DDL replication v4
DDL stands for data definition language, the subset of the SQL language that creates, alters, and drops database objects.
For operational convenience and correctness, BDR replicates most DDL actions, with these exceptions:
- Temporary or unlogged relations
- Certain DDL statements (mostly long running)
- Locking commands (
LOCK
) - Table maintenance commands (
VACUUM
,ANALYZE
,CLUSTER
,REINDEX
) - Actions of autovacuum
- Operational commands (
CHECKPOINT
,ALTER SYSTEM
) - Actions related to databases or tablespaces
Automatic DDL replication makes certain DDL changes easier without having to manually distribute the DDL change to all nodes and ensure that they are consistent.
In the default replication set, DDL is replicated to all nodes by default. To replicate DDL, you must add a DDL replication filter to the replication set. See DDL replication filtering.
BDR is significantly different from standalone PostgreSQL when it comes to DDL replication. Treating it the same is the most common issue with BDR.
The main difference from table replication is that DDL replication doesn't replicate the result of the DDL but the statement itself. This works very well in most cases, although it introduces the requirement that the DDL must execute similarly on all nodes. A more subtle point is that the DDL must be immutable with respect to all datatype-specific parameter settings, including any datatypes introduced by extensions (not built-in). For example, the DDL statement must execute correctly in the default encoding used on each node.
DDL replication options
The bdr.ddl_replication
parameter specifies replication behavior.
bdr.ddl_replication = on
is the default and replicates DDL to the
default replication set, which by default means all nodes. Nondefault
replication sets don't replicate DDL unless they have a
DDL filter
defined for them.
You can also replicate DDL to specific replication sets using the
function bdr.replicate_ddl_command()
. This can be helpful if you
want to run DDL commands when a node is down or if you want to have
indexes or partitions that exist on a subset of nodes or rep sets,
for example, all nodes at site1.
While we don't recommend it, you can skip automatic DDL replication and
execute it manually on each node using bdr.ddl_replication
configuration
parameters.
When set, it makes BDR skip both the global locking and the replication of executed DDL commands. You must then run the DDL manually on all nodes.
Warning
Executing DDL manually on each node without global locking can cause the whole BDR group to stop replicating if conflicting DDL or DML executes concurrently.
The bdr.ddl_replication
parameter can be set only by the bdr_superuser,
by superuser, or in the config
file.
Executing DDL on BDR systems
A BDR group isn't the same as a standalone PostgreSQL server. It's based on asynchronous multi-master replication without central locking and without a transaction coordinator. This has important implications when executing DDL.
DDL that executes in parallel continues to do so with BDR. DDL execution respects the parameters that affect parallel operation on each node as it executes, so you might notice differences in the settings between nodes.
Prevent the execution of conflicting DDL, otherwise DDL replication causes errors and the replication stops.
BDR offers three levels of protection against those problems:
ddl_locking = 'dml'
is the best option for operations, usable when you execute
DDL from only one node at a time. This isn't the default, but we recommend
that you use this setting if you can control where DDL is executed from. Doing so
ensures that there are no inter-node conflicts. Intra-node conflicts are already
handled by PostgreSQL.
ddl_locking = on
is the strictest option and is best when DDL might execute
from any node concurrently and you want to ensure correctness.
ddl_locking = off
is the least strict option and is dangerous in general use.
This option skips locks altogether, avoiding any performance overhead, which makes
it a useful option when creating a new and empty database schema.
These options can be set only by the bdr_superuser, by the superuser, or in the config file
.
When using the bdr.replicate_ddl_command
, you can set this
parameter directly with the third argument, using the specified
bdr.ddl_locking
setting only for the DDL commands passed to that
function.
DDL locking details
Two kinds of locks enforce correctness of replicated DDL with BDR.
The first kind is known as a global DDL lock and is used only when ddl_locking = on
.
A global DDL lock prevents any other DDL from executing on the cluster while
each DDL statement runs. This ensures full correctness in the general case but
is too strict for many simple cases. BDR acquires a global lock on
DDL operations the first time in a transaction where schema changes are made.
This effectively serializes the DDL-executing transactions in the cluster. In
other words, while DDL is running, no other connection on any node can run
another DDL command, even if it affects different tables.
To acquire a lock on DDL operations, the BDR node executing DDL contacts the other nodes in a BDR group and asks them to grant it the exclusive right to execute DDL. The lock request is sent by the regular replication stream, and the nodes respond by the replication stream as well. So it's important that nodes (or at least a majority of the nodes) run without much replication delay. Otherwise it might take a long time for the node to acquire the DDL lock. Once the majority of nodes agrees, the DDL execution is carried out.
The ordering of DDL locking is decided using the Raft protocol. DDL statements executed on one node are executed in the same sequence on all other nodes.
To ensure that the node running a DDL has seen effects of all prior DDLs run in the cluster, it waits until it has caught up with the node that ran the previous DDL. If the node running the current DDL is lagging behind in replication with respect to the node that ran the previous DDL, then it might take a long time to acquire the lock. Hence it's preferable to run DDLs from a single node or the nodes that have nearly caught up with replication changes originating at other nodes.
The second kind is known as a relation DML lock. This kind of lock is used when
either ddl_locking = on
or ddl_locking = dml
, and the DDL statement might cause
in-flight DML statements to fail. These failures can occur when you add or modify a constraint
such as a unique constraint, check constraint, or NOT NULL constraint.
Relation DML locks affect only one relation at a time. Relation DML
locks ensure that no DDL executes while there are changes in the queue that
might cause replication to halt with an error.
To acquire the global DML lock on a table, the BDR node executing the DDL contacts all other nodes in a BDR group, asking them to lock the table against writes and waiting while all pending changes to that table are drained. Once all nodes are fully caught up, the originator of the DML lock is free to perform schema changes to the table and replicate them to the other nodes.
The global DML lock holds an EXCLUSIVE LOCK on the table on each node, so it blocks DML, other DDL, VACUUM, and index commands against that table while it runs. This is true even if the global DML lock is held for a command that normally doesn't take an EXCLUSIVE LOCK or higher.
Waiting for pending DML operations to drain can take a long time and even longer if replication is currently lagging. This means that schema changes affecting row representation and constraints, unlike with data changes, can be performed only while all configured nodes can be reached and are keeping up reasonably well with the current write rate. If such DDL commands must be performed while a node is down, first remove the down node from the configuration.
If a DDL statement isn't replicated, no global locks are acquired.
Locking behavior is specified by the bdr.ddl_locking
parameter, as
explained in Executing DDL on BDR systems:
ddl_locking = on
takes global DDL lock and, if needed, takes relation DML lock.ddl_locking = dml
skips global DDL lock and, if needed, takes relation DML lock.ddl_locking = off
skips both global DDL lock and relation DML lock.
Some BDR functions make DDL changes. For those functions, DDL locking behavior applies. This is noted in the docs for each function.
Thus, ddl_locking = dml
is safe only when you can guarantee that
no conflicting DDL is executed from other nodes. With this setting,
the statements that require only the global DDL lock don't use the global
locking at all.
ddl_locking = off
is safe only when you can guarantee that there are no
conflicting DDL and no conflicting DML operations on the database objects
DDL executes on. If you turn locking off and then experience difficulties,
you might lose in-flight changes to data. The user application team needs to resolve any issues caused.
In some cases, concurrently executing DDL can properly be serialized. If these serialization failures occur, the DDL might reexecute.
DDL replication isn't active on logical standby nodes until they are promoted.
Some BDR management functions act like DDL, meaning that they attempt to take global locks, and their actions are replicated if DDL replication is active. The full list of replicated functions is listed in BDR functions that behave like DDL.
DDL executed on temporary tables never need global locks.
ALTER or DROP of an object created in the current transaction doesn't required global DML lock.
Monitoring of global DDL locks and global DML locks is shown in Monitoring.
Minimizing the impact of DDL
Good operational advice for any database, these points become even more important with BDR:
To minimize the impact of DDL, make transactions performing DDL short, don't combine them with lots of row changes, and avoid long running foreign key or other constraint rechecks.
For
ALTER TABLE
, useADD CONSTRAINT NOT VALID
followed by another transaction withVALIDATE CONSTRAINT
rather than usingADD CONSTRAINT
alone.VALIDATE CONSTRAINT
waits until replayed on all nodes, which gives a noticeable delay to receive confirmations.When indexing, use the
CONCURRENTLY
option whenever possible.
An alternate way of executing long-running DDL is to disable DDL replication and then to execute the DDL statement separately on each node. You can still do this using a single SQL statement, as shown in the following example. Global locking rules still apply, so be careful not to lock yourself out with this type of usage, which is more of a workaround.
We recommend using the bdr.run_on_all_nodes()
technique with CREATE
INDEX CONCURRENTLY
, noting that DDL replication must be disabled for the whole
session because CREATE INDEX CONCURRENTLY
is a multi-transaction command.
Avoid CREATE INDEX
on production systems
since it prevents writes while it executes.
REINDEX
is replicated in versions up to 3.6 but not with BDR 3.7 or later.
Avoid using REINDEX
because of the AccessExclusiveLocks it holds.
Instead, use REINDEX CONCURRENTLY
(or reindexdb --concurrently
),
which is available in PG12+ or 2QPG11+.
You can disable DDL replication when using command-line utilities like this:
Multiple DDL statements might benefit from bunching into a single transaction rather than fired as individual statements, so take the DDL lock only once. This might not be desirable if the table-level locks interfere with normal operations.
If DDL is holding up the system for too long, you can safely
cancel the DDL on the originating node with Control-C
in psql or with pg_cancel_backend()
.
You can't cancel a DDL lock from any other node.
You can control how long the global lock takes with optional
global locking timeout settings.
bdr.global_lock_timeout
limits how long the wait
for acquiring the global lock can take before it's canceled.
bdr.global_lock_statement_timeout
limits the runtime length of any statement
in transaction that holds global locks, and bdr.global_lock_idle_timeout
sets the maximum allowed idle time (time between statements) for a transaction
holding any global locks. You can disable all of these timeouts by setting
their values to zero.
Once the DDL operation has committed on the originating node, you can't cancel or abort it. The BDR group must wait for it to apply successfully on other nodes that confirmed the global lock and for them to acknowledge replay. For this reason, keep DDL transactions short and fast.
Handling DDL with down nodes
If the node initiating the global DDL lock goes down after it acquired the global lock (either DDL or DML), the lock stays active. The global locks don't time out, even if timeouts were set. In case the node comes back up, it releases all the global locks that it holds.
If it stays down for a long time (or indefinitely),
remove the node from the BDR group to release the global locks. This
is one reason for executing emergency DDL using the SET
command as
the bdr_superuser to update the bdr.ddl_locking
value.
If one of the other nodes goes down after it confirmed the global lock but before the command acquiring it executed, the execution of that command requesting the lock continues as if the node were up.
As mentioned earlier, the global DDL lock requires only a majority of the nodes to respond, and so it works if part of the cluster is down, as long as a majority is running and reachable. But the DML lock can't be acquired unless the whole cluster is available.
With global DDL or global DML lock, if another node goes down, the command continues normally, and the lock is released.
Statement-specific DDL replication concerns
Not all commands can be replicated automatically. Such commands
are generally disallowed, unless DDL replication is turned off
by turning bdr.ddl_replication
off.
BDR prevents some DDL statements from running when it's active on a database. This protects the consistency of the system by disallowing statements that can't be replicated correctly or for which replication isn't yet supported.
If a statement isn't permitted under BDR, you can often find
another way to do the same thing. For example, you can't do an ALTER TABLE
,
which adds a column with a volatile default value. But generally you can
rephrase that as a series of independent ALTER TABLE
and UPDATE
statements
that work.
Generally unsupported statements are prevented from being
executed, raising a feature_not_supported
(SQLSTATE 0A000
) error.
Any DDL that references or relies on a temporary object can't be replicated by BDR and throws an error if executed with DDL replication enabled.
BDR DDL command handling matrix
The following table describes the utility or DDL commands that are allowed, the ones that are replicated, and the type of global lock they take when they're replicated.
For some more complex statements like ALTER TABLE
, these can differ depending
on the subcommands executed. Every such command has detailed explanation
under the following table.
Command | Allowed | Replicated | Lock |
---|---|---|---|
ALTER AGGREGATE | Y | Y | DDL |
ALTER CAST | Y | Y | DDL |
ALTER COLLATION | Y | Y | DDL |
ALTER CONVERSION | Y | Y | DDL |
ALTER DATABASE | Y | N | N |
ALTER DATABASE LINK | Y | Y | DDL |
ALTER DEFAULT PRIVILEGES | Y | Y | DDL |
ALTER DIRECTORY | Y | Y | DDL |
ALTER DOMAIN | Y | Y | DDL |
ALTER EVENT TRIGGER | Y | Y | DDL |
ALTER EXTENSION | Y | Y | DDL |
ALTER FOREIGN DATA WRAPPER | Y | Y | DDL |
ALTER FOREIGN TABLE | Y | Y | DDL |
ALTER FUNCTION | Y | Y | DDL |
ALTER INDEX | Y | Y | DDL |
ALTER LANGUAGE | Y | Y | DDL |
ALTER LARGE OBJECT | N | N | N |
ALTER MATERIALIZED VIEW | Y | N | N |
ALTER OPERATOR | Y | Y | DDL |
ALTER OPERATOR CLASS | Y | Y | DDL |
ALTER OPERATOR FAMILY | Y | Y | DDL |
ALTER PACKAGE | Y | Y | DDL |
ALTER POLICY | Y | Y | DDL |
ALTER PROCEDURE | Y | Y | DDL |
ALTER PROFILE | Y | Y | Details |
ALTER PUBLICATION | Y | Y | DDL |
ALTER QUEUE | Y | Y | DDL |
ALTER QUEUE TABLE | Y | Y | DDL |
ALTER REDACTION POLICY | Y | Y | DDL |
ALTER RESOURCE GROUP | Y | N | N |
ALTER ROLE | Y | Y | DDL |
ALTER ROUTINE | Y | Y | DDL |
ALTER RULE | Y | Y | DDL |
ALTER SCHEMA | Y | Y | DDL |
ALTER SEQUENCE | Details | Y | DML |
ALTER SERVER | Y | Y | DDL |
ALTER SESSION | Y | N | N |
ALTER STATISTICS | Y | Y | DDL |
ALTER SUBSCRIPTION | Y | Y | DDL |
ALTER SYNONYM | Y | Y | DDL |
ALTER SYSTEM | Y | N | N |
ALTER TABLE | Details | Y | Details |
ALTER TABLESPACE | Y | N | N |
ALTER TEXT SEARCH CONFIGURATION | Y | Y | DDL |
ALTER TEXT SEARCH DICTIONARY | Y | Y | DDL |
ALTER TEXT SEARCH PARSER | Y | Y | DDL |
ALTER TEXT SEARCH TEMPLATE | Y | Y | DDL |
ALTER TRIGGER | Y | Y | DDL |
ALTER TYPE | Y | Y | DDL |
ALTER USER MAPPING | Y | Y | DDL |
ALTER VIEW | Y | Y | DDL |
ANALYZE | Y | N | N |
BEGIN | Y | N | N |
CHECKPOINT | Y | N | N |
CLOSE | Y | N | N |
CLOSE CURSOR | Y | N | N |
CLOSE CURSOR ALL | Y | N | N |
CLUSTER | Y | N | N |
COMMENT | Y | Details | DDL |
COMMIT | Y | N | N |
COMMIT PREPARED | Y | N | N |
COPY | Y | N | N |
COPY FROM | Y | N | N |
CREATE ACCESS METHOD | Y | Y | DDL |
CREATE AGGREGATE | Y | Y | DDL |
CREATE CAST | Y | Y | DDL |
CREATE COLLATION | Y | Y | DDL |
CREATE CONSTRAINT | Y | Y | DDL |
CREATE CONVERSION | Y | Y | DDL |
CREATE DATABASE | Y | N | N |
CREATE DATABASE LINK | Y | Y | DDL |
CREATE DIRECTORY | Y | Y | DDL |
CREATE DOMAIN | Y | Y | DDL |
CREATE EVENT TRIGGER | Y | Y | DDL |
CREATE EXTENSION | Y | Y | DDL |
CREATE FOREIGN DATA WRAPPER | Y | Y | DDL |
CREATE FOREIGN TABLE | Y | Y | DDL |
CREATE FUNCTION | Y | Y | DDL |
CREATE INDEX | Y | Y | DML |
CREATE LANGUAGE | Y | Y | DDL |
CREATE MATERIALIZED VIEW | Y | N | N |
CREATE OPERATOR | Y | Y | DDL |
CREATE OPERATOR CLASS | Y | Y | DDL |
CREATE OPERATOR FAMILY | Y | Y | DDL |
CREATE PACKAGE | Y | Y | DDL |
CREATE PACKAGE BODY | Y | Y | DDL |
CREATE POLICY | Y | Y | DML |
CREATE PROCEDURE | Y | Y | DDL |
CREATE PROFILE | Y | Y | Details |
CREATE PUBLICATION | Y | Y | DDL |
CREATE QUEUE | Y | Y | DDL |
CREATE QUEUE TABLE | Y | Y | DDL |
CREATE REDACTION POLICY | Y | Y | DDL |
CREATE RESOURCE GROUP | Y | N | N |
CREATE ROLE | Y | Y | DDL |
CREATE ROUTINE | Y | Y | DDL |
CREATE RULE | Y | Y | DDL |
CREATE SCHEMA | Y | Y | DDL |
CREATE SEQUENCE | Details | Y | DDL |
CREATE SERVER | Y | Y | DDL |
CREATE STATISTICS | Y | Y | DDL |
CREATE SUBSCRIPTION | Y | Y | DDL |
CREATE SYNONYM | Y | Y | DDL |
CREATE TABLE | Details | Y | DDL |
CREATE TABLE AS | Details | Y | DDL |
CREATE TABLESPACE | Y | N | N |
CREATE TEXT SEARCH CONFIGURATION | Y | Y | DDL |
CREATE TEXT SEARCH DICTIONARY | Y | Y | DDL |
CREATE TEXT SEARCH PARSER | Y | Y | DDL |
CREATE TEXT SEARCH TEMPLATE | Y | Y | DDL |
CREATE TRANSFORM | Y | Y | DDL |
CREATE TRIGGER | Y | Y | DDL |
CREATE TYPE | Y | Y | DDL |
CREATE TYPE BODY | Y | Y | DDL |
CREATE USER MAPPING | Y | Y | DDL |
CREATE VIEW | Y | Y | DDL |
DEALLOCATE | Y | N | N |
DEALLOCATE ALL | Y | N | N |
DECLARE CURSOR | Y | N | N |
DISCARD | Y | N | N |
DISCARD ALL | Y | N | N |
DISCARD PLANS | Y | N | N |
DISCARD SEQUENCES | Y | N | N |
DISCARD TEMP | Y | N | N |
DO | Y | N | N |
DROP ACCESS METHOD | Y | Y | DDL |
DROP AGGREGATE | Y | Y | DDL |
DROP CAST | Y | Y | DDL |
DROP COLLATION | Y | Y | DDL |
DROP CONSTRAINT | Y | Y | DDL |
DROP CONVERSION | Y | Y | DDL |
DROP DATABASE | Y | N | N |
DROP DATABASE LINK | Y | Y | DDL |
DROP DIRECTORY | Y | Y | DDL |
DROP DOMAIN | Y | Y | DDL |
DROP EVENT TRIGGER | Y | Y | DDL |
DROP EXTENSION | Y | Y | DDL |
DROP FOREIGN DATA WRAPPER | Y | Y | DDL |
DROP FOREIGN TABLE | Y | Y | DDL |
DROP FUNCTION | Y | Y | DDL |
DROP INDEX | Y | Y | DDL |
DROP LANGUAGE | Y | Y | DDL |
DROP MATERIALIZED VIEW | Y | N | N |
DROP OPERATOR | Y | Y | DDL |
DROP OPERATOR CLASS | Y | Y | DDL |
DROP OPERATOR FAMILY | Y | Y | DDL |
DROP OWNED | Y | Y | DDL |
DROP PACKAGE | Y | Y | DDL |
DROP PACKAGE BODY | Y | Y | DDL |
DROP POLICY | Y | Y | DDL |
DROP PROCEDURE | Y | Y | DDL |
DROP PROFILE | Y | Y | DDL |
DROP PUBLICATION | Y | Y | DDL |
DROP QUEUE | Y | Y | DDL |
DROP QUEUE TABLE | Y | Y | DDL |
DROP REDACTION POLICY | Y | Y | DDL |
DROP RESOURCE GROUP | Y | N | N |
DROP ROLE | Y | Y | DDL |
DROP ROUTINE | Y | Y | DDL |
DROP RULE | Y | Y | DDL |
DROP SCHEMA | Y | Y | DDL |
DROP SEQUENCE | Y | Y | DDL |
DROP SERVER | Y | Y | DDL |
DROP STATISTICS | Y | Y | DDL |
DROP SUBSCRIPTION | Y | Y | DDL |
DROP SYNONYM | Y | Y | DDL |
DROP TABLE | Y | Y | DML |
DROP TABLESPACE | Y | N | N |
DROP TEXT SEARCH CONFIGURATION | Y | Y | DDL |
DROP TEXT SEARCH DICTIONARY | Y | Y | DDL |
DROP TEXT SEARCH PARSER | Y | Y | DDL |
DROP TEXT SEARCH TEMPLATE | Y | Y | DDL |
DROP TRANSFORM | Y | Y | DDL |
DROP TRIGGER | Y | Y | DDL |
DROP TYPE | Y | Y | DDL |
DROP TYPE BODY | Y | Y | DDL |
DROP USER MAPPING | Y | Y | DDL |
DROP VIEW | Y | Y | DDL |
EXECUTE | Y | N | N |
EXPLAIN | Y | Details | Details |
FETCH | Y | N | N |
GRANT | Y | Details | DDL |
GRANT ROLE | Y | Y | DDL |
IMPORT FOREIGN SCHEMA | Y | Y | DDL |
LISTEN | Y | N | N |
LOAD | Y | N | N |
LOAD ROW DATA | Y | Y | DDL |
LOCK TABLE | Y | N | Details |
MOVE | Y | N | N |
NOTIFY | Y | N | N |
PREPARE | Y | N | N |
PREPARE TRANSACTION | Y | N | N |
REASSIGN OWNED | Y | Y | DDL |
REFRESH MATERIALIZED VIEW | Y | N | N |
REINDEX | Y | N | N |
RELEASE | Y | N | N |
RESET | Y | N | N |
REVOKE | Y | Details | DDL |
REVOKE ROLE | Y | Y | DDL |
ROLLBACK | Y | N | N |
ROLLBACK PREPARED | Y | N | N |
SAVEPOINT | Y | N | N |
SECURITY LABEL | Y | Details | DDL |
SELECT INTO | Details | Y | DDL |
SET | Y | N | N |
SET CONSTRAINTS | Y | N | N |
SHOW | Y | N | N |
START TRANSACTION | Y | N | N |
TRUNCATE TABLE | Y | Details | Details |
UNLISTEN | Y | N | N |
VACUUM | Y | N | N |
ALTER SEQUENCE
Generally ALTER SEQUENCE
is supported, but when using global
sequences, some options have no effect.
ALTER SEQUENCE ... RENAME
isn't supported on galloc sequences (only).
ALTER SEQUENCE ... SET SCHEMA
isn't supported on galloc sequences (only).
ALTER TABLE
Generally, ALTER TABLE
commands are allowed. However, several
subcommands aren't supported.
ALTER TABLE disallowed commands
Some variants of ALTER TABLE
currently aren't allowed on a BDR node:
ADD COLUMN ... DEFAULT (non-immutable expression)
— This is not allowed because it currently results in different data on different nodes. See Adding a column for a suggested workaround.ADD CONSTRAINT ... EXCLUDE
— Exclusion constraints aren't supported for now. Exclusion constraints don't make much sense in an asynchronous system and lead to changes that can't be replayed.ALTER TABLE ... SET WITH[OUT] OIDS
— Isn't supported for the same reasons as inCREATE TABLE
.ALTER COLUMN ... SET STORAGE external
— Is rejected if the column is one of the columns of the replica identity for the table.RENAME
— Can't rename an Autopartitioned table.SET SCHEMA
— Can't set the schema of an Autopartitioned table.ALTER COLUMN ... TYPE