Application Usage v3.7
This chapter looks at BDR from an application or user perspective.
Setting up nodes is discussed in a later chapter, as is replication of DDL, and various options for controlling replication using replication sets.
Application Behavior
BDR supports replicating changes made on one node to other nodes.
BDR will, by default, replicate all changes from INSERTs, UPDATEs, DELETEs and TRUNCATEs from the source node to other nodes. Only the final changes will be sent, after all triggers and rules have been processed. For example, INSERT ... ON CONFLICT UPDATE will send either an INSERT or an UPDATE depending on what occurred on the origin. If an UPDATE or DELETE affects zero rows, then no changes will be sent.
INSERTs can be replicated without any pre-conditions.
For UPDATEs and DELETEs to be replicated on other nodes, we must be able to identify the unique rows affected. BDR requires that a table have either a PRIMARY KEY defined, a UNIQUE constraint or have an explicit REPLICA IDENTITY defined on specfic column(s). If one of those is not defined, a WARNING will be generated, and later UPDATEs or DELETEs will be explicitly blocked. If REPLICA IDENTITY FULL is defined for a table, then a unique index is not required; in that case, UPDATEs and DELETEs are allowed and will use the first non-unique index that is live, valid, not deferred and does not have expressions or WHERE clauses, otherwise a sequential scan will be used.
TRUNCATE can be used even without a defined replication identity. Replication of TRUNCATE commands is supported, but some care must be taken when truncating groups of tables connected by foreign keys. When replicating a truncate action, the subscriber will truncate the same group of tables that was truncated on the origin, either explicitly specified or implicitly collected via CASCADE, except in cases where replication sets are defined, see Replication Sets chapter for further details and examples. This will work correctly if all affected tables are part of the same subscription. But if some tables to be truncated on the subscriber have foreign-key links to tables that are not part of the same (or any) replication set, then the application of the truncate action on the subscriber will fail.
Row-level locks taken implicitly by INSERT, UPDATE and DELETE commands will be replicated as the changes are made. Table-level locks taken implicitly by INSERT, UPDATE, DELETE and TRUNCATE commands will also be replicated. Explicit row-level locking (SELECT ... FOR UPDATE/FOR SHARE) by user sessions is not replicated, nor are advisory locks. Information stored by transactions running in SERIALIZABLE mode is not replicated to other nodes; the transaction isolation level of SERIALIAZABLE is supported but transactions will not be serialized across nodes, in the presence of concurrent transactions on multiple nodes.
If DML is executed on multiple nodes concurrently then potential conflicts could occur if executing with asynchronous replication and these must be must be either handled or avoided. Various avoidance mechanisms are possible, discussed in the chapter on Conflicts which is also required reading.
Sequences need special handling, described in the Sequences chapter.
Binary data in BYTEA columns is replicated normally, allowing "blobs" of data up to 1GB in size. Use of the PostgreSQL "Large object" facility is not supported in BDR.
Rules execute only on the origin node, so are not executed during apply, even if they are enabled for replicas.
Replication is only possible from base tables to base tables. That is, the tables on the source and target on the subscription side must be tables, not views, materialized views, or foreign tables. Attempts to replicate tables other than base tables will result in an error. DML changes that are made through updatable views are resolved through to base tables on the origin and then applied to the same base table name on the target.
BDR supports partitioned tables transparently, meaning that a partitioned table can be added to a replication set and changes that involve any of the partitions will be replicated downstream.
By default, triggers execute only on the origin node. For example, an INSERT
trigger executes on the origin node and is ignored when we apply the change on
the target node. You can specify that triggers should execute on both the origin
node at execution time and on the target when it is replicated ("apply time")
by using ALTER TABLE ... ENABLE ALWAYS TRIGGER
, or use the REPLICA
option
to execute only at apply time, ALTER TABLE ... ENABLE REPLICA TRIGGER
.
Some types of trigger are not executed on apply, even if they exist on a table and are currently enabled. Trigger types not executed are
- Statement-level triggers (FOR EACH STATEMENT)
- Per-column UPDATE triggers (UPDATE OF column_name [, ...])
BDR replication apply uses the system-level default search_path. Replica
triggers, stream triggers and index expression functions may assume
other search_path settings which will then fail when they execute on apply.
To ensure this does not occur, resolve object references clearly using
either the default search_path only, always use fully qualified references to
objects, e.g. schema.objectname, or set the search path for a function using
ALTER FUNCTION ... SET search_path = ...
for the functions affected.
Note that BDR assumes that there are no issues related to text or other collatable datatypes, i.e. all collations in use are available on all nodes and the default collation is the same on all nodes. Replication of changes uses equality searches to locate Replica Identity values, so this will not have any effect except where unique indexes are explicitly defined with non-matching collation qualifiers. Row filters might be affected by differences in collations if collatable expressions were used.
BDR handling of very-long "toasted" data within PostgreSQL is transparent to the user. Note that the TOAST "chunkid" values will likely differ between the same row on different nodes, but that does not cause any problems.
BDR cannot work correctly if Replica Identity columns are marked as "external".
PostgreSQL allows CHECK() constraints that contain volatile functions. Since BDR re-executes CHECK() constraints on apply, any subsequent re-execution that doesn't return the same result as previously will cause data divergence.
BDR does not restrict the use of Foreign Keys; cascading FKs are allowed.
BDR does not currently support the use of non-ASCII schema or relation names. Later versions will remove this restriction.
Non-replicated statements
None of the following user commands are replicated by BDR, so their effects occur on the local/origin node only:
- Cursor operations (DECLARE, CLOSE, FETCH)
- Execution commands (DO, CALL, PREPARE, EXECUTE, EXPLAIN)
- Session management (DEALLOCATE, DISCARD, LOAD)
- Parameter commands (SET, SHOW)
- Constraint manipulation (SET CONSTRAINTS)
- Locking commands (LOCK)
- Table Maintenance commands (VACUUM, ANALYZE, CLUSTER, REINDEX)
- Async operations (NOTIFY, LISTEN, UNLISTEN)
Note that since the NOTIFY
SQL command and the pg_notify()
functions
are not replicated, notifications are not reliable in case of failover.
This means that notifications could easily be lost at failover if a
transaction is committed just at the point the server crashes.
Applications running LISTEN
may miss notifications in case of failover.
This is regrettably true in standard PostgreSQL replication and BDR does
not yet improve on this. CAMO and Eager replication options do not
allow the NOTIFY
SQL command or the pg_notify()
function.
DML and DDL Replication
Note that BDR does not replicate the DML statement, it replicates the changes caused by the DML statement. So for example, an UPDATE that changed two rows would replicate two changes, whereas a DELETE that did not remove any rows would not replicate anything. This means that the results of execution of volatile statements are replicated, ensuring there is no divergence between nodes as might occur with statement-based replication.
DDL replication works differently to DML. For DDL, BDR replicates the statement, which is then executed on all nodes. So a DROP TABLE IF EXISTS might not replicate anything on the local node, but the statement is still sent to other nodes for execution if DDL replication is enabled. Full details are covered in their own chapter: [DDL replication].
BDR goes to great lengths to ensure that intermixed DML and DDL statements work correctly, even within the same transaction.
Replicating between different release levels
BDR is designed to replicate between nodes that have different major versions of PostgreSQL. This is a feature designed to allow major version upgrades without downtime.
BDR is also designed to replicate between nodes that have different versions of BDR software. This is a feature designed to allow version upgrades and maintenance without downtime.
However, while it's possible to join a node with a major version in a cluster, you can not add a node with a minor version if the cluster uses a newer protocol version, this will return error.
Both of the above features may be affected by specific restrictions; any known incompatibilities will be described in the release notes.
Replicating between nodes with differences
By default, DDL will automatically be sent to all nodes. This can be controlled manually, as described in DDL Replication, which could be used to create differences between database schemas across nodes. BDR is designed to allow replication to continue even while minor differences exist between nodes. These features are designed to allow application schema migration without downtime, or to allow logical standby nodes for reporting or testing.
Currently, replication requires the same table name on all nodes. A future feature may allow a mapping between different table names.
It is possible to replicate between tables with dissimilar partitioning definitions, such as a source which is a normal table replicating to a partitioned table, including support for updates that change partitions on the target. It can be faster if the partitioning definition is the same on the source and target since dynamic partition routing need not be executed at apply time. Further details are available in the chapter on Replication Sets.
By default, all columns are replicated. BDR replicates data columns based on the column name. If a column has the same name but a different datatype, we attempt to cast from the source type to the target type, if casts have been defined that allow that.
BDR supports replicating between tables that have a different number of columns.
If the target has missing column(s) from the source then BDR will raise a target_column_missing conflict, for which the default conflict resolver is ignore_if_null. This will throw an ERROR if a non-NULL value arrives. Alternatively, a node can also be configured with a conflict resolver of ignore. This setting will not throw an ERROR, just silently ignore any additional columns.
If the target has additional column(s) not seen in the source record then BDR will raise a source_column_missing conflict, for which the default conflict resolver is use_default_value. Replication will proceed if the additional columns have a default, either NULL (if nullable) or a default expression, but will throw an ERROR and halt replication if not.
Transform triggers can also be used on tables to provide default values or alter the incoming data in various ways before apply.
If the source and the target have different constraints, then replication will be attempted, but it might fail if the rows from source cannot be applied to the target. Row filters may help here.
Replicating data from one schema to a more relaxed schema won't cause failures. Replicating data from a schema to a more restrictive schema will be a source of potential failures. The right way to solve this is to place a constraint on the more relaxed side, so bad data is prevented from being entered. That way, no bad data ever arrives via replication, so it will never fail the transform into the more restrictive schema. For example, if one schema has a column of type TEXT and another schema defines the same column as XML, add a CHECK constraint onto the TEXT column that enforces that the text is XML.
A table may be defined with different indexes on each node. By default, the index definitions will be replicated. Refer to DDL Replication to specify how to create an index only on a subset of nodes, or just locally.
Storage parameters, such as fillfactor and toast_tuple_target, may differ
between nodes for a table without problems. An exception to that is the
value of a table's storage parameter user_catalog_table
must be identical
on all nodes.
A table being replicated should be owned by the same user/role on each node. Refer to Security and Roles for further discussion.
Roles may have different passwords for connection on each node, though by default changes to roles are replicated to each node. Refer to DDL Replication to specify how to alter a role password only on a subset of nodes, or just locally.
Comparison between nodes with differences
Livecompare is a tool used for data comparison on a database, against BDR and non-BDR nodes. It needs a minimum number of two connections to compare against and reach a final result.
From Livecompare 1.3 , you could configure with all_bdr_nodes
set. This will
save you from clarifying all the relevant DSNs for each separate node in the
cluster. A EDB Postgres Distributed cluster has N amount of nodes with connection information, but
its only the initial and output connection that livecompare 1.3+ needs in order
to complete its job. Setting logical_replication_mode
will state how all the
nodes are communicating.
All the configuration is done within a .ini file, named bdrLC.ini for example.
Templates for this configuration file can be seen within the
/etc/2ndq-livecompare/
location, where they were placed after the package
install.
During the execution of LiveCompare, you will see N+1 progress bars, N being the number of processes. Once all the tables are sourced a time will display, as the transactions per second (tps) has been measured. This will continue to count the time, giving you an estimate, then a total execution time at the end.
This tool has a lot of customisation and filters. Such as tables, schemas and replication_sets. LiveCompare can use stop-start without losing context information, so it can be run at convenient times. After the comparison, a summary and a DML script are generated so the user can review it. Please apply the DML to fix the found differences, if any.
General Rules for Applications
As discussed above, BDR uses replica identity values to identify the rows to be changed. Applications can cause difficulties if they insert, delete, and then later re-use the same unique identifiers. This is known as the ABA Problem. BDR cannot know whether the rows are the current row, the last row, or much older rows. See https://en.wikipedia.org/wiki/ABA_problem.
Similarly, since BDR uses table names to identify the table against which changes will be replayed, a similar ABA problem exists with applications that CREATE, then DROP, and then later re-use the same object names.
These issues give rise to some simple rules for applications to follow:
- Use unique identifiers for rows (INSERT)
- Avoid modification of unique identifiers (UPDATE)
- Avoid reuse of deleted unique identifiers
- Avoid reuse of dropped object names
In the general case, breaking those rules can lead to data anomalies and divergence. Applications can break those rules as long as certain conditions are met, but use caution: although anomalies can be unlikely, they are not impossible. For example, a row value can be reused as long as the DELETE has been replayed on all nodes, including down nodes. This might normally occur in less than a second, but could potentially take days if a severe issue occurred on one node that prevented it from restarting correctly.
Timing Considerations and Synchronous Replication
Being asynchronous by default, peer nodes may lag behind making it's possible for a client connected to multiple BDR nodes or switching between them to read stale data.
A queue wait function is provided for clients or proxies to prevent such stale reads.
The synchronous replication features of Postgres are available to BDR as well. In addition, BDR provides multiple variants for more synchronous replication. Please refer to the Durability & Performance Options chapter for an overview and comparison of all variants available and its different modes.
Application Testing
BDR applications can be tested using the following programs, in addition to other techniques.
- [TPAexec]
- [pgbench with CAMO/Failover options]
- [isolationtester with multi-node access]
TPAexec
TPAexec is the system used by EDB to deploy reference TPA architectures, including those based on EDB Postgres Distributed.
TPAexec includes test suites for each reference architecture; it also simplifies creating and managing a local collection of tests to be run against a TPA cluster, using a syntax as in the following example:
We strongly recommend that developers write their own multi-node suite of TPAexec tests which verify the main expected properties of the application.
pgbench with CAMO/Failover options
pgbench has been extended to allow users to run failover tests while using CAMO or regular BDR deployments. The following new options have been added:
in addition to the above options, the connection information about the peer node for failover must be specified in DSN form.
Use
-m camo
or-m failover
to specify the mode for pgbench. The-m failover
specification can be used to test failover in regular BDR deployments.Use
--retry
to specify whether transactions should be retried when failover happens with-m failover
mode. This is enabled by default for-m camo
mode.
Here's an example invocation in a CAMO environment:
The above command will run in camo
mode. It will connect to node1
and run the tests; if the
connection to node1
connection is lost, then pgbench will connect to
node2
. It will query node2
to get the status of in-flight transactions.
Aborted and in-flight transactions will be retried in camo
mode.
In failover
mode, if --retry
is specified then in-flight transactions will be retried. In
this scenario there is no way to find the status of in-flight transactions.
isolationtester with multi-node access
isolationtester has been extended to allow users to run tests on multiple sessions and on multiple nodes. This is used for internal BDR testing, though it is also available for use with user application testing.