Data Replication Methods

Data replication takes data from your source database, business application, API, file storage, etc., and copies it into your destination data warehouse or transactional database. After identifying the data you want to bring in, you need to determine how to replicate it for meeting your business needs.

Choosing the Right Method

Data Replication Methods

Given that the data replication method you choose will impact your data, we support various replication methods to give you as much flexibility as possible. The table below contains a high-level look at each of Etlworks' Replication Methods and compares their pros and cons.

Method
Pros
Cons
Change Data Capture (CDC)
Uses database redo [transaction] log to track changes in the source
  • Fast
  • No polling from database tables, uses database redo log instead
  • No polling from database tables, uses database redo log instead
  • Supports deletes
  • Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB
  • Some older versions of the databases above do not support CDC
  • Requires extra setup in the source database
Change Data Tracking (CT)
Synchronous tracking mechanism, in which the changes information will be available directly once the DML change is committed
  • Fast
  • No polling from database tables
  • Supports deletes
  • Supports [almost] real-time replication
  • Requires extra setup in the source database
High Watermark
Synchronous tracking mechanism, in which the changes information will be available directly once the DML change is committed
  • Fast
  • No extra moving parts
  • Works for all data sources, including all databases, files, and APIs
  • Does not support deletes
  • Requires a dedicated high watermark field in each table
Database Triggers
Synchronous tracking mechanism, in which the changes information will be available directly once the DML change is committed
  • Works for any source database which has triggers
  • No extra requirements for the specific version of the database or extra field in each table
  • Requires adding triggers to all database tables
  • Triggers can negatively impact performance
Real-time CDC with Kafka
Polls CDC events from the Kafka topic(s) to track changes in the source.
  • Fast
  • No polling from database tables, uses database redo log instead
  • No polling from database tables, uses database redo log instead
  • Supports deletes
  • Complicated setup (requires Kafka, Zookeeper, Kafka Connect, and Debezium)
  • Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB
  • Some older versions of the databases abovedo not support CDC
  • Requires extra setup in the source database
Full Refresh
Always polls the entire dataset from the source.
  • The simplest to setup
  • Can be quite fast for the relatively small datasets (<100K records)
  • Works for all data sources
  • Not recommended for large datasets
Method
Change Data Capture (CDC)
Uses database redo [transaction] log to track changes in the source
Pros
  • Fast
  • No polling from database tables, uses database redo log instead
  • No polling from database tables, uses database redo log instead
  • Supports deletes
Cons
  • Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB
  • Some older versions of the databases above do not support CDC
  • Requires extra setup in the source database
Method
Change Data Tracking (CT)
Synchronous tracking mechanism, in which the changes information will be available directly once the DML change is committed
Pros
  • Fast
  • No polling from database tables
  • Supports deletes
  • Supports [almost] real-time replication
Cons
  • Requires extra setup in the source database
Method
High Watermark
Synchronous tracking mechanism, in which the changes information will be available directly once the DML change is committed
Pros
  • Fast
  • No extra moving parts
  • Works for all data sources, including all databases, files, and APIs
Cons
  • Fast
  • No extra moving parts
  • Works for all data sources, including all databases, files, and APIs
Method
Database Triggers
Synchronous tracking mechanism, in which the changes information will be available directly once the DML change is committed
Pros
  • Works for any source database which has triggers
  • No extra requirements for the specific version of the database or extra field in each table
Cons
  • Requires adding triggers to all database tables
  • Triggers can negatively impact performance
Method
Real-time CDC with Kafka
Polls CDC events from the Kafka topic(s) to track changes in the source.
Pros
  • Fast
  • No polling from database tables, uses database redo log instead
  • No polling from database tables, uses database redo log instead
  • Supports deletes
Cons
  • Complicated setup (requires Kafka, Zookeeper, Kafka Connect, and Debezium)
  • Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB
  • Some older versions of the databases abovedo not support CDC
  • Requires extra setup in the source database
Method
Full Refresh
Always polls the entire dataset from the source.
Pros
  • The simplest to setup
  • Can be quite fast for the relatively small datasets (<100K records)
  • Works for all data sources
Cons
  • Not recommended for large datasets

Ready to Start Using ETLWorks Integrator?

Try 14 Days Free

start free trial

Get a Personalized Demo

request demo