
Data flowing into the Data Lake obviously changes. Data table changes are captured by CDC or change data capture. Changes in the source database are delivered downstream to a receiving system or endpoint. Using CDC, systems stay synchronised. This can be for real time systems, or batch and microbatch systems. An objective would be to support end user analytics (real time or batch based); and to prevent any downtime during a database migration (from on premises for eg. to AWS).
There are 2 common ways of doing this – Real time systems from on premises to AWS using Kafka; and migrating the data from on premises to AWS using the Database Migration service.


On premises SQL -> AWS EC2 hosted SQL -> use AWS Database Migration Service to move the data and schemas from on premises to AWS
If you are not migrating the database and application but leaving it on premises, and want the data to flow to the Datalake, you can still use DMS for large volumes or AWS DataSync.

Considerations for data migrations: