change data capturepostgres

5 min read

September 12, 2024

How-to: Real-time Change Data Capture for Neon PostgreSQL with Estuary Flow

Learn how to stream real-time changes from Neon PostgreSQL with Estuary Flow and Change Data Capture (CDC)

Dani Pálma

How-to: Real-time Change Data Capture for Neon PostgreSQL with Estuary Flow

Share this article

This guide will walk you through setting up Change Data Capture (CDC) for your Neon PostgreSQL database using Estuary Flow. This setup will allow you to replicate data from your Neon database to external destinations in real-time, such as Snowflake, Databricks, Iceberg, or even other PostgreSQL databases.

Why CDC is Great for Data Replication

Here are a few points to illustrate why log-based CDC is the best way to replicate data in real-time from PostgreSQL databases such as Neon.

Real-time Data Synchronization: CDC enables you to keep multiple systems in sync with minimal delay, ensuring data consistency across your entire data ecosystem.
Reduced Load on Source Systems: CDC minimizes the impact on your source database's performance by only capturing and transferring changed data.
Efficient Data Integration: CDC facilitates efficient ETL (Extract, Transform, Load) processes by focusing only on changed data, reducing processing time and resource usage.
Historical Change Tracking: CDC can provide a detailed audit trail of data changes, which is crucial for compliance and data governance.

Why Estuary Flow is The Best Option for Neon CDC

Native Integration with Neon: Estuary Flow offers native support for Neon PostgreSQL, making it easy to set up and manage CDC pipelines from your Neon database to various destinations.
Real-time Data Pipelines: With Estuary Flow, you can create real-time data pipelines that capture and process data changes as they occur, enabling up-to-the-minute analytics and decision-making.
Automated Schema Management: Estuary Flow automatically detects schema changes in your source database and propagates these changes to your target systems, reducing manual intervention and potential errors.
Wide Range of No-Code Connectors: Estuary Flow supports a variety of source and destination connectors, allowing you to easily integrate your Neon PostgreSQL data with popular data warehouses, analytics platforms, and other databases.

Prerequisites

Before you begin, make sure you have:

An Estuary Flow account (you can start for free, no credit card is required!)
A Neon account

Step 1: Enable Logical Replication in Neon

Log in to the Neon Console and select your project.
Go to the Neon Dashboard and select "Project settings".
Click on the "Beta" tab.
Click "Enable" to turn on logical replication.

Image #2.png — Configure logical replication in Neon

Note: This action will modify the wal_level configuration parameter to logical for all databases in your Neon project. This change cannot be reversed and will restart all computes in your project. To verify that logical replication is enabled, run this query in the Neon SQL Editor:

plaintext
SHOW wal_level;

The result should be logical.

Step 2: Create a Postgres Role for Replication

Navigate to the Neon Console.
Select a project.
Select Branches.
Select the branch where you want to create the role.
On the Roles & Databases tab, click Add role.
Specify a role name (e.g., cdc_role) and click "Create".
The role is created and you are provided with the password for the role.

Alternatively, you can use the Neon CLI:

plaintext
neon roles create --name cdc_role

Step 3: Grant Schema Access to Your Postgres Role

In the SQL Editor of your project, run these SQL commands for each schema you want to replicate:

plaintextGRANT USAGE ON SCHEMA public TO cdc_role;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO cdc_role;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO cdc_role;

Step 4: Create a Publication

Create a publication named flow_publication that includes all the tables you want to ingest into Estuary Flow:

plaintextCREATE PUBLICATION flow_publication FOR TABLE <tbl1, tbl2, tbl3>;
ALTER PUBLICATION flow_publication SET (publish_via_partition_root = true);

Step 5: (Optional) Allow Inbound Traffic

Allow traffic in Neon based on IP addresses

Image #4.png — Allow traffic in Neon based on IP addresses

If you're using Neon's IP Allow feature, you'll need to allow inbound traffic from Estuary Flow's IP addresses. Consult the Estuary Flow documentation for the list of IPs to allowlist.

Step 6: Create a Neon Source Connector in Estuary Flow

In the Estuary Flow web UI, go to "Sources" and click "New Capture".
Search for "Neon PostgreSQL" from the connector catalog and click "Capture".

Image #5.png — Create Neon Capture in Estuary Flow

Enter your Neon database connection details:
- Server Address: (from your Neon connection string)
- User: cdc_role
- Password: (the password for cdc_role)
- Database: (your database name)

Image #6.png — Configure Neon Capture in Estuary Flow

Click "Next" to scan your database for available tables for replication.
Select the tables you want to replicate by enabling bindings for them.

Image #7.png — Configure Collections in Estuary Flow

Click "Save and Publish" to start the connector and begin the backfill process.

And that’s it! Estuary Flow will start backfilling all existing data in just a few seconds, then shift into a continuous replication mode to capture all future data changes. From this point, you can either set up some transformations on your data in a true ETL fashion, or you can materialize the incoming records straight into a data warehouse, such as Snowflake, Databricks, or BigQuery.

Conclusion

In this quick guide, you’ve learned how to set up an Estuary Flow Neon change data capture data flow in a just a few minutes. This integration enables you to activate your data for analytics, machine learning, AI, and a heap of other use cases.

Make sure to check out some of our other articles on our blog.

If you have any questions, join us in our community Slack channel!

Share this article

Table of Contents

Build a Pipeline

Start streaming your data for free

Build a Pipeline

About the author

Dani Pálma

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

How-to: Real-time Change Data Capture for Neon PostgreSQL with Estuary Flow

Why CDC is Great for Data Replication

Why Estuary Flow is The Best Option for Neon CDC

Prerequisites

Step 1: Enable Logical Replication in Neon

Step 2: Create a Postgres Role for Replication

Step 3: Grant Schema Access to Your Postgres Role

Step 4: Create a Publication

Step 5: (Optional) Allow Inbound Traffic

Step 6: Create a Neon Source Connector in Estuary Flow

Conclusion

Start streaming your data for free

Start streaming your data for free

About the author

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.