PostgreSQL

8 min read

October 4, 2024

Measuring WAL Throughput in PostgreSQL: Step-by-Step Guide

Learn how to approximate change event throughput for PostgreSQL by measuring the Write-ahead-log (WAL).

Dani Pálma

Measuring WAL Throughput in PostgreSQL: Step-by-Step Guide

Share this article

Write-ahead logging (WAL) is essential to PostgreSQL's crash recovery and Change Data Capture (CDC) replication. Tracking the amount of WAL generated over time can help developers understand the volume of changes happening in their database, which is particularly useful for evaluating replication systems or monitoring replication performance.

In this article, we’ll walk through how to measure WAL throughput using SQL queries in PostgreSQL. We will create a table and views that allow you to track and calculate WAL volume over specific periods.

Step 1: Set Up a Table to Track WAL LSN Positions

To start, you’ll need a table to store the Log Sequence Numbers (LSNs) along with timestamps. This allows you to record the current LSN over time, which we can later use to calculate the WAL volume.

plaintextCREATE TABLE wal_lsn_history (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    lsn_position pg_lsn
);

This table captures two key pieces of information:

timestamp: The exact time the LSN was recorded.
lsn_position: The current LSN when the row is inserted.

The Log Sequence Number (LSN) in PostgreSQL is a unique identifier for a specific location within the WAL. An LSN might look like this: 0/16B3740. This represents a specific point in the WAL, which the system can use to determine where to read or write next.

Source: https://materialize.com/blog/connecting-materialize-directly-to-postgresql-via-the-replication-stream/

Screenshot 2024-10-04 at 09.34.14.png — Source: https://materialize.com/blog/connecting-materialize-directly-to-postgresql-via-the-replication-stream/

Step 2: Create a Function to Record the Current WAL LSN

Next, you will create a function that inserts the current LSN into the wal_lsn_history table. This function can be called periodically to track changes in the WAL.

plaintextCREATE OR REPLACE FUNCTION record_current_wal_lsn()
RETURNS void AS $$
BEGIN
    INSERT INTO wal_lsn_history (lsn_position)
    VALUES (pg_current_wal_lsn());
END;
$$ LANGUAGE plpgsql;

You can execute this function manually or set up a cron job to run it at regular intervals, depending on how often you want to sample the WAL LSN.

Step 3: Create a View for WAL Volume Analytics

This view calculates the WAL bytes generated between each recorded LSN and the time difference between samples. It allows you to analyze WAL throughput over time.

plaintextCREATE OR REPLACE VIEW wal_volume_analytics AS
WITH time_periods AS (
    SELECT
        timestamp,
        lsn_position,
        LEAD(timestamp) OVER (ORDER BY timestamp DESC) as prev_timestamp,
        LEAD(lsn_position) OVER (ORDER BY timestamp DESC) as prev_lsn
    FROM wal_lsn_history
),
calculations AS (
    SELECT
        timestamp,
        lsn_position,
        EXTRACT(EPOCH FROM (timestamp - prev_timestamp)) as seconds_diff,
        pg_wal_lsn_diff(lsn_position, prev_lsn) as bytes_written
    FROM time_periods
    WHERE prev_lsn IS NOT NULL
)
SELECT
    timestamp,
    lsn_position,
    bytes_written as wal_bytes_since_previous,
    pg_size_pretty(bytes_written) as wal_size_since_previous,
    ROUND(bytes_written::numeric / NULLIF(seconds_diff, 0), 2) as bytes_per_second,
    pg_size_pretty((bytes_written::numeric / NULLIF(seconds_diff, 0))::bigint) || '/s' as rate_pretty,
    seconds_diff as seconds_since_previous
FROM calculations
WHERE seconds_diff > 0
ORDER BY timestamp DESC;

Explanation of Columns in the View:

wal_bytes_since_previous: The number of bytes written to the WAL since the previous LSN.
wal_size_since_previous: Human-readable size of the WAL written (e.g., KB, MB).
bytes_per_second: The rate of WAL generation, in bytes per second.
rate_pretty: A human-readable version of the rate, in KB/s or MB/s.
seconds_since_previous: The time difference between the current and previous LSN sample.

Example Output

The following output shows the calculated WAL volume for each recorded LSN:

timestamp	wal_size_since_previous	bytes_per_second	rate_pretty	seconds_since_previous
2024-10-01 14:22:10 +00:00	2 MB	250000.00	244 KB/s	8
2024-10-01 14:22:02 +00:00	1.5 MB	187500.00	183 KB/s	8

wal_volume_analytics table sample

Step 4: Create Summary Views for Time Windows

To provide a broader overview of WAL volume over various time windows (e.g., last 5 minutes, last hour), we can create another view that summarizes the WAL volume over configurable periods.

plaintextCREATE OR REPLACE VIEW wal_volume_summary AS
WITH time_windows AS (
    SELECT
        'Last 5 minutes' as window,
        5 as minutes,
        NOW() - INTERVAL '5 minutes' as start_time
    UNION ALL
    SELECT 'Last 15 minutes', 15, NOW() - INTERVAL '15 minutes'
    UNION ALL
    SELECT 'Last hour', 60, NOW() - INTERVAL '1 hour'
    UNION ALL
    SELECT 'Last day', 1440, NOW() - INTERVAL '1 day'
),
wal_diffs AS (
    SELECT
        h.timestamp,
        h.lsn_position,
        LAG(h.lsn_position) OVER (ORDER BY h.timestamp) as prev_lsn
    FROM wal_lsn_history h
),
calculated_wal AS (
    SELECT
        wd.timestamp,
        wd.lsn_position,
        wd.prev_lsn,
        pg_wal_lsn_diff(wd.lsn_position, wd.prev_lsn) as wal_diff
    FROM wal_diffs wd
    WHERE wd.prev_lsn IS NOT NULL
)
SELECT 
    w.window,
    COUNT(*) as samples,
    pg_size_pretty(SUM(c.wal_diff)) as total_wal_size,
    pg_size_pretty(AVG(c.wal_diff)::bigint) as avg_wal_per_minute,
    pg_size_pretty((SUM(c.wal_diff)::numeric / (w.minutes * 60))::bigint) || '/s' as avg_rate
FROM 
    time_windows w
    JOIN calculated_wal c ON c.timestamp > w.start_time
GROUP BY w.window, w.minutes
ORDER BY w.minutes;

Quick Explanation of Columns:

window: The measured time window.
samples: Number of WAL size samples.
total_wal_size: Total WAL size written in the time window.
avg_wal_per_minute: Average WAL generated per minute.
avg_rate: Average rate of WAL generation (e.g., KB/s).

Example Output

window	samples	total_wal_size	avg_wal_per_minute	avg_rate
Last 5 minutes	3	10 MB	2 MB	341 KB/s
Last 15 minutes	34	30 MB	2 MB	227 KB/s
Last hour	532	120 MB	2 MB	341 KB/s
Last day	3243	2 GB	1.5 MB	170 KB/s

wal_volume_summary table sample

And there we have it! Detailed statistics of WAL throughput over time. Keep in mind the WAL contains other stuff as well, so this is still just an approximation of change event throughput, but it is as close as we can get. Here's why:

WAL Logging: WAL logs more than just data changes—it also includes metadata and internal information not always reflected in CDC output.
WAL Compression: PostgreSQL can compress operations or include additional overhead (e.g., transaction IDs) that CDC might exclude.
Row-Level vs Full Writes: Some CDC tools might capture only the new values, while WAL logs old and new data along with metadata, affecting size comparisons.

But of course we don’t want to manually run the size collection function every time, so as a last step, let’s take a look at how we can automate this.

Scheduling WAL Data Collection with Cron

To continuously monitor WAL throughput over time, you need to collect the WAL LSN positions at regular intervals. This can be easily achieved by using cron jobs, a Linux-based task scheduler that runs scripts or commands at specified intervals.

If you prefer to avoid external tools like cron, PostgreSQL offers an internal job scheduling option using the pg_cron extension. pg_cron allows you to run SQL-based jobs directly within the database without needing to manage external scripts. This is particularly useful if you want to keep everything self-contained within PostgreSQL.

Step 1: Install pg_cron

First, you need to install and enable pg_cron. On most PostgreSQL installations, you can do this by running:

Install pg_cron via your package manager (e.g., for Ubuntu/Debian):

plaintext
sudo apt install postgresql-16-cron

Load the extension into your PostgreSQL instance:

plaintext
CREATE EXTENSION pg_cron;

Ensure pg_cron is set up to run jobs by editing your postgresql.conf file to include the cron schema:

plaintext
shared_preload_libraries = 'pg_cron'

Then, restart your PostgreSQL instance to apply the changes.

Step 2: Schedule WAL Data Collection

With pg_cron installed and loaded, you can now schedule jobs directly in SQL. To collect WAL LSN data periodically, use the following command to schedule the record_current_wal_lsn() function to run every minute:

plaintext
SELECT cron.schedule('Record WAL LSN every minute', '*/1 * * * *', 'SELECT record_current_wal_lsn();');

This command schedules the record_current_wal_lsn() function to run every minute using the cron syntax. The job will be handled entirely within PostgreSQL, eliminating the need for external scripts or cron setups.

Step 3: Monitor Scheduled Jobs

You can monitor and manage your scheduled jobs within PostgreSQL using pg_cron. To see the list of jobs scheduled with pg_cron, you can query the cron.job table:

plaintext
SELECT jobid, schedule, command, nodename, nodeport, active FROM cron.job;

If you want to stop a scheduled job, you can disable it using the cron.unschedule function:

plaintext
SELECT cron.unschedule(jobid);

Step 4: Adjusting the Collection Frequency

As with standard cron, you can adjust the frequency by modifying the cron syntax in the schedule string:

*/5 * * * *: Every 5 minutes.
0 * * * *: Every hour.
0 0 * * *: Every day at midnight.

Conclusion

Measuring WAL throughput is an important part of monitoring and optimizing PostgreSQL, especially for systems that rely on replication. By using the table, functions, and views outlined in this article, you can track how much data is being written to the WAL in real-time and over specific periods. This can help you make informed decisions about replication system evaluations, performance tuning, and database scaling.

Share this article

Table of Contents

Build a Pipeline

Start streaming your data for free

Build a Pipeline

About the author

Dani Pálma

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

Measuring WAL Throughput in PostgreSQL: Step-by-Step Guide

Step 1: Set Up a Table to Track WAL LSN Positions

Step 2: Create a Function to Record the Current WAL LSN

Step 3: Create a View for WAL Volume Analytics

Example Output

Step 4: Create Summary Views for Time Windows

Example Output

Scheduling WAL Data Collection with Cron

Step 1: Install pg_cron

Step 2: Schedule WAL Data Collection

Step 3: Monitor Scheduled Jobs

Step 4: Adjusting the Collection Frequency

Conclusion

Start streaming your data for free

Start streaming your data for free

About the author

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.