Introduction to PostgreSQL Logical Replication: A Beginner's Guide
PostgreSQL logical replication is a powerful feature that allows you to replicate data at a finer level of control compared to physical replication. Unlike physical replication, which copies the entire database cluster, logical replication enables replication of specific tables, offering flexibility and efficiency for various use cases like real-time analytics and data warehousing.
In this guide, we’ll cover the fundamentals of logical replication, how it differs from physical replication, and how to set it up step-by-step.
What is Logical Replication?
Logical replication is a method of streaming data changes between PostgreSQL databases at the table level. It uses a publish-subscribe model:
- Publisher: The source database that sends the changes.
- Subscriber: The destination database that applies the changes.
With logical replication, you can replicate only the required data instead of the entire database, making it ideal for heterogeneous replication or consolidating data from multiple sources.
Logical Replication vs Physical Replication
Feature | Logical Replication | Physical Replication |
---|---|---|
Replication Level | Table-specific replication | Entire database cluster |
Use Case | Partial data replication, analytics | High Availability, disaster recovery |
Flexibility | High (supports custom setups) | Limited |
Cross-Version | Supported | Not supported |
When to Use Logical Replication
- Data Analytics: Replicate specific tables for real-time reporting.
- Data Distribution: Share data with multiple subscribers or replicas.
- Heterogeneous Replication: Sync data between different PostgreSQL versions.
- Selective Replication: Target specific tables for replication.
Before Setting up Replication
Before enabling logical replication in PostgreSQL, it’s essential to prepare your environment to handle the changes effectively. Logical replication requires certain configurations that can impact system performance and storage. This section outlines the key considerations and steps you should take before setting up replication.
- Monitor and Plan for Increased WAL Usage
Logical replication relies on Write-Ahead Logging (WAL) to track and propagate changes. When you set wal_level
to logical
, the volume of WAL logs increases, potentially consuming significant disk space.
- Evaluate Network Bandwidth
Logical replication streams data changes over the network between the publisher and subscriber. Insufficient bandwidth may cause delays in replication and increase latency.
- Verify Resource Availability
Logical replication may increase CPU and memory usage on both the publisher and subscriber due to the additional processes handling replication.
Setting Up Logical Replication
This section provides a step-by-step guide to configure logical replication, including commands and parameter settings to ensure a seamless setup.
Step 1: Enable Logical Replication
Logical replication requires specific parameters to be configured in the postgresql.conf
file.
- Open the PostgreSQL configuration file:
sudo nano /var/lib/pgsql/data/postgresql.conf
- Modify or add the following parameters:
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4
wal_level
: Must be set tological
.max_replication_slots
: Number of replication slots available.max_wal_senders
: Number of concurrent processes for sending WAL data.
- Restart PostgreSQL to apply changes:
sudo systemctl restart postgresql
Step 2: Configure the Publisher
- Allow replication connections in
pg_hba.conf
:
Open thepg_hba.conf
file:
sudo nano /var/lib/pgsql/data/pg_hba.conf
Add an entry for the subscriber:
host replication replicator 192.168.1.10/32 md5
- Create a replication role:
CREATE ROLE replicator WITH REPLICATION PASSWORD 'password' LOGIN;
- Create a publication:
CREATE PUBLICATION my_publication FOR TABLE employees, orders;
To set replication for all tables:
CREATE PUBLICATION my_publication FOR ALL TABLES;
Step 3: Export Database Structure
We can use pg_dump
to export the structure of the tables included in the publication:
pg_dump -h <publisher_host> -U <publisher_user> -d <publisher_db> -t employees -t orders --schema-only -f table_structure.sql
- Replace
<publisher_host>
,<publisher_user>
, and<publisher_db>
with the publisher's connection details. -t employees -t orders
: Specify the tables to export.--schema-only
: Exports only the table structure, not the data.-f table_structure.sql
: Outputs the structure to a file.
Step 4: Import structure in subscriber
Use pg_restore
or psql
to import the table structure into the subscriber database:
psql -h <subscriber_host> -U <subscriber_user> -d <subscriber_db> -f table_structure.sql
Replace <subscriber_host>
, <subscriber_user>
, and <subscriber_db>
with the subscriber's connection details.
Step 5: Configure the Subscriber
- Create table structure for all the tables needed, preferred method for creating these table is to have pg_dump and pg_restore for the structure
- Create a subscription on the subscriber database:
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=192.168.1.20 port=5432 dbname=mydb user=replicator password=password'
PUBLICATION my_publication;
my_subscription
: Name of the subscription.CONNECTION
: Connection string to the publisher.PUBLICATION my_publication
: Name of the publication to subscribe to.
- Verify the subscription:
SELECT * FROM pg_stat_subscription;
Step 6: Verify Logical Replication
Insert sample data into the employees
table on the publisher:
INSERT INTO employees (id, name, department) VALUES (1, 'Alice', 'HR');
Check the data on the subscriber database:
SELECT * FROM employees;
The data should appear in the subscriber database.
Important Queries for Managing Logical Replication
Below queries are important when you want to manage or check current status of the replication in the database.
- List Publications:
SELECT * FROM pg_publication;
- List Subscriptions:
SELECT * FROM pg_subscription;
- Monitor Replication Slots:
SELECT * FROM pg_replication_slots;
Cleanup Replication
After you are done with the replication testing or if you do not want to keep streaming the replication you can use below commands to cleanup it
- Drop a Subscription:
DROP SUBSCRIPTION my_subscription;
- Drop a Publication:
DROP PUBLICATION my_publication;
Best Practices for Logical Replication
- Monitor replication lag and adjust
max_wal_senders
as needed. - Ensure subscriber tables are initialized before creating a subscription.
Conclusion
Logical replication in PostgreSQL is a versatile tool that gives you granular control over data replication. Whether you're distributing data across regions or setting up real-time analytics, it provides a flexible and efficient solution. With this guide, you're now ready to set up and manage logical replication in your PostgreSQL environment.
For more tutorials and insights into PostgreSQL, stay tuned to our blog!
References:
Member discussion