PostgresBench Brings Transparency to Managed Database Performance
ClickHouse's open-source benchmarking harness lets developers run standardized pgbench workloads to compare hosted Postgres providers.
Cloud database providers love to claim they are "3x faster" or "the most cost-effective" option on the market. Historically, these claims have been nearly impossible for developers to verify. They are typically published in marketing-heavy whitepapers with hidden configurations, custom hardware, or undisclosed network topologies.
To combat this lack of transparency, ClickHouse has launched PostgresBench, an open-source, reproducible benchmarking harness designed to evaluate managed PostgreSQL services. Following the success of ClickBench in the OLAP space, PostgresBench brings a standardized, public methodology to transactional (OLTP) workloads. By automating the execution of native Postgres benchmarking tools, PostgresBench allows developers to run identical workloads across different hosted environments—including AWS RDS, Aurora, Crunchy Bridge, Neon, and Postgres by ClickHouse—and compare the results objectively.
For developers tasked with choosing a database vendor or justifying a migration, this tool represents a major shift toward verifiable, peer-reviewed cloud performance.
Standardizing the OLTP Workload with pgbench
At the core of PostgresBench is pgbench, the native benchmarking tool that ships directly with PostgreSQL. While other multi-database benchmarking suites like sysbench or Percona TPCC exist, they were originally designed with MySQL workloads in mind. Utilizing pgbench ensures a natural fit for PostgreSQL's architecture and eliminates the need for complex, third-party client installations.
PostgresBench leverages pgbench's built-in, TPC-B-like workload. This workload simulates a heavy transactional banking application, executing a mix of SELECT, UPDATE, and INSERT operations across four primary tables: pgbench_accounts, pgbench_branches, pgbench_tellers, and pgbench_history. It is a write-heavy, highly concurrent pattern that mimics common production scenarios like payment processing, inventory updates, and order management.
To capture stable, realistic throughput, PostgresBench executes pgbench with a highly demanding set of parameters:
pgbench -c 256 -j 16 -T 600 -M prepared -P 30 \
-s $SCALE_FACTOR \
-h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE
Each parameter is chosen deliberately to push the target database to its limits:
-c 256: Simulates 256 concurrent client sessions, representing a highly active production application.-j 16: Spawns 16 worker threads on the client machine to distribute the load generation.-T 600: Runs the benchmark for 10 minutes (600 seconds). This duration is critical; it ensures the database moves past its initial warm-up phase, fills its shared buffers, and experiences sustained disk I/O.-M prepared: Forces the use of prepared statements, reducing query parsing overhead on the server and isolating raw execution performance.-P 30: Outputs progress reports every 30 seconds to monitor performance stability over time.
To evaluate how databases handle different data volumes, PostgresBench tests two distinct scale factors (-s): 6849 (generating a dataset of approximately 100 GB) and 34247 (generating approximately 500 GB). The 100 GB scale factor represents a database where the active working set can comfortably fit within the RAM cache of a mid-tier instance. The 500 GB scale factor, however, forces the database to spill to disk, exposing the true performance of the underlying cloud storage subsystem under heavy write pressure.
The Fair Play Framework (And Its Caveats)
Designing a fair benchmark across competing cloud platforms is notoriously difficult. PostgresBench addresses this by standardizing the testing environment and documenting every variable in its public repository.
To eliminate network latency as a bottleneck, the benchmark client is run on a beefy 16 vCPU, 64 GB RAM instance provisioned in the same cloud region (us-east-2) as the target databases. Client and database instances are not colocated within the same Availability Zone (AZ), as some managed services do not allow users to pin resources to a specific AZ. This ensures a level playing field for all tested providers.
For hardware parity, PostgresBench targets a standard 1:4 CPU-to-RAM ratio, testing two primary configurations:
- 4 vCPUs / 16 GB RAM
- 16 vCPUs / 64 GB RAM
Because AWS Aurora does not offer an instance class matching this exact 1:4 ratio, the benchmark utilizes Aurora's 1:8 ratio instances (4 vCPUs/32 GB and 16 vCPUs/128 GB) to ensure it is included, granting it a slight memory advantage. Additionally, Graviton instances with NVMe caching are utilized for all services that support them (such as AWS RDS and Aurora).
However, developers must keep a few critical caveats in mind when analyzing the results:
- Default Configurations: PostgresBench tests each service using its out-of-the-box, default PostgreSQL settings. While this reflects the experience of a developer who spins up a database without manual tuning, it can penalize services with conservative default configurations (such as low
shared_buffersor restrictive autovacuum settings). - Single-Node Focus: To isolate compute and storage performance, high availability (HA) is disabled during these tests. Since replication architectures vary wildly—from standby nodes to distributed storage layers—HA performance remains an untested variable for now.
- Synthetic Data: The TPC-B-like schema is simple and uses randomly generated data. Real-world applications with complex schemas, foreign keys, and skewed data distributions may behave differently.
The Developer Angle: Running the Harness Locally
For engineering teams, the real value of PostgresBench is not just looking at the published leaderboards, but running the harness against their own self-managed or hosted databases. If you are considering migrating from self-hosted EC2 Postgres to a managed provider, you can use this exact suite to run a pre-migration sanity check.
Prerequisites
To run the benchmark, you need a local machine or client VM with PostgreSQL client tools (version 18 or higher) and jq installed:
# On Ubuntu/Debian
sudo apt-get install jq postgresql-client-18
# On macOS via Homebrew
brew install jq postgresql@18
Execution Flow
The benchmark is driven by a single run.sh script found in the PostgresBench GitHub repository. The script automates database initialization (dropping old tables, creating the schema, and loading data via pgbench -i), executes three consecutive 10-minute runs to ensure statistical consistency, and outputs the results to a structured JSON file.
To execute the benchmark against your target database, configure the connection environment variables and run the script:
# Define connection parameters
export PGHOST="your-database-host.amazonaws.com"
export PGPORT=5432
export PGUSER="postgres"
export PGPASSWORD="your-secure-password"
export PGDATABASE="postgres"
# Define instance metadata for the output report
export VCPUS=16
export RAM_GB=64
export SYSTEM_NAME="My Self-Managed Postgres"
export INSTANCE_TYPE="m8gd.4xlarge"
export PRIMARY_STORAGE="NVMe"
# Define benchmark parameters
export SCALE_FACTOR=6849 # ~100 GB dataset
export CLIENTS=256
export THREADS=16
export DURATION=600
export OUT_JSON="my_benchmark_results.json"
# Run the automated harness
./run.sh
Upon completion, the generated results.json file will contain detailed metrics for each of the three runs, including average transactions per second (TPS), average latency, P95 and P99 latency, latency standard deviation, and any failed transactions. This structured output makes it easy to parse the data into internal dashboards or CI/CD pipelines.
A Step Toward Open Database Engineering
PostgresBench is a highly welcome development in the database ecosystem. By open-sourcing the entire harness, ClickHouse has shifted the conversation from marketing claims to reproducible code. If a cloud provider disagrees with their ranking, they cannot simply complain; they must submit a pull request to optimize the configurations or point out flaws in the testing methodology.
For developers, this tool is an excellent starting point for evaluating database performance. While synthetic benchmarks should never entirely replace testing with your own application's query patterns, PostgresBench provides a rigorous, standardized baseline that cuts through the cloud marketing noise.
Sources & further reading
- PostgresBench: A Reproducible Benchmark for Postgres Services — clickhouse.com
- GitHub - ClickHouse/PostgresBench: PostgresBench: a Benchmark For Postgres-compatible DBMS using pgbench. · GitHub — github.com
- PostgresBench — A Reproducible Benchmark for Postgres Services — postgresbench.clickhouse.com
- PostgreSQL: Documentation: 18: pgbench — postgresql.org
Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.
Discussion 0
No comments yet
Be the first to weigh in.