Postgres Performance Analysis Scripts

Or How to Stop Worrying About Database Timeouts And Learn to Love These Scripts

The Postgres elephant. His name is Slonik.

Query Performance Analysis Scripts

The pg_stat_statements and pg_stat_activity tables provide valuable insights on query performance. The pg_stat_statments table holds information on queries that ran in the past, while the pg_stat_activity table holds information on queries running right this moment.

The pg_stat_statements table has has one row per for each query that ran. It provides useful information in columns like:

  • pg_stat_statements.query — the query text
  • pg_stat_statements.calls — The number of times this query was run
  • pg_stat_statements.total_time — The total time (ms) spent on the query

The pg_stat_activity table has one row per server process at this moment. It provides insightful info in columns like:

  • pg_stat_activity.query_start when query associated with this process started
  • pg_stat_activity.query the text of this query
  • pg_stat_activity.state the state of this query, possible values include active , idle , etc.

By querying these tables, we can answer questions like:

What are the slowest queries?

What are the slowest queries running right now sorted by running time

Which SELECT queries have read/touched the greatest number of rows

If a query touches/affects a large number or rows, adding indices may help.

While pg_stat_statement provides information about queries that already ran, to answer questions about queries running right this moment, we query the pg_stat_activity table. It can answer questions like:

How many queries have been running for longer than a minute?

Lots of long-running queries can cause your database to be slow or non-responsive.

How long have the oldest queries been running?

How many queries are currently writing to the same table?

Too many write queries to the same (especially large) table can cause lock-contention.

Because the pg_stat_activity table shows queries that are running right now, we can use its pg_cancel_backend function to kill problematic queries.

For example, if pg_stat_activity reveals long-running queries clogging up your database, a hot fix can just be to kill those queries. This will give a chance for the shorter queries to use the database. It’s better to have a few internal server errors caused by killing these queries, than to have the entire database unavailable.

Lock Analysis Scripts

Postgres queries acquire various types of locks on rows and tables. Depending on the type of lock, other queries may or not may be able to run concurrently depending on whether or not the locks conflict. The purpose of these locks is to resolve ambiguity regarding how the concurrent queries should behave. For example, if an INSERT query is running concurrently with a SELECT query, should the SELECT query return the newly inserted row? Or what happens if two UPDATE queries are running on the same row? These lock mechanism are part of a database design pattern called MVCC (Multi-version Concurrency Control) which is worth understanding.

But for the purpose of this article, just know that locks on rows/tables are mechanisms which prevent certain other queries from accessing that same row/tables until that lock is released.

Certain queries will acquire certain locks. If a query acquires a lock that conflicts with another query’s lock, then the second query cannot run until the first query has completed, or we say the first query is blocking the second query.

For example, a SELECT query will acquire an ACCESS SHARE lock on a table. An UPDATE query will acquire a ROW EXCLUSIVE lock on table. Because ACCESS SHARE and ROW EXCLUSIVE locks do not conflict, a SELECT and UPDATE query may both run on the same table concurrently (which makes sense).

However, a CREATE INDEX without CONCURRENTLY command acquires a SHARE lock on a table, which conflicts with a ROW EXCLUSIVE , which means we cannot UPDATE query cannot run until the CREATE INDEX has completed. However, an ACCESS SHARE lock does not conflict with a SHARE lock which means a SELECT query on a table while a CREATE INDEX is also running on that table.

Conflict matrix for postgres locks

The pg_locks table provides information on the locks through the following columns:

  • locktype — the type of object being locked
  • relation — the table being locked, null if locktype is not relation
  • transactionid — the id of the transaction that is doing the locking
  • pid — the process id holding or waiting for this lock
  • granted — a boolean field where true means the lock is being held and false means process is waiting to acquire the lock

From this table we can find out what queries are blocking other queries.

When queries that should be executing quickly are taking a long time, chances are they are being blocked. Checking pg_locks can provide insight into why queries are taking unexpectedly long.

Index Usage Scripts

Because indices improve read query performance at the expense of write query performance, it’s a good idea to look for un-used (or even low use) indices and drop them.

Here’s a script to find unused or low-use indices:

Source: https://gist.github.com/jberkus/6b1bcaf7724dfc2a54f3

Cache Performance Scripts

Postgres caches recent queries in memory called the shared buffer cache (shared_buffers in postgresql.conf). The pg_statio_user_tables has as rows representing various stats on each of the (user) tables. The two columns of interest we’ll be looking at are:

  • pg_statio_user_tables.heap_blks_read — Number of disk blocks read from a table (ie. missed cache)
  • pg_statio_user_tables.heap_blks_hits — Number of buffer hits from this table (ie. cache hit)

From these two columns, we can determine the cache hit ratio of a table by:

Or we can determine the average cache hit ratio across all tables:

Similarly, there is a pg_statio_user_indexes which has as rows various stats of indexes:

  • pg_statio_user_indexes.indx_blks_read — Number of disk block reads from this index
  • pg_statio_user_indexes.indx_blks_hit — Number of buffer hits from this index.

We can determine the cache hit ratio for a given index or across all indexes:

A cache hit ratio of 99% is considered “well-engineered”. If it’s significantly less than that, we will need to increase the shared buffer cache (which ranges from 15% to 25% of the total system memory).

If the Cache Hit Ratio is low, and the shared_buffer is less than 25% of the total system memory, consider bumping up `shared_buffers` in `postgresql.conf`

Database Connections

The maximum number of connections Postgres can support is limited by the Postgres server size, in particular the memory. To see the maximum number of connections the database has, run:

To see how many connections are being used at the moment, and what their statuses are or which applications are using them:

See these connections can give great insight how how to re-architect your application to use those connections more effectively. It can help you decide to use a connection pooling proxy like pgbouncer.

Resetting the Stats

After implementing changes, be sure to reset all the pg_stat* tables using pg_stat_reset() to check for changes in performance.

--

--

Writing About Rails, React, Web Application Technology, Databases, and Software Engineering

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Steven Li

Writing About Rails, React, Web Application Technology, Databases, and Software Engineering