Postgresql query monitoring

pavel trukhanov
okmeter.io blog
Published in
8 min readDec 5, 2017

--

Since 2008, ‘pgsql-hackers’ have been discussing an extension for collecting query statistics reports. This extension called ‘pg_stat_statements’, which is now part of PostgreSQL from 8.4 and above, allows for collecting statistical information about queries executed by a server.

The extension is typically used by database administrators as a data source for various reports. Though this data, in fact, represents only cumulative performance indicators since the last reset of counters, such information can be very useful for monitoring query execution time, locating performance issues, and in-depth analysis of what’s going on with a database server.

pg_stat_statements

So, let’s take a closer look at pg_stat_statements view (this one is from 9.4):

postgres=# \d pg_stat_statements;
View "public.pg_stat_statements"
Column | Type | Modifiers
---------------------+------------------+-----------
userid | oid |
dbid | oid |
queryid | bigint |
query | text |
calls | bigint |
total_time | double precision |
rows | bigint |
shared_blks_hit | bigint |
shared_blks_read | bigint |
shared_blks_dirtied | bigint |
shared_blks_written | bigint |
local_blks_hit | bigint |
local_blks_read | bigint |
local_blks_dirtied | bigint |
local_blks_written | bigint |
temp_blks_read | bigint |
temp_blks_written | bigint |
blk_read_time | double precision |
blk_write_time | double precision |

As you can see, all the queries are grouped, i.e. the statistics is collected not for individual queries, but for groups of queries which PostgreSQL considered similar (I will explain it in more detail below). Values of all counters are incrementing since the start of the Postgresql process or since last call of ‘pg_stat_statements_reset’.

  • query — query text
  • calls — number of query calls
  • total_time — total execution time of all query calls, in milliseconds
  • rows — number of rows returned (‘select’) or modified (‘update’) during the query execution
  • shared_blks_hit — number of shared memory blocks returned from the cache
  • shared_blks_read — number of shared memory blocks returned NOT from the cache. It’s not quite clear in the documentation whether it’s a total number of returned memory blocks or only the number of memory blocks not found in the cache, so let’s check out the source code:
/*
* lookup the buffer. IO_IN_PROGRESS is set if the requested block is
* not currently in memory.
*/
bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
strategy, &found);
if (found)
pgBufferUsage.shared_blks_hit++;
else
pgBufferUsage.shared_blks_read++;
  • shared_blks_dirtied — number of shared memory blocks marked as “dirty” during the query execution (i.e. a query modified as least one tuple in a block, and thus the block must be written to a drive using ‘checkpointer’ or ‘bgwriter’)
  • shared_blks_written — number of shared memory blocks written synchronously to a drive during the query execution. PostgreSQL worker attempts to synchronously write a block if it got it “dirty”.
  • local_blks — similar counters for blocks which are considered as local blocks by the backend and thus are used for temporary tables
  • temp_blks_read — number of blocks of temporary files read from a drive. Temporary files are used when there’s not enough memory to execute a query (the memory limit can be set in the ‘work_mem’ parameter)
  • temp_blks_written — number of blocks of temporary files written to a drive
  • blk_read_time — total waiting time for reading blocks, in milliseconds
  • blk_write_time — total waiting time for writing blocks to a drive, in milliseconds (only synchronous write operations by a worker are considered, the execution time of ‘checkpointer’/’bgwriter’ is not included)

Note that blk_read_time and blk_write_time are collected only when additional ‘track_io_timing’ setting is enabled.

The thing is that ‘pg_stat_statements’ considers only fully completed queries, i.e. if your query has been performing a resource-intensive task for about an hour, you will be able to view such a query only in ‘pg_stat_activity’.

How the queries are grouped

Previously I’ve been thinking that queries are grouped in the same order as they are executed. However, I also noticed that queries with different number of IN arguments usually show up in different groups, though such queries were expected to have the same execution plan.

After looking it up in the code, I understood that the queries are grouped by hash value of their “query jumble” (consisting of only significant parts of the query text obtained after parsing) — in version 9.4 and above, you can view this hash in the ‘queryid’ column.

In practice, we need to additionally normalize and group queries already in the agent, e.g. merge variable number of IN arguments into a single placeholder ‘?’ or replace inline arguments with placeholders — this is especially difficult when query’s content is not complete.

In versions older than 9.4, each query was cut down to ‘track_activity_query_size’, but in version 9.4 and above, this limitation is removed since query‘s content is now stored outside of the shared memory. However, we still cut large queries down to 8 KB to not to affect the performance of PostgreSQL significantly.

This is why we cannot parse a query with an SQL parser for additional normalization — any SQL parser will report an error on such cutted query text/ And thus we need to write several heuristics and regular expressions for better query cleanup. Certainly, adding new heuristics is far from perfect, but this is the only working solution we came up with.

Yet another issue is that PostgreSQL uses the ‘query’ field to store the first received query text in a group without normalization and any formatting, and thus resetting counters may lead to overwriting this query text with another one from the same group but looking completely different. In addition, many developers write comments directly in a query (e.g. to indicate the query’s ID or a function which calls the query), and these comments are also written to the ‘query’ field.

To avoid creating new metrics for the same queries every time, we remove all comments etc.

The question to ask

Together with our friends from PostgreSQL Consulting, we have deeply analyzed PostgreSQL’s internals and picked up the most useful metrics to locate database issues.

The goal of our monitoring is to answer to the following questions:

  • How does the database operate at this moment compared to previous periods?
  • What queries are the most resource-intensive for the server (by CPU, drive, etc.)?
  • How many queries (by type) are received?
  • How quickly are various queries executed?

Collecting the metrics

In fact, it’s not reasonable to monitor counters for all queries, so we have picked up TOP-50 queries for our analysis. However, we cannot simply apply ‘top’ to ‘total_time’, because the ‘total_time’ values for new queries will for a long time be much lower than those for older queries.

We decided to apply ‘top’ to the derivative of ‘total_time’ (rate). To do this, our agent fully reads the ‘pg_stat_statements’ values and saves the previous values of the counters. Then for each counter of each query, we attempt to additionally group similar queries (which Postgres considers different) and sum up their statistics. Finally, we apply ‘top’ to them and create dedicated metrics, while all remaining queries are summed up and written to the ”~other” query.

As a result, we will obtain 11 metrics for each query from ‘top’:

  • postgresql.query.time.cpu (we just subtracted the total drive awaiting time from ‘total_time’ for convenience)
  • postgresql.query.time.disk_read
  • postgresql.query.time.disk_write
  • postgresql.query.calls
  • postgresql.query.rows
  • postgresql.query.blocks.hit
  • postgresql.query.blocks.read
  • postgresql.query.blocks.written
  • postgresql.query.blocks.dirtied
  • postgresql.query.temp_blocks.read
  • postgresql.query.temp_blocks.written

Each of them will have a set of attributes (labels):

{“database”: “<db>”, “user”: “<user>”, “query”: “<query>”}

Detailed description — Postgres Query Monitoring metrics.

Interpreting the metrics

Users are often confused with the ‘postgresql.query.time.*’ metrics and their physical interpretation. Though it’s not always clear what the total response time really means, such metrics may be a good illustration of how various processes interact with one another.

Assuming that blockings are not taken into account, we can consider that PostgreSQL utilizes certain resources (CPU or disks) during query execution, and thus we can express this usage in resource seconds per second or in percent of CPU core utilization by a query vs. total CPU utilization.

Let’s see what we’ve got

First, we need to check whether our metrics are usable. For example, let’s try to find out why our database server performs more disk write operations than usual.

Let’s check whether PostgreSQL wrote anything to the disks at that moment:

Then we can figure out which queries have ‘dirtied’ the pages:

As we can see, though the query chart is not exactly the same as the buffer write chart, there is certain correlation between the two. The difference is because the block writing process is performed in background, thus changing the drive utilization profile.

Now let’s see the charts for read operations:

And again, there is certain correlation, but not an exact match — this is because PostgreSQL reads blocks from a drive not directly, but from file system’s cache, and thus the actual drive workload is partly hidden.

The CPU utilization rate can also be attributed to specific queries, but the analysis is not quite accurate due to possible locks and other delays:

Summary

  • We believe that ‘pg_stat_statements’ is a really great extension which provides in-depth statistics without heavy server workload.
  • However, users should keep in mind certain assumptions and inaccuracies in order to interpret these metrics.

Okmeter.io provides a complete and ready-to-use Postgres monitoring. Though we have an online live demo with query charts and statistics, it is still a demo with a synthetic workload, and it doesn’t quite resemble real life.

It’s always better to try it out on your project to see what’s happening in your actual production environment.

We offer two-week free trial, that you can use just for that!

I encourage you to try it, at the very least, you’ll know that everything is OK with your database and services.

If you want to know more about Postgres operations, you’ll find this article on “When SELECT can cause a data change and a disk write” truly interesting!

--

--