PostgreSQL Series: Part 1: The 8 Tunable Parameters
PostgreSQL is an open-source DBMS that typically supports both SQL and JSON for relational and non-relational queries respectively. PostgreSQL is typically known for its reliability, architecture, data integrity, and scalability. It is backed by an experienced community of open-source developers who have contributed to it significantly to make it what it is today. The roots of PostgreSQL were laid back in 1986 as part of a project at the University of California, Berkeley. It runs on all the major operating systems and supports advanced performance optimization and data types that are typically found only in commercial databases such as Oracle and SQL Server.
PostgreSQL is typically shipped with basic configurations. Now, these configurations are focused primarily on compatibility with multiple systems rather than performance. It is up to the developers to tune the default parameters to their requirements and reap the benefits of a powerful DBMS platform. Here are 8 tunable parameters of Postgre that will have maximum impact on the performance when you tune –
8 Tunable Parameters
1) shared_buffer: (Recommended Value: 25% of system memory)
The first tunable parameter is shared_buffer. PostgreSQL uses its own buffer and kernel buffered IO. What this means is the data is stored twice in a memory which is first in the PostgreSQL buffer and then in the kernel buffer. There is no direct IO in PostgreSQL, unlike other DBMS. This is termed as double buffering. The buffer of PostgreSQL is called shared_buffer and takes the first rank in our list of parameters due to its ability to affect the performance. It typically sets the level of dedicated memory that I used by PostgreSQL for cache purposes.
As expected, the default value of shared_buffer is very low out of the box which has been kept that way to make it compatible with machines that don’t support high values. But most modern machines can easily support higher values and thus you should plan to increase the value to optimize the performance.
Experts recommend a value that is close to 25% of your system’s RAM. You should also play around with the values and may experience better performance with a value lower or higher than 25%.
2) wal_buffers: Recommended Value: 16MB or higher)
The Write Ahead Log (WAL) record is written by PostgreSQL in buffers. These buffers are then flushed to the disk. The default size of these buffers which is actually defined by wal_buffers is 16MB. However, we recommend a higher value if you have multiple concurrent connections.
We recommend you to experiment with 32MB and 64MB, but with synchronous_commit=off
3) effective_cache_size:
This parameter provides the estimate of total memory available for disk caching. Do note that this is just a guideline and not the exact cache size. effective_cache_size doesn’t allocate memory, instead, it informs the optimizer about the level of cache available in the kernel. If this value is set low, then the query planner will automatically restrain from using certain indexes and sometimes these indexes are helpful. So always set a large value of this parameter.
4) work_mem:
This configuration is used in the case of complex sorting. If you intend to do complex sorting, then it is ideal to increase the value of this parameter to see good results. Do note that in-memory sorts are quicker than sorts overflowing to disk. Thus setting a high value can be counter-effective as it turns into a bottleneck for your deployment environment as this parameter is sort operation per user. Therefore based on the number of users trying to execute the sort operations, the system will allocate sort operations that are equal to the value of work_mem *total sort. Thus do not set this parameter globally and instead keep modifying this at the session-level.
5) maintenance_work_mem:
The maintenance_work_mem is used to specify the maximum memory that can be used by maintenance operations such as VACUUM, ALTER TABLE ADD FOREIGN KEY and CREATE INDEX. Since at any given point of time, only one of these operations is executed and also since PostgreSQL doesn’t have many of them running in parallel, it is ideal to set the value of this parameter higher than work_mem. In fact, setting a larger value might improve the performance of vacuuming and restoring database dumps.
6) temp_buffers:
The temp_buffers parameter is used to set the maximum number of temporary buffers that are used by every database session. The settings of temp_buffers can be altered within every session, however only before the initial use of temporary tables in the session.
PostgreSQL uses this memory to hold the temporary tables for each session. This memory will be cleared when the connection is terminated.
The value of temp_buffers in the out of the box state is 8MB. However, we recommend 32MB.
7) synchronous_commit:
The synchronous_commit parameter helps to enforce the commit that will typically wait for the WAL to be written on the disk. This is done before a success status is returned to the client. This parameter is a clear trade-off between performance and reliability and depends purely on your application. If your application is designed in a way that performance is more important, then the synchronous_commit parameter should be turned off. As a result, there will be a time-gap between the success status and the guaranteed write to disk. During a server crash, the data could be lost even if the client receives a success message on commit. In such instances, transactions commit very fast and will not wait for a WAL file to be flushed. However, in this case, reliability is compromised.
8) checkpoint_timeout & checkpoint_completion_target:
Whenever PostgreSQL writes any changes to WAL, the checkpoint flushes the data into the data files. This activity is usually done when the CHECKPOINT occurs. It is not just an expensive operation but also results in a huge amount of IO. The entire process involves expensive read/write operations on the disk. Users can issue CHECKPOINT whenever it looks necessary or to automate the system by PostgreSQL’s parameters viz. checkpoint_timeout and checkpoint_completion_target.
The checkpoint_timeout parameter is used mainly to set the time between WAL checkpoints. A low value of this parameter can decrease the crash recovery time, however, their performance will be impacted since every checkpoint ends up consuming important system resources.
Similarly, the checkpoint_completion_target is the fraction of time between the checkpoints for checkpoint completion. A high frequency of checkpoints can possibly affect performance. Hence for a smooth process of checkpoint setting, the checkpoint_timeout parameter must have a low value. Else, the OS will start accumulating all the bad pages until the ratio is met, which will then result in a huge flush.
Wrapping Up
This post talks about the 8 main parameters which when tuned to appropriate values can generate great results for your application. However, these are not the only parameters that can have an impact. There are several other parameters that can play a role in improving the performance of PostgreSQL. These parameters will be covered in our next post in the PostgreSQL series.