Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
postgresql_management [2024/07/17 20:01] – [VACUUMING] andonovj | postgresql_management [2024/07/18 04:41] (current) – [VACUUMING] andonovj | ||
---|---|---|---|
Line 467: | Line 467: | ||
=====VACUUMING===== | =====VACUUMING===== | ||
+ | Vacuuming should be part of routine database maintenance, | ||
+ | Don’t run manual VACUUM or ANALYZE without reason. | ||
+ | Database administrators should refrain from running manual vacuums too often on the entire database, as the autovacuum process might already have optimally vacuumed the target database. As a result, a manual vacuum may not remove any dead tuples but cause unnecessary I/O loads or CPU spikes. | ||
+ | If necessary, manual vacuums should be run on a table-by-table basis only when necessary, like when there are low ratios of live rows to dead rows or large gaps between autovacuum operations. They should also be run when user activity is minimum. | ||
+ | Autovacuum also keeps a table’s data distribution statistics up-to-date (it doesn’t rebuild them). When manually run, the ANALYZE command rebuilds these statistics instead of updating them. Again, rebuilding statistics when they’re already optimally updated by a regular autovacuum might cause unnecessary pressure on system resources. | ||
+ | The time when you must run ANALYZE manually is immediately after bulk loading data into the target table. A large number (even a few hundred) of new rows in an existing table will significantly skew its column data distribution. The new rows will cause any existing column statistics to be out-of-date. When the query optimizer uses such statistics, query performance can be really slow. | ||
+ | In these cases, running the ANALYZE command immediately after a data load to rebuild the statistics completely is better than waiting for the autovacuum to kick in. | ||
+ | Select VACUUM FULL only when performance degrades badly | ||
The autovacuum functionality doesn’t recover disk space taken up by dead tuples. Running a VACUUM FULL command will do so, but has performance implications. The target table is exclusively locked during the operation, preventing even reads on the table. The process also makes a full copy of the table, which requires extra disk space when it runs. We recommend only running VACUUM FULL if there is a very high percentage of bloat and queries are suffering badly. We also recommend using periods of lowest database activity for it. | The autovacuum functionality doesn’t recover disk space taken up by dead tuples. Running a VACUUM FULL command will do so, but has performance implications. The target table is exclusively locked during the operation, preventing even reads on the table. The process also makes a full copy of the table, which requires extra disk space when it runs. We recommend only running VACUUM FULL if there is a very high percentage of bloat and queries are suffering badly. We also recommend using periods of lowest database activity for it. | ||
+ | |||
+ | Fine-tune Autovacuum Threshold | ||
+ | It’s essential to check or tune the autovacuum and analyze configuration parameters in the postgresql.conf file or in individual table properties to strike a balance between autovacuum and performance gain. | ||
PostgreSQL uses two configuration parameters to decide when to kick off an autovacuum: | PostgreSQL uses two configuration parameters to decide when to kick off an autovacuum: | ||
+ | - autovacuum_vacuum_threshold: | ||
+ | - autovacuum_vacuum_scale_factor: | ||
+ | |||
+ | Together, these parameters tell PostgreSQL to start an autovacuum when the number of dead rows in a table exceeds the number of rows in that table multiplied by the scale factor plus the vacuum threshold. In other words, PostgreSQL will start autovacuum on a table when: | ||
+ | pg_stat_user_tables.n_dead_tup > (pg_class.reltuples x autovacuum_vacuum_scale_factor) | ||
- | autovacuum_vacuum_threshold: | + | This may be sufficient for small to medium-sized tables. For example, in a table with 10,000 rows, the number |
- | | + | Not every table in a database experiences the same rate of data modification. Usually, a few large tables will experience frequent data modifications, |
+ | Therefore, the goal should be to set these thresholds to optimal values so autovacuum can happen at regular intervals and don’t take a long time (and affect user sessions) while keeping the number of dead rows relatively low. | ||
- | pg_stat_user_tables.n_dead_tup > (pg_class.reltuples x autovacuum_vacuum_scale_factor) + autovacuum_vacuum_threshold | + | One approach is to use one or the other parameter. So, if we set autovacuum_vacuum_scale_factor |
- | This may be sufficient for small to medium-sized tables. For example, in a table with 10,000 rows, the number of dead rows has to be over 2,050 ((10,000 x 0.2) + 50) before an autovacuum | + | Fine-tune Autoanalyze Threshold |
+ | Similar | ||
+ | | ||
+ | - autovacuum_analyze_scale_factor: | ||
+ | |||
+ | Like autovacuum, the autovacuum_analyze_threshold parameter can be set to a value that dictates the number of inserted, deleted, or updated tuples | ||
+ | The code snippet below shows the SQL syntax for modifying the autovacuum_analyze_threshold setting for a table. | ||
+ | ALTER TABLE < | ||
+ | |||
+ | Fine-tune Autovacuum workers | ||
+ | Another parameter often overlooked is autovacuum_max_workers, | ||
+ | A common practice by PostgreSQL DBAs is to increase the number of maximum worker threads to speed up autovacuum. This doesn’t work as all the threads share the same autovacuum_vacuum_cost_limit, | ||
+ | |||
+ | individual thread’s cost_limit = autovacuum_vacuum_cost_limit / autovacuum_max_workers | ||
+ | |||
+ | The cost of work done by an autovacuum thread is calculated using three parameters: | ||
+ | - vacuum_cost_page_hit: | ||
+ | - vacuum_cost_page_miss: | ||
+ | - vacuum_cost_page_dirty: | ||
+ | |||
+ | What these parameters mean is this: | ||
+ | When a vacuum thread finds the data page that it’s supposed to clean in the shared buffer, the cost is 1. | ||
+ | If the data page is not in the shared buffer but the OS cache, the cost will be 10. | ||
+ | If the page has to be marked dirty because the vacuum thread had to delete dead rows, the cost will be 20. | ||
+ | An increased | ||
+ | A better way is to tune these parameters for individual tables only when necessary. For example, if the autovacuum | ||
+ | The code snippet below shows how to configure individual tables. | ||
+ | |||
+ | * ALTER TABLE < | ||
+ | * ALTER TABLE < | ||
- | ALTER TABLE < | + | Using the first parameter will ensure the autovacuum thread assigned to the table performs more work before going to sleep. Lowering the autovacuum_vacuum_cost_delay will also mean the thread sleeps for less time. |
+ | Get more best practice tips from our professional team of PostgreSQL experts: | ||
===Examples=== | ===Examples=== |