In this section we will discuss several things you can do with postgresql in terms of management. Let's start with user management:
Every role has a unique name and also an OID value, which represents the role as a numerical value. This is similar to how users are represented in the Unix system (and many others) where the numerical value of a role is used only internally.
User management in postgresql is quite different than other databases. In PostgreSQL, you should remember one thing, EVERYTHING IS ROLE. You have roles which can connect (e.g. users) and roles which cannot connect (groups). You can nest roles one into another and thus granding privileges and groups to other users.
In order to create a user in PostgreSQL, you have to:
So let's get going:
Again, user is a role which can connect and has password, that is the only difference between user and group in PostgreSQL. So to create a user we can use the following:
Create a User
CREATE ROLE jcaruana with password 'yxKEt045ZKcWiiu2YBxJ' INHERIT LOGIN;
Grant role
grant dev to jcaruana; grant select on all tables in schema public to dba;
To reset the password, we can use the following command:
Reset password
ALTER USER integration_playson WITH PASSWORD 'xjaR3IvMDXF9NvPHwVSe';
List Roles
forumdb=> SELECT * FROM pg_roles WHERE rolname = 'luca'; -[ RECORD 1 ]--+--------- rolname | luca rolsuper | f rolinherit | t rolcreaterole | f rolcreatedb | f rolcanlogin | t rolreplication | f rolconnlimit | 1 rolpassword | ******** rolvaliduntil | rolbypassrls | f rolconfig | oid | 16390 forumdb=# \x Expanded display is on. forumdb=# SELECT * FROM pg_authid WHERE rolname = 'luca'; -[ RECORD 1 ]--+------------------------------------ oid | 16390 rolname | luca rolsuper | f rolinherit | t rolcreaterole | f rolcreatedb | f rolcanlogin | t rolreplication | f rolbypassrls | f rolconnlimit | 1 rolpassword | md5bd18b4163ec8a3ad833d867a5933c8ec rolvaliduntil |
List Users with their roles
forumdb=> SELECT r.rolname, g.rolname AS group, m.admin_option AS is_admin FROM pg_auth_members m JOIN pg_roles r ON r.oid = m.member JOIN pg_roles g ON g.oid = m.roleid ORDER BY r.rolname; rolname | group | is_admin ------------+----------------------+---------- enrico | book_authors | f enrico | forum_admins | f luca | forum_stats | f luca | book_authors | f pg_monitor | pg_read_all_settings | f pg_monitor | pg_read_all_stats | f pg_monitor | pg_stat_scan_tables | f test | forum_stats | f (8 rows)
PostgreSQL stores permissions assigned to roles and objects as Access Control Lists (ACLs), and, when needed, it examines the ACLs for a specific role and a database object in order to understand whether the command or query can be performed. In this section, you will learn what ACLs are, how they are stored, and how to interpret them to understand what permissions an ACL provides.
An ACL is a representation of a group of permissions with the following structure:
Example
luca=arw/enrico
First of all, identify the roles involed: luca and enrico. luca is the role before the equals sign, hence it is the role the ACL refers to, which means this ACL describes what permissions the luca role has. The other role, enrico, is after the slash sign and therefore is the role that granted luca permissions. Now, with respect to the flags, the ACL provides an append (a), read ®, and write (w) permissions. The above reads as “enrico granted luca to perform INSERT, UPDATE, and SELECT on the table.”
Let's now see an example of ACLs from a table in the database: you can use the special \dp psql command to get information about a table:
Display ACL
forumdb=> \dp categories Access privileges Schema | Name | Type | Access privileges | Column privileges | Policies --------+------------+-------+-----------------------+--------------------+---------- public | categories | table | enrico=arwdDxt/enrico+| | | | | luca=arw/enrico +| | | | | =d/enrico | |
Display ACL v2
forumdb=> WITH acl AS ( SELECT relname, (aclexplode(relacl)).grantor, (aclexplode(relacl)).grantee, (aclexplode(relacl)).privilege_type FROM pg_class ) SELECT g.rolname AS grantee, acl.privilege_type AS permission, gg.rolname AS grantor FROM acl JOIN pg_roles g ON g.oid = acl.grantee JOIN pg_roles gg ON gg.oid = acl.grantor WHERE acl.relname = 'categories'; grantee | permission | grantor -------------+------------+--------- luca | INSERT | luca luca | SELECT | luca luca | UPDATE | luca luca | DELETE | luca luca | TRUNCATE | luca luca | REFERENCES | luca luca | TRIGGER | luca forum_stats | INSERT | luca forum_stats | SELECT | luca forum_stats | UPDATE | luca
When it comes to database management, there are MANY possibilities of what you want to do. Let's see how we can dump data to CSV:
We can use the COPY tool from WITHIN the database as follows:
Copy data to CSV
hunter_dev=# \copy (SQL STATEMENT) to 'out.csv' with csv header
That will execute teh SQL statement and put the data into a CSV file with the name of: “out.csv” and it will put the header on the top
Now, to clean pg_wal folder, we need to back them up. Assuming, we have done that OR that we don't care, we can use the following process:
Find latest checkpoint file
[root@s24-se-db01 pg_wal]# /usr/pgsql-11/bin/pg_controldata -D /var/lib/pgsql/11/data/ pg_control version number: 1100 Catalog version number: 201809051 Database system identifier: 6742286774524545149 Database cluster state: in production pg_control last modified: Fri 21 May 2021 21:37:51 CEST Latest checkpoint location: B6D/62D62990 Latest checkpoint's REDO location: B6D/62941780 Latest checkpoint's REDO WAL file: 0000000100000B6D00000062 <- This entry Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:774839045 Latest checkpoint's NextOID: 300868 Latest checkpoint's NextMultiXactId: 75180307 Latest checkpoint's NextMultiOffset: 30502311 Latest checkpoint's oldestXID: 600002360 Latest checkpoint's oldestXID's DB: 16406 Latest checkpoint's oldestActiveXID: 760450727 Latest checkpoint's oldestMultiXid: 62073622 Latest checkpoint's oldestMulti's DB: 16407 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Time of latest checkpoint: Fri 21 May 2021 21:35:46 CEST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: logical wal_log_hints setting: off max_connections setting: 1500 max_worker_processes setting: 28 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996
Before officially cleaning we can do a dry run:
Dry run
[root@s24-se-db01 pg_wal]# /usr/pgsql-11/bin/pg_archivecleanup -n /var/lib/pgsql/11/data/pg_wal/ 0000000100000B6D00000061 [root@s24-se-db01 pg_wal]#
Then we can do the actual delete:
Clean WAL Files
[root@s24-se-db01 pg_wal]# /usr/pgsql-11/bin/pg_archivecleanup -d /var/lib/pgsql/11/data/pg_wal/ 0000000100000B6D00000061 pg_archivecleanup: keeping WAL file "/var/lib/pgsql/11/data/pg_wal//0000000100000B6D00000061" and later pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B6A0000008C" pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B6400000053" pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B5400000008" pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B550000007A" pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B4C0000000E" pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B65000000E1" pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B480000008F" pg_archivecleanup: removing file "/var/lib/pgsql/11/data/pg_wal//0000000100000B61000000BC" [root@s24-se-db01 pg_wal]#
Partitioning in PostgreSQL, can be achieved in several ways, but in general are these:
Each has it's pros and cons, we have to start somewhere, so let's start with the declarative partitioning.
PostgreSQL offers a way to specify how to divide a table into pieces called partitions. The table that is divided is referred to as a partitioned table. The specification consists of the partitioning method and a list of columns or expressions to be used as the partition key.
So let's see how they are done:
We can create a table, using declarative partitioning as follows:
Create table
CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktemp int, unitsales int ) PARTITION BY RANGE (logdate);
Then we can define a partition as folows:
Define partition
CREATE TABLE measurement_y2021m09 PARTITION OF measurement FOR VALUES FROM ('2021-09-01') TO ('2021-10-01') WITH (parallel_workers = 4) TABLESPACE tbs_fast;
Let's say, we want to move the partition to a new tablespace (because it became old). We can do that (with a cost) but first, let's create new partition and fill our old partition.
Fill the partition
do $$ declare counter integer := 0; begin while counter != 100000000 LOOP insert into measurement values (counter,'2021-09-04',counter,counter+1); counter = counter + 1; END LOOP; end; $$; CREATE TABLE measurement_y2021m10 PARTITION OF measurement FOR VALUES FROM ('2021-10-02') TO ('2021-11-01') WITH (parallel_workers = 4) TABLESPACE tbs_fast;
That is simple, procedure, but quite effective. Now we have 2 partitions, one filled with data and one who doesn't have anything. Now, let's move the old partition (09) into the slower tablespace:
Move partition
repmgr=# alter table measurement_y2021m09 set tablespace tbs_slow; ALTER TABLE repmgr=#
During the move, PostgreSQL locks the PARTITION with Access Exclusive Lock. Which means, no one can do anything.
Partition Lock
repmgr=# SELECT a.datname, c.relname, l.transactionid, l.mode, l.GRANTED, a.usename, a.query, a.query_start, age(now(), a.query_start) AS "age", a.pid FROM pg_stat_activity a JOIN pg_locks l ON l.pid = a.pid JOIN pg_class c ON c.oid = l.relation where c.relname like 'measurement%' ORDER BY a.query_start; -[ RECORD 1 ]-+---------------------------------------------------------- datname | repmgr relname | measurement_y2021m09 transactionid | mode | AccessExclusiveLock granted | t usename | postgres query | alter table measurement_y2021m09 set tablespace tbs_slow; query_start | 2021-09-17 07:16:55.676676+00 age | 00:00:44.439482 pid | 20994
*We can eliminate the lock (RW) if add NOT VALID constraint (TODO)
BEGIN; create schema partman; create extension pg_partman schema partman; ALTER TABLE auditlog RENAME TO auditlog_old; CREATE TABLE auditlog ( LIKE auditlog_old including all) PARTITION BY RANGE (inserted_at); CREATE TABLE auditlog_template (LIKE auditlog_old including all); alter table auditlog_old add constraint old_data_constraint check ( inserted_at >= '1900-01-01' and inserted_at <= '2024-04-30' ) NOT VALID; alter table auditlog_old validate constraint old_data_constraint; ALTER TABLE auditlog ATTACH PARTITION auditlog_old FOR VALUES FROM ('1900-01-01') TO ('2024-05-01'); SELECT partman.create_parent('public.auditlog', p_control := 'inserted_at', p_type := 'native', p_template_table:= 'public.auditlog_template', p_interval := 'monthly', p_premake := 4, p_start_partition := '2024-05-01'); CALL partman.run_maintenance_proc(0,false,false);
Of course, operations on other partitions can contonue:
Other partitions
repmgr=# insert into measurement values(2,'2021-10-05',1,2); INSERT 0 1 repmgr=# insert into measurement values(2,'2021-10-05',1,2); INSERT 0 1 repmgr=# commit repmgr-# ; WARNING: there is no transaction in progress COMMIT repmgr=# select * from measurement_y2021m10; city_id | logdate | peaktemp | unitsales ---------+------------+----------+----------- 2 | 2021-10-05 | 1 | 2 2 | 2021-10-05 | 1 | 2
If you try, to even select data from the partition which we move, you will be blocked:
Try select on moved partition
repmgr=# select * from measurement_y2021m09 limit 1;
Blocked Select
postgres=# \c repmgr You are now connected to database "repmgr" as user "postgres". repmgr=# SELECT blocked_locks.pid AS blocked_pid, repmgr-# blocked_activity.usename AS blocked_user, repmgr-# blocking_locks.pid AS blocking_pid, repmgr-# blocking_activity.usename AS blocking_user, repmgr-# blocked_activity.query AS blocked_statement, repmgr-# blocking_activity.query AS current_statement_in_blocking_process repmgr-# FROM pg_catalog.pg_locks blocked_locks repmgr-# JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid repmgr-# JOIN pg_catalog.pg_locks blocking_locks repmgr-# ON blocking_locks.locktype = blocked_locks.locktype repmgr-# AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE repmgr-# AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation repmgr-# AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page repmgr-# AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple repmgr-# AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid repmgr-# AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid repmgr-# AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid repmgr-# AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid repmgr-# AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid repmgr-# AND blocking_locks.pid != blocked_locks.pid repmgr-# repmgr-# JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid repmgr-# WHERE NOT blocked_locks.GRANTED; blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process -------------+--------------+--------------+---------------+---------------------------------------------+----------------------------------------------------------- 21142 | postgres | 20994 | postgres | select * from measurement_y2021m09 limit 1; | alter table measurement_y2021m09 set tablespace tbs_fast; (1 row)
We can check the location of the table using the following function:
Check table location
repmgr=# SELECT pg_relation_filepath('measurement_y2021m09'); pg_relation_filepath --------------------------------------------- pg_tblspc/16455/PG_12_201909212/16385/16465 (1 row)
While the built-in declarative partitioning is suitable for most common use cases, there are some circumstances where a more flexible approach may be useful. Partitioning can be implemented using table inheritance, which allows for several features which are not supported by declarative partitioning, such as:
Vacuuming should be part of routine database maintenance, and there are optimal ways to go about it. Below are some performance tips for PostgreSQL vacuuming. Don’t run manual VACUUM or ANALYZE without reason. Database administrators should refrain from running manual vacuums too often on the entire database, as the autovacuum process might already have optimally vacuumed the target database. As a result, a manual vacuum may not remove any dead tuples but cause unnecessary I/O loads or CPU spikes. If necessary, manual vacuums should be run on a table-by-table basis only when necessary, like when there are low ratios of live rows to dead rows or large gaps between autovacuum operations. They should also be run when user activity is minimum. Autovacuum also keeps a table’s data distribution statistics up-to-date (it doesn’t rebuild them). When manually run, the ANALYZE command rebuilds these statistics instead of updating them. Again, rebuilding statistics when they’re already optimally updated by a regular autovacuum might cause unnecessary pressure on system resources. The time when you must run ANALYZE manually is immediately after bulk loading data into the target table. A large number (even a few hundred) of new rows in an existing table will significantly skew its column data distribution. The new rows will cause any existing column statistics to be out-of-date. When the query optimizer uses such statistics, query performance can be really slow. In these cases, running the ANALYZE command immediately after a data load to rebuild the statistics completely is better than waiting for the autovacuum to kick in. Select VACUUM FULL only when performance degrades badly The autovacuum functionality doesn’t recover disk space taken up by dead tuples. Running a VACUUM FULL command will do so, but has performance implications. The target table is exclusively locked during the operation, preventing even reads on the table. The process also makes a full copy of the table, which requires extra disk space when it runs. We recommend only running VACUUM FULL if there is a very high percentage of bloat and queries are suffering badly. We also recommend using periods of lowest database activity for it.
Fine-tune Autovacuum Threshold It’s essential to check or tune the autovacuum and analyze configuration parameters in the postgresql.conf file or in individual table properties to strike a balance between autovacuum and performance gain. PostgreSQL uses two configuration parameters to decide when to kick off an autovacuum:
Together, these parameters tell PostgreSQL to start an autovacuum when the number of dead rows in a table exceeds the number of rows in that table multiplied by the scale factor plus the vacuum threshold. In other words, PostgreSQL will start autovacuum on a table when:
pg_stat_user_tables.n_dead_tup > (pg_class.reltuples x autovacuum_vacuum_scale_factor) + autovacuum_vacuum_threshold
This may be sufficient for small to medium-sized tables. For example, in a table with 10,000 rows, the number of dead rows has to be over 2,050 ((10,000 x 0.2) + 50) before an autovacuum kicks off. Not every table in a database experiences the same rate of data modification. Usually, a few large tables will experience frequent data modifications, resulting in a higher number of dead rows. The default values may not work for such tables. For example, with the default values, a table with 1 million rows must have more than 200,050 dead rows before an autovacuum starts ((1000,000 x 0.2) + 50). This can mean longer gaps between autovacuums, increasingly long autovacuum times, and worse, autovacuum not running at all if active transactions on the table are locking it.
Therefore, the goal should be to set these thresholds to optimal values so autovacuum can happen at regular intervals and don’t take a long time (and affect user sessions) while keeping the number of dead rows relatively low.
One approach is to use one or the other parameter. So, if we set autovacuum_vacuum_scale_factor to 0 and instead set autovacuum_vacuum_threshold to, say, 5,000, a table will be autovacuumed when its number of dead rows is more than 5,000.
Fine-tune Autoanalyze Threshold Similar to autovacuum, autoanalyze also uses two parameters that decide when autovacuum will also trigger an autoanalyze:
Like autovacuum, the autovacuum_analyze_threshold parameter can be set to a value that dictates the number of inserted, deleted, or updated tuples in a table before an autoanalyze starts. We recommend setting this parameter separately on large and high-transaction tables. The table configuration will override the postgresql.conf values. The code snippet below shows the SQL syntax for modifying the autovacuum_analyze_threshold setting for a table.
ALTER TABLE <table_name> SET (autovacuum_analyze_threshold = <threshold rows>)
Fine-tune Autovacuum workers Another parameter often overlooked is autovacuum_max_workers, with a default value of 3. Autovacuum is not a single process but a number of individual vacuum threads running in parallel. The reason for specifying multiple workers is to ensure that vacuuming large tables isn’t holding up vacuuming smaller tables and user sessions. The autovacuum_max_workers parameter tells PostgreSQL to spin up the number of autovacuum worker threads to do the cleanup. A common practice by PostgreSQL DBAs is to increase the number of maximum worker threads to speed up autovacuum. This doesn’t work as all the threads share the same autovacuum_vacuum_cost_limit, which has a default value of 200. Each autovacuum thread is assigned a cost limit using the formula shown below:
individual thread’s cost_limit = autovacuum_vacuum_cost_limit / autovacuum_max_workers
The cost of work done by an autovacuum thread is calculated using three parameters:
What these parameters mean is this: When a vacuum thread finds the data page that it’s supposed to clean in the shared buffer, the cost is 1. If the data page is not in the shared buffer but the OS cache, the cost will be 10. If the page has to be marked dirty because the vacuum thread had to delete dead rows, the cost will be 20. An increased number of worker threads will lower the cost limit for each thread. As each thread is assigned a lower cost limit, it will go to sleep more often as the cost threshold is easily reached, ultimately causing the whole vacuum process to run slow. We recommend increasing the autovacuum_vacuum_cost_limit to a higher value, like 2000, and then adjusting the maximum number of worker threads. A better way is to tune these parameters for individual tables only when necessary. For example, if the autovacuum of a large transactional table is taking too long, the table may be temporarily configured to use its own vacuum cost limit and cost delays. The cost limit and delay will override the system-wide values set in postgresql.conf. The code snippet below shows how to configure individual tables.
Using the first parameter will ensure the autovacuum thread assigned to the table performs more work before going to sleep. Lowering the autovacuum_vacuum_cost_delay will also mean the thread sleeps for less time. Get more best practice tips from our professional team of PostgreSQL experts: