======Overview====== PostgreSQL is Open Source database and as such can be used for free for production environment. Furthermore PostgreSQL is very robust and is capabile of sustaining High Availability configurations with: * Replication: Both Physical (Native) and Logical (with Slony) * Native cluster( via: PostgreSQL-XL) * Integrated with: ETCD or Bucardo * It supports Partitioning both: by value and Hash (since 11) As any modern database, PostgreSQL is composed by two main structures: * Database Server: Such as Datafiles, Configuration files and WAL logs * Database Instance: The shared buffers, WAL buffer and other buffers which resides in the RAM Of course there are background processed to coordinate that dance between the instance and the server itself. So let's take a closed look of the archtiecture with the picture below: {{ :postgresql_basic_architecure.png?700 |}} Let's take a closer look on each of these structures: ======Database Instance====== The memory in PostgreSQL (and not only) is divided in two main catageries as well: * Shared Memory * Private Memory On the picture below you can clearly see this: {{ :postgres_instance.png?600 |}} =====Shared Memory===== The instance, in pretty much every database, is the collections of memory and buffers which serve the database. These buffers are used in order to speed the write/read operations to and from the database as RAM is usually faster than HDD drives. A database usually have separate buffers for user data and other memory needed for the database normal behaviour. For example, in PostgreSQL, the shared buffers keep the user data (tables, indexes, sequences and etc) while the WAL buffers keep the information regarding transactions and more precisely, how to redo a transaction. As already mentioned in previous sections, PostgreSQL is using MVCC, which allows it to keep several versions of the same data in memory. That, of course, is needed to provide different consistency levels to the users, but also prevents block readers from writers and writers from readers, one of the moto of Oracle engine. Such type of memory, is called SHARED memory. Because it is shared by all the sessions. In other words, all the sessions can and at some point READ/WRITE from this memory. Oposite of that, is the private memory. =====Private Memory===== In the private memory, session is storing information which is accessible ONLY to that session and really needed only by that session. For example: * Bind variables * Temp memory (For sorting) * Specific Query preferences * Others That memory isn't accessible by other processes. =====Utility Processes===== Utility processes are needed so the instance can interact with the Database server. They provide the connection between the instance and the server. For example, if a user wants to read data which isn't in the shared buffers, the WRITER has to read the data from the datafile and upload it into the shared buffers, then it will be passed to the session. As you can see, we have different processes for different structures, for example, we have: * Writer - Which writes to the database server data files. * WAL (WRITE AHEAD LOG) Writer process - Which Writes to the WAL LOG files * Logging Collection - Which gather events in the database * Etc. Some of these processes are optional like the (ARCHIVE) process, but very much needed if you want to have PITR Production capabilities. ======Database Server====== The server itself, or cluster as it is officially called. Represent the physical part of the database. That which resides on the File System (FS). That system is comprised by several layers of directories and structures. Below you can see a brief sumamrization, as we already discussed how it interacts with the instance: {{ :postgres_database_cluster.png?600 |}} That structure, doesn't represent the latest version, but it is good summary of the files which are mandatory in one PostgreSQL database. =====PGData Structure===== PostgreSQL doesn't store the objects with their name on the PGData directory. Rather it uses unique ID for it: * Databases have OID * Tables have FileNode ID -bash-4.2$ pwd /var/lib/pgsql/12/data -bash-4.2$ /usr/pgsql-12/bin/oid2name All databases: Oid Database Name Tablespace -------------------------------------------- 16424 edge_wallet pg_default 19352 edge_wallet_by pg_default 16425 extensions pg_default 16426 extensions_by pg_default 16427 game_registry pg_default Further, we can even explore the FileNode IDs for a particular database as so: -bash-4.2$ /usr/pgsql-12/bin/oid2name -d edge_wallet From database "edge_wallet": Filenode Table Name ----------------------------------- 16537 flyway_schema_history 16543 old_player_journal 16533 old_players 16527 old_transaction_journal 16526 old_transactions 16554 player_journal 16557 players 16536 schema_version 16542 transaction_journal 16528 transactions -bash-4.2$ We can also search for a particular table if we know, it's OID: -bash-4.2$ /usr/pgsql-12/bin/oid2name -d edge_wallet -f 16537 From database "edge_wallet": Filenode Table Name --------------------------------- 16537 flyway_schema_history -bash-4.2$ We can extract the file node using the followng SQL: edge_wallet=# select pg_relation_filepath('flyway_schema_history'); pg_relation_filepath ---------------------- base/16424/16537 (1 row) edge_wallet=# The structure is as follows: /path/database_oid/table_filenodeid From the preceding, it should be clear that every SQL table is stored as a file with a numeric name. However, PostgreSQL does not allow a single file to be greater than 1 GB in size, so what happens if a table grows beyond that limit? PostgreSQL "attaches" another file with a numeric extension that indicates the next chunk of 1 GB of data. In other words, if your table is stored in the 123 file, the second gigabyte will be stored in the 123.1 file, and if another gigabyte of storage is needed, another file, 123.2, will be created. Therefore, the filenode refers to the very first file related to a specific table, but more than one file can be stored on disk. =====Tablespaces===== PostgreSQL pretends to find all its data within the PGDATA directory, but that does not mean that your cluster is "jailed" to this directory. In fact, PostgreSQL allows "escaping" the PGDATA directory by means of tablespaces. A tablespace is a storage space that can be outside the PGDATA directory. Tablespaces are dragged into the PGDATA directory by means of symbolic links stored in the pg_tblspc subdirectory. In this way, the PostgreSQL processes do not have to seek outside PGDATA, still being able to access "external" storage. A tablespace can be used to achieve different aims, such as enlarging the storage data or providing different storage performances for specific objects. For instance, you can create a tablespace on a slow disk to contain infrequently accessed objects and tables, keeping fast storage within another tablespace for frequently accessed objects. You don't have to make links by yourself: PostgreSQL provides the TABLESPACE feature to manage this and the cluster will create and manage the appropriate links under the pg_tblspc subdirectory. For instance, the following is a PGDATA directory that has three different tablespaces: $ sudo ls -l /postgres/12/pg_tblspc/ total 0 lrwx------ 1 postgres postgres 22 Dec 23 19:47 16384 -> /data/tablespaces/ts_a lrwx------ 1 postgres postgres 22 Dec 23 19:47 16385 -> /data/tablespaces/ts_b lrwx------ 1 postgres postgres 22 Dec 23 19:47 16386 -> /data/tablespaces/ts_c lrwx------ 1 postgres postgres 22 Dec 23 19:47 16387 -> /data/tablespaces/ts_d As with databases, tablespaces have OIDs as well, which we can explore as follows: $ oid2name -s All tablespaces: Oid Tablespace Name ------------------------ 1663 pg_default 1664 pg_global 16384 ts_a 16385 ts_b 16386 ts_c 16387 ts_d =====Tables===== Tables or relations, will represent like 70% of your date, the other 29, being indexes. We can observe the table size, including other meta-data using the following query: ===Table Overview=== SELECT l.metric, l.nr AS bytes , CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty , CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row FROM ( SELECT min(tableoid) AS tbl -- = 'public.tbl'::regclass::oid , count(*) AS ct , sum(length(t::text)) AS txt_len -- length in characters FROM schema.table -- provide table name *once* ) x CROSS JOIN LATERAL ( VALUES (true , 'core_relation_size' , pg_relation_size(tbl)) , (true , 'visibility_map' , pg_relation_size(tbl, 'vm')) , (true , 'free_space_map' , pg_relation_size(tbl, 'fsm')) , (true , 'table_size_incl_toast' , pg_table_size(tbl)) , (true , 'indexes_size' , pg_indexes_size(tbl)) , (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl)) , (true , 'live_rows_in_text_representation' , txt_len) , (false, '------------------------------' , NULL) , (false, 'row_count' , ct) , (false, 'live_tuples' , pg_stat_get_live_tuples(tbl)) , (false, 'dead_tuples' , pg_stat_get_dead_tuples(tbl)) ) l(is_size, metric, nr); metric | bytes | bytes_pretty | bytes_per_row -----------------------------------+-------------+--------------+--------------- core_relation_size | 23835271168 | 22 GB | 2147 visibility_map | 737280 | 720 kB | 0 free_space_map | 5881856 | 5744 kB | 0 table_size_incl_toast | 23841898496 | 22 GB | 2148 indexes_size | 4579852288 | 4368 MB | 412 total_size_incl_toast_and_indexes | 28421750784 | 26 GB | 2561 live_rows_in_text_representation | 22837947661 | 21 GB | 2057 ------------------------------ | | | row_count | 11097648 | | live_tuples | 11507786 | | dead_tuples | 0 | As you can see, that query, shows us, the live and dead tuples, as well as the size of each component. Furthermore, we can slice the table vertically to see the size of each column: ===Column Size=== BEGIN; CREATE FUNCTION tc_column_size(table_name text, column_name text) RETURNS BIGINT AS $$ declare response BIGINT; BEGIN EXECUTE 'select sum(pg_column_size(t."' || column_name || '")) from ' || table_name || ' t ' into response; return response; END; $$ LANGUAGE plpgsql; SELECT z.table_name, z.column_name, pg_size_pretty(z.size) FROM ( SELECT table_name, column_name, tc_column_size(table_name, column_name) size FROM information_schema.columns WHERE table_schema = 'schema') AS z WHERE z.table_name = 'table' ORDER BY z.size DESC; table_name | column_name | pg_size_pretty -------------------------------+-----------------------------+---------------- store_sms_message_p2021_08_05 | delivery_time | store_sms_message_p2021_08_05 | profiling_category_id | store_sms_message_p2021_08_05 | profiling_category_rule_id | store_sms_message_p2021_08_05 | number_lookup_request_id | store_sms_message_p2021_08_05 | charge_on_status | store_sms_message_p2021_08_05 | sid_rewrite_id | store_sms_message_p2021_08_05 | category_list | 8949 MB store_sms_message_p2021_08_05 | message | 1335 MB store_sms_message_p2021_08_05 | message_data | 1334 MB store_sms_message_p2021_08_05 | country_id | 187 MB store_sms_message_p2021_08_05 | route_id | 187 MB store_sms_message_p2021_08_05 | message_id | 187 MB store_sms_message_p2021_08_05 | client_id | 187 MB store_sms_message_p2021_08_05 | client_connection_id | 187 MB store_sms_message_p2021_08_05 | network_id | 187 MB In our example, we can see that the category_list column occupy the most data.