Overview

PostgreSQL is Open Source database and as such can be used for free for production environment. Furthermore PostgreSQL is very robust and is capabile of sustaining High Availability configurations with:

Replication: Both Physical (Native) and Logical (with Slony)
Native cluster( via: PostgreSQL-XL)
Integrated with: ETCD or Bucardo
It supports Partitioning both: by value and Hash (since 11)

As any modern database, PostgreSQL is composed by two main structures:

Database Server: Such as Datafiles, Configuration files and WAL logs
Database Instance: The shared buffers, WAL buffer and other buffers which resides in the RAM

Of course there are background processed to coordinate that dance between the instance and the server itself. So let's take a closed look of the archtiecture with the picture below:

Let's take a closer look on each of these structures:

Database Instance

The memory in PostgreSQL (and not only) is divided in two main catageries as well:

Shared Memory
Private Memory

On the picture below you can clearly see this:

Shared Memory

The instance, in pretty much every database, is the collections of memory and buffers which serve the database. These buffers are used in order to speed the write/read operations to and from the database as RAM is usually faster than HDD drives. A database usually have separate buffers for user data and other memory needed for the database normal behaviour. For example, in PostgreSQL, the shared buffers keep the user data (tables, indexes, sequences and etc) while the WAL buffers keep the information regarding transactions and more precisely, how to redo a transaction. As already mentioned in previous sections, PostgreSQL is using MVCC, which allows it to keep several versions of the same data in memory. That, of course, is needed to provide different consistency levels to the users, but also prevents block readers from writers and writers from readers, one of the moto of Oracle engine.

Such type of memory, is called SHARED memory. Because it is shared by all the sessions. In other words, all the sessions can and at some point READ/WRITE from this memory. Oposite of that, is the private memory.

Private Memory

In the private memory, session is storing information which is accessible ONLY to that session and really needed only by that session. For example:

Bind variables
Temp memory (For sorting)
Specific Query preferences
Others

That memory isn't accessible by other processes.

Utility Processes

Utility processes are needed so the instance can interact with the Database server. They provide the connection between the instance and the server. For example, if a user wants to read data which isn't in the shared buffers, the WRITER has to read the data from the datafile and upload it into the shared buffers, then it will be passed to the session.

As you can see, we have different processes for different structures, for example, we have:

Writer - Which writes to the database server data files.
WAL (WRITE AHEAD LOG) Writer process - Which Writes to the WAL LOG files
Logging Collection - Which gather events in the database
Etc.

Some of these processes are optional like the (ARCHIVE) process, but very much needed if you want to have PITR Production capabilities.

Database Server

The server itself, or cluster as it is officially called. Represent the physical part of the database. That which resides on the File System (FS). That system is comprised by several layers of directories and structures. Below you can see a brief sumamrization, as we already discussed how it interacts with the instance:

That structure, doesn't represent the latest version, but it is good summary of the files which are mandatory in one PostgreSQL database.

PGData Structure

PostgreSQL doesn't store the objects with their name on the PGData directory. Rather it uses unique ID for it:

Databases have OID
Tables have FileNode ID

Explore Database OIDs

-bash-4.2$ pwd
/var/lib/pgsql/12/data
-bash-4.2$ /usr/pgsql-12/bin/oid2name
All databases:
    Oid            Database Name  Tablespace
--------------------------------------------
  16424              edge_wallet  pg_default
  19352           edge_wallet_by  pg_default
  16425               extensions  pg_default
  16426            extensions_by  pg_default
  16427            game_registry  pg_default

Further, we can even explore the FileNode IDs for a particular database as so:

Explore Table's FileNode IDs

-bash-4.2$ /usr/pgsql-12/bin/oid2name -d edge_wallet
From database "edge_wallet":
  Filenode               Table Name
-----------------------------------
     16537    flyway_schema_history
     16543       old_player_journal
     16533              old_players
     16527  old_transaction_journal
     16526         old_transactions
     16554           player_journal
     16557                  players
     16536           schema_version
     16542      transaction_journal
     16528             transactions
-bash-4.2$

We can also search for a particular table if we know, it's OID:

Search object by OID

-bash-4.2$ /usr/pgsql-12/bin/oid2name -d edge_wallet -f 16537
From database "edge_wallet":
  Filenode             Table Name
---------------------------------
     16537  flyway_schema_history
-bash-4.2$

We can extract the file node using the followng SQL:

Extract Table FileNode ID

edge_wallet=# select pg_relation_filepath('flyway_schema_history');
 pg_relation_filepath
----------------------
 base/16424/16537
(1 row)

edge_wallet=#

The structure is as follows: /path/database_oid/table_filenodeid

From the preceding, it should be clear that every SQL table is stored as a file with a numeric name. However, PostgreSQL does not allow a single file to be greater than 1 GB in size, so what happens if a table grows beyond that limit? PostgreSQL “attaches” another file with a numeric extension that indicates the next chunk of 1 GB of data. In other words, if your table is stored in the 123 file, the second gigabyte will be stored in the 123.1 file, and if another gigabyte of storage is needed, another file, 123.2, will be created. Therefore, the filenode refers to the very first file related to a specific table, but more than one file can be stored on disk.

Tablespaces

PostgreSQL pretends to find all its data within the PGDATA directory, but that does not mean that your cluster is “jailed” to this directory. In fact, PostgreSQL allows “escaping” the PGDATA directory by means of tablespaces. A tablespace is a storage space that can be outside the PGDATA directory. Tablespaces are dragged into the PGDATA directory by means of symbolic links stored in the pg_tblspc subdirectory. In this way, the PostgreSQL processes do not have to seek outside PGDATA, still being able to access “external” storage. A tablespace can be used to achieve different aims, such as enlarging the storage data or providing different storage performances for specific objects. For instance, you can create a tablespace on a slow disk to contain infrequently accessed objects and tables, keeping fast storage within another tablespace for frequently accessed objects.

You don't have to make links by yourself: PostgreSQL provides the TABLESPACE feature to manage this and the cluster will create and manage the appropriate links under the pg_tblspc subdirectory.

For instance, the following is a PGDATA directory that has three different tablespaces:

Tablespace example

$ sudo ls -l /postgres/12/pg_tblspc/
total 0
lrwx------  1 postgres  postgres  22 Dec 23 19:47 16384 -> /data/tablespaces/ts_a
lrwx------  1 postgres  postgres  22 Dec 23 19:47 16385 -> /data/tablespaces/ts_b
lrwx------  1 postgres  postgres  22 Dec 23 19:47 16386 -> /data/tablespaces/ts_c
lrwx------  1 postgres  postgres  22 Dec 23 19:47 16387 -> /data/tablespaces/ts_d

As with databases, tablespaces have OIDs as well, which we can explore as follows:

Tablespace OIDs example

$ oid2name -s
All tablespaces:
    Oid  Tablespace Name
------------------------
   1663       pg_default
   1664        pg_global
  16384             ts_a
  16385             ts_b
  16386             ts_c
  16387             ts_d

Tables

Tables or relations, will represent like 70% of your date, the other 29, being indexes. We can observe the table size, including other meta-data using the following query:

Table Overview

SELECT l.metric, l.nr AS bytes
, CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
, CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row
FROM (
SELECT min(tableoid) AS tbl -- = 'public.tbl'::regclass::oid
, count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
FROM schema.table -- provide table name *once*
) x
CROSS JOIN LATERAL (
VALUES
(true , 'core_relation_size' , pg_relation_size(tbl))
, (true , 'visibility_map' , pg_relation_size(tbl, 'vm'))
, (true , 'free_space_map' , pg_relation_size(tbl, 'fsm'))
, (true , 'table_size_incl_toast' , pg_table_size(tbl))
, (true , 'indexes_size' , pg_indexes_size(tbl))
, (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl))
, (true , 'live_rows_in_text_representation' , txt_len)
, (false, '------------------------------' , NULL)
, (false, 'row_count' , ct)
, (false, 'live_tuples' , pg_stat_get_live_tuples(tbl))
, (false, 'dead_tuples' , pg_stat_get_dead_tuples(tbl))
) l(is_size, metric, nr);


			  metric               |    bytes    | bytes_pretty | bytes_per_row
-----------------------------------+-------------+--------------+---------------
 core_relation_size                | 23835271168 | 22 GB        |          2147
 visibility_map                    |      737280 | 720 kB       |             0
 free_space_map                    |     5881856 | 5744 kB      |             0
 table_size_incl_toast             | 23841898496 | 22 GB        |          2148
 indexes_size                      |  4579852288 | 4368 MB      |           412
 total_size_incl_toast_and_indexes | 28421750784 | 26 GB        |          2561
 live_rows_in_text_representation  | 22837947661 | 21 GB        |          2057
 ------------------------------    |             |              |
 row_count                         |    11097648 |              |
 live_tuples                       |    11507786 |              |
 dead_tuples                       |           0 |

As you can see, that query, shows us, the live and dead tuples, as well as the size of each component. Furthermore, we can slice the table vertically to see the size of each column:

Column Size

BEGIN;
CREATE FUNCTION tc_column_size(table_name text, column_name text)
    RETURNS BIGINT AS
$$
    declare response BIGINT;
BEGIN
    EXECUTE 'select sum(pg_column_size(t."' || column_name || '")) from ' || table_name || ' t ' into response;
    return response;
END;
$$
    LANGUAGE plpgsql;
    SELECT
        z.table_name,
        z.column_name,
        pg_size_pretty(z.size)
    FROM (
        SELECT
            table_name,
            column_name,
            tc_column_size(table_name, column_name) size
        FROM
            information_schema.columns
        WHERE
            table_schema = 'schema') AS z
    WHERE z.table_name = 'table'
    ORDER BY
        z.size DESC;
          table_name           |         column_name         | pg_size_pretty
-------------------------------+-----------------------------+----------------
 store_sms_message_p2021_08_05 | delivery_time               |
 store_sms_message_p2021_08_05 | profiling_category_id       |
 store_sms_message_p2021_08_05 | profiling_category_rule_id  |
 store_sms_message_p2021_08_05 | number_lookup_request_id    |
 store_sms_message_p2021_08_05 | charge_on_status            |
 store_sms_message_p2021_08_05 | sid_rewrite_id              |
 store_sms_message_p2021_08_05 | category_list               | 8949 MB
 store_sms_message_p2021_08_05 | message                     | 1335 MB
 store_sms_message_p2021_08_05 | message_data                | 1334 MB
 store_sms_message_p2021_08_05 | country_id                  | 187 MB
 store_sms_message_p2021_08_05 | route_id                    | 187 MB
 store_sms_message_p2021_08_05 | message_id                  | 187 MB
 store_sms_message_p2021_08_05 | client_id                   | 187 MB
 store_sms_message_p2021_08_05 | client_connection_id        | 187 MB
 store_sms_message_p2021_08_05 | network_id                  | 187 MB

In our example, we can see that the category_list column occupy the most data.

Table of Contents