PostgreSQL is the RDMS (database system) that we use at Overloop.****
🏠 HOME
OUR STACK
Like any SQL database engine, PostgresSQL is a software that allows executing SQL commands.
To install it, just get the package using brew:
$ brew install postgres
Then, we can start executing SQL commands with the psql
command-line interface, or, by connecting it to our favorite backend system.
In order to use well the database, please follow the rules below:
Contact it as fewer as possible
Try to group your queries.
# Instead of doing:
SELECT * FROM table WHERE id = 1
SELECT * FROM table WHERE id = 1
# Do
SELECT * FROM table WHERE id IN (1, 2)
Make it return only what it needs
Unnecessary data can cause the database to transmit a lot of data over the network and so, answer slowly.
So, prefer doing: SELECT id FROM ...
, than SELECT * FROM ...
Monitor it frequently Like any complex engine, it can sometimes be slow or work a lot in unexpected situations. So, that’s important to check periodically that the DB is going well using the monitoring tips given below. Check that all tables have a reasonable size, that indexes are used, that queries are fast, …
If you are using Postgres on Heroku, you can just run the command below to have some basic insights:
$ heroku pg:diagnose -a app-name
To clean the data cache, just run:
$ heroku pg:stats_reset -a app-name
To get the size of all tables
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 25;
To get the size of all indexes:
SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
s.idx_scan as usage
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE 0 <>ALL (i.indkey) -- no index column is an expression
AND NOT i.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint c
WHERE c.conindid = s.indexrelid)
AND s.relname = 'table_name'
ORDER BY pg_relation_size(s.indexrelid) DESC;
-- Index hit rate
WITH idx_hit_rate as (
SELECT
relname as table_name,
n_live_tup,
round(100.0 * idx_scan / (seq_scan + idx_scan),2) as idx_hit_rate
FROM pg_stat_user_tables WHERE (seq_scan + idx_scan) > 0
ORDER BY n_live_tup DESC
),
-- Cache hit rate
cache_hit_rate as (
SELECT
relname as table_name,
heap_blks_read + heap_blks_hit as reads,
round(100.0 * sum (heap_blks_read + heap_blks_hit) over (ORDER BY heap_blks_read + heap_blks_hit DESC) / sum(heap_blks_read + heap_blks_hit) over (),4) as cumulative_pct_reads,
round(100.0 * heap_blks_hit / (heap_blks_hit + heap_blks_read),2) as cache_hit_rate
FROM pg_statio_user_tables
WHERE heap_blks_hit + heap_blks_read > 0
ORDER BY 2 DESC
)
SELECT
idx_hit_rate.table_name,
idx_hit_rate.n_live_tup as size,
cache_hit_rate.reads,
cache_hit_rate.cumulative_pct_reads,
idx_hit_rate.idx_hit_rate,
cache_hit_rate.cache_hit_rate
FROM idx_hit_rate, cache_hit_rate
WHERE idx_hit_rate.table_name = cache_hit_rate.table_name
AND cumulative_pct_reads < 100.0
ORDER BY reads DESC;
Run the snippet below to find all slow queries:
select query,calls,total_time,min_time,max_time,mean_time,stddev_time,rows
from pg_stat_statements order by mean_time desc;
The data used to find them can be reset using:
SELECT pg_stat_statements_reset()
To find the usage of a single table (not studying a particular slow query), run:
SELECT * FROM pg_stat_user_tables WHERE relname = 'table_name' ORDER By relname;
When you have found a slow query, run the EXPLAIN command (see doc) to find the origin of the problem.
EXPLAIN query # Explain the query plan of the query
EXPLAIN ANALYZE query # Explain the query plan and run the query
EXPLAIN (analyze,buffers,costs off) query # More detailled explain call
You’ll obtain a detailed query plan ( explanation of what does the DB do to run a particular query). For more information about how to read it, check this article.
Very often, problems of performance in the database can be explained by the lack of an index. In that case, the query plan will look like:
QUERY PLAN
Limit (cost=0.09..240.66 rows=2 width=501)
-> Index Scan Backward using table_pkey on table (cost=0.09..209539.43 rows=1742 width=501)
Filter: (foreign_key_id = 2)
It states that the DB is doing:
This kind of plan is very slow, because it has to browse a whole index to find a particular record.
Adding an index will change it to:
Limit (cost=0.09..4.09 rows=1 width=501)
-> Index Scan using table_pkey on table (cost=0.09..4.09 rows=1 width=501)
Index Cond: (id = 2)
Here, it’s much faster because instead of filtering the index, it’s putting a condition (i.e. using the index), thus, increasing the DB performance.
Sometimes, you can see that some requests may experience timeouts, or slowness. This can be discovered by seeing logs that indicate “ERROR: canceling statement due to statement timeout”, or, because some features appear to be slower than expected.
There can be many causes for that. But, generally, you can use the procedure below to find what is going wrong.
First step of the investigation is to find the query that is running slow.
To do that, search in the logs for the following statement:ERROR: canceling statement due to statement timeout
.
Around that, you will find the exact query causing the problem.
It can be something like:
SELECT COUNT(*) FROM mytable WHERE columnname = ?
Once you’ve got the problematic query, try reproducing the issue, replacing the placeholder values with values that are issuing the same slow query.
You should do that, if possible, on the same environment on which was running the original query, since the amount of data which caused the problem can be part of the answer.
Note that for some queries, running it once can make subsequent calls faster, thanks to the cache. So you must find the right set of parameters to reproduce the problem.
Then, you need to analyze the query.
Generally, you can solve the problem using the section Optimize a single query in the above Monitoring section, using a simple ANALYZE command. If the problem was due to a missing index, you can finish here.
If the previous step did not work, you’ll have to look after the root cause elsewhere.
Try making a time correlation analysis, by having a look at the general state of the database, other bugs, etc… to see if the problem is not due to the timeframe of the request.
An example can be that the DB is particularly overloaded at a particular moment in time.