Adventures in PostgreSQL
We’re using PostgreSQL as our database at Klaggle. My respect for Postgres continues to build, this is the first project I’ve used Postgres with, so the respect may be more fear than anything. There seems to be a black art to Postgress, and the body of knowledge out there is far from organized although it is “out there.” Here’s my part in adding to the obfuscation.
Performance Tuning:
Postgres has been around awhile, sometimes you find a clue from the days when 512MB RAM was a mind blowing quantity and the direct extrapolation, especially after a couple days of looking for said clue, is so non-intuitively opaque you wonder if in fact you’ve found a clue. The first thing in tuning Postgres is to realize that, for whatever reason, the Postgres gods have an “out of the box” installation assumption of a minor footprint. From an engineering perspective, especially these days, you have to respect the capability of Postgres to run well with constrained resources and for the this ability to be, de facto, demonstrated upon firing Postgres up for the first time. From a practicality perspective it’s horseshit because occasionally quasi-turn-key brilliance is appreciated.
Before reading my thoughts, here are some of the most valuable links I’ve found regarding Postgres and performance tuning. My intent here isn’t to express my prodigal brilliance in Postgres, it’s to attempt to organize what I’ve found. I’m an idiot. Just ask my wife. But here’s some nuggets from a few that are far smarter than I:
http://www.ca.postgresql.org/docs/momjian/hw_performance/0.html
http://varlena.com/GeneralBits/Tidbits/annotated_conf_e.html
http://www.freebsddiary.org/postgresql-analyze.php
http://www.amd.co.at/adminwiki/PostgreSQL
Additionally, there are two vital tidbits to understand:
1. Postgres runs under a multi-process paradigm as opposed to a multi-threaded paradigm. Fixing that in your mind goes a long way towards researching how to tune your hardware, OS, etc.
2. Hierarchical data relationships can be absolute murder your database performance. M-U-R-D-E-R it. Okay? So if you’re dataset is dependent on maintaining anscestor/child contexts start rewriting your schemas. (More on that later)
First steps - Hardware:
We’re in start-up mode. I found a Generation 3 HP DL380 with dual 3.2Ghz XEON procs, ordered some refurbished parts to beef things up including 12GB of RAM to make a decent platform all for about $675.00. Checkout http://www.pacificgeeks.com who I found, after the fact, and who last time I looked is selling the same kit fully loaded for $500.00.
Before you bang your head any further trying to load Ubuntu 8.10 on your HP with a 5i controller just stop and burn Ubuntu 8.04 to a CD and start over. 8.10 doesn’t write the GRUB config to disk and although the installation goes flawlessly, it’ll never boot. 8.04 is fine, it’s supported until 2013. I’m betting a firmware upgrade to the 5i controller would fix things but two failled downloads of the ISO from HP and I moved on.
It made sense so I did it. The server has 6 73GB 10k RPM drives. I mirrored disk0 and disk1 for the OS. Disk2 through disk4 are RAID 10 for the database(s). The basic consideration here is that logging, OS/System calls aren’t competing with the database. Also, Postgres benefits from a number of spindles and this config gives the speed of striping with the redundancy of mirroring.
15,000 RPM drives would have been better but the price difference is significant especially when you’re you’re own investor at the time.
Second Steps - Kernel Tweaks:
You have to make a couple changes to the kernel to increase shared memory availability first. Obfuscation #1, “Calculating SHMMAX and SHMALL in 2009″:
kernel.shmmax=2202671224
kernel.shmall=2202671224
Then make changes to postgresql.conf:
- shared_buffers
- effective_cache
Postgres’ daddy, Paul Momijian, recommends that effective cache be sized to fit your most active tables. If you’re going to be spending anytime living with Postgres I highly recommend injesting most of what he says as gospel. His postgres blog is here - http://momjian.us/main/blogs/pgblog.html
The top 4 tables I have total (as of writing this) roughly 1.1GB so I’ve set my effective cache to 1.6GB for now. Supposedly Linux is very effective at caching what is relevant so for now I’ve got my shared_buffers set to 256MB. The SHMMAX/SHMMALL configuration was calculated assuming this:
250kb + 8.2kb * shared_buffers +14.2kb * max_connections
For an idea of the performance impact of these minor changes, I had one query pattern that did several joins, sorts and counts that took an average of 1550ms to execute. After modifying these memory settings that query now executes in 550ms.
This isn’t finished. Ask me to bitch about TurboGear’s ORM, SQLAlchemy…
comment
Please Leave a Reply
TrackBack URL :
You are not an idiot. I love you.