I have a few goals in mind for this weekend (wait, it's Saturday afternoon already? Well, you know, the rest of this weekend):
- build my first presentation with reveal.js for my upcoming talk at Postgres Open in Dallas
- watch the Green Bay Packers win their season opener tomorrow at noon
- crush my foolish opponents in Fantasy Football
But those can all wait until I relate to you this story. Imagine you are a database administrator. Now, some of you may not know what a database administrator (dba) does, but that's not too hard. A developer writes code, a sys admin keeps the data center running, a manager makes it possible for these people to work without interruption; but you're supposed to imagine being a dba. A dba worries only about the database.
To make the database happy a dba will always want one of a few things:
- More space. Database admins are always going on and on about how there's not enough space, and the business should learn how to archive old data, and has anybody ever heard of a Data Warehouse?
- An upgrade. Most dba's seem to think that by getting "new features" and "performance improvements" they will somehow find contentment. But they will never be content, because the new Alpha has a newer feature.
- More RAM and more cores. Always.
- More space. Yes, the database just kept growing while I was writing this, and we don't have enough room for all these backups. Tell me where to move these backups, and so help me if you say /dev/null I will slug you.
One day you are presented with an opportunity. You can migrate the database to a new server where you will be given more space, more memory, more compute and a chance to upgrade to the newest version. Hoisted by your own petard, you must now perform the migration with minimal downtime.
But you run the world's most advanced open source database in PostgreSQL, and that's not actually a very hard proposition if the database is on version 9.0 or newer! Version 9.0 came out almost exactly five years ago, so hopefully you are somewhat with it. And I'm not judging if you're not, but you need to catch up. Unsupported databases and Production environments just don't mix.
Anyway, in version 9.0 PostgreSQL implemented streaming replication, which has since become the de facto standard for enterprise database availability. Streaming replication is a powerful feature which allows you to build a readable clone of your production database, that keeps up with changes rapidly. Just how rapidly can depend on many factors, but that's not the point of this story.
In this story, you enable streaming replication from your primary (the old database server) to your new hardware. On the night of the cutover, when you're ready to upgrade, you: stop web traffic for a few minutes, promote your replica to master and shut down the old service (don't want anybody connecting to that), run pg_upgrade -k to upgrade without copying a bunch of files around (no backing down now), start the new cluster, and analyze your tables. If you've built a nice maintenance window in, you might take the time to VACUUM some big tables or rebuild bloated indexes while nobody's connected, but it's not a must. When you're done, the app starts up, and whoever was responsible for changing all the connection strings (wait, was that your job? I thought chef would handle it. . .) did so, and the app is now happy because the database is happy.
If you're really good at your job, you've practiced this a few times before doing it for realzies to make sure there are no quirks to the upgrade process, but eh. If you want to walk on a tightrope in your imaginary career as a dba, that's your problem. You can take it up with the imaginary CTO after you have to roll back four hours in because you hadn't realized the extension system changed between 9.0 and 9.4, and now this new production database won't load data because of "citext"!? *
Everybody is happy. The production app is running, and you can now set up a new streaming replica from this, the new primary, since the laws of high availability dictate you should have at least two of everything. You sync the data directories for the first time, eager to start up your replica, only it won't start. It complains:
ERROR: could not open file "pg_xlog/00000002.history": No such file or directory FATAL: could not receive timeline history file from the primary
What happened? In fact, PostgreSQL has - or rather had - a bug of exactly this fashion. When PostgreSQL promotes a new primary server, it creates a marker of the timeline split in the form of a small text file placed in the WAL file directory. This file makes it possible to achieve Point-In-Time-Recovery under some rather complex failover and fail-back scenarios. It cannot, however, be used to recover to a point in time before the upgrade, and it knows this. So pg_upgrade deletes the .history file. This isn't a problem for the operation of the new primary. But it makes it impossible to start a streaming replica, for lack of a file that's probably less than 100 bytes. Or at least, it did before Bruce Momjian patched the bug and back-ported the fix through 9.0
But before that you were stuck. If you spend some time reading the recovery documentation you may come across recovery_target_timeline and think that setting this to 'latest' will help. But it doesn't. Instead it makes the replica think there is a newer history entry, and it will just start complaining about wanting "pg_xlog/00000003.history" now. So it seems that you will have to recreate the file. You can find a very nice summary of the .history file on the Postgres wiki. Since the info is in .pdf, though, it tends to be harder to index, so you may have trouble locating the doc if you don't already know it's there.
Which brings me in a rather roundabout fashion to what I asked at the very beginning of this entry (how long ago was that? Good god, Lemon): are you familiar with the internals of the .history file? Probably not, but they're very simple. A numbered history file contains an entry for each previous point in the WAL record where a new timeline began. Sound hard? It's not.
That's it. One digit, a magic number, and some comments. Now about that magic number..? If we actually needed the ability to recover back to a different timeline, we'd need the correct address. But we're never going to go back to that timeline, because it's before our upgrade. All we'd need to recreate a lost file is a good enough number. And you can get one by running:
# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
1/38F70328
(1 row)
Mock up a .history file in your WAL directory with these values, et voila. The replica will immediately be able to start.
But before that you were stuck. If you spend some time reading the recovery documentation you may come across recovery_target_timeline and think that setting this to 'latest' will help. But it doesn't. Instead it makes the replica think there is a newer history entry, and it will just start complaining about wanting "pg_xlog/00000003.history" now. So it seems that you will have to recreate the file. You can find a very nice summary of the .history file on the Postgres wiki. Since the info is in .pdf, though, it tends to be harder to index, so you may have trouble locating the doc if you don't already know it's there.
Which brings me in a rather roundabout fashion to what I asked at the very beginning of this entry (how long ago was that? Good god, Lemon): are you familiar with the internals of the .history file? Probably not, but they're very simple. A numbered history file contains an entry for each previous point in the WAL record where a new timeline began. Sound hard? It's not.
# cat 00000002.history 1 1404/A0000017 Some comments about when the timeline split.
That's it. One digit, a magic number, and some comments. Now about that magic number..? If we actually needed the ability to recover back to a different timeline, we'd need the correct address. But we're never going to go back to that timeline, because it's before our upgrade. All we'd need to recreate a lost file is a good enough number. And you can get one by running:
# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
1/38F70328
(1 row)
Mock up a .history file in your WAL directory with these values, et voila. The replica will immediately be able to start.
Be seeing you!
* Please note that upgrading extensions from pre 9.1 versions of PostgreSQL is generally a very simple matter, and I don't mean to imply that you'll have problems with citext or any other extension. Generally, the toughest thing you'll have to do is download a patch and run CREATE EXTENSION <<extension>> from UNPACKAGED;
And you can handle that. Just don't wait until the night of the upgrade to find out you need a patch and have to start looking. Plan ahead, imaginary dba. Gosh.