Monday, April 14. 2008Tuning PostgreSQL for Evergreen on a test serverUpdate 2008-05-01: Fixed a typo for sysctl: -a parameter simply shows all settings; -w parameter is needed to write the setting. Duh. Once you have decided on and acquired your test hardware for Evergreen, you need to think about tuning your PostgreSQL database server. Once you start loading bibliographic records, you might notice that after 100,000 records or so that your search response times aren't too snappy. Don't snarl at Evergreen. By default, PostgreSQL ships with very conservative settings (something like machines with 256 MB of RAM!) so if you don't tune those settings you're getting a false representation of your system's capabilities.
The "right" settings for PostgreSQL depend significantly on your hardware and deployment context, but in almost any circumstance you will want to bump up the settings from the delivered defaults. To give you an idea of what you need to consider, I thought I would share the settings that we're currently using on our Evergreen test server at Laurentian University. You might be able to use these as a starting point and adjust them accordingly once you've run some representative load tests against your configuration. And it's useful documentation for me to fall back on in a few months, when all of this has escaped my grasp The defaults (as shipped in Debian Etch)The defaults in Debian Etch are quite conservative. Consider that our test server has 12GB of RAM. The default only allocates 1MB of RAM to work memory (which is critical for sorting performance) and only 8MB of RAM to shared buffers. Following are the defaults set in /etc/postgresql/8.1/main/postgresql.conf: # - Memory - #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each #temp_buffers = 1000 # min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). #work_mem = 1024 # min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes each Our test server settingsOur test server has 12 GB of RAM. Assuming that the PostgreSQL defaults were set for a system with 1 GB of RAM, we should be able to multiply the memory-based settings by at least a factor of 12. We're a little bit more aggressive than that in our settings. Note, however, that this is a single-server install of Evergreen, so we're also running memcached, ejabberd, Apache, and all of the Evergreen services as well as the database - oh, and a test instance of an institutional repository, among other apps - so we're not nearly as aggressive as we would be in a dedicated PostgreSQL server configuration. Please note that I'm making no claims that this is the optimal set of configuration values for PostgreSQL even on our own hardware! # shared_buffers: much of our performance depends on sorting, so we'll set it 100X the default # some tuning guides suggest cranking this up to as much 30% of your available RAM shared_buffers = 100000 # 8K * 100000 = ~ 0.8 GB # work_mem: how much RAM each concurrent process is allowed to claim before swapping to disk # your workload will probably have a large number of concurrent processes work_mem=524288 # 512 MB # max_fsm_pages: increased because PostgreSQL demanded it max_fsm_pages = 200000 After you change these settings, you will need to restart PostgreSQL to make the settings take effect. Kernel tuningIn addition to PostgreSQL complaining about max_fsm_pages not being high enough, your operating system kernel defaults for SysV shared memory might not be high enough to support the amount of RAM PostgreSQL demands as a result of your modifications. In one of our test configurations, we had cranked up work_mem to 8GB; Debian complained about an insufficient SHMMAX setting, so we were able to adjust that by running the following command as root to set the kernel SHMMAX to 8GB (8*1024^2): sysctl -w kernel.shmmax=8589934592 To make this setting sticky through reboots, you can simply modify /etc/sysctl.conf to include the following line: # Set SHMMAX to 8GB for PostgreSQL #kernel.shmmax=8589934592 Other measuresDebian Etch comes with PostgreSQL 8.1. The first version of PostgreSQL 8.1 was released in November 2005. That's a long time in computer years. Version 8.2, which was released less than a year later, "adds many functionality and performance improvements" (according to the release notes). If you're not getting the performance you expect from your hardware with Debian Etch, perhaps a backport of PostgreSQL 8.2 would help out. Further resourcesThis is just a shallow dip into PostgreSQL tuning for Evergreen - hopefully enough to alert you to some of the factors you need to consider if you're putting Evergreen into a serious testing environment or production environment. Here are a few places to dig deeper into the art of PostgreSQL tuning:
Sunday, September 16. 2007The Linux Documentation Project, DB2, XML, and nostalgiaI took a walk down memory lane this evening. I thought I might as well bore you with the details. One of my first forays into the open source world was to participate in the Linux Documentation Project (TLDP). At the time (circa 1999), I was working for IBM as a technical writer for DB2 database. IBM was releasing DB2 on the Linux platform, I was part of the pre-release testing team, and I had turned to TLDP to provide me with an introduction to the world of Linux as a total n00b. It was a godsend of information. When DB2 was officially released on the Linux platform, it only officially supported a handful of distributions. Given the normal technical writing and release cycle, the officially supported versions of the distributions were woefully out of date in the official documentation. That, and much of the required installation and configuration information was either missing, or wrong. Don't lay any blame on the people involved; that was just the way that the release process (including translation into umpteen languages that were all available on release day) forced the end product to be. My focus was on application development, but I had to get test environments set up so I could ensure what I was writing actually worked (that's the way I roll as a tech writer). Of course, I chose an unsupported-by-DB2 but much more current distribution (Mandrake Linux 5.3 "Venus" I believe) simply because it would install on my hardware, when Red Hat 5.2 would not. It struck me that my install experiences would help other DB2 users as well. I realized it would also give IBM a way around the barrier imposed by the restriction that the official documentation for a given release was published once per release - no updates. By contributing a DB2 HOWTO to TLDP, I would not only be able to provide documentation on the distributions that people were actually using, I would also be able to update the HOWTO as circumstances warranted. My manager supported the project, and helped me stickhandle some obstacles. The result, I believe, was beneficial all around; I contributed some code to TLDP to help improve the PDF output and helped mentor some TLDP n00bs; DB2 got some usable documentation when it really needed it; and I had the opportunity to learn a technical writing DTD that made sense (DocBook) and play with an impressive open-source publishing toolchain. Over time, my friend and co-worker Ian Hakes picked up the ball and drove the next iteration of the DB2 HOWTO with my help. It has been over a year and a half since I left IBM, so I haven't paid any attention to the DB2 HOWTO. Recently, however, as I was playing around with an updated version of the DocBook toolchain, I discovered that Ian has released a brand new version of the DB2 HOWTO to cover installation of DB2 Express-C on various distributions. He included a touching tip of the hat to me, as well. What a swell guy! On one hand, I'm not sure that TLDP has nearly as much of a mandate as it did eight years ago. There are scads of books, a handful of good magazines, blogs, wikis, and web sites all publishing information about Linux these days. On the other hand, there's something to be said for a corpus of documentation maintained and edited by volunteers who just want to get information into the hands of people who need help -- without compensation, without publicity, and generally without thanks. So, given how much TLDP has helped me - thank you, TLDP volunteers. And thank you, Norm Walsh and the entire DocBook community, for providing an open-source publishing toolchain that starts with semantic XML and results in professional-looking documentation. P.S. I made another commit to the DB2 HOWTO tonight - just balancing out an XML element that was missing to return the document to valid XML state. And let me tell you, it felt good! Monday, December 11. 2006Java 6 is out; now including JavaDB (aka Apache Derby)A quick little note to mention the official release of Java 6. Okay, Dan, but why do you care?, you might ask. Good question, oh person-who-does-not-read-headlines. The reason I care is that Sun chose to bundle Apache Derby in this release -- take a peek in the /db/ subdirectory of the Java SDK. Bundling Derby is going to mean a huge boost to the visibility and usage of the little Java database that could. It will be the de facto default database for Java developers; and if they haven't already used it, I suspect they're going to be pleasantly surprised at Derby's robustness and ability to perform. I was chatting with a few of the DSpace developers a week or so ago, and mentioned my hope (in all my spare time) to port the DSpace institutional repository to Derby as a possible default database. Right now, you see, the default database for DSpace is PostgreSQL, and unfortunately correctly configuring PostgreSQL seems to be the biggest barrier new users encounter while deploying DSpace. Switch to an embedded Derby database, and those headaches go away. On the other hand, it seems that at least one of the DSpace developers have done a bit of experimenting with Derby in the past, as he claimed its performance suffered after 500,000 rows of data or so. Well, even if that is an insurmountable limit, that's a pretty good start for most institutional repositories -- and I suspect that the Derby developers would be highly motivated to show that Derby can, in fact, scale beyond that limit. So, if you're a Java developer or dabbler, get on out there and give Derby + Java 6 a try. You're going to have a lot of company. Oh yeah, and if you need a good book on Derby... Tuesday, November 14. 2006Neat-o: Archimède uses Apache DerbyA while back I mentioned on the DSpace-devel mailing list that I was interested in adapting DSpace to use embedded Apache Derby as the default database, rather than PostgreSQL, as a means of lowering the installation and configuration barriers involved with setting up access to an external database. I haven't had time yet to actually carry out my musing, but today I had the chance to set up the Archimède institutional repository on a test server -- and imagine my surprise when I saw a derby.log file sitting in the Archimède repository. It looks like someone else at Université Laval had the same idea as me much further back. It's still on my horizon to adapt DSpace to Derby; seeing that it works well for Archimède confirms my belief that it's the right direction to go.
(Page 1 of 4, totaling 14 entries)
» next page
|
QuicksearchAbout MeI'm Dan Scott: barista, library geek, and free-as-in-freedom software developer.
I hack on projects such as the Evergreen
open-source ILS project and PEAR's File_MARC package .
By day I'm the Systems Librarian for Laurentian University. You can reach me by email at dan@coffeecode.net. Identi.ca microblogging
LicenseCategories |
