PostgreSQL 8.0 is the first major release of the open source database in over a year. Linuxworld talks to two of 8.0's key contributors, Gavin Sherry and Claudio Natoli, who also happen to hail from Australia, about why 8.0 is so significant; porting it to Windows, and reasons why mainstream vendors such as Fujitsu have taken a keen interest in the object-relational database.
So, what's new in 8.0?
Gavin: 8.0 is the most significant release of PostgreSQL ever. It combines five major features, each of which would justify a release, with a huge number of smaller contributions.
Savepoints, often called nested transactions, provide application developers with a sophisticated error recovery framework.
Tablespaces allow PostgreSQL administrators to determine where the data for a given database object resides on the file systems. This means that without any downtime, administrators can scale systems by dynamically shifting data onto different devices. It also means that they can map data to devices based on the requirements applications have of that data. Data which is regularly used can be placed on expensive storage, while data which is only used for out-of-hours report generation can use cheaper, lower performance storage. It's all about giving the administrator more control.
Complementing this, the buffer management system was entirely rewritten. There were two goals for this project: to increase throughput by using a new algorithm which can adapt as system usage changes and to silently synchronise buffers with data stored on disk to avoid the "checkpoint IO storm" issues experienced by users of other database systems (including previous releases of PostgreSQL). This means that not only has throughput been increased, the availability of the system is more consistent.
One of the most significant developments has been a native port of PostgreSQL to the Microsoft Windows platform. Up until now, those wanting to run PostgreSQL on Windows either had to use the Cygwin UNIX emulator or had to purchase one of a few commercial versions of PostgreSQL which were designed specifically for Windows. With much hard work PostgreSQL 8.0 now uses native Windows system calls, resulting in better performance and stability.
PostgreSQL 8.0 also dramatically increases reliability through point-in-time recovery (PITR). This allows administrators to archive logs of actual data modifications on to backup systems so that if data is destroyed maliciously or otherwise, or if data is lost due to a catastrophic hardware failure, PostgreSQL can be recovered to the exact moment before the problem took place. It also means that PostgreSQL can be run in a continuous backup mode, which dramatically reduces recovery times for large systems.
These five major features are bolstered by hundreds of other improvements, including a new Perl server-side scripting language, the ability for users to modify column types, supporting an enriched CSV import format to ease migration from other systems, significant improvements in performance for transaction processing and administrative tasks, improved logging facilities, dramatically improved array handling, exception handling in the PL/PgSQL scripting language and much much more.
How long has 8.0 been in development?
Gavin: Approximately 13 months.
How many developers worked on the release?
Gavin: As always in open source, it's hard to say. There are usually between three and four contributors employed full-time to work on PostgreSQL. However, we probably attracted contributions of varying sizes from over 200 people. For example, savepoints, tablespaces, PITR and the Windows port were all contributed by part timers.
In a departure from previous releases, I believe some of the developers got paid to develop some of the features for 8.0?
Gavin: Well, 8.0 is actually one of the first releases where some major contributions were sponsored by a major software vendor. I developed tablespaces for 8.0 with the financial assistance of Fujitsu; as did Alvaro Herrera, the developer of savepoints. Fujitsu was also very keen to help out with JDBC 3.0 compliance as well. This project runs alongside PostgreSQL development.
The Windows port sounds like a big job.
Claudio: It was. Major development started over a year ago. Going back further, there were discussions and some groundwork laid during the development of the previous version (v7.4), which is where a Windows port was first seriously considered.
How did the port develop?
Claudio: The biggest stumbling block to getting a Windows port developed was the lack of process forking under Windows, which is a mechanism by which a running process can effectively duplicate itself and its state in a new child process. This is how PostgreSQL creates, among other things, its database connection processes. We wrote Unix code, mimicking the model of child process execution that would be required under Windows, whilst leaving the original mechanisms intact. This allowed us to get all the issues with a new process execution model nicely sorted out, without having to simultaneously deal with porting issues. When this was complete, it was then a matter of replacing the new Unix code with the equivalent Windows API calls.
With this out of the way, and with a handful of temporary "stub" code in a number of other areas, PostgreSQL was able to be compiled under Windows, and even ran... in a fashion. Which was crucial, as it facilitated the remainder of the porting effort, as various people were then able and willing to work on the broken subsystems, like signals. Others worked on rewriting all of the PostgreSQL shell scripts, which would obviously be unusable on a virgin Windows installation, into C. Others still worked on Windows specific tools, like the MSI installer package.
By mid-2004, the porting effort was sufficiently advanced that it was clear that the next version of PostgreSQL would be available under Windows.
Why release a Windows version at all?
Claudio: Good question! Whilst there was and continues to be an emulated Cygwin version available, it was widely considered unsuitable for production use, advocates received technical push-back from clients, and so on. It was not a general replacement for a native version, and Windows has such a large installed base that it is a market which PostgreSQL could not afford to ignore.
In my case, our company required a single database solution across all our client platforms, which includes Windows. Despite being otherwise suitable PostgreSQL was unable to fill that role. I've since learnt that this was a very common situation faced by PostgreSQL advocates, and hopefully this port will help play a part in increasing the awareness and use of PostgreSQL.
Still no replication for PostgreSQL, however?
Gavin: PostgreSQL has supported a variety of replication models for a long time. There's a strong focus on developing these as extensions to PostgreSQL however, since people have many different ideas about replication. This has meant that replication tools for PostgreSQL have not seen the prominence they deserve.
However, Slony I, which was released during the 8.0 development cycle and targets 7.4 and 8.0, provides enterprise grade replication which is suitable for most users. It was funded and is used by Afilias, the company which runs .org and .info.
Slony I replicates data from a master to one or more slaves. These slaves can act has hot fail-overs or be used as read-only masters, depending on the application.
What kind of workloads can PostgreSQL handle now?
Gavin: Well, PostgreSQL was already being used for very large work loads before 8.0. Computerworld itself reported on Red Sheriff's 32TB PostgreSQL system. Open Source Development Labs (OSDL) -- http://www.osdl.org -- is regularly throwing hundreds of gigabytes of data at PostgreSQL to stress the system and to stress Linux.
PostgreSQL use is also growing amongst people who only have a very small workload. Its small footprint makes PostgreSQL ideal for installation on desktop or mobile systems.
However, now that we have these five headline enterprise features I expect to see strong growth amongst users wanting to store very large and very important data in PostgreSQL.
Pervasive Software recently announced they were getting into PostgreSQL, and you mentioned the involvement of Fujitsu, what other vendors are involved?
Gavin: Software Research Associates (SRA), a Japanese systems integrator, employs one of PostgreSQL's core developers and offers PostgreSQL professional services and products. Red Hat employs another core developer. There are other smaller vendors, including Command Prompt in the USA, not to mention hundreds of consultants in Australia and around the world.
Claudio Natoli is a Sydney-based software developer at Memetrics (www.memetrics.com). He is a major contributor to PostgreSQL.
Gavin Sherry is a software developer based in Sydney. He is a major contributor to PostgreSQL. He and his company, Alcove Systems Engineering, provide professional services to IDG Communications, publisher of this Web site, and to Fujitsu Australia Software Limited.