On Monday October 5, Sal Vella, IBM VP of DB2 development, introduced DB2 pureScale to the attendees of the IDUG Europe conference in Rome. Matt Huras, IBM Distinguished Engineer and the architect of DB2 pureScale, provided an in-depth technical description of DB2 pureScale in his conference session. Today, IBM followed up with a press release highlighting this new and exciting technology. Since press releases in general don’t provide much in a way of detail I decided to post some information on this blog based on what Sal Vella and Matt Huras talked about at IDUG Europe. This is not going to be a comprehensive account of what Sal and Matt presented. For that you needed to attend IDUG Europe conference or the upcoming IOD Conference in Las Vegas to hear Sal and Matt or join us for a DB2 Chat With the Lab at (https://events.webdialogs.com/register.php?id=4e91bbf910&l=en-US).
What is pureScale? Simply put it is a marketing name for the new technology for creating clusters of DB2 servers. Describing it as “new” is also a bit of a misnomer since the implementation is based in large part on DB2 for z/OS Data Sharing and SYSPLEX technology that has been in use by world’s largest organizations for about a decade now. This technology is considered by many to be the gold standard for delivering reliability and scalability and is now available for UNIX database servers.
This brings us to the three key design goals for DB2 pureScale:
- virtually unlimited capacity,
- application transparency, and
- continuous data availability
But before we go any further, it is important to understand that the focus of DB2 pureScale is providing scalability and reliability for transactional (OLTP) workloads. If you are looking for data warehousing pureScale is not for you, but I do recommend that you take a look at the DB2 Database Partitioning Feature (DPF). DPF provides an excellent well proven technology for building out very large clusters of database servers that can tackle most demanding data warehousing tasks by clustering hundreds of nodes.
More on DPF later … back to pureScale. How does pureScale deliver scalability, transparency and availability? There are two ways in which one can scale IT resources:
- vertical and
Vertical scalability means you are able to add DBMS resources by increasing the size of the database server. In other words you have some headroom to add processors and memory or to upgrade to a larger server with more processor and memory capacity. Most important though is that your DBMS software is able to use the extra resources and is able to improve its performance as more resources are made available. The perfect scalability is “linear” i.e. performance of the DBMS increasing by the same factor as the increase in server resources. For example, you expect the performance of your database to double if you moved from a two processor server with 2GB of memory to a server with four processors and 4GB of memory. This is a fact that is not lost on many who have struggled with MySQL scalability issues for example where simply increasing amount of resources does not lead to proportional increase in performance. There are even cases where performance may decrease (negative scalability). I am not picking on MySQL here. Vertical scalability is a tough challenge to solve for many DBMS and it is one area where commercial databases, like DB2 have been very strong.
Vertical scalability however, has its issues. For one thing, adding extra resources can is often a “forklift upgrade” i.e. you actually have to get a brand new server just to add extra resources. Also, the inexpensive servers based on Intel/AMD x86-x64 architectures are limited in their vertical scalability. That is why people who need that extra head room typically settle for UNIX servers from IBM (System p with AIX), HP (HP-UX) and SUN (Solaris on SPARC). As you would expect, more scalable servers do cost more. And, as scalable as they are, these servers are not unlimited in their capacity; especially if your budget has a finite limit. This is why horizontal scalability is so desirable. Horizontal scalability means being able to add database capacity by adding more servers. It is desirable because you can use much cheaper smaller servers; you can add capacity in small increments. In theory, this scalability is unlimited; you can start with one server, and just keep adding one inexpensive server at a time to your DBMS server cluster as your workload increases. In practice, this is a very difficult thing to achieve for database servers. Web and application servers don’t have an issue with horizontal scalability because they don’t keep shared data. To achieve horizontal scalability for databases on the other hand, we have to be able to make many servers be able to work on shared data at the same time. (I will leave the use of replication as the way to achieve scalability for a different post). The trick of course is to be able to maintain linear scalability as more and more servers are brought in to operate this shared copy of the data. There aren’t that many DBMS systems that have been able to achieve this. IBM DB2 for z/OS (its Data Sharing facility) is one such system. It is considered by many to be the gold standard for implementing database clustering. It is this architecture that IBM DB2 pureScale borrows to bring this capability to UNIX database servers. The result is a very impressive demonstration of near linear scalability to well over a hundred servers.
There is however one very significant difference between DB2 pureScale on UNIX and DB2 for z/OS. DB2 for z/OS has the luxury of using specialized IBM System z mainframe hardware called the Coupling Facility which provides an incredibly useful facilities for synchronizing locking between multiple members of the DB2 for z/OS cluster (called “Data Sharing Group”). It maintains a cluster-wide timer (this is really important) and helps DB2 keep track of data not just on the shared disk (this part is relatively easy) but data that may be in the caches (called “Bufferpool” in DB2) in each of the servers. To bring this capability to UNIX systems, DB2 pureScale implements Coupling Facility completely in software. It is called “PowerHA pureScale”. The use of “power” in the name is in reference to the POWER processor architecture that is used in the IBM System p UNIX servers. This technology was developed in close cooperation between the DB2 and the IBM System p teams. This cooperation produced a solution with scalability and availability results that speak for themselves. However, at this time, DB2 pureScale requires very specific server configuration that includes IBM System p running AIX (UNIX) operating system and use of high bandwidth low latency networking called Infiniband. Other types of systems are sure to come but at this time pureScale is limited to these well defined configurations.
Making scalability transparent to applications is paramount to the effective use of the new capacity. After all, what use is the extra capacity if to take advantage of it you have to alter your application? pureScale promises that any application can take advantage of the added capacity without any changes or tweaking of the application code. This is not at all true for other database clustering technologies such as Oracle RAC which may require extensive application changes and tuning for efficient use in a cluster.
Maintaining continuous access to data is just as important, some say more important, than scalability. DB2 pureScale aims to provide applications with continuous access to data even when one or more database nodes in a cluster go down. As long as there is at least one active database server available, applications will continue to function, albeit with reduced performance. And when failed members of the cluster are brought back on line, they are ready to take on workload and bring back the performance of the overall system. This highly levels of availability are delivered by the PowerHA pureScale technology and very innovative techniques for managing transaction state and cached data. It would take way to much room in this post to describe these techniques. I recommend taking a look at Matt’s presentation at the IDUG Europe or better yet, attending his session at the IOD in Las Vegas. Also, do not forget to register for the upcoming Chat with the Labs on the subject of pureScale (https://events.webdialogs.com/register.php?id=4e91bbf910&l=en-US).
Meanwhile, watch the interview with Sal and Matt and hear them talk about this exciting new technology.