FreeDB2.com

… on DB2 Express-C and other free databases

FreeDB2.com header image 1

The demise of the open source fanatic

August 3rd, 2008 · 3 Comments

I spent the last week in Portland at the O’Reilly Open Source Convention also known as OSCON. We were part of the IBM exhibit talking to people about DB2 Express-C. OSCON is an interesting conference. Basically it is a big lovefest for the open source guys. Imagine about 2500 geeks all assembled in a single place for a week to talk about free software. The best part for me is that OSCON is not fixated on Linux. There were plenty of tutorials and talks on PHP, Python, Django, Ruby on Rails and of course databases. As one would expect there was quite a bit of MySQL presence and PostgreSQL, Ingres and EnterpriseDB all either had sessions or booths on the exhibit floor or both. And then there was the IBM booth (that you could not really tell was IBM) with DB2 Express-C.

I have to admit that talking up a non-open source product at the open source conference was a bit intimidating. In the past, I have on occasion run afoul of the people I call “open source fanatics” and I was a bit afraid that we would be going in to a hive of these and stirring the nest so to speak. In the end, I lived through it and I did not encounter any of the fanatics at the conference or at least none of them threatened me with physical harm. I hope this is a sign that fanatics in the open source are nearing extinction and are evolving in to more rational pragmatic bunch or at least are content not to force their views on the rest of us.

So, what do I mean by the open source fanatics? Have you ever met a person for whom commercial software is evil or the one that would argue that open source is the better way to produce quality software? I am not talking about these people. These are just opinionated folks. I am also not talking about those who thrive on making fun of the large software companies. Hey, making fun of Microsoft and Windows has become a sport. And there was some of that at the conference but it was all in good fun. I should mention that Microsoft was a sponsor and they had good presence with IronRuby. The people I am talking about are the ones that take offense to the fact that someone would dare to even compare a commercial product to an open source project. On several occasions I had people jump on me (figuratively speaking) when I compared DB2 Express-C business model to the very popular open source business model of a free license and a low cost support subscription. When someone considers a mere attempt at comparison to be an assault by the big bad corporation on the defiant and honorable open source project to me that is a clear manifestation of fanaticism. So, at the conference we talked to a lot of people and I made sure to say right upfront that DB2 Express-C is not open source and despite what may have been said in the press we have no plans to open source it. Conditioned by the verbal assaults on the forums, every time I said this I instinctively was raising my arms to guard my face and body. I continued to make the comparisons with the open source databases and I stand by it. DB2 Express-C is free and optional low cost subscription is available. And, unlike other “Express” databases from Microsoft and Oracle DB2 Express-C does not have crippling limitations on the size of the database (4GB user data limit for both Oracle XE and Microsoft SQL Server Express). So from this point of view DB2 Express-C is not much different from MySQL or Ingres. What I thought was great at this conference is that not a single person thought that not having access to DB2 Express-C source code was an issue for them. I guess the number of people who want to tinker with the database engine is not all that big. I think this is in line with what MySQL is seeing. Monty Widenius, co-founder of MySQL, talked about very low code contribution from the MySQL community. The plan is only 8 patches from the community in 2008.  On the other hand, I found the feedback to the model of free license and optional support subscription to be universally positive. In general, most people who came by to talk to us were genuinely surprised that IBM had an offering like that and even more that this offering was so function rich and was not crippleware they may have expected. When we talked about the hybrid relational-XML storage engine and data access most people were not aware that there was such technology available. The fact that it was part of the free DB2 Express-C was a huge surprise.

Another thing led me to believe that demise of open source fanatics was close at hand were many comments who saw great value in a large corporation like IBM offering free products. Many commented that it brought legitimacy to the business model that many in the open source community rely on. People also really liked that they could take DB2 Express-C and bundle it in to their solution open source or commercial and have the reputation and clout of IBM be on their side.

Despite the fact that the radicalization of the open source community in my opinion seems to be on the decline, I was still not entirely in agreement with some of the messages from the icons of the open source movement. Every keynote had some sort of a statement on the open source being the engine of innovation. It was almost as if the speaker after speaker were trying to say that without open source there would be no innovation. I know I am going to get some hate mail on this but I will say it none the less. I actually believe that overall open source has not been very innovative. Many of the open source projects are simply implementations of very successful commercial products. Don’t take me wrong, I think it someone taking the time and energy to provide a free alternative to people who could not otherwise be able to purchase a product is great. I think that putting your code out there for everyone to see and critique is gutsy. I just don’t think that this is the definition of innovation. There was one moment that really brought home this contradiction around innovation. Mark Shuttleworth (Ubuntu guy) praised open source community as great innovators and then called on them to build a desktop interface to Linux that would rival that of Mac OS X. I think it would be great to have Mac OS X interface brought to Linux but, hey, give the credit where credit is due. Apple is the real innovator here.

Overall, I was really liked OSCON this year. I thought it was full of great sessions with really funky titles and a great open exchange of ideas. I was surprised and very glad not to be in a middle of some ideological debates on the value of open source or have the dogma of FOSS (Free Open Source Software) shoved down my throat. If anyone has a chance to attend OSCON 2009, I would very much recommend it.

Share/Save/Bookmark

→ 3 CommentsTags: Conferences · DB2 · EnterpriseDB · Ingres · MySQL · Open Source · Oracle · PostgreSQL · Random thoughts · SQL Server

Tell us what you really think about DB2 Express-C

July 23rd, 2008 · No Comments

Consumability survey is still available. Why not tell us what you think about your experience with getting and using DB2 Express-C. If you still don’t have DB2 Express-C you can download and use it free for as long as you want and you can store as much data as you want. Really, I am not kidding. There is no catch. This is not a beta or a time-limited trial. This is a real free product. or as DB2 Express-C tag line would say “Free to Build, Deploy Distribute”.

If you decide to do the survey, and I hope you do, choose your language and be sure to choose “Information Management” and then select “DB2 for Linux, UNIX and Windows - DB2 Express-C Edition”. Your feedback is highly valued and much appreciated.

Share/Save/Bookmark

→ No CommentsTags: DB2

DB2 Express-C at OSCON 2008 in Portland

July 21st, 2008 · No Comments

We will have a pedestal in the exhibit hall at the O’Reilly Opens Source Conference 2008 that is on this week at the Oregon Convention Center in Portland. While sessions and tutorials pass does cost but money admittance to the Expo Hall is free. If you are at the conference on Wednesday or Thursday stop by the IBM booth 401 and say hello to Leon Katsnelson (me) and Rav Ahuja.

OSCON is all about open source so “free” is presumed. So, with DB2 Express-C we are focusing on what makes it really different from other DBMS both open source and commercial. We are focusing on the HYBRID relational/pureXML aspect of the DB2 Express-C. Here is a bookmark that sums the message up:

Go Hybrid with DB2 Express-C

Go Hybrid with DB2 Express-C

What do you think?

Share/Save/Bookmark

→ No CommentsTags: Conferences · Linux · Open Source

Where did my Data Studio go?

July 20th, 2008 · No Comments

So, you downloaded the free DB2 Express-C and you have started to play around with it. Maybe you created sample database and started to explore it using DB2 Control Center. If you are a developer, you may find that DB2 Control Center does not quite fit what you think of a a development tool. It is more in tune with what a DBA might want to do. So, you may have taken our advice and downloaded the new IBM Data Studio. After all, it is free so why not! If you have not done this already, go get it from the DB2 Express-C download page. Give yourself a bit of time and have at least 600MB free on your hard drive to hold Data Studio installer as this is not a small download. Don’t say I did not warn you.

Let’s say you downloaded and installed the Data Studio. This is where the first surprise may come in. If you install products like most people i.e. click on Next button until you get to the finish button without reading much on the way, you may be surprised to find that there is no shortcut for Data Studio in your Start/All Programs/IBM DB2 folder. Don’t panic. Your install in all likelyhood has indeed succeded and you do have Data Studio on your machine. So, where is it? Go to Start/All Programs/IBM Software Development Platform and you should see IBM Data Studio Folder and in it a shortcut for launching Data Studio.

At this time you are probably wondering “why in the world would IBM do this; doesn’t IBM DB2 folder make a lot more sense?” Yes it does … if you are using DB2. Notice however that we did not call this product “DB2 Data Studio”. That is because in addition to DB2 for Linux, Unix and Windows, DB2 for z/OS and DB2 for i5/OS, Data Studio also supports IDS (Informix Dynamic Server) databases. In the future, it is also not unreasonable to expect Data Studio to work with other databases say MySQL, Oracle, SQL Server etc. As a matter of fact, in a way it already does. Just take a look at the Rational Data Architect products in the Data Studio portfolio which already supports a number of non-IBM database servers.

This brings me to the second reason why Data Studio is in the IBM Software Development Platform
folder. If you have any of the IBM Rational products installed on your machine you will find them in the same folder. IBM Software Development Platform is IBM platfrom for delivering a set of tools that implement a complete software development lifecycle. Data Studio handles the data lifecycle within the scope of appilcation development. I know this sounds like marketeering but there is actually a lot of meat to this. For example, if you do have any of the Rational software development tools, you will find that Data Studio you just installed, plugs itself right in to these tools (e.g. RAD - Rational Application Developer, RSA - Rational System Architect). As a Java developer you will find this very convenient as both your Java programming and your database developement tasks can be done from a single IDE without having to start multiple copies of Eclipse and minimizing your risk of developing a curpal tunnel syndrome from switching between IDEs all the time (yes, it is my lame attempt at a joke).

OK, so you may now understand why Data Studio it is where it is but none of these points apply to you. You don’t do Java, Lifecycle is the name of your stationary bike that you use to excercise, and you are just happy to be working with DB2 and don’t need another database, thank you very much. No problem. Just create a shortcut on your Windows desktop, or the Quick Launch toolbar and you have a one click access to the Data Studio.

Share/Save/Bookmark

→ No CommentsTags: DB2 · Tools

Everything you ever wanted to know about a DB2 table but did not know where to look

July 15th, 2008 · No Comments

Just like any other DBMS, DB2 keeps all of your data in tables. The tables are organized in a usual i.e. data is stored in rows with each row having one or more columns. This is true even in the case when the data that is stored as XML. Hopefully it is not a surprise that DB2 Express-C has a “hybrid” storage engine that stores both relational data and XML data in their native formats. We call DB2’s ability to deal with XML data “pureXML”. For XML, this means that XML documents are stored in DB2 as pre-parsed trees. These XML trees are not a replacement for rows and columns. Instead, XML is just another data type for a column and each table can have zero or more columns of XML type.

Since table objects (or just tables) are so fundamental to your ability to store and retrieve data there comes a time when you want to get some information on a particular table in your database. For example, you may want to know how much disk space has been allocated for a particular table, or what state it is in. Should it be reorganized? Has the last load job finished loading data in to it and so on. If you were using MySQL, you may have just looked at the file that represents a table and were able to figure out disk usage that way. There is a better way to do this in DB2 Express-C (and every other DB2 for Linux, Unix and Windows v9.5). Since DB2 speaks SQL, it makes sense to query information about its tables using SQL. DB2 provides not one but 2 ways of getting cornucopia of information about tables via SQL. One way is to use a SYSIBMADM.ADMINTABINFO view. At this point in time you may be tempted to just fire up your DB2 Command Line Processor (CLP) and type in this query:
SELECT * FROM SYSIBMADM.ADMINTABINFO
Go ahead and try this. I bet you are not very impressed with the results. You are getting the right information but it is just too much of a good thing. The query is returning about 29 columns and this is for every table in your database. CLP is not the ideal way to display such query results. The good thing is that this is just a view and we are using SQL. Obviously we can apply a predicate to, for example, only return information for a particular table and specify only the columns we are interested in to make it more manageable. And, we can always use a better tool than CLP to run the query; say DB2 Control Center.

Instead of retrieving all of the columns in the view, you will likely be interested in a subset of columns. DB2 information Center provides a very comprehensive description of each column in this view. I will provide a much shorter description here to save you the hastle of trying to figure out what information applies to DB2 Express-C and what you can ignore.

Column name Data type Description
TABSCHEMA VARCHAR(128) Schema name. Schema in DB2 is a convenient way to manage the name space for objects in the same database.
TABNAME VARCHAR(128) Table name.
TABTYPE CHAR(1) Table type:

  • ‘H’ = hierarchy table
  • ‘S’ = materialized query table. Materialized Query Tables (MQT) are not supported in DB2 Express-C.
  • ‘T’ = table
DBPARTITIONNUM SMALLINT Database partition number. Not applicable to DB2 Express-C
DATA_PARTITION_ID INTEGER Data partition number. Not applicable to DB2 Express-C
AVAILABLE CHAR(1) State of the table:

  • ‘N’ = the table is unavailable. If the table is unavailable, all other output columns relating to the size and state will be NULL.
  • ‘Y’ = the table is available.
Note: Rollforward through an unrecoverable load will put a table into the unavailable state.
DATA_OBJECT_L_SIZE BIGINT Amount of disk space logically allocated for the table, reported in kilobytes.  This size represents the logical size of the base table only. Space consumed by LOB data, Long Data, Indexes and XML objects are reported by other columns.
DATA_OBJECT_P_SIZE BIGINT Amount of disk space physically allocated for the table, reported in kilobytes. This size represents the physical size of the base table only. Space consumed by LOB data, Long Data, Indexes and XML objects are reported by other columns.
INDEX_OBJECT_L_SIZE BIGINT Amount of disk space logically allocated for the indexes defined on the table, reported in kilobytes.
INDEX_OBJECT_P_SIZE BIGINT Amount of disk space physically allocated for the indexes defined on the table, reported in kilobytes.
LONG_OBJECT_L_SIZE BIGINT Amount of disk space logically allocated for long field data in a table, reported in kilobytes.
LONG_OBJECT_P_SIZE BIGINT Amount of disk space physically allocated for long field data in a table, reported in kilobytes.
LOB_OBJECT_L_SIZE BIGINT Amount of disk space logically allocated for LOB data in a table, reported in kilobytes.
LOB_OBJECT_P_SIZE BIGINT Amount of disk space physically allocated for LOB data in a table, reported in kilobytes.
XML_OBJECT_L_SIZE BIGINT Amount of disk space logically allocated for XML data in a table, reported in kilobytes.
XML_OBJECT_P_SIZE BIGINT Amount of disk space physically allocated for XML data in a table, reported in kilobytes.
INDEX_TYPE SMALLINT Indicates the type of indexes currently in use for the table. Returns:

  • 1 if type-1 indexes are being used. You should not see type 1 indexes in DB2 Express-C
  • 2 if type-2 indexes are being used.
REORG_PENDING CHAR(1) A value of ‘Y’ indicates that a reorg recommended alter has been applied to the table and a classic (offline) reorg is required. Otherwise ‘N’ is returned.
INPLACE_REORG_STATUS VARCHAR(10) Current status of an inplace table reorganization on the table. The status value can be one of the following:

  • ABORTED (in a PAUSED state, but unable to RESUME; STOP is required)
  • EXECUTING
  • NULL (if no inplace reorg has been performed on the table)
  • PAUSED
LOAD_STATUS VARCHAR(12) Current status of a load operation against the table. The status value can be one of the following:

  • IN_PROGRESS
  • NULL (if there is no load in progress for the table and the table is not in load pending state)
  • PENDING
READ_ACCESS_ONLY CHAR(1) ‘Y’ if the table is in Read Access Only state, ‘N’ otherwise. A value of ‘N’ should not be interpreted as meaning that the table is fully accessible. If a load is in progress or pending, a value of ‘Y’ means the table data is available for read access, and a value of ‘N’ means the table is inaccessible. Similarly, if the table status is set integrity pending (refer to SYSCAT.TABLES STATUS column), then a value of ‘N’ means the table is inaccessible.
NO_LOAD_RESTART CHAR(1) A value of ‘Y’ indicates the table is in a partially loaded state that will not allow a load restart. A value of ‘N’ is returned otherwise.
NUM_REORG_REC_ALTERS SMALLINT Number of reorg recommend alter operations (for example, alter operations after which a reorganization is required) that have been performed against this table since the last reorganization.
INDEXES_REQUIRE_REBUILD CHAR(1) ‘Y’ if any of the indexes defined on the table require a rebuild, and ‘N’ otherwise.
LARGE_RIDS CHAR(1) Indicates whether or not the table is using large row IDs (RIDs) (4 byte page number, 2 byte slot number). A value of ‘Y’ indicates that the table is using large RIDs and ‘N’ indicates that it is not using large RIDs. A value of ‘P’ (pending) will be returned if the table supports large RIDs (that is, the table is in a large table space), but at least one of the indexes for the table has not been reorganized or rebuilt yet, so the table is still using 4 byte RIDs (which means that action must be taken to convert the table or indexes).
LARGE_SLOTS CHAR(1) Indicates whether or not the table is using large slots (which allows more than 255 rows per page). A value of ‘Y’ indicates that the table is using large slots and ‘N’ indicates that it is not using large slots. A value of ‘P’ (pending) will be returned if the table supports large slots (that is, the table is in a large table space), but there has been no offline table reorganization or table truncation operation performed on the table yet, so it is still using a maximum of 255 rows per page.
DICTIONARY_SIZE BIGINT Size of the dictionary, in bytes, used for row compression if a row compression dictionary exists for the table. The value will always be zero since DB2 Express-C does not support row compression.
BLOCKS_PENDING_CLEANUP BIGINT For MDC tables, the number of blocks pending cleanup. Value will always be zero as DB2 Express-C does not support MDC tables.
STATSTYPE CHAR(1)
  • ‘F’ = System fabricated statistics without table or index scan. These statistics are stored in memory and are different from what is stored in the system catalogs. This is a temporary state and eventually full statistics will be gathered by DB2 and stored in the system catalogs.
  • ‘A’= System asynchronously gathered statistics. Statistics have been automatically collected by DB2 by a background process and stored in the system catalogs.
  • ‘S’ = System synchronously gathered statistics. Statistics have been automatically collected by DB2 during SQL statement compilation. These statistics are stored in memory and are different from what is stored in the system catalogs. This is a temporary state and eventually DB2 will store the statistics in the system catalogs.
  • ‘U’ = User gathered statistics. Statistics gathering was initiated by the user through a utility such as RUNSTATS, CREATE INDEX, LOAD, REDISTRIBUTE or by manually updating system catalog statistics.
  • NULL = unknown type

As you can see, many of the columns are not applicable to the DB2 Express-C environment but they are there to preserve 100% compatibility with other editions of DB2.

At the start of the post I said that there are two SQL based ways to get detailed information on a DB2 table. ADMINTABINFO view is one. The other is a to use SYSPROC.ADMIN_GET_TAB_INFO_V95 table function. This function, like any other table function, can be used in a SELECT statement. It accepts table schema and table name as parameters and returns exactly the same information as the ADMINTABINFO view.
SELECT * FROM TABLE (SYSPROC.ADMIN_GET_TAB_INFO_V95('LEON', 'DEPARTMENT'))
I recommend using ADMINTABINFO and to stay away from using ADMIN_GET_TAB_INFO_V95 table function directly. There is absolutely nothing that the ADMINTABINFO can not provide that ADMIN_GET_TAB_INFO_V95 can. And, you probably already noticed “V95″ in the name. This on its own should serve as the warning sign.

Share/Save/Bookmark

→ No CommentsTags: Administration · DB2 · Programming