I think that every Database Administrator needs to read the article “Opinion: No more excuses for SQL injection attacks” and send it out to every application developer they know. I hate SPAM as much as the next guy but I think it is justified in this case. I am dead serious.
We, the database people of all walks of life (DB2, Oracle, SQL Server …), have been talking about the benefits of using parameters in the queries instead of literal values for years. The main message has been “performance”. Most DBMS will perform much better and will use less server resources processing queries that have parameters instead of queries that have literal values in them instead. The reason is very simple. Good DBMS will compile an SQL statement once and will cache it so that the next time it sees the same statement it would not have to waste time and resources compiling/preparing the same statement and would pull query execution plan from the cache. The problem is that when SQL statements use literal values instead of parameters matching incoming statements to what is already in the statement cache becomes a much more difficult task for the database to do well. Consider this simple SQL that uses literal value for my last name:
select * from EMPLOYEE where LASTNAME='Katsnelson'
Once the database sees this SQL statement it will compile it and save it in the statement cache (EDM Pool for those of you on DB2 for z/OS). Let’s say a query for another employee comes along:
select * from EMPLOYEE where LASTNAME='Ahuja'
Because the queries don’t look exactly the same, the cache matching algorithm will have a very hard time understanding that this is exactly the same query and most DBMS will compile it again and will store it in the statement cache as well. So, you get a double hit of bad news:
- your server will keep compiling the same statement over and over again, and
- you keep wasting precious memory to hold more and more copies of the same statement in the cache
I said “most DBMS” because DB2 9.7 and DB2 Connect 9.7 (makes this feature available for DB2 for z/OS and DB2 for i5/OS) have a new feature called Statement Concentrator. The name does not really explain what this feature does so I will try to clarify. Basically, Statement Concentrator will intercept and change the statement text to match similar statement that is already in the cache.
So, there is no reason for me to get hot under the collar about use of literal values in SQL? Not so fast. Statement Concentrator tries to do a good thing but it has some limitations and drawbacks. First of all, it is not enabled by default and either a DBA or a programmer has to enable it on either the client (recommended) or on the server. Once enabled, it will inspect all SQL that is coming across and this will have some overhead. This overhead is minimal and is well justified by the savings in statement compilation but it is there. Also, in its current (v9.7) form, it will only save 100000 literal values and anything above will go through and cause statement to be compiled again. (I stand corrected; this restriction was present in the beta but has since been lifted). Last but not least, if the statement contains any parameters, DB2 will not substitute any literals i.e. statements with both parameters and literals do not benefit from the Statement Concentrator. This is why I am not describing Statement Concentrator as the “new cool feature in DB2 9.7“. I think it still needs a bit of polish before I’d call it “new cool feature”.
Now, if your application developer made a very simple change and coded a parameterized query like so:
Select * from EMPLOYEE where LASTNAME = ?
the life would be so much better. The article shows how easy to do it in Java but the same easy facility is available in most programming languages. DB2 (and other DBMS) could cache the statements and avoid the penalty associated with using literal values.
So, what about that article I am asking every DBA to send to every programmer they know? This article does not talk about the impact of using literal values on database performance. Instead, it focuses on something much more important – the evil of using literal values in SQL from a point of view of security. SQL injection attacks are the most common and the easiest ways to severely compromise the security and privacy of the data stored in an SQL database. The real danger of SQL injection is in the sheer simplicity and ease of automation. As the article and recent high profile security breaches illustrate, you don’t need to be an SQL guru or a top notch programmer to mount a wide ranging and a really devastating SQL injection attack. Script Kiddies have been using simple Google searches to identify vulnerable sites (read this post to see what this attack looked like to internet security professionals) and infect or bring down 10s of thousands of websites in a single day.
I hope I made my point on just how bad a programming practice use of literal values in the queries is. it will kill performance of your database and worse, it will open it up to an SQL injection attack. Do not count on sophisticated functionality such as Statement Concentrator in DB2 9.7 to save you. It will help but there is no substitute for good coding practices. So, here is my call to action: copy this url http://www.computerworld.com/s/article/9137478/Opinion_No_more_excuses_for_SQL_injection_attacks and paste it in to an email addressed to anyone you know who writes SQL statements. Maybe they will heed the warning and do the right thing and start using parameters in the SQL they write.