I think that every Database Administrator needs to read the article “Opinion: No more excuses for SQL injection attacks” and send it out to every application developer they know. I hate SPAM as much as the next guy but I think it is justified in this case. I am dead serious.
We, the database people of all walks of life (DB2, Oracle, SQL Server …), have been talking about the benefits of using parameters in the queries instead of literal values for years. The main message has been “performance”. Most DBMS will perform much better and will use less server resources processing queries that have parameters instead of queries that have literal values in them instead. The reason is very simple. Good DBMS will compile an SQL statement once and will cache it so that the next time it sees the same statement it would not have to waste time and resources compiling/preparing the same statement and would pull query execution plan from the cache. The problem is that when SQL statements use literal values instead of parameters matching incoming statements to what is already in the statement cache becomes a much more difficult task for the database to do well. Consider this simple SQL that uses literal value for my last name:
select * from EMPLOYEE where LASTNAME='Katsnelson'
Once the database sees this SQL statement it will compile it and save it in the statement cache (EDM Pool for those of you on DB2 for z/OS). Let’s say a query for another employee comes along:
select * from EMPLOYEE where LASTNAME='Ahuja'
Because the queries don’t look exactly the same, the cache matching algorithm will have a very hard time understanding that this is exactly the same query and most DBMS will compile it again and will store it in the statement cache as well. So, you get a double hit of bad news:
- your server will keep compiling the same statement over and over again, and
- you keep wasting precious memory to hold more and more copies of the same statement in the cache
I said “most DBMS” because DB2 9.7 and DB2 Connect 9.7 (makes this feature available for DB2 for z/OS and DB2 for i5/OS) have a new feature called Statement Concentrator. The name does not really explain what this feature does so I will try to clarify. Basically, Statement Concentrator will intercept and change the statement text to match similar statement that is already in the cache.
So, there is no reason for me to get hot under the collar about use of literal values in SQL? Not so fast. Statement Concentrator tries to do a good thing but it has some limitations and drawbacks. First of all, it is not enabled by default and either a DBA or a programmer has to enable it on either the client (recommended) or on the server. Once enabled, it will inspect all SQL that is coming across and this will have some overhead. This overhead is minimal and is well justified by the savings in statement compilation but it is there. Also, in its current (v9.7) form, it will only save 100000 literal values and anything above will go through and cause statement to be compiled again. (I stand corrected; this restriction was present in the beta but has since been lifted). Last but not least, if the statement contains any parameters, DB2 will not substitute any literals i.e. statements with both parameters and literals do not benefit from the Statement Concentrator. This is why I am not describing Statement Concentrator as the “new cool feature in DB2 9.7“. I think it still needs a bit of polish before I’d call it “new cool feature”.
Now, if your application developer made a very simple change and coded a parameterized query like so:
Select * from EMPLOYEE where LASTNAME = ?
the life would be so much better. The article shows how easy to do it in Java but the same easy facility is available in most programming languages. DB2 (and other DBMS) could cache the statements and avoid the penalty associated with using literal values.
So, what about that article I am asking every DBA to send to every programmer they know? This article does not talk about the impact of using literal values on database performance. Instead, it focuses on something much more important – the evil of using literal values in SQL from a point of view of security. SQL injection attacks are the most common and the easiest ways to severely compromise the security and privacy of the data stored in an SQL database. The real danger of SQL injection is in the sheer simplicity and ease of automation. As the article and recent high profile security breaches illustrate, you don’t need to be an SQL guru or a top notch programmer to mount a wide ranging and a really devastating SQL injection attack. Script Kiddies have been using simple Google searches to identify vulnerable sites (read this post to see what this attack looked like to internet security professionals) and infect or bring down 10s of thousands of websites in a single day.
I hope I made my point on just how bad a programming practice use of literal values in the queries is. it will kill performance of your database and worse, it will open it up to an SQL injection attack. Do not count on sophisticated functionality such as Statement Concentrator in DB2 9.7 to save you. It will help but there is no substitute for good coding practices. So, here is my call to action: copy this url http://www.computerworld.com/s/article/9137478/Opinion_No_more_excuses_for_SQL_injection_attacks and paste it in to an email addressed to anyone you know who writes SQL statements. Maybe they will heed the warning and do the right thing and start using parameters in the SQL they write.
This is an excellent article! If we would also share it with those, who care about the security of the Data you could also get a good backing of Making the Change! A lot of COTS application drive this effort of not using Host Variables (Speeds the operation of getting the COTS application out the door). Yet many of these COTS allow for rewriting of this process.
Therefore, for a business what is more important becomes the question. What is the cost to the bottom line? I as DBA say pay now (up front and fix the issue) or they will pay later with slow performance and/or unsecured data being transmitted!
The Oracle quivlent of the DB2 Statement Concentrator facility would be the Cursor_Sharing parameter
(http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/memory.htm#sthref497 etc)
The main problem with parameters is when you output the sql statement to a debug log you can not see what parameters have been set for the sql. You have to write debug statements for all the parameters used by the statement so you can found out what went wrong. Now if there was a way to output the actual sql including parameters invoked this would encourage people to use the parameter option more.
Regan Galbraith just pointed out on LinkedIN to this huge case of identity theft that was a result of SQL injection attack. Read and weep. Better yet, if you are using DB2 go to DB2 v9.7 and enable Statement Concentrator to help prevent SQL injection.
http://news.bbc.co.uk/2/hi/americas/8206305.stm
The discussion of switching to use a parameter marker instead of a literal is a little flawed, or perhaps just incomplete.
If the SAME access path will be used regardless of the LASTNAME value (to use the provided example), then switching to a parameter is a ‘duh’ move. Clearly, there’s a benefit because the access path, and cost of optimizing the different SQL text each time is eliminated.
On the other hand, if the SQL contains multiple predicates, or the data is sufficiently skewed that only some LASTNAME values should use an index, switching to a parameter marker could be dangerous. The first instance of the SQL coming sets up the access path, and it may not change again. If it’s not the right one for the predicate value, the performance degradation could be considerable and embarrassing.
Like all things DB2: it depends.
There is no doubt that there are cases where switching from a literal value to a parameter can lead to a change in the access path and in some cases indeed, it can actually degrade performance. Can this be embarrassing, you bet. But nowhere near as embarrassing when your company is on hte news because some script kiddie or a disgruntle employee compromised security of your data.
I received a lot of questions/comments about SQL injection attacks and their applicability to intranet applications. Many people are under misguided assumption that SQL injection attacks happen only when you have customer facing applications. Don’t let yourself fall in to a false sense of security. Most data security breaches are perpetrated by the insiders. SQL injection attack can be mounted even more effectively by your company’s employee.
> The main problem with parameters is when you output the sql statement to
> a debug log you can not see what parameters have been set for the sql.
Which is a problem of the driver though.
Some JDBC driver do implement PreparedStatement.toString() such that parameter values are shown.