Why SELECT * is bad (or could be)

I have always been told that SELECT * FROM some_table is a bad thing, but as a lazy developer that I am, it was just easier to select everything than to care about such thing. I also think that a lot of people face this statement as some kind of heresy, but I have my doubts that they really understand what it means. One of the worst examples one can provide to defend against star selects is this:

SELECT field1, field2, fieldN FROM some_table

where field1, field2 and fieldN are ALL fields from some_table. Sure it is declarative programming, but you can’t get much more than that.

It is not hard to find a lot of reasons and explanations against SELECT *, but one does not simply understands the impact of something before having problems with it. It is the same thing for everything, from Design Patterns to fire and children: a developer can’t just read GoF and try to apply it “everywhere” without facing the hard way the effects of bad software design. It does not work that way. Also, a child will not understand what his parents truly mean when they say “don’t put your hand in the fire, it will burn and hurt a lot” without doing it.

So, when can SELECT * be bad? As I see it, it could have huge impacts in either disk and network I/O, and memory. The explanation is simple: the more data you ask for, the more the server has to read from the disk and transfer the data over the network to your application, and you could end using much more memory than you need for any given operation.

See the following graph from a DB server, showing I/O usage:

See the two big gaps? The disk I/O dropped when some queries were optimized to return only what the application really needed for each operation. Before that, the server I/O was hitting around 80.000 ~ 100.000 blocks per second, which for this setup is a lot. Not surprisingly, the web server was wasting a lot of memory as well.

In the end, you will have to list the pros and cons about each strategy, and remember that the application size matters – one may not see big improvements in small services.

And always remember that premature optimization is the root of all evil.

Setting up SQL logs in Rails 2 and 3 console

When developing Rails applications, it is often desirable to see what ActiveRecord is doing behind the scenes when using the console. By default, log messages in script/console (Rails 2) or rails c (Rails 3) are only sent to log/development.log, which I find quite annoying to keep an eye on. Much better would be if it the messages were sent directly to the console itself.

It turns out that it is very simple to achieve that. Add the following code to your ~/.irbrc file:

# File ~/.irbrc
require 'logger'

# For Rails 2

# For Rails 3
ActiveRecord::Base.logger = Logger.new(STDOUT)

Simple as that. Restart the console and load any AR model, and the SQL statement should appear right before the data. For the same of simplicity, I set woth RAILS_DEFAULT_LOGGER and ActiveRecord::Base.logger, even because I had sittuations where only having one of the configurations didn’t work so well (for an unknown reason to me)