debuggable

 
Contact Us
 

Code Insults Round 1 - Why switch blocks are dumb

Posted on 27/10/08 by Nate Abele

Howdy folks,

Sorry for not getting back to you sooner. Between speaking engagements, framework releases and hard drive crashes, blog posting just gets lost in all the chaos sometimes. You know how it is.

So, today's code was randomly selected from the entries I've received since the introductory post. The entire submission is actually two files, which together comprise a console script for interacting with a web service (the names of the entrants have been withheld to protect the identities of the guilty). Rather than examine the full entry, we're going to take a look at one part which I find comes up fairly often: switch block overkill.

Don't get me wrong, switch blocks are great, especially for flow control (which makes the title of this post slightly misleading and sensationalist), but they're often overused in ways that make your code less maintainable. Let's take a look at the example:

for($k=0; $k<count($args); $k++) {
  $option = $args[$k];
  switch ($option) {
    case 'id':
      if (mb_ereg("^[0-9 ,]+$",$args[$k+1])) {
        $this->id = explode(',',str_replace(' ','',$args[$k+1]));
        $k++;
      } else {
        $err = 'No id number found with id option';
      }
      break;
    case 'q':
      $this->verbose = 0;
      $this->quiet = true;
      break;
    case 'v':
      $this->verbose = 1;
      $this->quiet = false;
      break;
    case 'V':
      $this->verbose = 2;
      $this->quiet = false;
      break;
    case 'D':
      $this->debug = true;
      $this->verbose = 2;
      $this->quiet = false;
      break;
    case 'T':
      $this->testrun = true;
      $this->quiet = false;
      break;
    case 'L':
      if (mb_ereg("^[0-9]+$",$args[$k+1])) {
        $this->limit = $args[$k+1];
        $k++;
      } else {
        $err = 'Limit param can only be a digit';
      }          
      break;
    default:
      if (!isset($valid_args[$option])) {
        $err = 'Uknown argument \''.$option.'\'.';
      } else if ($valid_args[$option] !== false) {
        $k++;
      }
      break;
  }
}

This code takes a list of arguments from the command line, and uses them to set the state of the shell object. In other words, it is creating a mapping between the parameter flags and the shell object's properties.

A key to writing flexible, maintainable code is realizing that your applications are basically comprised of two things: algorithms and data. Algorithms are the code, the logic. Data is what the algorithms operate on, though in many cases (like this one), data is showing up as part of the algorithm: it's too tightly bound up in logic, and unnecessarily so.

Finding where data is tied up in your business logic can sometimes be subtle and hard to spot. It might help to think about the distinction in these terms. Logic (or algorithm) code does something: it causes a transition to the next state of execution within your application. Conditionals (if/switch blocks) branch, loops iterate, etc. Conversely, data code is declarative, it takes no direct action; i.e. setting a variable or object property. (Unless __set() is involved, in which case you're invoking a method anyway, so it doesn't count. Cheater.)

Now, setting a variable does cause a change of state within the machine of your application, but in and of itself, this change has no value. It requires some form of processing via logic code. With that in mind, let's take a look at the re-written example:

$flags = array(
  'q' => array('verbose' => 0, 'quiet' => true),
  'v' => array('verbose' => 1, 'quiet' => false),
  'V' => array('verbose' => 2, 'quiet' => false),
  'D' => array('debug' => true, 'verbose' => 2, 'quiet' => false),
  'T' => array('testrun' => true, 'quiet' => false)
);

for ($k = 0; $k < count($args); $k++) {

  $option = $args[$k];
  $value = $args[$k + 1];

  switch (true) {
    case ($option == 'id'):
      if (mb_ereg("^[0-9 ,]+$", $value)) {
        $this->id = explode(',', str_replace(' ','', $value));
        $k++;
      } else {
        $err = 'No id number found with id option';
      }
    break;
    case (isset($flags[$option])):
      foreach ($flags[$option] as $key => $val) {
        $this->{$key} = $val;
      }
    break;
    case ($option == 'L'):
      if (mb_ereg("^[0-9]+$", $value)) {
        $this->limit = $value;
        $k++;
      } else {
        $err = 'Limit param can only be a digit';
      }
    break;
    default:
      if (!isset($validArgs[$option])) {
        $err = 'Uknown argument \'' . $option . '\'.';
      } else if ($validArgs[$option] !== false) {
        $k++;
      }
    break;
  }
}

First of all, notice how the code formatting has been changed: things are now spaced out in a more consistent format that's easier to read, and an explaining variable ($value) has been added alongside $option, to make it more clear from the code which options support parameters, and which are stand-alone flags. When you're writing code, even if you're in a hurry, it always helps to think of the next guy. That next guy might be you in a couple months (or days) when you come back to the code and wonder what the heck you were doing there.

Moving on: as we can see, the data mapping has been abstracted out of the logic and put at the top of the code block. This means that (a) our data can expand infinitely with no logic duplication, and (b) the data can be moved around, manipulated, re-used, or loaded from an external source. This highlights another property of logic vs. data: logic is fixed and immutable, while data is infinitely flexible.

Okay... any questions/comments/snide remarks?

No? Good.

Now get outta here and refactor your code!

 
&nsbp;

You can skip to the end and add a comment.

Daniel Watson  said on Oct 27, 2008:

Nate...

We all know that switch statements are bad... We should definitely move back to the old-school way of doing things... if() elseif() elseif() elseif() elseif() elseif()elseif() elseif() elseif() elseif() else()

Oh, and we need to be sure to make use of ternary operators, and dozens of includes.

if($option == 'id') {
include((file_exists('id.php')) ? 'id.php' : 'default.php');

elseif()...

...

// End of snide remark.

:-)

Nate Abele said on Oct 27, 2008:

Daniel Watson: Heh. That's why I added "switch blocks are great, especially for flow control" ;-)

Rob Wilkerson said on Oct 27, 2008:

I think the key thought here gets a little lost - perhaps because of the "sensational" title - in the reading. It's not about switch blocks at all - they're perfectly useful constructs (when used well) - but rather about the separation of logic from data. Though it could be argued that switch blocks make it easier than it should be to munge it all together.

As you say, "our data can expand infinitely with no logic duplication". That's the critical path. I had to read the article twice because the first time I was reading with an eye for why you thought that particular construct (or its general use) was flawed. I did better on the second reading because I dropped my preconception.

keymaster  said on Oct 28, 2008:

Data driven algorithms are often preferable, especially as the size of the data set grows.

If the data set is small and unlikely to change, I'm not sure it's worth the loss of readability or the extra coding time, though.

Thanks for giving us something to think about. Keep it up.

rafaelbandeira3 said on Oct 28, 2008:

@Nate Abele : good call, I see people commiting - literally - the same "error" more ofently than I would want to. You could have written a little about the {{{ switch (true) }}} that is wide seen on cakephp code, but I must admit that before I started with cake I had never seen such usage and it's very handy.

@keymaster : do you think it's less readable? I think that the "proposed" - should I say "imposed"? ;-)- is much more readable, as the data structure shows everything you need to know about the flags

Rogojine  said on Oct 30, 2008:

Like keymaster, I think that the new code is far less readable. It makes you think that the first array contains all the possible parameters, and that's wrong. And the "switch(true)" might be handy, but does it really make the code clearer?

And if you wanted to turn the parameter "-v" into an incremental one, the first code could easily be changed, whereas the second code would change completely (removing "-v" from the array and adding it to the switch).

If you want to insult other people's code, get ready to some insults to ;)

Loïc Hoguin said on Oct 30, 2008:

That's the kind of situations where getopt is good. But using it or not, the first thing to do is to transform your data into something usable, and then -but only then- to process it.

In this example and in the "fixed" example, data transformation is done at the same time as the data processing. Both examples are bad. And... switch (true)? It's just like if/else but with a different syntax. If you're going to do this at least use if/else.

The real solution for this specific example is to use getopt -or write a similar algorithm- which parse the command line and returns an array containing all the options, with their parameter if any. You can then iterate over this array and apply the appropriate actions when they're found in flags, and add individual tests for each of the "special values". This would become something similar to this:

// define $flags and do getopt (or your own custom parsing operations) here

foreach ($options as $name => $value) {
if ($value !== false)

continue; // Expecting an argument, skip

if (!isset($flags[$name]))
die('Uknown argument \'' . $name . '\'.');

foreach ($flags[$name] as $key => $val)
$this->{$key} = $val;

}

if (isset($options['id'])) {
// ...

}

if (isset($options['L'])) {
// ...

}

Hoping the code won't get screwed up...

Enc  said on Oct 30, 2008:

Separation of algorithm and data.. hold on .. some words like encapsulation, data hiding, interface, contracts pop up all of a sudden? I wonder why? Perhaps, because it's 2008 and every half decent programmer already knows what you're talking about? Back in 1990 maybe your article wouldn't have wasted people's time. Stop writing about trivial things, please.

Nicholas Piasecki said on Oct 30, 2008:

Two comments:

A.

I've written code that's like both of the examples. If I know the options ahead of time and they are unlikely to change, then I prefer the switch statement. It is in my opinion easier to read and easier for a maintenance programmer to easily digest.

If I don't know all of the options ahead of time, or if the options might change frequently, or if the options available could be defined in an external configuration file, then I will code something like the second example. To add an option, I muck with the config file, and the parsing code does not change.

B.

We should also remember that language is important. While the second example is perfectly fine PHP code, it would be somewhat unusual in a statically-typed language: pulling out reflection to set some properties or invoke some setters is slow and not type safe. It certainly is done for the reasons described above, but if we're talking a total of 6 options here, there's nothing wrong with a switch.

Just my two cents.

Tim Koschützki said on Oct 30, 2008:

@Enc: We all know these words. This however is an edge case, as this discussion already shows and I totally second Nicholas Piasecki's opinion A (not sure about B though).

Please don't flame on this blog. If you think it's all too trivial for you, you probably want to contribute yourself by insulting other people's code. Or might as well just start your own blog.

Jonathan Doklovic said on Oct 30, 2008:

just read the book Clean Code: A Handbook of Agile Software Craftsmanship (Robert C. Martin Series) and be done with the whole argument.

It's on safari if you have an account.

Nate Abele said on Oct 30, 2008:

Jonathan Doklovic: So, you're saying there will never be another new concept in computer programming? That seems unlikely. ;-) This is a series for people who want to learn something. If you already know it all, great for you.

Nicholas Piasecki: It was a real-world example used to highlight an abstract concept. I'm not going to try and convince you that this applies 100% of the time. Sure, for simple stuff, keep it simple. The point is to be aware of your own assumptions while doing so.

Enc: You say it's trivial, and yet I see programmers at all levels making the exact same mistake. It's great that all this is so obvious to you, but to many other people, it isn't. That said, maybe *you* should go teach them, rather than sitting around criticizing others for doing so. ;-)

SeanJA said on Oct 30, 2008:

I hate switch statements... I always use if elseif...

You should probably do:
$count = count($args)

for ($i = 0; $i < $count; $i+=1)

so that your program doesn't count each time...

Nate Abele said on Oct 30, 2008:

Switch statements are far better than if/else for flow-control. They make it clear what's going on, the flow logically from one item to the next (often from more specific to more general, depending), and multiple case conditions are able to share code.

primeminister said on Nov 03, 2008:

Nate, Thanks for getting that piece of my code (Yes, I recognize my code from far distance) ;)
That one was one of the things I thought that good be cleaner.

After reading the comments I think a lot the same like Nicholas Piasecki. The first example is still readable and good if the options doesn't change a lot. In this case they don't.

If they good change a lot then your solution would bet more nicer to change after a while.

Thanks again!

Jon said on Dec 20, 2008:

good stuff. thanks for the insight, nate et al.

Nate Abele said on Jan 28, 2009:

All: This is a much better explanation of the concept I was trying to illustrate: http://database-programmer.blogspot.com/2008/05/minimize-code-maximize-data.html

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.