Cyclomatic Complexity

Cyclomatic Complexity

Have you ever opened up a file in an older piece of software and seen this?

# rubocop:ignore Metrics/CyclomaticComplexity

def complex_method(x, y, z)
  if x >  0
    if y >  0
      if z >  0
        return "All positive"
      else
        return "z is not positive"
      end
    else
      if z >  0
        return "y is not positive, but z is positive"
      else
        return "y and z are not positive"
      end
    end
  else
    if y >  0
      if z >  0
        return "x is not positive, but y and z are positive"
      else
        return "x and z are not positive, but y is positive"
      end
    else
      if z >  0
        return "x and y are not positive, but z is positive"
      else
        return "All are not positive"
      end
    end
  end
end
# did you actually read the whole thing? I hope not!
# rubocop enable:Metrics/CyclomaticComplexity

What is Cyclomatic Complexity?

Looking at that method, it’s, well, bad. It’s hard to read and you’re likely lost by the second or third conditional.

It’s easy to say, complexity here is bad. But what does the linter mean when it calls this “Cyclomatic Complexity”?

In typical Wikipedia fashion, the first sentence on cyclomatic complexity uses the word “complexity” to define it:

Cyclomatic complexity is a software metric used to indicate the complexity of a program.

Errrr…okay. Cyclomatic complexity is…complexity? Not helpful.

Let’s read on:

It is a quantitative measure of the number of linearly independent paths through a program's source code.

Ah! That’s a bit more helpful. Cyclomatic Complexity is a number grading system to tell you how messy your code is! 😆

Let’s dig into this.

The Math (or not)

There are formulae behind computing cyclomatic complexity, and you can map those formulas onto the code. We’ll take a look at this in a minute, but first for those of us who think of things in connected graphs rather than formulas. There’s another way of thinking about cyclomatic complexity.

How many tests do you have to write to get it right?

In the disastrous example above, I can see at least 9 test cases because there are 8 return statements and the implicit return of the method itself. That’s a lot to keep track of!

You can even visualize this! Here’s an example from the Wikipedia page:

A control-flow graph of a simple program. The program begins executing at the red node, then enters a loop (group of three nodes immediately below the red node). Exiting the loop, there is a conditional statement (group below the loop) and the program exits at the blue node. This graph has nine edges, eight nodes and one connected component, so the program's cyclomatic complexity is 9 − 8 + 2×1 = 3.

The formulae

The caption on the image above uses a formula where

E = the number of edges of the graph.
N = the number of nodes of the graph.
P = the number of connected components.

And calculates the complexity with E - N + 2P. You can determine the number of edges and nodes by counting the control flow statements (conditionals).

Rubocop uses a slightly simpler calculation.

Control flow statements (I.e., conditionals, or if statements) count towards cyclomatic complexity. Every method has a cyclomatic complexity of 1 if the code path is linear. That means there are no control flow statements and no alternate outcomes.

In Rubocop, every control flow statement adds 1 to the cyclomatic complexity. Control flow statements are if, unless, ternary statements using ? and :, &&, and ||.

You might think those last two are odd because you can write if true && true and that looks like one statement. But it can be broken down into two if statements so it should be treated the same.

if true
  if true
      return someting
  end
end

Rubocop also counts enumerations towards cyclomatic complexity because code can branch based on the contents of the enumerator. However, enumerations do not multiply the complexity. For example, if an enumerator has 5 entities and there is an if inside the enumeration block, the if only counts 1 point towards cyclomatic complexity, not 5. In most cases, Rubocop won’t know how many entities are in the enumerator anyway.

The Rubocop docs have a great example of how they calculate cyclomatic complexity:

def each_child_node(*types)               # count begins: 1
  unless block_given?                     # unless: +1
    return to_enum(__method__, *types)

  children.each do |child|                # each{}: +1
    next unless child.is_a?(Node)         # unless: +1

    yield child if types.empty? ||        # if: +1, ||: +1
                   types.include?(child.type)
  end

  self
end                                       # total: 6

How do I fix cyclomatic complexity?!

Each case will be slightly different, so here’s the approach I usually take.

  1. Start with breaking the method down into smaller methods that have a single responsibility. Keep human-readability in mind!
  2. Bring early return statements to the top. These are called guard clauses. If there are any cases where there is a single condition like if var == 1, you can convert this to a guard clause.
  3. If you begin to see a pattern where some groups of method have related functionality or handle specific types of data, consider breaking these off into their own class.
  4. Skim through design pattern docs and see if any would help you. I’m bad at remembering names of design patterns and their examples. But I often find them helpful when I’m untangling complex code. So do a bit of searching or use your favourite AI to decide if a design pattern can help.
  5. Did I say keep human-readability in mind? We’re writing code for humans to read and computers to execute. Computers don’t care if your code is messy, but humans will get conflustered if your code is hard to read and it causes a bug.