Don't Panic: The Dependency Inversion Principle

Controlling complexity is the essence of computer programming.

— Brian Kernigan in [Kernighan-Plauger]

The Depenedency Inversion Principle is commonly interpreted as “Always talk to an interface, never to it’s implementation”, and the implementation is put in the same package as the interface (or a subpackage called impl). Like for example the Java Collections Framework in java.util. While this works for the collections framework (both the interfaces and implementations describe the same thing), generally it is WRONG.

This blog post re-introduces the Dependency Inversion Principle, done right, as a technique to untangle cyclic dependencies.

Introduction

Since the rise of test driven design, programmers are spending more time to create quality software. Our managers understand why: money. However, they still insist on a comprehensive (to them), high-level overview that tells them everything still works and that also quantifies to them how much money is involved in bad software. As a result, we’re employing Continuous Integration: a Bamboo server continuously builds our artifacts when changes are checked, and we get pretty graphs to show. We also use Sonar, which is more important one of the two as it visualizes code quality.

The best feature Sonar has for both managers and programmers is it’s project dashboard. This shows things like rules violations (the chance for bugs to occur), code coverage (tested code behaves as its unit test expects), complexity (another source of bugs), LCOM4 (logical components per class), the package tangle index and technical debt.

Of these, I’m going to focus on the package tangle index.

Dependency Structure Matrix (DSM)

If there are cyclic dependencies, Sonar reports a non-zero package tangle index. It uses a so-called Dependency Structure Matrix [Sangal-2005] to both discover them and to visualize where the cycles are. I’m going to give a short description of dependency structure matrices. For more detailed information on understanding and using a DSM, there are explanations online from Sonar and from IntelliJ.

A DSM shows your architecture. Not in a diagram made to impress, but in a sorted matrix that accurately shows dependencies. The most common way to sort a DSM is triangular: dependencies are placed in the lower left corner as much as possible. For example, a layered system without cyclic dependencies looks like the diagram on the left (on the right a strictly layered system, where each component only has access to the layer below):

DSM for a loosely layered application
DSM for a strictly layered application

As you can see, the dependencies tell you which packages/classes use which other parts of your application. Also, the presence or absence of numbers in the top right half of the matrix tell you of dependency cycles. The following example shows a project that is pretty badly tangled (a tangle index of nearly 50%):

DSM showing design erosion

The reason to focus on the package tangle index is twofold:

A tangled codebase becomes very brittle: fixing a bug may cause several other bugs to pop up (assuming you don’t break your build outright).

Cutting dependency cycles can be very difficult.

A brittle codebase is only slightly better than a legacy codebase (which is usually badly and/or misdocumented in addition to being brittle). As a result, a brittle codebase is expensive. Expensive for your manager (it gobbles up his budget), and expensive for you (it causes stress, and is most definitely not fun to work with). This is why you want to cut dependency cycles.

But cutting dependency cycles means you may have to rethink your design rules: which classes may access which others? Why are some dependencies forbidden? Where do I put this code? But worse: sometimes the dependency causing a cycle has entered the design on purpose. How do you get rid of that? This is where the Dependency Inversion Principle comes in.

Dependency Inversion Principle

Often, the Depenedency Inversion Principle is interpreted as “Always talk to an interface, never to it’s implementation”, and the implementation is put in the same package as the interface (or a subpackage called impl). An example that fits this interpretation is the Java Collections Framework. While this works for the collections framework (both the interfaces and implementations describe the same thing), generally it is WRONG.

Why? Because 1) you have not inverted any dependencies (not really, yet), and 2) you have increased complexity. My statement: the Depenedency Inversion Principle is much more powerful!

Let’s forget the interface/implementation abstraction for now, and focus on untangling cyclic dependencies. What we want, is to invert dependencies: move them from the top right half of the Dependency Structure Matrix to the lower left half. By inverting the dependency, the using class becomes the used class (and vice versa), and this goal is accomplished.

Think about it for a few seconds: how can simply splitting a class in an interface and its implementation be good? This is not what the Depenedency Inversion Principle is about, but is does increase maintenance costs by defining the API of your code in two places. And remember the DRY principle? Don’t Repeat Yourself. That’s right, simply splitting a class into an interface and implementation violates the DRY principle.

The essence of applying the Dependency Inversion Principle correctly is this:

Split the code/service/... you depend on into an interface and implementation.

The interface restructures the dependency in the jargon of the code using it, the implementation implements it in terms of its underlying techniques.

The implementation remains where it is. But the interface has a different function now (and uses a different jargon/language), describing something the using code can do. Move it to that package.

By not placing the interface and implementation in the same package, the (direction of the) dependency is inverted from user→implementation to implementation→user.

An example

An example can be found in Domain Driven Design. In DDD, the model is the heart of the application. The model implements all business logic in terms of the underlying knowledge domain (the ubiquitous language). The model is persisted by a repository, which also provides efficient access to the model from the datastore.

Of course, you do not want your model to be concerned with the details of persistence. So, you reuse the familiar model, DAO/Repository and service layers. But: now you’re implementing transaction scripts in your service layer, which have a tendency to also absorb business logic. Clearly, this is not desired.

One easy solution would be to dilligently put all business logic into the model, reducing the service layer to a transaction wrapper (which is good, as then the service layer has only a single responsibility). The downside of this is performance: you can no longer easily search all registered hours that are ready to be invoiced, for example, because that requires access to the repository. The obvious solution then is to allow the model access to the repository.

But… the repository already depends on the model (it needs it to retrieve it from the datastore)! So we have a dependency cycle. Bad design!

Yes, bad design, up to a point. The decision to allow the model access to the repository is valid. Thus, let’s proceed to break the dependency cycle.

Inverting the dependency

First review how to invert the dependency (see above): we split the repository in an interface and implementation. The repository defines the possible data access paths, and the implementation is the object that actually accesses the data from the datastore — it is a data access object, a DAO. Does this help us?

By making the repository an interface in our model, we define the fact that our model is searchable (and optionally, persisted). This is what our model may know and use, and the datastore has nothing to do with it. So far, so good.

The implementation of the repository, our data access object, is specific to the datastore we use. It’s only dependencies are the model and the datastore, and nothing depends on it (others depends on the repository instead). Nothing, nada… wow. This is excellent news! Sure, our existing dependencies between our service and model layers have been strengthened (but not by much). But the good news is that the data access, a dependency on an external system, is now as loosely coupled to the application as possible. Good stuff indeed!

Conclusion

The Dependency Inversion Principle is a powerful tool that allows us to reduce the package entanglement of our code, reducing its brittleness. It builds upon concepts like abstraction, coupling & cohesion and would not be possible without them. But The Dependency Inversion Principle goes further: it makes it possible (among other things), to give our business logic access to uncomplicated, efficient data access (instead of our transaction scripts). This makes our code faster, better and easier to maintain.

References

[Kernighan-Plauger]: Brian Kernighan and P. J. Plauger. Software Tools. Addison-Wesley. 1976. ISBN 201-03669-X
[Sangal et. al. 2005]: N. Sangal, E. Jordan, V. Sinha and D. Jackson. Using Dependency Models to Manage Complex Software Architecture. 2005.
http://sdg.csail.mit.edu/pubs/2005/oopsla05-dsm.pdf

15 August 2011

The Dependency Inversion Principle