This is the first in a series of articles about using the .NET software metrics tool, NDepend. Each article will focus on a particular use for NDepend, demonstrating how it helps developers “x-ray” their code to see strengths and weaknesses in design.
The Problem, or Why Does it Burn so Bad?
No one likes to experience pain, but the fact is that pain is often a very good metric for measuring the level of stupidity required for making certain decisions. For example, I have a very distinct childhood memory of touching a hot iron to determine if it was in fact plugged in. It dawned on me later, after much wailing and many tears, that perhaps looking at the outlet would have been a better approach, but if I had experienced no pain, I would not have learned.
Software development is really no different, though you might not know it by examining many software projects. Many of the pain points in software development are actually indicators–signals that alert developers that there are design flaws, perhaps significant ones, in the software solution. Often under tight deadlines, budget constraints, and a small army of Pointy Haired Bosses, developers spend hours agonizing over software bugs that seem to spring from the aether. Changes in one module often break code in another, seemingly unrelated module, creating QA and maintenance nightmares that are only remedied by more duct tape, bailing wire, and bubble gum. Problems persist until, eventually, the code base becomes too fragile and tangled to change.
A common cause of pain is often bad coupling, or bad dependencies, between software modules. In .NET, the unit of composition for software modules is the assembly. In general, the following principles should be applied to assemblies:
- modular software should be built to reuse, and be reused by, other modules;
- software modules should be easy to change; and
- changes to a given software module should not affect dependent software modules
Most developers understand and agree with the first principle, but have experienced the pain and frustration of practically implementing the second and third. If a .NET assembly violates the second principle, it is said to be rigid; if it violates the third, it is said to be fragile. [Martin, Metrics]
Reuse is impossible to achieve without dependencies. Assemblies hold references to other assemblies and reuse their types. Unfortunately, an assembly becomes difficult to change as its dependents increase, and as a consequence, changes to that assembly can have adverse consequences to dependents which may have no conceptual relationship to it. It would seem that, to adhere to the first principle of modular software, the second and the third must necessarily be violated. And it precisely this violation that causes so much turmoil in software projects.
The good news is that principles two and three can be adhered to if abstractions (in the form of actual abstract classes or interfaces) are introduced between dependent modules. Consider the following simplified case of a dependency between two classes in two different assemblies:
In this scenario, consider that changes to Class2 could have potential ramifications for Class1, even though Class1 may not require change to fulfill a given requirement or change request. The more classes that come to depend on Class2, the more difficult changes will be–the number of dependencies and possible side effects will cause developers to flee in panic whenever changes to Class2 are necessary. In contrast, consider the following alternative scenario:
By introducing an abstraction between classes 1 and 2, the developer is free to change the implementation of Class2 as necessary without fear that it will cause other assemblies and classes to change as well. By changing the nature of the dependency across assemblies, developers can easily adhere to all three principles of modular software. It is also interesting to note that the rigid nature of the direct dependency between Class1 and AbstractClass2 also serves to crystallize the nature of abstract classes (or interfaces)–that is, the dependency itself dissuades developers from changing the public facing API of the assembly. This is the Open/Closed principle applied to assemblies. Assembly APIs should be open for extension, but closed to modification.
The Solution, or How Using NDepend is like Bringing a Bazooka to a Gun Fight
SO. Now that the nature of the problem is clear, how does a developer go about identifying problem assemblies and types in an application? The software metrics tool NDepend allows developers to examine their software projects and determine where pain points are so they can be eliminated. NDepend applies a number of software metric algorithms to a given code base, and presents a host of useful information about the relationships of assemblies, namespaces, types, members, variables, etc. in a .NET project.
To illustrate how NDepend can help solve the assembly dependency problem, I resurrected an old .NET project from the mothballs to see exactly how bad my code violated the modular software principles. I was delighted to find that NDepend has Visual Studio 2010 integration, which makes it very easy to analyze a solution without leaving the IDE. I proceeded to upgrade my old project to VS2010 format, and attach a new NDepend project to begin my analysis.
Because NDepend is such a powerful tool, it is easy to drown in the sheer number of metrics available for a .NET solution. I spent quite a bit of time reading and digesting the NDepend Code Metrics Definitions page on the NDepend website. For this specific problem, I was interested in assembly metrics, specifically:
- Afferent Coupling (Ca) — the number of types external to a given assembly that depend on types within that assembly (incoming dependencies)
- Efferent Coupling (Ce) — the number of types within a given assembly that depend on types external to that assembly (outgoing dependencies)
- Instability (I) — the ratio of efferent coupling to total coupling within an assembly: I = Ce / (Ce + Ca)
- Abstractness (A) — the ratio of abstract types (and interfaces) in an assembly to the number of total types in the assembly (if A = 1, the assembly is completely abstract; if A = 0, the assembly is completely concrete)
- Distance from the main sequence (D) — the degree to which an assembly’s abstractness and dependencies balance each other (a very abstract assembly should have many dependents, and a very concrete assembly should have few)
NDepend offers a number of ways to find data, but the most versatile is the CQL query editor. CQL is a robust and, frankly, daunting domain-specific-language that enables developers to slice and dice a code base to identify solution items that meet specific criteria. As a starting point, I was interested in assemblies in my project that had a high number of incoming dependencies (afferent coupling), but a low level of abstraction. I opened the CQL editor (complete with intelisense!) and queried the code base.
Right away, I noticed that the Common assembly has a high number of dependencies, but a very low level of abstraction, which means assemblies that depend on it are directly using its concrete types. If I made changes to Common, I would have a good change of breaking other assemblies, forcing me to change them as well. Since this project is fairly trivial in size, my deployment strategy had always been to release all assemblies in each software release. Non-trivial projects, often composed of dozens or hundreds of assemblies, fare far better if changes can be deployed on an assembly-by-assembly basis. If my Common assembly were a member of a larger project, it would require all dependent assemblies to be released whenever its concrete types were changed.
It is clear that Common was an offensive assembly, but I needed to know what its dependent assemblies were to identify the exact points of Ca to be abstracted. I could have monkeyed around with the CQL editor some more to find that information, but I have a general disdain for SQL-like languages (which makes me a favorite among DBAs!), and I wanted to try out other NDepend features, so I opened the Dependency Matrix to find the information I needed.
The Dependency Matrix is really as awesome as it sounds. It allows a developer to view every assembly, type, member, etc. in terms of every other assembly, type, member, etc., on a matrix that shows the number of dependencies at a given drill-down level. Since I was interested in the dependencies of types among assemblies, I adjusted the view option and got a nice breakdown.
As you can see, each assembly is listed on the X and Y axis. The squares in the middle represent the number of types that have dependency relationships on the other assemblies. The green boxes represent the efferent (outgoing) coupling of the horizontal assemblies; the blue boxes represent the afferent (incoming) coupling of the same. I located Common on the horizontal axis and noticed that types from the root namespace, UI.Common, Cache, Data.Import, and Data directly used types in Common. Since I wanted to identify those types in Common that were directly referenced, I expanded the horizontal assembly node to get a more detailed view.
The bulk of the incoming references are for business entities, which are used in many places in the application. Now, recall that the measure of an assembly’s instability is the ratio of efferent (outgoing) coupling to total coupling. I noticed that Common, lacking any green box on its horizontal axis, references no other assemblies (framework assemblies excluded), so its instability in terms of developer-maintained code is zero. Common is a completely “stable” or rigid assembly, meaning that it is not easily changeable (like the foundation of a structure is “stable”), but it is also not very abstract, making it a very fragile assembly–both are conditions that I wanted to avoid. To fix these problems, I could have used the dependency matrix to “drill down” deeper into actual types and methods to determine the exact locations of tightly coupled code and introduce abstractions that would allow Common to maintain stability, but also allow extensibility.
I was curious about other dependencies in my project, but wanted to get visual overview of the project as a whole. I opened the Dependency Graph and changed the box size setting to adjust each node in the graph by afferent coupling. The result confirmed my analysis of the Common assembly, but also highlighted the Logger assembly as another possible offender (click to enlarge).
The interesting thing about this diagram is that it shows the flow of dependencies from the left to the right, and it’s very obvious that the assemblies on the right are the ones that a) need to be most stable, since they have the most dependencies, and b) desperately need to be abstract to avoid violating the second and third principles of modular software. The Dependency Graph fly-out lists detailed information about the Logger assembly, but the two items that immediately stand out are the abstractness rating (0.3, relatively close to 0), and the instability (0.2, very difficult to change). So Common and Logger have the same problems.
I decided to run the NDepend Report last, which produces a comprehensive HTML file with detailed information about the entire project. I was looking for a specific diagram which illustrates, in a nutshell, the reason why dependency management is important, and I found it (click to enlarge).
There is a lot of information in this graph, but notice the lower-left-hand corner: a special little hell known as the “Zone of Pain”. On this graph, Logger and Common occupy a sherbert-orange colored slice of the graph, just next to it. The Y axis of the graph indicates that the level of abstraction for these two assemblies is low (i.e., they contain mostly concrete classes), and the X axis indicates that each assembly has a low level of instability, i.e., a high level of stability, i.e., a large number of incoming dependencies which make it difficult for the assemblies to change. If the “Zone of Pain” is hell, the “Zone of Uselessness” is Plato’s heaven: all forms, and no substance. Classes that fall into this space have maximum abstractness (they have no concrete classes), and maximum instability, i.e., no incoming dependencies. So they are, in reality, “useless” and unused.
The perpendicular position of each assembly on the graph relative to the center diagonal line is known as the “distance from the main sequence”. Notice that there is a nice green area surrounding the main sequence. This represents the ideal location that assemblies should occupy, which implies a special relationship between abstractness and afferent coupling. According to the chart, the more “stable” an assembly is, the more abstract it should be, and likewise the more “instable” an assembly is, the more concrete it should be. Assemblies that occupy the lower-right-hand corner have no dependent assemblies (think web project, for example), and can have as many concrete classes as they like, while assemblies that occupy the upper-left-hand corner should be completely abstract because their dependents are so many. As long as assemblies remain in close proximity to the main sequence, they conform to the Open/Closed principle, and will be relatively painless to modify.
The Conclusion, or How Drinking Helped Me to Understand All of This
My first round with NDepend pretty much kicked my ass, but in a very good way. I was able to look at a project that I was intimately familiar with in a new way, and learned that there is a very precise and quantifiable relationship between the quality of a design and the extensibility of a design. NDepend takes complicated metrics and gives developers a way to visualize data about solutions, track down problem areas, and refactor bad designs with confidence. Any software team that is serious about making robust, extensible software should consider adding NDepend to the toolbox.
For further reading on assembly dependencies, I recommend the following resources: