Thursday, July 05, 2007

Spring on Code organization for large projects

Code organization for large projects.

I discovered this 88 minute presentation on dzone.com today. Not a lot of people have time the time to sit through a presentation this long, so I have decided to write about the presentation. The majority of my comments below are simply paraphrasing what the presenter says, and I take no credit for his thoughts or words. I simply found them to be enlightening and want to share them.

Before I start on the presentation, I want to go over some of the terms that are used, as they are the principals that are under discussion.

Cohesion: A measure of how well the lines of source code within a module work together to provide a specific piece of functionality. That is, how well a module of code focuses on one single task or function.

Low Coupling: Refers to a relationship in which one module interacts with another module through a stable interface and does not need to be concerned with the other module's internal implementation. The benefit of low coupling is that when a change in one module is required, it will not require a change in the implementation of another module.

Dependency: A state where one object uses a functionality of another object.
Java Package: A set of Java classes organized for convenience in the same directory.
Now then, the complete presentation includes video and a slide show and can be found at http://www.infoq.com/presentations/code-organization-large-projects.

Summary
Juergen Hoeller shares his experiences working on large projects (including his role as chief architect of the Spring Framework) to provide general guidelines on Packaging and package interdependencies, Layering and module decomposition, Evolving a large code base. Juergen will also discuss how tools can play a role in enforcing architectural soundness.

Bio
Juergen Hoeller has been the most active Spring developer since the open source project began from Rod's Interface21 framework back in February 2003. Juergen and Rod together continue to provide the direction for Spring.

About the conference
The Spring Experience conference is hosted by Interface21 and NoFluffJustStuff Java Symposiums


Why worry about code organization?
Many organization and or programmers don't spend time thinking about code organization, it simply evolves without thought.

Personally, I usually come up with some sort of rough package structure before I start coding, but I don’t document it, and I don’t think about modules. My packages are usually based upon the layers of my multi-tiered design.

For example, my packages typically look like:

com.mycompany.myapp.dao
dao interfaces

com.mycompany.myapp.dao.hibernate
hibernate implementations of the dao interfaces

com.mycompany.myapp.services
service class interfaces and implementations

com.mycompany.myapp.web.springmvc
or com.mycompany.myapp.web.struts
for my web controllers

com.mycompany.myapp.model
for all my model classes

I share my package structure with you so that you can be reassured that others do it this way if this is how you structure your packages. If you don’t structure your packages this way, I am interested in how you do it and why.

There isn't a lot of literature on package dependency and interdependency best practices.

I have to agree with Juergen. Given all the books that I have read, the blogs that I subscribe to and the classes that I have taken, no one has ever spoken on best practices when it comes to package organization. I may have read about package interdependency without realizing what I was reading at the time.

Code needs to be logically organization. Code bases needs to be able to evolve based on its original structure and, even years later, based on completely new requirements.Refactoring and agile development are fine, but how do you preserve backwards compatibility once the code is released? What do you do if you already have people using your published code?

How should modules evolve? Separate modules might need to interact in a later revision, despite the original design not having intended it, thereby introducing new interdependencies at the module or package level.

How well does the code base allow for repackaging into more fine-grained modules if the need arises, while preserving the API?

I like that question. I think it’s fascinating how the Spring Framework can be divided up into a dozen packages or one single package, allowing the users to include only those packages that they need to use for their application. For example, you can use spring-full for the entire spring framework, or just spring-ldap, spring-aop, or spring-core, if you have more specific needs. That fact that the project can be divided into so many modules is a testament to the care and thought that they have put into their package design and code maintainability.

Package Interdependencies
Most public code bases are not good examples of good package design. For example the JDK libraries and many open source projects such as Hibernate. They contain bad practices such as cyclical package dependencies.

Later in the presentation, Juergen goes on to demonstrate how java.lang depends on java.util and how java.util depends on java.lang. For example the String class. He also shows that, on average, any given class in Hibernate depends on two-thirds of the entire Hibernate library.
As a counter example, Juergen states that “Spring does not contain any package circles.”

Typical scenario: Package B depends on package A according to the initial architectural design. New code in package A could use some code in Package B. Since you don't want to duplicate code, you call the code from package A to package B. Now you have package B using package A using package B.
Read as “Now you have package B dependent on Package A dependent on Package B”

Central rule: Packages should have one way dependencies between each other - at most. That is B can call A, but A should not call B.

Why are one-way dependencies between packages so important? Why are circles so undesirable?

Typically, circles are not deliberate, they usually sneek in. Over time, as more programmers touch the code, the original intent or design of the code is lost. Circles can be an indication of code deterioration and indicate to the state or maintenance of the code base.Consequences of cyclical dependencies include limited reuse of packages.

Cyclical dependencies create dependencies so that you can not extract that package for use with another project or some other code base. The packages becomes inseparable. Neither package can be compiled without the other package. So that in the event that you needed to extract some code for another project, you would also need to bring along some other package that has nothing to do with the new code base.

Avoid circular dependencies between packages. This is easier said than done, and many times you don't even realize that you have done it. To remove these circular dependencies, creative refactoring is usually required, which creates backwards compatibility challenges. Regardless, always avoid code duplication.

How do you avoid circular dependencies while at the same time avoid code duplication? Usually this means taking the time to analyze your package architecture to determine the best place for this new code. For example, at first thought you might want to put the code in some deeper package, but after some careful consideration, you determine that the code really needs to be placed in a shallower package.

Module Decomposition and Layering
Modules are conceptual in java. They are conceptual boundaries within your code base. Generally modules are a collection of specific packages that collaborate and/or are conceptually related. They might live in separate source directories but do not have to. Some modules might consist of a single package only, while others may consist of many packages. Identifying modules can be challenging.

Typically, modules are driven by deployment needs as much as conceptual boundaries. For example part of the code might be used by one application deployed on one server, while another part of the code might be used be a second application. Others created modules based on multi-tier separation. However, this is unnatural since it does not match conceptual boundaries.

So, the package structure that I use, which I described above would be considered a package structure based on multi-tier separation which Juergen states is “unnatural.” Sometimes modules are isolated by specific dependencies, such as JDK 1.5 or Hibernate. However, this is not a conceptual boundary, so it is a rather incorrect use of module separation. Sometimes modules are created to keep jar file size down. Again, this is an incorrect usage.

Desirable characteristics of modules include low coupling to other modules, and high cohesion within the module.

Modules are conceptual units as much as a source management and deployment unit. Modules should allow for individual usage or a distinct role within a larger system. For example, a new developer on the team, or a new user of your code should be able to review a module and understand the reasoning or the usage of that module - thus it is a complete, implemented concept with clear boundaries.

This sounds ideal. I know it would be great if when I was assigned to a new project I could pull it down and see distinct modules, and know what each of their responsibilities were, and knew that they did not overlap.

Modules should not have circular dependencies. Therefore module 1 -> module 2 -> module 1 is an undesirable situation.

Layering is essentially a logical view on the package structure. Higher layers build on lower layers. That is, higher level packages depend on lower level packages, not the other way around.

I seem to follow this rule of thumb with my DAO packages, being com.mycompany.myapp.dao and com.mycompany.myapp.dao.hibernate.

The module structure might have a straightforward mapping into layers. However this is not strictly necessary since modules might be a vertical slice. Modules are often driven by deployment considerations more than layering. That is, modules are usually vertical while layers are usually horizontal.

Establish natural conceptual boundaries in your code base. It does not mater where your code resides. It could be a single shared source root, or one source root per module. It does help if the source code structure mirrors the conceptual structure. This creates a natural package naming system and makes navigation easier.

Evolving a Large Code Base
The hardest challenge is evolving the code as well as the architecture over time without letting the code deteriorate and without compromising on architectural quality. This becomes exponentially harder with growing size of the overall code base. Consider the situation where you have many developers involved and no single point of architectural management or enforcement at the fine-grained artifact or module level.

You are going to see inconsistencies across the code, duplicated code, inconsistent naming, and most likely a decrease in loose coupling and high cohesion.

What is the trade-off between backwards compatibility and architectural quality? Strict 100% backwards compatibility might not allow for sustaining the architectural quality level. Nevertheless, there is always a better solution than compromising on architectural quality. For example a creative internal refactoring that allows to preserve compatibility as well as well-defined package dependencies.

Certain packages, for example the public API or core packages, can almost never be changed. The code base must revolve around these packages. One solution, if you must change the public API or core, is deprecation but even this is sometimes not possible. Some changes can only be done by breaking backwards compatibility. What do you do in that case? Break backwards compatibility or reduce the integrity of the architecture? The Spring development team is willing to accept a small breakage of backwards compatibility as long as the end result is a strong architecture.

Case Study: the Evolution of Spring

Background of Spring core:

Origins back to early 2001.
First public release was in 2002
First public source release was mid 2003
1.0 release in 2004
2.0 release in 2006, largely compatible with 1.2.

The spring project has faced many code evolution challenges including a broad public API used by applications; sophisticated SPI used by advanced applications as well as sister products and third-party frameworks; and new requirements addressed in every release, often implying some refactoring.

A tough situation that would not have been possible if not for their straightforward architecture and their strict adherence to the rules: accept small breakages of backwards compatibility as long as the end result is a strong architecture, and right from the start, no circles allowed at package level, not even as a temporary measure.

How has the Spring code base survived in its original shape for 3.5 years?

Strict architecture management and loosely coupled packages with well-defined interdependencies.Right from the start, no circles allowed at package level, not even as a temporary measure.

When introducing new code or functionality, great care is given in determining where the new code will go.

Challenges
Ever changing third party libraries.
Hibernate 2.1 -> 3.0 -> 3.1 -> 3.2
Resulted in the creation of the package hibernate3
Quartz 1.3 -> 1.4 -> 1.5 -> 1.6
What do you do in case of incompatible API changes in such libraries? The goal is always to try to maintain backwards compatibility, so in many cases, Spring developers were required to use runtime reflection to determine which library is in use and to maintain this compatibility. With reflection Spring can check to see if a certain method exists, and if it does, it is called, if it does not, it is skipped.

Tools for Architectural Analysis
How do we make sure that our architecture remains sound and avoids architectural violations? The best thing to do is to always keep these restrictions in mind, but even this can only get you so far.

Manual analysis only gets you so far, it is like manual testing vs. automated, repeatable testing. The only way to guarantee this is to use tools. Since 2003, Spring has been using JDepend before every public release. They have also recently introduced the SonarJ tool.

JDepend traverses Java class file directories and generates design quality metrics for each Java package. JDepend allows you to automatically measure the quality of a design in terms of its extensibility, reusability, and maintainability to manage package dependencies effectively.

JDepend is an open source tool that has been around since 2001. It is typically used as a command line tool. It can generate an analysis report which, among many other things, includes package dependency cycles.

SonarJ is an innovative solution that helps you to manage and monitor the logical architecture and the technical quality of your Java projects.

SonarJ is a commercial tool, is gui-driven, and is highly customizable. SonarJ also allows for on-the-fly analysis.

The presentation ends with a demonstration using SonarJ to analyze Spring.

I was about to go check out JDepend, when I realized that the IDE that I use, IntelliJ, contains cyclic dependency and backwards compatibility tests via the Analyze menu. Fellow IntelliJ users should see http://www.jetbrains.com/idea/features/code_analysis.html#Dependencies_Analysis and http://blogs.jetbrains.com/idea/2006/04/analyzing-code-dependencies-part-i/

No comments: