Revision f88a19d3-b236-4047-b448-44d422257fea - Software Engineering Stack Exchange

# Foreword

I agree with you that this whole thing sounds like a daunting - but
fascinating - task, and that there's a lot of ground to cover. So I'm
humbly going to suggest what I think could be a rather comprehensive
guide to use for your team, with pointers to appropriate tools (and
alternatives) and appropriate reading or educational material to
share.

<sub>_I'm planning a rather large answer to this, and this is only a
work-in-progress at the moment, so check back again later for more as
I won't have time to finish it now. Apologies for that._ </sub>



-----



# Executive Summary for the Impatient

 * Define a **rigid project structure**, with:
 * **project templates**,
 * **coding conventions**,
 * familiar **build systems**,
 * and sets of **usage guidelines** for your infrastructure and tools.
 * Install a good **SCM** and make sure they know how to use it.
 * Point them to good **IDEs** for their technology, and make sure they know how to use them.
 * Implement **code quality checkers** and **automatic reporting** in the build system.
 * Couple the build system to **continuous integration** and **continuous inspection** systems.
 * With the help of the above, identify **code quality "hotspots"** and **refactor**.

_Now for the long version... Caution, brace yourselves!_



----



# Rigidity is (Often) Good

_This is a rather controversial opinion, as rigidity is often seen as
a force working against you and slowing you down. It's true for some
phases of some projects. But once you see rigidity as a structure, a
framework that takes away the guesswork, it greatly reduces the amount
of wasted time and effort. Make rigidity work for you, not against
you._

## Rigidity of the Project Structure

If each project comes with its own structure, you are lost and need to
pick up from scratch every time you look at it, and the same applies
to each newcomer. You don't want this in a professional software
engineering shop, and you don't want this in a research lab either.

## Rigidity of the Build Systems

As mentioned above, if each project **looks** different, there's a
good chance they also **build differently**. A project's build
shouldn't require too much research or too much guesswork. In general,
you want to be able to do the canonical thing and not need to worry
about specifics: `configure; make install`, `ant`, `mvn install`,
etc...

A quick `README` at the root to point to things that differ, but
that's all there should be (in an ideal world).

Plus, this also greatly facilitates other parts of your build infrastructure, namely:

 * [continuous integration][1],
 * [continuous inspection][2].

It also helps to ensure that all projects are built to the same level
of quality, but re-using the same build system for all of them and
making it evolve over the time. Not only do you keep it (and all your
projects) up to date, you also make it stricter over time, and more
efficient at reporting potential mistakes and enhancements. Do no
reinvent the wheel for each project, and reuse what you have already
done.

**Recommended Reading:**

 * _[Continuous Integration: Improving Software Quality and Reducing Risk][3]_ (Duval, Matyas, Glover, 2007)

## Rigidity in the Choice of Programming Languages

You probably can't expect, especially in a research environment, to
have all teams (and even less individual developers) use the same
language and technology stack. However, you can identify a set of
"officially supported" languages and frameworks, and encourage their
use. The rest of other languages, without a good rationale, shouldn't
be permitted beyond prototyping.

It is essential to keep your build system simple, and the maintenance
and breadth of required skills to a bare minimum, a core of
technologies and tools.

## Rigidity of the Coding Conventions and Guidelines

Coding conventions and guidelines are what allow you to develop both
an identity as a team, and a shared _linguo_. You don't want to err
into _terra incognita_ every time you open a source file.

There's no use trying to enforce non-sensical rules that will make
things harder or to forbid things to the extent that commits would be
refused based on a single violation. However is takes away a lot of
the whining and of the thinking if you identify a clear, concise set
of ground rules that **nobody** should break under no circumstances,
and a set of recommended rules that are advised to be followed.

I am fairly aggressive when it comes to coding conventions, some even
say _nazi_ <sup>(without wanting to offend anyone with the
evocation)</sup>, because I do believe in having a _lingua franca_ and
a recognizable style for my team. When crap code gets checked-in, it
stands out like a cold sore on the face of an hollywood star, which
helps you to identify that a quick review and action are required. In
fact, I've sometimes gone as far as to advocate the use of pre-commit
hooks to reject commits if they do not satisfy some common rules. As
mentioned before, it shouldn't be overly crazy and get too much in the
way, especially as you try to introduce these measures. But it may be
well-worth it if you spend so much time reviewing and dealing with
crap code that you can't work on real issues.

Some languages enforce some rules by design. Java was meant to reduce
the amount of dull crap you can write with it (though no doubt it can
be done, as evidenced here and on SO), for instance. Python's block
structure by indentation is another idea in this sense. Or the Go
programming language with its `gofmt` tool, which completely takes
away any styling work - **and ego!!** - out of coding effort: if
you run it before every commit, things are sure to be always looking
fine for everybody.

Be sure to make it so that **critical code gore** cannot slip
through. **Code conventions**, **continuous integration** and
**continuous inspection**, and **pair programming** and **code
reviews** are your best weapon against this demon.

Plus, as you'll see below, **code is documentation**, and that's
another area where your conventions should encourage proper
readability and clarity.

## Rigidity of the Documentation

Documentation goes hand in hand with code. Code itself can be
documentation. But there must be clear-cut instructions on how to
build things, how to use things, and how to maintain things.

Using a single point of control for documentation (like a WikiWiki or
DMS) is a good thing. Create separates spaces for projects, separate
spaces for more random banter and experimentation. And make sure that
each of these spaces reuses a set of common rules, and that people
take care of following them when they edit it.

In fact, most of the instructions that apply to code and tooling here
apply to documentation as well.

### Rigidity in Code Comments

Code comments, as mentioned above, are also documentation. Developers
like to express their feelings about their code (mostly pride and
frustration, if you ask me). So it's not unusual for them to express
these in no uncertain terms in comments (or even code), when a more
formal piece of text could have conveyed the same meaning with less
expletives or drama. It's OK to let a few slip through for fun and
historical reasons: it's also part of **developing a team
culture**. But it's very important that everybody knows what is
acceptable and what isn't, and that comment noise is just that:
**noise**.

### Rigidity in Commit Logs

Commits logs are not this annoying thing of the usage lifecycle of an
SCM that you just need to skip to get home on time or get on with the
next task, or to catch up with the buddies who left for lunch. They
matter, and, like (most) good wine, the more time passes the more
value they have. So make sure they are done right. I'm always
flabbergasted when I see co-workers writing one-liners for giant
commits, or for non-obvious things.

All commits are done for a reason, and that reason may not be clearly
expressed in the one line of code you added and the one line of commit
log you entered. There's more to it than that. **Each line of code has
a story, and a history**. The diffs can tell its history, but you have
to write its story.

> Why did you need to update this line? Because the interface changed.
>
> Why did the interface changed? Because the library that provides it
> was updated.
>
> Why was this library updated? Because it's a dependency on another
> library that we needed to implement feature X.
>
> And what's feature X? All about it is in `TASK_KEY_HERE`.

`Git` actually gets this right in that it is more geared towards
providing good logs than any other SCM. Though it's not my SCM of
choice, and not necessarily the best one for your lab either; but it
gets this right. It lets you provide a short log, and long log. Leave
the general update to the `shortlog`, with the reference task IDs to
link to your issue tracker (yes, you need one), and expand in the long
log. Write the changeset's **story**.


<sub>For crying out loud, if you can do it on a blog, you can do it in a
log. It's the same origin for (We)Blogs, after all: just keeping track
of things.</sub>

Really ask yourself the question:

> If I were searching for something about this change later, would
> this log answer my questions?

### Documentation and Code, and Projects as a Whole, Are ALIVE

You need to keep them in sync, otherwise they do not form that
symbiotic entity anymore. That's why it works wonders when you have:

 * clear commits logs in your SCM, with links to task IDs in your
 issue tracker,
 * where this tracker's tickets themselves link to the changesets in
 your SCM, and possibly to the builds in your continuous integration
 system,
 * and a documentation system that links to all of these.

Code and documentation need to be cohesive.

## Rigidity in Testing

Any new code shall come with (at least) unit tests.

Any refactored legacy code shall come with unit tests.

Period.

Of course, these tests need to actually test something valuable, and
to not be just a waste ot time and energy. They need to be well
written and commented, just like any other code you check in. They are
documentation as well, and they help to outline the contract of your
code. Especially if you use [Test Driven Development][4]. But even if you
don't, you need them for your peace of mind. They are your safety net
for the future (for maintenance, for future enhancements) and your
antibiotic against normal code rot.

And of course, you should go further and have [integration tests][5],
and [regression tests][6] for each reproducible bug you fix.

## Rigidity in the Use of the Tools

Sure, it's OK for the occasional developer/scientist to want to try
some new static checker on the source, generate some graph or model
using another, or implement a new module using a DSL. But it's best if
there's a canonical set of tools that **all** team members are
expected to know about and to use.

I regard it as generally OK to **recommend** a default working
environment with these tools, but to let each developer use their IDE
or editor of choice, **as long as they are productive** AND **do not
require regular assistance** to adjust to your general infrastructure
AND **do not modify the common areas (code, build system,
documentation...) in ways that affect other developers**. If that's
not the case, then it's fair to enforce that they fallback to your
defaults.

**Note:** Of course, some flexibility is good. Letting someone
occasionally use a shortcut, a quick-n-dirty approach, or a favorite
pet tool because it **gets the job done** is fine... But **never**
let it become a habit, and don't let this snippet of code or
prototype become the actual codebase to support.



----



# Team Spirit Matters

### Develop a Sense of Pride in Your Codebase

 * Develop a sense of Pride in Code
 * Use wallboards
 * leader board for a continuous integration game
 * wallboards for issue management and defect counting
 * Use an [issue tracker][7] / [bug tracker][8]

### Avoid Blame Games

 * DO use Continuous Integration / Continuous Inspection games: it fosters good-mannered and [productive competition][9].
 * DO keep track defects: it's just good house-keeping.
 * DO **identifying root causes**: it's just future-proofing processes.
 * BUT DO NOT [assign blame][10]: it's counter productive.

### It's About the Code, Not About the Developers

The whole point to make developers be conscious of the quality of their code, but to see it as a detached entity and not as an extension of themselves (and react badly when a part of this extension is criticized. Encourage [ego-less programming][11] for a healty workplace but do rely on ego for motivation.


----



# From Scientist to Programmer

You can't expect people who do not value and take pride in code to
produce good code. They need to discover how valuable (and fun) it can
be, for this property to emerge. Sheer professionalism and desire to
do good is not enough: good code needs passion. So you need to turn
your scientists into **programmers** (in the large sense).



-----



# Code Maintenance is Part of Research Work

Nobody wants to read a crappy research paper. They are proof-read,
refined, rewritten, resent for re-approval countless times until they
reach this final state that's deemed good enough for publication. The
same applies to a thesis. And **the same applies for a codebase!**

You want to make it clear that constant refactoring and refreshing of
a codebase is what prevents code rot and technical debt, and what
facilitates future re-use and adaptation of the work for other
projects.



----



# Why All This??!

In the end, why do we need all the above? For the Holy Grail: **code
quality**. Or is it **quality code**...?

All of the above aims at driving your team towards this goal. Some
aspects of it does it by genuinely wanting them do it themselves
(which is much better) and others by slightly taking them by the hand
(but that's how you educate people and develop habits).

But how do you know if you have found the Holy Grail, and not some
cheap knock-off (which might make you turn to dust quickly, which is
unpleasant)?

## Quality is Measurable

Not always quantitatively, but it **is measurable**. As mentioned
above, you need to develop a sense of pride in your team(s), and
showing progress and good results is key. Measure code quality at
point T, and show progress between intervals. Show how it matters. Do
retrospectives to reflect on what has been done, and how it made
things better or worse.

There are great tools out there for **continuous inspection**. [Sonar][12]
being one of them, quite popular in the Java world, but it can adapt
to other technologies; and there are many others. Keep your code under
the microscope and look for these pesky annoying bugs and microbes.



----



# But What if My Code is Already Crap??

Of course, all of the above is fun and cute like a trip to Never Land,
but it's not that easy to do when you already have (a big pile of
steamy and smelly) crap code.

Here's the secret: **you need to start somewhere**.

> **Personal anecdote:** In our current project, we are working with a
> codebase that originally was more than 650,000 lines of Java code,
> more than 200,000 lines of JSPs, more than 40,000 lines of
> JavaScript, and more than 400 MBs of binary dependencies on
> external projects and libraries.
> 

> Today, after about 18 months, we have 500,000 lines of **(MOSTLY
> CLEAN)** Java code, around 150,000 lines of JSPs, and still about
> 38,000 lines of JavaScript, and our dependencies are down to barely
> more than 100MBs (and these dependencies are not in our SCM
> anymore!).
>
> **How did we do it?** _We just did all of the above, or we try to._
>

> It's a huge team effort, but we slowly **inject** new regulations
> and new tools that help us to monitor the heart-rate of our product,
> while we hastily **slash** away the fat of the crap code and useless
> dependencies we can find. We didn't stop all development to do
> that. We have occasional periods of relative peace and quiet where
> we are more or less free to go crazy on the codebase and tear it
> apart, but most of the time we just do it all by defaulting to sort
> of a "review and refactor" mode every chance we get: when things
> build, over lunch, during team bug fixing sessions, when Friday
> afternoons get drowsy...
>
> We did have a few big construction sites... Switching our build
> system from a giant Ant build of more than 8500 lines of code to a
> multi-module Maven build was one of them. We now have clear-cut
> modules (or at least it's already a lot better than before, and we
> still have big plans for the future), automatic dependency
> management (which allows for easy maintenance and updates, and
> allowed to remove lots of them), and faster builds that are easier
> to get started with and to reproduce on demand, and to integrate
> with code quality tools.
>
> Injecting some "utility tool-belts" into the codebase, even though
> we were trying to reduce dependencies, was another: Google Guava and
> Apache Commons can help you code slim down to a much smaller size,
> and reduce surface for bugs in **your** code a lot.
>
> Persuading our IT department that maybe using the tools we use today
> (JIRA, Fisheye, Crucible, Confluence, Jenkins) was better than the
> ones in place. We still need to deal with a few ones we despise (I'm
> looking at you, QC, Sharepoint and SupportWorks), but it's still
> been a huge improvement, and we believe there's still room for
> improvement.
>
> And every day, there's a trickle of between one to dozens of commits
> that deal only with fixing and refactoring things. We do
> occasionally break stuff (remember, you need unit tests kids, and
> better write them **before** you refactor stuff away), but overall
> the benefit for our morale AND for the product has been enormous. We
> get there one fraction of a code quality percentage at a time. **And
> it's fun to see it increase!!!**

_It's indeed important to note that every once in a while, rigidity
(here, the one of our IT department and other development teams in the
company) needs to be shaken to make room new and better things. But
you need to prove that they are indeed better, and will boost your
productivity. Trial runs and prototypes are here for this._

## The Iterative Spaghetti Code Refactoring List

Once you have some quality tools at your toolbelt:

 1. Run checkers
 2. Identify hotspots.
 3. Fix critical hotspots and violations first.
 4. Fix minor violations for which fixes can be automated with a large
 sweep. <sup>(It reduces noise so you are be able to see
 significant violations when they appear on the radar.)</sup>
 5. Go back to 1 and repeat until you're satisfied with your
 code. <sup>(Which, ideally, you should never be, if this is still
 an active product.)</sup>


 [1]: http://en.wikipedia.org/wiki/Continuous_integration
 [2]: http://www.ibm.com/developerworks/java/library/j-ap08016/index.html
 [3]: http://www.amazon.com/Continuous-Integration-Improving-Software-Signature/dp/0321336380
 [4]: http://en.wikipedia.org/wiki/Test-driven_development
 [5]: http://en.wikipedia.org/wiki/Integration_testing
 [6]: http://en.wikipedia.org/wiki/Regression_testing
 [7]: http://en.wikipedia.org/wiki/Issue_tracking_system
 [8]: http://en.wikipedia.org/wiki/Bug_tracking_system
 [9]: http://www.codinghorror.com/blog/2009/05/how-to-motivate-programmers.html
 [10]: http://programmers.stackexchange.com/questions/83038/who-is-responsible-for-defects-found-during-development
 [11]: http://www.codinghorror.com/blog/2006/05/egoless-programming-you-are-not-your-job.html
 [12]: http://www.sonarsource.org/