Sunday 3 January 2016

Lessons learned

The project I've been working on since May, 2012, went live September 14, 2015. It's an ERP system for a sales company, which specializes in industrial HVAC rep sales (where you 'represent' the manufacturer). It is nice to announce the deployment of a 100% Smalltalk application, built with VisualWorks, GemStone and Seaside.

Our users are happy, mostly. They want more features, and they want them sooner than later. Not a bad place to be.

Personally, it's  been both a rewarding & frustrating project. Rewarding because I get to work for a far-sighted company that sees the value of a custom application, and can deal with the risks of using a niche technology. Frustrating because it could have been done better (which, I suspect, is true of just about any project).

The past couple of years have been a head down, ignore everything else, focused effort. We've done some interesting things, many of which I had hoped to share, but there never seems to be any time; work takes it all.

So, with the benefit of hindsight, here is what I've learned...

Have a project champion
Our project champion is one of the founders of the company. Without him risks would not have been taken, and the project would not have happened. We replaced a 20+ year old custom system that was also championed by the same person. He, and the company, believe that a custom ERP provides a competitive advantage. The old software proved it, which made selling the idea of a new system easier.

Smalltalk productivity rocks
Total effort was about 16 to 18 person years (our team size varied from 3 to 5 over 3.5 years). Compare that with effort to deploy something like SAP, and we look good. Our team's productivity will really shine as new features and customizations are rolled out over the next couple of years.

Expect a long tail of trivial things
What really stands out is how much time we spent (and continue to spend) on the little things. It tends to be boring, almost clerical work. But it's what users notice. Font sizes, colours, navigation sequences, default values, business rule adjustment... nothing intellectually challenging.

The beginning of the project was fun: figuring out how to use thousands of VW window specs in a Seaside application, including modal dialogs and dynamically morphing views. Finding ways to hold complex updates prior to a Save / Cancel decision. Building a new report framework that allows for edits and generates PDF content. Adding application permissions. Implementing a RESTful web-to-GS mechanism. And so on. All good stuff, but mostly done.

Pay your technical debt early
Looking back, we would have been better off not trying to preserve the ecosystem of the old fat client framework (the idea was to keep most of the domain code as is). Instead, we should have started from scratch, using the old system as a spec. The old framework was garbage. We knew it, but thought the technical debt could be managed. It was, but at a cost. We now know where we spent our time; it's evident switching to new code earlier would have allowed us to be deployed earlier.

If you see garbage code, be ruthless and get rid of it. Bad code is like a bed bug: it will keep biting until properly exterminated.

Show progress
Users need to see progress. And developers need feedback. We hit the jackpot with our beta users: they were willing to put up with a lot of early unfinished code. It gave them a view of what was to come, and they communicated that to the rest of the company. Our project dragged on a bit, but they saw progress, which made the delays palatable.  

Have clear metrics
It's so easy to get caught up in the moment, and to work on what is of interest right now, because that is what users see. But if you do that, you'll forget about the long term, and the important internal stuff just won't get done. If developers are not measured on the long term deliverables, there is little incentive to work on them.

Make long term metrics just as visible as short term ones. Break them down and make them part of each iteration, even if they are obscure and of no interest to the end users. It will be frustrating, You'll get asked "why are you working on that and not the feature I'm waiting for?". But they'll be far more aggravated if the application is not reliable. It's like backups: you don't notice their absence until you need them. Be sure they see the value of the boring internal tech stuff.

Use agile development 
We release a new production version every two weeks, with minor changes published twice each week. Developers merge their code every couple of hours. All new code is expected to have an SUnit test. The full test suite is run each night with Jenkins and keeping test green is the first developer priority. We pair up for tricky problems. Refactoring is considered to be a 'technical investment'. All changes are tracked (we have a nifty issue management tool).

Reviewing the process is part of the process. We adjust things almost every week. It's not easy, but getting to a smooth productive rhythm is so worth it.

What next? Mobile web interfaces, an Android app (we can do that with Pharo once a VM is ready), moving to a GS Seaside interface (need to port PDF4Smalltalk to GS), and a lots of small stuff.

Simple things should be simple. Complex things should be possible. - Alan Key


  1. Great post!

    Can you share some examples of metrics you used? :)

  2. For quality, we use Jenkins to report on the number of SUnit tests we have (which we expect to be increasing) and the number of failed test (the fixing of which is a top priority). Each modification & problem fix is expected to have a documented test, either an SUnit test or some manual test sequence.

    For features we use our Issue Library to track our backlog, which, now that we're live, keeps increasing with new user requests. Our application will be mature when the list starts to steadily go down.

    For performance, we track the time each GS call takes, and generate a rolling four hour histogram. A quarter are under 5ms, the remaining 95% are under 300 ms. We aggregate these into weekly stats.

    For code, we keep a method and class count. And we count the number of legacy methods (by name pattern) that remain to be removed. Our daily Jenkins driven VW image build includes a code critic report, along with a number of application meta-data metrics.