Friday 1 November 2013

The Trouble with Scrum

Scrum. An iterative and incremental agile software development framework. It’s full of buzzwords. It frees us from the tyranny of waterfall development (not to say that ever existed anywhere anyway ). It’s based on the premise that the customer doesn't know what they want; we iterate quickly and deliver every sprint. We communicate with the customer often, inspect and adapt, and we build what the customer wants.

Sorted. We fired the silver-bullet and scored a direct hit.

Well no. Some things don't fit into our nicely delineated sprints. Where does user experience fit into a two-week sprint? Where does architecture fit? Where does overall product quality fit in?

Stories that we can’t estimate are known as epics. According to Scrum terminology, there’s nothing intrinsically wrong with epics, as long as they aren't high priority (!). We can’t directly work on epics. We can't put a story like "Fix User Experience" on the board. Project managers would go insane, blood would be spilt.

So what do we do? Well, if your experience is anything like mine then we try to break a story up into smaller bits. Perhaps we break “Great User Experience” into pithy little tasks such as "Move Button" and "Improve Dialog". Maybe the great architecture revisiting is broken into a small prod into the right direction. "Extract class for FooBar" or "Break X and Y Dependency".

Do these small tasks make sure that we get the best user experience? Do these tasks make sure that we've got an architecture to support the needs of the code over the next few months?

Of course not.

How do we make sure that cross-cutting concerns like user experience, quality and architecture are given adequate attention in an iterative development environment? I’m not sure I have the answer (I'm not sure that anyone does), but I have a suggestion.

A clearly communicated vision.

It doesn't matter whether it's user experience, product quality or software architecture. A clearly communicated vision gives you a tool for making the right decisions as you build software.

Am I suggesting the dreaded "Big Design Up Front"? No! It doesn't need all the minutiae just enough to navigate in the right direction. You might say, Just Enough Design Upfront.

Wednesday 30 October 2013

Not all names are created equal

I think everyone agrees that naming things is one of the hardest things you can do? Books like Clean Code devote whole chapters to naming. Names should convey meaning so that the next person reading the code has an easier job understanding what it does. After all, we read code far more than we write it. It's definitely OK to spend some time arguing about the right name. It's important.

So that's it. Names are important. Job done? Of course not! There's more to the story than that.

At Agile Cambridge 2013, I attended a session (Unpicking the Haystack) where the source code was only available from decompiled byte code (some sad story involving not using version control, not backing up and all the things that no-one ever does). Our task was to recover what the original program actually did. When we're looking at decompiled code almost all the naming information has gone. By the time you've gone source code to binary to source code you've lost variable names. Unsurprisingly, trying to decipher what the code does with local variables such as a1 to a999 is very hard.

With variable names gone, we have to look for other clues for programmer intent. So what else is there? Well, it certainly helps that public methods aren't lost. In this respect, method names are more important to get right than variable names. The naming is stickier. But something far more important gives us even more clues about this mystery code base.

Enter types. Decompilation reveals the names of public types. Names of types can show much more information than the variable name. For example, string s reveals little, whereas URL s reveals much more. If we're disciplined followers of domain-driven design then our types align with the problem we are solving. I'd say that types sit right at the top of the most-important-things-to-name-correctly hierarchy.

In this view of decompiled code, some names are more important than others. Parameter names and local variables are least important, whereas type names are the most important (with methods a close second).

Coming at names from decompiled source is certainly a weird way to do it, but this seems to fit with Bob Martin's definition of name length.



I'd like to try to reinforce the view that types are by far the most important thing to get right. Crisply named abstractions matter more than almost anything else. To explore this area, we'll look at a strongly-typed static language, Haskell, and explore just enough syntax to understand its types. But first...

What is a type? A type is a label that describes properties of all objects that are instances of this type. If you see string in C#, you know you are getting an immutable set of characters with certain methods available. If you see a AbstractSingletonFactoryVisitorBean then you know you've got problems. I'm kidding.

Anyway, back to sensible types. Types describe program behaviour. Don't believe me? Let's begin our detour into Haskell:


-- Whenever you see "::" replace it with "is of type"
-- When you see a capital letter variable then you've got a type
-- add5 is of type Int, returning Int
add5 :: Int -> Int
add5 x = x + 5

-- Parameters are separated by ->
-- For the purposes of this, let's just say the last one is the return
-- type and the rest are the arguments
-- add is of type (Int -> Int) returning Int
add x y :: Int -> Int -> Int
add x y = x + y

-- Generics are represented with lower case paramets
-- middle is of type three generic parameters (a,b,c) returning b
middle :: a -> b -> c -> b
middle x y z = y


Let's look at that last one again. middle :: a -> b -> c -> b. From the name we might guess that it returns the middle argument (e.g. middle 1 2 3 returns 2). Is there any other definition of what the function could do? In Haskell, there's no such thing as type-casting, if all I know is that something could be any type, there's not many options. I can't add anything to it. I can't convert it to a string. In fact, I can't do anything with it other than return it. The types don't let me. Types constrain the implementation choices to a more sensible subset.

Do the names matter? We know that the argument x has type a. Is there any more descriptive name? Probably not, from the type we have no idea what properties hold for the types so a long descriptive name is just wasting space. For all we know, the argument could be a function. Or it could be a monad. What are you going to call it?

Is the method name important? It's definitely nice to have a good name, but is it essential? If I gave you quux :: a -> b -> a I'm betting you could tell me what it does?

In fact, armed with just a little knowledge about types you can start to infer what functions do without even needing to see their definition. Here's a few random functions with really poor names; what do they do?


bananaFactory :: a -> a

-- (a,b) is a tuple of two elements of type a and type b
spannerBlender :: (a,b) -> a

-- (a -> b) is a function taking anything of type a and returning type b
-- [a] is a list of items of type a
omgWTF :: (a -> b) -> [a] -> [b]

-- "Num a =>" says a must be an instance of the Num typeclass
-- think of this as specifying an interface
-- boing is a function of type taking two numbers and returning a number
boing :: (Num a) -> a -> a -> a

-- m is a type constructor that takes an argument of any type a
mindBlown :: (a -> b) -> m a -> m b


Armed with this basic knowledge of reading Haskell type signatures, you're now equipped to use Hoogle. You can search for the type signatures given above (a -> a, (a,b) -> a, (a -> b) -> [a] -> [b] and (a -> b) -> m a -> m b) and get a good idea of what these functions do.

So that's why I think long variable names are less common in functional programming. It's because the languages are terser (Uncle Bob's rule still applies) and because the type signature gives you the power of reasoning, not the variable names.



Names are important; but not all names are equally important.

Thursday 10 October 2013

Software Architect 2013 Day #2

What's wrong with current software architecture methods and 10 principles for improvement

Tom Gilb showed us a heck of a lot of slides and tried to convince us that we must take architecture seriously. I don't disagree with this, our industry could definitely do with a bit more rigour. Tom was very forthright in his views, and I appreciated his candour.

The system should be scalable, easy customizable and have a great user interface

That's a typical "design constraint" that we're probably all guilty of saying. This is nothing more than architectural poetry (putting it politely) or complete and utter bullshit. In order to take architecture seriously we should measure. Architecture is responsible for the values of the system. We should know these values and be able to measure them. If a given architecture isn't living up to these values, we should replace it with something that does. Architecture exists solely to satisfy the requirements.

Real architecture has multi-dimensional objectives, clear constraints, estimates the effects of changes. Pseudo-architecture has no dedication to objects and constraints, no ideas of the effects and no sense of the relationship between the architecture and the requirements.

If we're going to take architecture seriously, then we need to start treating it as engineering. We must understand the relationship between our architecture and the requirements of the system. We must demonstrate that our architecture works.

And then the wheels came off.

I don't work with huge systems, but I can clearly see that understanding the relationship between an architecture and the requirements is a good thing. Unfortunately, Tom presented examples from a domain that was unfamiliar to me (300 million dollar projects). In the examples, incredibly accurate percentages were shown (302%). At that point, I lost the thread. Estimates are just that, and if experience has taught me anything, it's that estimates have HUGE error bars. I didn't really see how all that planning up front led to a more measurable design. I've got a copy of Tom's book, Competitive Engineering so hopefully I can fill in the blanks.

Building on SOLID foundations

Nat Pryce and Steve Freeman gave a thought-provoking presentation entitled "Building on SOLID foundations" which explored the gap between low-level detail and high-level abstractions.

At the lowest level we have guidelines for clean code, such as the SOLID principles. At this level, it's all about the objects, but not about how they collaborate and become assembled into a functioning system. Even with SOLID principles applied, macro level problems occur (somehow all related to food metaphors), colourfully referred to as "raviolli code". Individual blocks are well organized, but as a whole it still looks like a mess. "Filo code" is code that's got so many layers you can't tell what's going on. "Spaghetti and Meatballs" code is an application with a good core, but the communication glue surrounding it is a huge mess.

At the highest level we have principles such as Conway's Law, Postel's Robustness Principle, CAP, end-to-end principle and REST.

But what's in the middle?

In the middle there are some patterns, such as Cockburn's, Hexagonal Architecture that help us structure systems as an inner domain language surrounded by specific adapters converting that data to the needs of the client. The question remains though; what are the principles between low and high level design?

Nat and Steve assert that compositionally is the principle for the middle. We should adopt a functional type approach and build a series of functions operating on immutable data in a stateful context. That sounds complicated, so what does code written in this style look like? Hamcrest gives us some examples, where by using simple combinators (functions that combine data) you can build up complicated expressions from simple operations (see the examples).

Having done a fair bit of Haskell I found it really easy to agree with this point of view. When there's no mutable state you can reason about code locally (and not checking for mutation). Local reasoning means that I can understand the code without jumping around. This is a hugely important part of a well-designed system.

I was slightly concerned to hear this style of programming as Modern Java. I hope it's not, because using Java like this feels like putting lipstick on a pig. One of the things I value in Haskell is that composition is a first class citizen. Partial application, function composition and first class functions mean that gluing simple code together to make something powerful is incredibly easy. I hope we're at that awkward point in language evolution where we're stretching our current languages to do things they don't want to do. Maybe this is finally the time when a functional language hits mainstream? (maybe it's Clojure or Scala.

We tried adopting this style of programming at Dynamic Aspects when building domain/j [PDF]. It was fantastic fun, and I really love Java's static imports for making the code lovely and terse (finding out $ is an operator also helps). Something about it feels dirty though. Haven't quite put my finger on what that was then, and hopefully with lambdas in Java 8 it's more natural.

So what is the bit in the middle? The bit in the middle is the language that describes your domain. Naming is everything and you should do whatever you can to make it easiest to understand. Eschewing mutable state and using functional programming to compose multiple simple operators seems to work!

Agile Architecture - Part 2

Allen Holub gave a presentation on agile techniques for design. Allen examined the fragile base class in some depth, before recapping CRC cards (not used enough!). Allen is a good presenter, so it was great to have a recap and have a few more examples to stick in my brain!

.

Leading Technical change

Nate closed out the day by giving a presentation on Leading Technical Change. It was well presented and focused on two things. How do you keep up with technology and how do you engage your organization to move to different technologies?

Nate presented some really disturbing statics about how much time Americans (and presumably other countries) waste on TV. Apparently the average American watches 151 hours of TV a month! Wow.

Nate introduced the audience to the idea of the technology radar which allows you to keep track of technology that is hot for yourself or your organization. We're trying to build one at Red Gate. We've also experimented with skills maps too, and you can see an example from a software engineering point of view here (love to know what you think?).

Introducing change is hard, and Nate presented the same sort of ideas that Roy presented the previous day. Change makes things worse in the beginning, but better in the end. Having the courage to stick out the dip is a hard thing! (image from here)

I have to admit, I didn't take many notes from this talk because I was enjoying it instead :) It was well presented and engaging with the audience. In summary, change is hard and it's all about the people. I think deep down I always knew that (people are way more complicated than code) but it was great to hear it presented so well!

Software Architect 2013 Day #1

The Coaching Architect

Roy Osherove presented "The Coaching Architect". If you want a better idea of the manifesto, read Notes to a Software Team Leader, it's a great book!

Roy asserts that your role as a "Software Architect" / team leader / leader of any kind is to grow the team to solve problems on their own. Far from making you redundant, this makes you a highly valued employee; by growing others, you'll always be wanted. Unfortunately, this means stepping outside your comfort zone and dealing with people.

Many managers like to take the money, but not do all the hard parts (Gerald Weinberg)

Learning something new is tough. Everything you learn has a downslope initially. You lose productivity, it's hard. However, once that thing has clicked, your performance rises. This pattern never ends!

I've seen this behaviour before in Programming Epiphanies. Initially I'll try a new technique or language (let's say when I first found C++ and objects). I was terrible; I constructed code at work that made me cry the next day I read it. Eventually things started to get better. I had my code critiqued by clever people (I say critiqued, I mean brutally torn apart) and rewrote it and practised it some more. Eventually, I felt as if I was a C++ ninja and I finally got the language. A few years later and I felt pretty comfortable. Until, Alexandrescu wrote Modern C++ and it felt like throwing myself off the cliff again!

Roy argues that to grow your team, you should push them off this cliff and challenge them. You need some risk to learn a new technology. Learning a framework is your spare time is not a risk; learning a new framework on the job? That takes some balls. Pushing people to learn also is a risky thing, but to grow the team we must first realize that we can put ourselves in that scary place and grow

There's a time and a place for learning. Roy outlines three phases on a team of development and a suitable stance for a leader to take on each part.

  • Survival - Teams are fire-fighting at this stage. Chaos rules! There's no time for learning. Teams in this mode need to get out of it, and the best way to do this is "Command and Control". Prioritize the tasks, use a clean slate and exit into the next mode.
  • Learning - Teams in this mode have time to learn new techniques. Roy asserts that teams in this mode might go 300-400% slower whilst they learn a new skill (say TDD). As a leader on a team like this, your role is to support the team through coaching.
  • Self Organizing - The team doesn't need you. They can solve problems on their own. Roy estimates that fewer than 10% of software teams find themselves in this place.

Teams get addicted to survival mode. Faced with the "write it now and get it out" or "test it" choice, teams often pick the former and get away with it. It feels good. It's only later when we realize we have to do a rewrite that we realize the folly of this decision.

It's OK, I hear you "lean" people. It's a waste doing it right, surely? It's an MVP man. What's the point in testing it if I'm going to throw it away? That'd be fine if you did throw it away, but you don't. We also overestimate how long we can get away with poor quality code. The design stamina hypothesis doesn't label the time axis, but in my opinion it's probably days or weeks not months or years.

Anyway. In order to break survival addiction, it's up to us as developers to take this under control. Give realistic estimations that build in time for doing it to an proper level of quality (you want to write unit tests? Make sure your estimate includes time for that, don't show it as a separate activity).

In order to understand why people don't change, Roy recommend the book Influencer: The Power to change everything.

For each behaviour, the world is perfectly designed for that to happen

In order to change behaviours, we need to understand the personal, social and environmental conditions that led for that behaviour to happen.

Roy ended up with a song. Which was weird. But was good.

Implementing micro-service architecture

Fred George gave an awesome presentation on micro-service architectures. He started with a brief history lesson of how he got to the plan of microservices, showing how his career had progressed from big monolithic applications through to service-oriented architecture. Each time he felt there was a collection of smaller things trying to get out, until one day he had the chance to try something crazy with a desparate team.

What if we built our applications as many tiny services, each fewer than 100 lines of code? Each service is like a class, a small and crisp conceptualization. Each service has a segregated database and encapsulates that information. Services publish conclusions; not raw data. This brings up some really interesting questions. How do you monitor systems like this? How do you keep it running?

The slides are available and I'd encourage you take a read. It's full of dangerous ideas. Why'd we need unit tests if we are just writing 100 lines of code? Why should we adopt a uniform language when we could just rewrite servers anyway? Why do we need to worry about copy-and-paste code?

I don't necessarily agree with everything (because I've never been involved in a system like this, and I'm a sceptical kind of guy), but it's great to see something different that challenges the way you think.

Architecture in the age of Agile

Rob Smallshire talked to us about architecture in the age of agile.

Lean thinking (I should read The Goal) says that it's all about the flow, and we should reduce our batch size to reduce waste. TDD does a great job of reducing batch size. You get feedback quicker and you find defects earlier. Rob argues that architecture is a counterpart to this; it helps you find defects in your system design earlier, it just works on a different time-scale.

A calendar and a clock also work on a different time scale, but we view these as complementary; not competing.

Architecture is about maximizing our ability to keep working sprint after sprint. Without architecture, how will you ever reach 200 sprints? Scrum is feature driven. I have a backlog of features which I rapidly complete, I throw them into a system and non-functional requirements are emergent properties.

This really clicked for me. In teams I've worked on cross-cutting concerns (performance, usability) often get neglected in scrum. We try to make features out of them, but that never works (how do you estimate "improve performance?"), so they instead get added to the bug tracker, usually split out into lots of little bugs (performance in history pane is too slow, performance on dialog x is too slow). The solution to these is not a local optimization; we should be considering the system and solving it from that perspective.

Rob presented some scary statistics on the half-life of code and developers on systems. On average on any project after about 3 years, 50% of the developers will have left. However, the code often lasts longer than that.

As architects, we should take that into account. Deliberate design provides context and structure and, most importantly, continuity for the project.

Keeping Agile Development from becoming Fragile

Just because you can go fast, doesn't mean you should. (1995 Darwin awards)

In my opinion the technical debt metaphor is over used. In this talk, Howard used conscious technical debt to illustrate the point. That's a trade-off I'm willing to make sometimes, and he gave some good examples of how you could recognize problems early on and counteract them.

Final Thoughts

It's been a bit of a mixed-bag so far. The format of the conference doesn't encourage the conversations that happened at Agile Cambridge. Looking forward to the rest of it though, some exciting sessions on Thursday.

Saturday 28 September 2013

Agile Cambridge 2013 Day #3

Lean Coffee

The day started particularly early today with Lean Coffee. It's a great way to look at a wide-range of topics in a short time period.

I got a lot out of the discussion, particularly around technical debt. I need to try the Get Kanban game as that's engineered to show that quality does matter. Someone also mentioned a low-tech way of capturing technical debt; just place a dot on the board when you're affected by it. At least that demonstrates improvements!

Real Options in the Real World

Chris is from a financial background and presented his approach to IT risk management. The briefest of summaries is:

  • Options have value. (maybe an indeterminate value)
  • Options expire.
  • Never commit early unless you know why.

The never commit early unless you know why point echoed Neil Denny on day 1, where he spoke about the delicious discomfort of uncertainty. As humans, we find it easier to close options out, but in reality from a rational point of view we should keep our options open.

Humans are bad at risk management. We have a tendency to want to assign probabilities to failures so that we can pretend that it's not very likely to happen. We should instead focus on time to business as usual; what are the options for recovery?

Chris then looked at how options thinking applies to moving staff between projects. Based on the theory of constraints, there's only one bottle-neck in the system and (logically) we should move people to solve that problem regardless of their role (such as blur the lines between dev/test). We (as an industry) worry about this because of people's attitudes (I'm not going to do testing!) or people's capability (but Joe's the only COBOL guy!).

Chris presented a compelling case. We should (counter intuitively) assign people with least experience to critical tasks first so we can use the experienced people in more of a mentoring/coaching role. The best developers coach; others fix problems with their help. This helps eliminate the "project heroes". Staff liquidity truly delivers agile.

This was a very compelling case, but something doesn't feel right to me. It ignores the human factors. Software engineering is not manufacturing; it's a craft. People also don't have an infinite capacity to learn (or at least I don't!). If I'm switched between projects, and dancing between coaching, developing and testing then I feel my overall effectiveness in each of these areas will be reduced. I mentioned this on Twitter, and I was given some pointers to reading more about this

I'll try to find the time to read that work, along with the Commitment book.

The Art of Systematic Feedback

Marcin Floyran gave a series of examples of how feedback is useful. I've long been sold on this point, and it was great to see it backed up by a series of examples. I've summarized Marcin's formula for systematic feedback below:

  1. Explicit Assumptions (scientific reasoning!)
  2. Clear Objective
  3. Careful design
  4. Learn from Results
  5. Rinse and Repeat

You know you're doing feedback right when you are "acting responsibly to meaningful data". I tried to argue before that we need to do this for coding and encode the explicit assumptions with deliberate development and this has helped me see the pieces that I was missing.

The AutoTrader Experience report

All too often in software development we focus our attention on the 1% of software projects that go well, and we never really look at the problems. The Auto Trader experience report was fascinating and brutally honest. They showed how despite being agile projects can still fail.

Given an impossible deadline, the team panicked. They tried incredibly hard to estimate the project to prove that it was mission impossible, but this didn't help. Instead the team was given unlimited resources (large numbers of contractors) and just told to "do it". It was great to see an insight into the panic, and this video is definitely one to watch on InfoQ when the video goes live!

.

Gamification - How I became a spaceship commander

Tomasz presented an entertaining study in gamification. The goal was to influence the behaviour of software developers to use their bug tracking system and track tasks. It was interesting to see the behaviours that this encouraged within developers. Definitely an area to spend some more time with.

Wrapping up

So what do I take from Agile Cambridge 2013? An awful lot of reading material, a huge collection of ideas floating around, some practical tips for Monday. Job done.

Agile Cambridge 2013 Day #2

Conference Cold. Every conference I attend seems to result in me getting ill. If you see me turning up next time in a mask, then you'll know why.

Conference Protection

Anyway, back to the writeup.

Change or be changed

Change. The ever-present moment of opportunity or terror that's been a staple of every company I've ever worked at. Janet Gregory explored the various types of change that occur in life.

Do you need or want to change? Sometimes you want to change (wouldn't it be great if I was fit?), but sometimes you have no choice(Your health is suffering, you have to get fit). Change can often bring new opportunities (Ford made cars, others wanted faster horses).

Towards the end, Janet highlighted some of the change models. I caught most of them, and I'll push them on my unbounded stack of reading material.

Making Sense of Systems Development

Cynefin. Kee - ne - fen. This is a word I've heard much about. And now I can pronounce it. It's a non-sense making framework (that was a cheap shot, sorry).

The workshop was well-run, and we classified situations as either:

  • Simple (the answer is clear, no need for analysis)
  • Complicated (solvable by an expert or process)
  • Complex (might know what to do better next time, hindsight)
  • Chaos (totally new, no idea how to do things)

There's something about this classification that feels familiar. Learning a new subject traverses from chaos (I've no idea what I'm doing!) to complicated, through complex and then finally simple (unconscious competence) and I can see how it would be a useful tool in the armoury. I didn't see anything that fundamentally changed my opinion (maybe something will click later?)

I'm very sceptical of things you have to pay for to understand (see ScrumMaster courses, Agile certification, Scientology and from the looks of it Cynefin.). I coin Jeff's law:

Blurring the Lines

Chris George presented on blurring the lines. The central premise is that by putting walls between dev and test, we've suffered. By breaking down these walls, we can produce better software.

Chris references one of my favourite papers, the 1968 NATO Report on Software Engineering and starts by introducing a quote by Alan Perlis along the lines of "testing is a process best undertook throughout the product life cycle".

Dedicated testing departments split this process, introducing artificial communication barriers which (due to Conway's Law) resulted in a split between development and test. Chris looked at breaking down this wall and merging the roles of test and development and the positive effects it had on his team.

This was a theme revisited by Chris Matts in the next days key-note. There's something I don't quite agree with from both talks, namely that it assumes that people are fungible resources. More to come on this, once I get my thoughts together.

The Open Session

Due to the conference flu, a number of speakers dropped out at the last-minute. Huge kudos to the organizers for managing to fill sessions at the last minute with interesting and relevant content. The final session of the day was an open session organized by Simon and Ryan.

Our group discussed controversial opinions.

I'm not sure we tried hard enough to rock the boat!

Wednesday 25 September 2013

Agile Cambridge 2013 Day #1

I'm at Agile Cambridge 2013 this week. I participated in the review panel this year and there was a huge amount of quality submissions. The programme looks fab and I'm looking forward to the rest of the week. Here's my notes on the sessions I've attended on the first day.

Moving Towards Symbiotic Design

The opening keynote by Michael Feathers explored Conway's Law which states:

"organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations."

He gave some examples of this in action, and I found it easy to relate these to experiences of my own. When working with remote teams we've struggled with communication issues, and so has the code. The boundaries between "our" module and "their" module was as cloudy and wooly as the communication between the teams.

I've also seen the Tragedy of the Commons that Michael described. I've worked at some places where shared code had no clear owner and thus depleted over time. On the flip side, I've seen shared code be tremendously successful when a team was devoted to it full-time (and that team was highly communicative). Conway's law holds!

Michael's observation that people working on a code base for too long become immune to the cruft that's accruing hit home. Uncle Bob has this great metaphor of developers understanding the geography of their code when working with large classes (this function is just here, after all this whitespace, and the really long method beginning foo...). I think this kind of complexity is what developers who work on a project for too long get comfortable with. When you're a new person on that code base you don't have that historical knowledge and you struggle to pick it up.

So what would change if code were king and people were subservient to it? It's an interesting question. People working on the same code for a long time become immune to the cruft that's accumulated within it. The code is screaming out for someone else to tend to it and address this. With this in mind we should probably be more deliberate about moving people around between software projects

If code was a first class citizen in project discussions then we'd pay more attention to things like "death by 1000 features". There's a point at which adding features to a project becomes considerably more difficult. Perhaps we should push back more and do what the code wants sometimes and not just caving in and sacrificing the code base?

The takeaway for me that we don't do a good job talking about code as a party in the relationship between people and process. We're already happy to change project structure to deliver the product, but let's start to explore changing project structure to drive the code in positive directions.

Revere the code!

Unpicking a Haystack

I thoroughly enjoyed the Unpicking a Haystack session by Duncan McGregor and Robert Chatley. We took perhaps the worst example imaginable of legacy code (decompiled from binaries!) and tried to make sense of it.

There were different approaches tried. One group did a lot of reading (they only had a text editor, not an IDE). Another tried to get a foothold in the code, test it and test their way to success. The group I was in took a slightly different approach and leaned heavily on the IDE. I've touched on this idea before (You don't need tests to refactor).

Our approach was to crowbar out large lumps of code with repeated "Extract Methods", rename things to make sense and use the IDE inspections to fix all the silliness (e.g. redundant variables, constant parameters, unused methods). It seemed to work pretty well, but I think there was some skepticism among others as to whether this was a good idea or not.

I hope in the future development tools will evolve to become something more developers trust and code becomes malleable putty we can shape in seconds. As a thought experiment, imagine if you could put your IDE in "semantic lock" mode and just drag, drop and reshape code to your heart's content with the guarantee that behaviour would be preserved. How would that change your ways of viewing legacy code? How would it change your design process?


To wrap up the group discussed how they might change the way they write code in light of the problems of refactoring. I didn't say anything at the time, but I've thought some more about it. Decompilation loses names (not all names, but some names), but decompilation always preserves types. I've waffled before about Types and Tests.

With this in mind, we should be more aggressive about encoding information within types (e.g. compiler verifiable properties) rather than names. As an example, we found some code with a value that either referenced working directories (pwd) or a password. The type information told us it was a string. The variable name was str12345. We had no idea which one to choose. If the original authors had encoded that information in a type, then we wouldn't have had a problem, the type would describe the shape of the variable.

Rob and Duncan are organizing a Software Archaeology Conference which looks fabulously interesting.

Rituals of Software Engineering

Alex Shaw examined the topics of rituals in software engineering. Rituals have a long history, and the essential parts of a ritual are:

  • Obligatory (not in the sense of being forced, but you feel like you should attend.
  • Admit of Delivery (the style of delivery doesn't matter
  • Consequence Free (measurable outcomes are not the goal

I guess rituals has a slightly negative connotation for me. When I hear rituals I think of animals being slaughtered and cargo-cultism (obviously not a healthy way to do things. From the wikipedia definition:

A ritual "is a stereotyped sequence of activities involved gestures, words, and objects, performed in a sequestered place, and designed to influence preternatural entities or forces on behalf of the actors' goals and interests"

By the end of the presentation I was more convinced that I was disagreeing with a word and not the content. The essences of great team 'rituals' included that members of the team know the context and understand the value and reason for the ritual. With all that in mind, I agree rituals are important for engineering teams and it's a really interesting idea to explore.

Growing XP Teams

Rachel Davies gave us a great insight to how the team work at Unruly media, a company run from the start on XP principles.

I love these kind of experience reports. Speaking afterwards, The Agile Pirate described it as voyeurism. I like that description (slightly sordid though it sounds). It's great to get an insight into how a team works and interacts.

It was interesting to hear how the "Lone Ranger" role worked at Unruly. The idea of having a single developer being available to answer questions from support/sales/product managers is interesting, and I can see how that could help protect the other members of the team from unwanted interruptions.

We don't know where we're going. Let's go!

Last, but by no means least, was a talk with an intriguing title by Neil Denny. This was my favourite talk from the first day, and had nothing to do with software whatsoever!

Neil's talk explored uncertainty, that "delicious discomfort of not knowing". Uncertainty is something we face all the time in software development (because no-one knows how to do it right), so it was great to explore this topic. The presentation style was fantastic with audience participation, touches of humour and an engaged audience.

What's more dangerous than uncertainty? "We are never more wrong that when we are most right". Once you convince yourself that you definitely have the answer then you become closed to better solutions, becoming dogmatic and rejecting alternatives. This is a dangerous place!

The point I left with is that we should treat uncertainty is a challenge to savour, not something to fear. When people are looking for answers, they tend to just want the smallest possible change (confirmation bias?). This often makes us go for the minimal change, rejecting the truth of what we need to do.

There were a few books mentioned which I need to add to my reading list

And then...

So there I am, it's the end of the day and I'm outside the college waiting for a taxi. I get talking to someone else waiting and I discover that she's a retired mid-wife. Nothing too strange yet. And a bit more talking and I discover that she's from the same area that I was born in. A few minutes later, we talk about times and ages, and realize that she would have worked at the same hospital I was born in. Putting more dates together we discover that I've just bumped into a person who possibly delivered me. Freaky

Looking forward to tomorrow!

Oops, should have mentioned we're hiring.

Friday 20 September 2013

How do you write software?

It’s a really simple question, but one that’s hard to answer. If you start by saying "I take a story from the board" then let me stop you right there. I'm not interested in the process, I'm interested in what happens between after the choice of what to do and before the done bit.

Maybe you’re answer involves the tools you use? You might begin with "I write in Emacs; let me tell you about my setup...". Nope. Not interested in that. Glad it works for you, but it doesn't really tell me about how you really construct software.

Maybe your answer involves the syntactic rules you use? You write software with spaces, not tabs and all your statements are semi-colon terminated and it’s either K&R or the highway? Yawn. That’s not what I'm after.

How do you actually build your software; what happens between the brain and the keyboard?

Half-baked practices

After pushing a bit more, I ask again and I sometimes get responses like this:

  • All the significant code I write is peer-reviewed.
  • I try to write unit-tests when I can.
  • Most of my bug fixes contain a regression test.

What do they have in common? They are all weak statements that don’t really tell me (and most importantly YOU) anything about the way you develop software. They probably tell me that your heart is in the right place, but it doesn't show any commitment.

Every single one of these statements has an escape hatch. It's far too easy to ignore these values. You can almost imagine the excuses now. This is just a small commit, it doesn't need a review, let alone a unit test. This bug I just fixed, it’s so tiny, so inconsequential that it doesn't need a regression test for it.

I'm not going to try to argue that you should never use an escape hatch, but to word things so generally is simply inviting temptation.

Trying to strengthen those practices

So what happens if we strengthen our statements a bit more? Let’s take the first half-fledged principle and try to improve it a little.

  • All code will be peer-reviewed

Peer reviewing all code. That's undoubtedly a good thing, and it certainly doesn't have an escape hatch. Or does it? The problem with this statement is it’s far too vague. What is the code being reviewed for and to what criteria? Perhaps it’s obvious to you, but what about the others on the team. Often you’ll ask around and get different answers from members of the team.

Is it a formatting check? I hope not, because you’ll have automated that and stopped arguing about it a long time ago. Right?

Is it checking for obvious mistakes? Maybe. That’s definitely a good start, but it’s still a bit woolly. What's obvious to you probably isn't so obvious to someone else on the team.

I'd suggest a strong set of criteria creates a tighter statement. For example:

It doesn't really matter to me (though it should matter to you) what the criteria is, the most important thing is that it’s strong enough for a disciplined code review.

So what good are fully fledged practices?

OK, so you've agreed on a couple of practices. What’s that actually going to do for you? Once you've got a disciplined way of developing software you can start to reflect on how things are working for you.

You've got a set of fully fledged practices. These practices are hopefully challenging you to think much more carefully about the way you construct software. Perhaps you've committed to getting all code peer-reviewed? Chances are you've got some doubters on your team? How can you demonstrate these practices are working?

Look at your practices; how do they translate into outcomes? Perhaps you want a code review before every push because you are trying to unpick a mess of legacy code? Maybe you’re trying to roll out test-driven development because your integration tests take hours to run? For any outcome, you can probably think of something you can compare that will help you know whether it's working or not. Perhaps you could measure whether your code coverage increases? Maybe you could take a look at the cyclomatic complexity of the code? How many times did the build break this week? You don't even need automated measures, you could simply ask opinions.

Put together, practices and retrospectives give you a base to reflect on the way you develop software and try to find ways to improve upon it. Donald Knuth did this. For the entire history of the Tex program, Knuth kept a bug journal recording the how's and why's of everything that went wrong. I've never seen a quote suggesting so, but I'd imagine it's hard not to become a better developer by reflecting on what works and what doesn't.

Putting it all together

Every team dislikes something about their software. Maybe it's harder to change than it was a year ago? Maybe the compilation time takes forever because of all the coupling? Maybe someone else wrote it and it just plain sucks? Flip the problems around and you've got goals to achieve. You want the ability to change code quickly and easily. You want a 10 minute build. You want simple and maintainable code. Now you have measurable outcomes!

This is where deliberate practices come into play. Now you've chosen some practices you can start to see how much difference they make on day-to-day development. I'd suggest that most practices probably need a good few months before they've truly bedded in. It's worth reflecting on practices more often though, perhaps you can see benefits sooner?

There is no such thing as a one-size-fits-all approach to software development. The idea of best practice is a complete myth; it's about what works for your project, your team and associated human factors and your work environment. By being deliberate about the practices you use then you can attempt to find what works best for you.

This is obviously a grossly simplified approach. The real world is messier and there are constraints all over the place, but I still think that being deliberate about the way we create software is an important step in the continuing journey to become a better software developer

So, how do you write software?

Thursday 5 September 2013

Programming Epiphanies

What is a programming epiphany? It's that moment that you have when you realize that the way you've coded is wrong wrong wrong, and there's a better way to do things.  Here's a few of my programming turning points.
When I was studying Computer Science at the University of Southampton object-oriented programming was simply:
  • Inheritance
  • Encapsulation
  • Polymorphism
Inheritance was first, and that meant it was the highest priority for me.  If I could find a way of inheriting A from B, I probably would.  Encapsulation?  That's just wrapping all those lovely members with getters and setters.  Polymorphism wasn't something I really ever thought about.  It came last so it was probably something I could get away without.

Around this time, I was programming using C++ and Microsoft Foundation Class.  My understanding of MFC at the time gelled quite well with how I understood object-oriented code.  Plenty of inheritance!  I even felt I saw a use for polymorphism with overriding some of those virtual function things.

Things got a bit better towards the end of my degree.  I found a copy of Effective C++ and read about const correctness.  I remember having particularly knotty issues in some of my code (probably due to my understanding of encapsulation) and not being able to find the bug.  By liberally sprinkling const over the code base (it's like a virus!) I eventually found my unwanted mutation.  My first epiphany; design code correctly and make the bugs impossible.

I bumbled my way through a research degree, and then became a research scientist for a bit.  I never really wrote code that anyone else had to read, so my code was just good enough.  My next big leap in learning came with my next job. For posterity here's the original advert (no idea how I got in!).

Day 1.  Someone mentions this thing called the visitor pattern, then ubiquitous language and then a few more things.  WTF?  Visitor pattern?  Names matter? Oh dear, there's a huge amount I don't know.  I managed to get through the day without getting found at, visited Amazon and ordered a few dozen books.  My second epiphany, smarter people than me have likely solved your problems before; get reading.  I went through my pattern craze, no doubt needlessly applying them sometimes, but I worked it out of my system.

At some point came another one, singletons are bad.  I think everyone realizes this at some point and has an instant recoil of all design patterns.  I love functional programming, and so I remember finding Design Patterns in Functional Language presentation and thinking to myself that maybe patterns are just missing language features?

All good things must come to an end, and next I turned to the dark side of enterprise programming.  If you don't know what this is, it's very simple.  A sales person promises impossible to a clueless manager, and then a team of software engineers will work at solving the impossible problem with an equally impossible deadline. I identified a huge amount with Death March, but lacked the gumption to quit.  Not all was bad though, by seeing every possible variant of wrong I learnt something incredibly important.  You can't tolerate complexity.

Quality isn't just something you can get back another day; quality matters.  Once you've lost quality, once you've lost clean code, you're fucked.  You might get away with it for a bit (the human brain can deal with remarkable amounts of complexity), but in the end that ball of mud will crush you.

So how'd you build in quality from the ground up?  As part of my job, I visited an extremely Californian company that practiced XP.  All code pair-programmed, all code with a failing test first.  This made a big (though not very immediate!) impression on me.  It wasn't until I read Growing Object Oriented Software Guided By Tests that it clicked in a way that felt right and TDD seemed a bit more natural.

So what are your programming epiphanies? What are the moments in your development so far that changed the way you think about writing code?

Monday 19 August 2013

Legacy Code Retreat

I had the pleasure of attending a Legacy Code Retreat at the weekend organized by the Cambridge Software Craftsmanship group. Legacy code is a subject close to my heart. I've spent most of my working life immersed in it, and I'm been at least fairly successful is altering my point of view from "OMG THE CODE IS SHIT" to "what can I learn from this?".

The event was excellently co-facilitated by Alastair Smith and Erik Talboom and followed a familiar format of short sessions with tight constraints and retrospectives to discuss.

This was my first encounter of TDD as if you meant it which was certainly a challenging experience and one that I'll definitely try again.

I enjoyed the discipline of baby steps. We set a timer for a minute and either had to write a test, or perform a refactoring (but not both) within that minute. Failing to do that would reset the code and you'd start again. The object of this exercise was to judge the smallest steps you can work in. This is really interesting experiment to do; it forces you to think of your work as a series of smaller and smaller steps.

I also learnt the term "Golden Master", refering to a record of the working system. Working with Andrea, we were tasked with writing system level tests to capture the behaviour. We quickly hit upon the idea of simply capturing the entire output of the program as the simplest end-to-end test. This proved to be amazingly powerful; armed just with an end-to-end test we felt very confident about changing the code. We understood that the seeded randomness didn't cover every case, but we figured we could just capture more runs of the program to increase the confidence. Our simple measure of the confidence we had was that commenting out or changing a line of code in the program always seemed to result in that integration test failing.

The main take away for me was just that the discipline of focusing on a single thing is incredibly powerful. Instead of "fixing all the easy things", I'm going to try and take a step back and focus on something specific (be it renaming, simplifying conditionals, or just increasing test coverage).

Sunday 18 August 2013

Technical Debt and Legacy Code

It's time to ban the phrase "technical debt". It's a metaphor that's been stretched to mean too many things, from hand waving usages to make teams "go faster" to being a catch-all excuse for developers to write terrible code. I'd like to challenge you to avoid saying "technical debt" and instead try to define the actual problem. By stating the problem concretely, you're already closer to the solution.

Let's go right back to the beginning. Ward Cunningham introduced the metaphor of technical debt at OOPSLA 92 in "The Wycash Portfolio Management System" experience report. Let's look at what he says:

Another, more serious pitfall is the failure to consolidate. Although immature code may work fine and be completely acceptable to the customer, excess quantities will make a program unmasterable, leading to extreme specialization of programmers and finally an inflexible product. Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. Objects make the cost of this transaction tolerable. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object- oriented or otherwise.

What Ward is saying here is that the build up of technical debt starts when you ship something with an incomplete understanding of the problem. It's the acknowledgement that we never ship the right thing first time. This isn't news to anyone! No matter what the customer says, the first time they get the new feature they want something slightly different. Technical debt starts to occur when you don't reflect that difference back to the underlying code.

The original definition of technical debt is nothing to do with quality of the code alone, it's simply an acknowledgement that once your code no longer models the problem, you're in trouble. Things aren't going to get better until you acknowledge that and fix things. I'm willing to bet that this definition of technical debt isn't the one you use.

Unfortunately, when I hear the term technical debt it's usually in the context of moaning about legacy code.

Most of us know what legacy code is. It's really hard to work with. It's difficult to like (though some might say there is Joy in Legacy Code), and progress is slow. Saying it's full of technical debt is almost certainly true, but it's not very helpful. You need to define that debt in meaningful ways so you can do something about it (or explain your woes concretely to the rest of the team so you can make time to tackle it as a team). In this context, technical debt is nothing but a lazy phrase to stop you concretely defining what the problems really are. As Bob Martin says "A Mess is Not Technical Debt" article.

Ever wonder why it's so hard to convince your boss to have time to tackle technical debt? You're fighting the wrong battle. You're describing the problem in terms of your domain, not the problem domain. What happens if you're not allowed to say "technical debt"? Now you're forced to describe the problem in concrete terms. How is the code mismatched against the customer requirements? Which one is the more compelling case to make to your boss?

So stop being lazy; define problems in concrete terms and don't use the dreaded phrase "technical debt".

Thursday 6 June 2013

JavaScript; not your regular curly brace language

JavaScript. It's the language of the web.

It's also a language originally created in 10 days, so there's a few warts here and there.

Here's my choice of JavaScript behaviours that might be a bit odd if you're expecting it behave like Java/C#!

Types

JavaScript is a dynamic and weakly-typed language. What does this mean in practice coming from a Java/C# world?

It's perfectly valid to assign anything to a variable.

var a = 'string';
a = 4;
a = [];

JavaScript won't interpret the assignments as errors, and it's caveat emporium to consumers as there's no way to specify the types of objects you'd like to receive.

JavaScript will use objects in ways you might not expect.

7 + '7' 
> '77'

7 * '7'
> '49'

7 * []
> 0

Equality Operators


== and != don't do what you think. In JavaScript, != and == perform type conversion first, and then compare the objects.

3 == '3'
> true

1 == true
> true

0 == []
> true

If either argument is a number or a boolean, then both operands are converted to a number first, then compared. If either argument is a string, then both operands are compared as strings.

Avoid this weirdness, train yourself to use === and !==. Both === and !== check objects are of the same type first and do no jiggery-pokery converting their arguments. In the comparison examples above, using === returns the expected value false.


Scope

Coming from a previous curly brace language, you'll probably be used to declaring variables as late as you need them. JavaScript doesn't like that.

function foo() {
    if (true)
    { var a = 3; }
    return a;
}

foo();
> 3

You may have expected the inner declaration of a to hide the outside declaration (i.e. a would be undefined). Not true. JavaScript does not have block scope. JavaScript treats variables declarations as if the appeared at the top of the declaring element (e.g. the function).


The logical operators, || and && and !

In Java and C#, the Boolean operators && and || return a Boolean result. In JavaScript the rules are again somewhat different.

3 || 4
> 3

undefined || 'banana'
> 'banana'

'fish' && 'chips'
> 'chips'

|| returns the first argument that evaluates to true.

&& returns the second argument if the first evaluates to true, otherwise the first argument.

! returns either the result of evaluating its argument. If you want to change an object to a Boolean representation the idiomatic way is to use !! to turn it to a Boolean first, and then complement it.

|| acts similarly to C#'s null coalescing operator ??.

function bar(foo) {
   foo = foo || 'banana';
   return 'baz_' + foo;
}

bar();
> 'baz_banana'

bar('bar');
> 'baz_bar'

Objects

There's no such thing as a class in JavaScript. Object definitions are much "looser" than they are in Java/C#.

Object constructor functions are simply defined as functions. By convention function names that represent constructors start with a Capital letter (unless you invoke using new they aren't very useful).

function I_Make_Objects() {
    var counter = 0;
    this.Property = 'I am a property';
    this.Method = function() {
        return 'Method' + counter++;
    };
}

I_Make_Objects();
> undefined

new I_Make_Objects();
> I_Make_Objects {
   Property: "I am a property", 
   Method: function
  }

var a = new I_Make_Objects();
> undefined    

a.Method();
> "Method0"

a.Method();
> "Method1"

JavaScript functions are actually kind of cool. As the simple example above shows, you can create objects with local state and use closures to hide this data from the outside world, providing you with a measure of encapsulation. Objects are simply represents as a bag of properties and closures.

Inheritance

In Java/C#, inheritance is class-based. Objects are instances of classes and an object can not change its class. In JavaScript, there are no classes, there are only objects. JavaScript uses prototypical inheritance; each object constructor has a prototype that describes the basic set of behaviours (also known as differential inheritance).

String // the object constructor for Strings
> function String() { [native code] }

String.prototype // the prototype of string
> String {}

"wibble".wibble();
> TypeError: Object wibble has no method 'wibble'

String.prototype.wibble = function () {
  return "wobble";
}
> function() { return wobble; }

"wibble".wibble();
> wobble()

By changing the prototype all objects constructed will now have that behaviour available. This is in contrast to the fixed nature of C#/Java where adding new behaviour at runtime is much more challenging.

Constructing an object assigns a prototype to the special property __proto__. Continuing the previous example:

"wobble".__proto__
> String { wibble: function }

Invoking a method on an object first searches for a match on the object itself and if that fails the JavaScript runtime searches up the prototype chain until it finds an object that has the requested property or method (Object is the only object without a prototype).

"wibble".__proto__
> String { wibble: function}

"wibble".__proto__.__proto__
> Object {}

"wibble".__proto__.__proto__.__proto__
> null

Prototypes are a fantastical flexible way of adding and removing behaviour at runtime. For example, you could redefine the prototype of string to give access to a common set of methods.

Thursday 9 May 2013

Choosing Container Types

Choosing a container type is easy; just pick ArrayList and be done with it. Right?

Unfortunately, if you choose the wrong data type and don't encapsulate it properly then you're saddled with the results for the foreseeable future.

Using data structures


One of the basic principles of object-oriented design is that you can hide details (encapsulation). One of the best things to hide is the way you've got your data stored. At least that way if you do choose a list of pairs instead of a map, you can at least fix it locally!

In most of the legacy code bases I've seen, developers have gone the other way. They've spot-welded the choice of data structure onto objects and shouted it proudly about the code base.
// I AM DEFINITELY A LIST
class Customers : List<Customer> {

}
I'm not sure what particular brand of insanity causes this. Either customers is a List (in which case this is probably an abstraction too far) or it represents a group of customers that has specific behaviour.

Slightly better is to at least hide some details. Interface implementation isn't quite so spot-welded as implementation inheritance. At least now you can change the underlying implementation without telling the outside world. I've seen people object to this because of the "noise" it generates (all those delegating members). This monotony can often be auto-generated (thanks ReSharper) and these delegating members are often a stepping stone to a clearer design.
class Customers : IList<Customer> {
    private List<Customer> implementation;

    // a million and one delegating members
 }
Better yet is to completely hide the details. Make Customers an object in its own right. Give it methods to manipulate its data. Make it a living breathing object, and not just a pale copy of a collection.
class Customers {
   public Invoice GenerateInvoice();

   private Set<Customer> customers;    
}
Once you've got an object wrapping your collection, make sure you don't leak any more details than you need to.
class Customer {
   public Set<Customer> GetCustomers() {
       return customers;
   }
}
ARGH! Don't do this! In a few years time, someone will inadvertently capture that reference and do something bonkers.
var set = customer.GetCustomers();
logger.Debug("Found {0} customers", set.size());

// Clear the memory, clearly we don't need it.
set.Clear();
Ideally, don't expose the information. Your Customers object is responsible for managing that collection. If you expose its internal details to the world then it's got no chance!

If you have to, use the types (and if you must the objects) in your language that make sure you won't get aliasing problems (Collections.unmodifiableSet) or IEnumerable). As a slight aside, I strongly dislike how Java's iterators define remove, and instead I have to rely on a run-time guarantee of immutability (never did get a great answer to this question).

Choosing the right data structure


There are two parts to every data structure. The first is the abstract data type - what operations does the data structure allow? The second is the underlying implementation of the ADT. What are the trade-offs? Is it a linked list or an array list?

In my experience people tend to focus on the latter (how is it implemented) instead of choosing the right data structure in the first place. Every time I see a list of pairs instead of a map, or a list without duplicates instead of a set, I sigh a little.

As a quick example of the perils of choosing the wrong data structure, I've recently been working on refactoring some code of this form.
var xs = new List();
var ys = new List();

foreach(var y in ys) 
  if (xs.Contains(y))
    DoSomething();
Even with a moderate number of elements (10000 or so in both collections) this code has serious performance problems. Running the code through a profiler showed that the equality operator was executed over 60 million times! Testing the results was taking 10 times as long as performing the operation itself.

There's a number of problems with the code above, but the root cause is choosing the wrong data structure. Despite the rich collection libraries in Java/C# most developers plump for a list. Premature optimization is one thing, but choosing the wrong data structure is just as bad.

Hide your decisions about collections. At least that way you can fix it locally.

Tuesday 23 April 2013

Lamenting the lack of RAII in C# / Java

One of the hardest problems in programming is managing resources. How do you make sure you don't have a memory leak, reclaim that opened file handle, or give up that network connection?


In the C language you get a file handle via the fopen function, and you absolutely must remember to return it with fclose. If you forget to call fclose then the underlying file descriptor in the operating system is not closed and eventually you'll run out of file handles. In the simple case, it's easy. Just open the file and close it. If you aren't careful, it starts to get pretty complicated though. What if you have multiple exit points in your function; you've got to remember to fclose everywhere. Manually managing resources is a really tough problem. Here's a simple example that leaves a file descriptor open. Can you see why?

int copyFile(char* szIn, char* szOut) {
    FILE* in;
    FILE* out;

    in = fopen(szIn,"r");
    if (in == NULL) return 7;

    out = fopen(szOut,"w");
    if (out == null) return 8;

    /* read from in, write to out */
    fclose(in);
    fclose(out);

    return 0;
}

C++ introduced the "RAII" pattern (resource acquisition is initialization) which is a fancy way of saying acquire in your constructor and release in your destructor. By using RAII I can write a file copy and not need to worry about remembering to close files.

int copyFilePlusPlus(char* szIn, char* szOut) {
   ifstream in(szIn);
   ofstream out(szOut);

   if (!in.is_open()) return 7;
   if (!out.is_open()) return 8;

   /* do the copy */

   return 0;
}

The knowledge about the resources can now be completely hidden in the object. Progress!


The most common resource to manage is memory. Garbage collection frees the programmer from having to worry about reclaiming memory (and hence makes it more difficult to leak memory). Unfortunately, the implementation of garbage collection cedes control of when objects will be released. The lack of deterministic finalization means that resources handling becomes more difficult. Without a destructor that runs when an object goes out of scope you can't hide the details of resource ownership quite as elegantly as you can with RAII. (Brian Harry discusses some of the reasons why C# doesn't have deterministic finalization here)


C# introduces the using syntax to deal with this problem. This gives you deterministic finalization giving you precise control over the lifetime of a resource.

using (var f = File.OpenRead("foo.txt")) {
  // use f here
}  

FileStream f = null;
try {
    f = File.OpenRead("foo.txt");
}
finally {
    if (f != null) f.Dispose();
}

The using statement expands out and lets us avoid writing a whole lot of boilerplate code to deal with it. This is progress of a kind, but it does mean we have to explicitly remember to call Dispose and we can't insulate the knowledge of the resource behind an object as we can in C++. Objects with unmanaged resources must implement the IDisposable interface and people who use these objects must remember to use using blocks.


Java was a bit slow in adopting this convention, but eventually got around to copying it with Java 7 (see try-with-resources).


Without the RAII pattern, Java and C# are more susceptible to leaking resources. As Herb Sutter says


I called this Dispose pattern “a coding pattern that is off by default and causing correctness or performance problems when it is forgotten.“ That is all true. You have to remember to write it, and you have to write it correctly. In constrast, in C++ you just put stuff in scopes or delete it when you're done, whether said stuff has a nontrivial destructor or not.


Tools like Gendarme can perform static analysis to attempt to find instances where variables haven't been disposed, but it's not perfect.


So what are the best practises to manage resources in Java/C#?


  • You should always dispose of locally created IDisposable or AutoCloseable objects. If you create a file stream to read in, remember to close it! Use tools like Gendarme or Resharper to check for violations.

  • If you write a class that has a member variable that is disposable, then that same class should also be disposable.

  • Have a clear convention for denoting who owns a given return value. For example, GetXYZ is unclear, does this return a new XYZ that I must dispose, or does it return a shared reference to an XYZ?

  • Make object lifetimes clear!

  • Fail fast - don't let disposed objects be used again (see the Disposable pattern)


The main point to take away is that resource management is hard, but it's often easy to assume it isn't until it goes wrong!

Sunday 10 March 2013

You don't need tests to refactor

Whenever I do refactoring, the first step is always the same. I need to build a solid set of tests for that section of code. The tests are essential because even though I follow refactorings structured to avoid most of the opportunities for introducing bugs, I'm still human and still make mistakes. Thus I need solid tests. (Refactoring: Improving the Design of Existing Code, Fowler et al, 1999)

When I first read Refactoring, I believed that tests were a necessary prerequisite before making structural changes to the code. However, when I worked with legacy code this often left me paralysed: I can't change the code because it doesn't have any tests, but I can't write the tests because I can't change the code.

Reading Working Effectively with Legacy Code gave me escape route.  I could write simpler characterization tests to capture the behaviour of existing code and then refactor. Instead of writing a unit test to capture correctness, I could write a characterization test to capture the behaviour; if the behaviour is preserved, I've not broken the code.

In the years since Refactoring was written tools (such as Eclipse, IntelliJ and Resharper) have well and truly crossed the Rubicon. Over the years I've used these tools, they've gradually earned my trust to the point where I wouldn't consider writing Java/C# without the aid of such as tool.

Using a refactoring tool changes the way you approach code. I feel quite confident in saying that refactoring doesn't always require tests [1]. Instead of considering legacy code as an inanimate collection of methods, I view it as a lump of putty I can shape as needed. Refactoring tools allow you to quickly explore the design space before settling on a decision.

[1] In so much as you trust the refactoring tool! There's always the chance that the refactoring tool will break your code, but I believe these chances are vanishingly small in most circumstances (just don't use reflection!).

Wednesday 23 January 2013

Finding the Joy in Legacy Code

Recently, I gave a presentation for NxtGenUG.  The main event was Uncle Bob's talk about Clean Architecture which meant that the event was really well attended (100+ people).  I decided to try to talk about finding the positives in legacy code.  The slides from the original presentation are available here.

Legacy code. Words that strikes fear and revulsion into most developers. Who wants to work on a code-base full of problems that's incredibly difficult to change and has little or no automated tests? I've spent the most of my programming career working in legacy code in one form or another.

Legacy code doesn't have a formal definition.  I've heard various definitions floating around.  Maybe it's the code you just don't want to work on?  Every project I've ever worked on has had some modules that no-one wants to go anywhere near.  One of my favourite examples is:

    if (!fileExists("c:/some/path")) {
       // The file exists?
       printf("WOT!");
    }

    #define FILE_EXIST_SUCCESS 0

    if (FILE_EXIST_SUCCESS == fileExists("c:/some/path")) {
       printf("Ah, it makes sense now");
    }

Yes, someone had actually shorted the code like that all over the place, so every time you read fileExists you have to do some crazy double-take to work out that the code means exactly what it doesn't say. I'm pretty sure that counts as legacy code, right? (this file also contained the legendary fileCopyWithRetry7, but that's another story).

Another definition of legacy code might be that using a platform that's on the way out. Perhaps you're the poor person stuck maintaining a COBOL or Fortran application? If your really unlucky, perhaps you have a J2EE application to maintain. That must count as legacy code right?

A simple definition of legacy code might just be code that's on the way out? Maybe your airline reservation system is being phased out and replaced with a new one. That's a simple definition and almost certainly covers it.

I thought I'd try to work out what legacy code is by looking at it from the other direction. What's great code? I went round the engineering teams at Red Gate and asked each of them what makes great code.


There was broad agreement on the themes.  Testable, tested, readable, maintainable.  All of these words are associated with confidence, any code that satisfies these properties is going to be fun to work on.

So that brings me to what I consider legacy code
Legacy Code is code that you are scared to change
 That's a hugely broad definition, but it sums it up nicely for me.  I'm aware of Michael Feathers definition which is that legacy code is code not covered by tests, but that doesn't quite cut it for me.  You need to have confidence in the tests, and they also need to get that feedback to you quickly, otherwise you'll still be reluctant to change the code.

If you're working in legacy code, you might be wondering how on earth you can find any joy when working them.  It's all too easy to feel like this guy:

You spend each and every day pushing around this code base, making things a little bit worse every day (metaphor courtesy of Roly Perera way back when).  Let's face it no-one wants to be a dung-beetle.  Who wants to push shit around all day?

Unfortunately, it's all too easy to fall into this situation.  If a method already takes 10 parameters, adding another one is all too easy.  If the preferred development methodology is copy and paste rather than abstraction, then some kind of Stockholm Syndrome sets in and it's easy to convince yourself that it's not that bad.

Even if you do push through that barrier, I've often found myself falling into the situation of not really solving the right problems.  For example, I'll convince myself that before I start any restructuring of the code I should eliminate all the warnings, or perhaps I'll get rid of those pesky PMD warnings?  Increase adherence to FxCop standards?  All of these things are easy to do, but do they really increase the quality of the code-base   Do they make engineering the next feature easier than the previous?  Do they increase your confidence when you work with the code?  Really?

So how do you transform yourself from a dung beetle into someone who enjoys the challenge of legacy code?  It's no easy task, the most important thing for me is to establish the basics:

  • Continuous Build server
  • Tests that give you confidence
  • Fast feedback
None of these things are easy.  Tests are really hard.  The cost of getting the very first test written for a legacy code base might be obscenely high, but until you invest in it, you're be destined to forever be pushing that dung around.


What is there to love about legacy code?  One thing is progress, it's pretty easy to gamify some aspects of legacy code.  As an example, perhaps your team could focus on eliminating global variables one at a time.  It's easy to keep score (Find Usages in the IDE) and it's the sort of change that can usually be made piece-meal.  Having a small set of engineering goals each sprint helps maintain the sense of progress.

One of the biggest assets a legacy code has is the code.  It might be messy, incomprehensible and thousands of lines of code, but there's gold buried in there.  Mining the code base for information can often reveal really interesting things.  Too often it's all too easy to think that any particular class can be reimplemented in half the time it'd take to understand the existing code.  Unfortunately, estimating rewrites tends to miss all the edge-cases and strange-behaviour that's captured in the legacy system.  Perhaps the SP3 edition of the Foo/Bar/Baz component has weird behaviour on the Turkish locale?  The only reason the legacy code captures this is it's been around the block, it's used code and it's had real bugs squashed by real users.  Joel on Spolsky's classic article on the Things you Should Never Do covers this problem really well.

In addition to the codebase, the version control system can reveal lots of other details.  For example, which files have changed the most?  Which have changed the least?  This can help guide important decisions, such as which parts of the system are worth getting under test.  Chances are if code hasn't been edited for 10 years, it probably works pretty well and doesn't require as much testing as the code that's changed every 5 minutes to handle just another edge case.

For me though, the biggest advantage with legacy code is the rate at which you can learn new techniques.  As a new software developer you're bombarded with information about refactoring, patterns and unit testing.  By necessity these are presented on trivial examples, and it's often difficult to see the point.  Why should I be disciplined about extract method when it's only ten lines long?  Why on earth would I need that pattern, surely I can just new this object up?  Test coupled to implementation details rather than behaviour?  It doesn't matter for this example?

Legacy code forces you to confront these issues head on.  You'll discover patterns from first principles. Just by trying to introduce a test you'll quickly realise the dangers of certain code constructs that make things hard to test.  Without the safety blanket of unit tests, you'll quickly learn which sort of refactorings you can confidently apply without tests (yes, sometimes you DO need to refactor without tests, otherwise you're permanently stymied).

The skills you gain with legacy code are entirely transferable.  They aren't a faddish library that no-one will be using next week, they are the meat and potatoes of software engineering.  Master the challenges of legacy code, and you'll have a job for life.