Wednesday 30 October 2013

Not all names are created equal

I think everyone agrees that naming things is one of the hardest things you can do? Books like Clean Code devote whole chapters to naming. Names should convey meaning so that the next person reading the code has an easier job understanding what it does. After all, we read code far more than we write it. It's definitely OK to spend some time arguing about the right name. It's important.

So that's it. Names are important. Job done? Of course not! There's more to the story than that.

At Agile Cambridge 2013, I attended a session (Unpicking the Haystack) where the source code was only available from decompiled byte code (some sad story involving not using version control, not backing up and all the things that no-one ever does). Our task was to recover what the original program actually did. When we're looking at decompiled code almost all the naming information has gone. By the time you've gone source code to binary to source code you've lost variable names. Unsurprisingly, trying to decipher what the code does with local variables such as a1 to a999 is very hard.

With variable names gone, we have to look for other clues for programmer intent. So what else is there? Well, it certainly helps that public methods aren't lost. In this respect, method names are more important to get right than variable names. The naming is stickier. But something far more important gives us even more clues about this mystery code base.

Enter types. Decompilation reveals the names of public types. Names of types can show much more information than the variable name. For example, string s reveals little, whereas URL s reveals much more. If we're disciplined followers of domain-driven design then our types align with the problem we are solving. I'd say that types sit right at the top of the most-important-things-to-name-correctly hierarchy.

In this view of decompiled code, some names are more important than others. Parameter names and local variables are least important, whereas type names are the most important (with methods a close second).

Coming at names from decompiled source is certainly a weird way to do it, but this seems to fit with Bob Martin's definition of name length.

I'd like to try to reinforce the view that types are by far the most important thing to get right. Crisply named abstractions matter more than almost anything else. To explore this area, we'll look at a strongly-typed static language, Haskell, and explore just enough syntax to understand its types. But first...

What is a type? A type is a label that describes properties of all objects that are instances of this type. If you see string in C#, you know you are getting an immutable set of characters with certain methods available. If you see a AbstractSingletonFactoryVisitorBean then you know you've got problems. I'm kidding.

Anyway, back to sensible types. Types describe program behaviour. Don't believe me? Let's begin our detour into Haskell:

-- Whenever you see "::" replace it with "is of type"
-- When you see a capital letter variable then you've got a type
-- add5 is of type Int, returning Int
add5 :: Int -> Int
add5 x = x + 5

-- Parameters are separated by ->
-- For the purposes of this, let's just say the last one is the return
-- type and the rest are the arguments
-- add is of type (Int -> Int) returning Int
add x y :: Int -> Int -> Int
add x y = x + y

-- Generics are represented with lower case paramets
-- middle is of type three generic parameters (a,b,c) returning b
middle :: a -> b -> c -> b
middle x y z = y

Let's look at that last one again. middle :: a -> b -> c -> b. From the name we might guess that it returns the middle argument (e.g. middle 1 2 3 returns 2). Is there any other definition of what the function could do? In Haskell, there's no such thing as type-casting, if all I know is that something could be any type, there's not many options. I can't add anything to it. I can't convert it to a string. In fact, I can't do anything with it other than return it. The types don't let me. Types constrain the implementation choices to a more sensible subset.

Do the names matter? We know that the argument x has type a. Is there any more descriptive name? Probably not, from the type we have no idea what properties hold for the types so a long descriptive name is just wasting space. For all we know, the argument could be a function. Or it could be a monad. What are you going to call it?

Is the method name important? It's definitely nice to have a good name, but is it essential? If I gave you quux :: a -> b -> a I'm betting you could tell me what it does?

In fact, armed with just a little knowledge about types you can start to infer what functions do without even needing to see their definition. Here's a few random functions with really poor names; what do they do?

bananaFactory :: a -> a

-- (a,b) is a tuple of two elements of type a and type b
spannerBlender :: (a,b) -> a

-- (a -> b) is a function taking anything of type a and returning type b
-- [a] is a list of items of type a
omgWTF :: (a -> b) -> [a] -> [b]

-- "Num a =>" says a must be an instance of the Num typeclass
-- think of this as specifying an interface
-- boing is a function of type taking two numbers and returning a number
boing :: (Num a) -> a -> a -> a

-- m is a type constructor that takes an argument of any type a
mindBlown :: (a -> b) -> m a -> m b

Armed with this basic knowledge of reading Haskell type signatures, you're now equipped to use Hoogle. You can search for the type signatures given above (a -> a, (a,b) -> a, (a -> b) -> [a] -> [b] and (a -> b) -> m a -> m b) and get a good idea of what these functions do.

So that's why I think long variable names are less common in functional programming. It's because the languages are terser (Uncle Bob's rule still applies) and because the type signature gives you the power of reasoning, not the variable names.

Names are important; but not all names are equally important.

No comments:

Post a Comment