ArrayList
and be done with it. Right? Unfortunately, if you choose the wrong data type and don't encapsulate it properly then you're saddled with the results for the foreseeable future.
Using data structures
One of the basic principles of object-oriented design is that you can hide details (encapsulation). One of the best things to hide is the way you've got your data stored. At least that way if you do choose a list of pairs instead of a map, you can at least fix it locally!
In most of the legacy code bases I've seen, developers have gone the other way. They've spot-welded the choice of data structure onto objects and shouted it proudly about the code base.
// I AM DEFINITELY A LIST
class Customers : List<Customer> {
}
List
(in which case this is probably an abstraction too far) or it represents a group of customers that has specific behaviour.Slightly better is to at least hide some details. Interface implementation isn't quite so spot-welded as implementation inheritance. At least now you can change the underlying implementation without telling the outside world. I've seen people object to this because of the "noise" it generates (all those delegating members). This monotony can often be auto-generated (thanks ReSharper) and these delegating members are often a stepping stone to a clearer design.
class Customers : IList<Customer> {
private List<Customer> implementation;
// a million and one delegating members
}
Customers
an object in its own right. Give it methods to manipulate its data. Make it a living breathing object, and not just a pale copy of a collection.class Customers {
public Invoice GenerateInvoice();
private Set<Customer> customers;
}
class Customer {
public Set<Customer> GetCustomers() {
return customers;
}
}
var set = customer.GetCustomers();
logger.Debug("Found {0} customers", set.size());
// Clear the memory, clearly we don't need it.
set.Clear();
Customers
object is responsible for managing that collection. If you expose its internal details to the world then it's got no chance!If you have to, use the types (and if you must the objects) in your language that make sure you won't get aliasing problems (Collections.unmodifiableSet) or IEnumerable). As a slight aside, I strongly dislike how Java's iterators define
remove
, and instead I have to rely on a run-time guarantee of immutability (never did get a great answer to this question).Choosing the right data structure
There are two parts to every data structure. The first is the abstract data type - what operations does the data structure allow? The second is the underlying implementation of the ADT. What are the trade-offs? Is it a linked list or an array list?
In my experience people tend to focus on the latter (how is it implemented) instead of choosing the right data structure in the first place. Every time I see a list of pairs instead of a map, or a list without duplicates instead of a set, I sigh a little.
As a quick example of the perils of choosing the wrong data structure, I've recently been working on refactoring some code of this form.
var xs = new List();
var ys = new List();
foreach(var y in ys)
if (xs.Contains(y))
DoSomething();
There's a number of problems with the code above, but the root cause is choosing the wrong data structure. Despite the rich collection libraries in Java/C# most developers plump for a list. Premature optimization is one thing, but choosing the wrong data structure is just as bad.
Hide your decisions about collections. At least that way you can fix it locally.