Tuesday, 23 April 2013

Lamenting the lack of RAII in C# / Java

One of the hardest problems in programming is managing resources. How do you make sure you don't have a memory leak, reclaim that opened file handle, or give up that network connection?

In the C language you get a file handle via the fopen function, and you absolutely must remember to return it with fclose. If you forget to call fclose then the underlying file descriptor in the operating system is not closed and eventually you'll run out of file handles. In the simple case, it's easy. Just open the file and close it. If you aren't careful, it starts to get pretty complicated though. What if you have multiple exit points in your function; you've got to remember to fclose everywhere. Manually managing resources is a really tough problem. Here's a simple example that leaves a file descriptor open. Can you see why?

int copyFile(char* szIn, char* szOut) {
    FILE* in;
    FILE* out;

    in = fopen(szIn,"r");
    if (in == NULL) return 7;

    out = fopen(szOut,"w");
    if (out == null) return 8;

    /* read from in, write to out */

    return 0;

C++ introduced the "RAII" pattern (resource acquisition is initialization) which is a fancy way of saying acquire in your constructor and release in your destructor. By using RAII I can write a file copy and not need to worry about remembering to close files.

int copyFilePlusPlus(char* szIn, char* szOut) {
   ifstream in(szIn);
   ofstream out(szOut);

   if (!in.is_open()) return 7;
   if (!out.is_open()) return 8;

   /* do the copy */

   return 0;

The knowledge about the resources can now be completely hidden in the object. Progress!

The most common resource to manage is memory. Garbage collection frees the programmer from having to worry about reclaiming memory (and hence makes it more difficult to leak memory). Unfortunately, the implementation of garbage collection cedes control of when objects will be released. The lack of deterministic finalization means that resources handling becomes more difficult. Without a destructor that runs when an object goes out of scope you can't hide the details of resource ownership quite as elegantly as you can with RAII. (Brian Harry discusses some of the reasons why C# doesn't have deterministic finalization here)

C# introduces the using syntax to deal with this problem. This gives you deterministic finalization giving you precise control over the lifetime of a resource.

using (var f = File.OpenRead("foo.txt")) {
  // use f here

FileStream f = null;
try {
    f = File.OpenRead("foo.txt");
finally {
    if (f != null) f.Dispose();

The using statement expands out and lets us avoid writing a whole lot of boilerplate code to deal with it. This is progress of a kind, but it does mean we have to explicitly remember to call Dispose and we can't insulate the knowledge of the resource behind an object as we can in C++. Objects with unmanaged resources must implement the IDisposable interface and people who use these objects must remember to use using blocks.

Java was a bit slow in adopting this convention, but eventually got around to copying it with Java 7 (see try-with-resources).

Without the RAII pattern, Java and C# are more susceptible to leaking resources. As Herb Sutter says

I called this Dispose pattern “a coding pattern that is off by default and causing correctness or performance problems when it is forgotten.“ That is all true. You have to remember to write it, and you have to write it correctly. In constrast, in C++ you just put stuff in scopes or delete it when you're done, whether said stuff has a nontrivial destructor or not.

Tools like Gendarme can perform static analysis to attempt to find instances where variables haven't been disposed, but it's not perfect.

So what are the best practises to manage resources in Java/C#?

  • You should always dispose of locally created IDisposable or AutoCloseable objects. If you create a file stream to read in, remember to close it! Use tools like Gendarme or Resharper to check for violations.

  • If you write a class that has a member variable that is disposable, then that same class should also be disposable.

  • Have a clear convention for denoting who owns a given return value. For example, GetXYZ is unclear, does this return a new XYZ that I must dispose, or does it return a shared reference to an XYZ?

  • Make object lifetimes clear!

  • Fail fast - don't let disposed objects be used again (see the Disposable pattern)

The main point to take away is that resource management is hard, but it's often easy to assume it isn't until it goes wrong!