Monday 23 February 2009

Bit Fields using Clojure

A bit array is a way of getting a very compact array of Boolean values with each value being represented as a single bit. Bit arrays are usually associated with low(er)-level like C, but you can do them in Clojure too.

Clojure provides array functions through aget and aset (slightly strange in that it mutates a data structure in place). Using Java arrays isn't very functional, but in my opinion this is about providing the balance between strictness (e.g. Haskell) and leniency (e.g. C).

We define a bit-array as a structure consisting of some array data and the width of each field.


(defstruct bit-field :element-width :array-data)

(defn bit-array
[n]
(struct bit-field 31 (int-array (inc (int (/ n 31))))))


Where 31 is the range of an integer in Java (it doesn't support unsigned) (yeah, my thinking is fuzzy here, but I think that's right... Certainly 32 fails all my tests.).

A bit-array only consists of a couple of operations, get-bit and set-bit!. The bang (!) notation at the end of the set-bit function name is an informal way of indicating that the function mutates its data.


(defn set-bit!
[bitfield bit val]
(let [r (mod bit (bitfield :element-width))
n (int (/ bit (bitfield :element-width)))
x (aget (bitfield :array-data) n)]
(if (not (zero? val))
(aset (bitfield :array-data) n (bit-or x (bit-shift-left 1 r)))
(aset (bitfield :array-data) n (bit-xor x (bit-shift-left 1 r))))
bitfield))

(defn get-bit
[bitfield bit]
(let [r (mod bit (bitfield :element-width))
x (aget (bitfield :array-data) (int (/ bit (bitfield :element-width))))]
(if (= 0 (bit-and x (bit-shift-left 1 r))) 0 1)))


We work out the index in the array to change by dividing the bit to set by the width of each element and use bit-shift-left to identify the bit in question to twiddle.

How can we be sure this works? Well, we can never be 100% sure without a proof, but we can at least run it against a reasonable set of data! Clojure Contrib has a very simple library for testing. You define tests using the deftest macro, each test consists of a number of is functions that verify assertions.



(deftest test-bits
(let [n 3000
f (bit-array n)]
(is (= 31 (f :element-width)))
(doseq [x (range 0 n)]
(is (= 0 (get-bit f x))))
(doseq [x (range 0 n)]
(set-bit! f x 1)
(is (= 1 (get-bit f x))))
(doseq [x (range 0 n)]
(set-bit! f x 0)


Tests are run using run-tests. For example:


bitarray> (run-tests 'bitarray)

Testing bitarray

Ran 1 tests containing 9001 assertions.
0 failures, 0 errors.
nil


So it passes the tests therefore it at least vaguely works!

4 comments:

  1. I wrote a macro that's similar in spirit, but allows you to access structures composed of variable-sized integers:

    http://github.com/nathell/clj-bitfields/tree/master

    ReplyDelete
  2. Hi Daniel,

    That looks really interesting, mine only focuses on single bits so that's a nice generalization. Would have been very useful for the ICFP 2009 contest as I struggled with problems like that!

    ReplyDelete
  3. Wouldn't it be more straightforward to use a boolean array, instead of encoding booleans as int bits?

    ReplyDelete
  4. @Febeling - yeah, it'd definitely be more straight forward to use a Boolean array.

    A Boolean array isn't necessarily implemented as a bit field - see here where it implies that the size of a Boolean array is one byte per Boolean.

    Java also has java.util.BitSet class which I could have used as well.

    This was more about me learning about array interop with Clojure :)

    ReplyDelete