Build a Collection of Unique Values in Ruby

One of the really cool things about working with Rails is that you can actually look at the source code – it’s a bit like seeing how a magician does all his tricks. I’m always interested in this type of code since it’s usually a very good indication of best practices.

I came across a little snippet of code while reading The Rails 3 Way that is a nice little example of how to build a collection of unique values in Ruby. This particular snippet is used to build a unique collection of values when the :uniq => true option is used on the AssociationCollection class within Active Record.

def uniq(collection = self)
  seen = Set.new
  collection.inject([]) do |kept, record|
    unless seen.include?(record.id)
      kept << record
      seen << record.id
    end
    kept
  end
end

Being pretty new to Ruby myself I didn’t understand all of it, which is what triggered this post.

This is the first time I’ve seen the Set class being used. I remember there being in Set class in C# as well, but it doesn’t seem to be a popular collection type. As I understand it a Set is very similar to an Array, except that order doesn’t matter (and isn’t preserved) and lookups are very fast. The description in the Ruby Documentation says it has ‘Hash’s fast lookup’ which would suggest to me that lookups are O(1). From the description it sounds particularly applicable to this problem.

The second bit I found confusing was the usage of Enumerable#inject. I’ve blogged about this bit of magic before, but this particular overload seems a bit different. Again, the Ruby Documentation is pretty useful for figuring out what’s going on.

So basically we want to loop over the list of values and only keep those which are unique. As we are looping we maintain 2 lists – all the ids we have already encountered and the current list of unique records.

This particular overload of the inject method takes an initial list of values (an empty array, in this case) and a block with 2 parameters. The first parameter – kept – is a reference to the last returned value from each iteration in inject. The second parameter – record – is the current value we are looping over. Once you understand the parameters and realize that a Set can almost be treated like an array, the code is straightforward.

If you’re unfamiliar with Ruby keep in mind a method will always return the last evaluated expression (unless there is an explicit return statement) – that’s why there is a single line at the end evaluating the value of kept.

Happy coding.