Repository Pattern – Are we using it wrong?

The repository pattern has seen a lot of publicity lately – if you’re not sure how it works I would suggest that you take a look at Rob Connery’s excellent MVC Storefront webcast series.  While implementing the repository pattern a couple of weeks ago it struck me that I might be introducing subtle concurrency errors.

Lazy List

The problem does not lie with the repository pattern itself – the problem lies with the use of the Lazy List class, also introduced in the MVC Storefront series.  The Lazy List class allows us to late-load data associated with a parent object – it’s the way we do lazy loading with the repository pattern (hence the name).

The implementation is very simple – we basically wrap the query for loading associated data in an IList implementation and only execute the query when necessary.  You can find the implementation as well as the discussion of how it came about on Rob Connery’s blog.

Don’t get me wrong – there is absolutely nothing wrong with the implementation – I’m simply trying to illustrate that we can use this class in a way that introduces concurrency errors.

Example

A while ago I wrote an application similar to StackOverflow as an illustration of MVC – I’m going to use this as an example.  In this domain model we have a post object, and a post has zero or more comments.

Domain

This is the ideal scenario for using the Lazy List class – we want to delay the loading of comments until they are used.  The alternative is to load all comments when we load the post object – an unnecessary (and possibly expensive) overhead.

public IQueryable<Domain.Post> GetPosts()
{
    var posts = from post in dataContext.Posts
                join user in dataContext.Users on post.CreatedBy equals user.Id
                select new Domain.Post
                {
                    Id = post.Id,
                    Title = post.Title,
                    Body = post.Body,
                    Rating = post.Rating,
                    Views = post.Views,
                    Tags = post.Tags,
                    CreatedOn = post.CreatedOn,
                    CreatedBy = user.Username,
                    Timestamp = post.Timestamp,
                    Comments = GetCommentsForPost(post.Id)
                };

    return posts;
}

private IList<Domain.Comment> GetCommentsForPost(int postId)
{
    var comments = from comment in dataContext.Comments
                   where comment.PostId == postId
                   select new Domain.Comment
                   {
                       Id = comment.Id,
                       Message = comment.Message,
                       CreatedOn = comment.CreatedOn,
                       Timestamp = comment.Timestamp
                   };

    return new LazyList<Domain.Comment>(comments);
}

If you don’t believe the lazy loading is working, fire up profiler and check for yourself.  Go on, I’ll wait right here.

There are 2 scenarios that I want to cover in terms of concurrency issues.  To get started, lets take a look at a single post with 2 comments.

Data

So imagine Joe developer came along and asked a question about Attributes in C# and so far there has been 2 comments.  I loaded the post using the repository above and the query for loading the associated comments has been loading into the lazy list object but not yet executed.

Loading data that wasn’t there at first

Now let’s take a look at a scenario where a comment was added after we loaded the post but before the comments were loaded.

var post = repository.GetPosts().WithId(2);

AddComment(post, "Added after we loaded the post");

DisplayPost(post);

FirstExample

The lazy loading is working and it resulted in us loading data that wasn’t there when we loaded the original post.  So what should have happened?  What data should we have seen?

The lazy loading resulted in us loading the correct data.  If we had not lazy loaded the comments (and loaded them upfront with the post) we would have missed the last comment – our post object would have been outdated.  Pretty cool – the lazy loading actually resulted in us loading the most current associated data.

Now let’s look at another scenario where the results are not that great.

Loading data that shouldn’t be there

Imagine the same scenario as above, but instead of a simple comment being added, Joe developer modified his original post and 2 comments were added due to his change.</p>

var post = repository.GetPosts().WithId(2);

ModifyPost(post, "Attributes in C# - UPDATED!");
AddComment(post, "Woah - you updated the post");
AddComment(post, "No way - that's crazy - updating the post");

DisplayPost(post);

SecondExample

Again the lazy loading resulted in us loading comments which weren’t there when we loaded the original post, but here we have a concurrency error – we have 2 sets of data which are incorrect when combined.  The original post was modified and as a result the comments don’t make sense.  (Of course in this case the nature of the comments dictate that they don’t make sense, but you get the idea)

Of course this is unlikely to happen in the case of a post/comment system and even if it does occur, the results are not that dramatic.  However, imagine a hospital system where the list of a patient’s medicine is lazy loaded or an accounting system where expenses against an account are lazy loaded – inconsistent results can have serious consequences.

So what should we do?

What’s the conclusion?  Should we stop using lazy loading in critical areas?  Not necessarily.  In the examples above, the post we displayed was only problematic when the post was modified before we loaded the comments.  By modifying the lazy list we can maintain the lazy loading as well as avoid concurrency issues.

My original idea was to join to the posts table and add a where clause for the timestamp of the post.  While this would prevent us from loading any comments that are not specific to the original version of the post, if the post is modified (as in the second example), the result would simply be that no comments are returned – the where clause would simply exclude all comments.  We would have no way of knowing if we are dealing with a post with no comments or a concurrency issue.

The alternative is to store 2 queries – one for loading the original object with the original timestamp and one for loading the associated objects.  We execute both queries inside a transaction – if the original object is found we know that it has not been modified and we can load the associated objects.

public IList<T2> Inner
{
    get
    {
        if (inner == null)
        {
            using (var transaction = new TransactionScope())
            {
                if (Equals(parentObject.SingleOrDefault(), default(T1)))
                {
                    throw new ChangeConflictException(string.Format("Lazy loading of a list of objects of type {0} failed - the parent object of type {1} was modified", typeof(T2).Name, typeof(T1).Name));
                }
                inner = query.ToList();

                transaction.Complete();
            }
        }
        return inner;
    }
}
private IList<Domain.Comment> GetCommentsForPost(int postId, Binary timestamp)
{
    var parentObject = from post in dataContext.Posts
                       where post.Id == postId && post.Timestamp == timestamp
                       select new Domain.Post();

    var comments = from comment in dataContext.Comments
                   where comment.PostId == postId
                   select new Domain.Comment
                   {
                       Id = comment.Id,
                       Message = comment.Message,
                       CreatedOn = comment.CreatedOn,
                       Timestamp = comment.Timestamp
                   };

    return new LazyList<Domain.Post, Domain.Comment>(parentObject, comments);
}

I’m not too crazy about using the TransactionScope object inside the Lazy List class, but I can’t see any other way of ensuring concurrency and still using lazy loading.

If I run the second example again an exception is generated.

Exception

Final thoughts

You could argue that this type of concurrency is unnecessary – and you would probably be right.  In most applications these types of errors would be incredibly rare and when they occur the user would simply need to hit refresh.  I am only trying to show a way of eliminating this type of error for systems where this type of concurrency is necessary.