The Amazing Visitor Pattern

I’m a big proponent of using design patterns whenever they are the proper tool for the job. One of the design patterns I find myself pulling out of the toolbox fairly frequently these days is the visitor design pattern. However, it’s not one of the more common patterns you see many people blogging about. So I’d like to cover it with some real-world examples of how I put it to use.

Before delving into code behind the visitor pattern, it’s best to understand the problems this pattern sets about solving. In C++ (and many similar OO languages), you have what’s called single dispatch semantics for method calls. Which method gets called is based on the vtable of the class instance object for the method being called. When you have an expression like foo.bar(x), foo’s dynamic type is used to determine what method bar is called. Functions can be overloaded based on the static type of x to call different versions of the bar method. However, this overloading is done at compile time. The compiler generates different internal names for functions based on the parameter information. So, for instance, given:

void bar( int i );
void bar( double d );

The compiler might mangle the names to bar_i and bar_d to represent the integer and double versions of the functions, respectively. So when you go to call the method, the compiler can use the type information known at compile time of the parameter to look up the appropriate instance of the method to call.

However, there are times when single dispatch is not powerful enough to express some ideas elegantly. For instance, if you want to walk a tree structure, performing an action for every node. This is fairly trivial if all of the nodes are of the same type. You could do so like this:

struct Node {
  struct Node *Left, *Right;
  void *Data;
} Node;

class INodeHandler {
public:
  virtual void HandleNode( Node *n ) = 0;
};

class TreeWalker {
public:
  void Walk( Node *n, INodeHandler *handler ) {
    handler->HandleNode( n );
    if (n->Left)
      Walk( n->Left, handler );
    if (n->Right)
      Walk( n->Right, handler );
  }
};

You call the Walk method and pass in the root of the tree, and it will call HandleNode for every node in the tree. This works fine since all of the Nodes are the same type. However, if the Nodes contain different types of data, this is fairly inelegant for C++ — we’re not likely to use a Node with a void * representing the data, as that’s not particularly type-safe. Instead, we’re likely to have a Node base class, and represent the data as subclasses. Let’s see how that might work:

class Node {
  friend class TreeWalker;
  Node *Left, *Right;

protected:
  Node( Node *left, Node *right ) : 
    Left( left ), Right( right ) {}

};

class AwesomeNode : public Node {
  int AwesomeData;

public:
  AwesomeNode( Node *left, Node *right, int data ) : 
    Node( left, right ), AwesomeData( data ) {}
};

class BadNode : public Node {
  double BadData;

public:
  BadNode( Node *left, Node *right, double data ) :
    Node( left, right ), BadData( data ) {}
};

Let’s say we want to walk a tree comprised of AwesomeNode and BadNode instances, and do something different for each one by using parameter overloading:

class INodeHandler {
public:
  virtual void HandleNode( AwesomeNode *node ) = 0;
  virtual void HandleNode( BadNode *node ) = 0;
};

If you were to try to compile this code along with our original TreeWalker class, you would find that you get a compile error:

error C2664: ‘void INodeHandler::HandleNode(AwesomeNode *)’ : cannot convert parameter 1 from ‘Node *’ to ‘AwesomeNode *’

This is one of the limitations of single dispatch; only the static type of the parameter being passed at the call site can be used to determine the function to call. In this case, Walk takes a Node *, but not all Node * are AwesomeNode * (the same is true for BadNode *) and so the compiler cannot determine which HandleNode method should be called.

The solution to this problem is to use double dispatch instead. With double dispatch, the dynamic type of the class the method is being called on is still used, but so is the dynamic type of the parameters. C++ does not facilitate this as part of the language, so we must devise our own way to perform double dispatch.

Our ultimate goal is to perform different work for each Node subclass while we walk the tree. This means that the Node subclasses have to perform some of this work. The tree walker needs to call a virtual method on a Node instance to access the proper Node subclass (this is single-dispatch at work; the dynamic type of the object is being used). But then we want to perform actual work based on the Node subclass we’re in while relying on the static type of the Node.

The visitor pattern is the codification of that very goal. And I’ve already described how it works, in a nutshell. We turn the problem on its head, so to speak. The walker calls a virtual method on the node, and the node subclass is responsible for calling the method on the handler. This is C++’s version of double dispatch because the walker dispatches the call to the node, and the node dispatches the call to the handler. Here’s what it would look like were we to continue with the overloading approach in the handler interface:

class INodeHandler {
public:
  virtual void HandleNode( class AwesomeNode *node ) = 0;
  virtual void HandleNode( class BadNode *node ) = 0;
};

class Node {
  friend class TreeWalker;
  Node *Left, *Right;

protected:
  Node( Node *left, Node *right ) : 
    Left( left ), Right( right ) {}

  virtual void Handle( INodeHandler *handler ) = 0;
};

class AwesomeNode : public Node {
  int AwesomeData;

public:
  AwesomeNode( Node *left, Node *right, int data ) : 
    Node( left, right ), AwesomeData( data ) {}

  void Handle( INodeHandler *handler ) {
    handler->HandleNode( this );
  }	  
};

class BadNode : public Node {
  double BadData;

public:
  BadNode( Node *left, Node *right, double data ) :
    Node( left, right ), BadData( data ) {}

  void Handle( INodeHandler *handler ) {
    handler->HandleNode( this );
  }
};

class TreeWalker {
public:
  void Walk( Node *n, INodeHandler *handler ) {
    n->Handle( handler );
    if (n->Left)
      Walk( n->Left, handler );
    if (n->Right)
      Walk( n->Right, handler );
  }
};

As you can see, Walk is called and it dispatches the call to Node::Handle. Since that’s a pure virtual function, all of the subclasses of Node must implement the method. The Handle method of the proper subclass of Node is called, and the handler object is passed into it. Then the node turns around and calls the HandleNode function on the handler, passing in “this” as the parameter. The static type of “this” is known at compile time, and so the proper overload is called. In this fashion, we are able to perform work based on the node type for every node in the tree.

While overloading certainly works for the node handler interface, it can quickly become a cognitive burden since all of the methods have the same name. Due to this, it is generally advised to name the handler methods more appropriately, and modify the call site within the nodes. For instance, instead of void HandleNode( BadNode * ), you would have void HandleBad( BadNode * ), and BadNode::Handle’s implementation would call handler->HandleBad( this ). It’s not strictly required, but it will definitely make your code easier to read.

You may have already spotted the drawback to using the visitor pattern. When you add a new Node subclass, you must add a new method to the interface, and you must implement the boilerplate Handle function to call the interface method. Only then can you implement the new Node’s logic within the handler. This can be a bit of an onerous process, especially if you are continually adding new nodes. Unfortunately, this is the nature of the beast — in order to get the double dispatch, you have to do double the work. However, if your code is truly boilerplate, you could automate the process with a simple code generator or even some macros.

#define DECLARE_NODE( node )  virtual void Handle##node( class node##Node *n ) = 0
#define DEFINE_NODE( node )   void Handle( INodeHandler *handler ) {  \
                                handler->Handle##node( this ); \
                              }

Now that you’ve seen how to implement the visitor pattern, and you have a use case demonstrating why, I’d like to cover one last topic: how to know when the visitor pattern is the right tool for the job. In my experience, the visitor pattern is a great way to solve “walking” problems where you have disparate data items that you want to process in different ways. For instance, in compiler work you typically have a tree of nodes called the abstract syntax tree (AST). These node represent the user’s code in a tree structure. You will walk this structure several times to perform work such as creating a symbol table, doing semantic analysis, doing code generation, etc. In this case, you have multiple visitors (INodeHandler implementers) to perform actions on disparate data (the syntax nodes). The data needn’t always be tree-like in nature, however. Any list of data could benefit from the visitor pattern, so long as you intend to implement several different handlers for the data.

Just like there are times when the visitor is the right pattern to use, there are times when it is the wrong pattern as well. If your data is not disparate, the visitor pattern becomes effectively useless since only single dispatch is required. Also, if you only ever foresee implementing a single visitor, then the visitor pattern may be overkill for your needs. The visitor pattern really shines when you need several different visitors since you can encapsulate their logic properly.

Hopefully you have a better understanding of what the visitor pattern is, what problems it solves, and what problems it doesn’t solve.

The Amazing Visitor Pattern

One Response to The Amazing Visitor Pattern

Leave a Reply

Who

Search

Categories

Archives

Feeds

The Amazing Visitor Pattern

One Response to The Amazing Visitor Pattern

Leave a Reply

Who

Search

Categories

Keywords

Archives

Feeds