Whining about iterators

In the STL, there are multiple classes of iterators: random access, bidirectional, forward, input, and output. I’d like to discuss the different types of iterators in a bit more detail, so that I can whine about “missing” functionality.

The most restricted iterator types are input and output iterators. These types are unidirectional, single pass iterators. This means that they can only advance over the list a single element at a time, and once an item has been iterated, it will never be iterated again. For instance, consider an input iterator that iterates over std::cin. It will return a character at a time, as they are ready in the input stream, but you can never “go back” to a previous character in the stream. Output iterators are much the same, conceptually.

A forward iterator is also a unidirectional iterator — it can only advance a single element at a time. However, unlike input and output iterators, a forward iterator is a multi-pass iterator. This means that you can “go back” to a previous character, but you cannot do so from the iterator object itself. This sounds a bit confusing, so let’s look at some code:

pretend_forward_iterator iter = some_list.begin();
pretend_forward_iterator iter2 = iter;

item i = *iter;  // Legal, we're using a first pass

++iter;  // Legal, moving forward
--iter;  // Illegal!  It's a forward-only iterator

item i2 = *iter2;  // Legal, we're using a second pass to read an earlier item

Contrast this with an input iterator, with which the

item i2 = *iter2;

would also be illegal, since it would require a second pass.

A bi-directional iterator is an iterator that can advance or retreat, but can only do so one item at a time. This means you can use both ++ and — to traverse the list, but cannot use the + or – operators to do so. It is also a multi-pass iterator, so you can re-read elements.

Finally, a random access iterator is an iterator that lets you navigate the list however you’d like. You can advance and retreat with ++ and –, you can skip around using + and -, and it is a multi-pass iterator allowing you to access the same element multiple times.

So what could possibly be missing from this list of functionality? An adapter iterator. The current list of iterators allow for highly optimized operations over lists by restricting what you can do with the iterator. However, there are times when optimization isn’t the key concern, but clarity is. In all circumstances, you can adapt a less restricted iterator to be more restricted with no loss of performance. In most circumstances, you can adapt a more restricted iterator to be a less restricted one with varying loss of performance. However, there are some circumstances you cannot adapt the more restricted iterator to be less restricted — but there are exceptions that can cover that.

For instance, consider an input iterator over std::cin, and writing a lexer. The purpose of a lexer is to scan through the source data set, looking for patterns of strings to create tokens. So, for instance, you read a ” scan over characters looking for the closing ” and create a string token with the contents. You could do that something like this:

Token LexString( istream_iterator& iter, const istream_iterator& end ) {
  // The caller has peeked the open quote, so we scan for the closing
  // quote.  We are not handling any escaped characters.
  istream_iterator begin = iter++; // Skip the quote for our scan
  while (iter != end) {
    if (*iter == '"') {
      // We found the end of the string, so grab the contents of the data
      // and make a string token
      std::string contents( begin, iter++ ); // Skip the close quote
      return Token( Token::kString, contents );
    }
    ++iter;
  }
  return Token( Token::kBadToken );
}

The code is fairly clear — save where we started lexing the string, loop over the input until we find the closing quote, then use the iterator to create a std::string object which is passed into the token we create. However, because istream_iterator is an input iterator, this code won’t work. It would require a second pass over the iterator in order to create the std::string object.

This is a good example of where an iterator adapter would come in handy. If we could adapt the input iterator so that it becomes a forward iterator, then the code would work fine. However, no such adapter exists (to my knowledge).

So what would it take to create an adapter iterator? How would one work?

Without having put too much thought into the implementation, I would say it would be a class conforming to the target iterator specification, that wraps an existing iterator. For instance, class forward_iterator_adapter which has a constructor that accepts an iterator to adapt. If we are creating a less restricted iterator, the class would have to store a buffer of elements that have previously been read from the wrapped iterator, which would allow us to perform semi-random access backwards. For random access forwards, the wrapper would have to scan forward on the wrapped iterator, storing values as it goes.

It’s not overly efficient because it means we now have to store elements in at least one more iterator. However, it would at least get the job done.

Don’t get me wrong, I like the division between the iterator classes. It helps to ensure you are using the write tool for the job. However, there are times when the restrictions placed on the iterator require you to write considerably uglier code. Taking my example from above and making it work would look something like this:

Token LexString( istream& stream ) {
  // The caller has peeked the open quote, so we scan for the closing
  // quote.  We are not handling any escaped characters.
  streampos begin = stream.tellg();
  while (!stream.eof()) {
    if (stream.get() == '"') {
      // We found the end of the string, so grab the contents of the data
      // and make a string token
      streampos end = stream.tellg();
      stream.seekg( begin );

      // We could avoid the extra alloc by reading in chunks, but that
      // also makes the code more complex.
      char *buffer = (char *)::malloc( end - begin );
      stream.read( buffer, end - begin );
      std::string contents( buffer, end - begin );
      ::free( buffer );
      return Token( Token::kString, contents );
    }
    ++iter;
  }
  return Token( Token::kBadToken );
}

Granted, there are other ways to write the code (not using streams, reading the buffer in chunks, etc). However, the original code I was trying to accomplish was quite clear and concise. It would be nice if it was also functional. Who knows, maybe I’ll sit down and find a reasonable way to solve this problem. Or I’ll discover why no one else has!

This entry was posted in C/C++ and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *