Opaque Data Pointers

Most of the frameworks that I work on need to be usable from multiple programming languages (typically, C++, C# and Objective-C, but sometimes more). This means I must target the lowest common denominator in terms of the function prototypes, so I write a lot of C header files. But I’m an object-oriented junky at heart, and the languages which will eventually consume my frameworks all support object-oriented paradigms, so many of my C APIs end up being a flattened “classes.” This shouldn’t be too surprising as it’s a common technique for C frameworks.

What this means from a practical standpoint is that there is usually one or two APIs responsible for creating an “object” and this object gets passed into several other APIs as sort of a manual “this” pointer. What I want to talk about today are possible ways to design that object reference, and the pros and cons associated with each.

Because the goal is to design a clean API, we want to hide as many implementation details as possible. This means that we do not want to expose a structure to the caller that allows them to manipulate the internals of our object directly — they should be using our APIs to do that! So the object pointer we give back to the user should be opaque in that it doesn’t expose any data members directly. Hence the term opaque data pointer

One of the most simplistic ways to expose an opaque data pointer is to use a void *. It can represent an arbitrary memory location, but does not describe the format of the memory location in any way. So you can pass back a pointer to a class instance, or a pointer to a structure instance, or an integer into a std::map (cast as void *), etc. This is the approach taken in Win32, as a HANDLE object is simply a void *.

The downside to using a void * is that you lose all semblance of type safety. I know that C isn’t well-known for being the most type-safe language in the world, but it at least makes some attempts to ensure that you don’t assign a struct foo * to a struct bar *! However, all void * are the same datatype to the compiler, which means the user really could assign a struct foo * to a struct bar * without knowing it. One way to combat this problem is to define a common structure format for all of your “handle” types, and give it a magic value field. This way, when you receive a handle, you can double-check to make sure it’s a handle of the proper type. For instance:

typedef struct handle_type {
	unsigned long magic;
} handle_type;

#define HANDLE_TYPE_FOO		'foo '
#define HANDLE_TYPE_QUUX	'QuuX'

#define IS_HANDLE_TYPE( handle, type )	((((handle_type *)handle)->magic == type) ? true : false)

void FooDoSomething( void *handle ) {
	if (IS_HANDLE_TYPE( handle, HANDLE_TYPE_FOO )) {
	
	}
}

By doing something along these lines, you can protect yourself from your user’s mistakes. However, it doesn’t exactly make your framework user-friendly. It doesn’t help the user in any way, it just protects you when they mess up!

Another approach is to make use of the type system provided by C by creating unique types for all of your different objects. By having a unique datatype for each object, you have less chance of the user accidentally trying to pass the incorrect object instance around. You accomplish this by using typedef to declare a unique structure for each of your object types, like this:

typedef struct foo_ *FooPtr;
typedef struct bar_ *BarPtr;

void FooDoSomething( FooPtr handle );

Now if the user attempts to call FooDoSomething and pass in a BarPtr, they should get a compile error due to the types not matching (assuming they’re still in a language where they call these functions directly instead of through interop). This approach to opaque data pointers is very popular on the Mac; you’ll see it all over the Core Foundation.

There are two different ways you can look at this approach as a framework designer. Either your header contains a forward declaration for a structure you intend to define in one of your source files, or your header contains an incomplete type declaration. Either way is perfectly fine, it just boils down to your preference.

With the forward declaration, you are simply telling the consumer “here’s a type that you know nothing about”, but you can use that type in your implementation files since it will be defined for you. For instance, let’s say you have a header like this:

typedef struct foo_ *FooPtr;

FooPtr FooCreate( void );
void FooDestroy( FooPtr foo );

int FooGetIntValue( FooPtr foo );
void FooSetIntValue( FooPtr foo, int value );

Your implementation file could look like this:

typedef struct foo_ {
	int mVal;
} foo_, *FooPtr;

FooPtr FooCreate( void ) {
	return new foo_;
}

void FooDestroy( FooPtr foo ) {
	delete foo;
}

int FooGetIntValue( FooPtr foo ) {
	return foo->mVal;
}

void FooSetIntValue( FooPtr foo, int value ) {
	foo->mVal = value;
}

In this case, you’re header file acts as a forward declaration, and your implementation file completes the type information. Then you can access the data with impunity as the framework implementer, but your consumers won’t be able to access the structure details except through the API you provide.

The forward declaration form of opaque data pointers comes in very useful when you are working with simple container objects that you treat as a POD (plain old datatype). You get the type safety, you get the ease of use, and it works well. But it doesn’t work well for more complex datatypes where you are using classes instead of structures. You can’t forward declare a class in a C header file, and you shouldn’t mix and match class and struct when defining your datatypes. In this case, it’s best to use an incomplete object datatype instead.

In that case, you are merely finding creative ways to turn a void * into a specific type — the end result is that you get a typed pointer object, but it can only be used in very restrictive circumstances. For instance, you cannot dereference an incomplete type. But the nice thing about this approach is that you can be sure of the fact that the size of the pointer to the incomplete type is the same as the size of the pointer to the actual type, so typecasting will be a safe, well-defined operation.

For instance, if we were to use the declarations from above, but use FooPtr as an incomplete type, our implementation file could look something like this:

class Foo {
private:
	int mVal;

public:
	Foo() : mVal( 0 ) {}
	virtual ~Foo();
	
	int GetValue() const { return mVal; }
	void SetValue( int val ) { mVal = val; }
};


FooPtr FooCreate( void ) {
	return reinterpret_cast< FooPtr >( new Foo() );
}

void FooDestroy( FooPtr foo ) {
	delete reinterpret_cast< Foo * >( foo );
}

int FooGetIntValue( FooPtr foo ) {
	return reinterpret_cast< Foo * >( foo )->GetValue();
}

void FooSetIntValue( FooPtr foo, int value ) {
	reinterpret_cast< Foo * >( foo )->SetValue( value );
}

The tradeoff with this approach is that you find yourself doing a lot of typecasting to go back and forth between the types. However, it also means that you retain the typesafety of your opaque datatype, but do not take a performance penalty by having the extra layer of indirection required by the forward declaration approach (where you could put your class Foo * inside of your struct foo_ type).

The nice thing about either the forward declaration or the incomplete type approach to opaque datatypes is that the consumer of the framework won’t know the difference. You can switch between them at-will, and the user won’t know.

Regardless of what approach you take, using an opaque data pointer to hide the implementation details of your framework is a great thing. It allows you to perform refactorings and optimizations with considerably more ease because you don’t have to worry about whether users are relying on those details.

tl;dr: Opaque data types are a great approach that allow you to hide the implementation details of your APIs.

This entry was posted in Framework Design and tagged , . Bookmark the permalink.

8 Responses to Opaque Data Pointers

  1. Andrea says:

    Excellent post! The kind of best-practice advices I always look for in my c/c++ day by day adventures. Very clear, I just got a bit confused by “…but do not take a performance penalty by having the extra layer of indirection required by the forward declaration approach”.
    I don’t see the difference in terms of indirection btw the 2 approaches.
    Also the last “…(where you could put your class Foo * inside of your struct foo_ type)”: could you elaborate on what you mean with this and how it relates with the previous statement?
    Thanks, and keep up the very good job with your nice blog.
    Andrea

  2. Aaron Ballman says:

    @Andrea — glad you enjoy the posts!

    The difference between the two approaches is where the data lives. In one approach, you have a pointer to a structure that contains a pointer to a class that does the work. In the other approach, you have a pointer to the class that does the work directly, but have to typecast to access it. So the performance difference occurs because in the first case, you dereference the structure, then dereference the class, then call the function. In the second case you cast the structure to the class (no runtime penalty), dereference the class, then call the function.

    Eg)

    // 1
    typedef struct opaque  *FooPtr;  // header
    struct opaque {  // implementation
      class Foo *mFoo;
    };
    
    void fun(FooPtr s ) {
      s->mFoo->Function();
    }
    
    // 2
    typedef struct opaque_2 *FooPtr;  // header
    void fun( FooPtr s ) {  // implementation
      ((Foo *)s)->Function();
    }
    

    Does this make more sense?

  3. Andrea says:

    Ok, that makes sense now. I got confused because in your post the forward declaration approach uses
    typedef struct foo_ {
    int mVal;
    } foo_, *FooPtr;
    i.e. it doesn’t wrap a pointer to the class that “does the work”, and I missed that you were suggesting that right when comparing the approaches.
    Thanks for the clarification.

  4. Don Johnson says:

    This was very interesting Aaron. I’m curious… for Core Foundation opaque types like
    CFMutbleArrayRef is there a way to hack around the functions provided and change the elements of the array. I’m just wondering since you are given a reference to an element… I know I wouldn’t want to do this in practice but would like to know if something is breakable…

  5. Aaron Ballman says:

    @Don — I would imagine the answer is yes, but I’m uncertain as to the internal details of the class. As I understand it, the object starts with a function pointer to an is-a check, and the rest is custom depending on the type of the class.

  6. Aaron Ballman says:

    @Don — This may be of interest to you, btw: http://ridiculousfish.com/blog/posts/array.html

  7. hlide says:

    Hi,

    some years ago, I learned there were a third way to have opaque pointer which doesn’t need forward declaration and casting :

    public interface:
    // class Element; <—- no need to declare it forward
    class Stack
    {
    // v— you must use "class/struct *” and not ” *”
    class Element * _top; // <– pointer on opaque named class

    public:

    void push(class Element * elem); <– just use a pointer on opaque named class
    bool pop(class Element *& elem); <– again

    };

    private implementation (no need for casting with reinterpret_cast):

    class Element
    {
    Element * _next;
    public:

    Element * getNext() { return nullptr != this ? this->_next : nullptr; }
    Element * setNext(Element * next) { this->_next = next; return this; }

    }; // declare opaque named class

    void List::push(Element * elem) { _top = elem->setNext(_top); }

    bool List::pop(Element *& elem) { elem = _top; _top = elem->getNext(); return nullptr != elem; }

    However note that you cannot declare “struct Element” or respectively “class Element” in public interface then define “class Element” or respectively “struct Element” in private implementation.

  8. hlide says:

    hmm… discard the previous and let me reexplain it with a better example:

    You have a class Signal which implements an Win32 Event-like synchronization for any platform:

    public interface:


    class Signal
    {
    class HalSignal * _opaque; // totally opaque and private but still named
    public:
    Signal();
    Signal * Create(bool manual, bool initialState);
    void Close();
    void Set();
    void Reset();
    Wait(unsigned long timeout_ms);
    }

    private implementation for a win32 platform:


    class HalSignal
    {
    HANDLE * _handle;
    public:
    HalSignal(bool manual, bool initialState) { _handle = ::CreateEvent(..., manual, initialState); ... }
    ~HalSignal() { if (_handle) ::CloseHandle(_handle); }
    void Set() { if (_handle) ::SetEvent(_handle); }
    void Reset() { if (_handle) ::ResetEvent(_handle); }
    bool Wait(unsigned long timeout_ms) { return WAIT_OBJECT0 == ::WaitForSingleObject(_handle, (DWORD)timeout); }
    ...
    }

    private implementation of Signal:


    #if defined(__win32__)
    #include "win32/halsignal.hpp"
    #elif defined(__linux__)
    #include "linux/halsignal.hpp"
    #elif ...
    ...
    #endif
    Signal::Signal() : _opaque(nullptr) {}
    Signal * Signal::Create(bool manual, bool initialState) { Close(); _opaque = new HalSignal(manual, initialState); }
    void Signal::Close() { if (_opaque) delete _opaque; }
    void Signal::Set() { if (_opaque) _opaque->Set(); }
    void Signal::Reset() { if (_opaque) _opaque->Reset(); }
    bool Signal::Wait(unsigned long timeout_ms) { return _opaque ? _opaque->wait(timeout_ms) : false; }

Leave a Reply

Your email address will not be published.