String Resources

On Windows, when you need to access a string resource, you turn to the LoadString API. It takes care of finding the string for you, loading it, and copying it into the buffer you supply. However, there are times when LoadString simply falls short. For instance, for my day job, I found myself needing to access localized resources regardless of the user’s current UI locale. The only way to do this is to use the FindResourceEx function and pass in the specific language you’re after. This works fine for most resources, but strings would always come back as not found! I want to cover this particular case in more depth today.

The string resource is special in that the ID the user assigns to the strings is not actually the real resource ID. The MSDN documentation hints at this, but doesn’t go into deep enough detail. It says,

String resources are stored in sections of up to 16 strings per section. The strings in each section are stored as a sequence of counted (not necessarily null-terminated) Unicode strings.

This tidbit of information only touches the surface of strings, unfortunately. It is accurate that strings are stored in blocks of 16, possibly not null-terminate Unicode strings. But there are several key pieces of information missing.

  • The ID you pass into FindResource/Ex is not the string ID, but the 1-based block ID the string lives in. You get this by dividing the string ID by 16 and adding 1.
  • The text contained by the string is Unicode, but the strings themselves are like wide character Pascal strings. The first two bytes of the string are not part of the data. Instead, they specify the length of the string.
  • There are always sixteen strings in the block returned, but the strings may be zero-length. In this case, you will have two bytes of zeros for each of the “empty” strings.
  • You find the string you are after by taking the string ID modulo 16 and using that as an index into the array of 16 strings within the block.
  • Because of this formatting, there is no difference between an empty string and a non-existent string.

So in order to accomplish what I was after, I needed to write my own LoadString function, accepting the language I was after. Given what I’ve mentioned above, the algorithm for loading a string is:

  1. Get the block number and offset number from the string ID
  2. Call FindResourceEx, passing in RT_STRING, the block number and the language you’re after
  3. Call SizeOfResource to determine the total length of the block. Call LoadResource and LockResource to get a pointer to the start of the block
  4. Loop over the first N strings to reach the offset calculated from the original string ID. When skipping a string, read the length byte and advance by that much (plus one character for the length)
  5. When reaching the proper string offset, copy the number of bytes into the passed buffer

The code for this looks something like this:

int WINAPI LoadStringExW( HINSTANCE hInstance, UINT uID, LPWSTR lpBuffer, 
  int nBufferMax, WORD wLanguage ) {
  // Loading a string is a bit strange.  Strings are grouped by blocks of 16
  // items.  However, the ID passed in does not reflect this.  This means 
  // that the resource itself does *not* have the same ID as what the user
  // expects!  Instead, the actual resource is a combination of the block
  // number and index within the block.  This can be calculated by 
  // translating the ID.
  UINT blockNumber = (uID >> 4) + 1;
  UINT indexNumber = uID % 16;

  // Now we can attempt to find the resource by block number
  HRSRC hResource = ::FindResourceEx( hInstance, RT_STRING, 
    MAKEINTRESOURCE( blockNumber ), wLanguage );
  if (hResource) {
    // Get the size of the block; we need to traverse it to find the index
    // we are after.  The strings are like Unicode versions of Pascal 
    // strings; they are prefaced with a single byte denoting the number of
    // characters in the string (or 0 if the string is empty).
    DWORD size = ::SizeofResource( hInstance, hResource );
    HGLOBAL glob = ::LoadResource( hInstance, hResource );
    if (glob && size) {
      LPCWSTR buffer = static_cast< LPCWSTR >( ::LockResource( glob ) );
      LPCWSTR end = buffer + size;
      if (buffer) {
        int idx = 0;
        while (buffer < end) {
          // Get the length byte
          WORD length = static_cast< WORD >( buffer[ 0 ] );
          if (idx == indexNumber) {
            // We are at the string we're after, so copy it into 
            // the buffer the caller passed.  If the caller did not
            // pass a buffer (the buffer length is zero), then we
            // point the passed buffer to the start of the resource.
            if (nBufferMax) {
              if (0 == ::memcpy_s( lpBuffer, nBufferMax, &buffer[ 1 ], 
                         length * sizeof( wchar_t ) )) {
                // Return the number of characters copied into
                // the buffer
                return min( nBufferMax, length );
              }
            } else {
              lpBuffer = const_cast< LPWSTR >( &buffer[ 1 ] );
              return 0;  // This is the number of characters copied!
            }
          }

          // Advance by the string length, plus one for the length 
          // byte itself
          buffer += length + 1;

          // Advance our index
          ++idx;
        }
      }
    }
  }
  return 0;
}

It may help to understand why strings are so complicated. Keep in mind that the resource structure in Windows is quite old — it was around back in the 16-bit days! Back then, the OS needed to make the most of the resources at is disposal. While there are many ways to lay out a string table, this format is a very efficient one. By splitting all of the strings up into blocks, you can get “near” the string you’re after with a simple bit shift and addition. Then to find the true string, you have to loop through a maximum of 15 strings. Passing over strings is a matter of a single pointer addition. The worst case lookup isn’t too bad, and you spend no extra space. The alternatives would involve table of contents (mapping ids to offsets), long lookups (looping over all strings to find the right ID), etc.

Unfortunately, all of these savings are not truly needed these days, and we’re stuck with the relative complexity of a string lookup. Thankfully, most of the time you can get away with calling LoadString and letting it do the heavy lifting. But if you find yourself needing to load a string based on a language, you can now use my LoadStringEx function to do it for you.

This entry was posted in Win32 and tagged , . Bookmark the permalink.

2 Responses to String Resources

  1. Jayson says:

    Hi Aaron

    I came across this 2011 post when I was simply looking for a way to know the correct buffer size for LoadString before I actually load the string. I figured out you need FindResourceEx for this, but it has its own problems.

    I am hoping that I will be able to alter your LoadStringEx function enough so that it creates its own buffer that is the correct size and then simply return to me a pointer to this string.

    Am I going to waste my time or is this possible?

    Kind Regards

    Jayson

  2. Aaron Ballman says:

    It shouldn’t be too hard to alter the function to perform the memory allocation yourself. The downside to designing an API like that is that it is a bit fragile — if the caller doesn’t know how the function allocates the memory, then nasty problems can happen. e.g., function uses malloc() to allocate and the caller uses delete[] to free it, or the function uses a debug malloc() to allocate and the caller uses a non-debug free() to free it, etc. Other than that, there shouldn’t be too many issues with changing the function around to do the allocation for the caller.

Leave a Reply

Your email address will not be published. Required fields are marked *