Frustrations in Assembly

At my day job, we do a lot of complex math calculations in our frameworks. To increase performance, we have enabled the optimizer to use certain assembly instruction sets such as MMX and SSE. However, we have found that not all customers have processors which support these instruction sets. A while back, I added a simple CPUID check to the library initialization code to ensure the customer had SSE support on chip and I figured that was the end of that. I was wrong.

The CPUID check I performed is fairly simple. According to Intel’s documentation, the 25th bit of EDX will report whether SSE is enabled or not when EAX is set to 1. The following code demonstrates a simplified test:

static bool IsSSEEnabled() {
  long edx_reg = 0;
  __asm {
    mov eax, 1
    cpuid
    mov edx_reg, edx
  }
  return ((edx_reg & (2 << 25)) == (2 << 25)); 
}

The test is fairly straight-forward (though safety code has been removed for clarity). However, it also caused some problems.

It turns out that some processor’s CPUID response does not match their marketing material. For instance, the AMD Athlon XP (Barton) processor’s marketing materials talks about it being “SSE compatible”, but CPUID returns 0 in bit 25. This is because “SSE compatible” is not “implements SSE” — it seems the processor can execute SSE instructions, but it either cannot execute all of them, or the execution deviates from the standard definition. Either way, the processor does not report itself as supporting SSE.

If it happens on one processor, it likely happens on more than one — so be careful when using CPUID as it may “lie” to you. Additionally, you really should check to see whether the instruction is supported in the first place, as some processors don’t implement the instruction (so checking will cause an illegal operation).

Ah the joys of being a low-level guy… Has anyone else run into this sort of a mess before?

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

1 Response to Frustrations in Assembly

  1. Dan says:

    Ah, that’s what makes it so fun! :-D

    A lot of the microcontrollers I use embedded an ID and a revision that can be read through memory mapped registers. Silicon is always being fixed & revved, and sometimes the boot up code needs to know “OK, can I enable the flash prefetch on this or not?” and by reading the silicon’s ID, you can make this determination at run time (instead of having separate builds for each chip rev, and hoping/praying that at the factory they program the right image into the right part — sometimes assembly lines have a picture of old parts & new – depends on what the distributor ships them or what they have left over from a previous production run.)

    One of the nice things about using embedded microcontrollers (ARM, MIPS, ColdFire, etc.) is that usually you can just read the registers like any other memory address (they’re memory mapped) – you don’t need to drop down to assembly & execute a special instruction. Some chips have MMUs (or at least MPUs) and certain regions of the memory space are marked as privileged (only accessible in a certain privileged CPU mode) or read-only, which in this case would be fine since we’re only reading the register (wouldn’t that be something if you could write to the register? Hmmmm… I want these processor extensions – I’ll just write the bit to magically enable them!)

Leave a Reply

Your email address will not be published. Required fields are marked *