Global training solutions for engineers creating the world's electronics

Efficient Byte Swapping using Armv6 and Armv7-A/R instructions

Since the introduction of the Arm11 processor family, the Arm instruction set supports on-the-fly switching of the current endian mode. Natively a little endian architecture, the processor can be put into a big endian mode to process chunks of data in that format. DSPs, network adapters and similar peripherals might return processed data in big endian format.

The Armv6 architecture has several instructions to handle data in different byte order. For endian conversion (byte swapping) of single data words it might be a good idea to simply use the REV instruction.

LDR r1, [r0]      ; load word from memory
REV r1, r1        ; change byte order
STR r1, [r0]      ; store word to memory (same address)

(There are different variants of this instruction available, some are architecture dependent.)

If you have larger blocks of data to convert, the method above might not be the quickest way do this. Instead we change the processor endianess between loading and storing data. Combined with the load/store multiple instructions using different addressing modes, this leads to the more efficient function below. Note that the block length must be divisible by 16 for this example to work!

__asm void *be2le4w(uint32_t *blk, size_t len)
{
    CMP  r1, #0           ; cover trivial case
    BXEQ lr

    PUSH { r4, r5 }       ; set up stack
    MOV  r12, r0          ; save pointer

loop
    SETEND BE             ; switch to big endian
    LDM  r12, { r2-r5 }   ; and load four words
                          ;   from memory
    SETEND LE             ; switch to little endian
    STM  r12!, { r2-r5 }  ; store four words to memory 
                          ;   and advance pointer
    SUBS r1, r1, #16      ; decrement counter
                          ;   (16 bytes transfered)
    BGT  loop             ; finished?

exit
    POP  { r4, r5 }       ; restore registers
    BX   lr               ; return
}

Every iteration of this loop processes four data words and memory access may take advantage of burst access on the system bus. It is perhaps worth noticing that the LDM instruction does not update the base register (r12). This way the subsequent STM accesses the same memory location, modifying data in place. Only after the STM has completed, the base register will be updated (incremented by 16) as indicated by the exclamation mark following the register name.