Say I have an array A[0..N-1] of N bytes, where you may assume N is divisible by 8.

I also have a function f mapping bytes to bytes,defined by some 256-byte array f[0..255] of constants.

I want to do this:

for(i=0 to N-1){ B[i] = f[A[i]]; }

or perhaps this in-place version of the same thing:

for(i=0 to N-1){ A[i] = f[A[i]]; }

I there a way to do this a lot faster than the naive method, by taking advantage of MMX instructions? Even better, is there a way to access that power directly in non-assembly languages?

There are other things that could be done fast to arrays of bytes with such instructions I suppose. If one knew how.