The Hitachi SH4 has a set of pipelineable vector instructions that work on 4x4 and 4x1 length vectors (implemented as 2 sets of 16 FP registers). Nothing compared to MMX/SSE/AVX, but relatively complex.