You forgot the overhead to setup the DMA before each transaction. And if you implement the callback at job end, this will also take time and block completely the CPU from doing other tasks.and how can DMA not give a speedup even if the cpu slows down? your transfer will take equally long as with a blocking transfer, and even if the cpu slows down, at least it can do some work during the transfer.
Thus, dependent on the SPI clock speed, the overhead together with the post-processing can take the time necessary to transfer, let's say, 25 bytes.
So if you transfer 20 bytes without DMA, it is faster than transferring it with DMA.
Hence, again, to choose the appropriate strategy strongly depends on the application.
If you always write blocks of 256 bytes or more and have a lot of tasks to do between consecutive block writes (not only to wait for the previous SPI job to finish), then using DMA is clearly a good approach. Otherwise it can be slower than the non-DMA version.