Non blocking SPI DMA - Added callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post here first, or if you can't find a relevant section!
stevestrong
Posts: 1505
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by stevestrong » Wed Feb 22, 2017 11:39 am

and how can DMA not give a speedup even if the cpu slows down? your transfer will take equally long as with a blocking transfer, and even if the cpu slows down, at least it can do some work during the transfer.
You forgot the overhead to setup the DMA before each transaction. And if you implement the callback at job end, this will also take time and block completely the CPU from doing other tasks.
Thus, dependent on the SPI clock speed, the overhead together with the post-processing can take the time necessary to transfer, let's say, 25 bytes.
So if you transfer 20 bytes without DMA, it is faster than transferring it with DMA.

Hence, again, to choose the appropriate strategy strongly depends on the application.
If you always write blocks of 256 bytes or more and have a lot of tasks to do between consecutive block writes (not only to wait for the previous SPI job to finish), then using DMA is clearly a good approach. Otherwise it can be slower than the non-DMA version.

racemaniac
Posts: 432
Joined: Sat Nov 07, 2015 9:09 am

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by racemaniac » Wed Feb 22, 2017 12:06 pm

stevestrong wrote:
and how can DMA not give a speedup even if the cpu slows down? your transfer will take equally long as with a blocking transfer, and even if the cpu slows down, at least it can do some work during the transfer.
You forgot the overhead to setup the DMA before each transaction. And if you implement the callback at job end, this will also take time and block completely the CPU from doing other tasks.
Thus, dependent on the SPI clock speed, the overhead together with the post-processing can take the time necessary to transfer, let's say, 25 bytes.
So if you transfer 20 bytes without DMA, it is faster than transferring it with DMA.

Hence, again, to choose the appropriate strategy strongly depends on the application.
If you always write blocks of 256 bytes or more and have a lot of tasks to do between consecutive block writes (not only to wait for the previous SPI job to finish), then using DMA is clearly a good approach. Otherwise it can be slower than the non-DMA version.
If you optimize a bit, the configuration overhead should be minimal (if you're doing similar transfers, at most an update of the memory address, transfer length, and reenabling the dma). if you're going trough the entire initialization like HAL does, it will indeed take time ^^.
But indeed, i can imagine if you do a lot of small transfers, that using DMA might not give any advantage.

And what was the part about the cpu becoming slower? are you saturating the memory bus with your DMA, causing the execution of the code during the dma to noticably slow down? or were you just talking about the overhead? (from my experiments, i can hardly imagine a single SPI bus hammering the memory bus so hard that you'd notice a difference in the cpu, but i never looked for that, so maybe i'm completely wrong :) ).

stevestrong
Posts: 1505
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by stevestrong » Wed Feb 22, 2017 3:06 pm

Extract from RM0008:
13.3 DMA functional description
The DMA controller performs direct memory transfer by sharing the system bus with the
Cortex®-M3 core. The DMA request may stop the CPU access to the system bus for some
bus cycles, when the CPU and DMA are targeting the same destination (memory or
peripheral)
. The bus matrix implements round-robin scheduling, thus ensuring at least half
of the system bus bandwidth (both to memory and peripheral) for the CPU.
So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.

racemaniac
Posts: 432
Joined: Sat Nov 07, 2015 9:09 am

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by racemaniac » Wed Feb 22, 2017 3:28 pm

stevestrong wrote:Extract from RM0008:
13.3 DMA functional description
The DMA controller performs direct memory transfer by sharing the system bus with the
Cortex®-M3 core. The DMA request may stop the CPU access to the system bus for some
bus cycles, when the CPU and DMA are targeting the same destination (memory or
peripheral)
. The bus matrix implements round-robin scheduling, thus ensuring at least half
of the system bus bandwidth (both to memory and peripheral) for the CPU.
So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.
i know that, but suppose you have spi running at max speed (is half the clock speed i believe, so 36Mhz for an stm32f103), it'll need 1 byte every 8 bits it transmits at that pace, so every 16 clockpulses (or 2 bytes if you're working bidirectional). And from my memory to memory experiments transfer experiments towards the port registers, i remember reaching far higher speeds than the 36Mbit (or 72) the spi can do.
i think besides doing dma memory to memory transfers, you're unlikely to notice much speed difference. a mere spi port is too slow to be that much of an issue.

but sounds like an interesting experiment to do :). i'll have to give this a try, set up a circular spi dma in the background and do some performance tests with and without it running :).

victor_pv
Posts: 1600
Joined: Mon Apr 27, 2015 12:12 pm

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by victor_pv » Wed Feb 22, 2017 4:17 pm

stevestrong wrote:Please check my comments posted here: https://github.com/rogerclarkmelbourne/ ... -277516813
Thanks, I found them when reading that PR, that was named for usb but included the SPI stuff. I added some comments in the commit, and they show out of context in the PR comments. Please see them in the commit to see what lines they refer to, I did not expect them to show in the PR.

I'll check your latest commit. One of the issues I found is that for the read() function you were not doing any xmit, the other is when sending data, dont remember the name of the function, but was not incrementing any counter so would loop forever.
I'll will check if it was corrected in your last comit. If not, it's possible you didn't notice them if they are not used in the libraries you tested.

victor_pv
Posts: 1600
Joined: Mon Apr 27, 2015 12:12 pm

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by victor_pv » Wed Feb 22, 2017 4:31 pm

racemaniac wrote:
stevestrong wrote:Extract from RM0008:
13.3 DMA functional description
The DMA controller performs direct memory transfer by sharing the system bus with the
Cortex®-M3 core. The DMA request may stop the CPU access to the system bus for some
bus cycles, when the CPU and DMA are targeting the same destination (memory or
peripheral)
. The bus matrix implements round-robin scheduling, thus ensuring at least half
of the system bus bandwidth (both to memory and peripheral) for the CPU.
So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.
i know that, but suppose you have spi running at max speed (is half the clock speed i believe, so 36Mhz for an stm32f103), it'll need 1 byte every 8 bits it transmits at that pace, so every 16 clockpulses (or 2 bytes if you're working bidirectional). And from my memory to memory experiments transfer experiments towards the port registers, i remember reaching far higher speeds than the 36Mbit (or 72) the spi can do.
i think besides doing dma memory to memory transfers, you're unlikely to notice much speed difference. a mere spi port is too slow to be that much of an issue.

but sounds like an interesting experiment to do :). i'll have to give this a try, set up a circular spi dma in the background and do some performance tests with and without it running :).
Racemaniac, the problem with the general use dmaSend functions in the SPI library is that they don't reuse buffers and transfer sizes, neither they know if the user code will need to use the DMA channel for something else after the SPI DMA transfer finishes, so to be safe the functions set up everything on every call, that's the reason for the big overhead. Then about the memory access slowdown I can see what Steve is saying in code that's memory access intensive, like reading or writting blocks to memory at the same time, there will be some slowdown. But for code that for example does some calculations on a data block, the slow down should be minimal. I think the biggest impact is from setting all the dma transfer parameters each time.

Still they provide a big advantage for anything sending or receiving more than a few bytes. Specially if using any device that operates at less than full speed. At 18Mhz, if using 16bit transfer it only does 1 read every 64 cycles, and sending 100 bytes would take 3200 cpu cycles if I calculate it right, there is plenty the CPU can do in 3200 cpu cycles if the transfer is done by DMA.

I think we all agree that like Steve said, it all depends on the use case.

So having the functionality doesn't hurt anything other than very few bytes of RAM.

racemaniac
Posts: 432
Joined: Sat Nov 07, 2015 9:09 am

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by racemaniac » Wed Feb 22, 2017 7:28 pm

victor_pv wrote:
racemaniac wrote:
stevestrong wrote:Extract from RM0008:

So the CPU may be slowed down to the half of its speed capacity. The higher the SPI clock, the worse the situation for the CPU if performs a lot of memory accesses.
i know that, but suppose you have spi running at max speed (is half the clock speed i believe, so 36Mhz for an stm32f103), it'll need 1 byte every 8 bits it transmits at that pace, so every 16 clockpulses (or 2 bytes if you're working bidirectional). And from my memory to memory experiments transfer experiments towards the port registers, i remember reaching far higher speeds than the 36Mbit (or 72) the spi can do.
i think besides doing dma memory to memory transfers, you're unlikely to notice much speed difference. a mere spi port is too slow to be that much of an issue.

but sounds like an interesting experiment to do :). i'll have to give this a try, set up a circular spi dma in the background and do some performance tests with and without it running :).
Racemaniac, the problem with the general use dmaSend functions in the SPI library is that they don't reuse buffers and transfer sizes, neither they know if the user code will need to use the DMA channel for something else after the SPI DMA transfer finishes, so to be safe the functions set up everything on every call, that's the reason for the big overhead. Then about the memory access slowdown I can see what Steve is saying in code that's memory access intensive, like reading or writting blocks to memory at the same time, there will be some slowdown. But for code that for example does some calculations on a data block, the slow down should be minimal. I think the biggest impact is from setting all the dma transfer parameters each time.

Still they provide a big advantage for anything sending or receiving more than a few bytes. Specially if using any device that operates at less than full speed. At 18Mhz, if using 16bit transfer it only does 1 read every 64 cycles, and sending 100 bytes would take 3200 cpu cycles if I calculate it right, there is plenty the CPU can do in 3200 cpu cycles if the transfer is done by DMA.

I think we all agree that like Steve said, it all depends on the use case.

So having the functionality doesn't hurt anything other than very few bytes of RAM.
indeed :). It would still be nice to be able to configure that you won't reuse the dma channel for something else and get the cheaper dma setup (likely the most often usecase since so far dma is hardly used anyway ^^)

And i'm still wondering about the performance hit for what is running simultaneous to it :). i'm really going to give that a try when i have some time, it's very good to know :).

any suggestions on what to benchmark it with? just a dhrystone benchmark or so?

victor_pv
Posts: 1600
Joined: Mon Apr 27, 2015 12:12 pm

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by victor_pv » Wed Feb 22, 2017 7:51 pm

racemaniac wrote:
indeed :). It would still be nice to be able to configure that you won't reuse the dma channel for something else and get the cheaper dma setup (likely the most often usecase since so far dma is hardly used anyway ^^)

And i'm still wondering about the performance hit for what is running simultaneous to it :). i'm really going to give that a try when i have some time, it's very good to know :).

any suggestions on what to benchmark it with? just a dhrystone benchmark or so?
I would think that a dhrystone would be good to compare general compute intensive operations, but also just a mem copy test, that would be intensive in ram access would help a lot to see worse case scenario.

victor_pv
Posts: 1600
Joined: Mon Apr 27, 2015 12:12 pm

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by victor_pv » Thu Feb 23, 2017 2:51 am

racemaniac wrote: indeed :). It would still be nice to be able to configure that you won't reuse the dma channel for something else and get the cheaper dma setup (likely the most often usecase since so far dma is hardly used anyway ^^)
If I understand right, you are suggesting to add functions that could repeat a dma transfer already set before?
Something like:

Code: Select all

SPI.dmaSetTX (uint16 *buffer, uint16 length, bool minc);
SPI.dmaRepeatTX();
... refilling buffer, etc ...
SPI.dmaRepeatTX();
That could make sense. I have to check the dma info, but I believe we only need to reset the length again before repeating a dma transfer, everything else stays, should remove some overhead.

racemaniac
Posts: 432
Joined: Sat Nov 07, 2015 9:09 am

Re: Planning to add callback to the SPI DMA functions (dmaSend, dmaTransfer...)

Post by racemaniac » Thu Feb 23, 2017 8:31 am

victor_pv wrote:
racemaniac wrote: indeed :). It would still be nice to be able to configure that you won't reuse the dma channel for something else and get the cheaper dma setup (likely the most often usecase since so far dma is hardly used anyway ^^)
If I understand right, you are suggesting to add functions that could repeat a dma transfer already set before?
Something like:

Code: Select all

SPI.dmaSetTX (uint16 *buffer, uint16 length, bool minc);
SPI.dmaRepeatTX();
... refilling buffer, etc ...
SPI.dmaRepeatTX();
That could make sense. I have to check the dma info, but I believe we only need to reset the length again before repeating a dma transfer, everything else stays, should remove some overhead.
yup, something like that. we can see how much can be reconfigured (also changing the target address could also be an option, is just 1 write to a register). But going for the full dma setup everytime seems a bit overkill most of the time :)

Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests