SoftPWM via DMA and no CPU cycles

Post your cool example code here.
User avatar
Pito
Posts: 1628
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: SoftPWM via DMA and no CPU cycles

Post by Pito » Fri Jul 07, 2017 4:08 pm

victor_pv wrote:
Fri Jul 07, 2017 3:47 pm
Ollie wrote:
Fri Jul 07, 2017 3:02 pm
The classic PWM is analog and quite slow - it used to be 20 ms, but it is still limited by the servo signal definition of 1 - 2 ms.
What do you mean that the classic PWM is analog and slow? are you referring to STM32F1 or something else?
He refers to a standard used in RC systems since sixties - a channel "pwm" period 20ms (50Hz), with active "pwm" pulse length 1-2ms where 1500us is the middle position of the servo in particular channel. People say it is analog even it is not today, what is analog is the Servo loop. ESC (it converts the servo pulse length into the actual power of the bldc motor) is fed by that signal as well. But, the guys flying acro and similar stuff need something faster as the 50Hz control loop per channel is slow for them. They need something like 1ms period and less as the flying with a quadcopter 100knots among the trees in the forest requires fast responses in the control chain :)
Pukao Hats Cleaning Services Ltd.

victor_pv
Posts: 1750
Joined: Mon Apr 27, 2015 12:12 pm

Re: SoftPWM via DMA and no CPU cycles

Post by victor_pv » Fri Jul 07, 2017 5:11 pm

Pito wrote:
Fri Jul 07, 2017 4:08 pm
victor_pv wrote:
Fri Jul 07, 2017 3:47 pm
Ollie wrote:
Fri Jul 07, 2017 3:02 pm
The classic PWM is analog and quite slow - it used to be 20 ms, but it is still limited by the servo signal definition of 1 - 2 ms.
What do you mean that the classic PWM is analog and slow? are you referring to STM32F1 or something else?
He refers to a standard used in RC systems since sixties - a channel "pwm" period 20ms (50Hz), with active "pwm" pulse length 1-2ms where 1500us is the middle position of the servo in particular channel. People say it is analog even it is not today, what is analog is the Servo loop. ESC (it converts the servo pulse length into the actual power of the bldc motor) is fed by that signal as well. But, the guys flying acro and similar stuff need something faster as the 50Hz control loop per channel is slow for them. They need something like 1ms period and less as the flying with a quadcopter 100knots among the trees in the forest requires fast responses in the control chain :)
Ahhh that makes sense. HW PWM in the STMs should have no problem with 1ms periods, and looks like DMA/Software doesn't either.

User avatar
Pito
Posts: 1628
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: SoftPWM via DMA and no CPU cycles

Post by Pito » Fri Jul 07, 2017 5:46 pm

While the standard analog pwm pulse length is 1000-2000us, the newer and fastest analog multishot is 5-25us.
The new Dshot600 is a fixed 26us frame and the newest Dshot1200 is a fixed 13us frame - these two are called "digital" as they send within the "frame" (from the control unit) a pulse length coded binary number (frame is 16bits with 11bits of the actual power value and 4bit CRC) into the ESC controller. The ESC contains an mcu (ie. stm32) and it decodes the number out the frame and sets the power accordingly. They can do ~30k updates per second.. The stm32 inside the ESC must a) decode the frame, b) generate 3x 30kHz (11bit) pwm to drive 3 phases of the brush-less dc engine.
So a typical 4-8copter setup would require an F4 in the control unit (IMU+RC signals fusion), and 4-8x F3(4) in the ESCes (one ESC per motor). A lot of silicon..
Q: are you able to generate ie. at 8 gpio outputs 8x Dshot600/1200 frames via DMA (all 8 frames shot out in parallel) ?
https://github.com/cleanflight/cleanfli ... m_output.c
Pukao Hats Cleaning Services Ltd.

universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

Re: SoftPWM via DMA and no CPU cycles

Post by universam10 » Fri Jul 07, 2017 10:01 pm

Thanks for the explanation about the race conditions, I will try with a single buffer.
victor_pv wrote:
Fri Jul 07, 2017 4:02 pm
Is the problem related to the timer, or is it perhaps due to fast rate of ISR and the time it takes to fill the buffer?
That bothers me that below 1us timer trigger the F1 just crashes from the start. Unfortunately I got no debugger right now so I'm in the dark why this happens. Afaik and according to your quotation the DMA and the port shouldn't be limited at 1 MHz so I wonder what's the limit here. Any ideas?

victor_pv
Posts: 1750
Joined: Mon Apr 27, 2015 12:12 pm

Re: SoftPWM via DMA and no CPU cycles

Post by victor_pv » Fri Jul 07, 2017 11:21 pm

universam10 wrote:
Fri Jul 07, 2017 10:01 pm
Thanks for the explanation about the race conditions, I will try with a single buffer.
victor_pv wrote:
Fri Jul 07, 2017 4:02 pm
Is the problem related to the timer, or is it perhaps due to fast rate of ISR and the time it takes to fill the buffer?
That bothers me that below 1us timer trigger the F1 just crashes from the start. Unfortunately I got no debugger right now so I'm in the dark why this happens. Afaik and according to your quotation the DMA and the port shouldn't be limited at 1 MHz so I wonder what's the limit here. Any ideas?
They shouldn't be limited, and is strange that it crashes. If the DMA was going to fast for the ram, it would just not keep up with the frequency, but not crash, as Racemaniac tested pushing the limits and he didn't crashes.

If I have some time later today I'll flash it to a board and check with the debugger.

Just tested, I can get up to 8Khz with 255 bit resolution.
That's about 2Mb if I calculate it right.
At 10Khz the DMA handler crashes. I think it's tripping an ISR before the previous one is completed.

Was 800Hz with 255 stepsresolution. See my note about about the setPeriod division.
With 100 steps resolution I can go in frequency up to 2.5Khz.

From the debugger, it fails this ASSERT:

Code: Select all

dma_irq_cause dma_get_irq_cause(dma_dev *dev, dma_channel channel) {
    /* Grab and clear the ISR bits. */
    uint8 status_bits = dma_get_isr_bits(dev, channel);
    dma_clear_isr_bits(dev, channel);

    /* If the channel global interrupt flag is cleared, then
     * something's very wrong. */
    ASSERT(status_bits & 0x1)
I think the ISR is taking too long to get serviced. Right now the filling is done inside the ISR, so a possible solution is to not do that, and instead set a flag and have a loop waiting for the flag to fill the DMA.
That would prevent the problem with the nesting ISRs, but likely the DMA would complete a cycle before the fillBuffer function has been able to fill the buffer completely.

Perhaps you can do some other optimizations so the ISR takes a bit shorter, that will give you a higher top frequency. Once thing that comes to mind is, if pinVal = 0, no need to compare to step, should just leave the set bit at 0 for all buffer positions. That may not save a lot of cycles, but since that loop repeats a lot, may help.

Also the division to calculate the period in uS has 10 millions, rather than 1million, so in fact it was setting a period 10 times longer.

Since you only update the buffer every X number of DMA ISRs, I think it would be better to use 2 independet buffers, and you refill each buffer outside of the ISR, so doesn't need to complete before the next DMA transfer. In fact should start refilling a buffer as soon as it starts using the other for the DMA. By the time the DMA has ran X cycles, the function that refills the buffer has had X times the time to complete filling up the second buffer, then the ISR just switches buffer address and starts the DMA over.

EDIT:
I wanted to try systemview for tracing, so I gave it a shot with this. First the DMA is triggering way more often than I expected, but I had changed the timer settings, so I have to doublecheck that. But the part I wanted to measure was refilling the buffers. One call to the fillBuffer function with a resolution of 100 takes abot 50uS.
My sketch was having a DMA event call every 70uS more or less, so in the cases in which it calls fillBuffer it only had 20uS left for everything else. If I tried to increase the frequency it crashes with nested DMA event calls.
So if you want to increase the speed considerably, offloading fillBuffer to be run outside of the ISR, and possibly using more buffers, so one can be filled up during a period longer than the DMA takes to run a cycle should be a good solution.

universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

Re: SoftPWM via DMA and no CPU cycles

Post by universam10 » Mon Jul 10, 2017 7:30 am

victor_pv wrote:
Fri Jul 07, 2017 11:21 pm
I think the ISR is taking too long to get serviced. Right now the filling is done inside the ISR, so a possible solution is to not do that, and instead set a flag and have a loop waiting for the flag to fill the DMA.
That would prevent the problem with the nesting ISRs, but likely the DMA would complete a cycle before the fillBuffer function has been able to fill the buffer completely.
Great, thanks for debugging!
I took your advice and the paradigm to update the buffer outside of the ISR as the race condition will not occur... see below
victor_pv wrote:
Fri Jul 07, 2017 11:21 pm
I think it would be better to use 2 independet buffers, and you refill each buffer outside of the ISR, so doesn't need to complete before the next DMA transfer. In fact should start refilling a buffer as soon as it starts using the other for the DMA. By the time the DMA has ran X cycles, the function that refills the buffer has had X times the time to complete filling up the second buffer, then the ISR just switches buffer address and starts the DMA over.
Thats not clear to me, since I am using and F1 where there are no ping-pong buffers for DMA. If I therefore switch the buffer I need to stop - start the DMA which, I cant measure right now, will likely be a longer stop of the PWM. Are there methods to do this within a single/few cycles?

victor_pv wrote:
Fri Jul 07, 2017 11:21 pm
So if you want to increase the speed considerably, offloading fillBuffer to be run outside of the ISR, and possibly using more buffers, so one can be filled up during a period longer than the DMA takes to run a cycle should be a good solution.
Great, so I put the suggestions together:

As being said the fillbuffer can be outside of the ISR and not necessarily in sync, which obviously does work. There may be some glitches :?:
Therefore, I could _completelly_ remove the whole ISR part as this became unnecessary.
Next was to remove the double buffer as also being unnecessary.

While thinking twice another idea came to my mind is that with the writePWM() I"m updating only one Pin, but in the fillbuffer() I'm rewriting the whole port so all 16 pins. Therefore I created a vertical fillbuffer that only changes the pin that was updated, which in result runs 16x faster.

Having done above changes, now the DMA trigger (duty cycle) I can go down to a prescaler of 1 which is a clock speed of 72MHz! :o
Of course, this is technically not possible, but actually the F1 doesn't crash any more at PWM of 280kHz and 8bit resolution.

Now I would be super curious whats the real speed on the pins if you got equipment to measure that? :)

Code: Select all

#include <Arduino.h>
#include <libmaple/dma.h>
#include <dma_private.h>

#define RESOLUTION 255   // PWM resolution
#define FREQUENCY 200000 // PWM frequency

// #if F_CPU / RESOLUTION / FREQUENCY < 120
// #error did not work for me
// #endif

class DMASoftPWM
{
  public:
    DMASoftPWM();
    void begin(gpio_dev *port);
    void setPinMode(uint8_t pin, bool enable);
    void writePWM(uint8_t pin, uint16_t val);
    uint32_t buffer[RESOLUTION];
    uint16_t pinmask;

  private:
    void fillBufferVert(uint8_t channel);
    // static DMASoftPWM *anchor;
    // static void marshall() { anchor->DMAEvent(); }
    inline void fillBuffer(uint16_t ptr);
    uint16_t pinVal[16];
    // uint8_t refresh;
    // void DMAEvent();
    dma_tube_config tube_config;
};

DMASoftPWM::DMASoftPWM()
{
    // anchor = this;
}

void DMASoftPWM::begin(gpio_dev *port)
{
    dma_init(DMA1);
    tube_config.tube_src = buffer;
    tube_config.tube_src_size = DMA_SIZE_32BITS;
    tube_config.tube_dst = (uint32_t *)&GPIOC->regs->BSRR; // Load pointer to porta clear/set
    tube_config.tube_dst_size = DMA_SIZE_32BITS;
    tube_config.tube_nr_xfers = RESOLUTION;
    tube_config.tube_flags = DMA_CFG_SRC_INC | DMA_CFG_CIRC; // | DMA_CFG_CMPLT_IE | DMA_CFG_HALF_CMPLT_IE;
    tube_config.target_data = 0;
    tube_config.tube_req_src = DMA_REQ_SRC_TIM2_CH3; // DMA request source.
    dma_set_priority(DMA1, DMA_CH1, DMA_PRIORITY_VERY_HIGH);
    dma_tube_cfg(DMA1, DMA_CH1, &tube_config); // Attach the tube to channel 1 (timer2 ch3)
    // dma_attach_interrupt(DMA1, DMA_CH1, DMASoftPWM::marshall);
    dma_enable(DMA1, DMA_CH1);

    //TIMER setup
    Timer2.pause();
    Timer2.setPrescaleFactor(F_CPU / RESOLUTION / FREQUENCY);
    Timer2.setOverflow(1);

    Timer2.setChannel3Mode(TIMER_OUTPUT_COMPARE);
    Timer2.setCompare(TIMER_CH3, 1);
    Timer2.refresh();
    TIMER2_BASE->DIER = TIMER_DIER_CC3DE;
    Timer2.resume();
}

void DMASoftPWM::fillBufferVert(uint8_t channel)
{
    for (uint16_t step = 0; step < RESOLUTION; step++)
    {
        if (pinVal[channel] >= step)
            buffer[step] |= BIT(channel);
        else
            buffer[step] &= ~ BIT(channel);
    }
}

void DMASoftPWM::setPinMode(uint8_t pin, bool enable)
{
    pinMode(pin, OUTPUT);

    if (enable)
        pinmask |= digitalPinToBitMask(pin);
    else
        pinmask &= ~digitalPinToBitMask(pin);
    // pre fill the reset buffer
    for (uint16_t step = 0; step < RESOLUTION; step++)
    {
        buffer[step] = (uint32_t)pinmask << 16;
    }
}

void DMASoftPWM::writePWM(uint8_t pin, uint16_t val)
{
    pinVal[pin] = val;
    // refresh = 2;
    fillBufferVert(pin);
    // fillBuffer((uint16_t) 0);
}
// DMASoftPWM *DMASoftPWM::anchor = NULL;
DMASoftPWM softPWMPortC;

void setup()
{
    Serial.begin(115200);
    Serial.println("starting usb serial");

    softPWMPortC.begin(GPIOC);
    softPWMPortC.setPinMode(PC13, true);
}
// #define DEBUGBUFFER 1

void loop()
{
    if (Serial.available())
    {
        int pin = Serial.parseInt();
        int val = Serial.parseInt();
        while (Serial.available())
            Serial.read();
        softPWMPortC.writePWM(pin, val);
        Serial.print(pin);
        Serial.print(':');
        Serial.println(val);

#ifdef DEBUGBUFFER
        delay(100);
        for (int u = 0; u < RESOLUTION; u++)
            Serial.println(softPWMPortC.buffer[u], BIN);
#endif
    }

    static uint32_t sweep;
    static uint16_t t = 0;
    if (millis() - sweep > 1000 / RESOLUTION)
    {
        sweep = millis();
        t = ++t % RESOLUTION;
        softPWMPortC.writePWM(13, t);
    }
}

zmemw16
Posts: 1491
Joined: Wed Jul 08, 2015 2:09 pm
Location: St Annes, Lancs,UK

Re: SoftPWM via DMA and no CPU cycles

Post by zmemw16 » Mon Jul 10, 2017 10:53 am

i don't understand ping-pong buffers term, do you mean 'set up two dma start address's and the dma will alternate between them' ?

isr fires on dma complete?? so why can't you alternate dma start point in the isr and re-trigger the dma??

stephen

universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

Re: SoftPWM via DMA and no CPU cycles

Post by universam10 » Mon Jul 10, 2017 11:14 am

How much time / cycles will it take to stop, reconfigure, start the DMA for every duty cycle change? Not sure, but anyway the PWM will be out of sync. On the other side, if I alter the buffer there might be a sample not accurate if the PWM duty really "jumps", but I doubt with former change the impact is way more dramatic!

Anyway, it works like this very good, one has to proof that there is a glitch firsthand.

victor_pv
Posts: 1750
Joined: Mon Apr 27, 2015 12:12 pm

Re: SoftPWM via DMA and no CPU cycles

Post by victor_pv » Mon Jul 10, 2017 4:16 pm

universam10 wrote:
Mon Jul 10, 2017 11:14 am
How much time / cycles will it take to stop, reconfigure, start the DMA for every duty cycle change? Not sure, but anyway the PWM will be out of sync. On the other side, if I alter the buffer there might be a sample not accurate if the PWM duty really "jumps", but I doubt with former change the impact is way more dramatic!

Anyway, it works like this very good, one has to proof that there is a glitch firsthand.
Should not be too much, depending how you do it. If you do direct register manupulation (and I can 't see a reason to not do so), should be just a few instructions.
You only need to disable the channel, change the source address, set the transfer size again (I believe it goes to 0, would need to read the reference manual again to confirm), and enable the channel.
Since you are not changing target address, interrupt settings, callback function... it should not be that much time.
But of course depending how fast you are going on the timer requests, could be longer than what you want. and cause that pulse to last longer than it should.
The other possibility, that's using the single buffer and modifying it while being sent by the DMA always has the chance that the DMA catches up with the CPU writing the new table and you get a pin to go down and up twice in the same cycle if the DMA is fast enough, so I guess it depends on the application, but I think for most applications may be better to get a pulse that's a few uS longer than it should, than getting 2 fast pulses instead of 1.
The F4 double buffering DMA would be great for pushing this to the limit :)

I have a small 8 channel analyzer, I think max speed it 10Mhz or 24Mhz, I'll see if I get a chance to measure the pulses, and see if I can force PWM "jumps" and see them. Did you update the code in the first post with fillBuffer decoupled from the DMA ISR?

universam10
Posts: 19
Joined: Sun Jan 03, 2016 8:35 am
Location: Germany

Re: SoftPWM via DMA and no CPU cycles

Post by universam10 » Mon Jul 10, 2017 5:45 pm

Cool, let me know what's the result. The updated code is in the above post.

Post Reply