[F4] Neopixel driver using hardware timers

flyboy74 · Post by **flyboy74** » Wed May 06, 2020 9:11 pm

ozcar wrote: Wed May 06, 2020 1:43 pm You could maybe add something to know when the transmission is complete and there was been sufficient delay for the "reset".

Probably the easiest way to do this is check the status of the UIE bit becuase the stopping of the transmission is done by this line.

Code: Select all

TIM4->DIER &= ~TIM_DIER_UIE; //disable interrupt flag to end transmission.

Is it possible that your scope isn't triggering till the second bit and cutting off detecting the first bit. I find it hard to believe that every 20 executions it behaves differently.

ozcar · Post by **ozcar** » Wed May 06, 2020 11:34 pm

flyboy74 wrote: Wed May 06, 2020 9:11 pm
Probably the easiest way to do this is check the status of the UIE bit becuase the stopping of the transmission is done by this line.
Code: Select all
TIM4->DIER &= ~TIM_DIER_UIE; //disable interrupt flag to end transmission.

Yes you can tell that it is no longer transmitting by checking UIE, but as it stands you don't know how long is has been in that state, so you don't know whether the output has been low long enough to provide the "reset". That time varies depending on the LED. You could maybe count off some multiple of the "dummy" pulses with CCR1 = 0, or change ARR for one longer period.

flyboy74 wrote: Wed May 06, 2020 9:11 pm
Is it possible that your scope isn't triggering till the second bit and cutting off detecting the first bit. I find it hard to believe that every 20 executions it behaves differently.

After I saw the occasional dropped bit, I inserted code to provide a trigger in show_neopixels:

Code: Select all

void show_neopixels(){
        GPIOD->BSRR = GPIO_BSRR_BS14;      // trigger up
        pos = 0;                    //set the interupt to start at first byte
	mask = 0B10000000;          //set the interupt to start at first bit
	TIM4->DIER |= TIM_DIER_UIE; //enable interupt flag to be generated to start transmission
	GPIOD->BSRR = GPIO_BSRR_BR14;      // trigger down
}

If I miss the trigger there (which I can't see happening anyway), then I get to see nothing. With that in place, I saw it usually working OK and outputting 8 bits, but very occasionally only 7, with the first bit missing.

I'm not saying it is going to fail exactly every 20th time - for me it was very random. However, when I eventually worked out what it was, I found I could get it to fail almost 100% of the time. The clue is there in what I did to fix it.

I was seeing it with the timing as you had it for 400kHz LEDs, but I only have 800kHz LEDs. With the different timing, and perhaps the optimisation set, it might be more or less likely to occur. If I get a chance, I will see if I can get it to happen with the 800kHz LEDs attached - I'm pretty sure it would cause a visible effect on the LEDs, particularly if the data sent was something like 0x808080808080... or 0x800000800000...

Hackswell · Post by **Hackswell** » Thu May 07, 2020 12:47 am

Isn't this similar to how the Octo8211 library for Teensy is implemented? Uses three timer channels to drive 8 GPIO pins from DMA.

ozcar · Post by **ozcar** » Thu May 07, 2020 3:31 am

Hackswell wrote: Thu May 07, 2020 12:47 am Isn't this similar to how the Octo8211 library for Teensy is implemented? Uses three timer channels to drive 8 GPIO pins from DMA.

This only uses one timer channel, but as was mentioned above this could be extended to use more timer channels to output data to several strings simultaneously. Could probably synchronise timers for even more channels.

It does not use DMA. I did look at OctoWS2811 a long time ago, but I don't remember much about it. I'm not sure if its use of Freescale, sorry NXP, DMA and timers could be easily adapted to STM32.

Somewhere I've seen code that used DMA to set the duty cycle for a timer, rather than doing that in an interrupt routine as in Flyboy's code, but it required that all the LED data was converted to a set of values that DMA could just grab and send to the timer, so that required a additional buffer way larger than the raw LED data. Inspired by Flyboy's code, I now came up with a version that uses DMA, but with a buffer only big enough for 2 LED's worth of data. Instead of having to handle one interrupt per bit, there is only 1 interrupt per LED (24 bits). The interrupt routine has to do more work now, but it still comes out ahead in terms of CPU usage. This is what the interrupt routine looks like:

Code: Select all

void DMA1_Stream6_IRQHandler(void)
{
   uint16_t *buf=0;
   int i;

   if ( DMA1->HISR & DMA_HISR_HTIF6 )
     {
       DMA1->HIFCR |= DMA_HIFCR_CHTIF6;       // clear half-transfer interrupt
       buf = DMAbuf;                          // to fill 1st half
     }
   if ( DMA1->HISR & DMA_HISR_TCIF6 )
     {
       DMA1->HIFCR |= DMA_HIFCR_CTCIF6;       // clear transfer complete interrupt
       buf = DMAbuf + BITS_PER_LED;           // to fill 2nd half
     }

   if ( !buf ) return;

   if  ( (pos < sizeof(LED_data)) )
     {
       for ( i=0; i<BYTES_PER_LED; i++,pos++ )   // sizeof(pos) is multiple of BYTES_PER_LED
         for ( mask=0x80; mask; mask>>=1 )
   	    {
   		if (  LED_data[pos] & mask )
   		  *buf = high_CCR1;
   		else
   		  *buf = low_CCR1;
   		buf++;
   	    }
     }
   else      // out of data
     {
       if ( lastbit < TRESET )    // approx for now
       	 {
       	   for ( i=0; i<BITS_PER_LED; i++,*buf++=0 );    // no pulses
       	   lastbit++;
       	 }
       else
	 DMA1_Stream6->CR &= ~DMA_SxCR_EN;   // DMA1 stream6 disable
     }
}

I suspect somebody may have done something similar already. I don't know if it can be done using the STM32duino core. Yet another thing for me to look at some time.

flyboy74 · Post by **flyboy74** » Thu May 07, 2020 9:12 am

ozcar wrote: Thu May 07, 2020 3:31 am Inspired by Flyboy's code, I now came up with a version that uses DMA, but with a buffer only big enough for 2 LED's worth of data. Instead of having to handle one interrupt per bit, there is only 1 interrupt per LED (24 bits). The interrupt routine has to do more work now, but it still comes out ahead in terms of CPU usage. This is what the interrupt routine looks like:
Code: Select all
void DMA1_Stream6_IRQHandler(void)
{
   uint16_t *buf=0;
   int i;

   if ( DMA1->HISR & DMA_HISR_HTIF6 )
     {
       DMA1->HIFCR |= DMA_HIFCR_CHTIF6;       // clear half-transfer interrupt
       buf = DMAbuf;                          // to fill 1st half
     }
   if ( DMA1->HISR & DMA_HISR_TCIF6 )
     {
       DMA1->HIFCR |= DMA_HIFCR_CTCIF6;       // clear transfer complete interrupt
       buf = DMAbuf + BITS_PER_LED;           // to fill 2nd half
     }

   if ( !buf ) return;

   if  ( (pos < sizeof(LED_data)) )
     {
       for ( i=0; i<BYTES_PER_LED; i++,pos++ )   // sizeof(pos) is multiple of BYTES_PER_LED
         for ( mask=0x80; mask; mask>>=1 )
   	    {
   		if (  LED_data[pos] & mask )
   		  *buf = high_CCR1;
   		else
   		  *buf = low_CCR1;
   		buf++;
   	    }
     }
   else      // out of data
     {
       if ( lastbit < TRESET )    // approx for now
       	 {
       	   for ( i=0; i<BITS_PER_LED; i++,*buf++=0 );    // no pulses
       	   lastbit++;
       	 }
       else
	 DMA1_Stream6->CR &= ~DMA_SxCR_EN;   // DMA1 stream6 disable
     }
}
I suspect somebody may have done something similar already. I don't know if it can be done using the STM32duino core. Yet another thing for me to look at some time.

I didn't look super close at your code so correct me if I am wrong. You create a buffer big enough to hold all the vaules for the CCR for 48 bits (I assume you can use uint8_t as values are always low) so that would be a total of an extra 48 bytes on top of the array that stores the whole led strip. Then you use a double buffered DMA to transfer the buffer to the CCR and after every byte it fires an interrupt that updates half the buffer.

ozcar · Post by **ozcar** » Thu May 07, 2020 10:52 am

flyboy74 wrote: Thu May 07, 2020 9:12 am I didn't look super close at your code so correct me if I am wrong. You create a buffer big enough to hold all the vaules for the CCR for 48 bits (I assume you can use uint8_t as values are always low) so that would be a total of an extra 48 bytes on top of the array that stores the whole led strip. Then you use a double buffered DMA to transfer the buffer to the CCR and after every byte it fires an interrupt that updates half the buffer.

The DMA buffer is 48 uint16_t values, so it is not that huge. No particular reason for the buffer to be exactly that many entries, but 48 for two LEDs worth seemed like a reasonable number. The values will fit in uint8_t so maybe there is a way to cut the size of the buffer in half. I did avoid using uint32_t sized entries because TIM4 uses only 16 bits.

Double buffered is where you can have two non-contiguous buffer areas. So not what they would call double buffered, but yes it fills alternate halves of the one buffer, for each interrupt. There is not an interrupt for every byte, only when it reaches the half-way point, and again when it wraps back to the beginning (circular mode). So one interrupt for every 24 CCR values = 24 bits of raw LED data = 1 LED.

I added some code there for the "reset" time, but that was more-or-less just an idea, and is not currently properly calculating the time for that.

I got to trying your code with the values set for 800kHz LEDs. If anything the problem with the first bit getting lost happens more often. And with some LEDs finally connected the problem is very obvious, at least when the LED data contain 0x80 values and the high-order bit for one LED effectively becomes the low-order bit for the preceding LED. The good news is that with the modification I suggested I did not see the problem.

ozcar · Post by **ozcar** » Thu May 07, 2020 10:56 pm

ozcar wrote: Thu May 07, 2020 3:31 am I suspect somebody may have done something similar already.

I decided to check if this method has been used before, and the very first thing brought up by my uncle Google is this: http://stm32f4-discovery.net/2018/06/tu ... eds-stm32/ .

I did not look very closely at the actual code, but he also decided to make the DMA buffer big enough for 2 LED's worth of data. He drew a nice picture there of the DMA buffer to show how it works, but TBH I could not follow all of what he was saying in the description below that - he made it sound lot more complicated than I thought it needed to be (but then maybe I missed something vital in my attempt).

I note his DMA buffer contains uint32_t values, while mine were uint16_t - I'm still not 100% sure if you could get it to work with uint8_t.

flyboy74 · Post by **flyboy74** » Fri May 08, 2020 7:27 am

ozcar wrote: Thu May 07, 2020 10:56 pm
ozcar wrote: Thu May 07, 2020 3:31 am I suspect somebody may have done something similar already.
I decided to check if this method has been used before, and the very first thing brought up by my uncle Google is this: http://stm32f4-discovery.net/2018/06/tu ... eds-stm32/ .

I did not look very closely at the actual code, but he also decided to make the DMA buffer big enough for 2 LED's worth of data. He drew a nice picture there of the DMA buffer to show how it works, but TBH I could not follow all of what he was saying in the description below that - he made it sound lot more complicated than I thought it needed to be (but then maybe I missed something vital in my attempt).

I note his DMA buffer contains uint32_t values, while mine were uint16_t - I'm still not 100% sure if you could get it to work with uint8_t.

Ha Ha Ha just when you think you have invented something new you find out someone else had the same idea

The largest value for the CCR will be for T1H of 0.8us on a STMF4 with timer running at 84MHz the vaule for CCR will be 0.8/(1/84) = 67 so will easily fit into a uint8_t although considering your only buffering 48 values it isn't a huge saving but still a slight savings

flyboy74 · Post by **flyboy74** » Fri May 08, 2020 7:53 am

ozcar wrote: Wed May 06, 2020 11:34 pm If I miss the trigger there (which I can't see happening anyway), then I get to see nothing. With that in place, I saw it usually working OK and outputting 8 bits, but very occasionally only 7, with the first bit missing.

I'm not saying it is going to fail exactly every 20th time - for me it was very random. However, when I eventually worked out what it was, I found I could get it to fail almost 100% of the time. The clue is there in what I did to fix it.

I was seeing it with the timing as you had it for 400kHz LEDs, but I only have 800kHz LEDs. With the different timing, and perhaps the optimisation set, it might be more or less likely to occur. If I get a chance, I will see if I can get it to happen with the 800kHz LEDs attached - I'm pretty sure it would cause a visible effect on the LEDs, particularly if the data sent was something like 0x808080808080... or 0x800000800000...

I have been thinking about why a bit could possibly get dropped, the only way would be for the interrupt to fire twice causing the the registers to be updated twice.

I have changed my interrupt handler a little bit so that all the update code is inside the loop that tests TIM_SR_UIF bit and I don't reset this bit till the end of the handler that way it can't fire again till after exit.

Please test this update and see if your still get a dropped bit https://github.com/OutOfTheBots/STM32_N ... ter/main.c

ozcar · Post by **ozcar** » Fri May 08, 2020 1:15 pm

flyboy74 wrote: Fri May 08, 2020 7:53 am
I have changed my interrupt handler a little bit so that all the update code is inside the loop that tests TIM_SR_UIF bit and I don't reset this bit till the end of the handler that way it can't fire again till after exit.

Please test this update and see if your still get a dropped bit https://github.com/OutOfTheBots/STM32_N ... ter/main.c

The good news is that with your modified interrupt routine the first bit does not get lost.

The bad news is that now, very occasionally, the first bit gets duplicated!

The other bad news, is that my explanation is a bit long...

Normally, let's say somewhere in the middle of generating the whole long stream of pulses, the execution of the interrupt routine is synchronised with the timer. That is, the interrupt routine always gets called just after the timer update event, when the count starts back at zero. The previous CCR value has just been latched, so it is then "safe" to set a new CCR value, without overlaying the previous value before it could get used, and you also have a (relatively) long time available to do this before the next update event. I'm mentioning CCR, same thing applies to ARR.

Things are different right at the start of the pulse stream kicked off in show_neopixels by turning on the UIE bit. The timer is already enabled at that point, and has been running, still generating update events which will have set the UIF flag. So, at soon as you clear UIE the interrupt will occur, and the interrupt routine will be called. It is not so much the fact that the interrupt routine gets called, but that this call to the interrupt routine is not synchronised with what is happening in the timer, it just depends on when you happen to call show_neopixels.

Usually, this does not cause a problem, but this is a sequence that might happen, with the original interrupt routine, which cleared UIF and then wrote to CCR1:

Interrupt occurs as soon as show_neopixels clears UIE.
Interrupt routine clears UIF.
Update event happens to occur now - this latches previously set CCR1 (0 for no pulse), and sets UIF again.
Interrupt routine sets CCR1 according to first data bit, to be latched at next update event.
As soon as interrupt routine exits, interrupt occurs again as UIF is set.
Interrupt routine clears UIF.
Interrupt routine sets CCR1 according to second data bit, overwriting the value for the first data bit, which has not been latched yet.
Result - first data bit is simply lost, pulse train starts with the second bit.

Now you changed the interrupt routine to write to CCR1 and then clear UIF. Imagine a sequence similar to that above, except now the first time the interrupt routine is called:

The update event happens to occur after CCR1 is updated with value for first bit, but before the UIF is cleared.
The update event latched the value for the first bit, and the pulse output has already gone high for the start of the first bit.
The interrupt routine clears UIF immediately after it was set.
Because UIF was cleared, at the time when the interrupt routine should have been called again to provide the value for the second bit, well, nothing happens (interrput not called).
However the timer is still running, and is quite happy to generate another pulse using the value for the first bit. Only when the timer is reusing the first CCR value for the second time, does the interrupt routine get called again, and it then continues as if nothing had happened.
Result - first bit gets duplicated.

My head hurts if I try to think what might happen if the update event happened between writing to CCR1 and ARR, and the zero and one bit ARR values were not the same.

Arduino for STM32

[F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers

Re: [F4] Neopixel driver using hardware timers