WS2812B (Neopixel) library has been added to the F1 core

Information on the latest releases
User avatar
Rick Kimball
Posts: 1058
Joined: Tue Apr 28, 2015 1:26 am
Location: Eastern NC, US
Contact:

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by Rick Kimball » Wed Jun 14, 2017 12:28 am

RogerClark wrote:From reading some other experimental blogs about the WS2812B, I was also wondering if all that was required to send a pixel 1 or 0, was just the length of time that the input is at in logic High state, and that perhaps a short pulse of 100nS followed by logic low for 100nS would be OK.
5 years or so ago I was obsessed with these ws281x chips. It was back when I was searching around for the MCU I wanted to spend my time focusing on. Getting the ws281x working on a chip became my standard test to see how flexible the chips architecture really was. If I could make a chip drive a w281x pixel without flogging myself, then I'd probably be happy writing code for the chip. I got the msp430s going pretty easily with its SPI and also by cycle count bit banging. The pic24f easily did it with cycle counting and a small asm routine even with the free compiler. The LPC1114 has a really flexible SPI peripheral and that was easy to get going. The LPC81x series has a SCT (State Configurable Timer) peripheral which offers nano second level logic based on a state table. That is easily the slickest and did everything without the cpu getting involved.

Ah but the STM32F103 for all its nice things isn't the easiest way to drive these leds. To be honest I find the STM32 SPI peripheral lacking even compared to the msp430 and lpc111x. It is not that flexible by itself, you have to use multiple peripherals to make it happy.

One approach to this problem for the STM32F103 I've thought about, but haven't implemented, is driving the SPI peripheral as a slave. I'd use the MISO pin as the output and send it to the led chip. I'd drive the SPI_CLK using a timer, that way you'd have much more control over the SPI clock. Of course I'd just ignore MOSI. Then the problem is to create an array of pulses you could just DMA into it. Which I guess is what you are doing now. The advantage would you could fine tune the SPI_CLK so the bits work out to your advantage. Personally I'd just use 8 bytes per GRB so you'd end up using 24 bytes per led pixel. However then you could drive the clock at 6.4MHz and have more bit resolution for tweaking. 800kHz / 8 bits yields a decent bit rate of 156.25 ns so 0b11000000 is 312.5 ns and 0b11110000 is 625 ns.

The code would look something like this
https://gist.github.com/RickKimball/976 ... 9d46a53939
(ignore that this is using an msp430f5529 and dma) ( Also, there are comments talking about the bit timing based on the SYSCLK speed in that code below)

The one other thing I spent some time doing was figuring out what timing my led chip is really using. Turn on your scope and watch the data you are sending in to the led. Make note of your timings then measure the data out pin coming out of the led chip going into the next chip. Those numbers are the timing your ws281x chip really wants. I don't remember the numbers but I do remember being surprised in that all the numbers I found in online data sheets didn't match the timing the chip was generating. Which of course is what the led chip really wants to be fed.
-rick

User avatar
RogerClark
Posts: 7491
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by RogerClark » Wed Jun 14, 2017 1:47 am

Thanks Rick

I'm obviously coming to this party very late ;-)

As you guessed... My code builds a bitstream to be sent via the SPI, and needs 3 SPI bits per 1 LED data bit

I build / pack 100 for a LED data bit of 1 and 110 for a LED data bit of 1

So it takes 24 bits for each colour, (3 bytes), and 9 bytes for all 3 colours. So its somewhat more compressed than your version, But...

Building / packing the 3 bit triplets and also having 3 bytes per colour byte, requires a lot more processing by the compiler

@racemanic did basically the same thing as I did (he used SPI DMA), but he uses 4 SPI bits per LED Pixel bit - which takes a bit more RAM, but also slows down the transmission by 30%, as he is still using the same "bit clock" of DIV32 (444.44nS)

(and he hasnt published his code as it was part of a larger project, rather than being a library)

So what he saves on processing / packing the bits, I think he more than looses in the transmission speed.

I added a partially hacky extra function to SPI as part of this, which does an asynchronous sent. (Note this currently does not have a callback, but I PM'ed @victor_pv for help with this, as I know he has code for it, and I'm waiting for him to get back to me)

But doing async transfers incurs a penality becuase most of the effects are additive and I ended up having to copy the whole encoded data buffer just after I kick off the async send.

There may be faster ways to do this, e.g. perhaps build / update the encoded SPI data buffer only when the user calls the show() command, but I'd need to add a system to only update pixels that have changed (though this could easily be done using the unused "white" channel in the uint32_t )

But I have a feeling that all these tricks including async sends will really only come into their own on the faster processors, as on the F103 it only yields a marginal improvement for most LED effects e.g.. ColorWheel

User avatar
Rick Kimball
Posts: 1058
Joined: Tue Apr 28, 2015 1:26 am
Location: Eastern NC, US
Contact:

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by Rick Kimball » Sat Jun 17, 2017 11:35 pm

FWIW: I was able to implement the Slave SPI approach using the new Arduino_Core_STM32 core that uses HAL with a NUCLEO-F030R8 board. The F030R8 only run at 48Mhz, however I used the internal high speed oscillator and slightly overclocked using a multiplier of 13 (52MHz) and then I reduced the speed using the Oscillator TRIM value until I got it to produce 51.2MHz. (RCC_OscInitStruct.HSICalibrationValue = 11;) That tweak allowed me to create a timer that ran at 6.4MHz and I just passed it an array of bytes with the values I wanted.

Here is the meat of how to send using SPI. I didn't bother doing the DMA:

Code: Select all

	/* Infinite loop */
	/* USER CODE BEGIN WHILE */
	while (1) {
	       // 1 ws2182b pixel
		uint8_t pixel_data[] = {
				0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11110000, 0b11110000,  // 0x03 G
				0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11110000, 0b11110000,  // 0x03 R
				0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11000000, 0b11110000, 0b11110000,  // 0x03 B 
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
				0,0,0,0,0,0,0,0,
		};
		/* USER CODE END WHILE */

		/* USER CODE BEGIN 3 */
		__HAL_SPI_ENABLE(&hspi1);
		hspi1.Instance->DR = 0X0;
		HAL_TIM_PWM_Start(&htim14, TIM_CHANNEL_1);
		HAL_SPI_Transmit(&hspi1, pixel_data, sizeof(pixel_data), 100);
		HAL_TIM_PWM_Stop(&htim14, TIM_CHANNEL_1);
		hspi1.Instance->DR = 0X0;
		__HAL_SPI_DISABLE(&hspi1);
		HAL_GPIO_WritePin(GPIOA, GPIO_PIN_6, GPIO_PIN_RESET);

		HAL_Delay(10);
	}
	/* USER CODE END 3 */
I just wired the timer output to the SPI_SCK pin and the output comes out on SPI_MISO.
Last edited by Rick Kimball on Sun Jun 18, 2017 9:09 am, edited 2 times in total.
-rick

User avatar
RogerClark
Posts: 7491
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by RogerClark » Sat Jun 17, 2017 11:58 pm

Thanks Rick

BTW. I wondered how you were going to clock the slave.

I thought perhaps there was some internal routing available, but now see you are doing it with an external link.

fotisl
Posts: 14
Joined: Fri May 19, 2017 8:35 am

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by fotisl » Thu Jun 29, 2017 12:45 pm

I think you may be able to gain some time by using pointers to 32 bit ints and some logic operators. For example:

Code: Select all

uint32_t *p, *q;
*p = (*p & 0xff000000) | (*q & 0x00ffffff);
Depending on the compiler's output, the following may be faster:

Code: Select all

uint32_t *p, *q;
uint32_t mask = 0xff000000;
*p = (*p & mask) | (*q & ~mask);
Further optimization using inline assembly may be even faster.

Please note that the code above is for little endian architectures, for big endian you have to change the masks.

Fotis

lmamakos
Posts: 1
Joined: Sun Feb 21, 2016 4:31 am

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by lmamakos » Wed Jul 26, 2017 8:43 pm

Just a thought to avoid copying the buffers; why not set the pixel in both buffers? It seems like that would be a win for individual LED twiddling rather than copying the entire buffer each time, just in case? It might make sense to have some optimized alternatives for clearing/setting all the LEDs to one value.

User avatar
RogerClark
Posts: 7491
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by RogerClark » Sun Jul 30, 2017 7:58 am

It depends on how many pixels are changed

Buffer copies are quicker (per pixel) than setting individual pixels in buffers.

The memcpy can run while the DMA is transferring the pixel data, but setting the pixel in both buffers would need to happen in the foreground hence

But if not many pixels are changed before they are sent to the LED's, then Yes... Setting the pixel into both buffers would be quicker and better


Overall, there doesn't seem to be an optimal way to do this with the STM32F103 as it does not have dedicated hardware that can generate the pulsetrain without a lot of extra processing of data and larger buffers.


For small LED strips and small numbers of pixels changed per output, then bit banging is probably more efficient.

But longer strips or arrays etc, DMA becomes more efficient

victor_pv
Posts: 1750
Joined: Mon Apr 27, 2015 12:12 pm

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by victor_pv » Sun Jul 30, 2017 1:00 pm

Roger, I replied about adding the callback feature to your async function somewhere, I think github.
I don't see a problem doing it.
We can use the a similar method to what I did with dmaSend:
If *callback is NULL, execute as of today.
If *callback is set, then skip the check since you are supposed to have used the callback to determine when the previos transfer is over.
I'll have a look at the code and see if I can write it and test it with something.

Regarding the buffer copy, not sure if you thought of it already, but you could use a second DMA channel to do that, without any MCU usage.
Since the DMA is super fast for that, you could:
Set initial buffer.
On show() set DMA copy from buffer 1 to buffer 2 (do not wait for it, keep running), then set SPI DMA async transfer.
As show returns, you can start working in buffer 2 from the bottom up. By this time the DMA memcpy has already had time to copy a bunch, and goes up the buffer faster than the MCU, so by the time you get to update a byte is very likely the transfer has completed, but if you want, you can check and confirm the DMA copy has finished before you edit the buffer.

This way the buffer copy is going in the background during the time you are setting the SPI async, returning calls, etc, and not taking CPU cycles right when you are going to update. If you use 32bit aligned buffers, you can set the DMA to 32bit mode, so only 2 RAM accesses per 4 bytes copied.

User avatar
RogerClark
Posts: 7491
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: WS2812B (Neopixel) library has been added to the F1 core

Post by RogerClark » Sun Jul 30, 2017 9:25 pm

Victor,

I thought that @racemaniac found that memory to memory dma impacted in SPI DMA performance, but for these LED strings, the SPI must run at a constant rate with no breaks, otherwise it screws up the timing of the protocol

If memcpy DMA can be set at a lower priority to the SPI DMA so it did not interfere, them I would definitely use it

Post Reply