STM32L4 Core

Cores are the underlying magic that make the Arduino API possible
Post Reply
User avatar
GrumpyOldPizza
Posts: 184
Joined: Fri Apr 15, 2016 4:15 pm
Location: Denver, CO

STM32L4 Core

Post by GrumpyOldPizza » Tue May 10, 2016 11:49 pm

Just put the first alpha of the STM32L4 core onto github. Perhaps somebody is interest in reviewing, installing, commenting, or ... actually using.

The target hardware is this here: https://www.tindie.com/products/onehors ... out-board/

The code is here: https://github.com/GrumpyOldPizza/arduino-STM32L4

Ok, what is special about it ?

The concept was to come up with an Arduino Zero compatible pinout and software layer. There is a 16MB NOR FLASH on the board which is acessable from a sketch as normal local file system, while at the same time it will show up on the host via USB/MSC. Nice for logging to say at least. Overall low-power was the main motivator.

Something that always bothered me about Arduino is the idea of blocking, synchronous io. Most of the things I deal with really shine with asynchronous io. So this port added consistently asynchronous io. For I2C/Wire it looks like this here:

Code: Select all

    bool transfer(uint8_t address, const uint8_t *txBuffer, size_t txSize, uint8_t *rxBuffer, size_t rxSize, bool stopBit, void(*callback)(uint8_t));
    bool done(void);
    uint8_t status(void);
So either you can get a callback, or periodically check whether the last transfer is done ... A good example for the use is this here: https://github.com/kriswiner/Dragonfly/ ... 0Optimized


A couple of more details (edited as I completely forgot in the original post):
  • Full set of CDC, Uart, SPI & Wire classes supported
  • Most communication peripherals use automatically DMA whenever possible
    • Serial1 uses DMA on RX & TX, Serial2 uses DMA on RX
    • SPI, SPI1 and SPI2 use DMA on RX & TX
    • Wire uses DMA on RX
  • All communication devices are non-blocking internally. Any Arduino API that waits for a device (say Wire.requestFrom()) puts the MCU into a sleep mode till the pending communication is completed
  • All communication devices can be used from either within setup()/loop() or interrupt handlers or callbacks
  • Peripherals are internally aggressively clock gated. If for example there is no active communication on an I2C bus, the corresponding peripheral is powered down
  • analogWriteFrequency() and analogWriteRange() have been added to control the PWM freqeuncy and range of a timer block, so that high precision servo or motor/ESC control can be implemented
  • delayMicroseconds() uses the internal CPU Cycle Counter, which is not affected by interrupts
  • USB/MSC accesses and local file system accesses from a sketch can be done in parallel (no explicit locking required)
  • There is a power failsafe FAT file system embedded
  • The internal DAC is available on A0/A1
Ah well, again this is an early alpha, and some bits that are still pending to be added (Servo, RTC to be specific).
Last edited by GrumpyOldPizza on Wed May 11, 2016 1:32 pm, edited 1 time in total.

User avatar
WereCatf
Posts: 166
Joined: Sat Apr 16, 2016 5:30 pm

Re: STM32L4 Core

Post by WereCatf » Wed May 11, 2016 12:15 am

GrumpyOldPizza wrote:The target hardware is this here: https://www.tindie.com/products/onehors ... out-board/
Eyyy, that looks like an interesting design! Including the 16MB Flash there is a great idea. Too bad most of the things I wanna do involve WiFi.
GrumpyOldPizza wrote:Something that always bothered me about Arduino is the idea of blocking, synchronous io. Most of the things I deal with really shine with asynchronous io. So this port added consistently asynchronous io. For I2C/Wire it looks like this here:
This is something I'd like to see with the other cores, too, to be honest.

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: STM32L4 Core

Post by RogerClark » Wed May 11, 2016 12:30 am

Interesting

Where are the CMSIS / Drivers files from? The STM32 Cube ?

Did you write all of the mid layer code to interface with the CMSIS

There is a parallel project that @sheepdoll and @vassilis are working on which uses the STM32 Cube to build a new "core" which uses the HAL

This core would support the STM32L4, (if they exported the Cube files for that target)

But at the moment it only has a USB CDC Serial device, and it will operate like libmaple, using the bootloader to upload via DFU before it switches to using the Serial USB CDC device built into the sketch

User avatar
GrumpyOldPizza
Posts: 184
Joined: Fri Apr 15, 2016 4:15 pm
Location: Denver, CO

Re: STM32L4 Core

Post by GrumpyOldPizza » Wed May 11, 2016 12:42 am

RogerClark wrote:Interesting

Where are the CMSIS / Drivers files from? The STM32 Cube ?

Did you write all of the mid layer code to interface with the CMSIS

There is a parallel project that @sheepdoll and @vassilis are working on which uses the STM32 Cube to build a new "core" which uses the HAL

This core would support the STM32L4, (if they exported the Cube files for that target)
The CMSIS files ... Either from the ARM repository or CubeL4.

And yes the middle layer (minus USB) was written directly on top of CMSIS.

And yes, I am aware of the efforts to use the ST HAL to put a core on top.

As with many other implementations the USB/CDC code resets the device into DFU mode, which is used to flash the device. The sequence is the traditional Arduino one though. So all in all quite similar.

Ah, there is SWD as well, which was used extensively during development.
Last edited by GrumpyOldPizza on Wed May 11, 2016 3:46 am, edited 1 time in total.

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: STM32L4 Core

Post by RogerClark » Wed May 11, 2016 1:04 am

And yes, the USB/CDC code resets the device into DFU mode, which is used to flash the device. The sequence is the traditional Arduino one though. Ah, there is SWD as well, which was used extensively during development.
I thought you were using "mass storage" mode to upload ?

User avatar
GrumpyOldPizza
Posts: 184
Joined: Fri Apr 15, 2016 4:15 pm
Location: Denver, CO

Re: STM32L4 Core

Post by GrumpyOldPizza » Wed May 11, 2016 3:41 am

RogerClark wrote:
And yes, the USB/CDC code resets the device into DFU mode, which is used to flash the device. The sequence is the traditional Arduino one though. Ah, there is SWD as well, which was used extensively during development.
I thought you were using "mass storage" mode to upload ?
Ah, no. "mass storage" is there to allow the host direct access to the NOR FLASH, as opposed to what ESP8266 does with their SPIFFS.

But yes, the idea to allow a "flash.bin" on the NOR FLASH that gets flashed into the microcontroller on reset has been raised before, but it's a tad more complicated to do. Guess this might be added after the code had a bit more soak time.

So right now the code really makes use of the builtin DFU bootloader, which integrates nicely with the standard way of what the Arduino IDE wants to do.

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: STM32L4 Core

Post by RogerClark » Wed May 11, 2016 5:42 am

OK

I'm not sure where I got the idea you were using mass storage, I think it must have been another project ;-)

I think the new BBC Micro Bit uses that method, but it uses a completely different processor.
And I think it uses a separate processor for upload to the main processor

It would be possible to write a bootloader that enumerated as a Mass storage device, which wrote straight into flash, and then jumped to the sketch start address, and I guess this removes the need for a DFU driver.

Its a shame there isn't a default Serial CDC VID PID that can be uses, as most systems have this sort of driver as part of the OS (thats what libmaple uses, but it has to enumate as a leaflabs device, which means we have to fool windows into loading the correct / built in driver )

User avatar
GrumpyOldPizza
Posts: 184
Joined: Fri Apr 15, 2016 4:15 pm
Location: Denver, CO

Re: STM32L4 Core

Post by GrumpyOldPizza » Wed May 11, 2016 11:44 am

RogerClark wrote:OK

I'm not sure where I got the idea you were using mass storage, I think it must have been another project ;-)

I think the new BBC Micro Bit uses that method, but it uses a completely different processor.
And I think it uses a separate processor for upload to the main processor

It would be possible to write a bootloader that enumerated as a Mass storage device, which wrote straight into flash, and then jumped to the sketch start address, and I guess this removes the need for a DFU driver.
You are thinking of MBED. But they do have a secondary debug chip, that handles STLINK/CMSIS-DAP as well as this kind of virtual USB/MSC. It's also not that tricky to do with a single MCU setup, but it requires a rather stable USB stack and a reserved bootloader area.

IMHO it's probably better to allow a SFLASH or SDCARD to have a flash update stored, so that the bootprocess can recover from a reset during the flash process.
RogerClark wrote:Its a shame there isn't a default Serial CDC VID PID that can be uses, as most systems have this sort of driver as part of the OS (thats what libmaple uses, but it has to enumate as a leaflabs device, which means we have to fool windows into loading the correct / built in driver )
Yes and no. Everything is good and fine with Linux, just Windows is such a PITA. For the Arduino IDE you want to have a different VID/PID so that the user can select the right "COM" port. But having to use Zadig/wdi-simple to force a inf update that just uses an already builtin driver is just a tad silly. To work around those issues I am almost tempted to add a USB/DFU class driver to the CDC/MSC composite device with WCID support so that for Windows 10 all of this would be unnecessary.

But in the grand scheme of things, there are other priorities for this STM32L4 core. Next up is a USB/HID based gdbstub, and then some reasonable RTOS support.

User avatar
Rick Kimball
Posts: 1056
Joined: Tue Apr 28, 2015 1:26 am
Location: Eastern NC, US
Contact:

Re: STM32L4 Core

Post by Rick Kimball » Wed May 11, 2016 5:30 pm

GrumpyOldPizza wrote:Just put the first alpha of the STM32L4 core onto github. Perhaps somebody is interest in reviewing, installing, commenting, or ... actually using.
...
The code is here: https://github.com/GrumpyOldPizza/arduino-STM32L4
I just gave this a spin on Arduino 1.6.9 (compiled from github source) on a linux box (ubuntu 15.10) using the arduino arm-none-eabi-gcc (4.8.3) and it does seem to compile. I don't have a real device so I just poked around at the generated .elf file. There are a few warning on the compile but it successfully links.

I did notice that the generated binary is rather large, about ~45k. This was using the ASCIITable example. On a generic f103cx board it is only about ~14k for the maple version. Granted it probably doesn't matter with a board that has 512K of flash. I noticed that you turned on the -mslow-flash-data option. I thought those F4 chips had that ART flash accelerator that effectively makes flash zero wait state? Is there any advantage to using the movw/movt style of ldr loading compared to using the typical loading from PC relative constants?

Looking good :)
-rick
-rick

User avatar
GrumpyOldPizza
Posts: 184
Joined: Fri Apr 15, 2016 4:15 pm
Location: Denver, CO

Re: STM32L4 Core

Post by GrumpyOldPizza » Wed May 11, 2016 6:05 pm

I just gave this a spin on Arduino 1.6.9 (compiled from github source) on a linux box (ubuntu 15.10) using the arduino arm-none-eabi-gcc (4.8.3) and it does seem to compile. I don't have a real device so I just poked around at the generated .elf file. There are a few warning on the compile but it successfully links.
Thanx for checking. Yes, I need to get some of the warnings figured out. The system code should be fairly clean. I moved late from gcc 4.9 to 4.8.3, so some things need cleanup. Did you see anything worrying ?
I did notice that the generated binary is rather large, about ~45k. This was using the ASCIITable example. On a generic f103cx board it is only about ~14k for the maple version. Granted it probably doesn't matter with a board that has 512K of flash.
I had not checked into the size. There is are couple of reasons. One is that the image always carries the full USB stack, including USB/MSC and the FAT backend along with it. This will even worse when a gdbstub and an RTOS are added. But as you pointed out, it should not matter that much with 512k. An alternate solution is to keep some kind of boot image constantly in the flash that contains all of those functions, but does not get reflashed every time.
I noticed that you turned on the -mslow-flash-data option. I thought those F4 chips had that ART flash accelerator that effectively makes flash zero wait state? Is there any advantage to using the movw/movt style of ldr loading compared to using the typical loading from PC relative constants?
Ah, yes the effects of marketing ;-) ART is really just a code/data cache. On STM32L4 it's 1024/256 bytes. Running at 80Hz one gets hit with 4 wait states with a FLASH access that is not in the cache. ART has a prefetcher that sucks up a lot of power, but does not seem to be terribly efficient. So without that "-mslow-flash-data" option, gcc places all literals at the end of a function (or sometimes groups of functions). So most of those literal fetches will cause a miss on the data cache part. The prefetcher does not prefetch data, only instructions. So for every 64 bit of literal data, you will take the 4 cycle hit. On the other hand if the compiler emits movw/movt the literals are part of the intruction stream and can be efficiently prefetched. I wish I had written down numbers when I did the analysis.

N.b. STM32F4 is different. STM32F446 for example has the same cache sizes, but 128 bit wide accesses, and at 80Mhz only 2 wait states (at 180MHz still 5). So the perf pattern there is different.

Post Reply