F4 - Update boards.txt

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: F4 - Update boards.txt

Post by Pito » Tue Jul 04, 2017 8:48 am

Long time back I messed with eLua, and I succeeded (with help of the community experts) to allocate full 192kB for the eLua on F4. It was done by an "allocator=multiple" directive and maybe 2 lines in a script defining the memory setup. And it worked. Being not a talented programmer I have to dig into old topics to find out how it worked..

PS: http://elua-development.2368040.n2.nabb ... 82063.html
PPS: eLua uses scons instead of make..
allocator = newlib | multiple | simple: choose between the default newlib allocator (newlib) which is an older version of dlmalloc, the multiple memory spaces allocator (multiple) which is a newer version of dlmalloc that can handle multiple memory spaces, and a very simple memory allocator (simple) that is slow and doesn’t handle fragmentation very well, but it requires very few resources (Flash/RAM). You should use the multiple allocator only if you need to support multiple memory spaces (for example boards that have external RAM). You should use simple only on very resource-constrained systems.
So a mastering the multiple fragmented Sram spaces would be a great achievement here - consider the upcoming STM32H7 possesses maybe 4 6 (Update: 128+64+512+288+64+4) internal scattered sram spaces plus an external sram/sdram space as well :)
Pukao Hats Cleaning Services Ltd.

victor_pv
Posts: 1681
Joined: Mon Apr 27, 2015 12:12 pm

Re: F4 - Update boards.txt

Post by victor_pv » Tue Jul 04, 2017 3:45 pm

Pito wrote:
Tue Jul 04, 2017 8:48 am
Long time back I messed with eLua, and I succeeded (with help of the community experts) to allocate full 192kB for the eLua on F4. It was done by an "allocator=multiple" directive and maybe 2 lines in a script defining the memory setup. And it worked. Being not a talented programmer I have to dig into old topics to find out how it worked..

PS: http://elua-development.2368040.n2.nabb ... 82063.html
PPS: eLua uses scons instead of make..
allocator = newlib | multiple | simple: choose between the default newlib allocator (newlib) which is an older version of dlmalloc, the multiple memory spaces allocator (multiple) which is a newer version of dlmalloc that can handle multiple memory spaces, and a very simple memory allocator (simple) that is slow and doesn’t handle fragmentation very well, but it requires very few resources (Flash/RAM). You should use the multiple allocator only if you need to support multiple memory spaces (for example boards that have external RAM). You should use simple only on very resource-constrained systems.
So a mastering the multiple fragmented Sram spaces would be a great achievement here - consider the upcoming STM32H7 possesses maybe 4 6 (Update: 128+64+512+288+64+4) internal scattered sram spaces plus an external sram/sdram space as well :)
I know how to place stack, heap, or anything else we want in CCM RAM. I do not know how to make the linker place anything in one or the other by its own decission. I checked that thread, but seems that allocator is a function, not something in the linker script, is that right?

Forcing the stack and heap there is not difficult, the problematic part is if the user code creates a buffer during runtime that goes to either stack or heap and tries to use that for DMA, it will crash.
Other than that, is great for those usages and leaves the main chunk of RAM to be used for user global variables, buffers etc. But do we want that risk?
Perhaps we can use a compile option to tell whether we want stack and heap, or whatever else, in CCM, so if we know we need to do DMA in a local variable or one allocated with Malloc, we select the board options to avoid it?

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: F4 - Update boards.txt

Post by Pito » Tue Jul 04, 2017 4:26 pm

but seems that allocator is a function, not something in the linker script, is that right?
The eLua builds by scons, the allocator=multiple is a scons parameter, based on that it includes the dlmalloc.c (130kB large source) into the build. The compiler itself was the codesourcery. Frankly, the details are not known to me, as the eLua build is a fairly complex exercise.
I think the decision on DMA accessible variables has to be left on the user (via some attributes) as the creating a build system which will consider all the MCU related nuances would be quite an effort..
Pukao Hats Cleaning Services Ltd.

testato
Posts: 39
Joined: Sun Aug 14, 2016 7:44 am

Re: F4 - Update boards.txt

Post by testato » Tue Jul 04, 2017 4:49 pm

i think that for now is good exclude the CCM Ram from the info received at the end of the compilation, so the user know how many real Ram capability managing is there on the actual version of core.

If in the future, if will be implemented a usage of the CCM Ram, i think will be better explain that also on the board.txt and not simply increase the value displayed, for example should be:

Code: Select all

Sketch uses 21,548 bytes (4%) of program storage space. Maximum is 514,288 bytes.
Global variables use 12,064 bytes (9%) of dynamic memory
CCM Ram use xxx bytes. Maximum is xxx bytes
This is the new PR
https://github.com/rogerclarkmelbourne/ ... 2/pull/303

victor_pv
Posts: 1681
Joined: Mon Apr 27, 2015 12:12 pm

Re: F4 - Update boards.txt

Post by victor_pv » Sat Jul 29, 2017 12:26 pm

Adding to the CCM discussion in this thread:
In F4 MCUs we have 64KB of CCM memory.
There are 2 restrictions using CCM:
  • The CCM memory can be used for data, no code can be run from it.
  • DMA controllers can not access
The advantage is that CCM is used exclusively by the MCU in a separate bus, so doesn't share bandwidth with any other memory or peripheral.
In theory you could have fast DMA going on in the normal RAM, while the CPU runs from flash using CCM data with no penalty in CPU or DMA performance.

With that in mind, there are 3 possible uses for CCM:
  • Heap
  • Stack
  • Normal user variables.
As long as we don't use those variables allocated in those blocks for DMA, all is good.
In the past as proof of concept I modified Steve's F4 USB code to allocate its buffers in CCM. Was not much trouble, and gives the option to use large buffers without taking from normal RAM.
I also allocated the Heap and Stack to CCM, and that provided a small speed gain in one of the CPU benchmarks.
I have not tested like racemaniac did to push the CPU+DMA, but should allow more concurrent operations with no penalty.

Now, allocating all the normal data to CCM is very risky, because if a user allocates a buffer for DMA use there, the code will crash.
Heap and Stack are used can be used for variables too, so it's somewhat risky, but most people using DMA will use a globally allocated buffer (not always though).

With all those conditions in mind, I have been thinking that a good compromise on using CCM without causing much pain would be to allow it as a board option. Similarly to selecting between stlink upload or bootloader upload makes the linker use a different linker script with different addresses, we could add an option that uses a script allocating Stack and Heap to CCM.
We can also add a #define in the core, similar to how __FLASH__ is defined to allow the user to force a variable to flash (that's not used often since the linker will place RO data in flash anyway, but it's in the core):

Code: Select all

#define __attr_flash __attribute__((section (".USER_FLASH")))
#define __FLASH__ __attr_flash
I'm not sure if anyone is using all the RAM in an F4 mcu to need the extra memory by CCM, but the speed gains may be more interesting to some.

What's everyone thought on this?
Should we add a menu option to allocate heap and stack to CCM?
or should be add defines and options in the linker script to allocate data to CCM with an attribute?
or no one has interest at the moment in using CCM?

stevestrong
Posts: 1746
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany

Re: F4 - Update boards.txt

Post by stevestrong » Sat Jul 29, 2017 3:02 pm

I would welcome heap and stack in CCM.
I would make it default.

Using large buffers for DMA on stack makes not much sense to me, I don't know any application doing this, and it seems to me a sub-optimal practice.

One special case would be the writing same data with DMA (with no increment of source pointer), but for that we could adapt the SPI DMA functions to convert to a global buffer (in normal RAM) the input data of passed buffer pointer[0].

victor_pv
Posts: 1681
Joined: Mon Apr 27, 2015 12:12 pm

Re: F4 - Update boards.txt

Post by victor_pv » Sat Jul 29, 2017 4:10 pm

stevestrong wrote:
Sat Jul 29, 2017 3:02 pm
I would welcome heap and stack in CCM.
I would make it default.

Using large buffers for DMA on stack makes not much sense to me, I don't know any application doing this, and it seems to me a sub-optimal practice.

One special case would be the writing same data with DMA (with no increment of source pointer), but for that we could adapt the SPI DMA functions to convert to a global buffer (in normal RAM) the input data of passed buffer pointer[0].
That's a good option, hopefully will not cost many cycles. Something that compares the pointer address to the CCM range, if it matches then do the copy.
Or perhaps add an assertion that fails at compile time if using CCM address?

User avatar
RogerClark
Posts: 7156
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: F4 - Update boards.txt

Post by RogerClark » Sat Jul 29, 2017 9:59 pm

Slightly off topic, but from @racemaniac's DMA speed tests, I don't think that this RAM being on a separate bus would make much difference to performance unless you are doing specific things , e.g. Memory to memory DMA at the same time as DMA to SPI.

victor_pv
Posts: 1681
Joined: Mon Apr 27, 2015 12:12 pm

Re: F4 - Update boards.txt

Post by victor_pv » Sun Jul 30, 2017 3:03 am

RogerClark wrote:
Sat Jul 29, 2017 9:59 pm
Slightly off topic, but from @racemaniac's DMA speed tests, I don't think that this RAM being on a separate bus would make much difference to performance unless you are doing specific things , e.g. Memory to memory DMA at the same time as DMA to SPI.
It all depends, for some uses there may be a performance improvement, for others may be nice to use the extra 64KB.
What do you think about Steve's suggestion to move heap and stack to CCM?

User avatar
RogerClark
Posts: 7156
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: F4 - Update boards.txt

Post by RogerClark » Sun Jul 30, 2017 3:39 am

What benefit is there to moving both Stack and Heap to CCM ?

Surely it would be better to only move either Stack or Heap, so that both get more space.

Post Reply