Compiling with Link Time Optimization enabled

Post here first, or if you can't find a relevant section!
grafalex
Posts: 9
Joined: Tue May 16, 2017 4:11 pm

Compiling with Link Time Optimization enabled

Postby grafalex » Tue May 16, 2017 4:38 pm

.... moving discussion here

Hi everyone,

I tried to compile stm32duino with -flto (Link Time Optimization) enabled and got number of warnings like this (same for other timers, and adc):

Code: Select all

D:\GRAFAL~1\Hobbies\Prog\MYPROJ~1\GPSLOG~3\Libs\STM32D~1\system\libmaple\include/libmaple/timer.h:138:18: warning: type of 'timer1' does not match original declaration
 extern timer_dev timer1;
                  ^
D:\Graf Alex\Hobbies\Prog\MyProjects\GPSLogger3\Libs\STM32duino\cores\maple\libmaple\timer.c:53:11: note: previously declared here
 timer_dev timer1 = ADVANCED_TIMER(1);


The code is

Header

Code: Select all

/** Timer device type */
typedef struct timer_dev {
    timer_reg_map regs;         /**< Register map */
    rcc_clk_id clk_id;          /**< RCC clock information */
    timer_type type;            /**< Timer's type */
    voidFuncPtr handlers[];     /**<
                                 * Don't touch these. Use these instead:
                                 * @see timer_attach_interrupt()
                                 * @see timer_detach_interrupt() */
} timer_dev;


Source:

Code: Select all

timer_dev timer1 = ADVANCED_TIMER(1);
/** Timer 1 device (advanced) */
timer_dev *const TIMER1 = &timer1;


First attempt to fix that using only pointers to the structure failed. I ended up with growing RAM usage (see https://github.com/rogerclarkmelbourne/ ... 2/pull/286)

Now I took a deeper look an see what the problem is. The cause is that timer_dev structure has variable size due to handlers[] array. This means that specific timers can be created with different number of handlers in the list. Compiler (actually linker) generates the warning because it can't calculate exact object size just by extern declaration.

I have a solution. It seems that timer1 object is statically allocated, no one is copying it. Everyone else in the code is using TIMER1, which is pointer to timer1. Perhaps we could use some kind of structure inheritance: structure is split into public part (first 3 fields) and a private part (handlers array) which is visible only in timer.c

It could look like this:
Header

Code: Select all

/** Timer device type */
typedef struct timer_dev {
    timer_reg_map regs;         /**< Register map */
    rcc_clk_id clk_id;          /**< RCC clock information */
    timer_type type;            /**< Timer's type */
} timer_dev;


timer.c

Code: Select all

/** Timer device type */
typedef struct timer_dev_pvt
{
    timer_dev dev; // Not a pointer - whole stucture is placed here
    voidFuncPtr handlers[];     /**<
                                 * Don't touch these. Use these instead:
                                 * @see timer_attach_interrupt()
                                 * @see timer_detach_interrupt() */
} timer_dev_pvt;

timer_dev_pvt timer1 = ADVANCED_TIMER(1);
timer_dev *const TIMER1 = &timer1;


If using c++ compiler instead of C it could be even more elegant (just structure inheritance)

timer_dev is a fixed size structure, so Link Time Optimization applies well. All the code which needs the handlers field will just cast passed timer_dev pointer to timer_dev_pvt one.

Any thoughts?

stevestrong
Posts: 1006
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany

Re: Compiling with Link Time Optimization enabled

Postby stevestrong » Tue May 16, 2017 5:18 pm

In general, PIN_MAP must contain constant data in order to be moved to FLASH.
That means, the linker has to know at link time with which kind of data must be the table filled up.
That's why it contains addresses of structures/arrays/variables, because their allocated addresses are fixed at link time. Any other alternative will move PIN_MAP to RAM.

Regarding you problem, I would do following:
As TIMER1 is already a pointer to timer1,

Code: Select all

timer_dev *const TIMER1 = &timer1;

why not replacing the line where timer1 is used

Code: Select all

timer_dev timer1 = ADVANCED_TIMER(1);

with:

Code: Select all

*TIMER1 = ADVANCED_TIMER(1);

This way timer1 is not directly initialized, only indirectly through TIMER1.
So the LTO should not throw any warning, I hope.

Another alternative is to remove the "static" declaration for timer structures in general. Done for F1, to be done for F3.

Or move the timer structure initialization to an init_timers() function.

grafalex
Posts: 9
Joined: Tue May 16, 2017 4:11 pm

Re: Compiling with Link Time Optimization enabled

Postby grafalex » Tue May 16, 2017 6:30 pm

I doubt this will work. The issue is that timer structures have different size, so compiler can't allocate a space for them as it has to know the size in advance. In other words sizeof(ADVANCED_TIMER(1)) is greater than sizeof(timer_dev) (and that is why compiler generates a warning)

I just realized, that my idea will not work as well

User avatar
RogerClark
Posts: 5554
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Compiling with Link Time Optimization enabled

Postby RogerClark » Tue May 16, 2017 10:19 pm

If you made those arrays a fixed size, Did link time optimisation result in smaller ROM or RAM footprint?

grafalex
Posts: 9
Joined: Tue May 16, 2017 4:11 pm

Re: Compiling with Link Time Optimization enabled

Postby grafalex » Wed May 17, 2017 7:53 pm

Please check out my changes at https://github.com/rogerclarkmelbourne/ ... /286/files. I did timer_dev and adc_dev structures as fixed size. Instead of array of handlers now these structures have a pointer to array of handlers.

In case of timers I had to change initialization macros. Macro were renamed from ADVANCED_TIMER to DEFINE_ADVANCED_TIMER (an so on for other timers). This change should reflect nature of the macro. Now they define variables (timer1, timer2, ...), not just initialization structure for them. Handler arrays are also defined in those macros.

Despite I could generate variable name inside the macro (e.g. timer##num), I decided that it would be better to accept variable name explicitly - this will help searching for a definition by variable name.

I also fixed array initialization (in case of adc handlers there was an array overrun). Now it explicitly specify elements range that is initialized.

I tested on a F103CB/BluePill. Blink example (using timer) works fine. Let me know how I can easily to run a deeper test.

As for memory questions.

PIN_MAP is still in the flash
.text segment is around the same (+8 bytes)
.data segment reduced by 112 bytes
.bss segment increased by 128 bytes
Last 2 items means that some of the objects (particularly handler arrays - 108 bytes) moved from data to bss. Total .data + .bss change is +16 bytes due to extra pointers.

Hope this is not a big deal. Esspecially as this change opens new opportunity - define timer_dev objects as constants and move them to flash as well (handler arrays remain non-constant). This will also imply making TIMERx pointers constant as well, mark all functions accepting timer_dev pointer to accept constant pointer. Perhaps this will also allow compiler to better optimize code around this. Though this should be a separate change.

grafalex
Posts: 9
Joined: Tue May 16, 2017 4:11 pm

Re: Compiling with Link Time Optimization enabled

Postby grafalex » Wed May 17, 2017 8:07 pm

RogerClark wrote:If you made those arrays a fixed size, Did link time optimisation result in smaller ROM or RAM footprint?

Roger, I am not sure I understand your question.

Offtopic:
I have to say that fixing these warnings (original question in this topic) is just a half of the LTO deal. There is another issue: ar/ranlib require a plugin to work with lto objects. Despite the plugin is bundled with the compiler (in my case it is C:\Program Files (x86)\GNU Tools ARM Embedded\5.4 2016q3\lib\gcc\arm-none-eabi\5.4.1\liblto_plugin-0.dll), I have not yet found a nice cross platform way to specify the plugin to the build system.

In the mean time I hacked ar/ranlib options and passed plugin explicitly as full path. I could link my ~60kb project. Overall outcome so far is -3.5kb in .text section and -24 bytes in RAM. Though I have not investigated deeper. Perhaps there is a room for improvement.

victor_pv
Posts: 1235
Joined: Mon Apr 27, 2015 12:12 pm

Re: Compiling with Link Time Optimization enabled

Postby victor_pv » Thu May 18, 2017 3:52 pm

grafalex wrote:
RogerClark wrote:If you made those arrays a fixed size, Did link time optimisation result in smaller ROM or RAM footprint?

Roger, I am not sure I understand your question.

Offtopic:
I have to say that fixing these warnings (original question in this topic) is just a half of the LTO deal. There is another issue: ar/ranlib require a plugin to work with lto objects. Despite the plugin is bundled with the compiler (in my case it is C:\Program Files (x86)\GNU Tools ARM Embedded\5.4 2016q3\lib\gcc\arm-none-eabi\5.4.1\liblto_plugin-0.dll), I have not yet found a nice cross platform way to specify the plugin to the build system.

In the mean time I hacked ar/ranlib options and passed plugin explicitly as full path. I could link my ~60kb project. Overall outcome so far is -3.5kb in .text section and -24 bytes in RAM. Though I have not investigated deeper. Perhaps there is a room for improvement.


I wanted to discuss something about the changes you are doing for LTO.
I you you mentioned the change for FreeRTOS was needed because otherwise that function would be left out by the linker. Now, the FreeRTOS files have minimal changes to make it compatible with the core, and nothing related to the function you had to mark as "used". Do you have any idea why that function was left out without that attribute? FreeRTOS is widely used, and if that was a somewhat common problem I think RealTime Engineer would have fixed it already.
Can we cause any problem by adding that attribute, such us getting that function, or even others called by it, included in the resulting binary when not actually needed?
I haven't had time to check what the function is for, but I think it was something related to context switching, so it's probably always needed for FreeRTOS, but that makes me even wonder more why would the linker leave it out.

grafalex
Posts: 9
Joined: Tue May 16, 2017 4:11 pm

Re: Compiling with Link Time Optimization enabled

Postby grafalex » Fri May 19, 2017 12:49 pm

victor_pv wrote:I you you mentioned the change for FreeRTOS was needed because otherwise that function would be left out by the linker. Now, the FreeRTOS files have minimal changes to make it compatible with the core, and nothing related to the function you had to mark as "used". Do you have any idea why that function was left out without that attribute? FreeRTOS is widely used, and if that was a somewhat common problem I think RealTime Engineer would have fixed it already.
Can we cause any problem by adding that attribute, such us getting that function, or even others called by it, included in the resulting binary when not actually needed?
I haven't had time to check what the function is for, but I think it was something related to context switching, so it's probably always needed for FreeRTOS, but that makes me even wonder more why would the linker leave it out.


I just took one more look. The problem happens with vTaskSwitchContext() which is responsible for switching between different tasks/threads. This is essential part of the library, so it always has to be included into the binary. At least I can't imagine a meaningful library usage without this function.

The function gets called from 2 places:
- piece of asm code in __exc_pendsv() function in port.c. This function is a PendSV interrupt handler which actually does context switching on system request
- vTaskSuspend() in tasks.c

In my case vTaskSuspend() is toggled off with corresponding ifdef. So __exc_pendsv() is the only place where it is used. It seems compiler can't handle a case where function is called from an assembler code. Compiler just does not mark vTaskSwitchContext() as used when compiling port.c, then LTO throws out all unused functions which leads to unresolved external.

I found a pretty recent discussion about the same issue on FreeRTOS support page - http://www.freertos.org/FreeRTOS_Suppor ... 3808j.html
There is no solution yet, but they are discussing adding a macro which will add 'used' attribute in case of gcc compiler

Code: Select all

#define portFORCE_USED  __attribute__(( used ))

victor_pv
Posts: 1235
Joined: Mon Apr 27, 2015 12:12 pm

Re: Compiling with Link Time Optimization enabled

Postby victor_pv » Fri May 19, 2017 4:12 pm

grafalex wrote:
victor_pv wrote:I you you mentioned the change for FreeRTOS was needed because otherwise that function would be left out by the linker. Now, the FreeRTOS files have minimal changes to make it compatible with the core, and nothing related to the function you had to mark as "used". Do you have any idea why that function was left out without that attribute? FreeRTOS is widely used, and if that was a somewhat common problem I think RealTime Engineer would have fixed it already.
Can we cause any problem by adding that attribute, such us getting that function, or even others called by it, included in the resulting binary when not actually needed?
I haven't had time to check what the function is for, but I think it was something related to context switching, so it's probably always needed for FreeRTOS, but that makes me even wonder more why would the linker leave it out.


I just took one more look. The problem happens with vTaskSwitchContext() which is responsible for switching between different tasks/threads. This is essential part of the library, so it always has to be included into the binary. At least I can't imagine a meaningful library usage without this function.

The function gets called from 2 places:
- piece of asm code in __exc_pendsv() function in port.c. This function is a PendSV interrupt handler which actually does context switching on system request
- vTaskSuspend() in tasks.c

In my case vTaskSuspend() is toggled off with corresponding ifdef. So __exc_pendsv() is the only place where it is used. It seems compiler can't handle a case where function is called from an assembler code. Compiler just does not mark vTaskSwitchContext() as used when compiling port.c, then LTO throws out all unused functions which leads to unresolved external.

I found a pretty recent discussion about the same issue on FreeRTOS support page - http://www.freertos.org/FreeRTOS_Suppor ... 3808j.html
There is no solution yet, but they are discussing adding a macro which will add 'used' attribute in case of gcc compiler

Code: Select all

#define portFORCE_USED  __attribute__(( used ))


It may be the case that even without LTO, if vTaskSuspend() is disabled the function is left out.
I dont remember ever looking at enabling/disabling vTaskSuspend() so I guess it's enabled by default and no one experienced the issue yet. Well, if it's a general problem with GCC, and given that's a critical function anyway, I'm fine with adding the used attribute. Can't see any negative to it other than possible code size increase in the unlikely case that's not really needed for some freeRTOS configuration.


Return to “General discussion”

Who is online

Users browsing this forum: Bing [Bot] and 2 guests