PR to add optimisation menu (to all boards individually)

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

PR to add optimisation menu (to all boards individually)

Post by RogerClark » Wed Jul 19, 2017 11:09 pm

There is a PR to add an optimisation menu

https://github.com/rogerclarkmelbourne/ ... 2/pull/313

e.g. it adds this to every board

Code: Select all

mapleMini.menu.opt.o2std=Faster
mapleMini.menu.opt.o2std.build.flags.optimize=-O2
mapleMini.menu.opt.o2std.build.flags.ldspecs=
mapleMini.menu.opt.o2lto=Faster with LTO
mapleMini.menu.opt.o2lto.build.flags.optimize=-O2 -flto
mapleMini.menu.opt.o2lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o1std=Fast
mapleMini.menu.opt.o1std.build.flags.optimize=-O1
mapleMini.menu.opt.o1std.build.flags.ldspecs=
mapleMini.menu.opt.o1lto=Fast with LTO
mapleMini.menu.opt.o1lto.build.flags.optimize=-O1 -flto
mapleMini.menu.opt.o1lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o3std=Fastest
mapleMini.menu.opt.o3std.build.flags.optimize=-O3
mapleMini.menu.opt.o3std.build.flags.ldspecs=
mapleMini.menu.opt.o3lto=Fastest with LTO
mapleMini.menu.opt.o3lto.build.flags.optimize=-O3 -flto
mapleMini.menu.opt.o3lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.ogstd=Debug
mapleMini.menu.opt.ogstd.build.flags.optimize=-Og
mapleMini.menu.opt.ogstd.build.flags.ldspecs=
mapleMini.menu.opt.oglto=Debug with LTO
mapleMini.menu.opt.oglto.build.flags.optimize=-Og -flto
mapleMini.menu.opt.oglto.build.flags.ldspecs=-flto
mapleMini.menu.opt.osstd=Smallest Code
mapleMini.menu.opt.osstd.build.flags.optimize=-Os
mapleMini.menu.opt.osstd.build.flags.ldspecs=
mapleMini.menu.opt.oslto=Smallest Code with LTO
mapleMini.menu.opt.oslto.build.flags.optimize=-Os -flto
mapleMini.menu.opt.oslto.build.flags.ldspecs=-flto
It also adds Link Time Optimisation

Although I like the idea in general, it does add a large amount of bulk to boards.txt (because the way the IDE works - as the same menu options need to be defined individually for each board :-( )

stevestrong
Posts: 1813
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany

Re: PR to add optimisation menu (to all boards individually)

Post by stevestrong » Thu Jul 20, 2017 9:29 am

I think this bloat is not going to solve any problem.

If -flto brings generally an improvement on code size without affecting execution speed, than I think it would be wise to integrate it to platform.txt (assuming that nothing else breaks).
If execution speed suffers, then I don't welcome it. The reason why many people uses this chip is because of its speed, which must be kept.
And there is enough room in flash for most of user applications, the max 5% saving will not bring too much, except some time saving at upload time.

Regarding speed optimization, I think the -Os gives an overall good performance for most apps, unless one has special cases (liveOV7670).
Where necessary, the speed can be eventually increased by using special coding style which would then bring the same advantage as -O2 or 3 while keeping the general setting of -Os in place.
An alternative solution is to locally change the optimization level:

Code: Select all

#pragma GCC push_options
#pragma GCC optimize ("O0")

your code

#pragma GCC pop_options
or for particular functions:

Code: Select all

void __attribute__((optimize("O0"))) foo(unsigned char data) {
    // unmodifiable compiler code
}
also see here: https://stackoverflow.com/a/2220565

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: PR to add optimisation menu (to all boards individually)

Post by RogerClark » Thu Jul 20, 2017 10:50 am

Thanks Steve

I think -O2 may be faster than -Os, as that works for LiveOV7670, but I already posted about using -O2 and the consensus was that we should stay with -Os

mtiutiu
Posts: 1
Joined: Fri Jul 21, 2017 7:02 am

Re: PR to add optimisation menu (to all boards individually)

Post by mtiutiu » Fri Jul 21, 2017 7:19 am

stevestrong wrote:
Thu Jul 20, 2017 9:29 am
I think this bloat is not going to solve any problem.

If -flto brings generally an improvement on code size without affecting execution speed, than I think it would be wise to integrate it to platform.txt (assuming that nothing else breaks).
If execution speed suffers, then I don't welcome it. The reason why many people uses this chip is because of its speed, which must be kept.
And there is enough room in flash for most of user applications, the max 5% saving will not bring too much, except some time saving at upload time.

Regarding speed optimization, I think the -Os gives an overall good performance for most apps, unless one has special cases (liveOV7670).
Where necessary, the speed can be eventually increased by using special coding style which would then bring the same advantage as -O2 or 3 while keeping the general setting of -Os in place.
An alternative solution is to locally change the optimization level:

Code: Select all

#pragma GCC push_options
#pragma GCC optimize ("O0")

your code

#pragma GCC pop_options
or for particular functions:

Code: Select all

void __attribute__((optimize("O0"))) foo(unsigned char data) {
    // unmodifiable compiler code
}
also see here: https://stackoverflow.com/a/2220565
It is a little bloat in there indeed and I know that the DRY principle wasn't respected but this is because I don't know of other way to specify those menu options in one place and then "include" them whenever/wherever needed. Maybe the Arduino IDE interpolation system doesn't know such a thing. Anyways this is not the main point here. I consider it as being a great addition because it allows me to test very quick various compile time options and watch then the real benefit - if any - on the real hardware. So instead of having those compile time options "hardcoded/fixed" in platform.txt I rather prefer to have a menu to switch between when needed. And after all users have the option to use the "standard" compile time options which were available before in platform.txt. This is NOT going to enforce you to use LTO if you don't want it.

Yes it adds more text to the boards.txt files but I think this is due to the limitations that the Arduino IDE has in regards to that so in the end I had to repeat the same block of text for each board. If anyone knows a better way of doing it then please let me know - I'm open to discussions.

I know about pragma's but as far as I know that applies to the file where is specified only and not to the arduino core and other libraries used in the project - what if I want to optimize those too at compile time and not only the curent c/c++ file? Or maybe I'm wrong and I don't understand the pragma compiler directive entirely?

I did some speed tests on stm32 real hardware and between -Os and -O2 there was a 50% increase in terms of speed - not all the time this matters that's true - it matters how well the code is written/optimized too.

User avatar
zoomx
Posts: 540
Joined: Mon Apr 27, 2015 2:28 pm
Location: Mt.Etna, Italy

Re: PR to add optimisation menu (to all boards individually)

Post by zoomx » Mon Jul 24, 2017 1:34 pm

https://gcc.gnu.org/onlinedocs/gcc/Func ... agmas.html

Code: Select all

#pragma GCC optimize ("string"...)
This pragma allows you to set global optimization options for functions defined later in the source file. One or more strings can be specified. Each function that is defined after this point is as if attribute((optimize("STRING"))) was specified for that function. The parenthesis around the options is optional. See Function Attributes, for more information about the optimize attribute and the attribute syntax.
It seems that the results depend on which position the main sketch will be after the preprocessor.

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: PR to add optimisation menu (to all boards individually)

Post by RogerClark » Mon Jul 24, 2017 10:21 pm

#pragma GCC optimize looks really interesting

We already have at least one example that needs -O2 , which at the moment requires changes to platform.txt

The proem with the menu system, is that if someone opened the "Ov7670 live" demo which needs -O2 it would not automatically select -O2, but the pragma would fix this situation

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: PR to add optimisation menu (to all boards individually)

Post by RogerClark » Sun Jul 30, 2017 12:17 am

Unfortunatly, looking at the gcc reference page

https://gcc.gnu.org/onlinedocs/gcc/Func ... agmas.html

It says
The #pragma GCC target pragma is presently implemented for x86, PowerPC, and Nios II targets only.
which implies its not available for ARM :-(

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: PR to add optimisation menu (to all boards individually)

Post by RogerClark » Sun Jul 30, 2017 12:44 am

I tested that #pragma on the LiveOV7670 coded which only runs correctly with -O2 optimisation, and it didn't seem to work.

I tried adding the pragma to various headers and also directly into the core code which needs this optimisation and it made no difference.

So at the moment using #pragma gcc is not an option..

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: PR to add optimisation menu (to all boards individually)

Post by RogerClark » Sun Jul 30, 2017 1:08 am

Back to the original purpose of this thread

I've tried the PR and I actually quite like it... However it will need some changes

I ran the graphics test which draws lines and fills and text etc onto the ILI9341 display and compared our current optimisation of -Os (this generally means optimise for size, at optimisation level -O2)

-Os code and RAM sizes were
Sketch uses 30292 bytes (46%) of program storage space. Maximum is 65536 bytes.
Global variables use 3728 bytes (18%) of dynamic memory

-O3 & LTO code and RAM sizes where
Sketch uses 32720 bytes (49%) of program storage space. Maximum is 65536 bytes.
Global variables use 3704 bytes (18%) of dynamic memory

Which is what would be expected as the code size increased by 2k (around 8%)

The speed test was also interesting and does what would be expected, with noticeable gains on some places
Operation -Os optimisation-O3 & LTO optimisation__Improvement %
ScreenFill 170789 170739 0.02927589
Text 39905 31275 21.62636261
Lines 228371 169259 25.88419721
Horiz/VertLines 15736 15031 4.480172852
Rectangles(outline) 11469 10496 8.483738774
Rectangles(filled) 355032 354789 0.068444535
Circles (filled) 140210 106118 24.31495614
Circles(outline) 154955 120331 22.34455164
Triangles(outline) 57983 40706 29.79666454
Triangles(filled) 164206 146417 10.83334348
Rounded rects(outline) 54528 42923 21.28264378
Rounded rects(filled) 414590 403744 2.616078535
Text, Lines, Circles and rounded rects are all considerably faster, which "Triangles (outline)" being almost 30% faster

I think the guys using the PigOScope (and derivatives) may find this speed increase quite useful, assuming it doesnt break anything else.

The only problem I see with this PR is that it changes the default optimisation to "Faster", which is

.menu.opt.o2std=Faster
.menu.opt.o2std.build.flags.optimize=-O2


I'd like to add this, but to stop it potentially breaking existing code the optimisation needs to be set to -Os by default.

I think it would also be better if the amount of optimisation increased in the lower menu options

So probably would go.

Smallest
Smallest + LTO
Fast
Fast+LTO
Faster
Faster+LTO
Fastest
Fastest+LTO

and then have debug as the last option as I'm not entirely sure who would use this or whether we should include it at all as the IDE does not have any debugging capabilities and this option would only be useful to people using the repo in another IDE, which may not support the Menu options at all.

Unfortunately I don't have time, at the moment, to go through and change the order of all of these entries in board.txt

https://github.com/mtiutiu/Arduino_STM3 ... boards.txt

I will see if the OP is willing to change this, or perhaps someone else with time on their hands could do it?

PS. I guess it could be done by taking one section into a separate editor window

Code: Select all

#-- Optimizations
mapleMini.menu.opt.o2std=Faster
mapleMini.menu.opt.o2std.build.flags.optimize=-O2
mapleMini.menu.opt.o2std.build.flags.ldspecs=
mapleMini.menu.opt.o2lto=Faster with LTO
mapleMini.menu.opt.o2lto.build.flags.optimize=-O2 -flto
mapleMini.menu.opt.o2lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o1std=Fast
mapleMini.menu.opt.o1std.build.flags.optimize=-O1
mapleMini.menu.opt.o1std.build.flags.ldspecs=
mapleMini.menu.opt.o1lto=Fast with LTO
mapleMini.menu.opt.o1lto.build.flags.optimize=-O1 -flto
mapleMini.menu.opt.o1lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.o3std=Fastest
mapleMini.menu.opt.o3std.build.flags.optimize=-O3
mapleMini.menu.opt.o3std.build.flags.ldspecs=
mapleMini.menu.opt.o3lto=Fastest with LTO
mapleMini.menu.opt.o3lto.build.flags.optimize=-O3 -flto
mapleMini.menu.opt.o3lto.build.flags.ldspecs=-flto
mapleMini.menu.opt.ogstd=Debug
mapleMini.menu.opt.ogstd.build.flags.optimize=-Og
mapleMini.menu.opt.ogstd.build.flags.ldspecs=
mapleMini.menu.opt.oglto=Debug with LTO
mapleMini.menu.opt.oglto.build.flags.optimize=-Og -flto
mapleMini.menu.opt.oglto.build.flags.ldspecs=-flto
mapleMini.menu.opt.osstd=Smallest Code
mapleMini.menu.opt.osstd.build.flags.optimize=-Os
mapleMini.menu.opt.osstd.build.flags.ldspecs=
mapleMini.menu.opt.oslto=Smallest Code with LTO
mapleMini.menu.opt.oslto.build.flags.optimize=-Os -flto
mapleMini.menu.opt.oslto.build.flags.ldspecs=-flto
re-ordering it

Paste back into boards.txt to replace the exist section

The search and replace "mapleMini" to another board e.g. "maple" then past to the maple section,
Then undo, the search and replace and re-do for the next board etc etc etc

Still quite time consuming :-( and unfortunately I need to fix the leaking roof above my home office :-(

Edit.

I've noticed the -O3 with LTO seems to cause some warnings to be displayed during compilation,

e.g.
D:\Documents\Arduino\hardware\Arduino_STM32\STM32F1\system/libmaple/include/libmaple/timer.h:150:18: warning: type of 'timer4' does not match original declaration [enabled by default]

extern timer_dev timer4;

D:\Documents\Arduino\hardware\Arduino_STM32\STM32F1\cores\maple\libmaple\timer.c:68:11: note: previously declared here

timer_dev timer4 = GENERAL_TIMER(4);

^
same for all timers and also for the ADC

I can't see why these warnings are being generated.

User avatar
RogerClark
Posts: 7440
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: PR to add optimisation menu (to all boards individually)

Post by RogerClark » Sun Jul 30, 2017 5:53 am

I'll see if I can script this change myself...

Edit.

I've committed changes for the F1 but I don't have time to do the F3 and F4 at the moment.

Post Reply