Dhrystone and Whetstone Benchmarks for STM32F103

Post here first, or if you can't find a relevant section!
ag123
Posts: 798
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Tue Apr 18, 2017 4:29 pm

here you go STM32F103 maple mini
again this is based on -O2 optimizations, debug flags off

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 3000000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 9.21
Dhrystones per Second: 108569.65
VAX MIPS rating = 61.79
note that the default optimization in platforms.txt is apparantly -Os (optimise for size)
recompile with -Os optimization, debug off

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 3000000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 11.97
Dhrystones per Second: 83540.53
VAX MIPS rating = 47.55
i've updated my prior post on the f407 benchmark with -Os optimization as well, apparently the lower VAX MIPS readings is simply due to the -Os optimization flags

if you are not using eclipse etc, i'd suggest you may like to update plaforms.txt for this experiment, backup a copy just in case of mistakes
what one could do is dependent on flash availability, one could literally choose -O2 when compiling the sketches and get a higher performance
i'd guess this won't impact sram per se as the mcu can execute codes directly off flash
Last edited by ag123 on Tue Apr 18, 2017 4:50 pm, edited 6 times in total.

User avatar
Pito
Posts: 1627
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Tue Apr 18, 2017 4:36 pm

48-49 with F103 -Os is something what was measured in past, so it seems to be ok.
The 48/62=1.29 with F103, and 160/232=1.45 with F407 is a difference, though..
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 798
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Tue Apr 18, 2017 4:54 pm

my guess is that on the f4, the sets of hardware (e.g. memory controllers etc) seem to be able to feed the cpu pipeline more effectively
http://infocenter.arm.com/help/index.js ... GJICF.html
^^ if as this link suggest f3 and f4 would probably have a similar pipe line, but what is notable is that the cpu pipeline is capable of dual instruction fetch
simply unrolling the loops (e.g. -O2) and having a possibly better memory controller internally may account for the jump in performance, in effect on the fetch side, it may literally be doubled just for the fetch if the loops are unrolled

i've posted my codes in the initial post on the benchmarks (in the attached zip file)
http://www.stm32duino.com/viewtopic.php ... 110#p26557

ag123
Posts: 798
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Tue Apr 18, 2017 5:17 pm

-O3 optimization for f407vet6

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 3000000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 2.16
Dhrystones per Second: 462753.58
VAX MIPS rating = 263.38
this is quite extreme if you consider stm32f407ve is a mere 168 mhz, puts it in the 'old' pentium 200mhz class :D

stm32f103 maple mini, -O3 optimization

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 3000000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 8.11
Dhrystones per Second: 123286.01
VAX MIPS rating = 70.17
almost 1 mips per clock! :D

but i'm half wondering if gcc 'cheated' and deleted some codes that probably have no references to them :lol:

User avatar
Pito
Posts: 1627
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Tue Apr 18, 2017 5:49 pm

Try to overclock the 407. It may work till 250MHz.. I think the 240MHz is such you get the 48MHz for USB..
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 798
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Tue Apr 18, 2017 5:52 pm

i'm not sure if liquid nitrogen is needed :lol:

oh, btw how do i do that, can i simply change F_CPU=168000000L? oh wait a minute, would usb still work it is usb-serial?
surprisingly, F_CPU don't seem to be referenced in c/c++ source codes

ok give up for now, needs to study the rcc codes :lol:
Last edited by ag123 on Tue Apr 18, 2017 6:04 pm, edited 1 time in total.

victor_pv
Posts: 1745
Joined: Mon Apr 27, 2015 12:12 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by victor_pv » Tue Apr 18, 2017 6:01 pm

ag123 wrote:i'm not sure if liquid nitrogen is needed :lol:

oh, btw how do i do that, can i simply change F_CPU=168000000L? oh wait a minute, would usb still work it is usb-serial? :lol:
surprisingly, F_CPU don't seem to be referenced in c/c++ source codes

meanwhile, coming up ...
To overclock you will need to adjust the PLL multiplier (raises the base clock for everything) and then adjust the USB divider, so the USB peripheral ends up having a 48Mhz clock as Pito said.
I haven't done it in the F4 so sadly can't point you to what files to check, but there is at least 1 thread about overclocking the F1, that should give some pointers.

About the memory controller, I understand the F1 just has a prefectch buffer, while the F407 has a largest buffer + branch prediction or other perks that increase the speed further.
Now the flash itself is just as fast, which means in the F4 it has to run with more wait states. If the prefetch and the rest are optimal it will be fast, but it they can't predict what will be needed correctly (interrupts comes to mind, they are unpredictable), it will insert a few wait states while fetching from the new address.
Perhaps the compiler is resulting in a perfectly optimal stream of instructions to take advantage of it all?

ag123
Posts: 798
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Tue Apr 18, 2017 6:07 pm

thanks victor

found various codes in STM32F4/cores/maple/libmaple/rccF4.c
would examine how those pll variables are defined

well, interrupts certainly matter, but i'd guess this is mainly just a casual test to see what is practically achievable (e.g. without needing to over clock) and just how 'fast' it is :D

the notion that f4 is reaching those old pentium speeds e.g. the first pentiums 133 mhz lets us guess what would be the good applications to run on the platform. i remembered that when those pentiums 133 mhz are around, windows 95 is just about the common desktop os back then
hence the f4 is a mini pc of that 'class', just that it doesn't do mmu and anyway the internal sram is pretty limited assuming that we'd not bother with external sram or sdram as those would be 'hard to connect' given the number of parallel pins to setup and possibly costly as well.

these mcus are effectively a 'computer on a chip' and it has reach those old desktop performance just that with very limited ram
Last edited by ag123 on Tue Apr 18, 2017 6:22 pm, edited 2 times in total.

User avatar
Pito
Posts: 1627
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Tue Apr 18, 2017 6:21 pm

Open the CubeMX configurator for 407 and set the clocks. You have to change about 4 dividers around PLL.
8MHz /4(PLLM) *240(PLLN) /2(PLLP) this gives you 240MHz clock.
Then set USB divider to 10(PLLQ).
407 240MHz.JPG
407 240MHz.JPG (91.16 KiB) Viewed 173 times
Last edited by Pito on Tue Apr 18, 2017 6:32 pm, edited 3 times in total.
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 798
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Tue Apr 18, 2017 6:25 pm

thanks pito :D
oops i'd need to setup cube-mx, i'm running somewhat low on disk space, i'd guess i'd try reading up rm0009 and looking at the source codes first

but the suggestion of 240mhz, makes me wonder if i'd need to standby a fire extinguisher, i possbly need to see what is the lower 'safe' speed bump :lol:

instead of usb, i may try with a uart so that keeping the usb working is less of a problem

oh, i think those -O flags are a 'good thing', it is 'free' performance gains so long as the app can live in flash, we don't necessarily need to limit ourselves to the -Os defaults. even maple mini and various blue pills have 128k flash, that's plenty for small apps
Last edited by ag123 on Tue Apr 18, 2017 6:38 pm, edited 3 times in total.

Post Reply