Dhrystone and Whetstone Benchmarks for STM32F103

Post here first, or if you can't find a relevant section!
User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Tue Apr 18, 2017 6:29 pm

makes me wonder if i'd need to standby a fire extinguisher
You will notice no temperature change (maybe 4degC). It seems the overclocking is not your hobby :)
You must not install CubeMX, just follow the picture and hints above :)
When everything properly set you have to see ~374 VAXMIPS (-O3).. EDIT: :lol:
About the memory controller, I understand the F1 just has a prefectch buffer, while the F407 has a largest buffer + branch prediction or other perks that increase the speed further.
The F4 includes "ART" accelerator, which claims to get the 5 flash waitstates down to 0. That maybe makes the F4 more efficient when optimizing with -O3..
Last edited by Pito on Tue Apr 18, 2017 11:05 pm, edited 1 time in total.
Pukao Hats Cleaning Services Ltd.

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Tue Apr 18, 2017 9:34 pm

My overclocking results for Black F407ZET:

-Os, -g, eabi-4.8.3-2014q1, 240MHz clock

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 3000000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 2.50
Dhrystones per Second: 399236.18
VAX MIPS rating = 227.23
Note: for reference - 159 VAXMIPS @168MHz with above settings

-O2, -g, eabi-4.8.3-2014q1, 240MHz clock

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 3000000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 1.43
Dhrystones per Second: 699455.10
VAX MIPS rating = 398.10
-O3, -g, eabi-4.8.3-2014q1, 240MHz clock

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 3000000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 1.23
Dhrystones per Second: 813388.81
VAX MIPS rating = 462.94
Thinking what went wrong here.. :) Orbital resonance??
Last edited by Pito on Tue Apr 18, 2017 11:07 pm, edited 2 times in total.
Pukao Hats Cleaning Services Ltd.

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Tue Apr 18, 2017 10:01 pm

Whetstone tells a different story (Black F407ZET):

-Os, -g, eabi-4.8.3-2014q1, 240MHz clock

Code: Select all

Loops: 1000Iterations: 1Duration: 4375 millisec.
C Converted Double Precision Whetstones: 22.86 MIPS
Note: for reference - 16.0 MIPS @168MHz with above settings

-O2, -g, eabi-4.8.3-2014q1, 240MHz clock

Code: Select all

Loops: 1000Iterations: 1Duration: 2770 millisec.
C Converted Double Precision Whetstones: 36.10 MIPS
-O3, -g, eabi-4.8.3-2014q1, 240MHz clock

Code: Select all

Loops: 1000Iterations: 1Duration: 2665 millisec.
C Converted Double Precision Whetstones: 37.52 MIPS
DHRY and WHET on STM32F407.JPG
DHRY and WHET on STM32F407.JPG (29.6 KiB) Viewed 346 times
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 2:52 am

wow :D

on a side note:
http://www.st.com/resource/en/datasheet/stm32f407ve.pdf
2.2.2 Adaptive real-time memory accelerator (ART AcceleratorTM)
The ART AcceleratorTM is a memory accelerator which is optimized for STM32 industry-
standard ARM® Cortex®-M4 with FPU processors. It balances the inherent performance
advantage of the ARM Cortex-M4 with FPU over Flash memory technologies, which
normally requires the processor to wait for the Flash memory at higher frequencies.
To release the processor full 210 DMIPS performance at this frequency, the accelerator
implements an instruction prefetch queue and branch cache, which increases program
execution speed from the 128-bit Flash memory. Based on CoreMark benchmark, the
performance achieved thanks to the ART accelerator is equivalent to 0 wait state program
execution from Flash memory at a CPU frequency up to 168 MHz.
and oh our tests with -O3 1 VAX MIPS on stm32f103
http://www.stm32duino.com/viewtopic.php ... 120#p26581
also puts the maple mini / blue pill at 70x DMIPS than the much bulkier VAX-11/780 or IBM System/360 :D
https://en.wikipedia.org/wiki/VAX
For a while the VAX-11/780 was used as a standard in CPU benchmarks. It was initially described as a one-MIPS machine, because its performance was equivalent to an IBM System/360 that ran at one MIPS

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Wed Apr 19, 2017 8:37 am

FYI - the Single Precision Whetstone on the Black F407ZET at 240MHz clock:
-Os

Code: Select all

Loops: 1000Iterations: 10Duration: 28744 millisec.
C Converted Single Precision Whetstones: 34.79 MIPS
-O3

Code: Select all

Loops: 1000Iterations: 10Duration: 17315 millisec.
C Converted Single Precision Whetstones: 57.75 MIPS
The Next step: switch on FPU (not easy, btw.)
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 11:27 am

i'm getting various compile troubles too :lol:

Code: Select all

failed to merge target specific data of file /opt4/opt/gcc-arm-none-eabi-6-2017-q1-update/bin/../lib/gcc/arm-none-eabi/6.3.1/thumb/v7e-m/fpv4-sp/hard/libgcc.a(_udivmoddi4.o)
seemed like gcc is trying to link to a armv7e-m lib but has problems doing so, i guess no worries we can always circle back again once the perfect flag or solution is found :D

googled and found a link on "stack overflow" might be useful
http://stackoverflow.com/questions/1676 ... t-behavior

oh, if i do a 'clean' in eclipse my errors seem to go away ... :P
Attachments
ardshrink2.png
fp-hard settings in eclipse
ardshrink2.png (46.5 KiB) Viewed 280 times
Last edited by ag123 on Wed Apr 19, 2017 2:18 pm, edited 1 time in total.

stevestrong
Posts: 1748
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by stevestrong » Wed Apr 19, 2017 11:54 am

Pito wrote:The Next step: switch on FPU (not easy, btw.)
Add these flags to build:

Code: Select all

-mfloat-abi=hard -mfpu=fpv4-sp-d16

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 1:10 pm

stm32f407vet, compile -Os (optimise size), no debug
software floating point, double precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:10356.75 millisec
C Converted Double Precision Whetstones:9.66 mflops
surprisingly slow :lol:

compile -O2 (optimise more), no debug
software floating point, double precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:7811.58 millisec
C Converted Double Precision Whetstones:12.80 mflops
compile -O3 (optimise most), no debug
software floating point, double precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:7801.82 millisec
C Converted Double Precision Whetstones:12.82 mflops
harware doubles
if -mfloat-abi=hard -mfpu=fpv4-sp-d16 is selected, sketch seem to 'hang' :o leave debug for another day

single precision:
compile -Os (optimise size), no debug
software floating point, single precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:7033.78 millisec
C Converted Single Precision Whetstones:14.22 mflops
compile -O2 (optimise more), no debug
software floating point, single precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...                      
                                                                  
Loops:1000, Iterations:1, Duration:5449.36 millisec               
C Converted Single Precision Whetstones:18.35 mflops
compile -O3 (optimise most), no debug
software floating point, single precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:5474.52 millisec
C Converted Single Precision Whetstones:18.27 mflops
-O2 and -O3 has little difference (or in fact -O3 is slower), not sure if it is a finger error :lol:

---
- uploaded whetstone.zip (attachment)
usage:
connect to usb-serial console, press 'g' to start

fixed
- v0.5 fixes for single precision, call single precision math lib functions instead of double precision functions
added codes to detect if software floating point or hardware floating point is selected
- v0.4 added fpu enabling codes
- v0.3 fixed a dangling double makes single precision work longer than expected
- v0.2 fixed some syntax errors
- v0.1 uploaded whetstone.zip (attachment)
Attachments
whetstone.zip
whetstone
(4.45 KiB) Downloaded 7 times
Last edited by ag123 on Wed Apr 19, 2017 9:19 pm, edited 13 times in total.

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Wed Apr 19, 2017 1:27 pm

stevestrong wrote:
Pito wrote:The Next step: switch on FPU (not easy, btw.)
Add these flags to build:

Code: Select all

-mfloat-abi=hard -mfpu=fpv4-sp-d16
I messed with those flags (and with soft/softfp, etc.) till 5.20am today.. Not so easy.. :(
That works in CMSIS environment only (we do not have it here). There is a lot of stuff to be done.
1. you must include "arm_math.h" from CMSIS (and you have to have all the FPU CMSIS stuff installed)
2. you must enable the lib and add into the makefile
3. you must enable FPU (a few lines of asm when not using systeminit from CMSIS)
4. add the flags and some -Dswitches..
I will start a separate topic for the FPU exercise.
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 2:03 pm

hardware float hangs :o , leave debug for another day, think pito is right, not easy (at least not 'automatic') :lol: - need codes to enable the fpu

stm32f407vet, compile -Os (optimise size), no debug
hardware floating point, single precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:1558.46 millisec
C Converted Single Precision Whetstones:64.17 mflops
compile -O2 (optimise more), no debug
hardware floating point, single precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:1211.56 millisec
C Converted Single Precision Whetstones:82.54 mflops
compile -O3 (optimise most), no debug
hardware floating point, single precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:1123.15 millisec
C Converted Single Precision Whetstones:89.04 mflops
stm32f407vet, compile -Os (optimise size), no debug
hardware floating point, double precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:10394.72 millisec
C Converted Double Precision Whetstones:9.62 mflops
compile -O2 (optimise more), no debug
hardware floating point, double precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:7830.20 millisec
C Converted Double Precision Whetstones:12.77 mflops
compile -O3 (optimise most), no debug
hardware floating point, double precision

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:1, Duration:7892.63 millisec
C Converted Double Precision Whetstones:12.67 mflops
Last edited by ag123 on Wed Apr 19, 2017 8:05 pm, edited 11 times in total.

Post Reply