Dhrystone and Whetstone Benchmarks for STM32F103

Post here first, or if you can't find a relevant section!
ag123
Posts: 742
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Sun Apr 23, 2017 6:05 am

@victor
the results u've shown are for double precision, for single precision edit whetstone.cpp

Code: Select all

/* default is double precision, define this for single precision */
#undef SINGLE_PRECISION
^^note that this is #undef, change that to

Code: Select all

/* default is double precision, define this for single precision */
#define SINGLE_PRECISION
;)

victor_pv
Posts: 1654
Joined: Mon Apr 27, 2015 12:12 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by victor_pv » Sun Apr 23, 2017 3:03 pm

ag123 wrote:@victor
the results u've shown are for double precision, for single precision edit whetstone.cpp

Code: Select all

/* default is double precision, define this for single precision */
#undef SINGLE_PRECISION
^^note that this is #undef, change that to

Code: Select all

/* default is double precision, define this for single precision */
#define SINGLE_PRECISION
;)
Just noticed when I tried to use the hardware FPU and got the same results... :lol:

Now this is with the same options as my previous post, except using hard FPU with soft calls and single precision:

Code: Select all

Starting test
Beginning Whetstone benchmark at 168 MHz ...
Loops:1000, Iterations:1, Duration:2109.07 millisec
C Converted Single Precision Whetstones:47.41 mflops
Now, those numbers seem much lower than what you posted. To be clear I used the following:
168Mhz
-Os
-mfloat-abi=softfphard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
Changed line 14 of whetstone.cpp:
#undef SINGLE_PRECISION
for
#define SINGLE_PRECISION
Add the assembler code to enable FPU in start_c.c

EDIT: Hard fpu with hard calls and 2000 loops:

Code: Select all

Starting test
Beginning Whetstone benchmark at 168 MHz ...
Loops:2000, Iterations:1, Duration:1664.73 millisec
C Converted Single Precision Whetstones:120.14 mflops
Last edited by victor_pv on Sun Apr 23, 2017 3:27 pm, edited 1 time in total.

ag123
Posts: 742
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Sun Apr 23, 2017 3:27 pm

try -O2 (optimise more) and -O3 (optimise most) and turn debug off :D
note that i compiled with -mfloat-abi=hard when i'm doing the fpu benchmark tests, this probably pushed up the benchmark scores as well
---
oh wow 120 mflops that last result :D

victor_pv
Posts: 1654
Joined: Mon Apr 27, 2015 12:12 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by victor_pv » Sun Apr 23, 2017 3:31 pm

ag123 wrote:try -O2 (optimise more) and -O3 (optimise most) and turn debug off :D
note that i compiled with -mfloat-abi=hard when i'm doing the fpu benchmark tests, this probably pushed up the benchmark scores as well
Well I now am using hardware FPU for sure since the mflops climbed to 120M, but it seems that I am getting a higher result than you did while using the same flags :?

-Os
-Hard fpu with hard calls.
-168Mhz.

How do you turn debug off?
So far I am using code that I can debug with Ozone, so my guess is my debug settings are on.

ag123
Posts: 742
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Sun Apr 23, 2017 3:34 pm

compile without the -g :D

victor_pv
Posts: 1654
Joined: Mon Apr 27, 2015 12:12 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by victor_pv » Sun Apr 23, 2017 4:19 pm

F407
168Mhz
Hard FPU with Hard calls
Single precission
-O0

Code: Select all

Starting test
Beginning Whetstone benchmark at 168 MHz ...
Loops:2000, Iterations:1, Duration:3946.32 millisec
C Converted Single Precision Whetstones:50.68 mflops
Last edited by victor_pv on Sun Apr 23, 2017 4:22 pm, edited 1 time in total.

User avatar
Pito
Posts: 1531
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Sun Apr 23, 2017 4:20 pm

-Os, no -g, eabi-4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, 168MHz clock
old repo, classic source in 1 file.

Code: Select all

Loops: 1000Iterations: 10Duration: 9984 millisec.   1677419955 clocks
C Converted Single Precision Whetstones: 100.16 MIPS
Last edited by Pito on Sun Apr 23, 2017 5:30 pm, edited 3 times in total.
Pukao Hats Cleaning Services Ltd.

victor_pv
Posts: 1654
Joined: Mon Apr 27, 2015 12:12 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by victor_pv » Sun Apr 23, 2017 4:24 pm

Pito wrote:-Os, no -g, eabi-4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, 168MHz clock

Code: Select all

Loops: 1000Iterations: 10Duration: 9985 millisec.   1677420928 clocks
C Converted Single Precision Whetstones: 100.15 MIPS
HA!
I beat you by 20Mflops :D

EDITED
-Os, no -g, eabi-4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, 168MHz clock
Also changed to 1000 loops and 10 itrerations to better match Pito.

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...
Loops:1000, Iterations:10, Duration:9420.46 millisec
C Converted Single Precision Whetstones:106.15 mflops
But I noticed your output shows MIPS rather that mflops, are you using the same code?

Now, all same as above, except using CCMRAM for .bss area (not initialized variables)

Code: Select all

Starting test
Beginning Whetstone benchmark at 168 MHz ...
Loops:1000, Iterations:10, Duration:8634.16 millisec
C Converted Single Precision Whetstones:115.82 mflops
Gained 9Mflops using FPU. I need to test the same with softFP (bss in sram vs CCM ram), it seems there are good gains from using CCM as much as possible.
Last edited by victor_pv on Sun Apr 23, 2017 4:47 pm, edited 1 time in total.

User avatar
Pito
Posts: 1531
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Sun Apr 23, 2017 4:32 pm

-Os, no -g, eabi-4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, 240MHz clock, Black 407ZET

Code: Select all

Loops: 1000Iterations: 10Duration: 6988 millisec.   1677248483 clocks
C Converted Single Precision Whetstones: 143.10 MIPS
Using the standard code. Indicate your eabi-arm version, plz (add the version to your above results).
8-)

With Whetstone from ag123 I get:
-Os, no -g, eabi-4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, 168MHz clock, Black 407ZET

Code: Select all

press 'g' to start
Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:10, Duration:9634.95 millisec
C Converted Single Precision Whetstones:103.79 mflops
There are 3 major diff against ag123 version:
1. mine measures time in millis(), and
2. mine is in 1 single file, while ag123 is split in 3. So splitting the sources makes the difference in timing.. :?
3. also most probably you use the new improved F407 repo - I still work with the 1y old one, so another source of lower performance (those 2.5mips diff).
Pukao Hats Cleaning Services Ltd.

victor_pv
Posts: 1654
Joined: Mon Apr 27, 2015 12:12 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by victor_pv » Sun Apr 23, 2017 5:26 pm

Pito wrote:-Os, no -g, eabi-4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, 240MHz clock, Black 407ZET

Code: Select all

Loops: 1000Iterations: 10Duration: 6988 millisec.   1677248483 clocks
C Converted Single Precision Whetstones: 143.10 MIPS
Using the standard code. Indicate your eabi-arm version, plz (add the version to your above results).
8-)

With Whetstone from ag123 I get:
-Os, no -g, eabi-4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, 168MHz clock, Black 407ZET

Code: Select all

press 'g' to start
Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:10, Duration:9634.95 millisec
C Converted Single Precision Whetstones:103.79 mflops
There are 2 major diff against ag123 version - mine measures time in millis(), and mine is in 1 single file, while ag123 is split in 3. So splitting the sources makes the difference in timing.. :?
Also you most probably you use the new improved F407 repo - I still work with the 1y old one, so another source of lower performance.
Splitting the source definitely has some effects on how the linker does things, we have seen it before.
I am using my 407 based on the latest libmaple. Has differences, but nothing related to speed, except I use CCM RAM for pinmap. When moving non-initialized variables to CCM still gains more mflops, I updated my post above.

Post Reply