Dhrystone and Whetstone Benchmarks for STM32F103

Post here first, or if you can't find a relevant section!
User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Wed Apr 19, 2017 4:34 pm

@ag123: I doubt your single precision results are ok. Double check your source. It seems to me you still do double.
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 4:54 pm

thanks, would take a look, thanks for posting about the fpu
http://www.stm32duino.com/viewtopic.php?f=39&t=2001
i'd think we can circle back on fpu another time

oh the whetstone code has a dangling double local variable in one subroutine, fixed, attached a few posts back

surprisingly the -O2 and -O3 optimization results remained unchanged
the -Os (optimise size) improved
Last edited by ag123 on Wed Apr 19, 2017 5:26 pm, edited 4 times in total.

Bingo600
Posts: 9
Joined: Thu Mar 16, 2017 5:25 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Bingo600 » Wed Apr 19, 2017 4:54 pm

@ag123

Did you see ansver 3 from above
you must enable FPU (a few lines of asm when not using systeminit from CMSIS)

Example
https://www.mikrocontroller.net/topic/261021#3959896

Doc
http://www.st.com/resource/en/applicati ... 047230.pdf


Some more info about hw-float
https://visualgdb.com/tutorials/arm/stm32/fpu/


/Bingo

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 5:43 pm

thanks @Bingo600 and @pito
the first benchmarks on hardware fpu, updating a couple of posts back ... :D

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Wed Apr 19, 2017 6:48 pm

@ag123: you still do double precision :) Double check your source.. HINT: sin() and sinf() is not the same..
There is a special thread on the FPU. The FPU Enable functions are already there, but it is not enough, it seems.
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 7:03 pm

thanks pito
would check it again

the enable fpu codes is as follows
http://infocenter.arm.com/help/topic/co ... BJHIG.html

Code: Select all

//enable the fpu (cortex-m4 - stm32f4* and above)
void enablefpu()
{
  __asm volatile
  (
    "  ldr.w r0, =0xE000ED88    \n"  /* The FPU enable bits are in the CPACR. */
    "  ldr r1, [r0]             \n"  /* read CAPCR */
    "  orr r1, r1, #( 0xf << 20 )\n" /* Set bits 20-23 to enable CP10 and CP11 coprocessors */
    "  str r1, [r0]              \n" /* Write back the modified value to the CPACR */
    "  dsb                       \n" /* wait for store to complete */
    "  isb"                          /* reset pipeline now the FPU is enabled */

  );
}
updated the results for -O2 and -O3 optimizations, apparently it didn't seem very different between -O2 and -O3

i've added the above codes in whetstone.zip a couple of posts back, continuing to work on the single precision codes

i'm a little suspecting that i'm probably linking against libraries that use software floating point, hence it is likely that if those are substituted with the CMSIS hard-fp math libraries there may be significant improvement

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Wed Apr 19, 2017 7:47 pm

ok fixed single precision to call single precision math lib function, updated the hardware float results, now the single precision floating point performance looks much better :D

hardware floating point additional tests with -fsingle-precision-constant compilation flag

stm32f407vet
compiled optimization -Os (optimise size), no debug, -fsingle-precision-constant
hardware floating point, single precision

Code: Select all

Loops:1000, Iterations:1, Duration:1111.43 millisec
C Converted Single Precision Whetstones:89.97 mflops
compiled optimization -O2 (optimise more), no debug, -fsingle-precision-constant
hardware floating point, single precision, 1000 loops, 1 iteration

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...
Insufficient duration- Increase the LOOP count
:o
try again compiled optimization -O2 (optimise more), no debug, -fsingle-precision-constant
hardware floating point, single precision, 1000 loops, 5 iterations

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:5, Duration:3296.84 millisec
C Converted Single Precision Whetstones:151.66 mflops
try again compiled optimization -O3 (optimise most), no debug, -fsingle-precision-constant
hardware floating point, single precision, 1000 loops, 5 iterations

Code: Select all

Beginning Whetstone benchmark at 168 MHz ...

Loops:1000, Iterations:5, Duration:3588.02 millisec
C Converted Single Precision Whetstones:139.35 mflops

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by Pito » Thu Apr 20, 2017 7:31 am

-O0, -g, 4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, Black F407ZET at 168MHz

Code: Select all

Loops: 1000Iterations: 10Duration: 19638 millisec.
C Converted Single Precision Whetstones: 50.92 MIPS
It seems Dhrystone and Whetstone should be run with -O0 in order to avoid optimization..
Last edited by Pito on Thu Apr 20, 2017 8:41 am, edited 2 times in total.
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 770
Joined: Thu Jul 21, 2016 4:24 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by ag123 » Thu Apr 20, 2017 7:53 am

i'd think that -Os is ok as that is pretty much the de-facto optimization, the thing about these benchmarks is that one should never be too serious about it
if the mflops, or vax mips gets much faster than this, i'd think intel & the rest of the chip giants may start coming in to scrutinize the results for an emerging 'new competitor' :lol:

i doubt these benchmarks reflects the accurate technical boundaries given the gcc -O optimizations, i'd think even without optimizations stated, it would be quite difficult to tell if the compiler may 'optimise away' codes as part of its build process

but nevertheless it reflects that -O2 can make codes run faster on the stm32f1 and stm32f4 as the benchmarks reflects this. this is good as it simply means using more flash. e.g. for the oscilloscope project, i'm not too sure if -O2 may actually help as other than doing analog reads, it is drawing up the graph on the lcd, hence those drawing codes may be accelerated simply specifying -O2
and this won't consume sram as codes executes directly off flash

oh and the f4 is a pretty fast chip at least as these benchmarks show, i'd think it probably beat the old cray 1 or even close in on cray 2 supercomputers, and on top of that it is no less an mcu, pretty much a single chip computer. the dhrystone tests shows the 'power' of the ART accelerator :lol:

victor_pv
Posts: 1681
Joined: Mon Apr 27, 2015 12:12 pm

Re: Dhrystone and Whetstone Benchmarks for STM32F103

Post by victor_pv » Sun Apr 23, 2017 3:56 am

I gave a shot to the single precision Whetstone code AG posted a few post ago.
Settings are:
Libmaple-based core.
407VET
No serial or USB, using SWO, so no interrupts hopefully going on.
168Mhz
no fpu
-Os

Beginning Whetstone benchmark at 168 MHz ...
Loops:1000, Iterations:1, Duration:6205.83 millisec
C Converted Double Precision Whetstones:16.11 mflops

Post Reply