FPU on F407 - how to

Limited support for STM32F4 Discovery, Nucleo and custom F4 boards
ag123
Posts: 768
Joined: Thu Jul 21, 2016 4:24 pm

Re: FPU on F407 - how to

Post by ag123 » Fri Apr 21, 2017 3:21 am

@pito thanks
yup it works
v6.3.1-2017-q1-update, optimization -O2 (optimise more), debug off, note -fsingle-precision-constant is not sspecified
hardware floating point, single precision, WITH_FPU 1,

Code: Select all

Hello 1, The Benchmark starts..                                   
left num: 6.7200002e-3 ana: 6.7121386e-3                          
righ num: 6.7057795e-3 ana: 6.7121386e-3                          
midr num: 6.7128791e-3 ana: 6.7121386e-3
trap num: 6.7120594e-3 ana: 6.7121386e-3
simp num: 6.7120504e-3 ana: 6.7121386e-3

left num: -2.6137950e-3 ana: -2.6122479e-3
righ num: -2.6139435e-3 ana: -2.6122479e-3
midr num: -2.6138706e-3 ana: -2.6122479e-3
trap num: -2.6122679e-3 ana: -2.6122479e-3
simp num: -2.6122713e-3 ana: -2.6122479e-3

left num: 2.8255537e-1 ana: 2.8247904e-1
righ num: 2.8240327e-1 ana: 2.8247904e-1
midr num: 2.8247923e-1 ana: 2.8247904e-1
trap num: 2.8247917e-1 ana: 2.8247904e-1
simp num: 2.8247907e-1 ana: 2.8247904e-1

left num: 6.6035757e-1 ana: 6.6010518e-1
righ num: 6.5985774e-1 ana: 6.6010518e-1
midr num: 6.6010627e-1 ana: 6.6010518e-1
trap num: 6.6010856e-1 ana: 6.6010518e-1
simp num: 6.6010732e-1 ana: 6.6010518e-1

Benchmark ends, elapsed 2305 msecs
i got 2305 msecs
v6.3.1-2017-q1-update, optimization -O2 (optimise more), debug off, -fsingle-precision-constant
hardware floating point, single precision, WITH_FPU 1,

Code: Select all

Hello 1, The Benchmark starts..
left num: 6.7200002e-3 ana: 6.7121386e-3
righ num: 6.7057795e-3 ana: 6.7121386e-3
midr num: 6.7128791e-3 ana: 6.7121386e-3
trap num: 6.7120594e-3 ana: 6.7121386e-3
simp num: 6.7120509e-3 ana: 6.7121386e-3

left num: -2.6137950e-3 ana: -2.6122405e-3
righ num: -2.6139435e-3 ana: -2.6122405e-3
midr num: -2.6138706e-3 ana: -2.6122405e-3
trap num: -2.6122679e-3 ana: -2.6122405e-3
simp num: -2.6122713e-3 ana: -2.6122405e-3

left num: 2.8255537e-1 ana: 2.8247904e-1
righ num: 2.8240325e-1 ana: 2.8247904e-1
midr num: 2.8247917e-1 ana: 2.8247904e-1
trap num: 2.8247917e-1 ana: 2.8247904e-1
simp num: 2.8247909e-1 ana: 2.8247904e-1

left num: 6.6035757e-1 ana: 6.6010518e-1
righ num: 6.5985774e-1 ana: 6.6010518e-1
midr num: 6.6010627e-1 ana: 6.6010518e-1
trap num: 6.6010856e-1 ana: 6.6010518e-1
simp num: 6.6010732e-1 ana: 6.6010518e-1

Benchmark ends, elapsed 872 msecs
with -fsingle-precision-constant , i got 872 msecs , almost 1/10 of software floats time now that's fast :D

note i replaced Serial with SerialUSB as otherwise nothing shows on my serial console, that's mainly because i'm using SerialUSB
and on the f4 black development branch which may have a slightly different Serial implementation
Last edited by ag123 on Fri Apr 21, 2017 4:21 am, edited 2 times in total.

ag123
Posts: 768
Joined: Thu Jul 21, 2016 4:24 pm

Re: FPU on F407 - how to

Post by ag123 » Fri Apr 21, 2017 3:48 am

when the completed version is done lets post the source and results in the dhrystone and whetstone thread, this is a better benchmark, it is less prone to gcc removing codes, e.g. -Os, -O2, -O3 apparently made little difference with this benchmark :D

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: FPU on F407 - how to

Post by Pito » Fri Apr 21, 2017 7:08 am

Ok, this is the version with Serial.print() only. And the new results:
-Os -g, 4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, FPUon, 240MHz

Code: Select all

Benchmark starts..

left num: 6.7200002e-3 ana: 6.7121386e-3
righ num: 6.7057795e-3 ana: 6.7121386e-3
midr num: 6.7128791e-3 ana: 6.7121386e-3
trap num: 6.7120594e-3 ana: 6.7121386e-3
simp num: 6.7120509e-3 ana: 6.7121386e-3

left num: -2.6137950e-3 ana: -2.6122405e-3
righ num: -2.6139435e-3 ana: -2.6122405e-3
midr num: -2.6138706e-3 ana: -2.6122405e-3
trap num: -2.6122679e-3 ana: -2.6122405e-3
simp num: -2.6122713e-3 ana: -2.6122405e-3

left num: 2.8255537e-1 ana: 2.8247904e-1
righ num: 2.8240325e-1 ana: 2.8247904e-1
midr num: 2.8247917e-1 ana: 2.8247904e-1
trap num: 2.8247917e-1 ana: 2.8247904e-1
simp num: 2.8247909e-1 ana: 2.8247904e-1

left num: 6.6035757e-1 ana: 6.6010518e-1
righ num: 6.5985774e-1 ana: 6.6010518e-1
midr num: 6.6010627e-1 ana: 6.6010518e-1
trap num: 6.6010856e-1 ana: 6.6010518e-1
simp num: 6.6010732e-1 ana: 6.6010518e-1

Benchmark ends, elapsed 633 msecs..
-O3 -g, 4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, FPUon, 240MHz

Code: Select all

Benchmark starts..

left num: 6.7200002e-3 ana: 6.7121386e-3
righ num: 6.7057795e-3 ana: 6.7121386e-3
midr num: 6.7128791e-3 ana: 6.7121386e-3
trap num: 6.7120594e-3 ana: 6.7121386e-3
simp num: 6.7120509e-3 ana: 6.7121386e-3

left num: -2.6137950e-3 ana: -2.6122405e-3
righ num: -2.6139435e-3 ana: -2.6122405e-3
midr num: -2.6138706e-3 ana: -2.6122405e-3
trap num: -2.6122679e-3 ana: -2.6122405e-3
simp num: -2.6122713e-3 ana: -2.6122405e-3

left num: 2.8255537e-1 ana: 2.8247904e-1
righ num: 2.8240325e-1 ana: 2.8247904e-1
midr num: 2.8247917e-1 ana: 2.8247904e-1
trap num: 2.8247917e-1 ana: 2.8247904e-1
simp num: 2.8247909e-1 ana: 2.8247904e-1

left num: 6.6035757e-1 ana: 6.6010518e-1
righ num: 6.5985774e-1 ana: 6.6010518e-1
midr num: 6.6010627e-1 ana: 6.6010518e-1
trap num: 6.6010856e-1 ana: 6.6010518e-1
simp num: 6.6010732e-1 ana: 6.6010518e-1

Benchmark ends, elapsed 620 msecs..
I will prepare the final version with some improvements.
integral_float.rar
(2.05 KiB) Downloaded 8 times
Overclocked works the FPU too, it seems :)
Last edited by Pito on Fri Apr 21, 2017 7:19 am, edited 1 time in total.
Pukao Hats Cleaning Services Ltd.

stevestrong
Posts: 1744
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany

Re: FPU on F407 - how to

Post by stevestrong » Fri Apr 21, 2017 7:19 am

Please also describe step-by-step what is necessary to get FPU running (maybe edit the very first post?).

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: FPU on F407 - how to

Post by Pito » Fri Apr 21, 2017 7:40 am

Updated the front page..

The quick guide on setting the FPU ON on STM32F407 boards under STM32duino
Will be precised further on

In your platform.txt add

Code: Select all

 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
or

Code: Select all

 -mfloat-abi=softfp -mfpu=fpv4-sp-d16 -fsingle-precision-constant
for

Code: Select all

compiler.c.flags= ..
compiler.c.elf.flags= ..
compiler.cpp.flags=..
Put this asm code after the setup() start. The flag WITH_FPU set to 1 enables the FPU

Code: Select all

#define WITH_FPU 0

void setup() { 

	if(WITH_FPU) {
		__asm volatile
		(
        "  ldr.w r0, =0xE000ED88    \n" /* The FPU enable bits are in the CPACR. */
        "  ldr r1, [r0]        \n" /* read CAPCR */
        "                      \n"
        "  orr r1, r1, #( 0xf << 20 )  \n" /* Enable CP10 and CP11 co-processors */
        "  str r1, [r0]        \n"
        "  dsb                 \n"
        "  isb" /* reset the pipeline, now the FPU is enabled */
		);
    }
    ..
It should be better placed before main() in init() in order to avoid compile problems.

In your code use "float" type only. Do not use "double".
Mind the math functions for type "float" use "f" as the suffix, ie. cosf, atanf, powf, sqrtf, etc.


Printing the floats:
Use Serial.print(x, 6) for example, or use a library.
Avoid using the vsnprintf() based libs or code chunks for printing the floats.
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 768
Joined: Thu Jul 21, 2016 4:24 pm

Re: FPU on F407 - how to

Post by ag123 » Fri Apr 21, 2017 9:38 am

@pito put those asm codes in a function enable_fpu(), this may help in cases where someone may otherwise copy those asm and put them in the same section as the floating point codes, resulting in the hard faults etc

i'd think at some 'future' time enable_fpu() can perhaps go into libmaple as a 'convenience function', i.e. only sketches that needs them calls it
i'm not sure if calling it in init() may cause more power consumption etc, or more importantly if it gets called on the f103 it may lead to a hard fault/exception. hence, that'd be left to the user/sketch who needs it to call it

i've a version that's sort of 'pretty printed'

Code: Select all

//enable the fpu (cortex-m4 - stm32f4* and above)
//http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/BEHBJHIG.html
void enablefpu()
{
  __asm volatile
  (
    "  ldr.w r0, =0xE000ED88    \n"  /* The FPU enable bits are in the CPACR. */
    "  ldr r1, [r0]             \n"  /* read CAPCR */
    "  orr r1, r1, #( 0xf << 20 )\n" /* Set bits 20-23 to enable CP10 and CP11 coprocessors */
    "  str r1, [r0]              \n" /* Write back the modified value to the CPACR */
    "  dsb                       \n" /* wait for store to complete */
    "  isb"                          /* reset pipeline now the FPU is enabled */

  );
}
just 2 cents

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: FPU on F407 - how to

Post by Pito » Fri Apr 21, 2017 9:57 am

This is the final version 1.0 of the Numerical Integration Benchmark.
The Benchmark shows the speed and the accuracy of the math lib.
The Benchmark is not easy to optimize out.
Changes:
1. Added precise results from Wolfram Alfa
2. Printing the results is now off the calculation loop.

-O3 -g, 4.8.3-2014q1, -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant, FPU on, Black F407ZET @240MHz

Code: Select all

Numerical Integration Benchmark starts..
Single Precision with FPU, v1.0

F1 Result: 0.00671211394423041159587
leftrect  num: 6.720000e-3 ana: 6.712139e-3
righrect  num: 6.705780e-3 ana: 6.712139e-3
midrect   num: 6.712879e-3 ana: 6.712139e-3
trapezium num: 6.712059e-3 ana: 6.712139e-3
simpson   num: 6.712051e-3 ana: 6.712139e-3

F2 Result: -0.00261227114510560413684
leftrect  num: -2.613795e-3 ana: -2.612241e-3
righrect  num: -2.613944e-3 ana: -2.612241e-3
midrect   num: -2.613871e-3 ana: -2.612241e-3
trapezium num: -2.612268e-3 ana: -2.612241e-3
simpson   num: -2.612271e-3 ana: -2.612241e-3

F3 Result: 0.28247911095899395473
leftrect  num: 2.825554e-1 ana: 2.824790e-1
righrect  num: 2.824032e-1 ana: 2.824790e-1
midrect   num: 2.824792e-1 ana: 2.824790e-1
trapezium num: 2.824792e-1 ana: 2.824790e-1
simpson   num: 2.824791e-1 ana: 2.824790e-1

F4 Result: 0.660105195382224847535
leftrect  num: 6.603576e-1 ana: 6.601052e-1
righrect  num: 6.598577e-1 ana: 6.601052e-1
midrect   num: 6.601063e-1 ana: 6.601052e-1
trapezium num: 6.601086e-1 ana: 6.601052e-1
simpson   num: 6.601073e-1 ana: 6.601052e-1

Benchmark ends, elapsed 616621 microsecs..
Last edited by Pito on Fri Apr 21, 2017 5:34 pm, edited 7 times in total.
Pukao Hats Cleaning Services Ltd.

ag123
Posts: 768
Joined: Thu Jul 21, 2016 4:24 pm

Re: FPU on F407 - how to

Post by ag123 » Fri Apr 21, 2017 10:05 am

@pito
the trouble is on an f1 a user who runs it using soft float may need to wait > 10x that amount of time for the benchmark to complete :lol:
i'd suggest we can use micros() instead which is the time in microseconds and we can make do with less loops

in addition i found that __ARM_PCS_VFP is defined by gcc when compiling with the option -mfloat-abi=hard -mfpu=fpv4-sp-d16 hence, this could be used to determine if the user is compiling with hardware float specified in the compile options

Code: Select all

#ifdef __ARM_PCS_VFP
#warning enabling harware fpu
	enablefpu();
#else
#warning using software floating point
#endif

User avatar
Pito
Posts: 1593
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: FPU on F407 - how to

Post by Pito » Fri Apr 21, 2017 10:17 am

Let us make it KISS.
We may enhance it in the near future. The Benchmark is for an FPU, so an F1 user has to modify it for F1.
The Single Precision is not intended for numerical math, so the results will be off for several reasons.
With newer processors with DP FPU we may create Double version.
The Q is whether somebody will use it, so do not spend too much time with it.. ;)
P.
PS: updated the pretty printed asm :)
Pukao Hats Cleaning Services Ltd.

edogaldo
Posts: 281
Joined: Fri Jun 03, 2016 8:19 am

Re: FPU on F407 - how to

Post by edogaldo » Fri Apr 21, 2017 11:40 am

ag123 wrote:@pito
the trouble is on an f1 a user who runs it using soft float may need to wait > 10x that amount of time for the benchmark to complete :lol:
i'd suggest we can use micros() instead which is the time in microseconds and we can make do with less loops

in addition i found that __ARM_PCS_VFP is defined by gcc when compiling with the option -mfloat-abi=hard -mfpu=fpv4-sp-d16 hence, this could be used to determine if the user is compiling with hardware float specified in the compile options

Code: Select all

#ifdef __ARM_PCS_VFP
#warning enabling harware fpu
	enablefpu();
#else
#warning using software floating point
#endif
__ARM_PCS_VFP is defined if you specify "-mfloat-abi=hard" but not if you specify "-mfloat-abi=softfp", in this case you would still get a HW fault..
I'd suggest following condition:

Code: Select all

#if defined (__GNUC__) && defined (__VFP_FP__) && !defined(__SOFTFP__)
#warning enabling harware fpu
	enablefpu();
#else
#warning using software floating point
#endif
I'd also suggest to put this in function "init()" of "boards.c".

Best, E.

Post Reply