Bluepill F4 board, anyone still working on it?

If you made your own board, post here, unless you built a Maple or Maple mini clone etc
User avatar
Pito
Posts: 94
Joined: Tue Dec 24, 2019 1:53 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Pito »

on stm32f407, pito achieved 500 mflops on an mcu overclocked to 250mhz, that is almost 2 flop per clock
Unbelievable :D , is the post available on the old forum somewhere?

Edit: I've found my old thread HERE, 340mips (whetstone) at 168MHz, but it looks like we had some problems with it at that time.. :?

Does the new STM's core support FPU?
Pukao Hats Cleaning Services Ltd.
fpiSTM
Posts: 1836
Joined: Wed Dec 11, 2019 7:11 pm
Answers: 102
Location: Le Mans
Contact:

Re: Bluepill F4 board, anyone still working on it?

Post by fpiSTM »

Pito wrote: Tue Dec 24, 2019 2:48 pm Does the new STM's core support FPU?
Yes
User avatar
Pito
Posts: 94
Joined: Tue Dec 24, 2019 1:53 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Pito »

FYI - I've built the whetstone for single precision without and with FPU (F407ZET, 168MHz), Rogers core:

Code: Select all

Loops: 1000 Iterations: 10 Duration: 42000 millisec.   0 clocks
C Converted Single Precision Whetstones: 23.81 MIPS

Loops: 1000 Iterations: 10 Duration: 10103 millisec.   0 clocks
C Converted Single Precision Whetstones: 98.98 MIPS
Pukao Hats Cleaning Services Ltd.
ag123
Posts: 1797
Joined: Thu Dec 19, 2019 5:30 am
Answers: 28

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

Pito wrote: Tue Dec 24, 2019 2:48 pm
on stm32f407, pito achieved 500 mflops on an mcu overclocked to 250mhz, that is almost 2 flop per clock
Unbelievable :D , is the post available on the old forum somewhere?
here it is, 466.95 Mflops just a whisker off 500 Mflops but close enough
https://web.archive.org/web/20190316173 ... 160#p26942

:D
User avatar
Pito
Posts: 94
Joined: Tue Dec 24, 2019 1:53 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Pito »

I think 8-10x speed up with math functions is max you may get off those stm32 single precision FPUs. In the whetstone those 4x I got yesterday is something pretty realistic, imho. A few weeks back I did a test with the pic32MZEF (double precision FPU, different benchmark) and I got 13x.
So 466mflops at 240MHz indicates an issue somewhere.
Last edited by Pito on Wed Dec 25, 2019 1:32 pm, edited 1 time in total.
Pukao Hats Cleaning Services Ltd.
ag123
Posts: 1797
Joined: Thu Dec 19, 2019 5:30 am
Answers: 28

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

i'd think the compiler could have 'optimised' away codes and took short cuts rather than doing the math, nevertheless, f407 is a dual fpu core + that ART accelerator. So if one measures only the SP floating point speeds, it is still plausible that 500 mflops in this 'simple' way may be true. But i'd guess in real apps, it'd be hard to get anywhere close as there would be overheads in other codes :lol:
User avatar
Pito
Posts: 94
Joined: Tue Dec 24, 2019 1:53 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Pito »

With default optimization (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 11689 millisec.   0 clocks
C Converted Single Precision Whetstones: 85.55 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000

With -O1 LTO (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 5290 millisec.   0 clocks
C Converted Single Precision Whetstones: 189.04 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000

With -O3 (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 5055 millisec.   0 clocks
C Converted Single Precision Whetstones: 197.82 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000
With -O2 LTO (Roger's core, 168MHz, FPU on)

Code: Select all

Loops: 1000 Iterations: 10 Duration: 4924 millisec.   0 clocks
C Converted Single Precision Whetstones: 203.09 mflops
0       0       0       1.00    -1.00   -1.00   -1.00   0
12000   14000   12000   -0.13   -0.18   -0.43   -0.48   12000
14000   12000   12000   0.02    -0.03   -0.04   -0.09   14000
345000  1       1       1.00    -1.00   -1.00   -1.00   345000
210000  1       2       6.00    6.00    -0.04   -0.09   210000
32000   1       2       0.09    0.09    0.09    0.09    32000
899000  1       2       1.00    1.00    1.00    1.00    899000
616000  1       2       3.00    2.00    3.00    -0.09   616000
0       2       3       1.00    -1.00   -1.00   -1.00   0
93000   2       3       1.00    1.00    1.00    1.00    93000
Pukao Hats Cleaning Services Ltd.
ag123
Posts: 1797
Joined: Thu Dec 19, 2019 5:30 am
Answers: 28

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

it seemed like gcc 'fixed' the wild optimization problems, i'm still waiting for my f401 to arrive & i'd probably try running it on it. :lol:
btw 203 Mflops is still pretty decent considering this is an mcu ! :D
User avatar
Pito
Posts: 94
Joined: Tue Dec 24, 2019 1:53 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Pito »

Well, the double precision would be something more interesting to have handy.
Single p. is rather limited in use, imho.
Pukao Hats Cleaning Services Ltd.
ag123
Posts: 1797
Joined: Thu Dec 19, 2019 5:30 am
Answers: 28

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

actually my guess is part of the reason for that fp 'speeds' is in part it is 32bits fp, it is probably significantly simpler to implement vs 64 bits (or 80 bits) fpu.
a lot of those (earlier and even current) nvidia, amd gpus basically accelerates 32bits fp, 64 bits is almost always much slower (i've seen ratios like 1:60 etc) i.e. double precision fp64 is 60 times slower than fp32.)
my guess is fp32 is primarily targeting the digital filters (e.g. dsp) and 'basic' fp calcs. my guess is certain things like inverse kinematics may possibly be adequate to do in fp32. fp32 is more of a problem in cases of iterative search (errors accumulate in every iteration) or where values degenerate (underflow) then suddenly you have 1 / 0 -> infinity
Post Reply

Return to “Custom design boards”