Bluepill F4 board, anyone still working on it?

If you made your own board, post here, unless you built a Maple or Maple mini clone etc
ag123
Posts: 1655
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

1 min, -O3 is there, fpu flags missing, updating them

using some big flags

Code: Select all

-mcpu=cortex-m4 -march=armv7e-m+fp

Code: Select all

-mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
platforms.txt

Code: Select all

# this can be overriden in boards.txt
build.mcu=cortex-m4
build.cpu_flags=-mcpu=cortex-m4 -march=armv7e-m+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant
didn't work try

Code: Select all

build.common_flags=-mcpu=cortex-m4 -march=armv7e-m+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant -mthumb -D__STM32F4__ -DSTM32F4
now ld returns error :lol:
i know why, i'd dig that out from my makefile
need to add

Code: Select all

-L $(ARM_NONE_EABI_PATH)/arm-none-eabi/lib/thumb/v7e-m+fp
when linking
try

Code: Select all

compiler.ldflags={build.flags.ldspecs} -L{runtime.tools.arm-none-eabi-gcc.path}/arm-none-eabi/lib/thumb/v7e-m+fp
nope, need to find another place to patch
try

Code: Select all

compiler.c.elf.extra_flags="-L{build.variant.path}/ld" "-Wl,--wrap=atexit,--wrap=__cxa_atexit,--wrap=exit" -L{runtime.tools.arm-none-eabi-gcc.path}/arm-none-eabi/lib/thumb/v7e-m+fp
Last edited by ag123 on Sun Jan 19, 2020 5:16 pm, edited 1 time in total.
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Bingo600 »

Yddrfff

Try (in boards.txt)

blackpill_f401.build.extra_flags=-DLED_BUILTIN=PC13 -DCRYSTAL_FREQ=25 -DNO_CCMRAM -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant

blackpill_f401.menu.opt.o3std.build.flags.optimize=-O3
blackpill_f401.menu.opt.o3std.build.flags.ldspecs=-mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant

Instead of that AWFULL : -L $(ARM_NONE_EABI_PATH)/arm-none-eabi/lib/thumb/v7e-m+fp
Ought to work

I have always meant it is BAD karma to point directly to mcu-specific libs (as in thumb / fp etc ...) , will get you in trouble later on

/Bingo
Last edited by Bingo600 on Sun Jan 19, 2020 5:23 pm, edited 1 time in total.
ag123
Posts: 1655
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

give up for now, that special vfp library probably has something to do with the mflops

but my makefile works
viewtopic.php?p=327#p327
trying that 1st

Code: Select all

Beginning Whetstone benchmark at 84 MHz ... -OS

Loops:10000, Iterations:1, Duration:15041.21 millisec
C Converted Single Precision Whetstones:66.48 Mflops

Code: Select all

Beginning Whetstone benchmark at 84 MHz ...

Loops:10000, Iterations:1, Duration:8287.97 millisec
C Converted Single Precision Whetstones:120.66 Mflops
now this looks normal :lol:
i'd need to figure out where to patch platforms.txt and boards.txt

in the mean time if you are keen on trying out the makefile, the necessary edits are:
- ARM_NONE_EABI_PATH - this needs to point to the installed location of your arm-none-eabi gcc/g++ toolchain
(if you have the official core installed there is actually one in '$home/dotarduino15' : .arduino15/packages/STM32/tools/xpack-arm-none-eabi-gcc/9.2.1-1.1)
- move the arduino sources into a sub folder called src. rename the .ino into .cpp
- symlink ./STM32F4 to the STM32F4 folder in Arduino_STM32. or you can copy that Arduino_STM32/STM32F4 into ./STM32F4

on running make, the output goes into the directory ./build
the bin file is in there
note that make clean deletes ./build directory
Attachments
blackpill_f401.zip
bin file
(21.85 KiB) Downloaded 335 times
makefile.zip
makefile
(4.16 KiB) Downloaded 439 times
ag123
Posts: 1655
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

now i know why that vfp is so special
http://infocenter.arm.com/help/topic/co ... dejjh.html
1.5.9. Vector Floating-Point (VFP)

The VFP coprocessor supports floating point arithmetic operations and is a functional block within the ARM1176JZF-S processor. The VFP coprocessor is mapped as coprocessor numbers 10 and 11. Software can determine whether the VFP is present by the use of the Coprocessor Access Control Register. See c1, Coprocessor Access Control Register for more details.
if you look at the enablefpu() codes commented

Code: Select all

// Enable the FPU (Cortex-M4 - STM32F4xx and higher)
// http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/BEHBJHIG.html
//void enablefpu() {
//    __asm volatile
//    (
//      "  ldr.w r0, =0xE000ED88    \n"  /* The FPU enable bits are in the CPACR. */
//      "  ldr r1, [r0]             \n"  /* read CAPCR */
//      "  orr r1, r1, #( 0xf << 20 )\n" /* Set bits 20-23 to enable CP10 and CP11 coprocessors */
//      "  str r1, [r0]              \n" /* Write back the modified value to the CPACR */
//      "  dsb                       \n" /* wait for store to complete */
//      "  isb"                          /* reset pipeline now the FPU is enabled */
//    );
//}
CP10, CP11, exactly that, this enablefpu() is there in both the official core and libmaple core, hence it isn't necessary to put that in the sketch

i've always been telling pito that there are 2 fpu in there, pito disagreed.
it isn't 2 fpu, it is 1 fpu with 2 vector lanes, hence we see mflops higher than the cpu mhz
it is a little incredible our little cortex-m4 processors has such features
but as it seemed, it is there indeed

arm gcc compiler is 'no ordinary' compiler, -O3 turns ordinary looking c/c++ routines into partly vector codes, i'd guess this is reflected in the whetstone mflops. and perhaps, it may not be just gcc, but that special vfp library
and as pito noted
viewtopic.php?p=558#p558
our little m4 has mflops that look like they are as fast or faster than a 2 GHz P4
:lol:
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Bingo600 »

Well if we had an ARM11 i'd say you're right , unfortunately this is a Cortex-M4

What is happening in the MCU i can't say, i mean if it's really calculating those flops , or if it's doing something entirely else ie. NOP's

/Bingo
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Bingo600 »

Aha ... You had something there : -march=armv7e-m+fp

As the old 7-2017q4 did NOT like -march=armv7e-m+fp

I just switched to the latest arm-gcc : 9-2019-q4
Put it here : $HOME/.arduino15/packages/arduino/tools/arm-none-eabi-gcc

And did these in boards.txt
blackpill_f401.build.extra_flags=-DLED_BUILTIN=PC13 -DCRYSTAL_FREQ=25 -DNO_CCMRAM -mthumb -mcpu=cortex-m4 -march=armv7e-m+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant

And (for the -03) (linking)
blackpill_f401.menu.opt.o3std.build.flags.ldspecs=-march=armv7e-m+fp -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant

Now it is "flying" - I enabled print

Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00

Loops:10000, Iterations:1, Duration:8596.40 millisec
C Converted Single Precision Whetstones:116.33 Mflops


/Bingo
ag123
Posts: 1655
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

time to mess with the m, n, p, q pll scalers
at 96mhz, the normal hz for F411
you'd get 134.89 Mflops (hopefully more)
viewtopic.php?p=539#p539
:)

the small catch is
https://github.com/stevstrong/Arduino_S ... f401.h#L39

Code: Select all

#define CYCLES_PER_MICROSECOND   84
^you would need to change that or systick and the timers would be incorrect

but
https://github.com/stevstrong/Arduino_S ... cF4.c#L476

Code: Select all

void rcc_clk_init(void)
{
	SystemCoreClock = CYCLES_PER_MICROSECOND * 1000000;

#if CYCLES_PER_MICROSECOND == 168
	  SetupClock168MHz();
#elif CYCLES_PER_MICROSECOND == 120
	  SetupClock120MHz();
#elif CYCLES_PER_MICROSECOND == 96
	  SetupClock96MHz();
#elif CYCLES_PER_MICROSECOND == 84
	  SetupClock84MHz();
#elif CYCLES_PER_MICROSECOND == 72
	  SetupClock72MHz();
#else
	#error Wrong CYCLES_PER_MICROSECOND!
#endif
}
so you'd probably want to do a 'bypass' so that it keeps calling SetupClock84MHz();
even if you change CYCLES_PER_MICROSECOND
e.g. comment others and leave SetupClock84MHz();

then you can go into SetupClock84MHz() and tweak the m, n, p, q prescalers to get 96 Mhz

that python script would be useful there
viewtopic.php?p=393#p393

keep a copy of your 'custom' libmaple core, lest you 'forget' when you do an 'update'
;)
Last edited by ag123 on Sun Jan 19, 2020 8:45 pm, edited 1 time in total.
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Bingo600 »

I just enabled ART (I think it might be default)


Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00

Loops:10000, Iterations:1, Duration:7974.93 millisec
C Converted Single Precision Whetstones:125.39 Mflops

ART disabled
Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00

Loops:10000, Iterations:1, Duration:10925.91 millisec
C Converted Single Precision Whetstones:91.53 Mflops

ART enabled
Beginning Whetstone benchmark at 84 MHz ...
0 0 0 1.00 -1.00 -1.00 -1.00
120000 140000 120000 -0.00 0.00 -0.00 0.00
140000 120000 120000 -0.00 0.00 0.00 0.00
3450000 1 1 1.00 -1.00 -1.00 -1.00
2100000 1 2 6.00 6.00 0.00 0.00
320000 1 2 0.00 0.00 0.00 0.00
8990000 1 2 1.00 1.00 1.00 1.00
6160000 1 2 3.00 2.00 3.00 0.00
0 2 3 1.00 -1.00 -1.00 -1.00
930000 2 3 1.00 1.00 1.00 1.00

Loops:10000, Iterations:1, Duration:7976.07 millisec
C Converted Single Precision Whetstones:125.37 Mflops


I had to add : "-I{build.core.path}/libmaple/" to platform.txt - to compile/resolve flashF4.h

compiler.libs.c.flags="-I{build.system.path}/libmaple" "-I{build.core.path}/libmaple/" "-I{build.core.path}/libmaple/usbF4"
Attachments
STM32F4-board-platform.tar.bz2
(4.24 KiB) Downloaded 315 times
BP411-wheatstone-Steve-ART.tar.bz2
(6.5 KiB) Downloaded 327 times
ag123
Posts: 1655
Joined: Thu Dec 19, 2019 5:30 am
Answers: 24

Re: Bluepill F4 board, anyone still working on it?

Post by ag123 »

like pito mentioned
viewtopic.php?p=558#p558
it would superficially look like we beat the 2 Ghz P4 at only a mere 84 mhz

but of course we aren't comparing the same thing, to get more mflops than mhz, there is vector floating point
literally 2 vector lanes in 1 fpu.
probably back then gcc -O3 don't know how to auto vectorize whetstone for the P4 using SSE
:lol:

we have that high end arm 11 fpu in our little stm32 F4
http://infocenter.arm.com/help/topic/co ... dejjh.html
https://community.arm.com/cfs-file/__ke ... D00_M7.pdf
Last edited by ag123 on Sun Jan 19, 2020 9:05 pm, edited 1 time in total.
Bingo600
Posts: 86
Joined: Sat Dec 21, 2019 3:56 pm

Re: Bluepill F4 board, anyone still working on it?

Post by Bingo600 »

ag123 wrote: Sun Jan 19, 2020 8:55 pm like pito mentioned
viewtopic.php?p=558#p558
it would superficially look like we beat the 2 Ghz P4 at only a mere 84 mhz

but of course we aren't comparing the same thing, to get more mflops than mhz, there is vector floating point
literally 2 vector lanes in 1 fpu.
probably back then gcc -O3 don't know how to auto vectorize whetstone for the P4 using SSE

we hae that high end arm 11 fpu in our little stm32 F4
http://infocenter.arm.com/help/topic/co ... dejjh.html
https://community.arm.com/cfs-file/__ke ... D00_M7.pdf
:lol:
The whitepaper is interesting , but i Can't find the : literally 2 vector lanes in 1 fpu.

It says : All of the instructions are single-cycle on Cortex-M4 (except hardware divide)
All "dual" i can see is fir the M7 , and we don't have an ARM11 - Powerusage would "SkyRocket"

But THANK YOU for spending so much time with me , on this FPU Quest :)
Really appreciated

Edit: I'm enclined to agree with pito ... Our numbers are/could be - fishy ... If we beat a 2GHz P4

/Bingo
Last edited by Bingo600 on Sun Jan 19, 2020 9:20 pm, edited 1 time in total.
Post Reply

Return to “Custom design boards”