Execute from external SRAM

Generic boards that are not Maple or Maple mini clones, and don't contain the additional USB reset hardware
User avatar
Pito
Posts: 1502
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Execute from external SRAM

Post by Pito » Sat Jan 07, 2017 7:52 am

In STM32F1\cores\maple\libmaple\nvic.c comment out for an EXRAM build:

Code: Select all

void nvic_set_vector_table(uint32 address, uint32 offset) {
  //  SCB_BASE->VTOR = address | (offset & 0x1FFFFF80);
}
You must not change the Vector Table address in STM32F1\boards.txt then.
Systick starts working.
Everything blinks as it should :) :)
Last edited by Pito on Sat Jan 07, 2017 9:00 am, edited 1 time in total.
Pukao Hats Cleaning Services Ltd.

User avatar
Pito
Posts: 1502
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Execute from external SRAM

Post by Pito » Sat Jan 07, 2017 8:57 am

Now the difficult part - Serial over USB in EXRAM :) :)
This is the sketch which blinks in debugger (it goes via the Serial calls, but I see no HelloW chars in the TeraTerm Terminal, Loader prints ok before the jump into APP in EXRAM).
It does blink stand alone (off debugger) and I see no HelloW chars in the TeraTerm Terminal. The Loader prints ok before the jump to APP in EXRAM.
The variant with waiting on isConnected loops in debugger around the isConnected.

Code: Select all

void setup() {
  int i;
  pinMode(PC13, OUTPUT);
  digitalWrite(PC13, HIGH);
  
  delay(5000);
  
  Serial.begin(115200);  

  //while(!Serial.isConnected()){};

  Serial.println("Hello World from EXRAM!");
  Serial.println("Now we blink 17x..");

  for (i = 0; i < 17; i++) {  
  digitalWrite(PC13, LOW);
  delay(500);
  digitalWrite(PC13, HIGH);
  delay(500);
  }

  digitalWrite(PC13, HIGH);
  Serial.println("Blinking stopped..");
}

void loop() {
}
It seems to start working USB Serial in EXRAM would require a significant expertise :) :ugeek:

What we do basically, imho:
1. we init usb dfu in the Bootloader,
2. then we init serial usb in the Loader and print something ok,
3. then we jump to APP in EXRAM and we init usb serial again (?),
4. after the jump to APP it does not disconnect/connect
5. it runs smoothly through the serial
6. it does not print HelloW,
7. it does not crash and
8. then it goes to blinking and blinks 17x (2017 :) )..

And this is how it looks like in EXRAM:

Code: Select all

  Serial.begin(115200);  
6800019e:	4b14      	ldr	r3, [pc, #80]	; (680001f0 <_Z5setupv+0x6c>)
680001a0:	4814      	ldr	r0, [pc, #80]	; (680001f4 <_Z5setupv+0x70>)
680001a2:	f44f 31e1 	mov.w	r1, #115200	; 0x1c200
680001a6:	4798      	blx	r3

  //while(!Serial.isConnected()){};

  Serial.println("Hello World from EXRAM!");
680001a8:	4c13      	ldr	r4, [pc, #76]	; (680001f8 <_Z5setupv+0x74>)
680001aa:	4914      	ldr	r1, [pc, #80]	; (680001fc <_Z5setupv+0x78>)
680001ac:	4811      	ldr	r0, [pc, #68]	; (680001f4 <_Z5setupv+0x70>)
680001ae:	47a0      	blx	r4
  Serial.println("Now we blink 17x..");
680001b0:	4810      	ldr	r0, [pc, #64]	; (680001f4 <_Z5setupv+0x70>)
680001b2:	4913      	ldr	r1, [pc, #76]	; (68000200 <_Z5setupv+0x7c>)
680001b4:	47a0      	blx	r4
680001b6:	2511      	movs	r5, #17

  for (i = 0; i < 17; i++) {  
  digitalWrite(PC13, LOW);
680001b8:	2100      	movs	r1, #0
Pukao Hats Cleaning Services Ltd.

User avatar
Pito
Posts: 1502
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Execute from external SRAM

Post by Pito » Sat Jan 07, 2017 11:14 am

Ok, let me try with Serial1 :)
COM22 is Serial USB from the LOADER, it prints out 8+8 bytes from APP's in EXRAM to verify the FSMC works and the APP has been loaded into the EXRAM, COM5 is Serial1 from the HelloWorld sketch as above running in EXRAM.
:) :) :) :) :ugeek: :ugeek: :ugeek: :ugeek: :) :) :) :)
HelloWorld from EXRAM.JPG
HelloWorld from EXRAM.JPG (29.43 KiB) Viewed 345 times
Now let me run something larger :P
Pukao Hats Cleaning Services Ltd.

User avatar
Pito
Posts: 1502
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Execute from external SRAM

Post by Pito » Sat Jan 07, 2017 11:40 am

Bubble Sort on 3000 random uints of various types, allocated by malloc(), everything except Vector Table in EXRAM.
VT at 0x2000FFC0 in reserved IRAM (reserved such the Loader's IRAM ends in linker -4kB to be safe :) ).
STM32F103ZET6 @72MHz, SRAM 256kx16, 10ns, FSMC timing D1, A0.
BubbleSort in EXRAM.JPG
BubbleSort in EXRAM.JPG (144.29 KiB) Viewed 344 times
And the same in flash:

Code: Select all

Allocating memory - all in IRAM..
********
Generating 3000 8bit uints:
Checksum: 383933
BubbleSorting 8bit uints:
Elapsed: 1190  msecs
2x compare count: 0
2x rd and 2x wr count: 0
Checksum: 383933
Sorted last 10 in ascending order:
2990 254
2991 254
2992 254
2993 255
2994 255
2995 255
2996 255
2997 255
2998 255
2999 255
********
Generating 3000 16bit uints:
Checksum: 99006717
BubbleSorting 16bit uints:
Elapsed: 1127  msecs
2x compare count: 0
2x rd and 2x wr count: 0
Checksum: 99006717
Sorted last 10 in ascending order:
2990 65361
2991 65379
2992 65399
2993 65408
2994 65414
2995 65437
2996 65497
2997 65510
2998 65522
2999 65531
********
Generating 3000 32bit uints:
Checksum: 2798690133398
BubbleSorting 32bit uints:
Elapsed: 1065  msecs
2x compare count: 0
2x rd and 2x wr count: 0
Checksum: 2798690133398
Sorted last 10 in ascending order:
2990 1992061039
2991 1993095694
2992 1993936447
2993 1994165868
2994 1994355317
2995 1994787653
2996 1994815262
2997 1994870290
2998 1996946895
2999 1999650861
********
Generating 3000 64bit uints:
Checksum: 16959245246837334
BubbleSorting 64bit uints:
Elapsed: 1406  msecs
2x compare count: 0
2x rd and 2x wr count: 0
Checksum: 16959245246837334
Sorted last 10 in ascending order:
2990 11843812700115
2991 11848708201890
2992 11852166314406
2993 11853718180899
2994 11854411491885
2995 11857725163128
2996 11860571865573
2997 11863881749010
2998 11864248679358
2999 11872081464387
The next plans:
1. make USB Serial work ??
2. finetuning the FSMC - I doubt is could be made faster
3. how to return from EXRAM
4. documentation :)
5. loading APPs from an SDcard and running them in EXRAM (300.000+ sketch_binaries on a $5 sdmicro) :)
6. CP/Mduino :)
Last edited by Pito on Thu Jan 12, 2017 10:28 am, edited 1 time in total.
Pukao Hats Cleaning Services Ltd.

User avatar
martinayotte
Posts: 1213
Joined: Mon Apr 27, 2015 1:45 pm

Re: Execute from external SRAM

Post by martinayotte » Sat Jan 07, 2017 3:28 pm

Pito wrote:6. CP/Mduino :)
:lol:

User avatar
Pito
Posts: 1502
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Execute from external SRAM

Post by Pito » Sat Jan 07, 2017 8:12 pm

I like retro :)

Some more benchmarks (from this forum's thread Dhrystone and Whetstone Benchmarks for STM32F103, first post), for the avoidance of any doubt the first results are from IRAM:

Code: Select all

Starting Whetstone benchmark...
Loops: 1000Iterations: 1Duration: 20366 millisec.
C Converted Double Precision Whetstones: 4.91 MIPS

Starting Whetstone benchmark...
Loops: 1000Iterations: 1Duration: 179589 millisec.
C Converted Double Precision Whetstones: 556.83 KIPS

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 300000 runs through Dhrystone
Execution ends
Microseconds for one run through Dhrystone: 12.93
Dhrystones per Second: 77332.02
VAX MIPS rating = 44.01

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 300000 runs through Dhrystone
Execution ends
Microseconds for one run through Dhrystone: 104.23
Dhrystones per Second: 9594.56
VAX MIPS rating = 5.46
A 5.5 VAX MIPS machine, with "unlimited RAM space". A lot of $$ 40y back ;)
Pukao Hats Cleaning Services Ltd.

victor_pv
Posts: 1607
Joined: Mon Apr 27, 2015 12:12 pm

Re: Execute from external SRAM

Post by victor_pv » Sun Jan 08, 2017 1:12 pm

The usb port, as it is coded in the core, should disconnect and reconnect when the board is powered up (bootloader running), when the bootloader jumps to the sketch (Sketch init code forces a disconnect, reconnect, and the PC re-enumerates the port), and same should happen when you jump from the loader to the exRAM sketch.
It is possible that the part that initializes the serial usb is called from one of the functions we disable, probably from setupClock()
It is also possible that the disconnect is not long enough for Windows to detect. In another thread Roger mentioned he had extended that time slightly because some people was having trouble with the enumeration.

If I understand right, you are first copying the NVIC table to int RAM, and then setting the VTOR value, is that right?
Do you do that in main() or in the sketch (setup() or loop())?

User avatar
Pito
Posts: 1502
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Execute from external SRAM

Post by Pito » Sun Jan 08, 2017 1:24 pm

At 128MHz CPU, running all in EXRAM (SRAM settings D1, A0):

Code: Select all

Dhrystone Benchmark, Version 2.1 (Language: C)
Execution starts, 300000 runs through Dhrystone

Execution ends
Microseconds for one run through Dhrystone: 57.35
Dhrystones per Second: 17437.22
VAX MIPS rating = 9.92
***
If I understand right, you are first copying the NVIC table to int RAM, and then setting the VTOR value, is that right?
Yes, in the xLOADER (in its sketch) I copy bin.h to EXRAM, then NVIC's 512 bytes (512 = I do not care on size) from bin.h to 0x2000FC00 in IRAM (reserved by linker for the xLOADER), then I set the new VTOR value.

Code: Select all

  
  // copy the bin.h with APP Sketch to EXRAM
   for (i = 0; i < bin_len; i++) {
    EXRAM8(i) = bin[i];
  }

  // copy the APP Vector_Table to 0x2000FC00 in IRAM
   for (i = 0; i < 512; i++) {
         IRAMVT(i) = bin[i];
  }
  
  Serial1.println("Starting the .bin:");
   
  /* Jump to Sketch code loaded into EXRAM memory and execute it *************************/
  
  JumpAddress = *(__IO uint32_t*) (ApplicationAddress + 4);
  Jump_To_Application = (pFunction) JumpAddress;

  nvicDisableInterrupts();  // systick disabled too
  
  //SCB_BASE->VTOR = (volatile uint32_t) 0x2000FC00;
   *(int volatile*)0xE000ED08 = (volatile uint32_t) 0x2000FC00;

  /* Initialize user application's Stack Pointer */
  __set_CONTROL(0); // Change from PSP to MSP
  __set_MSP(*(__IO uint32_t*) ApplicationAddress);

  Jump_To_Application();
}
Do you do that in main() or in the sketch (setup() or loop())?
Yes I do that in xLOADER's Setup().
I do nothing special in the APP Sketch. APP Sketch Source is 100% the same for EXRAM and IRAM. The build envs differs, of course (as described in this thread).

############# EXRAM on BlueZEX USAGE GUIDE v 0.01 ##################################
###
### An EXRAM enables to run APPs limited in size only by the EXRAM's size, under the stm32duino IDE,
### provided you have connected the EXRAM memory (up to 256MBytes large) to an STM32F MCU
### via the STM's FSMC interface.
###
### The EXRAM could be a memory of SRAM, PSRAM or SDRAM types. It depends on STM32F family,
### which type of memory the FSMC interface supports.
###
### When implemented, the EXRAM enables to reload and run large APPs dynamically, from any media,
### with help of a control program or an OS. Currently, a small sketch - the xLOADER - loads the APP's
### binary into the EXRAM and starts to execute the APP off the EXRAM.
###
### Note: Not all STM32F chips support the FSMC, not all chip packages support the FSMC.
###
### The first implementation and performance results have been provided on an STM32F103ZET6
### ebay board described on the stm32duino wiki, with a 256kx16 10ns SRAM chip connected to
### FSMC's Bank 1, sub-bank n.3 (EXRAM base address 0x68000000).
###
### This page includes vital information on the EXRAM implementation started by Pito and contributed by
### Pito, Victor_pv and Rick:
### "Execute from external SRAM"
### stm32duino.com/viewtopic.php?f=28&t=1651
### and the information herein will be summarized and precised in this Guide.
###
### Provided As-Is. No warranties of any kind are provided. Use at your own risk.
### by Pito, January 8th 2017
###
#################################################################################

IRAM - the internal STM32F103xx RAM (starting at 0x20000000, size as per MCU).

FSMC - the internal STM32F controller which enables the use of an external SRAM, NOR, NAND, PSRAM, SDRAM.

EXRAM - an External SRAM, PSRAM, or SDRAM connected via the FSMC to STM32F103xx MCU.
The SRAM memory Bank 1 in STM32F103xx is starting at 0x6X000000 - the EXRAM's base address.
The size of the actual EXRAM is as per specific chip(s) connected, the FSMC allows 4 subbanks 64MB each, max 256MB.
X in the base address depends on the FSMC hw chipselect used = 0, 4, 8, C for a respective subbank.

APP - a standard 100% arduino/stm32duino sketch we want to run off the EXRAM.
The APP sketch size (= text, data, bss, heap, stack) has to fit into the EXRAM's size.
The APP must have the FSMC enabled.
The APP has to be built for EXRAM.

xLOADER - a standard arduino sketch which loads the APP sketch binary into the EXRAM and starts the APP in EXRAM.
The xLOADER resides in flash and uses IRAM.
The xLOADER must have the FSMC enabled.
The xLOADER has to be built for IRAM.
The xLOADER requires a reserved IRAM space at the top of IRAM is reserved by linker in order to create a seat for the EXRAM Vector Table.

EXRAM Vector Table - the table which must be placed within the IRAM. It contains interrupt vectors APP is using. The EVT is currently placed beneath the top of IRAM.

Important Hacks - the fixes which must be done in order to build the APP and the xLOADER.
The hacks include few changes - in linker files, in start_c.c (FSMC init sequence), boards.cpp (commenting out flash and clock inits in init()), nvic.c (commenting out the VTOR setting), and commenting out the inits of GPIO_D,E,F,G - and all are described in this thread.

bin.h - an include file for the xLOADER sketch, which contains the APP's binary formatted as "C-source static const unsigned char include_array" called bin[].
For example:

Code: Select all

static const unsigned char bin[] = {
  0x00, 0xf8, 0x07, 0x68, 0x39, 0x06, 0x00, 0x68, 0x91, 0x09, 0x00, 0x68,
  0x95, 0x09, 0x00, 0x68, 0x99, 0x09, 0x00, 0x68, 0x9d, 0x09, 0x00, 0x68,
..
  0x02, 0x00, 0x01, 0x04, 0x03, 0x09, 0x04, 0x00, 0x43, 0x00, 0x00, 0x00,
  0xb8, 0x52, 0x00, 0x68, 0x78, 0x4d, 0x00, 0x68
};
unsigned int bin_len = 25844;
The bin[] array itself is generated by a tool called xxd "xxd -i APP_sketch.bin" where APP_sketch.bin is built within the standard IDE with some important hacks in it.
The bin_len value is equal the size of APP_sketch.bin.
The bin[] is read by the xLOADER and the binary is placed by the xLOADER into the EXRAM (starting from the EXRAM's base address).
The current understanding is an APP could be loaded anywhere in the EXRAM, thus several APPs could be loaded in EXRAM at different places.
The APP's binary could be read by the xLOADER from any media (flash/SDcard/HDD/punch tape) provided you will modify the xLOADER sketch accordingly.

BlueZEX - the STM32F103ZET6 board with an SRAM chip populated.
The board allows 256kx16=512kB or 512kx16=1024kB large tsop II 44pin 3.3V, 10-55ns chips.[/b][/color]
Pukao Hats Cleaning Services Ltd.

User avatar
RogerClark
Posts: 6726
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Execute from external SRAM

Post by RogerClark » Sun Jan 08, 2017 8:25 pm

@pito

To load a sketch into ExRAM, you could write some sketch code to accept uploads via serial using the STM32-Flash protocol which the internal bootloader uses.

This would allow you to just reset the board ( which would run the uploader sketch), then just set the IDE to use the Serial upload method.

The protocol is fairly simple, and is well documented by ST and also we have the C and Python uploaders as a reference.

There is also an branch of the LeafLabs bootloader in their repo, which uploads using that protocol, ( in Leaflabs github repo) . Note, I dont know if that bootloader actually works because I never tried it ;-)

You could even add something to the protocol e.g to set a File Name, so that you could store in SD and retrieve later.

victor_pv
Posts: 1607
Joined: Mon Apr 27, 2015 12:12 pm

Re: Execute from external SRAM

Post by victor_pv » Sun Jan 08, 2017 10:05 pm

Pito, I agree Roger's idea can be interesting. Also loading from an SD Card. You could even make the loader show a menu on serial or an LCD screen showing what .bin files are in the sd card and giving the user a choice to pick one.

Now, about returning back to the loader, I have been thinking on a possible way, but there are many "if"...
I'm adding indexes to this to help in referencing, not that necessarily this are steps and they go in this order:

1.First, we know the stack from the loader and the stack from the App do not overlap, since one is in IRAM and the other in EXRAM, so we are not concerned about the loader stack getting corrupt.
2.Right before loading the App, we change the stack pointer to the EXRAM one.
3.Next we call the APP (the PC should be saved to the stack automatically, isn't it?) Rick knows more how the internals of this MCUs work, so he may be able to help on this.
4.IF the PC is saved automatically when calling the App, then the first entry at the top of the new stack is the return point to get back to the loader sketch.
4b. IF that is not the case, we could read the PC before calling the user code, add as many bytes as needed to the get the right return address, and then save it to the top of the new stack before calling the App code.
5.The app could exit by just calling the address saved at the top of the EXRAM stack. That is a known address, so we dont even have to care what else is in that stack by now, only reading the first entry and jumping to it should hopefully be enough.
6.IF we can return back to the loader, we need to be able upon the return to point the SP register back to the bottom of the Loader stack (remember point 2 above, we overwrite the SP with the EXRAM top of stack). To do this, I think we should save the stack to a uint32 variable right before point 2, that is right before overwritting it with the EXRAM stack. Then the instruction after the jump to the App function, should be followed by an instruction that reads the value from that variable, and write it to SP, and the loader can continue running like normal.

Does anyone see any flaw on this process?

Pito, I think you can verify if the return PC is saved to the top of the new stack with the debugger, by placing a breakpoint right at the start of the App, and checking the top of the stack content, right?

Post Reply