Fast bitbanding gpio/sram access

Post your cool example code here.
arpruss
Posts: 153
Joined: Sat Sep 30, 2017 3:34 am

Fast bitbanding gpio/sram access

Post by arpruss » Sat Nov 25, 2017 8:00 pm

I made some fast bitband-based gpio preprocessor macros that generate fast and small i/o code. They work only with pin numbers explicitly specified like PB13 and PA6. Usage:

Code: Select all

value = DIGITAL_READ(PB13);
DIGITAL_WRITE(PB14,1);
Notes: DIGITAL_READ() returns 0 or 1 at no additional cost, unlike gpio_read_bit() which returns 0 or the mask. You need explicit PXxx pin specifications, or else things will go badly. For instance, DIGITAL_READ(0+PB13) will fail silently. (But DIGITAL_WRITE(0+PB13,1) will give a compile-time error. I couldn't manage to put the error check into the DIGITAL_READ() macro, though.) DIGITAL_WRITE(pin,value) ignores all but the LSB of the value.

I also included macros for sram access. For instance, you can do:

Code: Select all

BITBAND_SRAM(var1,7) = BITBAND_SRAM(var2,0)
to set bit 7 of var1 to the value of bit 0 of var2.

Below is the ORIGINAL code with a blinky demo (black pill). **I have since modified the code to use BSRR instead of bitbanding for DIGITAL_WRITE(), putting the slower, but smaller, bitbanded version into DIGITAL_WRITE_BITBAND. The modified version is here:** https://gist.github.com/arpruss/5be978f ... 7abf954c68

Code: Select all

#define BITBAND_SRAM_BASE        0x22000000
#define SRAM_START               0x20000000
#define BITBAND_PERIPHERAL_BASE  0x42000000
#define PERIPHERAL_START         0x40000000

#define BITBAND_SRAM(address, bit) ( *(volatile uint32*)( BITBAND_SRAM_BASE + ((uint32)(address)-SRAM_START) * 32 + (uint32)(bit)*4) )
#define BITBAND_PERIPHERAL(address, bit) *( (volatile uint32*)( BITBAND_PERIPHERAL_BASE + ((uint32)(address)-PERIPHERAL_START) * 32 + (uint32)(bit)*4) )
#define GPIO_ADDRESS(gpio) (GPIO_START + 0x400 * (uint32)((gpio)-'A'))
#define GPIO_OFFSET(gpioLetter) (0x400 * (uint32)((gpioLetter)-'A'))
#define BITBAND_GPIO_INPUT(gpioLetter, bit) BITBAND_PERIPHERAL((uint32)&(GPIOA_BASE->IDR)+GPIO_OFFSET(gpioLetter), (bit))
#define BITBAND_GPIO_OUTPUT(gpioLetter, bit) BITBAND_PERIPHERAL((uint32)&(GPIOA_BASE->ODR)+GPIO_OFFSET(gpioLetter), (bit))

#define ATOI2(s) ((s)[1] ? 10*(uint32)((s)[0]-'0') + (s)[1]-'0' : (uint32)((s)[0]-'0'))
#define XPIN_TO_GPIO(pin) (#pin[1])
#define XPARSE_PIN_BIT(pin) (ATOI2(#pin+2))
#define XCHECK_PIN_ID(pin) static_assert(#pin[0] == 'P' && #pin[1] >= 'A' && #pin[1] <= 'I' && #pin[2] >= '0' && #pin[2] <= '9' && \
    (#pin[3] == 0 || (#pin[3] >= '0' && #pin[3] <= '9' && #pin[4] == 0)) && ATOI2(#pin+2) < 16, "Invalid pin " #pin)

#define DIGITAL_READ(pin) (BITBAND_GPIO_INPUT(XPIN_TO_GPIO(pin), XPARSE_PIN_BIT(pin)))
#define DIGITAL_WRITE(pin,value) do { XCHECK_PIN_ID(pin); (BITBAND_GPIO_OUTPUT(XPIN_TO_GPIO(pin), XPARSE_PIN_BIT(pin)))=(value); } while(0)

void setup() {
  Serial.begin(9600);
  pinMode(PB12, OUTPUT);
}

void loop() {
  DIGITAL_WRITE(PB12,1);
  Serial.println(String(DIGITAL_READ(PB12),HEX) );
  delay(1000);
  DIGITAL_WRITE(PB12,0);
  Serial.println(String(DIGITAL_READ(PB12),HEX) );
  delay(1000);
}
Last edited by arpruss on Sun Nov 26, 2017 3:56 pm, edited 1 time in total.

dannyf
Posts: 228
Joined: Wed May 11, 2016 4:29 pm

Re: Fast bitbanding gpio/sram access

Post by dannyf » Sun Nov 26, 2017 3:32 am

I made some fast bitband-based gpio preprocessor macros that generate fast and small i/o code.
just curious: how much faster are they to the standard approaches?

arpruss
Posts: 153
Joined: Sat Sep 30, 2017 3:34 am

Re: Fast bitbanding gpio/sram access

Post by arpruss » Sun Nov 26, 2017 5:36 am

dannyf wrote:
Sun Nov 26, 2017 3:32 am
I made some fast bitband-based gpio preprocessor macros that generate fast and small i/o code.
just curious: how much faster are they to the standard approaches?
Basically, reading and writing becomes as fast as reading/writing a uint32 to a location pointed by a global uint32* pointer. I haven't timed it yet.

Updated: Not quite. The processor seems to have an extra overhead on accessing bitbanded memory locations, especially when writing.
Last edited by arpruss on Sun Nov 26, 2017 3:57 pm, edited 1 time in total.

User avatar
RogerClark
Posts: 7683
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Fast bitbanding gpio/sram access

Post by RogerClark » Sun Nov 26, 2017 6:46 am

FYI.

I have looked at the digital write function and its pretty well optimised, despite not looking like it is.

The compiler is a strange beast.

I'd try timing your code and take into consideration the call overhead and see if it is much faster

User avatar
Pito
Posts: 1739
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Fast bitbanding gpio/sram access

Post by Pito » Sun Nov 26, 2017 9:12 am

1mil for-loops switching pin 1/0 with:

Code: Select all

digitalWrite       1517ms   (591ns/Write)
DIGITAL_WRITE       570ms   (118ns/WRITE)
1mil for-loops reading pin with:

Code: Select all

digitalRead         974ms   (640ns/Read)
DIGITAL_READ        584ms   (250ns/READ)
Empty for-loop 334ms.
BPill, 72MHz, default opt., Roger's core.

Code: Select all

#include "Arduino.h"
  #define BITBAND_SRAM_BASE        0x22000000
  #define SRAM_START               0x20000000
  #define BITBAND_PERIPHERAL_BASE  0x42000000
  #define PERIPHERAL_START         0x40000000

  #define BITBAND_SRAM(address, bit) ( *(volatile uint32*)( BITBAND_SRAM_BASE + ((uint32)(address)-SRAM_START) * 32 + (uint32)(bit)*4) )
  #define BITBAND_PERIPHERAL(address, bit) *( (volatile uint32*)( BITBAND_PERIPHERAL_BASE + ((uint32)(address)-PERIPHERAL_START) * 32 + (uint32)(bit)*4) )
  #define GPIO_ADDRESS(gpio) (GPIO_START + 0x400 * (uint32)((gpio)-'A'))
  #define GPIO_OFFSET(gpioLetter) (0x400 * (uint32)((gpioLetter)-'A'))
  #define BITBAND_GPIO_INPUT(gpioLetter, bit) BITBAND_PERIPHERAL((uint32)&(GPIOA_BASE->IDR)+GPIO_OFFSET(gpioLetter), (bit))
  #define BITBAND_GPIO_OUTPUT(gpioLetter, bit) BITBAND_PERIPHERAL((uint32)&(GPIOA_BASE->ODR)+GPIO_OFFSET(gpioLetter), (bit))

  #define ATOI2(s) ((s)[1] ? 10*(uint32)((s)[0]-'0') + (s)[1]-'0' : (uint32)((s)[0]-'0'))
  #define XPIN_TO_GPIO(pin) (#pin[1])
  #define XPARSE_PIN_BIT(pin) (ATOI2(#pin+2))
  #define XCHECK_PIN_ID(pin) static_assert(#pin[0] == 'P' && #pin[1] >= 'A' && #pin[1] <= 'I' && #pin[2] >= '0' && #pin[2] <= '9' && \
      (#pin[3] == 0 || (#pin[3] >= '0' && #pin[3] <= '9' && #pin[4] == 0)) && ATOI2(#pin+2) < 16, "Invalid pin " #pin)

  #define DIGITAL_READ(pin) (BITBAND_GPIO_INPUT(XPIN_TO_GPIO(pin), XPARSE_PIN_BIT(pin)))
  #define DIGITAL_WRITE(pin,value) do { XCHECK_PIN_ID(pin); (BITBAND_GPIO_OUTPUT(XPIN_TO_GPIO(pin), XPARSE_PIN_BIT(pin)))=(value); } while(0)

//port/gpio oriented macros
#define IO_SET(port, pins)          port->regs->ODR |= (pins)       //set bits on port
#define IO_CLR(port, pins)          port->regs->ODR &=~(pins)       //clear bits on port

//fast routines through BRR/BSRR registers
#define FIO_SET(port, pins)         port->regs->BSRR = (pins)
#define FIO_CLR(port, pins)         port->regs->BRR = (pins)

volatile uint32_t elapsed, dummy, i;
void setup() {
    Serial.begin(9600);
    pinMode(PB12, OUTPUT);
    pinMode(PB11, INPUT);
  }

void loop() {
    elapsed = millis();
    for (i=0; i<1000000; i++) {
        // digitalWrite (PB12,1);
        // DIGITAL_WRITE(PB12,1);
        // dummy = digitalRead(PB11);
        // dummy = DIGITAL_READ(PB11);
              IO_SET(GPIOB, 12);
              IO_CLR(GPIOB, 12);
        // FIO_SET(GPIOB, 12);
        // FIO_CLR(GPIOB, 12);
        // digitalWrite (PB12,0);
        // DIGITAL_WRITE(PB12,0);
    }
    elapsed = millis() - elapsed;
    Serial.println(elapsed);
    delay(1000);
  }
Last edited by Pito on Sun Nov 26, 2017 1:30 pm, edited 3 times in total.
Pukao Hats Cleaning Services Ltd.

stevestrong
Posts: 2063
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany
Contact:

Re: Fast bitbanding gpio/sram access

Post by stevestrong » Sun Nov 26, 2017 9:54 am

In the libs there is a wide use of using core defines like these:

Code: Select all

const uint8_t inPin = PB0;
const uint8_t outPin = PB10;

volatile uint32_t * inPort = portInputRegister(digitalPinToPort(inPin));
uint16_t inMask = BIT(digitalPinToBit(inPin));
volatile uint32_t * setPort = portSetRegister(outPin);
uint16_t outMask = BIT(digitalPinToBit(outPin));
...
  bool rd = ( (*inPort) & inMask ) ? 1 : 0; // read
...
  *outPort = outMask; // set the pin
  *outPort = outMask<<16; // reset the pin
How much faster is your code compared to this?

dannyf
Posts: 228
Joined: Wed May 11, 2016 4:29 pm

Re: Fast bitbanding gpio/sram access

Post by dannyf » Sun Nov 26, 2017 12:58 pm

I haven't timed it yet.
I happened to be running IAR (7.10 I think) when I asked the question.

So I decided to benchmark my GPIO macros:

Code: Select all

//port/gpio oriented macros
#define IO_SET(port, pins)					port->ODR |= (pins)				//set bits on port
#define IO_CLR(port, pins)					port->ODR &=~(pins)				//clear bits on port

//fast routines through BRR/BSRR registers
#define FIO_SET(port, pins)					port->BSRR = (pins)
#define FIO_CLR(port, pins)					port->BRR = (pins)
here is what I got:

Code: Select all

		IO_SET(LED_PORT, LED); IO_CLR(LED_PORT, LED);	//flip led, 9 cycles total @ medium optimization
		FIO_SET(LED_PORT, LED); FIO_CLR(LED_PORT, LED);	//flip led, 3 cycles total @ medium optimization
I think bitbanding is one of those beautiful concepts that doesn't quite work in reality.

User avatar
Pito
Posts: 1739
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: Fast bitbanding gpio/sram access

Post by Pito » Sun Nov 26, 2017 1:18 pm

Here under our core:

Code: Select all

//port/gpio oriented macros
#define IO_SET(port, pins)          port->regs->ODR |= (pins)       //set bits on port
#define IO_CLR(port, pins)          port->regs->ODR &=~(pins)       //clear bits on port

//fast routines through BRR/BSRR registers
#define FIO_SET(port, pins)         port->regs->BSRR = (pins)
#define FIO_CLR(port, pins)         port->regs->BRR = (pins)
With my above bench (1mil times 1/0, empty loop 334ms):

Code: Select all

IO_SET/CLR  570ms  (118ns/IO) 8.5cycles
FIO_SET/CLR 417ms  (42ns/FIO) 3.02cycles
Updated above benchmark.
Pukao Hats Cleaning Services Ltd.

arpruss
Posts: 153
Joined: Sat Sep 30, 2017 3:34 am

Re: Fast bitbanding gpio/sram access

Post by arpruss » Sun Nov 26, 2017 2:21 pm

An advantage of my macros over some of the other solutions is that my macros don't require any setup -- no masks or addresses to define.

dannyf
Posts: 228
Joined: Wed May 11, 2016 4:29 pm

Re: Fast bitbanding gpio/sram access

Post by dannyf » Sun Nov 26, 2017 2:45 pm

It doesn't because you explicitly define them there.

You could have done it with other macros. And it is actually a good idea to NOT define your own and rely on the device header file - so your macros are ortable across platforms.

Post Reply