Port manipulation - LCD 8bit parallel

Working libraries, libraries being ported and related hardware
User avatar
ahull
Posts: 1596
Joined: Mon Apr 27, 2015 11:04 pm
Location: Sunny Scotland
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by ahull » Sun Jul 03, 2016 8:51 pm

martinayotte wrote:Andy, even if LCD is parallel, if code has to assign GPIOs one by one, especially in the case of a shield with D2-D9, where those GPIOs are not consecutive (especially on Netduino2Plus), I/O performance is hard to achieve.
I appreciate that, and am more than impressed with the progress so far. You might like to take a look at http://andybrown.me.uk/2013/08/03/lg-kf700/ in particular the section titled "optimised stm32plus GPIO access mode". It may be applicable to this problem too. You would of course need to know what the maximum speed that the parallel interface can cope with, rather than blatting it with data at full pelt.
- Andy Hull -

User avatar
RogerClark
Posts: 6877
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by RogerClark » Sun Jul 03, 2016 10:27 pm

Interesting.

I looked at some previous postings to the forum e.g. http://www.stm32duino.com/viewtopic.php?t=862&start=40

and it looks like operations which set individual pixels are faster, but anything that fills large areas are slower.

From what I recall the adafruit SPI based library is not very effecient at writing individual pixels, and I suspect that improvements could be made to the SPI version which may improve single pixel access speed in the SPI version.
(I think the Teensy has an optimised library with fast pixel access)


Where this really comes into its own, is on boards that use the F103V or F103Z, which have whole GPIO ports that are unused.
And further speed improvements can be gained where there are separate data and address ports.

User avatar
RogerClark
Posts: 6877
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by RogerClark » Sun Jul 03, 2016 11:02 pm

Just a quick followup about slow single pixel write speeds in the SPI version, and unless i have looked in the wrong file...

the drawpixel function wastes a lot of time, as it assumes its being called for a one off command, but as its called from the functions which draw triangles etc, it makes operations like triangles slow.

The simple solution, is to have 2 functions to write a pixel, one that works standalone and one that needs the SPI to be initialised for it e.g Chip Select pins etc to be toggled.
then get the triangle functions to call the non-standalone version.

Ideally the non-standalone function would be private to the class, but because the Text generator is in a separate cpp, I think to optimise the text, the new function would need to be made public, which is not ideal, but not something I would loose sleep over ;-)

Code: Select all

void Adafruit_ILI9341_STM::drawPixel(int16_t x, int16_t y, uint16_t color) {

  if ((x < 0) || (x >= _width) || (y < 0) || (y >= _height)) return;

  if (hwSPI) spi_begin();
  setAddrWindow(x, y, x + 1, y + 1);

  *dcport |=  dcpinmask;
  *csport &= ~cspinmask;

  spiwrite(color >> 8);
  spiwrite(color);

  *csport |= cspinmask;
  if (hwSPI) spi_end();
}

User avatar
iwalpola
Posts: 24
Joined: Tue Jun 21, 2016 1:08 pm
Location: Silchar, India
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by iwalpola » Sun Jul 03, 2016 11:13 pm

@ Roger, you're almost right, and you're referring the SPI code, not the 8 bit parallel. The triangle drawing functions call the functions which draw lines (horizontal, vertical or inclined). These functions in turn call the draw pixel function, but only if length of line is 1px.

However, on the line drawing, screen filling etc, where there is a lot of data to be written, there was room for improvement in CHIP_SELECT toggling and toggling of RS pin (which decides whether data written is command or data).

So after adjusting (after doing a git diff), the benchmark is better than Kurt's mod!! Maybe there is room for even more improvement, but I guess this is pretty good.

Code: Select all

Benchmark                Time (microseconds)
Screen fill              784951
Text                     27277
Lines                    242789
Horiz/Vert Lines         66034
Rectangles (outline)     42204
Rectangles (filled)      1630078
Circles (filled)         240487
Circles (outline)        180967
Triangles (outline)      58242
Triangles (filled)       563808
Rounded rects (outline)  84634
Rounded rects (filled)   1777583

User avatar
RogerClark
Posts: 6877
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by RogerClark » Sun Jul 03, 2016 11:25 pm

Yes. I was referring to the SPI version

Does the parallel version toggle CS etc for every pixel, or did you optimise it?

User avatar
iwalpola
Posts: 24
Joined: Tue Jun 21, 2016 1:08 pm
Location: Silchar, India
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by iwalpola » Sun Jul 03, 2016 11:39 pm

RogerClark wrote:Yes. I was referring to the SPI version

Does the parallel version toggle CS etc for every pixel, or did you optimise it?
It was doing CS toggle for every pixel write earlier (my dumb idea). Now I fixed it.

I was worried that it will interfere with the touch digitizer output available in the same module (TFT and Touch share 4 pins).

User avatar
RogerClark
Posts: 6877
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by RogerClark » Mon Jul 04, 2016 12:43 am

iwalpola wrote: It was doing CS toggle for every pixel write earlier (my dumb idea). Now I fixed it.

I was worried that it will interfere with the touch digitizer output available in the same module (TFT and Touch share 4 pins).
Wow, thats interesting.

I thought the difference in speed was probably the CS, but looking again, I see that each pixel is sent in its own SPI transaction, as I checked the code and the transaction stuff is incredibly inefficient if used like it is to draw individual pixels.

I think if the begin spi_begin() is moved out of an improved version of drawPixel the SPI would be substantially faster

Its actually interesting that no one has looked at doing this before. I guess we were so impressed at the SPI speed being much much faster than AVR that no one spent any time looking carefully at it to extract even more speed for some functions.

User avatar
martinayotte
Posts: 1219
Joined: Mon Apr 27, 2015 1:45 pm

Re: Port manipulation - LCD 8bit parallel

Post by martinayotte » Mon Jul 04, 2016 2:09 am

I start to misunderstanding the whole thread here :
On every parallel 8bis transfer, there must be an CS pulse.
So, how can that be skipped for optimisation ?

User avatar
RogerClark
Posts: 6877
Joined: Mon Apr 27, 2015 10:36 am
Location: Melbourne, Australia
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by RogerClark » Mon Jul 04, 2016 2:17 am

CS can probably me skipped if only one device is on SPI, but its hard coded into the Adafruit Ili9341 lib.

The other line that has to be toggled a lot is the Data / command pin

It looks like the main reason that writing single pixels is slow is not the CS line but the unnecessary calling of SPI begin transaction before writing each pixel, which takes up a huge amount of time, as the whole SPI device has to be disabled and all settings updated etc etc etc

Some minimal changes would result in a massive speed improvement in some functions e.g. drawing of diagonal lines , but the Adafruit graphics lib would need to also be updated to yield faster text speeds.

If I get time this evening, I will take a quick look at the existing code in the Adafruit_ili9341_STM lib (for F1)

User avatar
iwalpola
Posts: 24
Joined: Tue Jun 21, 2016 1:08 pm
Location: Silchar, India
Contact:

Re: Port manipulation - LCD 8bit parallel

Post by iwalpola » Mon Jul 04, 2016 2:36 am

Yet another performance gain by inlining the write8special() function.
Demo (almost too fast to see):
https://youtu.be/sGU_wwX20hM

@Roger, there must surely be potential for improvement in the SPI code as well. What I learned today is that even microseconds worth of delay adds up fast when you repeat it 10000's of times.

Code: Select all

Benchmark                Time (microseconds)
Screen fill              320577
Text                     14869
Lines                    129592
Horiz/Vert Lines         25437
Rectangles (outline)     16493
Rectangles (filled)      666012
Circles (filled)         104502
Circles (outline)        96406
Triangles (outline)      30358
Triangles (filled)       230338
Rounded rects (outline)  41103
Rounded rects (filled)   709046
Done!

Post Reply