[STM32GENERIC/HAL] SerialUSB TX/RX speed problem

Discussions about the STM32generic core
danieleff
Posts: 336
Joined: Thu Sep 01, 2016 8:52 pm
Location: Hungary
Contact:

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by danieleff » Sun Aug 13, 2017 11:16 am

There is timeout in writes like libmaple so that might contribute.
In SerialUSBClass write, unsigned long timeout=millis()+5;
Try to increase it.

Sorry I was not able to reproduce the error, but do not have time to thoroughly test this.

victor_pv
Posts: 1603
Joined: Mon Apr 27, 2015 12:12 pm

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by victor_pv » Sun Aug 13, 2017 1:24 pm

Pito I have not tested with Generic yet, only libmaple F4. I will test with generic when I have time again.

In libmaple the buffers are defined in another line, but same effect. Those buffers are 2KB each by default in the latest repo. I increased those without much effect, except if I get them really slow.
Playing with the buffers in commecho had a more definitive effect, the best results is when the transmission total, each individual chunk, and the commecho buffers were all multiples. I.E. 1milling total bytes, sent in chunks of 100 at a time, and commecho buffers of 1000 each. Still the speed sending 1 byte at a time to 100 bytes at a time is almost the same overall total speed. I think the bottleneck was in commecho receiving and sending back, but at least confirmed it got everything to Windows and back.
If I try to send 1000 bytes a time, I lose data, because I think it fills the libmaple buffer faster than Windows is taking it, and then the TX timeout quicks in and doesn't send the total it should.
My F4 libmaple is modified so it does not dump RX bytes evers. The current repo copy will dump RX data if the sketch is not reading it at the same speed that comes from the USB bus, and what's worse, it does so without properly moving the tail in the RX fifo buffer, so you end up with very corrupted data, not just lost, but bytes received later would be read before the older ones by the application.

I'll try to compile the sketch like it is with the Generic core and see what it does.

User avatar
Pito
Posts: 1498
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by Pito » Sun Aug 13, 2017 3:07 pm

@victor: do not use buffers in the sketch, do experiment with CDC_SERIAL_BUFFER_SIZE only plz.

Here it works only with this (Serial.readBytes() timeouts after 1 sec)

Code: Select all

  for (i = 0; i < (TXCHARS); i++) {
    Serial.write(x);
    //n+= Serial.readBytes(buf, bufsize);
    while(Serial.available()>0) {
      char dummy = Serial.read();
      n++;
    }
  }
and the result with Generic and #define CDC_SERIAL_BUFFER_SIZE 4096
CommEcho test.JPG
CommEcho test.JPG (25.53 KiB) Viewed 146 times
Interestingly with CDC_SERIAL_BUFFER_SIZE 128 I get a crap as well
CommEcho 128.JPG
CommEcho 128.JPG (36.45 KiB) Viewed 146 times
.
Here Win7_64bit.

Update: with

Code: Select all

size_t SerialUSBClass::write(const uint8_t *buffer, size_t size){
   unsigned long timeout=millis()+100; //5
the same..

Update2: with CommEcho buffer 2 or 10 or 1000 the same..
Last edited by Pito on Sun Aug 13, 2017 3:46 pm, edited 6 times in total.
Pukao Hats Cleaning Services Ltd.

vitor_boss
Posts: 61
Joined: Wed Apr 19, 2017 9:50 am

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by vitor_boss » Sun Aug 13, 2017 3:15 pm

W10 x64 creators update. I have disabled PERS buffers on COM advanced settings for both tests.

EDIT: I'm using default CDC_SERIAL_BUFFER_SIZE

victor_pv
Posts: 1603
Joined: Mon Apr 27, 2015 12:12 pm

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by victor_pv » Sun Aug 13, 2017 9:05 pm

Looks like the generic core dumps RX bytes if the app doesn't pick them, like libmaple F4, and unlinke Libmaple F1.
The function is here:
https://github.com/danieleff/STM32GENER ... B.cpp#L150

In libmaple F4, it would overwrite what's in the buffer. In the GENERIC core it will just not save the incoming packet to the buffer if there is no capacity, so if the buffer has let's say 10 bytes free, and a packet of 40 bytes comes in, it will write the first 10 bytes, and return. The packet will get over written with the next incoming packet since the communication is not stopped with the host and the 30 bytes that did not make it to the buffer will be lost forever.
So this will dump RX if the host sends faster than the sketch reads them.

On TX, it will write until the buffer fills up.

But if the buffer has capacity for only part of what we want to send, I believe it will write that part in the buffer, but not return the correct number of bytes that were buffered, instead return 0.
I.E. We want to send 100 bytes. Buffer has capacity for 40. It will buffer 40, and return 0. It should return 40, so the sketch can know what happened and can continue sending the rest.
That is not according to the Arduino API:
https://www.arduino.cc/en/Serial/Write
write() will return the number of bytes written, though reading that number is optional
So that will cause the sketches to end up sending corrupted data to the host if the buffer ever fills during a transmission.

https://github.com/danieleff/STM32GENER ... SB.cpp#L98
This is the part of the function doing that (breaking and returning 0 even if some bytes made it to the buffer):

Code: Select all

   for(size_t i=0; i < size; i++) {

       tx_buffer.buffer[tx_buffer.iHead] = *buffer;
       tx_buffer.iHead = (tx_buffer.iHead + 1) % sizeof(tx_buffer.buffer);
       buffer++;

       while(tx_buffer.iHead == tx_buffer.iTail && millis()<timeout);
       if (tx_buffer.iHead == tx_buffer.iTail) break;
   }
I think it could be corrected like this:

Code: Select all

   while(size--) {

       tx_buffer.buffer[tx_buffer.iHead] = *buffer;
       tx_buffer.iHead = (tx_buffer.iHead + 1) % sizeof(tx_buffer.buffer);
       buffer++;

       while(tx_buffer.iHead == tx_buffer.iTail && millis()<timeout);
       if (tx_buffer.iHead == tx_buffer.iTail) break;
   }
return size;
But there is more code in the function about setting a "transmitting" flag that I still need to check and understand, then I will see if the code may need anything else to not break something else.
There is also the possibility that the transmitting flag is 0 and then will return size even if the loop above "breaks".


I still have to look at why it fails with large buffers.

vitor_boss
Posts: 61
Joined: Wed Apr 19, 2017 9:50 am

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by vitor_boss » Mon Aug 14, 2017 4:13 am

victor_pv wrote:
Sun Aug 13, 2017 9:05 pm
...
I think it could be corrected like this:

Code: Select all

   while(size--) {

       tx_buffer.buffer[tx_buffer.iHead] = *buffer;
       tx_buffer.iHead = (tx_buffer.iHead + 1) % sizeof(tx_buffer.buffer);
       buffer++;

       while(tx_buffer.iHead == tx_buffer.iTail && millis()<timeout);
       if (tx_buffer.iHead == tx_buffer.iTail) break;
   }
return size;
It could be easier like this:

Code: Select all

if( i<size) { return i; }
else { return size; }
EDIT:
Another thing witch could improve speed is change frequency of calls to millis(), like that:

Code: Select all

   unsigned long lastmS=millis();
   for(size_t i=0; i < size; i++) {

       tx_buffer.buffer[tx_buffer.iHead] = *buffer;
       tx_buffer.iHead = (tx_buffer.iHead + 1) % sizeof(tx_buffer.buffer);
       buffer++;
      
       if(size & 16) { lastmS=millis(); } //Take less than a mS to send 16bytes
       while( (tx_buffer.iHead == tx_buffer.iTail) && (lastmS<timeout) );
       if (tx_buffer.iHead == tx_buffer.iTail) break;
   }

victor_pv
Posts: 1603
Joined: Mon Apr 27, 2015 12:12 pm

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by victor_pv » Mon Aug 14, 2017 4:30 am

vitor_boss wrote:
Mon Aug 14, 2017 4:13 am

It could be easier like this:

Code: Select all

if( i<size) { return i; }
else { return size; }
My code was wrong anyway, since I was going to decrement size, returning it would be wrong.
But I'm not sure I follow yours, why do you need to compare? i will never be larger than size, could be equal or smaller, if it is smaller, you need to return i, if it is equal, you can return size, but also return i, since they are equal. In that case, just return i always without having to compare to size.
But then the declaration of i needs to be moved out of the for loop.
Anyway, the function is still larger than that and does semething with a transmitting value which I think is messed up and contributing to the issues.

About this line, I don't think I am following you either. That timeout does not depend on how long it takes to send a number of bytes. It is rather a timeout so the function doesn't block forever if there is no host receiving the data.
vitor_boss wrote:
Mon Aug 14, 2017 4:13 am
if(size & 16) { lastmS=millis(); } //Take less than a mS to send 16bytes
[/code]
EDIT: Removed my previous comments, too late and I dont think I'm thinking right.

vitor_boss
Posts: 61
Joined: Wed Apr 19, 2017 9:50 am

Re: [STM32GENERIC/HAL] SerialUSB TX speed problem

Post by vitor_boss » Mon Aug 14, 2017 4:50 am

victor_pv wrote:
Mon Aug 14, 2017 4:30 am
vitor_boss wrote:
Mon Aug 14, 2017 4:13 am

It could be easier like this:

Code: Select all

if( i<size) { return i; }
else { return size; }
My code was wrong anyway, since I was going to decrement size, returning it would be wrong.
But I'm not sure I follow yours, why do you need to compare? i will never be larger than size, could be equal or smaller, if it is smaller, you need to return i, if it is equal, you can return size, but also return i, since they are equal. In that case, just return i always without having to compare to size.
But then the declaration of i needs to be moved out of the for loop.
Anyway, the function is still larger than that and does semething with a transmitting value which I think is messed up and contributing to the issues.

About this line, I don't think I am following you either. That timeout does not depend on how long it takes to send a number of bytes. It is rather a timeout so the function doesn't block forever if there is no host receiving the data.
vitor_boss wrote:
Mon Aug 14, 2017 4:13 am
if(size & 16) { lastmS=millis(); } //Take less than a mS to send 16bytes
[/code]
EDIT: Removed my previous comments, too late and I dont think I'm thinking right.
I didn't see the "i < size" condition, so just "return i" should work, about lastmS, forgive me if I misunderstood but that loop isn't to fill buffer? The real transmitting is done on https://github.com/danieleff/STM32GENER ... B.cpp#L121 right?

EDIT: UART speed is mediocre 10.49 KBytes/sec, any way to improve that?

User avatar
Pito
Posts: 1498
Joined: Sat Mar 26, 2016 3:26 pm
Location: Rapa Nui

Re: [STM32GENERIC/HAL] SerialUSB TX/RX speed problem

Post by Pito » Mon Aug 14, 2017 5:57 am

Frankly, I do not understand how we can "loose" (or we must drop or dump or overwrite) bytes while transmitting or receiving via USB.

The USB communication is based on "packets" where the RX/TX control is done via handshaking (ACK/NACK/STALL handshake packets), where packet sizes for control transfer stuff (like command and status) are 8 to 64 bytes in size, and the payload packet size is max 1024 bytes (actually 8, 16, 32, 64, 512, 1023 or 1024 based on the type of a transfer).

There are no bigger packet sizes, afaik (plz correct me).

When we set the RX buffer to 1024 and TX buffer to 1024 (perfectly feasible sizes for any stm32) we must not drop/dump/overwrite any bytes and thus we cannot loose any bytes when doing TX or RX via USB..
:?
Pukao Hats Cleaning Services Ltd.

victor_pv
Posts: 1603
Joined: Mon Apr 27, 2015 12:12 pm

Re: [STM32GENERIC/HAL] SerialUSB TX/RX speed problem

Post by victor_pv » Mon Aug 14, 2017 3:27 pm

Pito wrote:
Mon Aug 14, 2017 5:57 am
Frankly, I do not understand how we can "loose" (or we must drop or dump or overwrite) bytes while transmitting or receiving via USB.

The USB communication is based on "packets" where the RX/TX control is done via handshaking (ACK/NACK/STALL handshake packets), where packet sizes for control transfer stuff (like command and status) are 8 to 64 bytes in size, and the payload packet size is max 1024 bytes (actually 8, 16, 32, 64, 512, 1023 or 1024 based on the type of a transfer).

There are no bigger packet sizes, afaik (plz correct me).

When we set the RX buffer to 1024 and TX buffer to 1024 (perfectly feasible sizes for any stm32) we must not drop/dump/overwrite any bytes and thus we cannot loose any bytes when doing TX or RX via USB..
:?
You are correct Pito, but that is how it should work, but not always implemented like that. In the Generic core every packet that the host sent is ACKed inmediately even if it doesn't fit in the buffer. So if the packet doesn't fit, it stays in the USB device ram to be overwritten by the next packet sent by the host.
The libmaple F1 will stop ACKing packets when the buffer is full, so the host has to hold up. In the libmaple F4 SerialUSB, which is not really libmaple since it was added by AeroQuad from the Standard Peripheral Library, every packet is ACKed whether it fit in the buffer or not.
I have modified that already (libmaple F4) so it does not ACK when the buffer is full, and as soon as the buffer has capacity for 1 more packet, then it ACKs the previous one and the host can continue sending.

TX in all the cores (libmaple F1, F4 and Generic) has a timeout, if the transmission can't be completed within X mS, it returns. But the behaviour varies on that too. I need to confirm, but I believe is like follows:
Libmaple F1 & F4: Returns the number of bytes correctly queued for send.
Generic F4: Returns 0 even if some bytes were queue. Additionally, it has a "transmission" variable that as I can't manage to understand what exactly is intended to do. I think the intention is for it to indicate how many bytes are in the out buffer waiting to be sent, but that should rather be calculated with the head and tail of the queue. Also I am not sure that transmission variable gets the correct value depending on that path the code takes. But that may be me not understanding it correctly, although your confirmation that TX is corrupted and missing bytes I think is confirmation that doesn't work right.

Given that the Generic core is based in the HAL, it may be a good idea to replace the SerialUSB code with the one from STM.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest