Another "blue pill" synthesizer

What are you developing?
Post Reply
tfried
Posts: 21
Joined: Mon Dec 04, 2017 8:45 pm

Another "blue pill" synthesizer

Post by tfried » Tue Jan 02, 2018 9:28 pm

I am aware this is not the first such project, but as far as I can see, most projects did not really go anywhere, after all... The ones that did seem to be mostly libraries or frameworks, while what I am aiming at is a synthesizer that is versatile enough to be something that you'd just want to build as is (while still keeping it reasonably easy to customize).

Well, here is what I got so far, an almost serious synth https://github.com/tfry-git/almost.serious.synth, based on the Mozzi library (http://sensorium.github.io/Mozzi/). Features already present:
  • Sound generation based on 15 parameters, including envelope, mixing of two waveforms, low frequency oscillator for various effects, low pass filter.
  • Polyphonic (currently "up to" 12 notes)
  • MIDI input
  • Saving and loading of (currently) one synth voice on SD card
  • Recording and playback of (currently) one MIDI sequence on SD card
  • Low hardware requirements, with some flexibility - currently:
    • a 128*64 SSD1306 display
    • a 4 by 4 button matrix
    • one joystick, _or_ potentiometer, _or_ rotary encoder
    • SD card reader (optional, but recommended)
    • Audio amp and speaker _or_ simply a headphone w/o amp
Features clearly on the radar:
  • Saving / loading of several named voices
  • Saving / loading of several named MIDI sequences
  • More synthesizer effects (e.g. separate envelopes for the oscillators)
  • Support for more hardware options (esp. display)
More details in the readme. Would love to hear your feedback, if you give this a try... ... and of course you're welcome to contribute!

tfried
Posts: 21
Joined: Mon Dec 04, 2017 8:45 pm

Re: Another "blue pill" synthesizer

Post by tfried » Wed Jan 10, 2018 9:23 am

Well, so the project is coming along nicely. It now also has a (somewhat clumsy) UI to handle loading and saving of an arbitrary number of voices and MIDI recordings to SD card.

Now I might have become a bit distracted from my original direction, here, but next I dug into the MIDI playback some deeper, and now - in principle (see below) - the Synth can play back (almost) arbitrary MIDI files from SD card. Which is rather cool, also because it means the project could eventually incorporate a multi-track sequencer.

Well, all of that is lofty plans. For now there is not even support for using different voices for different channels. However, I can play already back MIDI format 1 files, which means a collection of simultaneous tracks. However, doing so results in choppy sounds, due to frequent buffer underruns. After quite a bit of trial and error, I think I'm beginning to understand that the primary cause of my troubles is that I need to read from disjunct positions in the file on SD card, i.e. I need to seek a lot, and that seems to be really slow (simple timing suggests seeks are almost 3ms on average, and I only have 8ms of audio buffer). To explain the reason for the seeking: MIDI format 1 means several "tracks" of events that may happen in parallel (such as a note starting in the "piano" track, and another in the "bass" track), but the tracks are stored sequentially. There could be up to 65535 parallel tracks, but for now I've set a rather arbitrary limit at 16 tracks (and I don't think many real world MIDI files exceed 32 tracks). The bandwidth of data to read is not terribly huge. Typical MIDI events require between 3 and 8 bytes for storage on file, and while there is not a defined upper limit, if you can handle a peak of some two dozen events inside 10ms or so, you should be golden, even for a largish arrangement.

So what I'd like to pick your brains about is ideas to improve on the performance of MIDI playback, here. Some initial ideas:
  • I could convert the MIDI file to format 0 (all events in a single track, i.e. strictly sequential), before playback. Seems pretty clumsy, to me, and I'm worried that it will cause annoying delays.
  • Keep an IO buffer per track. But memory requirements are multiplicative and so the buffers will have to remain rather small. Also, now the problem will shift to keeping several disjunct IO buffers filled. It seems I would need to have a way to fill several buffers a) asynchronously, and b) in parallel. Has anyone here done such a thing, before?
  • So far I'm using the "Arduino" SD-library. I understand that may not be the most performant option, but would you expect switching to "SdFat" will help me, significantly?
  • Your ideas
Any magic bullets for sale on this forum?

stevestrong
Posts: 2067
Joined: Mon Oct 19, 2015 12:06 am
Location: Munich, Germany
Contact:

Re: Another "blue pill" synthesizer

Post by stevestrong » Wed Jan 10, 2018 1:18 pm

You could read in several sequential tracks in beforehand and store them in separate temporary files on SD.
SdFat allows you to open multiple files in the same time, so you would not need to seek. Though it may come with separate 512 bytes buffers in RAM per file, I am not sure, check the SdFat help for more info.
Moreover, if you use the SdFatEX class, it would speed up the read process a lot.

tfried
Posts: 21
Joined: Mon Dec 04, 2017 8:45 pm

Re: Another "blue pill" synthesizer

Post by tfried » Wed Jan 10, 2018 7:41 pm

You could read in several sequential tracks in beforehand and store them in separate temporary files on SD. SdFat allows you to open multiple files in the same time, so you would not need to seek.
Hm, yes, or actually, for the sequencer use case, it may make sense to keep all tracks as separate files, and only export to MIDI format 1 on demand (for exchange). But then I'm still kind of fond of the idea of working on valid MIDI files, directly. Your suggestion would allow for that, but it is also not entirely trivial to implement, and I'm not sure how much will it help. Wouldn't reading from several files simultaneously be much the same as seeking inside one file, on the level of SD card access?
Moreover, if you use the SdFatEX class, it would speed up the read process a lot.
Thanks for the hint. So I did give SdFat a try, and at any rate, the results were quite interesting. I used the following code for timing (where f is an already opened file of a bit more than 5kB):

Code: Select all

    volatile uint32_t do_profile_dummy; // to keep the compiler from doing any clever optimizations
    uint32_t oldt = millis ();
    for (uint32_t i = 0; i < 10000ul; ++i) {
      if (!i % 5000) {
        f.seek (0);
      }
      do_profile_dummy += f.read ();
    }
    elapsed = millis () - oldt;   // number of milliseconds needed for 10000 mostly consequtive SD reads

// ... and...

    oldt = millis ();
    for (uint32_t i = 0; i < 10000ul; ++i) {
      f.seek ((i * 967 % 5000));
      do_profile_dummy += f.read ();
    }
    elapsed = millis () - oldt; // number of milliseconds needed for 10000 mostly "random" SD 
And the result was

Code: Select all

                SD              SdFat              SdFatEx
contiguous      88                 47                   43
random       25899               4673                 5463
----
flash usage    (0)              +1312                +1392
RAM usage      (0)              + 760                + 768
(timings in ms, RAM and flash consumption in bytes, with use of SD as baseline)

So using SdFat does cut down on the seek latency, quite significantly, at the cost of a bit of flash and RAM. Using SdFatEx increases the throughput (I imagine that differences might be more noticeable when using larger reads), but apparently leads to slightly slower seeking.

The difference between using SdFat and SD is clearly audible, but unfortunately, that alone is not enough to make the problem go away, entirely...

victor_pv
Posts: 1868
Joined: Mon Apr 27, 2015 12:12 pm

Re: Another "blue pill" synthesizer

Post by victor_pv » Wed Jan 10, 2018 9:43 pm

stevestrong wrote:
Wed Jan 10, 2018 1:18 pm
Moreover, if you use the SdFatEX class, it would speed up the read process a lot.
Adding to what Steve said, you should gain speed when using the SdFat library since it uses DMA, more so with the SdFatEX class, but even with the normal one. SdFatEX will give you better speed for sequential access.
So if you can format the card, and write some large files, the difference should be more noticeable. Even if the card is not formatted recently, with large files you have a good chance of getting multiple sequential blocks, and those benefit from SdFatEX.
Also, the larger the buffer you are reading too, the more it will benefit.
If you read 512 bytes at a time, should not make a difference, but if you read 2KB or 4KB blocks, you should see a noticeable difference in large files.
Once more thing, since SdFat uses DMA, you could do something else with the CPU while it's reading data from the sdcard, but that requires modifying the SdFat library so instead of blocking when doing DMA it calls whatever other function you want to run int he meanwhile.

Another way to take advantage of that is if your code uses interrupts to do things. So while the SdFat is waiting for a DMA transfer to be over, a timed interrupt could trigger, call the ISR and do something else. The SdFat transfer won't be affected much since the data moving is done by DMA (still needs CPU time to initiate and finish the transfer).
so sequentially the CPU could be doing this:
1-SdFat gets called to read 1KB.
2-SdFat sets up the transfer and triggers DMA.
3-Interrupt triggers, CPU jumps to ISR and start executing it (in background DMA controller continues SPI transfer)
4-ISR returns, CPU jumps back to SdFat
5-CPU waits until DMA finishes, if it had not finished yet, and then returns from SdFat read call.

tfried
Posts: 21
Joined: Mon Dec 04, 2017 8:45 pm

Re: Another "blue pill" synthesizer

Post by tfried » Thu Jan 11, 2018 9:04 am

Re DMA: Out of curiosity, I tried disabling DMA for SdFat(EX). Here's an updated table:

Code: Select all

                SD      SdFat      SdFatEx   SdFat no DMA   SdFatEx no DMA
contiguous      88         47           43             49               45
random       25899       4673         5463           5151             6214
----
flash usage    (0)      +1312        +1392           +864             +960
RAM usage      (0)      + 760        + 768           +752             +760
So DMA transfers do help, but they do not seem to be the big factor in my use case (ignoring CPU load). Again, my issue does not revolve around throughput, but around latency (frequent seeks and small reads).

Re buffers: Yes, I guess I will need to add some buffers (per track). But I need to support at least 16 of them, and might even want to support 32+, so I cannot afford any significant buffer size. Well, probably I do not need terribly large buffers, either. While the MIDI event rate is hugely variable, you'll rarely see more than a few hundred bytes per second on average. It's all about making sure I don't have to nibble bytes from various locations on the card in the middle of processing.

I'm thinking about using a double buffer strategy, where processing would happen on one buffer "in real time", while the other buffer would be refilled from SD in the background, asynchronously. So next issue:

Re asynchronous processing:
victor_pv wrote:
Wed Jan 10, 2018 9:43 pm
Once more thing, since SdFat uses DMA, you could do something else with the CPU while it's reading data from the sdcard, but that requires modifying the SdFat library so instead of blocking when doing DMA it calls whatever other function you want to run int he meanwhile.
You're saying that that is an easy thing to do. Is it? Even if, I do suppose I'd have to be careful to have only one DMA transfer going on at the same time, right?
victor_pv wrote:
Wed Jan 10, 2018 9:43 pm
Another way to take advantage of that is if your code uses interrupts to do things. So while the SdFat is waiting for a DMA transfer to be over, a timed interrupt could trigger, call the ISR and do something else.
Thanks! Sounds like exactly the strategy I was looking for. Although I think a) I'll rather put it into a wrapper around read(), than patching SdFat, and b) Preferably I'd want it the other way around: I'm doing whatever I'm doing, and meanwhile, somewhere in the background, a routine is filling up my read buffers, so ideally I will never (rarely) have to wait for IO. That does cue some follow-up questions, though, showing my lack of basic understanding of the way ISRs are handled:
  • The main loop can/will be interrupted by ISRs at any point (unless explicitly protected). Correct?
  • An ISR can be interrupted by a different ISR, but neither by "itself" nor the main loop. Correct?
  • Is there any way to tell an ISR to "yield" to the main loop?
  • Are ISRs the only multi-threading mechanism (easily) available on STM32F1?
(Trying to understand my options, as use of the Mozzi library already implies processing done in two timer ISRs in addition to the main loop. Currently I'm doing the MIDI handling in one of those ISRs, though I guess I could move it to the main loop, instead. But at any rate, the setup is not exactly trivial to start with...)

victor_pv
Posts: 1868
Joined: Mon Apr 27, 2015 12:12 pm

Re: Another "blue pill" synthesizer

Post by victor_pv » Thu Jan 11, 2018 5:41 pm

Steve put quite some effort in optimizing the normal SPI functions to achieve max speed, so the results you see are expected. There are some other threads where we discussed those optimizations. Since sending or receiving 1 byte takes at the very least 16 cpu cycles, he made the loop tight enough that we can send or receive a bunch of bytes without a pause between them.

DMA comes into play if you really use that CPU time for something else.
It's relatively easy to do if you understand how the transfer work and are willing to modify some things. Recnetly we added a few functions to the SPI library to help with that. I promised Steve i would write a short wiki on how to use them, I guess I really need to do that.
About DMA the DMA controller has multiple channels that can be used simultaneously, so you have 2 transfers at the same time one in SPI1 and one in SPI2. But the program can't start 2 DMA transfers to SPI1, since we haven't implemented any kind of queue system. You could within your application.

The double buffering should help, even if you use small ones, from what you are describing as your problems.

About patching SdFat, you really need to if you want the CPU to do something else efficiently while SdFat is doing a DMA transfer. Using interrupts alone doesn't guarantee that your ISR will trip exactly when SdFat is waiting for DMA to complete. Patching it is very easy though. The library has all SPI functions in 1 single file, so you only need to make small changes to that one file.

On your ISR questions:
Correct
Correct'
No, unless the "main loop" is actually another ISR
No, you could use an RTOS. CoOS and FreeRTOS are ported. Also you could implement a simpler multi-threading just using the systick timer interrupt.

I'm a very lousy programmer, there are better ones that may have better advices, but an RTOS seems like could make your life easier at the cost of memory resources. Otherwise if you can schedule everything on your own with ISRs, systick, etc, that probably gets you more things done, but requires more work in planning it all.

tfried
Posts: 21
Joined: Mon Dec 04, 2017 8:45 pm

Re: Another "blue pill" synthesizer

Post by tfried » Fri Jan 12, 2018 9:53 am

victor_pv wrote:
Thu Jan 11, 2018 5:41 pm
The double buffering should help, even if you use small ones, from what you are describing as your problems.
Indeed. I now implemented per track buffers (using only one ring buffer per track), and that does make a huge difference, indeed. My sample files now play back reasonably, even with the Arduino SD library (using SdFat still makes a difference on the most densely packaged beats, though).

All this even despite the fact that the buffer refill strategy is entirely non-parallel for now. Instead, buffers are refilled, a) when the buffer is empty, but I need the next byte, and b) when no events had to be processed in one iteration of the loop, the first buffer that is half full or less will be refilled, proactively. I think it should be very possible to squeeze out yet more performance by tweaking this a bit (but I'll need to get some patches into the Mozzi library before I can implement some of the more promising ideas).
About patching SdFat, you really need to if you want the CPU to do something else efficiently while SdFat is doing a DMA transfer.
Yes, I started to realize that after some more thinking. Well does not seem worth it, currently. As I can "afford" the latency of a single seek, all I have to worry about is making sure I don't get too many seeks directly next to each other.
but an RTOS seems like could make your life easier at the cost of memory resources
Thanks for the pointer. Not something for this project, I guess (which is growing in flash and RAM much faster than I had anticipated). But may be a useful option in the future.

Post Reply