Device drivers on SMP systems - Embedded Software

By Colin Walls

As I am on vacation, I thought that I would invite colleagues to provide a guest blog and Faheem Sheikh came up with the goods …

Multicore embedded designs are becoming increasingly common – a topic that I have addressed before. This presents some new challenges to software developers. Faheem was talking to an existing user of our Nucleus RTOS, who is considering a multicore design and, hence, a transition to the recently announced Nucleus SMP product. The matter of device drivers was raised, which brings up some interesting issues …

Symmetric multiprocessing (SMP) is a configuration that exposes multiple hardware resources to the software with the objective of achieving high performance and/or increased responsiveness from the system. Roughly speaking an OS can do this by exploiting thread level parallelism (TLP) which means running multiple threads (software contexts) simultaneously on separate cores. It is up to the developer to utilize TLP according to the requirements of their system. More often than not, it necessitates a new programming paradigm (both for development and debugging), if the developers want to benefit from performance gains of multiple cores. However, before high performance can be achieved, most developers must concern themselves with the functionality of their legacy software and device drivers once they move to a symmetric multicore solution. Of course, performance and responsiveness are only relevant if the code is functionally correct. The question is what does it take to make your existing low-level software to work in a multicore environment? The answer, somewhat surprisingly, is not much, especially if the SMP support in the OS is robust.

Here are some general guidelines:

All shared data structures accessed inside the Interrupt Service Routine(s) (ISR) should be protected using spinlocks.
If a device driver API contains any re-entrant code, make sure that interrupts are disabled before spinlock is acquired. This will avoid deadlocks, by not allowing another instantiation of the code on the same core.
Use separate locks for independent critical section paths/structures for reduced contention. However, using a large number of locks increases the possibility of a deadlock, so there is a trade-off.
Explicitly tell the application developer if the correct SMP safe behavior of an API depends on the use-case.

As an example, consider a serial UART driver working on an SoC. This system is transitioning to a multicore processor. What precautions should be taken to guarantee that the driver will still work in a new SMP context? Under ideal circumstances the requirement would be no change in the public interfaces and very little change in the code of the driver. We will assume an interrupt driven serial driver, as the polling case is relatively simpler. Most serial drivers supports APIs to initialize the UART, get and put character on the serial port, as well as printing a string on the UART console. In addition, there might be some other internal functions that help set up the UART and service the UART interrupts etc. We will assume that the following three basic APIs are exposed:

MySerial_Getchar (VOID) – Receives a character from default UART
MySerial_Putchar (int c) – Places a character at default UART
MySerial_Puts (const char* s) – Prints a character string from default UART

Some simple use-cases will help to illustrate the potential hazards and their solutions while porting our UART driver to a multicore processor:

Two threads each executing on different cores are trying to print strings on the UART console by calling MySerial_Puts(). If this API has been internally protected via a OS semaphore/mutex for correct multithreaded execution, and the OS has robust SMP support (meaning it has upgraded all its components including semaphores or mutex for SMP operation), then no change in this API is required. The concurrent execution will be handled by internal OS spinlocks.
One thread is printing on the UART using MySerial_Putchar(). The other one is waiting for a command on the serial line calls MySerial_getchar() as a result of RX serial interrupt. This can be a potential problem, since both the cores might be running the ISR simultaneously. A first level approach is to protect the whole ISR with a spinlock. Although this will ensure correctness of the driver it can drag down performance. The developer can think about introducing separate spinlocks for transmit and receive paths, if there is mutual exclusion in the respective data paths.
Sometimes it is just not worth the effort to make the device driver absolutely safe for multicore execution. Often, it is up to the application code to make sure it is judiciously using a shared resource. For instance, if two threads are simultaneously using MySerial_Putchar(), they should know that their characters can get lost or get printed in no particular order. Instead of introducing new spinlocks in the MySerial_Putchar() API, it is better to document this functionality so that an application programmer will take care when using this API in an SMP context.

Hopefully this post will help somewhat by allying “fears of concurrency” in embedded developers eying a multicore upgrade to their systems.

Leave a Reply Cancel reply