diff --git a/Documentation/guides/index.rst b/Documentation/guides/index.rst index 9ea60fcd65e3f..64e388916eaef 100644 --- a/Documentation/guides/index.rst +++ b/Documentation/guides/index.rst @@ -52,3 +52,6 @@ Guides port_drivers_to_stm32f7.rst semihosting.rst renode.rst + signal_events_interrupt_handlers.rst + signaling_sem_priority_inheritance.rst + smaller_vector_tables.rst \ No newline at end of file diff --git a/Documentation/guides/signal_events_interrupt_handlers.rst b/Documentation/guides/signal_events_interrupt_handlers.rst new file mode 100644 index 0000000000000..c363d7daccb37 --- /dev/null +++ b/Documentation/guides/signal_events_interrupt_handlers.rst @@ -0,0 +1,272 @@ +======================================== +Signaling Events from Interrupt Handlers +======================================== + +.. warning:: Migrated from + https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Events+from+Interrupt+Handlers + +Best way to wake multiple threads from interrupt? +================================================= + + I want to make a character device driver that passes the same data to + all tasks that are reading it. It is not so important whether the data + is queued or if just latest sample is retrieved. Problem is just how to + wake up the waiting threads. + +At the most primitive level, a thread can be waiting for a semaphore, a signal, +or a message queue (not empty or not full). Then there are higher +level wrappers around these like mutexes, semaphores, poll waits, +etc. But under the hood those are the three fundamental wait +mechanisms. Any could be used to accomplish what you want. + +In NuttX, some additional effort was put into the design of the signalling +side of each of the IPCs so that they could be easily used by interrupts +handlers. This behavior is unique to NuttX; POSIX says nothing about +interrupt handlers. As a result, we will be talking about primarily +non-portable OS interfaces. + + So far I've considered the following options: + +And you basically have gone through the list of wait mechanisms: + +Message Queues +============== + + 1) Open a message queue when the device is opened (a new queue for each + task) and keep them in a list. Post to a non-blocking endpoint of these + queues in the ISR. Read from a blocking endpoint in the device ``read()``. + I would need to generate names for the message queues, as there doesn't + seem to be anonymous message queues? + +When you start a project. It is a good idea to decide upon a common IPC +mechanism to base your design on. POSIX message queues are one good +choice to do that: Assign each thread a message queue and the ``main()`` +of each thread simply waits on the message queue. It is a good +architecture and used frequently. + +However, I would probably avoid creating a lot of message queues just +to support the interrupt level signaling. There are other ways to do +that that do not use so much memory. So, if you have message queues, +use them. If not, keep it simple. + +In this case, your waiting task will block on a call to ``mq_receive()`` +until a message is received. It will then wake up and can process +the message. In the interrupt handler, it will call ``mq_send()`` when +an event of interest occurs which will, in turn, wake up the waiting +task. + +Advantages of the use of message queues in this case are that 1) you +can pass quite a lot of data in the message, and 2) it integrates +well in a message-based application architecture. A disadvantage +is that there is a limitation on the number of messages that can be +sent from an interrupt handler so it is possible to get data overrun +conditions, that is, more interrupt events may be received than can +be reported with the available messages. + +This limitation is due to the fact that you cannot allocate memory +dynamically from an interrupt handler. Instead, interrupt handlers +are limited to the use of pre-allocated messages. The number of +pre-allocated messages is given by ``CONFIG_PREALLOC_MQ_MSGS`` + 8. +The ``CONFIG_PREALLOC_MQ_MSGS`` can be used either by normal tasking +logic or by interrupt level logic. The extra eight are an emergency +pool for interrupt handling logic only (that value is not currently +configurable). + +If the task logic consumes all of the ``CONFIG_PREALLOC_MQ_MSGS`` messages, it +will fall back to dynamically allocating messages at some cost to +performance and deterministic behavior. + +If the interrupt level consumes all of the ``CONFIG_PREALLOC_MQ_MSGS`` +messages, it will fall back and use the emergency pool of 8 +pre-allocated messages. If those are also exhausted, then the message +will not be sent and an interrupt is effectively lost. + +Semaphores +========== + + 2) Allocate a semaphore per each device open and keep them in a list. + Post the semaphores when new data is available in a shared buffer. + Read the data inside ``sched_lock()``. + +If you don't have an architecture that uses message queues, and all of +these threads are waiting only for the interrupt event and nothing else, +then signaling semaphores would work fine too. You are basically using +semaphores as condition variables in this case so you do have to be careful. + +NOTE: You do not need multiple semaphores. You can do this with a single +semaphore. If the semaphore is used for this purpose then you initialize +it to zero: + +.. code-block:: c + + sem_init(&sem, 0, 0); + sem_setprotocol(&sem, SEM_PRIO_NONE); + +``sem_setprotocol()`` is a non-standard NuttX function that should be called +immediately after the ``sem_init()``. The effect of this function call is to +disable priority inheritance for that specific semaphore. There should +then be no priority inheritance operations on this semaphore that is +used for signaling. See :doc:`/guides/signaling_sem_priority_inheritance` +for further information. + +Since the semaphore is initialized to zero, each time that a thread joins +the group of waiting threads, the count is decremented. So a simple loop +like this would wake up all waiting threads: + +.. code-block:: c + + int svalue; + int ret; + + for (; ; ) + { + ret = sem_getvalue(&sem, &svalue); + if (svalue < 0) + { + sem_post(&sem); + } + else + { + break; + } + } + +NOTE: This use of ``sem_getvalue()`` is not portable. In many environments, +``sem_getvalue()`` will not return negative values if there are waiters on +the semaphore. + +The above code snippet is essentially what the NuttX +``pthread_cond_broadcast()`` does (see `nuttx/sched/pthread_condbroadcast.c `_). +In NuttX condition variables are really just wrappers around semaphores +that give them a few new properties. You could even call +``pthread_cond_broadcast()`` from an interrupt handler: See +http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_signal.html +for usage information. + +Neither of the above mechanisms are portable uses of these interfaces. +However, there is no portable interface for communicating directly with +interrupt handlers. + +If you want to signal a single waiting thread, there are simpler things +you an do. In the waiting task: + +.. code-block:: c + + semt_t g_mysemaphore; + volatile bool g_waiting; + ... + + sem_init(&g_mysemaphore); + sem_setprotocol(&g_mysemaphore, SEM_PRIO_NONE); + ... + + flags = enter_critical_section(); + g_waiting = true; + while (g_waiting) + { + ret = sem_wait(&g_mysemaphore); + ... handler errors ... + } + + leave_critical_section(flags); + +In the above code snippet, interrupts are disabled to set and test +``g_waiting``. Interrupts will, of course, be re-enabled automatically +and atomically while the task is waiting for the interrupt event. + +Then in the interrupt handler + +.. code-block:: c + + extern semt_t g_mysemaphore; + extern volatile bool g_waiting; + ... + + if (g_waiting) + { + g_waiting = false; + sem_post(&g_mysemaphore); + } + +An integer type counter could also be used instead of a type bool to +support multiple waitings. In that case, this is equivalent to the +case above using ``sem_getvalue()`` but does not depend on non-portable +properties of ``sem_getvalue()``. + +NOTE: There is possibility of improper interactions between the +semaphore when it is used for signaling and priority inheritance. +In this case, you should disable priority inheritance on the +signaling semaphore using ``sem_setprotocol(SEM_PRIO_NONE)``. See +:doc:`/guides/signaling_sem_priority_inheritance` +for further information. + +Signals +======= + + 3) Store the thread id's in a list when ``read()`` is called. Wake up the + threads using ``sigqueue()``. Read the data from a shared buffer + inside ``sched_lock()``. + +Signals would work fine too. Signals have a side-effect that is sometimes +helpful and sometimes a pain in the butt: They cause almost all kinds of +waits (``read()``, ``sem_wait()``, etc.) to wake up and return an error with +``errno=EINTR``. + +That is sometimes helpful because you can wake up a ``recv()`` or a ``read()`` +etc., detect the event that generated the signal, and do something +about it. It is sometimes a pain because you have to remember to +handle the ``EINTR`` return value even when you don't care about it. + +The POSIX signal definition includes some support that would make this +easier for you. This support is not currently implemented in NuttX. +The ``kill()`` interface for example +(http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html) +supports this behavior: + +"If pid is 0, sig will be sent to all processes (excluding an unspecified +set of system processes) whose process group ID is equal to the process +group ID of the sender, and for which the process has permission to send +a signal. + +"If pid is -1, sig will be sent to all processes (excluding an unspecified +set of system processes) for which the process has permission to send that +signal." + +"If pid is negative, but not -1, sig will be sent to all processes (excluding +an unspecified set of system processes) whose process group ID is equal to +the absolute value of pid, and for which the process has permission to send +a signal." + +NuttX does not currently support process groups. But that might be a good +RTOS extension. If you and others think that would be useful I could +probably add the basics of such a feature in a day or so. + +``poll()`` +========== + + Is there some better way that I haven't discovered? + +The obvious thing that you did not mention is ``poll()``. See +http://pubs.opengroup.org/onlinepubs/009695399/functions/poll.html . +Since you are writing a device driver, support for the ``poll()`` method +in your driver seems to be the natural solution. See the ``drivers/`` +directory for many examples, ``drivers/pipes/pipe_common.c`` for one. +Each thread could simply wait on ``poll()``; when the event occurs the +driver could then wake up the set of waiters. Under the hood, this +is again just a set of ``sem_post``'s. But it is also a very standard +mechanism. + +In your case, the semantics of ``poll()`` might have to be bent just a +little. You might have to bend the meaning of some of the event +flags since they are all focused on data I/O events. + +Another creative use of ``poll()`` for use in cases like this: + + That would be something great! PX4 project has that implemented somehow + (in C++), so maybe - if license permits - it could be ported to NuttX in + no time? + + https://pixhawk.ethz.ch/px4/dev/shared_object_communication + +I don't know a lot about this, but it might be worth looking into +if it matches your need. \ No newline at end of file diff --git a/Documentation/guides/signaling_sem_priority_inheritance.rst b/Documentation/guides/signaling_sem_priority_inheritance.rst new file mode 100644 index 0000000000000..b1d37de500f31 --- /dev/null +++ b/Documentation/guides/signaling_sem_priority_inheritance.rst @@ -0,0 +1,212 @@ +============================================= +Signaling Semaphores and Priority Inheritance +============================================= + +.. warning:: Migrated from + https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance + +Locking vs Signaling Semaphores +=============================== + +Locking Semaphores +------------------ +POSIX counting semaphores have multiple uses. The typical usage is where +the semaphore is used as lock on one or more resources. In this typical +case, priority inheritance works perfectly: The holder of a semaphore +count must be remembered so that its priority can be boosted if a higher +priority task requires a count from the semaphore. It remains the +holder until the same task calls ``sem_post()`` to release the count on +the semaphore. + +Mutual Exclusion Example +------------------------ +This usage is very common for providing mutual exclusion. The semaphore +is initialized to a value of one. The first task to take the semaphore +has access; additional tasks that need access will then block until +the first holder calls ``sem_post()`` to relinquish access: + ++---------------------+--------------------+ +| **TASK A** | **TASK B** | ++=====================+====================+ +| `have access` | | ++---------------------+--------------------+ +| `priority boost` | **sem_wait(sem);** | ++---------------------+--------------------+ +| `priority restored` | `have access` | ++---------------------+--------------------+ +| **sem_post(sem);** | | ++---------------------+--------------------+ +| **sem_wait(sem);** | | ++---------------------+--------------------+ +| | `blocked` | ++---------------------+--------------------+ + +The important thing to note is that ``sem_wait()`` and ``sem_post()`` both +called on the same thread, TASK A. When ``sem_wait()`` succeeds, TASK +A becomes the holder of the semaphore and, while it is the holder +of the semaphore (1) other threads, such as TASK B, cannot access +the protected resource and (2) the priority of TASK A may be modified +by the priority inheritance logic. TASK A remains the holder until +is calls ``sem_post()`` on the `same thread`. At that time, (1) its +priority may be restored and (2) TASK B has access to the resource. + +Signaling Semaphores +-------------------- +But a very different usage model for semaphores is for signaling +events. In this case, the semaphore count is initialized to +zero and the receiving task calls ``sem_wait()`` to wait for the +next event of interest to occur. When an event of interest is +detected by another task (or even an interrupt handler), +``sem_post()`` is called which increments the count to 1 and +wakes up the receiving task. + +Signaling Semaphores and Priority Inheritance details +===================================================== + +Example +------- +For example, in the following TASK A waits on a semaphore +for events and TASK B (or perhaps an interrupt handler) +signals task A of the occurrence of the events by posting +to that semaphore: + ++--------------------------+--------------------+ +| **TASK A** | **TASK B** | ++==========================+====================+ +| **sem_init(sem, 0, 0);** | | ++--------------------------+--------------------+ +| **sem_wait(sem);** | | ++--------------------------+--------------------+ +| `blocked` | | ++--------------------------+--------------------+ +| | **sem_post(sem);** | ++--------------------------+--------------------+ +| `Awakens as holder` | | ++--------------------------+--------------------+ + +Notice that unlike the mutual exclusion case above, +``sem_wait()`` and ``sem_post()`` are called on `different` +threads. + +Usage in Drivers +---------------- + +This usage case is used often within drivers, for example, +when the user calls the ``read()`` method and there is no data +available. ``sem_wait()`` is called to wait for new data to be +received; ``sem_post()`` is called when the new data arrives +and the user task is re-awakened. + +Priority Inheritance Fails +-------------------------- + +These two usage models, the locking modeling and the +signaling model, are really very different and priority +inheritance simply does not apply when the semaphore is +used for signalling rather than locking. In this signaling +case priority inheritance can interfere with the operation +of the semaphore. The problem is that when TASK A is +awakened it is a holder of the semaphore. Normally, a +task is removed from the holder list when it finally +releases the semaphore via ``sem_post()``. + +In this case, TASK B calls ``sem_post(sem)`` but TASK B is +not the holder of the semaphore. Since TASK A never +calls ``sem_post(sem)`` it becomes a permanently a holder +of the semaphore and may have its priority boosted at +any time when any other task tries to acquire the +semaphore. + +Who's to Blame +-------------- + +In the POSIX case, priority inheritance is specified only +in the pthread mutex layer. In NuttX, on the other hand, +pthread mutexes are simply built on top of binary locking +semaphores. Hence, in NuttX, priority inheritance is +implemented in the semaphore layer. + +In the case of a mutex this could be simply resolved since +there is only one holder but for the case of counting +semaphores, there may be many holders and if the holder +is not the thread that calls ``sem_post()``, then it is not +possible to know which thread/holder should be released. + +Selecting the Semaphore Protocol +================================ + +``sem_setprotocol()`` +--------------------- + +The fix is to call non-standard NuttX function +``sem_setprotocol(SEM_PRIO_NONE)`` immediately after the +``sem_init()``. The effect of this function call is to +disable priority inheritance for that specific +semaphore. There should then be no priority inheritance +operations on this semaphore that is used for signalling. + +.. code-block:: C + + sem_t sem + // ... + sem_init(&sem, 0, 0); + sem_setprotocol(&sem, SEM_PRIO_NONE); + +Here is the rule: If you have priority inheritance +enabled and you use semaphores for signaling events, +then you `must` call ``sem_setprotocol(SEM_PRIO_NONE)`` +immediately after initializing the semaphore. + + +Why Another Non-Standard OS Interface? +-------------------------------------- + +The non-standard ``sem_setprotocol()`` is the `moral` +`equivalent` of the POSIX ``pthread_mutexattr_setprotocol()`` +and its naming reflects that relationship. In most +implementations, priority inheritance is implemented +only in the pthread mutex layer. In NuttX, on the +other hand, pthread mutexes are simply built on top +of binary locking semaphores. Hence, in NuttX, +priority inheritance is implemented in the semaphore +layer. This architecture then requires an interface +like ``sem_setprotocol()`` in order to manage the protocol +of the underlying semaphore. + + +``pthread_mutexattr_setprotocol()`` +----------------------------------- + +Since NuttX implements pthread mutexes on top of +binary semaphores, the above recommendation also +applies when pthread mutexes are used for inter-thread +signaling. That is, a mutex that is used for +signaling should be initialize like this (simplified, +no error checking here): + +.. code-block:: c + + pthread_mutexattr_t attr; + pthread_mutex_t mutex; + // ... + pthread_mutexattr_init(&attr); + pthread_mutexattr_settype(&attr, PTHREAD_PRIO_NONE); + pthread_mutex_init(&mutex, &attr); + +Is this Always a Problem? +========================= + +Ideally ``sem_setprotocol(SEM_PRIO_NONE)`` should be +called for all signaling semaphores. But, no, +often the use of a signaling semaphore with priority +inversion is not a problem. It is not a problem +if the signaling semaphore is always taken on +the same thread. For example: + +* If the driver is used by only a single task, or +* If the semaphore is only taken on the worker thread. + +But this can be a serious problem if multiple tasks +ever wait on the signaling semaphore. Drivers like +the serial driver, for example, have many user +threads that may call into the driver. \ No newline at end of file diff --git a/Documentation/guides/smaller_vector_tables.rst b/Documentation/guides/smaller_vector_tables.rst new file mode 100644 index 0000000000000..6a13f082cc6d5 --- /dev/null +++ b/Documentation/guides/smaller_vector_tables.rst @@ -0,0 +1,472 @@ +===================== +Smaller Vector Tables +===================== + +.. warning:: + Migrated from: + https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables + + +One of the largest OS data structures is the vector table, +``g_irqvector[]``. This is the table that holds the vector +information when ``irq_attach()`` is called and used to +dispatch interrupts by ``irq_dispatch()``. Recent changes +have made that table even larger, for 32-bit arm the +size of that table is given by: + +.. code-block:: c + + nbytes = number_of_interrupts * (2 * sizeof(void *)) + +We will focus on the STM32 for this discussion to keep +things simple. However, this discussion applies to all +architectures. + +The number of (physical) interrupt vectors supported by +the MCU hardwared given by the definition ``NR_IRQ`` which +is provided in a header file in ``arch/arm/include/stm32``. +This is, by default, the value of ``number_of_interrupts`` +in the above equation. + +For a 32-bit ARM like the STM32 with, say, 100 interrupt +vectors, this size would be 800 bytes of memory. That is +not a lot for high-end MCUs with a lot of RAM memory, +but could be a show stopper for MCUs with minimal RAM. + +Two approaches for reducing the size of the vector tables +are described below. Both depend on the fact that not all +interrupts are used on a given MCU. Most of the time, +the majority of entries in ``g_irqvector[]`` are zero because +only a small number of interrupts are actually attached +and enabled by the application. If you know that certain +IRQ numbers are not going to be used, then it is possible +to filter those out and reduce the size to the number of +supported interrupts. + +For example, if the actual number of interrupts used were +20, the the above requirement would go from 800 bytes to +160 bytes. + +Software IRQ Remapping +====================== + +`[On March 3, 2017, support for this "Software IRQ Remapping" +as included in the NuttX repository.]` + +One of the simplest way of reducing the size of +``g_irqvector[]`` would be to remap the large set of physical +interrupt vectors into a much small set of interrupts that +are actually used. For the sake of discussion, let's +imagine two new configuration settings: + +* ``CONFIG_ARCH_MINIMAL_VECTORTABLE``: Enables IRQ mapping +* ``CONFIG_ARCH_NUSER_INTERRUPTS``: The number of IRQs after mapping. + +Then it could allocate the interrupt vector table to be +size ``CONFIG_IRQ_NMAPPED_IRQ`` instead of the much bigger +``NR_IRQS``: + +.. code-block:: c + + #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE + struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS]; + #else + struct irq_info_s g_irqvector[NR_IRQS]; + #endif + +The ``g_irqvector[]`` table is accessed in only three places: + +``irq_attach()`` +---------------- + +``irq_attach()`` receives the physical vector number along +with the information needed later to dispatch interrupts: + +.. code-block:: c + + int irq_attach(int irq, xcpt_t isr, FAR void *arg); + +Logic in ``irq_attach()`` would map the incoming physical +vector number to a table index like: + +.. code-block:: c + + #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE + int ndx = g_irqmap[irq]; + #else + int ndx = irq; + #endif + +where ``up_mapirq[]`` is an array indexed by the physical +interrupt vector number and contains the new, mapped +interrupt vector table index. This array must be +provided by platform-specific code. + +``irq_attach()`` would this use this index to set the ``g_irqvector[]``. + +.. code-block:: c + + g_irqvector[ndx].handler = isr; + g_irqvector[ndx].arg = arg; + +``irq_dispatch()`` +------------------ + +``irq_dispatch()`` is called by MCU logic when an interrupt is received: + +.. code-block:: c + + void irq_dispatch(int irq, FAR void *context); + +Where, again irq is the physical interrupt vector number. + +``irq_dispatch()`` would do essentially the same thing as +``irq_attach()``. First it would map the irq number to +a table index: + +.. code-block:: c + + #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE + int ndx = g_irqmap[irq]; + #else + int ndx = irq; + #endif + +Then dispatch the interrupt handling to the attached +interrupt handler. NOTE that the physical vector +number is passed to the handler so it is completely +unaware of the underlying `shell` game: + +.. code-block:: c + + vector = g_irqvector[ndx].handler; + arg = g_irqvector[ndx].arg; + + vector(irq, context, arg); + +``irq_initialize()`` +-------------------- + +``irq_initialize()``: simply set the ``g_irqvector[]`` table +a known state on power-up. It would only have to distinguish +the difference in sizes. + +.. code-block:: c + + #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE + # define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS + #else + # define TAB_SIZE NR_IRQS + #endif + + for (i = 0; i < TAB_SIZE; i++) + +``g_mapirq[]`` +-------------- + +An implementation of ``up_mapirq()`` might be something like: + +.. code-block:: c + + #include + + const irq_mapped_t g_irqmap[NR_IRQS] = + { + ... IRQ to index mapping values ... + }; + +``g_irqmap[]`` is a array of mapped irq table indices. It +contains the mapped index value and is itself indexed +by the physical interrupt vector number. It provides +an ``irq_mapped_t`` value in the range of 0 to +``CONFIG_ARCH_NUSER_INTERRUPTS`` that is the new, mapped +index into the vector table. Unsupported IRQs would +simply map to an out of range value like ``IRQMAPPED_MAX``. +So, for example, if ``g_irqmap[37] == 24``, then the hardware +interrupt vector 37 will be mapped to the interrupt vector +table at index 24. if ``g_irqmap[42] == IRQMAPPED_MAX``, then +hardware interrupt vector 42 is not used and if it occurs +will result in an unexpected interrupt crash. + +Hardware Vector Remapping +========================= + +`[This technical approach is discussed here but is +discouraged because of technical "Complications" and +"Dubious Performance Improvements" discussed at the +end of this section.]` + +Most ARMv7-M architectures support two mechanism for handling interrupts: + +* The so-called `common` vector handler logic enabled with + ``CONFIG_ARMV7M_CMNVECTOR=y`` that can be found in + ``arch/arm/src/armv7-m/``, and +* MCU-specific interrupt handling logic. For the + STM32, this logic can be found at ``arch/arm/src/stm32/gnu/stm32_vectors.S``. + +The `common` vector logic is slightly more efficient, +the MCU-specific logic is slightly more flexible. + +If we don't use the `common` vector logic enabled with +``CONFIG_ARMV7M_CMNVECTOR=y``, but instead the more +flexible MCU-specific implementation, then we can +also use this to map the large set of hardware +interrupt vector numbers to a smaller set of software +interrupt numbers. This involves minimal changes to +the OS and does not require any magic software lookup +table. But is considerably more complex to implement. + +This technical approach requires changes to three files: + +* A new header file at ``arch/arm/include/stm32``, say + ``xyz_irq.h`` for the purposes of this discussion. + This new header file is like the other IRQ definition + header files in that directory except that it + defines only the IRQ number of the interrupts after + remapping. So, instead of having the 100 IRQ number + definitions of the original IRQ header file based on + the physical vector numbers, this header file would + define ``only`` the small set of 20 ``mapped`` IRQ numbers in + the range from 0 through 19. It would also set ``NR_IRQS`` + to the value 20. +* A new header file at ``arch/arm/src/stm32/hardware``, say + ``xyz_vector.h``. It would be similar to the other vector + definitions files in that directory: It will consist + of a sequence of 100 ``VECTOR`` and ``UNUSED`` macros. It will + define ``VECTOR`` entries for the 20 valid interrupts and + 80 ``UNUSED`` entries for the unused interrupt vector numbers. + More about this below. +* Modification of the ``stm32_vectors.S`` file. These changes + are trivial and involve only the conditional inclusion + of the new, special ``xyz_vectors.h`` header file. + +**REVISIT**: This needs to be updated. Neither the ``xyz_vector.h`` +files nor the ``stm32_vectors.S`` exist in the current realization. +This has all been replaced with the common vector handling at +``arch/arm/src/armv7-m``. + +Vector Definitions +================== + +In ``arch/arm/src/stm32/gnu/stm32_vector.S``, notice that the +``xyz_vector.h`` file will be included twice. Before each +inclusion, the macros ``VECTOR`` and ``UNUSED`` are defined. + +The first time that ``xyz_vector.h`` included, it defines the +hardware vector table. The hardware vector table consists +of ``NR_IRQS`` 32-bit addresses in an array. This is +accomplished by setting: + +.. code-block:: c + + #undef VECTOR + #define VECTOR(l,i) .word l + + #undef UNUSED + #define UNUSED(i) .word stm32_reserved + +Then including ``xyz_vector.h``. So consider the following +definitions in the original file: + +.. code-block:: c + + ... + VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */ + VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global interrupt */ + VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global interrupt */ + ... + +Suppose that we wanted to support only USART1 and that +we wanted to have the IRQ number for USART1 to be 12. +That would be accomplished in the ``xyz_vector.h`` header +file like this: + +.. code-block:: c + + ... + VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */ + UNUSED(0) /* Vector 16+38: USART2 global interrupt */ + UNUSED(0) /* Vector 16+39: USART3 global interrupt */ + ... + +Where the value of ``STM32_IRQ_USART1`` was defined to +be 12 in the ``arch/arm/include/stm32/xyz_irq.h`` header +file. When ``xyz_vector.h`` is included by ``stm32_vectors.S`` +with the above definitions for ``VECTOR`` and ``UNUSED``, the +following would result: + +.. code-block:: c + + ... + .word stm32_usart1 + .word stm32_reserved + .word stm32_reserved + ... + +These are the settings for vector 53, 54, and 55, +respectively. The entire vector table would be populated +in this way. ``stm32_reserved``, if called would result in +an "unexpected ISR" crash. ``stm32_usart1``, if called will +process the USART1 interrupt normally as we will see below. + +Interrupt Handler Definitions +----------------------------- + +in the vector table, all of the valid vectors are set to +the address of a `handler` function. All unused vectors +are force to vector to ``stm32_reserved``. Currently, only +vectors that are not supported by the hardware are +marked ``UNUSED``, but you can mark any vector ``UNUSED`` in +order to eliminate it. + +The second time that ``xyz_vector.h`` is included by +``stm32_vector.S``, the `handler` functions are generated. +Each of the valid vectors point to the matching handler +function. In this case, you do NOT have to provide +handlers for the ``UNUSED`` vectors, only for the used +``VECTOR`` vectors. All of the unused vectors will go +to the common ``stm32_reserved`` handler. The remaining +set of handlers is very sparse. + +These are the values of ``UNUSED`` and ``VECTOR`` macros on the +second time the ``xzy_vector.h`` is included by ``stm32_vectors.S``: + +.. code-block:: asm + + .macro HANDLER, label, irqno + .thumb_func + label: + mov r0, #\irqno + b exception_common + .endm + + #undef VECTOR + #define VECTOR(l,i) HANDLER l, i + + #undef UNUSED + #define UNUSED(i) + +In the above USART1 example, a single handler would be +generated that will provide the IRQ number 12. Remember +that 12 is the expansion of the macro ``STM32_IRQ_USART1`` +that is provided in the ``arch/arm/include/stm32/xyz_irq.h`` +header file: + +.. code-block:: asm + + .thumb_func + stm32_usart1: + mov r0, #12 + b exception_common + +Now, when vector 16+37 occurs it is mapped to IRQ 12 +with no significant software overhead. + +A Complication +-------------- + +A complication in the above logic has been noted by David Sidrane: +When we access the NVIC in ``stm32_irq.c`` in order to enable +and disable interrupts, the logic requires the physical +vector number in order to select the NVIC register and +the bit(s) the modify in the NVIC register. + +This could be handled with another small IRQ lookup table +(20 ``uint8_t`` entries in our example situation above). But +then this approach is not so much better than the `Software +Vector Mapping` described about which does not suffer from +this problem. Certainly enabling/disabling interrupts in a +much lower rate operation and at least does not put the +lookup in the critical interrupt path. + +Another option suggested by David Sidrane is equally ugly: + +* Don't change the ``arch/arm/include/stm32`` IRQ definition file. +* Instead, encode the IRQ number so that it has both + the index and physical vector number: + +.. code-block:: c + + ... + VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1) + UNUSED(0) + UNUSED(0) + ... + +The STM32_INDEX_USART1 would have the value 12 and +STM32_IRQ_USART1 would be as before (53). This encoded +value would be received by ``irq_dispatch()`` and it would +decode both the index and the physical vector number. +It would use the index to look up in the ``g_irqvector[]`` +table but would pass the physical vector number to the +interrupt handler as the IRQ number. + +A lookup would still be required in ``irq_attach()`` in +order to convert the physical vector number back to +an index (100 ``uint8_t`` entries in our example). So +some lookup is unavoidable. + +Based upon these analysis, my recommendation is that +we do not consider the second option any further. The +first option is cleaner, more portable, and generally +preferable.is well worth that. + +Dubious Performance Improvements +-------------------------------- + +The intent of this second option was to provide a higher +performance mapping of physical interrupt vectors to IRQ +numbers compared to the pure software mapping of option 1. However, +in order to implement this approach, we had +to use the less efficient, non-common vector handling +logic. That logic is not terribly less efficient, the +cost is probably only a 16 bit load immediate instruction +and branch to another location in FLASH (which will cause +the CPU pipeline to be flushed). + +The variant of option 2 where both the physical vector number +and vector table index are encoded would require even more +processing in ``irq_dispatch()`` in order to decode the +physical vector number and vector table index. +Possible just AND and SHIFT instructions. + +However, the minimal cost of the first pure software +mapping approach was possibly as small as a single +indexed byte fetch from FLASH in ``irq_attach()``. +Indexing is, of course, essentially `free` in the ARM +ISA, the primary cost would be the FLASH memory access. +So my first assessment is that the performance of both +approaches is the essentially the same. If anything, the +first approach is possibly the more performant if +implemented efficiently. + +Both options would require some minor range checking in +``irq_attach()`` as well. + +Because of this and because of the simplicity of the +first option, I see no reason to support or consider +this second option any further. + +Complexity and Generalizability +------------------------------- + +Option 2 is overly complex; it depends on a deep understanding +on how the MCU interrupt logic works and on a high level of +Thumb assembly language skills. + +Another problem with option 2 is that really only applies to +the Cortex-M family of processors and perhaps others that +support interrupt vectored interrupts in a similar fashion. +It is not a general solution that can be used with any CPU +architectures. + +And even worse, the MCU-specific interrupt handling logic +that this support depends upon is is very limited. As soon +as the common interrupt handler logic was added, I stopped +implementing the MCU specific logic in all newer ARMv7-M +ports. So that MCU specific interrupt handler logic is +only present for EFM32, Kinetis, LPC17, SAM3/4, STM32, +Tiva, and nothing else. Very limited! + +These are further reasons why option 2 is no recommended and +will not be supported explicitly.