Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/sched.c: fix _runqueue_pop() removing wrong thread #20938

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

derMihai
Copy link
Contributor

@derMihai derMihai commented Oct 23, 2024

Contribution description

_runqueue_pop() calls clist_lpop(), which pops the leftmost entry in the thread's priority runqueue, but not necessarily the one passed as argument.

I understand the original logic behind this: if a thread is to be removed, then it is so because it is about to block on something or exit. And a thread can only block or exit if it's currently running. As a result, it must be the leftmost entry on the runqueue. But this isn't always true. Consider sched_change_priority(), which does the following:

    if (thread_is_active(thread)) { 
        _runqueue_pop(thread);
        _runqueue_push(thread, priority);
    }

thread_is_active() returns true if a thread is either running or queued to be running. So the _runqueue_pop() might retrieve any thread. Let's extend _runqueue_pop() with the following assertion:

    ...
    clist_node_t *removed = clist_lpop(&sched_runqueues[thread->priority]);
    assume(container_of(removed, thread_t, rq_entry) == thread);
    ...

Running this will trigger the assertion:

    int thread_ids[WAITERS_CNT];
    for (unsigned i = 0; i < WAITERS_CNT; i++) {
       // set a lower priority than the current thread's to make sure they are all in the runqueue when
       // changing priorities below
        thread_ids[i] = thread_create(stacks[i], sizeof(stacks[0]), THREAD_PRIORITY_MAIN + 1,
                      THREAD_CREATE_STACKTEST, waiter, NULL, "waiter");
    }

    thread_t *thread2 = thread_get(thread_ids[1]);

    irq_disable(); // sched_change_priority() wants interrupts disabled
    sched_change_priority(thread2, THREAD_PRIORITY_MAIN);
    irq_enable();

If we change _runqueue_pop() to search for the specific thread:

    ...
    clist_node_t *removed = clist_remove(&sched_runqueues[thread->priority], &thread->rq_entry);
    assume(container_of(removed, thread_t, rq_entry) == thread);
    ....

Then everything is all right.

The only ugly thing is that, when the thread is indeed running (vast majority of cases), due to how clists are implemented the whole list is walked to remove the thread.

Testing procedure

I briefly tested this code on a SAM0 board.

@github-actions github-actions bot added the Area: core Area: RIOT kernel. Handle PRs marked with this with care! label Oct 23, 2024
@benpicco benpicco added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 23, 2024
@benpicco benpicco requested a review from maribu October 23, 2024 14:09
@riot-ci
Copy link

riot-ci commented Oct 23, 2024

Murdock results

✔️ PASSED

13d9b9c core/sched.c: fix _runqueue_pop() removing wrong thread

Success Failures Total Runtime
10214 0 10215 13m:21s

Artifacts

@benpicco
Copy link
Contributor

The only ugly thing is that, when the thread is indeed running (vast majority of cases), due to how clists are implemented the whole list is walked to remove the thread.

clist_remove() already contains an optimisation there:

static inline clist_node_t *clist_remove(clist_node_t *list, clist_node_t *node)
{
    if (list->next) {
        if (list->next->next == node) {
            return clist_lpop(list);
        }
        else {

So in the common case, this just adds two if conditions.

@derMihai
Copy link
Contributor Author

So in the common case, this just adds two if conditions

If the common case is that there are exactly one or two threads, and we want to remove the first. If there are more, it will start looping from the second. But probably good enough.

@benpicco benpicco added this pull request to the merge queue Oct 23, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 23, 2024
@benpicco
Copy link
Contributor

looks like Makefile.ci needs an update - can you check by how much this increases code size?

It's probably just a few bytes and tests_xbee was already at the very limit for nucleo-l031k6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants