Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers/pipes: return after short write if buffer is full #14667

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

Cynerd
Copy link
Contributor

@Cynerd Cynerd commented Nov 6, 2024

Summary

The write should return even in case of O_NONBLOCK if at least some bytes were written.

The previous state where always all bytes were written was breaking a common combination with poll, because poll would signal POLLOUT and some bytes would really be consumed but write could still block afterwards. That would prevent from execution returning to the poll loop again.

None the less it is also the standard C library behavior for the write function.

Impact

This could have impack on any code that uses pipes and fifos and relies on this non-standard behavior or write always writing all bytes. But it is non-standard behavior and thus considered to be a bug in my eyes.

Testing

Tested with the following code that after this change works:

// Must be longer than CONFIG_DEV_PIPE_SIZE
const char *text =
        "Lorem ipsum odor amet, consectetuer adipiscing elit. Eget litora laoreet ornare elementum pellentesque class penatibus eleifend morbi. Conubia cras accumsan blandit at fames mauris varius. Nisl c
const size_t tlen = strlen(text);
printf("tlen %zd\n", tlen);
int fds[2];
assert(pipe2(fds, 0) == 0);
struct pollfd pfd[2] = {
        {
                .fd = fds[0],
                .events = POLLIN | POLLHUP,
        },
        {
                .fd = fds[1],
                .events = POLLOUT,
        },
};
size_t off = 0;
while (true) {
        if (poll(pfd, 2, -1) == -1) {
                printf("Poll fail %s\n", strerror(errno));
                exit(1);
        }
        if (pfd[1].revents & POLLOUT) {
                ssize_t wlen = write(fds[1], text + off, tlen - off);
                if (wlen < 0) {
                        printf("Write fail %s\n", strerror(errno));
                        exit(1);
                }
                printf("Written %zd\n", wlen);
                off += wlen;
                if (off >= tlen)
                        close(fds[1]);
                continue;
        }
        if (pfd[0].revents & POLLIN) {
                char buf[BUFSIZ];
                ssize_t rlen = read(fds[0], buf, BUFSIZ);
                if (rlen < 0) {
                        printf("Read fail %s\n", strerror(errno));
                        exit(1);
                }
                fwrite(buf, 1, rlen, stdout);
        }
        if (pfd[0].revents & POLLHUP) {
                putc('\n', stdout);
                exit(0);
        }
}

The write should return even in case of O_NONBLOCK if at least some
bytes were written.

The previous state where always all bytes were written was breaking a
common combination with poll, because poll would signal POLLOUT and some
bytes would really be consumed but write could still block afterwards.
That would prevent from execution returning to the poll loop again.

None the less it is also the standard C library behavior for the write
function.
@github-actions github-actions bot added Area: Drivers Drivers issues Size: M The size of the change in this PR is medium labels Nov 6, 2024
@nuttxpr
Copy link

nuttxpr commented Nov 6, 2024

[Experimental Bot, please feedback here]

Yes, this PR appears to meet the NuttX requirements, although some sections could be more explicit.

Strengths:

  • Clear Summary: The summary explains the problem (blocking write with O_NONBLOCK when partial writes are possible), the solution (returning after a partial write), and the rationale (conforming to standard C library behavior and fixing the interaction with poll).
  • Impact Description: The impact section identifies potential compatibility issues with code relying on the previous, non-standard behavior.
  • Test Case: Provides a concrete test case demonstrating the issue and its resolution. This is crucial for verifying the fix.

Areas for Improvement:

  • Impact Specificity: While the impact section mentions potential compatibility issues, it would be stronger with more details. Which specific parts of NuttX or NuttX apps might be affected? If none are known, explicitly stating that would be helpful. The current phrasing "This could have impact..." is vague. Be more definitive if possible.
  • Testing Details: While the test code is provided, the "Testing logs before change" and "Testing logs after change" sections are empty. Include the actual output of the test program before and after the change. This concretely demonstrates the fix's effect. Also specify the Build Host and Target details as requested.
  • Impact on Documentation: If this change affects how write behaves with pipes/FIFOs, the documentation should be updated to reflect the new behavior. The Impact section should specify whether documentation updates are included in the PR.
  • Impact on Security and Compatibility: While it's likely there are no security implications and backward compatibility is the main concern (already addressed), explicitly stating "NO" with a brief justification for security and a more detailed explanation for compatibility (e.g., "Backward compatibility may be affected for applications relying on the non-standard behavior...") would strengthen the PR.

Example of improved Testing Section:

## Testing

I confirm that changes are verified on local setup and works as intended:
* Build Host(s): Linux (Ubuntu 20.04), x86_64, GCC 9.4.0
* Target(s): sim:qemu-x86_64


Testing logs before change:

tlen 295
Written 65536
Poll fail Interrupted system call <-- Demonstrates the blocking issue


Testing logs after change:

tlen 295
Written 65536
Written 229
Lorem ipsum odor amet, consectetuer adipiscing elit. Eget litora laoreet ornare elementum pellentesque class penatibus eleifend morbi. Conubia cras accumsan blandit at fames mauris varius. Nisl c

By making these improvements, the PR will be clearer, easier to review, and demonstrate a higher level of due diligence.

@xiaoxiang781216 xiaoxiang781216 merged commit d0680fd into apache:master Nov 6, 2024
23 of 27 checks passed
@lupyuen
Copy link
Member

lupyuen commented Nov 7, 2024

Sorry @Cynerd: CI Test in arm-02 is failing with Pipe Timeout, I wonder if it's due to this PR?

Configuration/Tool: sabre-6quad/citest,CONFIG_ARM_TOOLCHAIN_GNU_EABI
__________________________________ test_pipe ___________________________________
p = <utils.common.connectNuttx object at 0x7f35223[161](https://github.com/NuttX/nuttx/actions/runs/11713931118/job/32627704349#step:7:162)a0>
    def test_pipe(p):
        ret = p.sendCommand("pipe", "redirect_reader: Returning success", timeout=60)
>       assert ret == 0
E       assert -1 == 0
E         +-1
E         -0

/github/workspace/sources/nuttx/tools/ci/testrun/script/test_example/test_example.py:41: AssertionError
----------------------------- Captured stdout call -----------------------------
+------------------------------------------- Catch Exception -------------------------------------------+
+----------------------------------------------- Result ------------------------------------------------+
| Command     : 'pipe'                                                                                  |
| Expect value: ['redirect_reader: Returning success']                                                  |
| Timeout     : 60s                                                                                     |
| Test result : Timeout

https://github.com/NuttX/nuttx/actions/runs/11713931118/job/32627704349#step:7:174

Update: CI Test is also super slow in sim-01 and risc-v-05, taking 6 hours (before getting killed by GitHub):

@Cynerd
Copy link
Contributor Author

Cynerd commented Nov 7, 2024

Sorry @Cynerd: CI Test in arm-02 is failing with Pipe Timeout, I wonder if it's due to this PR?

It might be. I will look into that. I suspect that it is because of some invalid expectations for some components using the pipe. As I wrote in the "Impact": it might cause issues if code that uses pipecommon relies on write working in a non-standard way.

Edit: The pipe example fix should be easy. I will test it and submit PR with fix. We will see if it will help with timeout as well.

Edit2: @lupyuen apache/nuttx-apps#2829

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Drivers Drivers issues Size: M The size of the change in this PR is medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants