Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dereferencing of unaligned FILE_NAME_INFO #51

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 29 additions & 18 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ pub fn is(stream: Stream) -> bool {

// Otherwise, we fall back to a very strange msys hack to see if we can
// sneakily detect the presence of a tty.
unsafe { msys_tty_on(fd) }
msys_tty_on(fd)
}

/// returns true if this is _not_ a tty
Expand All @@ -115,34 +115,45 @@ unsafe fn console_on_any(fds: &[DWORD]) -> bool {

/// Returns true if there is an MSYS tty on the given handle.
#[cfg(windows)]
unsafe fn msys_tty_on(fd: DWORD) -> bool {
use std::{mem, slice};

fn msys_tty_on(fd: DWORD) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this should be unsafe. It's taking a raw file descriptor without any guarantees about the validity of that file descriptor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fd isn't used as a file descriptor internally, just as an argument to GetStdHandle which maps the handle id to the real runtime handle. I've documented why that's sound in its own unsafe block, which I'm fairly confident upholds winapis contracts.

Most winapi APIs are actually very forgiving with handles, and will simply return errors when they're invalild.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunfishcode What do you think about this? Particularly with respect to your new I/O safety RFC, should a routine like this be marked as unsafe even though there is no possibility of UB for any value of fd?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This... is a subtle one. It's true there's no possibility of UB here, or even corrupted I/O. On the other hand, the overall pattern here of using a pre-existing I/O handle without being given explicit access to it is the kind of thing I/O safety wants to avoid. On the first hand though, atty's own public API does this, by taking an enum instead of a handle; it's not unsafe, and this is doing essentially the same thing.

So I think it's reasonable to leave this as a safe function for now. But I also think it makes sense to explore APIs where the user passes in a handle.

use winapi::{
ctypes::c_void,
shared::minwindef::MAX_PATH,
um::{
fileapi::FILE_NAME_INFO, minwinbase::FileNameInfo, processenv::GetStdHandle,
minwinbase::FileNameInfo, processenv::GetStdHandle,
winbase::GetFileInformationByHandleEx,
},
};

let size = mem::size_of::<FILE_NAME_INFO>();
let mut name_info_bytes = vec![0u8; size + MAX_PATH * mem::size_of::<WCHAR>()];
let res = GetFileInformationByHandleEx(
GetStdHandle(fd),
FileNameInfo,
&mut *name_info_bytes as *mut _ as *mut c_void,
name_info_bytes.len() as u32,
);
/// Mirrors winapi::um::fileapi::FILE_NAME_INFO, giving it a fixed length that
/// we can stack allocate
#[repr(C)]
struct FILE_NAME_INFO {
FileNameLength: DWORD,
FileName: [WCHAR; MAX_PATH]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why the winapi definition isn't equivalent to this? I'm worried we're missing something given that winapi chose a different representation and it's not clear why to me. cc @retep998

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way winapi normally models C's flexible array members, as it's typically the most flexible (heh) option. The actual max length of file name that GetFileInformationByHandleEx will write is controlled by the fourth parameter, which could well be less than the full MAX_PATH length (and could even be longer! It's possible to enable even longer path lengths, but our ptys shouldn't need to worry about that :))

let mut name_info = FILE_NAME_INFO {
FileNameLength: 0,
FileName: [0; MAX_PATH]
};
let handle = unsafe {
// Safety: function has no invariants. an invalid handle id will cause
// GetFileInformationByHandleEx to return an error
GetStdHandle(fd)
};
let res = unsafe {
// Safety: handle is valid, and buffer length is fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The handle is not necessarily valid, GetStdHandle can return an INVALID_HANDLE_VALUE (ref docs) and it's fragile to rely on this api not doing anything wrong when passed invalid data. I would be more comfortable with this error being checked in rust before calling the system api.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetFileInformationByHandleEx should be well-behaved here, as documented in the comments. Handling an invalid handle is one of its error states. That said, I'd never really argue against defensive programming especially around winapi!

Either way, I probably won't do anything more with PR without input from a maintainer

GetFileInformationByHandleEx(
handle,
FileNameInfo,
&mut name_info as *mut _ as *mut c_void,
std::mem::size_of::<FILE_NAME_INFO>() as u32,
)
};
if res == 0 {
return false;
}
let name_info: &FILE_NAME_INFO = &*(name_info_bytes.as_ptr() as *const FILE_NAME_INFO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a smaller change here would be to just use (name_info_bytes.as_ptr() as *const FILE_NAME_INFO).read_unaligned(), right? Then you don't need to reason about uninitialized memory and what not, which seems like overkill here IMO.

Copy link
Author

@Plecra Plecra Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use uninitialized memory here anyway - mem::zeroed() would be just fine. I personally prefer to use MaybeUninit when dealing with OS apis, as it helps flag the invariants we're actually interested in. Even without the type there, this function entirely relies on the buffer being written to. That said, I'm happy to make the change to mem::zeroed if you decide this breaks the complexity budget :)

Edit: On second thought mem::zeroed() is overkill too 😄 we can simply initialize the struct itself. In the new commit, it turns out this drastically reduces the unsafety, limiting it to only the winapi calls.

As for a read_unaligned, I think using the actual type rather than a byte buffer makes the intention clearer.

let size = mem::size_of::<FILE_NAME_INFO>();
// Imo, this takes no less effort to grok
let mut name_info_bytes = vec![0u8; size + MAX_PATH * mem::size_of::<WCHAR>()];

The main problem with using read_unaligned would be that the invariants it requires are actually more complex than dereferencing a struct we've just initialized. i.e. The change you suggest would also be unsound: s would now have a buffer overflow because only 1 WCHAR would be read from the buffer (based on winapi's FILE_NAME_INFO struct). We could absolutely make read_unaligned work (we'd need to use something like name_info_bytes.as_ptr().add(4) on line 143) but my concern would be that the safety is actually more nuanced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w.r.t. to read_unaligned, I was suggesting it specifically as a replacement to the raw pointer dereference. If read_unaligned is unsound without some pointer arithmetic, then so is the raw pointer dereference. The key here is that this PR is trying to solve two problems (unaligned dereference and getting rid of heap alloc) despite the title of the PR only referring to one (unaligned dereference). Getting rid of heap allocs seems like a questionable thing to me, which means the smallest diff for this PR to address the unaligned deference is to simply use an unaligned load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm fine with fixing the heap alloc issue here. While it's not clear to me why fixing it is important, there is little cost to doing so I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That certainly wasn't the intention! I totally agree that it's good to keep the scope of these changes small, I just found that trying to align the vector properly was trickier than doing it on the stack, as I needed to intertwine the alignment with the generic type I was putting in the vector.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, I know you're only skimming so I don't blame you for misreading the change, but the difference between a read_unaligned and the previous pointer dereference is that it means we have a FILE_NAME_INFO instead of a &FILE_NAME_INFO. This makes name_info.FileName.as_ptr() below unsound because now it's moved away from the actual path data! Point being, I'm glad we don't have to have any more nasty unsafe here :P)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Tricksy indeed. You're totally right.

let s = slice::from_raw_parts(
name_info.FileName.as_ptr(),
name_info.FileNameLength as usize / 2,
);
let s = &name_info.FileName[..name_info.FileNameLength as usize];
let name = String::from_utf16_lossy(s);
// This checks whether 'pty' exists in the file name, which indicates that
// a pseudo-terminal is attached. To mitigate against false positives
Expand Down