Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fork overview, and thoughts to improve basis for chance of upstreaming #1

Open
mara004 opened this issue Nov 11, 2023 · 2 comments
Open

Comments

@mara004
Copy link
Member

mara004 commented Nov 11, 2023

See below for an overview of this fork. Note, this writeup is a non-exhaustive work in progress.

This information may be valuable for working towards a basis that could be merged back into upstream at some point, though this seems fairly hypothetical for the near term, given time constraints, and mismatched design intents (e.g. relating to backwards compatibility).

However, this fork of ctypesgen may be a good starting point for any active future development, with a significantly overhauled code base that should be nicer to work with.

Selection of improvements from this fork

small, self-contained fixes have usually been submitted upstream and may have been merged

Points to consider

  • Restoring implicit UTF-8 string encoding/decoding as optional?

    ctypesgen originally did implicit UTF-8 encoding/decoding of in/out strings.
    While that tends to be bad practice and callers had better handle strings explicitly instead, it would seem reasonable to retain an optional backward compatibility layer for existing callers.
    I also imagine it might be convenient for a library that consistently uses UTF-8 for everything.

    Adding the old string classes back is certainly not an option for us. However, it may be possible to create a lean replacement. See Make string auto-conversion optional & use leaner strings classes ctypesgen/ctypesgen#177 for a suggestion (copy below), or String seems incorrect ctypesgen/ctypesgen#77 (comment) for an alternative draft by @olsonse.
    Note that in/out must be handled in a single class.

  • Mixed calling conventions

    We do not currently support mixed calling conventions (both cdecl and windows-only stdcall functions in one binary). Our fork dropped this while rewriting the library loader, because it does not match the API layer of ctypes and would need a template to handle, which the assumably low relevance did not seem to justify. The previous implementation was a bit dirty, too: In case of a pure stdcall library, a cdecl handle would be opened anyway, and binary variables would be taken from the cdecl one.
    If there is a real-world use case, this could be added back.

  • Removal of support for multiple libraries in one bindings file

    This feature was a significant complexity burden in some code areas, including pollution around symbols in printer/output code. For now we decided to remove it - callers can use --no-embed-preamble and --link-modules to create separate bindings files. This also encourages individual/explicit rather than unified loader config.

    See also Recognize structs from common header files in different wrapper modules ctypesgen/ctypesgen#86 (comment) for some alternative considerations.

Other notes

  • Shifts in design intent: We would prefer to stick with plain ctypes as much as possible and avoid cluttering the bindings with custom wrappers.

  • CLI: We changed the command-line interface from action=append to action=extend and nargs=+/*. This implied switching headers from positional to flag argument to avoid confusion/interference with flags that take multiple arguments. There are more CLI changes not listed here, see diff for details.

Done tasks

  • Restored test suite usability by adapting to fork changes.
  • Restored macro guards as opt-out

Footnotes

  1. Quoting from the commit in question "Fun fact: It seems like ctypesgen prior to this change added the complie_libdirs to the runtime library loader. Further, relative paths (I think) were never correctly interpreted relative to the file's directory, but to the user's CWD, which is useless and could even cause unexpected behavior. So runtime_libdirs = ["."] didn't work, but the fallback below [commented with] 'then we search the directory where the generated python interface is stored' captured."

  2. Note, this is meant for use with inherently ABI correct packaging only

@mara004 mara004 pinned this issue Nov 11, 2023
@mara004
Copy link
Member Author

mara004 commented Nov 11, 2023

Dumping our lean string class replacement draft below as it's not very well visible in the PR diff.

class _wraps_c_char_p:
    def __init__(self, raw, value):
        self.raw = raw
        self.value = value

    # provided for clarity, not actually necessary due to __getattr__ wrapper below
    def decode(self, encoding="utf-8", errors="strict"):
        return self.value.decode(encoding, errors=errors)

    def __str__(self):
        return self.decode()

    def __getattr__(self, attr):
        return getattr(self.value, attr)


class String(ctypes.c_char_p):
    @classmethod
    def _check_retval_(cls, result):
        value = result.value
        return value if value is None else _wraps_c_char_p(result, value)

    @classmethod
    def from_param(cls, obj):
        if isinstance(obj, str):
            obj = obj.encode("utf-8")
        return super().from_param(obj)

@mara004 mara004 changed the title Improving basis for chance of merging back into upstream Fork overview, and thoughts to improve basis for chance of upstreaming Nov 11, 2023
@mara004
Copy link
Member Author

mara004 commented Dec 4, 2023

Another improvement that comes to my mind for autostrings would be making the kind of encoding configurable.

e.g. pdfium mostly uses UTF16LE, so autostrings with this might actually be convenient for pypdfium2, though formally a default encoding remains a problem - it would still be a concern with any APIs that use other encodings, like UTF-8 or ASCII.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant