Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser.cpp: fix WalkType + many native bugs #1819

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

deadlocklogic
Copy link
Contributor

@deadlocklogic deadlocklogic commented Dec 30, 2023

Consider this snippet from the standard string library on msvc, without skipping private declarations:

class Test : ILibrary
{
    public void SetupPasses(Driver driver)
    {
    }

    public void Preprocess(Driver driver, ASTContext ctx)
    {
    }

    public void Postprocess(Driver driver, ASTContext ctx)
    {
    }

    public void Setup(Driver driver)
    {
        driver.ParserOptions.UnityBuild = false;
        driver.ParserOptions.SkipPrivateDeclarations = false;

        var options = driver.Options;
        options.Quiet = false;
        options.DryRun = true;
        options.Verbose = true;
        options.GenerationOutputMode = GenerationOutputMode.FilePerModule;
        options.GeneratorKind = GeneratorKind.CPlusPlus;
        options.UseHeaderDirectories = true;

        var module = driver.Options.AddModule("test");

        string temp = Path.GetTempFileName();
        File.WriteAllText(temp, @"
template <class _Ty>
class allocator {};
template <class _Alloc>
struct allocator_traits {};
template <class _Alloc, class _Value_type>
using _Rebind_alloc_t = typename allocator_traits<_Alloc>::template rebind_alloc<_Value_type>;
template <class _Elem, class _Alloc = allocator<_Elem>>
class basic_string { // null-terminated transparent array of elements
private:
using _Alty        = _Rebind_alloc_t<_Alloc, _Elem>;
};
");

        module.Headers.Add(temp);
    }
}

In Debug mode, Parser::WalkType assertion fails: assert(TL->getTypeLocClass() == TypeLoc::DependentTemplateSpecialization);
The actual TypeLoc is: TL->getTypeLocClass() == TypeLoc::TemplateSpecialization.

There are also other problems when TypeLocClass == TypeLoc::Qualified and the inner TypeLoc == TypeLoc::Elaborated.

By the way, why the need of TypeLoc in the API at all? Haven't seen its usage.

@deadlocklogic
Copy link
Contributor Author

This PR partially fixes #1791, I am still in the process of debugging and fixing the general use case.

@deadlocklogic deadlocklogic marked this pull request as draft December 30, 2023 23:28
@deadlocklogic deadlocklogic changed the title Parser.cpp: fix WalkType for clang::Type::TypeClass == clang::Type::DependentTemplateSpecialization Parser.cpp: fix WalkType + many native bugs Dec 31, 2023
@deadlocklogic
Copy link
Contributor Author

deadlocklogic commented Dec 31, 2023

Consider parsing:

template <class...>
struct conjunction {};

template <class B>
struct conjunction<B> : B {};

The assertion fails when calling GetCXXRecordDeclFromBaseType, we shouldn't assert this as nullptr is an expected value which we test against in the 2 referring calls. Because in this case the type is a TemplateTypeParmType and is totally valid.
I believe the code related to HasLayout/offset should be migrated to C#. Consider when template parameter substitution is implemented, we should be able to calculate the layout of a full class template specialization on the C# side. Not to mention that this is a C# generator specific requirement.

@deadlocklogic
Copy link
Contributor Author

Tests in Debug mode before this PR had 4 crashing 1 failing, now down to 1 crashing 1 failing.

@deadlocklogic
Copy link
Contributor Author

@tritao Do you think we should refactor the Parser.cpp to remove TypeLoc usage.
Why do the current implementation prefers querying type properties from TypeLoc rather than the type itself.
Consider:

if (LocValid)
{
    // ...
    TPT->parameter = WalkTypeTemplateParameter(TTTL.getDecl());
}
else if (TP->getDecl())
    TPT->parameter = WalkTypeTemplateParameter(TP->getDecl());

@tritao
Copy link
Collaborator

tritao commented Jan 2, 2024

@tritao Do you think we should refactor the Parser.cpp to remove TypeLoc usage.

If possible yes, it would simplify the parser code a bit.

Why do the current implementation prefers querying type properties from TypeLoc rather than the type itself. Consider:

IIRC unless we used the TypeLoc versions we got incomplete type information in some cases. It's worth a try removing it and seeing if all tests still pass, but even then, most of the edge cases only happened with complex bindings like QtSharp.

@deadlocklogic
Copy link
Contributor Author

deadlocklogic commented Jan 2, 2024

IIRC unless we used the TypeLoc versions we got incomplete type information in some cases. It's worth a try removing it and seeing if all tests still pass, but even then, most of the edge cases only happened with complex bindings like QtSharp.

Interesting because most of TypeLoc and its derivates relay their properties from the type pointer.
I will try cleaning them up in a separate branch and see what goes on.

BTW I am having trouble parsing a relatively big project (without any processing, just parsing and AST generation), with around 650 header files, the process is crashing out of memory (consuming up to 15GB).
I wonder how compiling QT, which might be a larger library, was setup up.

Is there a way to let clang ignore function bodies?

@tritao
Copy link
Collaborator

tritao commented Jan 2, 2024

BTW I am having trouble parsing a relatively big project (without any processing, just parsing and AST generation), with around 650 header files, the process is crashing out of memory (consuming up to 15GB).

Is this in debug or release mode? Are you using the unity header parsing or just header by header parsing?

I think QtSharp enabled unity mode, you can check it here: https://gitlab.com/ddobrev/QtSharp

@deadlocklogic
Copy link
Contributor Author

I am not using UnityBuild so far, and was really wondering about its advantages but still haven't tried it to come up with a conclusion.

@tritao
Copy link
Collaborator

tritao commented Jan 2, 2024

Since it parses all the headers as a single translation unit, it can be significantly faster than parsing header by header, at the cost of a bit more memory usage.

@deadlocklogic
Copy link
Contributor Author

deadlocklogic commented Jan 2, 2024

Since it parses all the headers as a single translation unit, it can be significantly faster than parsing header by header, at the cost of a bit more memory usage.

I can't afford any more memory. I tried -fsyntax-only without any effect. Is there any techniques to avoid function body parsing, or at least to free their memory footprint from the AST?
Maybe making the debugText field generation optional?
BTW, I am parsing in Release mode of course, Debug mode is barely suitable for debugging: extremely slow, even more memory hungry.

This commit closes #1791, now will keep this PR active, to add tests and fix the one remaining bug in tests which causes crashing in Debug mode.

@tritao
Copy link
Collaborator

tritao commented Jan 2, 2024

I would try to understand what's going on with that particular project, it must be generating a ton of unnecessary garbage. First figure out how much memory parsing with Clang takes, then figure out how much memory we allocate when processing Clang AST, then drill down from there.

@deadlocklogic
Copy link
Contributor Author

Well I am trying to parse this game engine rbfx, which gets parsed with clang json dump within few minutes, and relatively slow with libclang, but so far crashing because of out of memory with clang frontend (CppSharp).
I am an not even doing anything rather than just getting the sources to parse before any processing, but so far with no luck.

@tritao
Copy link
Collaborator

tritao commented Jan 2, 2024

Cool, I actually developed a prototype C# binding for Urho3D with CppSharp many years ago, so it was definitely parseable back then.

Also @rokups (from rbfx) contributed a few fixes at the time to CppSharp: https://github.com/mono/CppSharp/commits?author=rokups

I would try unity build just in case it helps.

@deadlocklogic
Copy link
Contributor Author

I will try for sure.
While urho3d has a small community, it is friendly and willing to share experience with others.
I usually chat with the guys on discord, and was interested if I could integrate CppSharp with their build system to generate binding helpers for SWIG, because in its current form, this project still falls behind in terms of customization/advanced typemaps in comparison to the former.
Even thought, I have a kind of big idea in mind, if coupled with this project, could make something way better and accurate than SWIG. Still I need a little bit more of experimenting, getting the grounds ready for such a step.

@deadlocklogic
Copy link
Contributor Author

@tritao I tried adding a test, but because the C++17 requirement for deduction guides, a new bug is being introduced when adding C++17 compiler flag. The following tests are not ignoring the default argument in the C# generator:

void defaultNonEmptyCtor(QGenericArgument arg = QGenericArgument(0));
void defaultNonEmptyCtorWithNullPtr(QGenericArgument arg = QGenericArgument(nullptr));
// DEBUG: void defaultNonEmptyCtor(QGenericArgument arg = QGenericArgument(0))
public void DefaultNonEmptyCtor(global::CSharp.QGenericArgument arg = 0)
{
    var ____arg0 = arg.__Instance;
    var __arg0 = new __IntPtr(&____arg0);
    __Internal.DefaultNonEmptyCtor(__Instance, __arg0);
}

// DEBUG: void defaultNonEmptyCtorWithNullPtr(QGenericArgument arg = QGenericArgument(nullptr))
public void DefaultNonEmptyCtorWithNullPtr(global::CSharp.QGenericArgument arg = nullptr)
{
    var ____arg0 = arg.__Instance;
    var __arg0 = new __IntPtr(&____arg0);
    __Internal.DefaultNonEmptyCtorWithNullPtr(__Instance, __arg0);
}

Original valid results were:

// DEBUG: void defaultNonEmptyCtor(QGenericArgument arg = QGenericArgument(0))
public void DefaultNonEmptyCtor(global::CSharp.QGenericArgument arg)
{
    var ____arg0 = arg.__Instance;
    var __arg0 = new __IntPtr(&____arg0);
    __Internal.DefaultNonEmptyCtor(__Instance, __arg0);
}

// DEBUG: void defaultNonEmptyCtorWithNullPtr(QGenericArgument arg = QGenericArgument(nullptr))
public void DefaultNonEmptyCtorWithNullPtr(global::CSharp.QGenericArgument arg)
{
    var ____arg0 = arg.__Instance;
    var __arg0 = new __IntPtr(&____arg0);
    __Internal.DefaultNonEmptyCtorWithNullPtr(__Instance, __arg0);
}

I am not sure which pass is responsible for such modifications, so I can debug further.

@tritao
Copy link
Collaborator

tritao commented Jan 3, 2024

@deadlocklogic
Copy link
Contributor Author

deadlocklogic commented Jan 3, 2024

Is HandleDefaultParamValuesPass broken then?
Because as far as I know, the pass role is to find functions with default values then create overloads and delete the default value (in my own library I had something similar).
So, I would expected something like that generated for MethodsWithDefaultValues::defaultNonEmptyCtor():

public void DefaultNonEmptyCtor(global::CSharp.QGenericArgument arg)
{
    var ____arg0 = arg.__Instance;
    var __arg0 = new __IntPtr(&____arg0);
    __Internal.DefaultNonEmptyCtor(__Instance, __arg0);
}
public void DefaultNonEmptyCtor()
{
    var ____arg0 = arg.__Instance;
    __Internal.DefaultNonEmptyCtor(__Instance);
}

Edit:

By the way, I noticed that:

  1. the Expr API is messy and needs cleanup, maybe cleanup all obsolete API.
  2. we can use the built-in APValue::printPretty instead of the manual APValuePrinter which has bugs.
  3. native Parameter has defaultArgument and defaultValue, only defaultArgument is used.

Edit2:

Indeed DefaultNonEmptyCtor() is generated.

public void DefaultNonEmptyCtor()
{
    DefaultNonEmptyCtor(new global::CSharp.QGenericArgument());
}

But in my own experience, I would've preferred to generate a matching CLI function which calls the native function with defaulted arguments.

Edit2:

After a bit of debugging: I found that in:

void defaultNonEmptyCtor(QGenericArgument arg = QGenericArgument(0));

parameter.OriginalDefaultArgument is parsed as CXXConstructExprObsolete with default setup but as BuiltinTypeExpressionObsolete when we use Parser.LanguageVersion.CPP17.
Need further debugging in order to analyze why the behavior diverges.
BTW, a cleaner/foolproof solution would follow what SWIG does:

  1. generate overloads for every default argument combination for CLI and C# generators separately. Just as if default arguments were not allowed in C++ and we were restricted to create overloads instead (same as if we were coding in java).
  2. generate fields as C# properties: this way we address 2 current issues which the C# backend is suffering from: offset calculation, and default initializers, especially the latter which, as in 1, needs extra boilerplate and checkups to correctly marshal the value, and might introduce many bugs/corner cases.

@deadlocklogic
Copy link
Contributor Author

deadlocklogic commented Jan 4, 2024

After further debugging, I found out that C++17 might parse CXXConstructExprClass as CXXFunctionalCastExpr, so even further modifications are needed.
So best option right now, is to:

  1. improve the Expr support
  2. consider previous SWIG options discussed before
  3. merge this PR but keep the issue referencing it

@deadlocklogic deadlocklogic marked this pull request as ready for review January 4, 2024 00:08
@tritao
Copy link
Collaborator

tritao commented Jan 4, 2024

Nice debugging, that sounds like an annoying issue.

So should I merge this one right away, assuming we fix this new C++17 issue at a later stage?

@deadlocklogic
Copy link
Contributor Author

deadlocklogic commented Jan 4, 2024

The solution that will solve the default arguments forever is to:

  1. modify HandleDefaultParamValuesPass so it generates all possible overloads for a function with default arguments, no need to check for default arguments expressions.
  2. make HandleDefaultParamValuesPass a global pass and move it to the Driver
  3. when the Expr is stable, we can introduce an option which keeps trivial default values and avoid extra overload generation. But overall we should always prefer correctness over advanced features.

Then we can close this PR and the related open issue, and later improve the Expr API.
If you want, I can work on these fixes so we can close this PR and add tests too.

Edit:

Just to add an extra notice, CXXFunctionalCastExpr is currently treated as a opaque expression so in:
void defaultNonEmptyCtor(QGenericArgument arg = QGenericArgument(0));
QGenericArgument(0) in C++17 is treated as IntegerLiteralClass, therefore the related function passes the HandleDefaultParamValuesPass without any modification as it treats the IntegerLiteralClass as a valid expression for default arguments.

@deadlocklogic
Copy link
Contributor Author

deadlocklogic commented Jan 4, 2024

Again after a lot of digging, and been learning many new conventions along the way, fixing the whole problem with overload generation alone won't be that trivial as it will introduces new constraints which are hard to mitigate considering the current implementation.
Lets dissect the main obstacles and conclude what would be the best solution for the time being:
As I discussed before my obvious solution for default function arguments would be following what SWIG does: generating an overloaded function for every combination.
The only serious drawback is that I don't know how to generate the mangled name for these synthetized functions with different possible ABI. Another minor issue is that we will lose the default argument hint when using intellisense, but nothing too serious.
Because SWIG uses global C-style functions for all the CLI, it doesn't suffer from this issue.
So to keep it simple for the time being, we have 3 solutions each one needs some concessions: (! = strength, ? = weakness)

  1. migrate to a full Expr API and remove the obsolete part, and do the needed refactoring
    • best for the current implementation!
    • more corner cases to consider?
    • when typemaps API is more mature, marshalling problems might occur?
  2. use global c-style functions just for the overloaded methods
    • relatively easy!
    • needs adopting some conventions like naming etc...?
  3. use global c-style functions for all the CLI backend
    • following SWIG!
    • typemaps will be implemented way easier!
    • we can provide a separate CLI backend as option!
    • codebase far from being ready?

Unfortunately, our best bet will be on 1 currently.
Actually, we can create a milestone, for many things I've discussed previously and even in other tickets, just to keep them in mind before we lose note.

@tritao
Copy link
Collaborator

tritao commented Jan 29, 2024

Hey mate, how is progress, still working on this?

@deadlocklogic
Copy link
Contributor Author

@tritao, basically this PR is valid, but I am having hard time deciding how to fix C++17 requirements. Going forward the obsolete Expr API would need refactoring into the newer API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants