Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Majestic" PGF runtime #130

Open
johnjcamilleri opened this issue Aug 9, 2021 · 91 comments
Open

"Majestic" PGF runtime #130

johnjcamilleri opened this issue Aug 9, 2021 · 91 comments

Comments

@johnjcamilleri
Copy link
Member

@krangelov has mentioned at the summer school his ideas on completely reimagining the PGF format as a database. @anka-213 has expressed his interest in contributing, and I (@johnjcamilleri) also think I could be of use here. I'm creating this issue to be the home of that discussion.

@krangelov
Copy link
Member

krangelov commented Aug 10, 2021 via email

@johnjcamilleri
Copy link
Member Author

Wow, this seems a lot more advanced than you let on! I will take some time to better understand how it works.

@odanoburu
Copy link
Contributor

Hi @krangelov, pretty cool work! I remember hearing something about using SQLite for this work, which would give you transactions for free and also a platform-independent format (I'm assuming it would take the role of the .ngf files). Have you decided against it? I'm just curious and would love to contribute (I don't think I can, but I'll follow the discussion and if it turns out I can I'll say something!)

@krangelov
Copy link
Member

krangelov commented Aug 10, 2021 via email

@aarneranta
Copy link
Contributor

aarneranta commented Aug 10, 2021 via email

@odanoburu
Copy link
Contributor

@krangelov I see, thanks for the explanation! and daison looks interesting indeed!

@johnjcamilleri
Copy link
Member Author

I figure the best way I can start contributing to this effort is to be your first user! Which probably means I will have lots of annoying setup questions. To start with, I tried to build the runtime using the instructions from the old C runtime, which failed. I set up a CI workflow here which documents what I tried and also contains the current error I get. I'm happy to contribute to other kinds of documentation (as library docs and how-to guides) as I learn more.

Going forward, we should add whatever testsuite you have (mentioned earlier) to this workflow.

@johnjcamilleri
Copy link
Member Author

To be explicit, I am trying:

autoreconf -i
./configure
make

and getting:

make[1]: Entering directory '/home/runner/work/gf-core/gf-core/src/runtime/c'
  CXX      pgf/db.lo
  CXX      pgf/text.lo
  CXX      pgf/pgf.lo
  CXX      pgf/reader.lo
make[1]: *** No rule to make target 'pgf/expr.cxx', needed by 'pgf/expr.lo'.  Stop.
make[1]: Leaving directory '/home/runner/work/gf-core/gf-core/src/runtime/c'

What am I doing wrong?

@krangelov
Copy link
Member

krangelov commented Aug 13, 2021 via email

@johnjcamilleri
Copy link
Member Author

Thanks Krasimir, the CI workflow now builds everything and runs the tests successfully.
I will keep poking around to understand things better. Probably the next thing I can work on is adding more tests.

@johnjcamilleri
Copy link
Member Author

most of the code in the Python binding is commented out. It compiles, but
the only thing that you can do there is to load a grammar. I only needed
that in order to test that it is possible to load the runtime dynamically.

As far as I can see, nothing is commented out in src/runtime/python/pypgf.c?

I added a simple test to load the basic.pgf file from Python:

import pgf
import sys

sys.stdout.write("loading...")
sys.stdout.flush();
gr = pgf.readPGF("../haskell/tests/basic.pgf")
sys.stdout.write("\n")

but I get a segmentation fault. The CI output is here, and I get the same locally.

@krangelov
Copy link
Member

krangelov commented Aug 24, 2021 via email

@johnjcamilleri
Copy link
Member Author

Ah, thanks for the clarification!

@johnjcamilleri
Copy link
Member Author

Hi Krasimir,
I'm not experienced in C++, but I think this work is so important to the future of GF that it is worth me taking time to learn C++ itself as well as the internal details of how this new runtime works. I might be slow initially, but perhaps you can think of some simpler tasks to give me now, to help me get up to speed?

@krangelov
Copy link
Member

krangelov commented Aug 26, 2021 via email

@krangelov
Copy link
Member

krangelov commented Aug 27, 2021 via email

@johnjcamilleri
Copy link
Member Author

Great, I will start working on getting these implemented.

@johnjcamilleri johnjcamilleri pinned this issue Aug 30, 2021
@johnjcamilleri
Copy link
Member Author

johnjcamilleri commented Aug 30, 2021

Checklist for Python bindings:

Reading from file

  • pgf_readNGF
  • pgf_bootNGF

Restore from old code

  • PGF_getAbstractName
  • PGF_getCategories
  • PGF_getFunctions
  • PGF_functionsByCat

Require marshalling of abstract expressions

  • PGF_functionType
  • PGF_getStartCat
  • Expr_repr
  • pgf_readExpr
  • Type_repr
  • pgf_readType

@johnjcamilleri
Copy link
Member Author

So I have managed to implement the simpler functions for getting categories/functions by restoring old code, but one thing I can't figure out is how the error handing is supposed to work. Namely, the GuExn* err parameter seems to have disappeared from functions such as pgf_iter_categories.

For now the old error-handling code is commented out like so:

static PyObject*
PGF_getCategories(PGFObject *self, void *closure)
{
    PyObject* categories = PyList_New(0);
    if (categories == NULL)
        return NULL;

    // GuPool* tmp_pool = gu_local_pool();
    //
    // Create an exception frame that catches all errors.
    // GuExn* err = gu_new_exn(tmp_pool);

    PyPGFClosure clo = { { pgf_collect_cats }, self, categories };
    pgf_iter_categories(self->pgf, &clo.fn);

    // if (!gu_ok(err)) {
    //     Py_DECREF(categories);
    //     gu_pool_free(tmp_pool);
    //     return NULL;
    // }
    //
    // gu_pool_free(tmp_pool);
    return categories;
}

Any pointers here?

@krangelov
Copy link
Member

krangelov commented Aug 31, 2021 via email

@johnjcamilleri
Copy link
Member Author

Ok, what about the cases below? It seems possible that PyString_FromString and PyList_Append can fail. Do you think it's safe to ignore those too?

static void
pgf_collect_funs(PgfItor* fn, PgfText* key, void* value)
{
    PgfText* name = key;
    PyPGFClosure* clo = (PyPGFClosure*) fn;

    PyObject* py_name = NULL;

    py_name = PyString_FromString(name->text);
    if (py_name == NULL) {
        // gu_raise(err, PgfExn);
        goto end;
    }

    if (PyList_Append((PyObject*) clo->collection, py_name) != 0) {
        // gu_raise(err, PgfExn);
        goto end;
    }

end:
    Py_XDECREF(py_name);
}

@krangelov
Copy link
Member

krangelov commented Aug 31, 2021 via email

@johnjcamilleri
Copy link
Member Author

Thanks Krasimir. On the subject of strings, does this look right to you?:

static PyObject*
PGF_functionsByCat(PGFObject* self, PyObject *args)
{
    const char* c;
    if (!PyArg_ParseTuple(args, "s", &c))
        return NULL;

    const size_t s = strlen(c);

    PgfText* catname = (PgfText*) alloca(sizeof(PgfText)+s);
    strcpy(catname->text, c);
    catname->size = s;

    ...

I need to read the parameter from args into a PgfText (previously PgfCId). But PyArg_ParseTuple wants to put it into a char*, so I have to do this allocation + copy. I'm guessing strlen also works on null-terminated strings, but maybe here this is the correct solution since the string in the arguments not coming from the runtime?

@krangelov
Copy link
Member

krangelov commented Aug 31, 2021 via email

@johnjcamilleri
Copy link
Member Author

I've starting working on the marshalling in the Python bindings (see 3ecb937). I think it's starting to make sense, but I'd like to run a few things by you.

  • What should TypeObject contain (see expr.h)? Currently I have just a PgfText* in there for the type, but it needs more structure for the hypotheses etc. But am I correct in that we do not need a PgfType* in there anymore?
  • Next, when building a type object to return in the unmarshaller (see dtyp in marshaller.c), I am allocating a TypeObject and then returning it by casting it to a PgfType. Is that right?
  • Then, when using something which is created by the unmarshaller (see pgf_readType in pypgf.c), I just cast it back to a TypeObject* again. I think this aligns with what you wrote in the documentation, but would appreciate if you you confirm this.

@krangelov
Copy link
Member

krangelov commented Sep 5, 2021 via email

@johnjcamilleri
Copy link
Member Author

In TypeObject, I would put something like:

typedef struct {
    PyObject_HEAD
    PyList* hypos;
    PyString *cat;
    PyList* exprs;
} TypeObject;

In this way all fields use native Python types. The list hypos should
contain tuples with binding type, variable and the type of the variable.
The list exprs should contain objects of type ExprObject.

Understood!
What about the member PyObject* master, is this still relevant? I only half understood what its purpose is/was.

@johnjcamilleri
Copy link
Member Author

In the process I've come across another portability problem, namely the initialisation of flexible array members, which is apparently a GCC extension:

pgf/pgf.cxx:34:34: error: initialization of flexible array member is not allowed
PgfText master = {size: 6, text: {'m','a','s','t','e','r',0}};

References:

This seems to be used in two places, for the definitions of master in pgf.cxx and wildcard in expr.cxx. I've tried a few things but nothing fits neatly in the current code. My ideas:

  • There is some library-wide init function which can allocate the memory and write the values for these PgfText "constants".
  • The constants are not stored as a PgfText's but as two separate values:
    size_t master_size = 6;
    char master_text[] = {'m','a','s','t','e','r',0};
    
    and you have to do a little more work when comparing stuff with them.

@johnjcamilleri
Copy link
Member Author

Another issue related to the use of unsigned ints and EOF:

uint32_t PgfExprParser::getc()
{
    uint32_t ch = pgf_utf8_decode((const uint8_t **) &pos);
	if (pos - ((const char*) &inp->text) > inp->size) {
		ch = EOF;
	}
    return ch;
}
pgf/expr.cxx:368:7: error: case value evaluates to -1, which cannot be narrowed to type 'uint32_t' (aka 'unsigned int') [-Wc++11-narrowing]
        case EOF:
             ^

@johnjcamilleri
Copy link
Member Author

Also:

ld: library not found for -lrt

Removing it from libpgf_la_LIBADD in Makefile.am seems to allow everything to compile.

@johnjcamilleri
Copy link
Member Author

I've pushed my current changes (WIP) to the branch majestic-macos, relevant diff is here: 8c721e0...majestic-macos

@krangelov
Copy link
Member

krangelov commented Oct 12, 2021 via email

@krangelov
Copy link
Member

krangelov commented Oct 12, 2021 via email

@krangelov
Copy link
Member

krangelov commented Oct 12, 2021 via email

@krangelov
Copy link
Member

krangelov commented Oct 12, 2021 via email

@johnjcamilleri
Copy link
Member Author

librt is needed ipc.cxx but that module is not used yet. That is why removing it worked.

From what I understand, whatever is provided by librt on Linux is included in standard libraries on macOS, so it's just a case of the flag itself not being needed on macOS.

https://stackoverflow.com/questions/34379290/library-rt-not-found-on-mac-while-configuring-sfft-sparse-fast-fourier-transfo

@johnjcamilleri
Copy link
Member Author

Well, compiling now works on macOS (yay!), but readPGF immediately gives me a segmentation fault (boo). I'll see what I can find out by debugging.

@johnjcamilleri
Copy link
Member Author

I compiled and run a minimal example. Putting all details here for reference.

Code

#include "pgf/pgf.h"

int main () {
  PgfRevision rev;
  PgfExn err;
  PgfDB *obj = pgf_read_pgf("/Users/john/repositories/GF/gf-core-majestic/src/runtime/haskell/tests/basic.pgf", &rev, &err);
}

Command

clang -lpgf -L/usr/local/lib main.c
lldb -o run ./a.out

Output

Process 81037 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10016d058)
    frame #0: 0x000000010012c7f2 libpgf.0.dylib`PgfDB::~PgfDB(this=0x0000000100404100) at db.cxx:368:17 [opt]
   365 	PgfDB::~PgfDB() {
   366 	    if (ms != NULL) {
   367 	        size_t size =
-> 368 	            ms->top + chunksize(ptr(ms,ms->top)) + sizeof(size_t);
   369 	
   370 	        munmap(ms,size);
   371 	    }
Target 0: (a.out) stopped.

Backtrace

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10016d058)
  * frame #0: 0x000000010012c7f2 libpgf.0.dylib`PgfDB::~PgfDB(this=0x0000000100304100) at db.cxx:368:17 [opt]
    frame #1: 0x000000010012f971 libpgf.0.dylib`::pgf_read_pgf(fpath=<unavailable>, revision=<unavailable>, err=0x00007ffeefbff908) at pgf.cxx:70:9 [opt]
    frame #2: 0x0000000100003f2c a.out`main + 28
    frame #3: 0x00007fff2046bf3d libdyld.dylib`start + 1
    frame #4: 0x00007fff2046bf3d libdyld.dylib`start + 1

Since this doesn't happen in Linux (inside Docker), I first suspected that the culprit is this workaround for mremap in db.cxx line 887:

#ifndef MREMAP_MAYMOVE
            if (munmap(ms, old_size) == -1)
                throw pgf_systemerror(errno);
            malloc_state* new_ms =
                (malloc_state*) mmap(0, new_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
#else
            malloc_state* new_ms =
                (malloc_state*) mremap(ms, old_size, new_size, MREMAP_MAYMOVE);
#endif

However I a set breakpoint there which did not get triggered, so that can't be the problem?
The backtrace mentions pgf.cxx:70, which is this:

69    if (db != NULL)
70        delete db;

@johnjcamilleri
Copy link
Member Author

Slightly more investigation reveals that the read itself isn't working. If I comment out that destructor code, pgf_read_pgf gives me a PGF_EXN_SYSTEM_ERROR with code 9 (Bad file number).

@johnjcamilleri
Copy link
Member Author

Also, loading from NGF with pgf_read_ngf seems to work perfectly well.

@krangelov
Copy link
Member

krangelov commented Oct 13, 2021 via email

@krangelov
Copy link
Member

krangelov commented Oct 13, 2021 via email

@johnjcamilleri
Copy link
Member Author

I've stepped through in the debugger but I cannot see anything going wrong with the call to mmap. The check with MAP_FAILED immediately afterwards is false.
That page you linked is outdated, plus seems to be for iOS not macOS. When I run man mmap on my system it says:

     MAP_ANONYMOUS     Synonym for MAP_ANON.

I couldn't find the documentation online, so here's a gist of it: man mmap

It seems to be this call in PgfReader::read_pgf which is causing the crash:

read_abstract(ref<PgfAbstr>::from_ptr(&pgf->abstract));

I know that read_absract is never reached. When I try to print pgf->abstract in the debugger I get the message "Couldn't lookup symbols" (maybe I need to recompile the runtime with/out some flags).

@krangelov
Copy link
Member

krangelov commented Oct 14, 2021 via email

@johnjcamilleri
Copy link
Member Author

johnjcamilleri commented Oct 14, 2021

Thanks! Either I was doing something wrong before, or turning off optimisation has changed things a little, such that now it seems that the call to

    abstract->funs = read_namespace<PgfAbsFun>(&PgfReader::read_absfun);

in PgfReader::read_abstract is the source. At some point it seems to "crash" and then the destructor of PgfDB suddenly gets called:

Process 99205 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100140a1c libpgf.0.dylib`PgfReader::read_absfun(this=0x00007ffeefbff888) at reader.cxx:382:9
   379 	ref<PgfAbsFun> PgfReader::read_absfun()
   380 	{
   381 	    ref<PgfAbsFun> absfun =
-> 382 	        read_name<PgfAbsFun>(&PgfAbsFun::name);
   383 	    absfun->ref_count = 1;
   384 	    ref<PgfExprFun> efun =
   385 	        ref<PgfExprFun>::from_ptr((PgfExprFun*) &absfun->name);
Target 0: (a.out) stopped.
(lldb) n
Process 99205 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x100189058)
    frame #0: 0x000000010012a7f7 libpgf.0.dylib`PgfDB::~PgfDB(this=0x00000001003044d0) at db.cxx:369:17
   366 	PgfDB::~PgfDB() {
   367 	    if (ms != NULL) {
   368 	        size_t size =
-> 369 	            ms->top + chunksize(ptr(ms,ms->top)) + sizeof(size_t);
   370 	
   371 	        munmap(ms,size);
   372 	    }
Target 0: (a.out) stopped.

No, I know nothing about emulating macOS in Ubuntu. A quick search seems to indicate that it's actively discouraged by Apple. You could test things in a GitHub Actions workflow (which I will set up for macOS anyway) although testing by pushing & waiting is hardly optimal either. Perhaps we should schedule a meeting and debug together.

@krangelov
Copy link
Member

krangelov commented Oct 14, 2021 via email

@johnjcamilleri
Copy link
Member Author

I realized what is probably happening if indeed you actually go to the munmap/mmap part of the code.

When we have a file (i.e. fd > 0), then munmap/mmap works since the data is stored in the file. On the other hand when you have fid < 0 then munmap is just a more efficient version of free, and mmap becomes equivalent to malloc. In that case, of course, after free+malloc we will lose all the data.

On Mac the case when fd < 0 should be treated differently. In PgfDB::PgfDB the initial block should be allocated with malloc. In PgfDB::malloc_internal, instead of munmap+mmap we should use realloc().

I understand what you're suggesting and tried to make the changes in the majestic-macos branch, however there's a compilation error I can't understand:

pgf/db.cxx:331:28: error: no matching function for call to 'malloc'
      ms = (malloc_state*) malloc(file_size); // doesn't compile
                           ^~~~~~

More generally, what is the reason that this needs to be different for macOS?

@krangelov
Copy link
Member

krangelov commented Oct 15, 2021 via email

@johnjcamilleri
Copy link
Member Author

Right, thanks! This seems to have worked. I'm now getting a lot further and pass all the Haskell "basic" tests. Now I'm getting a new error:

transactions: branchPGF: invalid argument (Invalid argument)

but I will need to debug this further. Could it be related to the call to mmap in ipc.cxx?

@krangelov
Copy link
Member

krangelov commented Oct 15, 2021 via email

@johnjcamilleri
Copy link
Member Author

If I want to use the runtime from C, surely I don't need to implement my own un/marshaller? I see there are implementations of all the marshalling functions in expr.cxx, but these are not exposed in any way.
Maybe if one passes NULL when an unmarshaller is needed, it can/should use these internal implementations?

@johnjcamilleri
Copy link
Member Author

The cause behind the 'Invalid argument' error mentioned above seems to be this line in PgfDB::sync:

int res = msync((void *) ms, size, MS_SYNC | MS_INVALIDATE);

Which returns -1. Any idea how this should be changed for macOS (when ms is malloc'ed)?

@johnjcamilleri
Copy link
Member Author

For reference, macOS man page for msync: https://gist.github.com/johnjcamilleri/1cdc73d25420302035fb40df1aac2d75#file-msync

@johnjcamilleri
Copy link
Member Author

Any idea how this should be changed for macOS (when ms is malloc'ed)?

I tested not calling mysnc at all in this case, and seems to work on all platforms (see CI here). But I'm not sure if it's the right solution.

@krangelov
Copy link
Member

krangelov commented Oct 18, 2021 via email

@krangelov
Copy link
Member

krangelov commented Oct 18, 2021 via email

@johnjcamilleri
Copy link
Member Author

Ok, thanks for the clarifications!

I haven't anyone yet who would like to use the runtime from C :-)

Haha fair enough. I did this myself for testing purposes, in order to exclude the bindings themselves as a source of problems, but actually I managed to avoid using any functions that required un/marshalling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants