-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvest dataservices #3029
Harvest dataservices #3029
Conversation
cece558
to
53b2cd8
Compare
352d255
to
5fa9e5b
Compare
053a3bd
to
0c66e1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the way
0c66e1e
to
66fd737
Compare
79c140a
to
8e3e5ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thank you for this PR! I'm looking forward to seeing this live 👏
I haven't dived in all the weird DCAT cases, but let's add support for poorly described catalogs in following PRs if needed!
Co-authored-by: maudetes <[email protected]>
8290e1f
to
b401700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good to go for a first iteration! 👏
I've added some minor suggestion as well :)
Fix datagouv/data.gouv.fr#1353
To harvest dataservices we first need to harvest datasets (because dataservices reference datasets in
serveDatasets
attribute). But right now, datasets are harvested asynchronously (by saving anHarvestJob
and then queuing these jobs independently). It means we need to wait that all datasets are done before starting harvesting dataservices. Multiple options:HarvestJob
for debug only purposefinalize
function that is called at the end of all the jobs. not a big fan because it adds one more class and a lot of codeHarvestJob
(either the same model than for the datasets or a new one) and do some celery magic to dispatch all jobs with dependencies chains. not a big fan because it complexify a lot the architecture