Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Montage should allow to upload multiple categories at once #236

Open
geertivp opened this issue Oct 3, 2024 · 7 comments
Open

Montage should allow to upload multiple categories at once #236

geertivp opened this issue Oct 3, 2024 · 7 comments

Comments

@geertivp
Copy link

geertivp commented Oct 3, 2024

The ISA Tool allows to load multiple categories at once. This is extremely convenient for related campaigns to be processed as one unit. The same way Montage should allow for multiple categories to load into one Montage campaign. I have created a T299167 in January 2022, but still no solution. Please verify/prioritise...
In addition to that the file URL upload, and the File list upload is broken (problems with UTF-8 characters and " and ' accents).

@mahmoud
Copy link
Member

mahmoud commented Oct 4, 2024

Hey Geert, I believe the unicode issues have mostly been fixed with the Python 3 upgrade earlier this year. I've been pushing changes for any stragglers as they come up, so please try again.

The idea for multiple categories is an interesting one. Are the categories all part of one parent category? Or are they really disjoint? Examples appreciated.

@geertivp
Copy link
Author

geertivp commented Oct 4, 2024

Thanks, Mahmoud.

I had tried to use the CSV File URL upload, but it failed with the following error:
Internal server error: <ExceptionInfo [TypeError: startswith first arg must be bytes or a tuple of bytes, not str] (41 frames, last=Callpoint('load_name_list', 129, 'montage.loaders', './montage/loaders.py', 46, " if filename.startswith('File:'):"))>
I believe that one problem might be related by ' and " in the filenames. Could a tabbed CSV file be a solution (containing HT separator, instead of quoted comma separator) ?
Also the File list solution returned a similar error...

Please find the list of related but disjoint campaigns that we wanted to juror as one unit:

Please note, as a workaround, I have added all the images into the category Images from Wiki Loves Heritage Belgium in 2024: in total 2382 images.

This is not the ideal solution, but the only one I have available, just the way I have done since 2022. I would really like that the multiple categories uploud could be (easily) implemented. This would releaf coordinators to take the complex CSV URL method.

@geertivp
Copy link
Author

geertivp commented Oct 4, 2024

I have written a Python script to easily “merge” (=add) categories to a list of Wikimedia Commons Files: (based on the Category parameter). It is specially written for Wikimedia Commons, but it works for any MediaWiki project.
pwb add_wikitext commons commons Wiki_Loves_Denderland_2024
stdin contains the wiki text to append to each page. The script checks for duplicate content.
This way Montage can load one single (merged) category…
Please find the script here.
Example: File:Zandbergen brug over Dender.jpg

@geertivp
Copy link
Author

geertivp commented Oct 4, 2024

To make the proposed functionality clear:

  • If there would be subcategories, then a simple "recurse depth value" could be added (default 0)
  • If the categories are disjoint, then a list of categories should be mentioned.

Please look at the ISA Tool as an example how to implement the GUI/backend part.

@mahmoud
Copy link
Member

mahmoud commented Oct 4, 2024

I had tried to use the CSV File URL upload, but it failed with the following error: Internal server error: <ExceptionInfo [TypeError: startswith first arg must be bytes or a tuple of bytes, not str] (41 frames, last=Callpoint('load_name_list', 129, 'montage.loaders', './montage/loaders.py', 46, " if filename.startswith('File:'):"))> I believe that one problem might be related by ' and " in the filenames. Could a tabbed CSV file be a solution (containing HT separator, instead of quoted comma separator) ? Also the File list solution returned a similar error...

Right, I figured this was due to the production Montage running a slightly outdated version of the code. I deployed the new code, so you can try the CSV method again. But actually now I see that the issue was that the File List URL provided was at a https://wikimedia.be/public/wlh/ URL, while the Montage code is currently expecting the URL to be a gist. The same file uploaded to https://gist.github.com actually worked as expected when I tried it just now.

This is technically in the field description of the File List URL field (see below), but I agree that reliance on gist alone is suboptimal. I think one reason for this was availability and versioning maybe? We can look at changing it in future, or at least adding validation to the frontend.

image

Thanks for sharing the details on the category list proposition. Just to be sure, you don't feel like having a unified category is useful for archival reasons? Like, if anyone ever wanted to browse the entries on Commons, seems like a single Category following a naming convention might be good vs having to try and find the list of the categories that were added. Might be worth discussing in a separate issue as well.

@geertivp
Copy link
Author

geertivp commented Oct 5, 2024

I have one more question: How do you encode the filename when there is already a double quote in the filename?
Sometimes it happens that both " and ' are within the original filename.

I agree that the file URL interface should not be restricted to https://gist.github.com/ but any webserver should be accepted...

Using one single category is of course the preferred solution, but we had 4 disjoint related campaigns. To load everything in one category, I had to perform a mass update the categories of almost 2000 images... which is not transparent to the owners of the images, nor to the general public that sees a "confusing" additional category...

Please take into account the possibility of loading multiple categories at once as described above.

Please note also that the File List (copy/paste) interface throw a similar error due to (double) quotes in some of the filenames.

Thank you very much for your answers and analysis!

@CiellB
Copy link

CiellB commented Oct 10, 2024

Wiki For Arabic Minorities would also like to have a range of subcategories included in their Montage campaign. I think what might be important when adding an option like this, is that the campaign coordinator can set how deep into the subcategories Montage will have to look. (taken from the glamorgan-tool)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants