-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document this module and make it easier for others to re-run #3
Comments
I'm on it. |
I believe this is Kevin's analysis based on this module: https://meta.wikimedia.org/wiki/Research:Wiki_Ed_student_editor_contributions_to_sciences_on_Wikipedia |
As for the format... I guess the best option would probably be to add a new markdown file with details on how to use it, along with inline comments for anything within the code that you think should be clarified. |
okay. Thank you |
[Help needed] The main issue I have been having for days now is that the I am planning to replicate Kevin's research using topic contribution data for the year 2020. This is the level at which the command is at now: https://pastebin.com/SFVSXKwp |
Thanks for the update! Hmm... I suspect that Kevin may have done this from Toolforge, and even if he didn't, that's probably the best way around the problem you're facing. https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction I suggest going through the process to get Toolforge access and try to do it from there, since using a server within the same clould environment should make the dump downloads much faster and more reliable. |
Thank you. I'll go through the guide, set it up and give it another try. |
I requested for toolforge access and it says here (https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Quickstart) that I have to wait a week for it to be granted. Are there any other related tasks that I can be working on until then? |
Hopefully it will be less than a week, but here's another related analysis module you could look at: https://github.com/WikiEducationFoundation/academic_classification Similarly to this one, it's from the work Kevin was doing several years ago and we'd love to be able to easily re-run similar analyses on more recent data, so documenting where the bottlenecks and problems are will be helpful. |
Okay. I'm checking it out now |
I have received toolforge access and it's taking me surprisingly long to understand how to use it. I apologize for my speed so far, I am doing my best to make a substantial contribution before the 30th. |
Thanks @tab1tha! Sorry I couldn't provide a more clear-cut way to dive in. |
I have a few questions, Do I need to create a toolforge tool?. |
yes, creating a tool might be the best way to go. The 'toolforge' shell probably just means the terminal once you've logged on to toolforge via SSH. If you can get to the PAWS dumps, I think that means you're in the toolforge shell already. |
Ohh. This is helpful. Thank you |
[Update: help needed] I receive Error [13] which says that I do not have file permissions but when I check, it shows that I do have all the file permissions for that folder. Trying to use |
@tab1tha it looks like |
using the relative path /home/tambetabitha/demo_results yields this instead https://pastebin.com/DhMSriMh |
That seems like progress, perhaps. I don't know why it would be trying to treat that gzip file as a directory, though. |
I have been trying to figure that out too. I'm looking at the code now. |
I think it fails because the regex in `def _get_files_to_work_on(input_dir): raw_files = [join(input_dir, f) for f in listdir(input_dir) It is therefore necessary to unzip the file before passing it as a command line argument. Alternatively, we could adjust the regex code in the topics.cmdline module such that receives both zipped and unzipped files and in the case where the file in zipped, it unzips it using gzip. |
In the meantime, considering that the commit of Demonstration.md is part of Pull request 5, I have changed the pull request name to a more appropriate one. Also, am I on track with respect to the content and format of Demonstration.md so far? Is there something else that you expected or would want me to add? |
To enable handling of .gz files, I have considered adding a try-except clause to the topics.cmdline._get_files_to_work_on function as such: Is this okay? What would you prefer? |
Anything that works is fine with me! I don't have much of a sense for what is the most Pythonic way to do things, so use your best judgment. |
Okay. I'm on it ! |
This module hasn't been used to recreate the original analysis Kevin did in several years. Try to make it work, document any problems found, and add documentation and/or fixes to make it practical to do similar analyses on a regular basis.
The text was updated successfully, but these errors were encountered: