Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This Library #2

Open
MrWolvwxyz opened this issue Nov 6, 2013 · 1 comment
Open

This Library #2

MrWolvwxyz opened this issue Nov 6, 2013 · 1 comment

Comments

@MrWolvwxyz
Copy link

Hey this code looks perfect for a research project I'm working on. I downloaded the code through canopy and now I'm just trying to figure out how this code works. Do you have any documentation or file to start reading to understand better?

@saffsd
Copy link
Owner

saffsd commented Nov 7, 2013

I don't have any plans at the moment to develop this project in the immediate future. That said, it is in a usable state, and I've used it myself fairly recently. I'm not familiar with canopy, but if you install it like a normal Python package it will install a command-line tool, wikidump. wikidump -h provides some details on how to use it. When run, wikidump will generate a config file wikidump.cfg in the directory it was run it. This config file contains two paths you will need to amend, 'scratch', where the indexes can be stored, and 'xml_dumps', a path to a directory containing the downloaded xml dumps from Wikipedia. I've personally been using wp-download to download the dumps, so the path that wp-download saves them to is the path you want to set xml_dumps to. After downloading the relevant dumps, do wikidump index, and thereafter you can use wikidump dataset to pull out a dataset. Each of the commands should have a bit of help text, for example wikidump dataset -h. Let me know if you need help with figuring out how to do anything specifically, and I'll see if it can be done under the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants