Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run scraper on local machine to gather regulations.gov comments for the NILC #57

Open
dotj opened this issue Nov 18, 2018 · 2 comments
Open

Comments

@dotj
Copy link
Contributor

dotj commented Nov 18, 2018

Continuation of #48:

We have not yet been able to get a VM up to run the scraper, so we need your help running the scraper locally in order to gather an initial dataset that the NILC can look at.

The documentation for the scraper can be found here: https://github.com/Data4Democracy/immigration-connect/tree/master/public-charge/scraper

Ping me (@dotj) or @alejandrox1 here or post in the #immigration-connect slack page if you need help,

We've seen each page (50 comments) take about 4 minutes to scrape, and there are currently almost 10k comments, so it will take about 13 hours total. Of course, this is dependent on your internet speed and various other factors.

Tasks

  • Set up the scraper locally
  • Let the scraper run and collect all the comments (~13 hours)
@coreyar
Copy link

coreyar commented Mar 30, 2019

I started to take a look at this issue as a first contribution and have a couple of questions.

I was able to get the docker container running but was unable to runpython get_comments.py because the file wasn't being added to the container. I believe all the python files and database should get added to the container also. Would it be preferable to add the python files after installing requirements to avoid running the requirements install build step if the python files change?

I also ran into an error running python get_comments.py. What is the development process? The dockerfile is using ADD rather than volume so file changes are not being synced to the container.

@coreyar
Copy link

coreyar commented Mar 30, 2019

Ahh I see where it is creating the volume in the docker run command -v $(CURDIR):/opt/app.

It seems that the issue is I don't have CURDIR set. I am not too familiar but I think it might be related to make. The other issue I had is also related to the DISPLAY env var which also isn't getting set properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants