- Setup MongoDB and get it running
- setup virtual environment, and activate it
- install any needed requirements
pip install -r requirements.txt
- install the application package through
pip install -e .
- Add an environment variable
FLASK_APP
pointing tobeehivedata.beehivedata
- Initialise the database by running
flask init_db
- Fetch charity data by running
flask fetch_charities
and thenflask import_charities
- Fetch any grants data using
flask fetch_all
- Run the server with
flask run
(Installed through requirements.txt
)
- flask
- flass-sass - no pypi package
- flask_login
- flask_wtf
- gunicorn - for deployed version
- numpy (for windows use numpy-1.12.1+mkl-cp35-cp35m-win32.whl)
- pymongo
- pytest
- python-dateutil
- requests
- scipy (for windows use scipy-0.19.0-cp35-cp35m-win32.whl)
- sklearn
- slugify
- flatten-tool - no pypi package
- titlecase
- mechanicalsoup
Run flask run
from the command line.
For development/debug mode set FLASK_DEBUG
environmental variable to 1
.
Charity data can be downloaded using the flask fetch_charities
command, and then
imported into mongodb by running flask import_charities
. The data comes from:
When fetching data on Scottish charities you'll need to agree to the terms and conditions.
This step can either be run in one go using flask fetch_all
, or in the individual
steps shown below. You can also run all the update procedures without fetching
new data by running flask update_all
This command will fetch the data registry and save it to a mongo database. It then goes through the data registry, downloads each file, converts to json (if needed) and save all the grants in the database.
Run the command using:
$ flask fetch_data
The command can also be run to just fetch the files that have been updated since a given date:
$ flask fetch_data --files-since 2017-01-01
You can also set it to just download the data for a particular funder, using a comma-separated list of the funder prefixes, slugs or names. Eg:
$ flask fetch_data --funders 360G-ocf
The command line options for this are:
--files-since
: fetch only files updated after this date (inYYYY-MM-DD
format, default all files)--funders
: only fetch these funders (list of funder prefixes separated by comma, default all funders)--registry
: where to find the data registry (defaulthttp://data.threesixtygiving.org/data.json
)
These two steps update the organisations in the data. They are run using:
$ flask update_organisations
$ flask update_charity
update_organisations
tries to guess the organisation type of the recipient
organisation and apply the Beehive codes to it. It also processes the grant
according to the function in fetch_data
, so it can be useful to rerun if
you don't want to fetch all the data again
update_charities
gets data about the recipient from the charities
MongoDB collection. It then tries to work out the type of organisation, how long
they have operated for, and get the latest financial information.
Note: this stage allows for multiple recipients, but the end result only outputs the first recipient.
@todo: Add in companies data here too.
Using regexes and other techniques, try to identify the beneficiaries of each grant, including the age range and gender.
$ flask update_beneficiaries
Using regexes and other techniques, try to identify the countries served by each grant.
$ flask update_geography
The site is designed to be deployed using Heroku. You'll need to run a mongodb
instance and make the connection URI
available as a config variable MONGODB_URI
.
The site uses pytest
to run the tests. The test database will be created with
a different database name, and then destroyed at the end of every test.
The tests use seed data from tests/seed_data
which is based on actual 360giving
data. Some of the files have been changed to give a wider range of test scenarios.
The tests are run by running:
$ python -m pytest tests
The deployed version of the site also has circleci integration meaning the tests are run after every github commit. The current test status is: