- Python 3.12.3 installed on your system.
- Ensure you have
poetry
installed. If not, you can install them usingpip
.
pip install poetry
-
Clone the GitHub Repository:
Clone the GitHub repository you want to install locally using the
git clone
command.git clone https://github.com/dataforgoodfr/12_observatoire_des_imaginaires.git
-
Navigate to the Repository Directory:
Use the
cd
command to navigate into the repository directory.cd 12_observatoire_des_imaginaires/
-
Configure
poetry
to create a Virtual Environment inside the project:Ensure that poetry will create a
.venv
directory into the project with the command:poetry config virtualenvs.in-project true
-
Install Project Dependencies using
poetry
:Use
poetry
to install the project dependencies.poetry install
This will read the
pyproject.toml
file in the repository and install all the dependencies specified. -
Activate the Virtual Environment:
Activate the virtual environment to work within its isolated environment.
On Unix or MacOS:
poetry shell
-
Run & edit notebooks:
jupyter notebook
This code base uses a .env
file at the root directory of the code base.
Variable | Description | Default Value |
---|---|---|
HF_TOKEN | Hugging Face API Token. You must have write access to the datasets. | N/A |
TMDB_API_KEY | TMDB API Token. | N/A |
TMDB_BATCH_SIZE | Number of TMDB entries to download before updating a HF dataset. | 10000 |
TMDB_MAX_RETRIES | Maximum number of times to retry a failed TMDB API call. | 500 |
The observable directory contains an observable framework site that collect film and movie data from datasets on Hugging Face and filters the datasets according to the following rules in order to reduced the size of the data present on the generated web site. This site provides a search UI allow a user to select a specific movie or TV show. The user can then click on the link for their selection to kick off the questionnaire on tally andis destined to be embedded in an iframe in the main Observatoire des Imaginaires web site.
Movies:
- filter out adult movies
- filter out movies released more that two years ago
TV Shows:
- filter out adult shows
The web site is currently hosted on the Observable hosting platform and is available at the following URL:
https://observatoire-des-imaginaires.observablehq.cloud/questionnaire
pre-commit run --all-files
tox -vv
This repo includes invoke for pythonic task execution. To see the is of available tasks you can run:
invoke -l
To run the observable site in development mode you can run:
invoke dev
The French regional TMDB Movies Dataset on Hugging Face can be updated using the following command:
invoke update-movies-dataset
The French regional TMDB Series Dataset on Hugging Face can be updated using the following command:
invoke update-series-dataset
The Python CLI supports the following commands:
python -m observatoire.tmdb.movies --mode=[latest | missing]
python -m observatoire.tmdb.series --mode=[latest | missing]
In the latest
mode, which is the default, these commands sync the latest records from TMDB to our datasets on Hugging Face. In the missing
mode, they calculate which rows may be missing from the Hugging Face datasets and attempt to sync these records.