Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add times, dates and stations datasets #1

Merged
merged 23 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
28dc8db
chore: Remove dummy module and script
saattrupdan Oct 24, 2023
ac032a9
feat: Add dates and times modules
saattrupdan Oct 24, 2023
7419437
chore: Add deps
saattrupdan Oct 24, 2023
4e5a756
feat: Add bus_stops_and_stations module
saattrupdan Oct 24, 2023
f27c088
feat: Add output_dir arguments
saattrupdan Oct 24, 2023
2fc5752
feat: Add build_tts_dataset script
saattrupdan Oct 24, 2023
aaff3f4
Initial commit
saattrupdan Oct 24, 2023
a0067d1
feat: Add utils module
saattrupdan Oct 24, 2023
7363ec6
feat: Add interleaving of datasets
saattrupdan Oct 24, 2023
9823638
style: Remove comma before year
saattrupdan Oct 24, 2023
16c1cac
fix: Add 0-prefixes to hour and minute
saattrupdan Oct 24, 2023
fc4fe6d
chore: Add webdriver_manager to deps
saattrupdan Oct 24, 2023
8529e16
feat: Set up Dockerfile
saattrupdan Oct 24, 2023
23ea57d
feat: Add data folder as a volume to docker
saattrupdan Oct 24, 2023
1c6c2e3
fix: It's okay to run unlink if file doesn't exist
saattrupdan Oct 24, 2023
4f853a2
feat: Automatically install Gecko Driver
saattrupdan Oct 24, 2023
a941689
fix: Path in volume should be relative
saattrupdan Oct 24, 2023
5bc2fb0
docs: Add author
saattrupdan Oct 24, 2023
06a5be8
chore: Deps
saattrupdan Oct 24, 2023
3775a0f
feat: Simplify fetching of table with bus stops and stations
saattrupdan Oct 25, 2023
eed0fb2
docs: Update readme with cov badge, tree and quick start
saattrupdan Oct 25, 2023
6f0196d
chore: Ignore ruff cache
saattrupdan Oct 25, 2023
e03b682
feat: Add years 1970-1999 as well in the dates dataset
saattrupdan Oct 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,6 @@ models/*

# Dotenv file with name and email
.name_and_email

# Linting cache
.ruff_cache
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ RUN poetry env use python3.11
RUN poetry install --no-interaction --no-cache --without dev

# Run the script
CMD poetry run python src/scripts/your_script.py
CMD poetry run python src/scripts/build_tts_dataset.py
110 changes: 22 additions & 88 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,109 +7,37 @@ ______________________________________________________________________
[![Documentation](https://img.shields.io/badge/docs-passing-green)](https://alexandrainst.github.io/tts_text/tts_text.html)
[![License](https://img.shields.io/github/license/alexandrainst/tts_text)](https://github.com/alexandrainst/tts_text/blob/main/LICENSE)
[![LastCommit](https://img.shields.io/github/last-commit/alexandrainst/tts_text)](https://github.com/alexandrainst/tts_text/commits/main)
[![Code Coverage](https://img.shields.io/badge/Coverage-100%25-brightgreen.svg)](https://github.com/alexandrainst/tts_text/tree/main/tests)
[![Code Coverage](https://img.shields.io/badge/Coverage-47%25-orange.svg)](https://github.com/alexandrainst/tts_text/tree/main/tests)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](https://github.com/alexandrainst/tts_text/blob/main/CODE_OF_CONDUCT.md)


Developers:

- Anders Jess Pedersen ([email protected])
- Dan Saattrup Nielsen ([email protected])


## Setup
## Quick Start

### Installation
The quickest way to build the dataset is using Docker. With Docker installed, simply
write `make docker` and the final dataset will be built in the `data/processed`
directory, with the individual datasets in `data/raw`.

1. Run `make install`, which installs Poetry (if it isn't already installed), sets up a virtual environment and all Python dependencies therein.
2. Run `source .venv/bin/activate` to activate the virtual environment.

### Adding and Removing Packages

To install new PyPI packages, run:

```
poetry add <package-name>
```

To remove them again, run:
```
poetry remove <package-name>
```

To show all installed packages, run:
```
poetry show
```

## Development Setup

## A Word on Modules and Scripts
In the `src` directory there are two subdirectories, `tts_text`
and `scripts`. This is a brief explanation of the differences between the two.
To install the project for further development, run the following steps:

### Modules
All Python files in the `tts_text` directory are _modules_
internal to the project package. Examples here could be a general data loading script,
a definition of a model, or a training function. Think of modules as all the building
blocks of a project.

When a module is importing functions/classes from other modules we use the _relative
import_ notation - here's an example:

```
from .other_module import some_function
```

### Scripts
Python files in the `scripts` folder are scripts, which are short code snippets that
are _external_ to the project package, and which is meant to actually run the code. As
such, _only_ scripts will be called from the terminal. An analogy here is that the
internal `numpy` code are all modules, but the Python code you write where you import
some `numpy` functions and actually run them, that a script.
1. Run `make install`, which installs Poetry (if it isn't already installed), sets up a
virtual environment and all Python dependencies therein.
2. Run `source .venv/bin/activate` to activate the virtual environment.

When importing module functions/classes when you're in a script, you do it like you
would normally import from any other package:
With the project installed, you can build the dataset by running:

```
from tts_text import some_function
python src/scripts/build_tts_dataset.py
```

Note that this is also how we import functions/classes in tests, since each test Python
file is also a Python script, rather than a module.


## Features

### Docker Setup

A Dockerfile is included in the new repositories, which by default runs
`src/scripts/your_script.py`. You can build the Docker image and run the Docker
container by running `make docker`.

### Automatic Documentation

Run `make docs` to create the documentation in the `docs` folder, which is based on
your docstrings in your code. You can view this by running `make view-docs`.

### Automatic Test Coverage Calculation

Run `make test` to test your code, which also updates the "coverage badge" in the
README, showing you how much of your code base that is currently being tested.

### Continuous Integration

Github CI pipelines are included in the repo, running all the tests in the `tests`
directory, as well as building online documentation, if Github Pages has been enabled
for the repository (can be enabled on Github in the repository settings).

### Code Spaces

Code Spaces is a new feature on Github, that allows you to develop on a project
completely in the cloud, without having to do any local setup at all. This repo comes
included with a configuration file for running code spaces on Github. When hosted on
`alexandrainst/tts_text` then simply press the `<> Code` button
and add a code space to get started, which will open a VSCode window directly in your
browser.


## Project structure
```
Expand Down Expand Up @@ -150,16 +78,22 @@ browser.
│   └── .gitkeep
├── notebooks
│   └── .gitkeep
├── poetry.lock
├── poetry.toml
├── pyproject.toml
├── src
│   ├── scripts
│   │   ├── fix_dot_env_file.py
│   │   └── your_script.py
│   │   ├── build_tts_dataset.py
│   │   └── fix_dot_env_file.py
│   └── tts_text
│   ├── __init__.py
│   └── your_module.py
│   ├── __pycache__
│   ├── bus_stops_and_stations.py
│   ├── dates.py
│   ├── times.py
│   └── utils.py
└── tests
├── __init__.py
├── __pycache__
└── test_dummy.py
```
5 changes: 5 additions & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,8 @@ dirs:
processed: processed
final: final
models: models

sampling_probabilities:
dates: 0.25
times: 0.25
bus_stops_and_stations: 0.5
2 changes: 1 addition & 1 deletion makefile
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ test: ## Run tests

docker: ## Build Docker image and run container
@docker build -t tts_text .
@docker run -it --rm tts_text
@docker run -it --rm -v ./data:/project/data tts_text

tree: ## Print directory tree
@tree -a --gitignore -I .git .
Loading