Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch convertion module #35

Merged
merged 86 commits into from
Jul 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
a90722f
initial commit of the batch subpackage
JessyBarrette Sep 15, 2022
a9c6e72
first working batch
JessyBarrette Sep 15, 2022
ceeaa57
Merge branch 'development' into add-batch-convert-mode
JessyBarrette Oct 21, 2022
571e431
add logging config
JessyBarrette Oct 21, 2022
4e02bae
add file_registry
JessyBarrette Oct 21, 2022
eb6ca3f
cleanup imports
JessyBarrette Oct 21, 2022
9115eab
Merge branch 'development' into add-batch-convert-mode
JessyBarrette Oct 28, 2022
b0e6454
add more to config and auto mode tests
JessyBarrette Nov 2, 2022
c779f2d
Merge branch 'development' into add-batch-convert-mode
JessyBarrette May 24, 2023
c9fcd83
improve more the batch mode
JessyBarrette May 24, 2023
a797485
black isort
JessyBarrette May 25, 2023
84715b0
use logging config
JessyBarrette May 25, 2023
90783f7
fix output_path logic
JessyBarrette May 25, 2023
380a825
rename sub functions
JessyBarrette May 25, 2023
29e7cb7
update batch test module
JessyBarrette May 25, 2023
76ce3cb
fix to registry methods
JessyBarrette May 25, 2023
17a606b
refactor batch.convert
JessyBarrette May 25, 2023
d9b89c5
isort black
JessyBarrette May 25, 2023
c8aa87e
add cli methods for batch conversion
JessyBarrette May 25, 2023
bd4eb96
review parse output to be xarray.Dataset
JessyBarrette May 25, 2023
beeedf8
move test files to accomated better the auto detection test
JessyBarrette May 25, 2023
50b793a
streamline auto.file method
JessyBarrette May 25, 2023
48bad9b
parametrize auto detect parser tests
JessyBarrette May 25, 2023
e530a14
add nmea to auto detects
JessyBarrette May 25, 2023
6b4650d
add batch.__init__.py
JessyBarrette May 25, 2023
9bd1f97
add default batch config to package
JessyBarrette May 25, 2023
f69b0d7
make pytest on different historical python versions only pr to main
JessyBarrette May 25, 2023
7779525
improve registry logic and add pytests
JessyBarrette May 26, 2023
4331dfd
isort . black .
JessyBarrette May 26, 2023
9980bd8
isort . black .
JessyBarrette May 26, 2023
67973dc
Merge remote-tracking branch 'origin/development' into add-batch-conv…
JessyBarrette May 26, 2023
345a5d8
fix entry_points
JessyBarrette May 26, 2023
22454d6
compare registry with difflib instead of hash in test
JessyBarrette Jun 1, 2023
dcfe0ea
Merge branch 'development' into add-batch-convert-mode
JessyBarrette Jun 2, 2023
8f548ae
fix comment to read.auto.file
JessyBarrette Jun 2, 2023
3234c26
fix auto method to return dataset
JessyBarrette Jun 2, 2023
ef59d03
rename test_*.py
JessyBarrette Jun 2, 2023
5878e60
add a seabird timeseries test file
JessyBarrette Jun 2, 2023
9b18679
add process tool section
JessyBarrette Jun 2, 2023
cc14edf
add process dependencies
JessyBarrette Jun 2, 2023
762bddc
add __init__ to process
JessyBarrette Jun 2, 2023
3efd80f
rename again test_compile_netcdf
JessyBarrette Jun 2, 2023
4901afe
fix manual_qc imports
JessyBarrette Jun 2, 2023
5c1235c
isort . black .
JessyBarrette Jun 2, 2023
4326dd5
move process output file to tests folder
JessyBarrette Jun 2, 2023
49ceedf
improve registry api and add multiprocessing
JessyBarrette Jul 10, 2023
9863cbc
make ioos_qc and xarray_gsw optional packages
JessyBarrette Jul 10, 2023
95ddeab
improve multiprocessing input
JessyBarrette Jul 10, 2023
7fea021
add a cli flag to batch convert to generate a new config file
JessyBarrette Jul 10, 2023
3d3ac49
manual_qc imports
JessyBarrette Jul 10, 2023
bdb331d
add test for --new_config batch input
JessyBarrette Jul 10, 2023
555300c
update registry tests
JessyBarrette Jul 10, 2023
4c4463c
move registry_files tests to its own modle
JessyBarrette Jul 10, 2023
5663397
add sub method _get_sources to standardize sources input
JessyBarrette Jul 10, 2023
db664b1
drop units from datetime variables when standardizing
JessyBarrette Jul 10, 2023
3c1cc12
improve generate_output_path
JessyBarrette Jul 10, 2023
dbdf956
add errors input to config to handle errors
JessyBarrette Jul 10, 2023
ef2e93b
fix generate_output_path
JessyBarrette Jul 10, 2023
18b58ee
clean up convert module
JessyBarrette Jul 10, 2023
40c1272
drop overwrite parameter in config file_output
JessyBarrette Jul 10, 2023
e45d8b1
add more generate_output_path tests and improve function
JessyBarrette Jul 10, 2023
fa3e941
fix multiprocessing tool
JessyBarrette Jul 10, 2023
ab5a44e
Merge branch 'development' into add-batch-convert-mode
JessyBarrette Jul 10, 2023
acb00bb
isort . black .
JessyBarrette Jul 10, 2023
ffa6177
fix datetie variable units pop
JessyBarrette Jul 11, 2023
fbaf466
improve auto detection parser tests assertion
JessyBarrette Jul 11, 2023
dbbc57f
drop os dependancy in read.auto and rely on importlib.Path
JessyBarrette Jul 11, 2023
67824cf
make shapely import optional and only imported when needed
JessyBarrette Jul 11, 2023
226829a
add reference_station and reference_regions parameters in batch conve…
JessyBarrette Jul 11, 2023
1bbbd51
remove seabird hex file which shouldn't be there
JessyBarrette Jul 11, 2023
399561e
convert test variables associated with reference object variables as…
JessyBarrette Jul 11, 2023
190b0e7
convert list and tuple in xarray attributes as numpy arrays if all it…
JessyBarrette Jul 11, 2023
17a7096
fix time_coverage attribute type if standardize_attribute is run mult…
JessyBarrette Jul 11, 2023
5f1197d
use difflib only to compare two xarray datasets
JessyBarrette Jul 11, 2023
d9605e7
improve reference tests
JessyBarrette Jul 12, 2023
f0206f6
drop reference odf netcdf files for data not fully relate to bio orga…
JessyBarrette Jul 12, 2023
f01d71a
fix handling of list of strings attributes and generate_file_name
JessyBarrette Jul 12, 2023
46a57d1
another attempt to fix attributes to np.array
JessyBarrette Jul 12, 2023
37d96ef
add ioos_qc to dev dependancies
JessyBarrette Jul 12, 2023
b9221fe
drop path exist condition for new_config input
JessyBarrette Jul 12, 2023
10ac746
fix auto_detection tests test file list
JessyBarrette Jul 12, 2023
f4dd22c
fix copy of a slice warning
JessyBarrette Jul 12, 2023
e93f6dd
fix new config creation test
JessyBarrette Jul 12, 2023
7d7c4c3
drop new_config type
JessyBarrette Jul 13, 2023
d063da5
fix new_config with missing directory for paths
JessyBarrette Jul 13, 2023
fe3074d
black .
JessyBarrette Jul 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/test-python-version.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
pull_request:
branches:
- main
- development

jobs:
build:
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,9 @@ tests/parsers_test_files/amundsen/12713/**/*
output/
site/**
docs/read/**/*-hook.md
*.log
file_registry.csv
temp/**
tests/test_file_registry_temp.csv

temp/**
3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
recursive-include ocean_data_parser/read/vocabularies *
include ocean_data_parser/read/dfo/odf_source/*config.json
recursive-include ocean_data_parser/read/dfo/odf_source/references *
recursive-include ocean_data_parser/read/dfo/odf_source/references/geographical_areas *
recursive-include ocean_data_parser/read/dfo/odf_source/references/geographical_areas *
include ocean_data_parser/batch/default-batch-config.yaml
Empty file.
57 changes: 57 additions & 0 deletions ocean_data_parser/batch/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import logging
from pathlib import Path
from collections.abc import Generator

import yaml
import pandas as pd

from ocean_data_parser.geo import read_geojson

MODULE_PATH = Path(__file__).parent
DEFAULT_CONFIG_PATH = MODULE_PATH / "default-batch-config.yaml"

logger = logging.getLogger(__name__)


def glob(paths: str) -> Generator[Path]:
"""Create a generator of paths from a glob path expression

Args:
paths (str): glob type apth

Yields:
Generator[Path]: generator of Path objects
"""
paths = Path(paths)
anchor = paths.anchor
return Path(anchor).glob(str(paths.relative_to(anchor)))


def load_config(config_path: str = None, encoding="UTF-8"):
"""Load YAML configuration file, if not provided load default configuration."""
# Get default config if no file provided
if config_path is None:
config_path = Path(__file__).parent / "default-batch-config.yaml"

with open(config_path, "r", encoding=encoding) as file:
config = yaml.load(file, Loader=yaml.SafeLoader)

# Load geojson regions
if config.get("reference_regions") and config["reference_regions"].get("path"):
for path in glob(config["reference_regions"]["path"]):
config["reference_regions"]["regions"].update(read_geojson(path))

# Load reference stations
if config.get("reference_stations") and config["reference_stations"].get("path"):
config["reference_stations"]["stations"] = pd.concat(
[
pd.read_csv(path)
for path in glob(config["reference_stations"]["path"])
if path
]
)

return config


config = load_config(DEFAULT_CONFIG_PATH)
Loading