As neuroscience evolves with increasingly large data sets, complex analyses, and bigger teams, building robust, reproducible, and scalable analysis workflows becomes essential in order to develop large computational science projects and keep them sustainable over time.
In this workshop, you will gain hands-on experience in crafting modular code scripts and command-line tools for efficient data processing, and how to cleanly combine them into a complete analysis pipeline, using Python’s snakemake package. We will explore tools such as Conda package manager to ensure consistent computational environments, vital for sharing code and ensuring computational reproducibility, and Git and GitHub for collaborating and sharing code across teams.
By the end of this workshop, you'll be able to develop and manage data analysis pipelines, ensuring that your projects are not only advanced in their execution but also sustainable in the long term - your future self will thank you for it!
- Git Slides: The basics of git and GitHub
- Exercises: The basics of creating Python scripts
- Exercises: The basics of creating CLIs
- Project: Using scripts and CLIs to collect per-trial data into a single file
- Exercises - part 1: Creating python functions
- Exercises - part 2: Organizing functions into Python modules
- Project: Extracting functions from existing project code and storing them into modules
- Exercises - part 1: Intro to Snakemake and creating single rules
- Exercises - part 2: Creating workflows with Snakemake