Jupyter notebooks are a great tool for exploring and interacting with data using the Python programming language and its rich ecosystem of libraries. In this course we will cover basic usage of the Pandas library to download a dataset, explore its contents, clean up missing or invalid data, filter the data according to different criteria, and plot visualizations of the data.
You will need a computer with Python, Jupyter and pandas installed.
If you don't already have this, I recommend installing Anaconda (which contains all of this and more):
After installing
-
open Jupyter (e.g. on Windows:
Start Menu -> Anaconda3 -> Jupyter Notebook
) -
create a new notebook (the web browser where Jupyter appeared, in top right click on New -> Notebook Python 3)
-
type
import pandas
and press Shift+Enter (or clickCell -> Run Cells
in the menu) -
if no error message appears, you are ready to start the course!
The notebooks created during the course will be automatically updated as they are edited at jupyter-data-exploration-live.
- Part 1: Python and Jupyter - online slides, colab interactive notebook, read-only notebook
- Part 2: Pandas with toy data - online slides, colab interactive notebook, read-only notebook
- Part 3: Pandas with real data - online slides, colab interactive notebook, read-only notebook
- pandas
- kaggle courses