In this section, we'll prepare the environment
The easiest way to set up the environment is to use Anaconda or Miniconda. Anaconda comes with the most commonly used libraries preinstalled in the base
environment, Miniconda is a smaller version of Anaconda that contains only Python.
It is a good idea to set up a dedicated environment for the course (and do not use this environment for other projects)
Follow the instructions on page for installing the correct package for your system (the site will automatically detect your operating system and suggest the correct package)
Anaconda : https://www.anaconda.com/products/individual
Miniconda: https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links
(If you are using Windows, you can use WSL instead and follow the installation instructions for linux)
In your terminal, run this command
conda create -n ml-zoomcamp python=3.8
Do this whenever you are adding new packages for the course or working on coursework
conda activate ml-zoomcamp
Installing libraries available on conda
conda install numpy pandas scikit-learn seaborn jupyter
Optionally, if you want to use tensorflow locally with a GPU:
conda config --add channels conda-forge
conda install cudatoolkit=11.2
Additional libraries only available on pypi:
pip install xgboost
pip install tensorflow
If you are comfortable using docker you can use the following guide:
Code: Setup using Docker
Instead of running things locally, you can use online services or rent a server
You can rent an instance on AWS:
To use Kaggle to open and run the Jupyter notebooks provided as part of this course do the following:
Pre-requisites - You need to have an account in Kaggle (it's free) and be logged into Kaggle
-
Find the URL of the notebook.
-
To open the notebook in Kaggle, in your web browser launch paste the URL as shown in below example. (note the additional https://kaggle.com/kernels/welcome?src= before the URL of the notebook)
-
Check if the notebook uses any datafile to read data from it. If yes, note the datafile name from the code.- look for pd.read_csv("somefilename.csv").
-
You need to download the file into Kaggle. For this:
a. Find the URL of the datafile in github.
b. Suppose the URL is https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-02-car-price/data.csv , you need use the URL to raw file, which will look something like https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv
-
In the notebook opened in Kaggle, add a Code block with the command to download the file - !wget your-datafile-url
This way you can start with the exercise using Kaggle
To use Google Colab to open and run the Jupyter notebooks provided as part of this course do the following:
Pre-requisites - You need to have a google account (any gmail account) and be logged into that account
Steps for Google Colab are same as that for Kaggle, except for some changes in Step 2, as explained below.
-
To open the notebook in Google Colab, in your web browser launch paste the URL as shown in below example. (note the https://github.com/ in the URL of the notebook is replaced by https://colab.research.google.com/github/)