This repo contains the notebooks and slides for the Large Language Models: Foundation Models from the Ground Up course on edX & Databricks Academy.
Note: this is the second course in the two-part series. For the first installment please see the course on edX & Databricks Academy as well as the supporting repo.
Notebooks
-
You first need to add Git credentials to Databricks. Refer to documentation here.
-
Click
Repos
in the sidebar. ClickAdd Repo
on the top right. -
Clone the "HTTPS" URL from GitHub, or copy
https://github.com/databricks-academy/llm-foundation-models.git
and paste into the boxGit repository URL
. The rest of the fields, i.e.Git provider
andRepository name
, will be automatically populated. ClickCreate Repo
on the bottom right.
-
You can download the notebooks from a release by navigating to the releases section on the GitHub page:
-
From the releases page, download the
.dbc
file. This contains all of the course notebooks, with the structure and meta data. -
In your Databricks workspace, navigate to the Workspace menu, click on Home and select
Import
: -
Using the import tool, navigate to the location on your computer where the
.dbc
file was dowloaded from Step 1. Once you select the file, clickImport
, and the files will be loaded and extracted to your workspace:
Cluster settings
-
First, select
Single Node
-
This courseware has been tested on Databricks Runtime 13.3 LTS for Machine Learning. If you do not have access to a 13.3 LTS ML Runtime cluster, you will need to install many additional libraries (as the ML Runtime pre-installs many commonly used machine learning packages), and this courseware is not guaranteed to run.
For Module 1 and 3 notebooks, you can run them on i3.xlarge just fine. We recommend
i3.2xlarge
for Module 2 and 4 notebooks.