Skip to content

Commit

Permalink
Revised markdown and description of tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
darrylong committed Oct 25, 2024
1 parent 9bee41b commit deac885
Showing 1 changed file with 43 additions and 30 deletions.
73 changes: 43 additions & 30 deletions tutorials/model_ensembling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,16 @@
"id": "9c98761e-37fc-4407-b451-d59a68aecd83",
"metadata": {},
"source": [
"This notebook provides an example of how to ensemble multiple recommendation models in Cornac.\n",
"This Jupyter Notebook demonstrates the process of ensembling multiple recommendation models using the Cornac library.\n",
"\n",
"Ensemble models is a technique that combines the predictions of multiple models to produce a single prediction. The idea is that by combining the predictions of multiple models, we can improve the overall performance of the recommendation system.\n",
"Model ensembling is a technique that combines the predictions of multiple models to produce a single, more accurate prediction. By leveraging the strengths of different models, we can improve the overall performance of the recommendation system.\n",
"\n",
"We will use the MovieLens 100K dataset and ensemble 2 models."
"There are 5 main parts to this tutorial:\n",
"1. [**Introduction**](#introduction): We will first get you started by running a simple experiment with both BPR and WMF models. We will also take a look at what the dataset distribution is like.\n",
"2. [**Simple Model Ensembling**](#ensemble-models). We will ensemble the predictions of the BPR and WMF models using a simple technique called Borda Count.\n",
"3. [**Further Ensembling**](#further-ensembling). We will create variations of the WMF models, and further ensemble them using the same technique.\n",
"4. [**Ensembling with Regression Models**](#ensembling-with-regression-models). We will utilize the same WMF models and ensemble them by linear regression and random forest regression using the `scikit-learn` package.\n",
"5. [**Further Evaluation**](#evaluation). We will evaluate the performance of the ensemble models."
]
},
{
Expand All @@ -52,7 +57,16 @@
"id": "82295d13-46da-4e8e-a052-b420beb969e8",
"metadata": {},
"source": [
"## 1. Setup"
"## 1. Introduction\n",
"<a id='introduction'></a>"
]
},
{
"cell_type": "markdown",
"id": "50e0c21e",
"metadata": {},
"source": [
"We will first run a simple experiment with both BPR and WMF models. We will also take a look at what the dataset distribution is like.\n"
]
},
{
Expand Down Expand Up @@ -167,9 +181,7 @@
"id": "c5636df0-91c3-4b73-8c60-26d75b9bd6f6",
"metadata": {},
"source": [
"## 2. Prepare Experiment\n",
"\n",
"### 2.1 Loading Dataset\n",
"### 1.2 Loading Dataset\n",
"\n",
"First, we load the MovieLens 100K dataset."
]
Expand Down Expand Up @@ -219,12 +231,12 @@
"id": "2d37011e-1384-42cb-8ea0-4667784f952a",
"metadata": {},
"source": [
"### 2.2 Training BPR and WMF models\n",
"### 1.3 Training BPR and WMF models\n",
"\n",
"We will train two models: \n",
"\n",
"1. BPR (Bayesian Personalized Ranking)\n",
"2. WMF (Weighted Matrix Factorization)"
"1. **BPR (Bayesian Personalized Ranking)**\n",
"2. **WMF (Weighted Matrix Factorization)**"
]
},
{
Expand Down Expand Up @@ -334,25 +346,23 @@
"source": [
"Comparing Precision and Recall, both BPR and WMF are providing comparable results.\n",
"\n",
"Let's move on to try to interpret these results by using the genres of movies that were recommended to us.\n",
"\n",
"Generally, we could assume that if an individual likes a particular film genre like 'Romance', the recommender system should provide more of such 'Romance' films."
"Let's move on to try to interpret these results by using the genres of movies that were recommended to us."
]
},
{
"cell_type": "markdown",
"id": "0fcc4831",
"metadata": {},
"source": [
"### 2.3 Interpreting Results"
"### 1.4 Interpreting Results"
]
},
{
"cell_type": "markdown",
"id": "0be074f7",
"metadata": {},
"source": [
"##### Creating a Movie Genre Dataframe"
"##### 1.4.1 Creating a Movie Genre Dataframe"
]
},
{
Expand Down Expand Up @@ -560,7 +570,7 @@
"id": "b476a6f8",
"metadata": {},
"source": [
"##### Creating Training Data Dataframe\n",
"##### 1.4.2 Creating Training Data Dataframe\n",
"\n",
"To get a sense of what data has been inserted into our model for training, let's count the genres of the training data used to train the model.\n",
"\n",
Expand Down Expand Up @@ -602,7 +612,7 @@
"id": "2a2f5733",
"metadata": {},
"source": [
"##### Filtering Training Data\n",
"##### 1.4.3 Filtering Training Data\n",
"\n",
"Let's filter based on a particular user to learn more about the user.\n",
"\n",
Expand Down Expand Up @@ -762,7 +772,7 @@
"id": "d242944c",
"metadata": {},
"source": [
"##### Interpreting Recommendations of BPR, WMF Models"
"##### 1.4.4 Interpreting Recommendations of BPR, WMF Models"
]
},
{
Expand Down Expand Up @@ -1163,7 +1173,7 @@
"id": "9f02bf37",
"metadata": {},
"source": [
"##### Comparing Models by Genre Distribution"
"##### 1.4.5 Comparing Models by Genre Distribution"
]
},
{
Expand Down Expand Up @@ -1329,19 +1339,20 @@
"id": "7cce45f7",
"metadata": {},
"source": [
"## 2. Simple model ensembling by Borda Count\n",
"## 2. Simple Model Ensembling by Borda Count\n",
"<a id='ensemble-models'></a>\n",
"\n",
"We will ensemble the two models using the Borda Count method. The Borda Count method is a simple voting method that ranks the items based on the sum of their ranks from each model.\n",
"\n",
"Assuming that we have a list of 5 items, the Borda Count method works as follows:\n",
"Assuming that we have a list of **5 items**, the Borda Count method works as follows:\n",
"\n",
"1. For each model, rank the items from 1 to 5 based on the predicted scores.\n",
"2. Sum the ranks of each item across all models.\n",
"3. Sort the items based on the sum of their ranks.\n",
"4. The top-ranked item is the final recommendation.\n",
"5. Repeat the process for the next user.\n",
"\n",
"Given the below example for a random user 123:\n",
"Given the below example for a random user **123**:\n",
"\n",
"| Rank | Model 1 | Model 2 | Model 3 | Allocated Points (N - rank) |\n",
"|------|---------|---------|---------|-----------------------------|\n",
Expand Down Expand Up @@ -1835,7 +1846,8 @@
"id": "142db187",
"metadata": {},
"source": [
"## 3. Adding more models to the Borda Count ensemble\n",
"## 3. Further Ensembling by Adding More Models\n",
"<a id='further-ensembling'></a>\n",
"\n",
"We can easily add more models to the ensemble by training them and adding them. One approach is to train a model with different initializations using different random seeds. By adding multiple similar models of different random seeds (`seed=123`), some models could perform better for a set of users, while other models could perform better for another set of users.\n",
"\n",
Expand Down Expand Up @@ -2598,7 +2610,8 @@
"id": "2c286769",
"metadata": {},
"source": [
"## 4. Model Ensembling via Regression Models"
"## 4. Ensembling with Regression Models\n",
"<a id='ensembling-with-regression-models'></a>"
]
},
{
Expand Down Expand Up @@ -3305,22 +3318,22 @@
"id": "c6ac3d4e",
"metadata": {},
"source": [
"## 5. Further Comparison"
"## 5. Further Evaluation\n",
"<a id='evaluation'></a>"
]
},
{
"cell_type": "markdown",
"id": "2d02a252",
"metadata": {},
"source": [
"Based on section 4, we are only able to eyeball the results and compare them visually based on a **single user**.\n",
"\n",
"We have only compared the genre distributions of the recommendations for a **single user**.\n",
"In Section 4, we were only able to view the recommendation distribution and compare them visually based on a **single user**.\n",
"\n",
"What if we want to compare the models based on **multiple users**?\n",
"We can do so by calculating the score of all users and item combinations, then the **Precision** and **Recall** of the predictions for each model.\n",
"\n",
"Let's first create a dataframe to store the results of the models given **all users and items**."
"> We can do so by calculating the score of all users and item combinations, then the **Precision** and **Recall** of the predictions for each model.\n",
"\n",
"We will also create a dataframe `rank_df` to store the results of the models given **all users and items**."
]
},
{
Expand Down

0 comments on commit deac885

Please sign in to comment.