Revised markdown and description of tutorial

PreferredAI · Oct 25, 2024 · deac885 · deac885
1 parent 9bee41b
commit deac885
Showing 1 changed file with 43 additions and 30 deletions.
diff --git a/tutorials/model_ensembling.ipynb b/tutorials/model_ensembling.ipynb
@@ -32,11 +32,16 @@
    "id": "9c98761e-37fc-4407-b451-d59a68aecd83",
    "metadata": {},
    "source": [
-    "This notebook provides an example of how to ensemble multiple recommendation models in Cornac.\n",
+    "This Jupyter Notebook demonstrates the process of ensembling multiple recommendation models using the Cornac library.\n",
     "\n",
-    "Ensemble models is a technique that combines the predictions of multiple models to produce a single prediction. The idea is that by combining the predictions of multiple models, we can improve the overall performance of the recommendation system.\n",
+    "Model ensembling is a technique that combines the predictions of multiple models to produce a single, more accurate prediction. By leveraging the strengths of different models, we can improve the overall performance of the recommendation system.\n",
     "\n",
-    "We will use the MovieLens 100K dataset and ensemble 2 models."
+    "There are 5 main parts to this tutorial:\n",
+    "1. [**Introduction**](#introduction): We will first get you started by running a simple experiment with both BPR and WMF models. We will also take a look at what the dataset distribution is like.\n",
+    "2. [**Simple Model Ensembling**](#ensemble-models). We will ensemble the predictions of the BPR and WMF models using a simple technique called Borda Count.\n",
+    "3. [**Further Ensembling**](#further-ensembling). We will create variations of the WMF models, and further ensemble them using the same technique.\n",
+    "4. [**Ensembling with Regression Models**](#ensembling-with-regression-models). We will utilize the same WMF models and ensemble them by linear regression and random forest regression using the `scikit-learn` package.\n",
+    "5. [**Further Evaluation**](#evaluation). We will evaluate the performance of the ensemble models."
    ]
   },
   {
@@ -52,7 +57,16 @@
    "id": "82295d13-46da-4e8e-a052-b420beb969e8",
    "metadata": {},
    "source": [
-    "## 1. Setup"
+    "## 1. Introduction\n",
+    "<a id='introduction'></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50e0c21e",
+   "metadata": {},
+   "source": [
+    "We will first run a simple experiment with both BPR and WMF models. We will also take a look at what the dataset distribution is like.\n"
    ]
   },
   {
@@ -167,9 +181,7 @@
    "id": "c5636df0-91c3-4b73-8c60-26d75b9bd6f6",
    "metadata": {},
    "source": [
-    "## 2. Prepare Experiment\n",
-    "\n",
-    "### 2.1 Loading Dataset\n",
+    "### 1.2 Loading Dataset\n",
     "\n",
     "First, we load the MovieLens 100K dataset."
    ]
@@ -219,12 +231,12 @@
    "id": "2d37011e-1384-42cb-8ea0-4667784f952a",
    "metadata": {},
    "source": [
-    "### 2.2 Training BPR and WMF models\n",
+    "### 1.3 Training BPR and WMF models\n",
     "\n",
     "We will train two models: \n",
     "\n",
-    "1. BPR (Bayesian Personalized Ranking)\n",
-    "2. WMF (Weighted Matrix Factorization)"
+    "1. **BPR (Bayesian Personalized Ranking)**\n",
+    "2. **WMF (Weighted Matrix Factorization)**"
    ]
   },
   {
@@ -334,25 +346,23 @@
    "source": [
     "Comparing Precision and Recall, both BPR and WMF are providing comparable results.\n",
     "\n",
-    "Let's move on to try to interpret these results by using the genres of movies that were recommended to us.\n",
-    "\n",
-    "Generally, we could assume that if an individual likes a particular film genre like 'Romance', the recommender system should provide more of such 'Romance' films."
+    "Let's move on to try to interpret these results by using the genres of movies that were recommended to us."
    ]
   },
   {
    "cell_type": "markdown",
    "id": "0fcc4831",
    "metadata": {},
    "source": [
-    "### 2.3 Interpreting Results"
+    "### 1.4 Interpreting Results"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "0be074f7",
    "metadata": {},
    "source": [
-    "##### Creating a Movie Genre Dataframe"
+    "##### 1.4.1 Creating a Movie Genre Dataframe"
    ]
   },
   {
@@ -560,7 +570,7 @@
    "id": "b476a6f8",
    "metadata": {},
    "source": [
-    "##### Creating Training Data Dataframe\n",
+    "##### 1.4.2 Creating Training Data Dataframe\n",
     "\n",
     "To get a sense of what data has been inserted into our model for training, let's count the genres of the training data used to train the model.\n",
     "\n",
@@ -602,7 +612,7 @@
    "id": "2a2f5733",
    "metadata": {},
    "source": [
-    "##### Filtering Training Data\n",
+    "##### 1.4.3 Filtering Training Data\n",
     "\n",
     "Let's filter based on a particular user to learn more about the user.\n",
     "\n",
@@ -762,7 +772,7 @@
    "id": "d242944c",
    "metadata": {},
    "source": [
-    "##### Interpreting Recommendations of BPR, WMF Models"
+    "##### 1.4.4 Interpreting Recommendations of BPR, WMF Models"
    ]
   },
   {
@@ -1163,7 +1173,7 @@
    "id": "9f02bf37",
    "metadata": {},
    "source": [
-    "##### Comparing Models by Genre Distribution"
+    "##### 1.4.5 Comparing Models by Genre Distribution"
    ]
   },
   {
@@ -1329,19 +1339,20 @@
    "id": "7cce45f7",
    "metadata": {},
    "source": [
-    "## 2. Simple model ensembling by Borda Count\n",
+    "## 2. Simple Model Ensembling by Borda Count\n",
+    "<a id='ensemble-models'></a>\n",
     "\n",
     "We will ensemble the two models using the Borda Count method. The Borda Count method is a simple voting method that ranks the items based on the sum of their ranks from each model.\n",
     "\n",
-    "Assuming that we have a list of 5 items, the Borda Count method works as follows:\n",
+    "Assuming that we have a list of **5 items**, the Borda Count method works as follows:\n",
     "\n",
     "1. For each model, rank the items from 1 to 5 based on the predicted scores.\n",
     "2. Sum the ranks of each item across all models.\n",
     "3. Sort the items based on the sum of their ranks.\n",
     "4. The top-ranked item is the final recommendation.\n",
     "5. Repeat the process for the next user.\n",
     "\n",
-    "Given the below example for a random user 123:\n",
+    "Given the below example for a random user **123**:\n",
     "\n",
     "| Rank | Model 1 | Model 2 | Model 3 | Allocated Points (N - rank) |\n",
     "|------|---------|---------|---------|-----------------------------|\n",
@@ -1835,7 +1846,8 @@
    "id": "142db187",
    "metadata": {},
    "source": [
-    "## 3. Adding more models to the Borda Count ensemble\n",
+    "## 3. Further Ensembling by Adding More Models\n",
+    "<a id='further-ensembling'></a>\n",
     "\n",
     "We can easily add more models to the ensemble by training them and adding them. One approach is to train a model with different initializations using different random seeds. By adding multiple similar models of different random seeds (`seed=123`), some models could perform better for a set of users, while other models could perform better for another set of users.\n",
     "\n",
@@ -2598,7 +2610,8 @@
    "id": "2c286769",
    "metadata": {},
    "source": [
-    "## 4. Model Ensembling via Regression Models"
+    "## 4. Ensembling with Regression Models\n",
+    "<a id='ensembling-with-regression-models'></a>"
    ]
   },
   {
@@ -3305,22 +3318,22 @@
    "id": "c6ac3d4e",
    "metadata": {},
    "source": [
-    "## 5. Further Comparison"
+    "## 5. Further Evaluation\n",
+    "<a id='evaluation'></a>"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "2d02a252",
    "metadata": {},
    "source": [
-    "Based on section 4, we are only able to eyeball the results and compare them visually based on a **single user**.\n",
-    "\n",
-    "We have only compared the genre distributions of the recommendations for a **single user**.\n",
+    "In Section 4, we were only able to view the recommendation distribution and compare them visually based on a **single user**.\n",
     "\n",
     "What if we want to compare the models based on **multiple users**?\n",
-    "We can do so by calculating the score of all users and item combinations, then the **Precision** and **Recall** of the predictions for each model.\n",
     "\n",
-    "Let's first create a dataframe to store the results of the models given **all users and items**."
+    "> We can do so by calculating the score of all users and item combinations, then the **Precision** and **Recall** of the predictions for each model.\n",
+    "\n",
+    "We will also create a dataframe `rank_df` to store the results of the models given **all users and items**."
    ]
   },
   {