Skip to content

Commit

Permalink
Merge pull request #263 from NBISweden/olga-lm
Browse files Browse the repository at this point in the history
Fix typos and notations
  • Loading branch information
olgadet authored Apr 29, 2024
2 parents c81da5e + e1ebe46 commit af27691
Show file tree
Hide file tree
Showing 41 changed files with 235 additions and 88 deletions.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion session-lm-presentation/.quarto/xref/1eca1403
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"headings":["we-will-learn","why-linear-models","statistical-vs.-deterministic-relationship","statistical-vs.-deterministic-relationship-1","statistical-vs.-deterministic-relationship-2","statistical-vs.-deterministic-relationship-3","what-linear-models-are","what-linear-models-are-1","simple-linear-regression","simple-linear-regression-1","weight-and-plasma-volume","simple-linear-regression-2","simple-linear-regression-3","simple-linear-regression-4","simple-linear-regression-5","least-squares","least-squares-1","slope","intercept","hypothesis-testing","hypothesis-testing-1","hypothesis-testing-2","hypothesis-testing-3","vector-matrix-notations","vector-matrix-notations-1","vector-matrix-form-of-the-linear-model","vector-matrix-notations-2","least-squares-in-vector-matrix-notation","vector-matrix-notations-3","vector-matrix-notation","vector-matrix-notations-least-squares","vector-matrix-notations-least-squares-1","assessing-model-fit","r2-summary-of-the-fitted-model","r2-summary-of-the-fitted-model-1","r2-summary-of-the-fitted-model-2","r2-summary-of-the-fitted-model-3","r2","r2-and-correlation-coefficient","r2-1","r2adj","r2adj-1","checking-model-assumptions","the-assumptions-of-a-linear-model","the-assumptions-of-a-linear-model-1","checking-assumptions","checking-assumptions-1","exercises","linear-models-regression-and-classification-tasks","linear-models-in-ml-context","evaluating-linear-models","feature-selection","feature-selection-1","there-are-generally-three-main-groups-of-feature-selection-methods","regularized-regression","regularized-regression-1","regularized-regression-2","regularized-regression-3","bias-variance-trade-off","bias-variance-trade-off-1","bias-variance-trade-off-2","ridge-vs.-lasso","ridge-vs.-lasso-1","elastic-net","elastic-net-1","generalized-linear-models","why-generalized-linear-models","logistic-regression","logistic-regression-1","logistic-regression-2","logistic-regression-3","logistic-regression-4","logistic-regression-5","logistic-regression-6","common-glm-models","logistic-lasso","logistic-lasso-1","common-cases"],"entries":[{"key":"exm-vector-matrix-notation","order":{"section":[1,24,0,0,0,0,0],"number":3},"caption":"vector-matrix-notation"},{"key":"thm-lss-vector-matrix","order":{"section":[1,23,0,0,0,0,0],"number":1},"caption":"Least squares in vector-matrix notation"},{"key":"fig-obesity","order":{"section":[6,2,0,0,0,0,0],"number":4},"caption":"?(caption)"},{"key":"eq-ridge2","order":{"section":[5,12,0,0,0,0,0],"number":6},"caption":""},{"key":"exm-hypothesis-testing","order":{"section":[1,20,0,0,0,0,0],"number":2},"caption":"Hypothesis testing"},{"key":"eq-elastic-net","order":{"section":[5,14,0,0,0,0,0],"number":8},"caption":""},{"key":"eq-lasso","order":{"section":[5,12,0,0,0,0,0],"number":7},"caption":""},{"key":"eq-lm-no-error","order":{"section":[1,11,0,0,0,0,0],"number":1},"caption":""},{"key":"fig-reg-errors","order":{"section":[1,15,0,0,0,0,0],"number":2},"caption":"Scatter plot of the data shows that high plasma volume tends to be associated with high weight and vice versa. Linear regression gives the equation of the straight line (red) that best describes how the outcome changes with a change of exposure variable. Blue lines represent error terms, the vertical distances to the regression line"},{"key":"def-r2","order":{"section":[2,4,0,0,0,0,0],"number":2},"caption":"R^2"},{"key":"def-vector-matrix-lm","order":{"section":[1,22,0,0,0,0,0],"number":1},"caption":"vector matrix form of the linear model"},{"key":"eq-lm","order":{"section":[5,6,0,0,0,0,0],"number":3},"caption":""},{"key":"fig-bias-variance","order":{"section":[5,11,0,0,0,0,0],"number":3},"caption":"Squared bias, variance and test mean squared error for ridge regression predictions on a simulated data as a function of lambda demonstrating bias-variance trade-off. Based on Gareth James et. al, A Introduction to statistical learning"},{"key":"eq-ridge","order":{"section":[5,7,0,0,0,0,0],"number":5},"caption":""},{"key":"exm-simple-lm","order":{"section":[1,9,0,0,0,0,0],"number":1},"caption":"Weight and plasma volume"},{"key":"fig-lm-example-reg","order":{"section":[1,11,0,0,0,0,0],"number":1},"caption":"Scatter plot of the data shows that high plasma volume tends to be associated with high weight and vice verca. Linear regression gives the equation of the straight line (red) that best describes how the outcome changes (increase or decreases) with a change of exposure variable"},{"key":"thm-r2adj","order":{"section":[2,6,0,0,0,0,0],"number":3},"caption":"R^2(adj)"},{"key":"thm-r2","order":{"section":[2,5,0,0,0,0,0],"number":2},"caption":"R^2"}]}
{"headings":["we-will-learn","why-linear-models","statistical-vs.-deterministic-relationship","statistical-vs.-deterministic-relationship-1","statistical-vs.-deterministic-relationship-2","statistical-vs.-deterministic-relationship-3","what-linear-models-are","what-linear-models-are-1","simple-linear-regression","simple-linear-regression-1","weight-and-plasma-volume","simple-linear-regression-2","simple-linear-regression-3","simple-linear-regression-4","simple-linear-regression-5","least-squares","least-squares-1","slope","intercept","hypothesis-testing","hypothesis-testing-1","hypothesis-testing-2","hypothesis-testing-3","vector-matrix-notations","vector-matrix-notations-1","vector-matrix-form-of-the-linear-model","vector-matrix-notations-2","least-squares-in-vector-matrix-notation","vector-matrix-notations-3","vector-matrix-notation","vector-matrix-notations-least-squares","vector-matrix-notations-least-squares-1","assessing-model-fit","r2-summary-of-the-fitted-model","r2-summary-of-the-fitted-model-1","r2-summary-of-the-fitted-model-2","r2-summary-of-the-fitted-model-3","r2","r2-and-correlation-coefficient","r2-1","r2adj","r2adj-1","checking-model-assumptions","the-assumptions-of-a-linear-model","the-assumptions-of-a-linear-model-1","checking-assumptions","checking-assumptions-1","exercises","linear-models-regression-and-classification-tasks","linear-models-in-ml-context","evaluating-linear-models","feature-selection","feature-selection-1","there-are-generally-three-main-groups-of-feature-selection-methods","regularized-regression","regularized-regression-1","regularized-regression-2","regularized-regression-3","bias-variance-trade-off","bias-variance-trade-off-1","bias-variance-trade-off-2","ridge-vs.-lasso","ridge-vs.-lasso-1","elastic-net","elastic-net-1","generalized-linear-models","why-generalized-linear-models","logistic-regression","logistic-regression-1","logistic-regression-2","logistic-regression-3","logistic-regression-4","logistic-regression-5","logistic-regression-6","common-glm-models","logistic-lasso","logistic-lasso-1","common-cases"],"entries":[{"order":{"section":[2,6,0,0,0,0,0],"number":3},"caption":"R^2(adj)","key":"thm-r2adj"},{"order":{"section":[1,15,0,0,0,0,0],"number":2},"caption":"Scatter plot of the data shows that high plasma volume tends to be associated with high weight and vice versa. Linear regression gives the equation of the straight line (red) that best describes how the outcome changes with a change of exposure variable. Blue lines represent error terms, the vertical distances to the regression line","key":"fig-reg-errors"},{"order":{"section":[5,12,0,0,0,0,0],"number":5},"caption":"","key":"eq-lasso"},{"order":{"section":[1,20,0,0,0,0,0],"number":2},"caption":"Hypothesis testing","key":"exm-hypothesis-testing"},{"order":{"section":[1,11,0,0,0,0,0],"number":1},"caption":"","key":"eq-lm-no-error"},{"order":{"section":[5,6,0,0,0,0,0],"number":3},"caption":"","key":"eq-ridge"},{"order":{"section":[5,14,0,0,0,0,0],"number":6},"caption":"","key":"eq-elastic-net"},{"order":{"section":[1,11,0,0,0,0,0],"number":1},"caption":"Scatter plot of the data shows that high plasma volume tends to be associated with high weight and vice verca. Linear regression gives the equation of the straight line (red) that best describes how the outcome changes (increase or decreases) with a change of exposure variable","key":"fig-lm-example-reg"},{"order":{"section":[5,11,0,0,0,0,0],"number":3},"caption":"Squared bias, variance and test mean squared error for ridge regression predictions on a simulated data as a function of lambda demonstrating bias-variance trade-off. Based on Gareth James et. al, A Introduction to statistical learning","key":"fig-bias-variance"},{"order":{"section":[1,13,0,0,0,0,0],"number":2},"caption":"","key":"eq-lm"},{"order":{"section":[5,12,0,0,0,0,0],"number":4},"caption":"","key":"eq-ridge2"},{"order":{"section":[6,2,0,0,0,0,0],"number":4},"caption":"?(caption)","key":"fig-obesity"},{"order":{"section":[2,5,0,0,0,0,0],"number":2},"caption":"R^2","key":"thm-r2"},{"order":{"section":[1,22,0,0,0,0,0],"number":1},"caption":"vector matrix form of the linear model","key":"def-vector-matrix-lm"},{"order":{"section":[2,4,0,0,0,0,0],"number":2},"caption":"R^2","key":"def-r2"},{"order":{"section":[1,23,0,0,0,0,0],"number":1},"caption":"Least squares in vector-matrix notation","key":"thm-lss-vector-matrix"},{"order":{"section":[1,9,0,0,0,0,0],"number":1},"caption":"Weight and plasma volume","key":"exm-simple-lm"},{"order":{"section":[1,24,0,0,0,0,0],"number":3},"caption":"vector-matrix-notation","key":"exm-vector-matrix-notation"}]}
2 changes: 1 addition & 1 deletion session-lm-presentation/renv.lock
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"R": {
"Version": "4.2.1",
"Version": "4.3.1",
"Repositories": [
{
"Name": "CRAN",
Expand Down
16 changes: 8 additions & 8 deletions session-lm-presentation/session-lm-presentation.html
Original file line number Diff line number Diff line change
Expand Up @@ -732,7 +732,7 @@ <h2>Simple linear regression</h2>
<li class="fragment"><span class="math inline">\(2.638\)</span> is not exactly the as same as <span class="math inline">\(2.75\)</span>, the first measurement we have in our dataset, i.e.&nbsp;<span class="math inline">\(2.75 - 2.638 = 0.112 \neq 0\)</span>.</li>
<li class="fragment">We thus add to the previous equation (<a href="#/simple-linear-regression-3">Equation&nbsp;1</a>) an <strong>error term</strong> to account for this and now we can write our <strong>simple regression model</strong> more formally as:</li>
<li class="fragment"><span id="eq-lm"><span class="math display">\[Y_i = \alpha + \beta \cdot x_i + \epsilon_i \qquad(2)\]</span></span> where:</li>
<li class="fragment"><span class="math inline">\(x\)</span>: is called: exposure variable, explanatory variable, dependent variable, predictor, covariate</li>
<li class="fragment"><span class="math inline">\(x\)</span>: is called: exposure variable, explanatory variable, independent variable, predictor, covariate</li>
<li class="fragment"><span class="math inline">\(y\)</span>: is called: response, outcome, dependent variable</li>
<li class="fragment"><span class="math inline">\(\alpha\)</span> and <span class="math inline">\(\beta\)</span> are <strong>model coefficients</strong></li>
<li class="fragment">and <span class="math inline">\(\epsilon_i\)</span> is an <strong>error terms</strong></li>
Expand Down Expand Up @@ -1402,11 +1402,11 @@ <h2>Regularized regression</h2>
<h2>Regularized regression</h2>
<p><em>Ridge regression</em> <br></p>
<ul>
<li>Previously we saw that the least squares fitting procedure estimates model coefficients <span class="math inline">\(\beta_0, \beta_1, \cdots, \beta_p\)</span> using the values that minimize the residual sum of squares: <span id="eq-lm"><span class="math display">\[RSS = \sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 \qquad(3)\]</span></span></li>
<li>Previously we saw that the least squares fitting procedure estimates model coefficients <span class="math inline">\(\beta_0, \beta_1, \cdots, \beta_p\)</span> using the values that minimize the residual sum of squares: <span class="math display">\[RSS = \sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2\]</span></li>
</ul>
<div class="fragment">
<ul>
<li>In <strong>regularized regression</strong> the coefficients are estimated by minimizing slightly different quantity. Specifically, in <strong>Ridge regression</strong> we estimate <span class="math inline">\(\hat\beta^{L}\)</span> that minimizes <span id="eq-ridge"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}\beta_j^2 \qquad(4)\]</span></span></li>
<li>In <strong>regularized regression</strong> the coefficients are estimated by minimizing slightly different quantity. Specifically, in <strong>Ridge regression</strong> we estimate <span class="math inline">\(\hat\beta^{L}\)</span> that minimizes <span id="eq-ridge"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}\beta_j^2 \qquad(3)\]</span></span></li>
</ul>
<p>where:</p>
<p><span class="math inline">\(\lambda \ge 0\)</span> is a <strong>tuning parameter</strong> to be determined separately e.g.&nbsp;via cross-validation</p>
Expand All @@ -1415,8 +1415,8 @@ <h2>Regularized regression</h2>
<section id="regularized-regression-2" class="slide level2">
<h2>Regularized regression</h2>
<p><em>Ridge regression</em> <br></p>
<p><span id="eq-ridge"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}\beta_j^2 \qquad(5)\]</span></span></p>
<p><a href="#/regularized-regression-1">Equation&nbsp;5</a> trades two different criteria:</p>
<p><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}\beta_j^2\]</span></p>
<p><a href="#/regularized-regression-1">Equation&nbsp;3</a> trades two different criteria:</p>
<ul>
<li>lasso regression seeks coefficient estimates that fit the data well, by making RSS small</li>
<li>however, the second term <span class="math inline">\(\lambda \sum_{j=1}^{p}\beta_j^2\)</span>, called <strong>shrinkage penalty</strong> is small when <span class="math inline">\(\beta_1, \cdots, \beta_p\)</span> are close to zero, so it has the effect of <strong>shrinking</strong> the estimates of <span class="math inline">\(\beta_j\)</span> towards zero.</li>
Expand Down Expand Up @@ -1473,8 +1473,8 @@ <h2>Bias-variance trade-off</h2>
<img data-src="images/bias-variance.png" style="width:100.0%" class="r-stretch quarto-figure-center"><p class="caption">Figure&nbsp;3: Squared bias, variance and test mean squared error for ridge regression predictions on a simulated data as a function of lambda demonstrating bias-variance trade-off. Based on Gareth James et. al, A Introduction to statistical learning</p></section>
<section id="ridge-vs.-lasso" class="slide level2">
<h2>Ridge vs.&nbsp;Lasso</h2>
<p>In <strong>Ridge</strong> regression we minimize: <span id="eq-ridge2"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}\beta_j^2 \qquad(6)\]</span></span> where <span class="math inline">\(\lambda \sum_{j=1}^{p}\beta_j^2\)</span> is also known as <strong>L2</strong> regularization element or <span class="math inline">\(l_2\)</span> penalty</p>
<p>In <strong>Lasso</strong> regression, that is Least Absolute Shrinkage and Selection Operator regression we change penalty term to absolute value of the regression coefficients: <span id="eq-lasso"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}|\beta_j| = RSS + \lambda \sum_{j=1}^{p}|\beta_j| \qquad(7)\]</span></span> where <span class="math inline">\(\lambda \sum_{j=1}^{p}|\beta_j|\)</span> is also known as <strong>L1</strong> regularization element or <span class="math inline">\(l_1\)</span> penalty</p>
<p>In <strong>Ridge</strong> regression we minimize: <span id="eq-ridge2"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}\beta_j^2 \qquad(4)\]</span></span> where <span class="math inline">\(\lambda \sum_{j=1}^{p}\beta_j^2\)</span> is also known as <strong>L2</strong> regularization element or <span class="math inline">\(l_2\)</span> penalty</p>
<p>In <strong>Lasso</strong> regression, that is Least Absolute Shrinkage and Selection Operator regression we change penalty term to absolute value of the regression coefficients: <span id="eq-lasso"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}|\beta_j| = RSS + \lambda \sum_{j=1}^{p}|\beta_j| \qquad(5)\]</span></span> where <span class="math inline">\(\lambda \sum_{j=1}^{p}|\beta_j|\)</span> is also known as <strong>L1</strong> regularization element or <span class="math inline">\(l_1\)</span> penalty</p>
<p>Lasso regression was introduced to help model interpretation. With Ridge regression we improve model performance but unless <span class="math inline">\(\lambda = \infty\)</span> all beta coefficients are non-zero, hence all variables remain in the model. By using <span class="math inline">\(l_1\)</span> penalty we can force some of the coefficients estimates to be exactly equal to 0, hence perform <strong>variable selection</strong></p>
</section>
<section id="ridge-vs.-lasso-1" class="slide level2">
Expand Down Expand Up @@ -1508,7 +1508,7 @@ <h2>Ridge vs.&nbsp;Lasso</h2>
<section id="elastic-net" class="slide level2">
<h2>Elastic Net</h2>
<p><br></p>
<p><strong>Elastic Net</strong> use both L1 and L2 penalties to try to find middle grounds by performing parameter shrinkage and variable selection. <span id="eq-elastic-net"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}|\beta_j| + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}|\beta_j| + \lambda \sum_{j=1}^{p}\beta_j^2 \qquad(8)\]</span></span></p>
<p><strong>Elastic Net</strong> use both L1 and L2 penalties to try to find middle grounds by performing parameter shrinkage and variable selection. <span id="eq-elastic-net"><span class="math display">\[\sum_{i=1}^{n} \left( y_i - \beta_0 - \sum_{j=1}^{p}\beta_jx_{ij} \right)^2 + \lambda \sum_{j=1}^{p}|\beta_j| + \lambda \sum_{j=1}^{p}\beta_j^2 = RSS + \lambda \sum_{j=1}^{p}|\beta_j| + \lambda \sum_{j=1}^{p}\beta_j^2 \qquad(6)\]</span></span></p>

<img data-src="session-lm-presentation_files/figure-revealjs/elastic-net-1.png" width="960" class="r-stretch quarto-figure-center"><p class="caption">Example of Elastic Net regression to model BMI using age, chol, hdl and glucose variables: model coefficients are plotted over a range of lambda values and alpha value 0.1, showing the changes of model coefficients as a function of lambda being somewhere between those for Ridge and Lasso regression.</p></section>
<section id="elastic-net-1" class="slide level2">
Expand Down
Loading

0 comments on commit af27691

Please sign in to comment.