Skip to content

nickhamlin/mids_261_homework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIDS 261 - Machine Learning at Scale

This repo contains assignments for UC Berkeley's Machine Learning at Scale course. Content is organized by assignment and contain the following concepts:

Class Assignments

  • Week 1: Mapreduce wordcount in command line
  • Week 2: Wordcount in Python/Hadoop Streaming
  • Week 4: Basics in MrJob
  • Week 5: Reducer-side Inner Join
  • Week 10: Basics in Spark

Homework Assignments

  • HW 1: Naive Bayes spam filter in command line using Enron emails
  • HW 2: Naive Bayes spam filter in Hadoop Streaming using Enron emails
  • HW 3: Shopping Cart Analysis, Pairs vs Stripes, Secondary Sort, Custom Partioning
  • HW 4: Tweet clustering via KMeans in MrJob
  • HW 5: Large-scale joins in MrJob, EDA and synonym detection in Google n-gram corpus (on AWS)
  • HW 6: Weighted OLS using Gradient Descent, Gaussian Mixture Models
  • HW 7: Distributed Shortest-Path in MrJob (on AWS) using English Wikipedia
  • HW 9: Distributed Pagerank in MrJob (on AWS) using English Wikipedia
  • HW 10: KMeans,Ridge/Lasso Regression in MlLib
  • HW 11: Logisitic Regression,SVM in both base Spark and MlLib (in Zeppelin Notebook)
  • HW 12: Structured walkthrough of feature hashing, one-hot encoding, and click-through prediction using Criteo dataset
  • HW 13: Pagerank in Spark (using Wikipedia) and Click-through prediction at Scale (using Criteo data)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages