Housing-Price-Prediction-MATLAB

Machine Learning (ML) model for price prediction using Linear Regression.

Description

This code was written in MATLAB for the competition presented by Kaggle. The proposed ML model was developed in order to represent one of the possible solutions for the housing price prediction problem.

Dataset

The dataset provided by Kaggle consists of 2919 samples with 79 features each. This dataset originally split into training and testing datasets with 1460 and 1459 samples, respectively. In order to justify our models performance, the training dataset is split into two subsets of data. One subset contains 86% of the original training data and is used to train our model, second subset that is called validation subset contains the remaining 14% and is used to validate our model. The accuracy of validation with the 14% of the training data will provide us with an understanding of the efficiency of our design.

Data Preprocessing

Data preprocessing consists of the following steps:

The data is cleaned from features that contained more than 50% of missing data;
All categorical features are transformed into numerical features;
The features are sorted so it would be possible to describe them linearly;
Some features with very low variance are deleted;
Outliers are deleted;
All missing values are found and changed to either 0 or most frequent values of the features that contain these missing values, wherever it makes sence;
The data is normalized, where necessary.

Linear Regression

The training dataset is used to calculate "w" and "b". This is done by solving the equation ŵ = (X̂^T X̂ - εI₆₈)^-1X̂^Ty, where X̂ is a modified version of the training dataset, y is a vector that contains the labels (prices), and &epsilon is a small value, in our case 0.01. It is important to notice that it is necessary to include a term εI₆₈ in the equation to ensure that the inverse (X̂^T X̂ - εI₆₈)^-1 does exist. Otherwise, the matrix can be badly scaled and results may be inaccurate. After solving the equation we can receive optimal parameters "w^*" and "b^*". They are used to create a linear model that is able to predict the prices of the houses based on their features. The prices can be found by solving Y = w^TX + b.

Results

As a prediction method it was decided to use a linear regression method since the given data can be described linearly. This method turned out to be fairly accurate as it showed a high percentage of accuracy. As one of the evaluation methods, RMSLE is used to calculate error. As another form of evaluation, relative prediction error percentage is used. As the final result RMSLE showed an error of 14%. Relative prediction error percentage was 12%. As it can be seen, the final accuracy of the algorithm is approximately 87%.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
Project_release.m		Project_release.m
README.md		README.md
replace_func.m		replace_func.m
submission.csv		submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Housing-Price-Prediction-MATLAB

Description

Dataset

Data Preprocessing

Linear Regression

Results

About

Releases

Packages

Languages

License

dgovor/Housing-Price-Prediction-MATLAB

Folders and files

Latest commit

History

Repository files navigation

Housing-Price-Prediction-MATLAB

Description

Dataset

Data Preprocessing

Linear Regression

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages