top of page
Search

Project 3: Predicting Housing Price

  • Writer: Hoanglan Nguyen
    Hoanglan Nguyen
  • Jan 11, 2022
  • 3 min read

Updated: Apr 4, 2022

If there's anyone who wants a home, they first have to check the housing prices before purchasing it. The problem is that there are many features that houses have which could affect the sale prices. To find out the affects in housing price, I will use regression to predict the housing prices by using this housing price dataset from Kaggle. I will be using is the train.csv as the training dataset, and the test.csv as the testing dataset. The dataset was compiled Dean De Cock for use in data science education and has 79 features houses in Ames, Iowa. To achieve the goal, I will perform multiple experiments on the dataset to see any difference by using different techniques on the result.


What is regression and how does it work?

Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variables and a series of other variables which is known as independent variable. Using regression, it narrows down the dataset by understanding what features contribute to the data. The example for regression is linear regression which involves with establishing relationships between independent and dependent variables. The dependent variable can always be a continuous variable.


Linear Regression is a predictive model used for finding the linear relationship between a dependent variable and one or more independent variables.

Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out ta linear regression between x (input) and y (output). In the image above, the x variable is the work experience and the y variable is the salary of a person so the regression line is the best fit line for our model.


The simple linear model from algebra: Y = mX + b.

  • The "m" is the slope

  • The "b" is the y-intercept

The linear regression:

The symbol symbol YˆY^ to indicate a predicted value for YY. The value b0b0 is the Y-intercept and b1b1 is the slope. Compare this notation to traditional slope-intercept form.


Experiment 1: Data Understanding

Before I get into the real fun part, I first briefly look through the training csv file and test csv file on Kaggle to get the understanding of the columns and data types.

Then, I started the Jupyter by importing all libraries that I needed for the three experiments. Running the code "train.shape", the result shows that the dataset has 81 features. Then, I type in the code to see if there is any null values within the dataset.

As I set the codes to see any nulls, there are a lot of features or attributes with a large number of null values. The attributes that stands out the most is the LotFrontage, Alley, FireplaceQu, PoolQC, Fence, and MiscFeature.


Then, I created the heatmap to see some attributes with strong relationships that relates to housing price. For example, SalePrice has the best relationships with OverallQual, GrLivArea, GarageCars, and 1stFlrSF. There are possibly more attributes that have the best relationship but the ones with the most can be one. Another one is the OverallQual has the best relationship with GarageCars, GrLivArea, GarageArea, and YearBuilt.

The dataset has many attributes and looking through the heatmap has a lot attributes. According to the heatmap, the SalePrice is mostly related to many other features so I will create another graph to get the better view of SalePrice being at the top and the other features that are being ranked beneath SalePrice.


Experiment 1: Pre-processing

For this first experiment, I will use the ones that have the strongest relationship with the


Experiment 1: Modeling

To prepare for modeling,


Experiment 1: Evaluation

In this evaluation,


Experiment 2

wip


Experiment 3

wip


References

https://byuistats.github.io/B YUI_M221_Book/Lesson22.html

 
 
 

Recent Posts

See All
My Thoughts on Coded Bias

So recently I watched this documentary called Coded Bias and it's pretty interesting to see the fallout of MIT Media Lab researcher Joy...

 
 
 

Comments


Post: Blog2_Post
  • Facebook
  • Twitter
  • LinkedIn

©2022 by HelenNguyen77. Proudly created with Wix.com

bottom of page