DATA 606 Data Project Proposal

Data Preparation

The data set is regarding the houses in Melbourne, which talks about the no of rooms, type of house ( T= Townhouse/ H= House / U = Unit), Sold price , county, suburb etc columns.

# load data
getwd()
melHousData <- read.csv("Melbourne_housing_FULL.csv", header = TRUE)


str(melHousData)
dim(melHousData)



head(melHousData)

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. The research question is there a correlation in the house price with the Rooms and Type of the house.

Cases

What are the cases, and how many are there? There are 34857 cases with 19 variables. Each case refers house sold in Melbourne , based on Postcode, region, suburb, county , no of rooms and the house type what are prices which they were sold at.

Data collection

Describe the method of data collection. The Data set was easily available on Kaggle site.

Type of study

What type of study is this (observational/experiment)? This is an observational study, as we are trying to infer from already collected data and make some correlation.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link. The data has been taken from Kaggle site the url for the dataset is https://www.kaggle.com/anthonypino/melbourne-housing-market

Dependent Variable

What is the response variable? Is it quantitative or qualitative? Response variable selected for this is Price Indicator ,It is a quantitative variable.

Independent Variable

You should have two independent variables, one quantitative and one qualitative. The Independent variables are Rooms and House Type in which Rooms is numerical and House Type is Categorical.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

Rooms :- Looking at the summary and the histogram chart we can it is left skewed i.e. in most cases the house have less than 3 rooms.

summary(melHousData$Rooms)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.031   4.000  16.000

hist(melHousData$Rooms)

House Type :- the maximun houses are in house and then independent units and then least are the townhouse.

plot(melHousData$Type)

summary(melHousData$Type)

##     h     t     u 
## 23980  3580  7297

plot(x=melHousData$Rooms, y=melHousData$price)