> Housing <- read.table("/Users/erindane/Desktop/R Studios /Table2.1HousePrices-NoID.csv",
+ header=TRUE, stringsAsFactors=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
From this hisogram it is safe to assume that the distribution is normal given that the histogram is centred with no skewedness. Other methods should be used to do futher analysis.
> with(Housing, Hist(Price, scale="frequency", breaks="Sturges", col="darkgray"))
> with(Housing, qqPlot(Price, dist="norm", id=list(method="y", n=2, labels=rownames(Housing))))
[1] 104 117
> Boxplot( ~ Price, data=Housing, id=list(method="y"))
[1] "104"
We can accept the null hyupothosis because p-value is greater than 0.05. Meaning that we accept the null and that the price is normally distributed.
> normalityTest(~Price, test="shapiro.test", data=Housing)
Shapiro-Wilk normality test
data: Price
W = 0.98023, p-value = 0.05836
First a correlation matrix can be used to find initial correlation coefficients between price and other variables. The strongest correlations are: - Positive correlation between Price and Bedrooms - Positive correlation between Price and Square Feet - Negative correlation betwwen Price and Offers
> cor(Housing[,c("Bedrooms","Offers","Price","SqFt")], use="complete")
Bedrooms Offers Price SqFt
Bedrooms 1.0000000 0.1142706 0.5259261 0.4838071
Offers 0.1142706 1.0000000 -0.3136359 0.3369234
Price 0.5259261 -0.3136359 1.0000000 0.5529822
SqFt 0.4838071 0.3369234 0.5529822 1.0000000
From this scatter plot we can see that there is a postive correlation. This would mean that as the square feet of the house increases the price increases.
> scatterplot(SqFt~Price, regLine=FALSE, smooth=FALSE, boxplots=FALSE, data=Housing)