Packages
library(openintro) # for data
library(tidyverse) # for data wrangling and visualization
library(knitr) # for tables
library(broom) # for model summaryprice = \hat{\beta}_0 + \hat{\beta}_1 \times area + \epsilon \tag{1}
en nu inline:
een formule zoals deze price = \hat{\beta}_0 + \hat{\beta}_1 \times area + \epsilon is mooi!
zie ook (Waal, Goedegebuure, and Geradts 2011). we kunnen er nog eens naar verwijzen (Waal, Goedegebuure, and Geradts 2011).
zoals blijkt uit formule Equation 1
Testje, we kunnen Equation 1 nou moe …
Nu een call-out
In this analysis, we build a model predicting sale prices of houses based on data on houses that were sold in the Duke Forest neighborhood of Durham, NC around November 2020. Let’s start by loading the packages we’ll use for the analysis.
library(openintro) # for data
library(tidyverse) # for data wrangling and visualization
library(knitr) # for tables
library(broom) # for model summaryWe present the results of exploratory data analysis in Section 2 and the regression model in Section 3.
The data contains 98 houses. As part of the exploratory analysis let’s visualize and summarize the relationship between areas and prices of these houses.
Figure 1 shows two histograms displaying the distributions of price and area individually.
ggplot(duke_forest, aes(x = price)) +
geom_histogram(binwidth = 50000) +
labs(title = "Histogram of prices")
ggplot(duke_forest, aes(x = area)) +
geom_histogram(binwidth = 250) +
labs(title = "Histogram of areas")pricesareasFigure 2 displays the relationship between these two variables in a scatterplot.
ggplot(duke_forest, aes(x = area, y = price)) +
geom_point() +
labs(title = "Price and area of houses in Duke Forest")Table 1 displays basic summary statistics for these two variables.
duke_forest %>%
summarise(
`Median price` = median(price),
`IQR price` = IQR(price),
`Median area` = median(area),
`IQR area` = IQR(area),
`Correlation, r` = cor(price, area)
) %>%
kable(digits = c(0, 0, 0, 0, 2))| Median price | IQR price | Median area | IQR area | Correlation, r |
|---|---|---|---|---|
| 540000 | 193125 | 2623 | 1121 | 0.67 |
We can fit a simple linear regression model of the form shown in ?@eq-slr.
[ADD EQUATION HERE]
Table 2 shows the regression output for this model.
price_fit <- lm(price ~ area, data = duke_forest)
price_fit %>%
tidy() %>%
kable(digits = c(0, 0, 2, 2, 2))| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 116652 | 53302.46 | 2.19 | 0.03 |
| area | 159 | 18.17 | 8.78 | 0.00 |