Inferential statistics are statistical methods that allow us to make meaningful predictions based on data.
Uses of Inferential Statistics:
- Testing hypotheses
- Estimating parameters or boundaries of data values
- Making meaningful predictions
2024-06-04
Inferential statistics are statistical methods that allow us to make meaningful predictions based on data.
Uses of Inferential Statistics:
What is Linear Regression?
A line that models the relationship between a dependent variable and an independent variable through a data set.
Often, the independent variable is denoted by “y”, or the vertical axis.
The dependent variable (or the variable that may “cause” a certain result) is denoted by “x”, or the horizontal axis.
Simple Linear Regression is often referred to as the “line of best fit”.
One way to calculate how well a line fits a data set is through the R-Squared method.
This method measures the proportion of variance explained by the independent variable.
\[ R^2 = 1 - \frac{\text{SS}_{\text{Regression Error}}}{\text{SS}_{\text{Total Error}}} \]
Simple Linear Regression is a line, therefore, is represented by the following equation.
\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \] (In layman’s terms)
Y = Y-intercept + Slope*x + error
The following plot is from the data set train.csv. Points on the graph represent second hand vehicles. Price, our independent variable, is on the y-axis, and mileage, our dependent variable, is on the x axis.
x = train$km
y = train$current.price
plot(x,y, pch = 16,
col = "slateblue4",
xlab = "Mileage",
ylab = "Price",
main = "Mileage vs Price of Used Car Market"
)
Using the same data, a ggplot2 can also be used to generate a plot:
## `geom_smooth()` using formula = 'y ~ x'
ggplot(train, aes(x = km, y = current.price)) +
geom_point(color = "mediumslateblue", size = .3) +
labs(title = "Mileage vs Price: Used Car Market",
x = "Mileage",
y = "Price") +
geom_smooth(method = "lm", se = FALSE, color = "orangered") +
theme_minimal()
Simple linear regression allows us to make meaningful predictions based on data. For the used car market data set, we are able to see a distinct negative correlation between price and mileage. Meaning, as mileage goes up, price goes down. This makes sense because mileage indicates the amount of wear on a vehicle, therefore, a car with more mileage is prone to more repairs and thus has a lower value.