# Set a seed for reproducibility
set.seed(123)
# Generate independent variable (predictor)
x <- 1:100
# Generate dependent variable (response) with some noise
# y = 2*x + 5 + random_noise
y <- 2 * x + 5 + rnorm(100, mean = 0, sd = 20)
# Create a data frame
my_data <- data.frame(x = x, y = y)
# View the first few rows of the data
head(my_data)
Explanation:
set.seed(123): This ensures that the random numbers
generated are the same every time you run this code. This is crucial for
reproducible research.x <- 1:100: Creates a vector x
containing numbers from 1 to 100. This will be our predictor
variable.y <- 2 * x + 5 + rnorm(100, mean = 0, sd = 20):
2 * x + 5: This creates a linear relationship between
y and x with a slope of 2 and an intercept of
5.rnorm(100, mean = 0, sd = 20): This adds random noise
to the y values. rnorm generates random
numbers from a normal distribution. We’re generating 100 such numbers
with a mean of 0 and a standard deviation of 20. This simulates
real-world data where the relationship isn’t perfectly linear.my_data <- data.frame(x = x, y = y): Combines the
x and y vectors into a data frame, which is a
standard way to store and manipulate tabular data in R.head(my_data): Displays the first 6 rows of the
my_data data frame.2. Fitting the Linear Regression Model
The core function in R for linear regression is
lm().
# Fit the linear regression model
# The formula y ~ x means "model y as a function of x"
model <- lm(y ~ x, data = my_data)
Explanation:
lm(): This is the function to fit linear models.y ~ x: This is the formula for the
model.
y is the dependent variable (the variable you want to
predict).~ separates the dependent variable from the independent
variables.x is the independent variable (the variable used to
predict y).data = my_data: Specifies the data frame containing the
variables y and x.3. Examining the Model Results
Once the model is fitted, you can inspect its results using several functions.
# Get a summary of the model
summary(model)
# Get the coefficients of the model
coefficients(model)
# Get the fitted (predicted) values
fitted(model)
# Get the residuals (differences between actual and fitted values)
residuals(model)
Explanation:
summary(model): This is the most comprehensive output.
It provides:
x.coefficients(model): Returns a named vector containing
the intercept and the slope coefficient.fitted(model): Returns a vector of the predicted values
of y for each observation in your data.residuals(model): Returns a vector of the differences
between the actual y values and the predicted
y values.4. Visualizing the Model
It’s always a good practice to visualize your data and the fitted regression line.
# Plot the data and the regression line
plot(my_data$x, my_data$y, main = "Linear Regression Model",
xlab = "Independent Variable (x)", ylab = "Dependent Variable (y)")
abline(model, col = "red", lwd = 2) # Add the regression line
legend("topleft", legend = paste("y =", round(coef(model)[2], 2), "* x +", round(coef(model)[1], 2)),
col = "red", lty = 1, lwd = 2)
Explanation:
plot(my_data$x, my_data$y, ...): Creates a scatter plot
of your data points.
main: Sets the title of the plot.xlab, ylab: Set the labels for the x and y
axes.abline(model, col = "red", lwd = 2):
abline(): A function that draws straight lines on a
plot.model: When given a linear model object, it
automatically draws the regression line defined by the model’s intercept
and slope.col = "red": Sets the color of the line to red.lwd = 2: Sets the line width to 2.legend(...): Adds a legend to the plot, showing the
equation of the fitted line.5. Making Predictions
You can use the fitted model to predict y values for
new, unseen x values.
# Create new data for prediction
new_data <- data.frame(x = c(50, 110, 150))
# Predict y values for the new data
predictions <- predict(model, newdata = new_data)
# Display the predictions
print(predictions)
Explanation:
new_data <- data.frame(x = c(50, 110, 150)): Creates
a new data frame with the x values for which you want to
make predictions. It’s crucial that this data frame has the same column
name (x) as used in the model.predict(model, newdata = new_data):
predict(): The function for making predictions from a
fitted model.model: The fitted linear regression model.newdata = new_data: Specifies the data frame containing
the new predictor values.Example with Real Data (Conceptual)
If you had a CSV file named sales_data.csv with columns
advertising_spend and sales, you would do
this:
# 1. Load your data
sales_data <- read.csv("sales_data.csv")
# 2. Fit the model
sales_model <- lm(sales ~ advertising_spend, data = sales_data)
# 3. Examine results
summary(sales_model)
# 4. Visualize
plot(sales_data$advertising_spend, sales_data$sales,
main = "Sales vs. Advertising Spend",
xlab = "Advertising Spend", ylab = "Sales")
abline(sales_model, col = "blue", lwd = 2)
# 5. Predict for a new advertising spend value
new_ad_spend <- data.frame(advertising_spend = 5000)
predicted_sales <- predict(sales_model, newdata = new_ad_spend)
print(predicted_sales)