2026-03-03

What is linear regression?

  • Linear regression finds a function that takes in 2D points, normally x and y, and creates a linear function that represents the relationship between them.
  • The formula: \[\hat{y} = b_0 + b_1X\]
  • This formula will produce a variable \[\hat{y}\] that can be compared to the true y-value in order to find the error amount.

Data Cleaning:

I am using a dataset imported from kaggle.com that looks at battery usage. The code below is how I cleaned up the data. I will be looking at when a user is playing a game with a brightness higher than 50 and not currenlty charging their device. Here is the code:

battery_not_charging = battery_stats %>%
  filter(Charging_State != 'Charging', 
         Usage_Mode=='Gaming',
         `Brightness_Level_%` > 50)

Code to create scatterplot:

This is the scatter plot of data that I will test for linear regression using plotly.

fig = plot_ly(
  x=battery_not_charging$Screen_On_Time_min, 
  y=battery_not_charging$Battery_Drop_Per_Hour, 
              type='scatter', 
              mode='markers') %>%
  add_lines(x=battery_not_charging$Screen_On_Time_min, y=fitted(lm(
    battery_not_charging$Battery_Drop_Per_Hour
    ~ battery_not_charging$Screen_On_Time_min)))

The Scatterplot:

This graph does not really look to show that there is any trend in the data but we will find the regression line regardless.

Code to create linear regression line:

fig = ggplot(data=battery_not_charging, 
             aes(x=Screen_On_Time_min, y=Battery_Drop_Per_Hour)) + 
  geom_point() + 
  geom_smooth(method='lm', level=.01)

Linear Regression:

The graph appears to have no real relationship but you can see that a linear is graphed to minimize the error between prediction and the actual value.

Code for linear reg with extra:

fig = ggplot(data=battery_not_charging,
             aes(x=Screen_On_Time_min,
                 y=Battery_Drop_Per_Hour,
                 colour = RAM_Usage_MB,
                 size=Battery_Temperature_C)) +
  geom_point() +
  geom_smooth(method='lm', level=.01)

This code will add color to the graph based on RAM usage and shape the bubbles based on battery temperature.

Graph with extra stats:

## `geom_smooth()` using formula = 'y ~ x'

Conclude:

  • In this example we can see that there is not a very strong linear relationship between battery drop and screen on time with some restrictions on the type of app being used and a brightness higher than 50.
  • Although in this example, the relationship isn’t strong, linear regression can be used to make strong predictions using historical data.