library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/StarKid/Desktop/Data_Science/Data_101/week_5/IC9")
day <- read.csv("day.csv")

str(day)
## 'data.frame':    731 obs. of  16 variables:
##  $ instant   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ dteday    : chr  "2011-01-01" "2011-01-02" "2011-01-03" "2011-01-04" ...
##  $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ yr        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ mnth      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ weekday   : int  6 0 1 2 3 4 5 6 0 1 ...
##  $ workingday: int  0 0 1 1 1 1 1 0 0 1 ...
##  $ weathersit: int  2 2 1 1 1 1 2 2 1 1 ...
##  $ temp      : num  0.344 0.363 0.196 0.2 0.227 ...
##  $ atemp     : num  0.364 0.354 0.189 0.212 0.229 ...
##  $ hum       : num  0.806 0.696 0.437 0.59 0.437 ...
##  $ windspeed : num  0.16 0.249 0.248 0.16 0.187 ...
##  $ casual    : int  331 131 120 108 82 88 148 68 54 41 ...
##  $ registered: int  654 670 1229 1454 1518 1518 1362 891 768 1280 ...
##  $ cnt       : int  985 801 1349 1562 1600 1606 1510 959 822 1321 ...

Question 4

Imagine you are the owner of Capital Bike Share. Which variable in day.csv would you use as the response variable in a linear regression problem? Why?

I would chose the weather( as in temperature and humidity). The temperature would decide if people are willing to go out and get a bike. If its sunny and warm it would increase the sales per day. If its hot and humid and people want to get to their destination faster instead of walking could also determine the sales that day. The least likely choice I would believe if its raining. People are less inclined to go out side cause of the rain or more probably in getting into a bike accident. The cold would also determine the amount of sales. So weather in all is the best predictor of sales. If its good weather I would charge more to offset the sales when days that are raining or its cold.

Question 5

5. Use linear regression to create a linear model that relates two quantitative variables in this dataset, with the response variable being the variable you chose in problem 4.

x <- day$temp *100
y <- day$hum * 100

day %>% 
  lm(x ~ y, data = .) %>% 
  summary()
## 
## Call:
## lm(formula = x ~ y, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.907 -15.685   0.253  15.205  38.844 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.29289    3.03982  12.926  < 2e-16 ***
## y            0.16317    0.04722   3.456  0.00058 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.17 on 729 degrees of freedom
## Multiple R-squared:  0.01612,    Adjusted R-squared:  0.01477 
## F-statistic: 11.94 on 1 and 729 DF,  p-value: 0.0005801

#Question 6

6. Create a scatter plot and line of best fit visualization.

plot(hum ~ temp, data = day)

plot(x,y, main = "Temperature vs Humidity",
     xlab = "Temperature", ylab = "Humdity"
     , frame = FALSE) + abline(lm(y~x))

## integer(0)

Question 7

7. Is your linear model statistically significant? How do you know?

In the data set the p-value is 0.0005801 so there is some postive relationships between temperature and humdity. It kinda of makes sense cause when its warm outside the water would not become solid like ice. so colder temperatures would turn water into a solid while temperatures above 32 degrees f the temperture would melt the ice. As it gets hotter there would be mroe humidity based on the the rain evaporating or the ice evaporating into the air.

Question 8

8. What is the correlation coefficient between your two variables?

y = 0.16317x +39

Question 9

9. What is the r2 value for your model? What does this mean?

r squard is 0.16317 . r squared shows how well the data fits in the regression model

Question 10

Plot the residuals for your model. Do they appear random?

model <- lm(y ~ x, data = day)

res <- resid(model)

plot(fitted(model), res)

#abline(0,0)

qqnorm(res) 

#qqline(res)

plot(density(res))

They dont appear random. They show a normal bell shape curve

Question 11

11.Did you meet all of the assumptions for your linear regression results to be valid?

my assumptions of temperature and humidity seemed to be correct. as the higher the temperature the more humidty there is.