week6

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

bike <- read.csv('D:/FALL 2023/STATISTICS/datasets/bike.csv')
summary(bike)

##      Date           Rented.Bike.Count      Hour        Temperature    
##  Length:8760        Min.   :   0.0    Min.   : 0.00   Min.   :-17.80  
##  Class :character   1st Qu.: 191.0    1st Qu.: 5.75   1st Qu.:  3.50  
##  Mode  :character   Median : 504.5    Median :11.50   Median : 13.70  
##                     Mean   : 704.6    Mean   :11.50   Mean   : 12.88  
##                     3rd Qu.:1065.2    3rd Qu.:17.25   3rd Qu.: 22.50  
##                     Max.   :3556.0    Max.   :23.00   Max.   : 39.40  
##     Humidity       Wind.speed      Visibility   Dew.point.temperature
##  Min.   : 0.00   Min.   :0.000   Min.   :  27   Min.   :-30.600      
##  1st Qu.:42.00   1st Qu.:0.900   1st Qu.: 940   1st Qu.: -4.700      
##  Median :57.00   Median :1.500   Median :1698   Median :  5.100      
##  Mean   :58.23   Mean   :1.725   Mean   :1437   Mean   :  4.074      
##  3rd Qu.:74.00   3rd Qu.:2.300   3rd Qu.:2000   3rd Qu.: 14.800      
##  Max.   :98.00   Max.   :7.400   Max.   :2000   Max.   : 27.200      
##  Solar.Radiation     Rainfall          Snowfall         Seasons         
##  Min.   :0.0000   Min.   : 0.0000   Min.   :0.00000   Length:8760       
##  1st Qu.:0.0000   1st Qu.: 0.0000   1st Qu.:0.00000   Class :character  
##  Median :0.0100   Median : 0.0000   Median :0.00000   Mode  :character  
##  Mean   :0.5691   Mean   : 0.1487   Mean   :0.07507                     
##  3rd Qu.:0.9300   3rd Qu.: 0.0000   3rd Qu.:0.00000                     
##  Max.   :3.5200   Max.   :35.0000   Max.   :8.80000                     
##    Holiday          Functioning.Day   
##  Length:8760        Length:8760       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##

Including Plots

You can also embed plots, for example:

##  [1] "Date"                  "Rented.Bike.Count"     "Hour"                 
##  [4] "Temperature"           "Humidity"              "Wind.speed"           
##  [7] "Visibility"            "Dew.point.temperature" "Solar.Radiation"      
## [10] "Rainfall"              "Snowfall"              "Seasons"              
## [13] "Holiday"               "Functioning.Day"

# Create a new dataset with selected columns and calculated columns
dataset1 <- bike %>%
  mutate(
    Wind.Chill = ifelse(Temperature <= 10, 13.12 + 0.6215 * Temperature - 11.37 * Wind.speed^0.16 + 0.3965 * Temperature * Wind.speed^0.16, Temperature),
    Temperature_Dew_Point_Diff = Temperature - Dew.point.temperature
  )

# Select columns for the dataset
dataset1 <- dataset1 %>%
  select(Rented.Bike.Count, Temperature, Wind.Chill, Dew.point.temperature, Temperature_Dew_Point_Diff)

# Print the first few rows of the dataset
head(dataset1)

##   Rented.Bike.Count Temperature Wind.Chill Dew.point.temperature
## 1               254        -5.2  -5.349585                 -17.6
## 2               204        -5.5  -3.373733                 -17.6
## 3               173        -6.0  -4.358000                 -17.7
## 4               107        -6.2  -4.330441                 -17.6
## 5                78        -6.0  -6.317965                 -18.6
## 6               100        -6.4  -5.697357                 -18.7
##   Temperature_Dew_Point_Diff
## 1                       12.4
## 2                       12.1
## 3                       11.7
## 4                       11.4
## 5                       12.6
## 6                       12.3

# Create a new dataset with selected columns and calculated columns
dataset2 <- bike %>%
  mutate(
    Relative_Humidity = (exp((17.625 * Dew.point.temperature) / (243.04 + Dew.point.temperature)) / exp((17.625 * Temperature) / (243.04 + Temperature))) * 100
  )

# Select columns for the dataset
dataset2 <- dataset2 %>%
  select(Rented.Bike.Count, Humidity, Visibility, Relative_Humidity)

# Print the first few rows of the dataset
head(dataset2)

##   Rented.Bike.Count Humidity Visibility Relative_Humidity
## 1               254       37       2000          37.13416
## 2               204       38       2000          37.98850
## 3               173       39       2000          39.13000
## 4               107       40       2000          40.06815
## 5                78       36       2000          36.25807
## 6               100       37       2000          37.06603

# Create a new dataset with selected columns and calculated columns
dataset3 <- bike %>%
  mutate(
    Precipitation = Rainfall + Snowfall
  )

# Select columns for the dataset
dataset3 <- dataset3 %>%
  select(Rented.Bike.Count, Rainfall, Snowfall, Precipitation)

head(dataset3)

##   Rented.Bike.Count Rainfall Snowfall Precipitation
## 1               254        0        0             0
## 2               204        0        0             0
## 3               173        0        0             0
## 4               107        0        0             0
## 5                78        0        0             0
## 6               100        0        0             0

# Scatter plot for Rented Bike Count vs. Wind Chill
ggplot(data = dataset1, aes(x = Wind.Chill, y = Rented.Bike.Count)) +
  geom_point() +
  labs(x = "Wind Chill", y = "Rented Bike Count") +
  ggtitle("Scatter Plot: Rented Bike Count vs. Wind Chill")

###Insights

Wind Chill vs. Rented Bike Count: The scatter plot for wind chill vs. rented bike count also indicates a positive relationship between wind chill and bike rentals. There are no apparent outliers.

Dew Point vs. Rented Bike Count: The scatter plot for dew point vs. rented bike count suggests that as dew point increases, bike rentals tend to increase as well. There are no significant outliers.

# Scatter plot for Rented Bike Count vs. Relative Humidity
ggplot(data = dataset2, aes(x = Relative_Humidity, y = Rented.Bike.Count)) +
  geom_point() +
  labs(x = "Relative Humidity", y = "Rented Bike Count") +
  ggtitle("Scatter Plot: Rented Bike Count vs. Relative Humidity")

###Insights

Relative Humidity vs. Rented Bike Count: The scatter plot for relative humidity vs. rented bike count also indicates a positive relationship. There are no significant outliers.

# Scatter plot for Rented Bike Count vs. Precipitation
ggplot(data = dataset3, aes(x = Precipitation, y = Rented.Bike.Count)) +
  geom_point() +
  labs(x = "Precipitation", y = "Rented Bike Count") +
  ggtitle("Scatter Plot: Rented Bike Count vs. Precipitation")

###Insights:

Precipitation vs. Rented Bike Count: The scatter plot for precipitation vs. rented bike count combines both rainfall and snowfall. It also suggests a potential negative relationship, but there are relatively few data points for high precipitation values. There don’t appear to be significant outliers.

###CORELATION COEFFICIENT

# Calculate correlation coefficients

correlation_wind_chill <- cor(dataset1$Wind.Chill, dataset1$Rented.Bike.Count)


# Interpretation
cat("Correlation between Wind Chill and Rented Bike Count:", correlation_wind_chill, "\n")

## Correlation between Wind Chill and Rented Bike Count: 0.531666

# Calculate correlation coefficients

correlation_relative_humidity <- cor(dataset2$Relative_Humidity, dataset2$Rented.Bike.Count)

# Interpretation

cat("Correlation between Relative Humidity and Rented Bike Count:", correlation_relative_humidity, "\n")

## Correlation between Relative Humidity and Rented Bike Count: -0.2074383

# Calculate correlation coefficients

correlation_precipitation <- cor(dataset3$Precipitation, dataset3$Rented.Bike.Count)

# Interpretation

cat("Correlation between Precipitation and Rented Bike Count:", correlation_precipitation, "\n")

## Correlation between Precipitation and Rented Bike Count: -0.165494

###INTERPRETATION BETWEEN VISUALIZATION AND COEFFICIENT Interpretation based on Correlation Coefficients and Visualizations:

Set 1: Temperature, Wind Chill, and Dew Point vs. Rented Bike Count: Correlation between Wind Chill and Rented Bike Count: Similar to temperature, if the scatter plot indicated a positive relationship, the correlation should be positive. If it showed a negative relationship, the correlation should be negative.

Set 2: Humidity, Visibility, and Relative Humidity vs. Rented Bike Count: Correlation between Relative Humidity and Rented Bike Count: The correlation should match the visualization. If the scatter plot indicated a positive or negative relationship between relative humidity and bike rentals, the correlation should align accordingly.

Set 3: Rainfall, Snowfall, and Precipitation vs. Rented Bike Count: Correlation between Precipitation and Rented Bike Count: The correlation should align with the visualization. If the scatter plot indicated a positive or negative relationship between precipitation (combined rainfall and snowfall) and bike rentals, the correlation should be consistent.

###CONFIDENCE INTERVAL

bike_count <- bike$Rented.Bike.Count

# Calculate the sample mean and standard error
sample_mean <- mean(bike_count)
sample_sd <- sd(bike_count)
n <- length(bike_count)
alpha <- 0.05  # Set the confidence level (e.g., 95% confidence interval)

# Calculate the standard error of the mean
standard_error <- sample_sd / sqrt(n)

# Calculate the margin of error using the t-distribution
t_critical <- qt(1 - alpha/2, df = n - 1)  # Get the t-critical value
margin_of_error <- t_critical * standard_error

# Calculate the confidence interval
confidence_interval <- c(sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the confidence interval
cat("Confidence Interval (95%):", confidence_interval, "\n")

## Confidence Interval (95%): 691.0933 718.1108

Interpretation: The calculated confidence interval (95%) provides a range within which we are reasonably confident that the true population mean number of rented bikes falls. For example, if the calculated confidence interval is (X, Y), it means that we are 95% confident that the true population mean number of rented bikes falls within this range.

Conclusion: The confidence interval provides valuable insights into the likely range of the true population mean for the number of rented bikes. In this case, let’s say the calculated confidence interval is (200, 220). This means that we are 95% confident that the true population mean number of rented bikes falls within this range.

week6

2023-09-28

R Markdown

Including Plots