This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
bike <- read.csv('D:/FALL 2023/STATISTICS/datasets/bike.csv')
summary(bike)
## Date Rented.Bike.Count Hour Temperature
## Length:8760 Min. : 0.0 Min. : 0.00 Min. :-17.80
## Class :character 1st Qu.: 191.0 1st Qu.: 5.75 1st Qu.: 3.50
## Mode :character Median : 504.5 Median :11.50 Median : 13.70
## Mean : 704.6 Mean :11.50 Mean : 12.88
## 3rd Qu.:1065.2 3rd Qu.:17.25 3rd Qu.: 22.50
## Max. :3556.0 Max. :23.00 Max. : 39.40
## Humidity Wind.speed Visibility Dew.point.temperature
## Min. : 0.00 Min. :0.000 Min. : 27 Min. :-30.600
## 1st Qu.:42.00 1st Qu.:0.900 1st Qu.: 940 1st Qu.: -4.700
## Median :57.00 Median :1.500 Median :1698 Median : 5.100
## Mean :58.23 Mean :1.725 Mean :1437 Mean : 4.074
## 3rd Qu.:74.00 3rd Qu.:2.300 3rd Qu.:2000 3rd Qu.: 14.800
## Max. :98.00 Max. :7.400 Max. :2000 Max. : 27.200
## Solar.Radiation Rainfall Snowfall Seasons
## Min. :0.0000 Min. : 0.0000 Min. :0.00000 Length:8760
## 1st Qu.:0.0000 1st Qu.: 0.0000 1st Qu.:0.00000 Class :character
## Median :0.0100 Median : 0.0000 Median :0.00000 Mode :character
## Mean :0.5691 Mean : 0.1487 Mean :0.07507
## 3rd Qu.:0.9300 3rd Qu.: 0.0000 3rd Qu.:0.00000
## Max. :3.5200 Max. :35.0000 Max. :8.80000
## Holiday Functioning.Day
## Length:8760 Length:8760
## Class :character Class :character
## Mode :character Mode :character
##
##
##
You can also embed plots, for example:
## [1] "Date" "Rented.Bike.Count" "Hour"
## [4] "Temperature" "Humidity" "Wind.speed"
## [7] "Visibility" "Dew.point.temperature" "Solar.Radiation"
## [10] "Rainfall" "Snowfall" "Seasons"
## [13] "Holiday" "Functioning.Day"
# Create a new dataset with selected columns and calculated columns
dataset1 <- bike %>%
mutate(
Wind.Chill = ifelse(Temperature <= 10, 13.12 + 0.6215 * Temperature - 11.37 * Wind.speed^0.16 + 0.3965 * Temperature * Wind.speed^0.16, Temperature),
Temperature_Dew_Point_Diff = Temperature - Dew.point.temperature
)
# Select columns for the dataset
dataset1 <- dataset1 %>%
select(Rented.Bike.Count, Temperature, Wind.Chill, Dew.point.temperature, Temperature_Dew_Point_Diff)
# Print the first few rows of the dataset
head(dataset1)
## Rented.Bike.Count Temperature Wind.Chill Dew.point.temperature
## 1 254 -5.2 -5.349585 -17.6
## 2 204 -5.5 -3.373733 -17.6
## 3 173 -6.0 -4.358000 -17.7
## 4 107 -6.2 -4.330441 -17.6
## 5 78 -6.0 -6.317965 -18.6
## 6 100 -6.4 -5.697357 -18.7
## Temperature_Dew_Point_Diff
## 1 12.4
## 2 12.1
## 3 11.7
## 4 11.4
## 5 12.6
## 6 12.3
# Create a new dataset with selected columns and calculated columns
dataset2 <- bike %>%
mutate(
Relative_Humidity = (exp((17.625 * Dew.point.temperature) / (243.04 + Dew.point.temperature)) / exp((17.625 * Temperature) / (243.04 + Temperature))) * 100
)
# Select columns for the dataset
dataset2 <- dataset2 %>%
select(Rented.Bike.Count, Humidity, Visibility, Relative_Humidity)
# Print the first few rows of the dataset
head(dataset2)
## Rented.Bike.Count Humidity Visibility Relative_Humidity
## 1 254 37 2000 37.13416
## 2 204 38 2000 37.98850
## 3 173 39 2000 39.13000
## 4 107 40 2000 40.06815
## 5 78 36 2000 36.25807
## 6 100 37 2000 37.06603
# Create a new dataset with selected columns and calculated columns
dataset3 <- bike %>%
mutate(
Precipitation = Rainfall + Snowfall
)
# Select columns for the dataset
dataset3 <- dataset3 %>%
select(Rented.Bike.Count, Rainfall, Snowfall, Precipitation)
# Print the first few rows of the dataset
head(dataset3)
## Rented.Bike.Count Rainfall Snowfall Precipitation
## 1 254 0 0 0
## 2 204 0 0 0
## 3 173 0 0 0
## 4 107 0 0 0
## 5 78 0 0 0
## 6 100 0 0 0
# Scatter plot for Rented Bike Count vs. Wind Chill
ggplot(data = dataset1, aes(x = Wind.Chill, y = Rented.Bike.Count)) +
geom_point() +
labs(x = "Wind Chill", y = "Rented Bike Count") +
ggtitle("Scatter Plot: Rented Bike Count vs. Wind Chill")
###Insights
Wind Chill vs. Rented Bike Count: The scatter plot for wind chill vs. rented bike count also indicates a positive relationship between wind chill and bike rentals. There are no apparent outliers.
Dew Point vs. Rented Bike Count: The scatter plot for dew point vs. rented bike count suggests that as dew point increases, bike rentals tend to increase as well. There are no significant outliers.
# Scatter plot for Rented Bike Count vs. Relative Humidity
ggplot(data = dataset2, aes(x = Relative_Humidity, y = Rented.Bike.Count)) +
geom_point() +
labs(x = "Relative Humidity", y = "Rented Bike Count") +
ggtitle("Scatter Plot: Rented Bike Count vs. Relative Humidity")
###Insights
Relative Humidity vs. Rented Bike Count: The scatter plot for relative humidity vs. rented bike count also indicates a positive relationship. There are no significant outliers.
# Scatter plot for Rented Bike Count vs. Precipitation
ggplot(data = dataset3, aes(x = Precipitation, y = Rented.Bike.Count)) +
geom_point() +
labs(x = "Precipitation", y = "Rented Bike Count") +
ggtitle("Scatter Plot: Rented Bike Count vs. Precipitation")
###Insights:
Precipitation vs. Rented Bike Count: The scatter plot for precipitation vs. rented bike count combines both rainfall and snowfall. It also suggests a potential negative relationship, but there are relatively few data points for high precipitation values. There don’t appear to be significant outliers.
###CORELATION COEFFICIENT
# Calculate correlation coefficients
correlation_wind_chill <- cor(dataset1$Wind.Chill, dataset1$Rented.Bike.Count)
# Interpretation
cat("Correlation between Wind Chill and Rented Bike Count:", correlation_wind_chill, "\n")
## Correlation between Wind Chill and Rented Bike Count: 0.531666
# Calculate correlation coefficients
correlation_relative_humidity <- cor(dataset2$Relative_Humidity, dataset2$Rented.Bike.Count)
# Interpretation
cat("Correlation between Relative Humidity and Rented Bike Count:", correlation_relative_humidity, "\n")
## Correlation between Relative Humidity and Rented Bike Count: -0.2074383
# Calculate correlation coefficients
correlation_precipitation <- cor(dataset3$Precipitation, dataset3$Rented.Bike.Count)
# Interpretation
cat("Correlation between Precipitation and Rented Bike Count:", correlation_precipitation, "\n")
## Correlation between Precipitation and Rented Bike Count: -0.165494
###INTERPRETATION BETWEEN VISUALIZATION AND COEFFICIENT Interpretation based on Correlation Coefficients and Visualizations:
Set 1: Temperature, Wind Chill, and Dew Point vs. Rented Bike Count: Correlation between Wind Chill and Rented Bike Count: Similar to temperature, if the scatter plot indicated a positive relationship, the correlation should be positive. If it showed a negative relationship, the correlation should be negative.
Set 2: Humidity, Visibility, and Relative Humidity vs. Rented Bike Count: Correlation between Relative Humidity and Rented Bike Count: The correlation should match the visualization. If the scatter plot indicated a positive or negative relationship between relative humidity and bike rentals, the correlation should align accordingly.
Set 3: Rainfall, Snowfall, and Precipitation vs. Rented Bike Count: Correlation between Precipitation and Rented Bike Count: The correlation should align with the visualization. If the scatter plot indicated a positive or negative relationship between precipitation (combined rainfall and snowfall) and bike rentals, the correlation should be consistent.
###CONFIDENCE INTERVAL
# Assuming you have the dataset with "Rented.Bike.Count"
bike_count <- bike$Rented.Bike.Count
# Calculate the sample mean and standard error
sample_mean <- mean(bike_count)
sample_sd <- sd(bike_count)
n <- length(bike_count)
alpha <- 0.05 # Set the confidence level (e.g., 95% confidence interval)
# Calculate the standard error of the mean
standard_error <- sample_sd / sqrt(n)
# Calculate the margin of error using the t-distribution
t_critical <- qt(1 - alpha/2, df = n - 1) # Get the t-critical value
margin_of_error <- t_critical * standard_error
# Calculate the confidence interval
confidence_interval <- c(sample_mean - margin_of_error, sample_mean + margin_of_error)
# Print the confidence interval
cat("Confidence Interval (95%):", confidence_interval, "\n")
## Confidence Interval (95%): 691.0933 718.1108
Interpretation: The calculated confidence interval (95%) provides a range within which we are reasonably confident that the true population mean number of rented bikes falls. For example, if the calculated confidence interval is (X, Y), it means that we are 95% confident that the true population mean number of rented bikes falls within this range.
Conclusion: The confidence interval provides valuable insights into the likely range of the true population mean for the number of rented bikes. In this case, let’s say the calculated confidence interval is (200, 220). This means that we are 95% confident that the true population mean number of rented bikes falls within this range.