The surge in popularity of bike-sharing systems in urban environments has led to an increased interest in comprehending the underlying factors influencing bike rental patterns, (Lu and Lin 2020). The availability of extensive datasets, capturing details ranging from weather conditions to temporal variations, offers a unique opportunity to delve into the complexities of this phenomenon. Previous studies in this domain have often focused on specific aspects such as the influence of weather on bike rentals or the role of time related factors in shaping demand. However, this study endeavors to provide a holistic perspective by considering a diverse array of variables, recognizing the interdependencies that may exist among them. Such a comprehensive analysis is crucial for urban planners, policymakers, and bike-sharing operators seeking to optimize service provision and infrastructure based on a nuanced understanding of the factors that drive bike rental demand. The study revolves around the exploration of a bike-sharing dataset obtained from http://archive.ics.uci.edu/static/public/275/bike+sharing+dataset.zip. The dataset encompasses 731 observations across 16 variables, offering a diverse array of information pertinent to bike rentals.
The study will undertake a systematic and rigorous process encompassing data exploration, descriptive analysis, exploratory data analysis (EDA), and multiple linear regression modeling. The initial step involves a thorough exploration of the dataset. This encompasses a meticulous examination of the structure, types, and general characteristics of the variables, setting the stage for subsequent analyses. Following data exploration, the study will conduct a descriptive analysis to unveil fundamental statistics and characteristics of numeric variables. This phase aims to provide a quantitative summary of the dataset, including measures of central tendency, dispersion, and potential outliers, offering a foundational understanding of the distribution and variability within the data. To enhance comprehension and reveal patterns within the dataset, the study will employ data visualization techniques. Figures depicting the frequency distribution of categorical variables, pairwise relationships between numeric variables, and the daily frequency of total rental bikes over time have been included, which will serve as a pivotal component in unraveling trends, relationships, and potential anomalies within the data. Moreover, this phase aims to uncover relationships between variables, particularly focusing on how categorical predictors may influence the total count of rented bikes.
Multiple regression analysis plays a pivotal role in enhancing our understanding of intricate relationships between variables within the context of a study, such as the one investigating bike rentals. One primary importance lies in its capacity to disentangle the individual contributions of different predictors to the variability observed in the dependent variable, (Kashyap and Swastik 2021). By considering multiple predictors, the study can discern the unique impact of each variable while accounting for the presence of others. In the bike rental analysis, this means unraveling the distinct influence of temporal factors (season, month, day of the week), meteorological conditions (temperature, humidity, windspeed), and categorical variables (working day, weather situation) on the total count of bike rentals. Moreover, multiple regression aids in identifying potential interactions and dependencies among variables, (Butt et al. 2023). Regression analysis allows for the detection of how the effect of one variable may be contingent on the levels of others. Additionally, multiple regression provides a robust framework for hypothesis testing and model validation, (Lu and Lin 2020). By assessing the statistical significance of each predictor, the study can ascertain which variables make a meaningful contribution to predicting the total count of bike rentals.
The dataset employed in the bike rental study serves as a foundational element, providing the empirical basis for investigating the factors influencing the total count of rented bikes. This dataset encompasses various variables that encapsulate diverse dimensions relevant to bike rentals. Among the key variables are temporal features such as ‘season,’ ‘year,’ ‘month,’ and ‘weekday,’ which capture the influence of seasonal, annual, and weekly patterns on bike rental demand. Additionally, meteorological variables like ‘temperature,’ ‘humidity,’ and ‘windspeed’ offer insights into the impact of weather conditions on rental patterns. Categorical variables such as ‘workingday’ and ‘weathersit’ further contribute to the dataset’s richness, allowing for the exploration of how working days and different weather situations influence bike rentals. Furthermore, variables like ‘casual’ and ‘registered’ represent the counts of bike rentals by casual users and registered users, respectively, providing a breakdown of the overall demand.
In the study, the absence of outliers, as identified through boxplots, and the absence of missing values in the dataset represent favorable conditions for statistical analyses. The meticulous examination of boxplots, which visually display the distribution of the total count of rented bikes and other relevant variables, contributes to the assurance that extreme values or anomalies that could distort statistical analyses are not present. Simultaneously, the absence of missing values in the dataset is equally advantageous. The availability of complete data enhances the precision and accuracy of statistical analyses, allowing for a more comprehensive exploration of relationships between variables and a more reliable interpretation of results.
The descriptive statistics of the numeric variables in Table 1 provide a succinct overview of key statistical measures. Notably, the mean, standard deviation, median, minimum, maximum, range, skewness, and kurtosis are presented for each relevant variable. For instance, the variables like temperature (temp) and apparent temperature (atemp) exhibit mean values of 0.4954 and 0.4744, with standard deviations of 0.1831 and 0.1630, respectively. The skewness and kurtosis values for these variables suggest a relatively normal distribution. The visual representation of categorical variables in Figure 1 illustrates the frequency distribution of binary categorical variables. Notably, the ‘holiday’ variable, having only one outcome, will be excluded from subsequent analyses. The plot reveals the prevalence of clear weather days, followed by cloudy conditions, and a lesser occurrence of light rainfall. Additionally, it indicates that there are more working days than non-working days, providing insights into the temporal distribution of bike rentals.
| mean | sd | median | min | max | range | skew | kurtosis | |
|---|---|---|---|---|---|---|---|---|
| temp | 0.4954 | 0.1831 | 0.4983 | 0.0591 | 0.8617 | 0.8025 | -0.0543 | -1.1246 |
| atemp | 0.4744 | 0.1630 | 0.4867 | 0.0791 | 0.8409 | 0.7618 | -0.1306 | -0.9921 |
| hum | 0.6279 | 0.1424 | 0.6267 | 0.0000 | 0.9725 | 0.9725 | -0.0695 | -0.0803 |
| windspeed | 0.1905 | 0.0775 | 0.1810 | 0.0224 | 0.5075 | 0.4851 | 0.6746 | 0.3906 |
| casual | 848.1765 | 686.6225 | 713.0000 | 2.0000 | 3410.0000 | 3408.0000 | 1.2613 | 1.2931 |
| registered | 3656.1724 | 1560.2564 | 3662.0000 | 20.0000 | 6946.0000 | 6926.0000 | 0.0435 | -0.7227 |
| cnt | 4504.3488 | 1937.2115 | 4548.0000 | 22.0000 | 8714.0000 | 8692.0000 | -0.0472 | -0.8206 |
Figure 1: Frequency distribution of categoical variables
Pairwise relationships among numeric variables, as depicted in the pairs plot, uncover correlations that are crucial for subsequent analyses as well as distributions of the numeric vatiables. The time series plot in Figure 3 provides a visual representation of the daily frequency of total rental bikes over time. The plot indicates a recurring pattern with peaks during the mid-year and troughs at the end and beginning of each year. This temporal trend highlights the cyclical nature of bike rentals, potentially influenced by seasonal factors.
Figure 2: Pairwise relationships of numeric variables
Figure 3: Daily frequency of total rental bikes
The boxplots in Figure 4 explore the relationship between categorical variables and the total count of rental bikes. Noteworthy findings include the observation that fall exhibits the highest overall rental bikes, followed by summer, winter, and then spring. Additionally, the year 2012 witnessed more rentals compared to 2011. Examining the months, January tends to have the lowest overall rentals, with a gradual increase from February to June, maintaining a steady frequency until October, and then a decline leading back to January. Furthermore, the analysis indicates consistent rental frequencies throughout the week, with no discernible difference between working and non-working days. Clear weather conditions consistently lead to the highest rentals, followed by cloudy conditions, while light rainfall results in the lowest rental counts.
Figure 4: Relationship between total count and categorical variables
In the process of constructing a multiple linear regression model for predicting bike rentals, a thoughtful selection of the response variable and potential predictors is crucial for obtaining meaningful and interpretable results. The correlation matrix as shpwn in Figure 2, reveals strong correlations, notably a high correlation of 0.992 between temperature (temp) and apparent temperature (atemp). In the context of this study, the decision is made to utilize the ‘temp’ variable in subsequent analyses, given its slightly higher correlation with other variables compared to ‘atemp’. Moreover, the correlation of 0.946 between ‘registered’ and the target variable ‘cnt’ (total count) suggests potential multicollinearity, leading to the choice of ‘cnt’ as the response variable for further analysis. By focusing on the total count, the model aims to capture the comprehensive picture of bike rental patterns, encompassing both casual and registered rentals.
In tandem with the selection of the response variable, a careful consideration of potential predictors is paramount. The predictors, or independent variables, are chosen based on their perceived influence on the response variable. The predictors selected for the linear regression model include seasonal indicators (“season”), the year (“yr”), month (“mnth”), day of the week (“weekday”), working day status (“workingday”), weather conditions (“weathersit”), temperature (“temp”), humidity (“hum”), and windspeed (“windspeed”). Additionally, certain variables, such as “holiday,” “date,” and “instant,” have been intentionally omitted from the model. The exclusion of the “holiday” variable is justified by its singular outcome, implying that it does not provide variability for analysis. Meanwhile, the omission of “date” and “instant” variables can be attributed to their role as identifiers rather than predictors. Including these variables in the regression model would not enhance the understanding of the factors influencing bike rentals and could introduce noise into the analysis, (Desboulets 2018).
Subsequently, the multiple linear regression model is constructed using the selected response and predictor variables. The summary output of the model provides comprehensive information, including coefficients, standard errors, t-values, and p-values for each predictor variable, enabling a detailed examination of their individual contributions to predicting the total count of rented bikes. The diagnostic plots and tests provide valuable insights into the model’s assumptions and performance. The most useful diagnostics to check the assumption of constant variance is by plotting the residuals \(\hat{\varepsilon}\) against the fitted value \(\hat{y}\), (Williams, Grajales, and Kurkiewicz 2019). A Q-Q plot and the shapiro wilk test were used to check the residuals for normality, (Osborne and Waters 2019). A leverage point has an out-of-the-ordinary predictor value and may influence certain model features. It has little influence on regression coefficient estimations, but has a significant impact on model summary statistics such as \(R^2\) and standard errors. The half-normal plot was used to evaluate leverage in residuals, (Grégoire 2014).
The linear regression model constructed for predicting the total count of rented bikes (cnt) reveals several insightful findings, where results are presented in the output below. The overall performance of the model is captured by the Multiple R-squared value of 0.8463, indicating that approximately 84.63% of the variability in total bike rentals can be explained by the predictor variables included in the model. The F-statistic of 177.2 is highly significant with a p-value less than 2.2e-16, suggesting that the overall model is statistically significant.
The intercept term, representing the estimated total count of rented bikes when all predictor variables are zero, is 2346.94. This intercept is statistically significant (p-value < 0.001), providing evidence that the model captures meaningful information about bike rentals even in the absence of the specified predictors. Examining the coefficients for seasonal variables, the model identifies significant variations in bike rentals across different seasons. For instance, the ‘Springer’ season is associated with a decrease in bike rentals by an estimated 846.25 units compared to the reference season, Fall. Conversely, the ‘Winter’ season exhibits a substantial positive impact, leading to an increase of 771.63 units in the total count of rented bikes.
Call:
lm(formula = cnt ~ ., data = dt3)
Residuals:
Min 1Q Median 3Q Max
-3941.8 -355.3 75.1 458.1 2973.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2346.94 321.64 7.297 7.93e-13 ***
seasonSpringer -846.25 213.54 -3.963 8.15e-05 ***
seasonSummer 46.89 185.44 0.253 0.800451
seasonWinter 771.63 191.55 4.028 6.22e-05 ***
yr1 2020.11 58.35 34.622 < 2e-16 ***
mnth10 505.32 242.02 2.088 0.037160 *
mnth11 -152.08 230.83 -0.659 0.510215
mnth12 -99.65 182.54 -0.546 0.585297
mnth2 147.09 144.01 1.021 0.307426
mnth3 574.91 165.37 3.477 0.000539 ***
mnth4 476.36 247.82 1.922 0.054979 .
mnth5 753.16 267.68 2.814 0.005034 **
mnth6 540.66 281.30 1.922 0.055006 .
mnth7 43.64 313.16 0.139 0.889218
mnth8 448.49 301.16 1.489 0.136873
mnth9 994.94 264.86 3.756 0.000187 ***
weekday 68.89 14.33 4.806 1.88e-06 ***
workingdayYes 166.34 61.93 2.686 0.007396 **
weathersitCloudy -470.32 77.19 -6.093 1.82e-09 ***
weathersitLight Rainfall -1960.57 196.29 -9.988 < 2e-16 ***
temp 4402.22 410.61 10.721 < 2e-16 ***
hum -1479.18 292.10 -5.064 5.24e-07 ***
windspeed -2909.10 406.83 -7.151 2.16e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 771.3 on 708 degrees of freedom
Multiple R-squared: 0.8463, Adjusted R-squared: 0.8415
F-statistic: 177.2 on 22 and 708 DF, p-value: < 2.2e-16
Regarding the year variable, ‘yr1’ (representing the year 2012) has a positive coefficient of 2020.11. This implies that, on average, the total count of rented bikes in 2012 is estimated to be 2020.11 units higher than in the reference year 2011. This difference is highly statistically significant (p-value < 0.001), emphasizing the influence of the year on bike rentals. Considering the month variables, each month exhibits a distinct impact on bike rentals compared to the reference month, January. For example, ‘mnth3’ (March) is associated with an increase of 574.91 units in the total count, and this effect is statistically significant with a p-value of 0.000539. On the other hand, ‘mnth11’ (November) does not show a statistically significant impact on bike rentals.
The ‘weekday’ variable, representing the day of the week, has a positive coefficient of 68.89, indicating that, on average, each additional weekday is associated with an increase of 68.89 units in the total count of rented bikes. This effect is statistically significant (p-value < 0.001). The ‘workingdayYes’ variable, distinguishing between working and non-working days, has a positive coefficient of 166.34, suggesting that working days are associated with an increase in bike rentals compared to non-working days. This effect is statistically significant with a p-value of 0.007396. For the ‘weathersit’ variable, the coefficients for ‘Cloudy’ and ‘Light Rainfall’ indicate the impact of different weather conditions on bike rentals compared to the reference condition, ‘Clear.’ ‘Cloudy’ is associated with a decrease of 470.32 units, while ‘Light Rainfall’ leads to a substantial decrease of 1960.57 units in the total count of rented bikes. Both effects are highly statistically significant.
The coefficients for numeric variables, such as ‘temp,’ ‘hum’ (humidity), and ‘windspeed,’ provide insights into the quantitative impact of these meteorological factors on bike rentals. For instance, a one-unit increase in temperature (‘temp’) is associated with an increase of 4402.22 units in the total count of rented bikes, emphasizing the strong positive relationship between temperature and bike rentals. Conversely, ‘hum’ and ‘windspeed’ show negative associations, with increases in humidity and windspeed leading to decreases in bike rentals. In summary, the multiple linear regression model provides a comprehensive understanding of the factors influencing bike rentals, encompassing seasonal variations, year effects, month-to-month differences, day of the week, working day distinctions, and the impact of weather conditions and meteorological variables. The model demonstrates strong explanatory power, and the coefficients shed light on the nuanced relationships between predictors and the total count of rented bikes.
Figure 4 indicates an almost straight red line around zero implying residuals exhibit constant variance, which should prompt no change in the structural form of the model. There are however outliers in the residuals for certain observations. Additionally, the regression model’s tests and confidence intervals are based on the assumption of normal errors. The QQ plot reveals that there are a lot of observations with standardized residuals greater than 2 that need to be omitted from the data. The half normal plot displays two observations corresponding to dates “2011-03-31”, and “2011-03-10”, which appear to have large leverages and should also be removed from the resampled data.
Figure 4: Model diagnostics of the fitted regression model
In light of the above observations, the dataset is resampled and the model is fit again where diagnostic tests are presented below. To statistically check the normality of residuals, the Shapiro Wilk Test of Normality is conducted where results are presented below. It is observed that . Finally, outlier testing can help distinguish between really abnormal observations and residuals that are large but not distinctive. The fitted model’s studentized residuals are generated, where the highest studentized residual is retrieved and the Bonferroni critical is determined to evaluate the significance of this outlier, as presented in the output below. The biggest residual is for the date “2012-03-17” and the value is 4.200546, which is rather large for a standard normal distribution. In absolute terms, the predicted Bonferroni critical value is 4.55319. The observation’s absolute residual 4.415 is less than the absolute critical value, indicating that the observation is not an anomaly. Ultimately, the improved model, validated through diagnostic tests, aligns well with the assumptions of linear regression, addressing concerns related to normality and leverage.
Shapiro-Wilk normality test
data: resid(m2)
W = 0.97048, p-value = 6.363e-11
2012-03-17
4.200546
[1] -4.553191
The comprehensive analysis of bike rental data provides valuable insights that can inform strategic decisions for stakeholders. Understanding the factors influencing rental patterns is crucial for optimizing operational efficiency and enhancing user experience. The study’s findings offer nuanced perspectives on the interplay of temporal, meteorological, and socio-demographic variables, providing stakeholders with actionable information.
The temporal trends revealed in the time series plot underscore the importance of seasonality in bike rentals. Stakeholders can leverage this insight to anticipate peak demand periods and allocate resources effectively. For instance, the recurrent mid-year peaks suggest a need for increased bike availability, maintenance, and customer support during these periods.
The impact of meteorological factors, as indicated by the linear regression model, is of paramount importance for stakeholders. Temperature (‘temp’) emerges as a significant predictor, showcasing a strong positive association with rental counts. Stakeholders can use this information to optimize inventory management, ensuring an ample supply of bikes during periods of favorable weather conditions. Additionally, recognizing the influence of weather situations such as ‘Light Rainfall’ on reduced rentals highlights the importance of proactive measures, such as targeted marketing or promotional activities during adverse weather.
The insights gleaned from the relationship between categorical variables and rental counts provide actionable information for stakeholders. Understanding the varying rental patterns across seasons, years, and months allows for strategic planning of marketing campaigns, promotions, and infrastructure maintenance. Stakeholders can tailor their strategies to align with specific periods of high demand, such as promoting winter biking during peak fall season rentals.
The diagnostic tests and model refinements underscore the importance of ongoing monitoring and validation. Stakeholders should remain vigilant to outliers and deviations from model assumptions. Continuous validation ensures that predictive models remain robust and aligned with evolving patterns in bike rentals. Furthermore, stakeholders may explore opportunities for collaboration with local weather services to enhance predictive accuracy, especially during periods of extreme weather conditions.
Stakeholders in the bike rental industry can leverage the study’s findings to optimize resource allocation, enhance user experience, and develop targeted strategies. The cyclical nature of bike rentals, coupled with the impact of weather and temporal factors, provides a roadmap for informed decision-making. Ongoing monitoring and collaboration with relevant services can further refine predictive models and contribute to a resilient and responsive bike rental ecosystem.
While the study contributes valuable insights to the field of bike-sharing systems, researchers should remain cognizant of the limitations and potential biases inherent in the analysis. Addressing these limitations and exploring avenues for improvement will pave the way for more robust and generalizable models in future research endeavors. One notable limitation is the reliance on a single dataset obtained from a specific source. The dataset’s representativeness may be constrained by factors such as geographical specificity, bike-sharing system characteristics, and temporal scope. Generalizing the study’s conclusions to a broader context or different bike-sharing systems requires caution. Future research could benefit from incorporating diverse datasets from multiple locations and systems to enhance the external validity of the findings.
Another potential source of bias arises from the variables included in the model. While the study carefully selected predictors based on their perceived relevance to bike rental patterns, the exclusion or inclusion of certain variables may introduce bias. For instance, unaccounted-for socio-economic factors, cultural nuances, or specific events during the study period could influence bike rentals but may not be adequately captured by the chosen variables. Future research should strive to identify and incorporate additional variables that may contribute to a more comprehensive understanding of bike rental dynamics.
The assumption of linearity in the multiple linear regression model is another point of consideration. While this assumption simplifies the model, it may not fully capture potential non-linear relationships between predictors and the response variable. Future research could explore more advanced modeling techniques, such as non-linear regression or machine learning algorithms, to uncover complex patterns that may be overlooked in linear models. Furthermore, the analysis primarily focused on the impact of meteorological, temporal, and categorical variables on bike rentals. Other influential factors, such as marketing initiatives, promotional campaigns, or changes in infrastructure, were not explicitly incorporated into the model. Future studies could expand the scope to include these additional factors, providing a more holistic understanding of the determinants of bike rental demand.
# Load required packages
library(psych)
library(reshape2)
library(gridExtra)
library(GGally)
library(reshape2)
library(lubridate)
library(faraway)
library(lmtest)
library(tidyverse)
# Import dataset
day <- read.csv("day.csv")
# New data for visualization
dt1 <- day %>%
mutate(season = case_when(
season == 1 ~ "Springer",
season == 2 ~ "Summer",
season == 3 ~ "Fall",
season == 4 ~ "Winter"
)) %>%
mutate(weathersit = case_when(
weathersit == 1 ~ "Clear",
weathersit == 2 ~ "Cloudy",
weathersit == 3 ~ "Light Rainfall",
weathersit == 4 ~ "Heavy Rainfall"
)) %>%
mutate(
holiday = ifelse(holiday == 1, "Yes", "No"),
holiday = ifelse(holiday == 1, "Yes", "No"),
workingday = ifelse(workingday == 1, "Yes", "No"),
mnth = as.character(mnth),
dteday = as.POSIXct(dteday, format = "%Y-%m-%d"),
yr = as.character(yr)
)
glimpse(dt1)
# Split dataset into character and continuous variables
df1 <- dt1[ ,sapply(dt1, is.character)] # Character variables
df2 <- dt1[ ,!sapply(dt1, is.character)] # Continuous variables
# Descriptive statistics
dsc <- describe(df2[,-c(1,2,3)]) %>%
select(-vars, -n, -trimmed, -mad, -se)
# Present results
knitr::kable(dsc, digits = 4,
caption = "Table 1: Descriptive statistics of numeric variables")
# skew and kurtosis values suggest normal distribution of numeric variables
Frequencies2 <- apply(df1[,-c(1:3)],2, table) %>%
lapply(data.frame) # Get Frequencies of binary categorical variables
ggplot(bind_rows(Frequencies2, .id="df"), aes(x = Var1, y = Freq, fill = Var1)) +
geom_bar(stat = "identity") +
theme(legend.position = "none") +
geom_text(aes(label=Freq), vjust=-0.3, size=2.5) +
labs(x = "") + facet_wrap(~ df, scales = "free", ncol = 1)
# Holiday variable will be removed in subsequent analysis since it has one outcome
# Clear weather days are the most frequent, followed by cloudy, followed by light rainfall
# There are more working days than non working days
# Pairwise relationships of numeric variables
ggpairs(df2[,-c(1:3)])
# From the pairs plot, temp and atemp highly correlated (0.992), temp variable will be used in subsequent analysis
# Registered and cnt (Total count) are also highly correlated (0.946) therefore cnt will be used as the response.
round(cor(df2[,-c(1:3)]), 3)
ts_bikes <- data.frame(
Date = dt1$dteday, TotalCount = dt1$cnt, Registered = dt1$registered,
Casual = dt1$casual) %>%
melt(id = "Date")
ggplot(ts_bikes, aes(x = Date, y = sqrt(value), color = variable)) +
geom_line() +
labs(y = "Value", color = "Variable")
# Variables obserrve alsmost similar pattern. Peaks during mid year and troughs during end and beginning of years.
# Relationship between demographics and satisfaction
dt2 <- dt1 %>% select(season,yr,mnth,weekday,workingday,weathersit,cnt) %>%
melt(id = "cnt")
ggplot(dt2, aes(y = cnt, x = value, fill = value)) +
geom_boxplot() +
labs(y = "Total Rental Bikes", x = "") +
theme(legend.position = "none") +
facet_wrap(~variable, ncol = 2, scales = "free_x")
# Fall has the highest overall rental bikes, followed by summer then winter then spring
# 2012 had more rentals than 2011
# Jan has lowest overal rentals, gradual increase from feb to june then steady till oct then begins to fall to Jan
# Generally, rentals are the same throughout the week, also working and non working days
# Highest rentals on clear weather, then cloudy, lowest on light rainfall
# Select predictors for the LR model
dt3 <- dt1[,-c(1,2,6,11,14,15)]
rownames(dt3) <- dt1$dteday
# Rows: 731
# Columns: 11
#
# Linear Regression Model
m1 <- lm(cnt ~., data = dt3)
summary(m1)
par(mfrow = c(2,2))
plot(m1, which = 1)
plot(m1, which = 2)
hatv <- hatvalues(m1)
halfnorm(hatv,ylab="Leverages", main="Half-normal Plot",labs=rownames(dt3))
cook <- cooks.distance(m1)
halfnorm(cook,3,labs=rownames(dt3),ylab="Cook's distances")
par(mfrow = c(1,1))
dt_resamp <- dt3[-c(69, 90, 692, 266, 668, 669), ]
# Linear Regression Model
m2 <- lm(cnt ~., data = dt_resamp)
# Check normality of residuals
shapiro.test(resid(m2))
stud <- rstudent(m2) #Compute studentized residuals
stud[which.max(abs(stud))] #Extract the maximum studentized residual
qt(.05/(n*22),n-22) # Critical value