R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(purrr)
library(ggplot2)
bike <- read.csv('D:/FALL 2023/STATISTICS/datasets/bike.csv')
## Random sampling of bike dataset

```r
num <-sample(5:10,1)
columns <- c("Seasons", "Holiday", "Rented.Bike.Count", "Visibility", "Humidity", "Snowfall")
subsample_list <- list()
for (i in 1:num) {
  # Determine sample size (approximately 50% of the data)
  s_size <- round(0.5 * nrow(bike))
  # Randomly select rows with replacement
  s_index <- sample(1:nrow(bike), size = s_size, replace = TRUE)
  # Create the subsample data frame
  subsample <- bike[s_index, columns]
  # Store the subsample in the list
  subsample_list[[i]] <- subsample
}
View(subsample_list)

Displaying the dimensions of each subsample

sapply(subsample_list, dim)
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 4380 4380 4380 4380 4380 4380
## [2,]    6    6    6    6    6    6

Summaries of the sample data

summary_table <- lapply(subsample_list, function(subsample){
  summary_df <-summary(subsample)
  knitr::kable(summary_df, caption = "summary statistics")
}) 
for (i in 1: num){
  cat("### Subsample", i, "summary statisics \n")
  print(summary_table[[i]])
}
## ### Subsample 1 summary statisics 
## 
## 
## Table: summary statistics
## 
## |   |  Seasons        |  Holiday        |Rented.Bike.Count |  Visibility   |   Humidity   |   Snowfall     |
## |:--|:----------------|:----------------|:-----------------|:--------------|:-------------|:---------------|
## |   |Length:4380      |Length:4380      |Min.   :   0.0    |Min.   :  33.0 |Min.   : 0.00 |Min.   :0.00000 |
## |   |Class :character |Class :character |1st Qu.: 192.0    |1st Qu.: 951.5 |1st Qu.:42.00 |1st Qu.:0.00000 |
## |   |Mode  :character |Mode  :character |Median : 490.5    |Median :1681.0 |Median :57.00 |Median :0.00000 |
## |   |NA               |NA               |Mean   : 704.9    |Mean   :1435.9 |Mean   :58.12 |Mean   :0.07011 |
## |   |NA               |NA               |3rd Qu.:1062.0    |3rd Qu.:1999.0 |3rd Qu.:74.00 |3rd Qu.:0.00000 |
## |   |NA               |NA               |Max.   :3556.0    |Max.   :2000.0 |Max.   :98.00 |Max.   :8.80000 |
## ### Subsample 2 summary statisics 
## 
## 
## Table: summary statistics
## 
## |   |  Seasons        |  Holiday        |Rented.Bike.Count |  Visibility |   Humidity   |   Snowfall     |
## |:--|:----------------|:----------------|:-----------------|:------------|:-------------|:---------------|
## |   |Length:4380      |Length:4380      |Min.   :   0.0    |Min.   :  27 |Min.   : 0.00 |Min.   :0.00000 |
## |   |Class :character |Class :character |1st Qu.: 193.0    |1st Qu.: 946 |1st Qu.:42.00 |1st Qu.:0.00000 |
## |   |Mode  :character |Mode  :character |Median : 479.5    |Median :1674 |Median :57.00 |Median :0.00000 |
## |   |NA               |NA               |Mean   : 704.7    |Mean   :1435 |Mean   :58.24 |Mean   :0.07023 |
## |   |NA               |NA               |3rd Qu.:1071.2    |3rd Qu.:2000 |3rd Qu.:74.00 |3rd Qu.:0.00000 |
## |   |NA               |NA               |Max.   :3298.0    |Max.   :2000 |Max.   :98.00 |Max.   :8.80000 |
## ### Subsample 3 summary statisics 
## 
## 
## Table: summary statistics
## 
## |   |  Seasons        |  Holiday        |Rented.Bike.Count |  Visibility |   Humidity   |   Snowfall     |
## |:--|:----------------|:----------------|:-----------------|:------------|:-------------|:---------------|
## |   |Length:4380      |Length:4380      |Min.   :   0.0    |Min.   :  27 |Min.   : 0.00 |Min.   :0.00000 |
## |   |Class :character |Class :character |1st Qu.: 185.0    |1st Qu.: 895 |1st Qu.:43.00 |1st Qu.:0.00000 |
## |   |Mode  :character |Mode  :character |Median : 479.5    |Median :1662 |Median :57.00 |Median :0.00000 |
## |   |NA               |NA               |Mean   : 692.7    |Mean   :1420 |Mean   :58.52 |Mean   :0.07888 |
## |   |NA               |NA               |3rd Qu.:1044.0    |3rd Qu.:2000 |3rd Qu.:75.00 |3rd Qu.:0.00000 |
## |   |NA               |NA               |Max.   :3556.0    |Max.   :2000 |Max.   :98.00 |Max.   :8.80000 |
## ### Subsample 4 summary statisics 
## 
## 
## Table: summary statistics
## 
## |   |  Seasons        |  Holiday        |Rented.Bike.Count |  Visibility |   Humidity   |   Snowfall     |
## |:--|:----------------|:----------------|:-----------------|:------------|:-------------|:---------------|
## |   |Length:4380      |Length:4380      |Min.   :   0.0    |Min.   :  53 |Min.   : 0.00 |Min.   :0.00000 |
## |   |Class :character |Class :character |1st Qu.: 202.0    |1st Qu.: 955 |1st Qu.:42.00 |1st Qu.:0.00000 |
## |   |Mode  :character |Mode  :character |Median : 518.0    |Median :1715 |Median :57.00 |Median :0.00000 |
## |   |NA               |NA               |Mean   : 720.8    |Mean   :1447 |Mean   :57.96 |Mean   :0.07048 |
## |   |NA               |NA               |3rd Qu.:1088.0    |3rd Qu.:2000 |3rd Qu.:74.00 |3rd Qu.:0.00000 |
## |   |NA               |NA               |Max.   :3418.0    |Max.   :2000 |Max.   :98.00 |Max.   :6.00000 |
## ### Subsample 5 summary statisics 
## 
## 
## Table: summary statistics
## 
## |   |  Seasons        |  Holiday        |Rented.Bike.Count |  Visibility   |   Humidity   |   Snowfall     |
## |:--|:----------------|:----------------|:-----------------|:--------------|:-------------|:---------------|
## |   |Length:4380      |Length:4380      |Min.   :   0.0    |Min.   :  33.0 |Min.   : 0.00 |Min.   :0.00000 |
## |   |Class :character |Class :character |1st Qu.: 193.0    |1st Qu.: 954.2 |1st Qu.:43.00 |1st Qu.:0.00000 |
## |   |Mode  :character |Mode  :character |Median : 499.0    |Median :1700.0 |Median :57.00 |Median :0.00000 |
## |   |NA               |NA               |Mean   : 704.4    |Mean   :1444.2 |Mean   :58.23 |Mean   :0.07781 |
## |   |NA               |NA               |3rd Qu.:1058.0    |3rd Qu.:2000.0 |3rd Qu.:74.00 |3rd Qu.:0.00000 |
## |   |NA               |NA               |Max.   :3404.0    |Max.   :2000.0 |Max.   :98.00 |Max.   :7.10000 |
## ### Subsample 6 summary statisics 
## 
## 
## Table: summary statistics
## 
## |   |  Seasons        |  Holiday        |Rented.Bike.Count |  Visibility |   Humidity   |   Snowfall     |
## |:--|:----------------|:----------------|:-----------------|:------------|:-------------|:---------------|
## |   |Length:4380      |Length:4380      |Min.   :   0.0    |Min.   :  27 |Min.   : 0.00 |Min.   :0.00000 |
## |   |Class :character |Class :character |1st Qu.: 189.8    |1st Qu.: 906 |1st Qu.:43.00 |1st Qu.:0.00000 |
## |   |Mode  :character |Mode  :character |Median : 489.5    |Median :1686 |Median :58.00 |Median :0.00000 |
## |   |NA               |NA               |Mean   : 698.4    |Mean   :1424 |Mean   :58.69 |Mean   :0.08171 |
## |   |NA               |NA               |3rd Qu.:1059.0    |3rd Qu.:2000 |3rd Qu.:75.00 |3rd Qu.:0.00000 |
## |   |NA               |NA               |Max.   :3404.0    |Max.   :2000 |Max.   :98.00 |Max.   :5.00000 |

Scrutinizing subsamples

# summary statistics for each subsamples
summary_stats <-lapply(subsample_list, summary)
# plot histogram
histograms <- lapply(subsample_list, function(subsample) {
  ggplot(subsample, aes(x = Humidity)) +
    geom_histogram(binwidth = 1, fill = 'blue', color = 'black') +
    labs(title = "Histogram for humidity", x = 'Value', y = 'Frequency')
})
# display histograms and summary
for (i in 1:num) {
  cat("Subsample", i, "summary statistics:\n")
  print(summary_stats[[i]])
  print(histograms[[i]])
}
## Subsample 1 summary statistics:
##    Seasons            Holiday          Rented.Bike.Count   Visibility    
##  Length:4380        Length:4380        Min.   :   0.0    Min.   :  33.0  
##  Class :character   Class :character   1st Qu.: 192.0    1st Qu.: 951.5  
##  Mode  :character   Mode  :character   Median : 490.5    Median :1681.0  
##                                        Mean   : 704.9    Mean   :1435.9  
##                                        3rd Qu.:1062.0    3rd Qu.:1999.0  
##                                        Max.   :3556.0    Max.   :2000.0  
##     Humidity        Snowfall      
##  Min.   : 0.00   Min.   :0.00000  
##  1st Qu.:42.00   1st Qu.:0.00000  
##  Median :57.00   Median :0.00000  
##  Mean   :58.12   Mean   :0.07011  
##  3rd Qu.:74.00   3rd Qu.:0.00000  
##  Max.   :98.00   Max.   :8.80000

## Subsample 2 summary statistics:
##    Seasons            Holiday          Rented.Bike.Count   Visibility  
##  Length:4380        Length:4380        Min.   :   0.0    Min.   :  27  
##  Class :character   Class :character   1st Qu.: 193.0    1st Qu.: 946  
##  Mode  :character   Mode  :character   Median : 479.5    Median :1674  
##                                        Mean   : 704.7    Mean   :1435  
##                                        3rd Qu.:1071.2    3rd Qu.:2000  
##                                        Max.   :3298.0    Max.   :2000  
##     Humidity        Snowfall      
##  Min.   : 0.00   Min.   :0.00000  
##  1st Qu.:42.00   1st Qu.:0.00000  
##  Median :57.00   Median :0.00000  
##  Mean   :58.24   Mean   :0.07023  
##  3rd Qu.:74.00   3rd Qu.:0.00000  
##  Max.   :98.00   Max.   :8.80000

## Subsample 3 summary statistics:
##    Seasons            Holiday          Rented.Bike.Count   Visibility  
##  Length:4380        Length:4380        Min.   :   0.0    Min.   :  27  
##  Class :character   Class :character   1st Qu.: 185.0    1st Qu.: 895  
##  Mode  :character   Mode  :character   Median : 479.5    Median :1662  
##                                        Mean   : 692.7    Mean   :1420  
##                                        3rd Qu.:1044.0    3rd Qu.:2000  
##                                        Max.   :3556.0    Max.   :2000  
##     Humidity        Snowfall      
##  Min.   : 0.00   Min.   :0.00000  
##  1st Qu.:43.00   1st Qu.:0.00000  
##  Median :57.00   Median :0.00000  
##  Mean   :58.52   Mean   :0.07888  
##  3rd Qu.:75.00   3rd Qu.:0.00000  
##  Max.   :98.00   Max.   :8.80000

## Subsample 4 summary statistics:
##    Seasons            Holiday          Rented.Bike.Count   Visibility  
##  Length:4380        Length:4380        Min.   :   0.0    Min.   :  53  
##  Class :character   Class :character   1st Qu.: 202.0    1st Qu.: 955  
##  Mode  :character   Mode  :character   Median : 518.0    Median :1715  
##                                        Mean   : 720.8    Mean   :1447  
##                                        3rd Qu.:1088.0    3rd Qu.:2000  
##                                        Max.   :3418.0    Max.   :2000  
##     Humidity        Snowfall      
##  Min.   : 0.00   Min.   :0.00000  
##  1st Qu.:42.00   1st Qu.:0.00000  
##  Median :57.00   Median :0.00000  
##  Mean   :57.96   Mean   :0.07048  
##  3rd Qu.:74.00   3rd Qu.:0.00000  
##  Max.   :98.00   Max.   :6.00000

## Subsample 5 summary statistics:
##    Seasons            Holiday          Rented.Bike.Count   Visibility    
##  Length:4380        Length:4380        Min.   :   0.0    Min.   :  33.0  
##  Class :character   Class :character   1st Qu.: 193.0    1st Qu.: 954.2  
##  Mode  :character   Mode  :character   Median : 499.0    Median :1700.0  
##                                        Mean   : 704.4    Mean   :1444.2  
##                                        3rd Qu.:1058.0    3rd Qu.:2000.0  
##                                        Max.   :3404.0    Max.   :2000.0  
##     Humidity        Snowfall      
##  Min.   : 0.00   Min.   :0.00000  
##  1st Qu.:43.00   1st Qu.:0.00000  
##  Median :57.00   Median :0.00000  
##  Mean   :58.23   Mean   :0.07781  
##  3rd Qu.:74.00   3rd Qu.:0.00000  
##  Max.   :98.00   Max.   :7.10000

## Subsample 6 summary statistics:
##    Seasons            Holiday          Rented.Bike.Count   Visibility  
##  Length:4380        Length:4380        Min.   :   0.0    Min.   :  27  
##  Class :character   Class :character   1st Qu.: 189.8    1st Qu.: 906  
##  Mode  :character   Mode  :character   Median : 489.5    Median :1686  
##                                        Mean   : 698.4    Mean   :1424  
##                                        3rd Qu.:1059.0    3rd Qu.:2000  
##                                        Max.   :3404.0    Max.   :2000  
##     Humidity        Snowfall      
##  Min.   : 0.00   Min.   :0.00000  
##  1st Qu.:43.00   1st Qu.:0.00000  
##  Median :58.00   Median :0.00000  
##  Mean   :58.69   Mean   :0.08171  
##  3rd Qu.:75.00   3rd Qu.:0.00000  
##  Max.   :98.00   Max.   :5.00000

Anomalies and consistency

means <- lapply(subsample_list, function(subsample) {
  mean(subsample$Humidity)})
print(means)
## [[1]]
## [1] 58.11849
## 
## [[2]]
## [1] 58.23653
## 
## [[3]]
## [1] 58.51553
## 
## [[4]]
## [1] 57.96096
## 
## [[5]]
## [1] 58.23333
## 
## [[6]]
## [1] 58.68721
sds <- lapply(subsample_list, function(subsample) {
  sd(subsample$Visibility)})
print(sds)
## [[1]]
## [1] 607.5581
## 
## [[2]]
## [1] 610.8125
## 
## [[3]]
## [1] 614.9329
## 
## [[4]]
## [1] 602.7616
## 
## [[5]]
## [1] 602.0797
## 
## [[6]]
## [1] 615.1279
for (i in 1:num) {
  cat("Subsample", i, "Humidity", means[[i]],"\n")
  cat("Subsample", i, "Visibility", sds[[i]],"\n")
}
## Subsample 1 Humidity 58.11849 
## Subsample 1 Visibility 607.5581 
## Subsample 2 Humidity 58.23653 
## Subsample 2 Visibility 610.8125 
## Subsample 3 Humidity 58.51553 
## Subsample 3 Visibility 614.9329 
## Subsample 4 Humidity 57.96096 
## Subsample 4 Visibility 602.7616 
## Subsample 5 Humidity 58.23333 
## Subsample 5 Visibility 602.0797 
## Subsample 6 Humidity 58.68721 
## Subsample 6 Visibility 615.1279

ANALYSIS: 1. As we divided the dataset into subsamples and applied various aggregations, we can observe there is no significant variation between subsamples. 2. As there is no huge deviation, possibility of anamoly occurence is less. That is what we can observe from one of the column “Humidity”. 3. I can conclude there is prominent consistency among the columns across all the subsamples.