You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas.

In some cases, a incomplete code is given to guide you in the right direction. However, you will need to fill in the blanks in order to run the code block. Make sure to fill all the blanks, or comment them out, before you knit your R markdown file. Otherwise, it will return knitting errors.

Rio Olympics data set revisited

Load data

rio = read.csv("https://raw.githubusercontent.com/BUSN32100/data_files/master/rio.csv")

1a. Using count() check which country brought the most athletes, and how many was this? What are the 5 most represented countries? Do the same for the men and women separately.

rio %>% 
  count(nationality) %>% 
  arrange(desc(n))

##     nationality   n
## 1           USA 567
## 2           BRA 485
## 3           GER 441
## 4           AUS 431
## 5           FRA 410
## 6           CHN 404
## 7           GBR 374
## 8           JPN 346
## 9           CAN 321
## 10          ESP 313
## 11          ITA 312
## 12          RUS 286
## 13          NED 249
## 14          POL 242
## 15          ARG 223
## 16          KOR 213
## 17          NZL 208
## 18          UKR 205
## 19          SWE 164
## 20          COL 154
## 21          HUN 154
## 22          RSA 146
## 23          DEN 128
## 24          MEX 126
## 25          BLR 124
## 26          CUB 123
## 27          IND 123
## 28          EGY 122
## 29          BEL 108
## 30          CZE 104
## 31          SUI 104
## 32          KAZ 103
## 33          SRB 103
## 34          TUR 103
## 35          ROU  98
## 36          POR  95
## 37          GRE  93
## 38          CRO  88
## 39          VEN  88
## 40          IRL  80
## 41          KEN  80
## 42          NGR  78
## 43          AUT  71
## 44          UZB  70
## 45          ALG  68
## 46          LTU  67
## 47          IRI  64
## 48          SLO  63
## 49          NOR  62
## 50          TUN  61
## 51          JAM  57
## 52          AZE  56
## 53          TPE  56
## 54          FIJ  54
## 55          FIN  54
## 56          THA  54
## 57          SVK  51
## 58          BUL  50
## 59          MAR  49
## 60          ISR  47
## 61          EST  46
## 62          MGL  43
## 63          CHI  42
## 64          GEO  40
## 65          PUR  40
## 66          QAT  39
## 67          ECU  38
## 68          ETH  38
## 69          HKG  38
## 70          MNE  35
## 71          ZIM  35
## 72          BRN  34
## 73          ARM  32
## 74          LAT  32
## 75          MAS  32
## 76          TTO  32
## 77          PRK  31
## 78          BAH  30
## 79          HON  30
## 80          DOM  29
## 81          PER  29
## 82          INA  28
## 83          ANG  26
## 84          IRQ  26
## 85          SIN  25
## 86          CMR  24
## 87          MDA  23
## 88          VIE  23
## 89          SEN  22
## 90          GUA  21
## 91          UGA  21
## 92          KGZ  19
## 93          URU  17
## 94          CYP  16
## 95          GHA  16
## 96          PHI  13
## 97          UAE  13
## 98          BOL  12
## 99          BOT  12
## 100         CIV  12
## 101         ERI  12
## 102         BAR  11
## 103         BIH  11
## 104         CRC  11
## 105         KSA  11
## 106         MRI  11
## 107         PAR  11
## 108         CGO  10
## 109         HAI  10
## 110         LUX  10
## 111         NAM  10
## 112         PAN  10
## 113         ROT  10
## 114         SEY  10
## 115         ANT   9
## 116         BDI   9
## 117         COK   9
## 118         IOA   9
## 119         LIB   9
## 120         SRI   9
## 121         TKM   9
## 122         BER   8
## 123         ESA   8
## 124         ISL   8
## 125         JOR   8
## 126         KOS   8
## 127         LES   8
## 128         PNG   8
## 129         SAM   8
## 130         ARU   7
## 131         BAN   7
## 132         DJI   7
## 133         GRN   7
## 134         ISV   7
## 135         LBA   7
## 136         MLT   7
## 137         MYA   7
## 138         NEP   7
## 139         PAK   7
## 140         RWA   7
## 141         SKN   7
## 142         SYR   7
## 143         TAN   7
## 144         TGA   7
## 145         TJK   7
## 146         ZAM   7
## 147         ALB   6
## 148         BEN   6
## 149         CAF   6
## 150         CAM   6
## 151         GAB   6
## 152         GUY   6
## 153         LAO   6
## 154         MAD   6
## 155         MKD   6
## 156         MLI   6
## 157         MOZ   6
## 158         NIG   6
## 159         PLE   6
## 160         SUD   6
## 161         SUR   6
## 162         AND   5
## 163         BUR   5
## 164         CAY   5
## 165         CPV   5
## 166         FSM   5
## 167         GBS   5
## 168         GUI   5
## 169         GUM   5
## 170         LCA   5
## 171         MAW   5
## 172         MHL   5
## 173         NCA   5
## 174         PLW   5
## 175         SMR   5
## 176         TOG   5
## 177         ASA   4
## 178         COD   4
## 179         COM   4
## 180         GAM   4
## 181         IVB   4
## 182         MDV   4
## 183         OMA   4
## 184         SLE   4
## 185         VAN   4
## 186         VIN   4
## 187         AFG   3
## 188         BIZ   3
## 189         BRU   3
## 190         KIR   3
## 191         LIE   3
## 192         MON   3
## 193         SOL   3
## 194         SSD   3
## 195         STP   3
## 196         TLS   3
## 197         YEM   3
## 198         BHU   2
## 199         CHA   2
## 200         DMA   2
## 201         GEQ   2
## 202         LBR   2
## 203         MTN   2
## 204         NRU   2
## 205         SOM   2
## 206         SWZ   2
## 207         TUV   1

1b. Use summarise and sum , check how many medals of each type—gold, silver, bronze—were awarded at this Olympics?

⊕Hint We talk about using group_by and summarise together to create a summary table in lecture. If you only use summarise function (without using group_by), you can create a dataframe with one row to summarise the data.)

rio %>% 
  summarize(sum(gold),sum(bronze),sum(silver))

##   sum(gold) sum(bronze) sum(silver)
## 1       666         704         655

1c. Use mutate to create a column called total which adds the number of gold, silver, and bronze medals for each athlete, and add this column to the dataframe rio. Who are the top 10 athletes with the most number of medals and how many was this?

  rio=rio %>% 
  rowwise() %>% 
  mutate(total=sum(c(gold,bronze,silver))) %>% 
  ungroup()

1d. Create a new numeric variable called age, equal to 2016 - (the year of birth). Use group_by, summarise to find the minimum and maximum age of each sports, and order results by the ascending order of minimum age

rio %>% 
  mutate(age=2016-year_of_birth)

## # A tibble: 11,538 × 14
##           id name  natio…¹ sex   date_…² height weight sport  gold silver bronze
##        <int> <chr> <chr>   <chr> <chr>    <dbl>  <int> <chr> <int>  <int>  <int>
##  1 736041664 A Je… ESP     male  1969-1…   1.72     64 athl…     0      0      0
##  2 532037425 A La… KOR     fema… 1986-0…   1.68     56 fenc…     0      0      0
##  3 435962603 Aaro… CAN     male  1992-0…   1.98     79 athl…     0      0      1
##  4 521041435 Aaro… MDA     male  1991-0…   1.83     80 taek…     0      0      0
##  5  33922579 Aaro… NZL     male  1990-1…   1.81     71 cycl…     0      0      0
##  6 173071782 Aaro… AUS     male  1990-0…   1.8      67 tria…     0      0      0
##  7 266237702 Aaro… USA     male  1993-0…   2.05     98 voll…     0      0      1
##  8 382571888 Aaro… AUS     male  1991-0…   1.93    100 aqua…     0      0      0
##  9  87689776 Aaur… ESP     fema… 1988-1…   1.8      62 athl…     0      0      0
## 10 997877719 Abab… ETH     fema… 1991-0…   1.65     54 athl…     0      0      0
## # … with 11,528 more rows, 3 more variables: year_of_birth <int>, total <int>,
## #   age <dbl>, and abbreviated variable names ¹nationality, ²date_of_birth

rio$age = 2016 - rio$year_of_birth
rio %>% 
  group_by(sport) %>% 
  summarise(min = min(age),max = max(age)) %>% 
  arrange(min)

## # A tibble: 28 × 3
##    sport               min   max
##    <chr>             <dbl> <dbl>
##  1 aquatics             14    41
##  2 table tennis         15    54
##  3 athletics            16    47
##  4 fencing              16    42
##  5 football             16    41
##  6 gymnastics           16    41
##  7 shooting             16    56
##  8 weightlifting        16    41
##  9 archery              17    44
## 10 modern pentathlon    17    37
## # … with 18 more rows

1e. Create a new summary table call sports, with a column called n_participants which contains the number of participants in each sport, a column called n_gold which contains the number of gold medals in each sports. Then calculate the ratio of the number of gold medals to participants for each sport. Which sport has the highest ratio? Which has the lowest?

⊕Hint n() counts the total number of rows in each group, slice(c(1,n())) will return the first and last row

rio %>% 
  group_by(sport) %>% 
  summarise(n_participants=n(),n_gold=sum(gold),gold_ratio=n_gold/n_participants) %>% 
  arrange(gold_ratio) %>% 
  
  slice(c(which.min(gold_ratio),which.max(gold_ratio)))

## # A tibble: 2 × 4
##   sport      n_participants n_gold gold_ratio
##   <chr>               <int>  <int>      <dbl>
## 1 golf                  120      2     0.0167
## 2 gymnastics            324     30     0.0926

 #The sport with the lowest gold ratio is golf, and the sport with the highest gymnastics

Bike rentals in DC

Data

bike = read_csv("https://raw.githubusercontent.com/BUSN32100/data_files/master/bikeshare-day.csv")

The data include daily bike rental counts (by members and casual users) of Capital Bikeshare in Washington, DC in 2011 and 2012 as well as weather information on these days.

⊕Source: UCI Machine Learning Repository - Bike Sharing Dataset The original data sources are http://capitalbikeshare.com/system-data and http://www.freemeteo.com.

The codebook is below:

Variable name	Description
`instant`	record index
`dteday`	date
`season`	season (1:winter, 2:spring, 3:summer, 4:fall)
`yr`	year (0: 2011, 1:2012)
`mnth`	month (1 to 12)
`holiday`	weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)
`weekday`	day of the week
`workingday`	if day is neither weekend nor holiday is 1, otherwise is 0.
`weathersit`	1: Clear, Few clouds, Partly cloudy, Partly cloudy
	2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
	3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
	4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
`temp`	Normalized temperature in Celsius. The values are divided by 41 (max)
`atemp`	Normalized feeling temperature in Celsius. The values are divided by 50 (max)
`hum`	Normalized humidity. The values are divided by 100 (max)
`windspeed`	Normalized wind speed. The values are divided by 67 (max)
`casual`	Count of casual users
`registered`	Count of registered users
`cnt`	Count of total rental bikes including both casual and registered

Data wrangling

2a. Find the top three dates that had the highest total rental. In addition to number of total rentals also display the dteday, weekday, and holiday. What about three dates with the highest casual rentals?

bike$dteday=as.Date(bike$dteday)

bike %>% 
  group_by(dteday,weekday,holiday) %>% 
  summarise(total_rents=sum(cnt)) %>% 
  arrange(desc(total_rents))

## `summarise()` has grouped output by 'dteday', 'weekday'. You can override using
## the `.groups` argument.

## # A tibble: 731 × 4
## # Groups:   dteday, weekday [731]
##    dteday     weekday holiday total_rents
##    <date>       <dbl>   <dbl>       <dbl>
##  1 2012-09-15       6       0        8714
##  2 2012-09-29       6       0        8555
##  3 2012-09-22       6       0        8395
##  4 2012-03-23       5       0        8362
##  5 2012-05-19       6       0        8294
##  6 2012-09-09       0       0        8227
##  7 2012-07-25       3       0        8173
##  8 2012-09-21       5       0        8167
##  9 2012-10-05       5       0        8156
## 10 2012-06-02       6       0        8120
## # … with 721 more rows

2b. Find the top three dates that had the highest total rental during the spring season. In addition to number of total rentals also display the dteday, weekday, and holiday.

bike %>% 
 filter(season==2) %>%
  group_by(dteday,weekday,holiday) %>% 
  summarise(total_rents=sum(cnt)) %>% 
  arrange(desc(total_rents)) %>% 
  head(3)

## `summarise()` has grouped output by 'dteday', 'weekday'. You can override using
## the `.groups` argument.

## # A tibble: 3 × 4
## # Groups:   dteday, weekday [3]
##   dteday     weekday holiday total_rents
##   <date>       <dbl>   <dbl>       <dbl>
## 1 2012-03-23       5       0        8362
## 2 2012-05-19       6       0        8294
## 3 2012-06-02       6       0        8120

2c. Find the holidays in 2011 and order them by total rentals. In addition to total rental and date, also display the weather and temperatures.

bike %>% 
  filter(yr==0 & holiday==1) %>%
  group_by(dteday) %>% 
  summarise(dteday,total_rents=sum(cnt),weathersit,temp) %>% 
  arrange(desc(total_rents))

## # A tibble: 10 × 4
##    dteday     total_rents weathersit  temp
##    <date>           <dbl>      <dbl> <dbl>
##  1 2011-07-04        6043          2 0.727
##  2 2011-10-10        5117          1 0.571
##  3 2011-05-30        4098          1 0.733
##  4 2011-11-11        3368          1 0.324
##  5 2011-09-05        3351          2 0.673
##  6 2011-04-15        3126          1 0.447
##  7 2011-11-24        1495          1 0.373
##  8 2011-12-26        1317          1 0.322
##  9 2011-02-21        1107          2 0.303
## 10 2011-01-17        1000          2 0.176

2d. Recode the season variable to be a factor with meaningful level names as outlined in the codebook using case_when, and save it back to the bike variable (it is important to rewrite the season column in the framework for later questions).

bike %>% 
  mutate(season=case_when(season==1~"winter",season==2~"spring",season==3~"summer",season==4~"fall"))

## # A tibble: 731 × 16
##    instant dteday     season    yr  mnth holiday weekday working…¹ weath…²  temp
##      <dbl> <date>     <chr>  <dbl> <dbl>   <dbl>   <dbl>     <dbl>   <dbl> <dbl>
##  1       1 2011-01-01 winter     0     1       0       6         0       2 0.344
##  2       2 2011-01-02 winter     0     1       0       0         0       2 0.363
##  3       3 2011-01-03 winter     0     1       0       1         1       1 0.196
##  4       4 2011-01-04 winter     0     1       0       2         1       1 0.2  
##  5       5 2011-01-05 winter     0     1       0       3         1       1 0.227
##  6       6 2011-01-06 winter     0     1       0       4         1       1 0.204
##  7       7 2011-01-07 winter     0     1       0       5         1       2 0.197
##  8       8 2011-01-08 winter     0     1       0       6         0       2 0.165
##  9       9 2011-01-09 winter     0     1       0       0         0       1 0.138
## 10      10 2011-01-10 winter     0     1       0       1         1       1 0.151
## # … with 721 more rows, 6 more variables: atemp <dbl>, hum <dbl>,
## #   windspeed <dbl>, casual <dbl>, registered <dbl>, cnt <dbl>, and abbreviated
## #   variable names ¹workingday, ²weathersit

2e. Create new columns temp_raw, atemp_raw, hum_raw, windspeed_raw and store the raw values of temperature, feeling temperature, humidity and windspeed, respectively, as specified in the codebook. Update the dataframe and store the new columns in the dataframe.

bike$temp_raw=bike$temp*41
bike$atemp_raw=bike$atemp*50
bike$hum_raw=bike$hum*100
bike$windspeed_raw=bike$windspeed*67

head(bike)

## # A tibble: 6 × 20
##   instant dteday     season    yr  mnth holiday weekday workingday weath…¹  temp
##     <dbl> <date>      <dbl> <dbl> <dbl>   <dbl>   <dbl>      <dbl>   <dbl> <dbl>
## 1       1 2011-01-01      1     0     1       0       6          0       2 0.344
## 2       2 2011-01-02      1     0     1       0       0          0       2 0.363
## 3       3 2011-01-03      1     0     1       0       1          1       1 0.196
## 4       4 2011-01-04      1     0     1       0       2          1       1 0.2  
## 5       5 2011-01-05      1     0     1       0       3          1       1 0.227
## 6       6 2011-01-06      1     0     1       0       4          1       1 0.204
## # … with 10 more variables: atemp <dbl>, hum <dbl>, windspeed <dbl>,
## #   casual <dbl>, registered <dbl>, cnt <dbl>, temp_raw <dbl>, atemp_raw <dbl>,
## #   hum_raw <dbl>, windspeed_raw <dbl>, and abbreviated variable name
## #   ¹weathersit

2f. Check that the sum of casual and registered adds up to cnt for each record.

all(bike$casual+bike$registered==bike$cnt)

## [1] TRUE

Modelling

3a. Set random seed as 123 with the set.seed function. Using the sample.split function (in the caTools package) to randomly choose 70% data as the train data (stored in variable train) and the rest as the test data (stored in variable test).

#install.packages("caTools")
set.seed(123)
library(caTools)
samples=sample.split(bike$cnt, SplitRatio=0.7) 
train=subset(bike, samples==T)
test=subset(bike, samples==F)

3b. Fit a linear model predicting total daily bike rentals from daily temperature (i.e., temp_raw) on the train data. Write the linear model, interpret the slope and the intercept in context of the data. What is the train R square for this model?

model=lm(cnt~temp_raw,data=train)
summary(model)

## 
## Call:
## lm(formula = cnt ~ temp_raw, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4556.5 -1140.5   -41.7  1064.3  3670.9 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1322.697    188.102   7.032  6.6e-12 ***
## temp_raw     155.982      8.683  17.964  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1470 on 509 degrees of freedom
## Multiple R-squared:  0.388,  Adjusted R-squared:  0.3868 
## F-statistic: 322.7 on 1 and 509 DF,  p-value: < 2.2e-16

#cnt=1322+156*temp_raw
#Intercept 1322 and coefficient 156, both positives and significants. Rentals increase with higher temperature.
#R squared 0.388

3c. Fit another linear model predicting total daily bike rentals from daily feeling temperature (i.e., the variable atemp_raw you created before) on the train data. Write the linear model, interpret the slope and the intercept in context of the data. What is the train R square for this model?

model1=lm(cnt~atemp_raw, data=train)
summary(model1)

## 
## Call:
## lm(formula = cnt ~ atemp_raw, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4541.3 -1087.3   -94.8  1068.0  4326.4 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1073.832    200.556   5.354  1.3e-07 ***
## atemp_raw    144.204      8.002  18.022  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1468 on 509 degrees of freedom
## Multiple R-squared:  0.3895, Adjusted R-squared:  0.3883 
## F-statistic: 324.8 on 1 and 509 DF,  p-value: < 2.2e-16

#cnt=1074+144*atemp_raw
#Coefficient and intercept significants, daily bike rentals increase with the feeling temperature
#Rsquared =0.3895

3d. Fit another linear model predicting total daily bike rentals from season, daily feeling temperature and the interaction between season and daily feeling temperature on the train data. Write the linear model, interpret the coefficients associated with different seasons, and determine and interpret the \(R^2\).

model2=lm(cnt~season+atemp_raw+season:atemp_raw, data=train)
summary(model2)

## 
## Call:
## lm(formula = cnt ~ season + atemp_raw + season:atemp_raw, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4640.0 -1022.4  -161.2  1088.4  3712.8 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1265.226    445.872  -2.838  0.00473 ** 
## season            1136.960    183.283   6.203 1.15e-09 ***
## atemp_raw          237.221     23.344  10.162  < 2e-16 ***
## season:atemp_raw   -43.609      9.029  -4.830 1.81e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1405 on 507 degrees of freedom
## Multiple R-squared:  0.4433, Adjusted R-squared:   0.44 
## F-statistic: 134.6 on 3 and 507 DF,  p-value: < 2.2e-16

#cnt=-1265+1136*season+237*atem_raw-43*season*atemp_raw
#All coefficients are significant. The intercept is negative as well as the coefficient of the interaction between temperature and season (the colder seasons have a higher number so they become more negative as it gets colder) consistent with when is colder number of rentals decrease.

3e. Fit a model predicting total daily bike rentals on the train data from season, year, whether the day is holiday or not, whether the day is a workingday or not, the weather category, temperature, feeling temperature, humidity and windspeed. What is the train R square for this model.

model3=lm(cnt~season+yr+holiday+workingday+weathersit+temp_raw+atemp_raw+hum_raw+windspeed_raw, data=train)
summary(model3)

## 
## Call:
## lm(formula = cnt ~ season + yr + holiday + workingday + weathersit + 
##     temp_raw + atemp_raw + hum_raw + windspeed_raw, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4044.2  -441.0    61.4   518.3  2997.0 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2072.033    277.651   7.463 3.78e-13 ***
## season         329.254     37.765   8.719  < 2e-16 ***
## yr            1993.146     77.782  25.625  < 2e-16 ***
## holiday       -806.922    235.856  -3.421 0.000674 ***
## workingday      99.275     87.061   1.140 0.254711    
## weathersit    -513.606     96.085  -5.345 1.37e-07 ***
## temp_raw        83.923     36.127   2.323 0.020577 *  
## atemp_raw       46.385     33.582   1.381 0.167820    
## hum_raw        -14.977      3.808  -3.933 9.55e-05 ***
## windspeed_raw  -45.399      8.049  -5.641 2.84e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 865.4 on 501 degrees of freedom
## Multiple R-squared:  0.7912, Adjusted R-squared:  0.7874 
## F-statistic: 210.9 on 9 and 501 DF,  p-value: < 2.2e-16

#Rsquared is 0.7912

3f.. Based on the model you found in the previous questions, discuss what makes for a good day to bike in DC (as measured by rental bikes being more in demand).

Your answer here:

A good day to ride a bike could be a cool but dry summer working day in 2012 with no wind, . #A good day would be to ride in the summer in 2012 on a working day, avoiding holidays, with a Clear, Few clouds, Partly cloudy, Partly cloudy day with a high temperature feeling and low humidity and windspeed.

3g. Compute the R square on the testing data for the models you created in 3e.

pred=predict(model3,newdata=test) 
r_sqr_pred=1-((sum((pred-test$cnt)^2))/(sum((mean(test$cnt)-test$cnt)^2)))
r_sqr_pred

## [1] 0.791317

#Rsquared on the testing data 0.791317

Exploratory data analysis (Optional)

4a. Recreate the following visualization (see the instruction html), and interpret it in context of the data. ⊕Hint: You will need to use one of the variables you created above.

(bike=bike %>%
  mutate(atemp_raw = atemp * 50)) %>%
  ggplot(mapping=aes(x=dteday, y=cnt, color=temp_raw)) +
  geom_point(alpha=0.7) +
  labs(
    title="Bike Rentals vs Temperature and Date",
    subtitle="Bike rentals increase with better weather",
    x="Date",
    y="Bike Rentals",
    color="Temperature"
  ) +
  theme_minimal()

4b. Create a visualization displaying the relationship between bike rentals and season using a boxplot, where x-axis is the four seasons. Interpret the plot in context of the data.

Week 4 Group Lab