Data Preparation

Data yang digunakan pada exercise ini adalah data penjemputan (pickup) penumpang oleh taxi di suatu kota. Pendataan dilakukan dengan mencatat waktu pickup dalam satu hari selama bulan Januari 2015. Data terdiri dari tiga kolom yaitu: (1) waktu dalam satu hari (menit), (2) urutan hari (1 = Senin s.d 7 = Minggu), dan (3) jumlah pickup.

Load the Data

taxi<-read.csv("https://raw.githubusercontent.com/greenore/ac209b-coursework/master/hw1/data/dataset_1_train.txt")
head(taxi)
##   TimeMin DayOfWeek PickupCount
## 1      57         5         111
## 2      68         5          95
## 3     182         5          95
## 4     298         5          75
## 5     363         5          35
## 6     395         5          30

Preparing the Data

## Transform day numbers to characters
weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday","Sunday")

taxi$DayOfWeek <- factor(taxi$DayOfWeek, labels=weekdays,ordered=TRUE)

rm(weekdays)

## Transform to time in hours
taxi$TimeHours <- round((taxi$TimeMin / 60), 0)
head(taxi)
##   TimeMin DayOfWeek PickupCount TimeHours
## 1      57    Friday         111         1
## 2      68    Friday          95         1
## 3     182    Friday          95         3
## 4     298    Friday          75         5
## 5     363    Friday          35         6
## 6     395    Friday          30         7

Data Visualization

Visualizing Pickup Count on Each Day

Silahkan lakukan visualisasi data. Anda dapat menggunakan code berikut, maupun menggunakan code Anda sendiri. Diskusikan dengan rekan Anda: seperti apa sebaran PickupCount dari hari ke hari selama 1 minggu?

Nilai median pada setiap harinya tidak terlalu jauh

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
ggplot(taxi, aes(DayOfWeek, PickupCount)) + 
  labs(title="Plot I: Boxplot",
       subtitle="Pickup count vs. day of the week") +
  geom_boxplot(color="black") + 
  xlab("Weekday") + 
  ylab("Pickup count") +
  theme_bw()

Visualizing Pickup Count Throughout the Day

Selanjutnya, silahkan lakukan eksplorasi, dapat dengan memanfaatkan code berikut, serta diskusikan dengan rekan Anda: Seperti apa sebaran PickupCount dari waktu ke waktu (pagi hingga malam)?

ggplot(taxi, aes(TimeMin, PickupCount)) + 
  geom_point(stroke=0, alpha=0.8) +
  theme_bw() +
  labs(title="Plot II: Scatterplot",
       subtitle="Pickup count vs. time of the day") +
  scale_x_continuous(breaks=c(0, 360, 720, 1080, 1440),
                     labels=c("00:00", "06:00", "12:00", "18:00", "24:00")) + 
  ylab(label="Pickup Count") + 
  xlab("Time of the day")

Smoothing Spline

Lakukan pemulusan spline bersama rekan Anda, dan interpretasikan pola yang Anda peroleh.

fit=smooth.spline (taxi$TimeHours , taxi$PickupCount ,lambda =0.5)
plot(taxi$TimeHours , taxi$PickupCount)
lines(fit, col="red")

fit=smooth.spline (taxi$TimeHours , taxi$PickupCount ,lambda =0.4)
plot(taxi$TimeHours , taxi$PickupCount)
lines(fit, col="red")

fit=smooth.spline (taxi$TimeHours , taxi$PickupCount ,lambda =0.3)
plot(taxi$TimeHours , taxi$PickupCount)
lines(fit, col="red")

fit=smooth.spline (taxi$TimeHours , taxi$PickupCount ,lambda =0.2)
plot(taxi$TimeHours , taxi$PickupCount)
lines(fit, col="red")

fit=smooth.spline (taxi$TimeHours , taxi$PickupCount ,lambda =0.1)
plot(taxi$TimeHours , taxi$PickupCount)
lines(fit, col="red")

LOESS

Lakukan pendekatan LOESS bersama rekan Anda pada data ini, dan interpretasikan pola yang Anda peroleh.