My First assignment forsubsetting data

To Do! Data Tuberculosis

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/cd971423-e02e-415d-85f5-77c0118d242e/download/2013.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/76ed6414-c5cb-4c87-9d48-4b4bc1185bdb/download/2014.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/3ca80408-74cd-4403-8f97-d11e402ddb3d/download/2015.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/40f38fee-e376-45dc-af96-067c588e9e4d/download/2016.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/66a5e3d0-55cd-4096-a701-e134b91b104d/download/2017.xlsx

Question

2013

  1. Import the dataset into R environment
  2. Make sure the variable names are Minggu and states from Johor to WP Labuan
  3. Remove Variable name Malaysia
  4. Rename WP Kuala Lumpur to ‘Kuala Lumpur’
  5. Rename WP Labuan to ‘Labuan’
  6. Remove Observation Total (entire row).
  7. Add another column, as Year (make sure all observation in this year are same, eg: 2013).
  8. Create another object for this dataset.
  9. group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)
  10. Do the same process to year 2014, 2015, 2016, 20117 and 2018.
  11. combine these dataset to one dataset. (suppose you should have 2 dataset (1 by week, another 1 is by quarter).
  12. from quarter dataset, select only quarter 1 and quarter 3 for Perak.
  13. from weekly dataset, select only Perak and Perlis.
  14. from quarter dataset, find the total for the Peninsular Malaysia and East Malaysia based on quarter and year.
  15. from weekly dataset, find the total for the Peninsular Malaysia and East Malaysia based on weekly and year.
  16. from quarter dataset, find the total for Peninsular Malaysia and East Malaysia based on year.
  17. from quarter dataset, find the total for Malaysia based on Quarter and year
  18. From quarter dataset, find the total for Malaysia based on year.

Answer

  1. Import the dataset into R environment #2013
library(httr)
library(readxl)
url2013 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/cd971423-e02e-415d-85f5-77c0118d242e/download/2013.xlsx"
GET(url2013, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/cd971423-e02e-415d-85f5-77c0118d242e/download/2013.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.3 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac13ec20a5.xlsx
data2013 <- read_excel(tf)
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems
#see dimension of data2013
dim(data2013)
## [1] 55 17
  1. Make sure the variable names are Minggu and states from Johor to WP Labuan
names(data2013) <- lapply(data2013[2,], as.character)
## Warning: Must use a character vector as names.
## This warning is displayed once per session.
#remove row 1 and 2
data2013 <- data2013[-c(1:2), ]
  1. Remove Variable name Malaysia
data2013$Malaysia <- NULL
  1. Rename WP Kuala Lumpur to ‘Kuala Lumpur’
  2. Rename WP Labuan to ‘Labuan’
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data2013 <- rename(data2013, 
                   "Kuala Lumpur" = "WP Kuala Lumpur",
                   "Labuan" = "WP Labuan")
  1. Remove Observation Total (entire row).
data2013 <- data2013[-53, ]
  1. Add another column, as Year (make sure all observation in this year are same, eg: 2013)
data2013$Year <- 2013
  1. Create another object for this dataset.
# Order Year to be first variable in dataset
data2013 <- select(data2013, Year, everything())
# Change variable type to numeric
data2013[1:17] <- sapply(data2013[1:17],
                         as.numeric)
  1. group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)
quarter2013 <- 
  data2013 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))
  1. Do the same process to year 2014, 2015, 2016, 20117 and 2018. 2014
  2. Import the dataset into R environment
library(httr)
library(readxl)
url2014 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/76ed6414-c5cb-4c87-9d48-4b4bc1185bdb/download/2014.xlsx"
GET(url2014, write_disk(tf <- tempfile(fileext = ".xlxs")))
## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/76ed6414-c5cb-4c87-9d48-4b4bc1185bdb/download/2014.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.4 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac47881193.xlxs
data2014 <- read_excel(tf)
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems
#see dimension of data2014
dim(data2014)
## [1] 56 17
  1. Make sure the variable names are Minggu and states from Johor to WP Labuan
names(data2014) <- lapply(data2014[2,], as.character)
#remove row 1 and 2
data2014 <- data2014[-c(1:2), ]
  1. Remove Variable name Malaysia
data2014$Malaysia <- NULL
  1. Rename WP Kuala Lumpur to ‘Kuala Lumpur’
  2. Rename WP Labuan to ‘Labuan’
library(dplyr)
data2014 <- rename(data2014, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)
  1. Remove Observation Total (entire row).
data2014 <- data2014[-54, ]
  1. Add another column, as Year (make sure all observation in this year are same, eg: 2013)
data2014$Year <- 2014
  1. Create another object for this dataset.
# Order Year to be first variable in dataset
data2014 <- select(data2014, Year, everything())
# Change variable type to numeric
data2014[1:17] <- sapply(data2014[1:17],
                         as.numeric)
  1. group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)
quarter2014 <- 
  data2014 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=14, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 27, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 40, "Quarter 3", "Quarter 4"))))

2015 1) Import the dataset into R environment

library(httr)
library(readxl)
url2015 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/3ca80408-74cd-4403-8f97-d11e402ddb3d/download/2015.xlsx"
GET(url2015, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/3ca80408-74cd-4403-8f97-d11e402ddb3d/download/2015.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.3 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac597f30c2.xlsx
data2015 <- read_excel(tf)
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems
#see dimension of data2015
dim(data2015)
## [1] 55 17
  1. Make sure the variable names are Minggu and states from Johor to WP Labuan
names(data2015) <- lapply(data2015[2,], as.character)
#remove row 1 and 2
data2015 <- data2015[-c(1:2), ]
  1. Remove Variable name Malaysia
data2015$Malaysia <- NULL
  1. Rename WP Kuala Lumpur to ‘Kuala Lumpur’
  2. Rename WP Labuan to ‘Labuan’
library(dplyr)
data2015 <- rename(data2015, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)
  1. Remove Observation Total (entire row).
data2015 <- data2015[-53, ]
  1. Add another column, as Year (make sure all observation in this year are same, eg: 2013)
data2015$Year <- 2015
  1. Create another object for this dataset.
# Order Year to be first variable in dataset
data2015 <- select(data2015, Year, everything())
# Change variable type to numeric
data2015[1:17] <- sapply(data2015[1:17],
                         as.numeric)
  1. group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)
quarter2015 <- 
  data2015 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))

2016 1) Import the dataset into R environment

library(httr)
library(readxl)
url2016 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/40f38fee-e376-45dc-af96-067c588e9e4d/download/2016.xlsx"
GET(url2016, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/40f38fee-e376-45dc-af96-067c588e9e4d/download/2016.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.2 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac689a22db.xlsx
data2016 <- read_excel(tf)
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems
#see dimension of data2016
dim(data2016)
## [1] 55 17
  1. Make sure the variable names are Minggu and states from Johor to WP Labuan
names(data2016) <- lapply(data2016[2,], as.character)
#remove row 1 and 2
data2016 <- data2016[-c(1:2), ]
  1. Remove Variable name Malaysia
data2016$Malaysia <- NULL
  1. Rename WP Kuala Lumpur to ‘Kuala Lumpur’
  2. Rename WP Labuan to ‘Labuan’
library(dplyr)
data2016 <- rename(data2016, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)
  1. Remove Observation Total (entire row).
data2016 <- data2016[-53, ]
  1. Add another column, as Year (make sure all observation in this year are same, eg: 2013)
data2016$Year <- 2016
  1. Create another object for this dataset.
# Order Year to be first variable in dataset
data2016 <- select(data2016, Year, everything())
# Change variable type to numeric
data2016[1:17] <- sapply(data2016[1:17],
                         as.numeric)
  1. group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)
quarter2016 <- 
  data2016 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))

2017 1) Import the dataset into R environment

library(httr)
library(readxl)
url2017 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/66a5e3d0-55cd-4096-a701-e134b91b104d/download/2017.xlsx"
GET(url2017, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/66a5e3d0-55cd-4096-a701-e134b91b104d/download/2017.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.4 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac7847587e.xlsx
data2017 <- read_excel(tf)
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems
#see dimension of data2017
dim(data2017)
## [1] 55 17
  1. Make sure the variable names are Minggu and states from Johor to WP Labuan
names(data2017) <- lapply(data2017[2,], as.character)
#remove row 1 and 2
data2017 <- data2017[-c(1:2), ]
  1. Remove Variable name Malaysia
data2017$Malaysia <- NULL
  1. Rename WP Kuala Lumpur to ‘Kuala Lumpur’
  2. Rename WP Labuan to ‘Labuan’
library(dplyr)
data2017 <- rename(data2017, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)
  1. Remove Observation Total (entire row).
data2017 <- data2017[-53, ]
  1. Add another column, as Year (make sure all observation in this year are same, eg: 2013)
data2017$Year <- 2017
  1. Create another object for this dataset.
# Order Year to be first variable in dataset
data2017 <- select(data2017, Year, everything())
# Change variable type to numeric
data2017[1:17] <- sapply(data2017[1:17],
                         as.numeric)
  1. group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)
quarter2017 <- 
  data2017 %>%
  mutate(quarter2017 = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))
  1. combine these dataset to one dataset. (suppose you should have 2 dataset (1 by week, another 1 is by quarter).

  2. from quarter dataset, select only quarter 1 and quarter 3 for Perak.

  3. from weekly dataset, select only Perak and Perlis

  4. from quarter dataset, find the total for the Peninsular Malaysia and East Malaysia based on quarter and year.

  5. from weekly dataset, find the total for the Peninsular Malaysia and East Malaysia based on weekly and year.

  6. From quarter dataset, find the total for Peninsular Malaysia and East Malaysia based on year.

  7. from quarter dataset, find the total for Malaysia based on Quarter and year

  8. From quarter dataset, find the total for Malaysia based on year.