Subsetting Data

My First assignment forsubsetting data

To Do! Data Tuberculosis

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/cd971423-e02e-415d-85f5-77c0118d242e/download/2013.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/76ed6414-c5cb-4c87-9d48-4b4bc1185bdb/download/2014.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/3ca80408-74cd-4403-8f97-d11e402ddb3d/download/2015.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/40f38fee-e376-45dc-af96-067c588e9e4d/download/2016.xlsx

http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/66a5e3d0-55cd-4096-a701-e134b91b104d/download/2017.xlsx

Question

2013

Import the dataset into R environment
Make sure the variable names are Minggu and states from Johor to WP Labuan
Remove Variable name Malaysia
Rename WP Kuala Lumpur to ‘Kuala Lumpur’
Rename WP Labuan to ‘Labuan’
Remove Observation Total (entire row).
Add another column, as Year (make sure all observation in this year are same, eg: 2013).
Create another object for this dataset.
group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)
Do the same process to year 2014, 2015, 2016, 20117 and 2018.
combine these dataset to one dataset. (suppose you should have 2 dataset (1 by week, another 1 is by quarter).
from quarter dataset, select only quarter 1 and quarter 3 for Perak.
from weekly dataset, select only Perak and Perlis.
from quarter dataset, find the total for the Peninsular Malaysia and East Malaysia based on quarter and year.
from weekly dataset, find the total for the Peninsular Malaysia and East Malaysia based on weekly and year.
from quarter dataset, find the total for Peninsular Malaysia and East Malaysia based on year.
from quarter dataset, find the total for Malaysia based on Quarter and year
From quarter dataset, find the total for Malaysia based on year.

Answer

Import the dataset into R environment #2013

library(httr)
library(readxl)
url2013 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/cd971423-e02e-415d-85f5-77c0118d242e/download/2013.xlsx"
GET(url2013, write_disk(tf <- tempfile(fileext = ".xlsx")))

## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/cd971423-e02e-415d-85f5-77c0118d242e/download/2013.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.3 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac13ec20a5.xlsx

data2013 <- read_excel(tf)

## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems

#see dimension of data2013
dim(data2013)

## [1] 55 17

Make sure the variable names are Minggu and states from Johor to WP Labuan

names(data2013) <- lapply(data2013[2,], as.character)

## Warning: Must use a character vector as names.
## This warning is displayed once per session.

#remove row 1 and 2
data2013 <- data2013[-c(1:2), ]

Remove Variable name Malaysia

data2013$Malaysia <- NULL

Rename WP Kuala Lumpur to ‘Kuala Lumpur’
Rename WP Labuan to ‘Labuan’

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data2013 <- rename(data2013, 
                   "Kuala Lumpur" = "WP Kuala Lumpur",
                   "Labuan" = "WP Labuan")

Remove Observation Total (entire row).

data2013 <- data2013[-53, ]

Add another column, as Year (make sure all observation in this year are same, eg: 2013)

data2013$Year <- 2013

Create another object for this dataset.

# Order Year to be first variable in dataset
data2013 <- select(data2013, Year, everything())
# Change variable type to numeric
data2013[1:17] <- sapply(data2013[1:17],
                         as.numeric)

group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)

quarter2013 <- 
  data2013 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))

Do the same process to year 2014, 2015, 2016, 20117 and 2018. 2014
Import the dataset into R environment

library(httr)
library(readxl)
url2014 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/76ed6414-c5cb-4c87-9d48-4b4bc1185bdb/download/2014.xlsx"
GET(url2014, write_disk(tf <- tempfile(fileext = ".xlxs")))

## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/76ed6414-c5cb-4c87-9d48-4b4bc1185bdb/download/2014.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.4 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac47881193.xlxs

data2014 <- read_excel(tf)

## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems

#see dimension of data2014
dim(data2014)

## [1] 56 17

Make sure the variable names are Minggu and states from Johor to WP Labuan

names(data2014) <- lapply(data2014[2,], as.character)
#remove row 1 and 2
data2014 <- data2014[-c(1:2), ]

Remove Variable name Malaysia

data2014$Malaysia <- NULL

Rename WP Kuala Lumpur to ‘Kuala Lumpur’
Rename WP Labuan to ‘Labuan’

library(dplyr)
data2014 <- rename(data2014, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)

Remove Observation Total (entire row).

data2014 <- data2014[-54, ]

Add another column, as Year (make sure all observation in this year are same, eg: 2013)

data2014$Year <- 2014

Create another object for this dataset.

# Order Year to be first variable in dataset
data2014 <- select(data2014, Year, everything())
# Change variable type to numeric
data2014[1:17] <- sapply(data2014[1:17],
                         as.numeric)

group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)

quarter2014 <- 
  data2014 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=14, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 27, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 40, "Quarter 3", "Quarter 4"))))

2015 1) Import the dataset into R environment

library(httr)
library(readxl)
url2015 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/3ca80408-74cd-4403-8f97-d11e402ddb3d/download/2015.xlsx"
GET(url2015, write_disk(tf <- tempfile(fileext = ".xlsx")))

## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/3ca80408-74cd-4403-8f97-d11e402ddb3d/download/2015.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.3 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac597f30c2.xlsx

data2015 <- read_excel(tf)

## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems

#see dimension of data2015
dim(data2015)

## [1] 55 17

Make sure the variable names are Minggu and states from Johor to WP Labuan

names(data2015) <- lapply(data2015[2,], as.character)
#remove row 1 and 2
data2015 <- data2015[-c(1:2), ]

Remove Variable name Malaysia

data2015$Malaysia <- NULL

Rename WP Kuala Lumpur to ‘Kuala Lumpur’
Rename WP Labuan to ‘Labuan’

library(dplyr)
data2015 <- rename(data2015, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)

Remove Observation Total (entire row).

data2015 <- data2015[-53, ]

Add another column, as Year (make sure all observation in this year are same, eg: 2013)

data2015$Year <- 2015

Create another object for this dataset.

# Order Year to be first variable in dataset
data2015 <- select(data2015, Year, everything())
# Change variable type to numeric
data2015[1:17] <- sapply(data2015[1:17],
                         as.numeric)

group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)

quarter2015 <- 
  data2015 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))

2016 1) Import the dataset into R environment

library(httr)
library(readxl)
url2016 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/40f38fee-e376-45dc-af96-067c588e9e4d/download/2016.xlsx"
GET(url2016, write_disk(tf <- tempfile(fileext = ".xlsx")))

## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/40f38fee-e376-45dc-af96-067c588e9e4d/download/2016.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.2 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac689a22db.xlsx

data2016 <- read_excel(tf)

## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems

#see dimension of data2016
dim(data2016)

## [1] 55 17

Make sure the variable names are Minggu and states from Johor to WP Labuan

names(data2016) <- lapply(data2016[2,], as.character)
#remove row 1 and 2
data2016 <- data2016[-c(1:2), ]

Remove Variable name Malaysia

data2016$Malaysia <- NULL

Rename WP Kuala Lumpur to ‘Kuala Lumpur’
Rename WP Labuan to ‘Labuan’

library(dplyr)
data2016 <- rename(data2016, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)

Remove Observation Total (entire row).

data2016 <- data2016[-53, ]

Add another column, as Year (make sure all observation in this year are same, eg: 2013)

data2016$Year <- 2016

Create another object for this dataset.

# Order Year to be first variable in dataset
data2016 <- select(data2016, Year, everything())
# Change variable type to numeric
data2016[1:17] <- sapply(data2016[1:17],
                         as.numeric)

group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)

quarter2016 <- 
  data2016 %>%
  mutate(quarter = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))

2017 1) Import the dataset into R environment

library(httr)
library(readxl)
url2017 <- "http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/66a5e3d0-55cd-4096-a701-e134b91b104d/download/2017.xlsx"
GET(url2017, write_disk(tf <- tempfile(fileext = ".xlsx")))

## Response [http://www.data.gov.my/data/ms_MY/dataset/bbd0dfba-b586-47ae-91ea-6f0b7880d25b/resource/66a5e3d0-55cd-4096-a701-e134b91b104d/download/2017.xlsx]
##   Date: 2019-10-02 15:24
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 22.4 kB
## <ON DISK>  C:\Users\asus\AppData\Local\Temp\Rtmp42C20b\file17ac7847587e.xlsx

data2017 <- read_excel(tf)

## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 11 more problems

#see dimension of data2017
dim(data2017)

## [1] 55 17

Make sure the variable names are Minggu and states from Johor to WP Labuan

names(data2017) <- lapply(data2017[2,], as.character)
#remove row 1 and 2
data2017 <- data2017[-c(1:2), ]

Remove Variable name Malaysia

data2017$Malaysia <- NULL

Rename WP Kuala Lumpur to ‘Kuala Lumpur’
Rename WP Labuan to ‘Labuan’

library(dplyr)
data2017 <- rename(data2017, 
                   `Kuala Lumpur` = `WP Kuala Lumpur`,
                   `Labuan` = `WP Labuan`)

Remove Observation Total (entire row).

data2017 <- data2017[-53, ]

Add another column, as Year (make sure all observation in this year are same, eg: 2013)

data2017$Year <- 2017

Create another object for this dataset.

# Order Year to be first variable in dataset
data2017 <- select(data2017, Year, everything())
# Change variable type to numeric
data2017[1:17] <- sapply(data2017[1:17],
                         as.numeric)

group the observations to quarterly (Quarter 1 - Quarter 4).(take summation)

quarter2017 <- 
  data2017 %>%
  mutate(quarter2017 = ifelse(`Epid Minggu` <=13, "Quarter 1", 
                          ifelse(`Epid Minggu` <= 26, "Quarter 2",
                                 ifelse(`Epid Minggu` <= 39, "Quarter 3", "Quarter 4"))))

combine these dataset to one dataset. (suppose you should have 2 dataset (1 by week, another 1 is by quarter).
from quarter dataset, select only quarter 1 and quarter 3 for Perak.
from weekly dataset, select only Perak and Perlis
from quarter dataset, find the total for the Peninsular Malaysia and East Malaysia based on quarter and year.
from weekly dataset, find the total for the Peninsular Malaysia and East Malaysia based on weekly and year.
From quarter dataset, find the total for Peninsular Malaysia and East Malaysia based on year.
from quarter dataset, find the total for Malaysia based on Quarter and year
From quarter dataset, find the total for Malaysia based on year.

Subsetting Data

Alya Azman

10/2/2019

My First assignment forsubsetting data

Question

Answer