Weight & Height of American women aged 30-36
data(women)
women
## height weight
## 1 58 115
## 2 59 117
## 3 60 120
## 4 61 123
## 5 62 126
## 6 63 129
## 7 64 132
## 8 65 135
## 9 66 139
## 10 67 142
## 11 68 146
## 12 69 150
## 13 70 154
## 14 71 159
## 15 72 164
Summary of the height of the sample of American women:
summary(women$height)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 58.0 61.5 65.0 65.0 68.5 72.0
To describe the data and provide a summary, the summary() function is used on the height variable. Based on the result above, the minimum height is 58 inches. The maximum height is 72 inches. The mean is 65. The median is also 65.
quantile(women$height)
## 0% 25% 50% 75% 100%
## 58.0 61.5 65.0 68.5 72.0
To find the quantile of the height, the quartile() function is used.
IQR(women$height)
## [1] 7
To find the interquartile range of the height, the IQR() function is used. The interquartile range is 7.
var(women$height)
## [1] 20
To find the variance of the height, the var() function is used. The variance is 20.
sd(women$height)
## [1] 4.472136
To find the standard deviation of the height, the sd() function is used. The standard deviation is 4.472136.
Summary of the weight of the sample of American women:
summary(women$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 115.0 124.5 135.0 136.7 148.0 164.0
To describe the data and provide a summary, the summary() function is used on the weight variable. Based on the result above, the minimum weight is 115 lbs. The maximum weight is 164 lbs. The mean is 135.0. The median is 136.7.
quantile(women$weight)
## 0% 25% 50% 75% 100%
## 115.0 124.5 135.0 148.0 164.0
To find the quantile of the height, the quantile() function is used.
IQR(women$weight)
## [1] 23.5
To find the interquartile range of the height, the IQR() function is used. The interquartile range is 23.5.
var(women$weight)
## [1] 240.2095
To find the variance of the height, the var() function is used. The variance is 240.2095.
sd(women$weight)
## [1] 15.49869
To find the standard deviation of the height, the sd() function is used. The standard deviation is 15.49869.
Plot:
plot(women, xlab = "Height (in)", ylab = "Weight (lb)", main = "Weight & Height of American Women aged 30-39")
fit <- lm(weight~height, data = women)
abline(fit, lty="dashed")
Codebook:
codebook(women)
## ================================================================================
##
## height
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
##
## Min: 58.000
## Max: 72.000
## Mean: 65.000
## Std.Dev.: 4.320
## Skewness: 0.000
## Kurtosis: -1.211
##
## ================================================================================
##
## weight
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
##
## Min: 115.000
## Max: 164.000
## Mean: 136.733
## Std.Dev.: 14.973
## Skewness: 0.252
## Kurtosis: -1.100
summary(women)
## height weight
## Min. :58.0 Min. :115.0
## 1st Qu.:61.5 1st Qu.:124.5
## Median :65.0 Median :135.0
## Mean :65.0 Mean :136.7
## 3rd Qu.:68.5 3rd Qu.:148.0
## Max. :72.0 Max. :164.0
states <- c("Perlis","Kedah","Penang","Perak","Selangor","Kuala Lumpur","Putrajaya", "Malacca","Johore","Kelantan","Terengganu", "Negeri Sembilan","Pahang")
region <- c("Northern","Northern","Northern","Northern","Central","Central","Central","Southern","Southern","East Coast","East Coast","Southern","East Coast")
cases <- c(8,422,372,279,2235,447,30,209,549,626,266,434,263)
covidmalaysia <- data.frame(states, cases, region)
print(covidmalaysia)
## states cases region
## 1 Perlis 8 Northern
## 2 Kedah 422 Northern
## 3 Penang 372 Northern
## 4 Perak 279 Northern
## 5 Selangor 2235 Central
## 6 Kuala Lumpur 447 Central
## 7 Putrajaya 30 Central
## 8 Malacca 209 Southern
## 9 Johore 549 Southern
## 10 Kelantan 626 East Coast
## 11 Terengganu 266 East Coast
## 12 Negeri Sembilan 434 Southern
## 13 Pahang 263 East Coast
This dataset represents the number of Covid-19 cases in Peninsular Malaysia on 23 May 2021. The source is taken from the official website of Ministry of Health. The data is retrieved from covid-19.moh.gov.my
covidmalaysia2 <- rename(covidmalaysia, c(Peninsular_States=states, Covid19_Cases = cases))
covidmalaysia2
## Peninsular_States Covid19_Cases region
## 1 Perlis 8 Northern
## 2 Kedah 422 Northern
## 3 Penang 372 Northern
## 4 Perak 279 Northern
## 5 Selangor 2235 Central
## 6 Kuala Lumpur 447 Central
## 7 Putrajaya 30 Central
## 8 Malacca 209 Southern
## 9 Johore 549 Southern
## 10 Kelantan 626 East Coast
## 11 Terengganu 266 East Coast
## 12 Negeri Sembilan 434 Southern
## 13 Pahang 263 East Coast
The rename() function is used to change the column names. The column ‘states’ is changed to ‘Peninsular_States’. The column ‘cases’ is changed to ‘Covid19_Cases’.
covidmalaysia2 %>% filter(region=="Central")
## Peninsular_States Covid19_Cases region
## 1 Selangor 2235 Central
## 2 Kuala Lumpur 447 Central
## 3 Putrajaya 30 Central
The filter()function is used to pick rows that contains the value “Central” from the dataset called “covidmalaysia2”. The output produced Kuala Lumpur, Selangor, and Putrajaya.
cluster <- c(0, 79, 121,20,72,40,0,59,194,140,113,55,117)
closecontact <- c(6,194,110,167,1768,263,23,99,254,361,112,253, 110)
covidmalaysia2 %>% mutate(cluster,closecontact, .after=Covid19_Cases)
## Peninsular_States Covid19_Cases cluster closecontact region
## 1 Perlis 8 0 6 Northern
## 2 Kedah 422 79 194 Northern
## 3 Penang 372 121 110 Northern
## 4 Perak 279 20 167 Northern
## 5 Selangor 2235 72 1768 Central
## 6 Kuala Lumpur 447 40 263 Central
## 7 Putrajaya 30 0 23 Central
## 8 Malacca 209 59 99 Southern
## 9 Johore 549 194 254 Southern
## 10 Kelantan 626 140 361 East Coast
## 11 Terengganu 266 113 112 East Coast
## 12 Negeri Sembilan 434 55 253 Southern
## 13 Pahang 263 117 110 East Coast
Two new columns called ‘cluster’ and ‘closecontact’ is added to the dataset using the mutate() function. The data is retrieved from covid-19.moh.gov.my
covidmalaysia3 = data.frame(Peninsular_States = c("Perlis","Kedah","Penang","Perak","Selangor","Kuala Lumpur","Putrajaya", "Malacca","Johore","Kelantan","Terengganu", "Negeri Sembilan","Pahang"), importcase = c(0,0,0,0,0,4,0,0,0,0,0,0,0))
covidmalaysia4 = left_join(x=covidmalaysia2, y=covidmalaysia3, by="Peninsular_States")
covidmalaysia4
## Peninsular_States Covid19_Cases region importcase
## 1 Perlis 8 Northern 0
## 2 Kedah 422 Northern 0
## 3 Penang 372 Northern 0
## 4 Perak 279 Northern 0
## 5 Selangor 2235 Central 0
## 6 Kuala Lumpur 447 Central 4
## 7 Putrajaya 30 Central 0
## 8 Malacca 209 Southern 0
## 9 Johore 549 Southern 0
## 10 Kelantan 626 East Coast 0
## 11 Terengganu 266 East Coast 0
## 12 Negeri Sembilan 434 Southern 0
## 13 Pahang 263 East Coast 0
By using the left_join() function, a dataframe called ‘covidmalaysia3’ is combined with ‘covidmalaysia2’, and a dataset called covidmalaysia4 is created. The data is retrieved from covid-19.moh.gov.my