library(readr)
library(pander)
library(tidyverse)
library(forcats)
library(ggplot2)
library(outliers)
library(EnvStats)
Problem 1: Consider a random sample of 110 small to mid-size companies in the Philippines, classified according to their annual revenues (in millions of pesos). The following data presents the 110 companies classified based on their annual revenues into five categories as follows: Under 150, 150 – under 300, 300 – under 450, 450 – under 600, 600 or more. The labels 1, 2, 3, 4, 5, respectively, were used for the above categories. Construct a frequency, relative frequency, and percent frequency distribution table for the data. (10 points)
\(~\)
Data: 1, 4, 3, 5, 3, 4, 1, 2, 3, 4, 3, 1, 5, 3, 4, 2, 1, 1, 4, 5, 3, 2, 5, 2, 5, 2, 1, 2, 3, 3, 2, 1, 2, 5, 3, 2, 1, 1, 2, 1, 2, 4, 5, 3, 5, 1, 3, 1, 2, 1, 4, 1, 4, 5, 4, 1, 1, 2, 4, 1, 4, 1, 2, 4, 3, 3, 4, 1, 4, 1, 4, 1, 2, 1, 5, 3, 1, 5, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 1, 5, 3, 2, 5, 5, 2, 5, 4, 3, 5, 2, 3, 2, 3, 5, 2, 3, 5, 5, 2, 3.
\(~\)
Create data manually in RStudio:
revclass <- c(1,4,3,5,3,4,1,2,3,4,3,1,5,3,4,2,1,1,4,5,3,2,5,2,5,2,1,2,3,3,2,1,2,5,3,2,1,1,2,1,2,4,5,3,5,1,3,1,2,1,4,1,4,5,4,1,1,2,4,1,4,1,2,4,3,3,4,1,4,1,4,1,2,1,5,3,1,5,2,1,2,3,1,2,2,1,1,2,1,5,3,2,5,5,2,5,4,3,5,2,3,2,3,5,2,3,5,5,2,3)
revclass
## [1] 1 4 3 5 3 4 1 2 3 4 3 1 5 3 4 2 1 1 4 5 3 2 5 2 5 2 1 2 3 3 2 1 2 5 3 2 1
## [38] 1 2 1 2 4 5 3 5 1 3 1 2 1 4 1 4 5 4 1 1 2 4 1 4 1 2 4 3 3 4 1 4 1 4 1 2 1
## [75] 5 3 1 5 2 1 2 3 1 2 2 1 1 2 1 5 3 2 5 5 2 5 4 3 5 2 3 2 3 5 2 3 5 5 2 3
\(~\)
Solution:
data.freq <- table(revclass)
data.relfreq <- data.freq/sum(data.freq)
data.pctfreq <- data.relfreq*100
freq.dist <- cbind(data.freq, data.relfreq, data.pctfreq)
colnames(freq.dist) <- c("Frequency", "Relative Frequency", "Percent Frequency")
pander(freq.dist)
Frequency | Relative Frequency | Percent Frequency |
---|---|---|
28 | 0.2545 | 25.45 |
26 | 0.2364 | 23.64 |
21 | 0.1909 | 19.09 |
16 | 0.1455 | 14.55 |
19 | 0.1727 | 17.27 |
\(~\)
Problem 2: The following data give the consumption of electricity in kilowatt-hours during a given month in 30 rural households. Construct a stem-and-leaf plot. (5 points)
\(~\)
Data: 260,290,280,240,250,230,310,305,264,286,262,241,209,226,278,206,217,247,268,207,226,247,250,260,264,233,213,265,206,225
\(~\)
Solution:
econsumpt <- c(260,290,280,240,250,230,310,305,264,286,262,241,209,226,278,206,217,247,268,207,226,247,250,260,264,233,213,265,206,225)
stem(econsumpt)
The decimal point is 1 digit(s) to the right of the |
20 | 667937
22 | 56603
24 | 017700
26 | 00244588
28 | 060
30 | 50
\(~\)
Problem 3: The food services division of Oceanview Amusement Park is studying the amount families who visit the amusement park spend per day on food and drink. A sample of 40 families who visited the park yesterday revealed they spent the following amounts (in thousands of Pesos). (5 points)
\(~\)
Data: 7.7, 1.8, 6.3, 8.4, 3.8, 5.4, 5.0, 5.9, 5.4, 5.6, 3.6, 2.6, 5.0, 3.4, 4.4, 4.1, 5.8, 5.8, 5.3, 5.1, 6.2, 4.3, 5.2, 5.3, 6.3, 6.2, 6.2, 6.5, 6.1, 5.2, 6.0, 6.0, 4.5, 6.6, 8.3, 7.1, 6.3, 5.8, 6.1, 7.1
\(~\) a.) Organize the data into a frequency distribution, using seven (7) classes, with a class width (increment between successive lower limits) of 1.0, where the lower limit of the first class interval is 1.5.
b.) Where do the data tend to cluster?
\(~\)
Solution:
amtspent <- c(7.7, 1.8, 6.3, 8.4, 3.8, 5.4, 5.0, 5.9, 5.4, 5.6, 3.6, 2.6, 5.0, 3.4, 4.4, 4.1, 5.8, 5.8, 5.3, 5.1, 6.2, 4.3, 5.2, 5.3, 6.3, 6.2, 6.2, 6.5, 6.1, 5.2, 6.0, 6.0, 4.5, 6.6, 8.3, 7.1, 6.3, 5.8, 6.1, 7.1)
breaks <- seq(1.5, 8.5, by = 1.0)
classint <- cut(amtspent, breaks, right = FALSE)
freq <- table(classint)
freq.dist <- cbind(freq)
colnames(freq.dist) <- c("Frequency")
pander(freq.dist)
Frequency | |
---|---|
[1.5,2.5) | 1 |
[2.5,3.5) | 2 |
[3.5,4.5) | 5 |
[4.5,5.5) | 10 |
[5.5,6.5) | 15 |
[6.5,7.5) | 4 |
[7.5,8.5) | 3 |
\(~\)
The frequency distribution table show that the data tend to cluster in the interval 5.5 to 6.4.
\(~\)
Problem 4: Annual imports from selected Canadian trading partners are listed below for the year 2019. Develop an appropriate chart or graph and write a brief report summarizing the information. (5 points)
Partner | Annual Imports (in $ million) |
---|---|
Japan | 9,500 |
United Kingdom | 4,556 |
South Korea | 2,441 |
Philippines | 1,182 |
Australia | 618 |
\(~\)
Solution:
import <- c(9500, 4556, 2441, 1182, 618)
partner <- c("Japan", "United Kingdom", "South Korea", "Philippines", "Australia")
data <- data.frame(partner, import)
ggplot(data, aes(x=partner, y=import))+geom_bar(stat = "identity") + ggtitle("Data on Annual Imports")
\(~\)
The bar chart shows annual import data from selected Canadian trading partners. Japan has the highest annual imports on record at $9500, higher than the combined annual imports made by the other selected partners. On the other hand, the least annual import was recorded for Australia.
\(~\)
Problem 5: The People’s Banking Company is studying the number of times the ATM located in a certain supermall is used per day. The following data shows the number of time the machine was used over each of the last 30 days. (10 points)
\(~\)
Data: 83, 64, 84, 76, 84, 54, 75, 59, 70, 61, 63, 80, 84, 73, 68, 52, 65, 90, 52, 77, 95, 36, 78, 61, 59, 84, 95, 47, 87, 60.
\(~\)
\(~\)
Solution:
atmdata <- c(83, 64, 84, 76, 84, 54, 75, 59, 70, 61, 63, 80, 84, 73, 68, 52, 65, 90, 52, 77, 95, 36, 78, 61, 59, 84, 95, 47, 87, 60)
round(mean(atmdata), digits = 2)
## [1] 70.53
The mean is 70.53.
\(~\)
round(median(atmdata), digits = 2)
[1] 71.5
The median is 71.5.
\(~\)
round(sd(atmdata), digits = 4)
[1] 14.8248
The standard deviation is 14.8248.
\(~\)
For the Quartiles and IQR:
Q1 <- quantile(atmdata, 0.25)
Q1
25%
60.25
Q2 <- quantile(atmdata, 0.50)
Q2
50%
71.5
Q3 <- quantile(atmdata, 0.75)
Q3
75%
83.75
IQR <- Q3 - Q1
IQR
75%
23.5
The 1st Quartile, \(Q_1\), is \(60.25\); the 2nd Quartile, \(Q_2\), is \(71.5\); the 3rd Quartile, \(Q_3\), is \(83.75\). The IQR, on the other hand, is \(23.5\).
\(~\)
For the Boxplot:
boxplot(atmdata, outcol = "red", cex=1.5)
\(~\)
The boxplot shows that there are no identified outliers in the data.