library(readr)
library(pander)
library(tidyverse)
library(forcats)
library(ggplot2)
library(outliers)
library(EnvStats)

Problem 1: Consider a random sample of 110 small to mid-size companies in the Philippines, classified according to their annual revenues (in millions of pesos). The following data presents the 110 companies classified based on their annual revenues into five categories as follows: Under 150, 150 – under 300, 300 – under 450, 450 – under 600, 600 or more. The labels 1, 2, 3, 4, 5, respectively, were used for the above categories. Construct a frequency, relative frequency, and percent frequency distribution table for the data. (10 points)

\(~\)

Data: 1, 4, 3, 5, 3, 4, 1, 2, 3, 4, 3, 1, 5, 3, 4, 2, 1, 1, 4, 5, 3, 2, 5, 2, 5, 2, 1, 2, 3, 3, 2, 1, 2, 5, 3, 2, 1, 1, 2, 1, 2, 4, 5, 3, 5, 1, 3, 1, 2, 1, 4, 1, 4, 5, 4, 1, 1, 2, 4, 1, 4, 1, 2, 4, 3, 3, 4, 1, 4, 1, 4, 1, 2, 1, 5, 3, 1, 5, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 1, 5, 3, 2, 5, 5, 2, 5, 4, 3, 5, 2, 3, 2, 3, 5, 2, 3, 5, 5, 2, 3.

\(~\)

Create data manually in RStudio:

revclass <- c(1,4,3,5,3,4,1,2,3,4,3,1,5,3,4,2,1,1,4,5,3,2,5,2,5,2,1,2,3,3,2,1,2,5,3,2,1,1,2,1,2,4,5,3,5,1,3,1,2,1,4,1,4,5,4,1,1,2,4,1,4,1,2,4,3,3,4,1,4,1,4,1,2,1,5,3,1,5,2,1,2,3,1,2,2,1,1,2,1,5,3,2,5,5,2,5,4,3,5,2,3,2,3,5,2,3,5,5,2,3)

revclass
##   [1] 1 4 3 5 3 4 1 2 3 4 3 1 5 3 4 2 1 1 4 5 3 2 5 2 5 2 1 2 3 3 2 1 2 5 3 2 1
##  [38] 1 2 1 2 4 5 3 5 1 3 1 2 1 4 1 4 5 4 1 1 2 4 1 4 1 2 4 3 3 4 1 4 1 4 1 2 1
##  [75] 5 3 1 5 2 1 2 3 1 2 2 1 1 2 1 5 3 2 5 5 2 5 4 3 5 2 3 2 3 5 2 3 5 5 2 3

\(~\)

Solution:

data.freq <- table(revclass)
data.relfreq <- data.freq/sum(data.freq)
data.pctfreq <- data.relfreq*100
freq.dist <- cbind(data.freq, data.relfreq, data.pctfreq)
colnames(freq.dist) <- c("Frequency", "Relative Frequency", "Percent Frequency")
pander(freq.dist)
Frequency Relative Frequency Percent Frequency
28 0.2545 25.45
26 0.2364 23.64
21 0.1909 19.09
16 0.1455 14.55
19 0.1727 17.27

\(~\)

Problem 2: The following data give the consumption of electricity in kilowatt-hours during a given month in 30 rural households. Construct a stem-and-leaf plot. (5 points)

\(~\)

Data: 260,290,280,240,250,230,310,305,264,286,262,241,209,226,278,206,217,247,268,207,226,247,250,260,264,233,213,265,206,225

\(~\)

Solution:

econsumpt <- c(260,290,280,240,250,230,310,305,264,286,262,241,209,226,278,206,217,247,268,207,226,247,250,260,264,233,213,265,206,225)
stem(econsumpt)
  
    The decimal point is 1 digit(s) to the right of the |
  
    20 | 667937
    22 | 56603
    24 | 017700
    26 | 00244588
    28 | 060
    30 | 50

\(~\)

Problem 3: The food services division of Oceanview Amusement Park is studying the amount families who visit the amusement park spend per day on food and drink. A sample of 40 families who visited the park yesterday revealed they spent the following amounts (in thousands of Pesos). (5 points)

\(~\)

Data: 7.7, 1.8, 6.3, 8.4, 3.8, 5.4, 5.0, 5.9, 5.4, 5.6, 3.6, 2.6, 5.0, 3.4, 4.4, 4.1, 5.8, 5.8, 5.3, 5.1, 6.2, 4.3, 5.2, 5.3, 6.3, 6.2, 6.2, 6.5, 6.1, 5.2, 6.0, 6.0, 4.5, 6.6, 8.3, 7.1, 6.3, 5.8, 6.1, 7.1

\(~\) a.) Organize the data into a frequency distribution, using seven (7) classes, with a class width (increment between successive lower limits) of 1.0, where the lower limit of the first class interval is 1.5.

b.) Where do the data tend to cluster?

\(~\)

Solution:

amtspent <- c(7.7, 1.8, 6.3, 8.4, 3.8, 5.4, 5.0, 5.9, 5.4, 5.6, 3.6, 2.6, 5.0, 3.4, 4.4, 4.1, 5.8, 5.8, 5.3, 5.1, 6.2, 4.3, 5.2, 5.3, 6.3, 6.2, 6.2, 6.5, 6.1, 5.2, 6.0, 6.0, 4.5, 6.6, 8.3, 7.1, 6.3, 5.8, 6.1, 7.1)

breaks <- seq(1.5, 8.5, by = 1.0)

classint <- cut(amtspent, breaks, right = FALSE)

freq <- table(classint)

freq.dist <- cbind(freq)

colnames(freq.dist) <- c("Frequency")

pander(freq.dist)
  Frequency
[1.5,2.5) 1
[2.5,3.5) 2
[3.5,4.5) 5
[4.5,5.5) 10
[5.5,6.5) 15
[6.5,7.5) 4
[7.5,8.5) 3

\(~\)

The frequency distribution table show that the data tend to cluster in the interval 5.5 to 6.4.

\(~\)

Problem 4: Annual imports from selected Canadian trading partners are listed below for the year 2019. Develop an appropriate chart or graph and write a brief report summarizing the information. (5 points)

Partner Annual Imports (in $ million)
Japan 9,500
United Kingdom 4,556
South Korea 2,441
Philippines 1,182
Australia 618

\(~\)

Solution:

import <- c(9500, 4556, 2441, 1182, 618)
partner <- c("Japan", "United Kingdom", "South Korea", "Philippines", "Australia")

data <- data.frame(partner, import)
ggplot(data, aes(x=partner, y=import))+geom_bar(stat = "identity") + ggtitle("Data on Annual Imports")

\(~\)

The bar chart shows annual import data from selected Canadian trading partners. Japan has the highest annual imports on record at $9500, higher than the combined annual imports made by the other selected partners. On the other hand, the least annual import was recorded for Australia.

\(~\)

Problem 5: The People’s Banking Company is studying the number of times the ATM located in a certain supermall is used per day. The following data shows the number of time the machine was used over each of the last 30 days. (10 points)

\(~\)

Data: 83, 64, 84, 76, 84, 54, 75, 59, 70, 61, 63, 80, 84, 73, 68, 52, 65, 90, 52, 77, 95, 36, 78, 61, 59, 84, 95, 47, 87, 60.

\(~\)

  1. Find the mean, median, and standard deviation of these data.
  2. Find the three quartiles and the IQR for these data.
  3. Prepare a boxplot for these data. From the boxplot, determine if there are outliers. If there are, perform the Rosner’s test for outlier to determine significant outliers in the data set.

\(~\)

Solution:

atmdata <- c(83, 64, 84, 76, 84, 54, 75, 59, 70, 61, 63, 80, 84, 73, 68, 52, 65, 90, 52, 77, 95, 36, 78, 61, 59, 84, 95, 47, 87, 60)
round(mean(atmdata), digits = 2)
## [1] 70.53

The mean is 70.53.

\(~\)

round(median(atmdata), digits = 2)
  [1] 71.5

The median is 71.5.

\(~\)

round(sd(atmdata), digits = 4)
  [1] 14.8248

The standard deviation is 14.8248.

\(~\)

For the Quartiles and IQR:

Q1 <- quantile(atmdata, 0.25)
Q1
    25% 
  60.25
Q2 <- quantile(atmdata, 0.50)
Q2
   50% 
  71.5
Q3 <- quantile(atmdata, 0.75)
Q3
    75% 
  83.75
IQR <- Q3 - Q1
IQR
   75% 
  23.5

The 1st Quartile, \(Q_1\), is \(60.25\); the 2nd Quartile, \(Q_2\), is \(71.5\); the 3rd Quartile, \(Q_3\), is \(83.75\). The IQR, on the other hand, is \(23.5\).

\(~\)

For the Boxplot:

boxplot(atmdata, outcol = "red", cex=1.5)

\(~\)

The boxplot shows that there are no identified outliers in the data.