LONG TEST 2. PART II.

library(readr)
library(pander)
library(tidyverse)

## -- Attaching packages ----------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v dplyr   1.0.2
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.0
## v purrr   0.3.4

## -- Conflicts -------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(forcats)
library(outliers)
library(EnvStats)

## 
## Attaching package: 'EnvStats'

## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm

## The following object is masked from 'package:base':
## 
##     print.default

PROBLEM 1

Consider a random sample of 110 small to mid-size companies in the Philippines, classified according to their annual revenues (in millions of pesos). The following data presents the 110 companies classified based on their annual revenues into five categories as follows: Under 150, 150 – under 300, 300 – under 450, 450 – under 600, 600 or more. The labels 1, 2, 3, 4, 5, respectively, were used for the above categories. Construct a frequency, relative frequency, and percent frequency distribution table for the data.

revenue<-read.csv("revenue.csv")
data<-c("Under 150","150 - Under 300","300 - Under 450","450 - Under 600","600 or more")
data.freq<-table(revenue)
data.relfreq<-round(data.freq/sum(data.freq),digits=4)
data.pctfreq<-data.relfreq*100
freq.dist<-cbind(data,data.freq,data.relfreq,data.pctfreq)
colnames(freq.dist)<-c("Annual Revenue","Frequency","Relative Frequency","Percent Frequency")
pander(freq.dist)

Annual Revenue	Frequency	Relative Frequency	Percent Frequency
Under 150	28	0.2545	25.45
150 - Under 300	26	0.2364	23.64
300 - Under 450	21	0.1909	19.09
450 - Under 600	16	0.1455	14.55
600 or more	19	0.1727	17.27

Out of the 110 companies, 28 or 25.45% have an annual revenue of below P150 million, 26 or 23.64% have an annual revenue of at least P150 million but below P300 million, 21 or 19.09% have an annual revenue of at least P300 million but below P450 million, 16 or 14.55% have an annual revenue of at least P450 million but below P600 million, and 19 or 17.27% have an annual revenue of at least P600 million.

PROBLEM 2

The following data give the consumption of electricity in kilowatt-hours during a given month in 30 rural households. Construct a stem-and-leaf diagram for these data.

electricity<-read.csv("electricity.csv")
data<-electricity$electricity
stem(data)

## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##   20 | 667937
##   22 | 56603
##   24 | 017700
##   26 | 00244588
##   28 | 060
##   30 | 50

The lowest consumption of electricity is 203 kilowatt hours while the highest consumption is 305 kilowatt hours. Most households consume around 260 kilowatt hours.

PROBLEM 3

The food services division of Oceanview Amusement Park is studying the amount families who visit the amusement park spend per day on food and drink. A sample of 40 families who visited the park yesterday revealed they spent the following amounts (in thousands of Pesos).

Problem 3a

Organize the data into a frequency distribution, using seven (7) classes, with a class width (increment between successive lower limits) of 1.0, where the lower limit of the first class interval is 1.5.

data1<-read.csv("expense.csv")
breaks<-seq(1.5,8.5,by=1)
classint<-cut(data1$expense,breaks,right=FALSE)
freq1<-table(classint)
freq.dist1<-cbind(freq1)
colnames(freq.dist1)<-c("Freqeuncy")
pander(freq.dist1)

	Freqeuncy
[1.5,2.5)	1
[2.5,3.5)	2
[3.5,4.5)	5
[4.5,5.5)	10
[5.5,6.5)	15
[6.5,7.5)	4
[7.5,8.5)	3

Out of 40 families, 1 family spends at least P1,500 and below P2,500, 2 families spend at least P2,500 and below P3,500, 5 families spend at least P3,500 and below P4,500, 10 families spend at least P4,500 and below P5,500, 15 families spend P5,500 and below P6,500, 4 families spend at least P6,500 and below P7,500, and 3 families spend at least P7,500 and below P8,500.

Problem 3b

Where do the data tend to cluster? The data tend to cluster at the interval of at least P5,500 to below P6,500,

PROBLEM 4

Annual imports from selected Canadian trading partners are listed below for the year 2019. Develop an appropriate chart or graph and write a brief report summarizing the information.

imports<-read.csv("imports.csv")
ggplot(imports, aes(Partner,Annual.Imports.In.Million.Dollars))+geom_bar(stat="identity",width=.5)+ggtitle("Annual Imports from Selected Canadian Trading Partners")

From all the selected Canadian trading partners, Japan has the highest annual imports of 9,500 Million Dollars. It is followed by United Kingdom with 4,556 Million Dollars, then South Korea with 2,441 Million Dollars, and Philippines with 1,182 Million Dollars. Australia has the lowest annual imports of only 618 Million Dollars.

PROBLEM 5

The People’s Banking Company is studying the number of times the ATM located in a certain supermall is used per day. The following data shows the number of time the machine was used over each of the last 30 days.

Problem 5a

Find the mean, median, and standard deviation of these data.

usage<-read.csv("usage.csv")
mean(usage$usage)

## [1] 70.53333

median(usage$usage)

## [1] 71.5

sd(usage$usage)

## [1] 14.8248

The ATM located in a certain supermall has an average usage of 70.53 times per day. In 15 days, it was used below 71.5 times while in the other 15 days, it was used above 71.5 times. Its standard deviation is 14.8248 times.

Problem 5b

Find the three quartiles and the IQR for these data.

quantile(usage$usage,0.25)

##   25% 
## 60.25

quantile(usage$usage,0.5)

##  50% 
## 71.5

quantile(usage$usage,0.75)

##   75% 
## 83.75

Q1<-quantile(usage$usage,0.25)
Q3<-quantile(usage$usage,0.75)
IQR<-Q3-Q1
IQR

##  75% 
## 23.5

25% of the ATM usage over the last 30 days is less than 60.25 times, 50% of the ATM usage is less than 71.5 times, and 75% of the ATM usage is less than 83.75 times. In addition, the interquartile range is 23.5 times around the median.

Problem 5c

boxplot(usage,outcol="red",cex=1.5)

There are no outliers which means that all the values are within the range.

LONGTEST2

2194168 GARNA, JAN KATHLENE RAY L.

1769 MWF 5:00-6:00