Total number of parts : 05
Each part carries 5 marks.
You need to answer 05 parts in total.
This is a R Markdown/html. All questions are R related
Write your answers/scripts in the cell and save this Notebook in .Rmd format/ generate .html using ‘Knit’ button.
Index Number: [COADDS192P-006]
Answer all questions in this Markdown itself.
Using the provided “state” data, please answer the following questions:
Compute the mean, trimmed mean, and median for the population using R: (A trimmed mean is widely used to avoid the influence of outliers.
For example, trimming the bottom and top 10% (a common choice) of the data will provide protection against outliers in all but the smallest data sets.) (1 Marks)
state <- read.csv("state.csv", header = TRUE)
mean(state$Population)
## [1] 6162876
mean(state$Population, trim = 0.1)
## [1] 4783697
Displays some percentiles of the murder rate, such as;
(10%,25%,50%,75% and 90%) (2 Marks)
quantile(state$Murder.Rate, c(.1, .25, .5, .75, .9))
## 10% 25% 50% 75% 90%
## 1.890 2.425 4.000 5.550 6.010
Using R’s functions, compute standard deviation and interquartile range (IQR) (2 Marks)
sd(state$Population)
## [1] 6848235
sd(state$Murder.Rate)
## [1] 1.915736
IQR(state$Population)
## [1] 4847308
IQR(state$Murder.Rate)
## [1] 3.125
Whole Life Organic, Inc., produces high-quality organic frozen turkeys for distribution in organic food markets in the upper Midwest. The company has developed a range feeding program with organic grain supplements to produce their product. The mean weight of its frozen turkeys is 18 pounds with a variance of 4. Historical experience indicates that weights can be approximated by the normal probability distribution.
#I.
x <- pnorm(16, mean = 18, sd=sqrt(4), lower.tail = TRUE)
x
## [1] 0.1586553
#II.
pnorm(20, mean = 18, sd=sqrt(4), lower.tail = FALSE)
## [1] 0.1586553
#III.
y <- pnorm(20, mean = 18, sd=sqrt(4), lower.tail = TRUE)
y
## [1] 0.8413447
#IV.
y-x
## [1] 0.6826895
Find the cutoff point for the top 15% of sales? (2 Marks)
#Could not identify sales from question, went with weights...
qnorm(0.85, mean = 18, sd = sqrt(4))
## [1] 20.07287
A useful way to summarize two categorical variables is a contingency table. Using the provided lc_loans data, show the contingency table between the grade of a personal loan and the outcome of that loan.
Use the table command and obtain the contingency table (2 Marks)
lc <- read.csv("lc_loans.csv", header = TRUE)
table(lc$status, lc$grade)
##
## A B C D E F G
## Charged Off 1562 5302 6023 5007 2842 1526 409
## Current 50051 93852 88928 53281 24639 8444 1990
## Fully Paid 20408 31160 23147 13681 5949 2328 643
## Late 469 2056 2777 2308 1374 606 199
Install the library(descr) and use the CrossTable command to obtain the contingency table with counts and percentages
Hint (prop.c=F, prop.chisq=F, prop.t=F ) (3 Marks)
library(descr)
## Warning: package 'descr' was built under R version 3.6.3
CrossTable(lc$status, lc$grade, prop.c=F, prop.chisq=F, prop.t=F )
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## |-------------------------|
##
## ===============================================================================
## lc$grade
## lc$status A B C D E F G Total
## -------------------------------------------------------------------------------
## Charged Off 1562 5302 6023 5007 2842 1526 409 22671
## 0.069 0.234 0.266 0.221 0.125 0.067 0.018 0.050
## -------------------------------------------------------------------------------
## Current 50051 93852 88928 53281 24639 8444 1990 321185
## 0.156 0.292 0.277 0.166 0.077 0.026 0.006 0.712
## -------------------------------------------------------------------------------
## Fully Paid 20408 31160 23147 13681 5949 2328 643 97316
## 0.210 0.320 0.238 0.141 0.061 0.024 0.007 0.216
## -------------------------------------------------------------------------------
## Late 469 2056 2777 2308 1374 606 199 9789
## 0.048 0.210 0.284 0.236 0.140 0.062 0.020 0.022
## -------------------------------------------------------------------------------
## Total 72490 132370 120875 74277 34804 12904 3241 450961
## ===============================================================================
What is a Box Plot and why to use Box Plot with categorical variable?(1 Marks)
ANSWER: The boxplot helps to summarize data and see the shape of it’s distribution. It’s easier to compare categorical variables against another variable using boxplots.
Compare how the percentage of flight delays(pct_carrier_delay) varies across airlines by using box plot. (2 Marks) Hint [fill by airline and use the ylim as (0:50)]
fly <- read.csv("airline_stats.csv", header = TRUE)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
ggplot(data = fly) + geom_boxplot(mapping = aes(x= airline, y=pct_carrier_delay, fill =airline ))+
ylim(0,50) +labs(title = "Delays according to airline", x= "airline", y="delay" )
## Warning: Removed 38 rows containing non-finite values (stat_boxplot).
Compare how the density of flight delays(pct_carrier_delay) varies across airlines by using Violin plot (2 Marks)
Hint [fill by airline and use the ylim as (0:50)]
ggplot(data=fly) +geom_violin(mapping = aes(x=airline, y=pct_carrier_delay, fill =airline ))+
ylim(0,50) + labs(title = "Delays according to airline", x= "airline", y="delay" )
## Warning: Removed 38 rows containing non-finite values (stat_ydensity).
Using the provided “mpg” data, please answer the following questions:
Which variables in mpg are categorical? Which variables are continuous? (1 Marks)
ANSWER: All variables can be considered as categorical, including cty and hwy. ### Question 05-b Make a scatterplot of cty versus cyl. (1 Marks)
mpg <- read.csv("mpgta.csv", header = TRUE)
plot(mpg$cty, mpg$hwy, xlab = "cty", ylab="hwy", main = "cty vs. hwy")
Find the correlation value between cty and cyl (1 Marks)
cor(mpg$cty, mpg$cyl)
## [1] -0.8057714
and comment on the value.
Map the colors of your points to the class variable to reveal the class of each car: (1 Marks)
## write your codes here
Obtain the frequencies for manufacturers (1 Marks)
## write your codes here