library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(ggpubr)
## Loading required package: ggplot2
library(ggplot2)
library(Rmisc)
## Loading required package: lattice
## Loading required package: plyr
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:ggpubr':
## 
##     mutate
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
Infection_Risk_1_ <- read_csv("Infection_Risk(1).csv")
## Parsed with column specification:
## cols(
##   ID = col_double(),
##   Stay = col_double(),
##   Age = col_double(),
##   InfctRsk = col_double(),
##   Cultures = col_double(),
##   Xrays = col_double(),
##   Beds = col_double(),
##   MedSchl = col_double(),
##   Region = col_double(),
##   Census = col_double(),
##   Nurses = col_double(),
##   Services = col_double()
## )
View(Infection_Risk_1_)

Question 1

mean(Infection_Risk_1_$InfctRsk)
## [1] 4.354867
sd(Infection_Risk_1_$InfctRsk)
## [1] 1.340908
min(Infection_Risk_1_$InfctRsk)
## [1] 1.3
max(Infection_Risk_1_$InfctRsk)
## [1] 7.8
mean(Infection_Risk_1_$Nurses)
## [1] 173.2478
sd(Infection_Risk_1_$Nurses)
## [1] 139.2654
min(Infection_Risk_1_$Nurses)
## [1] 14
max(Infection_Risk_1_$Nurses)
## [1] 656

Question 2

The box plots for the variables InfctRsk and Nurses provide a visualization of the IQR, median, minumum, and maximum. For Nurses, there are a few outliers that are spread out while the median and IQR are centered closer towards the lower end of the distribution. The boxp plot for InfctRsk shows an IQR and mean that reside on or right beside the center of the distribution.

boxplot(Infection_Risk_1_$InfctRsk)

boxplot(Infection_Risk_1_$Nurses)

##Question 3

The normality of data for the variables InfctRsk and Nurses can be confirmed by looking at the density plots for both variables. Infct has a normal distribution with the bell curve in the middle of the distribution while Nurses has a positively skewed distribution where the bell curve is leaning towards the left.

ggdensity(Infection_Risk_1_$InfctRsk)

ggdensity(Infection_Risk_1_$Nurses)

Question 4

# for patients age 60 or older
Infection_Risk_1_ %>%
filter(Age > 60) %>%
summarize(mean=mean(InfctRsk, na.rm=T))
##       mean
## 1 4.766667
#for patients age 50 or younger
Infection_Risk_1_ %>%
filter(Age < 50) %>%
summarize(mean=mean(InfctRsk, na.rm=T))
##       mean
## 1 4.428571

##Question 5

#average infection risk by region
Infection_Risk_1_ %>%
group_by(Region) %>%
summarize(mean=mean(InfctRsk, na.rm=T))
##       mean
## 1 4.354867
#average number of nurses by region
Infection_Risk_1_ %>%
group_by(Region) %>%
summarize(mean=mean(Nurses, na.rm=T))
##       mean
## 1 173.2478

Question 6

CI(Infection_Risk_1_$InfctRsk, ci=0.95)
##    upper     mean    lower 
## 4.604801 4.354867 4.104933
CI(Infection_Risk_1_$Nurses, ci=0.95)
##    upper     mean    lower 
## 199.2057 173.2478 147.2899

Question 7

The CI’s represent the upper and lower bounds of where we estimate the actual population mean to be. This can be calculated from our sample. Using the 95% CI we can say that we’re 95% confident that the true infection risk of the population lies between 4.1 and 4.6, and likewise, the true average number of nurses for the population lies between 147 and 199 nurses.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.