This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
“2.2.7 For each of 31 healthy dogs, a veterinarian measured the glucose concentration in the anterior chamber of the right eye and also in the blood serum. The following data are the anterior chamber glucose measurements, expressed as a percentage of the blood glucose. 81 85 93 93 99 76 75 84 78 84 81 82 89 81 96 82 74 70 84 86 80 70 131 75 88 102 115 89 82 79 106 Construct a frequency distribution and display it as a table and as a histogram.”
#data
concentration = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, 89, 82, 79, 106)
#data set
glucose_data <- data.frame(
concentration = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, 89, 82, 79, 106)
)
glucose_data
## concentration
## 1 81
## 2 85
## 3 93
## 4 93
## 5 99
## 6 76
## 7 75
## 8 84
## 9 78
## 10 84
## 11 81
## 12 82
## 13 89
## 14 81
## 15 96
## 16 82
## 17 74
## 18 70
## 19 84
## 20 86
## 21 80
## 22 70
## 23 131
## 24 75
## 25 88
## 26 102
## 27 115
## 28 89
## 29 82
## 30 79
## 31 106
glucose <- glucose_data
#data table deliniating the frequencies of concentrations provided by the data set
concentration_frequency_table <- table(glucose)
concentration_frequency_table
## glucose
## 70 74 75 76 78 79 80 81 82 84 85 86 88 89 93 96 99 102
## 2 1 2 1 1 1 1 3 3 3 1 1 1 2 2 1 1 1
## 106 115 131
## 1 1 1
#histogram demonstrating glucose concentration frequency
hist(concentration, main = "Glucose Concentrations of 31 Healthy Dogs", xlab= "Concentration", col = "red", breaks = 5)
#Histogram with normal curve (BONUS)
mean_glucose<-mean(concentration)
std_glucose<-sqrt(var(concentration))
hist(concentration, main = "Glucose Concentrations of 31 Healthy Dogs", xlab= "Concentration", col = "red", density = 20, breaks = 5, prob = TRUE)
curve(dnorm(x, mean=mean_glucose, sd=std_glucose), add=TRUE)
“2.4.2 Here are the 18 measurements of Mao activity reported in Exercise 2.2.2: 6.8 8.4 8.7 11.9 14.2 18.8 9.9 4.1 9.7 12.7 5.2 7.8 7.8 7.4 7.3 10.6 14.5 10.7 (a) Determine the median and the quartiles. (b) Determine the interquartile range. (c) How large would an observation in this data set have to be in order to be an outlier? (d) Construct a (modified) boxplot of the data”
#data
activity = c(6.8, 8.4, 8.7, 11.9, 14.2, 18.8, 9.9, 4.1, 9.7, 12.7, 5.2, 7.8, 7.8, 7.4, 7.3, 10.6, 14.5, 10.7)
#data set
MAO_data <- data.frame(
activity = c(6.8, 8.4, 8.7, 11.9, 14.2, 18.8, 9.9, 4.1, 9.7, 12.7, 5.2, 7.8, 7.8, 7.4, 7.3, 10.6, 14.5, 10.7)
)
results <- MAO_data
#Re-ordering data set results
results_ascending_order <- results[order(results),]
##(a, I) Determining the median --> Median = 9.2
median.result <- median(activity, na.rm = FALSE)
print(median.result)
## [1] 9.2
##(a, II) Determining the quartiles --> Q1 = 7.5, Q3 = 11.6
quantile(activity)
## 0% 25% 50% 75% 100%
## 4.1 7.5 9.2 11.6 18.8
##(b) Determining the interquartile range --> IQR = 4.1
IQR(activity)
## [1] 4.1
##(c) How large would an observation in this data set have to be in order to be an outlier?
“Outliers are values that are really large or small. They are determined mathematically by multiplying the IQR by 1.5 and either adding it to the 3rd quartile (to get larger outliers) or subtracting it from the 1st quartile (to get smaller outliers).
In this data set by performing the following calculation: Q3 + (1.5*IQR) = we get 17.75 Anything equal to or above 17.75 would be considered an outlier in this data set."
##(d) Constructing a modified boxplot from the data
boxplot(activity, main = "MAO Activity: 18 Measurements", ylab = "MAO Activity", col = "purple")
2.6.7 Dopamine is a chemical that plays a role in the transmission of signals in the brain. A pharmacologist measured the amount of dopamine in the brain of each of seven rats. The dopamine levels (nmoles/g) were as follows: 6.8 5.3 6.0 5.9 6.8 7.4 6.2 (a) Calculate the mean and standard deviation. (b) Determine the median and the interquartile range (d) Replace the observation 7.4 by 10.4 and repeat parts (a) and (b). Which of the descriptive measures display resistance and which do not?
#data
dopamine_concentration = c(6.8, 5.3, 6.0, 5.9, 6.8, 7.4, 6.2)
#data set
dopamine_data <- data.frame(
Dopamine_Concentrations = c(6.8, 5.3, 6.0, 5.9, 6.8, 7.4, 6.2)
)
dopamine_data
## Dopamine_Concentrations
## 1 6.8
## 2 5.3
## 3 6.0
## 4 5.9
## 5 6.8
## 6 7.4
## 7 6.2
dopamine <- dopamine_data
#(a, I) Calculating the mean --> Mean = 6.342857
mean(dopamine_concentration)
## [1] 6.342857
#(a, II) Calculating the Standard Deviation --> Standard Deviation = 0.7020379
sd(dopamine_concentration)
## [1] 0.7020379
#(b, I) Determining the median --> Median = 6.2
median(dopamine_concentration)
## [1] 6.2
#(b, II) Determining the IQR --> IQR = 0.85
IQR(dopamine_concentration)
## [1] 0.85
#(d) Replacing the 7.4 value with 10.4
modified_dopamine_concentration = c(6.8, 5.3, 6.0, 5.9, 6.8, 10.4, 6.2)
#(d, I) Calculating the mean --> Mean = 6.771429
mean(modified_dopamine_concentration)
## [1] 6.771429
#(d, II) Calculating the Standard Deviation --> Standard Deviation = 1.683958
sd(modified_dopamine_concentration)
## [1] 1.683958
#(d, III) Determining the median --> Median = 6.2
median(modified_dopamine_concentration)
## [1] 6.2
#(d, IV) Determining the IQR --> IQR = 0.85
IQR(modified_dopamine_concentration)
## [1] 0.85
“The values that display resistance are both the Median and the IQR (as they remain unchanged from those seen in part b in spite of the data point modification). Values that do not display resistance to this data point modification are the Mean and the SD.”
Match the Histograms (provided in the homework packet) to the respective Mean, Median, SD, and Sample Size descriptions.
For “A” we essentialy see a normal distribution curve. Therefore, we see that ~68% of the data set lie within 1 SD, (they are closer to the mean value). This would mean that the SD in this example (“A”) will be the smallest of all the histograms. It’s for this reason that the SECOND statistical cluster is the right one for this histogram.
For “B” we see a close to equivalent distribution across the x-axis. Since we know that SD is a measurement of variance, we know that this histogram depicts the greatest variety out of the three. Hence, it will have the largest SD value. It is for this reason that the FOURTH statistical cluster is the right one for this histogram.
For “C”- a right skewed histogram- we know that for the most part the mean will lie to the right of the median. Assuming that this is true in this example, we can deduce that the mean will be greater in “C” than it would be in either “A” or “B.” It is for this reason that the FIRST statistical cluster is the right one for this histogram.