The dataset used in this home work is called Pharmacokinetics of Theophylline, n = 132 with 5 variables. The dataset was retieved from http://vincentarelbundock.github.io/Rdatasets/ on 24th December, 2018 and was then save in my Github.
Use the summary function to gain an overview of the dataset. Then display the mean and median for at least two attributes.
# Retrieving the dataset from my Github
theURL <- "https://raw.githubusercontent.com/greeneyefirefly/sps-msds-theoph/master/dataset.csv"
Theoph <- read.csv (file = theURL, header = TRUE, sep = ",")
head(Theoph)
## X Subject Wt Dose Time conc
## 1 1 1 79.6 4.02 0.00 0.74
## 2 2 1 79.6 4.02 0.25 2.84
## 3 3 1 79.6 4.02 0.57 6.57
## 4 4 1 79.6 4.02 1.12 10.50
## 5 5 1 79.6 4.02 2.02 9.66
## 6 6 1 79.6 4.02 3.82 8.58
summary(Theoph)
## X Subject Wt Dose
## Min. : 1.00 Min. : 1.00 Min. :54.60 Min. :3.100
## 1st Qu.: 33.75 1st Qu.: 3.75 1st Qu.:63.58 1st Qu.:4.305
## Median : 66.50 Median : 6.50 Median :70.50 Median :4.530
## Mean : 66.50 Mean : 6.50 Mean :69.58 Mean :4.626
## 3rd Qu.: 99.25 3rd Qu.: 9.25 3rd Qu.:74.42 3rd Qu.:5.037
## Max. :132.00 Max. :12.00 Max. :86.40 Max. :5.860
## Time conc
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.595 1st Qu.: 2.877
## Median : 3.530 Median : 5.275
## Mean : 5.895 Mean : 4.960
## 3rd Qu.: 9.000 3rd Qu.: 7.140
## Max. :24.650 Max. :11.400
The dataset contains 5 variables (attributes):
The attributes selected were Dosage (‘Dose’) and Concentration (‘conc’). The R function mean(x) and median(x) were used.
# The means are:
mean(Theoph$Dose)
## [1] 4.625833
mean(Theoph$conc)
## [1] 4.960455
# The medians are:
median(Theoph$Dose)
## [1] 4.53
median(Theoph$conc)
## [1] 5.275
Create a new data frame with a subset of the columns and rows. Create new names for the data frame.
newTheoph<-Theoph[51:132,4:6]
colnames(newTheoph) <- c("newDose", "newTime", "newConc")
rownames(newTheoph) <- 1:nrow(newTheoph)
head(newTheoph)
## newDose newTime newConc
## 1 5.86 5.02 7.56
## 2 5.86 7.02 7.09
## 3 5.86 9.10 5.90
## 4 5.86 12.00 4.37
## 5 5.86 24.35 1.57
## 6 4.00 0.00 0.00
summary(newTheoph)
## newDose newTime newConc
## Min. :3.10 Min. : 0.0000 Min. : 0.000
## 1st Qu.:4.00 1st Qu.: 0.8225 1st Qu.: 2.808
## Median :4.92 Median : 3.5850 Median : 4.900
## Mean :4.69 Mean : 6.2246 Mean : 4.677
## 3rd Qu.:5.30 3rd Qu.: 9.0300 3rd Qu.: 6.643
## Max. :5.86 Max. :24.4300 Max. :10.210
# The new means are:
apply(newTheoph[,c(1,3)],2,mean)
## newDose newConc
## 4.690244 4.676585
# The new medians are:
apply(newTheoph[,c(1,3)],2,median)
## newDose newConc
## 4.92 4.90
abs(apply(Theoph[,c(4,6)],2,mean)-apply(newTheoph[,c(1,3)],2,mean))
## Dose conc
## 0.06441057 0.28386918
abs(apply(Theoph[,c(4,6)],2,median)-apply(newTheoph[,c(1,3)],2,median))
## Dose conc
## 0.390 0.375
The new mean for the Dose attribute differs by 0.0644 mg/kg to that of the old mean, from 4.690 mg/kg to the new 4.623 mg/kg. While the median differs by 0.390 mg/kg, from being 4.530 mg/kg before to 4.920 mg/kg with the new data set.
The new median for the concentration attribute (conc) differs by 0.283 mg/L, from 4.960 mg/L to the new 4.676 mg/L. While the median showed a difference of 0.375 mg/L, being 5.275 mg/L with the full dataset to 4.900 mg/L with the subset data.
For at least 3 values in a column, please rename them so that every value in that column is renamed.
Because my dataset contain measurement numbers, I instead rounded any newTime attribute values that falls within an absolute difference of 0.10 hr and less to the nearest whole number. Therefore, if newTime = 3.10 hr, it will become 3.00 hr, if newTime = 4.52 hr, it will remain the same, and if newTime = 5.98 hr, it will become 6.00 hr.
Below is few rows of the before-dataset. We see many newTime which falls within an absolute difference of 0.10 or less to the nearest whole number. Therefore, we want to rename them.
newTheoph[51:67,]
## newDose newTime newConc
## 51 5.50 0.37 2.89
## 52 5.50 0.77 5.22
## 53 5.50 1.02 6.41
## 54 5.50 2.05 7.83
## 55 5.50 3.55 10.21
## 56 5.50 5.05 9.18
## 57 5.50 7.08 8.02
## 58 5.50 9.38 7.14
## 59 5.50 12.10 5.68
## 60 5.50 23.70 2.42
## 61 4.92 0.00 0.00
## 62 4.92 0.25 4.86
## 63 4.92 0.50 7.24
## 64 4.92 0.98 8.00
## 65 4.92 1.98 6.81
## 66 4.92 3.60 5.87
## 67 4.92 5.02 5.22
Carrying out the renaming
roundall<-round(newTheoph$newTime,0.1)
difference<-abs(roundall-newTheoph$newTime)
newTheoph$newTime[which(difference<0.101)]<-round(newTheoph$newTime[which(difference<0.101)],0.1)
Afterwards, we now see that some newTime values which met the criteria were renamed to their respective nearest whole number.
newTheoph[51:67,]
## newDose newTime newConc
## 51 5.50 0.37 2.89
## 52 5.50 0.77 5.22
## 53 5.50 1.00 6.41
## 54 5.50 2.00 7.83
## 55 5.50 3.55 10.21
## 56 5.50 5.00 9.18
## 57 5.50 7.00 8.02
## 58 5.50 9.38 7.14
## 59 5.50 12.00 5.68
## 60 5.50 23.70 2.42
## 61 4.92 0.00 0.00
## 62 4.92 0.25 4.86
## 63 4.92 0.50 7.24
## 64 4.92 1.00 8.00
## 65 4.92 2.00 6.81
## 66 4.92 3.60 5.87
## 67 4.92 5.00 5.22