CUNY SPS MSDS Bridge Winter 2018 - R Programming HW #2

The dataset used in this home work is called Pharmacokinetics of Theophylline, n = 132 with 5 variables. The dataset was retieved from http://vincentarelbundock.github.io/Rdatasets/ on 24th December, 2018 and was then save in my Github.

Question 1

Use the summary function to gain an overview of the dataset. Then display the mean and median for at least two attributes.

# Retrieving the dataset from my Github
theURL <- "https://raw.githubusercontent.com/greeneyefirefly/sps-msds-theoph/master/dataset.csv"
Theoph <- read.csv (file = theURL, header = TRUE, sep = ",")
head(Theoph)

##   X Subject   Wt Dose Time  conc
## 1 1       1 79.6 4.02 0.00  0.74
## 2 2       1 79.6 4.02 0.25  2.84
## 3 3       1 79.6 4.02 0.57  6.57
## 4 4       1 79.6 4.02 1.12 10.50
## 5 5       1 79.6 4.02 2.02  9.66
## 6 6       1 79.6 4.02 3.82  8.58

summary(Theoph)

##        X             Subject            Wt             Dose      
##  Min.   :  1.00   Min.   : 1.00   Min.   :54.60   Min.   :3.100  
##  1st Qu.: 33.75   1st Qu.: 3.75   1st Qu.:63.58   1st Qu.:4.305  
##  Median : 66.50   Median : 6.50   Median :70.50   Median :4.530  
##  Mean   : 66.50   Mean   : 6.50   Mean   :69.58   Mean   :4.626  
##  3rd Qu.: 99.25   3rd Qu.: 9.25   3rd Qu.:74.42   3rd Qu.:5.037  
##  Max.   :132.00   Max.   :12.00   Max.   :86.40   Max.   :5.860  
##       Time             conc       
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.595   1st Qu.: 2.877  
##  Median : 3.530   Median : 5.275  
##  Mean   : 5.895   Mean   : 4.960  
##  3rd Qu.: 9.000   3rd Qu.: 7.140  
##  Max.   :24.650   Max.   :11.400

The dataset contains 5 variables (attributes):

Subject - an ordered factor with levels 1, …, 12 identifying the subject on whom the observation was made. The ordering is by increasing maximum concentration of theophylline observed.
Wt - weight of the subject (kg).
Dose - dose of theophylline administered orally to the subject (mg/kg).
Time - time since drug administration when the sample was drawn (hr).
conc - theophylline concentration in the sample (mg/L).

The attributes selected were Dosage (‘Dose’) and Concentration (‘conc’). The R function mean(x) and median(x) were used.

# The means are:
mean(Theoph$Dose)

## [1] 4.625833

mean(Theoph$conc)

## [1] 4.960455

# The medians are:
median(Theoph$Dose)

## [1] 4.53

median(Theoph$conc)

## [1] 5.275

Question 2 & 3

Create a new data frame with a subset of the columns and rows. Create new names for the data frame.

newTheoph<-Theoph[51:132,4:6]
colnames(newTheoph) <- c("newDose", "newTime", "newConc")
rownames(newTheoph) <- 1:nrow(newTheoph)
head(newTheoph)

##   newDose newTime newConc
## 1    5.86    5.02    7.56
## 2    5.86    7.02    7.09
## 3    5.86    9.10    5.90
## 4    5.86   12.00    4.37
## 5    5.86   24.35    1.57
## 6    4.00    0.00    0.00

Question 4

Use the summary function to create an overview of the new data frame. Print the mean and median of the same attributes from the subset.

summary(newTheoph)

##     newDose        newTime           newConc      
##  Min.   :3.10   Min.   : 0.0000   Min.   : 0.000  
##  1st Qu.:4.00   1st Qu.: 0.8225   1st Qu.: 2.808  
##  Median :4.92   Median : 3.5850   Median : 4.900  
##  Mean   :4.69   Mean   : 6.2246   Mean   : 4.677  
##  3rd Qu.:5.30   3rd Qu.: 9.0300   3rd Qu.: 6.643  
##  Max.   :5.86   Max.   :24.4300   Max.   :10.210

# The new means are:
apply(newTheoph[,c(1,3)],2,mean)

##  newDose  newConc 
## 4.690244 4.676585

# The new medians are:
apply(newTheoph[,c(1,3)],2,median)

## newDose newConc 
##    4.92    4.90

Compare the mean and median of the full dataset to that of the subset

abs(apply(Theoph[,c(4,6)],2,mean)-apply(newTheoph[,c(1,3)],2,mean))

##       Dose       conc 
## 0.06441057 0.28386918

abs(apply(Theoph[,c(4,6)],2,median)-apply(newTheoph[,c(1,3)],2,median))

##  Dose  conc 
## 0.390 0.375

The new mean for the Dose attribute differs by 0.0644 mg/kg to that of the old mean, from 4.690 mg/kg to the new 4.623 mg/kg. While the median differs by 0.390 mg/kg, from being 4.530 mg/kg before to 4.920 mg/kg with the new data set.

The new median for the concentration attribute (conc) differs by 0.283 mg/L, from 4.960 mg/L to the new 4.676 mg/L. While the median showed a difference of 0.375 mg/L, being 5.275 mg/L with the full dataset to 4.900 mg/L with the subset data.

Question 5

For at least 3 values in a column, please rename them so that every value in that column is renamed.

Because my dataset contain measurement numbers, I instead rounded any newTime attribute values that falls within an absolute difference of 0.10 hr and less to the nearest whole number. Therefore, if newTime = 3.10 hr, it will become 3.00 hr, if newTime = 4.52 hr, it will remain the same, and if newTime = 5.98 hr, it will become 6.00 hr.

Below is few rows of the before-dataset. We see many newTime which falls within an absolute difference of 0.10 or less to the nearest whole number. Therefore, we want to rename them.

newTheoph[51:67,]

##    newDose newTime newConc
## 51    5.50    0.37    2.89
## 52    5.50    0.77    5.22
## 53    5.50    1.02    6.41
## 54    5.50    2.05    7.83
## 55    5.50    3.55   10.21
## 56    5.50    5.05    9.18
## 57    5.50    7.08    8.02
## 58    5.50    9.38    7.14
## 59    5.50   12.10    5.68
## 60    5.50   23.70    2.42
## 61    4.92    0.00    0.00
## 62    4.92    0.25    4.86
## 63    4.92    0.50    7.24
## 64    4.92    0.98    8.00
## 65    4.92    1.98    6.81
## 66    4.92    3.60    5.87
## 67    4.92    5.02    5.22

Carrying out the renaming

roundall<-round(newTheoph$newTime,0.1)
difference<-abs(roundall-newTheoph$newTime)
newTheoph$newTime[which(difference<0.101)]<-round(newTheoph$newTime[which(difference<0.101)],0.1)

Afterwards, we now see that some newTime values which met the criteria were renamed to their respective nearest whole number.

newTheoph[51:67,]

##    newDose newTime newConc
## 51    5.50    0.37    2.89
## 52    5.50    0.77    5.22
## 53    5.50    1.00    6.41
## 54    5.50    2.00    7.83
## 55    5.50    3.55   10.21
## 56    5.50    5.00    9.18
## 57    5.50    7.00    8.02
## 58    5.50    9.38    7.14
## 59    5.50   12.00    5.68
## 60    5.50   23.70    2.42
## 61    4.92    0.00    0.00
## 62    4.92    0.25    4.86
## 63    4.92    0.50    7.24
## 64    4.92    1.00    8.00
## 65    4.92    2.00    6.81
## 66    4.92    3.60    5.87
## 67    4.92    5.00    5.22

CUNY SPS MSDS Bridge Winter 2018 - R Programming HW #2

S. Deokinanan

January 6, 2018

Question 1

Question 2 & 3

Question 4

Question 5