EXERCISE 1

DATA MANAGEMENT IN R

I. OBJECTIVES

At the end of this exercise, the participant must be able to:

  1. load built-in, external, and user-created data to R and

  2. employ data management in R.

II. METHODS

Write the correct R scripts to answer the following problems. Do not forget to write a documentation for your script.

A. Using the iris built-in data set in R:

  1. Load the data into an R object iris.

  2. Access the plants of species versicolor and load the plants in the object species1.

  3. Access the plants with sepal length of at least 5.0 and load the plants in the object largesepals.

B. Using the data set patients.csv:

  1. Load the data into the object patientrecords.

  2. Access the individuals with at least 50 kg of weight and assign it to the object wgt50.

  3. How many individuals are at least 20 years old?

C. Create an R data set named experiment based on the experiment described below.

Experiment: A study was conducted to determine the effect of 2 new feed formulation (1, 2) on the weight of eggs. Three species of ducks (A, B, C) were purposively selected for the study. The following data were generated.

Species-Feed Weight Species-Feed Weight
A-1 5.6 B-2 7.3
A-1 5.8 B-2 7.1
A-2 6.1 C-1 6.3
A-2 6.3 C-1 6.2
B-1 8.1 C-2 6.8
B-1 8.2 C-2 6.9
  1. Specifications: The data set must have three columns namely species, feed, and weight.
  2. Save the R data set as experiment as a comma delimited file.

EXERCISE 1 ANSWER KEY

  1. Using the iris built-in data set in R:
  1. Load the data into an R object iris.
iris
  1. Access the plants of species versicolor and load the plants in the object species1.
#To examine the structure of the data set 'iris'
str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
species1 <- iris[which(iris$Species == 'versicolor'),]
species1
  1. Access the plants with sepal length of at least 5.0 and load the plants in the object largesepals.
largesepals <- species1[which(species1$Sepal.Length >= 5.0),]
#or
largesepals <- iris[which(species1$Sepal.Length >= 5.0 & iris$Species == 'versicolor'),]
largesepals

B. Using the data set patients.csv:

  1. Load the data into the object patientrecords.
#Kindly change the working directory where your patients.csv is located. THIS APPLIES TO ALL 'read.csv' and 'write.csv'
patientrecords <- read.csv(file='C:\\Users\\USER\\Desktop\\2019 Predictive Analytics Course in R\\R Scripts\\PA Course 1\\patients.csv')
patientrecords
  1. Access the individuals with at least 50 kg of weight and assign it to the object wgt50.
wgt50 <- patientrecords[which(patientrecords$weight >= 50),]
wgt50
  1. How many individuals are at least 20 years old?
age20AndUp <- wgt50[which(wgt50$age >= 20),]
#or
age20AndUp <- patientrecords[which(patientrecords$age >= 20 & patientrecords$weight >= 50),]
age20AndUp

C. Create an R data set named experiment based on the experiment described below.

Experiment: A study was conducted to determine the effect of 2 new feed formulation (1, 2) on the weight of eggs. Three species of ducks (A, B, C) were purposively selected for the study. The following data were generated.

Species-Feed Weight Species-Feed Weight
A-1 5.6 B-2 7.3
A-1 5.8 B-2 7.1
A-2 6.1 C-1 6.3
A-2 6.3 C-1 6.2
B-1 8.1 C-2 6.8
B-1 8.2 C-2 6.9
  1. Specifications: The data set must have three columns namely species, feed, and weight.
experiment <- data.frame(
  species = c('A','A','A','A','B','B','B','B','C','C','C','C'),
  feed = c(1,1,2,2,1,1,2,2,1,1,2,2),
  weight = c(5.6,5.8,6.1,6.3,8.1,8.2,7.3,7.1,6.3,6.2,6.8,6.9)
)
experiment
  1. Save the R data set as experiment as a comma delimited file.
write.csv(experiment, file=("C:\\Users\\USER\\Desktop\\2019 Predictive Analytics Course in R\\R Scripts\\PA Course 1\\experiment.csv"), row.names=TRUE)
LS0tDQp0aXRsZTogIlBBIENvdXJzZSAxIEV4ZXJjaXNlIDEgd2l0aCBBbnN3ZXIgS2V5Ig0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCioqKg0KIyNFWEVSQ0lTRSAxDQojIyNEQVRBIE1BTkFHRU1FTlQgSU4gUg0KDQojIyMjSS4gT0JKRUNUSVZFUw0KDQpBdCB0aGUgZW5kIG9mIHRoaXMgZXhlcmNpc2UsIHRoZSBwYXJ0aWNpcGFudCBtdXN0IGJlIGFibGUgdG86ICANCg0KYS4gbG9hZCBidWlsdC1pbiwgZXh0ZXJuYWwsIGFuZCB1c2VyLWNyZWF0ZWQgZGF0YSB0byBSIGFuZCAgDQoNCmIuIGVtcGxveSBkYXRhIG1hbmFnZW1lbnQgaW4gUi4gIA0KDQojIyMjSUkuIE1FVEhPRFMNCg0KV3JpdGUgdGhlIGNvcnJlY3QgUiBzY3JpcHRzIHRvIGFuc3dlciB0aGUgZm9sbG93aW5nIHByb2JsZW1zLiBEbyBub3QgZm9yZ2V0IHRvIHdyaXRlIGEgZG9jdW1lbnRhdGlvbiBmb3IgeW91ciBzY3JpcHQuDQoNCkEuIFVzaW5nIHRoZSAqKmlyaXMqKiBidWlsdC1pbiBkYXRhIHNldCBpbiBSOiAgDQoNCjEuIExvYWQgdGhlIGRhdGEgaW50byBhbiBSIG9iamVjdCAqKmlyaXMqKi4gIA0KDQoyLiBBY2Nlc3MgdGhlIHBsYW50cyBvZiBzcGVjaWVzIHZlcnNpY29sb3IgYW5kIGxvYWQgdGhlIHBsYW50cyBpbiB0aGUgb2JqZWN0ICoqc3BlY2llczEqKi4gIA0KDQozLglBY2Nlc3MgdGhlIHBsYW50cyB3aXRoIHNlcGFsIGxlbmd0aCBvZiBhdCBsZWFzdCA1LjAgYW5kIGxvYWQgdGhlIHBsYW50cyBpbiB0aGUgb2JqZWN0ICoqbGFyZ2VzZXBhbHMqKi4gIA0KDQpCLiBVc2luZyB0aGUgZGF0YSBzZXQgKipwYXRpZW50cy5jc3YqKjogIA0KDQoxLiBMb2FkIHRoZSBkYXRhIGludG8gdGhlIG9iamVjdCAqKnBhdGllbnRyZWNvcmRzKiouDQoNCjIuIEFjY2VzcyB0aGUgaW5kaXZpZHVhbHMgd2l0aCBhdCBsZWFzdCA1MCBrZyBvZiB3ZWlnaHQgYW5kIGFzc2lnbiBpdCB0byB0aGUgb2JqZWN0ICoqd2d0NTAqKi4NCg0KMy4gSG93IG1hbnkgaW5kaXZpZHVhbHMgYXJlIGF0IGxlYXN0IDIwIHllYXJzIG9sZD8NCg0KQy4gQ3JlYXRlIGFuIFIgZGF0YSBzZXQgbmFtZWQgZXhwZXJpbWVudCBiYXNlZCBvbiB0aGUgKipleHBlcmltZW50KiogZGVzY3JpYmVkIGJlbG93LiAgDQogIA0KKipFeHBlcmltZW50Kio6IEEgc3R1ZHkgd2FzIGNvbmR1Y3RlZCB0byBkZXRlcm1pbmUgdGhlIGVmZmVjdCBvZiAyIG5ldyBmZWVkIGZvcm11bGF0aW9uICgxLCAyKSBvbiB0aGUgd2VpZ2h0IG9mIGVnZ3MuIFRocmVlIHNwZWNpZXMgb2YgZHVja3MgKEEsIEIsIEMpIHdlcmUgcHVycG9zaXZlbHkgc2VsZWN0ZWQgZm9yIHRoZSBzdHVkeS4gVGhlIGZvbGxvd2luZyBkYXRhIHdlcmUgZ2VuZXJhdGVkLg0KDQpTcGVjaWVzLUZlZWR8V2VpZ2h0fFNwZWNpZXMtRmVlZHxXZWlnaHQNCi0tLXwtLS18LS0tfC0tLQ0KQS0xfDUuNnxCLTJ8Ny4zDQpBLTF8NS44fEItMnw3LjENCkEtMnw2LjF8Qy0xfDYuMw0KQS0yfDYuM3xDLTF8Ni4yDQpCLTF8OC4xfEMtMnw2LjgNCkItMXw4LjJ8Qy0yfDYuOQ0KDQoxLiBTcGVjaWZpY2F0aW9uczogVGhlIGRhdGEgc2V0IG11c3QgaGF2ZSB0aHJlZSBjb2x1bW5zIG5hbWVseSBzcGVjaWVzLCBmZWVkLCBhbmQgd2VpZ2h0Lg0KMi4gU2F2ZSB0aGUgUiBkYXRhIHNldCBhcyAqKmV4cGVyaW1lbnQqKiBhcyBhIGNvbW1hIGRlbGltaXRlZCBmaWxlLg0KDQotLS0NCiMjRVhFUkNJU0UgMSBBTlNXRVIgS0VZDQpBLglVc2luZyB0aGUgKippcmlzKiogYnVpbHQtaW4gZGF0YSBzZXQgaW4gUjoNCjEuCUxvYWQgdGhlIGRhdGEgaW50byBhbiBSIG9iamVjdCAqKmlyaXMqKi4NCmBgYHtyfQ0KaXJpcw0KYGBgDQoNCjIuCUFjY2VzcyB0aGUgcGxhbnRzIG9mIHNwZWNpZXMgdmVyc2ljb2xvciBhbmQgbG9hZCB0aGUgcGxhbnRzIGluIHRoZSBvYmplY3QgKipzcGVjaWVzMSoqLg0KYGBge3J9DQojVG8gZXhhbWluZSB0aGUgc3RydWN0dXJlIG9mIHRoZSBkYXRhIHNldCAnaXJpcycNCnN0cihpcmlzKQ0Kc3BlY2llczEgPC0gaXJpc1t3aGljaChpcmlzJFNwZWNpZXMgPT0gJ3ZlcnNpY29sb3InKSxdDQpzcGVjaWVzMQ0KYGBgDQoNCjMuCUFjY2VzcyB0aGUgcGxhbnRzIHdpdGggc2VwYWwgbGVuZ3RoIG9mIGF0IGxlYXN0IDUuMCBhbmQgbG9hZCB0aGUgcGxhbnRzIGluIHRoZSBvYmplY3QgKipsYXJnZXNlcGFscyoqLiAgDQpgYGB7cn0NCmxhcmdlc2VwYWxzIDwtIHNwZWNpZXMxW3doaWNoKHNwZWNpZXMxJFNlcGFsLkxlbmd0aCA+PSA1LjApLF0NCiNvcg0KbGFyZ2VzZXBhbHMgPC0gaXJpc1t3aGljaChzcGVjaWVzMSRTZXBhbC5MZW5ndGggPj0gNS4wICYgaXJpcyRTcGVjaWVzID09ICd2ZXJzaWNvbG9yJyksXQ0KbGFyZ2VzZXBhbHMNCmBgYA0KICANCkIuIFVzaW5nIHRoZSBkYXRhIHNldCAqKnBhdGllbnRzLmNzdioqOiAgDQoNCjEuIExvYWQgdGhlIGRhdGEgaW50byB0aGUgb2JqZWN0ICoqcGF0aWVudHJlY29yZHMqKi4NCmBgYHtyfQ0KI0tpbmRseSBjaGFuZ2UgdGhlIHdvcmtpbmcgZGlyZWN0b3J5IHdoZXJlIHlvdXIgcGF0aWVudHMuY3N2IGlzIGxvY2F0ZWQuIFRISVMgQVBQTElFUyBUTyBBTEwgJ3JlYWQuY3N2JyBhbmQgJ3dyaXRlLmNzdicNCnBhdGllbnRyZWNvcmRzIDwtIHJlYWQuY3N2KGZpbGU9J0M6XFxVc2Vyc1xcVVNFUlxcRGVza3RvcFxcMjAxOSBQcmVkaWN0aXZlIEFuYWx5dGljcyBDb3Vyc2UgaW4gUlxcUiBTY3JpcHRzXFxQQSBDb3Vyc2UgMVxccGF0aWVudHMuY3N2JykNCnBhdGllbnRyZWNvcmRzDQpgYGANCg0KMi4gQWNjZXNzIHRoZSBpbmRpdmlkdWFscyB3aXRoIGF0IGxlYXN0IDUwIGtnIG9mIHdlaWdodCBhbmQgYXNzaWduIGl0IHRvIHRoZSBvYmplY3QgKip3Z3Q1MCoqLg0KYGBge3J9DQp3Z3Q1MCA8LSBwYXRpZW50cmVjb3Jkc1t3aGljaChwYXRpZW50cmVjb3JkcyR3ZWlnaHQgPj0gNTApLF0NCndndDUwDQpgYGANCg0KMy4gSG93IG1hbnkgaW5kaXZpZHVhbHMgYXJlIGF0IGxlYXN0IDIwIHllYXJzIG9sZD8NCmBgYHtyfQ0KYWdlMjBBbmRVcCA8LSB3Z3Q1MFt3aGljaCh3Z3Q1MCRhZ2UgPj0gMjApLF0NCiNvcg0KYWdlMjBBbmRVcCA8LSBwYXRpZW50cmVjb3Jkc1t3aGljaChwYXRpZW50cmVjb3JkcyRhZ2UgPj0gMjAgJiBwYXRpZW50cmVjb3JkcyR3ZWlnaHQgPj0gNTApLF0NCmFnZTIwQW5kVXANCg0KYGBgDQoNCkMuIENyZWF0ZSBhbiBSIGRhdGEgc2V0IG5hbWVkIGV4cGVyaW1lbnQgYmFzZWQgb24gdGhlICoqZXhwZXJpbWVudCoqIGRlc2NyaWJlZCBiZWxvdy4gIA0KICANCioqRXhwZXJpbWVudCoqOiBBIHN0dWR5IHdhcyBjb25kdWN0ZWQgdG8gZGV0ZXJtaW5lIHRoZSBlZmZlY3Qgb2YgMiBuZXcgZmVlZCBmb3JtdWxhdGlvbiAoMSwgMikgb24gdGhlIHdlaWdodCBvZiBlZ2dzLiBUaHJlZSBzcGVjaWVzIG9mIGR1Y2tzIChBLCBCLCBDKSB3ZXJlIHB1cnBvc2l2ZWx5IHNlbGVjdGVkIGZvciB0aGUgc3R1ZHkuIFRoZSBmb2xsb3dpbmcgZGF0YSB3ZXJlIGdlbmVyYXRlZC4NCg0KU3BlY2llcy1GZWVkfFdlaWdodHxTcGVjaWVzLUZlZWR8V2VpZ2h0DQotLS18LS0tfC0tLXwtLS0NCkEtMXw1LjZ8Qi0yfDcuMw0KQS0xfDUuOHxCLTJ8Ny4xDQpBLTJ8Ni4xfEMtMXw2LjMNCkEtMnw2LjN8Qy0xfDYuMg0KQi0xfDguMXxDLTJ8Ni44DQpCLTF8OC4yfEMtMnw2LjkNCg0KMS4gU3BlY2lmaWNhdGlvbnM6IFRoZSBkYXRhIHNldCBtdXN0IGhhdmUgdGhyZWUgY29sdW1ucyBuYW1lbHkgc3BlY2llcywgZmVlZCwgYW5kIHdlaWdodC4NCmBgYHtyfQ0KZXhwZXJpbWVudCA8LSBkYXRhLmZyYW1lKA0KICBzcGVjaWVzID0gYygnQScsJ0EnLCdBJywnQScsJ0InLCdCJywnQicsJ0InLCdDJywnQycsJ0MnLCdDJyksDQogIGZlZWQgPSBjKDEsMSwyLDIsMSwxLDIsMiwxLDEsMiwyKSwNCiAgd2VpZ2h0ID0gYyg1LjYsNS44LDYuMSw2LjMsOC4xLDguMiw3LjMsNy4xLDYuMyw2LjIsNi44LDYuOSkNCikNCmV4cGVyaW1lbnQNCmBgYA0KDQoyLiBTYXZlIHRoZSBSIGRhdGEgc2V0IGFzICoqZXhwZXJpbWVudCoqIGFzIGEgY29tbWEgZGVsaW1pdGVkIGZpbGUuDQpgYGB7cn0NCndyaXRlLmNzdihleHBlcmltZW50LCBmaWxlPSgiQzpcXFVzZXJzXFxVU0VSXFxEZXNrdG9wXFwyMDE5IFByZWRpY3RpdmUgQW5hbHl0aWNzIENvdXJzZSBpbiBSXFxSIFNjcmlwdHNcXFBBIENvdXJzZSAxXFxleHBlcmltZW50LmNzdiIpLCByb3cubmFtZXM9VFJVRSkNCmBgYA0KDQo=