Import the height and weight dataset. We will do this through a couple of different ways:
- The command
height_weight_2 <- read.csv('Height_weight_male_female.csv', header = TRUE)
- Alternatively, you can have R studio figure out this command for you with the “import dataset” button
Next, lets start by looking at our dataset:
View(Height_weight_2)
Notice there is nothing being assigned here, just running a command b/c we are not saving anything to a variable name.
What kind of variables? Can use the summary in R to quickly find out a lot of information abotu the data.
summary(Height_weight_2)
Lets start by just making a scatterplot of height and width:
plot(Height_weight_2$Height_in, Height_weight_2$Weight_lbs)
We can use the help function to find out things about the graph. For example, how to add labels:
plot(Height_weight_2$Height_in, Height_weight_2$Weight_lbs, xlab = 'Height (inches)', ylab = 'Weight (lbs)')
What if we want to color our graph by our sex? Below the code does that and the second line adds a legend to the code, which is not necessary but helpful.
plot(Height_weight_2$Height_in, Height_weight_2$Weight_lbs, xlab = 'Height (inches)', ylab = 'Weight (lbs)', col = Height_weight_2$sex)
legend('bottomright', legend =levels(Height_weight_2$sex), col = 1:3, cex = 0.8, pch = 1)
Now lets make a histogram of the heights:
hist(Height_weight_2$Height_in)
What if we want to change the bin sizes? We can tell R how many bins we want with “breaks” and I also like to include the xlim argument to make a nicer x-axis that goes from 65-90. We can also make the axes prettier like we did before.
hist(Height_weight_2$Height_in, breaks = 10, xlim = range(65,90), xlab = "Height (inches)", )
What abotu a box plot?
boxplot(Height_weight_2$Height_in)
It could be interesting to seperate the data into two box plots by sex. Lets see how we can do that. Here the “~” tells R how to seperate the data.
boxplot(Height_weight_2$Height_in~Height_weight_2$sex)
Could we use this same notation for the summary data to find the different means? TO do this with use “tapply” function which uses the notation: tapply(value, factor, function) In this case our value is the height (call that variable), factor is sex, and the function is mean. Could do the same this with median, mode, summary etc.
tapply(Height_weight_2$Height_in, Height_weight_2$sex, mean)
LS0tCnRpdGxlOiAiSGVpZ2h0IFdlaWdodCBJbi1DbGFzcyBEZW1vIgpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAogIHBkZl9kb2N1bWVudDogZGVmYXVsdAotLS0KCkltcG9ydCB0aGUgaGVpZ2h0IGFuZCB3ZWlnaHQgZGF0YXNldC4gV2Ugd2lsbCBkbyB0aGlzIHRocm91Z2ggYSBjb3VwbGUgb2YgZGlmZmVyZW50IHdheXM6CgoxLiBUaGUgY29tbWFuZApgYGB7cn0KaGVpZ2h0X3dlaWdodF8yIDwtIHJlYWQuY3N2KCdIZWlnaHRfd2VpZ2h0X21hbGVfZmVtYWxlLmNzdicsIGhlYWRlciA9IFRSVUUpCmBgYAoKMi4gQWx0ZXJuYXRpdmVseSwgeW91IGNhbiBoYXZlIFIgc3R1ZGlvIGZpZ3VyZSBvdXQgdGhpcyBjb21tYW5kIGZvciB5b3Ugd2l0aCB0aGUgImltcG9ydCBkYXRhc2V0IiBidXR0b24KCk5leHQsIGxldHMgc3RhcnQgYnkgbG9va2luZyBhdCBvdXIgZGF0YXNldDoKCmBgYHtyfQpWaWV3KEhlaWdodF93ZWlnaHRfMikKYGBgCgpOb3RpY2UgdGhlcmUgaXMgbm90aGluZyBiZWluZyBhc3NpZ25lZCBoZXJlLCBqdXN0IHJ1bm5pbmcgYSBjb21tYW5kIGIvYyB3ZSBhcmUgbm90IHNhdmluZyBhbnl0aGluZyB0byBhIHZhcmlhYmxlIG5hbWUuIAoKV2hhdCBraW5kIG9mIHZhcmlhYmxlcz8gQ2FuIHVzZSB0aGUgc3VtbWFyeSBpbiBSIHRvIHF1aWNrbHkgZmluZCBvdXQgYSBsb3Qgb2YgaW5mb3JtYXRpb24gYWJvdHUgdGhlIGRhdGEuCgpgYGB7cn0Kc3VtbWFyeShIZWlnaHRfd2VpZ2h0XzIpCmBgYAoKTGV0cyBzdGFydCBieSBqdXN0IG1ha2luZyBhIHNjYXR0ZXJwbG90IG9mIGhlaWdodCBhbmQgd2lkdGg6CgpgYGB7cn0KcGxvdChIZWlnaHRfd2VpZ2h0XzIkSGVpZ2h0X2luLCBIZWlnaHRfd2VpZ2h0XzIkV2VpZ2h0X2xicykKYGBgCgpXZSBjYW4gdXNlIHRoZSBoZWxwIGZ1bmN0aW9uIHRvIGZpbmQgb3V0IHRoaW5ncyBhYm91dCB0aGUgZ3JhcGguIEZvciBleGFtcGxlLCBob3cgdG8gYWRkIGxhYmVsczoKCmBgYHtyfQpwbG90KEhlaWdodF93ZWlnaHRfMiRIZWlnaHRfaW4sIEhlaWdodF93ZWlnaHRfMiRXZWlnaHRfbGJzLCB4bGFiID0gJ0hlaWdodCAoaW5jaGVzKScsIHlsYWIgPSAnV2VpZ2h0IChsYnMpJykKYGBgCgpXaGF0IGlmIHdlIHdhbnQgdG8gY29sb3Igb3VyIGdyYXBoIGJ5IG91ciBzZXg/IEJlbG93IHRoZSBjb2RlIGRvZXMgdGhhdCBhbmQgdGhlIHNlY29uZCBsaW5lIGFkZHMgYSBsZWdlbmQgdG8gdGhlIGNvZGUsIHdoaWNoIGlzIG5vdCBuZWNlc3NhcnkgYnV0IGhlbHBmdWwuCgpgYGB7cn0KcGxvdChIZWlnaHRfd2VpZ2h0XzIkSGVpZ2h0X2luLCBIZWlnaHRfd2VpZ2h0XzIkV2VpZ2h0X2xicywgeGxhYiA9ICdIZWlnaHQgKGluY2hlcyknLCB5bGFiID0gJ1dlaWdodCAobGJzKScsIGNvbCA9IEhlaWdodF93ZWlnaHRfMiRzZXgpCmxlZ2VuZCgnYm90dG9tcmlnaHQnLCBsZWdlbmQgPWxldmVscyhIZWlnaHRfd2VpZ2h0XzIkc2V4KSwgY29sID0gMTozLCBjZXggPSAwLjgsIHBjaCA9IDEpCmBgYAoKTm93IGxldHMgbWFrZSBhIGhpc3RvZ3JhbSBvZiB0aGUgaGVpZ2h0czoKCmBgYHtyfQpoaXN0KEhlaWdodF93ZWlnaHRfMiRIZWlnaHRfaW4pCmBgYAoKV2hhdCBpZiB3ZSB3YW50IHRvIGNoYW5nZSB0aGUgYmluIHNpemVzPyBXZSBjYW4gdGVsbCBSIGhvdyBtYW55IGJpbnMgd2Ugd2FudCB3aXRoICJicmVha3MiIGFuZCBJIGFsc28gbGlrZSB0byBpbmNsdWRlIHRoZSB4bGltIGFyZ3VtZW50IHRvIG1ha2UgYSBuaWNlciB4LWF4aXMgdGhhdCBnb2VzIGZyb20gNjUtOTAuIFdlIGNhbiBhbHNvIG1ha2UgdGhlIGF4ZXMgcHJldHRpZXIgbGlrZSB3ZSBkaWQgYmVmb3JlLgoKYGBge3J9Cmhpc3QoSGVpZ2h0X3dlaWdodF8yJEhlaWdodF9pbiwgYnJlYWtzID0gMTAsIHhsaW0gPSByYW5nZSg2NSw5MCksIHhsYWIgPSAiSGVpZ2h0IChpbmNoZXMpIiwgKQpgYGAKCldoYXQgYWJvdHUgYSBib3ggcGxvdD8KCmBgYHtyfQpib3hwbG90KEhlaWdodF93ZWlnaHRfMiRIZWlnaHRfaW4pCmBgYAoKSXQgY291bGQgYmUgaW50ZXJlc3RpbmcgdG8gc2VwZXJhdGUgdGhlIGRhdGEgaW50byB0d28gYm94IHBsb3RzIGJ5IHNleC4gTGV0cyBzZWUgaG93IHdlIGNhbiBkbyB0aGF0LiBIZXJlIHRoZSAifiIgdGVsbHMgUiBob3cgdG8gc2VwZXJhdGUgdGhlIGRhdGEuCgpgYGB7cn0KYm94cGxvdChIZWlnaHRfd2VpZ2h0XzIkSGVpZ2h0X2lufkhlaWdodF93ZWlnaHRfMiRzZXgpCmBgYAoKQ291bGQgd2UgdXNlIHRoaXMgc2FtZSBub3RhdGlvbiBmb3IgdGhlIHN1bW1hcnkgZGF0YSB0byBmaW5kIHRoZSBkaWZmZXJlbnQgbWVhbnM/IFRPIGRvIHRoaXMgd2l0aCB1c2UgInRhcHBseSIgZnVuY3Rpb24gd2hpY2ggdXNlcyB0aGUgbm90YXRpb246CnRhcHBseSh2YWx1ZSwgZmFjdG9yLCBmdW5jdGlvbikKSW4gdGhpcyBjYXNlIG91ciB2YWx1ZSBpcyB0aGUgaGVpZ2h0IChjYWxsIHRoYXQgdmFyaWFibGUpLCBmYWN0b3IgaXMgc2V4LCBhbmQgdGhlIGZ1bmN0aW9uIGlzIG1lYW4uIENvdWxkIGRvIHRoZSBzYW1lIHRoaXMgd2l0aCBtZWRpYW4sIG1vZGUsIHN1bW1hcnkgZXRjLgoKYGBge3J9CnRhcHBseShIZWlnaHRfd2VpZ2h0XzIkSGVpZ2h0X2luLCBIZWlnaHRfd2VpZ2h0XzIkc2V4LCBtZWFuKQpgYGAKCg==