library(dbplyr)
I am using the article “You Can’t Trust What You Read About Nutrition” and corresponding dataset from “https://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/”. The article explains how difficult it is to examine how food and other factors influence health. Health is such a complex issue and measuring it is even more complex. Even recording everything you eat in a day is harder than anticipated. Despite all of these challenges, researchers did their best to survey participants based on 26 different characteristics. The the full dataset is named raw_anonymized_data.csv in my github.
I have created new data frame from a subset of the columns in the original dataset from the article. I have kept ID, rash, and cat as columns in my subset. The “ID” column represents the survey participants identification number. The “rash” column represents if the participant had a weird rash in the past year (yes/no). The “cat” column represents if the participant has a cat (yes/no).
healthy <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/nutrition-studies/raw_anonymized_data.csv", header= TRUE, sep=",")
healthy2 <- subset(healthy, select = c(ID, rash, cat))
healthy2
## ID rash cat
## 1 1003 Yes No
## 2 1053 Yes No
## 3 1006 No No
## 4 1166 No No
## 5 1134 No No
## 6 1014 No Yes
## 7 1074 No No
## 8 1151 No No
## 9 1001 No No
## 10 1048 No No
## 11 1073 No Yes
## 12 1075 No No
## 13 1051 No No
## 14 1173 No No
## 15 1148 Yes No
## 16 1105 No Yes
## 17 1008 No Yes
## 18 1192 No No
## 19 1081 Yes No
## 20 1103 No No
## 21 1071 Yes No
## 22 1063 No Yes
## 23 1146 No Yes
## 24 1039 No No
## 25 1058 No Yes
## 26 1123 Yes Yes
## 27 1068 Yes Yes
## 28 1120 No No
## 29 1115 No Yes
## 30 1043 No Yes
## 31 1152 No Yes
## 32 1086 No No
## 33 1076 No No
## 34 1138 Yes Yes
## 35 1177 No No
## 36 1080 No No
## 37 1034 No Yes
## 38 1054 No No
## 39 1101 No No
## 40 1119 No Yes
## 41 1102 No Yes
## 42 1176 No No
## 43 1022 No No
## 44 1019 No Yes
## 45 1153 No No
## 46 1128 Yes Yes
## 47 1002 Yes Yes
## 48 1026 No No
## 49 1013 No No
## 50 1129 No No
## 51 1005 No No
## 52 1044 No Yes
## 53 1045 No No
## 54 1093 No No
I wrote code to read the dataset from the “You Can’t Trust What You Read About Nutrition” and created a subset of the columns from the original dataset of “ID”, “cat” and “rash”.