library(dbplyr)

Overview:

I am using the article “You Can’t Trust What You Read About Nutrition” and corresponding dataset from “https://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/”. The article explains how difficult it is to examine how food and other factors influence health. Health is such a complex issue and measuring it is even more complex. Even recording everything you eat in a day is harder than anticipated. Despite all of these challenges, researchers did their best to survey participants based on 26 different characteristics. The the full dataset is named raw_anonymized_data.csv in my github.

Modified Data Frame:

I have created new data frame from a subset of the columns in the original dataset from the article. I have kept ID, rash, and cat as columns in my subset. The “ID” column represents the survey participants identification number. The “rash” column represents if the participant had a weird rash in the past year (yes/no). The “cat” column represents if the participant has a cat (yes/no).

healthy <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/nutrition-studies/raw_anonymized_data.csv", header= TRUE, sep=",")
healthy2 <- subset(healthy, select = c(ID, rash, cat))
healthy2
##      ID rash cat
## 1  1003  Yes  No
## 2  1053  Yes  No
## 3  1006   No  No
## 4  1166   No  No
## 5  1134   No  No
## 6  1014   No Yes
## 7  1074   No  No
## 8  1151   No  No
## 9  1001   No  No
## 10 1048   No  No
## 11 1073   No Yes
## 12 1075   No  No
## 13 1051   No  No
## 14 1173   No  No
## 15 1148  Yes  No
## 16 1105   No Yes
## 17 1008   No Yes
## 18 1192   No  No
## 19 1081  Yes  No
## 20 1103   No  No
## 21 1071  Yes  No
## 22 1063   No Yes
## 23 1146   No Yes
## 24 1039   No  No
## 25 1058   No Yes
## 26 1123  Yes Yes
## 27 1068  Yes Yes
## 28 1120   No  No
## 29 1115   No Yes
## 30 1043   No Yes
## 31 1152   No Yes
## 32 1086   No  No
## 33 1076   No  No
## 34 1138  Yes Yes
## 35 1177   No  No
## 36 1080   No  No
## 37 1034   No Yes
## 38 1054   No  No
## 39 1101   No  No
## 40 1119   No Yes
## 41 1102   No Yes
## 42 1176   No  No
## 43 1022   No  No
## 44 1019   No Yes
## 45 1153   No  No
## 46 1128  Yes Yes
## 47 1002  Yes Yes
## 48 1026   No  No
## 49 1013   No  No
## 50 1129   No  No
## 51 1005   No  No
## 52 1044   No Yes
## 53 1045   No  No
## 54 1093   No  No

Conclusion:

I wrote code to read the dataset from the “You Can’t Trust What You Read About Nutrition” and created a subset of the columns from the original dataset of “ID”, “cat” and “rash”.