Nutrition has been a scandalous topic for the last several years. The data reported from the article “You Can’t Trust What You Read About Nutrition” dates back to 2016, and controversy surrounding nutritional guidelines has skyrocketed since then. Public policy constituting a “healthy” diet is developed based on misrepresented self-reporting tools like food diaries and FFQs (food frequency questionnaires). Several correlations related to various health outcomes have been documented from the collected data, often leading to inaccurate, non-causal results. The code below selects the columns from the raw data provided in the article, which I deemed most valuable for analysis.
Article Link: https://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/
install.packages('readr', repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/5m/4f5rvwrn5rngf6j4gpl2mc9w0000gn/T//Rtmpz8GVM9/downloaded_packages
library (readr)
urlfile="https://raw.githubusercontent.com/fivethirtyeight/data/master/nutrition-studies/raw_anonymized_data.csv"
raw_data<-read.csv(url(urlfile))
raw_data<- data.frame(raw_data)
subset_raw_data <-raw_data[c("ID", "cancer", "diabetes", "heart_disease", "ever_smoked", "currently_smoke", "DT_PROT", "DT_CARB", "DT_ALCO", "DT_SUG_T", "DT_FIBE", "DT_CHOL")]
subset_raw_data
## ID cancer diabetes heart_disease ever_smoked currently_smoke DT_PROT
## 1 1003 Yes No No Yes Yes 87.15
## 2 1053 No Yes Yes Yes Yes 79.06
## 3 1006 Yes Yes Yes No No 89.54
## 4 1166 No No No No No 73.73
## 5 1134 Yes No No No No 59.86
## 6 1014 No No No Yes Yes 45.56
## 7 1074 Yes No No No No 35.30
## 8 1151 Yes No Yes No No 53.87
## 9 1001 Yes Yes Yes Yes No 43.49
## 10 1048 Yes No No No No 91.29
## 11 1073 Yes No Yes Yes No 116.90
## 12 1075 No Yes No No No 89.56
## 13 1051 Yes No No No No 75.56
## 14 1173 No Yes Yes No No 59.55
## 15 1148 No No No No No 110.83
## 16 1105 No No Yes No No 83.90
## 17 1008 No No No Yes No 99.86
## 18 1192 Yes No Yes Yes Yes 71.13
## 19 1081 Yes Yes No Yes No 115.04
## 20 1103 No No No No No 74.77
## 21 1071 Yes Yes Yes No No 59.06
## 22 1063 No No No Yes No 80.82
## 23 1146 No No No No No 70.98
## 24 1039 No No Yes Yes Yes 53.98
## 25 1058 No Yes No No No 160.35
## 26 1123 No No No Yes No 119.14
## 27 1068 No No No No No 80.15
## 28 1120 No No Yes No No 50.05
## 29 1115 Yes No No No No 83.74
## 30 1043 No No Yes No No 48.48
## 31 1152 Yes Yes No Yes No 54.69
## 32 1086 Yes No Yes Yes No 42.78
## 33 1076 No No No No No 83.52
## 34 1138 Yes Yes No No No 32.04
## 35 1177 No Yes No No No 46.77
## 36 1080 No No Yes No No 69.18
## 37 1034 Yes No Yes No No 85.99
## 38 1054 Yes Yes No No No 189.83
## 39 1101 No No No No No 99.25
## 40 1119 No No Yes No No 25.80
## 41 1102 No No No Yes No 102.15
## 42 1176 Yes No No No No 86.39
## 43 1022 Yes Yes Yes No No 159.46
## 44 1019 Yes No No No No 64.13
## 45 1153 No No No No No 65.07
## 46 1128 Yes No Yes Yes No 43.07
## 47 1002 No No No No No 113.82
## 48 1026 Yes Yes Yes No No 74.58
## 49 1013 Yes No No No No 47.18
## 50 1129 Yes No No No No 75.32
## 51 1005 Yes Yes No No No 51.93
## 52 1044 Yes No No No No 93.68
## 53 1045 Yes No Yes Yes No 115.87
## 54 1093 No No No No No 59.72
## DT_CARB DT_ALCO DT_SUG_T DT_FIBE DT_CHOL
## 1 62.10 40.57000 25.50 8.90 507.92
## 2 197.57 16.66000 86.30 13.41 269.81
## 3 254.19 9.31000 100.76 17.00 259.12
## 4 377.33 0.00115 187.25 37.35 224.06
## 5 201.19 44.47000 76.28 20.39 165.06
## 6 167.93 6.99000 55.08 29.80 71.13
## 7 142.38 15.62000 53.19 13.88 71.60
## 8 147.10 4.24000 62.46 13.92 338.24
## 9 133.87 23.38000 39.35 14.06 118.31
## 10 249.77 3.94000 76.61 29.75 379.11
## 11 371.87 23.04000 161.87 36.14 258.93
## 12 142.74 16.13000 49.34 14.19 267.96
## 13 254.82 21.08000 63.60 37.79 45.16
## 14 189.34 8.95000 83.01 22.99 200.07
## 15 245.70 1.64000 77.71 27.73 307.83
## 16 239.38 26.59000 85.27 22.52 424.76
## 17 260.59 30.14000 68.62 34.80 495.97
## 18 205.78 33.83000 51.26 21.02 204.28
## 19 372.32 0.00000 106.34 92.87 120.80
## 20 249.09 0.00000 100.83 15.41 307.67
## 21 156.72 3.89000 56.02 11.97 151.16
## 22 213.93 16.14000 80.18 15.47 287.90
## 23 180.39 12.73000 63.20 20.68 191.49
## 24 176.99 72.23000 46.57 8.81 145.04
## 25 322.33 0.00000 159.58 21.58 847.40
## 26 309.65 16.44000 121.83 26.01 383.45
## 27 277.72 54.56000 171.50 13.54 153.73
## 28 194.50 21.20000 75.84 21.24 112.89
## 29 149.85 13.99000 65.60 29.60 565.27
## 30 135.92 14.64000 49.26 13.54 113.75
## 31 170.69 0.82600 87.52 13.34 212.47
## 32 137.64 9.03000 43.89 13.53 129.01
## 33 185.35 20.05000 44.06 19.39 471.53
## 34 156.99 0.18400 106.47 8.03 62.42
## 35 159.59 0.03680 57.61 15.21 146.99
## 36 168.82 0.00000 106.85 11.37 376.84
## 37 228.39 0.71600 104.93 20.04 241.22
## 38 516.19 4.70000 195.31 38.74 758.42
## 39 344.19 67.97000 104.98 21.77 245.72
## 40 84.71 28.66000 24.62 7.92 81.76
## 41 80.33 13.10000 40.33 13.46 655.72
## 42 261.29 0.00000 112.22 22.77 181.14
## 43 366.45 18.54000 109.76 38.58 793.10
## 44 105.57 8.06000 47.00 14.98 220.21
## 45 246.68 17.29000 99.03 25.98 173.63
## 46 164.42 0.15500 61.73 29.51 119.99
## 47 315.70 38.22000 106.42 42.94 383.22
## 48 249.73 18.11000 105.38 51.10 154.35
## 49 159.80 28.81000 74.85 11.77 130.79
## 50 198.48 2.43000 88.59 26.15 268.60
## 51 171.83 6.12000 72.68 16.08 81.39
## 52 207.24 6.12000 115.79 20.33 242.73
## 53 247.17 37.98000 85.34 38.66 267.13
## 54 160.55 23.56000 37.51 23.79 255.71
colnames(subset_raw_data) <- c("ID", "Diabetes", "Heart Disease", "Ever Smoked", "Currently Smoke", "Daily Protein Intake", "Daily Carbohydrate Intake", "Daily Alcohol Intake", "Daily Sugar Intake", "Daily Fiber Intake", "Daily Cholesterol Intake")
summary(subset_raw_data)
## ID Diabetes Heart Disease Ever Smoked
## Min. :1001 Length:54 Length:54 Length:54
## 1st Qu.:1043 Class :character Class :character Class :character
## Median :1076 Mode :character Mode :character Mode :character
## Mean :1083
## 3rd Qu.:1127
## Max. :1192
## Currently Smoke Daily Protein Intake Daily Carbohydrate Intake
## Length:54 Length:54 Min. : 25.80
## Class :character Class :character 1st Qu.: 54.16
## Mode :character Mode :character Median : 75.05
## Mean : 78.61
## 3rd Qu.: 90.86
## Max. :189.83
## Daily Alcohol Intake Daily Sugar Intake Daily Fiber Intake
## Min. : 62.1 Min. : 0.000 Min. : 24.62
## 1st Qu.:159.6 1st Qu.: 3.902 1st Qu.: 55.31
## Median :198.0 Median :14.315 Median : 77.16
## Mean :216.1 Mean :16.724 Mean : 83.95
## 3rd Qu.:253.1 3rd Qu.:23.295 3rd Qu.:105.28
## Max. :516.2 Max. :72.230 Max. :195.31
## Daily Cholesterol Intake NA
## Min. : 7.92 Min. : 45.16
## 1st Qu.:13.89 1st Qu.:145.53
## Median :20.54 Median :232.64
## Mean :23.18 Mean :271.18
## 3rd Qu.:29.07 3rd Qu.:330.64
## Max. :92.87 Max. :847.40
As mentioned in the introduction, self-reporting and the inability to obtain accurate measurements are the biggest fallouts of nutritional studies to date. Ironically, it is the lack of education surrounding nutrition that makes the issue of self-reporting worse. Many people don’t know how to read a nutrition label or recognize serving sizes without a food scale, which inevitably leads to inaccurate caloric reporting. I do believe there is an issue here of having too many data points- which was often referenced in the article. Keeping people under lock and key for months, weeks, and/or years at a time for metabolic studies is unrealistic meaning that self-reporting is still the most feasible tool that we have. Aside from such studies, I am unsure of how self-reporting can be improved. However, it is important to remember that everyone has different dietary needs and restrictions, which is why nutrition is not a “one-size-fits-all approach.”