logo An analysis of the previous version of the data (published in April 2021) can be found at https://rpubs.com/Em_Mari3/FDC

1 Introduction


The U.S. Department of Agriculture’s FoodData Central (FDC) database is one (if not the) most expansive collection of food composition data available. By examining the data quality of FDC we can determine what food composition data has been reliably collected and can be accurately used for research and what information needs to be further expanded upon or updated.

Furthermore, the database is expansive and difficult to parse. By analyzing and comparing collected variables in a detailed and readable format, we increase the potential of understanding the data and using it to further the state of nutrition research.

2 Data Collection and Organization


2.1 Data Structure


As of October 2021, there are 2 ways to download the FoodData Central Database (1). You can either download the data as a collection of csv files or you can download the data as a collection of 4 JSON files.

The csv files contain information on the experimental foods (which the JSON files do not), but all updates to branded foods are organized under their own fdc id and thus act as duplicated entries with slight changes. These duplicates are practically impossible to parse with the information given. In the JSON files they are grouped together and we are provided with information for the most recent update only. To avoid counting each of these duplicates as their own entry, we will be using the JSON files.

The JSON files will be read into R as strings using the jsonlite package and converted into data frames for ease of use in analysis. The 4 JSON files are named as follows:

  • FoodData_Central_branded_food_json_2021-10-28
  • FoodData_Central_foundation_food_json_2021-10-28
  • FoodData_Central_sr_legacy_food_json_2021-10-28
  • FoodData_Central_survey_food_json_2021-10-28

To accurately compare and contrast data within the four files and identify the quality of the data overall we will have to analyze the variables available in each file and combine them into a singular data structure.

Variables Present in All 4 Files


The variables common among all 4 JSON files are:

Variable Name Variable Description
“foodClass” The classes of food within the data are “Survey” for FNDDS, “Branded” for branded foods, and “FinalFood” for SR legacy and foundation foods
“description” The name or description of the food such as “Milk, Whole” or “100 Grand Bar”
“foodNutrients” A nested variable containing all info on the nutrient composition (per 100g) and derivation of nutrient composition for each food
“foodAttributes” A nested variable left blank for SR legacy and foundation foods. For branded foods this variable contains a log of any updates made to this food (using variables “id”, “name”, “value”, “foodAttributeType.id”, “foodAttributeType.name”, and “foodAttributeType.description”). For survey foods this variable contains any attributes of the ingredients used
“fdcId” A unique identifier given to each food
“dataType” The dataset the food is contained in (of the 4 databases FNDDS, foundation, branded, and SR legacy)
“publicationDate” The day this version of the food as it appears in the data was published to the FoodData Central website

Variables Exclusive to Branded Foods


There are 16 variables exclusive to the branded foods, they are as follows:

Variable Name Variable Description
“modifiedDate” The last date the food was modified by the manufacturer
“availableDate” The date the food was made available to the USDA for entry in the FoodData central database
“marketCountry” The country the food product was sold in, now contains 3 unique entries “United States”, “New Zealand”, and “” (i.e. no marketCountry provided)
“brandOwner” The company who owns the brand that manufactured the food
“gtinUpc” The gtin or UPC barcode associated with the product
“dataSource” The source of the data from the following 3 options “LI” (for Label Insight), “GDSN”, and “NZGDSN” (for GS1 and New Zealand GS1)
“ingredients” The list of ingredients within the product
“servingSize” The serving size specified on the packaging of the food product
“servingSizeUnit” The unit of the serving size if provided in g or ml
“labelNutrients” The nutrient composition of the food as it is provided on the label of the food product
“brandedFoodCategory” The category of the food using the branded food categories
“foodUpdateLog” A nested variable containing variables “foodClass”, “description”, “foodAttributes”, “fdcId”, “dataType”, and “publicationDate” for any previous versions of the specified food
“brandName” The brand name of the product such as “LINDT” or “NEWMAN’S OWN”
“packageWeight” The weight of the food product including packaging
“householdServingFullText” The household serving if provided on the nutrition facts panel such as “1 cup” or “2 bars”
“subbrandName” The secondary brand of the product such as “CHIPS AHOY!” or “Coca-Cola”

Variables Exclusive to Foundation Foods and SR Legacy Foods


The files for foundation and SR foods follow the same structure. There are 17 variables exclusive to foundation and SR foods, they are as follows:

Variable Name Variable Description
“nutrientConversionFactors” A nested list of any conversion factors used to compute the nutrient composition of the food
“isHistoricalReference” A variable containing “FALSE” values for every entry, likely a variable intended for future use
“ndbNumber” The unique nutrient database number given to all foundation and SR legacy foods
“foodCategory” The SR food category associated with the food
“inputFoods” A nested variable containing the variables “id”, “foodDescription”, “inputFood.foodClass”, “inputFood.description”, “inputFood.foodCategory.id”, “inputFood.foodCategory.code”, “inputFood.foodCategory.description”, “inputFood.fdcId”, “inputFood.dataType”, and “inputFood.publicationDate” for all sample foods used to calculate the nutrition composition of the foundation food entry. For all SR entries inputFoods is a blank list with no entries
“foodPortions” A nested variable containing the variables “id”, “measureUnit.id”, “measureUnit.name”, “measureUnit.abbreviation”, “modifier”, “gramWeight”, “sequenceNumber”, and “minYearAcquired” that describe the portion of the food that was sampled for analysis. For all SR foods there is no variable specified as “minYearAcquired”
“scientificName” The scientific name of each food like “Solanum lycopersicum” or “Brassica oleracea (Acephala Group)”

Variables Exclusive to FNDDS (Survey) Foods


There are 6 variables unique to the survey foods, they are as follows:

Variable Name Variable Description
“foodCode” A unique code assigned to each food in FNDDS (multiple foods with different fdc ids can share a food code if they represent the same product made in different ways or by different manufacturers)
“startDate” The day the survey of foods began
“endDate” The day the survey of foods ended
“wweiaFoodCategory” The category of the food using the What We Eat in America(WWEIA) food categories
“inputFoods” A nested variable containing the variables “id”, “unit”, “portionDescription”, “portionCode”, “foodDescription”, “sequenceNumber”, “amount”, “ingredientCode”, “ingredientWeight”, and “ingredientDescription” that describe the ingredient(s) in the food that were used for analysis
“foodPortions” A nested variable containing the variables “id”, “measureUnit.id”, “measureUnit.name”, “measureUnit.abbreviation”, “modifier”, “gramWeight”, “sequenceNumber”, and “portionDescription” that describe the portion of ingredients in the food that were used for analysis

2.2 Common Variables


For a majority of our analysis we will be focusing on the variables found within all 4 json files. All 4 json files were filtered to only contain the variables “description”, “foodNutrients”, “fdcId”, “dataType”, and “publicationDate” and unnested. The variable “foodClass” will be excluded as it contains the same information as “dataType”. The nested variable “foodAttribute” will be excluded due to there being no information in this variable for SR legacy and foundation foods.

For our initial analysis of the overall data we will be using the following variables:

Variable Name Variable Description
fdc_id A unique number for each food in the Food Data Central Database
nutrient_id A unique number given to each nutrient
amount The amount of each nutrient per 100g of the listed food
data_points How many data points they used to derive the nutrition value
min, max, median Minimum, maximum, and median value of nutrition content within sample provided in three separate columns
data_type The type of data based on how it was acquired
food_description The name of the food or a brief description of the food such as “milk, whole”
publication_date The date the food was published to the FoodData Central website
nutrient_name The name of the nutrient
unit_name The unit of each nutrient (g, mg, mcg, IU, etc..)
derivation_description How the food was analyzed for nutrition content
source_id A unique number given to each source of nutrient composition

Below you’ll find the number of how much food and nutrient data there is in each data type.


Table 1: Number of Entries per Data Type
nutrient_entries food_entries
Branded 5137893 373242
Foundation 10023 159
SR Legacy 644125 7793
Survey (FNDDS) 460395 7083
Total 6252436 388277


Key notes/important takeaways:

  • Since the April update, 15,315 foods with a combined total of 306,038 nutrient entries have been added to branded (an average of 19.98 nutrient entries per food). This average number of nutrients per food is much higher than the overall average of nutrients per food in branded.
  • Since the April update, 19 foods with a combined total of 633 nutrient entries have been added to foundation (an average of 33.32 nutrient entries per food).
  • This version of the data does not include experimental foods. Since we intended to exclude the experimental foods from most of the analysis anyway, this does not pose any real problems.


3 Accuracy


3.1 Range of Precision for Nutrients


In this section we will look at the range of measurements with information regarding precision.

Nutrients Recorded per Data Type


Below is a table of the units of measurement used for each nutrient per data type.

There are 259 unique nutrient names, but not 259 unique nutrients. Multiple versions of one nutrient are often present in the data such as “Total dietary fiber (AOAC 2011.25)” and “Fiber, total dietary” or “Vitamin A, IU” and “Vitamin A, RAE”. Because of the way the nutrients were recorded, for each unit used to record a nutrient there is a unique nutrient name.



Key notes/important takeaways:

  • “Alcohol, ethyl”, “Caffeine”, “Folate, DFE”, and “Folic acid” are recorded in every data type except for Foundation.
  • “Carbohydrate, by difference” is recorded in all four data types, but “Carbohydrate, by summation” is specific to Foundation and “Carbohydrate, other” is specific to Branded.
  • “Chromium, Cr” is recorded only for foods in Branded.


Source of Nutrient Information per Data Type


Below is a table of the frequency of derivation descriptions by data type.


Table 3: Source per Data Type
SR Legacy Foundation Branded Survey (FNDDS)
Aggregated data involving comb. of codes other then 1,12 or6 817
Aggregated data involving combinations of source codes 1, 6, 12 and/or 13 7261
Analytical data from the literature, partial documentation 1075
Analytical or derived from analytical 208534 9249
Assumed zero 57711
Calculated by manufacturer, not adjusted or rounded for NLEA 10607
Calculated from nutrient label by NDL 5080
Calculated or imputed 166646 774
Manufacturer's analytical; partial documentation 4513 5137892
Value manufacturer based label claim for added nutrients 138
NA 181743 1 460395

note: NA = Not Available, unknown, or missing


It looks as though the missing data is mostly coming from SR and FNDDS. There are other files that can be downloaded to determine more about the derivation of the FNDDS foods. However, from the documentation of the FNDDS it is noted that all FNDDS nutrition values are taken from a combination of other foods in FDC. We will explore the breakdown of what data FNDDS is derived from further in a later section.

There is exactly one nutrient for one food in branded missing derivation source information, the food is “156 count, 3.0 oz, Guttenplan’s Frozen Dough” and has fdc id 1849609. The nutrient missing the information is “Fatty acids, total trans”, this is likely a clerical error.


Key notes/important takeaways:

  • SR legacy has the most variation in derivation methods because it was pulled together from a large variety of sources
  • We are missing the derivation method for 181743 of the 644125 nutrient entries in SR legacy. In other words, we don’t know how about 28.2% of the data in SR was derived.
  • All derivations in FNDDS are based on derivations for Foundation and SR
  • We know all the derivations for Foundation


3.2 Average age of measurements


There are many different variables formatted as dates within the FDC data, in order to get an accurate look at the age of the measurements, we will have to look at multiple of them.


Publication Date


The one date variable provided for all values is the publication date which represents when each food/nutrient was uploaded to FoodData Central. Below is a table summarizing this variable.


Table 4: Publication Date
n min median max
Branded 5137893 1/28/2021 3/19/2021 9/29/2020
Foundation 10023 10/28/2021 12/16/2019 4/28/2021
SR Legacy 644125 4/1/2019 4/1/2019 4/1/2019
Survey (FNDDS) 460395 10/30/2020 10/30/2020 10/30/2020


All publication dates are in 2019-2021 since FDC has only been around for 2 years. This makes this data rather unhelpful to us.


Figure 1: Publication Dates


Since the last update in April 2021 a fair amount of entries have been added to Foundation and Branded.


Foundation


The date variable associated with the foundation foods is “minYearAcquired” which informs us of when foods in foundation were purchases or procured for analysis. This date represents the oldest sample for each nutrient entry of each food. Let’s look at the distribution of min year acquired in Foundation.


Table 5: Minimum Year Acquired
n
2000 9
2001 22
2003 2
2006 2
2008 2
2009 7
2010 3
2011 10
2012 4
2013 9
2014 3
2015 9
2016 15
2017 13
2018 6
2019 11
total 127


Despite there being 159 foundation foods, “minYearAcquired” is only specified for 127 samples.


Figure 2: Minimum Year Acquired

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2000    2003    2013    2011    2016    2019


On average these samples were acquired in 2011, but the median “minYearAcquired” is 2013. The distribution is being skewed by the amount of data with a “minYearAcquired” of 2001. There seems to be a general upward trend since 2008, with evident drops in input in years 2010, 2012, 2014, and 2018.


FNDDS


For FNDDS we will have to look at both the start date and end date of each sample. The same date is in each entry of start_date and end_date. All samples started on “2017-01-01” and ended on “2018-12-31.” This data means practically nothing to us due to the fact that all FNDDS nutrient calculations are based off of nutrition information in SR and Foundation.


Branded


For branded date we’ll look at both “modifiedDate” which is the last date the food was altered by the manufacturer and “availableDate” which is the date the food was made available for inclusion in the database.


For modified date we have:

##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2013-06-05" "2018-02-17" "2019-04-17" "2019-05-29" "2020-09-21" "2021-09-29" 
##         NA's 
##          "7"


For available date we have:

##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2013-06-05" "2018-02-17" "2019-04-17" "2019-05-29" "2020-09-21" "2021-09-29" 
##         NA's 
##          "7"


Figure 3: Branded Foods Dates


This tells us more about how long food products can sit around before being sold than it does about when the nutrition information was gathered. You’ll notice that these two sets of data are identical, they should not be and haven’t been in previous versions of the data. To see what the previous version of this variable looks like, visit the analysis of the April 2021 release (https://rpubs.com/Em_Mari3/FDC).


Key notes/important takeaways:

  • SR legacy still is using some measurements from as far back as 1976
  • We are missing 1493 of the dates for SR legacy’s 644125 nutrient entries, which is less than one percent.
  • We are missing 1932 of the dates for Foundation’s 9390 nutrient entries, meaning we are not sure when about 20.6% of this data came from. This is a lot more concerning.
  • All dates from FNDDS are listed from “2017-01-01” to “2018-12-31” which are the dates they put the numbers together, not when the data was actually collected.
  • modifiedDate and availableDate should have information on different variables and have in past versions of the data, it appears availableDate has been overwritten with data from modifiedDate


4 Completeness


4.1 Average number of nutrients per data type


Below you’ll find a table of count data relating to how many nutrients were recorded per food in each data type. In this case “total” is the total number of nutrient entries in that data type, “average” refers to the average number of nutrients associated with each food and “min” and “max” refer to the minimum and maximum number of nutrient entries associated with a single food per data type. Note that any foods with zero nutrient information given (primarily the 681 foods in Branded that have no nutrient information, more information on this topic will come up in the missing data section) will not be included in the minimum as we are focusing on provided nutrient information. i.e. for Foundation, the most nutrients stated per food was 159, the least was 13. If we grabbed a random Foundation food we would expect to know the values of about 63 nutrients for that food.


Table 6: Nutrients Listed per Food
Total Average Min Max
Branded 5137893 13.77 1 48
Foundation 10023 63.04 13 159
SR Legacy 644125 82.65 8 138
Survey (FNDDS) 460395 65.00 65 65


Key notes/important takeaways:

  • There are foods in Branded where we only have information on one nutrient
  • SR legacy only has 8 nutrients entries for at least one food
  • On average, we only have the estimates for less than 14 nutrients per food in Branded
  • All FNDDS food entries contain information on exactly 65 nutrients


4.2 Most Frequent Entries


Even though not all measurements are technically nutrients, I will be referring to them all as nutrients due to the column of component names being labeled “nutrient_name.” Below is a plot of the 15 most used nutrient names overall. As you can see, the amount of entries in branded means it has a lot of bearing in the overall frequencies.


Figure 4: 15 most Frequently Reported Nutrient Names Overall


Below are 6 plots, the three plots on the left depict the top 15 most used nutrient names in that data type. The three plots on the right depict the frequencies of those same 15 nutrient names across all data types. We will be skipping FNDDS in this case because it has the exact same frequency for each of the 65 nutrients it uses. Nutrient names in the plots below are ordered from left to right depending on the frequency of each nutrient name in the specified database.


Figure 5: 15 most Frequently Reported Nutrient Names per Data Type


Key notes/important takeaways:

  • Energy is reported most in Foundation and SR legacy by a large margin
  • Protein is reported most in Branded, both protein and Carbohydrate, by difference rank above energy which is certainly unexpected.
  • While branded has a large amount of variance in the different frequencies of nutrient names, both SR legacy and Foundation have a large drop after energy and then become fairly consistent
  • Branded has a tendency not to record ash and water content of most foods even though both are quite frequent in SR and Foundation
  • Fiber did not make it into the top 15 in either SR legacy or Foundation


4.3 Missing Data


Note that all missing variables from the table below had no missing values.

Below is a table of missing or “NA” values for each applicable variable.


Table 7: Count of Missing Values per Data Type
nutrient_rank derivation_code derivation_description source_id source_code source_description data_points median max min
Branded 1166 1 1 1 1 1 5137893 5137893 5137893 5137893
Foundation 0 0 0 0 0 0 1998 1876 2274 2274
SR Legacy 0 181743 181743 181743 181743 181743 0 644125 547232 547234
Survey (FNDDS) 0 460395 460395 460395 460395 460395 460395 460395 460395 460395
total 1,166 642,139 642,139 642,139 642,139 642,139 5,600,286 6,244,289 6,147,794 6,147,796


formatted as percentages we have:


Table 8: Percentage of Missing Values per Data Type
nutrient_rank derivation_code derivation_description source_id source_code source_description data_points median max min
Branded 0 0.000 0.000 0.000 0.000 0.000 1.000 1.000 1.000 1.000
Foundation 0 0.000 0.000 0.000 0.000 0.000 0.199 0.187 0.227 0.227
SR Legacy 0 0.282 0.282 0.282 0.282 0.282 0.000 1.000 0.850 0.850
Survey (FNDDS) 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000


You’ll notice that the nutrients that were used most have the most missing information. Further analysis must be done to draw any true conclusions from this data due to the large number of nutrients we have information on. For now we will have to look at the Essential Nutrients and work from there to draw conclusions.


Key notes/important takeaways:

  • The amount of data missing for each variable seems highly correlated with the data type the entry is associated with, this would make sense due to different data types sourcing their information differently.


5 Frequency of Essential Nutrients


There are certain nutrients that are essential in maintaining life. In order for the data in FDC to be complete each food item would need to have an entry for each of these nutrients, obviously this is not the case.

Note: For all tables below data types will be excluded if they contain no relevant entries, if a table entry indicates “-” that implies that there were none of the listed nutrient in that category.

All percentage values displayed in plots are rounded to two decimal places, all percentage values displayed in tables are rounded to four decimal places.

5.1 Vitamins


Below is a table of the frequency of occurrences of each essential vitamin per data type:


Table 9: Essential Vitamins per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin A 187059 53 7386 7083
Thiamin 21753 103 7402 7083
Riboflavin 20811 102 7421 7083
Niacin 23037 109 7402 7083
Pantothenic acid 5001 56 6376
Vitamin B-6 14216 109 7262 7083
Biotin 557 22
Folate 14215 76 6877 7083
Vitamin B-12 10533 37 7113 7083
Vitamin C 196864 42 7332 7083
Vitamin D 112902 147 12323 7083
Vitamin E 3490 61 5586 7083
Vitamin K 2314 44 5055 7083
Choline 214 28 4612 7083
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin A 0.5012 0.3333 0.9478 1
Thiamin 0.0583 0.6478 0.9498 1
Riboflavin 0.0558 0.6415 0.9523 1
Niacin 0.0617 0.6855 0.9498 1
Pantothenic acid 0.0134 0.3522 0.8182
Vitamin B-6 0.0381 0.6855 0.9319 1
Biotin 0.0015 0.1384
Folate 0.0381 0.4780 0.8825 1
Vitamin B-12 0.0282 0.2327 0.9127 1
Vitamin C 0.5274 0.2642 0.9408 1
Vitamin D 0.3025 0.9245 1.5813 1
Vitamin E 0.0094 0.3836 0.7168 1
Vitamin K 0.0062 0.2767 0.6487 1
Choline 0.0006 0.1761 0.5918 1


Below you’ll find two plots, the first shows the count data as it appears above. The second plot shows the percentage of foods in each data type that contain the essential vitamins listed above. (i.e. the numbers above each bar in the second plot is the percentage of foods in that data type that contain the listed nutrient. For example, foundation contains a value of 0.65 for thiamin, this means that 65% of foods in the Foundation data contain information on thiamin content). All following sections within “Frequency of Essential Nutrients” will have plots that can be interpreted in the same manner.


Figure 6: Count of Essential Vitamins


Figure 7: Percentage of Foods Containing Essential Vitamins per Data Type

Since FNDDS entries all contain information on the same 65 nutrients, the percentage of FNDDS foods that contain information on a given nutrient will always be either 1 or 0.


Vitamins With Multiple Forms


In this case, multiple forms existed for a few of the essential vitamins, below you will see a breakdown of all the different forms of these entries. Each tab contains a separate table that indicates the counts of each type of entry associated with that type of vitamin.


Vitamin A:
Table 10: A Vitamins per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin A, IU 187059 7356
Retinol 27 6788 7083
Vitamin A, RAE 53 6918 7083
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin A, IU 0.5012 0.9439
Retinol 0.1698 0.8710 1
Vitamin A, RAE 0.3333 0.8877 1
Vitamin B9:
Table 11: B9 Vitamins per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Folate, DFE 201 6482 7083
Folate, total 7646 76 6851 7083
Folic acid 7617 6500 7083
10-Formyl folic acid (10HCOFA) 1
5-Formyltetrahydrofolic acid (5-HCOH4 1
5-methyl tetrahydrofolate (5-MTHF) 1
Folate, food 6722 7083
Branded Foundation SR Legacy Survey (FNDDS)
Folate, DFE 0.0005 0.8318 1
Folate, total 0.0205 0.4780 0.8791 1
Folic acid 0.0204 0.8341 1
10-Formyl folic acid (10HCOFA) 0.0063
5-Formyltetrahydrofolic acid (5-HCOH4 0.0063
5-methyl tetrahydrofolate (5-MTHF) 0.0063
Folate, food 0.8626 1
Vitamin D:
Table 12: D Vitamins per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin D (D2 + D3) 32 37 5185 7083
Vitamin D (D2 + D3), International Units 112843 37 5181
Vitamin D2 (ergocalciferol) 1 32 138
Vitamin D3 (cholecalciferol) 26 26 1819
25-hydroxycholecalciferol 15
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin D (D2 + D3) 0.0001 0.2327 0.6653 1
Vitamin D (D2 + D3), International Units 0.3023 0.2327 0.6648
Vitamin D2 (ergocalciferol) 0.0000 0.2013 0.0177
Vitamin D3 (cholecalciferol) 0.0001 0.1635 0.2334
25-hydroxycholecalciferol 0.0943
Vitamin E:
Table 13: E Vitamins per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin E 1184
Vitamin E (alpha-tocopherol) 333 61 5580 7083
Vitamin E (label entry primarily) 1973
Tocopherol, beta 61 1890
Tocopherol, delta 61 1872
Tocopherol, gamma 61 1888
Tocotrienol, alpha 60 1463
Tocotrienol, beta 60 1477
Tocotrienol, delta 60 1461
Tocotrienol, gamma 60 1466
Vitamin E, added 4616 7083
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin E 0.0032
Vitamin E (alpha-tocopherol) 0.0009 0.3836 0.7160 1
Vitamin E (label entry primarily) 0.0053
Tocopherol, beta 0.3836 0.2425
Tocopherol, delta 0.3836 0.2402
Tocopherol, gamma 0.3836 0.2423
Tocotrienol, alpha 0.3774 0.1877
Tocotrienol, beta 0.3774 0.1895
Tocotrienol, delta 0.3774 0.1875
Tocotrienol, gamma 0.3774 0.1881
Vitamin E, added 0.5923 1
Vitamin K:
Table 14: K Vitamins per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin K (phylloquinone) 2314 44 5054 7083
Vitamin K (Dihydrophylloquinone) 38 1419
Vitamin K (Menaquinone-4) 32 606
Branded Foundation SR Legacy Survey (FNDDS)
Vitamin K (phylloquinone) 0.0062 0.2767 0.6485 1
Vitamin K (Dihydrophylloquinone) 0.2390 0.1821
Vitamin K (Menaquinone-4) 0.2013 0.0778
Choline:
Table 15: Choline per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Choline, from phosphotidyl choline 1 28
Choline, total 213 28 4611 7083
Betaine 28 2091
Choline, free 28
Choline, from glycerophosphocholine 28
Choline, from phosphocholine 28
Choline, from sphingomyelin 28
Branded Foundation SR Legacy Survey (FNDDS)
Choline, from phosphotidyl choline 0e+00 0.1761
Choline, total 6e-04 0.1761 0.5917 1
Betaine 0.1761 0.2683
Choline, free 0.1761
Choline, from glycerophosphocholine 0.1761
Choline, from phosphocholine 0.1761
Choline, from sphingomyelin 0.1761


Key notes/important takeaways:

  • Vitamin names and measurements vary wildly between data types
  • Branded foods are supposed to have information on Vitamin A, Vitamin C, and vitamin D, this really inflates the counts of those vitamins overall due to the amount of branded foods but at maximum less than 55% of branded foods contained the most frequent vitamin entry (vitamin C)
  • SR Legacy has the highest proportions of entries for many of the vitamins but has less types of vitamin entries than Foundation.
  • There are no Biotin entries in SR legacy and FNDDS
  • There are no Pantothenic acid entries in FNDDS
  • Excluding FNDDS, there is no one essential vitamin that every food has a value for in any of the data types.


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 16: Count of Essential Vitamins
min max median
Foundation SR Legacy
Vitamin A 20 329
Thiamin 101 1370
Riboflavin 102 1402
Niacin 109 1386
Pantothenic acid 51 1145
Vitamin B-6 108 1297
Biotin 22
Folate 71 769
Vitamin B-12 36 661
C (ascorbic acid) 39 655
Vitamin D 29 136
Vitamin E 57 942
Vitamin K 38 1034
Choline 5 231
Foundation SR Legacy
Vitamin A 20 329
Thiamin 101 1370
Riboflavin 102 1402
Niacin 109 1386
Pantothenic acid 51 1145
Vitamin B-6 108 1297
Biotin 22
Folate 71 769
Vitamin B-12 36 661
C (ascorbic acid) 39 655
Vitamin D 29 136
Vitamin E 57 942
Vitamin K 38 1034
Choline 5 231
Foundation
Vitamin A 27
Thiamin 103
Riboflavin 102
Niacin 109
Pantothenic acid 56
Vitamin B-6 109
Biotin 22
Folate 76
Vitamin B-12 37
C (ascorbic acid) 42
Vitamin D 37
Vitamin E 61
Vitamin K 44
Choline 28


Table 17: Percentage of Entries with Essential Vitamins
min max median
Foundation SR Legacy
Vitamin A 0.1258 0.0422
Thiamin 0.6352 0.1758
Riboflavin 0.6415 0.1799
Niacin 0.6855 0.1779
Pantothenic acid 0.3208 0.1469
Vitamin B-6 0.6792 0.1664
Biotin 0.1384
Folate 0.4465 0.0987
Vitamin B-12 0.2264 0.0848
C (ascorbic acid) 0.2453 0.0840
Vitamin D 0.1824 0.0175
Vitamin E 0.3585 0.1209
Vitamin K 0.2390 0.1327
Choline 0.0314 0.0296
Foundation SR Legacy
Vitamin A 0.1258 0.0422
Thiamin 0.6352 0.1758
Riboflavin 0.6415 0.1799
Niacin 0.6855 0.1779
Pantothenic acid 0.3208 0.1469
Vitamin B-6 0.6792 0.1664
Biotin 0.1384
Folate 0.4465 0.0987
Vitamin B-12 0.2264 0.0848
C (ascorbic acid) 0.2453 0.0840
Vitamin D 0.1824 0.0175
Vitamin E 0.3585 0.1209
Vitamin K 0.2390 0.1327
Choline 0.0314 0.0296
Foundation
Vitamin A 0.1698
Thiamin 0.6478
Riboflavin 0.6415
Niacin 0.6855
Pantothenic acid 0.3522
Vitamin B-6 0.6855
Biotin 0.1384
Folate 0.4780
Vitamin B-12 0.2327
C (ascorbic acid) 0.2642
Vitamin D 0.2327
Vitamin E 0.3836
Vitamin K 0.2767
Choline 0.1761


Key notes/important takeaways:

  • It will soon become obvious that the variables “min”, “max”, and “median” are usually only specified for SR legacy and Foundation, the median often only being available for Foundation. There will be exceptions to this.


5.2 Minerals


Below you’ll find a table displaying the number of essential mineral entries exist in each data type.


Table 18: Essential Minerals per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Calcium 307476 147 7708 7083
Chromium 241
Copper 4002 147 7284 7083
Iodine 1478 23
Iron 308329 147 7713 7083
Magnesium 11812 147 7421 7083
Manganese 4130 147 6492
Molybendum 269 34
Phosphorus 11983 147 7467 7083
Potassium 155188 147 7516 7083
Selenium 2569 92 6865 7083
Zinc 9346 147 7406 7083
Chloride 275
Sodium 370373 129 7709 7083
Branded Foundation SR Legacy Survey (FNDDS)
Calcium 0.8238 0.9245 0.9891 1
Chromium 0.0006
Copper 0.0107 0.9245 0.9347 1
Iodine 0.0040 0.1447
Iron 0.8261 0.9245 0.9897 1
Magnesium 0.0316 0.9245 0.9523 1
Manganese 0.0111 0.9245 0.8331
Molybendum 0.0007 0.2138
Phosphorus 0.0321 0.9245 0.9582 1
Potassium 0.4158 0.9245 0.9645 1
Selenium 0.0069 0.5786 0.8809 1
Zinc 0.0250 0.9245 0.9503 1
Chloride 0.0007
Sodium 0.9923 0.8113 0.9892 1


Figure 8: Count of Essential Minerals


Figure 9: Percentage of Foods Containing Essential Minerals


Key notes/important takeaways:

  • Chromium and Chloride are only listed in Branded, which is an unusual
  • Sodium, Calcium, Iron, and Potassium have high proportions in Branded due to them often showing up on nutrition facts panels
  • Although it’s very close, SR Legacy does not specify values for calcium, iron, and sodium for every food.


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 19: Count of Entries with Essential Minerals
min max median
Foundation SR Legacy Branded Survey (FNDDS)
Calcium 143 1703
Chromium
Copper 143 1629
Iodine 23
Iron 143 1690
Magnesium 147 7421 11812 7083
Manganese 143 1628
Molybendum 34
Phosphorus 143 1663
Potassium 143 1678
Selenium 85 1159
Zinc 143 1659
Chloride
Sodium 125 1505
Foundation SR Legacy Branded Survey (FNDDS)
143 1703
143 1629
23
143 1690
147 7421 11812 7083
143 1628
34
143 1663
143 1678
85 1159
143 1659
125 1505
Foundation Branded SR Legacy Survey (FNDDS)
147
147
23
147
147 11812 7421 7083
147
34
147
147
92
147
129


Table 20: Percentage of Entries with Essential Minerals
min max median
minerals Foundation SR Legacy Branded Survey (FNDDS)
Calcium 0.8994 0.2185
Chromium
Copper 0.8994 0.2090
Iodine 0.1447
Iron 0.8994 0.2169
Magnesium 0.9245 0.9523 0.0316 1
Manganese 0.8994 0.2089
Molybendum 0.2138
Phosphorus 0.8994 0.2134
Potassium 0.8994 0.2153
Selenium 0.5346 0.1487
Zinc 0.8994 0.2129
Chloride
Sodium 0.7862 0.1931
Foundation SR Legacy Branded Survey (FNDDS)
0.8994 0.2185
0.8994 0.2090
0.1447
0.8994 0.2169
0.9245 0.9523 0.0316 1
0.8994 0.2089
0.2138
0.8994 0.2134
0.8994 0.2153
0.5346 0.1487
0.8994 0.2129
0.7862 0.1931
Foundation Branded SR Legacy Survey (FNDDS)
0.9245
0.9245
0.1447
0.9245
0.9245 0.0316 0.9523 1
0.9245
0.2138
0.9245
0.9245
0.5786
0.9245
0.8113


Key notes/important takeaways:

  • We have min, max and median values in every data type for Magnesium, I have no idea why but it’s extremely unusual


5.3 Amino Acids


Below you’ll find a table of the number of essential amino acid entries that exist for each data type.

Table 21: Amino Acids per Data Type
Count Percentage
Branded Foundation SR Legacy
Histidine 53 32 5076
Isoleucine 50 32 5084
Leucine 51 32 5083
Lysine 53 32 5097
Methionine 53 32 5096
Cysteine 23 14
Methionine + Cysteine 23 14
Phenylalanine 53 32 5079
Tyrosine 53 32 5049
Phenylalanine + Tyrosine 49 32 5048
Threonine 52 32 5080
Tryptophan 50 32 5030
Valine 52 32 5083
Branded Foundation SR Legacy
Histidine 1e-04 0.2013 0.6514
Isoleucine 1e-04 0.2013 0.6524
Leucine 1e-04 0.2013 0.6523
Lysine 1e-04 0.2013 0.6540
Methionine 1e-04 0.2013 0.6539
Cysteine 1e-04 0.0881
Methionine + Cysteine 1e-04 0.0881
Phenylalanine 1e-04 0.2013 0.6517
Tyrosine 1e-04 0.2013 0.6479
Phenylalanine + Tyrosine 1e-04 0.2013 0.6478
Threonine 1e-04 0.2013 0.6519
Tryptophan 1e-04 0.2013 0.6455
Valine 1e-04 0.2013 0.6523


Figure 10: Count of Essential Amino Acids


Figure 11: Percentage of Foods Containing Essential Amino Acids


Key notes/important takeaways:

  • In a vast majority of cases, if a food has a provided entry for one of these amino acids, it will have entries for all of them. There are a few exceptions to this where one or more will be missing, particularly tryptophan for some reason.
  • Foods added to foundation since the April update do not appear to contain entries for amino acids, as such the proportion of foundation foods containing amino acids had gone down.


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 22: Count of Entries with Amino Acids
min max median
Foundation SR Legacy
Histidine 29 7
Isoleucine 29 7
Leucine 29 7
Lysine 29 7
Methionine 29 7
Cysteine 12
Methionine + Cysteine 12
Phenylalanine 29 7
Tyrosine 29 7
Phenylalanine + Tyrosine 29 7
Threonine 29 7
Tryptophan 29 7
Valine 29 7
Foundation SR Legacy
Histidine 29 7
Isoleucine 29 7
Leucine 29 7
Lysine 29 7
Methionine 29 7
Cysteine 12
Methionine + Cysteine 12
Phenylalanine 29 7
Tyrosine 29 7
Phenylalanine + Tyrosine 29 7
Threonine 29 7
Tryptophan 29 7
Valine 29 7
Foundation
Histidine 32
Isoleucine 32
Leucine 32
Lysine 32
Methionine 32
Cysteine 14
Methionine + Cysteine 14
Phenylalanine 32
Tyrosine 32
Phenylalanine + Tyrosine 32
Threonine 32
Tryptophan 32
Valine 32


Table 23: Percentage of Entries with Amino Acids
min max median
Foundation SR Legacy
Histidine 0.1824 9e-04
Isoleucine 0.1824 9e-04
Leucine 0.1824 9e-04
Lysine 0.1824 9e-04
Methionine 0.1824 9e-04
Cysteine 0.0755
Methionine + Cysteine 0.0755
Phenylalanine 0.1824 9e-04
Tyrosine 0.1824 9e-04
Phenylalanine + Tyrosine 0.1824 9e-04
Threonine 0.1824 9e-04
Tryptophan 0.1824 9e-04
Valine 0.1824 9e-04
Foundation SR Legacy
Histidine 0.1824 9e-04
Isoleucine 0.1824 9e-04
Leucine 0.1824 9e-04
Lysine 0.1824 9e-04
Methionine 0.1824 9e-04
Cysteine 0.0755
Methionine + Cysteine 0.0755
Phenylalanine 0.1824 9e-04
Tyrosine 0.1824 9e-04
Phenylalanine + Tyrosine 0.1824 9e-04
Threonine 0.1824 9e-04
Tryptophan 0.1824 9e-04
Valine 0.1824 9e-04
Foundation
Histidine 0.2013
Isoleucine 0.2013
Leucine 0.2013
Lysine 0.2013
Methionine 0.2013
Cysteine 0.0881
Methionine + Cysteine 0.0881
Phenylalanine 0.2013
Tyrosine 0.2013
Phenylalanine + Tyrosine 0.2013
Threonine 0.2013
Tryptophan 0.2013
Valine 0.2013


5.4 Omega 3


Table 24: Omega 3 Fatty Acids per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
PUFA 18:3 n-3 c,c,c (ALA) 4 62 1967
PUFA 2:5 n-3 (EPA) 61 5800 7083
PUFA 22:5 n-3 (DPA) 59 5756 7083
PUFA 22:6 n-3 (DHA) 61 5772 7083
Branded Foundation SR Legacy Survey (FNDDS)
PUFA 18:3 n-3 c,c,c (ALA) 0 0.3899 0.2524
PUFA 2:5 n-3 (EPA) 0.3836 0.7443 1
PUFA 22:5 n-3 (DPA) 0.3711 0.7386 1
PUFA 22:6 n-3 (DHA) 0.3836 0.7407 1


Figure 12: Count and Percentage of Foods Containing Omega 3 Fatty Acids


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 25: Count of Entries with Omega 3 Fatty Acids
min max median
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 62 946
20:5 n-3 (EPA) 61 957
22:5 n-3 (DPA) 59 927
22:6 n-3 (DHA) 61 939
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 62 946
20:5 n-3 (EPA) 61 957
22:5 n-3 (DPA) 59 927
22:6 n-3 (DHA) 61 939
Foundation
18:3 n-3 c,c,c (ALA) 62
20:5 n-3 (EPA) 61
22:5 n-3 (DPA) 59
22:6 n-3 (DHA) 61


Table 26: Percentage of Entries with Omega 3 Fatty Acids
min max median
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 0.3899 0.1214
20:5 n-3 (EPA) 0.3836 0.1228
22:5 n-3 (DPA) 0.3711 0.1190
22:6 n-3 (DHA) 0.3836 0.1205
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 0.3899 0.1214
20:5 n-3 (EPA) 0.3836 0.1228
22:5 n-3 (DPA) 0.3711 0.1190
22:6 n-3 (DHA) 0.3836 0.1205
Foundation
18:3 n-3 c,c,c (ALA) 0.3899
20:5 n-3 (EPA) 0.3836
22:5 n-3 (DPA) 0.3711
22:6 n-3 (DHA) 0.3836


5.5 Omega 6


Table 27: Omega 6 Fatty Acids per Data Type
Count Percentage
Branded Foundation SR Legacy
PUFA 18:2 n-6 c,c 2 63 1842
PUFA 20:3 n-6 63 1260
PUFA 2:4 n-6 165
Branded Foundation SR Legacy
PUFA 18:2 n-6 c,c 0 0.3962 0.2364
PUFA 20:3 n-6 0.3962 0.1617
PUFA 2:4 n-6 0.0212


Figure 13: Count and Percentage of Foods Containing Omega 6 Fatty Acids


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 28: Count of Entries with Omega 6 Fatty Acids
min max median
Foundation SR Legacy
18:2 n-6 c,c 63 873
20:3 n-6 63 720
20:4 n-6 2
Foundation SR Legacy
18:2 n-6 c,c 63 873
20:3 n-6 63 720
20:4 n-6 2
Foundation
18:2 n-6 c,c 63
20:3 n-6 63
20:4 n-6


Table 29: Percentage of Entries with Omega 6 Fatty Acids
min max median
Foundation SR Legacy
18:2 n-6 c,c 0.3962 0.1120
20:3 n-6 0.3962 0.0924
20:4 n-6 0.0003
Foundation SR Legacy
18:2 n-6 c,c 0.3962 0.1120
20:3 n-6 0.3962 0.0924
20:4 n-6 0.0003
Foundation
18:2 n-6 c,c 0.3962
20:3 n-6 0.3962
20:4 n-6


5.6 Total Trans, Saturated and Unsaturated Fatty Acids


Table 30: Fatty Acid Entries per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Fatty acids, total monounsaturated 47778 67 7277 7083
Fatty acids, total polyunsaturated 47708 67 7279 7083
Fatty acids, total saturated 322966 67 7450 7083
Fatty acids, total trans 313750 57 4179
Branded Foundation SR Legacy Survey (FNDDS)
Fatty acids, total monounsaturated 0.1280 0.4214 0.9338 1
Fatty acids, total polyunsaturated 0.1278 0.4214 0.9340 1
Fatty acids, total saturated 0.8653 0.4214 0.9560 1
Fatty acids, total trans 0.8406 0.3585 0.5363


Figure 14: Count and Percentage of Foods Containing Fatty Acids


Key notes/important takeaways:

  • FNDDS does not specify total trans fat as one of it’s 65 nutrients
  • It’s upsetting that some of these values get so close to 100% without hitting it, FNDDS remains to be the only data type to have one nutrient for 100% of it’s entries


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 31: Count of Entries with Fatty Acids
min max median
Foundation SR Legacy
Fatty acids, total monounsaturated 20 27
Fatty acids, total polyunsaturated 20 27
Fatty acids, total saturated 20 55
Fatty acids, total trans 14 52
Foundation SR Legacy
Fatty acids, total monounsaturated 20 27
Fatty acids, total polyunsaturated 20 27
Fatty acids, total saturated 20 55
Fatty acids, total trans 14 52
Foundation
Fatty acids, total monounsaturated 20
Fatty acids, total polyunsaturated 20
Fatty acids, total saturated 20
Fatty acids, total trans 14


Table 32: Percentage of Entries with Fatty Acids
min max median
Foundation SR Legacy
Fatty acids, total monounsaturated 0.1258 0.0035
Fatty acids, total polyunsaturated 0.1258 0.0035
Fatty acids, total saturated 0.1258 0.0071
Fatty acids, total trans 0.0881 0.0067
Foundation SR Legacy
Fatty acids, total monounsaturated 0.1258 0.0035
Fatty acids, total polyunsaturated 0.1258 0.0035
Fatty acids, total saturated 0.1258 0.0071
Fatty acids, total trans 0.0881 0.0067
Foundation
Fatty acids, total monounsaturated 0.1258
Fatty acids, total polyunsaturated 0.1258
Fatty acids, total saturated 0.1258
Fatty acids, total trans 0.0881


5.7 Sugars


Table 33: Sugar Entries per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Fructose 2 78 1745
Glucose 4 78 1740
Inositol 53
Lactose 11 78 1712
Ribose 1
Sorbitol 34
Starch 30 45 1167
Sugars, added 121510
Sugars, total including NLEA 352470 5 6007 7083
Total sugar alcohols 3941
Xylitol 108
Galactose 68 1579
Maltose 78 1711
Sucrose 78 1733
Sugars, Total NLEA 78
Branded Foundation SR Legacy Survey (FNDDS)
Fructose 0.0000 0.4906 0.2239
Glucose 0.0000 0.4906 0.2233
Inositol 0.0001
Lactose 0.0000 0.4906 0.2197
Ribose 0.0000
Sorbitol 0.0001
Starch 0.0001 0.2830 0.1497
Sugars, added 0.3256
Sugars, total including NLEA 0.9443 0.0314 0.7708 1
Total sugar alcohols 0.0106
Xylitol 0.0003
Galactose 0.4277 0.2026
Maltose 0.4906 0.2196
Sucrose 0.4906 0.2224
Sugars, Total NLEA 0.4906


Figure 15: Count and Percentage of Foods Containing Sugars


Key notes/important takeaways:

  • The proportion of each data type that supplies information on sugars is severely lacking
  • Branded beat out both Foundation and SR legacy for proportion of entries providing information about total sugar. Even so, around 4% of branded foods are missing information on sugar.
  • Branded is the only data type with any information regarding sugar alcohols
  • Since the last update in april, the proportion of branded foods with provided information on added sugars has jumped dramatically


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 34: Count of Entries with Sugars
min max median
Foundation SR Legacy
Fructose 76 1112
Galactose 66 959
Glucose 76 1114
Lactose 76 1072
Maltose 76 1069
Starch 41 778
Sucrose 76 1119
Sugars, total including NLEA 5 1016
Foundation SR Legacy
Fructose 76 1112
Galactose 66 959
Glucose 76 1114
Lactose 76 1072
Maltose 76 1069
Starch 41 778
Sucrose 76 1119
Sugars, total including NLEA 5 1016
Foundation
Fructose 78
Galactose 68
Glucose 78
Lactose 78
Maltose 78
Starch 45
Sucrose 78
Sugars, total including NLEA 5


Table 35: Percentage of Entries with Sugars
min max median
Foundation SR Legacy
Fructose 0.4780 0.1427
Galactose 0.4151 0.1231
Glucose 0.4780 0.1429
Lactose 0.4780 0.1376
Maltose 0.4780 0.1372
Starch 0.2579 0.0998
Sucrose 0.4780 0.1436
Sugars, total including NLEA 0.0314 0.1304
Foundation SR Legacy
Fructose 0.4780 0.1427
Galactose 0.4151 0.1231
Glucose 0.4780 0.1429
Lactose 0.4780 0.1376
Maltose 0.4780 0.1372
Starch 0.2579 0.0998
Sucrose 0.4780 0.1436
Sugars, total including NLEA 0.0314 0.1304
Foundation
Fructose 0.4906
Galactose 0.4277
Glucose 0.4906
Lactose 0.4906
Maltose 0.4906
Starch 0.2830
Sucrose 0.4906
Sugars, total including NLEA 0.0314


Fiber


Below are the number of entries for total fiber in each data type.


Table 36: Fiber Entries per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Fiber, insoluble 3506 11
Fiber, soluble 3874 11
Fiber, total dietary 311806 73 7231 7083
Inulin 20
Total dietary fiber (AOAC 2011.25) 5
Branded Foundation SR Legacy Survey (FNDDS)
Fiber, insoluble 0.0094 0.0692
Fiber, soluble 0.0104 0.0692
Fiber, total dietary 0.8354 0.4591 0.9279 1
Inulin 0.0001
Total dietary fiber (AOAC 2011.25) 0.0314


Figure 16: Count and Percentage of Foods Containing Fiber


Key notes/important takeaways:

  • Surprisingly, Foundation does not provide much information regarding fiber
  • Branded was the only data type to include any information on inulin content
  • SR legacy is somehow missing fiber content for about 7% of it’s foods


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 37: Count of Entries with Fiber
min max median
Foundation SR Legacy
Fiber, insoluble 11
Fiber, soluble 11
Fiber, total dietary 68 1029
Total dietary fiber (AOAC 2011.25) 5
Foundation SR Legacy
Fiber, insoluble 11
Fiber, soluble 11
Fiber, total dietary 68 1029
Total dietary fiber (AOAC 2011.25) 5
Foundation
Fiber, insoluble 11
Fiber, soluble 11
Fiber, total dietary 68
Total dietary fiber (AOAC 2011.25) 5


Table 38: Percentage of Entries with Fiber
min max median
Foundation SR Legacy
Fiber, insoluble 0.0692
Fiber, soluble 0.0692
Fiber, total dietary 0.4277 0.132
Total dietary fiber (AOAC 2011.25) 0.0314
Foundation SR Legacy
Fiber, insoluble 0.0692
Fiber, soluble 0.0692
Fiber, total dietary 0.4277 0.132
Total dietary fiber (AOAC 2011.25) 0.0314
Foundation
Fiber, insoluble 0.0692
Fiber, soluble 0.0692
Fiber, total dietary 0.4277
Total dietary fiber (AOAC 2011.25) 0.0314

5.8 Carbohydrates


Table 39: Carbohydrate Entries per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Carbohydrate, by difference 371253 131 7793 7083
Carbohydrate, other 1166
Carbohydrate, by summation 34
Branded Foundation SR Legacy Survey (FNDDS)
Carbohydrate, by difference 0.9947 0.8239 1 1
Carbohydrate, other 0.0031
Carbohydrate, by summation 0.2138


Figure 17: Count and Percentage of Foods Containing Carbohydrates


Key notes/important takeaways:

  • For nearly all entries we have information on carbohydrates
  • 9 foods in Foundation are missing carbohydrate entries
  • Some foods in Foundation have entries for both “Carbohydrate, by summation” and “Carbohydrate, by difference”
  • 1703 foods in Branded are missing carbohydrate information, which proportionally speaking is very small
  • All SR Legacy foods have carbohydrate entries


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 40: Count of Entries with Carbohydrates
min max
SR Legacy
Carbohydrate, by difference 20
SR Legacy
Carbohydrate, by difference 20


Table 41: Percentage of Entries with Carbohydrates
min max
SR Legacy
Carbohydrate, by difference 0.0026
SR Legacy
Carbohydrate, by difference 0.0026


5.9 Carotenoids


Table 42: Cartenoid Entries per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Carotene, beta 19 38 5440 7083
Lutein + zeaxanthin 12 30 5294 7083
Carotene, alpha 39 5352 7083
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 39 5341 7083
Lutein 11
Lycopene 37 5314 7083
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12
Branded Foundation SR Legacy Survey (FNDDS)
Carotene, beta 1e-04 0.2390 0.6981 1
Lutein + zeaxanthin 0e+00 0.1887 0.6793 1
Carotene, alpha 0.2453 0.6868 1
cis-beta-Carotene 0.0692
cis-Lutein/Zeaxanthin 0.0881
cis-Lycopene 0.0692
Cryptoxanthin, alpha 0.0818
Cryptoxanthin, beta 0.2453 0.6854 1
Lutein 0.0692
Lycopene 0.2327 0.6819 1
Phytoene 0.0126
Phytofluene 0.0126
trans-beta-Carotene 0.0692
trans-Lycopene 0.0629
Zeaxanthin 0.0755


Figure 18: Count of Foods Containing Carotenoids


Figure 19: Percentage of Foods Containing Carotenoids


Key notes/important takeaways:

  • Information for carotenoids in Foundation is provided seemingly at random


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 43: Count of Entries with Carotenoids
min max median
Foundation SR Legacy
Carotene, alpha 34 379
Carotene, beta 32 439
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 34 377
Lutein 11
Lutein + zeaxanthin 21 322
Lycopene 31 352
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12
Foundation SR Legacy
Carotene, alpha 34 379
Carotene, beta 32 439
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 34 377
Lutein 11
Lutein + zeaxanthin 21 322
Lycopene 31 352
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12
Foundation
Carotene, alpha 39
Carotene, beta 37
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 39
Lutein 11
Lutein + zeaxanthin 26
Lycopene 36
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12


Table 44: Percentage of Entries with Carotenoids
min max median
Foundation SR Legacy
Carotene, alpha 0.2138 0.0486
Carotene, beta 0.2013 0.0563
cis-beta-Carotene 0.0692
cis-Lutein/Zeaxanthin 0.0881
cis-Lycopene 0.0692
Cryptoxanthin, alpha 0.0818
Cryptoxanthin, beta 0.2138 0.0484
Lutein 0.0692
Lutein + zeaxanthin 0.1321 0.0413
Lycopene 0.1950 0.0452
Phytoene 0.0126
Phytofluene 0.0126
trans-beta-Carotene 0.0692
trans-Lycopene 0.0629
Zeaxanthin 0.0755
Foundation SR Legacy
Carotene, alpha 0.2138 0.0486
Carotene, beta 0.2013 0.0563
cis-beta-Carotene 0.0692
cis-Lutein/Zeaxanthin 0.0881
cis-Lycopene 0.0692
Cryptoxanthin, alpha 0.0818
Cryptoxanthin, beta 0.2138 0.0484
Lutein 0.0692
Lutein + zeaxanthin 0.1321 0.0413
Lycopene 0.1950 0.0452
Phytoene 0.0126
Phytofluene 0.0126
trans-beta-Carotene 0.0692
trans-Lycopene 0.0629
Zeaxanthin 0.0755
Foundation
Carotene, alpha 0.2453
Carotene, beta 0.2327
cis-beta-Carotene 0.0692
cis-Lutein/Zeaxanthin 0.0881
cis-Lycopene 0.0692
Cryptoxanthin, alpha 0.0818
Cryptoxanthin, beta 0.2453
Lutein 0.0692
Lutein + zeaxanthin 0.1635
Lycopene 0.2264
Phytoene 0.0126
Phytofluene 0.0126
trans-beta-Carotene 0.0692
trans-Lycopene 0.0629
Zeaxanthin 0.0755

5.10 Phytosterols


Table 45: Phytosterol Entries per Data Type
Count Percentage
Branded Foundation SR Legacy Survey (FNDDS)
Cholesterol 316228 41 7394 7083
Beta-sitostanol 12
Beta-sitosterol 12 138
Brassicasterol 8
Campestanol 8
Campesterol 12 137
Delta-5-avenasterol 12
Phytosterols, other 4
Stigmasterol 12 137
Phytosterols 489
Branded Foundation SR Legacy Survey (FNDDS)
Cholesterol 0.8472 0.2579 0.9488 1
Beta-sitostanol 0.0755
Beta-sitosterol 0.0755 0.0177
Brassicasterol 0.0503
Campestanol 0.0503
Campesterol 0.0755 0.0176
Delta-5-avenasterol 0.0755
Phytosterols, other 0.0252
Stigmasterol 0.0755 0.0176
Phytosterols 0.0627


Figure 20: Count of Foods Containing Phytosterols


Figure 21: Percentage of Foods Containing Phytosterols


Key notes/important takeaways:

  • We are missing a lot of information on cholesterol for such a commonly reported nutrient
  • We’re missing information on cholesterol for about 36% of Branded foods, 71% of foundation foods and, 5% of SR legacy foods


min, max median


Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 46: Count of Entries with Phytosterols
min max median
Foundation SR Legacy
Beta-sitostanol 12
Beta-sitosterol 12 63
Brassicasterol 8
Campestanol 8
Campesterol 12 62
Cholesterol 40 864
Delta-5-avenasterol 12
Phytosterols, other 4
Stigmasterol 12 62
Phytosterols 2
Foundation SR Legacy
Beta-sitostanol 12
Beta-sitosterol 12 63
Brassicasterol 8
Campestanol 8
Campesterol 12 62
Cholesterol 40 864
Delta-5-avenasterol 12
Phytosterols, other 4
Stigmasterol 12 62
Phytosterols 2
Foundation
Beta-sitostanol 12
Beta-sitosterol 12
Brassicasterol 8
Campestanol 8
Campesterol 12
Cholesterol 41
Delta-5-avenasterol 12
Phytosterols, other 4
Stigmasterol 12
Phytosterols


Table 47: Percentage of Entries with Phytosterols
min max median
Foundation SR Legacy
Beta-sitostanol 0.0755
Beta-sitosterol 0.0755 0.0081
Brassicasterol 0.0503
Campestanol 0.0503
Campesterol 0.0755 0.0080
Cholesterol 0.2516 0.1109
Delta-5-avenasterol 0.0755
Phytosterols, other 0.0252
Stigmasterol 0.0755 0.0080
Phytosterols 0.0003
Foundation SR Legacy
Beta-sitostanol 0.0755
Beta-sitosterol 0.0755 0.0081
Brassicasterol 0.0503
Campestanol 0.0503
Campesterol 0.0755 0.0080
Cholesterol 0.2516 0.1109
Delta-5-avenasterol 0.0755
Phytosterols, other 0.0252
Stigmasterol 0.0755 0.0080
Phytosterols 0.0003
Foundation
Beta-sitostanol 0.0755
Beta-sitosterol 0.0755
Brassicasterol 0.0503
Campestanol 0.0503
Campesterol 0.0755
Cholesterol 0.2579
Delta-5-avenasterol 0.0755
Phytosterols, other 0.0252
Stigmasterol 0.0755
Phytosterols


6 Variables Specific to Data Type


As described previously, there are variables unique to each data type. Here we will explore some of the information only available for certain data types. All data types also have unique variables regarding food groupings, these will be investigated in a further section.


6.1 Branded


There are 3 variables in branded that could benefit from further exploration. Those variables being; “brandOwner”, “dataSource”, “brandName”, and “ingredients”.


Brand Owner and Brand Name


There are 28901 unique brand names and 20538 unique brand owners in the branded foods dataset. However, despite each of these strings being unique multiple names often seem to refer to the same brand. For instance, “ANNIE’S”, “ANNIES”, and “Annie’s” all refer to the same brand. The same problem is evident in the brand owners, for example “ANETO NATURAL S. L. U. POLIGONO INDUSTRIAL SANTA MARIA”, “ANETO NATURAL S.L.U. POLIGONO INDUSTRIAL SANTA MARIA”, “ANETO NATURAL, S.L.U.”, and “ANETO NATURAL S.L.U.” are all declared as different names of the same brand owner. There are no instances of both brand owner and brand name being left blank but there are 953 food entries where brand owner is left blank and 3038 food entries where brand name is left blank.

Due to this problem, there is no true way to know how many brands or brand owners we truly have information on.


Data Source


There are 3 data sources “GDSN”, “NZGDSN”, and “LI”. Where GDSN is Global Data Synchronization Network, NZGDSN is New Zeland Global Data Synchronization Network, and LI is Label Insight.


Table 48: Data Sources of Branded Foods
Total Foods Total Nutrients Average Nutrients per Food
GDSN 13514 217456 16.17856
LI 359821 4915972 13.68441
NZGDSN 562 4465 7.94484


A majority of the information has been collected through Label Insight. As the newest addition (only having been added this update) NZGDSN is the least utilized source.


Ingredients



Figure 22: Character Length of Ingredients by Data Source


Figure 23: Number of Words in Ingredient Statement by Data Source


Table 49: Length of Ingredient Statements
Variable Overall, N = 373,8971 GDSN, N = 13,5141 LI, N = 359,8211 NZGDSN, N = 5621 p-value2
Characters per Ingredient Statement 198 (90, 371) 293 (138, 483) 195 (89, 366) 130 (38, 260) <0.001
Words per Ingredient Statement 27 (12, 50) 40 (19, 66) 27 (12, 50) 17 (5, 34) <0.001

1 Median (IQR)

2 Kruskal-Wallis rank sum test


The length of the ingredient statement and by extension the amount of processing the food has undergone varies wildly between the data sources. With a p-value of less than 0.001, the difference between the amount of ingredients in each food per data source is incredibly statistically significant.


6.2 FNDDS (Survey)


There are a lot of variables stored within the data for FNDDS that contain duplicate information. For unique variables related to this data type we will go over the nested variables “foodAttributes”, “inputFoods”, and “foodPortions”.


Food Attributes


This nested variable contains 5 variables, 1 of which is nested further and contains 3 variables. “foodAttributes” contains variables “id”, “name”, “value”, “foodAttributeType”, and “rank”. “foodAttributeType” contains the 3 variables “id”, “name”, and “description”.

For every food in FNDDS, “foodAttributes” contains at least 2 entries, one is the WWEIA Category number and the other is the WWEIA Category description. This information appears twice for every food, once in “foodAttributes” and once in “wweiaFoodCategory”.

Foods having more than 2 entries in “foodAttributes” provide further information about the source of the food or food ingredients in the form of small additional description notes. These notes vary wildly in what information they provide, some provide brand names such as “McDonald’s”, others provide information on contents such as “leche fresca”.


Input Foods


Within the nested variable “inputFoods” we have 10 variables; “id”, “unit”, “portionDescription”, “portionCode”, “foodDescription”, “sequenceNumber”, “amount”, “ingredientCode”, “ingredientWeight”, and “ingredientDescription”.

The variables “foodDescription” and “ingredientDescription” are identical for all foods.


7 Available Data That Could Potentially Be Added to FDC


7.1 USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes


From the USDA Ag Data Commons website you can find a database of flavonoid content for many of the foods in the FDC database (2). These flavonoid values can be linked to Foundation and SR legacy foods by NDB number. Combining this data gives us amounts of the following flavonoids in 25 foods in Foundation and 1613 foods in SR legacy. In all cases, every flavonoid is provided for each food.


##  [1] "Daidzein"                       "Genistein"                     
##  [3] "Glycitein"                      "Cyanidin"                      
##  [5] "Petunidin"                      "Delphinidin"                   
##  [7] "Malvidin"                       "Pelargonidin"                  
##  [9] "Peonidin"                       "(+)-Catechin"                  
## [11] "(-)-Epigallocatechin"           "(-)-Epicatechin"               
## [13] "(-)-Epicatechin 3-gallate"      "(-)-Epigallocatechin 3-gallate"
## [15] "Theaflavin"                     "Thearubigins"                  
## [17] "Eriodictyol"                    "Hesperetin"                    
## [19] "Naringenin"                     "Apigenin"                      
## [21] "Luteolin"                       "Isorhamnetin"                  
## [23] "Kaempferol"                     "Myricetin"                     
## [25] "Quercetin"                      "Theaflavin-3,3'-digallate"     
## [27] "Theaflavin-3'-gallate"          "Theaflavin-3-gallate"          
## [29] "(+)-Gallocatechin"


7.2 USDA Database for the Flavonoid Content of Selected Foods


After the release of the USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes in 2015, alterations and additions were made to the USDA Database for the Flavonoid Content of Selected Foods in 2018. While the other supplemental databases are available on the USDA Ag Data Commons, this new update was published solely on the USDA Agricultural Research Service website (3). There are values in the USDA Database for the Flavonoid Content of Selected Foods for a total of 183 foods in SR legacy and Foundation. Of those 183, 131 can also be found in the USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes (meaning a total of 52 new foods have been added in this release). These newer values are assumed to be more accurate and if added to the FDC data should replace their previous versions.


Table 50: Flavonoid Entries for SR Legacy Foods
n
(-)-Epicatechin 82
(-)-Epicatechin 3-gallate 75
(-)-Epigallocatechin 73
(-)-Epigallocatechin 3-gallate 74
(+)-Catechin 83
(+)-Gallocatechin 71
Apigenin 101
Cyanidin 48
Delphinidin 46
Eriodictyol 2
Hesperetin 39
Isorhamnetin 43
Kaempferol 132
Luteolin 112
Malvidin 38
Myricetin 122
Naringenin 37
Pelargonidin 39
Peonidin 38
Petunidin 37
Quercetin 162
Theaflavin 3
Theaflavin-3'-gallate 3
Theaflavin-3,3'-digallate 3
Thearubigins 3


Table 51: Flavonoid Entries for Foundation Foods
n
(-)-Epicatechin 4
(-)-Epicatechin 3-gallate 4
(-)-Epigallocatechin 4
(-)-Epigallocatechin 3-gallate 4
(+)-Catechin 4
(+)-Gallocatechin 4
Apigenin 6
Cyanidin 2
Delphinidin 2
Hesperetin 2
Isorhamnetin 1
Kaempferol 7
Luteolin 6
Malvidin 2
Myricetin 7
Naringenin 2
Pelargonidin 2
Peonidin 2
Petunidin 2
Quercetin 7


7.3 USDA Database for the Proanthocyanidin Content of Selected Foods


From the USDA Ag Data Commons website you can find a database of proanthocyanidin content for many of the foods in the FDC database (4). These proanthocyanidin values can be linked to Foundation and SR legacy foods by NDB number. The following tables contain the names of each type of proanthocyanidin content and the number of foods in SR legacy and Foundation we have entries for.


Table 52: Proanthocyanidin Entries for SR Legacy Foods
n
Proanthocyanidin 4-6mers 114
Proanthocyanidin 7-10mers 110
Proanthocyanidin dimers 130
Proanthocyanidin polymers (>10mers) 108
Proanthocyanidin trimers 124


Table 53: Proanthocyanidin Entries for Foundation Foods
n
Proanthocyanidin 4-6mers 6
Proanthocyanidin 7-10mers 6
Proanthocyanidin dimers 6
Proanthocyanidin polymers (>10mers) 6
Proanthocyanidin trimers 6


7.4 USDA Database for the Isoflavone Content of Selected Foods


From the USDA Ag Data Commons website you can find a database of Isoflavone content for many of the foods in the FDC database (5). These Isoflavone values can be linked to Foundation and SR legacy foods by NDB number. The following tables contain the names of each type of Isoflavone content and the number of foods in SR legacy and Foundation we have entries for.


Table 54: Isoflavone Entries for SR Legacy Foods
n
Biochanin A 59
Coumestrol 123
Daidzein 262
Formononetin 123
Genistein 262
Glycitein 143
Total isoflavones 259


Table 55: Isoflavone Entries for Foundation Foods
n
Biochanin A 3
Coumestrol 8
Daidzein 15
Formononetin 8
Genistein 15
Glycitein 9
Total isoflavones 15


This data set has significant overlap with USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes but provides additional information on “Biochanin A”, “Coumestrol”, “Formononetin”, and “Total isoflavones”.


8 Food Groups


There is no consistency between the different food groups associated with each data type. Here will dive into what the current food groups are and a possible approach to standardizing them for comparison.


8.1 Currently Provided Food Groups


FNDDS uses the WWEIA (What We Eat In America) food groups which are split into 167 unique categories. The SR legacy and Foundation foods follow the SR legacy food groups which are split into 28 unique categories. Branded has it’s own list of food groups which contains 309 unique categories. None of the three lists of unique food categories have any entries that are identical across all three lists. However, the food categories for FNDDS and branded intersect on the following category names:

## [1] "Rice"     "Cheese"   "Pizza"    "Tomatoes" "Coffee"   "Beer"     "Bacon"


8.2 A Machine Learning Approach to Standardizing Food Groups

Due to the size and scope of this approach and analysis, it has been moved to it’s own report. For information on how a machine learning algorithm can be utilized in the standardization of food group labels see the full report here: https://rpubs.com/Em_Mari3/FoodGroups


9 References

  1. U.S. Department of Agriculture, Agricultural Research Service. FoodData Central, 2019. fdc.nal.usda.gov.

  2. Bhagwat, Seema; Haytowitz, David B.; Wasswa-Kintu, Shirley. (2015). USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes, Release 1.1 - December 2015. Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324677. Accessed 2022-01-12.

  3. Haytowitz, D.B., Wu, X., Bhagwat, S. 2018. USDA Database for the Flavonoid Content of Selected Foods, Release 3.3. U.S. Department of Agriculture, Agricultural Research Service. Nutrient Data Laboratory Home Page: http://www.ars.usda.gov/nutrientdata/flav

  4. Bhagwat, Seema; Haytowitz, David B.. (2015). USDA Database for the Proanthocyanidin Content of Selected Foods, Release 2 (2015). Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324621. Accessed 2022-01-12.

  5. Bhagwat, Seema; Haytowitz, David B.. (2015). USDA Database for the Isoflavone Content of Selected Foods, Release 2.1 (November 2015). Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324538. Accessed 2022-01-12.