logo

Note: A new characterization of the 2021 October data can be found at https://rpubs.com/Em_Mari3/FDC_V2

0.1 Data Structure

As of April 2021, the largest set of data that can be currently downloaded from the FoodData Central website is a collection of 36 files in a zipped folder, 1 file is a pdf that is meant to explain the variables in each file, the other 35 files are csv files containing data (1). The pdf of variable declarations and database structure is largely inaccurate, it contains variables not present in the data and is missing descriptions of several variables.

Included in one file, labeled “all_downloaded_table_record_counts,” is the number of data present in each of the other main files.


Some of these file names do not match the names of the actual files available from the downloadable data, but they are close enough that we can figure out what they’re going for. A lot of these files will not help much with this analysis, as they contain duplicate information and variables irrelevant to our focus.

0.1.1 Files of Interest

For our initial analysis we will focus on the files:

  • food
  • nutrient
  • food_nutrient
  • food_nutrient_derivation
  • branded_food
  • foundation_food
  • sr_legacy_food
  • market_acquisition
  • survey_fndds_food
  • wweia_food_category

Below is a diagram of the variables/ids used to connect the data:

These files contain the nutritional values for all foods in the database and the categories each food falls into.

0.1.1.1 Note on Categories

The data in the file “food_category” has variables that don’t match up to any variables from the other files. As a result we are missing food categories for all SR Legacy and Foundation food items. This means will have to manually add-in food_category descriptions for foundation foods based on the NDB_number of each food in the foundation_food file and their equivalents in the sr_legacy_food file.

Since the codes for SR legacy food groups have been removed from the files in recent years so we will have to get them from the ARS website from this data we will need the files “FD_GROUP” and “FOOD_DES” (2).

0.2 Reading in the data

Certain variable names will need to be altered so that we can combine them correctly, for instance the variable labeled “id” in the file “food_nutrient_derivation” is the same variable labeled “derivation_id” in the “food_nutrient” file. In cases like this we will always default to the longer or more descriptive variable name. We will use the file “food_nutrient” as our base, it contains 15324912 observations of 11 variables.

For the sake of simplicity, all footnote columns will be removed from each file.

The files “sr_legacy_food” and “foundation_food” will not be added to the main data set for now as the only new variable they provide information on is “NDB_number” which we do not need at the present moment. Analysis of “market_acquisition” will be done separately, as it only applies to certain branded foods and only introduces information on which stores and states foods were acquired in.

When combining all the data, we come across a problem. “food_nutrient” has a total of 1094722 unique fdc_ids, however “branded_food” has a total of 1142610 unique fdc_ids meaning at minimum 47888 foods have an fdc_id but no nutrient data.

A fact that contradicts this idea is that the row “upc_gtin” in the branded_food file has 357927 unique upc codes in it, meaning that the creators of the data gave a unique fdc_id to multiple forms of each product even though they share the same nutrient information. In the total combined data we have 1185096 unique fdc_ids, 1060007 of which belong to branded foods. Assuming each food with the same upc code has identical nutrient information, 702080 of these entries must be repeated information. On the FDC website there are 368686 view-able branded foods which is a lot closer to the number of unique upc codes rather than the number of unique fdc_ids. In this case it will be more representative if we filter out the repeated entries of branded foods.

0.3 Dealing with Duplicates

Deleting data is never an ideal solution, so let’s gather more context and see if we can find an alternate solution. Before removing any duplicate information lets look at how many unique foods we have per category.

Table 2: Unique Food Entries per Data Type in csv Files
n
agricultural_acquisition 810
branded_food 1142610
experimental_food 11
foundation_food 195
market_acquistion 5480
sample_food 1982
sr_legacy_food 7793
sub_sample_food 19126
survey_fndds_food 7083
NA 6


We’ve got some obvious problems here off the bat. On the FDC website, there are 159 unique foundation foods, and 378903 branded foods (the number of foods in SR legacy and FNDDS match the website frequencies) as of 12/10/2021. To combat this we will add the gtin_upc column from the branded_food file and filter for duplicated entries excluding fdc_id. In other words we will be filtering the data for unique combinations of the variables nutrient_id, amount, food_description, data_type, gtin_upc, and unit_name. We will also alter our strategy of combining our data by exclusively joining on foods we have nutrient information on.

Now our frequencies look like this:


Table 3: Unique Combinations of nutrient_id, amount, food_description, data_type, gtin_upc, and unit_name
n
agricultural_acquisition 805
branded_food 521993
foundation_food 177
sr_legacy_food 7793
sub_sample_food 14837
survey_fndds_food 7083
NA 5


We’re still not quite to the point where we match the numbers on the website but we sure are a lot closer than we were before. But close isn’t going to cut it, we need to find another way to filter down the entries in a way that is precise and accurate.

Looking at differences in the available types of downloadable data here is crucial. In the csv download version of the food file there are 1605403 entries, in the access version of the file there are only 372954.

Altering the method to use only the foods in the food file from the access version of the data we get:

Table 4: Unique Food Entries per Data Type in Access Files
n
branded_food 357927
experimental_food 11
foundation_food 140
sr_legacy_food 7793
survey_fndds_food 7083


We did it! We have the right number of entries per data type!

Key notes/important takeaways:

  • The different forms of downloadable data (CSV files vs Access database) on the FDC website do not contain identical information, in our case we required files from both of them to get a clear picture of the data
  • In the case of branded foods, every time there is a small change or update to a food, the creators of the database have chosen to create a new entry for the food and generate a new fdc_id for it resulting in multiple entries for each food
  • The pdf file containing explanations of variables and data organization is several versions old and trying to use it to understand this version of the data is like trying to understand a book published this year using a Shakespearean dictionary

0.4 Variables for Foods with Provided Nutrient Data

Ignoring the data for foods with no nutrient values for now, let’s look at what the data would be without it.This data set will be for the amount of foods we have nutrient information for only and only include foods listed in the food file.

Now we have 10795145 observations of 26 variables, all matched by id numbers. Most of these columns are superfluous so for now we will use a subset of the data. For our initial analysis of the overall data we will be using the following variables:

Variable Name Variable Description
fdc_id A unique number for each food in the Food Data Central Database
nutrient_id A unique number given to each nutrient
amount The amount of each nutrient per 100g of the listed food
data_points How many data points they used to derive the nutrition value
min, max, median Minimum, maximum, and median value of nutrition content within sample provided in three separate columns
min_year_acquired The first year data collection started on a sample
data_type The type of data based on how it was acquired
food_description The name of the food or a brief description of the food such as “milk, whole”
publication_date The date the food was published to the FoodData Central website
nutrient_name The name of the nutrient
unit_name The unit of each nutrient (g, mg, mcg, IU, etc..)
derivation_description How the food was analyzed for nutrition content

1 Accuracy

1.1 Range of Precision for Nutrients

In this section we will look at the range of measurements with information regarding precision.

1.1.1 Unit’s Used per Type of Data

Below is a table of the units of measurement used for each nutrient per data type.


Table 5: Units of Measurement per Data Type
G IU KCAL kJ MG MG_ATE SP_GR UG
branded_food 2470260 288601 354840 9 1685504 1040 0 30920
experimental_food 0 0 0 0 0 0 0 0
foundation_food 5696 26 315 98 2390 0 1 864
sr_legacy_food 358495 12537 7793 7793 157006 0 0 100501
survey_fndds_food 205407 0 7083 0 134577 0 0 113328


Since each measurement must be rounded to a certain number of significant digits, and a vast majority of the amounts in this data are rounded to 6 significant figures or less, a larger variety of units of measurement implies a larger amount of precision. Going by this logic, data on Branded food tends to be less precise than data on SR legacy or Foundation foods.

1.1.2 A frequency table of Derivation method’s per Data Type

Below is a table of the frequency of derivation descriptions by data type.

The source codes the descriptions mention are as follows:


Table 7: Source Codes
code description
1 Analytical or derived from analytical
4 Calculated or imputed
5 Value manufacturer based label claim for added nutrients
6 Aggregated data involving combinations of source codes 1, 6, 12 and/or 13
7 Assumed zero
8 Calculated from nutrient label by NDL
9 Calculated by manufacturer, not adjusted or rounded for NLEA
11 Aggregated data involving comb. of codes other then 1,12 or6
12 Manufacturer's analytical; partial documentation
13 Analytical data from the literature, partial documentation


Since there are now 64 different measures of derivation, it doesn’t make much sense to try to fit them all into the original 10 categories.

As there are 384 unique combinations of data type and derivation this is not particularly helpful. To improve our understanding of the derivation methods used in each data type we will split the derivation description into groups.


**Table 8**: Derivation Group per Data Type
Branded Experimental Foundation SR legacy FNDDS
Aggregated_data 0 0 0 2746 0
Analytical 0 0 7594 181665 0
Assumed_zero 0 0 0 57711 0
Based_on_similar_food 0 0 0 47013 0
Based_on_physical_composition 0 0 0 41454 0
Calculated 4762782 0 578 41795 0
Concentration_adjustment 0 0 0 18 0
Estimated_from_ingredients 0 0 0 39304 0
Given_by_info_provider 68392 0 0 0 0
Label 0 0 0 5080 0
Manufacturer_supplied 0 0 0 15258 0
Based_on_other_nutrient 0 0 0 5618 0
Other 0 0 0 1960 0
Product_standard 0 0 0 29 0
Recipe 0 0 0 21734 0
Summed 0 0 1218 0 0
Food_composition_tables 0 0 0 997 0


Derivations were split into 17 groups based on the following groupings

In the event that you prefer the groupings set by the original 10 source codes, the table would look like this:


Table 10: Source per Data Type
Branded Experimental Foundation SR legacy FNDDS
Aggregated data involving comb. of codes other then 1,12 or6 817
Aggregated data involving combinations of source codes 1, 6, 12 and/or 13 7261
Analytical data from the literature, partial documentation 1075
Analytical or derived from analytical 8812 208534
Assumed zero 57711
Calculated by manufacturer, not adjusted or rounded for NLEA 10607
Calculated from nutrient label by NDL 5080
Calculated or imputed 578 166646
Manufacturer's analytical; partial documentation 4831174 4513
Value manufacturer based label claim for added nutrients 138
NA 681 11 181743 460395


note: NA = Not Available, unknown, or missing

You’ll notice that there were a lot of missing values in the table of derivations so lets take a look at how many derivations were missing in each data type.


Table 11: Missing Derivation Values
NA_derivation_count
branded_food 681
experimental_food 11
foundation_food 0
sr_legacy_food 181743
survey_fndds_food 460395


It looks as though the missing data is mostly coming from SR and FNDDS. There are other files that can be downloaded to determine more about the derivation of the FNDDS foods. However, from the documentation of the FNDDS it is noted that all FNDDS nutrition values are taken from a combination of other foods in FDC. We will explore the breakdown of what data FNDDS is derived from further in a later section.

Key notes/important takeaways:

  • SR legacy has the most variation in derivation methods because it was pulled together from a large variety of sources
  • We are missing the derivation method for 181743 of the 644125 nutrient entries in SR legacy. In other words, we don’t know how about 28.2% of the data in SR was derived.
  • All derivations in FNDDS are based on derivations for Foundation and SR
  • We know all the derivations for Foundation

1.2 Average age of measurements

There are many different variables formatted as dates within the FDC data, in order to get an accurate look at the age of the measurements, we will have to look at multiple of them.

1.2.1 Publication Date

The one date variable provided for all values is the publication date which represents when each food/nutrient was uploaded to FoodData Central. Below is a table summarizing this variable.


Table 12: Publication Date
n min median max
branded_food 4831855 2019-04-01 2021-03-19 2021-03-19
experimental_food 11 2020-10-30 2020-10-30 2021-04-23
foundation_food 9390 2019-04-01 2019-12-16 2021-04-28
sr_legacy_food 644125 2019-04-01 2019-04-01 2019-04-01
survey_fndds_food 460395 2020-10-30 2020-10-30 2020-10-30


All publication dates are in 2019-2021 since FDC has only been around for 2 years. This makes this data rather unhelpful to us.

1.2.2 Foundation

The date variable associated with the foundation foods is min_year_acquired which informs us of when foods in foundation were purchases or procured for analysis. This date represents the oldest sample for each nutrient entry of each food. Let’s look at the distribution of min year acquired in Foundation.


Table 13: Minimum Year Acquired
n min median max
foundation_food 7458 1999 2015 2021


##  min_year_acquired
##  Min.   :1999     
##  1st Qu.:2011     
##  Median :2015     
##  Mean   :2014     
##  3rd Qu.:2018     
##  Max.   :2021     
##  NA's   :1932


We’ve got data ranging from 1999 to 2021, which gives us a fairly substantial range of around 22 years. However we have 1932 missing dates which is quite alarming.



1.2.3 SR legacy

To get the dates for SR legacy we have to go back to the link to the original SR legacy data and download the file “NUT_DATA” which provides a variable listed as “AddMod_Date” which specifies the last modified date for each nutrient entry.


##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "1976-11-01" "1996-03-01" "2006-05-01" "2003-06-05" "2011-08-01" "2018-01-01" 
##         NA's 
##       "1493"


This isn’t looking so good, some of these measurements are from 1976, with the mean and median falling in the early 2000’s, a total of 1493 of the entries didn’t specify a date at all.



That is quite a jump in the early 2000’s, more likely than not there is some variable that caused the amount of information to jump like that. If I had to guess I would say that there might have been some sort of scientific advancement that allowed more people to study nutritional composition or there was a large change in government funding.

1.2.4 FNDDS

For FNDDS we will have to look at both the start date and end date of each sample. The same date is in each entry of start_date and end_date. All samples started on “2017-01-01” and ended on “2018-12-31.” This data means practically nothing to us due to the fact that all FNDDS nutrient calculations are based off of nutrition information in SR and Foundation.

1.2.5 Branded

For branded date we’ll look at both “modified_date” which is the last date the food was altered by the manufacturer and “available_date” which is the date the food was made available for inclusion in the database.

For modified date we have:


##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2013-06-05" "2018-02-16" "2019-01-18" "2019-03-05" "2020-04-08" "2021-03-04" 
##         NA's 
##         "20"


For available date we have:


##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2019-04-01" "2019-12-06" "2021-02-26" "2020-07-17" "2021-03-19" "2021-03-19"


This tells us more about how long food products can sit around before being sold than it does about when the nutrition information was gathered.

Key notes/important takeaways:

  • SR legacy still is using some measurements from as far back as 1976
  • We are missing 1493 of the dates for SR legacy’s 644125 nutrient entries, which is less than one percent.
  • We are missing 1932 of the dates for Foundation’s 9390 nutrient entries, meaning we are not sure when about 20.6% of this data came from. This is a lot more concerning.
  • All dates from FNDDS are listed from “2017-01-01” to “2018-12-31” which are the dates they put the numbers together, not when the data was actually collected.

2 Completeness

2.1 Average number of nutrients per data type

Below you’ll find a table of count data relating to how many nutrients were recorded per food in each data type. In this case “total nutrient entries” is the total number of nutrient entries in that data type, “average count” refers to the average number of nutrients associated with each food and “minimum count” and “maximum count” refer to the minimum and maximum number of nutrient entries associated with a single food per data type. Note that any foods with zero nutrient information given (primarily the 681 foods in Branded that have no nutrient information, more information on this topic will come up in the missing data section) will not be included in the minimum as we are focusing on provided nutrient information. i.e. for Foundation, the most nutrients stated per food was 159, the least was 13. If we grabbed a random Foundation food we would expect to know the values of about 67 nutrients for that food.


Table 14:Nutrients Listed per Food
Total Entries Average Count Minimum Count Maximum Count
branded_food 4831855 13.50 1 48
experimental_food 11 1.00 1 1
foundation_food 9390 67.07 13 159
sr_legacy_food 644125 82.65 8 138
survey_fndds_food 460395 65.00 65 65


Key notes/important takeaways:

  • There are foods in Branded where we only have information on one nutrient
  • Sr legacy only has 8 nutrients entries for at least one food
  • On average, we only have the estimates for less than 14 nutrients per food in Branded
  • All FNDDS food entries contain information on exactly 65 nutrients

2.2 Most Frequent Entries

Even though not all measurements are technically nutrients, I will be referring to them all as nutrients due to the column of component names being labeled “nutrient_name.” Below is a plot of the 15 most used nutrient names overall. As you can see, the amount of entries in branded means it has a lot of bearing in the overall frequencies.



Below are 6 plots, the three plots on the left depict the top 15 most used nutrient names in that data type. The three plots on the right depict the frequencies of those same 15 nutrient names across all data types. We will be skipping FNDDS in this case because it has the exact same frequency for each of the 65 nutrients it uses. Nutrient names in the plots below are ordered from left to right depending on the frequency of each nutrient name in the specified database.



Key notes/important takeaways:

  • Energy is reported most in Foundation and SR legacy by a large margin
  • Protein is reported most in Branded, both protein and Carbohydrate, by difference rank above energy which is certainly unexpected.
  • While branded has a large amount of variance in the different frequencies of nutrient names, both SR legacy and Foundation have a large drop after energy and then become fairly consistent
  • Branded has a tendency not to record ash and water content of most foods even though both are quite frequent in SR and Foundation
  • Fiber did not make it into the top 15 in either SR legacy or Foundation

2.3 Missing Data

Note that for the following variables we have no missing values:

  • fdc_id
  • data_type
  • description
  • publication_date

Below is a table of missing or “NA” values for each applicable variable.


Table 15: Missing Values
Number of Missing Values Percentage
nutrient_id 692 0.0001
amount 692 0.0001
data_points 5294155 0.8904
derivation_id 642830 0.1081
min 5841684 0.9825
max 5841682 0.9825
median 5938177 0.9987
min_year_acquired 5938318 0.9987
NDB_number 5292261 0.8901
nutrient_name 692 0.0001
unit_name 692 0.0001
derivation_description 642830 0.1081
food_code 5945776 1.0000
survey_start_date 5945776 1.0000
survey_end_date 5945776 1.0000


Naturally, all of the foods from data types other than FNDDS will be missing survey_start_date and survey_end_date. food_code is also only truly applicable to FNDDS.

Below you’ll find a breakdown of the values above per data type.


Table 16: Count of Missing Values per Data Type
branded_food experimental_food foundation_food sr_legacy_food survey_fndds_food
nutrient_id 681 11 0 0 0
amount 681 11 0 0 0
data_points 4831855 11 1894 0 460395
derivation_id 681 11 0 181743 460395
min 4831855 11 2189 547234 460395
max 4831855 11 2189 547232 460395
median 4831855 11 1791 644125 460395
min_year_acquired 4831855 11 1932 644125 460395
NDB_number 4831855 11 0 0 460395
nutrient_name 681 11 0 0 0
unit_name 681 11 0 0 0
derivation_description 681 11 0 181743 460395
food_code 4831855 11 9390 644125 460395
survey_start_date 4831855 11 9390 644125 460395
survey_end_date 4831855 11 9390 644125 460395


That table can be rather difficult to interpret without any context, so here is the same information displayed as percentages of each variable missing within each data type rounded to 3 decimal points.


Table 17: Percentage of Missing Values per Data Type
branded_food experimental_food foundation_food sr_legacy_food survey_fndds_food
nutrient_id 0 1 0.000 0.000 0
amount 0 1 0.000 0.000 0
data_points 1 1 0.202 0.000 1
derivation_id 0 1 0.000 0.282 1
min 1 1 0.233 0.850 1
max 1 1 0.233 0.850 1
median 1 1 0.191 1.000 1
min_year_acquired 1 1 0.206 1.000 1
NDB_number 1 1 0.000 0.000 1
nutrient_name 0 1 0.000 0.000 0
unit_name 0 1 0.000 0.000 0
derivation_description 0 1 0.000 0.282 1
food_code 1 1 1.000 1.000 1
survey_start_date 1 1 1.000 1.000 1
survey_end_date 1 1 1.000 1.000 1


Experimental foods is still missing most data, NDB_number is only applicable for SR legacy and Foundation foods. It would appear there are 681 foods in branded that have no nutrient data but do have a name and unique fdc_id, they are all foods without available nutrition facts such as alcohol. For example, the first 6 branded foods without nutrient information are as follows:

  • PALM 50l
  • RODENBACH Vintage 75cl 2012
  • RODENBACH Grand Cru 33cl
  • PALM 33cl
  • STEENBRUGGE Blond 20l One way keg
  • PALM HOP SELECT 20l One way keg

Below is a table of missing variables by nutrient name:



and again formatted as percentages we have:


You’ll notice that the nutrients that were used most have the most missing information. Further analysis must be done to draw any true conclusions from this data due to the large number of nutrients we have information on. For now we will have to look at the Essential Nutrients and work from there to draw conclusions.

Key notes/important takeaways:

  • There are 681 foods in the Branded foods data that have no nutrient data associated with them. It would appear these are foods that are not required to have a nutrition facts panel in a retail setting.
  • The amount of data missing for each variable seems highly correlated with the data type the entry is associated with, this would make sense due to different data types sourcing there information differently.
  • We have 692 data points total that are not associated with a nutrient, this is the sum of the 681 Branded foods mentioned earlier and the 11 experimental foods within the data.

2.4 Frequency of Essential Nutrients

Note: For all tables below data types will be excluded if they contain no relevant entries, if a table entry indicates “-” that implies that there were none of the listed nutrient in that category.

All percentage values displayed in plots are rounded to two decimal places, all percentage values displayed in tables are rounded to four decimal places.

2.4.1 Vitamins

Below is a table of the frequency of occurrences of each essential vitamin per data type:


Table 20: Essential Vitamins per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
Vitamin A 185371 53 7386 7083
Thiamin 21157 84 7402 7083
Riboflavin 20232 87 7421 7083
Niacin 22207 90 7402 7083
Pantothenic acid 4765 56 6376
Vitamin B-6 13879 90 7262 7083
Biotin 461 12
Folate 12381 64 6877 7083
Vitamin B-12 10344 37 7113 7083
Vitamin C 194734 34 7332 7083
Vitamin D 101130 114 12323 7083
Vitamin E 3445 61 5586 7083
Vitamin K 2194 44 5055 7083
Choline 157 28 4612 7083
Branded Foundation SR legacy FNDDS
Vitamin A 0.5179 0.3786 0.9478 1
Thiamin 0.0591 0.6000 0.9498 1
Riboflavin 0.0565 0.6214 0.9523 1
Niacin 0.0620 0.6429 0.9498 1
Pantothenic acid 0.0133 0.4000 0.8182
Vitamin B-6 0.0388 0.6429 0.9319 1
Biotin 0.0013 0.0857
Folate 0.0346 0.4571 0.8825 1
Vitamin B-12 0.0289 0.2643 0.9127 1
Vitamin C 0.5441 0.2429 0.9408 1
Vitamin D 0.2825 0.8143 1.5813 1
Vitamin E 0.0096 0.4357 0.7168 1
Vitamin K 0.0061 0.3143 0.6487 1
Choline 0.0004 0.2000 0.5918 1


Below you’ll find two plots, the first shows the count data as it appears above. The second plot shows the percentage of foods in each data type that contain the essential vitamins listed above. (i.e. the numbers above each bar in the second plot is the percentage of foods in that data type that contain the listed nutrient. For example, “foundation_food” contains a value of 0.6 for Thiamin, this means that 60% of foods in the Foundation data contain information on thiamin content). All following sections within “Frequency of Essential Nutrients” will have plots that can be interpreted in the same manner.




Since FNDDS entries all contain information on the same 65 nutrients, the percentage of FNDDS foods that contain information on a given nutrient will always be either 1 or 0.

2.4.1.1 Vitamins With Multiple Forms

In this case, multiple forms existed for a few of the essential vitamins, below you will see a breakdown of all the different forms of these entries. Each tab contains a separate table that indicates the counts of each type of entry associated with that type of vitamin.

2.4.1.1.1 Vitamin A:
Table 21: A Vitamins per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
Vitamin A, IU 185371 7356
Retinol 27 6788 7083
Vitamin A, RAE 53 6918 7083
Branded Foundation SR legacy FNDDS
Vitamin A, IU 0.5179 0.9439
Retinol 0.1929 0.8710 1
Vitamin A, RAE 0.3786 0.8877 1
2.4.1.1.2 Vitamin B9:
Table 21: B9 Vitamins per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
Folate, DFE 179 6482 7083
Folate, total 7622 64 6851 7083
Folic acid 5789 6500 7083
10-Formyl folic acid (10HCOFA) 1
5-Formyltetrahydrofolic acid (5-HCOH4 1
5-methyl tetrahydrofolate (5-MTHF) 1
Folate, food 6722 7083
Branded Foundation SR legacy FNDDS
Folate, DFE 0.0005 0.8318 1
Folate, total 0.0213 0.4571 0.8791 1
Folic acid 0.0162 0.8341 1
10-Formyl folic acid (10HCOFA) 0.0071
5-Formyltetrahydrofolic acid (5-HCOH4 0.0071
5-methyl tetrahydrofolate (5-MTHF) 0.0071
Folate, food 0.8626 1
2.4.1.1.3 Vitamin D:
Table 22: D Vitamins per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
Vitamin D (D2 + D3) 23 26 5185 7083
Vitamin D (D2 + D3), International Units 101088 26 5181
Vitamin D2 (ergocalciferol) 1 21 138
Vitamin D3 (cholecalciferol) 18 26 1819
25-hydroxycholecalciferol 15
Branded Foundation SR legacy FNDDS
Vitamin D (D2 + D3) 0.0001 0.1857 0.6653 1
Vitamin D (D2 + D3), International Units 0.2824 0.1857 0.6648
Vitamin D2 (ergocalciferol) 0.0000 0.1500 0.0177
Vitamin D3 (cholecalciferol) 0.0001 0.1857 0.2334
25-hydroxycholecalciferol 0.1071
2.4.1.1.4 Vitamin E:
Table 23: E Vitamins per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
Vitamin E 1040
Vitamin E (alpha-tocopherol) 263 61 5580 7083
Vitamin E (label entry primarily) 2142
Tocopherol, beta 61 1890
Tocopherol, delta 61 1872
Tocopherol, gamma 61 1888
Tocotrienol, alpha 60 1463
Tocotrienol, beta 60 1477
Tocotrienol, delta 60 1461
Tocotrienol, gamma 60 1466
Vitamin E, added 4616 7083
Branded Foundation SR legacy FNDDS
Vitamin E 0.0029
Vitamin E (alpha-tocopherol) 0.0007 0.4357 0.7160 1
Vitamin E (label entry primarily) 0.0060
Tocopherol, beta 0.4357 0.2425
Tocopherol, delta 0.4357 0.2402
Tocopherol, gamma 0.4357 0.2423
Tocotrienol, alpha 0.4286 0.1877
Tocotrienol, beta 0.4286 0.1895
Tocotrienol, delta 0.4286 0.1875
Tocotrienol, gamma 0.4286 0.1881
Vitamin E, added 0.5923 1
2.4.1.1.5 Vitamin K:
Table 24: K Vitamins per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
Vitamin K (phylloquinone) 2194 44 5054 7083
Vitamin K (Dihydrophylloquinone) 38 1419
Vitamin K (Menaquinone-4) 32 606
Branded Foundation SR legacy FNDDS
Vitamin K (phylloquinone) 0.0061 0.3143 0.6485 1
Vitamin K (Dihydrophylloquinone) 0.2714 0.1821
Vitamin K (Menaquinone-4) 0.2286 0.0778
2.4.1.1.6 Choline:
Table 25: Choline per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
Choline, from phosphotidyl choline 1 28
Choline, total 156 28 4611 7083
Betaine 28 2091
Choline, free 28
Choline, from glycerophosphocholine 28
Choline, from phosphocholine 28
Choline, from sphingomyelin 28
Branded Foundation SR legacy FNDDS
Choline, from phosphotidyl choline 0e+00 0.2
Choline, total 4e-04 0.2 0.5917 1
Betaine 0.2 0.2683
Choline, free 0.2
Choline, from glycerophosphocholine 0.2
Choline, from phosphocholine 0.2
Choline, from sphingomyelin 0.2

Key notes/important takeaways:

  • Vitamin names and measurements vary wildly between data types
  • Branded foods are supposed to have information on Vitamin A, Vitamin C, and vitamin D, this really inflates the counts of those vitamins overall due to the amount of branded foods but at maximum less than 55% of branded foods contained the most frequent vitamin entry (vitamin C)
  • SR Legacy has the highest proportions of entries for many of the vitamins but has less types of vitamin entries than Foundation.
  • There are no Biotin entries in SR legacy and FNDDS
  • There are no Pantothenic acid entries in FNDDS
  • Excluding FNDDS, there is no one essential vitamin that every food has a value for in any of the data types.

2.4.1.2 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 26: Count of Essential Vitamins
min max median
Foundation SR legacy
Vitamin A 20 329
Thiamin 82 1370
Riboflavin 87 1402
Niacin 90 1386
Pantothenic acid 51 1145
Vitamin B-6 89 1297
Biotin 12
Folate 59 769
Vitamin B-12 36 661
C (ascorbic acid) 31 655
Vitamin D 18 136
Vitamin E 57 942
Vitamin K 38 1034
Choline 5 231
Foundation SR legacy
Vitamin A 20 329
Thiamin 82 1370
Riboflavin 87 1402
Niacin 90 1386
Pantothenic acid 51 1145
Vitamin B-6 89 1297
Biotin 12
Folate 59 769
Vitamin B-12 36 661
C (ascorbic acid) 31 655
Vitamin D 18 136
Vitamin E 57 942
Vitamin K 38 1034
Choline 5 231
Foundation
Vitamin A 27
Thiamin 84
Riboflavin 87
Niacin 90
Pantothenic acid 56
Vitamin B-6 90
Biotin 12
Folate 64
Vitamin B-12 37
C (ascorbic acid) 34
Vitamin D 26
Vitamin E 61
Vitamin K 44
Choline 28


Table 27: Percentage of Entries with Essential Vitamins
min max median
Foundation SR legacy
Vitamin A 0.1429 0.0422
Thiamin 0.5857 0.1758
Riboflavin 0.6214 0.1799
Niacin 0.6429 0.1779
Pantothenic acid 0.3643 0.1469
Vitamin B-6 0.6357 0.1664
Biotin 0.0857
Folate 0.4214 0.0987
Vitamin B-12 0.2571 0.0848
C (ascorbic acid) 0.2214 0.0840
Vitamin D 0.1286 0.0175
Vitamin E 0.4071 0.1209
Vitamin K 0.2714 0.1327
Choline 0.0357 0.0296
Foundation SR legacy
Vitamin A 0.1429 0.0422
Thiamin 0.5857 0.1758
Riboflavin 0.6214 0.1799
Niacin 0.6429 0.1779
Pantothenic acid 0.3643 0.1469
Vitamin B-6 0.6357 0.1664
Biotin 0.0857
Folate 0.4214 0.0987
Vitamin B-12 0.2571 0.0848
C (ascorbic acid) 0.2214 0.0840
Vitamin D 0.1286 0.0175
Vitamin E 0.4071 0.1209
Vitamin K 0.2714 0.1327
Choline 0.0357 0.0296
Foundation
Vitamin A 0.1929
Thiamin 0.6000
Riboflavin 0.6214
Niacin 0.6429
Pantothenic acid 0.4000
Vitamin B-6 0.6429
Biotin 0.0857
Folate 0.4571
Vitamin B-12 0.2643
C (ascorbic acid) 0.2429
Vitamin D 0.1857
Vitamin E 0.4357
Vitamin K 0.3143
Choline 0.2000


Key notes/important takeaways:

  • It will soon become obvious that the variables “min”, “max”, and “median” are usually only specified for SR legacy and Foundation, the median often only being available for Foundation. There will be exceptions to this.

2.4.2 Minerals

Below you’ll find a table displaying the number of essential mineral entries exist in each data type.


Table 28: Count of Foods Containing Essential Minerals
Branded Foundation SR legacy FNDDS
Calcium 291784 128 7708 7083
Chromium 194
Copper 3830 128 7284 7083
Iodine 1436 21
Iron 292662 128 7713 7083
Magnesium 11539 128 7421 7083
Manganese 3880 128 6492
Molybendum 228 30
Phosphorus 11907 128 7467 7083
Potassium 128639 128 7516 7083
Selenium 2397 81 6865 7083
Zinc 9212 128 7406 7083
Chloride 232
Sodium 353724 110 7709 7083



Key notes/important takeaways:

  • Chromium and Chloride are only listed in Branded, which is an unusual
  • Sodium, Calcium, Iron, and Potassium have high proportions in Branded due to them often showing up on nutrition facts panels
  • Although it’s very close, SR Legacy does not specify values for calcium, iron, and sodium for every food.

2.4.2.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 29: Count of Entries with Essential Minerals
min max median
Branded Foundation SR legacy FNDDS
Calcium 124 1703
Chromium
Copper 124 1629
Iodine 21
Iron 124 1690
Magnesium 128 7421 11539 7083
Manganese 124 1628
Molybendum 30
Phosphorus 124 1663
Potassium 124 1678
Selenium 74 1159
Zinc 124 1659
Chloride
Sodium 106 1505
Foundation Branded SR legacy FNDDS
Calcium 124 1703
Chromium
Copper 124 1629
Iodine 21
Iron 124 1690
Magnesium 128 7421 11539 7083
Manganese 124 1628
Molybendum 30
Phosphorus 124 1663
Potassium 124 1678
Selenium 74 1159
Zinc 124 1659
Chloride
Sodium 106 1505
Foundation Branded SR legacy FNDDS
Calcium 128
Chromium
Copper 128
Iodine 21
Iron 128
Magnesium 128 11539 7421 7083
Manganese 128
Molybendum 30
Phosphorus 128
Potassium 128
Selenium 81
Zinc 128
Chloride
Sodium 110


Table 30: Percentage of Entries with Essential Minerals
min max median
Branded Foundation SR legacy FNDDS
Calcium 3e-04 12.1643
Chromium
Copper 3e-04 11.6357
Iodine 1e-04
Iron 3e-04 12.0714
Magnesium 4e-04 53.0071 1.4807 1
Manganese 3e-04 11.6286
Molybendum 1e-04
Phosphorus 3e-04 11.8786
Potassium 3e-04 11.9857
Selenium 2e-04 8.2786
Zinc 3e-04 11.8500
Chloride
Sodium 3e-04 10.7500
Foundation Branded SR legacy FNDDS
Calcium 0.8857 0.0048
Chromium
Copper 0.8857 0.0046
Iodine 0.1500
Iron 0.8857 0.0047
Magnesium 0.9143 0.0207 1.4807 1
Manganese 0.8857 0.0045
Molybendum 0.2143
Phosphorus 0.8857 0.0046
Potassium 0.8857 0.0047
Selenium 0.5286 0.0032
Zinc 0.8857 0.0046
Chloride
Sodium 0.7571 0.0042
Foundation Branded SR legacy FNDDS
Calcium 0.9143
Chromium
Copper 0.9143
Iodine 0.1500
Iron 0.9143
Magnesium 0.9143 0.0322 7421 1
Manganese 0.9143
Molybendum 0.2143
Phosphorus 0.9143
Potassium 0.9143
Selenium 0.5786
Zinc 0.9143
Chloride
Sodium 0.7857


Key notes/important takeaways:

  • We have min, max and median values in every data type for Magnesium, I have no idea why but it’s extremely unusual

2.4.3 Amino Acids

Below you’ll find a table of the number of essential amino acid entries that exist for each data type.


Table 31: Amino Acids per Data Type
Count Percentage
Branded Foundation SR legacy
Histidine 48 32 5076
Isoleucine 47 32 5084
Leucine 47 32 5083
Lysine 48 32 5097
Methionine 48 32 5096
Cysteine 22 14
Methionine + Cysteine 22 14
Phenylalanine 48 32 5079
Tyrosine 46 32 5049
Phenylalanine + Tyrosine 46 32 5048
Threonine 48 32 5080
Tryptophan 47 32 5030
Valine 48 32 5083
Branded Foundation SR legacy
Histidine 1e-04 0.2286 0.6514
Isoleucine 1e-04 0.2286 0.6524
Leucine 1e-04 0.2286 0.6523
Lysine 1e-04 0.2286 0.6540
Methionine 1e-04 0.2286 0.6539
Cysteine 1e-04 0.1000
Methionine + Cysteine 1e-04 0.1000
Phenylalanine 1e-04 0.2286 0.6517
Tyrosine 1e-04 0.2286 0.6479
Phenylalanine + Tyrosine 1e-04 0.2286 0.6478
Threonine 1e-04 0.2286 0.6519
Tryptophan 1e-04 0.2286 0.6455
Valine 1e-04 0.2286 0.6523



Key notes/important takeaways:

  • In a vast majority of cases, if a food has a provided entry for one of these amino acids, it will have entries for all of them. There are a few exceptions to this where one or more will be missing, particularly tryptophan for some reason.

2.4.3.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 32: Count of Entries with Amino Acids
min max median
Foundation SR legacy
Histidine 29 7
Isoleucine 29 7
Leucine 29 7
Lysine 29 7
Methionine 29 7
Cysteine 12
Methionine + Cysteine 12
Phenylalanine 29 7
Tyrosine 29 7
Phenylalanine + Tyrosine 29 7
Threonine 29 7
Tryptophan 29 7
Valine 29 7
Foundation SR legacy
Histidine 29 7
Isoleucine 29 7
Leucine 29 7
Lysine 29 7
Methionine 29 7
Cysteine 12
Methionine + Cysteine 12
Phenylalanine 29 7
Tyrosine 29 7
Phenylalanine + Tyrosine 29 7
Threonine 29 7
Tryptophan 29 7
Valine 29 7
Foundation
Histidine 32
Isoleucine 32
Leucine 32
Lysine 32
Methionine 32
Cysteine 14
Methionine + Cysteine 14
Phenylalanine 32
Tyrosine 32
Phenylalanine + Tyrosine 32
Threonine 32
Tryptophan 32
Valine 32


Table 33: Percentage of Entries with Amino Acids
min max median
Foundation SR legacy
Histidine 0.2071 9e-04
Isoleucine 0.2071 9e-04
Leucine 0.2071 9e-04
Lysine 0.2071 9e-04
Methionine 0.2071 9e-04
Cysteine 0.0857
Methionine + Cysteine 0.0857
Phenylalanine 0.2071 9e-04
Tyrosine 0.2071 9e-04
Phenylalanine + Tyrosine 0.2071 9e-04
Threonine 0.2071 9e-04
Tryptophan 0.2071 9e-04
Valine 0.2071 9e-04
Foundation SR legacy
Histidine 0.2071 9e-04
Isoleucine 0.2071 9e-04
Leucine 0.2071 9e-04
Lysine 0.2071 9e-04
Methionine 0.2071 9e-04
Cysteine 0.0857
Methionine + Cysteine 0.0857
Phenylalanine 0.2071 9e-04
Tyrosine 0.2071 9e-04
Phenylalanine + Tyrosine 0.2071 9e-04
Threonine 0.2071 9e-04
Tryptophan 0.2071 9e-04
Valine 0.2071 9e-04
Foundation
Histidine 0.2286
Isoleucine 0.2286
Leucine 0.2286
Lysine 0.2286
Methionine 0.2286
Cysteine 0.1000
Methionine + Cysteine 0.1000
Phenylalanine 0.2286
Tyrosine 0.2286
Phenylalanine + Tyrosine 0.2286
Threonine 0.2286
Tryptophan 0.2286
Valine 0.2286


2.4.4 Omega 3


Table 34: Omega 3 Fatty Acids per Data Type
Count Percentage
Branded Foundation SR legacy FNDDS
PUFA 18:3 n-3 c,c,c (ALA) 4 62 1967
PUFA 2:5 n-3 (EPA) 61 5800 7083
PUFA 22:5 n-3 (DPA) 59 5756 7083
PUFA 22:6 n-3 (DHA) 61 5772 7083
Branded Foundation SR legacy FNDDS
PUFA 18:3 n-3 c,c,c (ALA) 0 0.4429 0.2524
PUFA 2:5 n-3 (EPA) 0.4357 0.7443 1
PUFA 22:5 n-3 (DPA) 0.4214 0.7386 1
PUFA 22:6 n-3 (DHA) 0.4357 0.7407 1



2.4.4.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 35: Count of Entries with Omega 3 Fatty Acids
min max median
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 62 946
20:5 n-3 (EPA) 61 957
22:5 n-3 (DPA) 59 927
22:6 n-3 (DHA) 61 939
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 62 946
20:5 n-3 (EPA) 61 957
22:5 n-3 (DPA) 59 927
22:6 n-3 (DHA) 61 939
Foundation
18:3 n-3 c,c,c (ALA) 62
20:5 n-3 (EPA) 61
22:5 n-3 (DPA) 59
22:6 n-3 (DHA) 61


Table 36: Percentage of Entries with Omega 3 Fatty Acids
min max median
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 0.4429 0.1214
20:5 n-3 (EPA) 0.4357 0.1228
22:5 n-3 (DPA) 0.4214 0.1190
22:6 n-3 (DHA) 0.4357 0.1205
Foundation SR Legacy
18:3 n-3 c,c,c (ALA) 0.4429 0.1214
20:5 n-3 (EPA) 0.4357 0.1228
22:5 n-3 (DPA) 0.4214 0.1190
22:6 n-3 (DHA) 0.4357 0.1205
Foundation
18:3 n-3 c,c,c (ALA) 0.4429
20:5 n-3 (EPA) 0.4357
22:5 n-3 (DPA) 0.4214
22:6 n-3 (DHA) 0.4357


2.4.5 Omega 6


Table 37: Omega 6 Fatty Acids per Data Type
Count Percentage
Branded Foundation SR legacy
PUFA 18:2 n-6 c,c 2 63 1842
PUFA 20:3 n-6 63 1260
PUFA 2:4 n-6 165
Branded Foundation SR legacy
PUFA 18:2 n-6 c,c 0 0.45 0.2364
PUFA 20:3 n-6 0.45 0.1617
PUFA 2:4 n-6 0.0212



2.4.5.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrients per data type.


Table 38: Count of Entries with Omega 6 Fatty Acids
min max median
Foundation SR Legacy
18:2 n-6 c,c 63 873
20:3 n-6 63 720
20:4 n-6 2
Foundation SR Legacy
18:2 n-6 c,c 63 873
20:3 n-6 63 720
20:4 n-6 2
Foundation
18:2 n-6 c,c 63
20:3 n-6 63
20:4 n-6


**Table 39 **:Percentage of Entries with Omega 6 Fatty Acids
min max median
Foundation SR Legacy
18:2 n-6 c,c 0.45 0.1120
20:3 n-6 0.45 0.0924
20:4 n-6 0.0003
Foundation SR Legacy
18:2 n-6 c,c 0.45 0.1120
20:3 n-6 0.45 0.0924
20:4 n-6 0.0003
Foundation
18:2 n-6 c,c 0.45
20:3 n-6 0.45
20:4 n-6


2.4.6 Total Trans, Saturated and Unsaturated Fatty Acids


Table 40: Fatty Acid Entries per Data Type
Count Percentage
Branded Foundation SR Legacy FNDDS
Fatty acids, total monounsaturated 46444 67 7277 7083
Fatty acids, total polyunsaturated 46415 67 7279 7083
Fatty acids, total saturated 307169 67 7450 7083
Fatty acids, total trans 293540 57 4179
Branded Foundation SR Legacy FNDDS
Fatty acids, total monounsaturated 0.1298 0.4786 0.9338 1
Fatty acids, total polyunsaturated 0.1297 0.4786 0.9340 1
Fatty acids, total saturated 0.8582 0.4786 0.9560 1
Fatty acids, total trans 0.8201 0.4071 0.5363



Key notes/important takeaways:

  • FNDDS does not specify total trans fat as one of it’s 65 nutrients
  • It’s upsetting that some of these values get so close to 100% without hitting it, FNDDS remains to be the only data type to have one nutrient for 100% of it’s entries

2.4.6.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 41: Count of Entries with Fatty Acids
min max median
Foundation SR Legacy
Fatty acids, total monounsaturated 20 27
Fatty acids, total polyunsaturated 20 27
Fatty acids, total saturated 20 55
Fatty acids, total trans 14 52
Foundation SR Legacy
Fatty acids, total monounsaturated 20 27
Fatty acids, total polyunsaturated 20 27
Fatty acids, total saturated 20 55
Fatty acids, total trans 14 52
Foundation
Fatty acids, total monounsaturated 20
Fatty acids, total polyunsaturated 20
Fatty acids, total saturated 20
Fatty acids, total trans 14


Table 42: Percentage of Entries with Fatty Acids
min max median
Foundation SR Legacy
Fatty acids, total monounsaturated 0.1429 0.0035
Fatty acids, total polyunsaturated 0.1429 0.0035
Fatty acids, total saturated 0.1429 0.0071
Fatty acids, total trans 0.1000 0.0067
Foundation SR Legacy
Fatty acids, total monounsaturated 0.1429 0.0035
Fatty acids, total polyunsaturated 0.1429 0.0035
Fatty acids, total saturated 0.1429 0.0071
Fatty acids, total trans 0.1000 0.0067
Foundation
Fatty acids, total monounsaturated 0.1429
Fatty acids, total polyunsaturated 0.1429
Fatty acids, total saturated 0.1429
Fatty acids, total trans 0.1000


2.4.7 Sugars


Table 43: Sugar Entries per Data Type
Count Percentage
Branded Foundation SR Legacy FNDDS
Fructose 2 70 1745
Glucose (dextrose) 4 70 1740
Inositol 51
Lactose 6 70 1712
Ribose 1
Sorbitol 32
Starch 30 45 1167
Sugars, added 61287
Sugars, total including NLEA 336043 5 6007 7083
Total sugar alcohols 3666
Xylitol 91
Galactose 60 1579
Maltose 70 1711
Sucrose 70 1733
Sugars, Total NLEA 70
Branded Foundation SR Legacy FNDDS
Fructose 0.0000 0.5000 0.2239
Glucose (dextrose) 0.0000 0.5000 0.2233
Inositol 0.0001
Lactose 0.0000 0.5000 0.2197
Ribose 0.0000
Sorbitol 0.0001
Starch 0.0001 0.3214 0.1497
Sugars, added 0.1712
Sugars, total including NLEA 0.9389 0.0357 0.7708 1
Total sugar alcohols 0.0102
Xylitol 0.0003
Galactose 0.4286 0.2026
Maltose 0.5000 0.2196
Sucrose 0.5000 0.2224
Sugars, Total NLEA 0.5000



Key notes/important takeaways:

  • The proportion of each data type that supplies information on sugars is severely lacking
  • Branded beat out both Foundation and SR legacy for proportion of entries providing information about total sugar. Even so, around 4% of branded foods are missing information on sugar.
  • Branded is the only data type with any information regarding sugar alcohols

2.4.7.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 44: Count of Entries with Sugars
min max median
Foundation SR Legacy
Fructose 68 1112
Galactose 58 959
Glucose (dextrose) 68 1114
Lactose 68 1072
Maltose 68 1069
Starch 41 778
Sucrose 68 1119
Sugars, total including NLEA 5 1016
Foundation SR Legacy
Fructose 68 1112
Galactose 58 959
Glucose (dextrose) 68 1114
Lactose 68 1072
Maltose 68 1069
Starch 41 778
Sucrose 68 1119
Sugars, total including NLEA 5 1016
Foundation
Fructose 70
Galactose 60
Glucose (dextrose) 70
Lactose 70
Maltose 70
Starch 45
Sucrose 70
Sugars, total including NLEA 5


Table 45: Percentage of Entries with Sugars
min max median
Foundation SR Legacy
Fructose 0.4857 0.1427
Galactose 0.4143 0.1231
Glucose (dextrose) 0.4857 0.1429
Lactose 0.4857 0.1376
Maltose 0.4857 0.1372
Starch 0.2929 0.0998
Sucrose 0.4857 0.1436
Sugars, total including NLEA 0.0357 0.1304
Foundation SR Legacy
Fructose 0.4857 0.1427
Galactose 0.4143 0.1231
Glucose (dextrose) 0.4857 0.1429
Lactose 0.4857 0.1376
Maltose 0.4857 0.1372
Starch 0.2929 0.0998
Sucrose 0.4857 0.1436
Sugars, total including NLEA 0.0357 0.1304
Foundation
Fructose 0.5000
Galactose 0.4286
Glucose (dextrose) 0.5000
Lactose 0.5000
Maltose 0.5000
Starch 0.3214
Sucrose 0.5000
Sugars, total including NLEA 0.0357


2.4.8 Fiber

Below are the number of entries for total fiber in each data type.


Table 46: Fiber Entries per Data Type
Count Percentage
Branded Foundation SR Legacy FNDDS
Fiber, insoluble 3745 11
Fiber, soluble 4099 11
Fiber, total dietary 296982 69 7231 7083
Inulin 12
Total dietary fiber (AOAC 2011.25) 5
Branded Foundation SR Legacy FNDDS
Fiber, insoluble 0.0105 0.0786
Fiber, soluble 0.0115 0.0786
Fiber, total dietary 0.8297 0.4929 0.9279 1
Inulin 0.0000
Total dietary fiber (AOAC 2011.25) 0.0357



Key notes/important takeaways:

  • Surprisingly, Foundation does not provide much information regarding fiber
  • Branded was the only data type to include any information on inulin content
  • SR legacy is somehow missing fiber content for about 7% of it’s foods

2.4.8.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 47: Count of Entries with Fiber
min max median
Foundation SR Legacy
Fiber, insoluble 11
Fiber, soluble 11
Fiber, total dietary 64 1029
Total dietary fiber (AOAC 2011.25) 5
Foundation SR Legacy
Fiber, insoluble 11
Fiber, soluble 11
Fiber, total dietary 64 1029
Total dietary fiber (AOAC 2011.25) 5
Foundation
Fiber, insoluble 11
Fiber, soluble 11
Fiber, total dietary 64
Total dietary fiber (AOAC 2011.25) 5


Table 48: Percentage of Entries with Fiber
min max median
Foundation SR Legacy
Fiber, insoluble 0.0786
Fiber, soluble 0.0786
Fiber, total dietary 0.4571 0.132
Total dietary fiber (AOAC 2011.25) 0.0357
Foundation SR Legacy
Fiber, insoluble 0.0786
Fiber, soluble 0.0786
Fiber, total dietary 0.4571 0.132
Total dietary fiber (AOAC 2011.25) 0.0357
Foundation
Fiber, insoluble 0.0786
Fiber, soluble 0.0786
Fiber, total dietary 0.4571
Total dietary fiber (AOAC 2011.25) 0.0357


2.4.9 Carbohydrates


Table 49: Carbohydrate Entries per Data Type
Count Percentage
Branded Foundation SR Legacy FNDDS
Carbohydrate, by difference 355036 112 7793 7083
Carbohydrate, other 1188
Carbohydrate, by summation 37
Branded Foundation SR Legacy FNDDS
Carbohydrate, by difference 0.9919 0.8000 1 1
Carbohydrate, other 0.0033
Carbohydrate, by summation 0.2643



Key notes/important takeaways:

  • For nearly all entries we have information on carbohydrates
  • 9 foods in Foundation are missing carbohydrate entries
  • Some foods in Foundation have entries for both “Carbohydrate, by summation” and “Carbohydrate, by difference”
  • 1703 foods in Branded are missing carbohydrate information, which proportionally speaking is very small
  • All SR Legacy foods have carbohydrate entries

2.4.9.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 50: Count of Entries with Carbohydrates
min max median
Foundation
Carbohydrate, by difference 20
Foundation
Carbohydrate, by difference 20
Foundation
Carbohydrate, by difference 0


Table 51: Percentage of Entries with Carbohydrates
min max median
Foundation
Carbohydrate, by difference 0.1429
Foundation
Carbohydrate, by difference 0.1429
Foundation
Carbohydrate, by difference 0


2.4.10 Cartenoids


Table 52: Cartenoid Entries per Data Type
Count Percentage
Branded Foundation SR Legacy FNDDS
Carotene, beta 18 38 5440 7083
Lutein + zeaxanthin 12 29 5294 7083
Carotene, alpha 39 5352 7083
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 39 5341 7083
Lutein 11
Lycopene 37 5314 7083
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12
Branded Foundation SR Legacy FNDDS
Carotene, beta 1e-04 0.2714 0.6981 1
Lutein + zeaxanthin 0e+00 0.2071 0.6793 1
Carotene, alpha 0.2786 0.6868 1
cis-beta-Carotene 0.0786
cis-Lutein/Zeaxanthin 0.1000
cis-Lycopene 0.0786
Cryptoxanthin, alpha 0.0929
Cryptoxanthin, beta 0.2786 0.6854 1
Lutein 0.0786
Lycopene 0.2643 0.6819 1
Phytoene 0.0143
Phytofluene 0.0143
trans-beta-Carotene 0.0786
trans-Lycopene 0.0714
Zeaxanthin 0.0857




Key notes/important takeaways:

  • Information for cartenoids in Foundation is provided seemingly at random

2.4.10.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 53: Count of Entries with Cartenoids
min max median
Foundation SR Legacy
Carotene, alpha 34 379
Carotene, beta 32 439
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 34 377
Lutein 11
Lutein + zeaxanthin 21 322
Lycopene 31 352
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12
Foundation SR Legacy
Carotene, alpha 34 379
Carotene, beta 32 439
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 34 377
Lutein 11
Lutein + zeaxanthin 21 322
Lycopene 31 352
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12
Foundation
Carotene, alpha 39
Carotene, beta 37
cis-beta-Carotene 11
cis-Lutein/Zeaxanthin 14
cis-Lycopene 11
Cryptoxanthin, alpha 13
Cryptoxanthin, beta 39
Lutein 11
Lutein + zeaxanthin 26
Lycopene 36
Phytoene 2
Phytofluene 2
trans-beta-Carotene 11
trans-Lycopene 10
Zeaxanthin 12


Table 54: Percentage of Entries with Cartenoids
min max median
Foundation SR Legacy
Carotene, alpha 0.2429 0.0486
Carotene, beta 0.2286 0.0563
cis-beta-Carotene 0.0786
cis-Lutein/Zeaxanthin 0.1000
cis-Lycopene 0.0786
Cryptoxanthin, alpha 0.0929
Cryptoxanthin, beta 0.2429 0.0484
Lutein 0.0786
Lutein + zeaxanthin 0.1500 0.0413
Lycopene 0.2214 0.0452
Phytoene 0.0143
Phytofluene 0.0143
trans-beta-Carotene 0.0786
trans-Lycopene 0.0714
Zeaxanthin 0.0857
Foundation SR Legacy
Carotene, alpha 0.2429 0.0486
Carotene, beta 0.2286 0.0563
cis-beta-Carotene 0.0786
cis-Lutein/Zeaxanthin 0.1000
cis-Lycopene 0.0786
Cryptoxanthin, alpha 0.0929
Cryptoxanthin, beta 0.2429 0.0484
Lutein 0.0786
Lutein + zeaxanthin 0.1500 0.0413
Lycopene 0.2214 0.0452
Phytoene 0.0143
Phytofluene 0.0143
trans-beta-Carotene 0.0786
trans-Lycopene 0.0714
Zeaxanthin 0.0857
Foundation
Carotene, alpha 0.2786
Carotene, beta 0.2643
cis-beta-Carotene 0.0786
cis-Lutein/Zeaxanthin 0.1000
cis-Lycopene 0.0786
Cryptoxanthin, alpha 0.0929
Cryptoxanthin, beta 0.2786
Lutein 0.0786
Lutein + zeaxanthin 0.1857
Lycopene 0.2571
Phytoene 0.0143
Phytofluene 0.0143
trans-beta-Carotene 0.0786
trans-Lycopene 0.0714
Zeaxanthin 0.0857


2.4.11 Phytosterols


Table 55: Phytosterol Entries per Data Type
Count Percentage
Branded Foundation SR Legacy FNDDS
Cholesterol 300501 41 7394 7083
Beta-sitostanol 8
Beta-sitosterol 8 138
Brassicasterol 8
Campestanol 8
Campesterol 8 137
Delta-5-avenasterol 8
Phytosterols, other 4
Stigmasterol 8 137
Phytosterols 489
Branded Foundation SR Legacy FNDDS
Cholesterol 0.8396 0.2929 0.9488 1
Beta-sitostanol 0.0571
Beta-sitosterol 0.0571 0.0177
Brassicasterol 0.0571
Campestanol 0.0571
Campesterol 0.0571 0.0176
Delta-5-avenasterol 0.0571
Phytosterols, other 0.0286
Stigmasterol 0.0571 0.0176
Phytosterols 0.0627




Key notes/important takeaways:

  • We are missing a lot of information on cholesterol for such a commonly reported nutrient
  • We’re missing information on cholesterol for about 36% of Branded foods, 71% of foundation foods and, 5% of SR legacy foods

2.4.11.1 min, max median

Below, you’ll find a table with three sections, the first describing the frequency of “min” values for each data type, the second containing frequencies of “max” values, and the third containing frequencies of “median” values for each of the listed nutrient per data type.


Table 56: Count of Entries with Phytosterols
min max median
Foundation SR Legacy
Beta-sitostanol 8
Beta-sitosterol 8 63
Brassicasterol 8
Campestanol 8
Campesterol 8 62
Cholesterol 40 864
Delta-5-avenasterol 8
Phytosterols, other 4
Stigmasterol 8 62
Phytosterols 2
Foundation SR Legacy
Beta-sitostanol 8
Beta-sitosterol 8 63
Brassicasterol 8
Campestanol 8
Campesterol 8 62
Cholesterol 40 864
Delta-5-avenasterol 8
Phytosterols, other 4
Stigmasterol 8 62
Phytosterols 2
Foundation
Beta-sitostanol 8
Beta-sitosterol 8
Brassicasterol 8
Campestanol 8
Campesterol 8
Cholesterol 41
Delta-5-avenasterol 8
Phytosterols, other 4
Stigmasterol 8
Phytosterols 0


Table 57:Percentage of Entries with Phytosterols
min max median
Foundation SR Legacy
Beta-sitostanol 0.0571
Beta-sitosterol 0.0571 0.0081
Brassicasterol 0.0571
Campestanol 0.0571
Campesterol 0.0571 0.0080
Cholesterol 0.2857 0.1109
Delta-5-avenasterol 0.0571
Phytosterols, other 0.0286
Stigmasterol 0.0571 0.0080
Phytosterols 0.0003
Foundation SR Legacy
Beta-sitostanol 0.0571
Beta-sitosterol 0.0571 0.0081
Brassicasterol 0.0571
Campestanol 0.0571
Campesterol 0.0571 0.0080
Cholesterol 0.2857 0.1109
Delta-5-avenasterol 0.0571
Phytosterols, other 0.0286
Stigmasterol 0.0571 0.0080
Phytosterols 0.0003
Foundation
Beta-sitostanol 0.0571
Beta-sitosterol 0.0571
Brassicasterol 0.0571
Campestanol 0.0571
Campesterol 0.0571
Cholesterol 0.2929
Delta-5-avenasterol 0.0571
Phytosterols, other 0.0286
Stigmasterol 0.0571
Phytosterols 0.0000


3 Variables Specific to Data Type

3.1 Market Acquisition

Part of Foundation is based on a dataset of branded foods that contains a significant amount of data not available for the other foods.

3.1.1 Market Country

A column for market country has been added to the data for branded foods. However, all entries indicate the market country to be “United States.” It is possible that this shows an intention to collect more data from other market countries.

3.1.2 Acquisition Locations

In the file market_acquisition we have a breakdown of where each of 5480 of the collected branded foods came from (Note: the summary file claims there are only 5327 entries in this file which is true when you download the data in Access format but there are 5480 entries in the csv version of the file). Specifically we have a “store_state” and “store_city” for each acquired item, many items are duplicated due to having been acquired multiple times from different locations. The list of unique entries in the store_state variable are as follows:

##  [1] "NE"       "PA"       "AZ"       "IA"       "TX"       "CA"      
##  [7] "IN"       "GA"       "NJ"       "KY"       "NC"       "FL"      
## [13] "NY"       "CO"       "TN"       "IL"       "WA"       "MA"      
## [19] "MN"       "MI"       "WI"       "KS"       "AL"       "CT"      
## [25] "AR"       "VA"       "OH"       "MO"       "OK"       ""        
## [31] "Al"       "WV"       "NM"       "NV"       "MD"       "Atlantic"
## [37] "West"     "Plains"   "Midwest"

34 of these entries are abbreviations of state names, 4 are region descriptions and 1 represents missing values.

This data is very far from being appropriately distributed and has the potential to create a confounding variable in the event of analysis if data is not identical across all locations.

3.2 Food Groups

FNDDS uses the wweia food groups which are split into 167 unique categories. The SR legacy and Foundation foods follow the SR legacy food groups which are split into 28 unique categories. Branded has it’s own list of food groups which contains 257 unique categories. None of the three lists of unique food categories have any entries that are identical across all three lists. However, the food categories for FNDDS and branded intersect on the following category names:

## [1] "Cheese"   "Rice"     "Tomatoes" "Pizza"    "Coffee"   "Beer"

3.3 FNDDS

Ingredients for FNDDS food are not available in the downloads from the FDC website, they are however downloadable from the ars website (https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/fndds-download-databases/). From here you can download the file “FNDDSIngred” which can be linked to the information provided by FoodData Central using the food_code variable. From the data available from the ARS website you can also link each ingredient to it SR_legacy equivalent and view the “Nutrient Value” of each score which is a number that ranks each food based on nutrient density. However, as you can see through the summary statistics below there is a problem with outliers in this variable. More than 90% of foods have a Nutrient value of less than 1, while several points are in the thousands.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.00     0.00     0.28    34.91     7.80 45902.00

3.3.1 Derivations

Since all FNDDS foods are based on nutrient entries in SR legacy and Foundation, the derivations of each nutrient will reflect that. From the data available on the ARS website (link above), we can download the files “FNDDSIngred”, “IngredNutVal”, and “DerivDesc” to determine which foods from SR legacy and foundation were used as ingredients and then from there we can look at the derivations of those foods.

3.4 Available Data on Flavonoids

3.4.1 USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes

From the USDA Ag Data Commons website you can find a database of flavonoid content for many of the foods in the FDC database (3). These flavonoid values can be linked to Foundation and SR legacy foods by NDB number. Combining this data gives us amounts of the following flavonoids in 25 foods in Foundation and 1613 foods in SR legacy. In all cases, every flavonoid is provided for each food.

##  [1] "Daidzein"                       "Genistein"                     
##  [3] "Glycitein"                      "Cyanidin"                      
##  [5] "Petunidin"                      "Delphinidin"                   
##  [7] "Malvidin"                       "Pelargonidin"                  
##  [9] "Peonidin"                       "(+)-Catechin"                  
## [11] "(-)-Epigallocatechin"           "(-)-Epicatechin"               
## [13] "(-)-Epicatechin 3-gallate"      "(-)-Epigallocatechin 3-gallate"
## [15] "Theaflavin"                     "Thearubigins"                  
## [17] "Eriodictyol"                    "Hesperetin"                    
## [19] "Naringenin"                     "Apigenin"                      
## [21] "Luteolin"                       "Isorhamnetin"                  
## [23] "Kaempferol"                     "Myricetin"                     
## [25] "Quercetin"                      "Theaflavin-3,3'-digallate"     
## [27] "Theaflavin-3'-gallate"          "Theaflavin-3-gallate"          
## [29] "(+)-Gallocatechin"

3.4.2 USDA Database for the Flavonoid Content of Selected Foods

After the release of the USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes in 2015, alterations and additions were made to the USDA Database for the Flavonoid Content of Selected Foods in 2018. While the other supplemental databases are available on the USDA Ag Data Commons, this new update was published solely on the USDA Agricultural Research Service website (4). There are values in the USDA Database for the Flavonoid Content of Selected Foods for a total of 183 foods in SR legacy and Foundation. Of those 183, 131 can also be found in the USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes (meaning a total of 52 new foods have been added in this release). These newer values are assumed to be more accurate and if added to the FDC data should replace their previous versions.

Table 58: Flavonoid Entries for SR Legacy Foods
n
(-)-Epicatechin 82
(-)-Epicatechin 3-gallate 75
(-)-Epigallocatechin 73
(-)-Epigallocatechin 3-gallate 74
(+)-Catechin 83
(+)-Gallocatechin 71
Apigenin 101
Cyanidin 48
Delphinidin 46
Eriodictyol 2
Hesperetin 39
Isorhamnetin 43
Kaempferol 132
Luteolin 112
Malvidin 38
Myricetin 122
Naringenin 37
Pelargonidin 39
Peonidin 38
Petunidin 37
Quercetin 162
Theaflavin 3
Theaflavin-3'-gallate 3
Theaflavin-3,3'-digallate 3
Thearubigins 3


Table 59: Flavonoid Entries for Foundation Foods
n
(-)-Epicatechin 4
(-)-Epicatechin 3-gallate 4
(-)-Epigallocatechin 4
(-)-Epigallocatechin 3-gallate 4
(+)-Catechin 4
(+)-Gallocatechin 4
Apigenin 6
Cyanidin 2
Delphinidin 2
Hesperetin 2
Isorhamnetin 1
Kaempferol 7
Luteolin 6
Malvidin 2
Myricetin 7
Naringenin 2
Pelargonidin 2
Peonidin 2
Petunidin 2
Quercetin 7

3.4.3 USDA Database for the Proanthocyanidin Content of Selected Foods

From the USDA Ag Data Commons website you can find a database of proanthocyanidin content for many of the foods in the FDC database (5). These proanthocyanidin values can be linked to Foundation and SR legacy foods by NDB number. The following tables contain the names of each type of proanthocyanidin content and the number of foods in SR legacy and Foundation we have entries for.


Table 60: Proanthocyanidin Entries for SR Legacy Foods
n
Proanthocyanidin 4-6mers 114
Proanthocyanidin 7-10mers 110
Proanthocyanidin dimers 130
Proanthocyanidin polymers (>10mers) 108
Proanthocyanidin trimers 124


Table 61: Proanthocyanidin Entries for Foundation Foods
n
Proanthocyanidin 4-6mers 6
Proanthocyanidin 7-10mers 6
Proanthocyanidin dimers 6
Proanthocyanidin polymers (>10mers) 6
Proanthocyanidin trimers 6


3.4.4 USDA Database for the Isoflavone Content of Selected Foods

From the USDA Ag Data Commons website you can find a database of Isoflavone content for many of the foods in the FDC database (6). These Isoflavone values can be linked to Foundation and SR legacy foods by NDB number. The following tables contain the names of each type of Isoflavone content and the number of foods in SR legacy and Foundation we have entries for.


Table 62: Isoflavone Entries for SR Legacy Foods
n
Biochanin A 59
Coumestrol 123
Daidzein 262
Formononetin 123
Genistein 262
Glycitein 143
Total isoflavones 259


Table 63: Isoflavone Entries for Foundation Foods
n
Biochanin A 3
Coumestrol 8
Daidzein 15
Formononetin 8
Genistein 15
Glycitein 9
Total isoflavones 15


This data set has significant overlap with USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes but provides additional information on “Biochanin A”, “Coumestrol”, “Formononetin”, and “Total isoflavones”.

4 References

  1. U.S. Department of Agriculture, Agricultural Research Service. FoodData Central, 2019. fdc.nal.usda.gov.

  2. US Department of Agriculture, Agricultural Research Service. 2016. Nutrient Data Laboratory. USDA National Nutrient Database for Standard Reference, Release 28 (Slightly revised). Version Current: May 2016. http://www.ars.usda.gov/nea/bhnrc/mafcl

  3. Bhagwat, Seema; Haytowitz, David B.; Wasswa-Kintu, Shirley. (2015). USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes, Release 1.1 - December 2015. Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324677. Accessed 2022-01-12.

  4. Haytowitz, D.B., Wu, X., Bhagwat, S. 2018. USDA Database for the Flavonoid Content of Selected Foods, Release 3.3. U.S. Department of Agriculture, Agricultural Research Service. Nutrient Data Laboratory Home Page: http://www.ars.usda.gov/nutrientdata/flav

  5. Bhagwat, Seema; Haytowitz, David B.. (2015). USDA Database for the Proanthocyanidin Content of Selected Foods, Release 2 (2015). Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324621. Accessed 2022-01-12.

  6. Bhagwat, Seema; Haytowitz, David B.. (2015). USDA Database for the Isoflavone Content of Selected Foods, Release 2.1 (November 2015). Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324538. Accessed 2022-01-12.