logo

1 Introduction


The U.S. Department of Agriculture’s FoodData Central (FDC) database is one (if not the) most expansive collection of food composition data available. By examining the data quality of FDC we can determine what food composition data has been reliably collected and can be accurately used for research and what information needs to be further expanded upon or updated.

Furthermore, the database is expansive and difficult to parse. By analyzing and comparing collected variables in a detailed and readable format, we increase the potential of understanding the data and using it to further the state of nutrition research.


2 Data Collection


There are 4 ways to download data from the FoodData Central Database (1). You can download the data as a collection of 35 Excel-compatible CSV delimited ASCII files, a singular Microsoft Access database file, a collection of 4 JavaScript Object Notation (JSON) files, or request data directly from the API.

As of October 2021, the Access format has been discontinued and it is recommended to use the new file format, JSON.

To fully analyze the age and groupings of nutrient measures in FDC, further information had to be collected from the original releases of SR Legacy and FNDDS published by the Agricultural Research Service to the USDA Ag Data Commons website (2-4).

The JSON files were into R as strings using the jsonlite package and converted into data frames for ease of use in analysis. The 4 JSON files are named as follows:

  • FoodData_Central_branded_food_json_2021-10-28
  • FoodData_Central_foundation_food_json_2021-10-28
  • FoodData_Central_sr_legacy_food_json_2021-10-28
  • FoodData_Central_survey_food_json_2021-10-28

To accurately compare and contrast data within the four files and identify the quality of the data overall we will have to analyze the variables available in each file and combine them into a singular data structure.


3 Data Cleansing and Organization


Different variables are provided for each of the 4 data types.

The variables common among all 4 data types are:

Variable Name Variable Description
“foodClass” The classes of food within the data are “Survey” for FNDDS, “Branded” for branded foods, and “FinalFood” for SR legacy and foundation foods
“description” The name or description of the food such as “Milk, Whole” or “100 Grand Bar”
“foodNutrients” A nested variable containing all info on the nutrient composition (per 100g) and derivation of nutrient composition for each food
“foodAttributes” A nested variable left blank for SR legacy and foundation foods. For branded foods this variable contains a log of any updates made to this food (using variables “id”, “name”, “value”, “foodAttributeType.id”, “foodAttributeType.name”, and “foodAttributeType.description”). For survey foods this variable contains any attributes of the ingredients used
“fdcId” A unique identifier given to each food
“dataType” The dataset the food is contained in (of the 4 databases FNDDS, foundation, branded, and SR legacy)
“publicationDate” The day this version of the food as it appears in the data was published to the FoodData Central website


Just because these variables are present for all 4 data types does not mean that they are utilized for all 4 data types. For instance every entry of “foodAttributes” in the SR legacy file is left blank.

These variables can be joined to create a comparison of the nutrient information per 100g and publication date of each food. All other information is unique to each data type and will have to be analyzed separately.


Below you’ll find the number of foods and nutrient measures available for each data type.


Table 1: Number of Entries per Data Type
food_entries nutrient_entries
Branded 373897 5138548
Foundation 159 10023
SR Legacy 7793 644125
Survey (FNDDS) 7083 460395
Total 388932 6253091


Key notes/important takeaways:

  • The four data types were made for different purposes and contain vastly different data in many cases.
  • On average, there are 16 nutrients provided per food (total nutrient entries divided by total food entries) overall.


4 Overall Analysis of Nutrient Measures and Date of Publication to FDC


4.1 Overall View of Nutrient Measures and Units by Data Type


Below is a table of the nutrient measures and units per data type. For food or beverages with no nutrient entries, nutrient name has been left blank.



Key notes/important takeaways:

  • There are 259 unique nutrient names, but not 259 unique nutrients. Multiple versions of one nutrient are often present in the data such as “Total dietary fiber (AOAC 2011.25)” and “Fiber, total dietary” or “Vitamin A, IU” and “Vitamin A, RAE”. Because of the way the nutrients were recorded, for each unit used to record a nutrient there is a unique nutrient name.
  • None of the data types use all 259 combinations of nutrient name and unit name.


4.2 Date of Publication to FDC


This variable indicates the date for which each nutrient measurement was added to the FDC database.


Figure 1: Publication to FDC by Data Type


Key notes/important takeaways:

  • Branded and foundation are being continually updated and new foods are added to these data types on a semi-regular basis.
  • SR Legacy has not been edited since it was originally published to FoodData Central in April 2019.
  • FNDDS has not been edited since it was originally published to FoodData Central in November 2020.


5 Analysis by Data Type


The 4 data types as defined by FDC are SR Legacy, Survey (FNDDS), Branded, and Foundation.

Different variables are provided for each data type, so each data type will have to be analyzed separately. For each data type we will be investigating the following areas of interest.

  • Nutrient Measures and Units
    • How many different nutrient measures were provided for each data type?
    • What nutrients are provided most for each data type?
  • Age of Nutrient Measures
    • How long ago were the nutrient measures recorded?
    • Are the nutrient measures being regularly updated?
  • Origin of Nutrient Measures
    • What methods were used to acquire the nutrient measures?
    • How much do we know about where the nutrient measures are coming from?
  • Food Categories
    • How have the foods been organized?
    • What types of food were collected the most?
    • Were foods consistently grouped for comparison?
  • Completeness
    • What nutrient measures are missing?
    • Are there any unexplained missing variables?
    • What foods and nutrient measures need to be further researched further?

6 SR Legacy


The US Department of Agriculture (USDA) National Nutrient Database for Standard Reference is the major source of food composition data in the United States and provides the foundation for most food composition databases in the public and private sectors. This is the last release of the database in its current format. SR-Legacy will continue its preeminent role as a stand-alone food composition resource and will be available in the new modernized system currently under development. SR-Legacy contains data on 7,793 food items and up to 150 food components that were reported in SR28 (2015), with selected corrections and updates (2).


6.1 Nutrient Measures and Units


Below you’ll find the nutrient measures and corresponding units for all foods in SR legacy.



Figure 2: Frequency of Nutrient Measures in SR Legacy


Key notes/important takeaways:

  • Some nutrient measures are recorded using multiple units (such as energy being recorded in both kj and kcal), this is reflected in the figure above.
  • There are 6 nutrient measures present for every SR Legacy food; Carbohydrate, by difference, Energy (in kcal), Energy (in kJ), Protein, Total lipid (fat), and Water. Ash comes in close, it is only missing in 3 of the 7793 foods in SR Legacy.

6.2 Age of Nutrient Measures


FDC does not provide any variables for the age of nutrient measures in SR. To get the dates for SR legacy we have to go back to the original SR legacy data download on the USDA Ag Data Commons website (2) the file “NUT_DATA” which provides a variable listed as “AddMod_Date” which specifies the last modified date for each nutrient entry.


Table 4: Nutrient Measure Additions and Modifications in SR Legacy


Figure 3: Nutrient Measure Additions and Modifications in SR Legacy


Key notes/important takeaways:

  • Each nutrient measure has only one AddMod_Date, it is not specified which dates are for additions and which are for modifications. Each time a nutrient measure is modified the new AddMod_Date replaces the previous.
  • There is no record of how many modifications were made or when previous modifications or additions took place.
  • There is data in SR Legacy that has not been updated since 1976.
  • Modifications or additions spiked in the years 1984, 2003, and 2013.


6.3 Origin of Nutrient Measures


There are three variables in the FoodData Central data that identify the origin of each nutrient measures. There is the derivation description which is the method by which each nutrient measure was derived. Then there are the source code and source description which identify the overall origin of each derivation method.


Table 5: Source and Derivation of Nutrient Measures in SR Legacy


Key notes/important takeaways:

  • One in three nutrient measures were obtained from analytical methods or derived from analytical methods.
  • Nearly 5% of nutrient measures are based on manufacturer calculations and labels.


6.4 Food Categories


There are 25 food categories present in SR legacy, below you’ll find a break-down of how many foods and nutrient measures were collected for each food category.



Table 7: Nutrient Measures per Food Category in SR Legacy


Key notes/important takeaways:

  • SR Food Categories are based on both how food products are packaged and sold and nutrient content.
    • This can lead to ambiguity in the placement of some foods, such as sausages made of beef being categorized as Sausages and Luncheon Meats rather than Beef Products.
    • There are some foods in SR that don’t necessarily fit into these groupings such as condiments. For instance catsup is in Vegetables and Vegetable Products and mustard is in Spices and Herbs.


6.5 Completeness


In this section we will be investigating the extent to which the data is complete and what further research needs to be completed.


All Nutrient Measures and Grouping



Figure 4: Frequency of Nutrient Measures For All Foods


Key notes/important takeaways:

  • On average, there are about 83 nutrient measures for each food in SR which is high above the average for all foods in FDC which is 16.
  • Foods in SR most commonly have about 70 or 90 nutrient measures (as represented by the two peaks in the graph above).


Nutrient measures were grouped for comparison, below you’ll find the groupings of every nutrient measure in SR legacy.


Table 8: Nutrient Measure Groups in SR Legacy


Nutrient Measures by Food Category



Each tab below displays the most essential or important nutrient measures in each nutrient measure group. The grouping of nutrient names into the groups displayed below can be found in table 8. The percentages below represent the proportion of foods in each food category containing at least one nutrient measure in the specified subgroup (For instance, if 50% of a food category contains Vitamin A, 50% of foods in that category contain either Retinol, Vitamin A, IU, or Vitamin A, RAE).


Carbohydrates


Key notes/important takeaways:

  • Soluble and insoluble fiber are not reported in SR legacy

  • Every food in SR Legacy has a value for total carbohydrates

  • Overall Inconsistencies in carbohydrate measures:

    • Starch is not provided for any pork or beef products
    • There is information on starch for only 26% of baked products
    • Total fiber is provided for only 52% of American Indian/Alaska Native Foods
    • Starch is provided for 92% of Restaurant Foods, which is a much higher percentage than any of the other food categories


Proteins


Key notes/important takeaways:

  • Total protein is provided for all foods.

  • All essential amino acids are reported at similar rates within each food category, but differ greatly between food categories.

  • No essential amino acids are provided for 100% of the products in any food category

  • There were more non-essential amino acid measures than essential amino acid measures collected overall. However, the essential amino acids seem to have been collected more consistently with each one being present for a little more than 5000 foods whereas the non-essentials range from being collected for 1431 foods (Hydroxyproline) to 5170 foods (Theobromine).

  • Overall Inconsistencies in protein measures:

    • 100% of poultry products have values for all of the provided non-essential amino acids.
    • Values for essential amino acids are only provided for 22% of breakfast cereals, 19-21% of baby foods, and 12% of beverages.


Sugars


Key notes/important takeaways:

  • Some sugars included in FDC (Raffinose, Stachyose, Ribose, and Verbascose) are not specified for foods in SR.

  • Overall Inconsistencies in sugar measures:

    • For sweets, there are values for sucrose, glucose, fructose, lactose, and maltose for 10% of foods and galactose for only 9% of foods.
    • Values of galactose are recorded less than all the other sugars for 14 of the 25 food categories. In the 11 food categories where galactose is not recorded less, it is recorded at the same rate as lactose and maltose.
    • Only 18% of breakfast cereals have values for all 6 sugars.


Vitamins


Key notes/important takeaways:

  • Biotin B7 is not recorded in SR legacy

  • Overall inconsistencies in vitamin measures:

    • Measures of vitamin k are recorded for 19% of Lamb, Veal, and Game Products, all other food categories have vitamin k values for 40% of foods or more.
    • Measures of vitamin D are present for 11% of American Indian/Alaska Native Foods and 23% of Restaurant Foods, all other food categories have vitamin D values for 40% of foods or more.
    • Values for thiamin, riboflavin, and niacin are reported for at least 83% of foods in every food category.


Fats


Key notes/important takeaways:

  • Total fat is provided for all foods.

  • There were more varieties of fat measures collected than any other measure group.

  • Overall inconsistencies in fat measures:

    • Restaurant Foods is the only food category where 100% of foods have values for both trans fat and omega 6 fatty acids.
    • Values of trans fat are only provided for 43% of foods in the category fats and oils and values for omega-6 are only provided for 36%.
    • Omega-6 values are provided for less than 30% of foods in 18 of the 25 food groups.


Minerals


Key notes/important takeaways:

  • Iodine, chloride (or chlorine), and chromium were not provided for any foods in SR Legacy.

  • Overall inconsistencies in mineral measures:

    • Fluoride is not provided for any Pork Products, American Indian/Alaska Native Foods, and Restaurant Foods
    • Less than 30% of foods in all food categories have flouride measures.


Phytochemicals


Key notes/important takeaways:

  • Overall inconsistencies in phytochemical measures:
    • Nut and seed products report stigmasterol, campesterol, and beta-sitosterol for 51% of foods, which is a much higher rate than any other food category.
    • Stigmasterol, campesterol, and beta-sitosterol are not provided for 11 food groups.


Figure 5: Frequency of Missing Nutrient Measure Entries by Food Category


Key notes/important takeaways:

  • All fat, protein, and sugar values are provided for at least one food in each food group.


Figure 6: Total Nutrient Measures by Food Category in SR Legacy


Key notes/important takeaways:

  • There are more nutrient measures recorded in beef products than any other food category


Date of Additions and Modifications



Each tab below displays the most essential or important nutrient measures in each nutrient measure group. The grouping of nutrient names into the groups displayed below can be found in table 8. The percentages below represent the proportion of each nutrient measure that was added or modified in the given time period. In this case, every column sums to 100%.

All AddMod dates had to be collected from the ARS website, they were not provided by FDC.


Carbohydrates


Key notes/important takeaways:

  • All starch measures were recorded after 1995.
  • 67% of carbohydrate measures currently in use were recorded between 2001 and 2020.
  • 41% of total fiber measurements are more than 20 years old.


Proteins


Key notes/important takeaways:

  • An addition or modification date is provided for all protein measures.


Sugars


Key notes/important takeaways:

  • An addition or modification date is provided for all sugar measures.
  • No sugar measures currently in use were recorded before the year 1996.


Vitamins


Key notes/important takeaways:

  • All Choline measures are from 2001 or later.
  • More than 20% of all values for thiamin, riboflavin, niacin, pantothenic acid, and pyridoxine are from the 1980’s.
  • More than 50% of all vitamin measures were added or modified after the year 2000.


Fats


Key notes/important takeaways:

  • All trans fat and omega-6 values were added or modified after the year 2000.


Minerals


Key notes/important takeaways:

  • All selenium values were added or modified after 1990.
  • All flouride values were added or modified after 2000.


Phytochemicals


Key notes/important takeaways:

  • An addition or modification date was provided for all phytochemicals.
  • All phytochemical measures were added or modified after 1996.




Key notes/important takeaways:

  • Only 7 food categories were collected prior to 1981, 13 food categories were added between 1981 and 1985, 2 were added between 1986 and 1990, and the remaining 3 food categories were added in the late 1990s and early 2000s.
  • Missing addition or modification dates do not appear to be associated with any particular food category.


Figure 7: Frequency of Missing Addition or Modification Date Entries


Key notes/important takeaways:

  • Less than 1% of addition or modification dates of SR foods were missing.


Origin of Nutrient Measures



Each tab below displays the most essential or important nutrient measures in each nutrient measure group. The grouping of nutrient names into the groups displayed below can be found in table 8. The percentages below represent the proportion of each nutrient measure that was acquired by that source. In this case, every column sums to 100%.


Carbohydrates


Key notes/important takeaways:

  • 91% of starch values were derived by analytical means.
  • 73% of total carbohydrate values were calculated or imputed.


Proteins


Key notes/important takeaways:

  • 46% - 47% of essential amino acid values have no provided source.
  • 39% of non-essential amino acid values have no provided source, meaning that the source was provided more often for the non-essential amino acids than the essential amino acids.
  • 6% of non-essential amino acid values were assumed zero.


Sugars


Key notes/important takeaways:

  • Total sugar values are spread among multiple sources but types of sugar were mostly (83%-85%) derived analytically.


Vitamins


Key notes/important takeaways:

  • The origin was provided for almost 100% of vitamin k values and 98% of vitamin E values.
  • 42% of vitamin d and vitamin b12 measures were assumed zero.


Fats


Key notes/important takeaways:

  • Omega-6 was the only fat measure that was never assumed zero, and the only fat measure that has a source for every value.


Minerals


Key notes/important takeaways:

  • At least 36% of all mineral measures were derived analytically.
  • 56% of selenium values were calculated or imputed.


Phytochemicals


Key notes/important takeaways:

  • 79% of all Stigmasterol, Campesterol, and beta-sitosterol values were analytically derived.


Figure 8: Frequency of Missing Origin Entries


Key notes/important takeaways:

  • Nearly 1/3 of the origins of nutrient measure values are missing.


6.6 Comparison of File Formats


Ideally the information available on the FDC would be identical across all file formats, however this is not the case.

Requesting Data Using the API


To collect FDC data using the API, an API key must be acquired from FDC. API access is limited to 3,600 requests per hour per IP address. In the event that you need to collect a large amount of food data from the API you must contact FDC directly.

There are 4 ways to collect data through the API:

  1. Get /v1/food/{fdcId}
    • Requires one FDC ID and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a single food item by an FDC ID.
  2. Get /v1/foods
    • Requires a list of multiple FDC ID’s and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a list of food items by a list of up to 20 FDC IDs.
  3. Get /v1/foods/list
    • No input required.
    • Retrieves a paged list of foods.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.
    • Only provides an abridged version of the data which includes; FDC id, name and description of food, date published to FDC website, food code (FNDDS only), the nutrient measures calculated per 100g or 100ml of the food, and the method of derivation for each nutrient measure collected.
  4. Get /v1/foods/search
    • Requires one or more search terms as input.
    • Returns a list of foods that matched search keywords.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.


Through methods 1,2, and 4, variables relating to nutrient acquisition and analysis are provided for foods in SR that cannot be found in the JSON, ASCII, and MS Access files downloadable on the FDC website. For each of these variables no description was provided.



Comparison of JSON and ASCII Data


Below you’ll find a comparison of variables in the JSON and ASCII file formats, this comparison is of variables unique to each format (not an exhaustive list). By design, the ASCII format will contain multiple id variables per file that are used to combine the data from multiple files. These id variables are not necessary in the JSON format.


Table 32: Comparison of Unique SR Legacy Variables in JSON and ASCII Formats


Key notes/important takeaways:

  • Variable descriptions have not been provided for 11 variables overall.
  • The content of the two formats is very similar, but major inconsistencies include:
    • Footnotes and median values for nutrient measures are exclusive to the ASCII format.
    • Food Attributes show up as empty variables in the JSON format, this is almost certainly an error exporting the data from the API.
    • Scientific name, food class, and food nutrient type appear exclusively in the JSON format. Scientific name provides further information on the foods in SR Legacy but food class and food nutrient type appear to be more for navigating through the system and act as replacements for certain id variables.
    • The food_portion file/table contains a variable called amount in the ASCII format, the information provided by amount is also included in the modifier variable present in both formats.


7 Survey (FNDDS)


FNDDS is a database that provides the nutrient values for foods and beverages reported in What We Eat in America, the dietary intake component of the National Health and Nutrition Examination Survey. FNDDS is made available for researchers to review the nutrient profiles for specific foods and beverages as well as their associated portions and recipes(3).


7.1 Nutrients and Units


As you will see in the table below, FNDDS provides information on the same 65 nutrients for each of it’s 7083 foods.

In previous versions of FNDDS, the nutrient profiles of each food were expanded to include 29 flavanoids, however the flavanoid data associated with the current version of FNDDS is not due to release until summer of 2022.



Figure 9: Frequency of Nutrient Measures in FNDDS


Key notes/important takeaways:

  • Many of the essential nutrients are missing in FNDDS such as biotin B7, pyridoxine B6, chloride, chromium, tryptophan, and threonine among others.
  • There is a value for each of the 65 nutrient measures for every food. However, 28.9% (133106 out of 460395) of these nutrient measure values are 0. Since we cannot distinguish the values assumed zero and the values calculated to be zero, there is no way of knowing how many of these nutrient measures are correct.


7.2 Age of Nutrient Measures


There are two variables provided for the age of nutrient measures in FNDDS, they are the start date and end date of each sample. All samples started on “2017-01-01” and ended on “2018-12-31.”


Additional variables measuring the age of nutrient values in FNDDS are available in the original release of FNDDS as it appears on the Food Surveys Research Group Home Page(3) but are not present in the JSON files downloadable on the FDC website. These additional variables include the addition and/or modification date of SR foods and minimum year acquired of Foundation foods used to calculate the nutrient values in FNDDS.


7.3 Origin of Nutrient Measures


There is no variable for the origin of the nutrient measurements in FNDDS, instead we are given a list of foods in SR Legacy and Foundation that were used as components to calculate the nutrient measurements in FNDDS. However,the documentation of FNDDS states that for a few ingredient codes, a source other than SR Legacy or Foundation was the basis for either all, or for only select nutrients.


While the specific source is documented as a variable named “Nutrient Value Source” in the original release of FNDDS (as it appears on the Food Surveys Research Group Home Page)(3), this variable was not included in the FDC release of the data.

Sources used to calculate nutrient values are listed in the documentation and include:



7.4 Food Categories


FNDDS uses the food categories from What We Eat in America (WWEIA).

These categories are further grouped in a pdf labeled [“List of WWEIA Food Categories 2017-2018” on the ARS website (4).



Table 34: Frequency of Foods by Food Category in FNDDS


Figure 10: Frequency of Foods by Food Category in FNDDS


Key notes/important takeaways:

  • The foods in FNDDS are supposed to reflect the average American diet, so it makes sense that the category containing the most foods (28%) is mixed dishes.
  • The subgroup with the most foods is Mixed Dishes – Meat, Poultry, Seafood which contains 8% of all foods in FNDDS.


7.5 Completeness


FNDDS provides values for the same 65 nutrient measures for all of it’s 7083 foods, no exceptions. However, we can investigate which nutrient measure values were assumed to be zero by looking at the overall frequency of zero values present in the data.


Figure 11: Frequency of Nutrient Measure Values Recorded as Zero in FNDDS


Key notes/important takeaways:

  • Over 85% of values for ethyl alcohol, caffeine, lycopene, PUFA 18:4, theobromine, added vitamin b12, and added vitamin e are zero. These are all values that appear rarely in foods and are logical measures to be assumed zero.



7.6 Comparison of File Formats


Ideally the information available on the FDC would be identical across all file formats, however this is not the case.

Requesting Data Using the API


To collect FDC data using the API, an API key must be acquired from FDC. API access is limited to 3,600 requests per hour per IP address. In the event that you need to collect a large amount of food data from the API you must contact FDC directly.

There are 4 ways to collect data through the API:

  1. Get /v1/food/{fdcId}
    • Requires one FDC ID and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a single food item by an FDC ID.
  2. Get /v1/foods
    • Requires a list of multiple FDC ID’s and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a list of food items by a list of up to 20 FDC IDs.
  3. Get /v1/foods/list
    • No input required.
    • Retrieves a paged list of foods.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.
    • Only provides an abridged version of the data which includes; FDC id, name and description of food, date published to FDC website, food code (FNDDS only), the nutrient measures calculated per 100g or 100ml of the food, and the method of derivation for each nutrient measure collected.
  4. Get /v1/foods/search
    • Requires one or more search terms as input.
    • Returns a list of foods that matched search keywords.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.


Through methods 1,2, and 4, the variable minYearAcquired is provided which cannot be found in the JSON, ASCII, and MS Access files downloadable on the FDC website.


Comparison of JSON and ASCII Data


Below you’ll find a comparison of variables in the JSON and ASCII file formats, this comparison is of variables unique to each format (not an exhaustive list). By design, the ASCII format will contain multiple id variables per file that are used to combine the data from multiple files. These id variables are not necessary in the JSON format.


Table 35: Comparison of Unique FNDDS Variables in JSON and ASCII Formats


The ASCII file fndds_ingredient_nutrient_value can only be linked to other ASCII files using the variable ingredient code. However, ingredient code is not present in any other ASCII file.

The ASCII file fndds_derivation has a similar problem, it contains no variables that can be used to link it to the other files.

This means we have addition and modification dates, sources where nutrient measures were collected, and the methods used to derive nutrient measures, but no way to find out which foods this information pertains to.

Key notes/important takeaways:

  • A user of this data would not be able to identify the foods used to derive the FNDDS nutrient measures. This issue can be seen from variables within the table/file input_food.
  • To acquire the most complete version of this data, both formats would need to be combined.


8 Branded


USDA Global Branded Food Products Database (Branded Foods) are data from a public-private partnership that provides values for nutrients in branded and private label foods that appear on the product label. Information in Branded Foods is received from food industry data providers. USDA supports this data type by standardizing the presentation of the data. Beginning in April 2020, data in Branded Foods will be updated on a monthly basis. These data can be found in the API. In addition, downloads containing the most recent data will be generated every six months with each new release of FoodData Central (1).

To use the branded food information in the ASCII format, the current branded foods must be identified from the Access format.


In the ASCII files, every time a branded food is updated it is entered as a new food with a new FDC id. Since the previous version of the new food is not erased and there is no indicator for what foods have been duplicated in this process, there are 1555131 food entries in the branded_food file instead of 373897.


The Access format has been discontinued as of October 2021, all future releases must be accessed via JSON files or the API.


8.1 Nutrient Measures and Units


Branded has 655 foods with no provided nutrient measures. The NA or empty values in the table below represent these foods.



Figure 12: Frequency of Nutrient Measures in Branded


Key notes/important takeaways:

  • There are 101 different nutrient measures recorded in branded.
  • For 655 foods in branded no nutrient measure values are provided, these 655 foods are alcoholic beverages and other food or beverage items that are not required to have a nutrition facts panel.
  • There are 5 nutrient measures only provided for 1 food.


8.2 Age of Nutrient Measures


Branded offers two measures of age; “modifiedDate” which is the last date the food was altered by the manufacturer and “availableDate” which is the date the food was made available for inclusion in the database.

Unfortunately, in the JSON files the “availableDate” was mistakenly overwritten with the “modifiedDate”. This means that “availableDate” and “modifiedDate” are identical.

This overwritten information is also reflected on the FDC website.


In an attempt to accurately assess the age of measurements in Branded, the corresponding variables “available_date” and “modified_date” present in the csv files were acquired. However, “available_date” and “modified_date” were only available for 65600 out of 308297 (17.5%) branded foods.

Going forward we will only be analyzing the modified date provided in the json files.


Table 37: Modified Date of Foods in Branded


Figure 13: Modified Date of Nutrient Measures in Branded


Key notes/important takeaways:

  • All nutrient measures in branded have been modified relatively recently, the oldest modification date is in 2013.
  • For 7 foods, no modification date was provided.


8.3 Origin of Nutrient Measures


There are three variables in the FoodData Central data that identify the origin of each nutrient measure. There is the derivation description which is the method by which each nutrient measure was derived. Then there are the source code and source description which identify the overall origin of the nutrient measures.


Table 38: Source and Derivation of Nutrient Measures in Branded



Key notes/important takeaways:

  • All nutrient measures in branded come from manufacturer data.
  • For 656 nutrient measures the source or derivation was not provided, this is most likely an error.


8.4 Food Categories


food categories in branded are provided by food and beverage manufacturers and as such are highly inconsistent and often incorrect. A few examples of incorrectly categorized foods include but are not limited to:

brandedFoodCategory Food
Meat Substitutes BANQUET Classic Cheesy Patty And Mashed Potatoes, 9 OZ
Food/Beverage/Tobacco Variety Packs Annie’s Organic Chocolate Chip Bunny Grahams
Media CELEBRATE WITH CHOCOLATE
Gardening LOVAGE



Key notes/important takeaways:

  • There are 309 branded food categories.
  • Each food category contains between 1 and 18306 foods.
  • Different food categories can contain extremely similar foods such as “Alcohol” and “Alcoholic Beverages”.
  • The branded food categories are inconsistent due to being provided by the food and beverage manufacturers.


8.5 Completeness


Ideally, we would have a full nutrient profile for each food in branded but this is not the case. Below we examine the average number of nutrient measures provided per food.


Figure 14: Frequency of Nutrient Measures For All Foods


Key notes/important takeaways:

  • About 14 nutrient measures are present per food in branded, which is around how many you’d find on a nutrition facts label.
  • There are 48 nutrient measures provided in at least one food in branded.


8.6 Comparison of File Formats


Ideally the information available on the FDC would be identical across all file formats, however this is not the case.

Requesting Data Using the API


To collect FDC data using the API, an API key must be acquired from FDC. API access is limited to 3,600 requests per hour per IP address. In the event that you need to collect a large amount of food data from the API you must contact FDC directly.

There are 4 ways to collect data through the API:

  1. Get /v1/food/{fdcId}
    • Requires one FDC ID and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a single food item by an FDC ID.
  2. Get /v1/foods
    • Requires a list of multiple FDC ID’s and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a list of food items by a list of up to 20 FDC IDs.
  3. Get /v1/foods/list
    • No input required.
    • Retrieves a paged list of foods.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.
    • Only provides an abridged version of the data which includes; FDC id, name and description of food, date published to FDC website, food code (FNDDS only), the nutrient measures calculated per 100g or 100ml of the food, and the method of derivation for each nutrient measure collected.
  4. Get /v1/foods/search
    • Requires one or more search terms as input.
    • Returns a list of foods that matched search keywords.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.


For branded foods, the API data is identical to the JSON data.


Comparison of JSON and ASCII Data


Below you’ll find a comparison of variables in the JSON and ASCII file formats, this comparison is of variables unique to each format (not an exhaustive list). By design, the ASCII format will contain multiple id variables per file that are used to combine the data from multiple files. These id variables are not necessary in the JSON format.


Table 40: Comparison of Unique Branded Variables in JSON and ASCII Formats


Key notes/important takeaways:

  • In the JSON data, both the value of the nutrient measures as they appear in the nutrition facts panel and the value of nutrient measures per 100g of each food is recorded. In the ASCII data, only the value of nutrient measures per 100g of each food is recorded.
    • For instance, for the food “ALMOND MILK, ORIGINAL” from the HEB store brand, in both formats you get the amount of calcium, iron , potassium, and sodium calculated per 100ml. In the JSON format, you also get the %DV and the amount per 240ml of these nutrient measures as they appear on the nutrition facts panel.
  • In the JSON data a log of all changes made to each food is provided. This update log includes changes made to descriptive attributes (such as name, serving size, or brand information), changes made to the nutrient measures, and the date each change was published to the FDC.
  • Footnotes are declared as a variable in the ASCII files but appear as food attributes in the JSON file.


9 Foundation


Foundation Foods includes values derived from analyses for food components, including nutrients on a diverse range of foods and ingredients as well as extensive underlying metadata. These metadata include the number of samples, sampling location, date of collection, analytical approaches used, and if appropriate, agricultural information such as genotype and production practices. The enhanced depth and transparency of Foundation Foods data can provide valuable insights into the many factors that influence variability in nutrient and food component profiles. The goal of Foundation Foods will be to, over time, expand the number of basic foods and ingredients and their underlying data (1).


9.1 Nutrient Measures and Units



Figure 15: Frequency of Nutrient Measures in Foundation

9.2 Age of Nutrient Measures


The minimum year each food sample was acquired is provided for foundation foods rather than the minimum date associated with each food or nutrient measure. However, this information is only provided for samples of 86 foundation foods (a little more than half).

Acquisition dates of foundation foods obtained through agricultural acquisition are provided in the csv version of the data, but the amount of foods with acquisition dates is rather limited.

The expiration date of foundation foods obtained through market acquisition can also be found within the csv files. However, this information is unreliable as an indicator of age for the nutrient measures.



Figure 16: Frequency of Sample Nutrient Measures by Minimum Year Acquired in Foundation


Key notes/important takeaways:

  • Minimum age acquired is provided for 54% of foundation foods but is viewable for all foundation foods on the FDC website.


9.3 Origin of Nutrient Measures


There are three variables in the FoodData Central data that identify the origin of each nutrient measures. There is the derivation description which is the method by which each nutrient measure was derived. Then there are the source code and source description which identify the overall origin of the nutrient measures.


Table 43: Source and Derivation of Nutrient Measures in Foundation


Key notes/important takeaways:


9.4 Food Categories


Foundation uses the same food categories as SR, excluding “American Indian/Alaska Native Foods”, “Baby Foods”, “Breakfast Cereals”, “Fast Foods”, “Lamb, Veal, and Game Products”, “Meals, Entrees, and Side Dishes”, and “Snacks”.


Table 44: Nutrient Measures per Food Category in Foundation


Key notes/important takeaways:


9.5 Completeness



All Nutrient Measures and Grouping



Ideally, all 221 nutrient measures present in foundation would be provided for every food. However, the number of nutrient measures provided for each food in foundation is variable.


Figure 17: Frequency of Nutrient Measures For All Foods in Foundation


Nutrient measures were grouped for comparison, below you’ll find the groupings of every nutrient measure in Foundation.


Table 45: Nutrient Measure Groups in Foundation


Nutrient measures by Food Category



Each tab below displays the most essential or important nutrient measures in each nutrient measure group. The grouping of nutrient names into the groups displayed below can be found in table 50. The percentages below represent the proportion of foods in each food category containing at least one nutrient measure in the specified subgroup.


Carbohydrates


Key notes/important takeaways:


Proteins


Key notes/important takeaways:


Sugars


Key notes/important takeaways:


Vitamins


Key notes/important takeaways:


Fats


Key notes/important takeaways:


Minerals


Key notes/important takeaways:


Phytochemicals


Key notes/important takeaways:


Figure 18: Frequency of Missing Nutrient Measure Entries by Food Category


Figure 19: Nutrient Measures by Food Category in Foundation

Origin of Nutrient Measures



Key notes/important takeaways:


Age of Nutrient Measures


Figure 20: Frequency of Missing Minimum Year Acquired Entries


Key notes/important takeaways:


9.6 Comparison of File Formats


Ideally the information available on the FDC would be identical across all file formats, however this is not the case.

Requesting Data Using the API


To collect FDC data using the API, an API key must be acquired from FDC. API access is limited to 3,600 requests per hour per IP address. In the event that you need to collect a large amount of food data from the API you must contact FDC directly.

There are 4 ways to collect data through the API:

  1. Get /v1/food/{fdcId}
    • Requires one FDC ID and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a single food item by an FDC ID.
  2. Get /v1/foods
    • Requires a list of multiple FDC ID’s and a list of up to 25 nutrient numbers (only the nutrient information for the specified nutrients will be returned) as input.
    • Retrieves a list of food items by a list of up to 20 FDC IDs.
  3. Get /v1/foods/list
    • No input required.
    • Retrieves a paged list of foods.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.
    • Only provides an abridged version of the data which includes; FDC id, name and description of food, date published to FDC website, food code (FNDDS only), the nutrient measures calculated per 100g or 100ml of the food, and the method of derivation for each nutrient measure collected.
  4. Get /v1/foods/search
    • Requires one or more search terms as input.
    • Returns a list of foods that matched search keywords.
    • Results can be filtered by data type and there are options for result page sizes or sorting.
    • A maximum of 200 foods may be retrieved per request.


Through methods 1,2, and 4, variables relating to nutrient acquisition and analysis are provided for foods in foundation that cannot be found in the JSON, ASCII, and MS Access files downloadable on the FDC website. For each of these variables no description was provided.



Comparison of JSON and ASCII Data


Below you’ll find a comparison of variables in the JSON and ASCII file formats, this comparison is of variables unique to each format (not an exhaustive list). By design, the ASCII format will contain multiple id variables per file that are used to combine the data from multiple files. These id variables are not necessary in the JSON format.


Table 55: Comparison of Unique Foundation Variables in JSON and ASCII Formats


Key notes/important takeaways:


10 USDA Special Interest Databases


The US Department of Agriculture has published several Special Interest Databases (SID) on flavonoids. The USDA Ag Data Commons website is where the most current versions of these databases are maintained (5). It contains direct data downloads, and more detailed information for the following USDA Special Interest Databases on Flavonoids:

  • USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes (September 2014)

  • USDA Database for the Flavonoid Content of Selected Foods, Release 3.2 (November 2015)

  • USDA Database for the Isoflavone Content of Selected Foods, Release 2.1 (November 2015)

  • USDA Database for the Proanthocyanidin Content of Selected Foods, Release 2 (2015)

Nutrient measures from these databases can be linked to Foundation and SR legacy foods by NDB number.


10.1 USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes


The Expanded Flavonoid Database provides values of the following flavonoids in 25 Foundation foods and SR legacy 1613 foods. In all cases, every flavonoid is provided for each food.


Table 56: List of Flavanoids in the Expanded Flavonoid Database
Flavanoid_Measures
Daidzein
Genistein
Glycitein
Cyanidin
Petunidin
Delphinidin
Malvidin
Pelargonidin
Peonidin
(+)-Catechin
(-)-Epigallocatechin
(-)-Epicatechin
(-)-Epicatechin 3-gallate
(-)-Epigallocatechin 3-gallate
Theaflavin
Thearubigins
Eriodictyol
Hesperetin
Naringenin
Apigenin
Luteolin
Isorhamnetin
Kaempferol
Myricetin
Quercetin
Theaflavin-3,3'-digallate
Theaflavin-3'-gallate
Theaflavin-3-gallate
(+)-Gallocatechin


10.2 USDA Database for the Flavonoid Content of Selected Foods


After the release of the USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes in 2015, alterations and additions were made to the USDA Database for the Flavonoid Content of Selected Foods in 2018. While the other supplemental databases are available on the USDA Ag Data Commons, this new update was published solely on the USDA Agricultural Research Service website (6).

There are values in the USDA Database for the Flavonoid Content of Selected Foods of 183 foods in SR legacy and Foundation. Of those 183, 131 can also be found in the USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes (meaning a total of 52 new foods have been added in this release). These newer values are assumed to be more accurate and if added to the FDC data should replace their previous versions.


Table 57: Flavonoid Entries for SR and Foundation Foods
SR_Count Foundation_Count
(-)-Epicatechin 82 4
(-)-Epicatechin 3-gallate 75 4
(-)-Epigallocatechin 73 4
(-)-Epigallocatechin 3-gallate 74 4
(+)-Catechin 83 4
(+)-Gallocatechin 71 4
Apigenin 101 6
Cyanidin 48 2
Delphinidin 46 2
Eriodictyol 2
Hesperetin 39 2
Isorhamnetin 43 1
Kaempferol 132 7
Luteolin 112 6
Malvidin 38 2
Myricetin 122 7
Naringenin 37 2
Pelargonidin 39 2
Peonidin 38 2
Petunidin 37 2
Quercetin 162 7
Theaflavin 3
Theaflavin-3'-gallate 3
Theaflavin-3,3'-digallate 3
Thearubigins 3


10.3 USDA Database for the Proanthocyanidin Content of Selected Foods


The database of proanthocyanidin content is available from the USDA website and provides values for 5 different measures of proanthocyanidin (7).


Table 58: Proanthocyanidin Entries for SR and Foundation Foods
SR_Count Foundation_Count
Proanthocyanidin 4-6mers 114 6
Proanthocyanidin 7-10mers 110 6
Proanthocyanidin dimers 130 6
Proanthocyanidin polymers (>10mers) 108 6
Proanthocyanidin trimers 124 6


10.4 USDA Database for the Isoflavone Content of Selected Foods


From the USDA Ag Data Commons website you can find a database of Isoflavone content for many of the foods in the FDC database (8). The following table contain the names of each type of Isoflavone content and the number of foods in SR legacy and Foundation we have values for.


Table 59: Proanthocyanidin Entries for SR and Foundation Foods
SR_Count Foundation_Count
Biochanin A 59 3
Coumestrol 123 8
Daidzein 262 15
Formononetin 123 8
Genistein 262 15
Glycitein 143 9
Total isoflavones 259 15


This data set has significant overlap with USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes but provides additional information on “Biochanin A”, “Coumestrol”, “Formononetin”, and “Total isoflavones”.


11 References

  1. U.S. Department of Agriculture, Agricultural Research Service. FoodData Central, 2019. fdc.nal.usda.gov.

  2. (dataset) Haytowitz, David B.; Ahuja, Jaspreet K.C.; Wu, Xianli; Somanchi, Meena; Nickle, Melissa; Nguyen, Quyen A.; Roseland, Janet M.; Williams, Juhi R.; Patterson, Kristine Y.; Li, Ying; Pehrsson, Pamela R.. (2019). USDA National Nutrient Database for Standard Reference, Legacy Release. Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://data.nal.usda.gov/dataset/usda-national-nutrient-database-standard-reference-legacy-release. Accessed 2022-02-22.

  3. U.S. Department of Agriculture, Agricultural Research Service. 2018. USDA Food and Nutrient Database for Dietary Studies 2017-2018. Food Surveys Research Group Home Page, www.ars.usda.gov/nea/bhnrc/fsrg

  4. U.S. Department of Agriculture, Agricultural Research Service. 2020. What We Eat in America Food Categories 2017-2018. Available: www.ars.usda.gov/nea/bhnrc/fsrg

  5. Bhagwat, Seema; Haytowitz, David B.; Wasswa-Kintu, Shirley. (2015). USDA’s Expanded Flavonoid Database for the Assessment of Dietary Intakes, Release 1.1 - December 2015. Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324677. Accessed 2022-01-12.

  6. Haytowitz, D.B., Wu, X., Bhagwat, S. 2018. USDA Database for the Flavonoid Content of Selected Foods, Release 3.3. U.S. Department of Agriculture, Agricultural Research Service. Nutrient Data Laboratory Home Page: http://www.ars.usda.gov/nutrientdata/flav

  7. Bhagwat, Seema; Haytowitz, David B.. (2015). USDA Database for the Proanthocyanidin Content of Selected Foods, Release 2 (2015). Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324621. Accessed 2022-01-12.

  8. Bhagwat, Seema; Haytowitz, David B.. (2015). USDA Database for the Isoflavone Content of Selected Foods, Release 2.1 (November 2015). Nutrient Data Laboratory, Beltsville Human Nutrition Research Center, ARS, USDA. https://doi.org/10.15482/USDA.ADC/1324538. Accessed 2022-01-12.

  9. R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  10. Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686

  11. Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.

  12. Yihui Xie, Joe Cheng and Xianying Tan (2021). DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.18. https://CRAN.R-project.org/package=DT

  13. Richard Iannone, Joe Cheng and Barret Schloerke (2021). gt: Easily Create Presentation-Ready Display Tables. R package version 0.3.1. https://CRAN.R-project.org/package=gt

  14. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

  15. Hadley Wickham and Jim Hester (2020). readr: Read Rectangular Text Data. R package version 1.4.0. https://CRAN.R-project.org/package=readr

  16. Hadley Wickham and Jennifer Bryan (2019). readxl: Read Excel Files. R package version 1.3.1. https://CRAN.R-project.org/package=readxl

  17. Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO] URL https://arxiv.org/abs/1403.2805.

  18. Greg Lin (2022). reactable: Interactive Data Tables Based on ‘React Table’. https://glin.github.io/reactable/, https://github.com/glin/reactable.

  19. Kyle Cuilla (2021). reactablefmtr: Easily Customize Interactive Tables Made with Reactable. R package version 1.0.0. https://CRAN.R-project.org/package=reactablefmtr

  20. Yihui Xie (2021). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.24.

  21. Yihui Xie (2016). bookdown: Authoring Books and Technical Documents with R Markdown. Chapman and Hall/CRC. ISBN 978-1138700109

  22. Joe Cheng, Carson Sievert, Winston Chang, Yihui Xie and Jeff Allen (2021). htmltools: Tools for HTML. R package version 0.5.1.1. https://CRAN.R-project.org/package=htmltools

  23. Yihui Xie (2021). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.31.

  24. Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963

  25. Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595

  26. JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2021). rmarkdown: Dynamic Documents for R. R package version 2.11. URL https://rmarkdown.rstudio.com.

  27. Yihui Xie and J.J. Allaire and Garrett Grolemund (2018). R Markdown: The Definitive Guide. Chapman and Hall/CRC. ISBN 9781138359338. URL https://bookdown.org/yihui/rmarkdown.

  28. Yihui Xie and Christophe Dervieux and Emily Riederer (2020). R Markdown Cookbook. Chapman and Hall/CRC. ISBN

  29. URL https://bookdown.org/yihui/rmarkdown-cookbook.