Final Report

Author

Radiah Khan ’28 | Faculty Advisor: Dr. Kementari Whitcher

1 Acknowledgment

I would like to express my deepest gratitude to my faculty advisor, Dr. Kementari Whitcher for being incredibly supportive not only as a teacher but also as a motivator every step of the way. In the past 8 weeks, there have been many times when working with a data set built from scratch felt intimidating, but with the supervision of Dr. Kementari Whitcher, I learnt to think as a statistician. This research would not have been possible without the amazing Statistical and Data Sciences Department at Smith College, which gave me unwavering support from the beginning. Furthermore, I would like to acknowledge Professor Randi Garcia, the chair of the department, whose support was pivotal to the completion of this project. I am grateful to every faculty member in the department who continues to be a source of inspiration through everything. Additionally, I am grateful for the generous funding from the Summer Undergraduate Research Fellowship at Smith College that made this project a reality. Special thanks to the Assistant Director of Sustainability at The Center for the Environment, Ecological Design and Sustainability (CEEDS), Becca Malloy – who provided Smith College dining data and graciously answered every question that came along the way. Last but not least, I would like to thank my niece, my other family members and my friends back home – who are my constant source of strength.

2 Abstract

This study examines procurement of food purchases in the dining sector of higher studies institutions in the United States and how it evolved over the course of time. It discusses the aspects of ethically produced food, locally sourced food, dining budget and sustainability in colleges. The goal of this study is to explore this data set through exploratory data analysis and statistical tests. The data set used in this study was compiled from two sources. For Smith College-specific data, we collected them from the Center for the Environment, Ecological Design and Sustainability (CEEDS) at Smith College. For other colleges used in this study, we obtained data from a framework named the Sustainability Tracking Assessment & Rating System (STARS), created by the Association for the Advancement of Sustainability in Higher Education (AASHE). We aim to analyze how dining services have changed over the years in private and flagship institutions across the 50 states in the United States of America. The study will conduct both cross-institution analysis and within-institution analysis between these two categories. It investigated statistical analyses in this data set to identify any possible correlations between variables within the standards of the STARS framework. The statistical analysis methods included stratified random sampling, logistic regression, hypothesis testing (ANOVA, Repeated Measure ANOVA), T-test, One-way Chi Square, Two-way Chi Square, and Regression Analysis. The data in this study points to patterns in purchasing trends in universities and their effects on sustainability metrics (Budget class, Total Budget, Ethically Produced Rate, Locally produced food percentage etc.). The findings are divided into three areas: Smith College Trends in Food Purchasing, Five College Consortium Comparison on Locally Sourced Produce, and Institutional Comparisons Across the United States.

3 Introduction

We will introduce the sources of the data used in this study.

3.1 STARS AASHE

The STARS (2025b) framework created a sustainability rubric to score higher education institutions to ensure continued improvement in different areas, such as academics (AC), engagement (EN), operations (OP), and planning & administration (PA). These areas have sub sectors, which are sustainability course offerings, applied learning, sustainability research, responsible research and innovation, outreach and communication, co-curricular activities, staff engagement and training, sustainability and culture assessment, civic engagement, community partnerships, continuing education, shared facilities, inter-campus collaboration, building design and construction, building operations and maintenance, water use, ecologically managed grounds, energy use, greenhouse gas emissions, dining service procurement, food recovery, sustainable procurement, purchase goods, and materials management, Waste Generation and Recovery, Vehicle Fleet, Commute Modal Split, Air Travel,Sustainability Coordination, Commitments and Planning, Institutional Governance, Sustainable Investment Program, Investment Holdings, Institutional Climate, Racial and Ethnic Representation, Gender Parity, Affordability and Access, Student Success, Health, Safety and Well being, Employee Rights, Pay Equity, and Living Wage.

The data set used in this study is extracted from the Operations sector, with the sub-sector of Dining Service Procurement (listed as OP7). Most of the universities in this study use the STARS version 2.2.For some occurrences it is listed from version 3.0 or version 2.1.

The STARS framework awards institutions ratings of five levels: Platinum, Gold, Silver, Bronze, or Reporter.

Under the Food and Dining impact area in the Operations category the total point available is 8 (version 2.2)

This is how they breakdown the score :

Credit system for dining impact area (Version 2.2)
Credit Point
Food and Beverage Purchasing 6
Sustainable Dining 2

STARS framework also integrates itself with Sustainable Development Goals Nations (2015) goals in the version 2.2 STARS (2019). Here is the flow of how each area correlates with fulfilling SDG Goals by the United Nations:

The data for creating this chart were extracted from the technical manual.1 The visualization was made with lucid chart2

The applicable criteria for this study depended on the availability of data points. For the majority of the data points, the score was out of 6, the other instances, it was out of 8 and 4. So the relevant credit areas are food and beverage percentage meeting sustainability criteria (6 credits) and dining services percentage spent with social impact suppliers (2 credits).

The minimum requirements for these credits are assessed by 2 metrics in version 3.0:

  • Food and beverage expenditure
  • Supplier attributes

The target to earn full points here are within 2 metrics as well:

  • Weighted cost of purchased food and beverages that meet sustainability criteria >= Total food and Beverage spend

  • Social impact supplier spend needs to be equal to or higher than 10% of the whole dining budget

Now the foods purchased by the institutions need to meet a certain sustainability or ethical criteria set by STARS AASHE. The standards here are:

  • I-FOAM Organics International or endorsed by them
  • ISO Type 1 ecolabel
  • ISEAL Alliance Member Organization
  • Global Ecolabelling Network Member Organization
  • Anchors in Action Aligned Framework
  • Monterey Bay Aquarium Seafood Watch
  • Participatory Guarantee System (PGS)
  • Short food supply chain (SFSC)
  • Small Producers’ Symbol (SPP)
  • World Fair Trade Organization (WFTO) or Fair Trade Federation (FTF) membership

In total, AASHE has 55 certifications for non-seafood items and 7 standards for seafood items around the world. The food categories falling into these certification categories are Fresh food (produce, eggs, meat, shellfish), Packaged/ready-to-eat foods (spices, oil, sugar, grains, baked goods, candies, frozen food, dairy products), beverages (sports drinks, fruit juices, tea, coffee, bottled water, and any liquids)

The STARS framework has developed a decision criterion for different kinds of food types to be categorized as sustainable/ethical.

Given it is a self reporting tool, these are the documentations used by the STARS framework.

  • The year the performance period ended for food and beverage expenditure
  • Sustainable/Ethically Produced Rate
  • Plant-based food rate
  • Inventory
  • Any data limitation
  • Methodology Description


This is how STARS defined their decision criteria in their technical manual of the 3.0.1 version STARS (2025a)

The data for creating this chart were extracted from the technical manual.3 The visualization was made with lucid chart4

3.2 CEEDS

CEEDS have collected data adhering to STARS framework.

They use several categories to label food to indicate sustainability. Each of these categories are from a student-initiated national effort to have significant positive change in campus food systems in 2008.

  • Real Food : It is a holistic term to categorize food that caters to consumers, producers, communities and every stakeholders. In 2008, students and advisers came up with a framework that defines what foods fall under Real Food. It is designed to encourage sustainable and fair producers. The evaluation of production is also emphasized here. Ecologically Sound, Local, Humane and Fair are real food category.

  • Sustainable Food : This term is defined by the STARS framework with their certification in the Section 3.1 sector.

  • Local : The food needs to be procured by -

    • Nearby farms, ranches, boats and businesses that are locally owned
    • Mid size food businesses

    However, each institutions reports “local” differently at times. According to Leighton (2017), here is how colleges define their food as “local”:

    Source : Measuring Up: When, How, and Why to Define “Local”

  • Fair : The workers need to be ensured of these rights-

    • Have a safe environment to work in

    • Receive fair compensation

    • Right to organize

    • Right to a grievance process

    • Equal opportunity for employment

  • Humane : Animals need to be ensured of these health conditions-

    • Low stress environment
    • Administer drugs only when diseases are diagnosed

4 Sampling Method

There are two sampling methods that were used for this study. The first one is how STARS collects their data and the second one is how we built the data set used for this study.

4.1 STARS Framework

The framework uses Representative Sampling — which means all of the samples in the population have an equal chance of being selected to represent the whole population.

All of the reporting institutions have two choices: One year period purchases or a representative sample that includes one academic term or equivalent.

However, there are some criterion that institutions need to consider while reporting.

  • Needs to be within the previous 3 years Seasonal purchases

  • Self operated/contracted dining service reports

  • Include retail outlets managed by institution/contractor

  • Exclude independent operators, vending etc

4.2 Research Method

This research used stratified random sampling to create the data set. One stratification variable was type of institution: Flagship Institutions and Private Institutions. The second variable was State, with all 50 US states and the District of Columbia included. This process of stratification was based on institution type and geographical location. First, we listed all the flagship and private institutions in the United States, with their states. Subsequently, we matched their data availability in the AASHE (2025) reports. Following that, we created two separate data sets, one for flagship, and the other for private universities.

4.2.1 Private Institutions

We collected all of our data from the AASHE (2025) website.

For each state, we had more than one private institution. We randomly selected one institution from each state. We faced two cases in this. The first case example was Florida; which had 13 private institutions. We chose a random number from 1-13 using a simple random number generator. We got the number 4; hence we chose the institution that was at the fourth place in our list. The second case example was in Colorado where we had 7 institutions. We randomly generated the number 5. However, the selected institution did not have the dining procurement data included in their report. In cases like this we moved to the immediate next institution, and kept progressing in order until we found valid dining data to use for the study.

4.2.2 Flagship Institutions

Some of the states have multiple flagship universities (NY, TX, IN, WY), so we used all of them instead of randomly selecting one. The states not found in the later analysis are the ones that were not present in the STARS framework.

After creating the two data sets; we compiled them all together and later cleaned them up. Below is the amount of data flow we created after processing and cleaning the data.

4.3 Data Selection Flow

To clean the data set, the criteria for filtering out data rows/states were: - Absence of value in at least 50% of used variables for the analysis - No dining procurement data in any institutions in the entire state

Data Selection Flow for the Final Dataset

5 Research Questions

In this study, we want to examine the dining sourcing and dining performance across three sectors: Smith College, the Five College Consortium, and colleges across the United States.

5.1 Smith College

  • Has the sourcing of local foods proportion have changed over the course of 2017-2018?
  • Is the likelihood of a food item being locally sourced associated with its cost?

5.2 Five College Consortium

  • How do the sourcing of sustainable and local items in the dining area differ among five colleges?

5.3 All Colleges

  • Do flagship institutions spend more money on dining than private institutions and how does it affect their score?
  • How does the dining performance score change over the years?
  • Does the STARS rating differ in institution categories?
  • How does the ethically produced rate differ in institution categories?
  • How are the key metric (adhered to STARS Framework) used in this study correlated across institutions?

6 Data Variables Used

For this study, we selected variables among the available variables in STARS framework templates.

6.1 Budget Class

There were 3 major ranges of budget institutions have created for themselves. It is important to note that it had 1 level noted in Canadian dollars so we converted it to fit in the 3 ranges. This is the total dining budget of each institution for the reporting period.

We then divided the 3 ranges to 3 classes as shown in the table below

Table 1
Budget Class
1 million - 4.9 million 1
5 million - 9.9 million 2
10 million or more 3

6.2 Score Ratio

The points earned by institutions in the impact area of food and dining had two score scales : out of 6 and out of 8; depending on the year. To keep the calculation consistent over all years; we calculated the proportion of the score to the total possible as a score ratio.

For example Table 2 shows how we calculated the score ratio for these two cases.

Table 2
X.Score.given.by.STARS Score.Ratio
3/6 0.5
4/8 0.5

6.3 Ethically Produced Rate

STARS provides an inventory template to institutions to document sustainable/ethical produces with proper justification of inclusion with these information : Product name, label, brand, description, certification of sustainability standard.

6.4 Year

The study reported two latest years of each institutions’ reports. The first latest year has been denoted as Year 1 and the second latest year has been denoted as Year 2.

6.5 Total Food and Beverage Expenditure

Institutions need to report the following categories of food: Meat(including frozen or canned), Dairy (anything that includes milk), Poultry, Fish/Seafood, Eggs, Produce, Baked goods, Grocery/Staples, Coffee-Tea, Beverages. All of these categories need to be included to calculate the total food and beverage expenditure.

6.6 Category

As mentioned above, the institutions reported in the database are divided into two categories: Flagship and Private.

Flagship : Flagship universities are publicly funded research institutions which are the largest/oldest in each state.

Private : Private universities are not typically funded by the government and often run on private endowment.

6.7 Rating Status

There are five levels that STARS AASHE rates an institution. This is the total score breakdown from the technical manual that denotes the total score (on all of the combined sectors above) an institution needs to have. Reporter institutions are any that submitted a report, but do not have a minimum score that earns a STAR rating.

This data is sourced from STARS (2025b)

Minimum Percentage Score Needed for Achievement Levels in STARS V2.2

7 Results

We will do exploratory data analysis on this data set alongside statistical analysis to identify any association and patterns.

This section will have three sub-sections:

  • Smith College Analysis
    Using the data we obtained from CEEDS, we will analyze the trends of dining sourcing between 2017, 2018, and 2023.

  • Five College Consortium Analysis
    We will analyze what percentage of spending was allocated for local products in each college to do a comparison.

  • All Institutions Combined Analysis
    We will conduct tests to identify trends and patterns across representative colleges from all U.S. states.

7.1 Smith College

For analyzing the food procurement at Smith College, its’ trend, and how the sourcing changed throughout time; we used the data sets of the year 2017, 2018 and 2023. It is important to note that 2023 data is not published in the STARS AASHE website. It is provided to us by CEEDS.

7.1.1 Defining the variables used

The variables used for analyzing Smith College data are shown in Table 3

Table 3
Variable Type Interpretation
Local_2017 Binary Coded as 1(local) and 0 (non-local) in 2017
Local_2018 Binary Coded as 1(local) and 0 (non-local) in 2018
Cost_2017 Numerical Price of each food item in 2017
Cost_2018 Numerical Price of each food item in 2018
Category Categorical STARS standard categories of food that calculates to total food and beverage expenditure
local_proportion Numerical The proportion of food categorized as local
count_yes Numerical The count of food categorized as local
count_no Numerical The count of food categorized as non-local

7.1.2 Food source and Cost

This analysis will model relationship between food source (local vs non-local) and their cost through logistic regression.

The response binary variable is (Local_2017 and Local_2018) and the categorical predictor variable is (Cost_2017 and Cost_2018)

In the response variable:

  • \(1\) = The item is locally sourced
  • \(0\) = The item is not locally sourced

We will test the following hypotheses:

  • Null hypothesis (\(H_0\)): \(\beta = 0\)
    The source of procurement (local/non-local) has no effect on the cost.

  • Alternative hypothesis (\(H_1\)): \(\beta \neq 0\)
    The source of procurement (local/non-local) has a significant effect on the cost.

The significance level for this test is \(p = 0.05\), and the confidence interval is \(95%\).

This analysis will be conducted for the years \(2017\) and \(2018\).

7.1.2.1 2017

For this test, we used the variable local and cost for the year \(2017\) to perform a logistic regression using a generalized linear model. This model examines the effect of cost on the probability of a food item being locally sourced.


Call:
glm(formula = Local_2017 ~ Cost_2017, family = binomial(link = logit), 
    data = logistical_regression_2017)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.869e+00  6.443e-02 -29.001  < 2e-16 ***
Cost_2017    3.336e-04  8.244e-05   4.047  5.2e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1908.4  on 2307  degrees of freedom
Residual deviance: 1892.7  on 2306  degrees of freedom
  (543 observations deleted due to missingness)
AIC: 1896.7

Number of Fisher Scoring iterations: 4

According to the output shown above, the p value is \(0.00052\) The p value is lower than the \(0.05\) significance level; so the result is statistically significant. That means, the local/non local status has a statistically significant effect on the cost outcome.

To visualize how well the model works, see Figure 1

Figure 1

Now to see how well the model works across every possible threshold (not just the significance level), we will calculate the area under the curve.

Figure 2

The higher the area under curve, the better the prediction power of the model. The range is 0 to 1. Here the area under the curve is 0.5683; which is better than baseline. Hence this is a significant predictor.

To get better performance, we added another variable in the categorical variable (the squared amount of the cost), but the area under the curve increased slightly.So, the change is not significant.

Given the nature of the data, there is a limited amount of data. With an addition of a significant variable (that we may not have) the model could work better.

7.1.2.2 2018

We will conduct the same test with 2018 data as well to see if the association is statistically significant or not.


Call:
glm(formula = Local ~ Cost, family = binomial(link = logit), 
    data = logistic_regression_2018)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.6836815  0.0488150 -14.006  < 2e-16 ***
Cost        -0.0021150  0.0003125  -6.768  1.3e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3421.2  on 2850  degrees of freedom
Residual deviance: 3333.3  on 2849  degrees of freedom
AIC: 3337.3

Number of Fisher Scoring iterations: 6

From the results we can see that the p value is \(0.000000000013\) so p < 0.05 (our significance level); so the result is statistically significant similar to 2017.

NULL
Figure 3

The Figure 3 shows that more values cluster around lower cost.

Area under the curve: 0.7221

The higher the area under curve, the better the prediction power of the model. The range is 0 to 1. Here the area under the curve is 0.7221; which is better than baseline. Hence this is a significant predictor.

It can be concluded here that for both of the years, the products that cost higher were more likely to be local for Smith College.

7.2 Association between categories of food purchased and local procurement status

Now to what extent is this local status of products associated with the categories Smith College purchased? Over the years (2017-18), Smith College has produced 7 consistent categories of food for their dining purposes. The categories are : - Baked - Beverages - Dairy - Grocery - Meat - Produce

We will use two-way chi square test of independence to test whether Category and Local-procurement status is associated or not.If they are not independent, then some categories have higher local proportion than the other.

  • Null Hypothesis (\(H_0\)): The two variables are independent.

  • Alternative Hypothesis (\(H_1\)): The two variables are not independent.

We start by extracting the Category column and Local column from the 2017 and 2018 data set. Then, we grouped the data by the category. Furthermore, we calculated the amount of local item in that category and the non-local item amount as well.

7.2.1 2017

Firstly, we analyze 2017 data with these two variables: Local and Category


    Pearson's Chi-squared test

data:  two_way_2017_table
X-squared = 186.19, df = 7, p-value < 2.2e-16

The p value is extremely close to 0; so it is less than the significance level 0.05.

Hence we can reject the null hypothesis. It can be concluded that the two variables are not independent.If they are not independent, then some categories have a higher proportion of local than others.

Figure 4

From Figure 4 we can see that counts of product categories that are local, we can see that produce, grocery, meat, dairy has higher counts of local sourcing.

7.2.2 2018

Then, we analyze 2018 data with the same variables to see if it is not independent in 2018 as well.

After conducting the chi square test in 2018, we see the following results.


    Pearson's Chi-squared test

data:  two_way_2018_table
X-squared = 825.5, df = 7, p-value < 2.2e-16

The p value is extremely close to 0; so it is less than the significance level 0.05. Hence we can reject the null hypothesis. It can be concluded that the two variables are not independent.If they are not independent, then some categories have a higher proportion of local than others.

Now that we know that for both of the years some categories have higher proportion of being local than the other ones, we will plot this.

Figure 5

From the Figure 5, we can see that our test results reflect on it. Dairy has the highest count of local purchases, followed by produce, grocery, meat and the other categories.

7.3 Category and Difference in Proportion from year 2017-2018

Now that we know some categories have a higher count than the other. We want to test if the proportion of the categories which were local is significantly different between the year 2017 and 2018.

After conducting the two proportion test on each category here are the results in Table 4

Table 4
Categories p.value
baked less than 0.05
beverages not significant
dairy less than 0.05
grocery less than 0.05
meat not significant
poultry not significant
produce less than 0.05

The significance level is 0.05. From the table we can see baked, dairy, grocery and produce has p < 0.05, hence the proportion of these categories which were local is significantly different between the year 2017 and 2018.

Now that we know the change was significant, we will plot these changes.

In Figure 6 we can see that dairy had the highest increase in locally sourced proportion from the year 2017 to 2018. Followed by produce and grocery. Baked had a decrease in proportion.

Figure 6

7.4 Cash Flow at Smith College Over the Year of 2017-18

We know that the proportion in each category and their local-non local status is not independent. There is statistical association between these two variables.

Now, we will visualize how the cash flow changed over the year (2017-2018) for the same categories.

Figure 7

Figure 7 shows the sankey diagram to visualize the cash flow from year 2017 to 2018. The width of the bands represent how high or low the cash amount is. The thicker the band the higher the cost.

Here we can see that the money spent on Beverages and Poultry stayed the same (the bands have almost the same width). For the other categories the band width decreased over time; which means that the money spent on these categories decreased from 2017 to 2018.

7.5 Amount distribution in all categories in 2 years at Smith College (2017-2018)

We are now analyzing how have the amount of items bought in each category changed over the course of this two year.

Figure 8

The Figure 8 shows that, most of the categories have increased in items except Meat category. It has decreased from 2017 to 2018.

7.5.1 Proportion of food type in 2023

From Section 3.2, we can see the categories of sustainable food Smith College has maintained in their dining system.

We have seen in 2017 and 2018 that Smith College has made effort to increase in locally sourced food. Depending on the data availability in 2023, we will see how Smith College allocated its dining money in all categories in the aforementioned section.

Figure 9

Figure 9 shows that, Smith College spent the most amount of money in their Real Category.Following that, Local is the next category where second highest amount of money was spent.Fair and Humane Category had similar spending (around $500000) which is the least in the range. Sustainable falls into a mid range in the money allocation.

7.6 Five College Consortium

Five College Consortium consists of the five college in the Pioneer Valley area. The institutions are : University of Massachusetts Amherst, Smith College, Amherst College, Mount Holyoke College and Hampshire College.

Among them, University of Massachusetts Amherst is the flagship university of Massachusetts. Now we will be conducting a cross-university analysis. We calculated each institutions’s total spending and looked over to the amount that was allocated to local spending—defined by the Local variable being marked as “1”. We then added all the items that were labelled as local and divided that by the total spending to determine the proportion of their spending that was local.

However, given the data availability, we used different years for each colleges.

  • Smith College = 2018
  • UMass Amherst = 2018
  • Mount Holyoke College = 2018
  • Hampshire College = 2021
  • Amherst College = 2019

So the year range is 2018-2021 depending on usability of data keeping it all consistent across colleges. It is important to note University of Massachusetts Amherst did not have Local sourcing variable, so we reported their percentage from their website UMass (2018)

7.6.1 Sustainably Produced Percentage Comparison: Smith vs UMass vs Hampshire

For Smith College, sustainable food covers four major categories in 2018 : Local, Fair, Ecological, Humane. If any of the product identified as true for any of these categories, we categorized them as sustainable.

For UMass (2018), they signed the Real Food Challenge Commitment and identified food that were third party verified(organic or humane)/ local-community sources.

For Hampshire College, they did not have categories specified. However they provided a percentage of food that were third party verified/local-community based.

7.6.2 Locally Produced Percentage Comparison: Smith College vs Hampshire vs Mount Holyoke College vs Amherst College

For Smith College, we only selected the local variable. The products that are local were calculated and then divided it by total cost of all local-non local products.

For UMass (2018), they signed the Real Food Challenge Commitment and identified food that were third party verified(organic or humane)/ local-community sources.

For Hampshire College, they did not have categories specified. However they provided a percentage of food that were local-community based.

7.7 All College Categories Combined

In this section, we will analyze all of the institutions across the United States. The analysis will be of three types. - Comparison Across Years - Comparison Between Categories (Flagship and Private) - Correlation between variables used in Section 6

7.7.1 Score Ratio- Budget Class

We are going to analyze the three budget classes that we classified in the institutions across institutions in Table 1, the score ratio that we determined for the institutions Table 2 and year (Section 6.4) with a two way ANOVA test.

The two factor in this ANOVA test is Budget Class and Year where the dependent variable is the score ratio.

  • Null Hypothesis 1 : Budget Class has no effect on the score ratio.

  • Null Hypothesis 2: Year has no effect on the score ratio.

  • Null Hypothesis 3: There is no interaction effect between the two factors.

  • Alternative Hypothesis 1: Budget Class has a significant effect on the score ratio.

  • Alternative Hypothesis 2: Year has a significant effect on score ratio.

  • Alternative Hypothesis 3: There is an interaction effect between two factors

Analysis of Variance Table

Response: value
                    Df  Sum Sq   Mean Sq F value Pr(>F)
`Budget Class`       2 0.02149 0.0107443  0.5250 0.5945
Year                 1 0.00222 0.0022204  0.1085 0.7431
`Budget Class`:Year  2 0.04408 0.0220410  1.0770 0.3478
Residuals           54 1.10514 0.0204655               
7.7.1.0.1 Result

The significance level is 0.05. So since the p values are greater than 0.05, none of the factors are significant. Budget level and year do not make a difference ratio of score that is based on agglomerated sustainability of schools.

7.8 Total food spend, Score Ratio

We performed a linear regression analysis here where the Total Budget is the independent variable and the Score Ratio is the dependent variable.

  • Null Hypothesis: \(H_0: \beta = 0\); There is no association between the independent and dependent variable
  • Alternative Hypothesis: \(H_0: \beta \neq 0\); There is statistically significant association between the independent and dependent variable.
Figure 10

From the Figure 10, we can see there is a weak positive relationship between the two variable.

The resulting model shows that there is a weak but statistically significant positive association between the two variable. (t = 2.449 with 114 degrees of freedom, p <0.05)

Linear Equation:
Score Ratio = 0.2092 + 5.047 * 10^-9 * Total Budget
R-squared: 4.2%


Call:
lm(formula = score_ratio ~ total_budget, data = cleaned_scores)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.26145 -0.09788 -0.01400  0.05543  0.48120 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.092e-01  1.963e-02  10.657   <2e-16 ***
total_budget 5.047e-09  2.061e-09   2.449   0.0158 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1482 on 114 degrees of freedom
Multiple R-squared:  0.04998,   Adjusted R-squared:  0.04165 
F-statistic: 5.997 on 1 and 114 DF,  p-value: 0.01585
Analysis of Variance Table

Response: score_ratio
              Df Sum Sq  Mean Sq F value  Pr(>F)  
total_budget   1 0.1317 0.131699  5.9974 0.01585 *
Residuals    114 2.5034 0.021959                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So we reject the null hypothesis and conclude that \(\beta \neq 0\), which means there is a statistically significant association between the two variables.

To check the normality of the residual plots for this analysis, we used a Q-Q plot to visualize it.

In here we can see that it performs reasonably well along most of the data with deviations from the expected pattern at the high end.

Now, what does it mean in our context?

The more an institution spent on its dining sector (based on the STARS categories), the higher its score ratio. This score indicates better performance in sustainable food procurement.

It seems to be the case that institutions that have a larger student body and higher funding spend more money on food and beverages expenditure, eventually gets a higher score ratio.

It is also important to reiterate that the score ratio criteria differ from year to year. The earlier year (Version 2.2 STARS) relied on 2 categories: Sustainable Dining, Food and Beverage Purchasing. Whereas recent years (Version 3.0.1 STARS) relied on two categories: Percentage of food and beverage spend that meets sustainability criteria and percentage of dining service spend with social impact suppliers. Converting their score to a ratio based on the total score possible made comparisons across years and institutions possible.

7.9 Score ratio and Category of Insitutions

Now we will conduct an analysis on the score ratio and the categories of institutions (Flagship and Private) to see if their mean score ratio differs or not.

7.9.1 Year 1

In this part, we compare the score ratios from two different category of institutions(flagship & private) for the first year of data of colleges.

  • The null hypothesis (\(H_0\)): The means of score ratio of the categories of institutions are equal.\
  • The alternative hypothesis (\(H_1\)): The means of score ratio of the categories of institutions are not equal.

The significance level is \(0.05\).For visualizing the distribution of the data, we use a box plot before conducting the independent sample t-test.

Figure 11

In the Figure 11, we removed outliers to improve normality. The median value is different in two categories. The private institutions show more dispersion compared to flagship institutions.

Before conducting the t-test, we conduct a Levene test.

# A tibble: 1 × 4
    df1   df2 statistic     p
  <int> <int>     <dbl> <dbl>
1     1    54     0.286 0.595

We can see that the p > 0.05 (significance level), so it is not statistically significant. Hence we do a t-test assuming equal variances.


    Two Sample t-test

data:  score_ratio by category
t = -0.85715, df = 54, p-value = 0.3952
alternative hypothesis: true difference in means between group Flagship and group Private is not equal to 0
95 percent confidence interval:
 -0.6065659  0.2432448
sample estimates:
mean in group Flagship  mean in group Private 
             -1.690458              -1.508797 

While there is a slight difference in the means of Flagship and Private, it is not statistically significant. Flagship has lower mean than Private. The p-value is 0.4111 which is not less than p value of 0.05. So we fail to reject the null hypothesis that there is no significant difference between score ratio in different college categories.

To put this in context,there appears to be no significant difference in the mean score ratio between flagship and private schools. From our previous test we saw that there is a positive significant association between dining expenditure and score ratio. We assumed that the category of colleges might play a role in the association. From the test we know that it is not significant.

7.10 Year 2

Now we conduct the same process for data of year 2 of the reporting institutions.

Figure 12

In the Figure 12, we removed outliers to improve normality. The median value is slightly different in two categories. The private box is slightly wider than flagship box.

From the levene test we can see that p>0.05, so we use simple t-test assuming variance is equal.

# A tibble: 1 × 4
    df1   df2 statistic     p
  <int> <int>     <dbl> <dbl>
1     1    40      2.40 0.129

From the levene test we can see that p>0.05, so we use simple t-test assuming variance is equal.


    Two Sample t-test

data:  score_ratio by category
t = 0.20327, df = 40, p-value = 0.84
alternative hypothesis: true difference in means between group Flagship and group Private is not equal to 0
95 percent confidence interval:
 -0.07154204  0.08754204
sample estimates:
mean in group Flagship  mean in group Private 
               0.22375                0.21575 

The p-value is 0.84 which is not less than p value of 0.05. So we fail to reject the null hypothesis that there is no significant difference between score ratio in different college categories in year 2 as well.

7.11 Ethically Produced Rate and Budget

We performed a linear regression analysis here where the Total Budget is the independent variable and the Ethically Produced Rate is the dependent variable.

  • Null Hypothesis \(H_0\): \(\beta = 0\) : There is no association between the independent and dependent variable
  • Alternative Hypothesis \(H_0\): \(\beta \neq 0\) : There is statistically significant association between the independent and dependent variable.
Figure 13: Scatterplot Showing Association Between Budget and Ethical Production Rate

From Figure 13, there is a weak positive association between the two variable.

The resulting model shows that there is no statistically significant association between the two variable. (t = 1.558 with 98 degrees of freedom, p >0.05)

Linear equation :Ethically Produced Rate = 6.825+ 0.0000001672 * Total Budget


Call:
lm(formula = ethically_produced_rate ~ total_budget, data = cleaned_ethical_rate)

Residuals:
   Min     1Q Median     3Q    Max 
-8.624 -5.730 -3.596  4.734 21.068 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.825e+00  1.065e+00   6.411  5.1e-09 ***
total_budget 1.672e-07  1.073e-07   1.558    0.122    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.456 on 98 degrees of freedom
Multiple R-squared:  0.02418,   Adjusted R-squared:  0.01422 
F-statistic: 2.429 on 1 and 98 DF,  p-value: 0.1224

So it can be concluded that there is no statistically significant association between ethically produced rate and total expenditure on food and dining.Any coefficient we get might be artifact due to data limitation.

7.12 Budget Class - Ethically Produced Rate

In this part, we will conduct an ANOVA test to measure any significant differences between the mean ethically produced rate among the three budget classes (Table 1)

  • Null Hypothesis (\(H_0\)): All means are equal, \[\mu_1 = \mu_2 = \mu_3\]
  • Alternative Hypothesis(\(H_1\)): At least one budget class mean ethically produced rate is different from others.
# A tibble: 3 × 4
  budget_class mean_rate std.dev_rate std.err
  <fct>            <dbl>        <dbl>   <dbl>
1 1                 7.87         9.87    1.83
2 2                 8.58         8.69    2.17
3 3                 5.04         5.71    1.35

From the summary table, we can see that the p>0.05 (significance level). So it is not statistically significant.So we fail to reject the null hypothesis stating that all budget class have equal mean ethically produced rate.

Implying that, the budget classes (lower and higher both) is not significantly associated with ethically produced rate. An institution with lower budget might perform similarly in terms of ethical produce compared to an institution with higher budget.

             Df Sum Sq Mean Sq F value Pr(>F)
budget_class  2    128   63.84   0.867  0.425
Residuals    60   4417   73.62               
Analysis of Variance Table

Response: ethically_produced_rate
             Df Sum Sq Mean Sq F value Pr(>F)
budget_class  2  127.7  63.842  0.8672 0.4253
Residuals    60 4417.0  73.616               

7.13 Score ratio over the years

We will analyse how the score ratios have changed between the year 1 and year 2 using a paired t-test method.

First we will check if the data fulfills the conditions for conducting the paired t-test. - The sample size > 30, so we do not need to check if it follows a normal distribution - The samples are paired since they were collected from measuring twice the score ratio of the same institutions.

  • Null Hypothesis (\(H_0\)): \(\mu_d = 0\); the mean difference between the paired score ratios is zero.
  • Alternative Hypothesis (\(H_1\)): \(\mu_d \neq 0\); the mean difference between the paired score ratios is not zero.

Initially, to visualize the distribution of score ratio we use a box plot:

Figure 14

Figure 14 shows that that the median is almost similar.


    Paired t-test

data:  score_years$`Score ratio Year 1` and score_years$`Score ratio Year 2`
t = 0.64005, df = 39, p-value = 0.5259
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.03951361  0.07609694
sample estimates:
mean difference 
     0.01829167 

The mean difference (Year 1- Year 2) is 0.0182916; which is not statistically significant. We fail to reject the null hypothesis. Implying that score ratios remained consistent over the years. The institutions had a consistent dining performance over the course of two consecutive reporting years.

7.14 Ethically produced rate over the years

We will analyse how the ethically produced rates have changed between the year 1 and year 2 using a paired t-test method.

First we will check if the data fulfills the conditions for conducting the paired t-test.

  • The sample size < 30, so we do need to check if it follows a normal distribution
Code
shapiro.test(ethical_years$`Ethically Produced Rate Year 1`)

    Shapiro-Wilk normality test

data:  ethical_years$`Ethically Produced Rate Year 1`
W = 0.9085, p-value = 0.01575
Code
shapiro.test(ethical_years$`Ethically Produced Rate Year 2`)

    Shapiro-Wilk normality test

data:  ethical_years$`Ethically Produced Rate Year 2`
W = 0.90698, p-value = 0.01445

We can see that they do not follow a normal data distribution. For this, a rank ordered test is better here but it is outside of the study’s scope.

7.15 Association between rating status & category of institutions

To compare between the two categorical variable rating status and category, we use chi-squared test of independence.


    Pearson's Chi-squared test

data:  tab_rate_category
X-squared = 4.55, df = 3, p-value = 0.2079
                    X^2 df P(> X^2)
Likelihood Ratio 6.2329  3  0.10081
Pearson          4.5500  3  0.20787

Phi-Coefficient   : NA 
Contingency Coeff.: 0.509 
Cramer's V        : 0.592 
Figure 15

Cramer’s V is 0.592 which means it has a large effect size. Sample size is small; there is a probable association however given small sample size and too much variability - that causes insignificant p value.

From the Figure 15, we can see for private institutions the most probable rating is Gold and the next likely rating is Bronze. But it is much less likely than Gold. For flagship institutions, the probability of having a gold status is less than expected (if there were no difference between the category of institutions)

7.16 Rating status & Ethically produced rate

In this part, we will conduct an ANOVA test to measure any significant differences between the mean ethically produced rate among the rating statuses (Section 6.7)

  • Null Hypothesis (\(H_0\)): All rating status pairs have equal mean ethically produced rates:
    \(\mu_{\text{Gold}} = \mu_{\text{Platinum}} = \mu_{\text{Silver}} = \mu_{\text{Not Available}}\)

  • Alternative Hypothesis (\(H_1\)): At least one pair of rating statuses has a different mean ethically produced rate.

The significance level is 0.01 here with a confidence level of 90%

              Df Sum Sq Mean Sq F value Pr(>F)  
rating_status  3  480.2  160.08   2.559 0.0665 .
Residuals     46 2877.6   62.56                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results we can see that the p < 0.01; so it is statistically significant. It can be concluded that at least one of the rating status above has a different mean ethically produced rate.

7.16.1 Post-Hoc ANOVA Analysis

Now that we know that it is statistically significant, we will conduct Post-Hoc analysis to specify which mean is different. Again, here the significance level is 0.01 with a 90% confidence interval.

$emmeans
 rating_status emmean   SE df lower.CL upper.CL
 Gold            6.79 1.44 46     4.36     9.21
 Not Available  11.63 2.64 46     7.20    16.05
 Platinum       12.49 4.57 46     4.83    20.16
 Silver          2.02 2.80 46    -2.67     6.72

Confidence level used: 0.9 

$contrasts
 contrast                 estimate   SE df t.ratio p.value
 Gold - Not Available       -4.844 3.01 46  -1.611  0.6837
 Gold - Platinum            -5.708 4.79 46  -1.192  1.0000
 Gold - Silver               4.764 3.15 46   1.514  0.8216
 Not Available - Platinum   -0.864 5.27 46  -0.164  1.0000
 Not Available - Silver      9.608 3.84 46   2.500  0.0963
 Platinum - Silver          10.472 5.35 46   1.956  0.3395

P value adjustment: bonferroni method for 6 tests 

When analyzing the p-values, Not Available - Silver has a p value < 0.1, so at 90% confidence level these two groups have statistically significant different mean ethically produced rate.However it is important to note that at 95% confidence level, the result is not significant.

It means that Not Available-Silver pair has a different statistically significant mean ethically produced rate. Not available groups have a higher mean ethically produced rate than silver status. Noting that, Not available means that the institutions no longer have an active rating(so all of their ratings have expired), there is a possibility of the expired ratings being gold, platinum, silver or bronze.

Including all the expired rating status would help the study more into truly understanding the difference between the gold, platinum, silver or bronze ratings but this is outside the study’s scope.

7.17 Total Food and Beverage Expenditure Across the United States

To visualize which colleges across the states in the US has highest food and beverage expenditure, we will map it by two categories : Flagship, Private.

It is important to note, we are taking their one year expenditure. Since the reporting years are different for each college, we are looking at a range of years. All of the colleges’ data hence come from the year 2022-2025.

Figure 16: Total Food and Beverage Expenditure Across the United States (2022-2025)

From the Figure 16, the darker region represents higher amount of money spent.

In the flagship institutions:

These are the following states that have higher expenditure: California (CA), Illinois (IL), Indiana (IN), Virginia (VA), Connecticut (CT), Massachusetts (MA), New York (NY)

These are the following states that have lesser expenditure: Montana (MT), Arizona (AZ), Kansas (KS), Iowa (IA), Virginia (VA), New Hampshire (NH)

Florida has the least expenditure.

In the private institutions:

These are the following states that have higher expenditure: Washington (WA), Virginia (VA), Wisconsin (WI), Ohio (OH), Georgia (GA), New Jersey (NJ)

These are the following states that have lesser expenditure: California (CA), Michigan (MI), New York (NY), North Carolina (NC), Maine (ME), New Hampshire (NH)

Now that we know which states have higher expenditure, we see if the same states have similar statistic on ethically produced rate.

7.18 Ethically Produced Rate Comparison Across US States

STARS AASHE has a field for institutions to report their ethically/sustainable produced rate for the institutions {#sec-intro-stars} and {#sec-intro-ceeds}.

7.18.1 Flagship

The plot in the Figure 17 explains the distribution of ethically produced rate across flagship US colleges in different states. The black line represents the average ethically produced rate for all colleges which is around 8% to 9% here. The x axis represents the categorical variable : States and the y axis represents Ethically Produced Rate of foods used in dining. The states below the line produces below average ethical food produce. The states above the line produces more than average ethical food produce.

Predictably so, California, Vermont and Massachusetts have the highest mean rate of ethically produced rate.Followed by Montana, Colorado,North Carolina and Utah. Pennsylvania, Connecticut, Iowa, Texas, Illinois, New York, Louisiana, Maryland, Virginia has lower than average mean rate of ethically produced rate.

Figure 17

7.18.2 Private

The plot in the Figure 18 explains the distribution of ethically produced rate across private universities in the US across different states. More colleges fall below average threshold rate of ethically produced rate for all private institutions.Texas,California and Georgia has the highest rate of ethically produced product.

Figure 18

7.19 Score ratio in Institutes Across the United States

Now, we are going to visualize how the score ratio differs in institutions across states. We calculated the average score ratio over the two available years’ for the institutions across states. The grouping variable here is states (which includes both flagship and private institutions). We have data ranging in the year from 2012-2025.

Figure 19

From the Figure 19, we can see Georgia, Oklahoma has the highest average score ratio among all of the states.The more the score ratio the darker the region. The lighter (yellow) regions indicate less score ratio and the orange regions indicate medium score ratio compared to the other two parts.

8 Analysis

As discussed in Section 5.1, some of categories have a higher local sourcing in both years (2017 and 2018) two years. The proportion of a few categories which are local have increased over the course of 2017-2018 compared to the non-local items. It indicates Smith College’s effort to promote sustainability in the dining impact area. This is a shift toward ethical and sustainable sourcing practice.We also explored the association between item cost and the likelihood of it being locally sourced using logistic regression. For both the years 2017 and 2018, lower cost were more clustered around an item being non-local. The regression analysis tells us that the model performs well(better in 2018 compared to 2017) predicting an item being local/non-local. In 2023, Smith College’s highest spending was allocated to Real food. These spending patterns highlight the college’s commitment to support local food sources and sustainable food procurement.

Next, as discussed in Section 5.2, we can see that Hampshire College has the highest sustainable produce percentage and the second highest local produce percentage. According to Gazette (2024), this college has focused on raising food in a way that is more climate resilient. The Hampshire College Farm is not a commercial firm rather it focuses more on adhering to student experience.This farm produces meat, vegetable and other produces to the college in its 80 acre area.They also partnered with a non profit named Momentum Ag–which solidifies its commitment towards strengthening local producers. Amherst College has the highest locally sourced produce share. According to Amherst College (2024), their Book and Plow Farm produces over 20,000 lbs of fresh produce each year. Their difference in percentage whether it is for local or sustainable may be a result of the availability of college owned farms, larger student body, and proximity to local production places. However, all of the colleges in the consortium show effort in local and sustainable produce

Finally, on the Section 7, we identified some pattern, trend, similarities and differences between flagship and private institutions.Flagship institutions consistently spent a higher amount of money than the private ones in the same state as seen in Figure 16. Given how higher score ratios have a positive correlation with an institutions food and beverage expenditure, higher the budget the higher score ratio. But score ratios stayed consistent for both flagship and private over the two years. Which means dining performance did not significantly differ for both categories in both years. While there were institutions that performed significantly better at times, they were considered outliers for the distribution.However, given their performance in the score ratio has not been statistically different in these two categories, indicating the private institutions are maintaining comparable performance with lesser expenditure. This can be due to the fact that flagship institutions have a larger student population.

After analyzing the total expenditure and ethically produced rate - we found out that they are not associated, yet we recognized a trend at institutions. The average threshold of the mean ethically produced rate for both Flagship (Figure 17) and Private (Figure 18) institutions was between 7% and 9%. More colleges fall below the average threshold in private institutions, whereas more colleges fall above the average threshold in flagship institutions. This suggests a minimal difference in ethical sourcing among flagship and private institutions.

When we analyzed the rating statuses across these two categories, most of the private institutions had gold ratings. This is important to note that the ratings are based on overall performance and not just the dining area. If there were truly no difference between the categories, the probability of gold ratings are less than expected for flagship institutions. They had more diverse ratings (given that we took all flagship institutions into account compared to the randomly chosen colleges for each state). But this rating does not have any statistically significant association with ethically produced rate either.

9 Limitation

While we were testing out for the changes in variables over the years, not all institutions provided all of the data requested by the STARS framework – that led to inconsistency in tests and might be a reason for statistically insignificant results. Some colleges had expired ratings and no active rating for any year so they were labelled as “not available”. Their lack of active rating might skew the results into making them less significant. We also chose all of the institutions for flagship colleges for each state but only chose one college per state (for private colleges) for the scope of this research project – that leads to data limitation and skewed results in state by state comparison across categories and summary results. It is a self reported tool, so there is a possibility of inconsistent information of data points across institutions. We are doing this comparison over 2 years and the years are not similar across all institutions. For example : some colleges have information about 2022 & 2023, some other colleges have information about 2017 & 2015. The year by year comparison might be biased due to how the socio-economic and geopolitical environment has changed over the years and might not be the same for all.

10 Future Scope of Work

This study had a section where we did cross comparison on categories of institutions and years. However, given the analysis was for 2 years hence a longer term analysis (preferably for more than five years) might reveal more significant trends. With the data of all institutions per state, we can have more comprehensive understanding of the averages by each state.

References

AASHE. 2025. “STARS: A Program of AASHE.” 2025. https://stars.aashe.org.
College, Amherst. 2024. “Sustainability Food & Dining.” 2024. https://www.amherst.edu/about/sustainability/sustainability-focus-areas/food-dining.
Gazette, Daily Hampshire. 2024. “Harvesting Food, Nourishing Knowledge: Hampshire College Farm Harnesses Student Energy While Adapting to Climate Change.” 2024. https://www.gazettenet.com/HampshireCollegeFarm-hg-09112024-56952399.
Leighton, Hannah. 2017. “Measuring up: When, How, and Why to Define "Local".” 2017. https://www.farmtoinstitution.org/blog/measuring-when-how-and-why-define-local#:~:text=RFC%20and%20AASHE%20have%20partnered,geographic%20midpoint%20of%20New%20England.
Nations, United. 2015. “THE 17 GOALS | Sustainable Development.” 2015. https://sdgs.un.org/goals.
STARS. 2019. “STARS-2.2-Technical-Manual.pdf.” 2019. https://stars.aashe.org/wp-content/uploads/2019/07/STARS-2.2-Technical-Manual.pdf.
———. 2025a. “STARS-3.0.1-Technical-Manual.pdf.” 2025. https://stars.aashe.org/wp-content/uploads/2025/07/STARS-Technical-Manual-Version-3.0.1.pdf.
STARS. 2025b. “Technical Manual.” 2025. https://stars.aashe.org/resources-support/technical-manual/.
UMass. 2018. “UMass Amherst Again Ranked Among Top 50 Green Colleges by Princeton Review.” 2018. https://www.umass.edu/news/article/umass-amherst-again-ranked-among-top-50-0#:~:text=The%20campus%20now%20has%20more,spent%20on%20local/organic%20food.

Footnotes

  1. Page 178-179 in the version 2.2 technical manual.↩︎

  2. https://www.lucidchart.com/ is a tool to make flow charts that we used to conceptualize the SDG goals written in the technical manual.↩︎

  3. Page 169-170 in the version 3.0.1 technical manual.↩︎

  4. https://www.lucidchart.com/ is a tool to make flow charts that we used to conceptualize it.↩︎