Study Abroad Program Models and Academics

ECI 586 Final Project

Author

Lindsey Anderegg

Published

December 8, 2024

Prepare

Purpose of Data and Guiding Research Questions

The following analysis takes a look at study abroad students over the last ten years to compare relationship of program duration/type, student participation, and academic credit. Utilizing data from the Open Doors report published by the Institute of International Education, which gathers study abroad data from universities across the United States, I have focused on the following research questions:

  1. How have short-term, mid-length (semester), and long-term study abroad program durations shifted over time?

  2. Are there notable shifts in student fields of study or academic levels corresponding to increases or decreases in study abroad participation over the years?

Data Sources and Variables Used

I initially set out to review data on study abroad students and how their program models impact their academics. Due to limited data available on study abroad and students who study abroad, the data below focuses on the trends of study abroad program duration and academics from 2013-2023. Data was gathered from the Institute of International Education (IIE) from four different data sets: Fields of Study, Student Profile, and Program Duration. Variables include year of participation, length of study (e.g. short-term vs. long-term), class level (e.g. freshman, sophomore, junior, senior), as well as number of participants and percentage of change in participation.

Wrangle

Loading Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
✔ broom        1.0.7     ✔ rsample      1.2.1
✔ dials        1.3.0     ✔ tune         1.2.1
✔ infer        1.0.7     ✔ workflows    1.1.4
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.2.1     ✔ yardstick    1.3.1
✔ recipes      1.1.0     
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use suppressPackageStartupMessages() to eliminate package startup messages
library(tidygraph)

Attaching package: 'tidygraph'

The following object is masked from 'package:stats':

    filter
library(reshape2)

Attaching package: 'reshape2'

The following object is masked from 'package:tidyr':

    smiths
library(treemap)
library(ggplot2)
library(ggraph)
library(janitor)

Attaching package: 'janitor'
The following objects are masked from 'package:stats':

    chisq.test, fisher.test

Importing Data Sets Total Study Abroad Participation

Total_Participation <- read_csv('~/Desktop/Data for Final Project/Final Data Sets Reformatted/Total Study Abroad Participation.csv')
Rows: 10 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (1): Year
num (1): Total

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
print(Total_Participation)
# A tibble: 10 × 2
    Year  Total
   <dbl>  <dbl>
 1  2013 304467
 2  2014 313415
 3  2015 325339
 4  2016 332727
 5  2017 341751
 6  2018 347099
 7  2019 162633
 8  2020  14549
 9  2021 188753
10  2022 280716

Three different data sets will be explored to help analyze Study Abroad Student programs and academics.

student_profile <- read_csv('~/Desktop/Data for Final Project/Final Data Sets Reformatted/FINAL.Profile_Study_Abroad_Students.csv')
New names:
Rows: 10 Columns: 46
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(5): Year, Nonbinary*, Attention-deficit/hyperactivity disorder (ADHD)*... dbl
(29): Undergraduate, Associate's, Freshman, Sophomore, Junior, Senior, B... lgl
(12): ...2, Academic level, ...17, Gender, ...22, Race/Ethnicity, ...30,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...2`
• `` -> `...17`
• `` -> `...22`
• `` -> `...30`
• `` -> `...33`
• `` -> `...43`
• `` -> `...45`
print(student_profile)
# A tibble: 10 × 46
   Year    ...2  `Academic level` Undergraduate `Associate's` Freshman Sophomore
   <chr>   <lgl> <lgl>                    <dbl>         <dbl>    <dbl>     <dbl>
 1 2013/14 NA    NA                        87             1.7      3.9      13.1
 2 2014/15 NA    NA                        87.6           1.8      3.9      13.1
 3 2015/16 NA    NA                        87.7           1.7      3.6      12.7
 4 2016/17 NA    NA                        87.9           1.7      4        13.2
 5 2017/18 NA    NA                        87.7           1.7      4.2      12.8
 6 2018/19 NA    NA                        88             1.9      4.1      13.2
 7 2019/20 NA    NA                        90.6           0.8      2.7      12.3
 8 2020/21 NA    NA                        88.1           0.4      7        14.2
 9 2021/22 NA    NA                        89.7           0.9      5        11.4
10 2022/23 NA    NA                        88.6           1.5      4.6      12.4
# ℹ 39 more variables: Junior <dbl>, Senior <dbl>,
#   `Bachelor's, Unspecified` <dbl>, Graduate <dbl>, `Master's` <dbl>,
#   Doctoral <dbl>, `Graduate, Professional` <dbl>,
#   `Graduate, Unspecified` <dbl>, `Other Academic Level` <dbl>, ...17 <lgl>,
#   Gender <lgl>, Female <dbl>, Male <dbl>, `Nonbinary*` <chr>, ...22 <lgl>,
#   `Race/Ethnicity` <lgl>, `American Indian or Alaska Native` <dbl>,
#   `Asian, Native Hawaiian, or Other Pacific Islander` <dbl>, …
program_duration <- read_csv('~/Desktop/Data for Final Project/Final Data Sets Reformatted/FINAL.Transposed_Study_Abroad_Duration_Data.csv')
New names:
Rows: 18 Columns: 27
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(6): Year, Two to eight weeks, Fewer than two weeks, Summer: More than ... dbl
(10): 8 Weeks or Less During Academic Year, January Term, Summer Term, O... lgl
(11): Short-term, ...11, Mid-length, ...16, Long-term, ...20, ...22, ......
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...11`
• `` -> `...16`
• `` -> `...20`
• `` -> `...22`
• `` -> `...24`
• `` -> `...26`
print(program_duration)
# A tibble: 18 × 27
   Year    `Short-term` 8 Weeks or Less During Academic Y…¹ `Two to eight weeks`
   <chr>   <lgl>                                      <dbl> <chr>               
 1 2005/06 NA                                           9.5 -                   
 2 2006/07 NA                                           9.8 -                   
 3 2007/08 NA                                          11   -                   
 4 2008/09 NA                                          11.7 -                   
 5 2009/10 NA                                          11.9 -                   
 6 2010/11 NA                                          13.3 5                   
 7 2011/12 NA                                          14.4 6.5                 
 8 2012/13 NA                                          15.3 6.9                 
 9 2013/14 NA                                          16.5 6.6                 
10 2014/15 NA                                          16.7 6.5                 
11 2015/16 NA                                          17.4 6.6                 
12 2016/17 NA                                          18.8 6.8                 
13 2017/18 NA                                          19   7.3                 
14 2018/19 NA                                          19.3 6.9                 
15 2019/20 NA                                          15.9 5.7                 
16 2020/21 NA                                           5.1 3.7                 
17 2021/22 NA                                          12.9 4.6                 
18 2022/23 NA                                          17.5 5.7                 
# ℹ abbreviated name: ¹​`8 Weeks or Less During Academic Year`
# ℹ 23 more variables: `Fewer than two weeks` <chr>, `January Term` <dbl>,
#   `Summer Term` <dbl>, `Summer: More than eight weeks` <chr>,
#   `Summer: Two to eight weeks` <chr>, `Summer: Fewer than two weeks` <chr>,
#   ...11 <lgl>, `Mid-length` <lgl>, `One Quarter` <dbl>, `Two Quarters` <dbl>,
#   `One Semester` <dbl>, ...16 <lgl>, `Long-term` <lgl>,
#   `Academic Year` <dbl>, `Calendar Year` <dbl>, ...20 <lgl>, Other <dbl>, …
field_of_study <- read_csv('~/Desktop/Data for Final Project/Final Data Sets Reformatted/FINAL.Transposed_Study_Abroad_Fields_of_Study_Data.csv')
New names:
Rows: 23 Columns: 35
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(5): Year, Foreign Language and International Studies***, Communication... dbl
(14): STEM Fields*, Physical or Life Sciences, Health Professions, Engin... lgl
(16): ...2, ...20, ...22, Note: Historical data may not always sum to to...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...2`
• `` -> `...20`
• `` -> `...22`
• `` -> `...33`
• `` -> `...34`
print(field_of_study)
# A tibble: 23 × 35
   Year    ...2  `STEM Fields*` `Physical or Life Sciences` `Health Professions`
   <chr>   <lgl>          <dbl>                       <dbl>                <dbl>
 1 2000/01 NA              16.6                        7.1                  3.2 
 2 2001/02 NA              16.8                        7.6                  3   
 3 2002/03 NA              17                          7.13                 3.09
 4 2003/04 NA              16.2                        7.06                 3.38
 5 2004/05 NA              16.3                        7.1                  3.4 
 6 2005/06 NA              16.4                        6.9                  3.8 
 7 2006/07 NA              17.5                        7.3                  4.1 
 8 2007/08 NA              17.6                        7.2                  4.5 
 9 2008/09 NA              17.7                        7.3                  4.5 
10 2009/10 NA              18.9                        7.5                  4.7 
# ℹ 13 more rows
# ℹ 30 more variables: Engineering <dbl>, `Math or Computer Science` <dbl>,
#   Agriculture <dbl>, `Business & Management` <dbl>,
#   `Social Sciences**` <dbl>,
#   `Foreign Language and International Studies***` <chr>,
#   `Fine and Applied Arts` <dbl>, `Communications and Journalism***` <chr>,
#   `Humanities**` <dbl>, Education <dbl>, …
glimpse(field_of_study)
Rows: 23
Columns: 35
$ Year                                                                                                                                                                                                                                                 <chr> …
$ ...2                                                                                                                                                                                                                                                 <lgl> …
$ `STEM Fields*`                                                                                                                                                                                                                                       <dbl> …
$ `Physical or Life Sciences`                                                                                                                                                                                                                          <dbl> …
$ `Health Professions`                                                                                                                                                                                                                                 <dbl> …
$ Engineering                                                                                                                                                                                                                                          <dbl> …
$ `Math or Computer Science`                                                                                                                                                                                                                           <dbl> …
$ Agriculture                                                                                                                                                                                                                                          <dbl> …
$ `Business & Management`                                                                                                                                                                                                                              <dbl> …
$ `Social Sciences**`                                                                                                                                                                                                                                  <dbl> …
$ `Foreign Language and International Studies***`                                                                                                                                                                                                      <chr> …
$ `Fine and Applied Arts`                                                                                                                                                                                                                              <dbl> …
$ `Communications and Journalism***`                                                                                                                                                                                                                   <chr> …
$ `Humanities**`                                                                                                                                                                                                                                       <dbl> …
$ Education                                                                                                                                                                                                                                            <dbl> …
$ `Legal Studies and Law Enforcement***`                                                                                                                                                                                                               <chr> …
$ `Foreign Languages***`                                                                                                                                                                                                                               <chr> …
$ `Other Fields of Study`                                                                                                                                                                                                                              <dbl> …
$ Undeclared                                                                                                                                                                                                                                           <dbl> …
$ ...20                                                                                                                                                                                                                                                <lgl> …
$ Total                                                                                                                                                                                                                                                <dbl> …
$ ...22                                                                                                                                                                                                                                                <lgl> …
$ `Note: Historical data may not always sum to totals.`                                                                                                                                                                                                <lgl> …
$ `Note: Percent distributions may not sum to 100.0 because of rounding.`                                                                                                                                                                              <lgl> …
$ `Note: Dash symbol (-) indicates no data available.`                                                                                                                                                                                                 <lgl> …
$ `Note: All broad field of study categories are based on Open Doors 2024 classifications, which may not match data in historical publications.`                                                                                                       <lgl> …
$ `Note: Starting in Open Doors 2021, data reported are from Classification of Instructional Programs, 2020 Edition, published by the National Center for Education Statistics (NCES) of the U.S. Department of Education.`                            <lgl> …
$ `Note: Prior to Open Doors 2021 The fields of study used were from Classification of Instructional Programs, 2010 Edition, published by the National Center for Education Statistics (NCES) of the U.S. Department of Education.`                    <lgl> …
$ `* Science, Technology, Engineering, and Math`                                                                                                                                                                                                       <lgl> …
$ `Notes on historical names and classifications:`                                                                                                                                                                                                     <lgl> …
$ `** Beginning in 2013/14, changes were made in the classification of fields of study reported in the Open Doors U.S. Study Abroad Survey. Figures reported from 2013/14 onward are not entirely comparable to prior years.`                          <lgl> …
$ `*** Beginning in 2013/14, Communications and Journalism  and Legal Studies & Law Enforcement were reported separately; and Foreign Language was merged with International Studies.`                                                                 <lgl> …
$ ...33                                                                                                                                                                                                                                                <lgl> …
$ ...34                                                                                                                                                                                                                                                <lgl> …
$ `Suggested citation: Institute of International Education. (2024). "Percent of U.S. Study Abroad Students by Field of Study, 2000/01 - 2022/23" Open Doors Report on International Educational Exchange. Retrieved from https://opendoorsdata.org/.` <lgl> …
glimpse(student_profile)
Rows: 10
Columns: 46
$ Year                                                <chr> "2013/14", "2014/1…
$ ...2                                                <lgl> NA, NA, NA, NA, NA…
$ `Academic level`                                    <lgl> NA, NA, NA, NA, NA…
$ Undergraduate                                       <dbl> 87.0, 87.6, 87.7, …
$ `Associate's`                                       <dbl> 1.7, 1.8, 1.7, 1.7…
$ Freshman                                            <dbl> 3.9, 3.9, 3.6, 4.0…
$ Sophomore                                           <dbl> 13.1, 13.1, 12.7, …
$ Junior                                              <dbl> 33.9, 33.1, 32.9, …
$ Senior                                              <dbl> 25.3, 26.4, 27.7, …
$ `Bachelor's, Unspecified`                           <dbl> 9.1, 9.3, 9.1, 8.6…
$ Graduate                                            <dbl> 12.7, 12.1, 12.1, …
$ `Master's`                                          <dbl> 7.6, 7.4, 7.0, 7.3…
$ Doctoral                                            <dbl> 0.7, 0.7, 0.7, 0.7…
$ `Graduate, Professional`                            <dbl> 2.0, 1.9, 2.1, 2.0…
$ `Graduate, Unspecified`                             <dbl> 2.4, 2.1, 2.3, 1.9…
$ `Other Academic Level`                              <dbl> 0.3, 0.3, 0.2, 0.2…
$ ...17                                               <lgl> NA, NA, NA, NA, NA…
$ Gender                                              <lgl> NA, NA, NA, NA, NA…
$ Female                                              <dbl> 65.3, 66.6, 66.5, …
$ Male                                                <dbl> 34.7, 33.4, 33.5, …
$ `Nonbinary*`                                        <chr> "-", "-", "-", "-"…
$ ...22                                               <lgl> NA, NA, NA, NA, NA…
$ `Race/Ethnicity`                                    <lgl> NA, NA, NA, NA, NA…
$ `American Indian or Alaska Native`                  <dbl> 0.5, 0.5, 0.5, 0.4…
$ `Asian, Native Hawaiian, or Other Pacific Islander` <dbl> 7.7, 8.1, 8.4, 8.2…
$ `Black or African-American`                         <dbl> 5.6, 5.6, 5.9, 6.1…
$ `Hispanic or Latino(a)`                             <dbl> 8.3, 8.8, 9.7, 10.…
$ Multiracial                                         <dbl> 3.6, 4.1, 3.9, 4.3…
$ White                                               <dbl> 74.3, 72.9, 71.6, …
$ ...30                                               <lgl> NA, NA, NA, NA, NA…
$ `Disability Status`                                 <lgl> NA, NA, NA, NA, NA…
$ Disability                                          <dbl> 5.7, 5.3, 8.8, 8.5…
$ ...33                                               <lgl> NA, NA, NA, NA, NA…
$ `Type of Disability`                                <lgl> NA, NA, NA, NA, NA…
$ `Attention-deficit/hyperactivity disorder (ADHD)**` <chr> "-", "-", "-", "-"…
$ `Autism spectrum disability ***`                    <chr> "-", "-", "1.8", "…
$ `Chronic health disability ***`                     <chr> "-", "-", "23.2", …
$ `Learning disability**`                             <dbl> 43.8, 42.1, 34.4, …
$ `Mental health/psychological disability  `          <dbl> 25.9, 27.0, 27.7, …
$ `Mobility/physical disability  `                    <dbl> 4.7, 5.2, 3.6, 4.5…
$ `Sensory disability  `                              <dbl> 5.0, 5.0, 4.4, 4.4…
$ `Other disability`                                  <dbl> 20.6, 20.7, 4.9, 5…
$ ...43                                               <lgl> NA, NA, NA, NA, NA…
$ `# of Institutions Reporting Disability Status`     <dbl> 273, 322, 341, 380…
$ ...45                                               <lgl> NA, NA, NA, NA, NA…
$ `TOTAL U.S. STUDY ABROAD`                           <dbl> 304467, 313415, 32…
glimpse(program_duration)
Rows: 18
Columns: 27
$ Year                                                                                                                                                                                                                                <chr> …
$ `Short-term`                                                                                                                                                                                                                        <lgl> …
$ `8 Weeks or Less During Academic Year`                                                                                                                                                                                              <dbl> …
$ `Two to eight weeks`                                                                                                                                                                                                                <chr> …
$ `Fewer than two weeks`                                                                                                                                                                                                              <chr> …
$ `January Term`                                                                                                                                                                                                                      <dbl> …
$ `Summer Term`                                                                                                                                                                                                                       <dbl> …
$ `Summer: More than eight weeks`                                                                                                                                                                                                     <chr> …
$ `Summer: Two to eight weeks`                                                                                                                                                                                                        <chr> …
$ `Summer: Fewer than two weeks`                                                                                                                                                                                                      <chr> …
$ ...11                                                                                                                                                                                                                               <lgl> …
$ `Mid-length`                                                                                                                                                                                                                        <lgl> …
$ `One Quarter`                                                                                                                                                                                                                       <dbl> …
$ `Two Quarters`                                                                                                                                                                                                                      <dbl> …
$ `One Semester`                                                                                                                                                                                                                      <dbl> …
$ ...16                                                                                                                                                                                                                               <lgl> …
$ `Long-term`                                                                                                                                                                                                                         <lgl> …
$ `Academic Year`                                                                                                                                                                                                                     <dbl> …
$ `Calendar Year`                                                                                                                                                                                                                     <dbl> …
$ ...20                                                                                                                                                                                                                               <lgl> …
$ Other                                                                                                                                                                                                                               <dbl> …
$ ...22                                                                                                                                                                                                                               <lgl> …
$ Total                                                                                                                                                                                                                               <dbl> …
$ ...24                                                                                                                                                                                                                               <lgl> …
$ `Note: Percent distribution may not total 100.0 due to rounding.`                                                                                                                                                                   <lgl> …
$ ...26                                                                                                                                                                                                                               <lgl> …
$ `Suggested citation: Institute of International Education. (2024). "Detailed Duration of U.S. Study Abroad, 2005/06 - 2022/23" Open Doors Report on International Educational Exchange. Retrieved from https://opendoorsdata.org/.` <lgl> …

Cleaning and Combining Data Sets

After using glimpse() to view data in uploaded datasets, I see that there are a handful of columns with NA or missing informationn. I will use the is.na function to remove this information.

program_duration_cleaned <- program_duration[, apply(program_duration, 2, function(x) any(!is.na(x)))]

print(program_duration_cleaned)
# A tibble: 18 × 16
   Year    8 Weeks or Less During …¹ `Two to eight weeks` `Fewer than two weeks`
   <chr>                       <dbl> <chr>                <chr>                 
 1 2005/06                       9.5 -                    -                     
 2 2006/07                       9.8 -                    -                     
 3 2007/08                      11   -                    -                     
 4 2008/09                      11.7 -                    -                     
 5 2009/10                      11.9 -                    -                     
 6 2010/11                      13.3 5                    8.3                   
 7 2011/12                      14.4 6.5                  7.9                   
 8 2012/13                      15.3 6.9                  8.4                   
 9 2013/14                      16.5 6.6                  9.9                   
10 2014/15                      16.7 6.5                  10.2                  
11 2015/16                      17.4 6.6                  10.8                  
12 2016/17                      18.8 6.8                  12                    
13 2017/18                      19   7.3                  11.7                  
14 2018/19                      19.3 6.9                  12.4                  
15 2019/20                      15.9 5.7                  10.2                  
16 2020/21                       5.1 3.7                  1.4                   
17 2021/22                      12.9 4.6                  8.3                   
18 2022/23                      17.5 5.7                  11.8                  
# ℹ abbreviated name: ¹​`8 Weeks or Less During Academic Year`
# ℹ 12 more variables: `January Term` <dbl>, `Summer Term` <dbl>,
#   `Summer: More than eight weeks` <chr>, `Summer: Two to eight weeks` <chr>,
#   `Summer: Fewer than two weeks` <chr>, `One Quarter` <dbl>,
#   `Two Quarters` <dbl>, `One Semester` <dbl>, `Academic Year` <dbl>,
#   `Calendar Year` <dbl>, Other <dbl>, Total <dbl>
field_of_study_cleaned <- field_of_study[, apply(field_of_study, 2, function(x) any(!is.na(x)))]

print(field_of_study_cleaned)
# A tibble: 23 × 19
   Year   `STEM Fields*` Physical or Life Sci…¹ `Health Professions` Engineering
   <chr>           <dbl>                  <dbl>                <dbl>       <dbl>
 1 2000/…           16.6                   7.1                  3.2         2.7 
 2 2001/…           16.8                   7.6                  3           2.9 
 3 2002/…           17                     7.13                 3.09        2.89
 4 2003/…           16.2                   7.06                 3.38        2.86
 5 2004/…           16.3                   7.1                  3.4         2.9 
 6 2005/…           16.4                   6.9                  3.8         2.9 
 7 2006/…           17.5                   7.3                  4.1         3.1 
 8 2007/…           17.6                   7.2                  4.5         3.1 
 9 2008/…           17.7                   7.3                  4.5         3.2 
10 2009/…           18.9                   7.5                  4.7         3.9 
# ℹ 13 more rows
# ℹ abbreviated name: ¹​`Physical or Life Sciences`
# ℹ 14 more variables: `Math or Computer Science` <dbl>, Agriculture <dbl>,
#   `Business & Management` <dbl>, `Social Sciences**` <dbl>,
#   `Foreign Language and International Studies***` <chr>,
#   `Fine and Applied Arts` <dbl>, `Communications and Journalism***` <chr>,
#   `Humanities**` <dbl>, Education <dbl>, …
student_profile_cleaned <- student_profile[, apply(student_profile, 2, function(x) any(!is.na(x)))]

print(student_profile_cleaned)
# A tibble: 10 × 34
   Year    Undergraduate `Associate's` Freshman Sophomore Junior Senior
   <chr>           <dbl>         <dbl>    <dbl>     <dbl>  <dbl>  <dbl>
 1 2013/14          87             1.7      3.9      13.1   33.9   25.3
 2 2014/15          87.6           1.8      3.9      13.1   33.1   26.4
 3 2015/16          87.7           1.7      3.6      12.7   32.9   27.7
 4 2016/17          87.9           1.7      4        13.2   33     27.4
 5 2017/18          87.7           1.7      4.2      12.8   33     28.2
 6 2018/19          88             1.9      4.1      13.2   33.4   29.4
 7 2019/20          90.6           0.8      2.7      12.3   42.7   27  
 8 2020/21          88.1           0.4      7        14.2   27     33.6
 9 2021/22          89.7           0.9      5        11.4   30.6   34.8
10 2022/23          88.6           1.5      4.6      12.4   30.7   30.6
# ℹ 27 more variables: `Bachelor's, Unspecified` <dbl>, Graduate <dbl>,
#   `Master's` <dbl>, Doctoral <dbl>, `Graduate, Professional` <dbl>,
#   `Graduate, Unspecified` <dbl>, `Other Academic Level` <dbl>, Female <dbl>,
#   Male <dbl>, `Nonbinary*` <chr>, `American Indian or Alaska Native` <dbl>,
#   `Asian, Native Hawaiian, or Other Pacific Islander` <dbl>,
#   `Black or African-American` <dbl>, `Hispanic or Latino(a)` <dbl>,
#   Multiracial <dbl>, White <dbl>, Disability <dbl>, …

Due to varying spans of time when data was collected, I’m going to narrow down the data sets to view only information from 2013 onward.

field_of_study_cleaned$Year <- as.numeric(sub("/.*", "", field_of_study_cleaned$Year))

filtered_field_of_study <- subset(field_of_study_cleaned, Year >= 2013 & Year <= 2022)
print(filtered_field_of_study)
# A tibble: 10 × 19
    Year `STEM Fields*` Physical or Life Scie…¹ `Health Professions` Engineering
   <dbl>          <dbl>                   <dbl>                <dbl>       <dbl>
 1  2013           22.6                     8                    6           4.6
 2  2014           23.9                     8.1                  6.3         5  
 3  2015           25.2                     8.1                  7.1         5.1
 4  2016           25.8                     8                    7.1         5.3
 5  2017           25.6                     7.8                  6.9         5.2
 6  2018           26.8                     8.1                  7.1         5.5
 7  2019           24.5                     7.4                  6           4.3
 8  2020           29.2                     9.6                  4.9         8.9
 9  2021           25.6                     8.2                  5.6         5.6
10  2022           27                       8.2                  6.1         5.4
# ℹ abbreviated name: ¹​`Physical or Life Sciences`
# ℹ 14 more variables: `Math or Computer Science` <dbl>, Agriculture <dbl>,
#   `Business & Management` <dbl>, `Social Sciences**` <dbl>,
#   `Foreign Language and International Studies***` <chr>,
#   `Fine and Applied Arts` <dbl>, `Communications and Journalism***` <chr>,
#   `Humanities**` <dbl>, Education <dbl>,
#   `Legal Studies and Law Enforcement***` <chr>, …
student_profile_cleaned$Year <- as.numeric(sub("/.*", "", student_profile_cleaned$Year))

filtered_student_profile <- subset(student_profile_cleaned, Year >= 2013 & Year <= 2022)

filtered_student_profile <- filtered_student_profile[, c("Year", "Freshman", "Sophomore", "Junior", "Senior")]

print(filtered_student_profile)
# A tibble: 10 × 5
    Year Freshman Sophomore Junior Senior
   <dbl>    <dbl>     <dbl>  <dbl>  <dbl>
 1  2013      3.9      13.1   33.9   25.3
 2  2014      3.9      13.1   33.1   26.4
 3  2015      3.6      12.7   32.9   27.7
 4  2016      4        13.2   33     27.4
 5  2017      4.2      12.8   33     28.2
 6  2018      4.1      13.2   33.4   29.4
 7  2019      2.7      12.3   42.7   27  
 8  2020      7        14.2   27     33.6
 9  2021      5        11.4   30.6   34.8
10  2022      4.6      12.4   30.7   30.6
program_duration_cleaned$Year <- as.numeric(sub("/.*", "", program_duration_cleaned$Year))

filtered_program_duration <- subset(program_duration_cleaned, Year >= 2013 & Year <= 2022)

print(filtered_program_duration)
# A tibble: 10 × 16
    Year 8 Weeks or Less During Ac…¹ `Two to eight weeks` `Fewer than two weeks`
   <dbl>                       <dbl> <chr>                <chr>                 
 1  2013                        16.5 6.6                  9.9                   
 2  2014                        16.7 6.5                  10.2                  
 3  2015                        17.4 6.6                  10.8                  
 4  2016                        18.8 6.8                  12                    
 5  2017                        19   7.3                  11.7                  
 6  2018                        19.3 6.9                  12.4                  
 7  2019                        15.9 5.7                  10.2                  
 8  2020                         5.1 3.7                  1.4                   
 9  2021                        12.9 4.6                  8.3                   
10  2022                        17.5 5.7                  11.8                  
# ℹ abbreviated name: ¹​`8 Weeks or Less During Academic Year`
# ℹ 12 more variables: `January Term` <dbl>, `Summer Term` <dbl>,
#   `Summer: More than eight weeks` <chr>, `Summer: Two to eight weeks` <chr>,
#   `Summer: Fewer than two weeks` <chr>, `One Quarter` <dbl>,
#   `Two Quarters` <dbl>, `One Semester` <dbl>, `Academic Year` <dbl>,
#   `Calendar Year` <dbl>, Other <dbl>, Total <dbl>

Due to the research questions that I am focusing on, I am also going to create some subset data tables to look specifically at short term, mid-length, and long-term programs. Having these data sets already set up will help when analyzing this data below. Please note that program duration totals listed are the percentage of U.S. Study Abroad Students participating in that type of program. Total number of particpiant’s can be viewed in Total_Participation

short_term_programs <- filtered_program_duration[, c("Year", "8 Weeks or Less During Academic Year", "January Term", "Summer Term")]

short_term_programs$Total_Short <- rowSums(short_term_programs[, c("8 Weeks or Less During Academic Year", "January Term", "Summer Term")], na.rm = TRUE)

print(short_term_programs)
# A tibble: 10 × 5
    Year 8 Weeks or Less During Acade…¹ `January Term` `Summer Term` Total_Short
   <dbl>                          <dbl>          <dbl>         <dbl>       <dbl>
 1  2013                           16.5            7.5          38.1        62.1
 2  2014                           16.7            7.4          39          63.1
 3  2015                           17.4            7.4          38          62.8
 4  2016                           18.8            7.1          38.5        64.4
 5  2017                           19              7            38.5        64.5
 6  2018                           19.3            6.9          38.6        64.8
 7  2019                           15.9           13.9           0.9        30.7
 8  2020                            5.1            0.8          57.7        63.6
 9  2021                           12.9            2.9          48.9        64.7
10  2022                           17.5            5.8          40.6        63.9
# ℹ abbreviated name: ¹​`8 Weeks or Less During Academic Year`
mid_term_programs <- filtered_program_duration[, c("Year", "One Quarter", "Two Quarters", "One Semester")]

mid_term_programs$Total_Mid <- rowSums(mid_term_programs[, c("One Quarter", "Two Quarters", "One Semester")], na.rm = TRUE)

print(mid_term_programs)
# A tibble: 10 × 5
    Year `One Quarter` `Two Quarters` `One Semester` Total_Mid
   <dbl>         <dbl>          <dbl>          <dbl>     <dbl>
 1  2013           2.4            0.6           31.9      34.9
 2  2014           2.2            0.3           31.8      34.3
 3  2015           2.3            0.3           31.9      34.5
 4  2016           2.2            0.2           30.7      33.1
 5  2017           2.4            0.2           30.3      32.9
 6  2018           1.8            0.3           30.7      32.8
 7  2019           2.8            0.3           62.5      65.6
 8  2020           1.5            0.3           26.3      28.1
 9  2021           2.1            0.4           30.3      32.8
10  2022           2.7            0.4           30.5      33.6
long_term_programs <- filtered_program_duration[, c("Year", "Academic Year", "Calendar Year")]

long_term_programs$Total_Long <- rowSums(long_term_programs[, c("Academic Year", "Calendar Year")], na.rm = TRUE)

print(long_term_programs)
# A tibble: 10 × 4
    Year `Academic Year` `Calendar Year` Total_Long
   <dbl>           <dbl>           <dbl>      <dbl>
 1  2013             2.9             0.1        3  
 2  2014             2.5             0.1        2.6
 3  2015             2.3             0.1        2.4
 4  2016             2.2             0.1        2.3
 5  2017             2.2             0.1        2.3
 6  2018             2.1             0.2        2.3
 7  2019             3.5             0.2        3.7
 8  2020             7.3             0.5        7.8
 9  2021             2.4             0.1        2.5
10  2022             2.3             0.1        2.4
short_term_programs$ProgramType <- "Short-Term"
mid_term_programs$ProgramType <- "Mid-Term"
long_term_programs$ProgramType <- "Long-Term"

short_term_programs <- short_term_programs[, c("Year", "Total_Short", "ProgramType")]
colnames(short_term_programs) <- c("Year", "Total", "ProgramType")

mid_term_programs <- mid_term_programs[, c("Year", "Total_Mid", "ProgramType")]
colnames(mid_term_programs) <- c("Year", "Total", "ProgramType")

long_term_programs <- long_term_programs[, c("Year", "Total_Long", "ProgramType")]
colnames(long_term_programs) <- c("Year", "Total", "ProgramType")

combined_programs <- rbind(short_term_programs, mid_term_programs, long_term_programs)

print(combined_programs)
# A tibble: 30 × 3
    Year Total ProgramType
   <dbl> <dbl> <chr>      
 1  2013  62.1 Short-Term 
 2  2014  63.1 Short-Term 
 3  2015  62.8 Short-Term 
 4  2016  64.4 Short-Term 
 5  2017  64.5 Short-Term 
 6  2018  64.8 Short-Term 
 7  2019  30.7 Short-Term 
 8  2020  63.6 Short-Term 
 9  2021  64.7 Short-Term 
10  2022  63.9 Short-Term 
# ℹ 20 more rows

Analyze

Study Abroad Program Duration

In this geom_bar below, we can see the trends in duration of study abroad programs. Comparing short, mid, and long term duration’s in this chart indicate that short term study abroad programs are by and far the most popular type of programs year to year. In 2019, it appears that mid-length programs far exceeded short and long term programs. Considering this time frame, it is likely due to the fact that this occurred during the pandemic and most short term (summer) programs were cancelled for the 2019/2020 school year.

combined_programs$Year <- as.factor(combined_programs$Year)

ggplot(combined_programs, aes(x = Year, y = Total, fill = ProgramType)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Program Length Study Abroad Trends",
    x = "Year",
    y = "Percentage of Students",
    fill = "Program Duration"
  ) +
  theme_minimal() +
  scale_x_discrete(labels = levels(combined_programs$Year)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Student Profiles

Viewing the academic profile of students allows the opportunity to view which year students are choosing to study abroad during their academic journey’s. According to the data reported, most students are studying abroad during their Junior year.

library(reshape2)
long_data <- melt(filtered_student_profile, id.vars = "Year", 
                  variable.name = "Student_Level", value.name = "Count")

ggplot(long_data, aes(x = Year, y = Count, fill = Student_Level)) +
  geom_area(alpha = 0.7, position = "stack") +
  labs(title = "Participation by Year and Student Level",
       x = "Year", y = "Number of Students") +
  scale_x_discrete(limits = unique(long_data$Year)) +
  theme_minimal()
Warning in scale_x_discrete(limits = unique(long_data$Year)): Continuous limits supplied to discrete scale.
ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?

Field of Study

In the three graphs below, field of study is compared for a specific year. With so much data to consider for each field of study, taking an approach of filtering the data to specific year’s in this step allows for adjusting the code chunk if there is a different year that we would like to view. For the sake of this project, I have replicated the code chunk so we can view the Field of Study trends for 2022, 2021, and 2018. Physical and Life Sciences is consistently the top field of study reported. Fine and applied arts, health professions and foreign languages and international studies are also top reported fields.

specific_year_data <- filtered_field_of_study %>%
  filter(Year == "2022") %>%
  select(-Undeclared, -Total)

long_data <- melt(specific_year_data, id.vars = "Year", 
                  variable.name = "Field", value.name = "Percentage")

ggplot(long_data, aes(x = Field, y = Percentage, fill = Field)) +
  geom_bar(stat = "identity") +
  labs(title = "Fields of Study Distribution (2022)",
       x = "Field of Study", y = "Percentage of Students") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

specific_year_data <- filtered_field_of_study %>%
  filter(Year == "2021") %>%
  select(-Undeclared, -Total)

long_data <- melt(specific_year_data, id.vars = "Year", 
                  variable.name = "Field", value.name = "Percentage")

ggplot(long_data, aes(x = Field, y = Percentage, fill = Field)) +
  geom_bar(stat = "identity") +
  labs(title = "Fields of Study Distribution (2021)",
       x = "Field of Study", y = "Percentage of Students") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

specific_year_data <- filtered_field_of_study %>%
  filter(Year == "2018") %>%
  select(-Undeclared, -Total)

long_data <- melt(specific_year_data, id.vars = "Year", 
                  variable.name = "Field", value.name = "Percentage")

ggplot(long_data, aes(x = Field, y = Percentage, fill = Field)) +
  geom_bar(stat = "identity") +
  labs(title = "Fields of Study Distribution (2018)",
       x = "Field of Study", y = "Percentage of Students") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Taking a look at the total participation of study abroad students in the graph below, we can compare how this number could correlate with trends that we see above in field of study. For example, 2018 reported the most students participating in study abroad programs - even more than has been reported in 2022/2023 school year. In the geom_bar graph for 2018, we can see that each field of study has a higher number reported for participants.

Total_Participation$Total <- as.numeric(gsub(",", "", Total_Participation$Total))

Total_Participation$Year <- as.numeric(as.character(Total_Participation$Year))

ggplot(Total_Participation, aes(x = Year, y = Total)) +
  geom_line(color = "blue", size = 1) +
  geom_point(color = "red", size = 2) +
  labs(title = "Total Study Abroad Participation Over Time",
       x = "Year", y = "Total Participation") +
  scale_y_continuous(labels = scales::comma) + # Format y-axis with commas
  scale_x_continuous(breaks = seq(min(Total_Participation$Year), max(Total_Participation$Year), by = 1)) + # Year intervals
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Communicate

Key Findings

Thinking back to my original research questions below, I think the following can be concluded to a certain extent.

  1. How have short-term, mid-length (semester), and long-term study abroad program durations shifted over time?

    • Short-term, mid-length, and long-term study abroad programs have ebbed and flowed over time with relation to the total number of study abroad participants. Short term programs are indicated to be the most popular type of program duration for students followed by mid-length. Long-term programs indicated as the least participated in.
  2. Are there notable shifts in student fields of study or academic levels corresponding to increases or decreases in study abroad participation over the years?

    • Shifts in student fields of study have occurred. There could be several influences that contribute to what field of study is reported as the highest participation in study abroad programs. However, given the data sets included here, there does appear to be an impact of top reported fields of study prior to and following the COVID pandemic.

Suggested Actions

I would like to further compare and potentially gather more information on the specific types of program duration that students are participating in. For example, looking at Faculty-Led programs vs. Direct Enroll or Exchange programs. Although the data provided by the Open Doors reports is super beneficial and provides great insight into the current state of the field for study abroad, I think that more in depth program information could be useful to include in analysis.

I would also like to see data specific to student academics such as “average GPA” or “average grades of study abroad courses”. I think that comparing students grades prior to study abroad with their study abroad course grades, and then their grades once they return from their study abroad could provide some beneficial insight into the impact of study abroad on student academics.

Limitations and Ethical Concerns

Limitations:

  • Data limitations were definitely a concern when compiling this data. There was incompleteness and missing data from certain columns and rows depending on the data file that I was utilizing (e.g., “Undeclared” or “Total”). Each file used also had different column headers that needed to be adjusted which created the need for mutation and filtering of data even more. The data may represent aggregated values (e.g., total percentages) rather than granular details like individual student records.

Ethical Concerns:

  • Privacy and confidentiality - Even aggregated data might indirectly reveal sensitive information about specific groups (e.g., small institutions or demographics). Although data comes from a national database, the filtering and mutating of data can lead to misrepresentation or misuse. Taking steps to ensure transparency in data use and limitations when presenting findings, highlight underrepresented groups and advocate for improved equity in data collection and avoid overgeneralization or unwarranted conclusions from aggregated data.

References

Institute of International Education. (2024). “Detailed Duration of U.S. Study Abroad, 2005/06 - 2022/23” Open Doors Report on International Educational Exchange. Retrieved from https://opendoorsdata.org/

Institute of International Education. (2024). “U.S. Study Abroad for Academic Credit Trends, 1989/90 - 2022/23.” Open Doors Report on International Educational Exchange. Retrieved from http://www.opendoorsdata.org

Institute of International Education. (2024). “Fields of Study of U.S. Study Abroad Students, 2000/01-2022/23” Open Doors Report on International Educational Exchange. Retrieved from http://www.opendoorsdata.org

Institute of International Education. (2024). “Profile of U.S. Study Abroad Students, 2000/01 - 2022/23.” Open Doors Report on International Educational Exchange. Retrieved from http://www.opendoorsdata.org