Introduction

Research Question: Does geographic location (coastal vs. inland) influence the average years of education among adults (30+) living in The Islands?

Does geographic location influence educational attainment? Specifically, do residents in coastal cities on The Islands complete more years of education, on average, than residents in inland cities? This question guides my analysis and stems from a broader interest in how geography shapes access to opportunity. The population parameter of interest is the difference in the mean years of education between adults (aged 30 or older) living in coastal cities and those living in inland cities on The Islands. Specifically, this parameter represents the average number of years of formal education completed by all adults in each group, and whether the average differs between these two geographic regions. 

Background research showed that educational outcomes are deeply influenced by geographic location, a pattern evident across multiple countries and contexts. In the UK, one study found that “schools located in all of the regions along England’s south coast are less likely to have their students proceed to a higher education qualification than those in London and the country’s midlands and north” (Playford 2023). This surprising coastal disadvantage motivated me to explore whether similar patterns exist elsewhere. In the U.S., Perna and Ruiz (2017) note that “counties with higher shares of adults ages 25 to 64 who have completed at least an associate degree are clustered along the coasts,” reinforcing the idea that educational attainment is geographically uneven and often favors metropolitan areas, although providing the opposite conclusion to the UK study. Realizing that this relationship may be different in different countries I extended the geographic scope further. Better education is commonly linked with a higher standard of living. Wild and Stadelmann (2022) found that in sub-Saharan Africa, “individuals living further away from the coast are significantly poorer, measured along an array of welfare indicators,” highlighting a possible more global link between coastal proximity, living standards, and by extension, access to education. Together, these studies suggest that location plays a major role in shaping educational opportunities and outcomes.

This prior research would suggest that either inland or coastal cities and towns would have a greater average years of education. Despite this, before conducting the study I had assumed that coastal cities would have the greater mean years of education. This was based partially on the evidence that coastal areas have a higher standard of livingcombined with my own intuition and personal experience that agrees with that.

Data Collection Methods

The observational units in this study are individual residents of The Islands who are 30 years or older. The age of 30 years or older came as an arbitrary number assuming that most residents would have finished their education at that point in their lives. There were two variables, the categorical explanatory variable of living location. This was divided into two categories, inland and coastal. The response variable was quantitative and the years of education completed. 

To collect data, I first classified all cities on The Islands as either coastal or inland based on their visible proximity to the shoreline. Then, I used a random number generator in Google Sheets to select 20 cities from each category. Within each selected city, I randomly chose a household using a random number generator. If the household did not contain an eligible person aged 30 or older, I selected another household within the same city. Once an eligible person was found, I randomly selected one adult from the household and recorded their years of education by examining each person’s life history in the “About” section of their profile and identifying when they started and completed various levels of formal schooling. The total number of years between starting and graduating school was used as the individual’s completed years of education.

Originally I had planned on using the chat feature that is built into the islands. Unfortunately, after asking in many different ways how many years of education someone had, I would almost always get an incorrect or weird answer. Realizing this method would not provide relatable data, I switched to looking at the about tab, which had a higher chance of human error while collecting data. Sadly, this was not my only issue during my data collection. In some cases, elderly individuals had completed higher education, such as university, but had no records of elementary or high school. In multiple cases, education was completed after the individual had turned 30, raising questions about whether they should be included. In these cases, I opted to include them if they were at least 30 by the time data was collected, regardless of when their education occurred. Additionally, there is no graduate school in the islands which isn’t accurate to our real society. Although these issues shouldn’t affect the ability to generalize results, because they would affect both groups, they are worth noting. Because I was using the about tab and didn’t need consent I did avoid issues such as death or withdrawing from the study, and so I had a 100% response rate. I had to reselect about 10% of houses in order to meet the eligibility requirements of a 30+ adult.

Descriptive Statistics

To explore the relationship between geographic region and educational attainment, I used two variables. The explanatory variable was living location. This is a categorical variable, either inland or coastal. The response variable is the years of education. This is quantitative and simply the counted number of years of education an individual has completed. I additionally used a side-by-side boxplot and numerical summaries to compare the distributions between the two groups. 

On The Islands evidence indicated that inland residents had a higher mean (14.7 years) and median (15.5 years) of education compared to coastal residents, who had a mean of 13.5 years and a median of 13 years. The standard deviation was slightly higher for the inland group (3.36) than for the coastal group (2.91), suggesting more variability in inland education levels. The range of education for inland residents was 4 to 19 years, while coastal values ranged from 6 to 17 years.

Visually, the box plot showed somewhat similar groups. Inland residents tend to cluster at slightly higher levels of education. The inland distribution appeared slightly right skewed, with one low outlier. In contrast, the coastal group was slightly left-skewed, indicating a concentration of values near the upper end but with some lower observations. While the central tendency is higher for inland residents, the overlap in distributions suggests that the difference is not extreme.

Overall, the descriptive statistics don’t suggest much of an association or correlation between geographic region and education level. Although inland residents appear to complete more years of schooling on average, it is just barely with most of the spread of both groups overlapping. To fully decide if there is any association a statistical test is required.

# Load and prepare the data
data <- read.csv("~/The Islands Data Sheet - Organized Data.csv", header = TRUE)
colnames(data) <- c("Region", "EducationYears")
library(mosaic)
# Numerical summaries
favstats(EducationYears ~ Region, data = data)
# Side-by-side boxplot
boxplot(EducationYears ~ Region, data = data,
        main = "Years of Education by Region",
        ylab = "Years of Education",
        col = c("skyblue", "lightgreen"))

# Histogram of distributions by region
histogram(~ EducationYears | Region, data = data,
          layout = c(1, 2),
          main = "Distribution of Education Years by Region",
          xlab = "Years of Education")

Analysis of Results

The population parameter of interest in this study is the difference in the mean years of education between adults living in coastal cities and those living in inland cities on The Islands. To evaluate whether the observed difference is statistically significant, I conducted a two-sample t-test.

The null hypothesis (\(H_{0}\))states that there is no difference in the mean years of education between the two regions (\(\mu_{coastal}=\mu_{inland}\)). The alternative hypothesis (\(H_a\)) is that there is a difference (\(\mu_{coastal}\ne\mu_{inland}\)). A Type I error in this context would involve incorrectly concluding that a difference exists when there is none, while a Type II error would involve failing to detect a true difference between the regions.

To assess representativeness, I used random selection across a range of cities and households, focusing on adults aged 30 and older from both coastal and inland regions. While there were some issues in data collection they do not affect the representativeness so the sample is reasonably representative of the adult population of The Islands.

The validity conditions for a two-sample t-test are satisfied. Observations were randomly sampled and are independent across groups. Although some cities were sampled more than once, because there were not enough cities in each category, each individual was from a separate household. The sample size (n = 20 per group) is borderline, but sufficient under the Central Limit Theorem. Visual inspection showed mild skewness and slightly unequal variability, but not enough to violate test assumptions. I used Welch’s t-test, which accounts for unequal variances.

Using the t.test() function in R, I calculated a t-statistic of -1.21 and a p-value of 0.2348. This means that, assuming the null hypothesis is true, there is a 23.48% chance of observing a sample difference as extreme as the one found. Because this p-value is greater than 0.05, I fail to reject the null hypothesis.

The 95% confidence interval for the difference in means (coastal - inland) had a lower bound of -3.21 and upper bound of 0.81. Because this interval includes zero, it supports the conclusion that the observed difference is not statistically significant. In this study, we cannot confidently say that geography (coastal or inland) has an effect on average years of education.

# Two-sample t-test (Welch’s t-test, unequal variances)
t.test(EducationYears ~ Region, data = data, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  EducationYears by Region
## t = -1.2077, df = 37.251, p-value = 0.2348
## alternative hypothesis: true difference in means between group Coastal and group Inland is not equal to 0
## 95 percent confidence interval:
##  -3.2128968  0.8128968
## sample estimates:
## mean in group Coastal  mean in group Inland 
##                  13.5                  14.7

Conclusion

This study aimed to determine whether geographic location, specifically coastal versus inland residence, affects the average years of education completed by adults in The Islands. While the descriptive statistics suggested inland residents might have slightly higher levels of education, the statistical analysis showed that this difference was not statistically significant. The t-test resulted in a p-value of 0.2348, low t-statistic, and the confidence interval for the difference in means included zero. These all provided evidence leading me to fail to reject the null hypothesis.

The results did not match my initial expectation that coastal residents would have more education due to presumed access to resources and infrastructure. Instead, the data aligned more closely with the literature suggesting that coastal areas can face educational disadvantages, although no clear conclusion can be drawn from this study. The difference between the two groups would likely have not existed with a larger sample size, and living location is not a variable coded into The Islands that affect years of education. 

The sample was reasonably representative of the adult population in The Islands, and the statistical validity conditions were satisfied. However, some limitations included possible non-independence from repeated city sampling and a relatively small sample size. In future studies, a larger sample could improve statistical power and reduce potential clustering effects. Additionally a different age cut off and disqualification of elderly residents data abnormalities would limit the problems I experienced. 

Further research could investigate more nuanced geographic, economic, or institutional factors, or explore how proximity to educational institutions or employment opportunities influences education outcomes, although this is likely not possible within The Islands due to its limitations.  This study serves as a starting point for understanding the geographic dimensions of education and concludes there is likely no relationship between living location and education within The Islands, only adding to the complexity of this topic.

Bibliography

Christopher James Playford, Anna Mountford-Zimdars, Simon Benham-Clarke, Coast and City, It Matters Where You Live: How Geography Shapes Progression to Higher Education in England, Social Sciences, Volume 12, Issue 11, 2023, Article 610, ISSN 2076-0760, (https://doi.org/10.3390/socsci12110610))

Laura W. Perna, Roman Ruiz, Geography and College Attainment: A Place-Based Approach, Higher Education Today, June 19, 2017, (https://www.higheredtoday.org/2017/06/19/geography-college-attainment-place-based-approach/))

Frederik Wild, David Stadelmann, Coastal Proximity and Individual Living Standards: Econometric Evidence from Geo-Referenced Household Surveys in Sub-Saharan Africa, Review of Development Economics, Volume 26, Issue 4, 2022, Pages 1883–1901, ISSN 1363-6669, (https://doi.org/10.1111/rode.12901))

Bryce Mason. Geographic Location and Educational Attainment in The Islands. RPubs. https://rpubs.com/masonb/1311491

Letter of Learning

One of the most meaningful aspects of this project was being able to design and complete a study based on my own research question. Especially at first this made the project more engaging and interesting. This also allowed for more learning opportunities as any issues or mistakes were likely unique to me. I also found value in experiencing a result that was not statistically significant. Especially after a semester of mostly rejecting the null hypothesis, failing to reject the null hypothesis reminded me that not all research leads to clear conclusions, and that can still be important. It gave me insight into the limitations of data, models, and assumptions. Another interesting part of this project was watching the other presentations. It was great to see the diverse range of topics and it was surprising to see how many found results that were not statistically significant. 

The most challenging part of the project was data collection. Not only was it tedious and time-consuming, but the process of interpreting and organizing the life history data from The Islands took more mental energy than I expected. It felt repetitive and slow and it ended up being a little frustrating to me that I didn’t fully utilize The Islands. By collecting data through the “About” tab I think I missed out on the more impressive part of the simulated world where users can interact with residents after gaining consent. The presentation was another hurdle. I have never been fully comfortable with public speaking, and presenting in front of others added pressure that made it harder to clearly communicate what I had prepared. 

As for strategies, time management had mixed success. Once I set small, specific goals  I made steady progress, but earlier in the semester, especially over break when I wanted to get ahead, I had more time but did not stick to clear goals, which led to procrastination. What worked best for me was outlining the structure early on and working in short, focused sessions. That made the overall project feel more manageable. Overall, this project was a helpful experience in independence, persistence, and analysis, even if the data did not support my original expectations.