Scenario

We will analyze data collected that exists as part of the Gapminder data set. This data set includes economic indicators from different countries over different periods of time. We will focus entirely on the year 2007, which is the most recent year of data available. We will focus on the variable GDP per capita. Since GDP is a measurement of economic activity, higher GDP per capita typically indicates a country is better off economically.

Our goal is to examine the data in general, and to do some more detailed analysis for the “species” variable.

A histogram of the data

We start by looking at a histogram of the GDP data

ggplot(gap_2007, aes(x = gdpPercap)) +
  geom_histogram(bins = 30, alpha = 0.8, fill = "lightblue", color = "black") +
  theme_minimal()

As we can see, the data set is unimodal and skewed to the right, meaning that the majority of our data values are closer to the minimum, with a handful of data values (the skew/tail of the distribution) at the higher end of the distribution. Since our observable units are countries from the year 2007, our right-skewed GDP data indicates that most of our countries have lower GDP per capita (closer to the minimum value), while only a handful of countries have a high GDP per capita. are 344 rows in our data frame (i.e. 344 penguins), and there are 8 columns (i.e. 8 variables). We anticipate that this means the median will be LOWER than the mean, but we will confirm this in the next section. # Calculating statistics for our GDP data

We now calculate some statistics for our GDP data.

# Measures of Center
mean(gap_2007$gdpPercap)

## [1] 11680.07

median(gap_2007$gdpPercap)

## [1] 6124.371

Starting with our measures of center, we see that the mean of our data set was 11680.07 and the median of our data set was 6124.371. This confirms our earlier hypothesis that the mean is higher than the median, thanks to the right skew of the data.

# Measures of Spread
sd(gap_2007$gdpPercap)

## [1] 12859.94

range(gap_2007$gdpPercap)

## [1]   277.5519 49357.1902

IQR(gap_2007$gdpPercap)

## [1] 16383.99

We see that our standard deviation is 12859.94. This large number (it is as large as the mean) suggests that our data are relatively spread out. Our range goes from a minimum of 277.5519 to a maximum of 49357.1902, with an interquartile range of 16383.99.

5 Number Summary and Box plot

We conclude this report with a 5-number summary and a box plot display.

summary(gap_2007$gdpPercap)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   277.6  1624.8  6124.4 11680.1 18008.8 49357.2

ggplot(gap_2007, aes(x = gdpPercap)) +
  geom_boxplot(fill="lightblue") +
  theme_minimal()

NOW: DO A SIMILAR ANALYSIS FOR Life Expectancy

ggplot(gap_2007, aes(x = lifeExp)) +
  geom_histogram(bins = 30, alpha = 0.8, fill = "lightblue", color = "black") +
  theme_minimal()

# Measures of Center
mean(gap_2007$lifeExp)

## [1] 67.00742

median(gap_2007$lifeExp)

## [1] 71.9355

# Measures of Spread
sd(gap_2007$lifeExp)

## [1] 12.07302

range(gap_2007$lifeExp)

## [1] 39.613 82.603

IQR(gap_2007$lifeExp)

## [1] 19.253

summary(gap_2007$lifeExp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   39.61   57.16   71.94   67.01   76.41   82.60

ggplot(gap_2007, aes(x = lifeExp)) +
  geom_boxplot(fill="lightblue") +
  theme_minimal()

The histogram above is a bimodal shape with a left-skew. This is due to the data having two peaks on the right side of the data and a higher concentration of values on the right side of the data. The mean in this data is less than the median of the data set. This means that the average will be less than the middle data point of the set.

In Conclusion

In this report, we analyzed data collected from countries all around the world in 2007. Our initial work focused on GDP Per Capita, while our later work focused on Life Expectancy. For each variable, we calculated measures of center; measures of spread; and looked at several graphs (histograms and boxplots).

Unit 2 and 3 - Studying State Economic Data

Max Hogue

February 5, 2026

Scenario

A histogram of the data

5 Number Summary and Box plot

NOW: DO A SIMILAR ANALYSIS FOR Life Expectancy

In Conclusion