Assignment 1: The Quality of Life Assessment Across the Globe

Overview

In today’s look at how people are doing worldwide, we usually check a common metrics called Gross Domestic Product (GDP) per person. But, more and more, some researchers are seeing that this number doesn’t tell the whole story and that there are some limitations to it. Other important factors such as health, social, political stability, and freedom, among other indicators have not been given much attention

(File:Average Annual HDI Growth From 2010 to 2021 Published in 2022.png - Wikipedia, 2022)

So, in this data analysis, we’re digging into the quality of life by exploring other measures such as happiness, freedom, how much money countries make (GDP), family, and health. We want to answer some interesting questions, like which countries have the happiest people, whether having a big family makes you happier, if trust is connected to freedom, and how happiness links up with money (GDP). Our goal is to dig into these questions and bring out clear and easy-to-understand insights that go beyond just money and paint a fuller picture of how life is around the world.

The dataset consists of 158 observations and 12 variables, with no missing values, by running a summary statistics 2 features are have the data type character the remaining 10 are numeric, there seem to be no outliers. The max score for happiness_Rank, Happiness_Score, GDP, Freedom and Life expentancy are 158, 7.587, 1.6904, 0.6697 and 1.0252 respectively.

data_2015 <- read_csv("2015.csv", show_col_types = FALSE) #read dataset
head(data_2015)# view the first rows f the data
dim(data_2015) #the dimension of the data
## [1] 158  12
names(data_2015) #get the variables names
##  [1] "Country"                       "Region"                       
##  [3] "Happiness Rank"                "Happiness Score"              
##  [5] "Standard Error"                "Economy (GDP per Capita)"     
##  [7] "Family"                        "Health (Life Expectancy)"     
##  [9] "Freedom"                       "Trust (Government Corruption)"
## [11] "Generosity"                    "Dystopia Residual"
sum(is.na(data_2015)) ### no missing values
## [1] 0
summary(data_2015) #summarize statistics
##    Country             Region          Happiness Rank   Happiness Score
##  Length:158         Length:158         Min.   :  1.00   Min.   :2.839  
##  Class :character   Class :character   1st Qu.: 40.25   1st Qu.:4.526  
##  Mode  :character   Mode  :character   Median : 79.50   Median :5.232  
##                                        Mean   : 79.49   Mean   :5.376  
##                                        3rd Qu.:118.75   3rd Qu.:6.244  
##                                        Max.   :158.00   Max.   :7.587  
##  Standard Error    Economy (GDP per Capita)     Family      
##  Min.   :0.01848   Min.   :0.0000           Min.   :0.0000  
##  1st Qu.:0.03727   1st Qu.:0.5458           1st Qu.:0.8568  
##  Median :0.04394   Median :0.9102           Median :1.0295  
##  Mean   :0.04788   Mean   :0.8461           Mean   :0.9910  
##  3rd Qu.:0.05230   3rd Qu.:1.1584           3rd Qu.:1.2144  
##  Max.   :0.13693   Max.   :1.6904           Max.   :1.4022  
##  Health (Life Expectancy)    Freedom       Trust (Government Corruption)
##  Min.   :0.0000           Min.   :0.0000   Min.   :0.00000              
##  1st Qu.:0.4392           1st Qu.:0.3283   1st Qu.:0.06168              
##  Median :0.6967           Median :0.4355   Median :0.10722              
##  Mean   :0.6303           Mean   :0.4286   Mean   :0.14342              
##  3rd Qu.:0.8110           3rd Qu.:0.5491   3rd Qu.:0.18025              
##  Max.   :1.0252           Max.   :0.6697   Max.   :0.55191              
##    Generosity     Dystopia Residual
##  Min.   :0.0000   Min.   :0.3286   
##  1st Qu.:0.1506   1st Qu.:1.7594   
##  Median :0.2161   Median :2.0954   
##  Mean   :0.2373   Mean   :2.0990   
##  3rd Qu.:0.3099   3rd Qu.:2.4624   
##  Max.   :0.7959   Max.   :3.6021

Key Finding

1

By viewing the first 10 data, highest happiness Most countries with the highest happiness_score are from the Western Europe countries.With a count of 7 countries.

2:

Another observation was that the subset of countries with low freedom scores is not confined to a specific region; instead, it spans across various parts of the globe. Notably, Sub-Saharan Africa contributes the highest number of countries with low freedom scores with count of 4 countries, Central and Eastern Europe (3) and the Middle East and Northern Africa (3) also feature prominently in this category
table(no_freedom_df$region) # tally of the regions in freedom<0.20
## 
##      Central and Eastern Europe Middle East and Northern Africa 
##                               3                               3 
##                   Southern Asia              Sub-Saharan Africa 
##                               1                               4 
##                  Western Europe 
##                               1

3:

Looking at the data, we noticed that Western European countries, particularly Norway, Switzerland, Sweden, Denmark, and Finland, consistently have high rankings in both happiness and freedom scores. This suggests that people in these countries generally enjoy a good quality of life with significant personal freedoms.

What’s interesting is that the United Arab Emirates and Qatar, from the Middle East, also show up in the top 10 for freedom. This might be surprising to some, but it aligns with the fact that these countries have been making efforts to improve social and economic freedoms, despite cultural differences.

4:

By grouping by country, we can say that Because of these diverse patterns, drawing a single straight line (regression line) on a graph might not fit well with all the points. This is because not all regions follow the same trend of having high happiness and freedom but we can say that a positive correlation between happiness and freedom.
## `geom_smooth()` using formula = 'y ~ x'

5:

By observing the the mean happiness and mean gdp of regions, we can say that as gdp increase so does happiness scores of these 10 region. In other words, individuals in countries with higher GDP levels are, on average, more likely to report higher levels of happiness.The correlation values is 0.8989795, which indicates a strong correlation between the two variables. While the correlation suggests a statistical association between GDP and happiness, it does not necessarily mean that one causes the other. Other factors, such as social relationships, health, and cultural values, may also play a role in influencing happiness.

6:

Interestingly the average gdp of the 10 happiest sub_sharan country is lesser than the average gdp of the 10 least happy western europe country. The observation highlights the importance of considering cultural differences in the perception of happiness. Different regions and cultures may have varying priorities and definitions of what constitutes a happy and fulfilling life.

## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Conclusion

In summary, In summary, our study looked at assessing the quality of life across the globe. We considered features like happiness, freedom,GDP,trust, family etc per regions . We found that countries in Western Europe tend to dominate with high happiness and freedom score. Also, we saw that when a country has more money (GDP), people there are usually happier. In conclusion although happiness score, and freedom and gdp tend to correlate, there may be some other variable that contribute to this correlation.

Assignment 2: Major League Baseball 1986

Overview

Major League Baseball (MLB) is one of the world’s most iconic and popular sports. The 1986 MLB season is special in history, full of unforgettable moments, intense competition, and individual performances (Simon, 2011) (Simon 2011). In this project, we take a deep dive into baseball’s batting statistics throughout the 1986 season, exploring a wide range of data that captures the essence of an incredible year in baseball.

(Major League Baseball, 2001)

The dataset provides batting statistics from the 1986 Major League Baseball season for various players. Here’s a summary of the key information:

Player Details:
    Last: Last name of the baseball player.
    First: First name of the baseball player.
    Age: Age of the player.

Performance Metrics:
    G (Games): Number of games played.
    PA (Plate Appearances): Total plate appearances.
    AB (At Bats): Total at-bats.
    R (Runs): Number of runs scored.
    H (Hits): Total number of hits.
    2B (Doubles): Number of doubles.
    3B (Triples): Number of triples.
    HR (Home Runs): Number of home runs.
    RBI (Runs Batted In): Runs batted in.
    SB (Stolen Bases): Number of stolen bases.
    CS (Caught Stealing): Number of times caught stealing.
    BB (Walks): Number of walks.
    SO (Strikeouts): Number of strikeouts.

There are no missing values, the data consist of 771 rows and 16 features. Two (2) of the columns features have the data type “character” the remaining 14 are numeric. From the summary statistics there seem to be some outliers in the data

#Problem 1
baseball <- read_csv("baseball.csv", show_col_types = FALSE)
head(baseball)
#Problem 2
dim(baseball) #dimension
## [1] 771  16
sum(is.na(baseball)) #missing values
## [1] 0
summary(baseball) # summary statistics
##      Last              First                Age              G        
##  Length:771         Length:771         Min.   :20.00   Min.   :  1.0  
##  Class :character   Class :character   1st Qu.:25.00   1st Qu.: 17.0  
##  Mode  :character   Mode  :character   Median :27.00   Median : 56.0  
##                                        Mean   :27.98   Mean   : 66.2  
##                                        3rd Qu.:31.00   3rd Qu.:111.0  
##                                        Max.   :45.00   Max.   :163.0  
##        PA              AB              R                H         
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 16.0   1st Qu.: 14.0   1st Qu.:  1.00   1st Qu.:  2.00  
##  Median :101.0   Median : 90.0   Median :  9.00   Median : 19.00  
##  Mean   :208.6   Mean   :185.6   Mean   : 24.05   Mean   : 47.83  
##  3rd Qu.:366.5   3rd Qu.:328.5   3rd Qu.: 42.00   3rd Qu.: 84.00  
##  Max.   :742.0   Max.   :687.0   Max.   :130.00   Max.   :238.00  
##        2B               3B               HR              RBI        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :  0.00  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:  0.00  
##  Median : 3.000   Median : 0.000   Median : 1.000   Median :  8.00  
##  Mean   : 8.445   Mean   : 1.109   Mean   : 4.946   Mean   : 22.56  
##  3rd Qu.:14.000   3rd Qu.: 2.000   3rd Qu.: 7.000   3rd Qu.: 38.00  
##  Max.   :53.000   Max.   :14.000   Max.   :40.000   Max.   :121.00  
##        SB                CS               BB               SO        
##  Min.   :  0.000   Min.   : 0.000   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  0.000   1st Qu.: 0.000   1st Qu.:  1.00   1st Qu.:  4.00  
##  Median :  0.000   Median : 0.000   Median :  7.00   Median : 20.00  
##  Mean   :  4.296   Mean   : 2.101   Mean   : 18.45   Mean   : 32.04  
##  3rd Qu.:  3.000   3rd Qu.: 3.000   3rd Qu.: 30.00   3rd Qu.: 51.50  
##  Max.   :107.000   Max.   :19.000   Max.   :105.00   Max.   :185.00

Research Questions

How does player performance (e.g., Runs, Hits, Home Runs) vary with age? Are there noticeable trends in different age groups?

Which baseball players have the highest number of strikeouts, and what are their respective age groups and the number of games they played?

What is the relationship between a baseball player’s home run (HR) count and their runs batted in (RBI) during the 1986 MLB season? Does a higher number of home runs correlate with a higher RBI count

Key Findings

1:

The number of players in each age group appears to vary, with majority of the players around the age of 25. Home Runs (HR): The average number of home runs per player tends to increase with age, peaking at age 35 (10.78 home runs per player). There’s a notable spike in home runs at ages 34 and 35. Hits (H): The average number of hits per player shows some fluctuations, with a peak at age 35 (81.67 hits per player). This suggests that players at age 35, on average, had a high number of hits. Runs (R): Similar to home runs, the average number of runs (R) per player also increases with age, peaking around the age of 35 (40.39 runs per player).

Overall there seem to be a positive trend between Age and (Runs, Hits, Home Runs)

attach(age_stats_df)
par(mfrow = c(1,3))
plot(Age,HR,main='Age Vs HR')
plot(Age, H,main="Age Vs H")
plot(Age,R, main= "Age Vs R")

2

By examining the the table of players that strike out most, The players in the list are between 21 and 30 years old. Each age seems to have its own set of performances.Some players, like Pete Incaviglia and Rob Deer, did really well in hitting, home runs, and bringing in runs. Dale Murphy and Darryl Strawberry also did great in different areas. Juan Samuel is super fast! He stole 42 bases. Some players, like Steve Balboni and Jesse Barfield, were pretty steady across different stats. They didn’t vary too much.Dale Murphy had 692 times at bat

3:

Evaluating the relationship between HR and RBI, it seemed like there’s a positive connection between hitting more home runs (HR) and getting more runs batted in (RBI) in the 1986 MLB season. You know, as players hit more homers, it looks like they’re also bringing in more runs. It just kind of makes sense.

4:

After evaluating players for end-of-season awards, considering the eligibility criteria of at least 300 at-bats or appearing in at least 100 games, the data was arranged in ascending order by TotalRank. The twenty players with the lowest TotalRank scores were identified as MVP candidates.

Among the 20 two players appeared on top with both having total ranks of 20; Don Mattingly and Mike Schmidt. Deciding who is most valuable, then the tiebreaker would depend on the specific criteria or preferences a team choose.

However to make a recommendation will choose “Mike Schmidt” since he has 2 lower ranks in the individual ranking(HR,RBI) as compared to “Don Mattingly”

Conclusion

In conclusion, our look at the 1986 Major League Baseball season showed us some interesting things about how players performed. We found that as players get older, they tend to hit more home runs, get more hits, and score more runs, with the highest point usually around 35 years old. When it comes to striking out, different age groups showed different performances. We also saw that hitting more home runs often means bringing in more runs for the team. In deciding who could be the Most Valuable Player (MVP), we chose because he had lower ranks in hitting home runs and bringing in runs compared to other top players.

Reference

Datar, R. (n.d.). Hands-On Exploratory Data Analysis with R. O’Reilly Online Learning. https://www.oreilly.com/library/view/hands-on-exploratory-data/9781789804379/5a7a58f4-66f7-4661-8f6d-9c5c09eaa23a.xhtml

File:Average annual HDI growth from 2010 to 2021 published in 2022.png - Wikipedia. (2022, September 11). https://en.wikipedia.org/wiki/File:Average_annual_HDI_growth_from_2010_to_2021_published_in_2022.png

Home - RDocumentation. (n.d.). https://www.rdocumentation.org/

Kabacoff, R. (2022, May 3). R in Action, Third Edition. Simon and Schuster. http://books.google.ie/books?id=dCZoEAAAQBAJ&printsec=frontcover&dq=R+in+action,+3rd+edition&hl=&cd=1&source=gbs_api

Major League Baseball. (2001, October 2). Wikipedia. https://en.wikipedia.org/wiki/Major_League_Baseball

Simon, M. (2011, April 1). Wackiest season ever: Look back at 1986 - ESPN - SweetSpot- ESPN. ESPN.com. https://www.espn.com/blog/sweetspot/post/_/id/8297/wackiest-season-ever-look-back-at-1986

File:Average Annual HDI Growth from 2010 to 2021 Published in 2022.png - Wikipedia — En.wikipedia.org.” n.d. https://en.wikipedia.org/wiki/File:Average_annual_HDI_growth_from_2010_to_2021_published_in_2022.png.
Home - RDocumentation — Rdocumentation.org.” n.d. https://www.rdocumentation.org/.
Kabacoff, Robert. 2022. R in Action: Data Analysis and Graphics with r and Tidyverse. Simon; Schuster.
Major League Baseball - Wikipedia — En.wikipedia.org.” n.d. https://en.wikipedia.org/wiki/Major_League_Baseball.
Simon, Mark. 2011. Wackiest Season Ever: Look Back at 1986 — Espn.com.” https://www.espn.com/blog/sweetspot/post/_/id/8297/wackiest-season-ever-look-back-at-1986.