Assignment : 1

library(tidyverse) 
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Loading the data : Using astronaut dataset.

Loading the CSV file to astro

astro <- read_delim('/Users/sneha/H510-Statistics/astronaut.csv')
## Rows: 1277 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, sex, nationality, military_civilian, selection, occupation, ...
## dbl (13): id, number, nationwide_number, year_of_birth, year_of_selection, m...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#head(astro, 5)

Numeric Summary

Total number of mission’s completed based on sex

astro |>
  group_by(sex) |>
  summarise(count_mission = sum(total_number_of_missions))
## # A tibble: 2 × 2
##   sex    count_mission
##   <chr>          <dbl>
## 1 female           428
## 2 male            3381

There is a significant disparity between the number of missions completed by female and male astronauts. Male astronauts have completed far more missions than female astronauts.The data suggests that male astronauts are currently more represented in space missions compared to female astronauts.

Distribution of Nationalities in the Astronaut Dataset

astro |>
  group_by(nationality) |>
  summarize(count_nation = n())
## # A tibble: 40 × 2
##    nationality    count_nation
##    <chr>                 <int>
##  1 Afghanistan               1
##  2 Australia                 4
##  3 Austria                   1
##  4 Belgium                   3
##  5 Brazil                    1
##  6 Bulgaria                  2
##  7 Canada                   18
##  8 China                    14
##  9 Cuba                      1
## 10 Czechoslovakia            1
## # ℹ 30 more rows

Filtering nationality - with count more than 100

astro |>
  group_by(nationality) |>
  summarize(count_nation = n()) |>
  filter(count_nation >=100)
## # A tibble: 2 × 2
##   nationality    count_nation
##   <chr>                 <int>
## 1 U.S.                    854
## 2 U.S.S.R/Russia          273

It can be observed that US and Russia are the only nations that has been listed more than 100 times in the dataset, which mean most of the astronauts in the dataset are either from the US or Russia.

3 novel questions to investigate

Question 1 : How does the number of missions completed vary across different nationalities ?

Question 2 : In which year did maximum number of missions happen?

Question 3 : What are the patterns in Duration of mission in hours based on the spacecraft used in the orbit?

Task : Address at least one of the above questions using an aggregation function

Question addressed : How does the number of missions completed vary across different nationalities ?

astro %>%  
  aggregate(total_number_of_missions ~ nationality, data = ., FUN = sum) 
##                 nationality total_number_of_missions
## 1               Afghanistan                        1
## 2                 Australia                       16
## 3                   Austria                        1
## 4                   Belgium                        5
## 5                    Brazil                        1
## 6                  Bulgaria                        2
## 7                    Canada                       38
## 8                     China                       22
## 9                      Cuba                        1
## 10           Czechoslovakia                        1
## 11                  Denmark                        1
## 12                   France                       38
## 13                  Germany                       28
## 14                   Hungry                        1
## 15                    India                        1
## 16                   Israel                        1
## 17                    Italy                       29
## 18                    Japan                       42
## 19               Kazakhstan                        1
## 20                    Korea                        1
## 21                  Malysia                        1
## 22                   Mexico                        1
## 23                 Mongolia                        1
## 24               Netherland                        5
## 25                   Poland                        1
## 26 Republic of South Africa                        1
## 27                  Romania                        1
## 28             Saudi Arabia                        1
## 29                 Slovakia                        1
## 30                    Spain                        4
## 31                   Sweden                        4
## 32              Switzerland                       16
## 33                    Syria                        1
## 34                     U.K.                        2
## 35                U.K./U.S.                       14
## 36                     U.S.                     2734
## 37           U.S.S.R/Russia                      787
## 38          U.S.S.R/Ukraine                        1
## 39                      UAE                        1
## 40                  Vietnam                        1

Finding out which all nations has more than 20 total missions

astro %>%  
  aggregate(total_number_of_missions ~ nationality, data = ., FUN = sum) |> 
  filter( total_number_of_missions >= 20 ) |>
  pluck("nationality")
## [1] "Canada"         "China"          "France"         "Germany"       
## [5] "Italy"          "Japan"          "U.S."           "U.S.S.R/Russia"

Canada, China, France, Germany, Italy, Japan, US, USSR/Russia are the nations which has more than 20 missions completed.

Visual summaries : Including Plots

Analysing patterns in Duration of mission in hours based on Military/Civilian status

status <- astro |>
  group_by(military_civilian) |>
  summarise(mean_hours = mean(total_hrs_sum))

status
## # A tibble: 2 × 2
##   military_civilian mean_hours
##   <chr>                  <dbl>
## 1 civilian               3043.
## 2 military               2919.
ggplot(astro, aes(x = military_civilian, y = total_hrs_sum, fill = military_civilian)) +
  geom_boxplot() +
  labs(title = "Mission Duration Distribution by Military/Civilian Status",
       x = "Status (Military/Civilian)",
       y = "Mission Duration (Hours)") +
  theme_minimal()

From the above graph we could understand that Military /civilian status doesnt have much significance on the mission completion duration. Both of their mean hours are nearly close.

Bar Plot of Number of Missions by Nationality

Grouping by nationality and counting the number of missions completed and filtering them based on more number of missions(>=100)

missions_by_nationality <- astro |>
  group_by(nationality) |>
  summarise(total_missions = sum(total_number_of_missions)) |>
  filter( total_missions >= 100 ) 
missions_by_nationality
## # A tibble: 2 × 2
##   nationality    total_missions
##   <chr>                   <dbl>
## 1 U.S.                     2734
## 2 U.S.S.R/Russia            787
ggplot(missions_by_nationality, aes(x = nationality, y = total_missions, fill = nationality)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Missions Completed by Nationality (More Than 100 Missions)",
       x = "Nationality",
       y = "Total Missions Completed") +
  theme_minimal()

The graph above indicates that the US and Russia have completed most missions, with US leading with a count of 2734 missions.