library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading the data : Using astronaut dataset.
Loading the CSV file to astro
astro <- read_delim('/Users/sneha/H510-Statistics/astronaut.csv')
## Rows: 1277 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, sex, nationality, military_civilian, selection, occupation, ...
## dbl (13): id, number, nationwide_number, year_of_birth, year_of_selection, m...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#head(astro, 5)
Total number of mission’s completed based on sex
astro |>
group_by(sex) |>
summarise(count_mission = sum(total_number_of_missions))
## # A tibble: 2 × 2
## sex count_mission
## <chr> <dbl>
## 1 female 428
## 2 male 3381
There is a significant disparity between the number of missions completed by female and male astronauts. Male astronauts have completed far more missions than female astronauts.The data suggests that male astronauts are currently more represented in space missions compared to female astronauts.
astro |>
group_by(nationality) |>
summarize(count_nation = n())
## # A tibble: 40 × 2
## nationality count_nation
## <chr> <int>
## 1 Afghanistan 1
## 2 Australia 4
## 3 Austria 1
## 4 Belgium 3
## 5 Brazil 1
## 6 Bulgaria 2
## 7 Canada 18
## 8 China 14
## 9 Cuba 1
## 10 Czechoslovakia 1
## # ℹ 30 more rows
Filtering nationality - with count more than 100
astro |>
group_by(nationality) |>
summarize(count_nation = n()) |>
filter(count_nation >=100)
## # A tibble: 2 × 2
## nationality count_nation
## <chr> <int>
## 1 U.S. 854
## 2 U.S.S.R/Russia 273
It can be observed that US and Russia are the only nations that has been listed more than 100 times in the dataset, which mean most of the astronauts in the dataset are either from the US or Russia.
Question 1 : How does the number of missions completed vary across different nationalities ?
Question 2 : In which year did maximum number of missions happen?
Question 3 : What are the patterns in Duration of mission in hours based on the spacecraft used in the orbit?
astro %>%
aggregate(total_number_of_missions ~ nationality, data = ., FUN = sum)
## nationality total_number_of_missions
## 1 Afghanistan 1
## 2 Australia 16
## 3 Austria 1
## 4 Belgium 5
## 5 Brazil 1
## 6 Bulgaria 2
## 7 Canada 38
## 8 China 22
## 9 Cuba 1
## 10 Czechoslovakia 1
## 11 Denmark 1
## 12 France 38
## 13 Germany 28
## 14 Hungry 1
## 15 India 1
## 16 Israel 1
## 17 Italy 29
## 18 Japan 42
## 19 Kazakhstan 1
## 20 Korea 1
## 21 Malysia 1
## 22 Mexico 1
## 23 Mongolia 1
## 24 Netherland 5
## 25 Poland 1
## 26 Republic of South Africa 1
## 27 Romania 1
## 28 Saudi Arabia 1
## 29 Slovakia 1
## 30 Spain 4
## 31 Sweden 4
## 32 Switzerland 16
## 33 Syria 1
## 34 U.K. 2
## 35 U.K./U.S. 14
## 36 U.S. 2734
## 37 U.S.S.R/Russia 787
## 38 U.S.S.R/Ukraine 1
## 39 UAE 1
## 40 Vietnam 1
Finding out which all nations has more than 20 total missions
astro %>%
aggregate(total_number_of_missions ~ nationality, data = ., FUN = sum) |>
filter( total_number_of_missions >= 20 ) |>
pluck("nationality")
## [1] "Canada" "China" "France" "Germany"
## [5] "Italy" "Japan" "U.S." "U.S.S.R/Russia"
Canada, China, France, Germany, Italy, Japan, US, USSR/Russia are the nations which has more than 20 missions completed.
status <- astro |>
group_by(military_civilian) |>
summarise(mean_hours = mean(total_hrs_sum))
status
## # A tibble: 2 × 2
## military_civilian mean_hours
## <chr> <dbl>
## 1 civilian 3043.
## 2 military 2919.
ggplot(astro, aes(x = military_civilian, y = total_hrs_sum, fill = military_civilian)) +
geom_boxplot() +
labs(title = "Mission Duration Distribution by Military/Civilian Status",
x = "Status (Military/Civilian)",
y = "Mission Duration (Hours)") +
theme_minimal()
From the above graph we could understand that Military /civilian status doesnt have much significance on the mission completion duration. Both of their mean hours are nearly close.
Grouping by nationality and counting the number of missions completed and filtering them based on more number of missions(>=100)
missions_by_nationality <- astro |>
group_by(nationality) |>
summarise(total_missions = sum(total_number_of_missions)) |>
filter( total_missions >= 100 )
missions_by_nationality
## # A tibble: 2 × 2
## nationality total_missions
## <chr> <dbl>
## 1 U.S. 2734
## 2 U.S.S.R/Russia 787
ggplot(missions_by_nationality, aes(x = nationality, y = total_missions, fill = nationality)) +
geom_bar(stat = "identity") +
labs(title = "Total Missions Completed by Nationality (More Than 100 Missions)",
x = "Nationality",
y = "Total Missions Completed") +
theme_minimal()
The graph above indicates that the US and Russia have completed most missions, with US leading with a count of 2734 missions.