Untuk mengerjakan soal Praktik 1, langkah pertama adalah memanggil package dan menyiapkan data yang akan digunakanan dan sebagainya.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.3
## Warning: package 'ggplot2' was built under R version 4.5.3
## Warning: package 'tidyr' was built under R version 4.5.3
## Warning: package 'readr' was built under R version 4.5.3
## Warning: package 'purrr' was built under R version 4.5.3
## Warning: package 'dplyr' was built under R version 4.5.3
## Warning: package 'stringr' was built under R version 4.5.3
## Warning: package 'forcats' was built under R version 4.5.3
## Warning: package 'lubridate' was built under R version 4.5.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Menyimpan URL raw data GitHub ke dalam variabel
url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-01/key_crop_yields.csv"
# Membaca data read_csv
df_crop <- read_csv(url)
## Rows: 13075 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (12): Year, Wheat (tonnes per hectare), Rice (tonnes per hectare), Maize...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Bertujuan agar datanya lebih baik
glimpse(df_crop)
## Rows: 13,075
## Columns: 14
## $ Entity <chr> "Afghanistan", "Afghanistan", "Afgh…
## $ Code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", …
## $ Year <dbl> 1961, 1962, 1963, 1964, 1965, 1966,…
## $ `Wheat (tonnes per hectare)` <dbl> 1.0220, 0.9735, 0.8317, 0.9510, 0.9…
## $ `Rice (tonnes per hectare)` <dbl> 1.5190, 1.5190, 1.5190, 1.7273, 1.7…
## $ `Maize (tonnes per hectare)` <dbl> 1.4000, 1.4000, 1.4260, 1.4257, 1.4…
## $ `Soybeans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Potatoes (tonnes per hectare)` <dbl> 8.6667, 7.6667, 8.1333, 8.6000, 8.8…
## $ `Beans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Peas (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Cassava (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Barley (tonnes per hectare)` <dbl> 1.0800, 1.0800, 1.0800, 1.0857, 1.0…
## $ `Cocoa beans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Bananas (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
df_crop$Entity <- as.factor(df_crop$Entity)
df_crop$Code <- as.factor(df_crop$Code)
#Periksa apakah ada perubahan
glimpse(df_crop)
## Rows: 13,075
## Columns: 14
## $ Entity <fct> "Afghanistan", "Afghanistan", "Afgh…
## $ Code <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, …
## $ Year <dbl> 1961, 1962, 1963, 1964, 1965, 1966,…
## $ `Wheat (tonnes per hectare)` <dbl> 1.0220, 0.9735, 0.8317, 0.9510, 0.9…
## $ `Rice (tonnes per hectare)` <dbl> 1.5190, 1.5190, 1.5190, 1.7273, 1.7…
## $ `Maize (tonnes per hectare)` <dbl> 1.4000, 1.4000, 1.4260, 1.4257, 1.4…
## $ `Soybeans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Potatoes (tonnes per hectare)` <dbl> 8.6667, 7.6667, 8.1333, 8.6000, 8.8…
## $ `Beans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Peas (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Cassava (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Barley (tonnes per hectare)` <dbl> 1.0800, 1.0800, 1.0800, 1.0857, 1.0…
## $ `Cocoa beans (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Bananas (tonnes per hectare)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Menampilkan kolom Entity, Year, Potatoes, dan Cassava saja. Gunakan fungsi Select
data_select <- select(df_crop,
Entity,
Year,
`Potatoes (tonnes per hectare)`,
`Cassava (tonnes per hectare)`)
data_select
## # A tibble: 13,075 × 4
## Entity Year `Potatoes (tonnes per hectare)` Cassava (tonnes per hecta…¹
## <fct> <dbl> <dbl> <dbl>
## 1 Afghanistan 1961 8.67 NA
## 2 Afghanistan 1962 7.67 NA
## 3 Afghanistan 1963 8.13 NA
## 4 Afghanistan 1964 8.6 NA
## 5 Afghanistan 1965 8.8 NA
## 6 Afghanistan 1966 9.07 NA
## 7 Afghanistan 1967 9.8 NA
## 8 Afghanistan 1968 10 NA
## 9 Afghanistan 1969 10.2 NA
## 10 Afghanistan 1970 9.54 NA
## # ℹ 13,065 more rows
## # ℹ abbreviated name: ¹`Cassava (tonnes per hectare)`
Mengeliminasi kolom Soybeans, Beans, dan Peas dari tabel.
data_select <- select(df_crop,
-c(`Soybeans (tonnes per hectare)`,
`Beans (tonnes per hectare)`,
`Peas (tonnes per hectare)`))
data_select
## # A tibble: 13,075 × 11
## Entity Code Year `Wheat (tonnes per hectare)` Rice (tonnes per hecta…¹
## <fct> <fct> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1961 1.02 1.52
## 2 Afghanistan AFG 1962 0.974 1.52
## 3 Afghanistan AFG 1963 0.832 1.52
## 4 Afghanistan AFG 1964 0.951 1.73
## 5 Afghanistan AFG 1965 0.972 1.73
## 6 Afghanistan AFG 1966 0.867 1.52
## 7 Afghanistan AFG 1967 1.12 1.92
## 8 Afghanistan AFG 1968 1.16 1.95
## 9 Afghanistan AFG 1969 1.19 1.98
## 10 Afghanistan AFG 1970 0.956 1.81
## # ℹ 13,065 more rows
## # ℹ abbreviated name: ¹`Rice (tonnes per hectare)`
## # ℹ 6 more variables: `Maize (tonnes per hectare)` <dbl>,
## # `Potatoes (tonnes per hectare)` <dbl>,
## # `Cassava (tonnes per hectare)` <dbl>, `Barley (tonnes per hectare)` <dbl>,
## # `Cocoa beans (tonnes per hectare)` <dbl>,
## # `Bananas (tonnes per hectare)` <dbl>
Menampilkan tahun hasil panen padi (Rice) di Indonesia yang nilainya di bawah 2 ton.
data_filter <- filter(df_crop,
Entity == "Indonesia",
`Rice (tonnes per hectare)` < 2)
select(data_filter,
Entity,
Code,
Year,
`Rice (tonnes per hectare)`)
## # A tibble: 7 × 4
## Entity Code Year `Rice (tonnes per hectare)`
## <fct> <fct> <dbl> <dbl>
## 1 Indonesia IDN 1961 1.76
## 2 Indonesia IDN 1962 1.79
## 3 Indonesia IDN 1963 1.72
## 4 Indonesia IDN 1964 1.76
## 5 Indonesia IDN 1965 1.77
## 6 Indonesia IDN 1966 1.77
## 7 Indonesia IDN 1967 1.76
Menampilkan negara apa saja yang memiliki hasil panen gandum (Wheat) di atas 5 ton pada tahun 2000 ke atas.
data_filter <- filter(df_crop,
`Wheat (tonnes per hectare)` > 5,
Year >= 2000)
select(data_filter,
Entity,
Code,
Year,
`Wheat (tonnes per hectare)`)
## # A tibble: 424 × 4
## Entity Code Year `Wheat (tonnes per hectare)`
## <fct> <fct> <dbl> <dbl>
## 1 Austria AUT 2001 5.24
## 2 Austria AUT 2004 5.92
## 3 Austria AUT 2005 5.03
## 4 Austria AUT 2008 5.69
## 5 Austria AUT 2010 5.01
## 6 Austria AUT 2011 5.85
## 7 Austria AUT 2013 5.37
## 8 Austria AUT 2014 5.92
## 9 Austria AUT 2015 5.70
## 10 Austria AUT 2016 6.25
## # ℹ 414 more rows
Menampilkan data negara Indonesia dan Malaysia khusus untuk tahun 2015 saja dengan menggunakan funsi filter.
data_filter <- filter(df_crop,
Entity %in% c("Indonesia", "Malaysia"),
Year == 2015)
data_filter
## # A tibble: 2 × 14
## Entity Code Year `Wheat (tonnes per hectare)` `Rice (tonnes per hectare)`
## <fct> <fct> <dbl> <dbl> <dbl>
## 1 Indonesia IDN 2015 NA 5.34
## 2 Malaysia MYS 2015 NA 4.02
## # ℹ 9 more variables: `Maize (tonnes per hectare)` <dbl>,
## # `Soybeans (tonnes per hectare)` <dbl>,
## # `Potatoes (tonnes per hectare)` <dbl>, `Beans (tonnes per hectare)` <dbl>,
## # `Peas (tonnes per hectare)` <dbl>, `Cassava (tonnes per hectare)` <dbl>,
## # `Barley (tonnes per hectare)` <dbl>,
## # `Cocoa beans (tonnes per hectare)` <dbl>,
## # `Bananas (tonnes per hectare)` <dbl>
Menampilkan negara yang memiliki hasil panen jagung (Maize) paling rendah pada tahun 2020.
data_arrange <- filter(df_crop,
Year == 2020,
!is.na(`Maize (tonnes per hectare)`)) %>%
arrange(`Maize (tonnes per hectare)`)
select(data_arrange,
Entity,
Code,
Year,
`Maize (tonnes per hectare)`)
## # A tibble: 0 × 4
## # ℹ 4 variables: Entity <fct>, Code <fct>, Year <dbl>,
## # Maize (tonnes per hectare) <dbl>
Karena pada hasil diatas, data tahun 2020 tidak tersedia pada dataset, sehingga dilakukan pengecekan tahun terakhir pada data, jika ingin tau saja.
max(df_crop$Year)
## [1] 2018
Berdasarkan hasil tersebut, tahun terakhir pada dataset adalah tahun 2018.
Maka seandainya ingin tau, kita dapat menampilkan negara yang memiliki hasil panen jagung (Maize) paling rendah pada tahun 2018.
data_arrange <- filter(df_crop,
Year == 2018,
!is.na(`Maize (tonnes per hectare)`)) %>%
arrange(`Maize (tonnes per hectare)`)
select(data_arrange,
Entity,
Code,
Year,
`Maize (tonnes per hectare)`)
## # A tibble: 201 × 4
## Entity Code Year `Maize (tonnes per hectare)`
## <fct> <fct> <dbl> <dbl>
## 1 Cape Verde CPV 2018 0.123
## 2 Botswana BWA 2018 0.211
## 3 South Sudan SSD 2018 0.426
## 4 Zimbabwe ZWE 2018 0.613
## 5 Vanuatu VUT 2018 0.616
## 6 Mauritania MRT 2018 0.689
## 7 Democratic Republic of Congo COD 2018 0.775
## 8 Lesotho LSO 2018 0.785
## 9 Morocco MAR 2018 0.799
## 10 Haiti HTI 2018 0.832
## # ℹ 191 more rows
Jadi kesimpulannya pada tahun 2020 kita tidak dapat menghasilkan output Negara mana yang punya hasil jagung (Maize) paling rendah di tahun 2020 karena tidak ada data mengenai tahun yang diminta.
Mengurutkan data Indonesia berdasarkan hasil panen kentang (Potatoes) dari yang paling tinggi.
data_arrange <- filter(df_crop,
Entity == "Indonesia") %>%
arrange(desc(`Potatoes (tonnes per hectare)`))
select(data_arrange,
Entity,
Code,
Year,
`Potatoes (tonnes per hectare)`)
## # A tibble: 58 × 4
## Entity Code Year `Potatoes (tonnes per hectare)`
## <fct> <fct> <dbl> <dbl>
## 1 Indonesia IDN 2018 18.7
## 2 Indonesia IDN 2016 18.3
## 3 Indonesia IDN 2015 18.2
## 4 Indonesia IDN 2014 17.7
## 5 Indonesia IDN 2006 16.9
## 6 Indonesia IDN 2008 16.7
## 7 Indonesia IDN 1995 16.6
## 8 Indonesia IDN 2012 16.6
## 9 Indonesia IDN 2009 16.5
## 10 Indonesia IDN 2005 16.4
## # ℹ 48 more rows
Berdasarkan hasil output, data hasil panen kentang (Potatoes) di Indonesia telah diurutkan dari nilai tertinggi ke terendah.
Membuat kolom Rice_Status berisi teks “Tinggi” jika hasil panen padi lebih dari 4 ton, dan “Rendah” jika di bawahnya.
data_mutate <- df_crop %>%
mutate(Rice_Status = ifelse(`Rice (tonnes per hectare)` > 4,
"Tinggi",
"Rendah"))
select(data_mutate,
Entity,
Code,
Year,
`Rice (tonnes per hectare)`,
Rice_Status)
## # A tibble: 13,075 × 5
## Entity Code Year `Rice (tonnes per hectare)` Rice_Status
## <fct> <fct> <dbl> <dbl> <chr>
## 1 Afghanistan AFG 1961 1.52 Rendah
## 2 Afghanistan AFG 1962 1.52 Rendah
## 3 Afghanistan AFG 1963 1.52 Rendah
## 4 Afghanistan AFG 1964 1.73 Rendah
## 5 Afghanistan AFG 1965 1.73 Rendah
## 6 Afghanistan AFG 1966 1.52 Rendah
## 7 Afghanistan AFG 1967 1.92 Rendah
## 8 Afghanistan AFG 1968 1.95 Rendah
## 9 Afghanistan AFG 1969 1.98 Rendah
## 10 Afghanistan AFG 1970 1.81 Rendah
## # ℹ 13,065 more rows
Berdasarkan hasil output, kolom baru Rice_Status berhasil dibuat sesuai kondisi hasil panen padi.
Menghitung rata-rata hasil panen pisang (Bananas) di Indonesia dari seluruh tahun yang tersedia.
data_summary <- df_crop %>%
filter(Entity == "Indonesia") %>%
summarise(`Rata-rata hasil panen pisang di Indonesia` = mean(`Bananas (tonnes per hectare)`,
na.rm = TRUE))
data_summary
## # A tibble: 1 × 1
## `Rata-rata hasil panen pisang di Indonesia`
## <dbl>
## 1 30.5
Berdasarkan hasil output, diperoleh rata-rata hasil panen pisang (Bananas) di Indonesia dari seluruh tahun yang tersedia pada dataset adalah sebesar 30.50554.
Menampilkan data jagung (Maize) mulai tahun 2010, kemudian menghitung simpangan baku per negara dan mengurutkannya dari nilai terbesar.
data_summary <- df_crop %>%
filter(Year >= 2010,
!is.na(`Maize (tonnes per hectare)`)) %>%
group_by(Entity) %>%
summarise(`Standar Deviasi Maize` = sd(`Maize (tonnes per hectare)`,
na.rm = TRUE)) %>%
arrange(desc(`Standar Deviasi Maize`))
data_summary
## # A tibble: 202 × 2
## Entity `Standar Deviasi Maize`
## <fct> <dbl>
## 1 Kuwait 9.24
## 2 United Arab Emirates 9.19
## 3 Jordan 7.03
## 4 Israel 4.80
## 5 Saint Vincent and the Grenadines 2.89
## 6 Qatar 2.74
## 7 French Guiana 2.50
## 8 New Caledonia 2.29
## 9 Slovakia 1.68
## 10 Oman 1.61
## # ℹ 192 more rows
Berdasarkan hasil output, simpangan baku hasil panen jagung (Maize) setiap negara berhasil dihitung dan diurutkan dari nilai terbesar ke terkecil.