Import two related datasets from TidyTuesday Project.
big_tech_stock_prices <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-02-07/big_tech_stock_prices.csv')
## Rows: 45088 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): stock_symbol
## dbl (6): open, high, low, close, adj_close, volume
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
big_tech_companies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-02-07/big_tech_companies.csv')
## Rows: 14 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): stock_symbol, company
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1
Data 2
set.seed(1234)
big_tech_stock_small <- big_tech_stock_prices %>%
select(stock_symbol, date, close, volume) %>%
sample_n(10)
big_tech_companies_small <- big_tech_companies %>%
select(stock_symbol, company)
big_tech_stock_small
## # A tibble: 10 × 4
## stock_symbol date close volume
## <chr> <date> <dbl> <dbl>
## 1 ORCL 2018-05-29 46.3 12427800
## 2 ORCL 2018-09-06 47.7 18456100
## 3 TSLA 2010-08-03 1.46 18457500
## 4 CSCO 2018-07-27 42.6 22050900
## 5 NFLX 2016-04-06 105. 9605800
## 6 NVDA 2011-04-06 4.36 85698400
## 7 GOOGL 2014-07-02 29.5 23916000
## 8 CSCO 2018-06-27 42.3 20794800
## 9 IBM 2010-11-03 138. 5420477
## 10 AAPL 2020-06-03 81.3 104491200
big_tech_companies_small
## # A tibble: 14 × 2
## stock_symbol company
## <chr> <chr>
## 1 AAPL Apple Inc.
## 2 ADBE Adobe Inc.
## 3 AMZN Amazon.com, Inc.
## 4 CRM Salesforce, Inc.
## 5 CSCO Cisco Systems, Inc.
## 6 GOOGL Alphabet Inc.
## 7 IBM International Business Machines Corporation
## 8 INTC Intel Corporation
## 9 META Meta Platforms, Inc.
## 10 MSFT Microsoft Corporation
## 11 NFLX Netflix, Inc.
## 12 NVDA NVIDIA Corporation
## 13 ORCL Oracle Corporation
## 14 TSLA Tesla, Inc.
Describe the resulting data:
How is it different from the original two datasets? The joined
dataset combines stock price information with company names.
Unlike the originals, this dataset only includes rows where the
stock_symbol appears in both datasets, and it’s much smaller because we
sampled just 10 rows.
big_tech_companies_small %>%
inner_join(big_tech_stock_small, by = "stock_symbol")
## # A tibble: 10 × 5
## stock_symbol company date close volume
## <chr> <chr> <date> <dbl> <dbl>
## 1 AAPL Apple Inc. 2020-06-03 81.3 1.04e8
## 2 CSCO Cisco Systems, Inc. 2018-07-27 42.6 2.21e7
## 3 CSCO Cisco Systems, Inc. 2018-06-27 42.3 2.08e7
## 4 GOOGL Alphabet Inc. 2014-07-02 29.5 2.39e7
## 5 IBM International Business Machines Corpor… 2010-11-03 138. 5.42e6
## 6 NFLX Netflix, Inc. 2016-04-06 105. 9.61e6
## 7 NVDA NVIDIA Corporation 2011-04-06 4.36 8.57e7
## 8 ORCL Oracle Corporation 2018-05-29 46.3 1.24e7
## 9 ORCL Oracle Corporation 2018-09-06 47.7 1.85e7
## 10 TSLA Tesla, Inc. 2010-08-03 1.46 1.85e7
Describe the resulting data:
How is it different from the original two datasets? A left join keeps all rows from the left table If a company doesn’t appear in the stock data, its new columns show NA.
big_tech_companies_small %>%
left_join(big_tech_stock_small, by = "stock_symbol")
## # A tibble: 16 × 5
## stock_symbol company date close volume
## <chr> <chr> <date> <dbl> <dbl>
## 1 AAPL Apple Inc. 2020-06-03 81.3 1.04e8
## 2 ADBE Adobe Inc. NA NA NA
## 3 AMZN Amazon.com, Inc. NA NA NA
## 4 CRM Salesforce, Inc. NA NA NA
## 5 CSCO Cisco Systems, Inc. 2018-07-27 42.6 2.21e7
## 6 CSCO Cisco Systems, Inc. 2018-06-27 42.3 2.08e7
## 7 GOOGL Alphabet Inc. 2014-07-02 29.5 2.39e7
## 8 IBM International Business Machines Corpo… 2010-11-03 138. 5.42e6
## 9 INTC Intel Corporation NA NA NA
## 10 META Meta Platforms, Inc. NA NA NA
## 11 MSFT Microsoft Corporation NA NA NA
## 12 NFLX Netflix, Inc. 2016-04-06 105. 9.61e6
## 13 NVDA NVIDIA Corporation 2011-04-06 4.36 8.57e7
## 14 ORCL Oracle Corporation 2018-05-29 46.3 1.24e7
## 15 ORCL Oracle Corporation 2018-09-06 47.7 1.85e7
## 16 TSLA Tesla, Inc. 2010-08-03 1.46 1.85e7
Describe the resulting data:
How is it different from the original two datasets? A right
join keeps all rows from the right table
(big_tech_stock_small)
and adds matching company information from the left table
big_tech_companies_small %>%
right_join(big_tech_stock_small, by = "stock_symbol")
## # A tibble: 10 × 5
## stock_symbol company date close volume
## <chr> <chr> <date> <dbl> <dbl>
## 1 AAPL Apple Inc. 2020-06-03 81.3 1.04e8
## 2 CSCO Cisco Systems, Inc. 2018-07-27 42.6 2.21e7
## 3 CSCO Cisco Systems, Inc. 2018-06-27 42.3 2.08e7
## 4 GOOGL Alphabet Inc. 2014-07-02 29.5 2.39e7
## 5 IBM International Business Machines Corpor… 2010-11-03 138. 5.42e6
## 6 NFLX Netflix, Inc. 2016-04-06 105. 9.61e6
## 7 NVDA NVIDIA Corporation 2011-04-06 4.36 8.57e7
## 8 ORCL Oracle Corporation 2018-05-29 46.3 1.24e7
## 9 ORCL Oracle Corporation 2018-09-06 47.7 1.85e7
## 10 TSLA Tesla, Inc. 2010-08-03 1.46 1.85e7
Describe the resulting data:
How is it different from the original two datasets?
big_tech_companies_small %>%
full_join(big_tech_stock_small, by = "stock_symbol")
## # A tibble: 16 × 5
## stock_symbol company date close volume
## <chr> <chr> <date> <dbl> <dbl>
## 1 AAPL Apple Inc. 2020-06-03 81.3 1.04e8
## 2 ADBE Adobe Inc. NA NA NA
## 3 AMZN Amazon.com, Inc. NA NA NA
## 4 CRM Salesforce, Inc. NA NA NA
## 5 CSCO Cisco Systems, Inc. 2018-07-27 42.6 2.21e7
## 6 CSCO Cisco Systems, Inc. 2018-06-27 42.3 2.08e7
## 7 GOOGL Alphabet Inc. 2014-07-02 29.5 2.39e7
## 8 IBM International Business Machines Corpo… 2010-11-03 138. 5.42e6
## 9 INTC Intel Corporation NA NA NA
## 10 META Meta Platforms, Inc. NA NA NA
## 11 MSFT Microsoft Corporation NA NA NA
## 12 NFLX Netflix, Inc. 2016-04-06 105. 9.61e6
## 13 NVDA NVIDIA Corporation 2011-04-06 4.36 8.57e7
## 14 ORCL Oracle Corporation 2018-05-29 46.3 1.24e7
## 15 ORCL Oracle Corporation 2018-09-06 47.7 1.85e7
## 16 TSLA Tesla, Inc. 2010-08-03 1.46 1.85e7
Describe the resulting data:
How is it different from the original two datasets? A semi
join keeps only the rows from the left table
(big_tech_companies_small)
that have a matching key in the right table
(big_tech_stock_small).
It does not add any new columns it only filters the
companies down to those that appear in the stock dataset.
This is different from inner joins because it only returns columns from
one table.
big_tech_companies_small %>%
semi_join(big_tech_stock_small, by = "stock_symbol")
## # A tibble: 8 × 2
## stock_symbol company
## <chr> <chr>
## 1 AAPL Apple Inc.
## 2 CSCO Cisco Systems, Inc.
## 3 GOOGL Alphabet Inc.
## 4 IBM International Business Machines Corporation
## 5 NFLX Netflix, Inc.
## 6 NVDA NVIDIA Corporation
## 7 ORCL Oracle Corporation
## 8 TSLA Tesla, Inc.
Describe the resulting data:
How is it different from the original two datasets? An anti
join returns the opposite of a semi join — it keeps rows from
the left table
(big_tech_companies_small) that do not
have a matching key in the right table
(big_tech_stock_small).
big_tech_companies_small %>%
anti_join(big_tech_stock_small, by = "stock_symbol")
## # A tibble: 6 × 2
## stock_symbol company
## <chr> <chr>
## 1 ADBE Adobe Inc.
## 2 AMZN Amazon.com, Inc.
## 3 CRM Salesforce, Inc.
## 4 INTC Intel Corporation
## 5 META Meta Platforms, Inc.
## 6 MSFT Microsoft Corporation