Import two related datasets from TidyTuesday Project.
big_tech_stock_prices <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-02-07/big_tech_stock_prices.csv')## Rows: 45088 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): stock_symbol
## dbl  (6): open, high, low, close, adj_close, volume
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.big_tech_companies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-02-07/big_tech_companies.csv')## Rows: 14 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): stock_symbol, company
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Describe the two datasets:
Data1
Data 2
set.seed(1234)
big_tech_stock_small <- big_tech_stock_prices %>%
  select(stock_symbol, date, close, volume) %>%
  sample_n(10)
big_tech_companies_small <- big_tech_companies %>%
  select(stock_symbol, company)
big_tech_stock_small## # A tibble: 10 × 4
##    stock_symbol date        close    volume
##    <chr>        <date>      <dbl>     <dbl>
##  1 ORCL         2018-05-29  46.3   12427800
##  2 ORCL         2018-09-06  47.7   18456100
##  3 TSLA         2010-08-03   1.46  18457500
##  4 CSCO         2018-07-27  42.6   22050900
##  5 NFLX         2016-04-06 105.     9605800
##  6 NVDA         2011-04-06   4.36  85698400
##  7 GOOGL        2014-07-02  29.5   23916000
##  8 CSCO         2018-06-27  42.3   20794800
##  9 IBM          2010-11-03 138.     5420477
## 10 AAPL         2020-06-03  81.3  104491200big_tech_companies_small## # A tibble: 14 × 2
##    stock_symbol company                                    
##    <chr>        <chr>                                      
##  1 AAPL         Apple Inc.                                 
##  2 ADBE         Adobe Inc.                                 
##  3 AMZN         Amazon.com, Inc.                           
##  4 CRM          Salesforce, Inc.                           
##  5 CSCO         Cisco Systems, Inc.                        
##  6 GOOGL        Alphabet Inc.                              
##  7 IBM          International Business Machines Corporation
##  8 INTC         Intel Corporation                          
##  9 META         Meta Platforms, Inc.                       
## 10 MSFT         Microsoft Corporation                      
## 11 NFLX         Netflix, Inc.                              
## 12 NVDA         NVIDIA Corporation                         
## 13 ORCL         Oracle Corporation                         
## 14 TSLA         Tesla, Inc.Describe the resulting data:
How is it different from the original two datasets? The joined
dataset combines stock price information with company names.
Unlike the originals, this dataset only includes rows where the
stock_symbol appears in both datasets, and it’s much smaller because we
sampled just 10 rows.
big_tech_companies_small %>%
  inner_join(big_tech_stock_small, by = "stock_symbol")## # A tibble: 10 × 5
##    stock_symbol company                                 date        close volume
##    <chr>        <chr>                                   <date>      <dbl>  <dbl>
##  1 AAPL         Apple Inc.                              2020-06-03  81.3  1.04e8
##  2 CSCO         Cisco Systems, Inc.                     2018-07-27  42.6  2.21e7
##  3 CSCO         Cisco Systems, Inc.                     2018-06-27  42.3  2.08e7
##  4 GOOGL        Alphabet Inc.                           2014-07-02  29.5  2.39e7
##  5 IBM          International Business Machines Corpor… 2010-11-03 138.   5.42e6
##  6 NFLX         Netflix, Inc.                           2016-04-06 105.   9.61e6
##  7 NVDA         NVIDIA Corporation                      2011-04-06   4.36 8.57e7
##  8 ORCL         Oracle Corporation                      2018-05-29  46.3  1.24e7
##  9 ORCL         Oracle Corporation                      2018-09-06  47.7  1.85e7
## 10 TSLA         Tesla, Inc.                             2010-08-03   1.46 1.85e7Describe the resulting data:
How is it different from the original two datasets? A left join keeps all rows from the left table If a company doesn’t appear in the stock data, its new columns show NA.
big_tech_companies_small %>%
  left_join(big_tech_stock_small, by = "stock_symbol")## # A tibble: 16 × 5
##    stock_symbol company                                date        close  volume
##    <chr>        <chr>                                  <date>      <dbl>   <dbl>
##  1 AAPL         Apple Inc.                             2020-06-03  81.3   1.04e8
##  2 ADBE         Adobe Inc.                             NA          NA    NA     
##  3 AMZN         Amazon.com, Inc.                       NA          NA    NA     
##  4 CRM          Salesforce, Inc.                       NA          NA    NA     
##  5 CSCO         Cisco Systems, Inc.                    2018-07-27  42.6   2.21e7
##  6 CSCO         Cisco Systems, Inc.                    2018-06-27  42.3   2.08e7
##  7 GOOGL        Alphabet Inc.                          2014-07-02  29.5   2.39e7
##  8 IBM          International Business Machines Corpo… 2010-11-03 138.    5.42e6
##  9 INTC         Intel Corporation                      NA          NA    NA     
## 10 META         Meta Platforms, Inc.                   NA          NA    NA     
## 11 MSFT         Microsoft Corporation                  NA          NA    NA     
## 12 NFLX         Netflix, Inc.                          2016-04-06 105.    9.61e6
## 13 NVDA         NVIDIA Corporation                     2011-04-06   4.36  8.57e7
## 14 ORCL         Oracle Corporation                     2018-05-29  46.3   1.24e7
## 15 ORCL         Oracle Corporation                     2018-09-06  47.7   1.85e7
## 16 TSLA         Tesla, Inc.                            2010-08-03   1.46  1.85e7Describe the resulting data:
How is it different from the original two datasets? A right
join keeps all rows from the right table
(big_tech_stock_small)
and adds matching company information from the left table
big_tech_companies_small %>%
  right_join(big_tech_stock_small, by = "stock_symbol")## # A tibble: 10 × 5
##    stock_symbol company                                 date        close volume
##    <chr>        <chr>                                   <date>      <dbl>  <dbl>
##  1 AAPL         Apple Inc.                              2020-06-03  81.3  1.04e8
##  2 CSCO         Cisco Systems, Inc.                     2018-07-27  42.6  2.21e7
##  3 CSCO         Cisco Systems, Inc.                     2018-06-27  42.3  2.08e7
##  4 GOOGL        Alphabet Inc.                           2014-07-02  29.5  2.39e7
##  5 IBM          International Business Machines Corpor… 2010-11-03 138.   5.42e6
##  6 NFLX         Netflix, Inc.                           2016-04-06 105.   9.61e6
##  7 NVDA         NVIDIA Corporation                      2011-04-06   4.36 8.57e7
##  8 ORCL         Oracle Corporation                      2018-05-29  46.3  1.24e7
##  9 ORCL         Oracle Corporation                      2018-09-06  47.7  1.85e7
## 10 TSLA         Tesla, Inc.                             2010-08-03   1.46 1.85e7Describe the resulting data:
How is it different from the original two datasets?
big_tech_companies_small %>%
  full_join(big_tech_stock_small, by = "stock_symbol")## # A tibble: 16 × 5
##    stock_symbol company                                date        close  volume
##    <chr>        <chr>                                  <date>      <dbl>   <dbl>
##  1 AAPL         Apple Inc.                             2020-06-03  81.3   1.04e8
##  2 ADBE         Adobe Inc.                             NA          NA    NA     
##  3 AMZN         Amazon.com, Inc.                       NA          NA    NA     
##  4 CRM          Salesforce, Inc.                       NA          NA    NA     
##  5 CSCO         Cisco Systems, Inc.                    2018-07-27  42.6   2.21e7
##  6 CSCO         Cisco Systems, Inc.                    2018-06-27  42.3   2.08e7
##  7 GOOGL        Alphabet Inc.                          2014-07-02  29.5   2.39e7
##  8 IBM          International Business Machines Corpo… 2010-11-03 138.    5.42e6
##  9 INTC         Intel Corporation                      NA          NA    NA     
## 10 META         Meta Platforms, Inc.                   NA          NA    NA     
## 11 MSFT         Microsoft Corporation                  NA          NA    NA     
## 12 NFLX         Netflix, Inc.                          2016-04-06 105.    9.61e6
## 13 NVDA         NVIDIA Corporation                     2011-04-06   4.36  8.57e7
## 14 ORCL         Oracle Corporation                     2018-05-29  46.3   1.24e7
## 15 ORCL         Oracle Corporation                     2018-09-06  47.7   1.85e7
## 16 TSLA         Tesla, Inc.                            2010-08-03   1.46  1.85e7Describe the resulting data:
How is it different from the original two datasets? A semi
join keeps only the rows from the left table
(big_tech_companies_small)
that have a matching key in the right table
(big_tech_stock_small).
It does not add any new columns it only filters the
companies down to those that appear in the stock dataset.
This is different from inner joins because it only returns columns from
one table.
big_tech_companies_small %>%
  semi_join(big_tech_stock_small, by = "stock_symbol")## # A tibble: 8 × 2
##   stock_symbol company                                    
##   <chr>        <chr>                                      
## 1 AAPL         Apple Inc.                                 
## 2 CSCO         Cisco Systems, Inc.                        
## 3 GOOGL        Alphabet Inc.                              
## 4 IBM          International Business Machines Corporation
## 5 NFLX         Netflix, Inc.                              
## 6 NVDA         NVIDIA Corporation                         
## 7 ORCL         Oracle Corporation                         
## 8 TSLA         Tesla, Inc.Describe the resulting data:
How is it different from the original two datasets? An anti
join returns the opposite of a semi join — it keeps rows from
the left table
(big_tech_companies_small) that do not
have a matching key in the right table
(big_tech_stock_small).
big_tech_companies_small %>%
  anti_join(big_tech_stock_small, by = "stock_symbol")## # A tibble: 6 × 2
##   stock_symbol company              
##   <chr>        <chr>                
## 1 ADBE         Adobe Inc.           
## 2 AMZN         Amazon.com, Inc.     
## 3 CRM          Salesforce, Inc.     
## 4 INTC         Intel Corporation    
## 5 META         Meta Platforms, Inc. 
## 6 MSFT         Microsoft Corporation