Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

big_tech_stock_prices <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-02-07/big_tech_stock_prices.csv')

## Rows: 45088 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): stock_symbol
## dbl  (6): open, high, low, close, adj_close, volume
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

big_tech_companies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-02-07/big_tech_companies.csv')

## Rows: 14 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): stock_symbol, company
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1

Columns: Big tech prices
Rows: 10

Data 2

Columns:
Rows:

set.seed(1234)

big_tech_stock_small <- big_tech_stock_prices %>%
  select(stock_symbol, date, close, volume) %>%
  sample_n(10)

big_tech_companies_small <- big_tech_companies %>%
  select(stock_symbol, company)

big_tech_stock_small

## # A tibble: 10 × 4
##    stock_symbol date        close    volume
##    <chr>        <date>      <dbl>     <dbl>
##  1 ORCL         2018-05-29  46.3   12427800
##  2 ORCL         2018-09-06  47.7   18456100
##  3 TSLA         2010-08-03   1.46  18457500
##  4 CSCO         2018-07-27  42.6   22050900
##  5 NFLX         2016-04-06 105.     9605800
##  6 NVDA         2011-04-06   4.36  85698400
##  7 GOOGL        2014-07-02  29.5   23916000
##  8 CSCO         2018-06-27  42.3   20794800
##  9 IBM          2010-11-03 138.     5420477
## 10 AAPL         2020-06-03  81.3  104491200

big_tech_companies_small

## # A tibble: 14 × 2
##    stock_symbol company                                    
##    <chr>        <chr>                                      
##  1 AAPL         Apple Inc.                                 
##  2 ADBE         Adobe Inc.                                 
##  3 AMZN         Amazon.com, Inc.                           
##  4 CRM          Salesforce, Inc.                           
##  5 CSCO         Cisco Systems, Inc.                        
##  6 GOOGL        Alphabet Inc.                              
##  7 IBM          International Business Machines Corporation
##  8 INTC         Intel Corporation                          
##  9 META         Meta Platforms, Inc.                       
## 10 MSFT         Microsoft Corporation                      
## 11 NFLX         Netflix, Inc.                              
## 12 NVDA         NVIDIA Corporation                         
## 13 ORCL         Oracle Corporation                         
## 14 TSLA         Tesla, Inc.

3. inner_join

Describe the resulting data:

Columns:10
Rows:5

How is it different from the original two datasets? The joined dataset combines stock price information with company names.
Unlike the originals, this dataset only includes rows where the stock_symbol appears in both datasets, and it’s much smaller because we sampled just 10 rows.

big_tech_companies_small %>%
  inner_join(big_tech_stock_small, by = "stock_symbol")

## # A tibble: 10 × 5
##    stock_symbol company                                 date        close volume
##    <chr>        <chr>                                   <date>      <dbl>  <dbl>
##  1 AAPL         Apple Inc.                              2020-06-03  81.3  1.04e8
##  2 CSCO         Cisco Systems, Inc.                     2018-07-27  42.6  2.21e7
##  3 CSCO         Cisco Systems, Inc.                     2018-06-27  42.3  2.08e7
##  4 GOOGL        Alphabet Inc.                           2014-07-02  29.5  2.39e7
##  5 IBM          International Business Machines Corpor… 2010-11-03 138.   5.42e6
##  6 NFLX         Netflix, Inc.                           2016-04-06 105.   9.61e6
##  7 NVDA         NVIDIA Corporation                      2011-04-06   4.36 8.57e7
##  8 ORCL         Oracle Corporation                      2018-05-29  46.3  1.24e7
##  9 ORCL         Oracle Corporation                      2018-09-06  47.7  1.85e7
## 10 TSLA         Tesla, Inc.                             2010-08-03   1.46 1.85e7

4. left_join

Describe the resulting data:

Columns:16
Rows:5

How is it different from the original two datasets? A left join keeps all rows from the left table If a company doesn’t appear in the stock data, its new columns show NA.

big_tech_companies_small %>%
  left_join(big_tech_stock_small, by = "stock_symbol")

## # A tibble: 16 × 5
##    stock_symbol company                                date        close  volume
##    <chr>        <chr>                                  <date>      <dbl>   <dbl>
##  1 AAPL         Apple Inc.                             2020-06-03  81.3   1.04e8
##  2 ADBE         Adobe Inc.                             NA          NA    NA     
##  3 AMZN         Amazon.com, Inc.                       NA          NA    NA     
##  4 CRM          Salesforce, Inc.                       NA          NA    NA     
##  5 CSCO         Cisco Systems, Inc.                    2018-07-27  42.6   2.21e7
##  6 CSCO         Cisco Systems, Inc.                    2018-06-27  42.3   2.08e7
##  7 GOOGL        Alphabet Inc.                          2014-07-02  29.5   2.39e7
##  8 IBM          International Business Machines Corpo… 2010-11-03 138.    5.42e6
##  9 INTC         Intel Corporation                      NA          NA    NA     
## 10 META         Meta Platforms, Inc.                   NA          NA    NA     
## 11 MSFT         Microsoft Corporation                  NA          NA    NA     
## 12 NFLX         Netflix, Inc.                          2016-04-06 105.    9.61e6
## 13 NVDA         NVIDIA Corporation                     2011-04-06   4.36  8.57e7
## 14 ORCL         Oracle Corporation                     2018-05-29  46.3   1.24e7
## 15 ORCL         Oracle Corporation                     2018-09-06  47.7   1.85e7
## 16 TSLA         Tesla, Inc.                            2010-08-03   1.46  1.85e7

5. right_join

Describe the resulting data:

Columns:10
Rows:5

How is it different from the original two datasets? A right join keeps all rows from the right table (big_tech_stock_small)
and adds matching company information from the left table

big_tech_companies_small %>%
  right_join(big_tech_stock_small, by = "stock_symbol")

## # A tibble: 10 × 5
##    stock_symbol company                                 date        close volume
##    <chr>        <chr>                                   <date>      <dbl>  <dbl>
##  1 AAPL         Apple Inc.                              2020-06-03  81.3  1.04e8
##  2 CSCO         Cisco Systems, Inc.                     2018-07-27  42.6  2.21e7
##  3 CSCO         Cisco Systems, Inc.                     2018-06-27  42.3  2.08e7
##  4 GOOGL        Alphabet Inc.                           2014-07-02  29.5  2.39e7
##  5 IBM          International Business Machines Corpor… 2010-11-03 138.   5.42e6
##  6 NFLX         Netflix, Inc.                           2016-04-06 105.   9.61e6
##  7 NVDA         NVIDIA Corporation                      2011-04-06   4.36 8.57e7
##  8 ORCL         Oracle Corporation                      2018-05-29  46.3  1.24e7
##  9 ORCL         Oracle Corporation                      2018-09-06  47.7  1.85e7
## 10 TSLA         Tesla, Inc.                             2010-08-03   1.46 1.85e7

6. full_join

Describe the resulting data:

Columns:16
Rows:5

How is it different from the original two datasets?

big_tech_companies_small %>%
  full_join(big_tech_stock_small, by = "stock_symbol")

## # A tibble: 16 × 5
##    stock_symbol company                                date        close  volume
##    <chr>        <chr>                                  <date>      <dbl>   <dbl>
##  1 AAPL         Apple Inc.                             2020-06-03  81.3   1.04e8
##  2 ADBE         Adobe Inc.                             NA          NA    NA     
##  3 AMZN         Amazon.com, Inc.                       NA          NA    NA     
##  4 CRM          Salesforce, Inc.                       NA          NA    NA     
##  5 CSCO         Cisco Systems, Inc.                    2018-07-27  42.6   2.21e7
##  6 CSCO         Cisco Systems, Inc.                    2018-06-27  42.3   2.08e7
##  7 GOOGL        Alphabet Inc.                          2014-07-02  29.5   2.39e7
##  8 IBM          International Business Machines Corpo… 2010-11-03 138.    5.42e6
##  9 INTC         Intel Corporation                      NA          NA    NA     
## 10 META         Meta Platforms, Inc.                   NA          NA    NA     
## 11 MSFT         Microsoft Corporation                  NA          NA    NA     
## 12 NFLX         Netflix, Inc.                          2016-04-06 105.    9.61e6
## 13 NVDA         NVIDIA Corporation                     2011-04-06   4.36  8.57e7
## 14 ORCL         Oracle Corporation                     2018-05-29  46.3   1.24e7
## 15 ORCL         Oracle Corporation                     2018-09-06  47.7   1.85e7
## 16 TSLA         Tesla, Inc.                            2010-08-03   1.46  1.85e7

7. semi_join

Describe the resulting data:

Columns:8
Rows:2

How is it different from the original two datasets? A semi join keeps only the rows from the left table (big_tech_companies_small)
that have a matching key in the right table (big_tech_stock_small).
It does not add any new columns it only filters the companies down to those that appear in the stock dataset.
This is different from inner joins because it only returns columns from one table.

big_tech_companies_small %>%
  semi_join(big_tech_stock_small, by = "stock_symbol")

## # A tibble: 8 × 2
##   stock_symbol company                                    
##   <chr>        <chr>                                      
## 1 AAPL         Apple Inc.                                 
## 2 CSCO         Cisco Systems, Inc.                        
## 3 GOOGL        Alphabet Inc.                              
## 4 IBM          International Business Machines Corporation
## 5 NFLX         Netflix, Inc.                              
## 6 NVDA         NVIDIA Corporation                         
## 7 ORCL         Oracle Corporation                         
## 8 TSLA         Tesla, Inc.

8. anti_join

Describe the resulting data:

Columns:6
Rows:2

How is it different from the original two datasets? An anti join returns the opposite of a semi join — it keeps rows from the left table
(big_tech_companies_small) that do not have a matching key in the right table
(big_tech_stock_small).

big_tech_companies_small %>%
  anti_join(big_tech_stock_small, by = "stock_symbol")

## # A tibble: 6 × 2
##   stock_symbol company              
##   <chr>        <chr>                
## 1 ADBE         Adobe Inc.           
## 2 AMZN         Amazon.com, Inc.     
## 3 CRM          Salesforce, Inc.     
## 4 INTC         Intel Corporation    
## 5 META         Meta Platforms, Inc. 
## 6 MSFT         Microsoft Corporation

Week 9: Apply it to your data 8

Connor Sweeney

2025-10-21

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join