Import wide text file

Read in Escarlet’s Travel Expense data that I put into a csv file.

travel <- read_csv("https://raw.githubusercontent.com/jacshap/Data607/refs/heads/main/Escarlet%20Travel%20Expense%20Data.csv", col_names = TRUE, show_col_types = FALSE) 
## New names:
## • `` -> `...1`
knitr::kable(travel)
…1 Meals Hotels Transport subtotals
San Jose NA NA NA NA
25-Aug-97 37.74 112 45 NA
26-Aug-97 27.28 112 45 NA
subtotals 65.02 224 90 379.02
Seattle NA NA NA NA
27-Aug-97 96.25 109 36 NA
28-Aug-97 35.00 109 36 NA
subtotals 131.25 218 72 421.25
Totals 196.27 442 162 800.27

Format to long

Get rid of subtotals column and rows of subtotals and Totals (rows 4, 8, and 9)

#rename first column
names(travel)[1] <- "Date"

#remove subtotals column
travel <- travel %>% select(Date, Meals, Hotels, Transport)

#remove rows
travel <- travel[-c(4,8,9),]

knitr::kable(travel)
Date Meals Hotels Transport
San Jose NA NA NA
25-Aug-97 37.74 112 45
26-Aug-97 27.28 112 45
Seattle NA NA NA
27-Aug-97 96.25 109 36
28-Aug-97 35.00 109 36

There will be 2 San Jose days and 2 Seattle days. Delete those rows and make list of city and append as a column.

cities <- c('San Jose', 'San Jose', 'Seattle', 'Seattle')

travel<- travel[-c(1, 4),]

travel <- travel %>% mutate(City = cities)

Reorder to put Date first and City second

travel <- travel[,c(1,5,2,3,4)]

Combine Meals, Hotels, and Transport into one column

travel <- travel %>% pivot_longer(cols = 'Meals':'Transport',
                               names_to = 'Category',
                               values_to = "Amount"
                               )

knitr::kable(travel)
Date City Category Amount
25-Aug-97 San Jose Meals 37.74
25-Aug-97 San Jose Hotels 112.00
25-Aug-97 San Jose Transport 45.00
26-Aug-97 San Jose Meals 27.28
26-Aug-97 San Jose Hotels 112.00
26-Aug-97 San Jose Transport 45.00
27-Aug-97 Seattle Meals 96.25
27-Aug-97 Seattle Hotels 109.00
27-Aug-97 Seattle Transport 36.00
28-Aug-97 Seattle Meals 35.00
28-Aug-97 Seattle Hotels 109.00
28-Aug-97 Seattle Transport 36.00

Analysis

Suggested analysis is “If tidied, the data could be used to analyze average expenses by city, compare spending across categories, or track changes in daily travel costs.”

Average expenses by city

#can divide sum by 2 because 2 days each
travel %>% group_by(City) %>% select(City, Amount) %>% summarize(Total = sum(Amount)/2)
## # A tibble: 2 × 2
##   City     Total
##   <chr>    <dbl>
## 1 San Jose  190.
## 2 Seattle   211.

Spending across categories

travel %>% group_by(Category) %>% summarize(Total = sum(Amount))
## # A tibble: 3 × 2
##   Category  Total
##   <chr>     <dbl>
## 1 Hotels     442 
## 2 Meals      196.
## 3 Transport  162

Changes in daily travel costs

travel %>% group_by(Category) %>% mutate(lagged = lag(Amount, n=1)) %>% mutate(Daily_Difference = Amount - lagged) %>% select(-lagged) %>% arrange(Category)
## # A tibble: 12 × 5
## # Groups:   Category [3]
##    Date      City     Category  Amount Daily_Difference
##    <chr>     <chr>    <chr>      <dbl>            <dbl>
##  1 25-Aug-97 San Jose Hotels     112               NA  
##  2 26-Aug-97 San Jose Hotels     112                0  
##  3 27-Aug-97 Seattle  Hotels     109               -3  
##  4 28-Aug-97 Seattle  Hotels     109                0  
##  5 25-Aug-97 San Jose Meals       37.7             NA  
##  6 26-Aug-97 San Jose Meals       27.3            -10.5
##  7 27-Aug-97 Seattle  Meals       96.2             69.0
##  8 28-Aug-97 Seattle  Meals       35              -61.2
##  9 25-Aug-97 San Jose Transport   45               NA  
## 10 26-Aug-97 San Jose Transport   45                0  
## 11 27-Aug-97 Seattle  Transport   36               -9  
## 12 28-Aug-97 Seattle  Transport   36                0