Read in Escarlet’s Travel Expense data that I put into a csv file.
travel <- read_csv("https://raw.githubusercontent.com/jacshap/Data607/refs/heads/main/Escarlet%20Travel%20Expense%20Data.csv", col_names = TRUE, show_col_types = FALSE)
## New names:
## • `` -> `...1`
knitr::kable(travel)
| …1 | Meals | Hotels | Transport | subtotals |
|---|---|---|---|---|
| San Jose | NA | NA | NA | NA |
| 25-Aug-97 | 37.74 | 112 | 45 | NA |
| 26-Aug-97 | 27.28 | 112 | 45 | NA |
| subtotals | 65.02 | 224 | 90 | 379.02 |
| Seattle | NA | NA | NA | NA |
| 27-Aug-97 | 96.25 | 109 | 36 | NA |
| 28-Aug-97 | 35.00 | 109 | 36 | NA |
| subtotals | 131.25 | 218 | 72 | 421.25 |
| Totals | 196.27 | 442 | 162 | 800.27 |
Get rid of subtotals column and rows of subtotals and Totals (rows 4, 8, and 9)
#rename first column
names(travel)[1] <- "Date"
#remove subtotals column
travel <- travel %>% select(Date, Meals, Hotels, Transport)
#remove rows
travel <- travel[-c(4,8,9),]
knitr::kable(travel)
| Date | Meals | Hotels | Transport |
|---|---|---|---|
| San Jose | NA | NA | NA |
| 25-Aug-97 | 37.74 | 112 | 45 |
| 26-Aug-97 | 27.28 | 112 | 45 |
| Seattle | NA | NA | NA |
| 27-Aug-97 | 96.25 | 109 | 36 |
| 28-Aug-97 | 35.00 | 109 | 36 |
There will be 2 San Jose days and 2 Seattle days. Delete those rows and make list of city and append as a column.
cities <- c('San Jose', 'San Jose', 'Seattle', 'Seattle')
travel<- travel[-c(1, 4),]
travel <- travel %>% mutate(City = cities)
Reorder to put Date first and City second
travel <- travel[,c(1,5,2,3,4)]
Combine Meals, Hotels, and Transport into one column
travel <- travel %>% pivot_longer(cols = 'Meals':'Transport',
names_to = 'Category',
values_to = "Amount"
)
knitr::kable(travel)
| Date | City | Category | Amount |
|---|---|---|---|
| 25-Aug-97 | San Jose | Meals | 37.74 |
| 25-Aug-97 | San Jose | Hotels | 112.00 |
| 25-Aug-97 | San Jose | Transport | 45.00 |
| 26-Aug-97 | San Jose | Meals | 27.28 |
| 26-Aug-97 | San Jose | Hotels | 112.00 |
| 26-Aug-97 | San Jose | Transport | 45.00 |
| 27-Aug-97 | Seattle | Meals | 96.25 |
| 27-Aug-97 | Seattle | Hotels | 109.00 |
| 27-Aug-97 | Seattle | Transport | 36.00 |
| 28-Aug-97 | Seattle | Meals | 35.00 |
| 28-Aug-97 | Seattle | Hotels | 109.00 |
| 28-Aug-97 | Seattle | Transport | 36.00 |
Suggested analysis is “If tidied, the data could be used to analyze average expenses by city, compare spending across categories, or track changes in daily travel costs.”
#can divide sum by 2 because 2 days each
travel %>% group_by(City) %>% select(City, Amount) %>% summarize(Total = sum(Amount)/2)
## # A tibble: 2 × 2
## City Total
## <chr> <dbl>
## 1 San Jose 190.
## 2 Seattle 211.
travel %>% group_by(Category) %>% summarize(Total = sum(Amount))
## # A tibble: 3 × 2
## Category Total
## <chr> <dbl>
## 1 Hotels 442
## 2 Meals 196.
## 3 Transport 162
travel %>% group_by(Category) %>% mutate(lagged = lag(Amount, n=1)) %>% mutate(Daily_Difference = Amount - lagged) %>% select(-lagged) %>% arrange(Category)
## # A tibble: 12 × 5
## # Groups: Category [3]
## Date City Category Amount Daily_Difference
## <chr> <chr> <chr> <dbl> <dbl>
## 1 25-Aug-97 San Jose Hotels 112 NA
## 2 26-Aug-97 San Jose Hotels 112 0
## 3 27-Aug-97 Seattle Hotels 109 -3
## 4 28-Aug-97 Seattle Hotels 109 0
## 5 25-Aug-97 San Jose Meals 37.7 NA
## 6 26-Aug-97 San Jose Meals 27.3 -10.5
## 7 27-Aug-97 Seattle Meals 96.2 69.0
## 8 28-Aug-97 Seattle Meals 35 -61.2
## 9 25-Aug-97 San Jose Transport 45 NA
## 10 26-Aug-97 San Jose Transport 45 0
## 11 27-Aug-97 Seattle Transport 36 -9
## 12 28-Aug-97 Seattle Transport 36 0