Fall 2024, Exam 1

Your name: Paul McCoy

Question 1 (35%)

Fix the code below

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
t_sales <- tibble(
  id = 1:30,
  odd = id %% 2 == 0,
  region = rep(c('CA', 'WV', 'ID'), 10),
  year = rep(2010:2019,3),
  sales = ifelse(region == "CA", 2, 1) * id * 2.5
)

t_people <- tibble( 
  state = c('CA', 'WV', 'ID', NA),
  name = c('Bob', 'Sarah', 'Cash', 'Smith')
)

# Join the data
t <- t_sales %>% 
  filter(region=='ID') %>% 
  rename(state = region) %>% 
  inner_join(t_people, by = 'state') %>% 
  group_by(year, name) %>% 
  summarise(sales_sum = sum(sales))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# Plot the results
ggplot(data = t) +
  geom_point(mapping = aes(color = name, y = sales_sum, x = year),
             size = 3) +
  geom_smooth(mapping = aes(color = name, y = sales_sum, x = year),
              se = F) + 
  labs(title = 'Sales by Person',
       x = 'Sales',
       y = 'Year')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Question 2 (15 points)

Complete the following tasks with r functions

# Create a vector with the numbers from 1 to 1,000
v1 <- c(1:1000)

# Create a new vector that multiplies them by 1/4th
v2 <- v1*1/4

# Create a new string that pulls out the first 8 characters of the  class name
# string using a function.
s <- 'BUDA 451 Business Data Analytics'
s1 <- substring(s,1,8)

# Create a new string that pulls the the numbers out of our class code using a 
# a function.
s2 <- parse_number(s)

# Turn the prior value into a number
s3 <- as.numeric(s2)

Q2a (35 points)

Create a new tibble that shows the number of days for each month and year. Then, show that data in a chart.

Hint: create a new tibble that is mutated to get the month and year, then summarise to find the number of days. Then plot the new tibble with smooth geom.

library(lubridate)

t_dates <- tibble(
  d = as.Date(as.Date('2000-1-1'):as.Date('2020-12-30'))
)

t_cal <- t_dates %>% 
  mutate(m=month(d),
         y=year(d)) %>% 
  group_by(y,m) %>% 
  summarise(num_of_days=n())
## `summarise()` has grouped output by 'y'. You can override using the `.groups`
## argument.
ggplot(data=t_cal,
       aes(m,num_of_days))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Q3 (15 points)

We are working to predict which companies our fund should purchase. The analysis will be based off of a dataset of their historical performance.

Q3a. What type of prediction would we use to decide if we should invest (a yes or no decision)? 2-3 words

Answer: ?

Q3b. What r functions and analysis should we use to measure our model’s performance? 1-2 sentences.

Answer: ?

Q3c. What metric would be most important if we want to prioritize not losing money on a deal? 1 word.

Answer: precision