You are thinking of launching a new digital magazine subscription service, similar to Netflix, but with magazines. To determine the viability of this idea and potential key segments, we will look at this data of magazine subscribers and not subscribers. The dataset we will use in this assignment is the Magazine Subscription Data.
Start by loading in the correct packages and dataset.
install.packages("tidyverse")
## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(readr)
mag <- read_csv("Magazine Subscription Data.csv")
## Parsed with column specification:
## cols(
## age = col_double(),
## gender = col_character(),
## income = col_double(),
## kids = col_double(),
## ownHome = col_character(),
## subscribe = col_character(),
## Segment = col_character()
## )
If you recall, we learned about 5 verbs to use when coding in the tidyverse. They were: filter, arrange, mutate, summarize, and group_by. Which of these functions could be used to display only people with 0 children? mutate___________
Use this function to make it so that you only see magazine subscribers with 0 children.
filter(mag, kids=="0")
## # A tibble: 121 x 7
## age gender income kids ownHome subscribe Segment
## <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 43.2 Male 44169. 0 ownYes subNo Suburb mix
## 2 28.5 Male 47245. 0 ownNo subNo Suburb mix
## 3 35.2 Female 52568. 0 ownYes subNo Suburb mix
## 4 47.6 Male 47918. 0 ownYes subNo Suburb mix
## 5 37.6 Female 65767. 0 ownNo subNo Suburb mix
## 6 42.0 Female 53127. 0 ownYes subNo Suburb mix
## 7 44.0 Female 41255. 0 ownYes subNo Suburb mix
## 8 44.5 Male 57363. 0 ownNo subNo Suburb mix
## 9 45.3 Female 65170. 0 ownNo subNo Suburb mix
## 10 42.3 Male 49675. 0 ownYes subNo Suburb mix
## # … with 111 more rows
arrange(mag,age)
## # A tibble: 300 x 7
## age gender income kids ownHome subscribe Segment
## <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 19.3 Female 18593. 0 ownNo subNo Urban hip
## 2 20.7 Male 22517. 3 ownNo subNo Urban hip
## 3 21.0 Female 27244. 1 ownNo subNo Urban hip
## 4 21.2 Male 18419. 1 ownYes subYes Urban hip
## 5 21.4 Male 16646. 3 ownNo subNo Urban hip
## 6 21.5 Female 17083. 2 ownNo subNo Urban hip
## 7 21.8 Male 27807. 2 ownNo subYes Urban hip
## 8 22.1 Male 21107. 0 ownNo subNo Urban hip
## 9 22.2 Female 20222. 2 ownYes subYes Urban hip
## 10 22.3 Female 24541. 1 ownNo subNo Urban hip
## # … with 290 more rows
arrange(mag,desc(age))
## # A tibble: 300 x 7
## age gender income kids ownHome subscribe Segment
## <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 80.5 Male 82077. 0 ownYes subYes Travelers
## 2 78.2 Female 24604. 0 ownYes subNo Travelers
## 3 75.9 Female 23968. 0 ownYes subNo Travelers
## 4 71.9 Female 60279. 0 ownYes subYes Travelers
## 5 70.6 Male 48697. 0 ownNo subNo Travelers
## 6 68.1 Female 51535. 0 ownNo subNo Travelers
## 7 68.1 Female 25772. 0 ownYes subNo Travelers
## 8 68.1 Male 104312. 0 ownYes subNo Travelers
## 9 68.0 Female 69075. 0 ownNo subNo Travelers
## 10 66.9 Male 54061. 0 ownYes subNo Travelers
## # … with 290 more rows
filter(mag, gender=="Female") %>%
arrange(desc(income))
## # A tibble: 157 x 7
## age gender income kids ownHome subscribe Segment
## <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 54.9 Female 106430. 0 ownYes subNo Travelers
## 2 57.8 Female 105538. 0 ownYes subNo Travelers
## 3 66.4 Female 101174. 0 ownYes subNo Travelers
## 4 55.2 Female 96509. 0 ownYes subNo Travelers
## 5 47.8 Female 92431. 0 ownYes subNo Travelers
## 6 56.3 Female 91509. 0 ownYes subNo Travelers
## 7 53.7 Female 85770. 0 ownYes subNo Travelers
## 8 62.4 Female 82349. 0 ownYes subNo Travelers
## 9 37.3 Female 81042. 1 ownNo subNo Suburb mix
## 10 47.9 Female 79544. 1 ownYes subNo Suburb mix
## # … with 147 more rows
mag %>% group_by (Segment) %>% summarise(count=n())
## # A tibble: 4 x 2
## Segment count
## <chr> <int>
## 1 Moving up 70
## 2 Suburb mix 100
## 3 Travelers 80
## 4 Urban hip 50
segment_age <-
mag %>% group_by (Segment) %>% summarise(avg_age = mean(age))
head(segment_age)
## # A tibble: 4 x 2
## Segment avg_age
## <chr> <dbl>
## 1 Moving up 36.3
## 2 Suburb mix 39.9
## 3 Travelers 57.9
## 4 Urban hip 23.9
segment_age_kids <- mag %>% group_by (Segment) %>% summarise(avg_age = mean(age), avg_kids= mean(kids))
head(segment_age_kids)
## # A tibble: 4 x 3
## Segment avg_age avg_kids
## <chr> <dbl> <dbl>
## 1 Moving up 36.3 1.91
## 2 Suburb mix 39.9 1.92
## 3 Travelers 57.9 0
## 4 Urban hip 23.9 1.1
ggplot(data=segment_age_kids, aes(x=Segment, y=avg_age)) +
geom_bar(stat="identity")
10. Let’s look at what the distribution of income is like across our data. Make a histogram showing the distribution of income.
ggplot(mag, aes(x=income)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(mag, aes(x=income)) + geom_histogram(binwidth=10500)
ggplot(mag, aes(x=Segment, y=income)) +
geom_boxplot()
ggplot(mag, aes(x=gender, y=age)) +
geom_boxplot()
ggplot(mag, aes(x=age, y=income)) +
geom_point()
ggplot(mag, aes(x=age, y=income, color=gender)) +
geom_point()
color <- ggplot(mag, aes(x=age, y=income, color=gender)) +
geom_point()
color <- color + facet_wrap(~Segment, ncol=2)
color