In this lab, I am going to choose two variables from the datsaet “german” from the ICRdatasets, the purpose of this lab is to look at the relation these two variables have, an interpretation will be provided after completing the 7 steps.
Part 1
## THIS CODE IS TO RENAME THE LEVELS FOR THE CHANNEL AND REGION VARIABLES IN THE CUSTOMER DATASETlibrary(datasetsICR)data(german)head(german)
Age Gender Housing Saving accounts Checking account Credit amount Duration
1 67 male own <NA> little 1169 6
2 22 female own little moderate 5951 48
3 49 male own little <NA> 2096 12
4 45 male free little little 7882 42
5 53 male free little little 4870 24
6 35 male free <NA> <NA> 9055 36
Purpose Class Risk
1 radio/TV 1
2 radio/TV 2
3 education 1
4 furniture/equipment 1
5 car 2
6 education 1
Part 1: Practice using pipes (dplyr) to summarize the data: Two categorical values
For this lab, I am going to use the “housing” and “purpose” variables
Use dplyr to summarize the data by the two categorical variables and get the frequency and percent
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
`summarise()` has grouped output by 'Housing'. You can override using the
`.groups` argument.
pip1
# A tibble: 22 × 5
# Groups: Housing [3]
Housing Purpose N freq pct
<chr> <chr> <int> <dbl> <dbl>
1 free business 5 0.0463 5
2 free car 55 0.509 51
3 free education 15 0.139 14
4 free furniture/equipment 11 0.102 10
5 free radio/TV 15 0.139 14
6 free repairs 3 0.0278 3
7 free vacation/others 4 0.0370 4
8 own business 76 0.107 11
9 own car 219 0.307 31
10 own domestic appliances 10 0.0140 1
# ℹ 12 more rows
Use the subset() argument in the data= section in the ggplot() argument to remove missing values
library(ggplot2)p_title <-"Housing type and their purposes"p_caption <-"german dataset"# AS STACKED BAR CHARTp <-ggplot(data =subset(pip1, !is.na(Housing) &!is.na(Purpose)), aes(x=Housing, y=pct, fill = Purpose))p +geom_col(position ="stack") +labs(x="Housing", y="Percent", fill ="Purpose",title = p_title, caption = p_caption, subtitle ="As a stacked bar chart") +geom_text(aes(label=pct), position =position_stack(vjust=.5))
Interpretation:
From the chart below, using the two variables seem that the information can be interpreted and therefore it makes sense, when dividing the housing types and purposes into percentages, the algorithm displays actual percentages than when using other variables such as Credit amount. Because the type of housing can have different reasons why to be used, the purpose variable fits well with the data provided.
Part 2:Create stacked and dodged bar charts: Two Categorical Variables
p_title <-"Housing type and their purposes"p_caption <-"german dataset"# AS STACKED BAR CHARTp <-ggplot(data =subset(pip1, !is.na(Housing) &!is.na(Purpose)), aes(x=Housing, y=pct, fill = Purpose))p +geom_col(position ="stack") +labs(x="Housing", y="Percent", fill ="Purpose",title = p_title, caption = p_caption, subtitle ="As a stacked bar chart") +geom_text(aes(label=pct), position =position_stack(vjust=.5))
# AS DODGED BAR CHARTp +geom_col(position ="dodge2") +labs(x="Housing", y="Percent", fill ="Purpose",title = p_title, caption = p_caption, subtitle ="As a dodged bar chart") +geom_text(aes(label = pct), position =position_dodge(width = .9))
# AS FACETED HORIZONTAL BAR CHARTp +geom_col(position ="dodge2") +labs(x=NULL, y="Percent", fill ="Purpose",title = p_title, caption = p_caption, subtitle ="As a faceted horizontal bar chart") +guides(fill ="none") +coord_flip() +facet_grid(~ Housing) +geom_text(aes(label = pct), position =position_dodge2(width =1))
Part 3: Practice using pipes (dplyr) to summarize data: Two Continuous Variables and One Categorical
Part 4: Create a scatterplot: Two Continuous Variables and One Categorical
p <-ggplot(pip2, aes(x=credit_mean, y=duration_mean, color=Purpose))p +geom_point(size=5) +annotate(geom ="text", x =1.6, y=58, label ="Lets see how the credit amount earned and the duration it has is for \n type of housing", hjust=0) +labs(y="Duration in months", x="Credit amount", title="Credit amount earned and duration", subtitle ="How the credit amount lasts depending on the type of purpose", caption <-"german dataset{ICRdatasets}")
Part 5: Legends and guides
p <-ggplot(pip2, aes(x=credit_mean, y=duration_mean, color=Purpose))p +geom_point(size=5) +annotate(geom ="text", x =1.6, y=58, label ="Lets see how the credit amount earned and the duration it has is for \n each type of housing", hjust=0) +labs(y="Credit Duration in months", x="Credit Aamount", title="Credit amount earned in Deutsch mark and its duration", subtitle ="How the credit amount lasts depending on the type of purpose", caption <-"german dataset{ICRdatasets}")
Part 6: Data Labels
p <-ggplot(pip2, aes(x=credit_mean, y=duration_mean, color=Purpose))p +geom_point(size=5) +geom_text(mapping =aes(label=Purpose), hjust=1.2, size=3) +annotate(geom ="text", x =1.6, y=58, label ="Lets see how the credit amount earned and the duration it has is for \n each type of housing.", hjust=0) +labs(y="Credit Duration", x="Credit Aamount", title="Credit amount earned in Deutsch mark and its duration",subtitle ="How the credit amount lasts depending on the type of purpose", caption <-"german dataset{ICRdatasets}",color ="Housing") +theme(legend.position ="none")
Interpretation.
A graph depicts trends in credit amount and duration in Germany based on the purpose of the loan. Vacation loans are the most frequent, with the highest credit amounts and longest durations. The average vacation loan is around 30,000 Deutsche Marks paid over 20 months. In contrast, loans for domestic appliances have the lowest credit amounts. For items like furniture, cars, education, and home repairs, the credit amounts and durations fall between these two extremes, with only minor variations. Overall, the data indicates Germans are willing to take on more long-term debt to fund recreation and vacations compared to basic household needs. This suggests vacation time is a higher priority in German culture than material possessions. People appear comfortable meeting basic needs without loans, only utilizing longer-term financing for discretionary expenses like travel which bring enjoyment over many months or years. The graph reflects German cultural values and priorities.