Lab 3.1

Erick Xavier Maldonado

INTRODUCTION

I loaded the “german” dataset and examined the credit amount, age, and gender variables. Then created visualizations to explore the relationships between age and credit amount, with gender mapped to color. I produced a polished scatterplot and published an analysis to RPubs using revealjs. The exercise built skills in data wrangling, visualization, and presentation. Key steps included data loading, exploration, plotting, and publishing results. Overall, the tasks provided an end-to-end workflow from loading data to creating and sharing insights.

Looked at the first 10 and last 10 rows

   Age Credit amount Gender
1   67          1169   male
2   22          5951 female
3   49          2096   male
4   45          7882   male
5   53          4870   male
6   35          9055   male
7   53          2835   male
8   35          6948   male
9   61          3059   male
10  28          5234   male
     Age Credit amount Gender
991   37          3565   male
992   34          1569   male
993   23          1936   male
994   30          3959   male
995   50          2390   male
996   31          1736 female
997   40          3857   male
998   38           804   male
999   23          1845   male
1000  27          4576   male

Get summary statistics

 Credit amount        Age       
 Min.   :  250   Min.   :19.00  
 1st Qu.: 1366   1st Qu.:27.00  
 Median : 2320   Median :33.00  
 Mean   : 3271   Mean   :35.55  
 3rd Qu.: 3972   3rd Qu.:42.00  
 Max.   :18424   Max.   :75.00  

Created a table for the factor

'data.frame':   1000 obs. of  9 variables:
 $ Age             : num  67 22 49 45 53 35 53 35 61 28 ...
 $ Gender          : Factor w/ 2 levels "female","male": 2 1 2 2 2 2 2 2 2 2 ...
 $ Housing         : chr  "own" "own" "own" "free" ...
 $ Saving accounts : chr  NA "little" "little" "little" ...
 $ Checking account: chr  "little" "moderate" NA "little" ...
 $ Credit amount   : num  1169 5951 2096 7882 4870 ...
 $ Duration        : num  6 48 12 42 24 36 24 36 12 30 ...
 $ Purpose         : chr  "radio/TV" "radio/TV" "education" "furniture/equipment" ...
 $ Class Risk      : num  1 2 1 1 2 1 1 1 1 2 ...

female   male 
   310    690 

Played with a few versions of the chart following the protocols in Chapter 3.

Chart 1

Chart 2

Chart 3

Selected my final chart. For this final chart, I interpreted the findings from the chart in text.

Chart 2

Interpretation of the findings

Based on the scatter plot of credit amount vs age, there appears to be a weak positive correlation between age and credit amount overall. However, when looking at males and females separately, the relationship is different: For males, there is a moderate positive correlation between age and credit amount. Older males tend to have higher credit amounts on average. The correlation appears linear, with credit amount increasing steadily with age.

Interpretation of the findings

For females, the relationship is more complex. There is no clear linear correlation. Females have varying credit amounts across all ages. Young females both in their 20s and 30s have some of the highest credit amounts, even more than older females. There are also young females with very low amounts. The distribution appears more scattered for females across age.
So in summary, age and credit amount have a positive correlation for males but not for females in this dataset. Older males tend to have higher credit amounts but that relationship does not hold for females across age groups. The gender difference in the age-credit amount relationship is an interesting finding from this data.

END