Exercises

  1. The dataset teengamb concerns a study of teenage gambling in Britain. Make a numerical and graphical summary of the data, commenting on any features that you find interesting. Limit the output you present to a quantity that a busy reader would find sufficient to get a basic understanding of the data.

Solution

Study of teenage gambling in Britain

A survey was conducted to study teenage gambling in Britain. This frame contains the following columns:

sex 0=male, 1=female

status Socioeconomic status score based on parents’ occupation

income in pounds per week

verbal verbal score in words out of 12 correctly defined

gamble expenditure on gambling in pounds per year

The data set ‘’teengamp’’ has 47 observations and 5 variables. As We notice there are 28 men and 19 females teenagers. There no missing value in the dataset.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
# load data set teengamp


teengamp <- read.table('teengamb.txt', header = T)

# read the first 5 rows
head(teengamp)
##   sex status income verbal gamble
## 1   1     51   2.00      8    0.0
## 2   1     28   2.50      8    0.0
## 3   1     37   2.00      6    0.0
## 4   1     28   7.00      4    7.3
## 5   1     65   2.00      8   19.6
## 6   1     61   3.47      6    0.1

Thre is not enough information on this data set to do inference. We can see basic statitics using summary function. The graph below show

# attch data set
attach(teengamp)

summary(teengamp)
##       sex             status          income           verbal     
##  Min.   :0.0000   Min.   :18.00   Min.   : 0.600   Min.   : 1.00  
##  1st Qu.:0.0000   1st Qu.:28.00   1st Qu.: 2.000   1st Qu.: 6.00  
##  Median :0.0000   Median :43.00   Median : 3.250   Median : 7.00  
##  Mean   :0.4043   Mean   :45.23   Mean   : 4.642   Mean   : 6.66  
##  3rd Qu.:1.0000   3rd Qu.:61.50   3rd Qu.: 6.210   3rd Qu.: 8.00  
##  Max.   :1.0000   Max.   :75.00   Max.   :15.000   Max.   :10.00  
##      gamble     
##  Min.   :  0.0  
##  1st Qu.:  1.1  
##  Median :  6.0  
##  Mean   : 19.3  
##  3rd Qu.: 19.4  
##  Max.   :156.0
str(teengamp)
## 'data.frame':    47 obs. of  5 variables:
##  $ sex   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ status: int  51 28 37 28 65 61 28 27 43 18 ...
##  $ income: num  2 2.5 2 7 2 3.47 5.5 6.42 2 6 ...
##  $ verbal: int  8 8 6 4 8 6 7 5 6 7 ...
##  $ gamble: num  0 0 0 7.3 19.6 0.1 1.45 6.6 1.7 0.1 ...
# Look at the distribution for test '

summary(factor(teengamp$sex))
##  0  1 
## 28 19

The distribution of income shows a wide range, indicating a diverse economic background among the teenagers.The distribution of gambling expenditure is highly skewed, with most teenagers spending very little, while a few spend much more. This could indicate that gambling is not a common activity for most but significant for some

library(ggplot2)
ggplot(teengamp, aes(x=income, y=gamble)) + geom_point(size =1) + facet_grid(~sex)

par(mfrow=c(2,3))

# Histograms for each numerical variable
hist(teengamp$income, main="Income Distribution", xlab="Income")
hist(teengamp$verbal, main="Verbal Score Distribution", xlab="Verbal Score")
hist(teengamp$gamble, main="Gambling Expenditure Distribution", xlab="Gambling Expenditure")

# Boxplots to compare gambling expenditure by sex and status
boxplot(gamble ~ sex, data=teengamp, main="Gambling Expenditure by Sex", xlab="Sex", ylab="Gambling Expenditure")
boxplot(gamble ~ status, data=teengamp, main="Gambling Expenditure by Status", xlab="Status", ylab="Gambling Expenditure")

library(GGally)
ggpairs(teengamp,
        color = 2:4,
        aes(color = factor(sex),
            alpha=0.5))
## Warning in warn_if_args_exist(list(...)): Extra arguments: "color" are being
## ignored.  If these are meant to be aesthetics, submit them using the 'mapping'
## variable within ggpairs with ggplot2::aes or ggplot2::aes_string.
## Warning in cor(x, y): the standard deviation is zero

## Warning in cor(x, y): the standard deviation is zero

## Warning in cor(x, y): the standard deviation is zero

## Warning in cor(x, y): the standard deviation is zero

## Warning in cor(x, y): the standard deviation is zero

## Warning in cor(x, y): the standard deviation is zero

## Warning in cor(x, y): the standard deviation is zero

## Warning in cor(x, y): the standard deviation is zero

BY looking at the pairplot,m we can notice that there is a correlation between income and gamble

Exercise 3

1.3 The dataset prostate is from a study on 97 men with prostate cancer who were due to receive a radical prostatectomy. Make a numerical and graphical summary of the data as in the first question

Solution

The prostate data set has 97 observations and 9 variables. The variables are:

lpsa: log PSA score

lcavol: log cancer volume

lweight: log prostate weight

age: age of patient

lbph: log of the amount of benign prostatic hyperplasia

svi: seminal vesicle invasion

lcp: log of capsular penetration

gleason: Gleason score

pgg45: percent of Gleason scores 4 or 5

As We can notice minimum age that appear to have a prostate cancer is 41 and the maximum age is 79. It is important to understand the relationships between these variables and the data is distributed.

prostate <- read.table('prostate.txt',header=T)
head(prostate)
##       lcavol lweight age      lbph svi      lcp gleason pgg45     lpsa
## 1 -0.5798185  2.7695  50 -1.386294   0 -1.38629       6     0 -0.43078
## 2 -0.9942523  3.3196  58 -1.386294   0 -1.38629       6     0 -0.16252
## 3 -0.5108256  2.6912  74 -1.386294   0 -1.38629       7    20 -0.16252
## 4 -1.2039728  3.2828  58 -1.386294   0 -1.38629       6     0 -0.16252
## 5  0.7514161  3.4324  62 -1.386294   0 -1.38629       6     0  0.37156
## 6 -1.0498221  3.2288  50 -1.386294   0 -1.38629       6     0  0.76547
str(prostate)
## 'data.frame':    97 obs. of  9 variables:
##  $ lcavol : num  -0.58 -0.994 -0.511 -1.204 0.751 ...
##  $ lweight: num  2.77 3.32 2.69 3.28 3.43 ...
##  $ age    : int  50 58 74 58 62 50 64 58 47 63 ...
##  $ lbph   : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ svi    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ lcp    : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ gleason: int  6 6 7 6 6 6 6 6 6 6 ...
##  $ pgg45  : int  0 0 20 0 0 0 0 0 0 0 ...
##  $ lpsa   : num  -0.431 -0.163 -0.163 -0.163 0.372 ...
summary(prostate)
##      lcavol           lweight           age             lbph        
##  Min.   :-1.3471   Min.   :2.375   Min.   :41.00   Min.   :-1.3863  
##  1st Qu.: 0.5128   1st Qu.:3.376   1st Qu.:60.00   1st Qu.:-1.3863  
##  Median : 1.4469   Median :3.623   Median :65.00   Median : 0.3001  
##  Mean   : 1.3500   Mean   :3.653   Mean   :63.87   Mean   : 0.1004  
##  3rd Qu.: 2.1270   3rd Qu.:3.878   3rd Qu.:68.00   3rd Qu.: 1.5581  
##  Max.   : 3.8210   Max.   :6.108   Max.   :79.00   Max.   : 2.3263  
##       svi              lcp             gleason          pgg45       
##  Min.   :0.0000   Min.   :-1.3863   Min.   :6.000   Min.   :  0.00  
##  1st Qu.:0.0000   1st Qu.:-1.3863   1st Qu.:6.000   1st Qu.:  0.00  
##  Median :0.0000   Median :-0.7985   Median :7.000   Median : 15.00  
##  Mean   :0.2165   Mean   :-0.1794   Mean   :6.753   Mean   : 24.38  
##  3rd Qu.:0.0000   3rd Qu.: 1.1786   3rd Qu.:7.000   3rd Qu.: 40.00  
##  Max.   :1.0000   Max.   : 2.9042   Max.   :9.000   Max.   :100.00  
##       lpsa        
##  Min.   :-0.4308  
##  1st Qu.: 1.7317  
##  Median : 2.5915  
##  Mean   : 2.4784  
##  3rd Qu.: 3.0564  
##  Max.   : 5.5829
attach(prostate)

par(mfrow = c(3,3)) # 3*3 plots 


for (col in names(prostate)){
  hist(prostate[[col]], main = paste('Histogram of', col), xlab=col, col='lightblue', border='black') # histogram for all the numerical value features
}

for (col in names(prostate)) {
  boxplot(prostate[[col]], main=paste('Boxplot of ', col), xlab=col, col='lightblue', border = 'black')
}

library(corrplot)
## corrplot 0.92 loaded
cor_matrix <- cor(prostate)

corrplot(cor_matrix, method = 'circle', type='upper',order= 'hclust',
         addCoef.col = 'black', tl.col = 'black',tl.srt = 45,
         title = 'Correlation of Plot Prostate Cancer', mar = c(0,0,1,0))

Histograms and boxplots reveal the distribution and potential outliers in the data. The scatter plot matrix helps in visualizing relationships between variables. We use the correlation plot to show how well these variables are related to each other.

This model suggests that lcavol, lweight are significant predictors of lpsa, while age is less so. The model explains a significant portion of the variance in lpsa, indicating that it has a reasonably good fit. The R-squared value of 0.5981 indicates that about 60% of the variability in lpsa is explained by the model. The F-statistic is significant (p-value < 0.0001), suggesting that the model as a whole is a good fit for the data.

lm_pros <- lm(lpsa ~ lcavol+age+lweight+gleason)
summary(lm_pros)
## 
## Call:
## lm(formula = lpsa ~ lcavol + age + lweight + gleason)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.48679 -0.46467 -0.00243  0.38421  1.95227 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.85636    1.04299  -0.821 0.413732    
## lcavol       0.64443    0.07367   8.748 9.82e-14 ***
## age         -0.01320    0.01125  -1.173 0.243783    
## lweight      0.58736    0.16506   3.559 0.000592 ***
## gleason      0.17211    0.12103   1.422 0.158393    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7476 on 92 degrees of freedom
## Multiple R-squared:  0.5981, Adjusted R-squared:  0.5806 
## F-statistic: 34.22 on 4 and 92 DF,  p-value: < 2.2e-16
plot(lm_pros)

Exercise 4

1.4 The dataset sat comes from a study entitled “Getting What You Pay For: The Debate Over Equity in Public School Expenditures.” Make a numerical and graphical summary of the data as in the first question.

Solution

The datset SAT has 50 observations and 8 columns. These variables are:

Variable Description State: Name of state expend: Expenditure per pupil in average daily attendance in public elementary and secondary schools, 1994-95 (in thousands of dollars) ratio: Average pupil/teacher ratio in public elementary and secondary schools, Fall 1994 salary: Estimated average annual salary of teachers in public elementary and secondary schools, 1994-95 (in thousands of dollars) frac: Percentage of all eligible students taking the SAT, 1994-95 verbal: Average verbal SAT score, 1994-95 math: Average math SAT score, 1994-95 sat: Average total score on the SAT, 1994-95

We change the row names into columns and make an histogram of how the data is distributed. We loop through all the numerical variables to create the histograms. The correlation plot shows that there is a high correlation between salary and expenditure per pupil in average daily attendance in public elementary and secondary schools. State like North Dakota has the highest math test score for SAT and Iowa has the highest total points for the SAT Test score.

library(tibble)
sat <- read.table('sat.txt',header = T, sep='\t')
head(sat)
##            expend ratio salary takers verbal math total
## Alabama     4.405  17.2 31.144      8    491  538  1029
## Alaska      8.963  17.6 47.951     47    445  489   934
## Arizona     4.778  19.3 32.175     27    448  496   944
## Arkansas    4.459  17.1 28.934      6    482  523  1005
## California  4.992  24.0 41.078     45    417  485   902
## Colorado    5.443  18.4 34.571     29    462  518   980
# Changing the shape of the table 
new_df <- rownames_to_column(sat, var='State')
new_df
##             State expend ratio salary takers verbal math total
## 1         Alabama  4.405  17.2 31.144      8    491  538  1029
## 2          Alaska  8.963  17.6 47.951     47    445  489   934
## 3         Arizona  4.778  19.3 32.175     27    448  496   944
## 4        Arkansas  4.459  17.1 28.934      6    482  523  1005
## 5      California  4.992  24.0 41.078     45    417  485   902
## 6        Colorado  5.443  18.4 34.571     29    462  518   980
## 7     Connecticut  8.817  14.4 50.045     81    431  477   908
## 8        Delaware  7.030  16.6 39.076     68    429  468   897
## 9         Florida  5.718  19.1 32.588     48    420  469   889
## 10        Georgia  5.193  16.3 32.291     65    406  448   854
## 11         Hawaii  6.078  17.9 38.518     57    407  482   889
## 12          Idaho  4.210  19.1 29.783     15    468  511   979
## 13       Illinois  6.136  17.3 39.431     13    488  560  1048
## 14        Indiana  5.826  17.5 36.785     58    415  467   882
## 15           Iowa  5.483  15.8 31.511      5    516  583  1099
## 16         Kansas  5.817  15.1 34.652      9    503  557  1060
## 17       Kentucky  5.217  17.0 32.257     11    477  522   999
## 18      Louisiana  4.761  16.8 26.461      9    486  535  1021
## 19          Maine  6.428  13.8 31.972     68    427  469   896
## 20       Maryland  7.245  17.0 40.661     64    430  479   909
## 21  Massachusetts  7.287  14.8 40.795     80    430  477   907
## 22       Michigan  6.994  20.1 41.895     11    484  549  1033
## 23      Minnesota  6.000  17.5 35.948      9    506  579  1085
## 24    Mississippi  4.080  17.5 26.818      4    496  540  1036
## 25       Missouri  5.383  15.5 31.189      9    495  550  1045
## 26        Montana  5.692  16.3 28.785     21    473  536  1009
## 27       Nebraska  5.935  14.5 30.922      9    494  556  1050
## 28         Nevada  5.160  18.7 34.836     30    434  483   917
## 29  New Hampshire  5.859  15.6 34.720     70    444  491   935
## 30     New Jersey  9.774  13.8 46.087     70    420  478   898
## 31     New Mexico  4.586  17.2 28.493     11    485  530  1015
## 32       New York  9.623  15.2 47.612     74    419  473   892
## 33 North Carolina  5.077  16.2 30.793     60    411  454   865
## 34   North Dakota  4.775  15.3 26.327      5    515  592  1107
## 35           Ohio  6.162  16.6 36.802     23    460  515   975
## 36       Oklahoma  4.845  15.5 28.172      9    491  536  1027
## 37         Oregon  6.436  19.9 38.555     51    448  499   947
## 38   Pennsylvania  7.109  17.1 44.510     70    419  461   880
## 39   Rhode Island  7.469  14.7 40.729     70    425  463   888
## 40 South Carolina  4.797  16.4 30.279     58    401  443   844
## 41   South Dakota  4.775  14.4 25.994      5    505  563  1068
## 42      Tennessee  4.388  18.6 32.477     12    497  543  1040
## 43          Texas  5.222  15.7 31.223     47    419  474   893
## 44           Utah  3.656  24.3 29.082      4    513  563  1076
## 45        Vermont  6.750  13.8 35.406     68    429  472   901
## 46       Virginia  5.327  14.6 33.987     65    428  468   896
## 47     Washington  5.906  20.2 36.151     48    443  494   937
## 48  West Virginia  6.107  14.8 31.944     17    448  484   932
## 49      Wisconsin  6.930  15.9 37.746      9    501  572  1073
## 50        Wyoming  6.160  14.9 31.285     10    476  525  1001
str(new_df)
## 'data.frame':    50 obs. of  8 variables:
##  $ State : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ expend: num  4.41 8.96 4.78 4.46 4.99 ...
##  $ ratio : num  17.2 17.6 19.3 17.1 24 18.4 14.4 16.6 19.1 16.3 ...
##  $ salary: num  31.1 48 32.2 28.9 41.1 ...
##  $ takers: int  8 47 27 6 45 29 81 68 48 65 ...
##  $ verbal: int  491 445 448 482 417 462 431 429 420 406 ...
##  $ math  : int  538 489 496 523 485 518 477 468 469 448 ...
##  $ total : int  1029 934 944 1005 902 980 908 897 889 854 ...
summary(new_df)
##     State               expend          ratio           salary     
##  Length:50          Min.   :3.656   Min.   :13.80   Min.   :25.99  
##  Class :character   1st Qu.:4.882   1st Qu.:15.22   1st Qu.:30.98  
##  Mode  :character   Median :5.768   Median :16.60   Median :33.29  
##                     Mean   :5.905   Mean   :16.86   Mean   :34.83  
##                     3rd Qu.:6.434   3rd Qu.:17.57   3rd Qu.:38.55  
##                     Max.   :9.774   Max.   :24.30   Max.   :50.05  
##      takers          verbal           math           total       
##  Min.   : 4.00   Min.   :401.0   Min.   :443.0   Min.   : 844.0  
##  1st Qu.: 9.00   1st Qu.:427.2   1st Qu.:474.8   1st Qu.: 897.2  
##  Median :28.00   Median :448.0   Median :497.5   Median : 945.5  
##  Mean   :35.24   Mean   :457.1   Mean   :508.8   Mean   : 965.9  
##  3rd Qu.:63.00   3rd Qu.:490.2   3rd Qu.:539.5   3rd Qu.:1032.0  
##  Max.   :81.00   Max.   :516.0   Max.   :592.0   Max.   :1107.0
attach(new_df)
## The following object is masked from teengamp:
## 
##     verbal
new10 <- new_df %>% arrange(math) %>% tail() 
new10
##           State expend ratio salary takers verbal math total
## 45 South Dakota  4.775  14.4 25.994      5    505  563  1068
## 46         Utah  3.656  24.3 29.082      4    513  563  1076
## 47    Wisconsin  6.930  15.9 37.746      9    501  572  1073
## 48    Minnesota  6.000  17.5 35.948      9    506  579  1085
## 49         Iowa  5.483  15.8 31.511      5    516  583  1099
## 50 North Dakota  4.775  15.3 26.327      5    515  592  1107
library(dplyr)
corr_matrix <- cor(new_df[,-1])

corrplot(corr_matrix, method = 'circle', type='upper',order= 'hclust',
         addCoef.col = 'black', tl.col = 'black',tl.srt = 45,
         title = 'Correlation of Plot SAT ', mar = c(0,0,1,0))

# Scatter plot matrix
pairs(new_df[,-1], main = "Scatterplot Matrix", pch = 19, col = "blue")

**Excerise 1.5

The dataset divusa contains data on divorces in the United States from 1920 to 1996. Make a numerical and graphical summary of the data as in the first question

Solution

Divorce in the USA 1920-1996 > Description

Divorce rates in the USA from 1920-1996

Format A data frame with 77 observations on the following 7 variables.

year the year from 1920-1996

divorce divorce per 1000 women aged 15 or more

unemployed unemployment rate

femlab percent female participation in labor force aged 16+

marriage marriages per 1000 unmarried women aged 16+

birth births per 1000 women aged 15-44

military military personnel per 1000 population

# Read the dataset
df <- read.table("divusa.txt", header = TRUE, sep = "\t")
# Summary statistics
summary(df)
##       year         divorce        unemployed         femlab     
##  Min.   :1920   Min.   : 6.10   Min.   : 1.200   Min.   :22.70  
##  1st Qu.:1939   1st Qu.: 8.70   1st Qu.: 4.200   1st Qu.:27.47  
##  Median :1958   Median :10.60   Median : 5.600   Median :37.10  
##  Mean   :1958   Mean   :13.27   Mean   : 7.173   Mean   :38.58  
##  3rd Qu.:1977   3rd Qu.:20.30   3rd Qu.: 7.500   3rd Qu.:47.80  
##  Max.   :1996   Max.   :22.80   Max.   :24.900   Max.   :59.30  
##     marriage          birth           military     
##  Min.   : 49.70   Min.   : 65.30   Min.   : 1.940  
##  1st Qu.: 61.90   1st Qu.: 68.90   1st Qu.: 3.469  
##  Median : 74.10   Median : 85.90   Median : 9.102  
##  Mean   : 72.97   Mean   : 88.89   Mean   :12.365  
##  3rd Qu.: 80.00   3rd Qu.:107.30   3rd Qu.:14.266  
##  Max.   :118.10   Max.   :122.90   Max.   :86.641
str(df)
## 'data.frame':    77 obs. of  7 variables:
##  $ year      : int  1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 ...
##  $ divorce   : num  8 7.2 6.6 7.1 7.2 7.2 7.5 7.8 7.8 8 ...
##  $ unemployed: num  5.2 11.7 6.7 2.4 5 3.2 1.8 3.3 4.2 3.2 ...
##  $ femlab    : num  22.7 22.8 22.9 23 23.1 ...
##  $ marriage  : num  92 83 79.7 85.2 80.3 79.2 78.7 77 74.1 75.5 ...
##  $ birth     : num  118 120 111 110 111 ...
##  $ military  : num  3.22 3.56 2.46 2.21 2.29 ...
# Correlation matrix
cor(df[,-1])  # Exclude 'year' column
##                divorce unemployed      femlab   marriage      birth    military
## divorce     1.00000000 -0.2106019  0.91039698 -0.5342554 -0.7219242  0.01857483
## unemployed -0.21060188  1.0000000 -0.25746176 -0.2707630 -0.3138890 -0.40029295
## femlab      0.91039698 -0.2574618  1.00000000 -0.6486273 -0.6040949  0.05126339
## marriage   -0.53425537 -0.2707630 -0.64862728  1.0000000  0.6737273  0.25819826
## birth      -0.72192425 -0.3138890 -0.60409490  0.6737273  1.0000000  0.14089864
## military    0.01857483 -0.4002930  0.05126339  0.2581983  0.1408986  1.00000000
cor(df[,-1])  # Exclude 'year' column
##                divorce unemployed      femlab   marriage      birth    military
## divorce     1.00000000 -0.2106019  0.91039698 -0.5342554 -0.7219242  0.01857483
## unemployed -0.21060188  1.0000000 -0.25746176 -0.2707630 -0.3138890 -0.40029295
## femlab      0.91039698 -0.2574618  1.00000000 -0.6486273 -0.6040949  0.05126339
## marriage   -0.53425537 -0.2707630 -0.64862728  1.0000000  0.6737273  0.25819826
## birth      -0.72192425 -0.3138890 -0.60409490  0.6737273  1.0000000  0.14089864
## military    0.01857483 -0.4002930  0.05126339  0.2581983  0.1408986  1.00000000
# 2. Graphical Summary
head(df)
##   year divorce unemployed femlab marriage birth military
## 1 1920     8.0        5.2  22.70     92.0 117.9   3.2247
## 2 1921     7.2       11.7  22.79     83.0 119.8   3.5614
## 3 1922     6.6        6.7  22.88     79.7 111.2   2.4553
## 4 1923     7.1        2.4  22.97     85.2 110.5   2.2065
## 5 1924     7.2        5.0  23.06     80.3 110.9   2.2889
## 6 1925     7.2        3.2  23.15     79.2 106.6   2.1735
# Convert the data to long format for easier plotting
df_long <- reshape2::melt(df, id.vars = "year")

# Plot histograms for each variable
ggplot(df_long, aes(x = value)) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  facet_wrap(~variable, scales = "free") +
  labs(title = "Histograms of DivUSA Variables", x = "Value", y = "Frequency") +
  theme_minimal()

corr_matrix <- cor(df[-1])

corrplot(corr_matrix, method = 'circle', type='upper',order= 'hclust',
         addCoef.col = 'black', tl.col = 'black',tl.srt = 45,
         title = 'Correlation of Female working  ', mar = c(0,0,1,0))

The histogram displays the distribution of all the variables and gives us a better insight of the kind of data we have. It is important to notice that divorce among female who participate in labor forced at the of 16 and up have a strong correlation.