This analysis serves the purpose to explore the Belkin Elago Data and to gather some first insights and relations on the data set. Therefor, we take a look at the descriptive statistics in the first step.

Descriptive statistics

summary(BelkinElagoComplete)
##      salary            age            elevel           car       
##  Min.   : 20000   Min.   :20.00   Min.   :1.000   Min.   : 1.00  
##  1st Qu.: 52109   1st Qu.:35.00   1st Qu.:2.000   1st Qu.: 5.00  
##  Median : 84969   Median :50.00   Median :2.000   Median :10.00  
##  Mean   : 84897   Mean   :49.81   Mean   :2.339   Mean   :10.47  
##  3rd Qu.:117168   3rd Qu.:65.00   3rd Qu.:3.000   3rd Qu.:16.00  
##  Max.   :150000   Max.   :80.00   Max.   :4.000   Max.   :20.00  
##     zipcode          credit         brand     
##  Min.   :0.000   Min.   :416.6   Belkin:4652  
##  1st Qu.:2.000   1st Qu.:563.4   Elago :5348  
##  Median :4.000   Median :632.1                
##  Mean   :4.037   Mean   :632.1                
##  3rd Qu.:6.000   3rd Qu.:701.3                
##  Max.   :8.000   Max.   :849.0
summary(BelkinElagoComplete$salary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20000   52109   84969   84897  117168  150000
summary(BelkinElagoComplete$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.00   35.00   50.00   49.81   65.00   80.00
summary(BelkinElagoComplete$credit)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   416.6   563.4   632.1   632.1   701.3   849.0

Exploring relationships between variables

After some first insights given by the descriptive statistics, in the next steps, we will figure out the relationships and correlations between variables using plots and histograms, by calculating correlations and variances and finally by using boxplots. But one after another :)

plot_num(BelkinData,bins=10)

cross_plot(data=BelkinData,input="education",target="brand")

correlation_table(data=BelkinData,target="salary")
##    Variable salary
## 1    salary   1.00
## 2    credit   0.87
## 3 education   0.71
## 4       age   0.01
## 5       zip  -0.01
## 6       car  -0.24

The correlation table illustrates a strong relationship between salary and credit. Moreover, we can see the distribution of the attributes included in the data set by looking at this beautiful colorful table.

Using some Histograms

We start with two simple histograms for education level, credit status and age.

Checking out the attribute “car”

Then, we go on by getting to know the attribute “cars”. We therefore count the number of observations in our data set for every characteristic included.

##    var frequency percentage cumulative_perc
## 1    3       841       8.41            8.41
## 2   20       799       7.99           16.40
## 3    8       749       7.49           23.89
## 4    7       722       7.22           31.11
## 5   15       636       6.36           37.47
## 6    1       634       6.34           43.81
## 7   18       533       5.33           49.14
## 8    5       445       4.45           53.59
## 9   13       443       4.43           58.02
## 10  10       429       4.29           62.31
## 11  12       423       4.23           66.54
## 12  14       423       4.23           70.77
## 13  19       423       4.23           75.00
## 14   4       420       4.20           79.20
## 15  17       420       4.20           83.40
## 16  16       409       4.09           87.49
## 17   6       395       3.95           91.44
## 18  11       312       3.12           94.56
## 19   2       280       2.80           97.36
## 20   9       264       2.64          100.00

Since the characteristics themselves seem to have no meaningful explanation (it is not possible that 7.99% of the people have 20 cars), the only logical conclusion must be that the number equals a special car type. For example 20 = VW Gold, a very popular car. And 9 = Porsche 911, a maybe not so common car :)

We are now going to explore the relationship between salary and brand choice.

Exploring the relationship between salary and brand choice

Moreover, the following plots will highlight the relationship between the salary of an individual and its brand choice.

It seems to be that Elago is somewhat more popular among well educated and good earning individuals. We can also see that age does not affect salary crucially but education level does.

Explaining patterns in credit status

In a next step we will now take into account the credit variable in order to explore if credit status is affected by age, salary and education.

Not surprisingly, credit status depends heavily on education level and salary. Age, however, does not seem to have any relevance in this case.

Explaining more about patterns in brand choice

In the following, we will check whether age, credit status, salary and education affect the brand choice of individuals.

Two different plots are being applied for this purpose, with the second one showing the age via a variable called “age.bins”. This variable represents age separated into 6 bins, as shown below.

Boxplot analysis

The first boxplots represents the credit status as the target variable for all four education levels. The findings are not too surprising.

Finally, we will investigate the density and ranges of salary and age, devided by the brands Belkin and Elago, using boxplots.