Credit Scoring star relation schema using erdplus. The data that is provided demonstrates variation between expenses and income. The amount of income and expenses create a variation in the amount of the overall savings, the dimensions provide information about an individual’s marital status, expenses, income, job and other important information.
mydata = read.csv(file="data/Scoring.csv")
head(mydata)
mydata= read.csv(file = "data/Scoring.csv")
mean(mydata$Income)-mean(mydata$Expenses)
[1] 85.02834
median(mydata$Income)-median(mydata$Expenses)
[1] 73
Below, I have extracted the Expenses column.
#Extracting the Expenses Column
Expenses = mydata$Expenses
#Calling the Expenses Column
Expenses
[1] 73 48 90 63 46 75 75 35 90 90 60 60
[13] 75 75 35 75 35 65 45 35 46 45 105 74
[25] 45 60 75 75 75 45 45 75 75 35 45 45
[37] 75 70 45 44 75 35 45 75 35 35 75 48
[49] 60 60 75 105 75 60 85 35 45 60 90 70
[61] 90 35 75 75 75 75 60 45 93 60 60 45
[73] 75 75 86 90 45 70 45 60 60 60 35 75
[85] 35 47 60 40 60 35 35 101 95 140 75 45
[97] 35 120 89 45 73 75 78 35 75 76 90 45
[109] 35 79 45 90 35 60 75 35 75 45 35 75
[121] 57 75 66 114 45 35 75 35 90 75 45 75
[133] 60 65 75 60 45 60 45 47 60 90 75 119
[145] 35 105 60 35 45 60 35 45 60 35 75 78
[157] 35 45 75 90 75 35 60 57 60 45 75 42
[169] 75 45 77 75 74 75 35 35 60 75 45 45
[181] 113 75 75 75 75 75 60 78 46 45 120 75
[193] 90 45 60 75 87 126 45 35 90 45 75 75
[205] 60 45 35 75 105 75 75 35 35 45 45 60
[217] 60 45 90 105 60 60 60 35 35 75 60 45
[229] 35 35 100 56 45 57 45 62 45 45 90 90
[241] 90 60 35 45 35 35 45 45 45 45 73 94
[253] 105 60 60 77 60 88 45 93 60 105 60 75
[265] 35 45 75 60 60 60 60 44 60 35 35 75
[277] 56 35 45 75 35 60 60 60 75 60 45 75
[289] 60 75 60 90 73 60 90 45 45 45 45 35
[301] 35 75 90 59 87 35 60 90 85 75 60 60
[313] 96 45 45 45 45 45 87 60 35 90 35 71
[325] 75 45 75 60 75 60 75 90 35 45 75 90
[337] 60 45 60 173 60 60 60 60 90 60 110 45
[349] 70 60 75 60 35 45 35 75 75 45 45 75
[361] 35 75 35 90 60 113 60 75 76 45 90 45
[373] 75 35 35 90 72 60 49 58 66 35 75 75
[385] 60 60 35 35 75 68 90 60 75 60 35 45
[397] 75 70 60 60 75 35 49 45 75 45 45 60
[409] 66 45 45 63 45 67 84 90 35 35 75 75
[421] 60 82 86 35 35 60 71 60 80 45 54 51
[433] 60 75 86 60 75 45 95 45 60 35 45 55
[445] 45 85 63 35 75 45 45 50 60 45 60 60
[457] 45 45 75 35 35 45 75 75 49 60 60 60
[469] 75 60 90 45 75 60 58 35 45 107 45 54
[481] 35 90 75 60 60 54 45 43 45 60 60 60
[493] 75 35 62 35 35 35 45 75 35 66 35 35
[505] 90 75 68 90 60 45 45 75 52 45 68 77
[517] 45 45 46 45 35 45 45 75 60 60 45 75
[529] 60 35 59 75 69 45 45 75 35 60 101 45
[541] 60 60 75 75 75 75 35 75 60 44 75 75
[553] 75 60 64 66 45 91 35 45 90 35 35 135
[565] 60 75 90 45 45 45 45 88 45 45 75 75
[577] 60 120 93 35 90 60 45 75 75 35 35 105
[589] 90 45 75 75 90 60 75 76 45 35 90 35
[601] 84 60 105 75 60 75 103 90 35 56 75 105
[613] 45 35 90 90 45 60 69 77 75 90 45 55
[625] 60 60 74 56 60 60 82 60 75 35 48 45
[637] 60 60 75 69 45 75 75 74 45 45 60 59
[649] 35 45 60 45 39 60 35 60 91 75 44 90
[661] 75 35 45 60 45 35 63 75 56 75 85 60
[673] 90 60 90 45 58 90 60 52 60 90 75 35
[685] 52 87 35 85 35 102 130 75 75 105 75 35
[697] 75 107 85 40 45 52 65 85 60 35 60 35
[709] 60 60 35 35 90 45 35 75 45 35 91 60
[721] 75 45 90 35 45 90 45 60 56 35 75 45
[733] 75 35 60 35 75 73 60 75 68 60 35 35
[745] 60 66 35 35 75 60 45 75 60 45 60 69
[757] 60 60 74 45 45 60 35 60 45 90 75 90
[769] 118 75 60 60 75 35 45 45 90 60 45 68
[781] 75 60 135 60 35 75 45 60 90 35 35 78
[793] 60 35 43 35 75 35 45 45 75 35 90 61
[805] 135 92 35 102 60 84 90 75 35 45 60 60
[817] 105 45 60 90 35 45 105 75 35 64 90 45
[829] 75 60 75 35 90 70 45 60 60 75 85 53
[841] 90 45 105 60 90 35 35 90 35 35 62 35
[853] 75 75 45 35 45 75 45 90 45 60 60 75
[865] 45 60 90 90 45 100 45 75 75 97 90 60
[877] 45 90 35 75 72 70 60 35 105 60 75 35
[889] 55 35 75 45 58 74 60 45 60 35 51 45
[901] 35 90 35 60 60 108 56 90 78 58 74 75
[913] 60 60 88 75 35 75 35 45 35 45 35 90
[925] 35 60 60 45 77 45 45 82 75 35 45 75
[937] 75 45 60 45 35 75 60 42 78 35 35 45
[949] 75 60 59 75 102 35 35 75 35 75 75 93
[961] 45 88 35 69 45 75 90 75 35 57 35 103
[973] 90 35 35 60 35 45 35 97 45 35 75 45
[985] 60 35 60 75 75 45 60 35 60 60 113 60
[997] 45 35 124 75
[ reached getOption("max.print") -- omitted 3446 entries ]
Now,I will extract and call the Income column.
#Extracting the Income Column
income=mydata$Income
#Calling the Income Column
income
[1] 129 131 200 182 107 214 125 80 107 80 125 121
[13] 199 170 50 131 330 200 130 137 107 324 112 140
[25] 143 130 180 251 85 150 122 198 150 170 119 208
[37] 115 99 120 90 137 230 142 71 120 233 289 128
[49] 150 145 90 301 200 150 100 100 100 155 715 245
[61] 150 70 190 152 126 181 185 170 176 238 115 200
[73] 411 93 108 500 45 250 100 70 150 70 263 200
[85] 78 120 125 50 146 70 105 413 500 350 200 138
[97] 80 208 137 58 130 123 180 140 315 164 325 135
[109] 109 185 217 300 77 253 101 200 124 143 250 135
[121] 115 160 214 390 500 95 200 85 214 140 60 180
[133] 300 200 242 155 100 105 166 120 115 350 214 442
[145] 101 122 250 90 160 300 83 200 60 205 133 179
[157] 69 195 112 210 155 394 149 120 400 165 125 74
[169] 86 185 165 110 138 300 147 348 112 350 110 75
[181] 230 85 210 125 149 201 105 183 113 126 300 160
[193] 160 89 95 125 120 359 80 67 148 298 318 185
[205] 39 194 80 147 156 178 130 63 88 140 115 83
[217] 144 200 172 200 318 177 133 150 208 145 157 190
[229] 86 90 100 214 100 117 110 168 150 166 283 149
[241] 250 120 236 277 55 200 235 80 200 140 125 185
[253] 190 150 102 170 315 130 156 177 341 240 142 333
[265] 125 170 220 230 157 340 120 91 150 88 120 70
[277] 106 162 77 128 189 300 92 380 500 130 141 220
[289] 146 250 132 150 127 90 166 69 182 77 50 131
[301] 62 60 86 143 120 283 138 90 99 160 330 100
[313] 210 100 97 65 227 140 115 150 90 275 176 110
[325] 140 500 66 273 145 67 232 80 130 200 154 187
[337] 135 160 133 230 154 50 189 202 20 40 50 125
[349] 100 115 160 70 220 99 107 160 150 232 120 225
[361] 100 80 208 105 98 532 140 90 155 120 159 122
[373] 156 90 130 45 118 152 135 125 208 81 160 300
[385] 135 121 145 165 428 233 245 232 200 42 87 200
[397] 250 350 105 100 160 132 140 160 148 113 97 107
[409] 206 275 176 175 223 70 87 42 60 300 144 217
[421] 180 69 110 50 86 172 109 95 203 114 186 155
[433] 195 246 113 290 103 125 199 133 210 124 78 200
[445] 100 95 183 43 120 198 100 150 133 110 202 140
[457] 123 275 200 180 55 105 146 100 136 90 144 116
[469] 128 105 125 68 400 251 225 140 34 320 124 87
[481] 161 90 92 110 87 93 110 81 98 155 167 190
[493] 110 76 170 214 70 55 80 81 139 206 74 120
[505] 300 224 80 166 110 161 80 150 170 100 225 166
[517] 200 110 110 160 204 135 158 290 100 115 149 143
[529] 150 199 142 80 93 113 92 205 210 256 260 106
[541] 160 148 220 63 106 81 100 260 243 88 186 105
[553] 80 154 35 63 297 161 200 140 125 78 57 300
[565] 66 202 175 150 158 92 123 125 113 167 148 156
[577] 123 143 180 92 150 92 139 250 315 121 180 129
[589] 214 99 110 85 138 93 100 160 215 90 464 167
[601] 86 125 274 135 300 300 128 90 115 107 300 230
[613] 122 43 250 190 100 246 85 173 150 86 183 100
[625] 94 150 140 107 117 230 470 102 210 42 130 90
[637] 120 117 160 90 50 100 182 135 67 167 137 140
[649] 85 324 77 100 42 251 180 400 158 300 90 146
[661] 255 250 50 150 232 190 178 254 110 70 100 137
[673] 258 242 85 179 125 500 140 70 150 245 193 110
[685] 70 115 85 400 130 120 250 108 225 300 250 150
[697] 156 319 100 51 145 168 54 100 115 209 180 63
[709] 130 170 157 79 64 99 35 150 65 100 160 215
[721] 112 126 150 136 123 80 72 110 111 60 184 104
[733] 290 173 125 60 160 126 148 170 230 166 67 190
[745] 80 60 91 96 180 130 67 100 830 125 30 237
[757] 152 95 139 145 300 135 80 100 275 250 125 100
[769] 130 131 350 538 276 107 105 156 110 260 34 81
[781] 76 98 174 132 105 300 117 119 140 70 265 180
[793] 62 189 75 70 188 150 124 113 98 184 195 159
[805] 959 170 28 268 155 240 500 135 225 100 190 200
[817] 154 210 145 210 102 91 75 150 120 36 129 125
[829] 156 72 166 75 125 250 128 155 459 360 100 80
[841] 250 105 300 137 110 127 56 75 52 100 170 150
[853] 188 175 64 110 186 140 200 135 130 200 33 69
[865] 90 175 197 125 170 247 250 130 207 65 180 250
[877] 129 159 116 88 120 100 100 113 80 150 52 71
[889] 99 63 166 108 134 138 110 160 176 70 60 130
[901] 176 130 108 144 150 183 108 200 180 130 140 141
[913] 127 117 130 240 425 800 80 95 71 135 47 250
[925] 150 118 130 125 171 265 100 69 293 87 110 75
[937] 40 140 122 161 70 65 66 73 178 140 50 78
[949] 60 350 140 125 121 155 133 117 125 136 25 176
[961] 430 125 79 92 176 85 188 300 100 120 60 125
[973] 464 176 90 200 415 318 163 72 156 49 110 140
[985] 92 74 145 104 400 95 150 160 177 85 380 107
[997] 134 81 191 100
[ reached getOption("max.print") -- omitted 3446 entries ]
I will now compute the mean
#Call mean Expenses
meanExpenses = mean(Expenses)
#Calling the average
meanExpenses
[1] 55.60144
#Call mean Income
meanIncome= mean(income)
meanIncome
[1] 140.6298
Next, I will compute the standard deviation or spread of both the Income and Expenses columns.
#Computing the standard deviation of Expenses
spreadExpenses = sd(Expenses)
spreadExpenses
[1] 19.52084
#Finding the standard deviation of Income
spreadIncome= sd(income)
spreadIncome
[1] 80.1779
Now, to compute the SNR, the signal to noise ratio, a formula is created because there is no built in function.
SNR is the mean, or average, divided by the spread.
#Compute the snr of Expenses and name it snr_Expenses
snr_Expenses = meanExpenses/spreadExpenses
#Call snr_Expenses
snr_Expenses
[1] 2.848312
#Finding the snr of the Income and name it snr_Income
snr_Income= meanIncome/spreadIncome
#Call snr_Saving
snr_Income
[1] 1.753972
Of the Income and Expenses, which has a higher SNR? Why do you think that is? After looking at the results I believe Expenses has a higher SNR. Expenses has a higher SNR because individuals spend the same amount of money regardless of their income.
#str(mydata)
summary(mydata)
Status Seniority Home Time Age Marital Records Job Expenses Income
bad :1249 Min. : 0.000 ignore : 20 Min. : 6.00 Min. :18.00 divorced : 38 no_rec :3677 fixed :2803 Min. : 35.0 Min. : 1.0
good:3197 1st Qu.: 2.000 other : 319 1st Qu.:36.00 1st Qu.:28.00 married :3238 yes_rec: 769 freelance:1021 1st Qu.: 35.0 1st Qu.: 90.0
Median : 5.000 owner :2106 Median :48.00 Median :36.00 separated: 130 others : 171 Median : 51.0 Median :124.0
Mean : 7.991 parents: 782 Mean :46.45 Mean :37.08 single : 973 partime : 451 Mean : 55.6 Mean :140.6
3rd Qu.:12.000 priv : 246 3rd Qu.:60.00 3rd Qu.:45.00 widow : 67 3rd Qu.: 72.0 3rd Qu.:170.0
Max. :48.000 rent : 973 Max. :72.00 Max. :68.00 Max. :180.0 Max. :959.0
Assets Debt Amount Price Finrat Savings
Min. : 0 Min. : 0.0 $1,000.00 : 541 $1,500.00 : 46 Min. : 6.702 Min. :-8.160
1st Qu.: 0 1st Qu.: 0.0 $1,200.00 : 221 $1,200.00 : 45 1st Qu.: 60.030 1st Qu.: 1.615
Median : 3000 Median : 0.0 $800.00 : 219 $1,300.00 : 45 Median : 77.097 Median : 3.120
Mean : 5355 Mean : 342.3 $1,100.00 : 210 $1,600.00 : 43 Mean : 72.616 Mean : 3.860
3rd Qu.: 6000 3rd Qu.: 0.0 $1,300.00 : 198 $1,100.00 : 41 3rd Qu.: 88.460 3rd Qu.: 5.196
Max. :300000 Max. :30000.0 $900.00 : 198 $1,700.00 : 39 Max. :100.000 Max. :33.250
(Other) :2859 (Other) :4187
median(mydata$Expenses)
[1] 51
median(mydata$Income)
[1] 124
median(mydata$Assets)
[1] 3000
median(mydata$Debt)
[1] 0
After using Watson Analytics I found different patterns in the data, some of the patterns are: What is the breakdown of Income by job and home. The second pattern shows comparison of Expenses by Sseniority.