R-Studio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics. The Credit Risk Data displays the credit risk of an individual based on the loan they have taken out and other features of the individual.
R studio is able to compute various statistical and graphical techniques, such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, time series plots, maps, etc.
After downloading the bdad_lab01 zip folder, make sure to open the folder in the downloads, right click it, and select ‘extract’. This will give you a new unzipped folder. Next, we must set this folder as the working directory. The way to do this is to open R Studio, go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Now, follow the directions to complete the lab.
To begin the Lab, examine the content of the csv file ‘creditrisk.csv’ by opening the file in RStudio. Create a simple star relational schema in erdplus stanalone feature https://erdplus.com/#/standalone, take a screenshot of the image, and upload it below.
To add a picture, use the directions found in Lab 0. Below is an example of what the simple star relational schema should look like.
Finally export the diagram as an image.
Next, read the csv file into R Studio. It can be useful to name your data to create a shortcut to it. Here we will label the data, ‘mydata’. To see the data in the console, one can ‘call’ it by referring to it by its given name.
mydata = read.csv(file="data/Scoring.csv")
head(mydata)
To capture, or extract, the checking and savings columns and perform some analytics on them, we must first be able to extract the columns from the data separately. Using the ‘$’ sign following the label for the data extracts a specific column. For convenience, we relabel the extracted data.
Below, we have extracted the checking column.
#Extracting the Income column
income = mydata$Income
#Calling the Income column
income
[1] 129 131 200 182 107 214 125 80 107 80 125 121 199 170 50
[16] 131 330 200 130 137 107 324 112 140 143 130 180 251 85 150
[31] 122 198 150 170 119 208 115 99 120 90 137 230 142 71 120
[46] 233 289 128 150 145 90 301 200 150 100 100 100 155 715 245
[61] 150 70 190 152 126 181 185 170 176 238 115 200 411 93 108
[76] 500 45 250 100 70 150 70 263 200 78 120 125 50 146 70
[91] 105 413 500 350 200 138 80 208 137 58 130 123 180 140 315
[106] 164 325 135 109 185 217 300 77 253 101 200 124 143 250 135
[121] 115 160 214 390 500 95 200 85 214 140 60 180 300 200 242
[136] 155 100 105 166 120 115 350 214 442 101 122 250 90 160 300
[151] 83 200 60 205 133 179 69 195 112 210 155 394 149 120 400
[166] 165 125 74 86 185 165 110 138 300 147 348 112 350 110 75
[181] 230 85 210 125 149 201 105 183 113 126 300 160 160 89 95
[196] 125 120 359 80 67 148 298 318 185 39 194 80 147 156 178
[211] 130 63 88 140 115 83 144 200 172 200 318 177 133 150 208
[226] 145 157 190 86 90 100 214 100 117 110 168 150 166 283 149
[241] 250 120 236 277 55 200 235 80 200 140 125 185 190 150 102
[256] 170 315 130 156 177 341 240 142 333 125 170 220 230 157 340
[271] 120 91 150 88 120 70 106 162 77 128 189 300 92 380 500
[286] 130 141 220 146 250 132 150 127 90 166 69 182 77 50 131
[301] 62 60 86 143 120 283 138 90 99 160 330 100 210 100 97
[316] 65 227 140 115 150 90 275 176 110 140 500 66 273 145 67
[331] 232 80 130 200 154 187 135 160 133 230 154 50 189 202 20
[346] 40 50 125 100 115 160 70 220 99 107 160 150 232 120 225
[361] 100 80 208 105 98 532 140 90 155 120 159 122 156 90 130
[376] 45 118 152 135 125 208 81 160 300 135 121 145 165 428 233
[391] 245 232 200 42 87 200 250 350 105 100 160 132 140 160 148
[406] 113 97 107 206 275 176 175 223 70 87 42 60 300 144 217
[421] 180 69 110 50 86 172 109 95 203 114 186 155 195 246 113
[436] 290 103 125 199 133 210 124 78 200 100 95 183 43 120 198
[451] 100 150 133 110 202 140 123 275 200 180 55 105 146 100 136
[466] 90 144 116 128 105 125 68 400 251 225 140 34 320 124 87
[481] 161 90 92 110 87 93 110 81 98 155 167 190 110 76 170
[496] 214 70 55 80 81 139 206 74 120 300 224 80 166 110 161
[511] 80 150 170 100 225 166 200 110 110 160 204 135 158 290 100
[526] 115 149 143 150 199 142 80 93 113 92 205 210 256 260 106
[541] 160 148 220 63 106 81 100 260 243 88 186 105 80 154 35
[556] 63 297 161 200 140 125 78 57 300 66 202 175 150 158 92
[571] 123 125 113 167 148 156 123 143 180 92 150 92 139 250 315
[586] 121 180 129 214 99 110 85 138 93 100 160 215 90 464 167
[601] 86 125 274 135 300 300 128 90 115 107 300 230 122 43 250
[616] 190 100 246 85 173 150 86 183 100 94 150 140 107 117 230
[631] 470 102 210 42 130 90 120 117 160 90 50 100 182 135 67
[646] 167 137 140 85 324 77 100 42 251 180 400 158 300 90 146
[661] 255 250 50 150 232 190 178 254 110 70 100 137 258 242 85
[676] 179 125 500 140 70 150 245 193 110 70 115 85 400 130 120
[691] 250 108 225 300 250 150 156 319 100 51 145 168 54 100 115
[706] 209 180 63 130 170 157 79 64 99 35 150 65 100 160 215
[721] 112 126 150 136 123 80 72 110 111 60 184 104 290 173 125
[736] 60 160 126 148 170 230 166 67 190 80 60 91 96 180 130
[751] 67 100 830 125 30 237 152 95 139 145 300 135 80 100 275
[766] 250 125 100 130 131 350 538 276 107 105 156 110 260 34 81
[781] 76 98 174 132 105 300 117 119 140 70 265 180 62 189 75
[796] 70 188 150 124 113 98 184 195 159 959 170 28 268 155 240
[811] 500 135 225 100 190 200 154 210 145 210 102 91 75 150 120
[826] 36 129 125 156 72 166 75 125 250 128 155 459 360 100 80
[841] 250 105 300 137 110 127 56 75 52 100 170 150 188 175 64
[856] 110 186 140 200 135 130 200 33 69 90 175 197 125 170 247
[871] 250 130 207 65 180 250 129 159 116 88 120 100 100 113 80
[886] 150 52 71 99 63 166 108 134 138 110 160 176 70 60 130
[901] 176 130 108 144 150 183 108 200 180 130 140 141 127 117 130
[916] 240 425 800 80 95 71 135 47 250 150 118 130 125 171 265
[931] 100 69 293 87 110 75 40 140 122 161 70 65 66 73 178
[946] 140 50 78 60 350 140 125 121 155 133 117 125 136 25 176
[961] 430 125 79 92 176 85 188 300 100 120 60 125 464 176 90
[976] 200 415 318 163 72 156 49 110 140 92 74 145 104 400 95
[991] 150 160 177 85 380 107 134 81 191 100
[ reached getOption("max.print") -- omitted 3446 entries ]
Now, fill in the code to extract and call the savings column.
#Extracting expenses column
expenses = mydata$Expenses
#Calling expenses column
expenses
[1] 73 48 90 63 46 75 75 35 90 90 60 60 75 75 35
[16] 75 35 65 45 35 46 45 105 74 45 60 75 75 75 45
[31] 45 75 75 35 45 45 75 70 45 44 75 35 45 75 35
[46] 35 75 48 60 60 75 105 75 60 85 35 45 60 90 70
[61] 90 35 75 75 75 75 60 45 93 60 60 45 75 75 86
[76] 90 45 70 45 60 60 60 35 75 35 47 60 40 60 35
[91] 35 101 95 140 75 45 35 120 89 45 73 75 78 35 75
[106] 76 90 45 35 79 45 90 35 60 75 35 75 45 35 75
[121] 57 75 66 114 45 35 75 35 90 75 45 75 60 65 75
[136] 60 45 60 45 47 60 90 75 119 35 105 60 35 45 60
[151] 35 45 60 35 75 78 35 45 75 90 75 35 60 57 60
[166] 45 75 42 75 45 77 75 74 75 35 35 60 75 45 45
[181] 113 75 75 75 75 75 60 78 46 45 120 75 90 45 60
[196] 75 87 126 45 35 90 45 75 75 60 45 35 75 105 75
[211] 75 35 35 45 45 60 60 45 90 105 60 60 60 35 35
[226] 75 60 45 35 35 100 56 45 57 45 62 45 45 90 90
[241] 90 60 35 45 35 35 45 45 45 45 73 94 105 60 60
[256] 77 60 88 45 93 60 105 60 75 35 45 75 60 60 60
[271] 60 44 60 35 35 75 56 35 45 75 35 60 60 60 75
[286] 60 45 75 60 75 60 90 73 60 90 45 45 45 45 35
[301] 35 75 90 59 87 35 60 90 85 75 60 60 96 45 45
[316] 45 45 45 87 60 35 90 35 71 75 45 75 60 75 60
[331] 75 90 35 45 75 90 60 45 60 173 60 60 60 60 90
[346] 60 110 45 70 60 75 60 35 45 35 75 75 45 45 75
[361] 35 75 35 90 60 113 60 75 76 45 90 45 75 35 35
[376] 90 72 60 49 58 66 35 75 75 60 60 35 35 75 68
[391] 90 60 75 60 35 45 75 70 60 60 75 35 49 45 75
[406] 45 45 60 66 45 45 63 45 67 84 90 35 35 75 75
[421] 60 82 86 35 35 60 71 60 80 45 54 51 60 75 86
[436] 60 75 45 95 45 60 35 45 55 45 85 63 35 75 45
[451] 45 50 60 45 60 60 45 45 75 35 35 45 75 75 49
[466] 60 60 60 75 60 90 45 75 60 58 35 45 107 45 54
[481] 35 90 75 60 60 54 45 43 45 60 60 60 75 35 62
[496] 35 35 35 45 75 35 66 35 35 90 75 68 90 60 45
[511] 45 75 52 45 68 77 45 45 46 45 35 45 45 75 60
[526] 60 45 75 60 35 59 75 69 45 45 75 35 60 101 45
[541] 60 60 75 75 75 75 35 75 60 44 75 75 75 60 64
[556] 66 45 91 35 45 90 35 35 135 60 75 90 45 45 45
[571] 45 88 45 45 75 75 60 120 93 35 90 60 45 75 75
[586] 35 35 105 90 45 75 75 90 60 75 76 45 35 90 35
[601] 84 60 105 75 60 75 103 90 35 56 75 105 45 35 90
[616] 90 45 60 69 77 75 90 45 55 60 60 74 56 60 60
[631] 82 60 75 35 48 45 60 60 75 69 45 75 75 74 45
[646] 45 60 59 35 45 60 45 39 60 35 60 91 75 44 90
[661] 75 35 45 60 45 35 63 75 56 75 85 60 90 60 90
[676] 45 58 90 60 52 60 90 75 35 52 87 35 85 35 102
[691] 130 75 75 105 75 35 75 107 85 40 45 52 65 85 60
[706] 35 60 35 60 60 35 35 90 45 35 75 45 35 91 60
[721] 75 45 90 35 45 90 45 60 56 35 75 45 75 35 60
[736] 35 75 73 60 75 68 60 35 35 60 66 35 35 75 60
[751] 45 75 60 45 60 69 60 60 74 45 45 60 35 60 45
[766] 90 75 90 118 75 60 60 75 35 45 45 90 60 45 68
[781] 75 60 135 60 35 75 45 60 90 35 35 78 60 35 43
[796] 35 75 35 45 45 75 35 90 61 135 92 35 102 60 84
[811] 90 75 35 45 60 60 105 45 60 90 35 45 105 75 35
[826] 64 90 45 75 60 75 35 90 70 45 60 60 75 85 53
[841] 90 45 105 60 90 35 35 90 35 35 62 35 75 75 45
[856] 35 45 75 45 90 45 60 60 75 45 60 90 90 45 100
[871] 45 75 75 97 90 60 45 90 35 75 72 70 60 35 105
[886] 60 75 35 55 35 75 45 58 74 60 45 60 35 51 45
[901] 35 90 35 60 60 108 56 90 78 58 74 75 60 60 88
[916] 75 35 75 35 45 35 45 35 90 35 60 60 45 77 45
[931] 45 82 75 35 45 75 75 45 60 45 35 75 60 42 78
[946] 35 35 45 75 60 59 75 102 35 35 75 35 75 75 93
[961] 45 88 35 69 45 75 90 75 35 57 35 103 90 35 35
[976] 60 35 45 35 97 45 35 75 45 60 35 60 75 75 45
[991] 60 35 60 60 113 60 45 35 124 75
[ reached getOption("max.print") -- omitted 3446 entries ]
In order to calculate the mean, or the average by hand of the checkings columns, one can add each individual entry and divide by the total number or rows. This would take much time, but thankfully, R has a command for this.
We have done an example using the checkings column. Compute the same using the savings column.
#Here, I calculate and call the mean of the three columns
mean_income = mean(income)
mean_income
[1] 140.6298
mean_expenses = mean(expenses)
mean_expenses
[1] 55.60144
#Next, compute the standard deviation or spread of both the checkings and savings columns.
spread_income = sd(income) spread_income ```
Now, to compute the SNR, the signal to noise ratio, a formula is created because there is no built in function.
SNR is the mean, or average, divided by the spread.
# SNR
snr_expenses = mean_expenses/spread_expenses
snr_expenses
snr_income = mean_income/spread_income
snr_income
str(mydata)
Of the Checking and Savings, which has a higher SNR? Why do you think that is?
After using Watson Analytics to find patterns in the data, save your work and upload a screenshot here. Refer to Task 1 on how to upload a photo.