A. Remember to always set your working directory to the source file location. Click on Session -> Set Working Directory -> To Source File Location
B. Examine the content of creditrisk.csv by opening & importing the file in RStudio. Steps to open & import the file are as follows: Click on File -> Import Dataset -> From CSV -> Browse -> data folder (double click) -> creditrisk -> open -> Import
The creditrisk.csv file will be shown on the RStudio window. You can examine the columns (field names) and the content of the file.
C. Visit this url https://erdplus.com/assets/images/StarSchemaDemoTiny.gif to learn how to create a star schema using ERDPlus. (IT’S VERY USEFUL!)
STEP A. On the ERDPLUS window click “Diagram” -> “New Star Schema” STEP B. Create a dimension by clicking on “Dimension” then click on the white drawing area (a dimension box will be created in the drawing area) STEP C. Name the new dimension by going to “Edit Dimension” (on the rigth) and type “CUSTOMER” (type it without the quotes) STEP D. Create an attribute for the CUSTOMER dimension by clicking on “Add” (under Columns) STEP E. Give name “C_ID” (type it without quotes) to the new column or attribute STEP F. Set a primary key for the dimension by Clicking on “Primary Key” and select “C_ID” as the primary key, then click “Done”
### TASK 3 # i - Repeat STEPS B through F to create a new dimension called “OCCUPATION” with attributes “O_ID”, “Job” and “MonthsEmployed”, and set O_ID as the primary key |
# ii - Repeat STEPS B through F to create a new dimension called “ACCOUNT” with attributes “A_ID”, “Checking” and “Savings”, and set A_ID as the primary key |
# iii - Repeat STEPS B through F to create a new dimension called “LOAN” with attributes “L_ID”, “LoanPurpose”, “CreditRisk”, and “Years”, and set L_ID as the primary key |
STEP A. Click on “Fact” then click on the middle of the white drawing area (a fact box will be created in the drawing area) STEP B. Name the fact table by going to “Edit Fact” (on the right) and type “CUSTOMER_CREDIT_RISK” STEP C. Click on “Connect” then click on CUSTOMER dimension, hold your click and move your cursor such that a red square appears around the CUSTOMER dimension, then drag the cursor to the CUSTOMER_CREDIT_RISK fact table. STEP D. Repeat STEP C to connect other dimensions (OCCUPATION, ACCOUNT, and LOAN) to the CUSTOMER_CREDIT_RISK fact table.
STEP A. Click on Diagram -> Export Image -> Save Image STEP B. Go to the place where you save the image (could be in your Downloads folder), search for the the image (its name could be erdplus-diagram.png), and move this image to the “imgs” folder of “bsad_lab02” folder STEP C. Insert the image of the star schema into this document by following the directions found in the rmd file of bsad_lab00 (under “Inserting Images”). Make sure to include the folder name where you keep the image (which is imgs), and to type the name of the image file correctly (including the file extension).
This is a caption
#The code below reads the creditrisk.csv file and name it mydata
mydata = read.csv(file="data/creditrisk.csv")
#The code below calls mydata using the head() function to see the top head data
head(mydata)
## Loan.Purpose Checking Savings Months.Customer Months.Employed Gender
## 1 Small Appliance 0 739 13 12 M
## 2 Furniture 0 1230 25 0 M
## 3 New Car 0 389 19 119 M
## 4 Furniture 638 347 13 14 M
## 5 Education 963 4754 40 45 M
## 6 Furniture 2827 0 11 13 M
## Marital.Status Age Housing Years Job Credit.Risk
## 1 Single 23 Own 3 Unskilled Low
## 2 Divorced 32 Own 1 Skilled High
## 3 Single 38 Own 4 Management High
## 4 Single 36 Own 2 Unskilled High
## 5 Single 31 Rent 3 Skilled Low
## 6 Married 25 Own 1 Skilled Low
#Run this chunk of codes by clicking on the green arrow botton
To capture, or extract, the checking and savings columns and perform some analytics on them, we must first extract the columns from the data separately. Using the ‘$’ sign following the label for the data extracts a specific column. For convenience, we relabel the extracted data.
#The code below extracts the Checking column. Note: pay attention on the name of the column including the upper and lower case letter(s)
checking = mydata$Checking
#The code below Calls the Checking Column to display top head values
head(checking)
## [1] 0 0 0 638 963 2827
#Run this chunk of codes by clicking on the green arrow botton.
#Write below the code to extract the Savings Column
savings = mydata$Savings
#write below the code to Call the Savings Column
head(savings)
## [1] 739 1230 389 347 4754 0
#Run the above two lines of codes within this chunk
In order to calculate the mean in R we use the mean() function.
#The code below calculates the average of checking data using the mean() function and names the checking average as meanChecking
meanChecking = mean(checking)
#The code below calls the meanChecking and display its value (which is the average of checking data)
meanChecking
## [1] 1048.014
#Run this chunk of codes by clicking on the green arrow botton.
#Write below the code to compute the average of savings data and name the savings average as meanSavings
meanSavings = mean(savings)
#write below the code to Call meanSavings and display its value (which is the average of savings data)
meanSavings
## [1] 1812.562
#Run the above two lines of codes within this chunk
In order to compute the standard deviation or spread in R we use the sd() function.
#The code below computes the standard deviation of checking data using the sd() function and names the checking stadard deviation as spreadChecking
spreadChecking = sd(checking)
#The code below calls the spreadChecking and display its value (which is the standard deviation of checking data)
spreadChecking
## [1] 3147.183
#Run this chunk of codes by clicking on the green arrow botton.
#Write below the code to compute the standard deviation of savings data and name it spreadSavings
spreadsavings = sd(savings)
#write below the code to Call spreadSavings and display its value (which is the standard deviation of savings data)
spreadsavings
## [1] 3597.285
#Run the above two lines of codes within this chunk
To compute the Signal-to-Noise Ratio or SNR, we need to create a formula because there is no built in function. SNR is the mean or average, divided by the spread.
#The code below computes the snr of checking data and names it snrChecking
snrChecking = meanChecking/spreadChecking
#The code below calls the snrChecking and display its value (which is the signal-to-noise ratio of checking data)
snrChecking
## [1] 0.3330006
#Run this chunk of codes by clicking on the green arrow botton.
#Write below the code to compute the snr of savings data and name it snrSavings
snrsavings = meanSavings/spreadsavings
#write below the code to Call snrSavings and display its value (which is the signal-to-noise ration of savings data)
snrsavings
## [1] 0.5038695
#Run the above two lines of codes within this chunk
Type your answers below: a. Savings b. Savings has more signal to noise ratio than Checking.