ATTENTION:

1. READ EVERY SINGLE LINE BELOW TO GET A COMPLETE UNDERSTANDING OF WHAT NEED TO BE DONE IN THIS ASSIGNMENT.

2. SET UP THE DIRECTORY AS INSTRUCTED IN THE “SETUP” BELOW BEFORE YOU PROCEED WITH THIS ASSIGNMENT TO AVOID ANY ERRORS.

4. DO THE TASKS SEQUENTIALLY TO AVOID ERRORS WHEN YOU RUN THE CODES.

5. BEFORE YOU KNIT YOUR RMD FILE AND PUBLISH IT ON THE RPUBS, MAKE SURE TO RUN EACH CHUNK OF YOUR CODES BY CLICKING THE GREEN ARROW ON THE RIGHT OF THE CODE CHUNK.

6. FULL POINTS WILL ONLY BE GIVEN TO TASKS WHICH CODES ARE RUN SUCCESSFULLY (ERROR-FREE), FOR THE LAST TASK - IF YOU ANSWER CORRECTLY BOTH QUESTIONS

7. PUBLISH YOUR WORK TO RPUBS BY NAMING IT “bsad_lab02”, COPY YOUR RPUBS LINK TO SAKAI AND SUBMIT THE LINK.

SETUP

A. Remember to always set your working directory to the source file location. Click on Session -> Set Working Directory -> To Source File Location

B. Examine the content of creditrisk.csv by opening & importing the file in RStudio. Steps to open & import the file are as follows: Click on File -> Import Dataset -> From Text (readr) -> Browse -> data folder (double click) -> creditrisk -> open -> Import

The creditrisk.csv file will be shown on the RStudio window. You can examine the columns (field names) and the content of the file.

C. Visit this url https://erdplus.com/assets/images/StarSchemaDemoTiny.gif to learn how to create a star schema using ERDPlus. (IT’S VERY USEFUL!)

TASK 1 - Go to this url https://erdplus.com/#/standalone and follow the steps below, sequentially, to create a star schema:

STEP A. On the ERDPLUS window click “Diagram” -> “New Star Schema”. STEP B. Create a dimension by clicking on “Dimension” then click on the white drawing area (a dimension box will be created in the drawing area). STEP C. Name the new dimension by going to “Edit Dimension” (on the right) and type “CUSTOMER” (type it without the quotes). STEP D. Create an attribute for the CUSTOMER dimension by clicking on “Add” (under Columns). STEP E. Give name “C_ID” (type it without quotes) to the new column or attribute. STEP F. Set a primary key for the dimension by Clicking on “Primary Key” and select “C_ID” as the primary key, then click “Done”.

TASK 2 - Repeat only STEP D and STEP E to create two more attributes and name them “Age” and “Gender”.

## TASK 3 # i - Repeat STEPS B through F to create a new dimension called “OCCUPATION” with attributes “O_ID”, “Job” and “MonthsEmployed”, and set O_ID as the primary key.

# ii - Repeat STEPS B through F to create a new dimension called “ACCOUNT” with attributes “A_ID”, “Checking” and “Savings”, and set A_ID as the primary key.

# iii - Repeat STEPS B through F to create a new dimension called “LOAN” with attributes “L_ID”, “LoanPurpose”, “CreditRisk”, and “Years”, and set L_ID as the primary key.

TASK 4 - Create a Fact table and connect it to the dimension tables by following these steps:

STEP A. Click on “Fact” then click on the middle of the white drawing area (a fact box will becreated in the drawing area). STEP B. Name the fact table by going to “Edit Fact” (on the right) and type “CUSTOMER_CREDIT_RISK”. STEP C. Click on “Connect” then click on CUSTOMER dimension, hold your click and move your cursor such that a red square appears around the CUSTOMER dimension, then drag the cursor to the CUSTOMER_CREDIT_RISK fact table. STEP D. Repeat STEP C to connect other dimensions (OCCUPATION, ACCOUNT, and LOAN) to the CUSTOMER_CREDIT_RISK fact table.

TASK 5 - Export the star schema diagram that you just created and insert the image onto this document by following these steps:

STEP A. Click on Diagram -> Export Image -> Save Image. STEP B. Go to the place where you save the image (could be in your Downloads folder), search for the the image (its name could be erdplus-diagram.png), and move or copy this image to the “imgs” folder of “bsad_lab02” folder. STEP C. Insert the image of the star schema into this document by following the directions found in the rmd file of bsad_lab00 (under “Inserting Images”). Make sure to include the folder name where you keep the image (which is imgs), and to type the name of the image file correctly (including the file extension).

Write the code to insert the image below. Note: you don’t have to insert the image inside an R chunk. Just write the code to insert the image below this line (on this white space).

star schema

Next, read the creditrisk.csv file into R Studio. It can be useful to name your data to create a shortcut to it. Here we will name the data as mydata. To see the top head data in the console, you can call it using the function head() and referring to it by this given shortcut name.

#The code below reads the creditrisk.csv file and name it mydata.
mydata = read.csv(file="data/creditrisk.csv")

#The code below calls mydata using the head() function to see the top head data.
head(mydata)

##      Loan.Purpose Checking Savings Months.Customer Months.Employed Gender
## 1 Small Appliance        0     739              13              12      M
## 2       Furniture        0    1230              25               0      M
## 3         New Car        0     389              19             119      M
## 4       Furniture      638     347              13              14      M
## 5       Education      963    4754              40              45      M
## 6       Furniture     2827       0              11              13      M
##   Marital.Status Age Housing Years        Job Credit.Risk
## 1         Single  23     Own     3  Unskilled         Low
## 2       Divorced  32     Own     1    Skilled        High
## 3         Single  38     Own     4 Management        High
## 4         Single  36     Own     2  Unskilled        High
## 5         Single  31    Rent     3    Skilled         Low
## 6        Married  25     Own     1    Skilled         Low

#Run this chunk of codes by clicking on the green arrow button.

To capture, or extract, the checking and savings columns and perform some analytics on them, we must first extract the columns from the data separately. Using the ‘$’ sign following the label for the data extracts a specific column. For convenience, we relabel the extracted data.

#The code below extracts the Checking column and keeps its content (the checking data) in a variable called my_checking. Pay attention on the name of the column including the upper and lower case letter(s). Notes: the column name has a capitalized C at the beginning of the name.
my_checking = mydata$Checking 

#The code below calls the variable my_checking to display its top head values.
head(my_checking)

## [1]    0    0    0  638  963 2827

#Run this chunk of codes by clicking on the green arrow button.

TASK 6 - As shown above, we have extracted the Checking column, kept its content in variable my_checking and called the variable to display its content. Now write similar codes for the savings data as instructed below.

#Write below the code to extract the Savings column and keeps its content in a variable called my_savings.

my_savings = mydata$Savings

#write below the code to call the variable my_savings to display its top head values.

head(my_savings)

## [1]  739 1230  389  347 4754    0

#Run the above two lines of codes within this chunk by clicking on the green arrow button.

In order to calculate the mean in R we use the mean() function.

#The code below calculates  the average of checking data (which is saved in variable my_checking) using the mean() function and names the average as my_meanchecking.
my_meanchecking = mean(my_checking)

#The code below calls the my_meanchecking to display its value.
my_meanchecking

## [1] 1048.014

#Run this chunk of codes by clicking on the green arrow button.

TASK 7 - As shown above, we have computed the average for the checking data and named it my_meanchecking. Now write similar codes for the savings data as instructed below.

#Write below the code to compute the average of savings data (which is saved in variable my_savings), using the mean() function and name the average as my_meansavings. 

my_meansavings = mean(my_savings)

#Write below the code to call my_meansavings to display its value.

my_meansavings

## [1] 1812.562

#Run the above two lines of codes within this chunk by clicking on the green arrow button.

In order to compute the standard deviation or spread in R we use the sd() function.

#The code below computes  the standard deviation of checking data (which is saved in variable my_checking) using the sd() function and names the stadard deviation as my_spreadchecking
my_spreadchecking = sd(my_checking)

#The code below calls my_spreadchecking to display its value.
my_spreadchecking

## [1] 3147.183

#Run this chunk of codes by clicking on the green arrow button.

TASK 8 - As shown above, we have computed the standard deviation for the checking data and named it my_spreadchecking. Now write similar codes for the savings data as instructed below.

#Write below the code to compute the standard deviation of savings data (which is saved in variable my_savings), using the sd() function and name the standard deviation as my_spreadsavings

my_spreadsavings = sd(my_savings)

#write below the code to call my_spreadsavings to display its value.

my_spreadsavings

## [1] 3597.285

#Run the above two lines of codes within this chunk by clicking on the green arrow button.

To compute the Signal-to-Noise Ratio or SNR, we need to create a formula because there is no built in function in R. SNR is the mean or average, divided by the spread.

#The code below computes the SNR of checking data and names it my_snrchecking.
my_snrchecking = my_meanchecking/my_spreadchecking

#The code below calls the my_snrchecking to display its value.
my_snrchecking

## [1] 0.3330006

#Run this chunk of codes by clicking on the green arrow button.

TASK 9 - As shown above, we have computed the SNR for the checking data and named it my_snrchecking. Now write similar codes for the savings data as instructed below.

#Write below the code to compute the SNR of savings data and name it my_snrsavings.

my_snrsavings = my_meansavings/my_spreadsavings

#Write below the code to call my_snrsavings to display its value.

my_snrsavings

## [1] 0.5038695

#Run the above two lines of codes within this chunk by clicking on the green arrow button.

TASK 10 - Of the results from my_snrchecking and my_snrsavings:

A. Which one has a higher SNR? Type your answer here:

Savings

B. Why do you think that is? Type your answer here:

Because the savings has a greater mean and standard deviation than the checkings. As well as, the savings only has one zero entry, whereas the checking has three. Thus, the savings has more signals and relevant data than the noise like the zeros.

Lab02

CME Group Foundation Business Analytics Lab

Ahad Sami

2/5/2019

ATTENTION:

1. READ EVERY SINGLE LINE BELOW TO GET A COMPLETE UNDERSTANDING OF WHAT NEED TO BE DONE IN THIS ASSIGNMENT.

2. SET UP THE DIRECTORY AS INSTRUCTED IN THE “SETUP” BELOW BEFORE YOU PROCEED WITH THIS ASSIGNMENT TO AVOID ANY ERRORS.

4. DO THE TASKS SEQUENTIALLY TO AVOID ERRORS WHEN YOU RUN THE CODES.

5. BEFORE YOU KNIT YOUR RMD FILE AND PUBLISH IT ON THE RPUBS, MAKE SURE TO RUN EACH CHUNK OF YOUR CODES BY CLICKING THE GREEN ARROW ON THE RIGHT OF THE CODE CHUNK.

6. FULL POINTS WILL ONLY BE GIVEN TO TASKS WHICH CODES ARE RUN SUCCESSFULLY (ERROR-FREE), FOR THE LAST TASK - IF YOU ANSWER CORRECTLY BOTH QUESTIONS

7. PUBLISH YOUR WORK TO RPUBS BY NAMING IT “bsad_lab02”, COPY YOUR RPUBS LINK TO SAKAI AND SUBMIT THE LINK.

SETUP

TASK 1 - Go to this url https://erdplus.com/#/standalone and follow the steps below, sequentially, to create a star schema:

TASK 2 - Repeat only STEP D and STEP E to create two more attributes and name them “Age” and “Gender”.

TASK 4 - Create a Fact table and connect it to the dimension tables by following these steps:

TASK 5 - Export the star schema diagram that you just created and insert the image onto this document by following these steps:

Write the code to insert the image below. Note: you don’t have to insert the image inside an R chunk. Just write the code to insert the image below this line (on this white space).

Next, read the creditrisk.csv file into R Studio. It can be useful to name your data to create a shortcut to it. Here we will name the data as mydata. To see the top head data in the console, you can call it using the function head() and referring to it by this given shortcut name.

TASK 6 - As shown above, we have extracted the Checking column, kept its content in variable my_checking and called the variable to display its content. Now write similar codes for the savings data as instructed below.

TASK 7 - As shown above, we have computed the average for the checking data and named it my_meanchecking. Now write similar codes for the savings data as instructed below.

TASK 8 - As shown above, we have computed the standard deviation for the checking data and named it my_spreadchecking. Now write similar codes for the savings data as instructed below.

TASK 9 - As shown above, we have computed the SNR for the checking data and named it my_snrchecking. Now write similar codes for the savings data as instructed below.

TASK 10 - Of the results from my_snrchecking and my_snrsavings:

Lab02

CME Group Foundation Business Analytics Lab

Ahad Sami

2/5/2019

ATTENTION:

1. READ EVERY SINGLE LINE BELOW TO GET A COMPLETE UNDERSTANDING OF WHAT NEED TO BE DONE IN THIS ASSIGNMENT.

2. SET UP THE DIRECTORY AS INSTRUCTED IN THE “SETUP” BELOW BEFORE YOU PROCEED WITH THIS ASSIGNMENT TO AVOID ANY ERRORS.

3. THERE ARE TEN TASKS IN THIS ASSIGNMENT WHICH ARE RELATED TO DATA MODELING WITH ERDPLUS AND SIGNAL-TO-NOISE RATIO CALCULATION.

4. DO THE TASKS SEQUENTIALLY TO AVOID ERRORS WHEN YOU RUN THE CODES.

5. BEFORE YOU KNIT YOUR RMD FILE AND PUBLISH IT ON THE RPUBS, MAKE SURE TO RUN EACH CHUNK OF YOUR CODES BY CLICKING THE GREEN ARROW ON THE RIGHT OF THE CODE CHUNK.

6. FULL POINTS WILL ONLY BE GIVEN TO TASKS WHICH CODES ARE RUN SUCCESSFULLY (ERROR-FREE), FOR THE LAST TASK - IF YOU ANSWER CORRECTLY BOTH QUESTIONS

7. PUBLISH YOUR WORK TO RPUBS BY NAMING IT “bsad_lab02”, COPY YOUR RPUBS LINK TO SAKAI AND SUBMIT THE LINK.

SETUP

TASK 1 - Go to this url https://erdplus.com/#/standalone and follow the steps below, sequentially, to create a star schema:

TASK 2 - Repeat only STEP D and STEP E to create two more attributes and name them “Age” and “Gender”.

TASK 4 - Create a Fact table and connect it to the dimension tables by following these steps:

TASK 5 - Export the star schema diagram that you just created and insert the image onto this document by following these steps:

Write the code to insert the image below. Note: you don’t have to insert the image inside an R chunk. Just write the code to insert the image below this line (on this white space).

Next, read the creditrisk.csv file into R Studio. It can be useful to name your data to create a shortcut to it. Here we will name the data as mydata. To see the top head data in the console, you can call it using the function head() and referring to it by this given shortcut name.

TASK 6 - As shown above, we have extracted the Checking column, kept its content in variable my_checking and called the variable to display its content. Now write similar codes for the savings data as instructed below.

TASK 7 - As shown above, we have computed the average for the checking data and named it my_meanchecking. Now write similar codes for the savings data as instructed below.

TASK 8 - As shown above, we have computed the standard deviation for the checking data and named it my_spreadchecking. Now write similar codes for the savings data as instructed below.

TASK 9 - As shown above, we have computed the SNR for the checking data and named it my_snrchecking. Now write similar codes for the savings data as instructed below.

TASK 10 - Of the results from my_snrchecking and my_snrsavings: