For your assignment you may be using different dataset than what is included here.
Always read carefully the instructions on Sakai.
Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to the task section.
We are going to use tidyverse a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. *Info:* https://www.tidyverse.org/
Loading required package: tidyverse
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[30m[32mv[30m [34mggplot2[30m 2.2.1 [32mv[30m [34mpurrr [30m 0.2.4
[32mv[30m [34mtibble [30m 1.4.2 [32mv[30m [34mdplyr [30m 0.7.4
[32mv[30m [34mtidyr [30m 0.7.2 [32mv[30m [34mstringr[30m 1.2.0
[32mv[30m [34mreadr [30m 1.1.1 [32mv[30m [34mforcats[30m 0.2.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
mydata= read_csv("C:\\Users\\hp\\Documents\\Spring 2018\\BSAD 343H\\Labs\\Lab 1\\01-notebook-lab\\Scoring.csv")
Parsed with column specification:
cols(
Status = col_character(),
Seniority = col_integer(),
Home = col_character(),
Time = col_integer(),
Age = col_integer(),
Marital = col_character(),
Records = col_character(),
Job = col_character(),
Expenses = col_integer(),
Income = col_integer(),
Assets = col_integer(),
Debt = col_integer(),
Amount = col_character(),
Price = col_character(),
Finrat = col_double(),
Savings = col_double()
)
head(mydata)
To extract the features (columns) from the dataset, use the name of the dataset follow by ‘$’ sign and the name the specific column.
Extract the first feature (column)
#Extracting the Checking Column
Expenses = mydata$Expenses
#Calling the Checking Column
Expenses [1:10]
[1] 73 48 90 63 46 75 75 35 90 90
Now, use the same procedure to extract the other feature
#Extracting the feature (column)
Savings = mydata$Savings
#Calling the feature (column)
Savings [1:10]
[1] 4.200000 4.980000 1.980000 7.933333 7.083871 12.830769 1.875000 2.700000
[9] 0.850000 -0.400000
# Calculate the feature average
meanExpenses = mean(Expenses)
# Inspect the variable with the calculated mean
meanExpenses
[1] 55.60144
Repeat the same procedure for the other feature
# Calculate the feature average
meanSavings = mean(Savings)
# Inspect the variable with the calculated mean
meanSavings
[1] 3.860083
Compute the standard deviation for the first feature
#Computing the standard deviation
spreadExpenses = sd(Expenses)
# Inspect the variable with the calculated sd
spreadExpenses
[1] 19.52084
Compute the standard deviation for the second feature
# Calculate the feature standard deviation
spreadSavings = sd(Savings)
# Inspect the variable with the calculated standard deviation
spreadSavings
[1] 3.726292
#Compute the snr of Checking and name it snr_Checking
snr_Expenses = meanExpenses/spreadExpenses
#Call snr_Checking
snr_Expenses
[1] 2.848312
# Find the SNR of the second feature
snr_Savings = meanSavings/spreadSavings
# Inspect the variable with the calculated SNR
snr_Savings
[1] 1.035905
Expenses has a higher SNR. This is because expenses has a greater variance than savings.
mydata[1,]
Below is an example of what the simple star relational schema should look like.
Example of how to create an start schema using erdplus
Example of how to export the final start schema on erdplus
Completed Star Schema Example
knitr::include_graphics('C:\\Users\\hp\\Documents\\Spring 2018\\BSAD 343H\\Labs\\Lab 1\\01-notebook-lab\\imgs\\img07.png')
Here we are going to familiarize with watson analytics, you should have access to the portal below.
https://watson.analytics.ibmcloud.com
knitr::include_graphics('C:\\Users\\hp\\Documents\\Spring 2018\\BSAD 343H\\Labs\\Lab 1\\01-notebook-lab\\imgs\\img08.png')
knitr::include_graphics('C:\\Users\\hp\\Documents\\Spring 2018\\BSAD 343H\\Labs\\Lab 1\\01-notebook-lab\\imgs\\img09.png')