TASK 1: Create a New Rmd File And Save It In Your Desired Folder

TASK 2: Download The Data as a CSV file And Save It In The Same Folder

TASK 3: Set The Working Directory To The Folder In Which You Have Saved The Data

setwd("C:/Users/Dell/Downloads/MLM/Session 4")

Q.1a Write R code to read the data into a dataframe called “df”.

# reading external data and storing into a dataframe called "df"
df <- read.csv("DefaultData.csv")

Q.1b Also write R code to read the data into a data table called “dt”.

# reading external data and storing into a datatable called "dt"
library(data.table)

## Warning: package 'data.table' was built under R version 3.4.4

dt <- data.table(df)

Q.2 Write R code to get the dimensions of the dataframe “df”

d <- dim(df)
d

## [1] 10000     4

Q.3 Write R code to list the column names of the dataframe “df”

c <- colnames(df)
c

## [1] "default" "student" "balance" "income"

Q.4 Write R code to attach the dataframe “df”

attach(df)

Q.5a Write R code to list the data structures of the columns in the dataframe “df”

str(df)

## 'data.frame':    10000 obs. of  4 variables:
##  $ default: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ student: Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 2 1 1 ...
##  $ balance: num  730 817 1074 529 786 ...
##  $ income : num  44362 12106 31767 35704 38463 ...

Q.5b Also Write R Code To List The Data Structures Of The Columns in the data.table “dt”. Notice if there is any difference in the outputs.

str(dt)

## Classes 'data.table' and 'data.frame':   10000 obs. of  4 variables:
##  $ default: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ student: Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 2 1 1 ...
##  $ balance: num  730 817 1074 529 786 ...
##  $ income : num  44362 12106 31767 35704 38463 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Q.6 Write R code to count how many consumers default on their loan

attach(df)

## The following objects are masked from df (pos = 3):
## 
##     balance, default, income, student

table(default)

## default
##   No  Yes 
## 9667  333

Q.7 Write R code to count how many consumers default on their loan, further broken down by whether or not they are students

attach(df)

## The following objects are masked from df (pos = 3):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 4):
## 
##     balance, default, income, student

table(default,student)

##        student
## default   No  Yes
##     No  6850 2817
##     Yes  206  127

Q.8 Write R code to create the complete contingency table of defaulters broken down by students

# creating contingency table
attach(df)

## The following objects are masked from df (pos = 3):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 4):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 5):
## 
##     balance, default, income, student

tab1 <- table(default,student)
# Margin of rows
addmargins(tab1, c(1,2))

##        student
## default    No   Yes   Sum
##     No   6850  2817  9667
##     Yes   206   127   333
##     Sum  7056  2944 10000

Q.9 Write R code to calculate the percentage of Defaulters and non-Defaulters, rounded to 1 decimal place

attach(df)

## The following objects are masked from df (pos = 3):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 4):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 5):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 6):
## 
##     balance, default, income, student

tab2 <- table(default)
protable <- prop.table(tab2)
protable

## default
##     No    Yes 
## 0.9667 0.0333

Perc <- round(protable*100,1)
Perc

## default
##   No  Yes 
## 96.7  3.3

Q.10 Write R code to get Mean, Standard Deviation and Variance Of The Income

attach(df)

## The following objects are masked from df (pos = 3):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 4):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 5):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 6):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 7):
## 
##     balance, default, income, student

m <- mean(income)
sd <- sd(income)
v <- var(income)
m;sd;v

## [1] 33516.98

## [1] 13336.64

## [1] 177865955

Q.11 Write R code to calculate the Minimum And Maximum Income, rounding it to 2 decimal places

attach(df)

## The following objects are masked from df (pos = 3):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 4):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 5):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 6):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 7):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 8):
## 
##     balance, default, income, student

mn <- round(min(income),2)
mn

## [1] 771.97

mx <- round(max(income),2)
mx

## [1] 73554.23

Q.12a Write R code to print the following Descriptive Statistics:

attach(df)

## The following objects are masked from df (pos = 3):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 4):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 5):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 6):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 7):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 8):
## 
##     balance, default, income, student

## The following objects are masked from df (pos = 9):
## 
##     balance, default, income, student

library(psych)

## Warning: package 'psych' was built under R version 3.4.4

## 
## Attaching package: 'psych'

## The following object is masked from 'df':
## 
##     income
## 
## The following object is masked from 'df':
## 
##     income
## 
## The following object is masked from 'df':
## 
##     income
## 
## The following object is masked from 'df':
## 
##     income
## 
## The following object is masked from 'df':
## 
##     income
## 
## The following object is masked from 'df':
## 
##     income
## 
## The following object is masked from 'df':
## 
##     income
## 
## The following object is masked from 'df':
## 
##     income

describe(df)[,c(1:5,8:9)]

##          vars     n     mean       sd   median    min      max
## default*    1 10000     1.03     0.18     1.00   1.00     2.00
## student*    2 10000     1.29     0.46     1.00   1.00     2.00
## balance     3 10000   835.37   483.71   823.64   0.00  2654.32
## income      4 10000 33516.98 13336.64 34552.64 771.97 73554.23

Q.12b In the above output, Interpret the meaning of the 1.29 written as the mean of the student column.

# R alphabetically assigns the category variables the number so NO = 1 and YES =2.
# So, mean of 1.29 means that number of non student persons are approximately 2900 or 29%.

Q.13 Write R code to get average of balance, broken down by whether consumers default on their loan

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.4.4

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:data.table':
## 
##     between, first, last

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

group<- group_by(df,default)
summarise(group, mean = mean(balance))

## # A tibble: 2 x 2
##   default  mean
##   <fct>   <dbl>
## 1 No       804.
## 2 Yes     1748.

Q.13b Write R code to create a Histogram of balance

hist(balance,main = "Histogram of Balance", xlab = "balance",col = c("green"))

Q.14 Write R code to get a breakdown of the mean and standard deviation of the balance, with respect to whether someone is a student and whether he or she has defaulted in payment, as shown in the following output

group <- group_by(df,default,student)
# summrising by grouping variables
summarise(group, N = n(),
MeanBalance = mean(balance, na.rm = TRUE),
SDbalance  = sd(balance, na.rm = TRUE))

## # A tibble: 4 x 5
## # Groups:   default [2]
##   default student     N MeanBalance SDbalance
##   <fct>   <fct>   <int>       <dbl>     <dbl>
## 1 No      No       6850        745.      446.
## 2 No      Yes      2817        948.      451.
## 3 Yes     No        206       1678.      331.
## 4 Yes     Yes       127       1860.      329.

Q.15 Write R code to create a Box-Plot for credit card balance

# plotting the boxplot for inbuilt data
boxplot(balance,width = 0.5,
horizontal = TRUE,main = "Boxplot for Balance",
xlab = "balance",col = c("green"))

EDA Cred card

Jay Modi

27 June 2019

TASK 1: Create a New Rmd File And Save It In Your Desired Folder

TASK 2: Download The Data as a CSV file And Save It In The Same Folder

TASK 3: Set The Working Directory To The Folder In Which You Have Saved The Data

Q.1a Write R code to read the data into a dataframe called “df”.

Q.1b Also write R code to read the data into a data table called “dt”.

Q.2 Write R code to get the dimensions of the dataframe “df”

Q.3 Write R code to list the column names of the dataframe “df”

Q.4 Write R code to attach the dataframe “df”

Q.5a Write R code to list the data structures of the columns in the dataframe “df”

Q.5b Also Write R Code To List The Data Structures Of The Columns in the data.table “dt”. Notice if there is any difference in the outputs.

Q.6 Write R code to count how many consumers default on their loan

Q.7 Write R code to count how many consumers default on their loan, further broken down by whether or not they are students

Q.8 Write R code to create the complete contingency table of defaulters broken down by students

Q.9 Write R code to calculate the percentage of Defaulters and non-Defaulters, rounded to 1 decimal place

Q.10 Write R code to get Mean, Standard Deviation and Variance Of The Income

Q.11 Write R code to calculate the Minimum And Maximum Income, rounding it to 2 decimal places

Q.12a Write R code to print the following Descriptive Statistics:

Q.12b In the above output, Interpret the meaning of the 1.29 written as the mean of the student column.

Q.13 Write R code to get average of balance, broken down by whether consumers default on their loan

Q.13b Write R code to create a Histogram of balance

Q.14 Write R code to get a breakdown of the mean and standard deviation of the balance, with respect to whether someone is a student and whether he or she has defaulted in payment, as shown in the following output

Q.15 Write R code to create a Box-Plot for credit card balance