TASK 1: Create a New Rmd File And Save It In Your Desired Folder
TASK 2: Download The Data as a CSV file And Save It In The Same Folder
TASK 3: Set The Working Directory To The Folder In Which You Have Saved The Data
setwd("C:/Users/Dell/Downloads/MLM/Session 4")
Q.1a Write R code to read the data into a dataframe called “df”.
# reading external data and storing into a dataframe called "df"
df <- read.csv("DefaultData.csv")
Q.1b Also write R code to read the data into a data table called “dt”.
# reading external data and storing into a datatable called "dt"
library(data.table)
## Warning: package 'data.table' was built under R version 3.4.4
dt <- data.table(df)
Q.2 Write R code to get the dimensions of the dataframe “df”
d <- dim(df)
d
## [1] 10000 4
Q.3 Write R code to list the column names of the dataframe “df”
c <- colnames(df)
c
## [1] "default" "student" "balance" "income"
Q.4 Write R code to attach the dataframe “df”
attach(df)
Q.5a Write R code to list the data structures of the columns in the dataframe “df”
str(df)
## 'data.frame': 10000 obs. of 4 variables:
## $ default: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ student: Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 2 1 1 ...
## $ balance: num 730 817 1074 529 786 ...
## $ income : num 44362 12106 31767 35704 38463 ...
Q.5b Also Write R Code To List The Data Structures Of The Columns in the data.table “dt”. Notice if there is any difference in the outputs.
str(dt)
## Classes 'data.table' and 'data.frame': 10000 obs. of 4 variables:
## $ default: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ student: Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 2 1 1 ...
## $ balance: num 730 817 1074 529 786 ...
## $ income : num 44362 12106 31767 35704 38463 ...
## - attr(*, ".internal.selfref")=<externalptr>
Q.6 Write R code to count how many consumers default on their loan
attach(df)
## The following objects are masked from df (pos = 3):
##
## balance, default, income, student
table(default)
## default
## No Yes
## 9667 333
Q.7 Write R code to count how many consumers default on their loan, further broken down by whether or not they are students
attach(df)
## The following objects are masked from df (pos = 3):
##
## balance, default, income, student
## The following objects are masked from df (pos = 4):
##
## balance, default, income, student
table(default,student)
## student
## default No Yes
## No 6850 2817
## Yes 206 127
Q.8 Write R code to create the complete contingency table of defaulters broken down by students
# creating contingency table
attach(df)
## The following objects are masked from df (pos = 3):
##
## balance, default, income, student
## The following objects are masked from df (pos = 4):
##
## balance, default, income, student
## The following objects are masked from df (pos = 5):
##
## balance, default, income, student
tab1 <- table(default,student)
# Margin of rows
addmargins(tab1, c(1,2))
## student
## default No Yes Sum
## No 6850 2817 9667
## Yes 206 127 333
## Sum 7056 2944 10000
Q.9 Write R code to calculate the percentage of Defaulters and non-Defaulters, rounded to 1 decimal place
attach(df)
## The following objects are masked from df (pos = 3):
##
## balance, default, income, student
## The following objects are masked from df (pos = 4):
##
## balance, default, income, student
## The following objects are masked from df (pos = 5):
##
## balance, default, income, student
## The following objects are masked from df (pos = 6):
##
## balance, default, income, student
tab2 <- table(default)
protable <- prop.table(tab2)
protable
## default
## No Yes
## 0.9667 0.0333
Perc <- round(protable*100,1)
Perc
## default
## No Yes
## 96.7 3.3
Q.10 Write R code to get Mean, Standard Deviation and Variance Of The Income
attach(df)
## The following objects are masked from df (pos = 3):
##
## balance, default, income, student
## The following objects are masked from df (pos = 4):
##
## balance, default, income, student
## The following objects are masked from df (pos = 5):
##
## balance, default, income, student
## The following objects are masked from df (pos = 6):
##
## balance, default, income, student
## The following objects are masked from df (pos = 7):
##
## balance, default, income, student
m <- mean(income)
sd <- sd(income)
v <- var(income)
m;sd;v
## [1] 33516.98
## [1] 13336.64
## [1] 177865955
Q.11 Write R code to calculate the Minimum And Maximum Income, rounding it to 2 decimal places
attach(df)
## The following objects are masked from df (pos = 3):
##
## balance, default, income, student
## The following objects are masked from df (pos = 4):
##
## balance, default, income, student
## The following objects are masked from df (pos = 5):
##
## balance, default, income, student
## The following objects are masked from df (pos = 6):
##
## balance, default, income, student
## The following objects are masked from df (pos = 7):
##
## balance, default, income, student
## The following objects are masked from df (pos = 8):
##
## balance, default, income, student
mn <- round(min(income),2)
mn
## [1] 771.97
mx <- round(max(income),2)
mx
## [1] 73554.23
Q.12a Write R code to print the following Descriptive Statistics:
attach(df)
## The following objects are masked from df (pos = 3):
##
## balance, default, income, student
## The following objects are masked from df (pos = 4):
##
## balance, default, income, student
## The following objects are masked from df (pos = 5):
##
## balance, default, income, student
## The following objects are masked from df (pos = 6):
##
## balance, default, income, student
## The following objects are masked from df (pos = 7):
##
## balance, default, income, student
## The following objects are masked from df (pos = 8):
##
## balance, default, income, student
## The following objects are masked from df (pos = 9):
##
## balance, default, income, student
library(psych)
## Warning: package 'psych' was built under R version 3.4.4
##
## Attaching package: 'psych'
## The following object is masked from 'df':
##
## income
##
## The following object is masked from 'df':
##
## income
##
## The following object is masked from 'df':
##
## income
##
## The following object is masked from 'df':
##
## income
##
## The following object is masked from 'df':
##
## income
##
## The following object is masked from 'df':
##
## income
##
## The following object is masked from 'df':
##
## income
##
## The following object is masked from 'df':
##
## income
describe(df)[,c(1:5,8:9)]
## vars n mean sd median min max
## default* 1 10000 1.03 0.18 1.00 1.00 2.00
## student* 2 10000 1.29 0.46 1.00 1.00 2.00
## balance 3 10000 835.37 483.71 823.64 0.00 2654.32
## income 4 10000 33516.98 13336.64 34552.64 771.97 73554.23
Q.12b In the above output, Interpret the meaning of the 1.29 written as the mean of the student column.
# R alphabetically assigns the category variables the number so NO = 1 and YES =2.
# So, mean of 1.29 means that number of non student persons are approximately 2900 or 29%.
Q.13 Write R code to get average of balance, broken down by whether consumers default on their loan
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
group<- group_by(df,default)
summarise(group, mean = mean(balance))
## # A tibble: 2 x 2
## default mean
## <fct> <dbl>
## 1 No 804.
## 2 Yes 1748.
Q.13b Write R code to create a Histogram of balance
hist(balance,main = "Histogram of Balance", xlab = "balance",col = c("green"))

Q.14 Write R code to get a breakdown of the mean and standard deviation of the balance, with respect to whether someone is a student and whether he or she has defaulted in payment, as shown in the following output
group <- group_by(df,default,student)
# summrising by grouping variables
summarise(group, N = n(),
MeanBalance = mean(balance, na.rm = TRUE),
SDbalance = sd(balance, na.rm = TRUE))
## # A tibble: 4 x 5
## # Groups: default [2]
## default student N MeanBalance SDbalance
## <fct> <fct> <int> <dbl> <dbl>
## 1 No No 6850 745. 446.
## 2 No Yes 2817 948. 451.
## 3 Yes No 206 1678. 331.
## 4 Yes Yes 127 1860. 329.
Q.15 Write R code to create a Box-Plot for credit card balance
# plotting the boxplot for inbuilt data
boxplot(balance,width = 0.5,
horizontal = TRUE,main = "Boxplot for Balance",
xlab = "balance",col = c("green"))
