Warning: package 'AER' was built under R version 4.3.3
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival
Data set 1
Describing the data
The data is a Cross Sectional data frame showing the credit history for a sample of credit card applicants (2003). It includes a variety of interesting variables such as the applicants age, expenditures, income, and aplication approval status. I was very interested in understanding consumer spend by age & income as well as applicants approval status by age.
Variables:
card —> Factor. Was the application for a credit card accepted?
reports —> Number of major derogatory reports.
Age —> Age in years plus twelfths of a year.
income —> Yearly income (in USD 10,000).
share —> Ratio of monthly credit card expenditure to yearly income.
expenditure —> Average monthly credit card expenditure.
owner —> Factor. Does the individual own their home?
selfemp —> Factor. Is the individual self-employed?
library(knitr)# Creating a table (First 10 rows)kable(head(card, 10), format ="html", caption ="Credit Card (first 10 rows)")
Credit Card (first 10 rows)
card
reports
age
income
share
expenditure
owner
selfemp
dependents
months
majorcards
active
yes
0
38
45,200
0.0333
124.98
yes
no
3
54
1
12
yes
0
33
24,200
0.0052
9.85
no
no
3
34
1
13
yes
0
34
45,000
0.0042
15.00
yes
no
4
58
1
5
yes
0
31
25,400
0.0652
137.87
no
no
0
25
1
7
yes
0
32
97,867
0.0671
546.50
yes
no
2
64
1
5
yes
0
23
25,000
0.0444
92.00
no
no
0
54
1
1
yes
0
28
39,600
0.0126
40.83
no
no
2
7
1
5
yes
0
29
23,700
0.0764
150.79
yes
no
0
77
1
3
yes
0
37
38,000
0.2456
777.82
yes
no
0
97
1
6
yes
0
28
32,000
0.0198
52.58
no
no
0
65
1
18
Creating a Scatter plot and bar chart
library(ggplot2)ggplot(card, aes(x = age, y = share)) +geom_point(color ="navyblue") +labs(title ="Scatter plot of Age vs Share (expense / income)", x ="Age", y ="Share") +theme_minimal()
# Filter data to only rejected applicantsrejected_data <- card[card$card =="no", ]# Create a bar chartggplot(rejected_data, aes(x =as.factor(age))) +geom_bar(fill ="red", color ="black") +labs(title ="Count of Rejected Credit Card Applications by Age",x ="Age",y ="Count of Rejected Applications") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
# Filter data to only accepted applicantsaccepted_data <- card[card$card =="yes", ]# Create a bar chartggplot(accepted_data, aes(x =as.factor(age))) +geom_bar(fill ="green", color ="black") +labs(title ="Count of Accepted Credit Card Applications by Age",x ="Age",y ="Count of Accepted Applications") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
The data contains the net returns for US stock prices and their expected dividends, measured by the broad-based (NYSE and AMEX) value-weighted index of stock prices.
Type of Data —> it is structured as a multivariate time series data set (1931 - 2002)
Variables
Returns —> Monthly return % minus risk free rate of US treasury bill. Return incorporates the benefit or loss from the change in price plus the dividend recieved.
Dividends —> Distributions to shareholders over past 12 months divided by the price in the current month.
Adjusting data from Multivariate time series to a data frame
In this case the Class function tells us it is a multivariate time series while GGplot works more so with data frames.
We didnt have an explicit column for time in the format GGplot expects and so we have to create one.
GGPlot - Identifying the Data structure
str(stock)
Time-Series [1:864, 1:2] from 1931 to 2003: 5.96 10.31 -6.84 -10.45 -14.36 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "returns" "dividend"
class(stock)
[1] "mts" "ts"
Creating a graph in GGplot
# Convert time series to a data framestock_df <-as.data.frame(stock)# Create a new Date column by taking the time information (index)stock_df$Date <-as.Date(time(stock), origin ="1970-01-01") # origin in 1970 as mentioned in class# Formatting datastock_df$returns <-round(stock_df$returns, digits =2)stock_df$dividend <-round(stock_df$dividend, digits =2)# Plot for "returns"ggplot(stock_df, aes(x = Date, y = returns)) +geom_line(color ="navyblue") +labs(title ="US Stock Returns Over Time", x ="Date", y ="Returns") +theme_minimal()
# Plot for "dividends"ggplot(stock_df, aes(x = Date, y = dividend)) +geom_line(color ="darkgreen") +labs(title ="US Stock Dividends Over Time", x ="Date", y ="Dividends") +theme_minimal()
Now we can also see we have a column for Date added
# Creating a table (First 10 rows)kable(head(stock_df, 10), format ="html", caption ="Stock returns (first 10 rows)")