ECNM Discussion 2

Author

Bryan Calderon

library("AER")
Warning: package 'AER' was built under R version 4.3.3
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival

Data set 1

Describing the data

The data is a Cross Sectional data frame showing the credit history for a sample of credit card applicants (2003). It includes a variety of interesting variables such as the applicants age, expenditures, income, and aplication approval status. I was very interested in understanding consumer spend by age & income as well as applicants approval status by age.

Variables:

  1. card —> Factor. Was the application for a credit card accepted?

  2. reports —> Number of major derogatory reports.

  3. Age —> Age in years plus twelfths of a year.

  4. income —> Yearly income (in USD 10,000).

  5. share —> Ratio of monthly credit card expenditure to yearly income.

  6. expenditure —> Average monthly credit card expenditure.

  7. owner —> Factor. Does the individual own their home?

  8. selfemp —> Factor. Is the individual self-employed?

  9. dependents —> Number of dependents.

  10. months —> Months living at current address.

  11. Majorcards —> Number of major credit cards held.

  12. active —> Number of active credit accounts.

Bringing in & Formatting Data

data("CreditCard")

card <- CreditCard

# Formatting data
card$income <- card$income * 10000
card$income <- round(card$income, digits = 0)
card$income <- format(card$income, big.mark = ",", scientific = FALSE)

card$expenditure <- round(card$expenditure, digits = 2)
card$age <- round(card$age + .01)

card$share <- round(card$share, digits = 4)

Creating a structured Table

library(knitr)

# Creating a table (First 10 rows)
kable(head(card, 10), 
      format = "html", 
      caption = "Credit Card (first 10 rows)")
Credit Card (first 10 rows)
card reports age income share expenditure owner selfemp dependents months majorcards active
yes 0 38 45,200 0.0333 124.98 yes no 3 54 1 12
yes 0 33 24,200 0.0052 9.85 no no 3 34 1 13
yes 0 34 45,000 0.0042 15.00 yes no 4 58 1 5
yes 0 31 25,400 0.0652 137.87 no no 0 25 1 7
yes 0 32 97,867 0.0671 546.50 yes no 2 64 1 5
yes 0 23 25,000 0.0444 92.00 no no 0 54 1 1
yes 0 28 39,600 0.0126 40.83 no no 2 7 1 5
yes 0 29 23,700 0.0764 150.79 yes no 0 77 1 3
yes 0 37 38,000 0.2456 777.82 yes no 0 97 1 6
yes 0 28 32,000 0.0198 52.58 no no 0 65 1 18

Creating a Scatter plot and bar chart

library(ggplot2)

ggplot(card, aes(x = age, y = share)) +
  geom_point(color = "navyblue") +
  labs(title = "Scatter plot of Age vs Share (expense / income)", 
       x = "Age", 
       y = "Share") +
  theme_minimal()

# Filter data to only rejected applicants
rejected_data <- card[card$card == "no", ]

# Create a bar chart
ggplot(rejected_data, aes(x = as.factor(age))) +
  geom_bar(fill = "red", color = "black") +
  labs(title = "Count of Rejected Credit Card Applications by Age",
       x = "Age",
       y = "Count of Rejected Applications") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Filter data to only accepted applicants
accepted_data <- card[card$card == "yes", ]

# Create a bar chart
ggplot(accepted_data, aes(x = as.factor(age))) +
  geom_bar(fill = "green", color = "black") +
  labs(title = "Count of Accepted Credit Card Applications by Age",
       x = "Age",
       y = "Count of Accepted Applications") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Creating a 2 way table

table(card$card, card$age)
     
       0  1 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
  no   0  1  0  2  4  7  7 18 15 19 14 12 15  9  9 12 12 10  9 14 14 13  9  9
  yes  1  5  2  8 15 27 43 44 45 75 43 45 57 40 38 36 40 33 31 32 25 25 22 26
     
      40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
  no   6  9  6  5  0  6  4  2  4  0  1  0  2  5  1  1  1  2  1  1  0  0  0  2
  yes 36 29 24 14 14 14 24 12 13 13  8  3 13  3  7  9  1  6  3  1  2  4  2  2
     
      64 65 66 67 70 72 74 80 84
  no   0  0  0  1  1  0  0  1  0
  yes  2  1  1  1  0  1  1  0  1

Data set 2

Describing the data

The data contains the net returns for US stock prices and their expected dividends, measured by the broad-based (NYSE and AMEX) value-weighted index of stock prices.

  • Type of Data —> it is structured as a multivariate time series data set (1931 - 2002)

  • Variables

    • Returns —> Monthly return % minus risk free rate of US treasury bill. Return incorporates the benefit or loss from the change in price plus the dividend recieved.

    • Dividends —> Distributions to shareholders over past 12 months divided by the price in the current month.

data("USStocksSW")
stock <- USStocksSW

# Creating a table (First 10 rows)
kable(head(stock, 10),
      format = "html",
      caption = "Stock returns (first 10 rows)")
Stock returns (first 10 rows)
returns dividend
5.9650 -282.2329
10.3053 -293.2089
-6.8408 -287.8614
-10.4481 -278.2477
-14.3581 -265.4742
12.8503 -280.5102
-6.6559 -275.5950
0.0461 -278.4424
-34.2584 -247.1829
7.6799 -255.0321
#Creating a simple graph
plot(stock)

Understanding the type of data

  • Adjusting data from Multivariate time series to a data frame

    • In this case the Class function tells us it is a multivariate time series while GGplot works more so with data frames.

    • We didnt have an explicit column for time in the format GGplot expects and so we have to create one.

GGPlot - Identifying the Data structure

str(stock)
 Time-Series [1:864, 1:2] from 1931 to 2003: 5.96 10.31 -6.84 -10.45 -14.36 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "returns" "dividend"
class(stock)
[1] "mts" "ts" 

Creating a graph in GGplot

# Convert time series to a data frame
stock_df <- as.data.frame(stock)

# Create a new Date column by taking the time information (index)
stock_df$Date <- as.Date(time(stock), 
                         origin = "1970-01-01") # origin in 1970 as mentioned in class

# Formatting data
stock_df$returns <- round(stock_df$returns, digits = 2)
stock_df$dividend <- round(stock_df$dividend, digits = 2)

# Plot for "returns"
ggplot(stock_df, aes(x = Date, y = returns)) +
  geom_line(color = "navyblue") +
  labs(title = "US Stock Returns Over Time", 
       x = "Date", 
       y = "Returns") +
  theme_minimal()

# Plot for "dividends"
ggplot(stock_df, aes(x = Date, y = dividend)) +
  geom_line(color = "darkgreen") +
  labs(title = "US Stock Dividends Over Time", 
       x = "Date", 
       y = "Dividends") +
  theme_minimal()

Now we can also see we have a column for Date added

# Creating a table (First 10 rows)
kable(head(stock_df, 10), 
      format = "html", 
      caption = "Stock returns (first 10 rows)")
Stock returns (first 10 rows)
returns dividend Date
5.96 -282.23 1931-01-01
10.31 -293.21 1931-02-01
-6.84 -287.86 1931-03-01
-10.45 -278.25 1931-04-01
-14.36 -265.47 1931-05-01
12.85 -280.51 1931-06-01
-6.66 -275.60 1931-07-01
0.05 -278.44 1931-08-01
-34.26 -247.18 1931-09-01
7.68 -255.03 1931-10-01