library(readr)
library(readxl)
library(dplyr)
library(survey)
library(car)
library(broom)
library(stargazer)
data = read_excel("ms465_weighted.xlsx")
#fin = read_excel("Macintosh HD/Users/sudikshajoshi/Downloads/Survey Sampling/Project/ms465_weighted")
attach(data)

Variables

gender: What is your gender 1. Male 2. Female

doyouwork: Do you work for someone else, are you self-employed, or what? 1 Work for someone else 2 Self-employed 3 Other

III_A_1: ever purchased shares in an actively managed stock mutual fund A mutual fund is a company that brings together money from many people and invests it in stocks, bonds or other assets. In an actively managed stock mutual fund, the fund manager tries to beat the overall stock market’s return by picking stocks to buy. In contrast, a passively managed stock mutual fund (also known as a stock index fund) holds stocks in order to match the performance of a market benchmark (such as the S&P 500 stock market index) as closely as possible.

Have you ever purchased shares in an actively managed stock mutual fund? 1 Yes 2 No 3 Unsure

D4: value of all investable financial assets

What is the value of all your investable financial assets? Include the value of your bank accounts, brokerage accounts, retirement savings accounts, investment properties, etc., but NOT the value of the home(s) you live in or any private businesses you own. 1 $0 2 $1 - $999 3 $1,000 - $4,999 4 $5,000 - $9,999 5 $10,000 - $24,999 6 $25,000 - $49,999 7 $50,000 - $74,999 8 $75,000 - $99,999 9 $100,000 or more

I_B_2_1b: uncertainty 10 year period

Concern that when uncertainty increases about how the U.S.’s material standard of living will change over the 10 year period starting 1 year in the future, the stock market will tend to drop. 1 Not important at all 2 A little important 3 Moderately important 4 Very important 5 Extremely important

I_B_3 economic disaster

Concern that in an economic disaster where the amount that the U.S. economy produces in a year shrinks by more than 10% – like the Great Depression – a dollar I invested in stocks would lose more value than a dollar I put in a bank savings account. 1 Not important at all 2 A little important 3 Moderately important 4 Very important 5 Extremely important

I_A_1a What is the least amount of money worthwhile to invest in stocks

What is the least amount of money you would need to have available to make it worthwhile to invest in stocks? 1 $1 - $999 2 $1,000 - $4,999 3 $5,000 - $9,999 4 $10,000 - $24,999 5 $25,000 - $49,999 6 $50,000 - $74,999 7 $75,000 - $99,999 8 $100,000 or more

IV_A_1 growth stock vs value stock risk

A value stock is a stock that has a low price relative to its company’s current profits (and other fundamentals). A growth stock is a stock that has a high price relative to its company’s current profits (and other fundamentals). Compared to a growth stock, I expect a value stock to normally be 1 Riskier over the next year, on average 2 Equally risky over the next year, on average 3 Less risky over the next year, on average 4 No opinion

IV_A_2 growth stock vs value stock returns

Compared to a growth stock, I expect a value stock to normally have … 1 Higher returns over the next year, on average 2 About the same returns over the next year, on average 3 Lower returns over the next year, on average 4 No opinion

IV_B_1 stock whose price fell vs stock whose price rose risk

Compared to a stock whose price fell a lot over the past year, I expect a stock whose price rose a lot over the past year to normally be 1 Riskier over the next year, on average 2 Equally risky over the next year, on average 3 Less risky over the next year, on average 4 No opinion

I established the survey design before analyzing the data. As there is one observation per individual, I have specified ids=~0 (see code) to indicate no clustering. Also, I have omitted the finite population correction factor as each observation is a very small proportion of the overall sample. The survey designs also has weights.

# Initiate the svydesign object for a simple random sampling design. I will use the "fin_Decision" object for further analysis: 
(fin_decisions = svydesign(ids= ~0 , weights = ~weight, data = data))
## Independent Sampling design (with replacement)
## svydesign(ids = ~0, weights = ~weight, data = data)
# view the survey design’s object class or type:
class(fin_decisions)
## [1] "survey.design2" "survey.design"
# counts the sum of missing values in the variables selected
small_data = as.data.frame(data[c(6, 14, 33, 45, 47, 85, 106, 114: 116)])

(missing = sapply(small_data, function(small_data) sum(is.na (small_data))) %>% as.data.frame() %>%  rename("Number of Missing Values" = "."))

Mean estimates by three groups: gender, doyouwork and iii_a_1 (share purchases in mutual funds):

svyby(~d4+i_b_2_1b+i_b_3+i_a_1a+ iv_a_1+ iv_a_2, by=~gender, design=fin_decisions, svymean, na.rm=TRUE) %>% as.data.frame() %>% select("gender", "d4", "i_b_2_1b", "i_b_3", "i_a_1a" , "iv_a_1", "iv_a_2")
svyby(~d4+i_b_2_1b+i_b_3+i_a_1a+ iv_a_1+ iv_a_2, by=~doyouwork, design=fin_decisions, svymean, na.rm=TRUE) %>% select("doyouwork", "d4", "i_b_2_1b", "i_b_3", "i_a_1a" , "iv_a_1", "iv_a_2")
svyby(~d4+i_b_2_1b+i_b_3+i_a_1a+ iv_a_1+ iv_a_2, by=~iii_a_1, design=fin_decisions, svymean, na.rm=TRUE) %>% select("iii_a_1", "d4", "i_b_2_1b", "i_b_3", "i_a_1a" , "iv_a_1", "iv_a_2")

For a male, the estimate is 3.37, which is greater than the estimate for female by 0.23. While both invest in the range of $1,000 - $4,999 on financial assets, men typically invest more than demales. In the context of employment, respondents who work for other invest slighly less than those are self-employed but spend in the same range as $1,000 - $4,999. Respondents belonging in the “other” category invest a higher amount of $5,000 - $9,999.

In both categories, males and individuals employed for others are less concerned than females and all self-employed individuals about the rising uncertainity in standard of living, and drop in stock prices.

Males are very marginally less concerned than females (although both are moderately concerned) that if a Great Depresssion like crisis strikes, they would be worried if the dollar invested ins tocks shrinks by at least 10 percent relative to dollar in the bank savings account. Similarly, self-employed are the most concerned those those working for others.

Men invest at least $5,000 - $9,999 in stocks but in the lower end of the range. Alternatively, females and individuals working for others invest in the higher end of the range $1,000 - $4,999. Self-employed respondents spend in the midrange of $5,000 - $9,999 in stocks.

Males and self-employed expect value stock to be equally risky over the year, while females and individuals working for others expect that value stocks on average will be less risky over the next year. Albeit, all respondents expect same returns from value stocks nest year, females and those working for others are more optimistic than males and all self-employed individuals.

# uncertainty 10 year period about how the U.S.’s material standard of living will, the stock market will tend to drop
survey_mean = rbind(
svymean(~i_b_2_1b, na.rm=TRUE, fin_decisions), 

# value of all the investable financial assets
svymean(~d4, na.rm=TRUE, fin_decisions),

# if there is a great depression like event, a dollar in stocks will lose more value than a dollar I put in a bank savings account.
svymean(i_b_3, na.rm=TRUE, fin_decisions),

# What is the least amount of money worthwhile to invest in stocks
svymean(~i_a_1a, na.rm = T, design = fin_decisions),

# growth stock vs value stock risk 
svymean(~iv_a_1, na.rm = T, design = fin_decisions),

# growth stock vs value stock return
svymean(~iv_a_2, na.rm = T, design = fin_decisions)
)

# confidence intervals of the mean responses
confint = rbind(
  
uncertain_in_10yrs = confint(svymean(~i_b_2_1b,  design = fin_decisions, na.rm=T)),
econ_disaster = confint(svymean(~i_b_3,  design = fin_decisions, na.rm=T)),
money_invest_in_stock = confint(svymean(~i_a_1a,  design = fin_decisions, na.rm=T)),
value_of_investable_fin_assets = confint(svymean(~d4,  design = fin_decisions, na.rm=T)),
growth_value_vs_stock_risk = confint(svymean(~iv_a_1,  design = fin_decisions, na.rm=T)),
growth_value_vs_stock_return = confint(svymean(~iv_a_2,  design = fin_decisions, na.rm=T))

)

#confint 

(merge = cbind(confint, survey_mean)  %>%  as.data.frame() %>% rename(Mean = i_b_2_1b))

On average, the response value is 2.66. The respondents are moderately concerned that if the uncertainty about the US’s material standard of living changes over the next 10 years, then the stock market will drop. I am 95 percent confident that the mean response is between 2.546746 and 2.778895.

On average, the response value is 6.017. Respondents have estimatedly $25,000 - $49,999 worth of investable financial assets - value of your bank accounts, brokerage accounts, retirement savings accounts, investment properties, etc. I am 95 percent confident that the mean response is between 5.681171 and 6.354655.

On average, the response value is 3.183. Respondents are moderately concerned that if a great depression like crisis distrupts the US economy, then more dollar value than the amount they have invested.I am 95 percent confident that the mean response is between 3.045577 and 3.320872.

On average, the response value is 3.047. So, the least amount of money that the respondents are willing to invest in stocks is in the range of $5,000 - $9,999. I am 95 percent confident that the mean response is between 2.654978 and 3.440853.

On average, the response value is 2.828. Compared to growth stock, respondents expect that the value stock will be equally or less risky over the next year. I am 95 percent confident that the mean response is between 2.733018 and 2.923122.

On average, the response value is 2.57. Compared to growth stock, respondents expect that the value stock will have about the same or lower returns over the next year. I am 95 percent confident that the mean response is between 2.461672 and 2.690477.

# regression with raw variables
fit_reg = svyglm(i_a_1a ~ d4 + i_b_2_1b + i_b_3 + iv_a_1 + iv_a_2   , design = fin_decisions)

stargazer(fit_reg, type="text", dep.var.labels = c("money invest in stocks"), covariate.labels = c("value of investable fin assets", "uncertainty in 10 yrs", "econ disaster", "growth value vs stock risk", "growth value vs stock return") , align = TRUE, na.rm=T)
## 
## ==========================================================
##                                    Dependent variable:    
##                                ---------------------------
##                                  money invest in stocks   
## ----------------------------------------------------------
## value of investable fin assets            0.127           
##                                          (0.077)          
##                                                           
## uncertainty in 10 yrs                     0.208           
##                                          (0.192)          
##                                                           
## econ disaster                            -0.304           
##                                          (0.209)          
##                                                           
## growth value vs stock risk                0.260           
##                                          (0.242)          
##                                                           
## growth value vs stock return             -0.220           
##                                          (0.243)          
##                                                           
## Constant                                2.918***          
##                                          (0.784)          
##                                                           
## ----------------------------------------------------------
## Observations                               215            
## Log Likelihood                          -495.546          
## Akaike Inf. Crit.                       1,003.091         
## ==========================================================
## Note:                          *p<0.1; **p<0.05; ***p<0.01
## 
## ====
## TRUE
## ----

Except for the intercept, all the explanatory variables are statistically insignificant, so they don’t explain how much money do respondents invest in stocks. The negative coefficient of the variable “econ disaster” implies that when respondents are more concerned of increasing chance of a financial crisis occuring, they will invest less in stocks, which is also concurrent with economic theory. The negative sign of the coefficient of “growth value vs stock return” suggests that as the value stock generate lower returns relative to growth stock over time, then the respondents will invest less in stocks. The positive sign of the coefficient of “uncertainity in 10 years” state that as the economic condition becomes more uncertain, investors will invest more. This disproves the economic theory, suggesting that the variable is not a good predictor of the the response variable. Below are the diagnostic plots that identify patterns in residuals.

# diagnostic plots of the model with raw variables
plot(fit_reg)

The residual vs fitten plot shows that residuals are almost evenly spread out about the horizonal line, so they are homoskedastic and the model has linear relationship between predictors and response variables.

In the normal Q-Q plot, the residuals severely deviate and do not follow a straight line. They appear to be skewed, so the residuals are not normally distributed. A few outliers are also present.

Due to insignificance of the variables, I made another regression model with the transformed variables. I applied box cox transformation

# lambdas to get the transformed variables
lambda_bc_y = powerTransform(data$i_a_1a, family = "bcPower")
lambda_bc_d4 = powerTransform(data$d4, family = "bcPower")
lambda_bc_i_b_2_1b = powerTransform(data$i_b_2_1b, family = "bcPower")
lambda_bc_i_b_3 = powerTransform(data$i_b_3, family = "bcPower")
lambda_bc_iv_a_1 = powerTransform(data$iv_a_1, family = "bcPower")
lambda_bc_iv_a_2 = powerTransform(data$iv_a_2, family = "bcPower")
 
# data frame that consists of all the lambdas
transform = rbind(lambda_bc_y$lambda, lambda_bc_d4$lambda, lambda_bc_i_b_2_1b$lambda, lambda_bc_i_b_3$lambda, lambda_bc_iv_a_1$lambda, lambda_bc_iv_a_2$lambda) %>% as.data.frame() %>% rename("lambda" = "data$i_a_1a")

# dataframe that stores the names of the variables
variables = data.frame(Variables =c("money invest in stocks", "value of financial assets ", "uncertainty 10 years", "econ disaster", "growth stock vs value stock risk", "growth stock vs value stock return") )

# data frame with the variable names and their respective lambdas
variables %>% cbind(transform)
# Add transformed variables to the data sets
data_bc = data %>%
mutate(bc_y = bcPower(i_a_1a, lambda = lambda_bc_y$lambda), 
       bc_d4 = bcPower(d4, lambda = lambda_bc_d4$lambda),
       bc_i_b_2_1b = bcPower(i_b_2_1b, lambda = lambda_bc_i_b_2_1b$lambda), 
       bc_i_b_3 = bcPower(i_b_3, lambda = lambda_bc_i_b_3$lambda),
       bc_iv_a_1 = bcPower(iv_a_1, lambda = lambda_bc_iv_a_1$lambda), 
       bc_iv_a_2 = bcPower(iv_a_2, lambda = lambda_bc_iv_a_2$lambda))

# make a new design object to incorporate the bc transformed variable
fin_decisions_bc = svydesign(ids= ~0 , weights = ~weight, data = data_bc)

I saved the transformed variables in a new dataframe, and created a new survey design.

# regression with the transformed variables
fit_reg_bc = svyglm(bc_y ~ bc_d4 + bc_i_b_2_1b + bc_i_b_3 + bc_iv_a_1 + bc_iv_a_2  , design = fin_decisions_bc)


stargazer(fit_reg_bc, type="text", dep.var.labels = c("bc money invest in stocks"), covariate.labels = c("bc value of investable fin assets", "bc uncertainty in 10 yrs", "bc econ disaster", "bc growth value vs stock risk", "bc growth value vs stock return") , align = TRUE, na.rm=T)
## 
## =============================================================
##                                       Dependent variable:    
##                                   ---------------------------
##                                    bc money invest in stocks 
## -------------------------------------------------------------
## bc value of investable fin assets            0.023           
##                                             (0.020)          
##                                                              
## bc uncertainty in 10 yrs                     0.039           
##                                             (0.117)          
##                                                              
## bc econ disaster                            -0.090           
##                                             (0.088)          
##                                                              
## bc growth value vs stock risk                0.054           
##                                             (0.072)          
##                                                              
## bc growth value vs stock return             -0.102           
##                                             (0.139)          
##                                                              
## Constant                                   1.105***          
##                                             (0.218)          
##                                                              
## -------------------------------------------------------------
## Observations                                  215            
## Log Likelihood                             -297.527          
## Akaike Inf. Crit.                           607.054          
## =============================================================
## Note:                             *p<0.1; **p<0.05; ***p<0.01
## 
## ====
## TRUE
## ----

Even after fitting the new model, variables are insignificant and the diagnostic plots show very minimal difference.

# diagnostic plots of the model with transformed variables
plot(fit_reg_bc)