data=read_excel("E:\\datasets for programming\\Final Excel REsponses for PHD (2).xlsx",sheet = 'Sheet3')
New names:
data1=read_excel("E:\\datasets for programming\\Final Excel REsponses for PHD (2).xlsx")
New names:
# Cronbach's ALpha for Awareness of FL
library(psych)
Warning: package ‘psych’ was built under R version 4.4.3
Awareness_alpha = psych::alpha(data1[,c(48:77)]) 
ifelse(Awareness_alpha$total[,1] > 0.75 , 'Likert items hang together in well manner','No Redundancy')
[1] "Likert items hang together in well manner"
Awareness_alpha$total[,1]
[1] 0.9743263
Investment_conideration_alpha=psych::alpha(data1[,c(34:39)])
ifelse(Investment_conideration_alpha$total[,1] > 0.75 ,'Likert items of investment consideration hang together in well manner', 'No redudancy')
[1] "Likert items of investment consideration hang together in well manner"
Investment_conideration_alpha$total[,1]
[1] 0.8280706
Major_investment_alpha$total[,1]
[1] 0.6387958
Investment_challange_alpha$total[,1] # Acceptable
[1] 0.6004277

is os found that there is no statistically significant difference in Financial Literacy Score distributions across the maratial status

from the above plot we can infer that the average finincial literacy score for those womens whose working experience is Up to 5y, 6-10y, above 15y is almost same. And due to unfulfillement of required assumption of parametric test, i have moved to non parametric kruskal wallis test which suggest that, there is no statistically significant difference between FL score for women with different level of work experience.

ggplot(data, aes(x =data[[31]])) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  geom_vline(aes(xintercept = mean(data[[31]])), 
             color = "red", linetype = "dashed", size = 1.2) +
  labs(x = "Sector status", y = "Investment consideration Score", title = "Score by Sector Status")+
  facet_wrap(~data[[11]])
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
Please use `linewidth` instead.

CA Interpretation :

Here’s how to interpret the plot and the summary output:

Inertia (Explained Variance): The summary output will show you how much of the total “inertia” (which is related to the χ 2 statistic) is explained by each dimension (axis).

Dimension 1 (Dim 1) likely accounts for the vast majority of the inertia (~80-90%). Looking at the plot, this horizontal axis clearly separates preferences, running from LeastPreferred on the left to Preferred and HighlyPreferred on the right. It represents the primary axis of “dislike vs. like.”

Dimension 2 (Dim 2) accounts for a smaller portion of the inertia. This vertical axis helps separate the categories further.

Interpreting the Biplot:

Safe/Low Risk is positioned very closely to Preferred and HighlyPreferred. This indicates a strong association; people who prefer safe/low-risk investments also tend to rate them as preferred or highly preferred.

High Risk is positioned far out on the left, very close to LeastPreferred and DontKnow. This shows a strong association where high-risk investments are seen as the least preferred. The proximity to “DontKnow” might suggest uncertainty or aversion.

Moderate Risk and Traditional are located closer to the center (origin) and near Neutral. This suggests their preference profiles are less extreme and more average compared to the high and low-risk categories. “Traditional” in particular seems to have a fairly neutral, slightly varied preference profile.

Overall Story: The analysis paints a clear picture. The primary factor driving investment preference is the level of risk. Safe/Low-risk options are strongly and positively preferred, while High-risk options are strongly disliked. Moderate and traditional options fall in the middle, eliciting a more neutral or average response.

library(corrplot)
Warning: package ‘corrplot’ was built under R version 4.4.3corrplot 0.95 loaded
residual=chi_sq_result$residuals
print("standardized Residuals:");print(round(residual,2))
[1] "standardized Residuals:"
     DontKnow LeastPreferred Neutral Preferred HighlyPreferred
[1,]     8.52           3.48   -1.36     -5.46           -3.92
[2,]     0.49          -6.19    4.89      2.52           -2.99
[3,]    -5.67           0.65   -2.01      3.61            3.13
[4,]    -0.49           3.40   -1.79     -3.26            2.94
# Visualize the residuals using corrplot
# Positive residuals are in blue, negative are in red.
# The size and color intensity of the circle correspond to the magnitude of the residual.

corrplot(residual,is.cor=FALSE,
         method = "color", # Display as colored cells
         tl.col = "black",
         addCoef.col = "black", # Add the residual values on the plot
         number.cex = 0.8)

NA
NA

The key to interpretation is the “rule of 2”:

A standardized residual greater than 2 means the observed count in that cell is significantly higher than what we’d expect if there were no association.

A standardized residual less than -2 means the observed count is significantly lower than expected.

Values between -2 and 2 are considered within the range of random chance.


# Major Invest Objectives
Major_Objectives=rbind(table(data1[[16]]),table(data1[[17]]),table(data1[[18]]),table(data1[[19]]),
      table(data1[[20]]),table(data1[[21]]),table(data1[[22]]),table(data1[[23]]))
rownames(Major_Objectives)=c('Wealth Creation','Earn regular Return','Meet unanticipated e',
               'Children/Self education','Children/self Marriage','Old age Plannig',
               'Reduce Tax Burden','House Purchase')
Major_Objectives
                          1   2   3
Wealth Creation         304 106  29
Earn regular Return     184 148 107
Meet unanticipated e    110 195 134
Children/Self education 146 129 164
Children/self Marriage  165 149 125
Old age Plannig         194 136 109
Reduce Tax Burden       133 145 161
House Purchase          163 112 164
# Ordinary INvest objectives
Ordinary_Objectuves=rbind(table(data1[[25]]),table(data1[[26]]),table(data1[[27]]),table(data1[[28]]),
      table(data1[[29]]),table(data1[[30]]))
Warning: number of columns of result is not a multiple of vector length (arg 1)
rownames(Ordinary_Objectuves)=c('Make Earning/ part of income','Have a flow of income','Anticipated future needs',
               'Build reserve for unseen contigencies','Build assets[Land,Gold]','Save Tax')

Ordinary_Objectuves
                                        1   2  3  4  5   6   7
Make Earning/ part of income          246  82 40 10 53   8 246
Have a flow of income                 241 148 21 16 13 241 148
Anticipated future needs              182 122 96 13 26 182 122
Build reserve for unseen contigencies 122 147 90 10 31   9  30
Build assets[Land,Gold]               223 111 70 24 11 223 111
Save Tax                              150 121 68 26  6  65   3
Total_Major_invest_obj_score=rowSums(Major_Objectives)
Total_Ordi_invest_obj_score=rowSums(Ordinary_Objectuves)



# Visualisation of total score
barplot(Major_Objectives[,1],col='green',ylim=c(0,1000))

barplot(Major_Objectives[,2],col='red',ylim=c(0,1000))

barplot(Major_Objectives[,3],col='orange',ylim=c(0,1000))


chisq_major=chisq.test(Major_Objectives[,-1])

S=svd(Major_Objectives)
chisq.test(Ordinary_Objectuves[,-1])      # implies there is an association between investment objectives

    Pearson's Chi-squared test

data:  Ordinary_Objectuves[, -1]
X-squared = 877.81, df = 25, p-value < 2.2e-16
# Weighted overall investment objectives
data$Overall_invest_Obj_score=8*data1[[24]] + 6*data1[[31]]


# Correspondence Analysis (CA)

#Visual way to map association between type of investment objectives and ranking.


library(FactoMineR)
library(factoextra)

ca_result1 <- CA(as.matrix(Major_Objectives[-1,]), graph=TRUE)

fviz_ca_biplot(ca_result1)
Registered S3 methods overwritten by 'car':
  method       from
  hist.boot    FSA 
  confint.boot FSA 

ca_result2 <- CA(as.matrix(Ordinary_Objectuves[-1,]), graph=TRUE)

fviz_ca_biplot(ca_result2)

NA
NA
---
title: "R Notebook"
output: html_notebook
---

```{r}
library(readxl)

data=read_excel("E:\\datasets for programming\\Final Excel REsponses for PHD (2).xlsx",sheet = 'Sheet3')
str(data)

View(data.frame(NO = seq_along(colnames(data)),Name=colnames(data)))
```


```{r}
data1=read_excel("E:\\datasets for programming\\Final Excel REsponses for PHD (2).xlsx")
View(data.frame(No=seq_along(colnames(data1)),Name=colnames(data1)))
```



```{r}
# Cronbach's ALpha for Awareness of FL
library(psych)
Awareness_alpha = psych::alpha(data1[,c(48:77)]) 
ifelse(Awareness_alpha$total[,1] > 0.75 , 'Likert items hang together in well manner','No Redundancy')
Awareness_alpha$total[,1]



Investment_conideration_alpha=psych::alpha(data1[,c(34:39)])
ifelse(Investment_conideration_alpha$total[,1] > 0.75 ,'Likert items of investment consideration hang together in well manner', 'No redudancy')
Investment_conideration_alpha$total[,1]
```


```{r}
# Cronbach's ALpha for Major Investment of FL
library(psych)
Major_investment_alpha = psych::alpha(data1[,c(17:23)]) 
ifelse(Major_investment_alpha$total[,1] > 0.75 , 'Likert items hang together in well manner','No Redundancy')
Major_investment_alpha$total[,1]
```



```{r}
# Cronbach's ALpha for Investment Challange of FL

Investment_challange_alpha = psych::alpha(data1[,c(105:110)]) 
ifelse(Investment_challange_alpha$total[,1] > 0.75 , 'Likert items hang together in well manner','No Redundancy')
Investment_challange_alpha$total[,1] # Acceptable

```

```{r}
# Socio Economic Factor Maritial Status associate with FL among NMU WW
# wilcox.test(colnames(data[,24])~colnames(data[,3]),data)
kruskal.test(data[[24]]~as.factor(data[[3]]))

table(as.factor(data[[3]]))

boxplot(data[[24]] ~ as.factor(data[[3]]),
        main = "Score by Maratial Status",
        xlab = "Maratial Status", ylab = "Score", col = "lightblue")

# Even if p > 0.05, you might still want to check the magnitude of differences. For Kruskal–Wallis, you can use eta-squared or epsilon-squared. Example:

# install.packages("rcompanion")
library(rcompanion)
kruskalTest <- kruskal.test(data[[24]] ~ as.factor(data[[3]]))
rcompanion::etaSquared(kruskalTest, anova = FALSE)


# Optional – Dunn’s posthoc for pairwise comparison

# install.packages("FSA")
library(FSA)
dunnTest(data[[24]] ~ as.factor(data[[3]]), method="bonferroni")

# Visualization

library(ggplot2)

ggplot(data, aes(x = as.factor(data[[3]]), y = data[[24]], fill = as.factor(data[[3]]))) +
  geom_boxplot() +
  labs(x = "Marital Status", y = "Score", title = "FL Score w.r.t Marital Status") +
  theme_minimal()

```
is os found that there is no statistically significant difference in  Financial Literacy Score distributions across the maratial status

```{r}
# Socio Economic Factor Maritial Status associate with FL among NMU WW
# wilcox.test(colnames(data[,24])~colnames(data[,6]),data)
kruskal.test(data[[24]]~as.factor(data[[6]]))  # more than 2 levels

table(as.factor(data[[6]]))

boxplot(data[[24]] ~ as.factor(data[[6]]),
        main = "Score by Gender",
        xlab = "Work Experience", ylab = "Score", col = "lightblue")

# Even if p > 0.05, you might still want to check the magnitude of differences. For Kruskal–Wallis, you can use eta-squared or epsilon-squared. Example:

# install.packages("rcompanion")
library(rcompanion)
kruskalTest <- kruskal.test(data[[24]] ~ as.factor(data[[6]]))
# rcompanion::etaSquared(kruskalTest, anova = FALSE)


# Optional – Dunn’s posthoc for pairwise comparison

# install.packages("FSA")
library(FSA)
dunnTest(data[[24]] ~ as.factor(data[[6]]), method="bonferroni")

# Visualization

library(ggplot2)

ggplot(data, aes(x = as.factor(data[[6]]), y = data[[24]], fill = as.factor(data[[6]]))) +
  geom_boxplot() +
  labs(x = "Work Experience", y = "FL Score", title = "FL Score w.r.t working Experience") +
  theme_minimal()
```

from the above plot we can infer that the average finincial literacy score for those womens whose working experience is Up to 5y, 6-10y, above 15y is almost same. 
And due to unfulfillement of required assumption of parametric test, i have moved to non parametric kruskal wallis test which suggest that, there is no statistically significant difference between FL score for women with different level of work experience.

```{r}
# Socio Economic Factor Maritial Status associate with Investment Behaviour among NMU WW
# wilcox.test(colnames(data[,24])~colnames(data[,6]),data)
kruskal.test(data[[31]]~as.factor(data[[3]]))  # more than 2 levels

table(as.factor(data[[3]]))

boxplot(data[[31]] ~ as.factor(data[[3]]),
        main = "Score by Gender",
        xlab = "Maritial Status", ylab = "Score", col = "lightblue")

# Even if p > 0.05, you might still want to check the magnitude of differences. For Kruskal–Wallis, you can use eta-squared or epsilon-squared. Example:



# install.packages("rcompanion")
library(rcompanion)
kruskalTest <- kruskal.test(data[[24]] ~ as.factor(data[[3]]))
# rcompanion::etaSquared(kruskalTest, anova = FALSE)


# Optional – Dunn’s posthoc for pairwise comparison

# install.packages("FSA")
library(FSA)
dunnTest(data[[31]] ~ as.factor(data[[3]]), method="bonferroni")

# Visualization

library(ggplot2)

ggplot(data, aes(x = as.factor(data[[3]]), y = data[[31]], fill = as.factor(data[[3]]))) +
  geom_boxplot() +
  labs(x = "Martial status", y = "Score", title = "Score by Marital Status") +
  theme_minimal()
```


```{r}
# Socio Economic Factor Sector Status associate with Investment Behaviour among NMU WW
# wilcox.test(colnames(data[,31])~colnames(data[,11]),data)
kruskal.test(data[[31]]~as.factor(data[[11]]))  # more than 2 levels

table(as.factor(data[[11]]))

boxplot(data[[31]] ~ as.factor(data[[11]]),
        main = "Score by Gender",
        xlab = "Sector Status", ylab = "Score", col = "lightblue")

# Even if p > 0.05, you might still want to check the magnitude of differences. For Kruskal–Wallis, you can use eta-squared or epsilon-squared. Example:



# install.packages("rcompanion")
library(rcompanion)
kruskalTest <- kruskal.test(data[[24]] ~ as.factor(data[[3]]))
# rcompanion::etaSquared(kruskalTest, anova = FALSE)


# Optional – Dunn’s posthoc for pairwise comparison

# install.packages("FSA")
library(FSA)
dunnTest(data[[31]] ~ as.factor(data[[11]]), method="bonferroni")

# Visualization

library(ggplot2)

ggplot(data, aes(x = as.factor(data[[11]]), y = data[[31]], fill = as.factor(data[[11]]))) +
  geom_boxplot() +
  labs(x = "Sector status", y = "Score", title = "FL Score w.r.t residential sector") +
  theme_minimal()
```



```{r}
# Socio Economic Factor Sector Status associate with Investment Behaviour among NMU WW
# wilcox.test(colnames(data[,31])~colnames(data[,11]),data)
kruskal.test(data[[31]]~as.factor(data[[7]]))  # more than 2 levels

table(as.factor(data[[7]]))

boxplot(data[[31]] ~ as.factor(data[[7]]),
        main = "Score with respect to working sector",
        xlab = "Sector Status", ylab = "Score", col = "lightblue")

# Even if p > 0.05, you might still want to check the magnitude of differences. For Kruskal–Wallis, you can use eta-squared or epsilon-squared. Example:



# install.packages("rcompanion")
#library(rcompanion)
#kruskalTest <- kruskal.test(data[[24]] ~ as.factor(data[[3]]))
# rcompanion::etaSquared(kruskalTest, anova = FALSE)


# Optional – Dunn’s posthoc for pairwise comparison

# install.packages("FSA")
library(FSA)
dunnTest(data[[31]] ~ as.factor(data[[7]]), method="bonferroni")

# Visualization

library(ggplot2)

ggplot(data, aes(x = as.factor(data[[7]]), y = data[[31]], fill = as.factor(data[[7]]))) +
  geom_boxplot() +
  labs(x = "Occupational status", y = "Score", title = "FL Score w.r.t occupational status") +
  theme_minimal()

```


```{r H04}

wilcox.test(data[[31]]~as.factor(data[[11]]))  # more than 2 levels
library(lattice)

X=data[[31]];y=data[[11]]

ggplot(data, aes(x =data[[31]])) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  geom_vline(aes(xintercept = mean(data[[31]])), 
             color = "red", linetype = "dashed", size = 1.2) +
  labs(x = "Sector status", y = "Investment consideration Score", title = "Score by Sector Status")+
  facet_wrap(~data[[11]])

```


```{r  H05}
data$Invest_behavior <- NULL
data$Invest_behavior = 7*data1[[90]] + 5*data1[[96]]+3*data1[[100]] +3*data1[[104]] 
data[[32]]

shapiro.test(data[[32]])

# Create the data frame
df <- data.frame(
  RiskType = c(
    rep("Safe/Low Risk", 7),
    rep("Moderate Risk", 5),
    rep("High Risk", 3),
    rep("Traditional", 3)
  ),
  FinancialProduct = c(
    "Savings Account", "Bank Fixed Deposit", "Public Provident Fund", 
    "National Savings Certificate", "Post Office Savings", 
    "Government Securities", "Recurring Deposit",
    "Mutual Funds", "Insurance", "Debentures", "Bonds", "Preference Shares",
    "Equity Share Market", "Commodities Market", "Forex Market",
    "Real Estate / Property", "Gold / Silver", "Chit Funds"
  ),
  DontKnow = c(
    33, 17, 71, 85, 18, 37, 36,
    27, 83, 42, 107, 36,
    59, 104, 127,
    76, 16, 79
  ),
  LeastPreferred = c(
    21, 51, 54, 87, 82, 91, 96,
    23, 19, 42, 64, 66,
    58, 98, 94,
    109, 35, 106
  ),
  Neutral = c(
    68, 40, 88, 101, 152, 79, 73,
    55, 89, 179, 109, 126,
    78, 97, 82,
    58, 66, 127
  ),
  Preferred = c(
    124, 173, 105, 49, 60, 82, 81,
    162, 105, 40, 65, 87,
    79, 42, 44,
    62, 94, 45
  ),
  HighlyPreferred = c(
    122, 85, 48, 44, 54, 77, 64,
    82, 70, 21, 21, 53,
    87, 25, 19,
    61, 155, 9
  )
)

# Check structure
str(df)

# Preview first rows
head(df)
colnames(df)

# Suppose your data frame is df with columns: 
# RiskType, FinancialProduct, DontKnow, LeastPref, Neutral, Preferred, HighlyPreferred

# Collapse counts across products within each risk type:
df_risk <- aggregate(cbind(DontKnow, LeastPreferred, Neutral, Preferred, HighlyPreferred) ~ RiskType, data = df,FUN=sum)

# Look at the result
print(df_risk)
# Convert to matrix
pref_matrix <- as.matrix(df_risk[,-1])
rownames(pref_matrix) <- df_risk$RiskType

# Perform chi-square test
chisq.test(pref_matrix)

library(dplyr)

df_collapsed <- df %>%
  group_by(RiskType) %>%
  summarise(
    'DontKnow' = sum(DontKnow),
    'LeastPreferred' = sum(LeastPreferred),
    'Neutral' = sum(Neutral),
    'Preferred' = sum(Preferred),
    'HighlyPreferred' = sum(HighlyPreferred),
    .groups = "drop"
  )

df_collapsed

chi_sq_result=chisq.test(df_collapsed[,-1]) 
# There is a highly statistically significant association between the RiskType and the level of preference. The two variables are not independent.

# ------------------------------------------------------------

# Correspondence Analysis (CA)

#Visual way to map association between risk types and preferences.

#Helps interpret which risk category leans toward which preference.

library(FactoMineR)
library(factoextra)

ca_result <- CA(as.matrix(df_collapsed[,-1]), graph=TRUE)
fviz_ca_biplot(ca_result)



```

CA Interpretation :

Here’s how to interpret the plot and the summary output:

Inertia (Explained Variance): The summary output will show you how much of the total "inertia" (which is related to the χ 
2
  statistic) is explained by each dimension (axis).

Dimension 1 (Dim 1) likely accounts for the vast majority of the inertia (~80-90%). Looking at the plot, this horizontal axis clearly separates preferences, running from LeastPreferred on the left to Preferred and HighlyPreferred on the right. It represents the primary axis of "dislike vs. like."

Dimension 2 (Dim 2) accounts for a smaller portion of the inertia. This vertical axis helps separate the categories further.

Interpreting the Biplot:

Safe/Low Risk is positioned very closely to Preferred and HighlyPreferred. This indicates a strong association; people who prefer safe/low-risk investments also tend to rate them as preferred or highly preferred.

High Risk is positioned far out on the left, very close to LeastPreferred and DontKnow. This shows a strong association where high-risk investments are seen as the least preferred. The proximity to "DontKnow" might suggest uncertainty or aversion.

Moderate Risk and Traditional are located closer to the center (origin) and near Neutral. This suggests their preference profiles are less extreme and more average compared to the high and low-risk categories. "Traditional" in particular seems to have a fairly neutral, slightly varied preference profile.

Overall Story: The analysis paints a clear picture. The primary factor driving investment preference is the level of risk. Safe/Low-risk options are strongly and positively preferred, while High-risk options are strongly disliked. Moderate and traditional options fall in the middle, eliciting a more neutral or average response.


```{r Standardized Residuals in R}
library(corrplot)

residual=chi_sq_result$residuals
print("standardized Residuals:");print(round(residual,2))

# Visualize the residuals using corrplot
# Positive residuals are in blue, negative are in red.
# The size and color intensity of the circle correspond to the magnitude of the residual.

corrplot(residual,is.cor=FALSE,
         method = "color", # Display as colored cells
         tl.col = "black",
         addCoef.col = "black", # Add the residual values on the plot
         number.cex = 0.8)


```
The key to interpretation is the "rule of 2":

A standardized residual greater than 2 means the observed count in that cell is significantly higher than what we'd expect if there were no association.

A standardized residual less than -2 means the observed count is significantly lower than expected.

Values between -2 and 2 are considered within the range of random chance.



```{r  H0 3 again boldness in behaviour}

# Major Invest Objectives
Major_Objectives=rbind(table(data1[[16]]),table(data1[[17]]),table(data1[[18]]),table(data1[[19]]),
      table(data1[[20]]),table(data1[[21]]),table(data1[[22]]),table(data1[[23]]))
rownames(Major_Objectives)=c('Wealth Creation','Earn regular Return','Meet unanticipated e',
               'Children/Self education','Children/self Marriage','Old age Plannig',
               'Reduce Tax Burden','House Purchase')
Major_Objectives

# Ordinary INvest objectives
Ordinary_Objectuves=rbind(table(data1[[25]]),table(data1[[26]]),table(data1[[27]]),table(data1[[28]]),
      table(data1[[29]]),table(data1[[30]]))
rownames(Ordinary_Objectuves)=c('Make Earning/ part of income','Have a flow of income','Anticipated future needs',
               'Build reserve for unseen contigencies','Build assets[Land,Gold]','Save Tax')

Ordinary_Objectuves

Total_Major_invest_obj_score=rowSums(Major_Objectives)
Total_Ordi_invest_obj_score=rowSums(Ordinary_Objectuves)



# Visualisation of total score
barplot(Major_Objectives[,1],col='green',ylim=c(0,1000))
barplot(Major_Objectives[,2],col='red',ylim=c(0,1000))
barplot(Major_Objectives[,3],col='orange',ylim=c(0,1000))

chisq_major=chisq.test(Major_Objectives[,-1])

S=svd(Major_Objectives)
chisq.test(Ordinary_Objectuves[,-1])      # implies there is an association between investment objectives

# Weighted overall investment objectives
data$Overall_invest_Obj_score=8*data1[[24]] + 6*data1[[31]]


# Correspondence Analysis (CA)

#Visual way to map association between type of investment objectives and ranking.


library(FactoMineR)
library(factoextra)

ca_result1 <- CA(as.matrix(Major_Objectives[-1,]), graph=TRUE)
fviz_ca_biplot(ca_result1)


ca_result2 <- CA(as.matrix(Ordinary_Objectuves[-1,]), graph=TRUE)
fviz_ca_biplot(ca_result2)


```

