Picture of Vietnam

Project Contents




Question for Analysis


By looking at a cross section of the “Vietnam World Bank Livings Standards Survey” of households in Vietnam in 1997 can we make any meaningful observations about population, human development, or migration in Vietnam, especially considering the Vietnam War from 1 November 1955 to 30 April 1975?

Data Exploration


Data Selection

I selected the “Medical Expenses in Vietnam (household Level)” data set from http://vincentarelbundock.github.io/Rdatasets/ and uploaded the original .csv to my github account.

#read the data from my github
vnm <- read.csv("https://raw.githubusercontent.com/pkofy/Bridge_Winter2022/main/VietNamH.csv", stringsAsFactors = FALSE)


Summary Statistics

#transpose and display the first few records
t(vnm[1:5, ]) #I could use t(head(vnm)) but it takes the first six rows and the sixth wraps around in the .html
##          1           2           3           4           5          
## X        "1"         "2"         "3"         "4"         "5"        
## sex      "female"    "female"    "male"      "female"    "female"   
## age      "68"        "57"        "42"        "72"        "73"       
## educyr   " 4"        " 8"        "14"        " 9"        " 1"       
## farm     "no"        "no"        "no"        "no"        "no"       
## urban    "yes"       "yes"       "yes"       "yes"       "yes"      
## hhsize   "6"         "6"         "6"         "6"         "8"        
## lntotal  "10.13649"  "10.25206"  "10.93231"  "10.26749"  "10.48811" 
## lnmed    "11.233210" " 8.505120" " 8.713418" " 9.291736" " 7.555382"
## lnrlfood " 8.639339" " 9.345752" "10.226330" " 9.263722" " 9.592890"
## lnexp12m "11.233210" " 8.505120" " 8.713418" " 9.291736" " 7.555382"
## commune  "1"         "1"         "1"         "1"         "1"

X is the record number
sex, age and educyr describe the head of household with educyr being the years of education obtained
farm and urban describe the household
hhsize is the number of people in the household
lntotal is the natural log of the total miscellaneous expenditures in the household
lnmed is the natural log of the total medical expenditures in the household
lnrlfood is the natural log of the total food expenditures in the household
lnexp12m is the natural log of the expected total medical expenditures in the household over the next 12 months
commune is an organizational division for health and education in Vietnam. In 2008 there were over 9,000 communes. In this survey households from 150 different communes were surveyed.

summary(vnm)
##        X            sex                 age            educyr      
##  Min.   :   1   Length:5999        Min.   :16.00   Min.   : 0.000  
##  1st Qu.:1500   Class :character   1st Qu.:37.00   1st Qu.: 4.000  
##  Median :3000   Mode  :character   Median :46.00   Median : 7.000  
##  Mean   :3000                      Mean   :48.01   Mean   : 7.094  
##  3rd Qu.:4500                      3rd Qu.:58.00   3rd Qu.:10.000  
##  Max.   :5999                      Max.   :95.00   Max.   :22.000  
##                                                                    
##      farm              urban               hhsize          lntotal      
##  Length:5999        Length:5999        Min.   : 1.000   Min.   : 6.543  
##  Class :character   Class :character   1st Qu.: 4.000   1st Qu.: 8.920  
##  Mode  :character   Mode  :character   Median : 5.000   Median : 9.311  
##                                        Mean   : 4.752   Mean   : 9.342  
##                                        3rd Qu.: 6.000   3rd Qu.: 9.759  
##                                        Max.   :19.000   Max.   :12.202  
##                                                                         
##      lnmed           lnrlfood         lnexp12m         commune      
##  Min.   : 0.000   Min.   : 6.356   Min.   : 0.000   Min.   :  1.00  
##  1st Qu.: 4.174   1st Qu.: 8.376   1st Qu.: 5.273   1st Qu.: 51.00  
##  Median : 5.966   Median : 8.691   Median : 6.372   Median : 99.00  
##  Mean   : 5.266   Mean   : 8.680   Mean   : 6.311   Mean   : 98.27  
##  3rd Qu.: 7.180   3rd Qu.: 9.002   3rd Qu.: 7.392   3rd Qu.:146.50  
##  Max.   :12.363   Max.   :11.384   Max.   :12.363   Max.   :194.00  
##                                    NA's   :993


Initial Conclusion

  • There may be an urban/rural divide by sex of head of household.
  • Medical expenses look to be a significant portion of total expenses.
  • There may be an observable difference in age of head of household due to the war.
  • Farms may spend more or less on food per household member than non farm households.


Data Wrangling


String Manipulation

We can remove X as the record number since that information is contained in the order of the records

vnm <- vnm[,c(2:12)]

We can change the values of urban and farm to be more descriptive, clarify the column names and create a new column combining the two.

#install.packages("dplyr",repos = "http://cran.us.r-project.org")
suppressPackageStartupMessages(require(dplyr))

vnm$urban <- replace(vnm$urban, vnm$urban == "yes", "Urban")
vnm$urban <- replace(vnm$urban, vnm$urban == "no", "Rural")
vnm$farm <- replace(vnm$farm, vnm$farm == "yes", "Farm")
vnm$farm <- replace(vnm$farm, vnm$farm == "no", "Home")

vnm <- rename(vnm, c("isurban"="urban", "isfarm"="farm"))

#Not done here but you can specify to add a column after or before another column
vnm$type <- paste(vnm$isurban, vnm$isfarm)

#show changes
t(vnm[1:3,])
##          1            2            3           
## sex      "female"     "female"     "male"      
## age      "68"         "57"         "42"        
## educyr   " 4"         " 8"         "14"        
## isfarm   "Home"       "Home"       "Home"      
## isurban  "Urban"      "Urban"      "Urban"     
## hhsize   "6"          "6"          "6"         
## lntotal  "10.13649"   "10.25206"   "10.93231"  
## lnmed    "11.233210"  " 8.505120"  " 8.713418" 
## lnrlfood " 8.639339"  " 9.345752"  "10.226330" 
## lnexp12m "11.233210"  " 8.505120"  " 8.713418" 
## commune  "1"          "1"          "1"         
## type     "Urban Home" "Urban Home" "Urban Home"


Number Manipulation

We can add a column for each expenditure column, remove the log, convert to today’s VND (Vietnamese Dong) and convert to today’s USD (US Dollar).

#Inflation from 1997 to 2022 in VND
VNDinflation <- 4.2
VNDtoUSD <- 22730

#add new columns
vnm$misc <- round(exp(vnm$lntotal)*VNDinflation/VNDtoUSD*100)/100
vnm$med <- round(exp(vnm$lnmed)*VNDinflation/VNDtoUSD*100)/100
vnm$food <- round(exp(vnm$lnrlfood)*VNDinflation/VNDtoUSD*100)/100
vnm$expmed <- round(exp(vnm$lnexp12m)*VNDinflation/VNDtoUSD*100)/100

#show changes
t(vnm[1:3,])
##          1            2            3           
## sex      "female"     "female"     "male"      
## age      "68"         "57"         "42"        
## educyr   " 4"         " 8"         "14"        
## isfarm   "Home"       "Home"       "Home"      
## isurban  "Urban"      "Urban"      "Urban"     
## hhsize   "6"          "6"          "6"         
## lntotal  "10.13649"   "10.25206"   "10.93231"  
## lnmed    "11.233210"  " 8.505120"  " 8.713418" 
## lnrlfood " 8.639339"  " 9.345752"  "10.226330" 
## lnexp12m "11.233210"  " 8.505120"  " 8.713418" 
## commune  "1"          "1"          "1"         
## type     "Urban Home" "Urban Home" "Urban Home"
## misc     " 4.67"      " 5.24"      "10.34"     
## med      "13.97"      " 0.91"      " 1.12"     
## food     "1.04"       "2.12"       "5.10"      
## expmed   "13.97"      " 0.91"      " 1.12"


Graphics


Table of urban/rural divide by sex of head of household

Type of Household by Female/Male. It looks like women are more likely to be the head of household in an urban commune than in a rural commune.

#room for improvement-> show percentages
table(vnm$sex, vnm$type)
##         
##          Rural Farm Rural Home Urban Farm Urban Home
##   female        661        264         75        624
##   male         2587        757        115        916


Boxplot and pie chart of expenses by type

The expense types are miscellaneous, medical and food expenses (the natural log of their values in 1997 VND). It looks like families spend a significant portion of their expenses on medical but maybe this is due to the log.

boxplot(vnm$lntotal, vnm$lnmed, vnm$lnrlfood, names=c("Misc","Medical","Food"))


meanmisc <- mean(vnm$misc)
meanmedical <- mean(vnm$med)
meanfood <- mean(vnm$food)

#pie chart with percentages
slices <- c(meanmisc, meanmedical, meanfood)
lbls <- c("Misc", "Medical", "Food")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%",sep="")
pie(slices, labels=lbls, main="Pie Chart of Expenses (natural log removed)")


Histogram of age of head of household

As of the date of the survey a 42 year old would have been born at the start of the war in 1955 and a 22 year-old would have been born at the end of the war in 1997. It looks like a chunk has been scooped out from this curve of heads of households aged 40 to 55. One explanation could be that the war caused the death or emigration as refugees of a significant percentage of the population aged 18 to 33.

hist(vnm$age)


We can see this gouge in the demographics in more detail using the ggplot2 package

#install.packages("ggplot2")
require(ggplot2)
qplot(age, data=vnm, binwidth=1)


Scatterplot of size of household by age of head of household

You can see the size of the household increasing when the head of household is between 20 and 40, holding steady between 40 and 60 and decreasing thereafter.

plot(vnm$age, vnm$hhsize)


Scatterplot of food expenditure by size of household

Here is a scatterplot of the natural log of the expenditure of food versus the size of the household.

#can't seem to make the chart differentiate between farm and non farm households.
qplot(lnrlfood, hhsize, data=vnm, colour='cyl')


Here we use ggplot2 to reproduce the chart but differentiating between farm and non farm households. And we are able to see that non farm households seem to spend more per person in the household on food than farm households.

ggplot(vnm, aes(x=lnrlfood, y=hhsize), fill="grey50") + geom_point(aes(color=isfarm))



Conclusion


It looks like women are more likely to be the head of household in an urban commune than in a rural commune. This could be because of more conservative values in rural areas, increased opportunities for women in urban settings, or maybe country-wide infrastructure projects taking men away from the cities for work.

Medical expenses look to be a significant portion of total expenses however this is due to comparing the natural log of the expenses. Once the log is removed the mean medical expenses are a much lower percentage of the total expenses.

It looks like the war decimated a generation who were of fighting age or young enough to seek a home in another country. From Wikipedia, an estimated 2 million were killed, 3 million were wounded and 12 million became refugees during the Vietnam War.

We show that housholds tend to increase in size while the head of household is between 20 and 40 years of age, stay the same while the head of household is between 40 and 60 years of age, and decrease after the head of household is 60.

Households spend more on food the more household members they have. By switching from qplot() to ggplot() we were able to show that non farm households spend more per member than farm households on food.


Follow up questions:
- Are there differences in education between types of households?
- Do older heads of households have higher medical expenses?



References


Treasury Reporting Rates of Exchange as of March 31, 1997
Vietnam Inflation Rate 1996-2022