#reading the dataset file for further analysis
gssdata <- read.csv(file="gss2years.csv", header=TRUE, sep=",")
head(gssdata)
##   X abany abdefect abhlth abrape absingle age        aged
## 1 1  <NA>     <NA>   <NA>   <NA>     <NA>  31 A GOOD IDEA
## 2 2  <NA>     <NA>   <NA>   <NA>     <NA>  23     DEPENDS
## 3 3  <NA>     <NA>   <NA>   <NA>     <NA>  82 A GOOD IDEA
## 4 4  <NA>     <NA>   <NA>   <NA>     <NA>  40     DEPENDS
## 5 5   YES      YES    YES    YES      YES  46 A GOOD IDEA
## 6 6    NO      YES    YES    YES       NO  31     DEPENDS
##                            attend cappun childs      coneduc     conpress
## 1 ONCE A YR THRU SEV TIMES A YEAR   <NA>      0    ONLY SOME    ONLY SOME
## 2                  LT ONCE A YEAR OPPOSE      0    ONLY SOME   HARDLY ANY
## 3   ONCE A MNTH THRU ALMST WEEKLY OPPOSE      5 A GREAT DEAL A GREAT DEAL
## 4                  LT ONCE A YEAR OPPOSE      2    ONLY SOME    ONLY SOME
## 5 ONCE A YR THRU SEV TIMES A YEAR OPPOSE      1    ONLY SOME    ONLY SOME
## 6   ONCE A MNTH THRU ALMST WEEKLY OPPOSE      3    ONLY SOME   HARDLY ANY
##           degree         divlaw         eqwlth   fechld             fefam
## 1       BACHELOR         EASIER              3    AGREE              <NA>
## 2       BACHELOR      STAY SAME              2    AGREE STRONGLY DISAGREE
## 3 LT HIGH SCHOOL MORE DIFFICULT NO GOVT ACTION DISAGREE          DISAGREE
## 4 LT HIGH SCHOOL         EASIER              4    AGREE             AGREE
## 5 JUNIOR COLLEGE           <NA>              4     <NA>              <NA>
## 6    HIGH SCHOOL           <NA>              4     <NA>              <NA>
##      fepol    finalter       goodlife     grass gunlaw         happy
## 1 DISAGREE      BETTER          AGREE      <NA>   <NA>  PRETTY HAPPY
## 2 DISAGREE STAYED SAME          AGREE     LEGAL   <NA> NOT TOO HAPPY
## 3 DISAGREE       WORSE STRONGLY AGREE NOT LEGAL   <NA> NOT TOO HAPPY
## 4     <NA>       WORSE        NEITHER     LEGAL   <NA>  PRETTY HAPPY
## 5     <NA>       WORSE          AGREE      <NA>  FAVOR  PRETTY HAPPY
## 6     <NA>      BETTER STRONGLY AGREE NOT LEGAL  FAVOR    VERY HAPPY
##      health          helpful          homosex         kidssol letdie1
## 1      <NA> LOOKOUT FOR SELF             <NA>     MUCH BETTER    <NA>
## 2      <NA> LOOKOUT FOR SELF             <NA>  ABOUT THE SAME     YES
## 3      <NA> LOOKOUT FOR SELF             <NA>     MUCH BETTER      NO
## 4      <NA> LOOKOUT FOR SELF             <NA> SOMEWHAT BETTER      NO
## 5      GOOD          DEPENDS NOT WRONG AT ALL  ABOUT THE SAME    <NA>
## 6 EXCELLENT          HELPFUL             <NA>     MUCH BETTER    <NA>
##              marital   natcrime    nateduc     partyid     polviews owngun
## 1      NEVER MARRIED       <NA>       <NA>    DEMOCRAT      LIBERAL   <NA>
## 2      NEVER MARRIED TOO LITTLE TOO LITTLE    DEMOCRAT      LIBERAL   <NA>
## 3            WIDOWED       <NA>       <NA>  REPUBLICAN      LIBERAL   <NA>
## 4      NEVER MARRIED TOO LITTLE TOO LITTLE    DEMOCRAT      LIBERAL   <NA>
## 5 DIVORCED SEPARATED       <NA>       <NA> INDEPENDENT CONSERVATIVE     NO
## 6            MARRIED TOO LITTLE TOO LITTLE    DEMOCRAT      LIBERAL     NO
##           premarsx  race          region    relig         satfin  sei
## 1 NOT WRONG AT ALL OTHER MIDDLE ATLANTIC CATHOLIC   MORE OR LESS 76.4
## 2 NOT WRONG AT ALL WHITE MIDDLE ATLANTIC     NONE   MORE OR LESS 85.1
## 3  SOMETIMES WRONG WHITE MIDDLE ATLANTIC CATHOLIC      SATISFIED   NA
## 4  SOMETIMES WRONG BLACK MIDDLE ATLANTIC     NONE   MORE OR LESS 32.3
## 5             <NA> BLACK MIDDLE ATLANTIC CATHOLIC   MORE OR LESS 63.5
## 6             <NA> BLACK MIDDLE ATLANTIC CATHOLIC NOT AT ALL SAT   NA
##      sex                    sexfreq sibs                 socfrend
## 1   MALE ONCE A YR THRU ONCE A MNTH    2         SEV TIMES A MNTH
## 2 FEMALE                     WEEKLY    3 SEV TIMES A WEEK OR MORE
## 3 FEMALE                       <NA>   10      ONCE A YEAR OR LESS
## 4   MALE                       <NA>   11         SEV TIMES A MNTH
## 5 FEMALE                       <NA>    2                     <NA>
## 6 FEMALE                       <NA>    1                     <NA>
##              socommun suicide1 tvhours year       age3 SEI3
## 1 ONCE A YEAR OR LESS     <NA>       1 2010 18 THRU 36 HIGH
## 2        ONCE A MONTH      YES       0 2010 18 THRU 36 HIGH
## 3 ONCE A YEAR OR LESS       NO       3 2010 54 THRU 89 <NA>
## 4        ONCE A MONTH       NO       4 2010 37 THRU 53  LOW
## 5                <NA>     <NA>      NA 2010 37 THRU 53 HIGH
## 6                <NA>     <NA>      NA 2010 18 THRU 36 <NA>
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#using the is.na() function to remove those rows with "NA"
gssnew<-subset(gssdata,!is.na(gssdata$age))

ggplot(gssnew,mapping = aes(as.numeric(gssnew$age)))+geom_histogram(binwidth = 1)

From this graph, we can say that the majority of the population are in the age bracket (10-30years). The histogram is a great tool to represent data in stacks for further analysis.

qplot(sex, data = gssnew)

pie(table(gssdata$marital))

pie(table(gssdata$race))

From this pie charts, we can conclude that good portion of population is married and belong to white race. Silge (2016) underlined that barcharts are used to identify how three variables are connected.

hist(as.numeric(gssnew$degree), main="Histogram for graduates", xlab="", border="red", breaks = 20, col="blue")

ggplot(data = gssnew, aes(relig, fill=sex))+geom_bar()

ggplot(data = gssnew, aes(marital, fill=region))+geom_bar()

ggplot(data = gssnew, aes(degree))+geom_bar()+facet_grid(race~.)

ggplot(data = gssnew, aes(satfin, fill=region))+geom_bar()+coord_flip()

The simple histogram depicted that there are significant number of citizens with assoicate/junior degree and limited number with bachelors’ degree in the population sample

From the bar charts, we could generate many insights about the GSS survey results about the population demographics & social behaviors.

  1. Protestants are high in number and male to female ratio is lowest in catholic & protestant communities

  2. South Atlantic has the highest proportion of married population and unmarried in West South Central America

  3. White population peformed well in education compared to the black population.

  4. Those who reside in mountains are satisfied financially the most.

gssnew<- subset(gssnew, !is.na(gssnew$goodlife))
ggplot(data = gssnew, aes(satfin, fill=goodlife))+geom_bar()

ggplot(data = gssnew, aes(sex, fill=degree))+geom_bar(position = "dodge")+coord_flip()

ggplot(data = gssnew, aes(age3))+geom_density()

those population with more or less satisfied financially agreed that they are having good life and otherwise. female population scored high on the academic front compared to the male population from the above graphs. The density graph was used to show the rate of change of variable frequency. In this case, the density is low and falling down for age group 18-36 unlike the slope curve for group 37-53

gssnew<-mutate(gssnew, famchildpolicy = ifelse(is.numeric(childs) <3, "good","bad"))
## Warning: package 'bindrcpp' was built under R version 3.4.4

Horton (2015) highlighted that we can use mutate() and ifelse() function to create new variables in the dplyr package.I have created a new variable called “famchildpolicy” that will take the value of “good” if the number of children is less than 3 in the family.

References:

Horton, N. J., & Kleinman, K. (2015). Using R and RStudio for data management, statistical analysis, and graphics. Chapman and Hall/CRC.

Silge, J., & Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in r. The Journal of Open Source Software, 1(3), 37.