knitr::opts_chunk$set(error = TRUE,tidy=FALSE,size="small")  ts_chunk$set(error = TRUE) 

Load packages

    set.seed(123)
    library(ggplot2)
    library(dplyr)

BRFSS 2013- SELF REPORTED SURVEILLANCE SURVEY ANALYSIS

Part 1: About Data

Data Sources

The data frame(BRFSS2013) was downloaded from Courser’s Course Web site’s assignment page.I loaded it in the in hard drive (in R Data Files directory) then imported in the R markdown file in R studio for Analysis.The data set is provided in Work space (.Rdata) format for Analysis with R Studio. Categorical values are factors in the R work space. All missing values are coded NA in the R Work space.During analysis I had to manipulate ram by deleting several not required subsets from time to time to make more space in the working memory.

DATA - Generalizability

This data was collected through landline and cell phones from households in all the American states. Any information which had 50 or less samples or if it was thought to be biased was dropped. Futher weightage stratified the data. The non English speaking resondents were not questioned in their own language. This information is non-institutionlized. The data can be considered as random viewing the size of data. This data appears to be valid and reliable when we compare it with other surveys of similar nature. According to independent analysts it stands at 75%.reliability and it is generalizable. It is used for planning health care. By year on year comparison of results from this data the health care planners can find if their plans are working well. The prevalence trends can further help them to revise their plans.

What is BRFSS ?

The Behavioral Risk Factor Surveillance System (BRFSS)is a collaborative project between all of the states in the United States (US) and participating US territories and the Centers for Disease Control and Prevention (CDC) The BRFSS is administered and supported by CDC’s Population Health Surveillance Branch, under the Division of Population Health at the National Center for Chronic Disease Prevention and Health Promotion.

BRFSS is an ongoing surveillance system designed to measure behavioral risk factors for the non-institutionalized adult population (18 years of age and older) residing in the US.Bias to data can exist.

The BRFSS objective is to collect uniform, state-specific include tobacco use, HIV/AIDS knowledge and prevention,exercise, immunization, health status, healthy days - health-related quality of life, health care access,inadequate sleep, hypertension awareness, cholesterol awareness, chronic health conditions, alcohol consumption, fruits and vegetables consumption, arthritis burden, and seat belt use.

Part 2 Research Questions.

Research quesion 1:

What is the health Status of the population by Age,Sex and activity ?

Research quesion.2

What are different behaviours that might cause poor health ?

Research quesion 3:

What is correlation between behaviours and chronic health conditions ?

Following Variables will be used:

** This list shows which variables are used for which research question(s)**

    genhlth:  Status of general health                        Research Question 1
    Sex    :  Sex of Respondent                               Research Question 1
    X_age_g:  Groups of repondents by Age                     Research Question 1
    sleptim1: How Much Time Do You Sleep                      Research Question 2,3
    cvdinfr4: Ever Diagnosed With Heart Attack                Research Question 2,3
    addepev2: Ever Told You Had A Depressive Disorder         Research Question 2,3
    smoke100: Smoking 100 cigarettes a day.                   Research Question 2,3
    diabete3: Told that you are diabetic.                     Research Question 2,3
    exerany2: Exercise and physical activity in past 30 days  Research Question 1,2,3
    toldhi2 : Told that you have blood high cholestrol        Research Question 2,3      
    X_bmi5cat:Weight by Categories                            Research Question 2,3
    X_rfbmi5: Overweight or obese.                            Research Question 2,3
    bphigh4: Ever Told Blood Pressure High                    Research Question 2,3
    X_pacat1 : Is Life Active or Passive                      Research Question 1

Exploring Data Frame brfss2013

This data frame has 330 columns and 491775 rows.

  dim(brfss2013)
## [1] 491775    330

There are 330 variable as shown below.We have to choose required variables from this by subsetting.

  str(brfss2013)
## 'data.frame':    491775 obs. of  330 variables:
##  $ X_state  : Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ fmonth   : Factor w/ 12 levels "January","February",..: 1 1 1 1 2 3 3 3 4 4 ...
##  $ idate    : int  1092013 1192013 1192013 1112013 2062013 3272013 3222013 3042013 4242013 4242013 ...
##  $ imonth   : Factor w/ 12 levels "January","February",..: 1 1 1 1 2 3 3 3 4 4 ...
##  $ iday     : Factor w/ 31 levels "1","2","3","4",..: 9 19 19 11 6 27 22 4 24 24 ...
##  $ iyear    : Factor w/ 2 levels "2013","2014": 1 1 1 1 1 1 1 1 1 1 ...
##  $ dispcode : Factor w/ 2 levels "Completed interview",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ seqno    : int  2013000580 2013000593 2013000600 2013000606 2013000608 2013000630 2013000634 2013000644 2013001305 2013001338 ...
##  $ X_psu    : int  2013000580 2013000593 2013000600 2013000606 2013000608 2013000630 2013000634 2013000644 2013001305 2013001338 ...
##  $ ctelenum : Factor w/ 1 level "Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ pvtresd1 : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 1 1 1 ...
##  $ colghous : Factor w/ 1 level "Yes": NA NA NA NA NA NA NA NA NA NA ...
##  $ stateres : Factor w/ 1 level "Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cellfon3 : Factor w/ 1 level "Not a cellular phone": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ladult   : Factor w/ 2 levels "Yes, male respondent",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ numadult : Factor w/ 19 levels "1","2","3","4",..: 2 2 3 2 2 1 2 1 5 2 ...
##  $ nummen   : Factor w/ 14 levels "0","1","2","3",..: 2 2 3 2 2 1 2 1 5 2 ...
##  $ numwomen : Factor w/ 12 levels "0","1","2","3",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ genhlth  : Factor w/ 5 levels "Excellent","Very good",..: 4 3 3 2 3 2 4 3 1 3 ...
##  $ physhlth : int  30 0 3 2 10 0 1 5 0 0 ...
##  $ menthlth : int  29 0 2 0 2 0 15 0 0 0 ...
##  $ poorhlth : int  30 NA 0 0 0 NA 0 10 NA NA ...
##  $ hlthpln1 : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 1 1 1 ...
##  $ persdoc2 : Factor w/ 3 levels "Yes, only one",..: 1 1 1 1 1 1 2 1 1 1 ...
##  $ medcost  : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ checkup1 : Factor w/ 5 levels "Within past year",..: 1 1 1 2 4 1 1 1 1 1 ...
##  $ sleptim1 : int  NA 6 9 8 6 8 7 6 8 8 ...
##  $ bphigh4  : Factor w/ 4 levels "Yes","Yes, but female told only during pregnancy",..: 1 3 3 3 1 1 1 1 3 3 ...
##  $ bpmeds   : Factor w/ 2 levels "Yes","No": 1 NA NA NA 2 1 1 1 NA NA ...
##  $ bloodcho : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cholchk  : Factor w/ 4 levels "Within past year",..: 1 1 4 1 2 1 1 1 1 1 ...
##  $ toldhi2  : Factor w/ 2 levels "Yes","No": 1 2 2 1 2 1 2 1 1 2 ...
##  $ cvdinfr4 : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ cvdcrhd4 : Factor w/ 2 levels "Yes","No": NA 2 2 2 2 2 2 1 2 2 ...
##  $ cvdstrk3 : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ asthma3  : Factor w/ 2 levels "Yes","No": 1 2 2 2 1 2 2 2 2 2 ...
##  $ asthnow  : Factor w/ 2 levels "Yes","No": 1 NA NA NA 2 NA NA NA NA NA ...
##  $ chcscncr : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ chcocncr : Factor w/ 2 levels "Yes","No": 2 2 2 2 1 2 2 2 2 2 ...
##  $ chccopd1 : Factor w/ 2 levels "Yes","No": 1 2 2 2 2 2 2 2 2 2 ...
##  $ havarth3 : Factor w/ 2 levels "Yes","No": 1 2 1 2 2 2 1 1 1 2 ...
##  $ addepev2 : Factor w/ 2 levels "Yes","No": 1 1 1 2 2 2 2 2 2 2 ...
##  $ chckidny : Factor w/ 2 levels "Yes","No": 1 2 2 2 2 2 2 2 2 2 ...
##  $ diabete3 : Factor w/ 4 levels "Yes","Yes, but female told only during pregnancy",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ veteran3 : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ marital  : Factor w/ 6 levels "Married","Divorced",..: 2 1 1 1 1 2 1 3 1 1 ...
##  $ children : int  0 2 0 0 0 0 1 0 1 0 ...
##  $ educa    : Factor w/ 6 levels "Never attended school or only kindergarten",..: 6 5 6 4 6 6 4 5 6 4 ...
##  $ employ1  : Factor w/ 8 levels "Employed for wages",..: 7 1 1 7 7 1 1 7 7 5 ...
##  $ income2  : Factor w/ 8 levels "Less than $10,000",..: 7 8 8 7 6 8 NA 6 8 4 ...
##  $ weight2  : Factor w/ 570 levels "",".b","100",..: 154 30 63 31 169 128 9 1 139 73 ...
##  $ height3  : int  507 510 504 504 600 503 500 505 602 505 ...
##  $ numhhol2 : Factor w/ 2 levels "Yes","No": 1 2 2 2 2 1 2 2 2 2 ...
##  $ numphon2 : Factor w/ 6 levels "1 residential telephone number",..: 2 NA NA NA NA 1 NA NA NA NA ...
##  $ cpdemo1  : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cpdemo4  : int  10 70 70 75 0 70 40 1 60 50 ...
##  $ internet : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 1 1 1 ...
##  $ renthom1 : Factor w/ 3 levels "Own","Rent","Other arrangement": 1 1 1 1 1 1 1 2 1 1 ...
##  $ sex      : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
##  $ pregnant : Factor w/ 2 levels "Yes","No": NA NA NA NA NA NA 2 NA NA NA ...
##  $ qlactlm2 : Factor w/ 2 levels "Yes","No": 1 2 1 2 2 2 1 1 2 2 ...
##  $ useequip : Factor w/ 2 levels "Yes","No": 1 2 2 2 2 2 2 2 2 2 ...
##  $ blind    : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ decide   : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ diffwalk : Factor w/ 2 levels "Yes","No": 1 2 1 2 2 2 2 1 2 2 ...
##  $ diffdres : Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
##  $ diffalon : Factor w/ 2 levels "Yes","No": 1 2 2 2 2 2 2 2 2 2 ...
##  $ smoke100 : Factor w/ 2 levels "Yes","No": 1 2 1 2 1 2 1 1 2 2 ...
##  $ smokday2 : Factor w/ 3 levels "Every day","Some days",..: 3 NA 2 NA 3 NA 3 1 NA NA ...
##  $ stopsmk2 : Factor w/ 2 levels "Yes","No": NA NA 1 NA NA NA NA 2 NA NA ...
##  $ lastsmk2 : Factor w/ 8 levels "Within the past month",..: 7 NA NA NA 1 NA 5 NA NA NA ...
##  $ usenow3  : Factor w/ 3 levels "Every day","Some days",..: 3 3 3 3 3 3 3 3 1 3 ...
##  $ alcday5  : int  201 0 220 208 210 0 201 202 101 0 ...
##  $ avedrnk2 : int  2 NA 4 2 2 NA 1 1 1 NA ...
##  $ drnk3ge5 : int  0 NA 20 0 0 NA 0 0 0 NA ...
##  $ maxdrnks : int  2 NA 10 2 3 NA 1 1 2 NA ...
##  $ fruitju1 : int  304 305 301 202 0 205 320 0 0 202 ...
##  $ fruit1   : int  104 301 203 306 302 206 325 320 101 202 ...
##  $ fvbeans  : int  303 310 202 202 101 0 330 360 202 203 ...
##  $ fvgreen  : int  310 203 202 310 310 203 315 315 203 201 ...
##  $ fvorang  : int  303 202 310 305 303 0 310 325 0 201 ...
##  $ vegetab1 : int  NA 203 330 204 101 207 310 308 101 203 ...
##  $ exerany2 : Factor w/ 2 levels "Yes","No": 2 1 2 1 2 1 1 1 1 1 ...
##  $ exract11 : Factor w/ 75 levels "Active Gaming Devices (Wii Fit, Dance, Dance revolution)",..: NA 64 NA 64 NA 6 64 64 7 64 ...
##  $ exeroft1 : int  NA 105 NA 205 NA 102 220 102 102 220 ...
##  $ exerhmm1 : int  NA 20 NA 30 NA 15 100 15 100 30 ...
##  $ exract21 : Factor w/ 76 levels "Active Gaming Devices (Wii Fit, Dance, Dance revolution)",..: NA 71 NA 75 NA 18 75 75 75 18 ...
##  $ exeroft2 : int  NA 101 NA NA NA 102 NA NA NA 101 ...
##  $ exerhmm2 : int  NA 10 NA NA NA 30 NA NA NA 100 ...
##  $ strength : int  0 0 0 0 0 0 205 0 102 0 ...
##  $ lmtjoin3 : Factor w/ 2 levels "Yes","No": 1 NA 1 NA NA NA 2 1 2 NA ...
##  $ arthdis2 : Factor w/ 2 levels "Yes","No": 1 NA 1 NA NA NA 1 2 2 NA ...
##  $ arthsocl : Factor w/ 3 levels "A lot","A little",..: 1 NA 2 NA NA NA 3 1 3 NA ...
##  $ joinpain : int  7 NA 5 NA NA NA 3 8 4 NA ...
##  $ seatbelt : Factor w/ 6 levels "Always","Nearly always",..: 1 1 1 1 1 1 1 1 2 1 ...
##  $ flushot6 : Factor w/ 2 levels "Yes","No": 2 1 1 2 2 1 2 1 1 2 ...
##  $ flshtmy2 : Factor w/ 26 levels "January 2012",..: NA 10 13 NA NA NA NA 10 10 NA ...
##  $ tetanus  : Factor w/ 4 levels "Yes, received Tdap",..: 4 1 1 4 4 4 4 4 1 4 ...
##  $ pneuvac3 : Factor w/ 2 levels "Yes","No": 1 2 2 2 2 1 2 2 2 2 ...
##   [list output truncated]

Research quesion 1:

What is the Current Status of Health by Sex,Age Groups and Activity Levels.

The tablulated information gives health by sex,age and quality of life

##            sex
## genhlth      Male Female
##   Excellent 35741  49740
##   Very good 65135  93940
##   Good      62998  87557
##   Fair      25882  40844
##   Poor      10713  17238
##            X_age_g
## genhlth     Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54
##   Excellent         6896        12204        13243        15683
##   Very good        10266        18453        21542        27630
##   Good              7795        14518        17677        24513
##   Fair              1874         4047         6055        10738
##   Poor               303          821         1729         4950
##            X_age_g
## genhlth     Age 55 to 64 Age 65 or older
##   Excellent        17434           20020
##   Very good        34626           46557
##   Good             32745           53305
##   Fair             16092           27919
##   Poor              8098           12050
##            exerany2
## genhlth        Yes     No
##   Excellent  67914  11453
##   Very good 120595  28860
##   Good       97095  42443
##   Fair       34844  26949
##   Poor       10941  14856

General Health Status:physical,mental and poor health prevalence in

population under study.Plots show the health in population

Activity level and General Health as shown here

Both sexes have almost similar Health levels.

Research Quesion 2

Behaviour Patterns and Prevalence of Major disorders. Preparation of Data Frame and Analysis,Checking Missing Values and tabulating the Summaries of Variables which we will use for analysis.There are three habits which seem to effect life:Physical Activity,smoking and sleeping habit.

Calculating figures and Plotting

##       genhlth           sex                    X_age_g       X_rfbmi5    
##  Excellent: 63241   Male  :149886   Age 18 to 24   :  9038   No :121282  
##  Very good:121403   Female:214906   Age 25 to 34   : 26391   Yes:243510  
##  Good     :109965                   Age 35 to 44   : 42382               
##  Fair     : 49386                   Age 45 to 54   : 65957               
##  Poor     : 20797                   Age 55 to 64   : 88956               
##                                     Age 65 or older:132068               
##          X_bmi5cat     
##  Underweight  :  5537  
##  Normal weight:115745  
##  Overweight   :133390  
##  Obese        :110120  
##                        
##                        
##                                        diabete3      cvdinfr4    
##  Yes                                       : 51202   Yes: 24347  
##  Yes, but female told only during pregnancy:  3105   No :340445  
##  No                                        :303594               
##  No, pre-diabetes or borderline diabetes   :  6891               
##                                                                  
##                                                                  
##                                        bphigh4       addepev2    
##  Yes                                       :161096   Yes: 73620  
##  Yes, but female told only during pregnancy:  2258   No :291172  
##  No                                        :197358               
##  Told borderline or pre-hypertensive       :  4080               
##                                                                  
##                                                                  
##  exerany2     toldhi2      smoke100        sleptim1     
##  Yes:269002   Yes:160553   Yes:166624   Min.   : 1.000  
##  No : 95790   No :204239   No :198168   1st Qu.: 6.000  
##                                         Median : 7.000  
##                                         Mean   : 7.057  
##                                         3rd Qu.: 8.000  
##                                         Max.   :24.000

Behaviours and Life Style

Smoking by Age and Sex.

Tabulating and Plottig.

## 
##    Yes     No 
## 166624 198168
## 
##      Yes       No 
## 45.67644 54.32356
##      Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54 Age 55 to 64 Age 65 or older
##                                                                                      
## Yes     0.6485888    2.7755543    4.3364438    7.7849843   11.9136385      18.2172306
## No      1.8289875    4.4589794    7.2816838   10.2957302   12.4717647      17.9864142
##          Male   Female
##                       
## Yes  21.44099 24.23545
## No   19.64709 34.67647

Exercise and Physical Activity.This activity is useful for health.

Summary of subset.Table and Plots.

## 
##    Yes     No 
## 269002  95790
## 
##     Yes      No 
## 73.7412 26.2588
##      Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54 Age 55 to 64 Age 65 or older
##                                                                                      
## Yes      2.109421     5.970526     9.181945    13.653534    17.868265       24.957510
## No       0.368155     1.264008     2.436183     4.427180     6.517139       11.246135

Seeping Time.How much time people spend while sleeping?.

Insufficient sleeping causes problems for health.

Let us compare sleeping time with Health.Tables and Plots.

## 
##      1      2      3      4      5      6      7      8      9     10 
##    130    718   2430  10283  24362  79431 110243 105978  17984   8820 
##     11     12     13     14     15     16     17     18     19     20 
##    580   2632    137    323    265    254     27    115     12     44 
##     21     22     23     24 
##      3      6      1     14
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Obesity and Overweight Prevalence:How does obesity

effects our health. Table and Plots.

table(disease$X_rfbmi5)
## 
##     No    Yes 
## 121282 243510
table(disease$X_rfbmi5,disease$genhlth)
##      
##       Excellent Very good  Good  Fair  Poor
##   No      31835     42698 28934 11938  5877
##   Yes     31406     78705 81031 37448 14920
prop.table(table(disease$X_bmi5cat,disease$X_age_g))*100
##                
##                 Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54
##   Underweight     0.09018838   0.11760126   0.12966293   0.21957718
##   Normal weight   1.24317419   2.69578280   3.65331477   5.27012654
##   Overweight      0.67654992   2.38574311   4.03216079   6.47903463
##   Obese           0.46766376   2.03540648   3.80298910   6.11197614
##                
##                 Age 55 to 64 Age 65 or older
##   Underweight     0.28454571      0.67627580
##   Normal weight   6.92449396     11.94214785
##   Overweight      8.98649093     14.00606373
##   Obese           8.18987258      9.57915744
prop.table(table(disease$X_rfbmi5,disease$exerany2))*100
##      
##             Yes        No
##   No  26.339119  6.907772
##   Yes 47.402081 19.351027
obs <- table(disease$genhlth,disease$X_rfbmi5)
plot(obs,main="Obesity vs General Health",col=rainbow(5))

qplot(disease$X_bmi5cat,xlab="Obesity",main="Obesity prevalence")

qplot(X_age_g, data = disease, fill=X_bmi5cat,
xlab="Age",ylab ="Population",
main ="Obese-Overwight by age")

prop.table(table(disease$diabete3,disease$genhlth))*100
##                                             
##                                                Excellent   Very good
##   Yes                                         0.43641308  2.26238514
##   Yes, but female told only during pregnancy  0.15570517  0.30208996
##   No                                         16.60864273 30.25559771
##   No, pre-diabetes or borderline diabetes     0.13541964  0.45998816
##                                             
##                                                     Good        Fair
##   Yes                                         5.09961841  4.12152679
##   Yes, but female told only during pregnancy  0.25631045  0.10800675
##   No                                         24.06494660  8.90425229
##   No, pre-diabetes or borderline diabetes     0.72370008  0.40434001
##                                             
##                                                     Poor
##   Yes                                         2.11600035
##   Yes, but female told only during pregnancy  0.02905765
##   No                                          3.39042523
##   No, pre-diabetes or borderline diabetes     0.16557381
prop.table(table(disease$genhlth,disease$exerany2))*100
##            
##                   Yes        No
##   Excellent 15.107239  2.228941
##   Very good 27.166988  6.113073
##   Good      21.289118  8.855457
##   Fair       7.716726  5.821400
##   Poor       2.461129  3.239929
prop.table(table(disease$genhlth,disease$sex))*100 
##            
##                  Male    Female
##   Excellent  7.140782 10.195399
##   Very good 13.661210 19.618851
##   Good      12.778789 17.365787
##   Fair       5.302748  8.235378
##   Poor       2.204544  3.496513

Prevalence of Disorders

Diabetes: What is the prevalence of diabetes by Age Group ?

Tabulating and Plotting the relevant information.

## 
##                                        Yes 
##                                      51202 
## Yes, but female told only during pregnancy 
##                                       3105 
##                                         No 
##                                     303594 
##    No, pre-diabetes or borderline diabetes 
##                                       6891
## 
##                                        Yes 
##                                  14.035944 
## Yes, but female told only during pregnancy 
##                                   0.851170 
##                                         No 
##                                  83.223865 
##    No, pre-diabetes or borderline diabetes 
##                                   1.889022
##                                             
##                                              Age 18 to 24 Age 25 to 34
##   Yes                                          0.04166758   0.19545385
##   Yes, but female told only during pregnancy   0.01644773   0.12308384
##   No                                           2.40273910   6.86528213
##   No, pre-diabetes or borderline diabetes      0.01672186   0.05071383
##                                             
##                                              Age 35 to 44 Age 45 to 54
##   Yes                                          0.63734950   1.83638896
##   Yes, but female told only during pregnancy   0.23410601   0.19106779
##   No                                          10.62112108  15.77063093
##   No, pre-diabetes or borderline diabetes      0.12555100   0.28262681
##                                             
##                                              Age 55 to 64 Age 65 or older
##   Yes                                          3.85315467      7.47192921
##   Yes, but female told only during pregnancy   0.13377486      0.15268975
##   No                                          19.90257462     27.66151670
##   No, pre-diabetes or borderline diabetes      0.49589903      0.91750916
##                                             Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54 Age 55 to 64 Age 65 or older
##                                                                                                                             
## Yes                                           0.04166758   0.19545385   0.63734950   1.83638896   3.85315467      7.47192921
## Yes, but female told only during pregnancy    0.01644773   0.12308384   0.23410601   0.19106779   0.13377486      0.15268975
## No                                            2.40273910   6.86528213  10.62112108  15.77063093  19.90257462     27.66151670
## No, pre-diabetes or borderline diabetes       0.01672186   0.05071383   0.12555100   0.28262681   0.49589903      0.91750916

Cardio Vascular Disorders

How the prevalence of Heart diseases increase with aging.?

Tabulating and plotting.

table(disease$cvdinfr4)
## 
##    Yes     No 
##  24347 340445
prop.table(table(disease$cvdinfr4))*100
## 
##       Yes        No 
##  6.674214 93.325786
prop.table(ftable(table(disease$cvdinfr4,disease$X_age_g)))*100
##      Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54 Age 55 to 64 Age 65 or older
##                                                                                      
## Yes    0.01288405   0.04468300   0.15077085   0.60966249   1.52278559      4.33342836
## No     2.46469221   7.18985065  11.46735674  17.47105200  22.86261760     31.87021645
qplot(disease$cvdinfr4,xlab="affirmation",ylab="count",
          main="Heart Disease Prevalence")

qplot(X_age_g, data = disease, fill=cvdinfr4,xlab="CVD",
      ylab ="Prevalence",main ="Heart Disease by Age")

High Blood Pressure

How does the high blood pressure increases with aging.?

table(disease$bphigh4)
## 
##                                        Yes 
##                                     161096 
## Yes, but female told only during pregnancy 
##                                       2258 
##                                         No 
##                                     197358 
##        Told borderline or pre-hypertensive 
##                                       4080
prop.table(table(disease$bphigh4))*100
## 
##                                        Yes 
##                                 44.1610562 
## Yes, but female told only during pregnancy 
##                                  0.6189829 
##                                         No 
##                                 54.1015154 
##        Told borderline or pre-hypertensive 
##                                  1.1184456
prop.table(ftable(table(disease$bphigh4,disease$X_age_g)))*100
##                                             Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54 Age 55 to 64 Age 65 or older
##                                                                                                                             
## Yes                                          0.223140858  1.020581592  2.488541415  6.123215421 11.593181868    22.712395009
## Yes, but female told only during pregnancy   0.025219851  0.146384789  0.161461874  0.106087853  0.084705805     0.095122700
## No                                           2.220443431  6.018224084  8.859021031 11.644443957 12.393638018    12.965744863
## Told borderline or pre-hypertensive          0.008772122  0.049343187  0.109103270  0.206967258  0.313877497     0.430382245
 qplot(disease$bphigh4)

 qplot(X_age_g, data = disease, fill=bphigh4,xlab="High BP",
 ylab ="Prevalence",main ="High blood pressure by age")

High Blood cholesterol

How prevalence of High Cholesterol level relates to aging.?

Tabulating and plotting this information.

## 
##    Yes     No 
## 160553 204239
## 
##     Yes      No 
## 44.0122 55.9878
##      Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54 Age 55 to 64 Age 65 or older
##                                                                                      
## Yes      0.234106     1.142295     3.027753     7.028663    12.330863       20.248525
## No       2.243470     6.092239     8.590375    11.052052    12.054541       15.955120

Depression and Stress Disorders

How the depression relates to different age groups.?

Tabulating and Plotting this information.


   Yes     No 
 73620 291172 

     Yes       No 
20.18136 79.81864 
     Age 18 to 24 Age 25 to 34 Age 35 to 44 Age 45 to 54 Age 55 to 64 Age 65 or older
                                                                                     
Yes      0.436139     1.444933     2.423025     4.241869     5.863067        5.772331
No       2.041437     5.789601     9.195103    13.838845    18.522336       30.431314

Compare Diabetes and Obesity

Tabulation and Plots.

## hlthdiab1
##                                        Yes 
##                                      62363 
## Yes, but female told only during pregnancy 
##                                       4602 
##                                         No 
##                                     415374 
##    No, pre-diabetes or borderline diabetes 
##                                       8604
## 
##   Underweight Normal weight    Overweight         Obese 
##          5537        115745        133390        110120

Research quesion 3:

Analyzing Association of behavioural outcomes with the causes by Linear and Multiple Regression Methods

   vars <- names(brfss2013) %in% c("sleptim1",
            "cvdinfr4","addepev2","diabete3",
            "bphigh4","toldhi2","smoke100","X_rfbmi5")

          hlthsub2 <- brfss2013[vars] 
          
          names(hlthsub2)
## [1] "sleptim1" "bphigh4"  "toldhi2"  "cvdinfr4" "addepev2" "diabete3"
## [7] "smoke100" "X_rfbmi5"
          MissingData <- function(x){sum(is.na(x))/length(x)*100}
          
          apply(hlthsub2, 2, MissingData)
##   sleptim1    bphigh4    toldhi2   cvdinfr4   addepev2   diabete3 
##  1.5021097  0.2887499 14.5721112  0.5260536  0.4654568  0.1691831 
##   smoke100   X_rfbmi5 
##  3.0339078  5.4352092
          summary(hlthsub2)
##     sleptim1                                             bphigh4      
##  Min.   :  0.000   Yes                                       :198921  
##  1st Qu.:  6.000   Yes, but female told only during pregnancy:  3680  
##  Median :  7.000   No                                        :282687  
##  Mean   :  7.052   Told borderline or pre-hypertensive       :  5067  
##  3rd Qu.:  8.000   NA's                                      :  1420  
##  Max.   :450.000                                                      
##  NA's   :7387                                                         
##  toldhi2       cvdinfr4      addepev2     
##  Yes :183501   Yes : 29284   Yes : 95779  
##  No  :236612   No  :459904   No  :393707  
##  NA's: 71662   NA's:  2587   NA's:  2289  
##                                           
##                                           
##                                           
##                                           
##                                        diabete3      smoke100     
##  Yes                                       : 62363   Yes :215201  
##  Yes, but female told only during pregnancy:  4602   No  :261654  
##  No                                        :415374   NA's: 14920  
##  No, pre-diabetes or borderline diabetes   :  8604                
##  NA's                                      :   832                
##                                                                   
##                                                                   
##  X_rfbmi5     
##  No  :163161  
##  Yes :301885  
##  NA's: 26729  
##               
##               
##               
## 
          summary(hlthsub2$X_rfbmi5)
##     No    Yes   NA's 
## 163161 301885  26729
           hlthsub2$X_rfbmi5 <- replace(hlthsub2$X_rfbmi5,
                   which(is.na(hlthsub2$X_rfbmi5)), "Yes")
          
           summary(hlthsub2$X_rfbmi5)
##     No    Yes 
## 163161 328614
          summary(hlthsub2$smoke100)
##    Yes     No   NA's 
## 215201 261654  14920
          hlthsub2$addepev2 <- replace(hlthsub2$addepev2,
                   which(is.na(hlthsub2$addepev2)), "Yes")
          
          summary(hlthsub2$addepev2)
##    Yes     No 
##  98068 393707
          hlthsub2$smoke100 <- replace(hlthsub2$smoke100,
                   which(is.na(hlthsub2$smoke100)), "Yes")
          
          summary(hlthsub2$smoke100)
##    Yes     No 
## 230121 261654
          hlthsub2$diabete3 <- replace(hlthsub2$diabete3,
                   which(is.na(hlthsub2$diabete3)), "Yes")
          
          summary(hlthsub2$diabete3)
##                                        Yes 
##                                      63195 
## Yes, but female told only during pregnancy 
##                                       4602 
##                                         No 
##                                     415374 
##    No, pre-diabetes or borderline diabetes 
##                                       8604
          hlthsub2$bphigh4 <- replace(hlthsub2$bphigh4,
                   which(is.na(hlthsub2$bphigh4)), "No")
          
          summary(hlthsub2$bphigh4)
##                                        Yes 
##                                     198921 
## Yes, but female told only during pregnancy 
##                                       3680 
##                                         No 
##                                     284107 
##        Told borderline or pre-hypertensive 
##                                       5067
          summary(hlthsub2$sleptim1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   6.000   7.000   7.052   8.000 450.000    7387
          mean(hlthsub2$sleptim1,na.rm = T)
## [1] 7.052099
          hlthsub2$sleptim1 <- replace(hlthsub2$sleptim1,
                   which(is.na(hlthsub2$sleptim1)), 7)
          summary(hlthsub2$sleptim1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   6.000   7.000   7.051   8.000 450.000
          summary(hlthsub2$toldhi2)
##    Yes     No   NA's 
## 183501 236612  71662
          hlthsub2$toldhi2 <- replace(hlthsub2$toldhi2,
                   which(is.na(hlthsub2$toldhi2)), "Yes")
        
          summary(hlthsub2$toldhi2)
##    Yes     No 
## 255163 236612
          summary(hlthsub2$cvdinfr4)
##    Yes     No   NA's 
##  29284 459904   2587
          hlthsub2$cvdinfr4 <- replace(hlthsub2$cvdinfr4,
                   which(is.na(hlthsub2$cvdinfr4)), "Yes")
          
          summary(hlthsub2$cvdinfr4)
##    Yes     No 
##  31871 459904
          hlthsub2$bphigh4 <- as.factor(ifelse(hlthsub2$bphigh4=="Yes",
                     "Yes",(ifelse(hlthsub2$bphigh4=="Yes, but female
                     told only during pregnancy", "Yes",
                      (ifelse(hlthsub2$bphigh4=="Told borderline or
                      pre-hypertensive", "Yes","No"))))))
 

          hlthsub2$diabete3 <- as.factor(ifelse(hlthsub2$diabete3 == "Yes",
                     "Yes",(ifelse(hlthsub2$diabete3 == "Yes, but 
                     female told only during pregnancy","Yes",
                     (ifelse(hlthsub2$diabete3 == "Told borderline
                     diabetes or pre-diabetes","Yes","No"))))))
        
        library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     combine, src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

Converting factors to numerics for analysis.

hlthsub2$addepev2 <- ifelse(hlthsub2$addepev2=="Yes", 1, 0)
hlthsub2$cvdinfr4 <- ifelse(hlthsub2$cvdinfr4=="Yes", 1, 0)
hlthsub2$smoke100 <- ifelse(hlthsub2$smoke100=="Yes", 1, 0)
hlthsub2$diabete3 <- ifelse(hlthsub2$diabete3=="Yes", 1, 0)
hlthsub2$toldhi2 <- ifelse(hlthsub2$toldhi2 =="Yes", 1, 0)
hlthsub2$bphigh4 <- ifelse(hlthsub2$bphigh4=="Yes", 1, 0)
hlthsub2$X_rfbmi5 <- ifelse(hlthsub2$X_rfbmi5=="Yes", 1, 0)

Checking Missing Values

MissingData <- function(x){sum(is.na(x))/length(x)*100}
apply(hlthsub2, 2, MissingData)
## sleptim1  bphigh4  toldhi2 cvdinfr4 addepev2 diabete3 smoke100 X_rfbmi5 
##        0        0        0        0        0        0        0        0

Correlation and Regression Analysis.

Finding correlation and plotting a corrplot. Fitting model by logistic regression using binomial method. Tabulation and Plotting correlations between various variables.

library(corrplot)
M <- cor(hlthsub2)
corrplot(M, method="number",pch=23,col=rainbow(7))

library(corrplot)
corrplot(M, type="upper", order="hclust", tl.col=rainbow(7), tl.srt=45,method="number",pch=23)

print(M)
             sleptim1    bphigh4      toldhi2   cvdinfr4    addepev2
sleptim1  1.000000000 0.00299319 -0.002282787 0.00151238 -0.05198593
bphigh4   0.002993190 1.00000000  0.180710056 0.17632702  0.07233525
toldhi2  -0.002282787 0.18071006  1.000000000 0.10240121  0.07886054
cvdinfr4  0.001512380 0.17632702  0.102401212 1.00000000  0.05931655
addepev2 -0.051985927 0.07233525  0.078860535 0.05931655  1.00000000
diabete3  0.004080381 0.27190219  0.130802191 0.15972742  0.07907738
smoke100 -0.024540698 0.07609272  0.064803460 0.09378122  0.10470799
X_rfbmi5 -0.028331766 0.18674776  0.067617184 0.04047252  0.04811917
            diabete3    smoke100    X_rfbmi5
sleptim1 0.004080381 -0.02454070 -0.02833177
bphigh4  0.271902195  0.07609272  0.18674776
toldhi2  0.130802191  0.06480346  0.06761718
cvdinfr4 0.159727418  0.09378122  0.04047252
addepev2 0.079077382  0.10470799  0.04811917
diabete3 1.000000000  0.04739043  0.15123138
smoke100 0.047390432  1.00000000  0.01842239
X_rfbmi5 0.151231383  0.01842239  1.00000000
summary(M)
    sleptim1             bphigh4            toldhi2         
 Min.   :-0.0519859   Min.   :0.002993   Min.   :-0.002283  
 1st Qu.:-0.0254885   1st Qu.:0.075153   1st Qu.: 0.066914  
 Median :-0.0003852   Median :0.178518   Median : 0.090631  
 Mean   : 0.1126806   Mean   :0.245889   Mean   : 0.202864  
 3rd Qu.: 0.0032650   3rd Qu.:0.208036   3rd Qu.: 0.143279  
 Max.   : 1.0000000   Max.   :1.000000   Max.   : 1.000000  
    cvdinfr4           addepev2           diabete3      
 Min.   :0.001512   Min.   :-0.05199   Min.   :0.00408  
 1st Qu.:0.054606   1st Qu.: 0.05652   1st Qu.:0.07116  
 Median :0.098091   Median : 0.07560   Median :0.14102  
 Mean   :0.204192   Mean   : 0.17380   Mean   :0.23053  
 3rd Qu.:0.163877   3rd Qu.: 0.08549   3rd Qu.:0.18777  
 Max.   :1.000000   Max.   : 1.00000   Max.   :1.00000  
    smoke100           X_rfbmi5       
 Min.   :-0.02454   Min.   :-0.02833  
 1st Qu.: 0.04015   1st Qu.: 0.03496  
 Median : 0.07045   Median : 0.05787  
 Mean   : 0.17258   Mean   : 0.18553  
 3rd Qu.: 0.09651   3rd Qu.: 0.16011  
 Max.   : 1.00000   Max.   : 1.00000  

Correlation and Fitting of the Models Cardio,depression and diabetes with other variables. A plot of a fit model for cardiovascular disorder against all variables is drawn.

library(stargazer, quietly = TRUE)
## 
## Please cite as:
##  Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2. http://CRAN.R-project.org/package=stargazer
fit <- glm(cvdinfr4 ~ ., data=hlthsub2, family = "binomial")
fit1 <- glm(addepev2 ~ ., data=hlthsub2, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
fit2 <- glm(diabete3 ~ cvdinfr4+addepev2+bphigh4+smoke100+
        toldhi2+sleptim1, data=hlthsub2, family = "binomial")
summary(fit)
## 
## Call:
## glm(formula = cvdinfr4 ~ ., family = "binomial", data = hlthsub2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8987  -0.3972  -0.2618  -0.2160   2.9409  
## 
## Coefficients:
##              Estimate Std. Error  z value Pr(>|z|)    
## (Intercept) -4.262688   0.027655 -154.140  < 2e-16 ***
## sleptim1     0.006726   0.003001    2.241    0.025 *  
## bphigh4      1.171271   0.013841   84.623  < 2e-16 ***
## toldhi2      0.544248   0.013241   41.105  < 2e-16 ***
## addepev2     0.267102   0.013492   19.797  < 2e-16 ***
## diabete3     0.834225   0.013728   60.766  < 2e-16 ***
## smoke100     0.640269   0.012428   51.517  < 2e-16 ***
## X_rfbmi5    -0.055023   0.013937   -3.948 7.88e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 236049  on 491774  degrees of freedom
## Residual deviance: 211103  on 491767  degrees of freedom
## AIC: 211119
## 
## Number of Fisher Scoring iterations: 6
plot(fit)

Multiple Linear Regression Example

require(stargazer)
library(stargazer, quietly = TRUE)
stargazer(fit,fit1,fit2,type="text")
## 
## ========================================================
##                            Dependent variable:          
##                   --------------------------------------
##                     cvdinfr4     addepev2     diabete3  
##                       (1)          (2)          (3)     
## --------------------------------------------------------
## sleptim1            0.007**     -0.092***     0.012***  
##                     (0.003)      (0.003)      (0.003)   
##                                                         
## bphigh4             1.171***     0.165***     1.532***  
##                     (0.014)      (0.008)      (0.010)   
##                                                         
## toldhi2             0.544***     0.288***     0.505***  
##                     (0.013)      (0.007)      (0.010)   
##                                                         
## addepev2            0.267***                  0.351***  
##                     (0.013)                   (0.010)   
##                                                         
## cvdinfr4                         0.253***     0.818***  
##                                  (0.013)      (0.014)   
##                                                         
## diabete3            0.834***     0.327***               
##                     (0.014)      (0.010)                
##                                                         
## smoke100            0.640***     0.468***     0.073***  
##                     (0.012)      (0.007)      (0.009)   
##                                                         
## X_rfbmi5           -0.055***     0.152***               
##                     (0.014)      (0.008)                
##                                                         
## Constant           -4.263***    -1.383***    -3.349***  
##                     (0.028)      (0.020)      (0.023)   
##                                                         
## --------------------------------------------------------
## Observations        491,775      491,775      491,775   
## Log Likelihood    -105,551.600 -239,307.700 -166,126.000
## Akaike Inf. Crit. 211,119.100  478,631.300  332,266.000 
## ========================================================
## Note:                        *p<0.1; **p<0.05; ***p<0.01

Corelation of variables with eg.,diabetes (we can do this for sleptime1,bphigh4,toldhi2,cvdinfr4, addepev2 or smokine if required) Values of correlation for diabetes,sleeptime,cholesterol, high b.p and depression are given below

cor(hlthsub2$diabete3,hlthsub2$sleptim1)
## [1] 0.004080381
cor(hlthsub2$diabete3,hlthsub2$bphigh4)
## [1] 0.2719022
cor(hlthsub2$diabete3,hlthsub2$toldhi2)
## [1] 0.1308022
cor(hlthsub2$diabete3,hlthsub2$cvdinfr4)
## [1] 0.1597274
cor(hlthsub2$diabete3,hlthsub2$addepev2)
## [1] 0.07907738
cor(hlthsub2$diabete3,hlthsub2$smoke100)
## [1] 0.04739043
aggregate(diabete3~X_rfbmi5,data=hlthsub2,mean)
##   X_rfbmi5   diabete3
## 1        0 0.05668021
## 2        1 0.16416525

Correlation analysis on subset hlthsub2.

 coran <- sort(cor(hlthsub2)[,1])
 print(coran)
    addepev2     X_rfbmi5     smoke100      toldhi2     cvdinfr4 
-0.051985927 -0.028331766 -0.024540698 -0.002282787  0.001512380 
     bphigh4     diabete3     sleptim1 
 0.002993190  0.004080381  1.000000000 
 plot(coran)

Fisher Test on Selected variables.

Cardiovascular disorder and obesity

## 
##  Fisher's Exact Test for Count Data
## 
## data:  hlthsub2$cvdinfr4 and hlthsub2$X_rfbmi5
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.412973 1.488011
## sample estimates:
## odds ratio 
##   1.449966

Diabetes and Depression

## 
##  Fisher's Exact Test for Count Data
## 
## data:  hlthsub2$diabete3 and hlthsub2$addepev2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.669775 1.734516
## sample estimates:
## odds ratio 
##   1.701818

Smoke and cardiovascular disease

## 
##  Fisher's Exact Test for Count Data
## 
## data:  hlthsub2$smoke100 and hlthsub2$cvdinfr4
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.126275 2.229919
## sample estimates:
## odds ratio 
##   2.177476

Diabetes and Obesity

## 
##  Fisher's Exact Test for Count Data
## 
## data:  hlthsub2$diabete3 and hlthsub2$X_rfbmi5
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  3.194582 3.344753
## sample estimates:
## odds ratio 
##    3.26872

Diabetes and cardiovascular disease

## 
##  Fisher's Exact Test for Count Data
## 
## data:  hlthsub2$cvdinfr4 and hlthsub2$diabete3
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  3.743158 3.936279
## sample estimates:
## odds ratio 
##   3.838856

Run Density Plots of a few selected health outcomes

Inference of Analysis

From the above results we can infer that:
   
1.Sleeping Time has a negative association with depression,smoking and obesity.
If there is more smoking,obesity and depression, the person tends to sleep more
hours.
   
2.Exercise has negative association with depression,diabetes,high blood pressure
and cholesterol.It means your health will improve if you exercise and chronic 
disorders like High Blood Pressure and cholestrol will be reduced.
   
3.Instance of cardio vascular disorders are directly related to diabetes,
high blood pressure and cholesterol,depression,smoking and obesity,but no
relationhip with exercise and sleeping time.
We can find associations between outcomes and causes,which confirms our hypothesis.
"To Lead good life it is essential that one must avoid excesses and be physically
active irrespective of age or sex."