Statistics with R Specialization

Introduction to Probabilty and Data with R

Jayasree Kulothungan

Load packages

library(ggplot2)
library(dplyr)

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called brfss2013. Delete this note when before you submit your work.

load("brfss2013.RData")

Part 1: Data

The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States (US) and participating US territories and the Centers for Disease Control and Prevention (CDC).

Generaliazibility

The data is analysed for Generalizabity - whether random sampling was used.

Since 2011, BRFSS conducts both landline telephone- and cellular telephone-based surveys. In conducting the BRFSS landline telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone and resides in a private residence or college housing.

Since the data is collected from a very large random sample of representatives, this data for this cross sectional observational study can be generalized to the the adult population of the respective states

Causality

The Behavioral Risk Factor Surveillance System (BRFSS) is a cross sectional observational study. Since it is difficult to draw inferences that are not biased from an observational study, deriving causal inferences is not possible.


Part 2: Research questions

Research quesion 1:

How does Arthritis affect daily life? Will it be difficult to walk for a long time and also have an impact on social Activity?

Interest :

After a family member got diagnosed with arthritis recently, I want to know its consequences on personal life.

Variables:

  • havarth3 : Have Arthritis
  • diffwalk : Difficulty Walking Or Climbing Stairs
  • arthsocl : Social Activities Limited Because Of Joint Symptoms

Research quesion 2:

How many women have had the opportunity to take a Pap Test? How many of these women have had their uteres removed?

Interest :

Until recently, a large number of women around the world have been victims to uterus cancer. Unfortunately, a lot of them have no idea about Pap Test and the importance of getting regular checkups. I want to know how many women in the sample have undergone this test and how many out of them have had hysterecomy.

Variables :

  • sex : Respondents Sex
  • hadpap2 : Ever Had A Pap Test
  • hadhyst2 : Had Hysterectomy

Research quesion 3:

Is there any correlation between people who have got medical help regarding depression and whether they have been depressed over the past month?

Interest :

Depression has been prevalent world wide among people of both gender. However, reaching out help is still not widespread. I want to know whether getting medical assistance has helped in reducing the depression.

Variables :

  • sex : Respondents Sex
  • addepev2 : Ever Told You Had A Depressive Disorder
  • misdeprd : How Often Feel Depressed Past 30 Days
  • mistmnt : Receiving Medicine Or Treatment From Health Pro For Emotional Problem

Part 3: Exploratory data analysis

NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.

Research quesion 1:

arthdata <- brfss2013[which(brfss2013$havarth3 == "Yes"),
                 names(brfss2013) %in% 
                     c("havarth3","diffwalk","arthsocl")]
arthdata <- arthdata[complete.cases(arthdata),]
summary(arthdata)
##  havarth3     diffwalk          arthsocl    
##  Yes:152390   Yes:57318   A lot     :28689  
##  No :     0   No :95072   A little  :36806  
##                           Not at all:86895
ggplot(arthdata, aes(y=factor(diffwalk), ..count..)) + 
        geom_bar(aes(fill = arthsocl),width = 0.5, position = position_stack(reverse = TRUE)) +
         labs(x = "Frequency of People with arthritis", y = "Difficulty in walking", fill = "Social Limitations") 

From the above analysis, It is evident that the out of the people who suffer from arthritis, those who have difficulty in walking feel that it it affects their social life a lot more than those who dont have problem walking or climbing stairs.

Research quesion 2:

women_test <- brfss2013[which(brfss2013$sex == "Female"),
                 names(brfss2013) %in% 
                     c("sex","hadpap2","hadhyst2")]

women_test <- women_test[complete.cases(women_test),]

summary(women_test)
##      sex        hadpap2     hadhyst2   
##  Male  :    0   Yes:27797   Yes:10254  
##  Female:29451   No : 1654   No :19197
ggplot(women_test, aes(hadpap2, ..count..)) + geom_bar(aes(fill = hadhyst2), position = "dodge" , width = 0.5) +
         labs(x = "Had Pap Test", y = "Frequency", fill = "Had Hysterectomy")

From the above graph it can be seen that a lot of women had had pap test at least once and also, the number of people who have had hysterectomy is comparatively less irrespective of whether they have undergone the test.

Research quesion 3:

depressiondata <- brfss2013[which(brfss2013$addepev2 == "Yes"),names(brfss2013) %in% 
                     c("addepev2","sex", "misdeprd" ,"mistmnt")]

depressiondata <-depressiondata[complete.cases(depressiondata),] 

summary(depressiondata)
##  addepev2       sex           misdeprd    mistmnt   
##  Yes:7643   Male  :2260   All     : 209   Yes:4338  
##  No :   0   Female:5383   Most    : 425   No :3305  
##                           Some    :1040             
##                           A little:1288             
##                           None    :4681
ggplot(depressiondata, aes(x=factor(mistmnt), ..count..)) + 
        geom_bar(aes(fill = misdeprd),width = 0.5, position = "stack") +
         labs(x = "Recieved Medical Help", y = "Frequency", fill = "Depressed for the past month") +
        theme_minimal(base_size = 10) +
        facet_grid(. ~  sex)

The graph shows the number of females who have been depressed have been considerably more than the number of males. Also, according to the graph there is no noteworthy inference of whether medical assistance has helped the individuals. This can be because the number of days observed was very less and the information on whether the individuals are currently receiving help or the difference between the time when they received help to the time the data was taken is not available