(a) what does each row of the data matrix represent?
Each row of the Data Matrix Represents a smoking habit of an indvidual U.K resident it is also called a case or observational unit.
df<-data.frame(read_xls("11263-Smoking_tcm86-13253.xls"))
head(df)
## Sex Age Marital.Status Highest.Qualification Nationality Ethnicity
## 1 Male 38 Divorced No Qualification British White
## 2 Female 42 Single No Qualification British White
## 3 Male 40 Married Degree English White
## 4 Female 40 Married Degree English White
## 5 Female 39 Married GCSE/O Level British White
## 6 Female 37 Married GCSE/O Level British White
## Gross.Income Region Smoke. Amount.Weekends Amount.Weekdays
## 1 £2600 to less than £5200 The North No N/A N/A
## 2 Less than £2600 The North Yes 12 12
## 3 £28600 to less than £36400 The North No N/A N/A
## 4 £10400 to less than £15600 The North No N/A N/A
## 5 £2600 to less than £5200 The North No N/A N/A
## 6 £15600 to less than £20800 The North No N/A N/A
## Type
## 1 N/A
## 2 Packets
## 3 N/A
## 4 N/A
## 5 N/A
## 6 N/A
(b) How many participants were included in the survey
A total of 1693 were included in the survey.
(c)Indicate whether each variable included in the survey is numerical or categorical . If numerical , identify as continious or discrete. If categorical , Indicate if the variable is ordinal.
Lets examine each variable to determine the type. We will be making a use of unique function in R that will provide us with all the distinct values for a variable. This is just to give us a better understanding of the data contained in the variable and can help us make a determination as to what type of variable it is.
unique(df$Marital.Status)
## [1] "Divorced" "Single" "Married" "Widowed" "Separated"
Marital.Status is a Categorical variable as no numerical operation of addition or substraction can be applied. It is also not ordinal as there is no specific order to the marital status a person can be single then get married then divorced and then married again.
unique(df$Highest.Qualification)
## [1] "No Qualification" "Degree" "GCSE/O Level"
## [4] "GCSE/CSE" "Other/Sub Degree" "Higher/Sub Degree"
## [7] "ONC/BTEC" "A Levels" "99"
Highest Qualification variable is also categorical but is ordinal as it needs to happen in a specific order.
unique(df$Nationality)
## [1] "British" "English" "Scottish" "Other" "Welsh" "Irish" "Refused"
## [8] "Unknown"
Nationality: is categorical
unique(df$Ethnicity)
## [1] "White" "Mixed" "Black" "Refused" "Asian" "Chinese" "Unknown"
Ethnicity: is categorical
unique(df$Gross.Income)
## [1] "£2600 to less than £5200" "Less than £2600"
## [3] "£28600 to less than £36400" "£10400 to less than £15600"
## [5] "£15600 to less than £20800" "£36400 or more"
## [7] "£5200 to less than £10400" "Refused"
## [9] "£20800 to less than £28600" "Unknown"
GrossIncome: is also categorical as it is divided in to diffferent categories such as can be seen above.
unique(df$Region)
## [1] "The North" "Midlands & East Anglia" "London"
## [4] "South East" "South West" "Wales"
## [7] "Scotland"
Region: is also categorical as it is divided in to diffferent categories such as can be seen above.
unique(df$Smoke)
## [1] "No" "Yes"
Smoke: is also categorical as it is divided in to diffferent categories such as can be seen above.
unique(df$Amount.Weekends)
## [1] "N/A" "12" "6" "8" "15" "5" "20" "25" "40" "4" "30" "10"
## [13] "7" "9" "2" "50" "16" "35" "18" "1" "0" "3" "998" "60"
## [25] "24" "45"
Amt.Weekends: Is a Numerical and discrete as a person smokes no cigarettes which is 0 or more but it is a whole number therefore it is discrete.
unique(df$Amount.Weekdays)
## [1] "N/A" "12" "6" "8" "2" "20" "15" "25" "4" "10" "0" "30"
## [13] "3" "7" "40" "9" "5" "50" "18" "35" "1" "998" "55" "16"
## [25] "24" "45"
Amount.Weekdays: Is a Numerical and discrete as a person smokes no cigarettes which is 0 or more but it is a whole number therefore it is discrete.
unique(df$Type)
## [1] "N/A" "Packets"
## [3] "Hand-Rolled" "Both/Mainly Packets"
## [5] "Both/Mainly Hand-Rolled"
Type: Is Categorical
unique(df$Sex)
## [1] "Male" "Female"
Sex: is a categorical data type.
unique(df$Age)
## [1] 38 42 40 39 37 53 44 41 72 49 29 79 25 27 30 47 69 55 34 36 56 71 58 83 73
## [26] 31 26 57 22 78 74 85 75 80 33 81 76 59 54 28 89 64 61 20 82 23 67 43 18 63
## [51] 50 66 62 17 68 65 35 52 60 16 24 32 48 91 70 87 21 77 46 51 84 45 19 90 86
## [76] 88 93 95 97
Age: is a Numerical Data Type and is dicrete.
160 children between the ages of 5 and 15
Before we answer the question it is important to understand the question being asked in the experiment. The question is will the students cheat if they were explicitly told not to. There are certain key pieces of information that is missing for us to make a conclusion if the results may be generalized or not. The information we may need to know is how was the sampling done was it using the randomization technique or not. If a randmization technique was used was it simple, Stratified , blocked etc. Also there may be varying other influences as to whether a child may cheat or not such as the upbringing , family, cultural aspects, ethical and religious aspects. In order to genralize the results a more in depth research would be required rather than just collecting the four data elements collected in this research. So, I would say that no we cannot generalize the results. In some way we can say that a causal relationship can established using the research as we can create a variable called “InstructedNotToCheat” with possible categories of “Yes” and “No” this will be the explanotarey variable and the response variable will be whether the child cheated or not so there could be a relationship formed between the two.
Based on this article we can conclude that smoking may cause dementia later in life as the results and data suggest that those that smoke are at high risk of having this disease which is 25% more than the non smokers. Also as the quantity of smoking increases from less than a pack 1 to two packs a day the chances are 44% more than non smokers and it rises to 50% for those who smoke more than 2 packs. The evidence collected is pretty strong although there could be other factors that may have impacted the dementia with a combination of smoking like weather , geograpy and genetics but still the number is pretty high.
I don’t believe the statment is justified as although the kids with behavioral concerns were twice as likely to have sleep disorders that does not neccessarily mean that sleep disorder is what causes the behavioural issues. It could be the other way around where bullying may lead to sleep disorders. But based on this article it shows that this was an observational study and a causal relationship cannot be conducted based on observational study they would have to conduct an experiment to determine that.So I do not think it is justifiable to say that sleep disorders lead to bullying.
This is a experimental study as the researchers are conducting an experiment and then using the random stratified sampling method to assign members to a treatment group or a controlled group.
The treatment group is the group that is being told to exercise twice a week and the controlled group is the one that is instructed not to exercise.
The study does make use of blocking as half of the patients from each stratified group is assigned to the treatment group where as the other half is assigned to the controlled group. The blocking variable is the Age range.
The study does not make use of blinding as each person knows which group they are in and so do the researchers.
Yes a cause a relationship can be established between the exercise and mental health as in this case the exercise can easily be the explanatory variable and the mental health category variable can be the response variable. As for being able to genralize to a larger population I would say since the sampling method is stratified and blocking is being used the results can be more accurate however what is the sample size and how many individuals from each group are in the research is unkonwn. Also there is some other data that may need to be collected such as income level, eating habits, sleep patterns and weather conditons of where they live as I believe these things can also have an impact on mental health rather than just exercise alone. So personally I won’t jump to genralizing the results to a larger population before doing a more deeper analysis of the subject.
I would say this is a very interesting study and should get funding as mental health is a very important issue in the society at large and if this study can determine a cause a relationship between exercise and mental health then it will allow people to be able to improve their mental health in this way. However, I would factor in collecting other data from the population such as their income level, education level, eating habits, sleep patterns and the geographical location.