STA 308: LAB FIELDWORK FOR SURVEY METHODS & SAMPLING THEORY
Survey data is defined as the resultant data that is collected from a sample of respondents that took a survey. This data is comprehensive information gathered from a target audience about a specific topic to conduct research. There are many methods used for survey data collection and statistical analysis.
Various channels are used to collect feedback and opinions from the desired sample of individuals. While conducting survey research, researchers prefer multiple sources to gather data such as online surveys, telephonic surveys, face-to-face surveys, etc.
However, the medium of collecting survey data decides the sample of people that are to be reached out to, to reach the requisite number of survey responses.
Factors of collecting survey data such as how the interviewer will contact the respondents (online or offline), how the information is communicated to the respondents etc. decide the effectiveness of gathered data.
SURVEY DATA COLLECTION METHODS
The methods used to collect survey data have evolved with the change in technology. From face-to-face surveys, telephonic surveys to now online and email surveys, the world of survey data collection has changed with time. Each survey data collection method has its pros and cons, and every researcher has a preference for gathering accurate information from the target sample.
The survey response rates for each of these data collection methods will differ as their reach and impact are always different. Different ways are chosen according to specific target population characteristics and the intent to examine human nature under various situations.
There are four main survey data collection methods – Online Surveys, Face-to-Face Surveys, Telephone Surveys, and Paper Survey.
ONLINE SURVEYS
Online surveys are the most cost-effective and can reach the maximum number of people in comparison to the other mediums. The performance of these surveys is much more widespread than the other data collection methods. In situations where there is more than one question to be asked to the target sample, certain researchers prefer conducting online surveys over the traditional face-to-face or telephone surveys.
Online surveys are effective and therefore require computational logic and branching technologies for exponentially more accurate survey data collection versus any other traditional means of surveying. They are straightforward in their implementation and take a minimum time of the respondents.
The investment required for survey data collection using online surveys is also negligible in comparison to the other methods. The results are collected in real-time for researchers to analyze and decide corrective measures. A very good example of an online survey is a hotel chain using an online survey to collect guest satisfaction metrics after a stay or an event at the property.
FACE-T0-FACE SURVEYS
Gaining information from respondents via face-to-face medium is much more effective than the other mediums because respondents usually tend to trust the surveyors and provide honest and clear feedback about the subject in-hand. Researchers can easily identify whether their respondents are uncomfortable with the asked questions and can be extremely productive in case there are sensitive topics involved in the discussion.
This face-to-face data collection method demands more cost-investment than in comparison to the other methods. According to the geographic or psychographic segmentation, researchers must be trained to gain accurate information.
For example, a job evaluation survey is conducted in person between an HR or a manager with the employee. This method works best face-to-face as the data collection can collect as accurate information as possible.
TELEPHONE SURVEYS
Telephone surveys require much lesser investment than face-to-face surveys. Depending on the required reach, telephone surveys cost as much or a little more than online surveys. Contacting respondents via the telephonic medium requires less effort and manpower than the face-to-face survey medium. If interviewers are located at the same place, they can cross-check their questions to ensure error-free questions are asked to the target audience.
The main drawback of conducting telephone surveys is that establishing a friendly equation with the respondent becomes challenging due to the bridge of the medium. Respondents are also highly likely to choose to remain anonymous in their feedback over the phone as the reliability associated with the researcher can be questioned.
For example, if a retail giant would like to understand purchasing decisions, they can conduct a telephonic, motivation, and buying experience survey to collect data about the entire purchasing experience.
PAPER SURVEYS
The other commonly used survey method is paper surveys. These surveys can be used where laptops, computers, and tablets cannot go, and hence they use the age-old method of data collection; pen and paper. This method helps collect survey data in field research and helps strengthen the number of responses collected and the validity of these responses.
A popular example or use case of a paper survey is a fast food restaurant survey where the fast-food chain would like to collect feedback on the dining experience of its patrons (attendants).
TYPES OF SURVEY DATA BASED ON THE FREQUENCY AT WHICH THEY ARE ADMINISTERED
Surveys can be divided into 3 distinctive types on the basis of the frequency of their distribution. They are:
Cross-Sectional Surveys
Cross-sectional surveys are an observational research method that analyzes data of variables collected at one given point of time across a sample population or a pre-defined subset. The survey data from this method helps the researcher understand what the respondent is feeling at a certain point in time.
It helps measure opinions in a particular situation. For example, if the researcher would like to understand movie rental habits, a survey can be conducted across demographics and geographical locations.
The cross-sectional survey, for example, can help understand that males between 21-28 rent action movies and females between 35-45 rent romantic comedies.
Longitudinal Surveys
Longitudinal surveys are those surveys that help researchers to make an observation and collect data over an extended period of time. This survey data can be qualitative or quantitative in nature, and the survey creator does not interfere with the survey respondents.
For example, a longitudinal study can be carried out for years to help understand if mine workers are more prone to lung diseases. This study takes a year and discounts any pre-existing conditions.
Retrospective Surveys
In retrospective surveys, researchers ask respondents to report events from the past. This survey method offers in-depth survey data but doesn’t take as long to complete. By deploying this kind of survey, researchers can gather data based on past experiences and beliefs of people.
For example, if hikers are asked about a certain hike – the conditions of the hiking trail, ease of hike, weather conditions, trekking conditions, etc. after they have completed the trek, it is a retrospective study.
### SURVEY DATA ANALYSIS USING R LANGUAGE ###
After the survey data has been collected, this data has to be analyzed to ensure it aids towards the end research objective. There are different ways of conducting this research and some steps to follow. There are four main steps of survey data analysis:
- Understand the most popular survey research questions: The survey questions should align with the overall purpose of the survey. That is when the collected data will be effective in helping researchers. For example, if a seminar has been conducted, the researchers will send out a post-seminar feedback survey. T he primary goal of this survey will be to understand whether the attendees are interested in attending future seminars. The question will be: “How likely are you to attend future seminars?”. Data collected for this question will decide the likelihood of success of future seminars.
- Filter obtained results using the cross-tabulation technique: Understand the various categories in the target audience and their thoughts using cross-tabulation format. For example, if there are business owners, administrators, students, etc. who attend the seminar, the data about whether they would prefer attending future seminars or not can be represented using cross-tabulation.
- Evaluate the derived numbers: Analyzing the gathered information is critical. How many of the attendees are of the opinion that they will be attending future seminars and how many will not – these facts need to be evaluated according to the results obtained from the sample.
- Draw conclusions: Weave a story with the collected and analyzed data. What was the intention of the survey research, and how does the survey data suffice that objective? – Understand that and develop accurate, conclusive results.
###SURVEY DATA ANALYSIS METHODS###
Conducting a survey without having access to the resultant data and the inability to draw conclusions from the survey data is pointless. When you conduct a survey, it is imperative to have access to its analytics. It is tough to analyze using traditional survey methods like pen and paper and also requires additional manpower. Survey data analysis becomes much easier when using advanced online data collection methods with an online survey platform such as market research survey software or customer survey software like R and Python.
Statistical analysis can be conducted on the survey data to make sense of the data that has been collected. There are multiple data analysis methods of quantitative data. Some of the commonly used types are:
Cross-tabulation analysis
Trend analysis
MaxDiff analysis
Conjoint analysis
Total Unduplicated Reach and Frequency (TURF) analysis
Gap analysis
SWOT analysis: SWOT analysis, another widely used statistical method, organizes survey data into data that represents the strength, weaknesses, opportunities, and threats of an organization or product or service that provides a holistic picture of competition. This method helps to create effective business strategies.
Text analysis: Text analysis is an advanced statistical method where intelligent tools make sense of and quantify or fashion qualitative and open-ended data into easily understandable data. This method is used when the survey data is unstructured.
APPLICATION OF R PROGRAMMING LANGUAGE IN SURVEY DATA ANALYSIS
We want to describe and implement functions (commands) that aid the exploration of survey data via simple tabulations of respondent counts and proportions, including the ability to specify:
either a frequency count or a row/column/joint/total table proportion;
multiple row and column variables; and
all or grand margins or no margins plus retention of data in a format that is amendable to further analysis in R.
NOW, LET’S START THE CALCULATIONS
PRACTICAL ILLUSTRATION
We will simulate datasets, that will be approximately 90% accurate with real-life dataset. That will serve as our case study.
Install the following packages as follows:
install.packages("tidyverse")
## Warning: package 'tidyverse' is in use and will not be installed
install.packages("stringr")
## Warning: package 'stringr' is in use and will not be installed
install.packages("gmodels")
## Warning: package 'gmodels' is in use and will not be installed
#install.packages("ggplots")
install.packages("descr")
## Warning: package 'descr' is in use and will not be installed
# After installation, call their libraries one by one as follows:
Run the codes one by one, it will draw the applicability of the packages for usage:
library("tidyverse")
library("stringr")
library("gmodels")
#library("ggplots")
library("descr")
EXAMPLE 1:
TO CREATE SOME DEMOGRAPHIC DATA SETS
Run the following R codes to see what you observe:
ID = seq(1:3000)
set.seed(234)
Age = sample(c("0 - 5", "6 - 14", "15 - 24", "25 - 50", "51 - 64", "65+ "), 3000, replace = TRUE)
#View(Age)
set.seed(234)
Gender = sample(c("Male", "Female"), 3000, replace = TRUE)
#View(Gender)
set.seed(234)
Country = sample(c("Nigeria", "Ghana", "South Africa", "Botswana", "United Kingdom", "Austria"), 3000, replace = TRUE)
#View(Country)
set.seed(234)
Health_Status = sample(c("Poor", "Fair", "Okay"), 3000, replace = TRUE)
#View(Health_Status)
Survey = data.frame(Age, Gender, Country, Health_Status)
#View(Survey)
head(Survey) ##Recall that head is used to pick the first 6 elements of the generated data set
## Age Gender Country Health_Status
## 1 0 - 5 Male Nigeria Poor
## 2 6 - 14 Male Ghana Okay
## 3 65+ Female Austria Fair
## 4 6 - 14 Female Ghana Fair
## 5 6 - 14 Female Ghana Fair
## 6 51 - 64 Female United Kingdom Fair
#Look at this code:
head(Survey, 15) #This will produce the first 15 elements of survey
## Age Gender Country Health_Status
## 1 0 - 5 Male Nigeria Poor
## 2 6 - 14 Male Ghana Okay
## 3 65+ Female Austria Fair
## 4 6 - 14 Female Ghana Fair
## 5 6 - 14 Female Ghana Fair
## 6 51 - 64 Female United Kingdom Fair
## 7 0 - 5 Male Nigeria Poor
## 8 25 - 50 Female Botswana Poor
## 9 25 - 50 Male Botswana Okay
## 10 65+ Female Austria Fair
## 11 65+ Female Austria Okay
## 12 15 - 24 Male South Africa Okay
## 13 6 - 14 Female Ghana Fair
## 14 6 - 14 Male Ghana Okay
## 15 51 - 64 Male United Kingdom Fair
#What about the command below:
tail(Survey) # This picks the last 6 elements of the generated data set
## Age Gender Country Health_Status
## 2995 15 - 24 Male South Africa Okay
## 2996 15 - 24 Male South Africa Poor
## 2997 51 - 64 Female United Kingdom Fair
## 2998 6 - 14 Female Ghana Poor
## 2999 0 - 5 Female Nigeria Okay
## 3000 25 - 50 Female Botswana Poor
tail(Survey, 20) # This picks the last 20 elements of the generated data set
## Age Gender Country Health_Status
## 2981 0 - 5 Female Nigeria Fair
## 2982 6 - 14 Female Ghana Okay
## 2983 15 - 24 Female South Africa Poor
## 2984 6 - 14 Male Ghana Poor
## 2985 65+ Male Austria Poor
## 2986 0 - 5 Female Nigeria Fair
## 2987 51 - 64 Female United Kingdom Fair
## 2988 51 - 64 Female United Kingdom Poor
## 2989 6 - 14 Male Ghana Okay
## 2990 6 - 14 Female Ghana Okay
## 2991 51 - 64 Male United Kingdom Okay
## 2992 15 - 24 Male South Africa Poor
## 2993 25 - 50 Female Botswana Okay
## 2994 0 - 5 Female Nigeria Okay
## 2995 15 - 24 Male South Africa Okay
## 2996 15 - 24 Male South Africa Poor
## 2997 51 - 64 Female United Kingdom Fair
## 2998 6 - 14 Female Ghana Poor
## 2999 0 - 5 Female Nigeria Okay
## 3000 25 - 50 Female Botswana Poor
###Now, let’s start workings with our “Survey” generated data set###
result_1 = CrossTable(Survey$Age, Survey$Gender, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Gender"))
print(result_1)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =================================
## Gender
## Age Female Male Total
## ---------------------------------
## 0 - 5 253 227 480
## 246.7 233.3
## 0.160 0.169
## 0.527 0.473 0.160
## 0.164 0.156
## 0.084 0.076
## ---------------------------------
## 15 - 24 246 259 505
## 259.6 245.4
## 0.709 0.750
## 0.487 0.513 0.168
## 0.160 0.178
## 0.082 0.086
## ---------------------------------
## 25 - 50 235 263 498
## 256.0 242.0
## 1.718 1.817
## 0.472 0.528 0.166
## 0.152 0.180
## 0.078 0.088
## ---------------------------------
## 51 - 64 269 228 497
## 255.5 241.5
## 0.718 0.759
## 0.541 0.459 0.166
## 0.174 0.156
## 0.090 0.076
## ---------------------------------
## 6 - 14 268 232 500
## 257.0 243.0
## 0.471 0.498
## 0.536 0.464 0.167
## 0.174 0.159
## 0.089 0.077
## ---------------------------------
## 65+ 271 249 520
## 267.3 252.7
## 0.052 0.055
## 0.521 0.479 0.173
## 0.176 0.171
## 0.090 0.083
## ---------------------------------
## Total 1542 1458 3000
## 0.514 0.486
## =================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 7.876522 d.f. = 5 p = 0.163
To generate the same results in SPSS format, use the command below:
result_1B = CrossTable(Survey$Age, Survey$Gender,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_1B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ====================================
## Survey$Gender
## Survey$Age Female Male Total
## ------------------------------------
## 0 - 5 253 227 480
## 16.4% 15.6%
## ------------------------------------
## 15 - 24 246 259 505
## 16.0% 17.8%
## ------------------------------------
## 25 - 50 235 263 498
## 15.2% 18.0%
## ------------------------------------
## 51 - 64 269 228 497
## 17.4% 15.6%
## ------------------------------------
## 6 - 14 268 232 500
## 17.4% 15.9%
## ------------------------------------
## 65+ 271 249 520
## 17.6% 17.1%
## ------------------------------------
## Total 1542 1458 3000
## 51.4% 48.6%
## ====================================
Now, let’s consider Age and Health Status:
result_2 = CrossTable(Survey$Age, Survey$Health_Status, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Health_Status"))
print(result_2)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ========================================
## Health_Status
## Age Fair Okay Poor Total
## ----------------------------------------
## 0 - 5 145 181 154 480
## 163.2 160.3 156.5
## 2.030 2.668 0.039
## 0.302 0.377 0.321 0.160
## 0.142 0.181 0.157
## 0.048 0.060 0.051
## ----------------------------------------
## 15 - 24 166 179 160 505
## 171.7 168.7 164.6
## 0.189 0.633 0.130
## 0.329 0.354 0.317 0.168
## 0.163 0.179 0.164
## 0.055 0.060 0.053
## ----------------------------------------
## 25 - 50 177 153 168 498
## 169.3 166.3 162.3
## 0.348 1.069 0.197
## 0.355 0.307 0.337 0.166
## 0.174 0.153 0.172
## 0.059 0.051 0.056
## ----------------------------------------
## 51 - 64 164 153 180 497
## 169.0 166.0 162.0
## 0.147 1.018 1.995
## 0.330 0.308 0.362 0.166
## 0.161 0.153 0.184
## 0.055 0.051 0.060
## ----------------------------------------
## 6 - 14 179 176 145 500
## 170.0 167.0 163.0
## 0.476 0.485 1.988
## 0.358 0.352 0.290 0.167
## 0.175 0.176 0.148
## 0.060 0.059 0.048
## ----------------------------------------
## 65+ 189 160 171 520
## 176.8 173.7 169.5
## 0.842 1.078 0.013
## 0.363 0.308 0.329 0.173
## 0.185 0.160 0.175
## 0.063 0.053 0.057
## ----------------------------------------
## Total 1020 1002 978 3000
## 0.340 0.334 0.326
## ========================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 15.34322 d.f. = 10 p = 0.12
For the immediate above, let’s have the same results in SPSS format:
result_2B = CrossTable(Survey$Age, Survey$Health_Status,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_2B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ===========================================
## Survey$Health_Status
## Survey$Age Fair Okay Poor Total
## -------------------------------------------
## 0 - 5 145 181 154 480
## 14.2% 18.1% 15.7%
## -------------------------------------------
## 15 - 24 166 179 160 505
## 16.3% 17.9% 16.4%
## -------------------------------------------
## 25 - 50 177 153 168 498
## 17.4% 15.3% 17.2%
## -------------------------------------------
## 51 - 64 164 153 180 497
## 16.1% 15.3% 18.4%
## -------------------------------------------
## 6 - 14 179 176 145 500
## 17.5% 17.6% 14.8%
## -------------------------------------------
## 65+ 189 160 171 520
## 18.5% 16.0% 17.5%
## -------------------------------------------
## Total 1020 1002 978 3000
## 34.0% 33.4% 32.6%
## ===========================================
See another example below:
result_3 = CrossTable(Survey$Age, Survey$Country, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Country"))
print(result_3)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =====================================================================================
## Country
## Age Austria Botswana Ghana Nigeria Sth Afrc Untd Kng Total
## -------------------------------------------------------------------------------------
## 0 - 5 0 0 0 480 0 0 480
## 83.2 79.7 80.0 76.8 80.8 79.5
## 83.200 79.680 80.000 2116.800 80.800 79.520
## 0.000 0.000 0.000 1.000 0.000 0.000 0.160
## 0 0 0 1 0 0
## 0.000 0.000 0.000 0.160 0.000 0.000
## -------------------------------------------------------------------------------------
## 15 - 24 0 0 0 0 505 0 505
## 87.5 83.8 84.2 80.8 85.0 83.7
## 87.533 83.830 84.167 80.800 2075.008 83.662
## 0.000 0.000 0.000 0.000 1.000 0.000 0.168
## 0 0 0 0 1 0
## 0.000 0.000 0.000 0.000 0.168 0.000
## -------------------------------------------------------------------------------------
## 25 - 50 0 498 0 0 0 0 498
## 86.3 82.7 83.0 79.7 83.8 82.5
## 86.320 2086.668 83.000 79.680 83.830 82.502
## 0.000 1.000 0.000 0.000 0.000 0.000 0.166
## 0 1 0 0 0 0
## 0.000 0.166 0.000 0.000 0.000 0.000
## -------------------------------------------------------------------------------------
## 51 - 64 0 0 0 0 0 497 497
## 86.1 82.5 82.8 79.5 83.7 82.3
## 86.147 82.502 82.833 79.520 83.662 2088.336
## 0.000 0.000 0.000 0.000 0.000 1.000 0.166
## 0 0 0 0 0 1
## 0.000 0.000 0.000 0.000 0.000 0.166
## -------------------------------------------------------------------------------------
## 6 - 14 0 0 500 0 0 0 500
## 86.7 83.0 83.3 80.0 84.2 82.8
## 86.667 83.000 2083.333 80.000 84.167 82.833
## 0.000 0.000 1.000 0.000 0.000 0.000 0.167
## 0 0 1 0 0 0
## 0.000 0.000 0.167 0.000 0.000 0.000
## -------------------------------------------------------------------------------------
## 65+ 520 0 0 0 0 0 520
## 90.1 86.3 86.7 83.2 87.5 86.1
## 2050.133 86.320 86.667 83.200 87.533 86.147
## 1.000 0.000 0.000 0.000 0.000 0.000 0.173
## 1 0 0 0 0 0
## 0.173 0.000 0.000 0.000 0.000 0.000
## -------------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 0.173 0.166 0.167 0.160 0.168 0.166
## =====================================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 15000 d.f. = 25 p <2e-16
Look at what we have again in SPSS FORMAT:
result_3B = CrossTable(Survey$Age, Survey$Country, prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_3B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==============================================================================
## Survey$Country
## Survy$Ag Austria Botswana Ghana Nigeria Sth Afrc Untd Kng Total
## ------------------------------------------------------------------------------
## 0 - 5 0 0 0 480 0 0 480
## 0% 0% 0% 100% 0% 0%
## ------------------------------------------------------------------------------
## 15 - 24 0 0 0 0 505 0 505
## 0% 0% 0% 0% 100% 0%
## ------------------------------------------------------------------------------
## 25 - 50 0 498 0 0 0 0 498
## 0% 100% 0% 0% 0% 0%
## ------------------------------------------------------------------------------
## 51 - 64 0 0 0 0 0 497 497
## 0% 0% 0% 0% 0% 100%
## ------------------------------------------------------------------------------
## 6 - 14 0 0 500 0 0 0 500
## 0% 0% 100% 0% 0% 0%
## ------------------------------------------------------------------------------
## 65+ 520 0 0 0 0 0 520
## 100% 0% 0% 0% 0% 0%
## ------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 17.3% 16.6% 16.7% 16.0% 16.8% 16.6%
## ==============================================================================
Let’s look at Example below:
result_4 = CrossTable(Survey$Gender, Survey$Country, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Gender", "Country"))
print(result_4)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ================================================================================
## Country
## Gender Austria Botswana Ghana Nigeria South Afrc Untd Kngdm Total
## --------------------------------------------------------------------------------
## Female 271 235 268 253 246 269 1542
## 267.3 256.0 257.0 246.7 259.6 255.5
## 0.052 1.718 0.471 0.160 0.709 0.718
## 0.176 0.152 0.174 0.164 0.160 0.174 0.514
## 0.521 0.472 0.536 0.527 0.487 0.541
## 0.090 0.078 0.089 0.084 0.082 0.090
## --------------------------------------------------------------------------------
## Male 249 263 232 227 259 228 1458
## 252.7 242.0 243.0 233.3 245.4 241.5
## 0.055 1.817 0.498 0.169 0.750 0.759
## 0.171 0.180 0.159 0.156 0.178 0.156 0.486
## 0.479 0.528 0.464 0.473 0.513 0.459
## 0.083 0.088 0.077 0.076 0.086 0.076
## --------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 0.173 0.166 0.167 0.160 0.168 0.166
## ================================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 7.876522 d.f. = 5 p = 0.163
The same results above could be in SPSS FORMAT in the following codes:
result_4B = CrossTable(Survey$Gender, Survey$Country, prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_4B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==============================================================================
## Survey$Country
## Srvy$Gnd Austria Botswana Ghana Nigeria Sth Afrc Untd Kng Total
## ------------------------------------------------------------------------------
## Female 271 235 268 253 246 269 1542
## 52.1% 47.2% 53.6% 52.7% 48.7% 54.1%
## ------------------------------------------------------------------------------
## Male 249 263 232 227 259 228 1458
## 47.9% 52.8% 46.4% 47.3% 51.3% 45.9%
## ------------------------------------------------------------------------------
## Total 520 498 500 480 505 497 3000
## 17.3% 16.6% 16.7% 16.0% 16.8% 16.6%
## ==============================================================================
Another example below:
result_5 = CrossTable(Survey$Gender, Survey$Health_Status, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Gender", "Health_Status"))
print(result_5)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =======================================
## Health_Status
## Gender Fair Okay Poor Total
## ---------------------------------------
## Female 538 493 511 1542
## 524.3 515.0 502.7
## 0.359 0.942 0.137
## 0.349 0.320 0.331 0.514
## 0.527 0.492 0.522
## 0.179 0.164 0.170
## ---------------------------------------
## Male 482 509 467 1458
## 495.7 487.0 475.3
## 0.380 0.996 0.145
## 0.331 0.349 0.320 0.486
## 0.473 0.508 0.478
## 0.161 0.170 0.156
## ---------------------------------------
## Total 1020 1002 978 3000
## 0.340 0.334 0.326
## =======================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 2.959869 d.f. = 2 p = 0.228
We can still have it in SPSS FORMAT as follows:
result_5B = CrossTable(Survey$Gender, Survey$Health_Status,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
print(result_5B)
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## ==============================================
## Survey$Health_Status
## Survey$Gender Fair Okay Poor Total
## ----------------------------------------------
## Female 538 493 511 1542
## 52.7% 49.2% 52.2%
## ----------------------------------------------
## Male 482 509 467 1458
## 47.3% 50.8% 47.8%
## ----------------------------------------------
## Total 1020 1002 978 3000
## 34.0% 33.4% 32.6%
## ==============================================
####EXAMPLE 2:####
data(esoph, package = "datasets")
#View(esoph)
names(esoph)
## [1] "agegp" "alcgp" "tobgp" "ncases" "ncontrols"
result_6 = CrossTable(esoph$alcgp, esoph$agegp, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Alcohol consumption", "Tobacco consumption"))
## Warning in chisq.test(tab, correct = FALSE, ...): Chi-squared approximation may
## be incorrect
print(result_6)
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ============================================================================
## Tobacco consumption
## Alcohol consumption 25-34 35-44 45-54 55-64 65-74 75+ Total
## ----------------------------------------------------------------------------
## 0-39g/day 4 4 4 4 4 3 23
## 3.9 3.9 4.2 4.2 3.9 2.9
## 0.002 0.002 0.008 0.008 0.002 0.005
## 0.174 0.174 0.174 0.174 0.174 0.130 0.261
## 0.267 0.267 0.250 0.250 0.267 0.273
## 0.045 0.045 0.045 0.045 0.045 0.034
## ----------------------------------------------------------------------------
## 40-79 4 4 4 4 3 4 23
## 3.9 3.9 4.2 4.2 3.9 2.9
## 0.002 0.002 0.008 0.008 0.216 0.440
## 0.174 0.174 0.174 0.174 0.130 0.174 0.261
## 0.267 0.267 0.250 0.250 0.200 0.364
## 0.045 0.045 0.045 0.045 0.034 0.045
## ----------------------------------------------------------------------------
## 80-119 3 4 4 4 4 2 21
## 3.6 3.6 3.8 3.8 3.6 2.6
## 0.094 0.049 0.009 0.009 0.049 0.149
## 0.143 0.190 0.190 0.190 0.190 0.095 0.239
## 0.200 0.267 0.250 0.250 0.267 0.182
## 0.034 0.045 0.045 0.045 0.045 0.023
## ----------------------------------------------------------------------------
## 120+ 4 3 4 4 4 2 21
## 3.6 3.6 3.8 3.8 3.6 2.6
## 0.049 0.094 0.009 0.009 0.049 0.149
## 0.190 0.143 0.190 0.190 0.190 0.095 0.239
## 0.267 0.200 0.250 0.250 0.267 0.182
## 0.045 0.034 0.045 0.045 0.045 0.023
## ----------------------------------------------------------------------------
## Total 15 15 16 16 15 11 88
## 0.170 0.170 0.182 0.182 0.170 0.125
## ============================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 1.41891 d.f. = 15 p = 1
#### EXAMPLE 3: ####
set.seed(234)
sex = factor(c(rep("F", 900), rep("M", 900)))
income = 100 * (rnorm(1800) + 5)
weight = rep(1, 1800)
weight[sex == "F" & income > 500] = 3
#View(weight)
attr(income, "label") = "Income"
attr(sex, "label") = "Sex"
compmeans(income, sex, col = "lightgray", ylab = "income", xlab = "sex")
## Mean value of "Income" according to "Sex"
## Mean N Std. Dev.
## F 497.6180 900 96.95414
## M 503.7893 900 98.07035
## Total 500.7036 1800 97.53558
comp = compmeans(income, sex, weight, plot = FALSE)
plot(comp, col = c("orange", "lightblue"), ylab = "income", xlab = "sex")