STA 308: FIELDWORD/LAB FOR SURVEY AND SAMPLING THEORY

STA 308: LAB FIELDWORK FOR SURVEY METHODS & SAMPLING THEORY

INTRODUCTION

SURVEY DATA

Survey data is defined as the resultant data that is collected from a sample of respondents that took a survey. This data is comprehensive information gathered from a target audience about a specific topic to conduct research. There are many methods used for survey data collection and statistical analysis.

Various channels are used to collect feedback and opinions from the desired sample of individuals. While conducting survey research, researchers prefer multiple sources to gather data such as online surveys, telephonic surveys, face-to-face surveys, etc.

However, the medium of collecting survey data decides the sample of people that are to be reached out to, to reach the requisite number of survey responses.

Factors of collecting survey data such as how the interviewer will contact the respondents (online or offline), how the information is communicated to the respondents etc. decide the effectiveness of gathered data.

SURVEY DATA COLLECTION METHODS

The methods used to collect survey data have evolved with the change in technology. From face-to-face surveys, telephonic surveys to now online and email surveys, the world of survey data collection has changed with time. Each survey data collection method has its pros and cons, and every researcher has a preference for gathering accurate information from the target sample.

The survey response rates for each of these data collection methods will differ as their reach and impact are always different. Different ways are chosen according to specific target population characteristics and the intent to examine human nature under various situations.

There are four main survey data collection methods:

Online Surveys
Face-to-Face
Surveys, Telephone Surveys, and
Paper Survey.

ONLINE SURVEYS

Online surveys are the most cost-effective and can reach the maximum number of people in comparison to the other mediums. The performance of these surveys is much more widespread than the other data collection methods. In situations where there is more than one question to be asked to the target sample, certain researchers prefer conducting online surveys over the traditional face-to-face or telephone surveys.

Online surveys are effective and therefore require computational logic and branching technologies for exponentially more accurate survey data collection versus any other traditional means of surveying. They are straightforward in their implementation and take a minimum time of the respondents.

The investment required for survey data collection using online surveys is also negligible in comparison to the other methods. The results are collected in real-time for researchers to analyze and decide corrective measures. A very good example of an online survey is a hotel chain using an online survey to collect guest satisfaction metrics after a stay or an event at the property.

FACE-T0-FACE SURVEYS

Gaining information from respondents via face-to-face medium is much more effective than the other mediums because respondents usually tend to trust the surveyors and provide honest and clear feedback about the subject in-hand. Researchers can easily identify whether their respondents are uncomfortable with the asked questions and can be extremely productive in case there are sensitive topics involved in the discussion.

This face-to-face data collection method demands more cost-investment than in comparison to the other methods. According to the geographic or psychographic segmentation, researchers must be trained to gain accurate information.

For example, a job evaluation survey is conducted in person between an HR or a manager with the employee. This method works best face-to-face as the data collection can collect as accurate information as possible.

TELEPHONE SURVEYS

Telephone surveys require much lesser investment than face-to-face surveys. Depending on the required reach, telephone surveys cost as much or a little more than online surveys. Contacting respondents via the telephonic medium requires less effort and manpower than the face-to-face survey medium. If interviewers are located at the same place, they can cross-check their questions to ensure error-free questions are asked to the target audience.

The main drawback of conducting telephone surveys is that establishing a friendly equation with the respondent becomes challenging due to the bridge of the medium. Respondents are also highly likely to choose to remain anonymous in their feedback over the phone as the reliability associated with the researcher can be questioned.

For example, if a retail giant would like to understand purchasing decisions, they can conduct a telephonic, motivation, and buying experience survey to collect data about the entire purchasing experience.

PAPER SURVEYS

The other commonly used survey method is paper surveys. These surveys can be used where laptops, computers, and tablets cannot go, and hence they use the age-old method of data collection; pen and paper. This method helps collect survey data in field research and helps strengthen the number of responses collected and the validity of these responses.

A popular example or use case of a paper survey is a fast food restaurant survey where the fast-food chain would like to collect feedback on the dining experience of its patrons (attendants).

TYPES OF SURVEY DATA BASED ON THE FREQUENCY AT WHICH THEY ARE ADMINISTERED

Surveys can be divided into 3 distinctive types on the basis of the frequency of their distribution. They are:

Cross-Sectional Surveys Cross-sectional surveys are an observational research method that analyzes data of variables collected at one given point of time across a sample population or a pre-defined subset. The survey data from this method helps the researcher understand what the respondent is feeling at a certain point in time.

It helps measure opinions in a particular situation. For example, if the researcher would like to understand movie rental habits, a survey can be conducted across demographics and geographical locations.

The cross-sectional survey, for example, can help understand that males between 21-28 rent action movies and females between 35-45 rent romantic comedies.

Longitudinal Surveys Longitudinal surveys are those surveys that help researchers to make an observation and collect data over an extended period of time. This survey data can be qualitative or quantitative in nature, and the survey creator does not interfere with the survey respondents.

For example, a longitudinal study can be carried out for years to help understand if mine workers are more prone to lung diseases. This study takes a year and discounts any pre-existing conditions.

Retrospective Surveys In retrospective surveys, researchers ask respondents to report events from the past. This survey method offers in-depth survey data but doesn’t take as long to complete. By deploying this kind of survey, researchers can gather data based on past experiences and beliefs of people.

For example, if hikers are asked about a certain hike – the conditions of the hiking trail, ease of hike, weather conditions, trekking conditions, etc. after they have completed the trek, it is a retrospective study.

### SURVEY DATA ANALYSIS USING R LANGUAGE ###

After the survey data has been collected, this data has to be analyzed to ensure it aids towards the end research objective. There are different ways of conducting this research and some steps to follow. There are four main steps of survey data analysis:

No. 1: Understand the most popular survey research questions: The survey questions should align with the overall purpose of the survey. That is when the collected data will be effective in helping researchers. For example, if a seminar has been conducted, the researchers will send out a post-seminar feedback survey. T he primary goal of this survey will be to understand whether the attendees are interested in attending future seminars. The question will be: “How likely are you to attend future seminars?”. Data collected for this question will decide the likelihood of success of future seminars.

No. 2: Filter obtained results using the cross-tabulation technique: Understand the various categories in the target audience and their thoughts using cross-tabulation format. For example, if there are business owners, administrators, students, etc. who attend the seminar, the data about whether they would prefer attending future seminars or not can be represented using cross-tabulation.

No. 3: Evaluate the derived numbers: Analyzing the gathered information is critical. How many of the attendees are of the opinion that they will be attending future seminars and how many will not – these facts need to be evaluated according to the results obtained from the sample.

No. 4: Draw conclusions: Weave a story with the collected and analyzed data. What was the intention of the survey research, and how does the survey data suffice that objective? – Understand that and develop accurate, conclusive results.

### SURVEY DATA ANALYSIS METHODS ###

Conducting a survey without having access to the resultant data and the inability to draw conclusions from the survey data is pointless. When you conduct a survey, it is imperative to have access to its analytics. It is tough to analyze using traditional survey methods like pen and paper and also requires additional manpower. Survey data analysis becomes much easier when using advanced online data collection methods with an online survey platform such as market research survey software or customer survey software like R and Python.

Statistical analysis can be conducted on the survey data to make sense of the data that has been collected. There are multiple data analysis methods of quantitative data. Some of the commonly used types are:

Cross-tabulation analysis
Trend analysis
MaxDiff analysis
Conjoint analysis
Total Unduplicated Reach and Frequency (TURF) analysis
Gap analysis
SWOT analysis: SWOT analysis, another widely used statistical method, organizes survey data into data that represents the strength, weaknesses, opportunities, and threats of an organization or product or service that provides a holistic picture of competition. This method helps to create effective business strategies.
Text analysis: Text analysis is an advanced statistical method where intelligent tools make sense of and quantify or fashion qualitative and open-ended data into easily understandable data. This method is used when the survey data is unstructured.

APPLICATION OF R PROGRAMMING LANGUAGE IN SURVEY DATA ANALYSIS

We want to describe and implement functions (commands) that aid the exploration of survey data via simple tabulations of respondent counts and proportions, including the ability to specify:

either a frequency count or a row/column/joint/total table proportion;
multiple row and column variables; and
all or grand margins or no margins plus retention of data in a format that is amendable to further analysis in R.

NOW, LET’S START THE CALCULATIONS

PRACTICAL ILLUSTRATION

We will simulate datasets, that will be approximately 95% accurate with real-life dataset. That will serve as our case study.

Install the following packages as follows:

# install.packages(“tidyverse”)

# install.packages(“stringr”)

# install.packages(“gmodels”)

# install.packages(“ggplots”)

# install.packages(“descr”)

# After installation, call their libraries one by one as follows:

Run the codes one by one, it will draw the applicability of the packages for usage:

library("tidyverse")

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library("stringr")
library("gmodels")
library("ggplot2")
library("descr")

## 
## Attaching package: 'descr'
## 
## The following object is masked from 'package:gmodels':
## 
##     CrossTable

EXAMPLE 1: TO CREATE SOME DEMOGRAPHIC DATA SETS:

Run the following R codes to see what you observe:

ID = seq(1:3000)


  set.seed(234)
  Age = sample(c("0 - 5", "6 - 14", "15 - 24", "25 - 50", "51 - 64", "65+ "), 3000, replace = TRUE)
  #View(Age)


  set.seed(234)
  Gender = sample(c("Male", "Female"), 3000, replace = TRUE)
  #View(Gender)


  set.seed(234)
  Country = sample(c("Nigeria", "Ghana", "South Africa", "Botswana", "United Kingdom", "Austria"), 3000, replace = TRUE)
  #View(Country)


  set.seed(234)
  Health_Status = sample(c("Poor", "Fair", "Okay"), 3000, replace = TRUE) 
  #View(Health_Status)


  Survey = data.frame(Age, Gender, Country, Health_Status)
  #View(Survey)

  head(Survey)  ##Recall that head is used to pick the first 6 elements of the generated data set

  #Look at this code:
  
  head(Survey, 15)  #This will produce the first 15 elements of survey

  #What about the command below:

  tail(Survey)  # This picks the last 6 elements of the generated data set

  tail(Survey, 20)  # This picks the last 20 elements of the generated data set

Now, let’s start workings with our “Survey” generated data set

  result_1 = CrossTable(Survey$Age, Survey$Gender, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Gender"))

  print(result_1)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |              Expected N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## =================================
##            Gender
## Age        Female    Male   Total
## ---------------------------------
## 0 - 5         253     227     480
##             246.7   233.3        
##             0.160   0.169        
##             0.527   0.473   0.160
##             0.164   0.156        
##             0.084   0.076        
## ---------------------------------
## 15 - 24       246     259     505
##             259.6   245.4        
##             0.709   0.750        
##             0.487   0.513   0.168
##             0.160   0.178        
##             0.082   0.086        
## ---------------------------------
## 25 - 50       235     263     498
##             256.0   242.0        
##             1.718   1.817        
##             0.472   0.528   0.166
##             0.152   0.180        
##             0.078   0.088        
## ---------------------------------
## 51 - 64       269     228     497
##             255.5   241.5        
##             0.718   0.759        
##             0.541   0.459   0.166
##             0.174   0.156        
##             0.090   0.076        
## ---------------------------------
## 6 - 14        268     232     500
##             257.0   243.0        
##             0.471   0.498        
##             0.536   0.464   0.167
##             0.174   0.159        
##             0.089   0.077        
## ---------------------------------
## 65+           271     249     520
##             267.3   252.7        
##             0.052   0.055        
##             0.521   0.479   0.173
##             0.176   0.171        
##             0.090   0.083        
## ---------------------------------
## Total        1542    1458    3000
##             0.514   0.486        
## =================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 7.876522      d.f. = 5      p = 0.163

To generate the same results in SPSS format, use the command below:

  result_1B = CrossTable(Survey$Age, Survey$Gender,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
  print(result_1B)

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ====================================
##               Survey$Gender
## Survey$Age    Female    Male   Total
## ------------------------------------
## 0 - 5           253     227     480 
##                16.4%   15.6%        
## ------------------------------------
## 15 - 24         246     259     505 
##                16.0%   17.8%        
## ------------------------------------
## 25 - 50         235     263     498 
##                15.2%   18.0%        
## ------------------------------------
## 51 - 64         269     228     497 
##                17.4%   15.6%        
## ------------------------------------
## 6 - 14          268     232     500 
##                17.4%   15.9%        
## ------------------------------------
## 65+             271     249     520 
##                17.6%   17.1%        
## ------------------------------------
## Total          1542    1458    3000 
##                51.4%   48.6%        
## ====================================

Now, let’s consider Age and Health Status:

  result_2 = CrossTable(Survey$Age, Survey$Health_Status, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Health_Status"))
  print(result_2)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |              Expected N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## ========================================
##            Health_Status
## Age         Fair    Okay    Poor   Total
## ----------------------------------------
## 0 - 5        145     181     154     480
##            163.2   160.3   156.5        
##            2.030   2.668   0.039        
##            0.302   0.377   0.321   0.160
##            0.142   0.181   0.157        
##            0.048   0.060   0.051        
## ----------------------------------------
## 15 - 24      166     179     160     505
##            171.7   168.7   164.6        
##            0.189   0.633   0.130        
##            0.329   0.354   0.317   0.168
##            0.163   0.179   0.164        
##            0.055   0.060   0.053        
## ----------------------------------------
## 25 - 50      177     153     168     498
##            169.3   166.3   162.3        
##            0.348   1.069   0.197        
##            0.355   0.307   0.337   0.166
##            0.174   0.153   0.172        
##            0.059   0.051   0.056        
## ----------------------------------------
## 51 - 64      164     153     180     497
##            169.0   166.0   162.0        
##            0.147   1.018   1.995        
##            0.330   0.308   0.362   0.166
##            0.161   0.153   0.184        
##            0.055   0.051   0.060        
## ----------------------------------------
## 6 - 14       179     176     145     500
##            170.0   167.0   163.0        
##            0.476   0.485   1.988        
##            0.358   0.352   0.290   0.167
##            0.175   0.176   0.148        
##            0.060   0.059   0.048        
## ----------------------------------------
## 65+          189     160     171     520
##            176.8   173.7   169.5        
##            0.842   1.078   0.013        
##            0.363   0.308   0.329   0.173
##            0.185   0.160   0.175        
##            0.063   0.053   0.057        
## ----------------------------------------
## Total       1020    1002     978    3000
##            0.340   0.334   0.326        
## ========================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 15.34322      d.f. = 10      p = 0.12

For the immediate above, let’s have the same results in SPSS format:

  result_2B = CrossTable(Survey$Age, Survey$Health_Status,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
  print(result_2B)

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ===========================================
##               Survey$Health_Status
## Survey$Age     Fair    Okay    Poor   Total
## -------------------------------------------
## 0 - 5          145     181     154     480 
##               14.2%   18.1%   15.7%        
## -------------------------------------------
## 15 - 24        166     179     160     505 
##               16.3%   17.9%   16.4%        
## -------------------------------------------
## 25 - 50        177     153     168     498 
##               17.4%   15.3%   17.2%        
## -------------------------------------------
## 51 - 64        164     153     180     497 
##               16.1%   15.3%   18.4%        
## -------------------------------------------
## 6 - 14         179     176     145     500 
##               17.5%   17.6%   14.8%        
## -------------------------------------------
## 65+            189     160     171     520 
##               18.5%   16.0%   17.5%        
## -------------------------------------------
## Total         1020    1002     978    3000 
##               34.0%   33.4%   32.6%        
## ===========================================

See another example below:

  result_3 = CrossTable(Survey$Age, Survey$Country, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Age", "Country"))
  print(result_3)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |              Expected N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## =====================================================================================
##            Country
## Age         Austria   Botswana      Ghana    Nigeria   Sth Afrc   Untd Kng      Total
## -------------------------------------------------------------------------------------
## 0 - 5             0          0          0        480          0          0        480
##                83.2       79.7       80.0       76.8       80.8       79.5           
##              83.200     79.680     80.000   2116.800     80.800     79.520           
##               0.000      0.000      0.000      1.000      0.000      0.000      0.160
##                   0          0          0          1          0          0           
##               0.000      0.000      0.000      0.160      0.000      0.000           
## -------------------------------------------------------------------------------------
## 15 - 24           0          0          0          0        505          0        505
##                87.5       83.8       84.2       80.8       85.0       83.7           
##              87.533     83.830     84.167     80.800   2075.008     83.662           
##               0.000      0.000      0.000      0.000      1.000      0.000      0.168
##                   0          0          0          0          1          0           
##               0.000      0.000      0.000      0.000      0.168      0.000           
## -------------------------------------------------------------------------------------
## 25 - 50           0        498          0          0          0          0        498
##                86.3       82.7       83.0       79.7       83.8       82.5           
##              86.320   2086.668     83.000     79.680     83.830     82.502           
##               0.000      1.000      0.000      0.000      0.000      0.000      0.166
##                   0          1          0          0          0          0           
##               0.000      0.166      0.000      0.000      0.000      0.000           
## -------------------------------------------------------------------------------------
## 51 - 64           0          0          0          0          0        497        497
##                86.1       82.5       82.8       79.5       83.7       82.3           
##              86.147     82.502     82.833     79.520     83.662   2088.336           
##               0.000      0.000      0.000      0.000      0.000      1.000      0.166
##                   0          0          0          0          0          1           
##               0.000      0.000      0.000      0.000      0.000      0.166           
## -------------------------------------------------------------------------------------
## 6 - 14            0          0        500          0          0          0        500
##                86.7       83.0       83.3       80.0       84.2       82.8           
##              86.667     83.000   2083.333     80.000     84.167     82.833           
##               0.000      0.000      1.000      0.000      0.000      0.000      0.167
##                   0          0          1          0          0          0           
##               0.000      0.000      0.167      0.000      0.000      0.000           
## -------------------------------------------------------------------------------------
## 65+             520          0          0          0          0          0        520
##                90.1       86.3       86.7       83.2       87.5       86.1           
##            2050.133     86.320     86.667     83.200     87.533     86.147           
##               1.000      0.000      0.000      0.000      0.000      0.000      0.173
##                   1          0          0          0          0          0           
##               0.173      0.000      0.000      0.000      0.000      0.000           
## -------------------------------------------------------------------------------------
## Total           520        498        500        480        505        497       3000
##               0.173      0.166      0.167      0.160      0.168      0.166           
## =====================================================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 15000      d.f. = 25      p <2e-16

Look at what we have again in SPSS FORMAT:

 result_3B = CrossTable(Survey$Age, Survey$Country, prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
  print(result_3B)

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ==============================================================================
##             Survey$Country
## Survy$Ag    Austria   Botswana   Ghana   Nigeria   Sth Afrc   Untd Kng   Total
## ------------------------------------------------------------------------------
## 0 - 5            0          0       0       480          0          0     480 
##                  0%         0%      0%      100%         0%         0%        
## ------------------------------------------------------------------------------
## 15 - 24          0          0       0         0        505          0     505 
##                  0%         0%      0%        0%       100%         0%        
## ------------------------------------------------------------------------------
## 25 - 50          0        498       0         0          0          0     498 
##                  0%       100%      0%        0%         0%         0%        
## ------------------------------------------------------------------------------
## 51 - 64          0          0       0         0          0        497     497 
##                  0%         0%      0%        0%         0%       100%        
## ------------------------------------------------------------------------------
## 6 - 14           0          0     500         0          0          0     500 
##                  0%         0%    100%        0%         0%         0%        
## ------------------------------------------------------------------------------
## 65+            520          0       0         0          0          0     520 
##                100%         0%      0%        0%         0%         0%        
## ------------------------------------------------------------------------------
## Total          520        498     500       480        505        497    3000 
##               17.3%      16.6%   16.7%     16.0%      16.8%      16.6%        
## ==============================================================================

Let’s look at Example below:

   result_4 = CrossTable(Survey$Gender, Survey$Country, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Gender", "Country"))
  print(result_4)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |              Expected N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## ================================================================================
##           Country
## Gender    Austria   Botswana   Ghana   Nigeria   South Afrc   Untd Kngdm   Total
## --------------------------------------------------------------------------------
## Female        271        235     268       253          246          269    1542
##             267.3      256.0   257.0     246.7        259.6        255.5        
##             0.052      1.718   0.471     0.160        0.709        0.718        
##             0.176      0.152   0.174     0.164        0.160        0.174   0.514
##             0.521      0.472   0.536     0.527        0.487        0.541        
##             0.090      0.078   0.089     0.084        0.082        0.090        
## --------------------------------------------------------------------------------
## Male          249        263     232       227          259          228    1458
##             252.7      242.0   243.0     233.3        245.4        241.5        
##             0.055      1.817   0.498     0.169        0.750        0.759        
##             0.171      0.180   0.159     0.156        0.178        0.156   0.486
##             0.479      0.528   0.464     0.473        0.513        0.459        
##             0.083      0.088   0.077     0.076        0.086        0.076        
## --------------------------------------------------------------------------------
## Total         520        498     500       480          505          497    3000
##             0.173      0.166   0.167     0.160        0.168        0.166        
## ================================================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 7.876522      d.f. = 5      p = 0.163

The same results above could be in SPSS FORMAT in the following codes:

  result_4B = CrossTable(Survey$Gender, Survey$Country, prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
  print(result_4B)

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ==============================================================================
##             Survey$Country
## Srvy$Gnd    Austria   Botswana   Ghana   Nigeria   Sth Afrc   Untd Kng   Total
## ------------------------------------------------------------------------------
## Female         271        235     268       253        246        269    1542 
##               52.1%      47.2%   53.6%     52.7%      48.7%      54.1%        
## ------------------------------------------------------------------------------
## Male           249        263     232       227        259        228    1458 
##               47.9%      52.8%   46.4%     47.3%      51.3%      45.9%        
## ------------------------------------------------------------------------------
## Total          520        498     500       480        505        497    3000 
##               17.3%      16.6%   16.7%     16.0%      16.8%      16.6%        
## ==============================================================================

Another example below:

  result_5 = CrossTable(Survey$Gender, Survey$Health_Status, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Gender", "Health_Status"))
  print(result_5)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |              Expected N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## =======================================
##           Health_Status
## Gender     Fair    Okay    Poor   Total
## ---------------------------------------
## Female      538     493     511    1542
##           524.3   515.0   502.7        
##           0.359   0.942   0.137        
##           0.349   0.320   0.331   0.514
##           0.527   0.492   0.522        
##           0.179   0.164   0.170        
## ---------------------------------------
## Male        482     509     467    1458
##           495.7   487.0   475.3        
##           0.380   0.996   0.145        
##           0.331   0.349   0.320   0.486
##           0.473   0.508   0.478        
##           0.161   0.170   0.156        
## ---------------------------------------
## Total      1020    1002     978    3000
##           0.340   0.334   0.326        
## =======================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 2.959869      d.f. = 2      p = 0.228

We can still have it in SPSS FORMAT as follows:

 result_5B = CrossTable(Survey$Gender, Survey$Health_Status,prop.r=FALSE,prop.t=FALSE,prop.chisq=FALSE,format="SPSS")
  print(result_5B)

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## ==============================================
##                  Survey$Health_Status
## Survey$Gender     Fair    Okay    Poor   Total
## ----------------------------------------------
## Female            538     493     511    1542 
##                  52.7%   49.2%   52.2%        
## ----------------------------------------------
## Male              482     509     467    1458 
##                  47.3%   50.8%   47.8%        
## ----------------------------------------------
## Total            1020    1002     978    3000 
##                  34.0%   33.4%   32.6%        
## ==============================================

####EXAMPLE 2:####

  data(esoph, package = "datasets")
  View(esoph)
  names(esoph)

## [1] "agegp"     "alcgp"     "tobgp"     "ncases"    "ncontrols"

  result_6 = CrossTable(esoph$alcgp, esoph$agegp, expected = TRUE, chisq = TRUE, prop.chisq = TRUE, dnn = c("Alcohol consumption", "Tobacco consumption"))

## Warning in chisq.test(tab, correct = FALSE, ...): Chi-squared approximation may
## be incorrect

  print(result_6)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |              Expected N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## ============================================================================
##                        Tobacco consumption
## Alcohol consumption    25-34   35-44   45-54   55-64   65-74     75+   Total
## ----------------------------------------------------------------------------
## 0-39g/day                  4       4       4       4       4       3      23
##                          3.9     3.9     4.2     4.2     3.9     2.9        
##                        0.002   0.002   0.008   0.008   0.002   0.005        
##                        0.174   0.174   0.174   0.174   0.174   0.130   0.261
##                        0.267   0.267   0.250   0.250   0.267   0.273        
##                        0.045   0.045   0.045   0.045   0.045   0.034        
## ----------------------------------------------------------------------------
## 40-79                      4       4       4       4       3       4      23
##                          3.9     3.9     4.2     4.2     3.9     2.9        
##                        0.002   0.002   0.008   0.008   0.216   0.440        
##                        0.174   0.174   0.174   0.174   0.130   0.174   0.261
##                        0.267   0.267   0.250   0.250   0.200   0.364        
##                        0.045   0.045   0.045   0.045   0.034   0.045        
## ----------------------------------------------------------------------------
## 80-119                     3       4       4       4       4       2      21
##                          3.6     3.6     3.8     3.8     3.6     2.6        
##                        0.094   0.049   0.009   0.009   0.049   0.149        
##                        0.143   0.190   0.190   0.190   0.190   0.095   0.239
##                        0.200   0.267   0.250   0.250   0.267   0.182        
##                        0.034   0.045   0.045   0.045   0.045   0.023        
## ----------------------------------------------------------------------------
## 120+                       4       3       4       4       4       2      21
##                          3.6     3.6     3.8     3.8     3.6     2.6        
##                        0.049   0.094   0.009   0.009   0.049   0.149        
##                        0.190   0.143   0.190   0.190   0.190   0.095   0.239
##                        0.267   0.200   0.250   0.250   0.267   0.182        
##                        0.045   0.034   0.045   0.045   0.045   0.023        
## ----------------------------------------------------------------------------
## Total                     15      15      16      16      15      11      88
##                        0.170   0.170   0.182   0.182   0.170   0.125        
## ============================================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 1.41891      d.f. = 15      p = 1

#### EXAMPLE 3: ####

  set.seed(234)
  sex = factor(c(rep("F", 900), rep("M", 900)))

  income = 100 * (rnorm(1800) + 5)

  weight = rep(1, 1800)

  weight[sex == "F" & income > 500] = 3

  View(weight)

  attr(income, "label") = "Income"

  attr(sex, "label") = "Sex"

  compmeans(income, sex, col = "lightgray", ylab = "income", xlab = "sex")

## Mean value of "Income" according to "Sex"
##           Mean    N Std. Dev.
## F     497.6180  900  96.95414
## M     503.7893  900  98.07035
## Total 500.7036 1800  97.53558

  comp = compmeans(income, sex, weight, plot = FALSE)


  plot(comp, col = c("orange", "lightblue"), ylab = "income", xlab = "sex")

Load the following packages

library(dplyr)
library(ggplot2)
library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

library(DescTools)

## Registered S3 method overwritten by 'DescTools':
##   method         from 
##   reorder.factor gdata

NB: Some of the libraries above are already embedded in the tidyverse package.

Create contingency table for Sex Disposition

sex=as.table(matrix(c(33,28,11,2,4,3,117,101,41,4,17,8),ncol=2,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("Male","Female"))))
print(sex)

##             Male Female
## Guma          33    117
## Logo          28    101
## Barkin-Ladi   11     41
## Bokkos         2      4
## Mangu          4     17
## Riyom          3      8

sex_bar=melt(sex, varnames=c("LGA","Sex"), id.vars="States")
print(sex_bar)

##            LGA    Sex value
## 1         Guma   Male    33
## 2         Logo   Male    28
## 3  Barkin-Ladi   Male    11
## 4       Bokkos   Male     2
## 5        Mangu   Male     4
## 6        Riyom   Male     3
## 7         Guma Female   117
## 8         Logo Female   101
## 9  Barkin-Ladi Female    41
## 10      Bokkos Female     4
## 11       Mangu Female    17
## 12       Riyom Female     8

Set up data frame to generate bar chart

 ggplot(sex_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
     aes(x=LGA, y = Percentage,fill = Sex, cumulative = TRUE))+geom_col() +
     geom_text(aes(label = paste0(Percentage*100,"%")),
     position = position_stack(vjust = 0.5))

Row Percentages

PercTable(sex, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                           
##                       Male   Female    Sum
##                                           
## Guma          freq      33      117    150
##               p.row 22.00%   78.00%      .
##                                           
## Logo          freq      28      101    129
##               p.row 21.71%   78.29%      .
##                                           
## Barkin-Ladi   freq      11       41     52
##               p.row 21.15%   78.85%      .
##                                           
## Bokkos        freq       2        4      6
##               p.row 33.33%   66.67%      .
##                                           
## Mangu         freq       4       17     21
##               p.row 19.05%   80.95%      .
##                                           
## Riyom         freq       3        8     11
##               p.row 27.27%   72.73%      .
##                                           
## Sum           freq      81      288    369
##               p.row 21.95%   78.05%      .
##

Column Percentages

PercTable(sex, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                           
##                       Male   Female    Sum
##                                           
## Guma          freq      33      117    150
##               p.col 40.74%   40.62% 40.65%
##                                           
## Logo          freq      28      101    129
##               p.col 34.57%   35.07% 34.96%
##                                           
## Barkin-Ladi   freq      11       41     52
##               p.col 13.58%   14.24% 14.09%
##                                           
## Bokkos        freq       2        4      6
##               p.col  2.47%    1.39%  1.63%
##                                           
## Mangu         freq       4       17     21
##               p.col  4.94%    5.90%  5.69%
##                                           
## Riyom         freq       3        8     11
##               p.col  3.70%    2.78%  2.98%
##                                           
## Sum           freq      81      288    369
##               p.col      .        .      .
##

Expected Frequencies and Residuals

PercTable(sex, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
  stdres = FALSE, margins = c(1,2), digits = NULL)

##                                            
##                       Male   Female     Sum
##                                            
## Guma          freq      33      117     150
##               exp   32.927  117.073       .
##               res    0.013   -0.007       .
##                                            
## Logo          freq      28      101     129
##               exp   28.317  100.683       .
##               res   -0.060    0.032       .
##                                            
## Barkin-Ladi   freq      11       41      52
##               exp   11.415   40.585       .
##               res   -0.123    0.065       .
##                                            
## Bokkos        freq       2        4       6
##               exp    1.317    4.683       .
##               res    0.595   -0.316       .
##                                            
## Mangu         freq       4       17      21
##               exp    4.610   16.390       .
##               res   -0.284    0.151       .
##                                            
## Riyom         freq       3        8      11
##               exp    2.415    8.585       .
##               res    0.377   -0.200       .
##                                            
## Sum           freq      81      288     369
##               exp        .        .       .
##               res        .        .       .
##

chi-square test

chisq.test(sex, correct=FALSE)

## Warning in chisq.test(sex, correct = FALSE): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  sex
## X-squared = 0.76292, df = 5, p-value = 0.9793

Create contingency table for Age Classification

age=as.table(matrix(c(19,11,7,1,3,1,131,118,45,5,18,10),ncol=2,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("(10-17) Years","(18-Above) Years"))))
print(age)

##             (10-17) Years (18-Above) Years
## Guma                   19              131
## Logo                   11              118
## Barkin-Ladi             7               45
## Bokkos                  1                5
## Mangu                   3               18
## Riyom                   1               10

age_bar=melt(age, varnames=c("LGA","Age"))
print(age_bar)

##            LGA              Age value
## 1         Guma    (10-17) Years    19
## 2         Logo    (10-17) Years    11
## 3  Barkin-Ladi    (10-17) Years     7
## 4       Bokkos    (10-17) Years     1
## 5        Mangu    (10-17) Years     3
## 6        Riyom    (10-17) Years     1
## 7         Guma (18-Above) Years   131
## 8         Logo (18-Above) Years   118
## 9  Barkin-Ladi (18-Above) Years    45
## 10      Bokkos (18-Above) Years     5
## 11       Mangu (18-Above) Years    18
## 12       Riyom (18-Above) Years    10

Set up data frame to generate bar chart

ggplot(age_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),aes(x=LGA, y = Percentage,fill = Age, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),position = position_stack(vjust = 0.5))

Row Percentages

PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                              
##                       (10-17) Years   (18-Above) Years    Sum
##                                                              
## Guma          freq               19                131    150
##               p.row          12.67%             87.33%      .
##                                                              
## Logo          freq               11                118    129
##               p.row           8.53%             91.47%      .
##                                                              
## Barkin-Ladi   freq                7                 45     52
##               p.row          13.46%             86.54%      .
##                                                              
## Bokkos        freq                1                  5      6
##               p.row          16.67%             83.33%      .
##                                                              
## Mangu         freq                3                 18     21
##               p.row          14.29%             85.71%      .
##                                                              
## Riyom         freq                1                 10     11
##               p.row           9.09%             90.91%      .
##                                                              
## Sum           freq               42                327    369
##               p.row          11.38%             88.62%      .
##

Column Percentages

PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                              
##                       (10-17) Years   (18-Above) Years    Sum
##                                                              
## Guma          freq               19                131    150
##               p.col          45.24%             40.06% 40.65%
##                                                              
## Logo          freq               11                118    129
##               p.col          26.19%             36.09% 34.96%
##                                                              
## Barkin-Ladi   freq                7                 45     52
##               p.col          16.67%             13.76% 14.09%
##                                                              
## Bokkos        freq                1                  5      6
##               p.col           2.38%              1.53%  1.63%
##                                                              
## Mangu         freq                3                 18     21
##               p.col           7.14%              5.50%  5.69%
##                                                              
## Riyom         freq                1                 10     11
##               p.col           2.38%              3.06%  2.98%
##                                                              
## Sum           freq               42                327    369
##               p.col               .                  .      .
##

Expected Frequencies and Residuals

PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
  stdres = FALSE, margins = c(1,2), digits = NULL)

##                                                              
##                      (10-17) Years   (18-Above) Years     Sum
##                                                              
## Guma          freq              19                131     150
##               exp           17.073            132.927       .
##               res            0.466             -0.167       .
##                                                              
## Logo          freq              11                118     129
##               exp           14.683            114.317       .
##               res           -0.961              0.344       .
##                                                              
## Barkin-Ladi   freq               7                 45      52
##               exp            5.919             46.081       .
##               res            0.444             -0.159       .
##                                                              
## Bokkos        freq               1                  5       6
##               exp            0.683              5.317       .
##               res            0.384             -0.138       .
##                                                              
## Mangu         freq               3                 18      21
##               exp            2.390             18.610       .
##               res            0.394             -0.141       .
##                                                              
## Riyom         freq               1                 10      11
##               exp            1.252              9.748       .
##               res           -0.225              0.081       .
##                                                              
## Sum           freq              42                327     369
##               exp                .                  .       .
##               res                .                  .       .
##

chi-square test

chisq.test(age, correct=FALSE)

## Warning in chisq.test(age, correct = FALSE): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  age
## X-squared = 1.9096, df = 5, p-value = 0.8615

Create contingency table for Dialect Classification

dialect=as.table(matrix(c(136,63,0,0,0,0,12,4,0,0,0,0,2,42,0,0,0,0,0,0,48,0,0,8,0,0,0,6,0,0,0,0,0,0,19,0,0,20,4,0,2,3),ncol=7,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("Tiv","Wapan","Idoma","Berom","Ron","Mwaghavul","Others"))))
print(dialect)

##             Tiv Wapan Idoma Berom Ron Mwaghavul Others
## Guma        136    12     2     0   0         0      0
## Logo         63     4    42     0   0         0     20
## Barkin-Ladi   0     0     0    48   0         0      4
## Bokkos        0     0     0     0   6         0      0
## Mangu         0     0     0     0   0        19      2
## Riyom         0     0     0     8   0         0      3

dialect_bar=melt(dialect, varnames=c("LGA","Dialects"))
print(dialect_bar)

##            LGA  Dialects value
## 1         Guma       Tiv   136
## 2         Logo       Tiv    63
## 3  Barkin-Ladi       Tiv     0
## 4       Bokkos       Tiv     0
## 5        Mangu       Tiv     0
## 6        Riyom       Tiv     0
## 7         Guma     Wapan    12
## 8         Logo     Wapan     4
## 9  Barkin-Ladi     Wapan     0
## 10      Bokkos     Wapan     0
## 11       Mangu     Wapan     0
## 12       Riyom     Wapan     0
## 13        Guma     Idoma     2
## 14        Logo     Idoma    42
## 15 Barkin-Ladi     Idoma     0
## 16      Bokkos     Idoma     0
## 17       Mangu     Idoma     0
## 18       Riyom     Idoma     0
## 19        Guma     Berom     0
## 20        Logo     Berom     0
## 21 Barkin-Ladi     Berom    48
## 22      Bokkos     Berom     0
## 23       Mangu     Berom     0
## 24       Riyom     Berom     8
## 25        Guma       Ron     0
## 26        Logo       Ron     0
## 27 Barkin-Ladi       Ron     0
## 28      Bokkos       Ron     6
## 29       Mangu       Ron     0
## 30       Riyom       Ron     0
## 31        Guma Mwaghavul     0
## 32        Logo Mwaghavul     0
## 33 Barkin-Ladi Mwaghavul     0
## 34      Bokkos Mwaghavul     0
## 35       Mangu Mwaghavul    19
## 36       Riyom Mwaghavul     0
## 37        Guma    Others     0
## 38        Logo    Others    20
## 39 Barkin-Ladi    Others     4
## 40      Bokkos    Others     0
## 41       Mangu    Others     2
## 42       Riyom    Others     3

Set up data frame to generate bar chart

ggplot(dialect_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
  aes(x=LGA, y = Percentage,fill = Dialects, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),
  position = position_stack(vjust = 0.5))

Row Percentages

PercTable(dialect, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                                        
##                         Tiv   Wapan   Idoma   Berom     Ron   Mwaghavul
##                                                                        
## Guma          freq      136      12       2       0       0           0
##               p.row  90.67%   8.00%   1.33%   0.00%   0.00%       0.00%
##                                                                        
## Logo          freq       63       4      42       0       0           0
##               p.row  48.84%   3.10%  32.56%   0.00%   0.00%       0.00%
##                                                                        
## Barkin-Ladi   freq        0       0       0      48       0           0
##               p.row   0.00%   0.00%   0.00%  92.31%   0.00%       0.00%
##                                                                        
## Bokkos        freq        0       0       0       0       6           0
##               p.row   0.00%   0.00%   0.00%   0.00% 100.00%       0.00%
##                                                                        
## Mangu         freq        0       0       0       0       0          19
##               p.row   0.00%   0.00%   0.00%   0.00%   0.00%      90.48%
##                                                                        
## Riyom         freq        0       0       0       8       0           0
##               p.row   0.00%   0.00%   0.00%  72.73%   0.00%       0.00%
##                                                                        
## Sum           freq      199      16      44      56       6          19
##               p.row  53.93%   4.34%  11.92%  15.18%   1.63%       5.15%
##                                                                        
##                                     
##                       Others     Sum
##                                     
## Guma          freq         0     150
##               p.row    0.00%       .
##                                     
## Logo          freq        20     129
##               p.row   15.50%       .
##                                     
## Barkin-Ladi   freq         4      52
##               p.row    7.69%       .
##                                     
## Bokkos        freq         0       6
##               p.row    0.00%       .
##                                     
## Mangu         freq         2      21
##               p.row    9.52%       .
##                                     
## Riyom         freq         3      11
##               p.row   27.27%       .
##                                     
## Sum           freq        29     369
##               p.row    7.86%       .
##

Column Percentages

PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                              
##                       (10-17) Years   (18-Above) Years    Sum
##                                                              
## Guma          freq               19                131    150
##               p.col          45.24%             40.06% 40.65%
##                                                              
## Logo          freq               11                118    129
##               p.col          26.19%             36.09% 34.96%
##                                                              
## Barkin-Ladi   freq                7                 45     52
##               p.col          16.67%             13.76% 14.09%
##                                                              
## Bokkos        freq                1                  5      6
##               p.col           2.38%              1.53%  1.63%
##                                                              
## Mangu         freq                3                 18     21
##               p.col           7.14%              5.50%  5.69%
##                                                              
## Riyom         freq                1                 10     11
##               p.col           2.38%              3.06%  2.98%
##                                                              
## Sum           freq               42                327    369
##               p.col               .                  .      .
##

Expected Frequencies and Residuals

PercTable(age, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
  stdres = FALSE, margins = c(1,2), digits = NULL)

##                                                              
##                      (10-17) Years   (18-Above) Years     Sum
##                                                              
## Guma          freq              19                131     150
##               exp           17.073            132.927       .
##               res            0.466             -0.167       .
##                                                              
## Logo          freq              11                118     129
##               exp           14.683            114.317       .
##               res           -0.961              0.344       .
##                                                              
## Barkin-Ladi   freq               7                 45      52
##               exp            5.919             46.081       .
##               res            0.444             -0.159       .
##                                                              
## Bokkos        freq               1                  5       6
##               exp            0.683              5.317       .
##               res            0.384             -0.138       .
##                                                              
## Mangu         freq               3                 18      21
##               exp            2.390             18.610       .
##               res            0.394             -0.141       .
##                                                              
## Riyom         freq               1                 10      11
##               exp            1.252              9.748       .
##               res           -0.225              0.081       .
##                                                              
## Sum           freq              42                327     369
##               exp                .                  .       .
##               res                .                  .       .
##

chi-square test

chisq.test(age, correct=FALSE)

## Warning in chisq.test(age, correct = FALSE): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  age
## X-squared = 1.9096, df = 5, p-value = 0.8615

Create contingency table for Marital Status

marital=as.table(matrix(c(56,34,8,0,2,4,7,9,6,0,1,0,20,14,12,1,4,2,49,57,16,2,11,3,18,15,10,3,3,2),ncol=5,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("Married","Widowed","Divorced","Separated","Single"))))
print(marital)

##             Married Widowed Divorced Separated Single
## Guma             56       7       20        49     18
## Logo             34       9       14        57     15
## Barkin-Ladi       8       6       12        16     10
## Bokkos            0       0        1         2      3
## Mangu             2       1        4        11      3
## Riyom             4       0        2         3      2

marital_bar=melt(marital, varnames=c("LGA","Marital_Status"))
print(marital_bar)

##            LGA Marital_Status value
## 1         Guma        Married    56
## 2         Logo        Married    34
## 3  Barkin-Ladi        Married     8
## 4       Bokkos        Married     0
## 5        Mangu        Married     2
## 6        Riyom        Married     4
## 7         Guma        Widowed     7
## 8         Logo        Widowed     9
## 9  Barkin-Ladi        Widowed     6
## 10      Bokkos        Widowed     0
## 11       Mangu        Widowed     1
## 12       Riyom        Widowed     0
## 13        Guma       Divorced    20
## 14        Logo       Divorced    14
## 15 Barkin-Ladi       Divorced    12
## 16      Bokkos       Divorced     1
## 17       Mangu       Divorced     4
## 18       Riyom       Divorced     2
## 19        Guma      Separated    49
## 20        Logo      Separated    57
## 21 Barkin-Ladi      Separated    16
## 22      Bokkos      Separated     2
## 23       Mangu      Separated    11
## 24       Riyom      Separated     3
## 25        Guma         Single    18
## 26        Logo         Single    15
## 27 Barkin-Ladi         Single    10
## 28      Bokkos         Single     3
## 29       Mangu         Single     3
## 30       Riyom         Single     2

Set up data frame to generate bar chart

ggplot(marital_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
  aes(x=LGA, y = Percentage,fill = Marital_Status, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),
  position = position_stack(vjust = 0.5))

Row Percentages

PercTable(marital, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                                               
##                       Married   Widowed   Divorced   Separated   Single    Sum
##                                                                               
## Guma          freq         56         7         20          49       18    150
##               p.row    37.33%     4.67%     13.33%      32.67%   12.00%      .
##                                                                               
## Logo          freq         34         9         14          57       15    129
##               p.row    26.36%     6.98%     10.85%      44.19%   11.63%      .
##                                                                               
## Barkin-Ladi   freq          8         6         12          16       10     52
##               p.row    15.38%    11.54%     23.08%      30.77%   19.23%      .
##                                                                               
## Bokkos        freq          0         0          1           2        3      6
##               p.row     0.00%     0.00%     16.67%      33.33%   50.00%      .
##                                                                               
## Mangu         freq          2         1          4          11        3     21
##               p.row     9.52%     4.76%     19.05%      52.38%   14.29%      .
##                                                                               
## Riyom         freq          4         0          2           3        2     11
##               p.row    36.36%     0.00%     18.18%      27.27%   18.18%      .
##                                                                               
## Sum           freq        104        23         53         138       51    369
##               p.row    28.18%     6.23%     14.36%      37.40%   13.82%      .
##

Column Percentages

PercTable(marital, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                                               
##                       Married   Widowed   Divorced   Separated   Single    Sum
##                                                                               
## Guma          freq         56         7         20          49       18    150
##               p.col    53.85%    30.43%     37.74%      35.51%   35.29% 40.65%
##                                                                               
## Logo          freq         34         9         14          57       15    129
##               p.col    32.69%    39.13%     26.42%      41.30%   29.41% 34.96%
##                                                                               
## Barkin-Ladi   freq          8         6         12          16       10     52
##               p.col     7.69%    26.09%     22.64%      11.59%   19.61% 14.09%
##                                                                               
## Bokkos        freq          0         0          1           2        3      6
##               p.col     0.00%     0.00%      1.89%       1.45%    5.88%  1.63%
##                                                                               
## Mangu         freq          2         1          4          11        3     21
##               p.col     1.92%     4.35%      7.55%       7.97%    5.88%  5.69%
##                                                                               
## Riyom         freq          4         0          2           3        2     11
##               p.col     3.85%     0.00%      3.77%       2.17%    3.92%  2.98%
##                                                                               
## Sum           freq        104        23         53         138       51    369
##               p.col         .         .          .           .        .      .
##

Expected Frequencies and Residuals

PercTable(marital, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
  stdres = FALSE, margins = c(1,2), digits = NULL)

##                                                                              
##                      Married   Widowed   Divorced   Separated   Single    Sum
##                                                                              
## Guma          freq        56         7         20          49       18    150
##               exp     42.276     9.350     21.545      56.098   20.732      .
##               res      2.111    -0.768     -0.333      -0.948   -0.600      .
##                                                                              
## Logo          freq        34         9         14          57       15    129
##               exp     36.358     8.041     18.528      48.244   17.829      .
##               res     -0.391     0.338     -1.052       1.261   -0.670      .
##                                                                              
## Barkin-Ladi   freq         8         6         12          16       10     52
##               exp     14.656     3.241      7.469      19.447    7.187      .
##               res     -1.739     1.532      1.658      -0.782    1.049      .
##                                                                              
## Bokkos        freq         0         0          1           2        3      6
##               exp      1.691     0.374      0.862       2.244    0.829      .
##               res     -1.300    -0.612      0.149      -0.163    2.384      .
##                                                                              
## Mangu         freq         2         1          4          11        3     21
##               exp      5.919     1.309      3.016       7.854    2.902      .
##               res     -1.611    -0.270      0.566       1.123    0.057      .
##                                                                              
## Riyom         freq         4         0          2           3        2     11
##               exp      3.100     0.686      1.580       4.114    1.520      .
##               res      0.511    -0.828      0.334      -0.549    0.389      .
##                                                                              
## Sum           freq       104        23         53         138       51    369
##               exp          .         .          .           .        .      .
##               res          .         .          .           .        .      .
##

chi-square test

chisq.test(marital, correct=FALSE)

## Warning in chisq.test(marital, correct = FALSE): Chi-squared approximation may
## be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  marital
## X-squared = 33.218, df = 20, p-value = 0.03193

Create contingency table for Educational Status

educational=as.table(matrix(c(134,116,48,6,15,9,10,9,4,0,3,0,2,1,0,0,0,1,4,3,0,0,3,1,0,0,0,0,0,0),ncol=5,nrow=6,dimnames=list(c("Guma","Logo","Barkin-Ladi","Bokkos","Mangu","Riyom"),c("No Formal Education","Primary Education","Secondary Education","Adult/Vocational Education","Tertiary Education"))))
print(educational)

##             No Formal Education Primary Education Secondary Education
## Guma                        134                10                   2
## Logo                        116                 9                   1
## Barkin-Ladi                  48                 4                   0
## Bokkos                        6                 0                   0
## Mangu                        15                 3                   0
## Riyom                         9                 0                   1
##             Adult/Vocational Education Tertiary Education
## Guma                                 4                  0
## Logo                                 3                  0
## Barkin-Ladi                          0                  0
## Bokkos                               0                  0
## Mangu                                3                  0
## Riyom                                1                  0

educational_bar=melt(educational, varnames=c("LGA","Educational_Status"))
print(educational_bar)

##            LGA         Educational_Status value
## 1         Guma        No Formal Education   134
## 2         Logo        No Formal Education   116
## 3  Barkin-Ladi        No Formal Education    48
## 4       Bokkos        No Formal Education     6
## 5        Mangu        No Formal Education    15
## 6        Riyom        No Formal Education     9
## 7         Guma          Primary Education    10
## 8         Logo          Primary Education     9
## 9  Barkin-Ladi          Primary Education     4
## 10      Bokkos          Primary Education     0
## 11       Mangu          Primary Education     3
## 12       Riyom          Primary Education     0
## 13        Guma        Secondary Education     2
## 14        Logo        Secondary Education     1
## 15 Barkin-Ladi        Secondary Education     0
## 16      Bokkos        Secondary Education     0
## 17       Mangu        Secondary Education     0
## 18       Riyom        Secondary Education     1
## 19        Guma Adult/Vocational Education     4
## 20        Logo Adult/Vocational Education     3
## 21 Barkin-Ladi Adult/Vocational Education     0
## 22      Bokkos Adult/Vocational Education     0
## 23       Mangu Adult/Vocational Education     3
## 24       Riyom Adult/Vocational Education     1
## 25        Guma         Tertiary Education     0
## 26        Logo         Tertiary Education     0
## 27 Barkin-Ladi         Tertiary Education     0
## 28      Bokkos         Tertiary Education     0
## 29       Mangu         Tertiary Education     0
## 30       Riyom         Tertiary Education     0

ggplot(educational_bar %>% group_by(LGA) %>% mutate(Percentage = round(value/sum(value),4)),
  aes(x=LGA, y = Percentage,fill = Educational_Status, cumulative = TRUE))+geom_col()+geom_text(aes(label = paste0(Percentage*100,"%")),
  position = position_stack(vjust = 0.5))

Row Percentages

PercTable(educational, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "010", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                              
##                       No Formal Education   Primary Education
##                                                              
## Guma          freq                    134                  10
##               p.row                89.33%               6.67%
##                                                              
## Logo          freq                    116                   9
##               p.row                89.92%               6.98%
##                                                              
## Barkin-Ladi   freq                     48                   4
##               p.row                92.31%               7.69%
##                                                              
## Bokkos        freq                      6                   0
##               p.row               100.00%               0.00%
##                                                              
## Mangu         freq                     15                   3
##               p.row                71.43%              14.29%
##                                                              
## Riyom         freq                      9                   0
##               p.row                81.82%               0.00%
##                                                              
## Sum           freq                    328                  26
##               p.row                88.89%               7.05%
##                                                              
##                                                                       
##                       Secondary Education   Adult/Vocational Education
##                                                                       
## Guma          freq                      2                            4
##               p.row                 1.33%                        2.67%
##                                                                       
## Logo          freq                      1                            3
##               p.row                 0.78%                        2.33%
##                                                                       
## Barkin-Ladi   freq                      0                            0
##               p.row                 0.00%                        0.00%
##                                                                       
## Bokkos        freq                      0                            0
##               p.row                 0.00%                        0.00%
##                                                                       
## Mangu         freq                      0                            3
##               p.row                 0.00%                       14.29%
##                                                                       
## Riyom         freq                      1                            1
##               p.row                 9.09%                        9.09%
##                                                                       
## Sum           freq                      4                           11
##               p.row                 1.08%                        2.98%
##                                                                       
##                                                 
##                       Tertiary Education     Sum
##                                                 
## Guma          freq                     0     150
##               p.row                0.00%       .
##                                                 
## Logo          freq                     0     129
##               p.row                0.00%       .
##                                                 
## Barkin-Ladi   freq                     0      52
##               p.row                0.00%       .
##                                                 
## Bokkos        freq                     0       6
##               p.row                0.00%       .
##                                                 
## Mangu         freq                     0      21
##               p.row                0.00%       .
##                                                 
## Riyom         freq                     0      11
##               p.row                0.00%       .
##                                                 
## Sum           freq                     0     369
##               p.row                0.00%       .
##

Column Percentages

PercTable(educational, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "001", expected = FALSE, residuals = FALSE,
  stdres = FALSE, margins = c(1,2), digits = 2)

##                                                              
##                       No Formal Education   Primary Education
##                                                              
## Guma          freq                    134                  10
##               p.col                40.85%              38.46%
##                                                              
## Logo          freq                    116                   9
##               p.col                35.37%              34.62%
##                                                              
## Barkin-Ladi   freq                     48                   4
##               p.col                14.63%              15.38%
##                                                              
## Bokkos        freq                      6                   0
##               p.col                 1.83%               0.00%
##                                                              
## Mangu         freq                     15                   3
##               p.col                 4.57%              11.54%
##                                                              
## Riyom         freq                      9                   0
##               p.col                 2.74%               0.00%
##                                                              
## Sum           freq                    328                  26
##               p.col                     .                   .
##                                                              
##                                                                       
##                       Secondary Education   Adult/Vocational Education
##                                                                       
## Guma          freq                      2                            4
##               p.col                50.00%                       36.36%
##                                                                       
## Logo          freq                      1                            3
##               p.col                25.00%                       27.27%
##                                                                       
## Barkin-Ladi   freq                      0                            0
##               p.col                 0.00%                        0.00%
##                                                                       
## Bokkos        freq                      0                            0
##               p.col                 0.00%                        0.00%
##                                                                       
## Mangu         freq                      0                            3
##               p.col                 0.00%                       27.27%
##                                                                       
## Riyom         freq                      1                            1
##               p.col                25.00%                        9.09%
##                                                                       
## Sum           freq                      4                           11
##               p.col                     .                            .
##                                                                       
##                                                
##                       Tertiary Education    Sum
##                                                
## Guma          freq                     0    150
##               p.col               NA     40.65%
##                                                
## Logo          freq                     0    129
##               p.col               NA     34.96%
##                                                
## Barkin-Ladi   freq                     0     52
##               p.col               NA     14.09%
##                                                
## Bokkos        freq                     0      6
##               p.col               NA      1.63%
##                                                
## Mangu         freq                     0     21
##               p.col               NA      5.69%
##                                                
## Riyom         freq                     0     11
##               p.col               NA      2.98%
##                                                
## Sum           freq                     0    369
##               p.col                    .      .
##

Expected Frequencies and Residuals

PercTable(educational, row.vars = NULL, col.vars = NULL, justify = "right",
  freq = TRUE, rfrq = "000", expected = TRUE, residuals = TRUE,
  stdres = FALSE, margins = c(1,2), digits = NULL)

##                                                             
##                      No Formal Education   Primary Education
##                                                             
## Guma          freq                   134                  10
##               exp                133.333              10.569
##               res                  0.058              -0.175
##                                                             
## Logo          freq                   116                   9
##               exp                114.667               9.089
##               res                  0.125              -0.030
##                                                             
## Barkin-Ladi   freq                    48                   4
##               exp                 46.222               3.664
##               res                  0.261               0.176
##                                                             
## Bokkos        freq                     6                   0
##               exp                  5.333               0.423
##               res                  0.289              -0.650
##                                                             
## Mangu         freq                    15                   3
##               exp                 18.667               1.480
##               res                 -0.849               1.250
##                                                             
## Riyom         freq                     9                   0
##               exp                  9.778               0.775
##               res                 -0.249              -0.880
##                                                             
## Sum           freq                   328                  26
##               exp                      .                   .
##               res                      .                   .
##                                                             
##                                                                      
##                      Secondary Education   Adult/Vocational Education
##                                                                      
## Guma          freq                     2                            4
##               exp                  1.626                        4.472
##               res                  0.293                       -0.223
##                                                                      
## Logo          freq                     1                            3
##               exp                  1.398                        3.846
##               res                 -0.337                       -0.431
##                                                                      
## Barkin-Ladi   freq                     0                            0
##               exp                  0.564                        1.550
##               res                 -0.751                       -1.245
##                                                                      
## Bokkos        freq                     0                            0
##               exp                  0.065                        0.179
##               res                 -0.255                       -0.423
##                                                                      
## Mangu         freq                     0                            3
##               exp                  0.228                        0.626
##               res                 -0.477                        3.000
##                                                                      
## Riyom         freq                     1                            1
##               exp                  0.119                        0.328
##               res                  2.551                        1.174
##                                                                      
## Sum           freq                     4                           11
##               exp                      .                            .
##               res                      .                            .
##                                                                      
##                                                
##                      Tertiary Education     Sum
##                                                
## Guma          freq                    0     150
##               exp                 0.000       .
##               res               NA            .
##                                                
## Logo          freq                    0     129
##               exp                 0.000       .
##               res               NA            .
##                                                
## Barkin-Ladi   freq                    0      52
##               exp                 0.000       .
##               res               NA            .
##                                                
## Bokkos        freq                    0       6
##               exp                 0.000       .
##               res               NA            .
##                                                
## Mangu         freq                    0      21
##               exp                 0.000       .
##               res               NA            .
##                                                
## Riyom         freq                    0      11
##               exp                 0.000       .
##               res               NA            .
##                                                
## Sum           freq                    0     369
##               exp                     .       .
##               res                     .       .
##

chi-square test

chisq.test(educational, correct=FALSE)

## Warning in chisq.test(educational, correct = FALSE): Chi-squared approximation
## may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  educational
## X-squared = NaN, df = 20, p-value = NA

Best wishes from:

Timothy A. OGUNLEYE Department of Statistics, Faculty of Basic and Applied Sciences, Osun State University, Osogbo, Nigeria Official Email: timothy.ogunleye@uniosun.edu.ng Personal Email: thompsondx@gmail.com

STA 308: FIELDWORD/LAB FOR SURVEY AND SAMPLING THEORY

Tim. A. OGUNLEYE

2023-05-04

STA 308: LAB FIELDWORK FOR SURVEY METHODS & SAMPLING THEORY

INTRODUCTION

SURVEY DATA

Now, let’s start workings with our “Survey” generated data set

Load the following packages

Set up data frame to generate bar chart

Set up data frame to generate bar chart

Set up data frame to generate bar chart

Set up data frame to generate bar chart