Impact of Financial Support on Academic performance of Students in the Msc Epidemiology and Public health (Honours) Programs at Wits University

Group 8 Stata Assignment

Authors

Mojaki Nyabela

Bongani Ncube, Msc, honours

Pontso Molapo

Pahora Tsibogo

Ziyanda Majokweni

Isikelele Mpoto

Published

March 14, 2025

Redcap

comment The screenshot above shows that there are more than 2 users assigned or data quality checks on redcap and it shows that they were working during testing.

Setup the Stata Engine in R

this involves loading the package Statamarkdown
defining the Stata installation path

#install.packages("Statamarkdown")
library(Statamarkdown)
stataexe <- "C:/Program Files/Stata18/StataSE-64.exe" 
knitr::opts_chunk$set(engine.path=list(stata=stataexe))

Load your dataset

Loading data for First form

Loading the Instrument Student Information dataset from the first form
this dataset also contains all demographics for the students

/* load the data file from csv */
import delimited "Datasets\Academic_Performance.csv"

save "Datasets\Academic_Performance.dta"

(encoding automatically selected: ISO-8859-1)
(11 vars, 51 obs)

file Datasets\Academic_Performance.dta saved

Load dataset for second form

load the academic performance dataset from the second form

/* load the data file from csv */
import delimited "Datasets\Student_Information.csv"

save "Datasets\Student_Information.dta"

(encoding automatically selected: ISO-8859-1)
(41 vars, 51 obs)

file Datasets\Student_Information.dta saved

Merging data (1:1 and/or 1:m) and appending data sets.

Merging datasets together

merge joins corresponding observations from the dataset currently in memory (called the master dataset) with those from filename.dta (called the using dataset), matching on one or more key variables.
merge can perform match merges (one-to-one, one-to-many, many-to-one, and many-to-many). + merge creates a new variable,_merge, containing numeric codes concerning the source and thecontents of each observation in the merged dataset.
You will need to drop the _merge variable if plan to continue merging other datasets.
the two datasets from the two forms are merged using inner join or one to one merging using the `record_id`` Id column.

/*Merging the two datasets (student information and academic performance datasets)*/
use "Datasets\Academic_Performance.dta"  
merge 1:1 record_id using "Datasets\Student_Information.dta" 

save "Datasets\Combined_datasets.dta" , replace

> sets)*/

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                                51  (_merge==3)
    -----------------------------------------

(file Datasets\Combined_datasets.dta not found)
file Datasets\Combined_datasets.dta saved

Appending an additional dataset to the merged data

Append the merged dataset with 4 additional observations to increase power.

/*Merging the two datasets (student information and academic performance datasets)*/
use "Datasets\Combined_datasets.dta"  
append using "Datasets\Tobeappended.dta"

save "Datasets\Combined_datasets.dta" , replace

(label _merge already defined)

file Datasets\Combined_datasets.dta saved

Converting dates from either string or different columns to Stata friendly dates.

Explore the date of birth (dob) variable

use "Datasets\Combined_datasets.dta"
list record_id dob in 1/5
des dob

     | record~d          dob |
     |-----------------------|
  1. |        1   1999-01-13 |
  2. |        3   2002-12-05 |
  3. |        4   2000-04-27 |
  4. |        6   1997-08-28 |
  5. |        7   2002-08-16 |
     +-----------------------+


Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
dob             str10   %10s

Comment + the variable is not in the format that stata understands hence we need to change it to a date format since it is a string

Changing to proper date format

/*Change the string format to date */

use "Datasets\Combined_datasets.dta",clear
generate _date_ = date(dob,"YMD")
format _date_ %dM_d,_CY
drop dob
rename _date_ dob

des dob
save "Datasets\Combined_datasets.dta" , replace

Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
dob             float   %dM_d,_CY             

file Datasets\Combined_datasets.dta saved

checking for consistency

/*Compare the new variable*/
use "Datasets\Combined_datasets.dta"
list record_id name dob in 1/5

     | record~d       name                dob |
     |----------------------------------------|
  1. |        1     Pontso   January 13, 1999 |
  2. |        3   Thandeka   December 5, 2002 |
  3. |        4     Mojaki     April 27, 2000 |
  4. |        6     karabo    August 28, 1997 |
  5. |        7     Fezeka    August 16, 2002 |
     +----------------------------------------+

Demonstrate exporting and importing data back into Stata using any two data formats

of your choice.

*Exporting the dataset* 
use "Datasets\Combined_datasets.dta",clear
export delimited "Datasets\Exported_Combined.dta"

*Importing the dataset back into stata*
import delimited "Datasets\Exported_Combined.dta", clear

file Datasets\Exported_Combined.dta saved

(encoding automatically selected: ISO-8859-1)
(52 vars, 55 obs)

Saving data sets, labelling variables and creating value labels and making notes on dataset.

label the variables Gender , socioeconomic_background, residential location, income range ,finacial_support_1

**********************************DATA CLEANING**************************************************************
*Labeling variables Gender, socioeconomic_background, residential location*
  
*Labeling gender*
use "Datasets\Combined_datasets.dta"
label define gender 1 "Female" 2 "Male"
label value gender gender

*Label socioeconomic background*

lab def socioeconomic_background 1 "Low-income household ( R0 - R100 000 per annum)" 2 "Middle-incomehousehold (R100 000 - R500 000 per annum)" 3 "High-income household (>R500 000 per annum)"
lab val socioeconomic_background socioeconomic_background

*Label residential location*

lab define residential_location 1"Eastern Cape" 2"Free State" 3"Gauteng" 4"KwaZulu-Natal" 5"Limpopo" 6"Mpumalanga" 7"Northern Cape" 8"North West" 9"Western Cape"
lab value residential_location residential_location

*Label income range *

lab define income_range 1 "0-<10 000" 2 "10 000-< 30 000" 3 "30 000 -< 50 000" 4 "50 000 -< 80 000" 5 "80 000 -<150 000" 6 "150 000+" 
lab value income_range income_range

*label variable financial_support_1*
  
label define financial_support_1 1 "Yes" 0 "No" , replace
label value financial_support_1 financial_support_1

*Labeling Program Variable*
  
label define program 1 "MSc Epidemiology" 2 "BHSc (Honours) in Public Health"
label value program program


save "Datasets\Combined_datasets.dta" , replace

file Datasets\Combined_datasets.dta saved

Create tables to check for consistency

use "Datasets\Combined_datasets.dta"

tab residential_location

tab gender

tab socioeconomic_background

tab financial_support_1

residential_l |
      ocation |      Freq.     Percent        Cum.
--------------+-----------------------------------
 Eastern Cape |          7       12.73       12.73
   Free State |          2        3.64       16.36
      Gauteng |         32       58.18       74.55
KwaZulu-Natal |          6       10.91       85.45
      Limpopo |          4        7.27       92.73
   Mpumalanga |          2        3.64       96.36
   North West |          1        1.82       98.18
 Western Cape |          1        1.82      100.00
--------------+-----------------------------------
        Total |         55      100.00

     gender |      Freq.     Percent        Cum.
------------+-----------------------------------
     Female |         39       70.91       70.91
       Male |         16       29.09      100.00
------------+-----------------------------------
      Total |         55      100.00

               socioeconomic_background |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
Low-income household ( R0 - R100 000 pe |         29       52.73       52.73
Middle-incomehousehold (R100 000 - R500 |         21       38.18       90.91
High-income household (>R500 000 per an |          5        9.09      100.00
----------------------------------------+-----------------------------------
                                  Total |         55      100.00


financial_s |
   upport_1 |      Freq.     Percent        Cum.
------------+-----------------------------------
         No |         23       41.82       41.82
        Yes |         32       58.18      100.00
------------+-----------------------------------
      Total |         55      100.00

Checking the Program variable for consistency

use "Datasets\Combined_datasets.dta"
list record_id name program in 1/10

     | record~d        name                           program |
     |--------------------------------------------------------|
  1. |        1      Pontso                  MSc Epidemiology |
  2. |        3    Thandeka                  MSc Epidemiology |
  3. |        4      Mojaki                  MSc Epidemiology |
  4. |        6      karabo                  MSc Epidemiology |
  5. |        7      Fezeka                  MSc Epidemiology |
     |--------------------------------------------------------|
  6. |        9    Akuphelo                  MSc Epidemiology |
  7. |       10     Zandile                  MSc Epidemiology |
  8. |       11        John                  MSc Epidemiology |
  9. |       12    Nelisiwe                  MSc Epidemiology |
 10. |       14   Isikelele   BHSc (Honours) in Public Health |
     +--------------------------------------------------------+

Renaming variables in stata

renaming variables in stata using the code rename oldname newname

use "Datasets\Combined_datasets.dta"

*Renaming variables*
rename languages___1 Afrikaans
rename languages___2 English
rename languages___3 Isixhosa
rename languages___4 Isizulu
rename languages___5 IsiNdebele
rename languages___6 Sesotho
rename languages___7 Setswana
rename languages___8 Siswati
rename languages___9 Tshivenda
rename languages___10 Xitsonga
rename languages___11 Sepedi
rename languages___12 SignLanguage


*labeling variables*
label define Afrikaans 1"yes" 0"No", replace
label value Afrikaans Afrikaans
label define English 1"yes" 0"No", replace
label value English English
label define Isixhosa 1"yes" 0"No", replace
label value Isixhosa
label define Isizulu 1"yes" 0"No", replace
label value Isizulu Isizulu
label define IsiNdebele 1"yes" 0"No", replace
label value IsiNdebele IsiNdebele
label define Sesotho 1"yes" 0"No", replace
label value Sesotho Sesotho
label define Setswana 1"yes" 0"No", replace
label value Setswana Setswana
label define Siswati 1"yes" 0"No", replace
label value Siswati Siswati
label define Tshivenda 1"yes" 0"No", replace
label value Tshivenda Tshivenda
label define Xitsonga 1"yes" 0"No", replace
label value Xitsonga Xitsonga
label define Sepedi 1"yes" 0"No", replace
label value Sepedi Sepedi
label define SignLanguage 1"yes" 0"No", replace
label value SignLanguage SignLanguage

save "Datasets\Combined_datasets.dta" , replace

file Datasets\Combined_datasets.dta saved

Checking the language variables for consistency

use "Datasets\Combined_datasets.dta"
list name Afrikaans English Isizulu IsiNdebele IsiNdebele in 1/5

     |     name   Afrika~s   English   Isizulu   IsiNde~e   IsiNde~e |
     |---------------------------------------------------------------|
  1. |   Pontso         No       yes        No         No         No |
  2. | Thandeka         No       yes       yes         No         No |
  3. |   Mojaki         No       yes        No         No         No |
  4. |   karabo         No       yes        No         No         No |
  5. |   Fezeka         No       yes       yes         No         No |
     +---------------------------------------------------------------+

Generate a new variable called language

use "Datasets\Combined_datasets.dta"

*Create a new variable called language*

gen language=.
replace language=1 if Afrikaans==1
replace language=2 if English==1
replace language=3 if Isixhosa==1
replace language=4 if Isizulu ==1
replace language=5 if IsiNdebele==1
replace language=6 if Sesotho==1
replace language=7 if Setswana ==1
replace language=8 if Siswati==1
replace language=9 if Tshivenda==1
replace language=10 if Xitsonga ==1
replace language=11 if Sepedi ==1
replace language=12 if SignLanguage ==1

label define language 1 "Afrikaans" 2 "English" 3 " Isixhosa " 4 " Isizulu " 5 " IsiNdebele " 6 " Sesotho " 7"Setswana " 8 "Siswati" 9 "Tshivenda" 10 "Xitsonga" 11 "Sepedi" 12 "SignLanguage", replace
label value language language

*Create a new variable called financial_support_type*
  
gen Financial_support_type=.
replace Financial_support_type=1 if finacial_support___1==1
replace Financial_support_type=2 if finacial_support___2==1
replace Financial_support_type=3 if finacial_support___3==1
replace Financial_support_type=4 if finacial_support___4==1
replace Financial_support_type=5 if finacial_support___5==1


label define Financial_support_type 1 "Bursary" 2 "Loan" 3 "Self-funded" 4 "Family support" 5 "Other" , replace
label value Financial_support_type Financial_support_type

save "Datasets\Combined_datasets.dta" , replace

(55 missing values generated)

(5 real changes made)

(43 real changes made)

(15 real changes made)

(24 real changes made)

(2 real changes made)

(13 real changes made)

(7 real changes made)

(4 real changes made)

(2 real changes made)

(1 real change made)

(8 real changes made)

(0 real changes made)



(55 missing values generated)

(13 real changes made)

(1 real change made)

(7 real changes made)

(10 real changes made)

(0 real changes made)



file Datasets\Combined_datasets.dta saved

Checking the language variable for consistency

use "Datasets\Combined_datasets.dta"
list record_id language in 1/5

     | record~d   language |
     |---------------------|
  1. |        1    Sesotho |
  2. |        3    Isizulu |
  3. |        4    Sesotho |
  4. |        6     Sepedi |
  5. |        7    Isizulu |
     +---------------------+

Generating and manipulating data using the commands/functions:

gen, encode, egen, recode (with labels on the categories) to create one categorical variable in these three ways and show the results using different text and graphic summaries.

*recoding the variable age*
  
use "Datasets\Combined_datasets.dta"
recode age (min/18=1 "18 years and below") (19/25=2 "19 to 25 years") (26/32=3 "26 to 32 years") (33/40=4 "33 to 40 years") (41/max=5 "41 and above"), gen(Age_cat) label (agelabel)

*using the egen command to generate a variable academic_score*
  
egen academic_score_cat= cut( academic_score ), at(0, 49,65,75,100) label
label define academic_score_cat 0 "Fail" 1 "Third Class" 2 "Second Class" 3 "First Class", replace

save "Datasets\Combined_datasets.dta" , replace

(55 differences between age and Age_cat)

(2 missing values generated)


file Datasets\Combined_datasets.dta saved

Check missing data using two or three approaches and explain the outputs or

missingness.

use "Datasets\Combined_datasets.dta"
*Checking the missing values within the dataset using 3 methods*
misstable summarize 
*Provides a summary of missing values for each variable in the dataset.*
misstable pattern
*Displays patterns of missing values across multiple variables.*
misstable tree
*Creates a decision tree that shows how missing values are related across variables.*

                                                               Obs<.
                                                +------------------------------
               |                                | Unique
      Variable |     Obs=.     Obs>.     Obs<.  | values        Min         Max
  -------------+--------------------------------+------------------------------
  redcap_sur~r |        55                   0  |      0          .           .
  academic_s~e |         2                  53  |     20          5          95
  attendence~s |         2                  53  |     17          5          50
  attendance~e |         2                  53  |     17         10         100
  research_a~s |         2                  53  |      2          0           1
  workshops_~d |         6                  49  |     10          0           9
  financial_~s |         2                  53  |      2          0           1
  other_resp~s |        55                   0  |      0          .           .
  enrolment_~r |         3                  52  |      8          2        2028
  other_resp~e |        55                   0  |      0          .           .
  sufficienc~e |        27                  28  |      8          2          10
  financial_~t |        29                  26  |     12       1000       15000
  income_range |        26                  29  |      6          1           6
   travel_mode |        32                  23  |      5          1           5
  other_repl~s |        55                   0  |      0          .           .
  commute_le~h |         3                  52  |     15          5          90
      language |         2                  53  |     10          2          11
  Financial_~e |        27                  28  |      4          1           4
  academic_s~t |         2                  53  |      4          0           3
  -----------------------------------------------------------------------------

                        Missing-value patterns
                          (1 means complete)

              |   Pattern
    Percent   |  1  2  3  4    5  6  7  8    9 10 11 12   13 14 15 16
  ------------+-------------------------------------------------------
       <1%    |  1  1  1  1    1  1  1  1    1  1  1  1    1  1  1  1
              |  1  1  1
              |
       40     |  1  1  1  1    1  1  1  1    1  1  1  1    1  1  0  0
              |  0  0  0
              |
       33     |  1  1  1  1    1  1  1  1    1  1  0  0    0  0  1  0
              |  0  0  0
              |
        4     |  0  0  0  0    0  0  1  1    1  0  0  0    0  0  1  0
              |  0  0  0
              |
        4     |  1  1  1  1    1  1  0  0    1  0  0  0    0  0  0  0
              |  0  0  0
              |
        4     |  1  1  1  1    1  1  1  1    1  1  1  0    1  1  0  0
              |  0  0  0
              |
        4     |  1  1  1  1    1  1  1  1    1  1  1  1    1  0  0  0
              |  0  0  0
              |
        2     |  1  1  1  1    1  1  1  0    1  1  0  0    0  0  1  0
              |  0  0  0
              |
        2     |  1  1  1  1    1  1  1  1    0  0  0  0    0  0  1  0
              |  0  0  0
              |
        2     |  1  1  1  1    1  1  1  1    0  1  0  0    0  0  1  0
              |  0  0  0
              |
        2     |  1  1  1  1    1  1  1  1    0  1  1  1    0  1  0  0
              |  0  0  0
              |
        2     |  1  1  1  1    1  1  1  1    1  0  1  1    1  0  0  0
              |  0  0  0
              |
        2     |  1  1  1  1    1  1  1  1    1  1  0  1    1  0  0  0
              |  0  0  0
              |
        2     |  1  1  1  1    1  1  1  1    1  1  1  1    0  1  0  0
              |  0  0  0
              |
  ------------+-------------------------------------------------------
      100%    |

  Variables are
      Row 1:   (1) academic_score  (2) academic_score_cat
               (3) attendance_rate  (4) attendence_days
               (5) financial_constraints  (6) research_activities
               (7) language  (8) commute_length  (9) enrolment_year
               (10) workshops_attended  (11) income_range
               (12) Financial_support_type  (13) sufficiency_scale
               (14) financial_support  (15) travel_mode  (16) other_replies
      Row 2:   (1) other_response  (2) other_responses
               (3) redcap_survey_identifier

(only 7 variables shown)

  Nested pattern of missing values
  other_~ies other_re~e other_~ses redcap_s~r travel_m~e financia~t Financia~e
  ----------------------------------------------------------------------------
         100%       100%       100%       100%        58%        11%         4%
                                                                             7 
                                                                 47          4 
                                                                            44 
                                                      42         42         42 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                      0          0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
           0          0          0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                      0          0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
  ----------------------------------------------------------------------------
 (percent missing listed first)

Introduce duplicates and errors, then check the data for duplicates and clean data of duplicates and any errors.

*Introducing a duplicate and clean data of duplicates and any errors*
use "Datasets\Combined_datasets.dta"  
preserve
keep if student_id == 2751133
tempfile duplicate_copy
save "duplicate_copy"

restore
append using "duplicate_copy"

*checking for duplicates*
  
duplicates list student_id record_id name surname

*Introduction a missing value error*
replace student_id = . if record_id == 4
replace student_id = . if record_id == 10

*checking for a missing value error*
misstable summarize student_id

*Cleaning data for introduced duplicates*
duplicates drop student_id record_id, force

*Cleaning data for missing value error*
replace student_id = 2751133 if record_id == 4
replace student_id = 2331203 if record_id == 10

save "Datasets\Combined_datasets.dta" , replace

(54 observations deleted)


file duplicate_copy.dta saved


(label Financial_support_type already defined)
(label language already defined)
(label program already defined)
(label financial_support_1 already defined)
(label income_range already defined)
(label residential_location already defined)
(label socioeconomic_background already defined)
(label gender already defined)
(label _merge already defined)
(label Afrikaans already defined)
(label English already defined)
(label Isizulu already defined)
(label IsiNdebele already defined)
(label Sesotho already defined)
(label Setswana already defined)
(label Siswati already defined)
(label Tshivenda already defined)
(label Xitsonga already defined)
(label Sepedi already defined)
(label SignLanguage already defined)
(label agelabel already defined)
(label academic_score_cat already defined)


Duplicates in terms of student_id record_id name surname

  +----------------------------------------------+
  | Obs   studen~d   record~d     name   surname |
  |----------------------------------------------|
  |   3    2751133          4   Mojaki   Nyabela |
  |  56    2751133          4   Mojaki   Nyabela |
  +----------------------------------------------+

(2 real changes made, 2 to missing)

(1 real change made, 1 to missing)

                                                               Obs<.
                                                +------------------------------
               |                                | Unique
      Variable |     Obs=.     Obs>.     Obs<.  | values        Min         Max
  -------------+--------------------------------+------------------------------
    student_id |         3                  53  |     48    1065346    1.05e+11
  -----------------------------------------------------------------------------


Duplicates in terms of student_id record_id

(1 observation deleted)

(1 real change made)

(1 real change made)

file Datasets\Combined_datasets.dta saved

Comments Only 7 variables have incomplete records.

Use of the dtable or other appropriate commands for Bivariate (Table 1) results and/or

even regression (without any explanation), save the word based table as a separate attachment

What variables are associated with academic score

* dtable for Bivariate (Table 1) results and/or even regression*
  
use "Datasets\Combined_datasets.dta"  
dtable, by(academic_score_cat, tests testnotes totals) continuous(age attendence_days attendance_rate ) factor(gender residential_location income_range socioeconomic_background Financial_support_type finacial_support___1 ) title(Table1) titlestyles( font(, bold) )

file duplicate_copy.dta already exists
r(602);



note: using test regress across levels of academic_score_cat for age,
      attendence_days, and attendance_rate.
note: using test pearson across levels of academic_score_cat for gender,
      residential_location, income_range, socioeconomic_background,
      Financial_support_type, and finacial_support___1.

Table1
---------------------------------------------------------------------------------------------------------------------------------------------
                                                                                          academic_score_cat                                 
                                                              Fail        Third Class     Second Class    First Class        Total       Test
---------------------------------------------------------------------------------------------------------------------------------------------
N                                                              2 (3.8%)      10 (18.9%)      16 (30.2%)      25 (47.2%)     53 (100.0%)      
age                                                      24.000 (0.000) 30.100 (13.076)  24.688 (3.825)  23.440 (2.468)  25.094 (6.546) 0.051
attendence_days                                          23.000 (0.000)  37.900 (8.888) 34.875 (12.366) 42.960 (10.212) 38.811 (11.346) 0.022
attendance_rate                                          46.000 (0.000) 75.800 (17.775) 69.750 (24.732) 85.920 (20.424) 77.623 (22.692) 0.022
gender                                                                                                                                       
  Female                                                       0 (0.0%)       8 (80.0%)      12 (75.0%)      17 (68.0%)      37 (69.8%) 0.147
  Male                                                       2 (100.0%)       2 (20.0%)       4 (25.0%)       8 (32.0%)      16 (30.2%)      
residential_location                                                                                                                         
  Eastern Cape                                                 0 (0.0%)       2 (20.0%)       5 (31.2%)        0 (0.0%)       7 (13.2%) 0.004
  Free State                                                   0 (0.0%)       1 (10.0%)        0 (0.0%)        1 (4.0%)        2 (3.8%)      
  Gauteng                                                      0 (0.0%)       6 (60.0%)       8 (50.0%)      16 (64.0%)      30 (56.6%)      
  KwaZulu-Natal                                                0 (0.0%)        0 (0.0%)       3 (18.8%)       3 (12.0%)       6 (11.3%)      
  Limpopo                                                    2 (100.0%)       1 (10.0%)        0 (0.0%)        1 (4.0%)        4 (7.5%)      
  Mpumalanga                                                   0 (0.0%)        0 (0.0%)        0 (0.0%)        2 (8.0%)        2 (3.8%)      
  North West                                                   0 (0.0%)        0 (0.0%)        0 (0.0%)        1 (4.0%)        1 (1.9%)      
  Western Cape                                                 0 (0.0%)        0 (0.0%)        0 (0.0%)        1 (4.0%)        1 (1.9%)      
income_range                                                                                                                                 
  0-<10 000                                                    0 (0.0%)       3 (75.0%)       1 (14.3%)       8 (50.0%)      12 (41.4%) 0.002
  10 000-< 30 000                                              0 (0.0%)        0 (0.0%)       6 (85.7%)        1 (6.2%)       7 (24.1%)      
  30 000 -< 50 000                                             0 (0.0%)       1 (25.0%)        0 (0.0%)        1 (6.2%)        2 (6.9%)      
  50 000 -< 80 000                                           2 (100.0%)        0 (0.0%)        0 (0.0%)       2 (12.5%)       4 (13.8%)      
  80 000 -<150 000                                             0 (0.0%)        0 (0.0%)        0 (0.0%)       3 (18.8%)       3 (10.3%)      
  150 000+                                                     0 (0.0%)        0 (0.0%)        0 (0.0%)        1 (6.2%)        1 (3.4%)      
socioeconomic_background                                                                                                                     
  Low-income household ( R0 - R100 000 per annum)              0 (0.0%)       8 (80.0%)       7 (43.8%)      14 (56.0%)      29 (54.7%) 0.089
  Middle-incomehousehold (R100 000 - R500 000 per annum)     2 (100.0%)        0 (0.0%)       8 (50.0%)       9 (36.0%)      19 (35.8%)      
  High-income household (>R500 000 per annum)                  0 (0.0%)       2 (20.0%)        1 (6.2%)        2 (8.0%)        5 (9.4%)      
Financial_support_type                                                                                                                       
  Bursary                                                        0 (.%)       1 (25.0%)       3 (37.5%)       6 (37.5%)      10 (35.7%) 0.267
  Loan                                                           0 (.%)        0 (0.0%)        0 (0.0%)        1 (6.2%)        1 (3.6%)      
  Self-funded                                                    0 (.%)       3 (75.0%)       2 (25.0%)       2 (12.5%)       7 (25.0%)      
  Family support                                                 0 (.%)        0 (0.0%)       3 (37.5%)       7 (43.8%)      10 (35.7%)      
finacial_support___1                                                                                                                         
  0                                                          2 (100.0%)       9 (90.0%)      13 (81.2%)      16 (64.0%)      40 (75.5%) 0.277
  1                                                            0 (0.0%)       1 (10.0%)       3 (18.8%)       9 (36.0%)      13 (24.5%)      
---------------------------------------------------------------------------------------------------------------------------------------------

Comments

from the table output we can see that the variables attendance rate and attendance days are associated to academic perfomance implying that the number of days and rate that each student puts in impacts his/her academic performance. This is evident from the p-values that are all less than 0.05 (p =0.022)
We also note that the variables Residential location and income range are associated with academic performance. This is evident from the Pearson chisquare test of association which shows p-values less than 0.05 for both values. This implies that the variables are associated to academic perfomance beyond what is attributable to chance.
there is no evidence to show that other variables are associated to academic perfomance since they have p-values less greater than 5%

Drawing four different types of graphs and place them on one graph.

use "Datasets\Combined_datasets.dta"  
*Plotting Graphs*

histogram age ,title("Histogram of Age") note( "Source : School of public Health") ytitle("Fraction") normal ytitle("Density") name(graph1 , replace)

graph hbar , over(financial_support_1) asyvars over(academic_score_cat) title("Academic score by Financial support")  name(graph2 , replace)

graph pie, over(program) title(Proportion By Program) name(graph3 , replace) 

graph box age, over(gender) title("Gender by age") name(graph4 ,replace) 

graph combine graph1 graph2 graph3 graph4,title("Four combined graphs") saving("all_graphs.gph" ,replace) 

quietly graph export all_graphs.svg, replace

file duplicate_copy.dta already exists
r(602);



(bin=7, start=18, width=5)




(file all_graphs.gph not found)
file all_graphs.gph saved

Question 3: Drawing a map in Stata to display data

Import the datasets for mapping

A shapefile that contains information on the boundaries of South Africa and its counties
Geo-coordinates of major South African cities
Geographically disaggregated data we want to map (ie county population)

****Load stats SA population by districts for 2018*****
import delimited "Datasets\statssa_population_districts_2018.csv_", clear
rename adm2_id ADM2_ID
save "Datasets\pop_districts_final.dta", replace

*****loading the shapefiles*****
shp2dta using "Datasets\zaf_admbnda_adm2_sadb_ocha_20201109", database(SA_counties) coordinates(coord_SA) genid(county_id) replace

file duplicate_copy.dta already exists
r(602);


(encoding automatically selected: UTF-8)
(30 vars, 52 obs)


(file Datasets\pop_districts_final.dta not found)
file Datasets\pop_districts_final.dta saved

type: 5

Merge the datasets and draw the map

merge Geo-coordinates of major South African cities and Geographically disaggregated data we want to map (ie county population) > to achieve the above we need a variable that is common to both datasets.
Having merged the county level information with master file we use the spmap command to draw the map

*****Merge the datasets*********
use "SA_counties.dta", clear

merge 1:1 ADM2_ID using "Datasets\pop_districts_final"
cap drop _merge

graph bar (asis) p_total ,title("Population distribution by District,2018") over(ADM1_EN) asyvars over(ADM2_EN , sort(p_total) lab(angle(90)))  ysize(6) nofill legend(pos(3) col(1)) xsize(16) 
quietly graph export graph_hbar.svg, replace

***Draw the map****

spmap p_total using "coord_SA.dta", id(county_id) legend(on) fcolor(Greens2) legend(label(2 "74247-502821") label(3 "502821-811400")  label(4 "811400,1164473.5" )  label(5 "31164473.5,4949347" ) ) title("Population Density by District")
 /*specify base map (coord_SA) and variable identifying relevant geographic units (county_id)*/

  
/*change default labels (just cosmetics really) */
quietly graph export graph_map1.svg, replace

file duplicate_copy.dta already exists
r(602);

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                                52  (_merge==3)
    -----------------------------------------