Impact of Financial Support on Academic performance of Students in the Msc Epidemiology and Public health (Honours) Programs at Wits University

Group 8 Stata Assignment

Authors

Bongani Ncube

Pontso Molapo

Pahora Tsibogo

Ziyanda Majokweni

Isikelele Mpoto

Mojaki Nyabela

Published

February 25, 2025

Setup the Stata Engine in R

  • this involves loading the package Statamarkdown
  • defining the Stata installation path
#install.packages("Statamarkdown")
library(Statamarkdown)
stataexe <- "C:/Program Files/Stata18/StataSE-64.exe" 
knitr::opts_chunk$set(engine.path=list(stata=stataexe))

Load your dataset

Loading data for First form

  • Loading the Instrument Student Information dataset from the first form
  • this dataset also contains all demographics for the students
/* load the data file from csv */
import delimited "Datasets\Academic_Performance.csv"

save "Dofiles\Academic_Performance.dta" 
(encoding automatically selected: ISO-8859-1)
(11 vars, 51 obs)

file Dofiles\Academic_Performance.dta already exists
r(602);

r(602);

Load dataset for second form

  • load the academic performance dataset from the second form
/* load the data file from csv */
clear
import delimited "Datasets\Student_Information.csv"
save "Dofiles\Student_Information.dta" , replace
(encoding automatically selected: ISO-8859-1)
(41 vars, 51 obs)

file Dofiles\Student_Information.dta saved

Merging data (1:1 and/or 1:m) and appending data sets.

Merging datasets together

  • the two datasets from the two forms are merged using inner join or one to one merging using the `record_id`` Id column.
/*Merging the two datasets (student information and academic performance datasets)*/
use "Dofiles\Academic_Performance.dta"  
merge 1:1 record_id using "Dofiles\Student_Information.dta" 

save "Datasets\Combined_datasets.dta" , replace
> sets)*/

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                                51  (_merge==3)
    -----------------------------------------

file Datasets\Combined_datasets.dta saved

Appending an additional dataset to the merged data

  • Append the merged dataset with 4 new observations (Students who responded late)
/*Merging the two datasets (student information and academic performance datasets)*/
use "Datasets\Combined_datasets.dta"  
append using "Datasets\Tobeappended.dta"

save "Datasets\Combined_datasets.dta" , replace
(label _merge already defined)

file Datasets\Combined_datasets.dta saved

Converting dates from either string or different columns to Stata friendly dates.

Explore the date of birth (dob) variable

use "Datasets\Combined_datasets.dta"
list dob in 1/5
     |        dob |
     |------------|
  1. | 1999-01-13 |
  2. | 2002-12-05 |
  3. | 2000-04-27 |
  4. | 1997-08-28 |
  5. | 2002-08-16 |
     +------------+

Comment + the variable is not in the format that stata understands hence we need to change it to a date format since it is a string

Changing to proper date format

/*Change the string format to date */

use "Datasets\Combined_datasets.dta",clear
generate _date_ = date(dob,"YMD")
format _date_ %dM_d,_CY
drop dob
rename _date_ dob

save "Datasets\Combined_datasets.dta" , replace
file Datasets\Combined_datasets.dta saved
  • checking for consistency
/*Compare the new variable*/
use "Datasets\Combined_datasets.dta"
list record_id name dob in 1/5
     | record~d       name                dob |
     |----------------------------------------|
  1. |        1     Pontso   January 13, 1999 |
  2. |        3   Thandeka   December 5, 2002 |
  3. |        4     Mojaki     April 27, 2000 |
  4. |        6     karabo    August 28, 1997 |
  5. |        7     Fezeka    August 16, 2002 |
     +----------------------------------------+

Demonstrate exporting and importing data back into Stata using any two data formats

of your choice.

*Exporting the dataset* 
use "Datasets\Combined_datasets.dta",clear
export delimited "Datasets\Exported_Combined.dta"

*Importing the dataset back into stata*
import delimited "Datasets\Exported_Combined.dta", clear
file Datasets\Exported_Combined.dta already exists
r(602);

r(602);

Saving data sets, labelling variables and creating value labels and making notes on dataset.

  • label the variables Gender , socioeconomic_background, residential location
**********************************DATA CLEANING**************************************************************
*Labeling variables Gender, socioeconomic_background, residential location*
  
*Labeling gender*
use "Datasets\Combined_datasets.dta"
label define gender 1 "Female" 2 "Male"
label value gender gender

*Label socioeconomic background*

lab def socioeconomic_background 1 "Low-income household ( R0 - R100 000 per annum)" 2 "Middle-incomehousehold (R100 000 - R500 000 per annum)" 3 "High-income household (>R500 000 per annum)"
lab val socioeconomic_background socioeconomic_background

*Label residential location*

lab define residential_location 1"Eastern Cape" 2"Free State" 3"Gauteng" 4"KwaZulu-Natal" 5"Limpopo" 6"Mpumalanga" 7"Northern Cape" 8"North West" 9"Western Cape"
lab value residential_location residential_location

save "Datasets\Combined_datasets.dta" , replace
file Datasets\Combined_datasets.dta saved

Create tables to check for consistency

use "Datasets\Combined_datasets.dta"

tab residential_location

tab gender

tab socioeconomic_background
residential_l |
      ocation |      Freq.     Percent        Cum.
--------------+-----------------------------------
 Eastern Cape |          7       12.73       12.73
   Free State |          2        3.64       16.36
      Gauteng |         32       58.18       74.55
KwaZulu-Natal |          6       10.91       85.45
      Limpopo |          4        7.27       92.73
   Mpumalanga |          2        3.64       96.36
   North West |          1        1.82       98.18
 Western Cape |          1        1.82      100.00
--------------+-----------------------------------
        Total |         55      100.00

     gender |      Freq.     Percent        Cum.
------------+-----------------------------------
     Female |         39       70.91       70.91
       Male |         16       29.09      100.00
------------+-----------------------------------
      Total |         55      100.00

               socioeconomic_background |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
Low-income household ( R0 - R100 000 pe |         29       52.73       52.73
Middle-incomehousehold (R100 000 - R500 |         21       38.18       90.91
High-income household (>R500 000 per an |          5        9.09      100.00
----------------------------------------+-----------------------------------
                                  Total |         55      100.00

Renaming variables in stata

  • renaming variables in stata using the code rename oldname newname
use "Datasets\Combined_datasets.dta"

*Renaming variables*
rename languages___1 Afrikaans
rename languages___2 English
rename languages___3 Isixhosa
rename languages___4 Isizulu
rename languages___5 IsiNdebele
rename languages___6 Sesotho
rename languages___7 Setswana
rename languages___8 Siswati
rename languages___9 Tshivenda
rename languages___10 Xitsonga
rename languages___11 Sepedi
rename languages___12 SignLanguage


*labeling variables*
label define Afrikaans 1"yes" 0"No", replace
label value Afrikaans Afrikaans
label define English 1"yes" 0"No", replace
label value English English
label define Isixhosa 1"yes" 0"No", replace
label value Isixhosa
label define Isizulu 1"yes" 0"No", replace
label value Isizulu Isizulu
label define IsiNdebele 1"yes" 0"No", replace
label value IsiNdebele IsiNdebele
label define Sesotho 1"yes" 0"No", replace
label value Sesotho Sesotho
label define Setswana 1"yes" 0"No", replace
label value Setswana Setswana
label define Siswati 1"yes" 0"No", replace
label value Siswati Siswati
label define Tshivenda 1"yes" 0"No", replace
label value Tshivenda Tshivenda
label define Xitsonga 1"yes" 0"No", replace
label value Xitsonga Xitsonga
label define Sepedi 1"yes" 0"No", replace
label value Sepedi Sepedi
label define SignLanguage 1"yes" 0"No", replace
label value SignLanguage SignLanguage

save "Datasets\Combined_datasets.dta" , replace
file Datasets\Combined_datasets.dta saved

Checking the language variables for consistency

use "Datasets\Combined_datasets.dta"
list name Afrikaans English Isizulu IsiNdebele IsiNdebele in 1/5
     |     name   Afrika~s   English   Isizulu   IsiNde~e   IsiNde~e |
     |---------------------------------------------------------------|
  1. |   Pontso         No       yes        No         No         No |
  2. | Thandeka         No       yes       yes         No         No |
  3. |   Mojaki         No       yes        No         No         No |
  4. |   karabo         No       yes        No         No         No |
  5. |   Fezeka         No       yes       yes         No         No |
     +---------------------------------------------------------------+

Generate a new variable called language

use "Datasets\Combined_datasets.dta"
gen language=.
replace language=1 if Afrikaans==1
replace language=3 if Isixhosa==1
replace language=4 if Isizulu ==1
replace language=5 if IsiNdebele==1
replace language=6 if Sesotho==1
replace language=7 if Setswana ==1
replace language=8 if Siswati==1
replace language=9 if Tshivenda==1
replace language=10 if Xitsonga ==1
replace language=11 if Sepedi ==1
replace language=12 if SignLanguage ==1

label define language 1 "Afrikaans" 2 "English" 3 " Isixhosa " 4 " Isizulu " 5 " IsiNdebele " 6 " Sesotho " 7"Setswana " 8 "Siswati" 9 "Tshivenda" 10 "Xitsonga" 11 "Sepedi" 12 "SignLanguage", replace
label value language language

save "Datasets\Combined_datasets.dta" , replace
(55 missing values generated)

(5 real changes made)

(15 real changes made)

(24 real changes made)

(2 real changes made)

(13 real changes made)

(7 real changes made)

(4 real changes made)

(2 real changes made)

(1 real change made)

(8 real changes made)

(0 real changes made)



file Datasets\Combined_datasets.dta saved

Checking the language variable for consistency

use "Datasets\Combined_datasets.dta"
list record_id language in 1/5
     | record~d   language |
     |---------------------|
  1. |        1    Sesotho |
  2. |        3    Isizulu |
  3. |        4    Sesotho |
  4. |        6     Sepedi |
  5. |        7    Isizulu |
     +---------------------+

Label the Program variable

*labeling the variable program*
use "Datasets\Combined_datasets.dta"
label define program 1 "MSc Epidemiology" 2 "BHSc (Honours) in Public Health"
label value program program
save "Datasets\Combined_datasets.dta" , replace
file Datasets\Combined_datasets.dta saved

Checking the Program variable for consistency

use "Datasets\Combined_datasets.dta"
list record_id name program in 1/10
     | record~d        name                           program |
     |--------------------------------------------------------|
  1. |        1      Pontso                  MSc Epidemiology |
  2. |        3    Thandeka                  MSc Epidemiology |
  3. |        4      Mojaki                  MSc Epidemiology |
  4. |        6      karabo                  MSc Epidemiology |
  5. |        7      Fezeka                  MSc Epidemiology |
     |--------------------------------------------------------|
  6. |        9    Akuphelo                  MSc Epidemiology |
  7. |       10     Zandile                  MSc Epidemiology |
  8. |       11        John                  MSc Epidemiology |
  9. |       12    Nelisiwe                  MSc Epidemiology |
 10. |       14   Isikelele   BHSc (Honours) in Public Health |
     +--------------------------------------------------------+

Generating and manipulating data using the commands/functions:

  • gen, encode, egen, recode (with labels on the categories) to create one categorical variable in these three ways and show the results using different text and graphic summaries.
*recoding the variable age*
use "Datasets\Combined_datasets.dta"
recode age (min/18=1 "18 years and below") (19/25=2 "19 to 25 years") (26/32=3 "26 to 32 years") (33/40=4 "33 to 40 years") (41/max=5 "41 and above"), gen(Age_cat) label (agelabel)

*using the egen command to generate a variable academic_score*
egen academic_score_cat= cut( academic_score ), at(0, 49,65,75,100) label
label define academic_score_cat 0 "Fail" 1 "Third Class" 2 "Second Class" 3 "First Class", replace

save "Datasets\Combined_datasets.dta" , replace
(55 differences between age and Age_cat)

(2 missing values generated)


file Datasets\Combined_datasets.dta saved

Check missing data using two or three approaches and explain the outputs or

missingness.

use "Datasets\Combined_datasets.dta"
*Checking the missing values within the dataset using 3 methods*
misstable summarize 
*Provides a summary of missing values for each variable in the dataset.*
misstable pattern
*Displays patterns of missing values across multiple variables.*
misstable tree
*Creates a decision tree that shows how missing values are related across variables.*
                                                               Obs<.
                                                +------------------------------
               |                                | Unique
      Variable |     Obs=.     Obs>.     Obs<.  | values        Min         Max
  -------------+--------------------------------+------------------------------
  redcap_sur~r |        55                   0  |      0          .           .
  academic_s~e |         2                  53  |     20          5          95
  attendence~s |         2                  53  |     17          5          50
  attendance~e |         2                  53  |     17         10         100
  research_a~s |         2                  53  |      2          0           1
  workshops_~d |         6                  49  |     10          0           9
  financial_~s |         2                  53  |      2          0           1
  other_resp~s |        55                   0  |      0          .           .
  enrolment_~r |         3                  52  |      8          2        2028
  other_resp~e |        55                   0  |      0          .           .
  sufficienc~e |        27                  28  |      8          2          10
  financial_~t |        29                  26  |     12       1000       15000
  income_range |        26                  29  |      6          1           6
   travel_mode |        32                  23  |      5          1           5
  other_repl~s |        55                   0  |      0          .           .
  commute_le~h |         3                  52  |     15          5          90
      language |         6                  49  |     10          1          11
  academic_s~t |         2                  53  |      4          0           3
  -----------------------------------------------------------------------------

                        Missing-value patterns
                          (1 means complete)

              |   Pattern
    Percent   |  1  2  3  4    5  6  7  8    9 10 11 12   13 14 15 16
  ------------+-------------------------------------------------------
       <1%    |  1  1  1  1    1  1  1  1    1  1  1  1    1  1  1  1
              |  1  1
              |
       38     |  1  1  1  1    1  1  1  1    1  1  1  1    1  0  0  0
              |  0  0
              |
       33     |  1  1  1  1    1  1  1  1    1  1  0  0    0  1  0  0
              |  0  0
              |
        5     |  1  1  1  1    1  1  1  1    1  0  1  1    1  0  0  0
              |  0  0
              |
        4     |  0  0  0  0    0  0  1  1    0  1  0  0    0  1  0  0
              |  0  0
              |
        4     |  1  1  1  1    1  1  0  1    0  0  0  0    0  0  0  0
              |  0  0
              |
        4     |  1  1  1  1    1  1  1  1    1  1  1  1    0  0  0  0
              |  0  0
              |
        2     |  1  1  1  1    1  1  0  1    1  1  0  0    0  1  0  0
              |  0  0
              |
        2     |  1  1  1  1    1  1  1  0    0  1  0  0    0  1  0  0
              |  0  0
              |
        2     |  1  1  1  1    1  1  1  0    1  1  0  0    0  1  0  0
              |  0  0
              |
        2     |  1  1  1  1    1  1  1  0    1  1  1  0    1  0  0  0
              |  0  0
              |
        2     |  1  1  1  1    1  1  1  1    0  1  1  1    0  0  0  0
              |  0  0
              |
        2     |  1  1  1  1    1  1  1  1    1  0  1  0    1  0  0  0
              |  0  0
              |
        2     |  1  1  1  1    1  1  1  1    1  1  0  1    0  0  0  0
              |  0  0
              |
  ------------+-------------------------------------------------------
      100%    |

  Variables are
      Row 1:   (1) academic_score  (2) academic_score_cat
               (3) attendance_rate  (4) attendence_days
               (5) financial_constraints  (6) research_activities
               (7) commute_length  (8) enrolment_year  (9) workshops_attended
               (10) language  (11) income_range  (12) sufficiency_scale
               (13) financial_support  (14) travel_mode  (15) other_replies
               (16) other_response
      Row 2:   (1) other_responses  (2) redcap_survey_identifier

(only 7 variables shown)

  Nested pattern of missing values
  other_~ies other_re~e other_~ses redcap_s~r travel_m~e financia~t sufficie~e
  ----------------------------------------------------------------------------
         100%       100%       100%       100%        58%        11%         4%
                                                                             7 
                                                                 47          4 
                                                                            44 
                                                      42         42         42 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                      0          0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
           0          0          0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                      0          0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                 0          0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                            0          0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
                                                       0          0          0 
                                                                             0 
                                                                  0          0 
                                                                             0 
  ----------------------------------------------------------------------------
 (percent missing listed first)

Introduce duplicates and errors, then check the data for duplicates and clean data of

duplicates and any errors.

*Introducing a duplicate and clean data of duplicates and any errors*
use "Datasets\Combined_datasets.dta"  
preserve
keep if student_id == 2751133
tempfile duplicate_copy
save "duplicate_copy"

restore
append using "duplicate_copy"

*checking for duplicates*
  
duplicates list student_id record_id name surname

*Introduction a missing value error*
replace student_id = . if record_id == 4
replace student_id = . if record_id == 10

*checking for a missing value error*
misstable summarize student_id

*Cleaning data for introduced duplicates*
duplicates drop student_id record_id, force

*Cleaning data for missing value error*
replace student_id = 2751133 if record_id == 4
replace student_id = 2331203 if record_id == 10

save "Datasets\Combined_datasets.dta" , replace
(54 observations deleted)


file duplicate_copy.dta already exists
r(602);

r(602);

Use of the dtable or other appropriate commands for Bivariate (Table 1) results and/or

even regression (without any explanation), save the word based table as a separate attachment

* dtable for Bivariate (Table 1) results and/or even regression*
  
use "Datasets\Combined_datasets.dta"  
dtable, by(program, tests testnotes) continuous(age attendence_days attendance_rate ) factor(gender residential_location income_range)
file duplicate_copy.dta already exists
r(602);



note: using test regress across levels of program for age, attendence_days,
      and attendance_rate.
note: using test pearson across levels of program for gender,
      residential_location, and income_range.

-------------------------------------------------------------------------------------------
                                                     program                               
                     MSc Epidemiology BHSc (Honours) in Public Health      Total       Test
-------------------------------------------------------------------------------------------
N                          31 (56.4%)                      24 (43.6%)     55 (100.0%)      
age                    26.613 (8.011)                  23.083 (2.535)  25.073 (6.443) 0.043
attendence_days       37.933 (13.419)                  39.957 (8.031) 38.811 (11.346) 0.525
attendance_rate       75.867 (26.838)                 79.913 (16.062) 77.623 (22.692) 0.525
gender                                                                                     
  Female                   22 (71.0%)                      17 (70.8%)      39 (70.9%) 0.991
  Male                      9 (29.0%)                       7 (29.2%)      16 (29.1%)      
residential_location                                                                       
  Eastern Cape               3 (9.7%)                       4 (16.7%)       7 (12.7%) 0.097
  Free State                 2 (6.5%)                        0 (0.0%)        2 (3.6%)      
  Gauteng                  20 (64.5%)                      12 (50.0%)      32 (58.2%)      
  KwaZulu-Natal              2 (6.5%)                       4 (16.7%)       6 (10.9%)      
  Limpopo                   4 (12.9%)                        0 (0.0%)        4 (7.3%)      
  Mpumalanga                 0 (0.0%)                        2 (8.3%)        2 (3.6%)      
  North West                 0 (0.0%)                        1 (4.2%)        1 (1.8%)      
  Western Cape               0 (0.0%)                        1 (4.2%)        1 (1.8%)      
income_range                                                                               
  1                         4 (28.6%)                       8 (53.3%)      12 (41.4%) 0.582
  2                         3 (21.4%)                       4 (26.7%)       7 (24.1%)      
  3                          1 (7.1%)                        1 (6.7%)        2 (6.9%)      
  4                         3 (21.4%)                        1 (6.7%)       4 (13.8%)      
  5                         2 (14.3%)                        1 (6.7%)       3 (10.3%)      
  6                          1 (7.1%)                        0 (0.0%)        1 (3.4%)      
-------------------------------------------------------------------------------------------

Drawing four different types of graphs and place them on one graph.

use "Datasets\Combined_datasets.dta"  
*Plotting Graphs*

histogram age ,title("Histogram of Age") note( "Source : Data source unknown") ytitle("Fraction") normal ytitle("Density") name(graph1 , replace)

graph hbar, over(residential_location) name(graph2 , replace)

graph pie, over(program) title(Proportion By Program) name(graph3 , replace) 

graph box age, over(gender) name(graph4 ,replace)

graph combine graph1 graph2 graph3 graph4,title("Four combined graphs") saving("all_graphs.gph" ,replace) 

quietly graph export all_graphs.svg, replace
file duplicate_copy.dta already exists
r(602);



(bin=7, start=18, width=5)




file all_graphs.gph saved

Stata Graph - Graph 0 .02 .04 .06 .08 .1 Density 20 30 40 50 60 age Source : Data source unknown Histogram of Age 0 20 40 60 percent Western Cape North West Mpumalanga Limpopo KwaZulu-Natal Gauteng Free State Eastern Cape MSc Epidemiology BHSc (Honours) in Public Health Proportion By Program 20 30 40 50 60 age Female Male Four combined graphs

Question 3: Drawing a map in Stata to display data

****Load stats SA population by districts for 2018*****
import delimited "Datasets\statssa_population_districts_2018.csv_", clear
rename adm2_id ADM2_ID
save "Datasets\pop_districts_final.dta", replace

*****loading the shapefiles*****
shp2dta using "Datasets\zaf_admbnda_adm2_sadb_ocha_20201109", database(SA_counties) coordinates(coord_SA) genid(county_id) replace
file duplicate_copy.dta already exists
r(602);


(encoding automatically selected: UTF-8)
(30 vars, 52 obs)


file Datasets\pop_districts_final.dta saved

type: 5
*****Merge the datasets*********
use "SA_counties.dta", clear

merge 1:1 ADM2_ID using "Datasets\pop_districts_final"
cap drop _merge

graph bar (asis) p_total ,over(ADM1_EN) asyvars over(ADM2_EN , sort(p_total) lab(angle(90)))  ysize(10) nofill legend(pos(3) col(1)) xsize(22) 
quietly graph export graph_hbar.svg, replace

***Draw the map****

spmap p_total using "coord_SA.dta", id(county_id) legend(on) fcolor(Greens2) legend(label(2 "74247-502821") label(3 "502821-811400")  label(4 "811400,1164473.5" )  label(5 "31164473.5,4949347" ) ) 
 /*specify base map (kencoord) and variable identifying relevant geographic units (county_id)*/

  
/*change default labels (just cosmetics really) */
quietly graph export graph_map1.svg, replace
file duplicate_copy.dta already exists
r(602);

    Result                      Number of obs
    -----------------------------------------
    Not matched                             0
    Matched                                52  (_merge==3)
    -----------------------------------------

Stata Graph - Graph 0 1.0e+06 2.0e+06 3.0e+06 4.0e+06 5.0e+06 p_Total Central Karoo Namakwa Xhariep Pixley ka Seme John Taolo Gaetsewe Z F Mgcawu Overberg Joe Gqabi Frances Baard West Coast Dr Ruth Segomotsi Mompati Cacadu Fezile Dabi Sisonke Amajuba Umzinyathi Eden Lejweleputswa iLembe Umkhanyakude Uthukela Dr Kenneth Kaunda Waterberg Ugu Thabo Mofutsanyane Mangaung Buffalo City West Rand Chris Hani Cape Winelands Alfred Nzo Amathole Ngaka Modiri Molema Zululand Sedibeng Uthungulu Umgungundlovu Gert Sibande Mopani Sekhukhune Nelson Mandela Bay Capricorn Vhembe Nkangala O.R.Tambo Bojanala Ehlanzeni City of Tshwane Ekurhuleni eThekwini City of Cape Town City of Johannesburg Eastern Cape Free State Gauteng KwaZulu-Natal Limpopo Mpumalanga North West Nothern Cape Western Cape

Stata Graph - Graph (1164473.5,4949347] (811400,1164473.5] (502821,811400] [74247,502821]