#install.packages("Statamarkdown")
library(Statamarkdown)
<- "C:/Program Files/Stata18/StataSE-64.exe"
stataexe ::opts_chunk$set(engine.path=list(stata=stataexe)) knitr
Impact of Financial Support on Academic performance of Students in the Msc Epidemiology and Public health (Honours) Programs at Wits University
Group 8 Stata Assignment
Redcap
comment The screenshot above shows that there are more than 2 users assigned or data quality checks on redcap and it shows that they were working during testing.
Setup the Stata Engine in R
- this involves loading the package
Statamarkdown
- defining the Stata installation path
Load your dataset
Loading data for First form
- Loading the Instrument Student Information dataset from the first form
- this dataset also contains all demographics for the students
/* load the data file from csv */
"Datasets\Academic_Performance.csv"
import delimited
save "Datasets\Academic_Performance.dta"
(encoding automatically selected: ISO-8859-1)
(11 vars, 51 obs)
file Datasets\Academic_Performance.dta saved
Load dataset for second form
- load the academic performance dataset from the second form
/* load the data file from csv */
"Datasets\Student_Information.csv"
import delimited
save "Datasets\Student_Information.dta"
(encoding automatically selected: ISO-8859-1)
(41 vars, 51 obs)
file Datasets\Student_Information.dta saved
Merging data (1:1 and/or 1:m) and appending data sets.
Merging datasets together
- merge joins corresponding observations from the dataset currently in memory (called the master dataset) with those from filename.dta (called the using dataset), matching on one or more key variables.
- merge can perform match merges (one-to-one, one-to-many, many-to-one, and many-to-many). + merge creates a new variable,_merge, containing numeric codes concerning the source and thecontents of each observation in the merged dataset.
- You will need to drop the _merge variable if plan to continue merging other datasets.
- the two datasets from the two forms are merged using
inner join
or one to one merging using the `record_id`` Id column.
/*Merging the two datasets (student information and academic performance datasets)*/
use "Datasets\Academic_Performance.dta"
merge 1:1 record_id using "Datasets\Student_Information.dta"
save "Datasets\Combined_datasets.dta" , replace
> sets)*/
Result Number of obs
-----------------------------------------
Not matched 0
Matched 51 (_merge==3)
-----------------------------------------
(file Datasets\Combined_datasets.dta not found)
file Datasets\Combined_datasets.dta saved
Appending an additional dataset to the merged data
- Append the merged dataset with 4 additional observations to increase power.
/*Merging the two datasets (student information and academic performance datasets)*/
use "Datasets\Combined_datasets.dta"
append using "Datasets\Tobeappended.dta"
save "Datasets\Combined_datasets.dta" , replace
(label _merge already defined)
file Datasets\Combined_datasets.dta saved
Converting dates from either string or different columns to Stata friendly dates.
Explore the date of birth (dob) variable
use "Datasets\Combined_datasets.dta"
list record_id dob in 1/5
des dob
| record~d dob |
|-----------------------|
1. | 1 1999-01-13 |
2. | 3 2002-12-05 |
3. | 4 2000-04-27 |
4. | 6 1997-08-28 |
5. | 7 2002-08-16 |
+-----------------------+
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
dob str10 %10s
Comment + the variable is not in the format that stata understands hence we need to change it to a date format since it is a string
Changing to proper date format
/*Change the string format to date */
use "Datasets\Combined_datasets.dta",clear
generate _date_ = date(dob,"YMD")
format _date_ %dM_d,_CY
drop dob
rename _date_ dob
des dob
save "Datasets\Combined_datasets.dta" , replace
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
dob float %dM_d,_CY
file Datasets\Combined_datasets.dta saved
- checking for consistency
/*Compare the new variable*/
use "Datasets\Combined_datasets.dta"
list record_id name dob in 1/5
| record~d name dob |
|----------------------------------------|
1. | 1 Pontso January 13, 1999 |
2. | 3 Thandeka December 5, 2002 |
3. | 4 Mojaki April 27, 2000 |
4. | 6 karabo August 28, 1997 |
5. | 7 Fezeka August 16, 2002 |
+----------------------------------------+
Demonstrate exporting and importing data back into Stata using any two data formats
of your choice.
*Exporting the dataset* use "Datasets\Combined_datasets.dta",clear
export delimited "Datasets\Exported_Combined.dta"
*Importing the dataset back into stata*"Datasets\Exported_Combined.dta", clear import delimited
file Datasets\Exported_Combined.dta saved
(encoding automatically selected: ISO-8859-1)
(52 vars, 55 obs)
Saving data sets, labelling variables and creating value labels and making notes on dataset.
- label the variables Gender , socioeconomic_background, residential location, income range ,finacial_support_1
**********************************DATA CLEANING**************************************************************
*Labeling variables Gender, socioeconomic_background, residential location*
*Labeling gender*use "Datasets\Combined_datasets.dta"
label define gender 1 "Female" 2 "Male"
label value gender gender
background*
*Label socioeconomic
"Low-income household ( R0 - R100 000 per annum)" 2 "Middle-incomehousehold (R100 000 - R500 000 per annum)" 3 "High-income household (>R500 000 per annum)"
lab def socioeconomic_background 1
lab val socioeconomic_background socioeconomic_background
*Label residential location*
define residential_location 1"Eastern Cape" 2"Free State" 3"Gauteng" 4"KwaZulu-Natal" 5"Limpopo" 6"Mpumalanga" 7"Northern Cape" 8"North West" 9"Western Cape"
lab value residential_location residential_location
lab
range *
*Label income
define income_range 1 "0-<10 000" 2 "10 000-< 30 000" 3 "30 000 -< 50 000" 4 "50 000 -< 80 000" 5 "80 000 -<150 000" 6 "150 000+"
lab value income_range income_range
lab
label variable financial_support_1*
*
label define financial_support_1 1 "Yes" 0 "No" , replace
label value financial_support_1 financial_support_1
*Labeling Program Variable*
label define program 1 "MSc Epidemiology" 2 "BHSc (Honours) in Public Health"
label value program program
save "Datasets\Combined_datasets.dta" , replace
file Datasets\Combined_datasets.dta saved
Create tables to check for consistency
use "Datasets\Combined_datasets.dta"
tab residential_location
tab gender
tab socioeconomic_background
tab financial_support_1
residential_l |
ocation | Freq. Percent Cum.
--------------+-----------------------------------
Eastern Cape | 7 12.73 12.73
Free State | 2 3.64 16.36
Gauteng | 32 58.18 74.55
KwaZulu-Natal | 6 10.91 85.45
Limpopo | 4 7.27 92.73
Mpumalanga | 2 3.64 96.36
North West | 1 1.82 98.18
Western Cape | 1 1.82 100.00
--------------+-----------------------------------
Total | 55 100.00
gender | Freq. Percent Cum.
------------+-----------------------------------
Female | 39 70.91 70.91
Male | 16 29.09 100.00
------------+-----------------------------------
Total | 55 100.00
socioeconomic_background | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Low-income household ( R0 - R100 000 pe | 29 52.73 52.73
Middle-incomehousehold (R100 000 - R500 | 21 38.18 90.91
High-income household (>R500 000 per an | 5 9.09 100.00
----------------------------------------+-----------------------------------
Total | 55 100.00
financial_s |
upport_1 | Freq. Percent Cum.
------------+-----------------------------------
No | 23 41.82 41.82
Yes | 32 58.18 100.00
------------+-----------------------------------
Total | 55 100.00
Checking the Program variable for consistency
use "Datasets\Combined_datasets.dta"
list record_id name program in 1/10
| record~d name program |
|--------------------------------------------------------|
1. | 1 Pontso MSc Epidemiology |
2. | 3 Thandeka MSc Epidemiology |
3. | 4 Mojaki MSc Epidemiology |
4. | 6 karabo MSc Epidemiology |
5. | 7 Fezeka MSc Epidemiology |
|--------------------------------------------------------|
6. | 9 Akuphelo MSc Epidemiology |
7. | 10 Zandile MSc Epidemiology |
8. | 11 John MSc Epidemiology |
9. | 12 Nelisiwe MSc Epidemiology |
10. | 14 Isikelele BHSc (Honours) in Public Health |
+--------------------------------------------------------+
Renaming variables in stata
- renaming variables in stata using the code
rename oldname newname
use "Datasets\Combined_datasets.dta"
*Renaming variables*rename languages___1 Afrikaans
rename languages___2 English
rename languages___3 Isixhosa
rename languages___4 Isizulu
rename languages___5 IsiNdebele
rename languages___6 Sesotho
rename languages___7 Setswana
rename languages___8 Siswati
rename languages___9 Tshivenda
rename languages___10 Xitsonga
rename languages___11 Sepedi
rename languages___12 SignLanguage
*labeling variables*label define Afrikaans 1"yes" 0"No", replace
label value Afrikaans Afrikaans
label define English 1"yes" 0"No", replace
label value English English
label define Isixhosa 1"yes" 0"No", replace
label value Isixhosa
label define Isizulu 1"yes" 0"No", replace
label value Isizulu Isizulu
label define IsiNdebele 1"yes" 0"No", replace
label value IsiNdebele IsiNdebele
label define Sesotho 1"yes" 0"No", replace
label value Sesotho Sesotho
label define Setswana 1"yes" 0"No", replace
label value Setswana Setswana
label define Siswati 1"yes" 0"No", replace
label value Siswati Siswati
label define Tshivenda 1"yes" 0"No", replace
label value Tshivenda Tshivenda
label define Xitsonga 1"yes" 0"No", replace
label value Xitsonga Xitsonga
label define Sepedi 1"yes" 0"No", replace
label value Sepedi Sepedi
label define SignLanguage 1"yes" 0"No", replace
label value SignLanguage SignLanguage
save "Datasets\Combined_datasets.dta" , replace
file Datasets\Combined_datasets.dta saved
Checking the language variables for consistency
use "Datasets\Combined_datasets.dta"
list name Afrikaans English Isizulu IsiNdebele IsiNdebele in 1/5
| name Afrika~s English Isizulu IsiNde~e IsiNde~e |
|---------------------------------------------------------------|
1. | Pontso No yes No No No |
2. | Thandeka No yes yes No No |
3. | Mojaki No yes No No No |
4. | karabo No yes No No No |
5. | Fezeka No yes yes No No |
+---------------------------------------------------------------+
Generate a new variable called language
use "Datasets\Combined_datasets.dta"
new variable called language*
*Create a
gen language=.
replace language=1 if Afrikaans==1
replace language=2 if English==1
replace language=3 if Isixhosa==1
replace language=4 if Isizulu ==1
replace language=5 if IsiNdebele==1
replace language=6 if Sesotho==1
replace language=7 if Setswana ==1
replace language=8 if Siswati==1
replace language=9 if Tshivenda==1
replace language=10 if Xitsonga ==1
replace language=11 if Sepedi ==1
replace language=12 if SignLanguage ==1
label define language 1 "Afrikaans" 2 "English" 3 " Isixhosa " 4 " Isizulu " 5 " IsiNdebele " 6 " Sesotho " 7"Setswana " 8 "Siswati" 9 "Tshivenda" 10 "Xitsonga" 11 "Sepedi" 12 "SignLanguage", replace
label value language language
new variable called financial_support_type*
*Create a
gen Financial_support_type=.
replace Financial_support_type=1 if finacial_support___1==1
replace Financial_support_type=2 if finacial_support___2==1
replace Financial_support_type=3 if finacial_support___3==1
replace Financial_support_type=4 if finacial_support___4==1
replace Financial_support_type=5 if finacial_support___5==1
label define Financial_support_type 1 "Bursary" 2 "Loan" 3 "Self-funded" 4 "Family support" 5 "Other" , replace
label value Financial_support_type Financial_support_type
save "Datasets\Combined_datasets.dta" , replace
(55 missing values generated)
(5 real changes made)
(43 real changes made)
(15 real changes made)
(24 real changes made)
(2 real changes made)
(13 real changes made)
(7 real changes made)
(4 real changes made)
(2 real changes made)
(1 real change made)
(8 real changes made)
(0 real changes made)
(55 missing values generated)
(13 real changes made)
(1 real change made)
(7 real changes made)
(10 real changes made)
(0 real changes made)
file Datasets\Combined_datasets.dta saved
Checking the language variable for consistency
use "Datasets\Combined_datasets.dta"
list record_id language in 1/5
| record~d language |
|---------------------|
1. | 1 Sesotho |
2. | 3 Isizulu |
3. | 4 Sesotho |
4. | 6 Sepedi |
5. | 7 Isizulu |
+---------------------+
Generating and manipulating data using the commands/functions:
- gen, encode, egen, recode (with labels on the categories) to create one categorical variable in these three ways and show the results using different text and graphic summaries.
variable age*
*recoding the
use "Datasets\Combined_datasets.dta"
recode age (min/18=1 "18 years and below") (19/25=2 "19 to 25 years") (26/32=3 "26 to 32 years") (33/40=4 "33 to 40 years") (41/max=5 "41 and above"), gen(Age_cat) label (agelabel)
using the egen command to generate a variable academic_score*
*
egen academic_score_cat= cut( academic_score ), at(0, 49,65,75,100) label
label define academic_score_cat 0 "Fail" 1 "Third Class" 2 "Second Class" 3 "First Class", replace
save "Datasets\Combined_datasets.dta" , replace
(55 differences between age and Age_cat)
(2 missing values generated)
file Datasets\Combined_datasets.dta saved
Check missing data using two or three approaches and explain the outputs or
missingness.
use "Datasets\Combined_datasets.dta"
missing values within the dataset using 3 methods*
*Checking the summarize
misstable of missing values for each variable in the dataset.*
*Provides a summary
misstable patternpatterns of missing values across multiple variables.*
*Displays
misstable treemissing values are related across variables.* *Creates a decision tree that shows how
Obs<.
+------------------------------
| | Unique
Variable | Obs=. Obs>. Obs<. | values Min Max
-------------+--------------------------------+------------------------------
redcap_sur~r | 55 0 | 0 . .
academic_s~e | 2 53 | 20 5 95
attendence~s | 2 53 | 17 5 50
attendance~e | 2 53 | 17 10 100
research_a~s | 2 53 | 2 0 1
workshops_~d | 6 49 | 10 0 9
financial_~s | 2 53 | 2 0 1
other_resp~s | 55 0 | 0 . .
enrolment_~r | 3 52 | 8 2 2028
other_resp~e | 55 0 | 0 . .
sufficienc~e | 27 28 | 8 2 10
financial_~t | 29 26 | 12 1000 15000
income_range | 26 29 | 6 1 6
travel_mode | 32 23 | 5 1 5
other_repl~s | 55 0 | 0 . .
commute_le~h | 3 52 | 15 5 90
language | 2 53 | 10 2 11
Financial_~e | 27 28 | 4 1 4
academic_s~t | 2 53 | 4 0 3
-----------------------------------------------------------------------------
Missing-value patterns
(1 means complete)
| Pattern
Percent | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
------------+-------------------------------------------------------
<1% | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
| 1 1 1
|
40 | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
| 0 0 0
|
33 | 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0
| 0 0 0
|
4 | 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0
| 0 0 0
|
4 | 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0
| 0 0 0
|
4 | 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0
| 0 0 0
|
4 | 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
| 0 0 0
|
2 | 1 1 1 1 1 1 1 0 1 1 0 0 0 0 1 0
| 0 0 0
|
2 | 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0
| 0 0 0
|
2 | 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 0
| 0 0 0
|
2 | 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 0
| 0 0 0
|
2 | 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0
| 0 0 0
|
2 | 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0
| 0 0 0
|
2 | 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0
| 0 0 0
|
------------+-------------------------------------------------------
100% |
Variables are
Row 1: (1) academic_score (2) academic_score_cat
(3) attendance_rate (4) attendence_days
(5) financial_constraints (6) research_activities
(7) language (8) commute_length (9) enrolment_year
(10) workshops_attended (11) income_range
(12) Financial_support_type (13) sufficiency_scale
(14) financial_support (15) travel_mode (16) other_replies
Row 2: (1) other_response (2) other_responses
(3) redcap_survey_identifier
(only 7 variables shown)
Nested pattern of missing values
other_~ies other_re~e other_~ses redcap_s~r travel_m~e financia~t Financia~e
----------------------------------------------------------------------------
100% 100% 100% 100% 58% 11% 4%
7
47 4
44
42 42 42
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
0 0 0 0
0
0 0
0
0 0 0
0
0 0
0
----------------------------------------------------------------------------
(percent missing listed first)
Introduce duplicates and errors, then check the data for duplicates and clean data of duplicates and any errors.
clean data of duplicates and any errors*
*Introducing a duplicate and use "Datasets\Combined_datasets.dta"
preserve
keep if student_id == 2751133
tempfile duplicate_copy
save "duplicate_copy"
restore
append using "duplicate_copy"
for duplicates*
*checking
duplicates list student_id record_id name surname
missing value error*
*Introduction a replace student_id = . if record_id == 4
replace student_id = . if record_id == 10
for a missing value error*
*checking summarize student_id
misstable
data for introduced duplicates*
*Cleaning duplicates drop student_id record_id, force
data for missing value error*
*Cleaning replace student_id = 2751133 if record_id == 4
replace student_id = 2331203 if record_id == 10
save "Datasets\Combined_datasets.dta" , replace
(54 observations deleted)
file duplicate_copy.dta saved
(label Financial_support_type already defined)
(label language already defined)
(label program already defined)
(label financial_support_1 already defined)
(label income_range already defined)
(label residential_location already defined)
(label socioeconomic_background already defined)
(label gender already defined)
(label _merge already defined)
(label Afrikaans already defined)
(label English already defined)
(label Isizulu already defined)
(label IsiNdebele already defined)
(label Sesotho already defined)
(label Setswana already defined)
(label Siswati already defined)
(label Tshivenda already defined)
(label Xitsonga already defined)
(label Sepedi already defined)
(label SignLanguage already defined)
(label agelabel already defined)
(label academic_score_cat already defined)
Duplicates in terms of student_id record_id name surname
+----------------------------------------------+
| Obs studen~d record~d name surname |
|----------------------------------------------|
| 3 2751133 4 Mojaki Nyabela |
| 56 2751133 4 Mojaki Nyabela |
+----------------------------------------------+
(2 real changes made, 2 to missing)
(1 real change made, 1 to missing)
Obs<.
+------------------------------
| | Unique
Variable | Obs=. Obs>. Obs<. | values Min Max
-------------+--------------------------------+------------------------------
student_id | 3 53 | 48 1065346 1.05e+11
-----------------------------------------------------------------------------
Duplicates in terms of student_id record_id
(1 observation deleted)
(1 real change made)
(1 real change made)
file Datasets\Combined_datasets.dta saved
Comments Only 7 variables have incomplete records.
Use of the dtable or other appropriate commands for Bivariate (Table 1) results and/or
even regression (without any explanation), save the word based table as a separate attachment
What variables are associated with academic score
for Bivariate (Table 1) results and/or even regression*
* dtable
use "Datasets\Combined_datasets.dta"
by(academic_score_cat, tests testnotes totals) continuous(age attendence_days attendance_rate ) factor(gender residential_location income_range socioeconomic_background Financial_support_type finacial_support___1 ) title(Table1) titlestyles( font(, bold) ) dtable,
file duplicate_copy.dta already exists
r(602);
note: using test regress across levels of academic_score_cat for age,
attendence_days, and attendance_rate.
note: using test pearson across levels of academic_score_cat for gender,
residential_location, income_range, socioeconomic_background,
Financial_support_type, and finacial_support___1.
Table1
---------------------------------------------------------------------------------------------------------------------------------------------
academic_score_cat
Fail Third Class Second Class First Class Total Test
---------------------------------------------------------------------------------------------------------------------------------------------
N 2 (3.8%) 10 (18.9%) 16 (30.2%) 25 (47.2%) 53 (100.0%)
age 24.000 (0.000) 30.100 (13.076) 24.688 (3.825) 23.440 (2.468) 25.094 (6.546) 0.051
attendence_days 23.000 (0.000) 37.900 (8.888) 34.875 (12.366) 42.960 (10.212) 38.811 (11.346) 0.022
attendance_rate 46.000 (0.000) 75.800 (17.775) 69.750 (24.732) 85.920 (20.424) 77.623 (22.692) 0.022
gender
Female 0 (0.0%) 8 (80.0%) 12 (75.0%) 17 (68.0%) 37 (69.8%) 0.147
Male 2 (100.0%) 2 (20.0%) 4 (25.0%) 8 (32.0%) 16 (30.2%)
residential_location
Eastern Cape 0 (0.0%) 2 (20.0%) 5 (31.2%) 0 (0.0%) 7 (13.2%) 0.004
Free State 0 (0.0%) 1 (10.0%) 0 (0.0%) 1 (4.0%) 2 (3.8%)
Gauteng 0 (0.0%) 6 (60.0%) 8 (50.0%) 16 (64.0%) 30 (56.6%)
KwaZulu-Natal 0 (0.0%) 0 (0.0%) 3 (18.8%) 3 (12.0%) 6 (11.3%)
Limpopo 2 (100.0%) 1 (10.0%) 0 (0.0%) 1 (4.0%) 4 (7.5%)
Mpumalanga 0 (0.0%) 0 (0.0%) 0 (0.0%) 2 (8.0%) 2 (3.8%)
North West 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (4.0%) 1 (1.9%)
Western Cape 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (4.0%) 1 (1.9%)
income_range
0-<10 000 0 (0.0%) 3 (75.0%) 1 (14.3%) 8 (50.0%) 12 (41.4%) 0.002
10 000-< 30 000 0 (0.0%) 0 (0.0%) 6 (85.7%) 1 (6.2%) 7 (24.1%)
30 000 -< 50 000 0 (0.0%) 1 (25.0%) 0 (0.0%) 1 (6.2%) 2 (6.9%)
50 000 -< 80 000 2 (100.0%) 0 (0.0%) 0 (0.0%) 2 (12.5%) 4 (13.8%)
80 000 -<150 000 0 (0.0%) 0 (0.0%) 0 (0.0%) 3 (18.8%) 3 (10.3%)
150 000+ 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (6.2%) 1 (3.4%)
socioeconomic_background
Low-income household ( R0 - R100 000 per annum) 0 (0.0%) 8 (80.0%) 7 (43.8%) 14 (56.0%) 29 (54.7%) 0.089
Middle-incomehousehold (R100 000 - R500 000 per annum) 2 (100.0%) 0 (0.0%) 8 (50.0%) 9 (36.0%) 19 (35.8%)
High-income household (>R500 000 per annum) 0 (0.0%) 2 (20.0%) 1 (6.2%) 2 (8.0%) 5 (9.4%)
Financial_support_type
Bursary 0 (.%) 1 (25.0%) 3 (37.5%) 6 (37.5%) 10 (35.7%) 0.267
Loan 0 (.%) 0 (0.0%) 0 (0.0%) 1 (6.2%) 1 (3.6%)
Self-funded 0 (.%) 3 (75.0%) 2 (25.0%) 2 (12.5%) 7 (25.0%)
Family support 0 (.%) 0 (0.0%) 3 (37.5%) 7 (43.8%) 10 (35.7%)
finacial_support___1
0 2 (100.0%) 9 (90.0%) 13 (81.2%) 16 (64.0%) 40 (75.5%) 0.277
1 0 (0.0%) 1 (10.0%) 3 (18.8%) 9 (36.0%) 13 (24.5%)
---------------------------------------------------------------------------------------------------------------------------------------------
Drawing four different types of graphs and place them on one graph.
use "Datasets\Combined_datasets.dta"
*Plotting Graphs*
histogram age ,title("Histogram of Age") note( "Source : School of public Health") ytitle("Fraction") normal ytitle("Density") name(graph1 , replace)
graph hbar , over(financial_support_1) asyvars over(academic_score_cat) title("Academic score by Financial support") name(graph2 , replace)
graph pie, over(program) title(Proportion By Program) name(graph3 , replace)
graph box age, over(gender) title("Gender by age") name(graph4 ,replace)
graph combine graph1 graph2 graph3 graph4,title("Four combined graphs") saving("all_graphs.gph" ,replace)
quietly graph export all_graphs.svg, replace
file duplicate_copy.dta already exists
r(602);
(bin=7, start=18, width=5)
(file all_graphs.gph not found)
file all_graphs.gph saved
Question 3: Drawing a map in Stata to display data
Import the datasets for mapping
- A shapefile that contains information on the boundaries of South Africa and its counties
- Geo-coordinates of major South African cities
- Geographically disaggregated data we want to map (ie county population)
stats SA population by districts for 2018*****
****Load "Datasets\statssa_population_districts_2018.csv_", clear
import delimited rename adm2_id ADM2_ID
save "Datasets\pop_districts_final.dta", replace
*****loading the shapefiles*****using "Datasets\zaf_admbnda_adm2_sadb_ocha_20201109", database(SA_counties) coordinates(coord_SA) genid(county_id) replace shp2dta
file duplicate_copy.dta already exists
r(602);
(encoding automatically selected: UTF-8)
(30 vars, 52 obs)
(file Datasets\pop_districts_final.dta not found)
file Datasets\pop_districts_final.dta saved
type: 5
Merge the datasets and draw the map
merge Geo-coordinates of major South African cities and Geographically disaggregated data we want to map (ie county population) > to achieve the above we need a variable that is common to both datasets.
Having merged the county level information with master file we use the
spmap
command to draw the map
*****Merge the datasets*********use "SA_counties.dta", clear
merge 1:1 ADM2_ID using "Datasets\pop_districts_final"
drop _merge
cap
graph bar (asis) p_total ,title("Population distribution by District,2018") over(ADM1_EN) asyvars over(ADM2_EN , sort(p_total) lab(angle(90))) ysize(6) nofill legend(pos(3) col(1)) xsize(16)
quietly graph export graph_hbar.svg, replace
***Draw the map****
using "coord_SA.dta", id(county_id) legend(on) fcolor(Greens2) legend(label(2 "74247-502821") label(3 "502821-811400") label(4 "811400,1164473.5" ) label(5 "31164473.5,4949347" ) ) title("Population Density by District")
spmap p_total /*specify base map (coord_SA) and variable identifying relevant geographic units (county_id)*/
/*change default labels (just cosmetics really) */
quietly graph export graph_map1.svg, replace
file duplicate_copy.dta already exists
r(602);
Result Number of obs
-----------------------------------------
Not matched 0
Matched 52 (_merge==3)
-----------------------------------------
Comments
p =0.022
)