library(tidyverse)
library(haven)
library(EdSurvey)
library(Dire)
library(WeMix)
library(ggplot2)EdSurvey PISA USA
Loading Required Packages
Downloading the PISA 2018 Data and Subsetting the US Data
Reading PISA 2018 Data and Subsetting the US Data
eds_pisa <- EdSurvey::readPISA(
path = "C:/Users/nghimire/OneDrive - The University of Texas at Tyler/Redirected Folders/Documents/edsurvey_PISA_USA/PISA/2018",
database = "INT", countries = "usa", cognitive = "score"
)Found cached data for country code "usa"
dim(eds_pisa)[1] 4838 5045
# eds_pisa$w_fstuwtIt’s massive data set. I am surprised by the volume of the columns!! Showing all 5045 column names doesn’t make sense. I would like to printout just first 100 columns. Let’s see, what they are.
hd <- head(colnames(eds_pisa), 50)
ht <- tail(colnames(eds_pisa), 50)
cbind(hd, ht) [,1] [,2]
[1,] "ROWID" "sc003q01ta"
[2,] "cntryid" "sc053q01ta"
[3,] "cnt" "sc053q02ta"
[4,] "cntschid" "sc053q03ta"
[5,] "cntstuid" "sc053q04ta"
[6,] "cyc" "sc053q12ia"
[7,] "natcen" "sc053q13ia"
[8,] "stratum" "sc053q09ta"
[9,] "subnatio" "sc053q10ta"
[10,] "oecd" "sc053q14ia"
[11,] "adminmode" "sc053q15ia"
[12,] "langtest_qqq" "sc053q16ia"
[13,] "langtest_cog" "sc053d11ta"
[14,] "langtest_paq" "sc150q01ia"
[15,] "bookid" "sc150q02ia"
[16,] "st001d01t" "sc150q03ia"
[17,] "st003d02t" "sc150q04ia"
[18,] "st003d03t" "sc150q05ia"
[19,] "st004d01t" "sc164q01ha"
[20,] "st005q01ta" "sc064q01ta"
[21,] "st006q01ta" "sc064q02ta"
[22,] "st006q02ta" "sc064q03ta"
[23,] "st006q03ta" "sc064q04na"
[24,] "st006q04ta" "sc152q01ha"
[25,] "st007q01ta" "sc160q01wa"
[26,] "st008q01ta" "sc052q01na"
[27,] "st008q02ta" "sc052q02na"
[28,] "st008q03ta" "sc052q03ha"
[29,] "st008q04ta" "privatesch"
[30,] "st011q01ta" "schltype"
[31,] "st011q02ta" "stratio"
[32,] "st011q03ta" "schsize"
[33,] "st011q04ta" "ratcmp1"
[34,] "st011q05ta" "ratcmp2"
[35,] "st011q06ta" "totat"
[36,] "st011q07ta" "proatce"
[37,] "st011q08ta" "proat5ab"
[38,] "st011q09ta" "proat5am"
[39,] "st011q10ta" "proat6"
[40,] "st011q11ta" "clsize"
[41,] "st011q12ta" "creactiv"
[42,] "st011q16na" "edushort"
[43,] "st011d17ta" "staffshort"
[44,] "st011d18ta" "stubeha"
[45,] "st011d19ta" "teachbeha"
[46,] "st012q01ta" "scmceg"
[47,] "st012q02ta" "w_schgrnrabwt"
[48,] "st012q03ta" "w_fstuwt_sch_sum"
[49,] "st012q05na" "senwt.sch_qqq"
[50,] "st012q06na" "ver_dat.sch_qqq"
Amazing!! I seem to know some of them but I have no idea about any of these variables.
Understanding the Data
Checking the structures of some of the variables and labels. First of all, I want to check how many plausible values are there for each cogintive test areas (e.g., math, science, and reading).
# showWeights(eds_pisa, verbose = TRUE)
showPlausibleValues(eds_pisa)There are 9 subject scale(s) or subscale(s) in this
edsurvey.data.frame:
'math' subject scale or subscale with 10 plausible values.
'read' subject scale or subscale with 10 plausible values (the
default).
'scie' subject scale or subscale with 10 plausible values.
'glcm' subject scale or subscale with 10 plausible values.
'rcli' subject scale or subscale with 10 plausible values.
'rcun' subject scale or subscale with 10 plausible values.
'rcer' subject scale or subscale with 10 plausible values.
'rtsn' subject scale or subscale with 10 plausible values.
'rtml' subject scale or subscale with 10 plausible values.
Based on the information, the data set contains 10-plausible values for reading, and I want to learn a little more about them by comparing their summary.
Loos like they are fairly homogeneous, but they are different in terms of their values. The OECD report shows that the average Reading scores for US test takers was 505. The median not mean for pV2read shows the same value but all mean values are smaller. Taking average of all of the means would provide us:
\[ \text{Average of Means: (500.2 + 500.8 + 503.8 + 503.1 + 500.5 + 501.0 + 499.9 + 500.3 + 501.1 + 500.6)/10 = 500.58} \] \[ \text{Average of Medians: (503.6 + 505.2 + 503.8 + 503.1 + 504.5 + 504.4 + 502.6 + 504.0 + 503.9 + 503.6)/10 = 503.87} \] \[ \text{Median of Median of: 502.6, 503.1, 503.6, 503.6, 503.8, 503.9, 504.0, 504.4, 504.5, 505.2 = (503.8 + 503.9)/2 = 503.85} \] Based on these statistics, median-of-median for dependent variable would be much closer to reported scores, followed by median-of-median. Where, the mean-of-means is no where close to the reported average scores. Thus, any analyses conducted using either any single plausible value or an average of all plausible values would lead us to a wrong findings. Let’s come back to this point after I check some other variables.
n_distinct(eds_pisa$ratcmp1)[1] 93
levelsSDF(varnames = "ratcmp1", data = eds_pisa)Levels for Variable 'ratcmp1' (Lowest level first):
995. VALID SKIP* (n = 0)
997. NOT APPLICABLE* (n = 0)
998. INVALID* (n = 0)
999. NO RESPONSE* (n = 0)
NOTE: * indicates an omitted level.
I was not sure what the ratcmp1 variable was and if it had any levels. I requested the codebook using showCodebook(eds_data_full) and figured out that this variable is defined as the index of availability of computers (RATCMP1) is the ratio of computers available to 15-year-olds for educational purposes to the total number of students in the modal grade for 15-year-olds.
As seen above, there were 93 distinct computer indicators among the US samples and the variable does not have any labels. It should be a fairly straight continuous variables with some missing values. However, I don’t need to know more because I am going to get rid of this variable as it has nothing to do with my current analysis. My focus will be more towards teacher samples.
# getStratumVar(data = eds_pisa, weightVar = "origwt")
# summary2(eds_pisa, "composite")
# summary2(eds_pisa, "composite", weightVar = "NULL")
showCutPoints(data = eds_pisa)Achievement Levels:
Mathematics: 357.77, 420.07, 482.38, 544.68, 606.99, 669.3
Reading: 189.33, 262.04, 334.75, 407.47, 480.18, 552.89, 625.61, 698.32
Science: 260.54, 334.94, 409.54, 484.14, 558.73, 633.33, 707.93
During our hands-on training at NAEP Winter Data Workshop-2023, I learned that they use “origwt” to weight the variable and “composite” as the composite value of all plausible math scores. I tried both expression but looks like they are not the cases, here.
Coming back to the false mean scores that I discussed earlier. I know that NAEP recommends using weighted samples instead of simple samples in analyses. Here’s the further discussion about using weighted values:
Why Weighting the Samples?
- The weights account for the fraction of the population represented by each stratum and reflect the probability that an element of the stratum is selected to be in the sample. One can show that the weighted sample mean is a good estimator in the statistical sense of the population mean when the sampling is a stratified design (pp. 300).
- The unweighted sample size is in fact the size of the only sample selected. The weighted sample size is nothing more than the size of the population represented by the sample which is already known, or can be calculated from the weights (pp. 301).
- Stratification is often used when the population has groups that are different from each other regarding the variable of interest, such as students from different countries, states, and school districts in the PISA assessment. In such cases, we are usually interested in some inference (e.g, mean, proportion,total, ratio, etc.) about each startum (e.g., students from different ethnic status). The weighting comes into play when combining the inferences from the strata into an inference about the entire population (e.g., we are interested to identifying how the 4, 838 US 15-year-olds did in 2018 PISA Reading Assessment and how they compare with each other based on their ethnicity, and we are going to make an inference about the whole US 15-year-olds in the year 2018, which was close to ~12,506,174). [@Ciol et al., 2006, pp. 301]
Let’s come back to this point.
Subsetting the Reading Data
This file is a compiled version of all possible variables in the assessments (Cognitive- Reading, Math, Science, Digital Literacy; and Surveys- Student, Teacher, and School), thus, we can subset the data to suit our requirements for example school, student, or teacher etc. For now, I am going to subset student only data. The selected variables are the one useful for me in this analysis.
read_data_full <- EdSurvey::getData(eds_pisa,
varnames = c(
"ROWID", "cntschid", "cntstuid",
"privatesch", "schltype", "stratio", "schsize",
"totat", "proatce", "proat5ab", "proat5am", "proat6",
"clsize", "teachbeha", "w_schgrnrabwt", "w_fstuwt_sch_sum",
"read", "w_fstuwt"
), addAttributes = TRUE, omittedLevels = FALSE
)
# names(read_data_full)
summary(read_data_full[, -(27:106)]) ROWID cntschid cntstuid privatesch
Min. : 1 Min. :8.4e+07 Min. :84000001 Length:4838
1st Qu.:1210 1st Qu.:8.4e+07 1st Qu.:84002155 Class :character
Median :2420 Median :8.4e+07 Median :84004338 Mode :character
Mean :2420 Mean :8.4e+07 Mean :84004300
3rd Qu.:3629 3rd Qu.:8.4e+07 3rd Qu.:84006418
Max. :4838 Max. :8.4e+07 Max. :84008626
schltype stratio schsize
PRIVATE INDEPENDENT : 163 Min. : 1.667 Min. : 22
PRIVATE GOVERNMENT-DEPENDENT: 13 1st Qu.: 13.100 1st Qu.: 639
PUBLIC :4636 Median : 16.154 Median :1411
NO RESPONSE : 26 Mean : 17.523 Mean :1491
3rd Qu.: 19.000 3rd Qu.:2076
Max. :100.000 Max. :4507
NA's :761 NA's :559
totat proatce proat5ab proat5am
Min. : 1.00 Min. :0.0000 Min. :0.0116 Min. :0.0167
1st Qu.: 46.00 1st Qu.:0.9765 1st Qu.:0.6095 1st Qu.:0.2692
Median : 80.50 Median :1.0000 Median :1.0000 Median :0.4727
Mean : 85.83 Mean :0.9407 Mean :0.8006 Mean :0.4843
3rd Qu.:114.00 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.6593
Max. :280.00 Max. :1.0000 Max. :1.0000 Max. :1.0000
NA's :702 NA's :802 NA's :861 NA's :853
proat6 clsize teachbeha w_schgrnrabwt
Min. :0.0000 26-30 STUDENTS:1660 Min. :-2.0409 Min. : 20.69
1st Qu.:0.0000 21-25 STUDENTS:1271 1st Qu.:-0.1274 1st Qu.: 42.70
Median :0.0154 31-35 STUDENTS: 645 Median : 0.2266 Median : 66.49
Mean :0.0227 16-20 STUDENTS: 478 Mean : 0.2720 Mean : 120.02
3rd Qu.:0.0330 NO RESPONSE : 277 3rd Qu.: 0.8952 3rd Qu.: 157.37
Max. :0.2222 (Other) : 221 Max. : 1.9937 Max. :1294.02
NA's :886 NA's : 286 NA's :467
w_fstuwt_sch_sum pv1read pv2read pv3read
Min. : 820.3 Min. :161.3 Min. :176.5 Min. :132.4
1st Qu.:18102.3 1st Qu.:423.5 1st Qu.:424.6 1st Qu.:423.2
Median :21341.2 Median :503.6 Median :505.2 Median :503.8
Mean :22421.2 Mean :500.2 Mean :500.8 Mean :500.3
3rd Qu.:25895.6 3rd Qu.:578.7 3rd Qu.:578.4 3rd Qu.:577.9
Max. :49343.6 Max. :868.9 Max. :898.5 Max. :858.4
pv4read pv5read pv6read pv7read
Min. :140.3 Min. :137.7 Min. :128.1 Min. :148.7
1st Qu.:426.1 1st Qu.:423.2 1st Qu.:424.4 1st Qu.:424.4
Median :503.1 Median :504.5 Median :504.4 Median :502.6
Mean :501.1 Mean :500.5 Mean :501.0 Mean :499.9
3rd Qu.:579.3 3rd Qu.:579.0 3rd Qu.:579.3 3rd Qu.:578.6
Max. :834.1 Max. :853.5 Max. :844.8 Max. :815.3
pv8read pv9read pv10read w_fstuwt
Min. :170.9 Min. :173.6 Min. :167.8 Min. : 262.8
1st Qu.:424.9 1st Qu.:426.0 1st Qu.:426.1 1st Qu.: 563.0
Median :504.0 Median :503.9 Median :503.6 Median : 661.7
Mean :500.3 Mean :501.1 Mean :500.6 Mean : 735.6
3rd Qu.:579.4 3rd Qu.:577.1 3rd Qu.:579.0 3rd Qu.: 854.5
Max. :823.4 Max. :818.1 Max. :834.1 Max. :2946.1
# showCodebook(read_data_full)
# View(showCodebook(read_data_full))The truncated datafile contains all teacher related variables. I don’t need all of them. For example, I am not going to use the teacher behavior and ‘w_fstuwt_sch_sum’ variable. Thus, I got rid of them.
Creating A Composite Varialbe Using Ten Plausible Values
psych::alpha(read_data_full[, 17:26])
Reliability analysis
Call: psych::alpha(x = read_data_full[, 17:26])
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.99 0.99 0.99 0.94 152 0.00014 501 105 0.94
95% confidence boundaries
lower alpha upper
Feldt 0.99 0.99 0.99
Duhachek 0.99 0.99 0.99
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
pv1read 0.99 0.99 0.99 0.94 137 0.00016 1.6e-06 0.94
pv2read 0.99 0.99 0.99 0.94 136 0.00016 1.5e-06 0.94
pv3read 0.99 0.99 0.99 0.94 137 0.00016 1.6e-06 0.94
pv4read 0.99 0.99 0.99 0.94 137 0.00016 1.7e-06 0.94
pv5read 0.99 0.99 0.99 0.94 136 0.00016 1.7e-06 0.94
pv6read 0.99 0.99 0.99 0.94 136 0.00016 1.8e-06 0.94
pv7read 0.99 0.99 0.99 0.94 137 0.00016 1.4e-06 0.94
pv8read 0.99 0.99 0.99 0.94 136 0.00016 1.2e-06 0.94
pv9read 0.99 0.99 0.99 0.94 136 0.00016 1.4e-06 0.94
pv10read 0.99 0.99 0.99 0.94 136 0.00016 1.4e-06 0.94
Item statistics
n raw.r std.r r.cor r.drop mean sd
pv1read 4838 0.97 0.97 0.97 0.96 500 108
pv2read 4838 0.97 0.97 0.97 0.97 501 108
pv3read 4838 0.97 0.97 0.97 0.96 500 108
pv4read 4838 0.97 0.97 0.97 0.96 501 108
pv5read 4838 0.97 0.97 0.97 0.97 500 108
pv6read 4838 0.97 0.97 0.97 0.97 501 108
pv7read 4838 0.97 0.97 0.97 0.96 500 108
pv8read 4838 0.97 0.97 0.97 0.97 500 108
pv9read 4838 0.97 0.97 0.97 0.97 501 107
pv10read 4838 0.97 0.97 0.97 0.97 501 108
Qualitative Descriptors of Cronbach’s alpha
- .95 - 1.00: Excellent
- .90 - .94: Great
- .80 - .89: Good
- .70 - .79: Acceptable
- .60 - .69: Questionable, and
- .00 - .59: Unacceptable
Based on the statistics, the raw alpha of .99 shows Excellent internal consistency among the plausible values. The reliability would not get affected even if we drop one or any of the plausible values. Given the alpha = .99, and all ten plausible values appear to measure same thing, we decided to retain all ten plausible values for a composite variable.
read_data_full$composite <- rowMeans(read_data_full[, 17:26], na.rm = TRUE)
summary(read_data_full$composite) Min. 1st Qu. Median Mean 3rd Qu. Max.
157.3 425.9 504.8 500.6 578.4 810.5
Nope this is not what I wanted!! I got to try something new but none of the average means when a corresponding item is dropped from the instrument in the Item Statistics are close to the reported mean. The outcomes are worse than what I previously reported. The average of the composite value is also much lower,i.e., 500.6 than 505. Sadly, I cannot proceed with this so called ‘composite’ value as dependent variable.
Let’s get back to circle one. The summary of the variables above give include some variable that I need to check again. Here’s the further summary in a tabular form:
rbind(
summary(read_data_full$w_fstuwt),
summary(read_data_full$w_fstuwt_sch_sum),
summary(read_data_full$w_schgrnrabwt)
) Min. 1st Qu. Median Mean 3rd Qu. Max.
[1,] 262.75044 563.04221 661.71657 735.6438 854.5203 2946.134
[2,] 820.25602 18102.33043 21341.22828 22421.2347 25895.5895 49343.581
[3,] 20.68506 42.69652 66.48856 120.0234 157.3688 1294.020
Based on the statistics, w_fstuwt can be of some merit to my study but not w_fstuwt_sch_sum & w_schgrnrabwt.
Now, I would like to go back to my original dataset and check if there is any variable that is listed as weights.
showWeights(eds_pisa, verbose = TRUE)There is 1 full sample weight in this edsurvey.data.frame:
'w_fstuwt' with 80 JK replicate weights (the default).
Jackknife replicate weight variables associated with the full
sample weight 'w_fstuwt':
'w_fsturwt1', 'w_fsturwt2', 'w_fsturwt3', 'w_fsturwt4',
'w_fsturwt5', 'w_fsturwt6', 'w_fsturwt7', 'w_fsturwt8',
'w_fsturwt9', 'w_fsturwt10', 'w_fsturwt11', 'w_fsturwt12',
'w_fsturwt13', 'w_fsturwt14', 'w_fsturwt15', 'w_fsturwt16',
'w_fsturwt17', 'w_fsturwt18', 'w_fsturwt19', 'w_fsturwt20',
'w_fsturwt21', 'w_fsturwt22', 'w_fsturwt23', 'w_fsturwt24',
'w_fsturwt25', 'w_fsturwt26', 'w_fsturwt27', 'w_fsturwt28',
'w_fsturwt29', 'w_fsturwt30', 'w_fsturwt31', 'w_fsturwt32',
'w_fsturwt33', 'w_fsturwt34', 'w_fsturwt35', 'w_fsturwt36',
'w_fsturwt37', 'w_fsturwt38', 'w_fsturwt39', 'w_fsturwt40',
'w_fsturwt41', 'w_fsturwt42', 'w_fsturwt43', 'w_fsturwt44',
'w_fsturwt45', 'w_fsturwt46', 'w_fsturwt47', 'w_fsturwt48',
'w_fsturwt49', 'w_fsturwt50', 'w_fsturwt51', 'w_fsturwt52',
'w_fsturwt53', 'w_fsturwt54', 'w_fsturwt55', 'w_fsturwt56',
'w_fsturwt57', 'w_fsturwt58', 'w_fsturwt59', 'w_fsturwt60',
'w_fsturwt61', 'w_fsturwt62', 'w_fsturwt63', 'w_fsturwt64',
'w_fsturwt65', 'w_fsturwt66', 'w_fsturwt67', 'w_fsturwt68',
'w_fsturwt69', 'w_fsturwt70', 'w_fsturwt71', 'w_fsturwt72',
'w_fsturwt73', 'w_fsturwt74', 'w_fsturwt75', 'w_fsturwt76',
'w_fsturwt77', 'w_fsturwt78', 'w_fsturwt79', and 'w_fsturwt80'
Wow! It is indeed w_fstuwt variable. There are 80 different weights, i.e., w_fsturwt1: w_fsturwt80 for each of the 4838 students. We can use the w-fstuwt as the composite of all of these 80 numbers. The following information comes from the NCES websites, which describes the ~ Jackknife Replication Method~: “A replication method that estimates standard errors of percentages and other statistics. It is particularly suited to complex sample designs. In the jackknife, sample units are grouped into pairs (replicate groups). Portions of the sample (replicates) are formed by repeatedly omitting one half of the units in one of the replicate groups and calculating the desired statistic (replicate estimate). The number of replicate estimates is equal to the number of replicate groups. The variability among the replicate estimates is used to estimate the overall sampling variability” (https://nces.ed.gov/nationsreportcard/glossary.aspx#jackknife).
Now, I can move forward and conduct descriptive analyses. Before starting a series of descriptive statistics, I want to see whether there are difference in weighted and unweighted means of reading scores among students. Here’s the unweighted mean score:
summary2(read_data_full, "read", weightVar = NULL)Estimates are not weighted.
Variable N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 pv1read 4838 161.343 423.4801 503.6375 500.1502 578.6824 868.870 108.4549
2 pv2read 4838 176.458 424.5367 505.2045 500.7907 578.4439 898.478 107.9547
3 pv3read 4838 132.423 423.1459 503.8005 500.3032 577.9054 858.393 107.8982
4 pv4read 4838 140.293 426.1104 503.0610 501.0978 579.3880 834.076 108.4841
5 pv5read 4838 137.737 423.1717 504.4565 500.4783 579.0315 853.488 108.0790
6 pv6read 4838 128.111 424.4202 504.4380 500.9541 579.3228 844.836 108.1850
7 pv7read 4838 148.739 424.3686 502.5630 499.8935 578.6955 815.275 107.7119
8 pv8read 4838 170.907 424.9227 503.9935 500.3028 579.3685 823.427 108.1128
9 pv9read 4838 173.639 425.9822 503.8790 501.0805 577.1258 818.066 107.4320
10 pv10read 4838 167.822 426.0661 503.5685 500.6259 579.0812 834.091 107.9336
NA's
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
The means, as in the past, not close to reported 505. These are the means for all ten plausible scores in reading. Lets check the weighted mean for reading scores among US 15-year-olds who took part in the PISA 2018 assessment. Either of the codes below would give us the same results.
# summary2(read_data_full, "read")
summary2(read_data_full, "read", weightVar = "w_fstuwt")Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max.
1 read 4838 3559045 153.7472 429.7936 509.7353 505.3528 583.5768 844.9
SD NA's Zero weights
1 107.9064 0 0
Amazing! That’s what I want. The mean reading score for the US student was 505.3528. Thus, w_fstuwt would be the weighted sample and read the outcome/dependent variable. Let’s draw a histogram and look at the distribution.
Descriptive Statistics
Average Scores Based on Class Size
clsize_reading <- edsurveyTable(formula = read ~ clsize, data = read_data_full)
clsize_reading
Formula: read ~ clsize
Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4275
Summary Table:
clsize N WTD_N PCT SE(PCT) MEAN SE(MEAN)
15 STUDENTS OR FEWER 98 78303.96 2.521100 1.020527 490.9990 10.635406
16-20 STUDENTS 478 361655.05 11.643965 2.477297 505.8996 10.771112
21-25 STUDENTS 1271 911449.99 29.345344 3.874926 507.4223 8.233802
26-30 STUDENTS 1660 1134459.30 36.525425 3.979064 508.5477 6.052202
31-35 STUDENTS 645 501987.16 16.162144 2.511393 511.2569 9.204440
36-40 STUDENTS 123 118088.67 3.802022 1.777732 530.6957 16.336060
School Type and Reading Scores
schltype_reading <- edsurveyTable(
formula = read ~ schltype,
data = read_data_full
)
schltype_reading
Formula: read ~ schltype
Plausible values: 10
jrrIMax: 1
Weight variable: 'w_fstuwt'
Variance method: jackknife
JK replicates: 80
full data n: 4838
n used: 4812
Summary Table:
schltype N WTD_N PCT SE(PCT) MEAN
1 PRIVATE INDEPENDENT 163 203557.99 5.757412 1.2083048 525.3717
2 PRIVATE GOVERNMENT-DEPENDENT 13 22100.53 0.625089 0.3349908 499.0487
3 PUBLIC 4636 3309922.77 93.617499 1.2244331 503.6134
SE(MEAN)
1 21.037771
2 27.273720
3 3.421781
Student Teacher Ratio with Omitted Variables like NAs
summary2(read_data_full, "stratio", omittedLevels = FALSE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 stratio 4838 3559045 1.6667 12.5827 16 17.21917 18.9474 100 9.444379
NA's Zero weights
1 761 0
Student Teacher Ratio without Omitted Variables
summary2(read_data_full, "stratio", omittedLevels = TRUE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 stratio 4077 2932840 1.6667 12.5827 16 17.21917 18.9474 100 9.444379
NA's Zero weights
1 0 0
School Size (by Total Students)
summary2(read_data_full, "schsize", omittedLevels = FALSE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 schsize 4838 3559045 22 639 1411 1490.035 2061 4507 986.5392
NA's Zero weights
1 559 0
Total Teachers by School
summary2(read_data_full, "totat", omittedLevels = FALSE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 totat 4838 3559045 1 48 80 86.36344 115 280 51.42637
NA's Zero weights
1 702 0
Percentage of Teachers Fully Certified
summary2(read_data_full, "proatce", omittedLevels = FALSE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 proatce 4838 3559045 0 0.9685 1 0.9256084 1 1 0.2051878
NA's Zero weights
1 802 0
Percentage of Teachers with Bachelor Degrees or Above Degrees
summary2(read_data_full, "proat5ab", omittedLevels = FALSE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max.
1 proat5ab 4838 3559045 0.0116 0.6541 1 0.8113005 1 1
SD NA's Zero weights
1 0.2747722 861 0
Percentage of Teachers with Master Degree or Above (Per-School)
summary2(read_data_full, "proat5am", omittedLevels = FALSE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max. SD
1 proat5am 4838 3559045 0.0167 0.2974 0.5 0.50363 0.6818 1 0.250698
NA's Zero weights
1 853 0
Percentage of Teachers by School School Having Doctoral Degree or Other Higher Degrees
summary2(read_data_full, "proat6", omittedLevels = FALSE)Estimates are weighted using the weight variable 'w_fstuwt'
Variable N Weighted N Min. 1st Qu. Median Mean 3rd Qu. Max.
1 proat6 4838 3559045 0 0 0.0152 0.02227463 0.033 0.2222
SD NA's Zero weights
1 0.02850491 886 0
References
- Ciol, M. A., Hoffman, J. M., Dudgeon, B. J., Shumway-Cook, A., Yorkston, K. M., & Chan, L. (2006). Understanding the use of weights in the analysis of data from multistage surveys. Archives of physical medicine and rehabilitation, 87(2), 299-303. https://doi.org/10.1016/j.apmr.2005.09.021