The data set will be about the highschools in a school district. It is going to be a really big school district. It will have 500 High schools. These are the variables I am going to create:
fnteacher = Number of female teachers per high schools in a district
fsalary = Average salary of the female teachers by school
femaleperc = Percentage of female teachers by school
totteacher = Total teachers by school
asalary = Average salary of the teachers in the school
Based on the variable dynamics, there should be some direct link among the ‘total teacher’, ‘female teacher’, and ‘female percentage’.
Let’s do it step by step.
First,I am going to create an empty matrix named ‘schooldata’ which will have 500 rows and 4 columns. I will then name the columns as mentioned above.
schooldata <- matrix(nrow = 500, ncol = 4, data = "Not Yet")
colnames(schooldata) <- c("totteacher", "femaleperc", "asalary", "fsalary")
# Lets check how we did
head(schooldata)
totteacher femaleperc asalary fsalary
[1,] "Not Yet" "Not Yet" "Not Yet" "Not Yet"
[2,] "Not Yet" "Not Yet" "Not Yet" "Not Yet"
[3,] "Not Yet" "Not Yet" "Not Yet" "Not Yet"
[4,] "Not Yet" "Not Yet" "Not Yet" "Not Yet"
[5,] "Not Yet" "Not Yet" "Not Yet" "Not Yet"
[6,] "Not Yet" "Not Yet" "Not Yet" "Not Yet"
Yup. I now have an empty matrix. I have to fill this matrix with the data mentioned above.
set.seed(123)
#total number of teacher range between 90 through 120
totteacher <- 90:120
#percentage of the female teachers by school
femaleperc <- 60:80
# Simulating average salary per teacher by school
asalary <- 44200:70400
# Simulating average salary per female teacher by school
fsalary <- 39780:63360
#Quick Check
head(totteacher);head(femaleperc);head(asalary);head(fsalary)
[1] 90 91 92 93 94 95
[1] 60 61 62 63 64 65
[1] 44200 44201 44202 44203 44204 44205
[1] 39780 39781 39782 39783 39784 39785
The variables have been created. Some of the variables have more data points than 500 (e.g., asalary), while some other have way less (e.g., femaleperc). As mentioned above, we need 500 data points, which means we need 500 values for each of these variables. And we want R to select them randomly for us.
set.seed(123)
for (i in 1:500){
schooldata [i,1] <- sample(totteacher, size = 1)#Row 1:500 in column 1
schooldata [i,2] <- sample(femaleperc, size = 1)#Row 1:500 in column 2
schooldata [i,3] <- sample(asalary, size = 1)#Row 1:500 in column 3
schooldata [i,4] <- sample(fsalary, size = 1)#Row 1:500 in column 4
}
schooldata <- as.data.frame(schooldata)
head(schooldata)
totteacher femaleperc asalary fsalary
1 120 74 69301 42765
2 107 70 55837 44540
3 115 64 65690 52415
4 114 68 54404 53446
5 97 66 53841 52278
6 93 73 61138 53962
Looks like the data has been generated. We only have four variables of interest. I am going to create a new variable that accounts for the total number of female teachers and add that column to the existing matrix.
schooldata$totteacher <- as.integer(schooldata$totteacher)
schooldata$femaleperc <- as.integer(schooldata$femaleperc)
schooldata$asalary <- as.numeric(schooldata$asalary)
schooldata$fsalary <- as.numeric(schooldata$fsalary)
str(schooldata)
'data.frame': 500 obs. of 4 variables:
$ totteacher: int 120 107 115 114 97 93 110 98 117 116 ...
$ femaleperc: int 74 70 64 68 66 73 71 68 80 65 ...
$ asalary : num 69301 55837 65690 54404 53841 ...
$ fsalary : num 42765 44540 52415 53446 52278 ...
That worked.
schooldata$fnteacher <- (schooldata$totteacher*schooldata$femaleperc)/100
#Quick Check
head(schooldata)
totteacher femaleperc asalary fsalary fnteacher
1 120 74 69301 42765 88.80
2 107 70 55837 44540 74.90
3 115 64 65690 52415 73.60
4 114 68 54404 53446 77.52
5 97 66 53841 52278 64.02
6 93 73 61138 53962 67.89
Yup. It worked.
schooldata$fnteacher <- round(schooldata$fnteacher, digits = 0)
#Quick Check
schooldata
totteacher femaleperc asalary fsalary fnteacher
1 120 74 69301 42765 89
2 107 70 55837 44540 75
3 115 64 65690 52415 74
4 114 68 54404 53446 78
5 97 66 53841 52278 64
6 93 73 61138 53962 68
7 110 71 53558 49770 78
8 98 68 61630 63230 67
9 117 80 63790 47768 94
10 116 65 60800 50600 75
11 97 71 62182 60400 69
12 107 60 64672 59070 64
13 114 65 52668 53554 74
14 98 74 58625 47170 73
15 105 79 60165 50793 83
16 100 67 50941 51252 67
17 111 77 66712 50053 85
18 93 72 62700 45913 67
19 120 78 50752 61591 94
20 111 73 51326 49419 81
21 105 71 58656 49105 75
22 119 62 53892 60739 74
23 96 62 68414 49885 60
24 104 80 56652 47595 83
25 108 69 59737 41165 75
26 101 61 62961 60207 62
27 109 73 56248 62241 80
28 103 62 60303 54993 64
29 108 74 60351 47060 80
30 112 70 58414 54066 78
31 112 65 68757 63074 73
32 96 69 61612 48297 66
33 105 80 51934 41735 84
34 101 73 66467 60256 74
35 108 66 61569 44012 71
36 96 61 62167 60848 59
37 108 79 54635 49764 85
38 97 79 45184 58670 77
39 105 79 60778 53952 83
40 93 79 57363 62841 73
41 106 69 62715 54748 73
42 112 67 70101 43728 75
43 91 70 51956 42537 64
44 114 67 70227 52739 76
45 115 63 64084 42848 72
46 103 80 52919 60218 82
47 118 60 59057 56636 71
48 97 77 61732 41816 75
49 98 66 68324 51195 65
50 120 70 45512 39964 84
51 118 78 54961 57496 92
52 111 72 49074 49532 80
53 109 66 58944 43578 72
54 115 79 47780 50939 91
55 114 68 48912 56442 78
56 103 73 50989 46270 75
57 90 69 62488 54813 62
58 106 80 69731 53757 85
59 98 79 64045 61781 77
60 119 76 63591 49416 90
61 114 79 46410 42065 90
62 90 61 51883 58541 55
63 90 64 64796 52715 58
64 112 80 52025 41485 90
65 120 65 68462 54803 78
66 106 80 45187 50466 85
67 110 67 55260 60970 74
68 109 77 56836 59972 84
69 91 76 56873 40944 69
70 113 76 66666 41992 86
71 120 71 57888 60016 85
72 99 65 68625 57421 64
73 95 75 62695 46003 71
74 92 63 63618 54590 58
75 92 62 50822 47507 57
76 91 74 53182 44780 67
77 118 66 57733 45186 78
78 119 68 64971 61509 81
79 111 71 56784 60864 79
80 118 70 47310 49232 83
81 94 70 69832 48054 66
82 101 61 54716 43951 62
83 119 62 59589 47164 74
84 107 74 50717 50342 79
85 108 77 55047 53449 83
86 99 67 67665 51139 66
87 115 72 59963 43593 83
88 109 79 60421 49932 86
89 105 67 55032 45346 70
90 107 67 56730 58317 72
91 112 70 65304 56549 78
92 108 72 52532 39810 78
93 119 78 58301 60353 93
94 99 75 51787 48149 74
95 90 75 66115 44399 68
96 110 61 60787 43026 67
97 108 80 67373 58595 86
98 115 74 55866 42142 85
99 92 73 66568 45188 67
100 105 70 59659 51035 74
101 101 72 48305 60429 73
102 113 66 46649 41012 75
103 109 68 64050 52075 74
104 107 64 46788 55445 68
105 95 69 46529 57772 66
106 100 65 62343 56367 65
107 108 63 69631 41727 68
108 92 73 46159 40425 67
109 93 69 61847 52958 64
110 113 75 64667 42861 85
111 114 65 49931 53479 74
112 104 79 48573 60283 82
113 96 74 68707 49698 71
114 94 73 49622 41522 69
115 105 78 52079 45570 82
116 115 64 66318 41875 74
117 117 75 65072 51182 88
118 100 78 65349 61472 78
119 99 61 49701 58249 60
120 109 73 47093 57829 80
121 113 67 68082 57720 76
122 114 66 64154 56762 75
123 97 69 51020 54196 67
124 112 65 65103 45929 73
125 104 76 60298 45375 79
126 104 79 60776 44178 82
127 109 61 47895 39970 66
128 119 78 51889 53718 93
129 116 76 54500 49417 88
130 101 63 45596 45434 64
131 108 61 49919 59879 66
132 111 70 69533 43097 78
133 113 78 60834 42387 88
134 92 60 53381 58373 55
135 91 67 67515 62385 61
136 114 72 67950 61707 82
137 118 62 68870 51961 73
138 105 60 69395 40227 63
139 107 80 51270 54059 86
140 113 64 51591 58996 72
141 115 70 55486 60477 80
142 93 72 65327 39858 67
143 120 63 50580 60838 76
144 96 60 48717 52560 58
145 115 76 48894 43544 87
146 95 69 69652 42185 66
147 98 67 59688 44723 66
148 111 61 53647 49329 68
149 101 67 64720 51307 68
150 114 68 60635 61508 78
151 92 67 64425 56740 62
152 98 66 64628 53111 65
153 103 75 46987 54160 77
154 101 72 62587 56769 73
155 120 68 52147 42859 82
156 96 74 60504 50766 71
157 96 78 68006 57821 75
158 92 65 52462 45671 60
159 99 76 64599 46699 75
160 110 62 49171 43288 68
161 103 64 44469 57599 66
162 95 68 55112 44503 65
163 103 78 50572 55268 80
164 117 79 47493 56999 92
165 116 70 65580 54617 81
166 93 78 44490 47656 73
167 118 68 46976 49442 80
168 113 76 61948 50088 86
169 113 71 53417 50905 80
170 107 63 67857 51154 67
171 112 67 52840 46811 75
172 100 64 53938 55858 64
173 104 69 58806 44082 72
174 109 71 47248 60192 77
175 114 63 69162 43114 72
176 93 67 67292 45256 62
177 113 65 61419 60782 73
178 99 64 64767 45756 63
179 100 69 65142 59990 69
180 117 69 57172 51870 81
181 113 62 62873 53155 70
182 97 71 54599 41482 69
183 96 65 45752 45869 62
184 102 69 66601 59706 70
185 104 67 47081 49310 70
186 104 78 44734 47858 81
187 111 75 48751 42944 83
188 98 80 62679 42796 78
189 91 68 53586 40238 62
190 93 66 58073 48110 61
191 111 60 53234 49823 67
192 115 65 49916 54445 75
193 114 77 48490 61382 88
194 91 67 51331 57712 61
195 97 61 67362 62569 59
196 112 76 63559 44081 85
197 96 72 66543 51913 69
198 100 69 62136 43690 69
199 112 71 62499 58964 80
200 97 74 64597 60657 72
201 115 76 47816 54933 87
202 107 79 52738 47757 85
203 99 80 48102 61622 79
204 112 77 60089 44751 86
205 101 80 57887 61191 81
206 109 69 62316 50213 75
207 107 71 58341 54691 76
208 102 70 68442 44873 71
209 107 69 53104 43840 74
210 111 78 68975 59472 87
211 90 61 61987 41742 55
212 118 75 55468 60728 88
213 120 63 62018 47354 76
214 92 74 64270 44957 68
215 96 61 59846 56629 59
216 107 62 63119 52881 66
217 98 74 58424 55786 73
218 119 67 50664 50300 80
219 91 70 69839 59383 64
220 100 77 61533 44273 77
221 92 62 53269 43583 57
222 118 66 48446 42418 78
223 112 67 60744 58655 75
224 118 76 48960 54013 90
225 113 71 52615 61291 80
226 105 66 50695 56138 69
227 90 70 57962 53369 63
228 106 79 51978 44529 84
229 94 77 53935 53979 72
230 99 77 50360 48620 76
231 92 64 55932 59420 59
232 105 79 63105 52422 83
233 92 72 61643 41490 66
234 112 78 69444 63280 87
235 101 80 48443 50112 81
236 102 72 60955 40296 73
237 113 72 69246 59583 81
238 101 67 66853 41225 68
239 113 75 46846 54595 85
240 116 71 44503 57970 82
241 90 71 47025 47033 64
242 99 63 64592 44924 62
243 119 79 63948 58083 94
244 116 68 56698 54662 79
245 91 71 53622 45190 65
246 118 66 49284 56080 78
247 103 61 53796 53088 63
248 98 74 49960 41484 73
249 98 66 48537 43355 65
250 112 65 64583 53024 73
251 105 75 48297 57861 79
252 103 80 64894 41501 82
253 116 66 61933 55079 77
254 92 73 45922 42577 67
255 105 69 47811 44591 72
256 107 62 50728 53135 66
257 106 60 59723 53206 64
258 102 77 60918 42928 79
259 103 77 67603 40583 79
260 113 62 49536 47470 70
261 108 75 60438 62318 81
262 95 64 60521 49913 61
263 106 64 54042 42162 68
264 108 62 45826 45817 67
265 120 74 60194 49537 89
266 91 64 59502 59365 58
267 105 67 53950 51514 70
268 113 80 51642 51948 90
269 108 73 45434 47144 79
270 102 75 69262 44747 76
271 105 74 68417 43767 78
272 92 76 63982 50460 70
273 102 62 59745 55065 63
274 102 73 60198 56700 74
275 118 80 49847 54355 94
276 98 64 54749 51769 63
277 103 71 64678 57657 73
278 113 73 49477 46290 82
279 97 72 68430 59464 70
280 97 74 69557 49316 72
281 112 73 53950 47170 82
282 120 80 59896 56419 96
283 96 66 44244 57490 63
284 101 73 51834 44703 74
285 108 76 66352 47568 82
286 119 73 64574 52226 87
287 95 60 53219 52898 57
288 99 68 70147 54545 67
289 108 60 66871 61289 65
290 117 79 66415 60348 92
291 103 73 49583 61454 75
292 112 79 61612 60401 88
293 90 60 57356 49350 54
294 93 62 59313 52591 58
295 105 62 50547 62352 65
296 97 76 45562 52325 74
297 106 76 59354 40693 81
298 115 78 58604 61632 90
299 118 73 60584 54977 86
300 119 77 47323 57706 92
301 94 69 55755 57955 65
302 97 76 49998 60653 74
303 119 60 64673 60562 71
304 103 66 55142 47621 68
305 99 80 62326 41568 79
306 108 77 45845 55907 83
307 111 77 61018 49830 85
308 100 79 51046 60252 79
309 94 73 69817 62628 69
310 96 74 54104 45865 71
311 102 60 50189 51467 61
312 99 61 52380 56509 60
313 107 64 47203 48169 68
314 118 70 59299 46913 83
315 107 66 59216 40177 71
316 118 70 61539 62772 83
317 101 71 69978 59068 72
318 102 71 54317 51471 72
319 101 70 63674 46958 71
320 120 63 62889 46313 76
321 96 63 67550 41875 60
322 119 63 67408 42216 75
323 105 70 69792 58550 74
324 117 74 46819 55854 87
325 118 76 45668 48524 90
326 114 74 58867 42337 84
327 108 79 65390 42094 85
328 104 68 60065 45238 71
329 100 60 66672 49786 60
330 113 76 58161 45355 86
331 107 65 58368 47893 70
332 96 66 51763 57676 63
333 105 62 68678 43601 65
334 119 61 51395 60679 73
335 115 60 64769 45176 69
336 99 80 57120 41839 79
337 90 71 63796 60053 64
338 90 68 47942 40131 61
339 91 62 65416 58618 56
340 107 64 70001 51794 68
341 96 75 69690 62439 72
342 108 62 55753 48896 67
343 102 63 62075 54345 64
344 92 80 54748 57611 74
345 94 64 52858 44367 60
346 119 66 55829 62452 79
347 118 61 66232 61550 72
348 103 67 50504 40453 69
349 99 61 55332 47497 60
350 102 71 49610 40990 72
351 90 73 63570 52185 66
352 100 67 51960 46511 67
353 104 73 51930 42414 76
354 106 65 58771 57462 69
355 110 76 57836 41556 84
356 113 75 62482 62024 85
357 103 64 45202 49115 66
358 119 72 67355 44455 86
359 100 80 55131 44685 80
360 109 76 44274 54411 83
361 110 64 55521 58294 70
362 100 60 68425 55215 60
363 103 62 53207 62563 64
364 115 63 67755 55869 72
365 90 76 65404 46471 68
366 106 66 60832 55198 70
367 94 68 65478 47145 64
368 98 62 54512 43846 61
369 116 65 59129 48884 75
370 102 71 48470 45504 72
371 99 80 49591 44868 79
372 95 74 53762 58361 70
373 97 60 48280 41001 58
374 94 68 46022 41894 64
375 98 80 64249 58169 78
376 115 78 55232 44169 90
377 119 65 54415 61927 77
378 111 76 61467 53673 84
379 115 66 68674 57825 76
380 96 70 60220 48324 67
381 108 67 67768 59615 72
382 119 73 48703 54250 87
383 92 60 46975 58418 55
384 116 73 60610 49044 85
385 97 60 53245 47837 58
386 102 78 55046 63229 80
387 109 68 54664 49250 74
388 106 77 51824 43428 82
389 91 70 67200 57300 64
390 119 73 56729 56386 87
391 103 79 44431 52672 81
392 99 76 56027 44842 75
393 111 71 49062 44867 79
394 100 77 50982 55279 77
395 96 62 60245 51234 60
396 105 76 64877 52871 80
397 110 64 50532 50445 70
398 97 69 62814 57121 67
399 103 74 51930 61827 76
400 90 61 65398 57453 55
401 94 64 52574 46676 60
402 115 77 61737 41214 89
403 120 70 54247 52680 84
404 106 73 53514 51491 77
405 114 78 45961 60127 89
406 106 68 45657 44204 72
407 119 73 55382 53123 87
408 95 67 58659 58199 64
409 91 60 45625 54186 55
410 116 66 58059 62120 77
411 100 69 45810 62719 69
412 90 69 54983 48826 62
413 110 71 48270 60168 78
414 107 67 56054 63043 72
415 96 73 50705 58139 70
416 105 62 65565 52654 65
417 118 62 59439 47934 73
418 116 76 50176 42751 88
419 111 72 54227 51990 80
420 108 65 65524 62220 70
421 100 73 54746 40351 73
422 119 80 46046 62421 95
423 111 80 55444 51055 89
424 108 71 67201 58200 77
425 109 63 52611 60762 69
426 116 75 45400 51215 87
427 91 64 57786 43952 58
428 96 64 51310 41971 61
429 101 70 53370 43977 71
430 113 71 65876 43145 80
431 99 67 55985 61321 66
432 116 78 63556 63297 90
433 112 76 65501 55866 85
434 108 78 54299 44720 84
435 90 65 68327 58197 58
436 96 76 61388 61315 73
437 116 72 66511 43606 84
438 90 63 67935 48039 57
439 97 80 54237 56730 78
440 114 63 46201 43067 72
441 119 70 68517 39995 83
442 106 61 51441 47257 65
443 93 65 57210 55390 60
444 98 75 70036 50860 74
445 120 66 56666 44283 79
446 99 77 65535 42843 76
447 97 75 65059 47506 73
448 107 76 46053 52787 81
449 94 62 49389 56667 58
450 114 78 58034 48718 89
451 101 60 48997 46251 61
452 96 74 54026 50026 71
453 102 79 49008 44120 81
454 101 73 62478 60991 74
455 98 63 59770 54092 62
456 91 76 70131 57005 69
457 96 75 59145 44461 72
458 108 61 57660 56004 66
459 104 77 68746 40307 80
460 93 66 53349 43857 61
461 117 63 58008 44086 74
462 112 67 57961 42741 75
463 114 75 59425 46970 86
464 106 70 64041 60254 74
465 103 76 68029 51248 78
466 111 68 64662 47366 75
467 113 65 66129 46706 73
468 98 74 60608 59148 73
469 94 73 60152 44261 69
470 108 64 55719 43451 69
471 100 60 60206 50290 60
472 115 64 66046 56942 74
473 105 69 64137 62079 72
474 95 71 68922 49926 67
475 97 72 51159 52833 70
476 102 72 46428 46281 73
477 108 70 55571 45568 76
478 90 67 66720 60771 60
479 90 67 53922 53967 60
480 97 75 57777 55666 73
481 99 67 57859 62347 66
482 91 73 54073 49070 66
483 91 78 55191 49628 71
484 98 64 54561 57449 63
485 107 76 58818 48111 81
486 104 61 46133 61413 63
487 105 74 62180 51501 78
488 116 80 68927 44862 93
489 92 61 67538 43019 56
490 116 75 50224 58275 87
491 104 72 64606 49854 75
492 107 74 54739 52560 79
493 115 68 45477 61224 78
494 97 77 59437 60363 75
495 100 70 66049 40457 70
496 93 74 65368 41899 69
497 114 70 46237 41301 80
498 114 73 61331 55723 83
499 92 69 48454 60753 63
500 114 72 47213 59226 82
That worked.
Now, I want to save the data frame in my local device and use it as I needed.
save(schooldata, file = "C:/Users/nirma/Documents/EDX courses/MicroMaster MIT/14.310x-Data Analysis for Social Scientists/Programs/simulated_school_data.RData")