era_task

The solution for Task 1a and 1b must be programmatic and not resolved using copy/paste. You may use programs such as Access, SQL, SPSS or SAS. Save the program and logic you used to complete these tasks with your final files. You could also include your assumptions and explanations of your methodology if you would like.

library(kableExtra)# To create tables in R markdown files
library(tidyverse)# To manipulate data files 
library(ggplot2)# To create Plots and Diagrams
library(cowplot)# To put tables together
library(stringi)# To search for patter midst characters string/text/natural language
library(stringr) # for data cleaning and correct implementation of common string manipulation
library(readxl)# To read excel files
library(tidyr)# To simplify the process of creating tidy data, which is defined as a stadard way of storing data that is used wherever possible throughout the tidyverse.

Table 1:

table1 <- data.frame(
  Field_Name = c("SchoolCode","CourseCd","ClassroomCd","StudentCount"), 
  Type = c("String","String","String","Numeric"),
  Length = c("6","6","25","3"), 
  Description = c("Code that uniquely identifies a school","Code that uniquely identifies a course","Code that uniquely identifies a specific classroom in the school.  This code is unique by SchoolCode","Number of students in the classroom")
)
kbl(table1)%>%
  kable_classic_2(full_width=F)%>%
  column_spec(1, border_right=T)%>%
  column_spec(2, border_right=T)%>%
  column_spec(3, border_right=T)%>%
  column_spec(4)

Field_Name	Type	Length	Description
SchoolCode	String	6	Code that uniquely identifies a school
CourseCd	String	6	Code that uniquely identifies a course
ClassroomCd	String	25	Code that uniquely identifies a specific classroom in the school. This code is unique by SchoolCode
StudentCount	Numeric	3	Number of students in the classroom

Task 1a: Create a three-year view of total reported enrollment in one report using the separate October files provided. This report will be used by individuals need to examine the trends.

# Uploading 2012 data in R
data_2012 <- read_excel("ERA Task_October_2012.xlsx", range = "C16:E141")
# Giving the Column Names
colnames(data_2012) <- c("LEA", "District_Agency_Name", "TRS")
# Checking 2012 data 
names(data_2012)

[1] "LEA"                  "District_Agency_Name" "TRS"

# Uploading 2013 data in R
data_2013 <- read_excel("ERA Task_October_2013.xlsx", range = "A14:C142")
# Giving the Column Names
colnames(data_2013) <- c("LEA", "District_Agency_Name", "TRS")
# Subsetting the 2013 dataset to match it with 2012 & 2014 Data Files
#data_2013 <- select(data_2013, -Empty)
# Checking 2013 data 
names(data_2013)

[1] "LEA"                  "District_Agency_Name" "TRS"

# Uploading 2014 data in R
data_2014 <- read_excel("ERA Task_October_2014.xlsx", range = "A15:C150")
colnames(data_2014) <- c("LEA", "District_Agency_Name", "TRS")
# Checking 2014 data 
names(data_2014)

[1] "LEA"                  "District_Agency_Name" "TRS"

The datafiles have been uploaded to R. I have named the columns, and they are set appropriately. Dataset from the year 2013 had second and third columns in Excel Spreadsheet merged together. R separates them as two different columns. Thus, the third column was not useful to me because it did not have any data. I got rid of the column. Now, the data set for all of the years (i.e., 2012, 2013, and 2014) have same column names. How do the data look? Let’s have a quick look. I want to check 10-rows of data per year.

# Year 2012
head((data_2012),10)

# A tibble: 10 x 3
   LEA   District_Agency_Name   TRS
   <chr> <chr>                <dbl>
 1 002   Allen Parish          4350
 2 003   Ascension Parish     21090
 3 004   Assumption Parish     3830
 4 005   Avoyelles Parish      6060
 5 006   Beauregard Parish     6160
 6 007   Bienville Parish      2340
 7 008   Bossier Parish       21680
 8 009   Caddo Parish         41640
 9 010   Calcasieu Parish     32690
10 011   Caldwell Parish       1650

# Year 2013
head((data_2013),10)

# A tibble: 10 x 3
   LEA   District_Agency_Name TRS  
   <chr> <chr>                <chr>
 1 002   Allen Parish         4368 
 2 003   Ascension Parish     21525
 3 004   Assumption Parish    3892 
 4 005   Avoyelles Parish     5952 
 5 006   Beauregard Parish    6092 
 6 007   Bienville Parish     2279 
 7 008   Bossier Parish       22018
 8 009   Caddo Parish         41252
 9 010   Calcasieu Parish     32872
10 011   Caldwell Parish      1657

# Year 2014
head((data_2014),10)

# A tibble: 10 x 3
   LEA   District_Agency_Name   TRS
   <chr> <chr>                <dbl>
 1 002   Allen Parish          4370
 2 003   Ascension Parish     21866
 3 004   Assumption Parish     3729
 4 005   Avoyelles Parish      5968
 5 006   Beauregard Parish     6013
 6 007   Bienville Parish      2274
 7 008   Bossier Parish       22206
 8 009   Caddo Parish         41146
 9 010   Calcasieu Parish     32595
10 011   Caldwell Parish       1763

2013 data was troublesome. I went ahead and created a user defined function (UDF) in Excel to get rid of spaces and special characters in the dataset.

I activated the Visual Basic Editor in Excel by pressing Alt+F11
I Selected the 2013 data from the active worksheets
Right clicked on the Workbook and clicked on “insert a new module” button

Function RemoveSpecChar(sInput As String) As String

Dim sSpecChar As String

Dim i As Long

sSpecChar = "/:*?™""®<>|.&@# (_+`©~);-+=^$!,’"

For i = 1 To Len(sSpecChar)

sInput = Replace$(sInput, Mid$(sSpecChar, i, 1), "")

Next

RemoveSpecChar = sInput

End Function

We are able to see how data look for each of the study year. Although the data have similar column names, their types are not what they have to be. The “TRS” column is denoted as either “character” or “double”, but they should be numeric. District or Agency name on the other hand is denoted as character. It looks fine. Let’s change the “TRS” columns to numeric types.

# Year 2012
data_2012$TRS <- as.numeric(data_2012$TRS)
# Checking 
class(data_2012$TRS)

[1] "numeric"

# Year 2013
data_2013$TRS <- as.numeric(data_2013$TRS)
# Checking 
class(data_2013$TRS)

[1] "numeric"

# Year 2014
data_2014$TRS <- as.numeric(data_2014$TRS)
# Checking 
class(data_2014$TRS)

[1] "numeric"

Now that the columns have been straight forward, I want compare these data set. For trend analyses, I need to merge these data files. I want to check if they have equal number of data points, if there is any specific ID that would help me combine these files, and if the ids’ match in all of these data set. For this I am going to draw the summary:

# Year 2012
summary(data_2012)

     LEA            District_Agency_Name      TRS       
 Length:125         Length:125           Min.   :   30  
 Class :character   Class :character     1st Qu.:  520  
 Mode  :character   Mode  :character     Median : 2010  
                                         Mean   : 5626  
                                         3rd Qu.: 6060  
                                         Max.   :45940

# Year 2013
summary(data_2013)

     LEA            District_Agency_Name      TRS       
 Length:128         Length:128           Min.   :   39  
 Class :character   Class :character     1st Qu.:  548  
 Mode  :character   Mode  :character     Median : 1978  
                                         Mean   : 5539  
                                         3rd Qu.: 5626  
                                         Max.   :46661

# Year 2014
summary(data_2014)

     LEA            District_Agency_Name      TRS         
 Length:135         Length:135           Min.   :   38.0  
 Class :character   Class :character     1st Qu.:  527.5  
 Mode  :character   Mode  :character     Median : 1723.0  
                                         Mean   : 5286.3  
                                         3rd Qu.: 5249.0  
                                         Max.   :48126.0

The results show that, there have been total of 125 rows in the year 2012, which is 128 in 2013, and 135 in the year 2014. They unequal data points. I can see differential amount of missing data in these data set. However, I don’t much worry about them now. I am now going to merge this files in phases. Merge 2212 and 2013 first and merge 2014 afterward.

m1_data <- merge(data_2012, data_2013, by = "LEA")
#m1_data

The data set have been merged but R gave unique column names. I want to fix them first before I combine the data to 2014 data.

m1_data <- select(m1_data, LEA, District_Agency_Name.y, TRS.x, TRS.y)
head(m1_data)

  LEA District_Agency_Name.y TRS.x TRS.y
1 002           Allen Parish  4350  4368
2 003       Ascension Parish 21090 21525
3 004      Assumption Parish  3830  3892
4 005       Avoyelles Parish  6060  5952
5 006      Beauregard Parish  6160  6092
6 007       Bienville Parish  2340  2279

I got rid of one of the repeated columns. I now want to merge 2014 data.

complete_data <- merge(m1_data, data_2014, by = "LEA")
#complete_data

Data have been merged. It looks like the ID’s match as well. I am now going to get rid of the repeated column and rename the columns for easy read of the combined data file.

complete_data <- select(complete_data, LEA, District_Agency_Name, TRS.x, TRS.y, TRS)
#head(complete_data)

# Overwriting the Column Names
colnames(complete_data) <- c("LEA", "District_Agency_Name", "Year_2012", "Year_2013", "Year_2014")

Task 1a: Report in a Single Table

# Some of the names of the schools are too long. I want to truncate them so that I can fit all the column in a single table side by side
library(stringr)
complete_data$District_Agency_Name <- lapply(complete_data$District_Agency_Name, str_trunc, 40, ellipsis = "")
complete_data

    LEA                     District_Agency_Name Year_2012 Year_2013 Year_2014
1   002                             Allen Parish      4350      4368      4370
2   003                         Ascension Parish     21090     21525     21866
3   004                        Assumption Parish      3830      3892      3729
4   005                         Avoyelles Parish      6060      5952      5968
5   006                        Beauregard Parish      6160      6092      6013
6   007                         Bienville Parish      2340      2279      2274
7   008                           Bossier Parish     21680     22018     22206
8   009                             Caddo Parish     41640     41252     41146
9   010                         Calcasieu Parish     32690     32872     32595
10  011                          Caldwell Parish      1650      1657      1763
11  012                           Cameron Parish      1290      1289      1306
12  013                         Catahoula Parish      1520      1477      1465
13  014                         Claiborne Parish      1960      1746      1728
14  015                         Concordia Parish      3820      3590      3557
15  016                            DeSoto Parish      5240      5145      5122
16  017                  East Baton Rouge Parish     42720     42375     42618
17  018                      East Carroll Parish      1150      1114      1098
18  019                    East Feliciana Parish      2010      1994      2004
19  020                        Evangeline Parish      6150      6184      6160
20  021                          Franklin Parish      3210      3154      3097
21  022                             Grant Parish      3320      3173      3216
22  023                            Iberia Parish     13960     14099     14048
23  024                         Iberville Parish      4610      4697      4735
24  025                           Jackson Parish      2300      2345      2359
25  026                         Jefferson Parish     45940     46661     48126
26  027                   Jefferson Davis Parish      5930      5946      5871
27  028                         Lafayette Parish     30810     31171     30205
28  029                         Lafourche Parish     14690     14787     14875
29  030                           LaSalle Parish      2640      2673      2641
30  031                           Lincoln Parish      6650      6615      6628
31  032                        Livingston Parish     25550     25826     25992
32  033                           Madison Parish      1880      1492      1417
33  034                         Morehouse Parish      4430      4569      4449
34  035                      Natchitoches Parish      6780      6774      6565
35  036                           Orleans Parish     11360     12541     13352
36  037                          Ouachita Parish     20170     20130     20166
37  038                       Plaquemines Parish      3950      4066      4105
38  039                     Pointe Coupee Parish      2940      2916      3114
39  040                           Rapides Parish     24190     24288     23840
40  041                         Red River Parish      1480      1538      1494
41  042                          Richland Parish      3470      3338      3195
42  043                            Sabine Parish      4360      4405      4382
43  044                       St. Bernard Parish      6750      7189      7466
44  045                       St. Charles Parish      9790      9727      9757
45  046                        St. Helena Parish       790       805      1126
46  047                         St. James Parish      3830      3865      3868
47  048              St. John the Baptist Parish      6100      6031      5951
48  049                        St. Landry Parish     14880     14919     14580
49  050                        St. Martin Parish      8420      8345      8419
50  051                          St. Mary Parish      9500      9434      9401
51  052                       St. Tammany Parish     37880     38044     38193
52  053                        Tangipahoa Parish     19890     20093     20041
53  054                            Tensas Parish       680       671       653
54  055                        Terrebonne Parish     18920     18799     18657
55  056                             Union Parish      2370      2269      2205
56  057                         Vermilion Parish      9530      9599      9593
57  058                            Vernon Parish      9870      9716      9372
58  059                        Washington Parish      5470      5409      5376
59  060                           Webster Parish      6710      6665      6539
60  061                  West Baton Rouge Parish      3930      3968      3988
61  062                      West Carroll Parish      2210      2210      2126
62  063                    West Feliciana Parish      2160      2186      2144
63  064                              Winn Parish      2510      2545      2480
64  065           City of Monroe School District      8680      8557      8438
65  066         City of Bogalusa School District      2140      1962      1757
66  067        Zachary Community School District      5360      5519      5393
67  068            City of Baker School District      1750      1673      1459
68  069        Central Community School District      4330      4418      4585
69  101                  Special School District       510       478       425
70  300    RSD-New Beginnings Schools Foundation      2010      2019      2107
71  302 Louisiana School For Math Science & the        300       298       308
72  304 LA Schools for the Deaf and Visually Imp       210       207       193
73  306       Louisiana Special Education Center        30        39        38
74  318                    LSU Laboratory School      1370      1398      1414
75  319           Southern University Lab School       400       555      1040
76  321              New Vision Learning Academy       370       376       383
77  328 Southwest Louisiana Charter Acad. Founda       560       679       967
78  329             V. B. Glencoe Charter School       370       367       377
79  331        International School of Louisiana       740       826       908
80  333          Avoyelles Public Charter School       690       722       721
81  334     New Orleans Center for Creative Arts       110       177       242
82  336                     Delhi Charter School       670       924       832
83  337               Belle Chasse Academy, Inc.       970       946       933
84  339                        Milestone Academy       410       336       428
85  340                   The MAX Charter School       100       116       110
86  341           D'Arbonne Woods Charter School       560       724       806
87  343 Community School for Apprenticeship Lear      1590      2071      2246
88  344 Voices for International Business & Educ       410       476       546
89  345            Louisiana Connections Academy      1200      1200      1723
90  346 Lake Charles Charter Academy Foundation,       770       872       860
91  347    Lycee Francais de la Nouvelle-Orleans       340       400       467
92  348    New Orleans Military/Maritime Academy       210       360       540
93  349 Outreach Community Development Corporati       170       196       220
94  360   RSD-Educators for Quality Alternatives       150       159       164
95  361          RSD-Crescent Leadership Academy       150       227       162
96  363                RSD-Crescent City Schools       520       924      1015
97  364 RSD-Community Leaders Advocating Student       550       575       597
98  366   RSD-Lagniappe Academies of New Orleans       130       162       180
99  367         RSD-Spirit of Excellence Academy       370       355       387
100 368         RSD-Morris Jeff Community School       310       412       522
101 369    RSD-ReNEW-Reinventing Education, Inc.      2430      3332      3432
102 371          Shreveport Charter School, Inc.       520       527       533
103 373                        RSD-Arise Academy       380       905       848
104 374          RSD-Success Preparatory Academy       410       457       506
105 381         RSD-Akili Academy of New Orleans       380       472       487
106 382                 RSD-Collegiate Academies       580       854      1052
107 384 RSD-Miller-McCoy Academy for Math and Bu       380       325       365
108 385 RSD-New Orleans College Preparatory Acad       860      1173      1276
109 388       RSD-Broadmoor Charter School Board       610       647       628
110 389       RSD-Pelican Educational Foundation       520       566       564
111 390                         RSD-Dryades YMCA       640       570       514
112 391                      RSD-Friends of King      1140      1209      1229
113 392 RSD-New Orleans Charter Schools Foundati       420       447       448
114 393                    RSD-Choice Foundation      1840      1874      1861
115 395 RSD-Algiers Charter Schools Association       3570      4129      4450
116 396             Recovery School District-LDE      6720      3465       194
117 397    RSD-Institute for Academic Excellence       480       470       399
118 398 RSD-Knowledge is Power Program (KIPP) N.      3210      3755      4022
119 399              RSD-FirstLine Schools, Inc.      2560      2802      3114
120 A02               Office of Juvenile Justice       190       288       284

As can be seen, there are 120 data points. They all belong to different school districts. It is a long list, thus, things are not as clear. I don’t need the summary of the first two columns because they simply represent the District number and district name. Three year summary would provide us with some crucial information. Let’s see the summary of the data:

summary(complete_data[,-(1:2)])

   Year_2012         Year_2013       Year_2014      
 Min.   :   30.0   Min.   :   39   Min.   :   38.0  
 1st Qu.:  557.5   1st Qu.:  665   1st Qu.:  646.8  
 Median : 2185.0   Median : 2198   Median : 2135.0  
 Mean   : 5845.9   Mean   : 5887   Mean   : 5887.4  
 3rd Qu.: 6112.5   3rd Qu.: 5972   3rd Qu.: 5955.2  
 Max.   :45940.0   Max.   :46661   Max.   :48126.0

Now, we see the a much better picture. The school with the minimum number of students have been changed over the years. In addition, the mean number of students per school has been increasing, as well. The change, however, is not a big one. Schools seem to have higher enrollment over the years. Now, lets order a density plot and compare:

ggplot()+
  geom_freqpoly(data=complete_data, aes(Year_2012, ..density..), fill="blue", color="darkblue")+
  geom_freqpoly(data=complete_data, aes(Year_2013, ..density..), fill="red", color="darkred")+
  geom_freqpoly(data=complete_data, aes(Year_2014, ..density..), fill="purple", color="purple")+ xlab("Number of Students")

We can see that most of the schools have less than 10,000 students. Number of schools having lower than 10,000 students have been decreasing over time.

This is not the final thing I wanted my data to spell. I would thus, change the wide data into the long data to do a trend analysis. First of all, let’s change the data outlook. I want to create a new variable that will subsume 2012, 2013, and 2014 student numbers as subcategories. I will call it “Number_of_students” and my key will be the name of the school districts/agency.

long_data <- gather(data = complete_data, key = Year, value = Number_of_Students, 
                    Year_2012:Year_2013:Year_2014, factor_key = TRUE)
head(arrange(long_data, LEA), 10)

   LEA District_Agency_Name      Year Number_of_Students
1  002         Allen Parish Year_2012               4350
2  002         Allen Parish Year_2013               4368
3  002         Allen Parish Year_2014               4370
4  003     Ascension Parish Year_2012              21090
5  003     Ascension Parish Year_2013              21525
6  003     Ascension Parish Year_2014              21866
7  004    Assumption Parish Year_2012               3830
8  004    Assumption Parish Year_2013               3892
9  004    Assumption Parish Year_2014               3729
10 005     Avoyelles Parish Year_2012               6060

I can see that my new dataset has only four columns that includes two new columns, i.e., Year and Number_of_Students. Same school district has three data points each representing year 2012 through 2014.

Now lets see the student number time-series:

library(plotly)
a <- ggplot()+
  geom_line(data = long_data, aes(x = Number_of_Students, y = Year, group = Number_of_Students, colour = Number_of_Students))
ggplotly(a)

Task 1b: The ClassFile.txt contains a list of all courses taught in all schools along with the number of students in each course.

Using the CourseCd_with_Names.xlsx file, add the names of the courses to the Class File.
How many classes of Algebra II are being taught by school? The finished product should be in the following layout.

table2 <- data.frame(
  SchoolCode = c("-","-"), 
 Numbe_of_Algebra_II_Classrooms = c("-", "-")
)
kbl(table2)%>%
  kable_classic_2(full_width=F)%>%
  column_spec(1, border_right=T)%>%
  column_spec(2)

SchoolCode	Numbe_of_Algebra_II_Classrooms

# Third Table
table3 <- data.frame(
  CourseName = c("-", "-"), 
  ALGEBRA_II = c("-", "-")
)
kbl(table3)%>%
  kable_classic_2(full_width=F)%>%
  column_spec(1, border_right=T)%>%
  column_spec(2)

CourseName	ALGEBRA_II

What is the average number of students in a classroom by school? The finished product should be in the following layout.

    table3 <- data.frame(
  SchoolCode = c("-", "-"), 
  Average_Class_Size = c("-", "-")
)
kbl(table3)%>%
  kable_classic_2(full_width=F)%>%
  column_spec(1, border_right=T)%>%
  column_spec(2)

SchoolCode	Average_Class_Size

era_task

Nirmal Ghimire

8/10/2021