This is an R HTML document. When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

## My project is to get an information on the salary of teachers from different states and separate them by states
## As well as the region, just to make sure which state or region pays more to teachers.

#load different packages
library(ggplot2)
library(reshape2)
library(psych)
library(RCurl)

#load data file from github
statesData <- read.table(file="https://raw.githubusercontent.com/maharjansudhan/states_csv/master/States.csv", header=TRUE, sep=",",stringsAsFactors = FALSE)

#rename the X column with states_id
colnames(statesData)[1] <- "states_id"
colnames(statesData)[3] <- "population"
colnames(statesData)[4] <- "teachers_salary"

#mulitply the salary column by $1000
statesData$teachers_salary <- statesData$teachers_salary * 1000

#mulitply the population column by 1000
statesData$population <- statesData$population * 1000


#Trim any kind of unnecessary spaces
statesData$state_id <- trimws(statesData$state_id)

#delete columns you don't want from the dataset
statesData <- subset( statesData, select = -c(SATV, SATM ) )
statesData <- subset( statesData, select = -c(percent, dollars))

summary(statesData)
states_id            region            population       teachers_salary
 Length:51          Length:51          Min.   :  454000   Min.   :22000  
 Class :character   Class :character   1st Qu.: 1215000   1st Qu.:27500  
 Mode  :character   Mode  :character   Median : 3294000   Median :30000  
                                       Mean   : 4876647   Mean   :30941  
                                       3rd Qu.: 5780000   3rd Qu.:33500  
                                       Max.   :29760000   Max.   :43000  
# After sorting out the dataset and taking out all the unnecessary data from the dataset, finally we have made a data which
# has information about teacher's salary from different states and regions.
<p>You can also embed plots, for example:</p>
## Error: <text>:33:22: unexpected symbol
## 32: summary(statesData)
## 33: states_id            region
##                          ^
boxplot(statesData$teachers_salary)
## Error in boxplot(statesData$teachers_salary): object 'statesData' not found
#According to the boxplot, the minimum salary nationwide is 22000, maximum salary is 43000 and the median salary is 30000 and there is an outlier. # we can use the histogram to see the distribution of the data

You can also embed plots, for example:

ggplot(statesData, aes(teachers_salary) ) + geom_histogram(color="black")
## Error in ggplot(statesData, aes(teachers_salary)): could not find function "ggplot"
#We can get top 10 states which pays more salary to teachers. head(statesData[order(statesData$teachers_salary, decreasing= T),], n = 10) states_id region population teachers_salary 2 AK PAC 550000 43000 7 CN NE 3287000 43000 33 NY MA 17990000 42000 5 CA PAC 29760000 39000 9 DC SA 607000 39000 21 MD SA 4781000 38000 23 MI ENC 9295000 38000 31 NJ MA 7730000 38000 40 RI NE 1003000 37000 22 MA NE 6016000 36000 #salary spread according to the regions

You can also embed plots, for example:

ggplot(statesData, aes(x = teachers_salary, y = region )) + geom_point(na.rm=TRUE)+geom_smooth(method=lm,se=FALSE, na.rm=TRUE)
## Error in ggplot(statesData, aes(x = teachers_salary, y = region)): could not find function "ggplot"
#According to the scatterplot, it seems like the teachers_salary in Mid-Atlantic is higher than other region even though they do not get the highest salary in the United States. #In Conclusion, according to the dataset and the diagram, East South Central (ESC) and West South Central(WSC) region pays their teachers very low compared to other states. North-Eastern states and region teachers are paid relatively more than others in United States. end.rcode-->

You can also embed plots, for example: