Upload file population of us named bigcity
getwd()
## [1] "C:/Users/hangr/Documents/SPSummer/RSummer"
uscity<-read.csv("bigcity.csv")
head(uscity)
summary of uscity dataset
summary(uscity)
## X u x
## Min. : 1 Min. : 2.0 Min. : 46.0
## 1st Qu.:13 1st Qu.: 43.0 1st Qu.: 58.0
## Median :25 Median : 64.0 Median : 79.0
## Mean :25 Mean :103.1 Mean :127.8
## 3rd Qu.:37 3rd Qu.:120.0 3rd Qu.:130.0
## Max. :49 Max. :507.0 Max. :634.0
Mean of two variables
mean(uscity$X)
## [1] 25
median(uscity$X)
## [1] 25
mean(uscity$u)
## [1] 103.1429
median(uscity$u)
## [1] 64
Create new data frame with with a subset of the columns and rows
uscity_sub<-uscity[,c('x','u')]
head(uscity_sub)
create new column name for the subset
colnames(uscity_sub)<-c("pop_1920","pop_1930")
head(uscity_sub)
Forquestion 5 upload a new file and rename the columns
getwd()
## [1] "C:/Users/hangr/Documents/SPSummer/RSummer"
acme<-read.csv("acme.csv")
head(acme)
names(acme)<-gsub("e", "excellent", names(acme))
acme
Read from github
bigcity<-read.csv("https://github.com/HantzSPS/SPS/blob/master/bigcity.csv", sep = ",", na.strings = "NA", strip.white = TRUE, stringsAsFactors = FALSE)
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec =
## dec, : EOF within quoted string
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
bigcitydf<-tbl_df(bigcity)
bigcitydf