Data set provided by Abdellah Ait Elmouden.
Yearly full-text report provides data on the average agricultural land values. Includes, by state and region, average value per acre for farm real estate (farmland and buildings), irrigated and non-irrigated cropland, and pastureland. yearly full-text report provides data on the average agricultural land values. Includes, by state and region, average value per acre for farm real estate (farmland and buildings), irrigated and non-irrigated cropland, and pastureland.
There are numerous tables within the report. I will extract, tidy and explore table Farm Real Estate Average Value per Acre, showed below.
Table
Step 1: Reading file and extracting part that I will work on.
Table is located between rows 64 and 100.
if (!require('tidyverse')) install.packages('tidyverse')
## Loading required package: tidyverse
## ── Attaching packages ───────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data_raw<-read.csv('https://downloads.usda.library.cornell.edu/usda-esmis/files/pn89d6567/g732dn07g/th83m920j/land0819.txt')
head(data_raw)
## United.States.Department.of.Agriculture
## 1 National Agricultural Statistics Service
## 2 ISSN: 1949-1867
## 3 Land Values
## 4 2019 Summary
## 5 August 2019
## 6 Contents
con<-file('https://downloads.usda.library.cornell.edu/usda-esmis/files/pn89d6567/g732dn07g/th83m920j/land0819.txt', open='r')
land_raw<-readLines(con)
leng<-length(land_raw)
rawDF1<-NULL
j<-0
for (i in 1:leng)
{
if(between(i,64,100))
{
j<-j+1
rawDF1<-rbind(rawDF1,land_raw[i])
}
}
close(con)
head(rawDF1)
## [,1]
## [1,] "Northeast ..........: 5,190 5,270 5,380 5,550 5,690 2.5 "
## [2,] " Connecticut ......: 11,700 11,900 12,100 12,300 12,200 -0.8 "
## [3,] " Delaware .........: 8,110 8,290 8,250 8,410 8,950 6.4 "
## [4,] " Maine ............: 2,170 2,210 2,370 2,370 2,410 1.7 "
## [5,] " Maryland .........: 7,270 7,470 7,620 7,860 8,060 2.5 "
## [6,] " Massachusetts ....: 10,600 10,700 10,800 10,900 11,100 1.8 "
Next step is to eliminate spaces, dots, semicolon blank lines etc. Data frame will be generated.
rawDF1<-str_trim(rawDF1, side="both")
states <- unlist(str_replace_all(rawDF1, "[[:space:]]", ""))
states<- unlist(str_extract_all(states,"[[:alpha:]]{2,}"))
values <- unlist((str_extract_all(rawDF1, "[[:digit:]]{1,},[[:digit:]]{2,}")))
values<-str_trim(values,side="both")
values <- matrix((unlist(values)), nrow = 33, ncol = 5)
values <- data.matrix(values)
values <- matrix(unlist(values), nrow = 33, ncol = 5)
#y<-transform(values,as.numeric(values))
#x<-as.numeric((unlist(values[[1]])))
rawDF2<-data.frame(states, values, stringsAsFactors = FALSE)
names(rawDF2)<-c("Region/State","2015", "2016", "2017", "2018", "2019")
head(rawDF2)
## Region/State 2015 2016 2017 2018 2019
## 1 Northeast 5,190 4,900 4,900 7,190 2,100
## 2 Connecticut 5,270 4,980 4,940 3,230 2,070
## 3 Delaware 5,380 13,300 4,960 3,220 2,070
## 4 Maine 5,550 13,300 4,960 3,120 3,810
## 5 Maryland 5,690 13,400 4,760 3,380 3,880
## 6 Massachusetts 11,700 13,500 4,790 3,400 3,970
Now to transform from long to wide and do some EDA. First, I will split first column to display only States
rawDF3<-rawDF2[-c(1,13,17,23,28),]
names(rawDF3)[1]<-"State"
#$=c("Northeast","LakeStates", "","CornBelts","NorthernPlains","Appalachian"),)
head(rawDF3)
## State 2015 2016 2017 2018 2019
## 2 Connecticut 5,270 4,980 4,940 3,230 2,070
## 3 Delaware 5,380 13,300 4,960 3,220 2,070
## 4 Maine 5,550 13,300 4,960 3,120 3,810
## 5 Maryland 5,690 13,400 4,760 3,380 3,880
## 6 Massachusetts 11,700 13,500 4,790 3,400 3,970
## 7 NewHampshire 11,900 13,500 4,870 5,930 4,030
Tidying I will create a variable year and transpose the format to wide
rawDF4<- rawDF3 %>%
gather(2:6,key="year", value="Land_Value")
head(rawDF4)
## State year Land_Value
## 1 Connecticut 2015 5,270
## 2 Delaware 2015 5,380
## 3 Maine 2015 5,550
## 4 Maryland 2015 5,690
## 5 Massachusetts 2015 11,700
## 6 NewHampshire 2015 11,900
(year_2015 <- rawDF4 %>%
filter(year==2015) %>%
ggplot(mapping=aes(x=State,y=Land_Value)) + geom_col()
)
I struggled for hours to convert characters into numeric (column Land_Value) and because of that I was not able to do any meaningful analysis, like aggregation by mean, or a trend chart.