I began by installing packages I would use, specifically dplyr, ggplot, and tidyverse
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
install.packages("ggplot")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
## Warning: package 'ggplot' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
library(ggplot2)
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.1.8
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(xlsx)
Using the Files pane and read.xlsx(), I imported an Excel spreadsheet to RStudio Cloud.
patents_count_spreadsheet<-read.xlsx("patents count spreadsheet.xlsx",
sheetName = "Years and Totals")
patents_count_spreadsheet
## Row.Labels Sum.of.Total.Patent.Applications..
## 1 1963 90982
## 2 1964 92971
## 3 1965 100150
## 4 1966 93482
## 5 1967 90544
## 6 1968 98737
## 7 1969 104357
## 8 1970 109359
## 9 1971 111095
## 10 1972 105300
## 11 1973 109622
## 12 1974 108011
## 13 1975 107456
## 14 1976 109580
## 15 1977 108377
## 16 1978 108648
## 17 1979 108209
## 18 1980 112379
## 19 1981 113966
## 20 1982 117987
## 21 1983 112040
## 22 1984 120276
## 23 1985 126788
## 24 1986 132665
## 25 1987 139455
## 26 1988 151491
## 27 1989 165748
## 28 1990 176264
## 29 1991 177830
## 30 1992 186507
## 31 1993 188739
## 32 1994 206090
## 33 1995 228238
## 34 1996 211013
## 35 1997 232424
## 36 1998 260889
## 37 1999 288811
## 38 2000 315015
## 39 2001 345732
## 40 2002 356493
## 41 2003 366043
## 42 2004 382139
## 43 2005 417508
## 44 2006 452633
## 45 2007 484955
## 46 2008 485312
## 47 2009 482871
## 48 2010 520277
## 49 2011 535188
## 50 2012 576763
## 51 2013 609052
## 52 2014 615243
## 53 2015 629647
## 54 2016 649319
## 55 2017 651355
## 56 2018 643303
## 57 2019 669434
## 58 2020 646244
## 59 Grand Total 16041006
Then, I performed data cleaning, renaming both columns and dropping the last row because it only held a sum.
clean_patent_data<-rename(patents_count_spreadsheet,"Year"="Row.Labels",
"Patent_Applications"="Sum.of.Total.Patent.Applications..")
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
## List of 1
## $ axis.text.x:List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : num 1
## ..$ angle : num 45
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
clean_patent_data<-clean_patent_data[-59,]
Next, I made a plot using ggplot for a first look at the data.
ggplot(data=clean_patent_data)+
geom_point(mapping=aes(x=Year,y=Patent_Applications))+
labs(title="Total Patents 1963 to 2020",y="Patent Applications")+
scale_x_discrete(guide=guide_axis(n.dodge=2))+
theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))
Since 1963, patent applications have increased, in either with a polynomial or exponential trend. Examining further, I identified three sections to explore where the slope appeared to change most noticeably, creating three smaller plots based on year ranges: 1963-1985,1986-1995, & 1996-2016
clean_patent_data_63_to_85<-clean_patent_data[1:23,]
ggplot(data=clean_patent_data_63_to_85)+
geom_point(mapping=aes(x=Year,y=Patent_Applications))+
labs(title="Patent Applications 1963-1985",y="Patent Applications")+
theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))
This plot shows a possibly linear trend, though this interpretation is limited from due to the possible outliers in the data. An explanation for them could be:
clean_patent_data_86_to_95<-clean_patent_data[24:33,]
ggplot(data=clean_patent_data_86_to_95)+
geom_point(mapping=aes(x=Year,y=Patent_Applications))+
labs(title="Patent Applications 1986 to 1995",y="Patent Applications")+
theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))
This plot is much cleaner with fewer outliers. However, this could be due to the small amount of data point for the particular span of time I examined. If not, a better understanding of the patenting process could explain improved linearity.
clean_patent_data_96_to_20<-clean_patent_data[34:58,]
ggplot(data=clean_patent_data_96_to_20)+
geom_point(mapping=aes(x=Year,y=Patent_Applications))+
labs(title="Patent Applications 1996 to 2020",y="Patent Applications")+
theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))
Finally, the last plot. This time, there are a good amount of data points as well as a very linear-looking trend. The only time this does not appear linear is near the end where the trend seems to plateau. Since this happens near the end around 2016 on the graph, I think greater proliferation of the internet and social media creates ways to circumvent the patent application process. These might include:
Further investigation would delve into patents laws and how they changed over time and patent-related crime rates and whether or not they increased.
My data was gotten from the following .gov site:
US Patent and Trademark Office (https://www.uspto.gov/web/offices/ac/ido/oeip/taf/us_stat.htm)