PACKAGE INSTALLATION

I began by installing packages I would use, specifically dplyr, ggplot, and tidyverse

install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
install.packages("ggplot")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
## Warning: package 'ggplot' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
library(ggplot2)
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.1.8
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ✔ readr     2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(xlsx)

FILE IMPORT

Using the Files pane and read.xlsx(), I imported an Excel spreadsheet to RStudio Cloud.

patents_count_spreadsheet<-read.xlsx("patents count spreadsheet.xlsx", 
                                     sheetName = "Years and Totals")
patents_count_spreadsheet
##     Row.Labels Sum.of.Total.Patent.Applications..
## 1         1963                              90982
## 2         1964                              92971
## 3         1965                             100150
## 4         1966                              93482
## 5         1967                              90544
## 6         1968                              98737
## 7         1969                             104357
## 8         1970                             109359
## 9         1971                             111095
## 10        1972                             105300
## 11        1973                             109622
## 12        1974                             108011
## 13        1975                             107456
## 14        1976                             109580
## 15        1977                             108377
## 16        1978                             108648
## 17        1979                             108209
## 18        1980                             112379
## 19        1981                             113966
## 20        1982                             117987
## 21        1983                             112040
## 22        1984                             120276
## 23        1985                             126788
## 24        1986                             132665
## 25        1987                             139455
## 26        1988                             151491
## 27        1989                             165748
## 28        1990                             176264
## 29        1991                             177830
## 30        1992                             186507
## 31        1993                             188739
## 32        1994                             206090
## 33        1995                             228238
## 34        1996                             211013
## 35        1997                             232424
## 36        1998                             260889
## 37        1999                             288811
## 38        2000                             315015
## 39        2001                             345732
## 40        2002                             356493
## 41        2003                             366043
## 42        2004                             382139
## 43        2005                             417508
## 44        2006                             452633
## 45        2007                             484955
## 46        2008                             485312
## 47        2009                             482871
## 48        2010                             520277
## 49        2011                             535188
## 50        2012                             576763
## 51        2013                             609052
## 52        2014                             615243
## 53        2015                             629647
## 54        2016                             649319
## 55        2017                             651355
## 56        2018                             643303
## 57        2019                             669434
## 58        2020                             646244
## 59 Grand Total                           16041006

DATA CLEANING

Then, I performed data cleaning, renaming both columns and dropping the last row because it only held a sum.

clean_patent_data<-rename(patents_count_spreadsheet,"Year"="Row.Labels",
                       "Patent_Applications"="Sum.of.Total.Patent.Applications..") 
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
## List of 1
##  $ axis.text.x:List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : num 45
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE
clean_patent_data<-clean_patent_data[-59,]

FULL DATA PLOT

Next, I made a plot using ggplot for a first look at the data.

ggplot(data=clean_patent_data)+
  geom_point(mapping=aes(x=Year,y=Patent_Applications))+
  labs(title="Total Patents 1963 to 2020",y="Patent Applications")+
    scale_x_discrete(guide=guide_axis(n.dodge=2))+
  theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))

Since 1963, patent applications have increased, in either with a polynomial or exponential trend. Examining further, I identified three sections to explore where the slope appeared to change most noticeably, creating three smaller plots based on year ranges: 1963-1985,1986-1995, & 1996-2016

1963 TO 1985 PLOT

clean_patent_data_63_to_85<-clean_patent_data[1:23,] 
ggplot(data=clean_patent_data_63_to_85)+ 
  geom_point(mapping=aes(x=Year,y=Patent_Applications))+ 
  labs(title="Patent Applications 1963-1985",y="Patent Applications")+
  theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))

Analysis

This plot shows a possibly linear trend, though this interpretation is limited from due to the possible outliers in the data. An explanation for them could be:

  • rapidly changing patent laws
  • the patent filing process itself being understood by few people.

1986 TO 1995 PLOT

clean_patent_data_86_to_95<-clean_patent_data[24:33,]
ggplot(data=clean_patent_data_86_to_95)+
  geom_point(mapping=aes(x=Year,y=Patent_Applications))+
  labs(title="Patent Applications 1986 to 1995",y="Patent Applications")+
  theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))

Analysis

This plot is much cleaner with fewer outliers. However, this could be due to the small amount of data point for the particular span of time I examined. If not, a better understanding of the patenting process could explain improved linearity.

1996 TO 2020 PLOT

clean_patent_data_96_to_20<-clean_patent_data[34:58,]
ggplot(data=clean_patent_data_96_to_20)+
  geom_point(mapping=aes(x=Year,y=Patent_Applications))+
  labs(title="Patent Applications 1996 to 2020",y="Patent Applications")+
  theme(axis.text.x = element_text(size=10,angle = 45, vjust = 1, hjust=1))

Analysis

Finally, the last plot. This time, there are a good amount of data points as well as a very linear-looking trend. The only time this does not appear linear is near the end where the trend seems to plateau. Since this happens near the end around 2016 on the graph, I think greater proliferation of the internet and social media creates ways to circumvent the patent application process. These might include:

  • getting fake patents from a site
  • research dissuading applying since their patent will be too similar to others’

Further investigation would delve into patents laws and how they changed over time and patent-related crime rates and whether or not they increased.

Resources

My data was gotten from the following .gov site:

US Patent and Trademark Office (https://www.uspto.gov/web/offices/ac/ido/oeip/taf/us_stat.htm)