1 Summary

This is an R Markdown document describing the final analysis of the rhetorical move sequences for a sample of UK patents from 1734 to 2011, one randomly selected patent from each year. The goal is to track change over time in the discursive structure of British patents primarily using string edit distance. The results of this analysis will be presented at ICAME 39 and written up for publication.

2 Packages

We’re using stringdist, which contains various string matching and string distance functions, as are commonly used in computational linguistics, dialectometry, genetics, etc. Documentation is available at https://cran.r-project.org/web/packages/stringdist/stringdist.pdf. We’re also using forecast for time series smoothing, as well as knitr and dplyr.

library(stringdist)
library(forecast)

## Warning: package 'forecast' was built under R version 3.4.2

## Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2017c.
## 1.0/zoneinfo/Europe/London'

library(knitr)
library(dplyr)

## Warning: package 'dplyr' was built under R version 3.4.2

3 Data

The dataset consists of one string of characters for each of the 272 years between 1740 and 2011 (see end of report for complete list)

patents <- read.table("PATENT_RECODE.csv", header = TRUE, sep = ",")
patents

Each string represents a sequence of rhetorical moves. In total, we have coded 21 distinct moves.

gloss <- read.table("GLOSS_RECODE.txt", header = TRUE, sep = "\t")
kable(gloss)

Code	Move
A	Salutation
B	Royal Grant
C	Condition Statement
D	Invention Declaration
E	Invention Description
F	Witness
G	Enrolment Confirmation
H	Other Witnesses
I	Petition
J	Drawings
K	Claims
L	Filing Information
M	Descriptive Title
N	Grant Declaration
O	Provisional Specification Header
P	Provisional Invention Description
Q	Specification Header
R	Overseas Communication
S	Generic Title
T	Abstract
U	Abstract Title

For example, the first year in the dataset (1740) is represented by a patent coded as ABCDEFG. This corresponds to this sequence of moves:

Salutation > Royal Statement > Condition Statement > Invention Declaration > Invention Description > Witness > Enrolment Confirmation

Alternatively, the string for the last year in the dataset (2011), SLMUTLJEK, corresponds to this sequence:

Generic Title > Filing Information > Descriptive Title > Abstract Title > Abstract > Filing Information > Drawings > Invention Description > Claims

The research question is basically how did British patents evolve from ABCDEFG to SLMUTLJEK? Gradually? In bursts? And then what do these results tell us about the evolution of discourse structures in general?

4 Initial Move Sequence Analysis

4.1 Distribution of Sequences

Over 272 years, 51 sequence types are attested.

length(table(patents$CODE))

## [1] 51

With some sequences being repeated up to 31 times.

summary(patents)

##       YEAR                 CODE    
##  Min.   :1740   SLQMDEKFJ    : 31  
##  1st Qu.:1808   SLMUTLJEK    : 23  
##  Median :1876   ABCDEKFHGJ   : 20  
##  Mean   :1876   ABCDEFG      : 17  
##  3rd Qu.:1943   LOMDPFQMDEKFJ: 16  
##  Max.   :2011   ABCDEFHG     : 12  
##                 (Other)      :153

Overall, the frequency distribution of sequences shows a fairly quick fall, with a limited number of sequences accounting for a substantial number of tokens.

freqt <- as.data.frame(count(patents, CODE))
freqt <- freqt[with(freqt, order(n, decreasing = TRUE)), ]
barplot(freqt$n, names.arg = freqt$CODE, main = "Patent Move Sequences", ylab = "Freqency", 
    las = 2, cex.names = 0.4, col = "seagreen3", border = FALSE)

We can also look at when these move sequences occured.

First, we build a move sequence by year matrix.

moveseqmat <- data.frame()
for (i in c(1:nrow(patents))) {
    moveseqmat[i, "YEAR"] <- patents$YEAR[i]
    
    for (j in c(1:nrow(freqt))) {
        if (identical(as.character(patents$CODE[i]), as.character(freqt$CODE[j]))) {
            moveseqmat[i, as.character(freqt$CODE[j])] <- 1
        }
    }
}
moveseqmat

Then we get the first occurence of each move sequence.

firstoc <- patents[match(unique(patents$CODE), patents$CODE), ]
merget <- merge(firstoc, freqt, by.x = "CODE", by.y = "CODE")
merget <- merget[order(merget$YEAR, decreasing = FALSE), ]

Then we set a colour ramp palette.

pal <- colorRampPalette(c(rgb(0.94, 0.96, 0.95), rgb(0.26, 0.8, 0.5)), bias = 1)
collist <- pal(max(merget$n))
color <- collist[merget$n]

And then we plot the results.

png("MOVSEQTIME.png", width = 1900, height = 2160, res = 150)

par(mfrow = c(nrow(firstoc), 1))
par(oma = c(3.5, 0, 1, 0))
par(mar = c(0.17, 16, 0.17, 3.5))

for (i in c(1:nrow(firstoc))) {
    plot(moveseqmat$YEAR, moveseqmat[, as.character(firstoc$CODE[i])], type = "n", 
        ylim = c(0, 1), xaxt = "n", yaxt = "n", ylab = "", xlab = "", bty = "n")
    
    rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4], col = color[i])
    
    lines(moveseqmat$YEAR, moveseqmat[, as.character(firstoc$CODE[i])], type = "h", 
        lwd = 3.4)
    
    box(which = "plot", col = "black", lwd = 0.8)
    
    mtext(toupper(as.character(firstoc$CODE[i])), side = 2, las = 1, line = 1, 
        cex = 0.75)
    
    mtext(as.character(freqt$n[freqt$CODE == firstoc$CODE[i]]), side = 4, las = 1, 
        line = 1, cex = 0.75)
    
    if (i == nrow(firstoc)) {
        axis(side = 1, lwd = 0, lwd.ticks = 1, outer = TRUE, cex.axis = 1.6, 
            at = c(min(patents$YEAR), 1800, 1850, 1900, 1950, max(patents$YEAR)))
    }
}
dev.off()

## quartz_off_screen 
##                 2

4.2 Summary

This initial analysis of move sequences tells us various things:

Specific move sequences are often in usage for decades, sometimes longer, but at some point they fall out of usage.
Sometimes a move sequence will be used exclusively for a short period of time and sometimes a relatively small number of move sequences will be used concurrently for a short period of time, but in general move sequences are unstable and change is relatively constant and rapid.
However, there appears to be a few major divisions in the data, where one large set of related move sequences stops being used altogether, and another set takes over, specifically around 1850, 1915, and 1975.
There is also a clear difference in terms of the length of the move sequence, with moves sequences being especially long from 1850-1900, although longer sequences still occur occasionally until the 1950s.

Overall, these results seem to show a mix between gradual and sudden evolution in patent structure: patents appear to always be in a state of variation and change, but there are a relatively small number of periods when rate of change increases dramatically.

Despite these results, this analysis treats each move sequences as a unique and equally distinct stucture. In other words, this analysis does not take into account that some of these distinct sequences are fairly similar to each other. This is why we’re going to be looking at string edit distance later on in this report.

5 Individual Move Analysis

5.1 Distribution of Moves

Before comparing strings, it is interesting to look at the frequency of each of the individual moves, regardless of their position.

First, we count the frequency of each move code across all sequences.

codes <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", 
    "N", "O", "P", "Q", "R", "S", "T", "U")
movefreq <- data.frame()
for (i in c(1:length(codes))) {
    movefreq[i, "CODE"] <- codes[i]
    movefreq[i, "COUNT"] <- length(grep(codes[i], patents$CODE))
}
movefreq

We then plot the frequency distribution of the individual moves, which shows a relatively steady drop in the percentage of patents containing each move.

movefreqsort <- movefreq[order(movefreq$COUNT, decreasing = TRUE), ]
barplot((movefreqsort$COUNT/272) * 100, names.arg = movefreqsort$CODE, main = "Patent Move", 
    ylab = "Percentage of Texts", ylim = c(0, 100), cex.names = 0.5, col = "seagreen3", 
    border = FALSE)

movefreqsort <- movefreq[order(movefreq$CODE, decreasing = FALSE), ]

We also looked at when the moves were used over time, again ignoring where the moves occur inside the patents.

First, we build a move by year frequency matrix.

movemat <- data.frame()
for (i in c(1:nrow(patents))) {
    movemat[i, "YEAR"] <- patents$YEAR[i]
    for (j in c(1:length(codes))) {
        movemat[i, codes[j]] <- length(grep(codes[j], patents$CODE[i]))
    }
}

movemat[movemat == 0] <- NA

movemat

Then we set a colour ramp palette.

pal <- colorRampPalette(c(rgb(0.94, 0.96, 0.95), rgb(0.26, 0.8, 0.5)), bias = 1)
collist <- pal(max(movefreqsort$COUNT))
color <- collist[movefreqsort$COUNT]

And then we plot the results.

png("MOVETIME.png", width = 1900, height = 1600, res = 150)

par(mfrow = c(nrow(movefreqsort), 1))
par(oma = c(3.5, 0, 1, 0))
par(mar = c(0.5, 3.5, 0.5, 3.5))

for (i in c(1:nrow(movefreqsort))) {
    
    plot(movemat$YEAR, movemat[, as.character(movefreqsort$CODE[i])], type = "n", 
        ylim = c(0, 1), xaxt = "n", yaxt = "n", ylab = "", xlab = "", bty = "n")
    
    rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4], col = color[i])
    
    lines(movemat$YEAR, movemat[, as.character(movefreqsort$CODE[i])], type = "h", 
        lwd = 1.3)
    
    box(which = "plot", col = "black", lwd = 0.8)
    
    mtext(toupper(as.character(movefreqsort$CODE[i])), side = 2, las = 1, line = 1, 
        cex = 1)
    
    mtext(as.character(movefreqsort$COUNT[i]), side = 4, las = 1, line = 1, 
        cex = 1)
    
    if (i == nrow(movefreqsort)) {
        axis(side = 1, lwd = 0, lwd.ticks = 1, outer = TRUE, cex.axis = 1.6, 
            at = c(min(patents$YEAR), 1800, 1850, 1900, 1950, max(patents$YEAR)))
    }
}
dev.off()

## quartz_off_screen 
##                 2

5.2 Summary

The analysis of individual moves also tells us various things about how patents change over time:

Aside from move E (Invention Description), none of the moves are used across the complete time period, showing that not only the order of moves but the inventory of moves has changed drastically over time.
Mostly, the rise and fall of individual moves is very abrupt, with moves being used consistently for 50+ years before being discarded altogether.
However, there are a small number of moves that show more gradual rises and falls, with Move J (Drawings) increasing gradually in usage over the entire time period, and moves O (Provisional Specification Header) and P (Provisional Invention Description) slowly falling out of use after abrupt introductions around 1850.
Additionally, Moves O and P are also notable for always occuring in the same patents, although notably they are never adjacent and the intervening moves do change.
There is one very clear divide in these results around 1850, when moves L-Q are all introduced, but otherwise aside from moves that were present since the start (A-G), the introduction of individual moves happens at different times.
However, the fate of moves that originate before 1850 is variable, with moves A-G dying off abruptly at different times, aside from move E.
Other important dates are around 1875, when various moves fall (or start to fall out of use), including A-C, H, N-P, and around 1975, when T and U are introduced, and D, F, and O-Q fall out of use.

Overall, we see that in terms of individual moves, change in usage is generally very abrupt, as opposed to the more gradual change observed above in move sequences. This suggests that change in the relative position of moves explains some of this more gradual change over time in move structure.

We can, however, see one very clear shift around 1850 that aligns with the analysis of move sequences presented above, when a large numbe of new and ultimately highly successful moves are introduced. Alternatively, the (abrupt) abandonment of older moves, occurs at different times.

6 Main Move Sequence Analysis

6.1 String Edit Distance

To look at change in discourse move structure over time in more detail, we use string edit distance to quantify the difference between the move strings, which measures the numbers of operations (insertion, deletion, substitution) needed to transform one string to another.

To do this we use the stringdist() function and the default optimal string alignment metric (OSA), also known as “restricted Damereau-Levenshtein distance”, which works as one would expect

as.character(patents$CODE[247])

## [1] "SLMUTLJEK"

as.character(patents$CODE[248])

## [1] "SLMUTLJEK"

stringdist(patents$CODE[247], patents$CODE[248])

## [1] 0

as.character(patents$CODE[245])

## [1] "SLUTLJEK"

stringdist(patents$CODE[247], patents$CODE[245])

## [1] 1

as.character(patents$CODE[5])

## [1] "ABCDEFG"

stringdist(patents$CODE[247], patents$CODE[5])

## [1] 9

6.2 Adjacent String Edit Distance

Fr every adjacent pair of move sequences in the time series, we first compare each pair of move sequences, using the year of the second string as an index.

adjacent <- data.frame()
for (i in c(2:nrow(patents))) {
    adjacent[i - 1, "YEAR"] <- patents$YEAR[i]
    adjacent[i - 1, "DIST"] <- stringdist(patents$CODE[i - 1], patents$CODE[i])
}

adjacent

We can then plot the distances between each adjacent pair, which in this case perhaps most notably shows a lot of variation from 1900-1975, aside from the time around WWII, and relative less variation since 1975.

barplot(adjacent$DIST, names.arg = adjacent$YEAR, main = "Adjacent Year Edit Distance", 
    ylab = "Edit Distance", col = "seagreen3", border = NA, cex.names = 0.7, 
    space = 0)

We can also smooth out the time series and plot that. This helps us see patterns more clearly in the data and also controls for some extent that we are only comparing adjacent years, effectively openning up the comparison window.

Here is a 10-year moving average.

smth <- ma(as.ts(adjacent$DIST), 10, centre = TRUE)

plot(adjacent$YEAR, smth, type = "l", col = "seagreen3", main = "Adjacent Year Edit Distance (Smoothed)", 
    ylab = "Edit Distance", xlab = "Year")

And here is a 25-year moving average.

smth <- ma(as.ts(adjacent$DIST), 25, centre = TRUE)

plot(adjacent$YEAR, smth, type = "l", col = "seagreen3", main = "Adjacent Year Edit Distance (Smoothed)", 
    ylab = "Edit Distance", xlab = "Year")

An alternative way to visualise the rate of change of patent structure, which also helps to smooth over some of the short-term variability, is to sum these distances over time and plot the results, which yields a cumulative (monotonic) time series, where the sharper the slope of the line connecting two time points, the greater the amount of change.

cumulative <- data.frame()
cumulative[1, "YEAR"] <- adjacent$YEAR[1]
cumulative[1, "CUMDIST"] <- adjacent$DIST[1]
for (i in c(2:nrow(adjacent))) {
    cumulative[i, "YEAR"] <- adjacent$YEAR[i]
    cumulative[i, "CUMDIST"] <- adjacent$DIST[i] + cumulative$CUMDIST[i - 1]
}

plot(cumulative$YEAR, cumulative$CUMDIST, type = "l", col = "seagreen3", main = "Cumulative Adjacent Year Edit Distance Change over Time", 
    ylab = "Cumulative Edit Distance", xlab = "Year")

It’s also informative to zoom in on this time series a bit to see the detail better.

plot(cumulative$YEAR[c(1:60)], cumulative$CUMDIST[c(1:60)], type = "l", col = "seagreen3", 
    lwd = 3, main = "Cumulative Adjacent Year Edit Distance Change over Time 1740-1800", 
    ylab = "Cumulative Edit Distance", xlab = "Year")

plot(cumulative$YEAR[c(60:160)], cumulative$CUMDIST[c(60:160)], type = "l", 
    col = "seagreen3", lwd = 3, main = "Cumulative Adjacent Year Edit Distance Change over Time 1800-1900", 
    ylab = "Cumulative Edit Distance", xlab = "Year")

plot(cumulative$YEAR[c(140:271)], cumulative$CUMDIST[c(140:271)], type = "l", 
    col = "seagreen3", lwd = 3, main = "Cumulative Adjacent Year Edit Distance Change over Time 1880-2011", 
    ylab = "Cumulative Edit Distance", xlab = "Year")

6.2.1 Summary

This analysis of adjacent move sequence distance also tells us various things about how patents change over time:

There is a lot of variability and there is almost always some variability year to year.
But the degree of variability is relatively unstable and doesn’t seem to follow some kind of general trend, suggesting a lot of external factors are at play,
In general, the variability in moves decreases moderately although somewhat inconsistently from 1750-1900, but it really increases from around 1900 until the 1940s, at which point it drops sharply, with another sharp drop and subsequently especially low variability after around 1975.
Secondary prominent inflection points include fairly big drops around 1775, 1810, and 1860, with fairly big jumps arround 1760, 1800, and 1850.
In general it appears like a big jump in variability will be followed within a decade or so by a fairly big fall, suggesting that although the structure is always in flux, big changes come in burst, and then things will settle down for a bit.
Looking at the cumulative graphs in particular, we see that when we a fairly stable average rate of change up until about 1900 and then it really pick up until almost 1950.
Nevertheless, when we zoom in we can see that in general, although on a smaller scale (fractal-like), we can see this gradual-burst-gradual-burst pattern repeats itself, with the slower change periods generally be longer than the faster change periods.
Overall, thee results are consistent with an s-curve theory of language change, or maybe more specifically of a theory of s-curves being composed of smaller s-curves, which I believe has been discussed somewhere in the literature (Janet Holmes)?

In terms of how patent structure changes, these results suggests it is a continous and often gradual process, punctuated by quick burst of change at certain points, which are often followed by periods of relative calm.

6.3 All String Edit Distances

Rather than just comparing adjacent strings, which is a bit artificial and limiting, especially because competing forms often alternate over a decade or more, we also looked at the edit distance between all pairs of moves sequences.

6.3.1 String Distance Matrix

First, we make a distance matrix of string edit distances using the stringdistmatrix() function.

distmat <- as.dist(stringdistmatrix(patents$CODE, patents$CODE))

6.3.2 Multidimensional Scaling

We then ran a simple metric multidimensional scaling to dimension reduce this matrix containing the distance between all pairs of strings down to two dimensions.

fit <- cmdscale(distmat, k = 2)

The years can then be plotted along these two dimensions to visualise clusters of years in the data. The results are a bit hard to see because of all the overlap, but there are three big clusters: early years on the left, later years in the middle, and middling years on the right. This right cluster is also more diffuse, reflecting the greater rate of change in the middling years

x <- fit[, 1]
y <- fit[, 2]
plot(x, y, xlab = "Dimension 1", ylab = "Dimension 2", main = "MDS Plot of Patent Sequence Distances by Year", 
    type = "n")
text(x, y, labels = patents$YEAR, cex = 0.5, col = "seagreen3")

We can pull these clusters out a bit better if we k-means these years over the 2 dimensions into three clusters, and we can then plot those over time.

Note that k-means can give a different result each time it runs since the initial seeds are placed at random. This effects both the (arbitrary) order of clusters, which does matter to us, since we want to plot over time, and the actual membership, which is not arbitrary. In terms of membership, I’ve run it multiple times and about 80% of the time it’s giving back the clusters one would expect, but the other 20% it’s classifying the upper few years in the right cluster with the middle cluster, although that does lead to a simpler picture when plotting the clusters over time (less overlap). So anyway, when running this, make sure the final run is a good one.

clusts <- kmeans(fit, centers = 3, iter.max = 10000)

color = c()
color[clusts$cluster == 1] <- "seagreen3"
color[clusts$cluster == 2] <- "violetred3"
color[clusts$cluster == 3] <- "darkorange1"

plot(x, y, xlab = "Dimension 1", ylab = "Dimension 2", main = "MDS Plot of Patent Sequence Distances by Year", 
    type = "n")
text(x, y, labels = patents$YEAR, cex = 0.5, col = color)

boxplot(patents$YEAR ~ clusts$cluster, ylab = "Cluster", names = c(), xlab = "Year", 
    main = "Patent Distance Clusters", col = c("darkorange1", "violetred3", 
        "seagreen3"), horizontal = TRUE)

The boxplot show the relationship between these three clusters over time nore clearly: the one on the middle-left consists entirely of early patents (pre-1850), the one on the top-middle consists primarily of later patents (post-1900), and the one on the right consists primarily of patents between these two points (1850-1900), although there is some overlap between the later two eras, especially between 1900-1940, indicating that there is competition betweent these two general types of patent move sequences around that time

Finally, we can also plot these two dimensions individually against time, and also smooth them so that we can abstract away a bit from the year-to-year noisiness and focus in on the larger trends.

plot(patents$YEAR, x, type = "l", col = "seagreen3", main = "Edit Distance (Dimension 1)", 
    ylab = "MDS 1", xlab = "Year")

s_dim1 <- ma(as.ts(x), 10, centre = TRUE)

plot(patents$YEAR, s_dim1, type = "l", col = "seagreen3", main = "Edit Distance (Dimension 1, Smoothed)", 
    ylab = "MDS 1", xlab = "Year")

plot(patents$YEAR, y, type = "l", col = "violetred3", main = "Edit Distance (Dimension 2)", 
    ylab = "MDS 2", xlab = "Year")

s_dim2 <- ma(as.ts(y), 10, centre = TRUE)

plot(patents$YEAR, s_dim2, type = "l", col = "violetred3", main = "Edit Distance (Dimension 2, Smoothed)", 
    ylab = "MDS 2", xlab = "Year")

And we can combine them.

plot(patents$YEAR, s_dim1, type = "l", col = "seagreen3", ylim = c(-8, 8), main = "Edit Distance (Dimension 1 + 2, Smoothed)", 
    ylab = "MDS 1 + 2", xlab = "Year")
lines(patents$YEAR, s_dim2, col = "violetred3")

6.3.3 Summary

Overall, we therefore get 4 main eras: 1740-1850, 1850-1900, 1900-1940, 1940-2011, with the 3rd era being transitionary.

…

7 Appendix: Move Sequences

kable(patents)

YEAR	CODE
1740	ABCDEFG
1741	ABCDEFHG
1742	ABCDEFG
1743	ABCDEFG
1744	ABCDEFG
1745	ABCDFGE
1746	ABCDEFG
1747	ABCDEFHG
1748	ABCDFGE
1749	AIBCDEFHG
1750	AIBCDFHGE
1751	AIBCDEFHG
1752	ABCDEFHG
1753	ABCDEFHGJ
1754	AIBCDEFHG
1755	ABCDEFEHG
1756	AIBCDEFHGJ
1757	ABCDEFHG
1758	ABCDEFHG
1759	AIBCDEFHG
1760	ABCDEFHG
1761	ABCDFGE
1762	ABCDEFHGJ
1763	ABCDFGEJ
1764	AIBCDEFHG
1765	ABCDEFEGJ
1766	AIBCDEFHG
1767	ABCDEFHG
1768	ABCDFGE
1769	AIBCDEFHGJ
1770	ABCDEFHG
1771	AIBCDEFHEGJ
1772	AIBCDEFHG
1773	ABCDEFG
1774	AIBCDEFHG
1775	AIBCDEFHGJ
1776	AIBCDFHGE
1777	ABCDEFG
1778	ABCDEFHGJ
1779	ABCDEFHG
1780	ABCDEFHGJ
1781	AIBCDEFHG
1782	AIBCDEFHGJ
1783	ABCDEFG
1784	ABCDFEGJ
1785	ABCDFEGJ
1786	ABCDEFHG
1787	AIBCDEFHGJ
1788	ABCDEFHGJ
1789	ABCDEFHGJ
1790	ABCDEFGJ
1791	ABCDEFG
1792	ABCDEFGJ
1793	ABCDEFG
1794	ABCDEFG
1795	ABCDEFHGJ
1796	ABCDEFGJ
1797	ABCDEFGJ
1798	ABCDEFHGJ
1799	ABCDEFG
1800	AIBCDEFHG
1801	ABCDFEGJ
1802	AIBCDEFHG
1803	ABCDEFG
1804	AIBCDEFHGJ
1805	ABCDFEGJ
1806	ABCDEFG
1807	ABCDEFG
1808	ABCDEFGJ
1809	ABCDEFG
1810	ABCDEFGJ
1811	ABCDEFGJ
1812	ABCBCDEFGJ
1813	ABCDEKFGJ
1814	ABCDEFGJ
1815	ABCDEFG
1816	ABCDEKFGJ
1817	ABCDEFHG
1818	ABCDEKFHGJ
1819	ABCDEKFGJ
1820	ABCDEKFGJ
1821	AIBCDEFGJ
1822	ABCDEFHGJ
1823	ABCDEFHG
1824	ABCDEKFHGJ
1825	ABCDEKFHGJ
1826	ABCDEKFHG
1827	AIBCDEKFHG
1828	ABCDEKFHGJ
1829	AIBCDEKFG
1830	ABCDEKFHGJ
1831	ABCDEKFHGJ
1832	ABCDEKFHG
1833	ABCDEKFHGJ
1834	AIBCDEKFGJ
1835	ABCDEFHGJ
1836	ABCDEKFHG
1837	ABCDEKFHGJ
1838	ABCDEKFHGJ
1839	ABCDEKFHGJ
1840	ABCDEKFHGJ
1841	ABCDEKFHGJ
1842	ABCDEKFHG
1843	ABCDEKFHGJ
1844	ABCDEKFHGJ
1845	ABCDEKFHGJ
1846	ABCDEKFHG
1847	ABCDEKFHGJ
1848	ABCDEKFHGJ
1849	ABCDEKFHGJ
1850	ABCDEKFHGJ
1851	ABCDEKFHGJ
1852	LMNODPQCABCDEKFHJ
1853	LMNODPQCABCDEKFHJ
1854	LMNODPQCABCDEF
1855	LMNODPQCABCDEKFJ
1856	LMNODPQCABCDEKFH
1857	LMNODPQCABCDEKFJ
1858	LMNQCAIDEKFHJ
1859	LMNODPQCABCDEKFJ
1860	LMNODPQCABCDEKFJ
1861	LMNODPQCABCDEKFHJ
1862	LMNODPQCABCDEKF
1863	LMNODPQCABCDEKFHJ
1864	LMNODPQCABCDEKFHJ
1865	LMNODPQCABCDEKFHJ
1866	LMNODPQCABCDEKFJ
1867	LMNODPQCABCDEKFHJ
1868	LMNODPQCABCDEKFHJ
1869	LMNODPQCABCDEKFJ
1870	LMNODPQCABCDEKFJ
1871	LMNODPQCABCDEKFJ
1872	LMNODPQCABCDEKFHJ
1873	LMNODPQCABCDEKFH
1874	LMNODPQCABCDEKFHJ
1875	LMNODPQCABCDEKFHJ
1876	LMNODPQCEKFJ
1877	LMNODPQCEKFHJ
1878	LMNODPQCEKFJ
1879	LMNODPQCEKFJ
1880	LMNODPQCEKFJ
1881	LMNODPQCEKFHJ
1882	LMNODPQCEKFHJ
1883	LMNODPQCEKFJ
1884	LMODPFQDEKFJ
1885	LOMDPFQMDEKFJ
1886	LOMDPFQMDEKFJ
1887	LOMDPFQMDEKFJ
1888	LOMDPFQMDEKFJ
1889	LOMDPFQMDEKFJ
1890	LOMDPFQMDEKF
1891	LQMDEKFJ
1892	LOMDPFQMDEKFJ
1893	LOMDPFQMDEKFJ
1894	LOMDPFQMDEKFJ
1895	LOMDPFQMDEKFJ
1896	LOMDPFQMDEKFJ
1897	LOMDPFQMDEKFJ
1898	LOMDPFQMDEKFJ
1899	LOMDPFQMDEKFJ
1900	LQMDEKFJ
1901	LQMDEKFJ
1902	LQMDEKF
1903	LQMDEKFJ
1904	LQMDEKFJ
1905	LOMDPFQMDEKFJ
1906	LQMDEKF
1907	LQMDEKFJ
1908	LQMDEKFJ
1909	LOMRDPFQMRDEKFJ
1910	LOMDPFQMDEKFJ
1911	LQMDEKFJ
1912	LQMDEKFJ
1913	LQMDEKFJ
1914	LOMDPFQMDEKFJ
1915	LQMDEKFJ
1916	SLQMDEKFJ
1917	SLOMDPFQMDEKFJ
1918	SLQMDEKFJ
1919	SLOMDPFQMDEKFJ
1920	SLQMDEKFJ
1921	SLOMDPFQMDEKFJ
1922	SLQMDEKFJ
1923	SLQMDEKFJ
1924	SLOMDPFQMDEKFJ
1925	SLQMDEKFJ
1926	SLOMDPFQMDEKFJ
1927	SLQMDEKFJ
1928	SLQMDEKFJ
1929	SLOMDPFQMDEKFJ
1930	SLQMDEKFJ
1931	SLOMDPFQMDEKFJ
1932	SLQMDEKFJ
1933	SLOMDPFQMDEKFJ
1934	SLQMDEKFJ
1935	SLOMDPFQMDEKFJ
1936	SLQMDEKF
1937	SLQMDEKFJ
1938	SLQMDEKFJ
1939	SLQMDEKFJ
1940	SLQMDEKFJ
1941	SLQMDEKF
1942	SLQMDEKFJ
1943	SLQMDEKFJ
1944	SLQMDEKFJ
1945	SLQMDEKFJ
1946	SLOMDPFQMDEKFJ
1947	SLQMDEKFJ
1948	SLQMDEKFJ
1949	SLQMDEKFJ
1950	SLQMDEKFODPFJ
1951	SLQMDEKFODPFJ
1952	SLQMDEKFODPFJ
1953	SLQMDEKFJ
1954	SLQMDEKFJ
1955	SLQMDEKFJ
1956	SLQMDEKFJ
1957	SLQMDEKFJ
1958	SLQMDEKFODPFJ
1959	SLQMDEKF
1960	SLQMDEFJ
1961	SLMDEKFJ
1962	SLQMDEKFJ
1963	SLQMDEKFJ
1964	SLMDEKFJTT
1965	SLQMDEKFJ
1966	SLQMDEKFJ
1967	SLMDEKFJ
1968	SLMDEKFJT
1969	SLMDEKFJ
1970	SLMDEKFTJ
1971	SLMDEKF
1972	SLMDEKFJ
1973	SLMDEKFJ
1974	SLMDEKFJ
1975	SLMDEKFJT
1976	SLMDEKFJTJ
1977	SLMDEKFJTJTJ
1978	SLUTLJQMEK
1979	SLMUTLJEK
1980	SLMUTLEKF
1981	SLMUTLJEK
1982	SLMUTLJEK
1983	SLMUTLJEK
1984	SLUTLJEK
1985	SLMUTLJEK
1986	SLMUTLJEK
1987	SLMUTLJEK
1988	SLMUTLJEK
1989	SLMUTLJEK
1990	SLMUTLJEK
1991	SLMUTLJEK
1992	SLMUTLJEK
1993	SLMUTLJEK
1994	SLMUTLJEK
1995	SLMUTLJEK
1996	SLUTLJEK
1997	SLMUTLJEK
1998	SLUTLJEK
1999	SLUTLJEK
2000	SLUTLJEK
2001	SLMUTLJEK
2002	SLMUTLJEK
2003	SLMUTLJEK
2004	SLUTLJEK
2005	SLUTLJEK
2006	SLMUTLJEK
2007	SLUTLJEK
2008	SLUTLJEK
2009	SLMUTLJEK
2010	SLMUTLJEK
2011	SLMUTLJEK

Tracking Change in a Corpus of British Patents using Move Analysis and Edit Distance

R Analysis

Jack Grieve and Nick Groom

University of Birmingham

2018/01/25