Seth J. Chandler
August 29, 2014
This programming exercise asks students to take a look at one of the most comprehensive studies of lawsuits and insurance claims ever compiled: the study of 1989 claims in Texas. The material is, of course, a bit dated here in 2014, but it still is a trove of information on the legal process. The idea is to show students how to use R to obtain descriptive statistics on a large data set.
CLOSED89 <- read.csv("~/Downloads/CLOSED89.csv", stringsAsFactors=FALSE)
ClosedClaimDataFieldDefinition1989 <- read.csv("~/Downloads/ClosedClaimDataFieldDefinition1989.csv", stringsAsFactors=FALSE)
tsa<-CLOSED89[,"Q12A7"]
meantsa<-mean(tsa)
mediantsa<-median(tsa)
q90<-tsa[order(tsa)[round(0.9*length(tsa))]]
length(which(tsa>100000))/length(tsa)
tried<-CLOSED89[CLOSED89$"Q10A"<=8 & CLOSED89$"Q10A">=5,]
jury<-tried[CLOSED89$"Q10C"==1,]
judge<-tried[CLOSED89$"Q10C"==2,]
mean(jury[,"Q12A7"])
mean(judge[,"Q12A7"])
Import the files CLOSED89.csv and ClosedClaimDataFieldDefinition1989.csv into R. They contain a trove of information about the disposition of lawsuits in Texas in 1989. Take a look at the ClosedClaimDataFieldDefinition1989 file and figure out the name of the variable that contains the “Total settlement amount or court award.”
The first thing we do is suck in the data. I’ve printed out the dimensions of the two databases. You can see that the closed files database is pretty large. I’ ve also printed out just the top few cases in the claims file to get a flavor of what they look like. I’ve also printed out the column names in the definitions database. Intelligently poking about data to solve a problem may not be as fancy a skill as learning a new machine learning algorithm or understanding how to compute the page rank of a node in a network, but, as a practical matter, it is extremely important.
closed89 <- read.csv("~/Downloads/CLOSED89.csv", stringsAsFactors=FALSE)
print(dim(closed89))
## [1] 7249 220
print(head(closed89,n=3)) # just need a few cases to understand data
## EXTSEQ TYPEF Q1A Q1B Q1C Q1D Q1E Q1F
## 1 4000001 S 19860824 19880516 19880310 0 19881025 0
## 2 4000002 S 19850213 19870923 19870626 19881114 19881104 0
## 3 4000003 S 19860409 19870629 19870430 19881012 19881128 19881013
## Q1G Q2 Q3A Q3B Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q4K Q4L Q4M
## 1 19890201 0
## 2 19890201 0
## 3 19890101 0
## Q4N Q4O Q4P Q4Q Q4R Q5A Q5B Q5C Q5D Q5E Q5F Q5G Q5H Q5I Q5J Q5K Q5L Q5M
## 1
## 2
## 3
## Q5N Q5O Q6A Q6B Q6C Q6D Q7A Q7B Q7C Q7D1 Q7D2 Q7D3 Q8A Q8B Q8C
## 1 1 220 220 0 4 1 19 0 0 1000000 2000 1500 3500
## 2 1 165 165 0 4 1 19 0 100000 0 10000 6500 16500
## 3 1 15 15 15 4 1 20 0 100000 0 2000 1000 3000
## Q8D Q8E Q8F Q9A Q9B Q9C Q10A Q10B Q10C Q10D Q10E Q10F Q11A Q11B1
## 1 2000 1500 3500 Y Y N 4 0 0 0 0 100000 0
## 2 25000 10000 35000 Y Y N 4 0 0 0 0 42000 0
## 3 2000 1000 3000 Y Y N 7 5 1 0 N 0 30000 12979
## Q11B2A Q11B2B Q11B2C Q11B2D Q11B2E Q11C Q11D1 Q11D2 Q11D3A Q11D3B Q11D3C
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 1000 10282 0 1697 12979 N 0 0 0 0
## Q11D3D Q11D3E Q11E1 Q11E2 Q11E3A Q11E3B Q11E3C Q11E3D Q11E3E Q12A1 Q12A2
## 1 0 0 22500 N 0 0 0 0 0 20000 0
## 2 0 0 20000 N 0 0 0 0 0 20000 0
## 3 0 0 0 0 0 0 0 0 12979 0
## Q12A3A Q12A3B Q12A4A Q12A4B Q12A5A Q12A5B Q12A6A Q12A6B Q12A7 Q12B1A
## 1 0 0 0 2500 22500 0
## 2 0 0 0 0 20000 0
## 3 0 0 0 0 12979 0
## Q12B1B Q12B1C Q12B2A Q12B2B Q12B2C Q12B3A Q12B3B Q12B3C Q12B4A Q12B4B
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0
## Q12B4C Q12B5A Q12B5B Q12B5C Q12B6A Q12B6B Q12B6C Q12C Q13A Q13B1A Q13B2A
## 1 0 0 0 0 0 0 0 N 0 0
## 2 0 0 0 0 0 0 0 N 0 0
## 3 0 0 0 0 0 0 0 N 0 0
## Q13B2B Q13B2C Q13B3A Q13B3B Q13B3C Q13B4A Q13B4B Q13B4C Q13B5 Q13B6 Q13C
## 1 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0
## Q13D1A Q13D2A Q13D2B Q13D3A Q13D3B Q13D4A Q13D4B Q13D5 Q13E1 Q13E2A1
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## Q13E2A2 Q13E2B1 Q13E2B2 Q13E2C1 Q13E2C2 Q13E2D1 Q13E2D2 Q13E2E1 Q13E2E2
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## Q13E2F1 Q13E2F2 Q13E2G1 Q13E2G2 Q13E2H1 Q13E2H2 Q13E2I1 Q13E2I2 Q14A
## 1 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0
## Q14B Q14C1 Q14C2 Q14C3 Q14C4 Q14C5 Q14C6 Q15A Q15B Q16A Q16B1 Q16B2
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## Q16B3 Q16B4 Q16C Q17A Q17B Q17C Q17D ECONOMIC NONECO EXEMP INTEREST
## 1 0 0 2032 0 161 2193 0 0 0 0
## 2 0 0 9998 0 2230 12228 0 0 0 0
## 3 0 0 9305 0 630 9935 1000 10282 0 1697
## TOTALDAM INSPCT OTHPCT UNPCT INSSHARE OTHSHARE UNSHARE ET1A1B ET1A1C
## 1 0 0 0 0 0 0 0 631 564
## 2 0 0 0 0 0 0 0 952 863
## 3 12979 0 0 0 0 0 0 446 386
## ET1A1D ET1A1E ET1A1F ET1A1G ET1B1C ET1B1D ET1B1E ET1B1F ET1B1G ET1C1D
## 1 NA 793 NA 892 -67 NA 162 NA 261 NA
## 2 1370 1360 NA 1449 -89 418 408 NA 497 507
## 3 917 964 918 998 -60 471 518 472 552 531
## ET1C1E ET1C1F ET1C1G ET1D1E ET1D1F ET1D1G ET1E1F ET1E1G ET1F1G POLLIMIT
## 1 229 NA 328 NA NA NA NA 99 NA 1000000
## 2 497 NA 586 -10 NA 79 NA 89 NA 100000
## 3 578 532 612 47 1 81 -46 34 80 100000
## PARTY MULTIDEF UNKNOWN COMPANUN
## 1 2 1 0 0
## 2 1 0 0 0
## 3 1 0 0 0
definitions89<-read.csv("~/Downloads/ClosedClaimDataFieldDefinition1989.csv",
stringsAsFactors=FALSE)
print(dim(definitions89))
## [1] 297 16
print(colnames(definitions89))
## [1] "Column" "Field.Name" "Field.Type" "Column.Width"
## [5] "Question" "Coding.Notes" "X" "X.1"
## [9] "X.2" "X.3" "X.4" "X.5"
## [13] "X.6" "X.7" "X.8" "X.9"
print(head(definitions89))
## Column Field.Name Field.Type Column.Width Question
## 1 1 EXTSEQ N 8 TDI Number
## 2 2 TYPEF A 1 Type of Form
## 3 3 Q1A N 8 Date of injury
## 4 4 Q1B N 8 Date reported to insurer
## 5 5 Q1C N 8 Date suit filed
## 6 6 Q1D N 8 Date of trial
## Coding.Notes X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9
## 1 NA NA NA NA NA NA NA NA NA NA
## 2 L - Long S - Short NA NA NA NA NA NA NA NA NA NA
## 3 YYYYMMDD format NA NA NA NA NA NA NA NA NA NA
## 4 YYYYMMDD format NA NA NA NA NA NA NA NA NA NA
## 5 YYYYMMDD format NA NA NA NA NA NA NA NA NA NA
## 6 YYYYMMDD format NA NA NA NA NA NA NA NA NA NA
It looks as if we could just print definitions89 without killing too many forests if we restricted ourselves to the Field.Name and Question columns. I’m going to use the substr function to shorten the Question column so the table fits better on the page. (Again, this isn’t one of the traditional analytic skills, but making your ouput attractive and useful is an important practical skill) Let’s do that.
definitions89$short.Question<-substr(definitions89[,"Question"],0,50)
print(definitions89[,c("Field.Name","short.Question")],row.names=FALSE)
## Field.Name short.Question
## EXTSEQ TDI Number
## TYPEF Type of Form
## Q1A Date of injury
## Q1B Date reported to insurer
## Q1C Date suit filed
## Q1D Date of trial
## Q1E Date of settlement
## Q1F Date of jury award
## Q1G Date claim closed
## Q2 Age of injured party
## Q3A Employment status
## Q3B Work related injury status
## Q4A Death
## Q4B Amputation
## Q4C Burns (heat)
## Q4D Burns (chemical)
## Q4E Systemic poisoning (toxic)
## Q4F Systemic poisoning (other)
## Q4G Eye injury (blindness)
## Q4H Respiratory condition
## Q4I Nervous condition
## Q4J Hearing loss or impairment
## Q4K Circulatory condition
## Q4L Multiple injuries
## Q4M Back injury
## Q4N Skin disorder
## Q4O Brain damage
## Q4P Scarring
## Q4Q Spinal cord injuries
## Q4R Other
## Q5A Off road vehicle
## Q5B Air transportation
## Q5C Railway
## Q5D Other motor vehicle
## Q5E Surgical/medical care
## Q5F Falls
## Q5G Drowning
## Q5H Use of defective product
## Q5I Fire
## Q5J Firearm
## Q5K Pollution/Toxic exposure
## Q5L Explosions
## Q5M Use of agricultural machinery
## Q5N Oil & gas extraction
## Q5O Other
## Q6A Injury location
##
## Q6B County where injury occurred
## Q6C County where suit filed
## Q6D County where case tried
## Q7A Policy type
##
##
##
##
## Q7B Policy form
##
## Q7C Business class
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
## Q7D1 Per person policy limit
## Q7D2 Per occurrence/accident policy limit
## Q7D3 Combined single limit
## Q8A Initial indemnity reserve
## Q8B Initial expense reserve
## Q8C Initial expenditure reserve (Q8A + Q8B)
## Q8D Final indemnity reserve
## Q8E Final expense reserve
## Q8F Final expenditure reserve (Q8D + Q8E)
## Q9A Attorney involvement - plaintiff
## Q9B Attorney involvement - insurer
## Q9C Attorney involvement - insured
## Q10A Legal stage where settlement was reached
##
##
##
##
##
##
##
##
## Q10B Result of court verdict
##
##
##
##
##
##
##
##
## Q10C Trial involvement
##
## Q10D Who requested appeal
##
## Q10E Did the court order a remittitur
## Q10F Amount remittitur reduced original award
## Q11A Final demand of claimant or attorney for claimant
## Q11B1 Amount of court verdict
## Q11B2A Economic loss on court verdict
## Q11B2B Non-economic loss on court verdict
## Q11B2C Exemplary damages on court verdict
## Q11B2D Prejudgment interest on court verdict
## Q11B2E Total court verdict amount
## Q11C Was settlement different from verdict
## Q11D1 Amount of settlement
## Q11D2 Was settlement influence by noneconomic damages or
## prejudgment interest (long forms), exemplary damag
## Q11D3A Economic loss on settlements after verdict
## Q11D3B Non-economic loss on settlements after verdict
## Q11D3C Exemplary damages on settlements after verdict
## Q11D3D Prejudgment interest on settlements after verdict
## Q11D3E Amount of settlement after verdict
## Q11E1 Settlement amount - no court verdict
## Q11E2 Was settlement influence by noneconomic damages or
## prejudgment interest (long forms), exemplary damag
## Q11E3A Economic loss on settlements without verdict
## Q11E3B Non-economic loss on settlements without verdict
## Q11E3C Exemplary damages on settlements without verdict
## Q11E3D Prejudgment interest on settlements without verdic
## Q11E3E Amount of settlement without verdict
## Q12A1 Amount paid by the primary carrier
## Q12A2 Amount paid by insured due to deductible (or self-
## Q12A3A Amount paid by excess carrier
## Q12A3B Unknown amount paid by excess carrier
## Q12A4A Amount paid by insured - over policy limit or to e
## Q12A4B Unknown for amount paid by insured
## Q12A5A Amount paid by other insured defendants
## Q12A5B Unknown amount paid by other insured defendants
## Q12A6A Amount paid by uninsured defendants
## Q12A6B Unknown amount paid by uninsured defendants
## Q12A7 Total settlement amount or court award
## Q12B1A Other companies contributing to settlement (1st co
## Q12B1B Other companies contributing to settlement NAIC co
## Q12B1C Amount paid by other companies (1st company)
## Q12B2A Other companies contributing to settlement (2nd co
## Q12B2B Other companies contributing to settlement NAIC co
## Q12B2C Amount paid by other companies (2nd company)
## Q12B3A Other companies contributing to settlement (3rd co
## Q12B3B Other companies contributing to settlement NAIC co
## Q12B3C Amount paid by other companies (3rd company)
## Q12B4A Other companies contributing to settlement (4th co
## Q12B4B Other companies contributing to settlement NAIC co
## Q12B4C Amount paid by other companies (4th company)
## Q12B5A Other companies contributing to settlement (5th co
## Q12B5B Other companies contributing to settlement NAIC co
## Q12B5C Amount paid by other companies (5th company)
## Q12B6A Other companies contributing to settlement (6th co
## Q12B6B Other companies contributing to settlement NAIC co
## Q12B6C Amount paid by other companies (6th company)
## Q12C Any other defendants still in liitigation
## Q13A Did judgment provide for joint and several liabili
## Q13B1A Court verdict: percentage fault - injured party
## Q13B2A Court verdict: percentage fault - insured party
## Q13B2B Amount of verdict - insured party
## Q13B2C Amount of settlement - insured party
## Q13B3A Court verdict: percentage fault - other insured de
## Q13B3B Amount of verdict - other insured defendants
## Q13B3C Amount of settlement - other insured defendants
## Q13B4A Court verdict: percentage fault - uninsured defend
## Q13B4B Amount of verdict - uninsured defendants
## Q13B4C Amount of settlement - uninsured defendants
## Q13B5 Total verdict amount
## Q13B6 Total payout amount paid in settlement after verdi
## Q13C Did the doctrine of joint and several liability im
## Q13D1A No verdict: percentage fault - injured party
## Q13D2A No verdict: percentage fault - insured party
## Q13D2B Settlement amount - insured party
## Q13D3A No verdict: percentage fault - other insured defen
## Q13D3B Settlement amount - other insured defendants
## Q13D4A No verdict: percentage fault - uninsured defendant
## Q13D4B Settlement amount - uninsured defendants
## Q13D5 Total settlement for all defendants, no court verd
## Q13E1 Number of other defendants
##
##
##
##
##
##
##
## Q13E2A1 Number of other insured defendants - Municipal
## Q13E2A2 Number of other uninsured defendants - Municipal
## Q13E2B1 Number of other insured defendants - Government ot
## Q13E2B2 Number of other uninsured defendants - Government
## Q13E2C1 Number of other insured defendants - Business
## Q13E2C2 Number of other uninsured defendants - Business
## Q13E2D1 Number of other insured defendants - Industrial
## Q13E2D2 Number of other uninsured defendants - Industrial
## Q13E2E1 Number of other insured defendants - Non-profit or
## Q13E2E2 Number of other uninsured defendants - Non-profit
## Q13E2F1 Number of other insured defendants - Hospital
## Q13E2F2 Number of other uninsured defendants - Hospital
## Q13E2G1 Number of other insured defendants - Physicians &
## Q13E2G2 Number of other uninsured defendants - Physicians
## Q13E2H1 Number of other insured defendants - Other health
## Q13E2H2 Number of other uninsured defendants - Other healt
## Q13E2I1 Number of other insured defendants - All others
## Q13E2I2 Number of other uninsured defendants - All others
## Q14A Workers' compensation available to injured party
## Q14B Availability of other collateral sources
## Q14C1 Medical insurance
## Q14C2 Disability insurance
## Q14C3 Social security disability/supplementary security
## Q14C4 Medicare, medicaid
## Q14C5 Sick leave
## Q14C6 Other
## Q15A Lawsuits pertaining to subrogation, contribution,
## Q15B Status in lawsuit for subrogation , contributions
##
##
##
## Q16A Structured settlement used in closing claim
## Q16B1 Immediate payment
## Q16B2 Present value of structured amount
## Q16B3 Total award or settlement (Equal to Q12A7)
## Q16B4 Total projected future payout
## Q16C Structured settlement used to pay plaintiff's atto
## Q17A Amount paid to outside defense counsel
## Q17B Amount for allocated expense for inhouse defense c
## Q17C Amount for other ALAE
## Q17D Total allocated loss adjustment expense
## ECONOMIC Calculated Field indicating economic damages
## NONECO Calculated Field indicating non-economic damages
## EXEMP Calculated Field indicating exemplary damages
## INTEREST Calculated Field indicating prejudgment interest
## TOTALDAM Calculated Field indicating total damages, if deta
## INSPCT Insured's percentage of fault after reallocating f
## OTHPCT Other insured parties percentage of fault after re
## UNPCT Uninsured parties percentage of fault after reallo
## INSSHARE Insured's share of settlement after reallocating f
## OTHSHARE Other insured parties share of settlement after re
## UNSHARE Uninsured parties share of settlement after reallo
## ET1A1B Elapsed time between date of injury and date repor
## ET1A1C Elapsed time between date of injury and date suit
## ET1A1D Elapsed time between date of injury and date of tr
## ET1A1E Elapsed time between date of injury and date of se
## ET1A1F Elapsed time between date of injury and date of ju
## ET1A1G Elapsed time between date of injury and date claim
## ET1B1C Elapsed time between date reported to insurer and
## ET1B1D Elapsed time between date reported to insurer and
## ET1B1E Elapsed time between date reported to insurer and
## ET1B1F Elapsed time between date reported to insurer and
## ET1B1G Elapsed time between date reported to insurer and
## ET1C1D Elapsed time between date suit filed and date of t
## ET1C1E Elapsed time between date suit filed and date of s
## ET1C1F Elapsed time between date suit filed and date of j
## ET1C1G Elapsed time between date suit filed and date clai
## ET1D1E Elapsed time between date of trial and date of set
## ET1D1F Elapsed time between date of trial and date of jur
## ET1D1G Elapsed time between date of trial and date claim
## ET1E1F Elapsed time between date of settlement and date o
## ET1E1G Elapsed time between date of settlement and date c
## ET1F1G Elapsed time between date of jury award and date c
## POLLIMIT Policy limit for each individual injury
## PARTY Indicates the presence of other contributors to th
## regardless of whether payment was made
## MULTIDEF Indicates whether multiple defendants were involve
##
## UNKNOWN Indicates incomplete settlement amounts due to unk
## litigation still in progress
## COMPANUN Identifies matches among multi-defendant claims
##
##
##
##
##
##
## Number of Records
## 5,917
## 7,249
## 9,140
## 11,197
## 12,610
## 12,891
When I do this, it looks as though the question I want is “12A7,” which contains the “Total settlement amount or court award.” I’m not 100% certain of this because the question descriptions are rather terse, so if you picked some other plausible column, don’t worry about it. In the real world, we’d either spend a lot more time on this issue or contact the authors of the study.
All we really need for the next section of the exercise is this total settlement amount or court award column. We should know how to do this by now.
head(tsa<-closed89[,"Q12A7"])
## [1] 22500 20000 12979 17500 15000 20000
Once we have the data, getting descriptive statistics is incredibly easy in R. I first print the mean, which is the sum of the values divided by the number of values. I then print the median, which for an odd number of values is the middle of a sorted list of values. For an even number of values it’s defined by R as the mean of the two middle values of the sorted list – a point I establish with the last piece of code.
print(summary(tsa))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10000 17000 30000 134000 85000 22600000
print(mean(tsa))
## [1] 134393
print(median(tsa))
## [1] 30000
print(median(c(3,5,8,2)))
## [1] 4
R lets us calculate quantiles (sometimes called percentiles) quite easily.
quantile(tsa,0.9)
## 90%
## 250000
So, 90% of the cases have awards under $250,000.
One thing you should immediately see is that the mean and the median are far apart. This is an artifact of the distribution of awards being highly skewed.
hist(tsa)
Or, if you want a clearer picture of the data, plot it on a logarithmic scale using ggplot2. Notice how I use + scale_x_log10() to create a logarithmic x-axis and how I use xlab and ylab to create custom axis labels.
require(ggplot2)
## Loading required package: ggplot2
ggplot(data.frame("tsa"=tsa), aes(x = tsa)) + geom_histogram(fill="steelblue") + scale_x_log10() + xlab("log of settlement or award") + ylab("how many cases")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Suppose we want to look at cases that actually went to trial and see how settlements and awards varied depending on whether the case tried to a judge or jury. To do this, we will want to look at the variable “Q10A,” which is described as “Legal stage where settlement was reached.” We can see the values of this variable with the following R code. See if you can follow what I am doing. First I find the row that starts Q10A. Then I find the row that starts the next question, which happens to be Q10B. I subtract 1 from that so that I get the end of Q10A. Then I look at the “Coding.Notes” column for all of the answers relevant to Q10A.
print(start<-which(definitions89[,"Field.Name"]=="Q10A"))
## [1] 96
print(end<-which(definitions89[,"Field.Name"]=="Q10B")-1)
## [1] 104
definitions89[start:end,"Coding.Notes"]
## [1] "1 - Alternative dispute resolution: no suit"
## [2] "2 - No suit filed"
## [3] "3 - Alternative dispute resolution: with suit"
## [4] "4 - Suit filed, settled before trial"
## [5] "5 - During trial, before court verdict"
## [6] "6 - Court verdict"
## [7] "7 - Settlement reached after verdict"
## [8] "8 - Settlement after appeal filed"
## [9] "9 - Case dismissed or summary judgment"
When I do this I see that answers 1 to 4 and answer 9 correspond to dispositions that do not involve trial. In fact, just out of curiosity, let’s make a pie chart showing the disposition of the cases. It’s not a beautiful graphic by any means, but for the moment I don’t want to spend the time making it better.
q10a<-table(closed89[,"Q10A"])
pie(q10a,labels=definitions89[start:end,"Coding.Notes"])
Anyway, let’s just focus on the cases that have gone to trial.
dim(tried<-closed89[closed89$"Q10A" %in% c(5,6,7,8),])
## [1] 316 220
We see there are 316 of them. #Judge or Jury To figure out which cases were tried to a judge and which to a jury, we need to look at other data in the closed89 data.frame. I can use R’s “grep” command to find any “Coding Notes” that refer to a jury. It turns out that Question 10C appears to have the information I need.
definitions89[grep(".*jury.*",definitions89[,"Coding.Notes"])+-1:1,c("Question","Coding.Notes")]
## Question Coding.Notes
## 113 9 - All others
## 114 Trial involvement 1 - Trial by judge and jury
## 115 2 - Trial by judge alone
I can then use that information to obtain a distribution showing whether a case went to a jury (1), just a judge (2) or neither (0).
table(closed89[,"Q10C"])
##
## 0 1 2
## 6933 306 10