An Analysis of Texas Lawsuits

Seth J. Chandler

August 29, 2014

Introduction

This programming exercise asks students to take a look at one of the most comprehensive studies of lawsuits and insurance claims ever compiled: the study of 1989 claims in Texas. The material is, of course, a bit dated here in 2014, but it still is a trove of information on the legal process. The idea is to show students how to use R to obtain descriptive statistics on a large data set.

Appendix: All the Code

CLOSED89 <- read.csv("~/Downloads/CLOSED89.csv", stringsAsFactors=FALSE)
ClosedClaimDataFieldDefinition1989 <- read.csv("~/Downloads/ClosedClaimDataFieldDefinition1989.csv", stringsAsFactors=FALSE)
tsa<-CLOSED89[,"Q12A7"]
meantsa<-mean(tsa)
mediantsa<-median(tsa)
q90<-tsa[order(tsa)[round(0.9*length(tsa))]]
length(which(tsa>100000))/length(tsa)
tried<-CLOSED89[CLOSED89$"Q10A"<=8 & CLOSED89$"Q10A">=5,]
jury<-tried[CLOSED89$"Q10C"==1,]
judge<-tried[CLOSED89$"Q10C"==2,]
mean(jury[,"Q12A7"])
mean(judge[,"Q12A7"])

Reading in the Data

Import the files CLOSED89.csv and ClosedClaimDataFieldDefinition1989.csv into R. They contain a trove of information about the disposition of lawsuits in Texas in 1989. Take a look at the ClosedClaimDataFieldDefinition1989 file and figure out the name of the variable that contains the “Total settlement amount or court award.”

The first thing we do is suck in the data. I’ve printed out the dimensions of the two databases. You can see that the closed files database is pretty large. I’ ve also printed out just the top few cases in the claims file to get a flavor of what they look like. I’ve also printed out the column names in the definitions database. Intelligently poking about data to solve a problem may not be as fancy a skill as learning a new machine learning algorithm or understanding how to compute the page rank of a node in a network, but, as a practical matter, it is extremely important.

closed89 <- read.csv("~/Downloads/CLOSED89.csv", stringsAsFactors=FALSE)
print(dim(closed89))
## [1] 7249  220
print(head(closed89,n=3)) # just need a few cases to understand data 
##    EXTSEQ TYPEF      Q1A      Q1B      Q1C      Q1D      Q1E      Q1F
## 1 4000001    S  19860824 19880516 19880310        0 19881025        0
## 2 4000002    S  19850213 19870923 19870626 19881114 19881104        0
## 3 4000003    S  19860409 19870629 19870430 19881012 19881128 19881013
##        Q1G Q2 Q3A Q3B Q4A Q4B Q4C Q4D Q4E Q4F Q4G Q4H Q4I Q4J Q4K Q4L Q4M
## 1 19890201  0                                                            
## 2 19890201  0                                                            
## 3 19890101  0                                                            
##   Q4N Q4O Q4P Q4Q Q4R Q5A Q5B Q5C Q5D Q5E Q5F Q5G Q5H Q5I Q5J Q5K Q5L Q5M
## 1                                                                        
## 2                                                                        
## 3                                                                        
##   Q5N Q5O Q6A Q6B Q6C Q6D Q7A Q7B Q7C Q7D1   Q7D2    Q7D3   Q8A  Q8B   Q8C
## 1           1 220 220   0   4   1  19    0      0 1000000  2000 1500  3500
## 2           1 165 165   0   4   1  19    0 100000       0 10000 6500 16500
## 3           1  15  15  15   4   1  20    0 100000       0  2000 1000  3000
##     Q8D   Q8E   Q8F Q9A Q9B Q9C Q10A Q10B Q10C Q10D Q10E Q10F   Q11A Q11B1
## 1  2000  1500  3500  Y   Y   N     4    0    0    0         0 100000     0
## 2 25000 10000 35000  Y   Y   N     4    0    0    0         0  42000     0
## 3  2000  1000  3000  Y   Y   N     7    5    1    0   N     0  30000 12979
##   Q11B2A Q11B2B Q11B2C Q11B2D Q11B2E Q11C Q11D1 Q11D2 Q11D3A Q11D3B Q11D3C
## 1      0      0      0      0      0          0            0      0      0
## 2      0      0      0      0      0          0            0      0      0
## 3   1000  10282      0   1697  12979   N      0            0      0      0
##   Q11D3D Q11D3E Q11E1 Q11E2 Q11E3A Q11E3B Q11E3C Q11E3D Q11E3E Q12A1 Q12A2
## 1      0      0 22500    N       0      0      0      0      0 20000     0
## 2      0      0 20000    N       0      0      0      0      0 20000     0
## 3      0      0     0            0      0      0      0      0 12979     0
##   Q12A3A Q12A3B Q12A4A Q12A4B Q12A5A Q12A5B Q12A6A Q12A6B Q12A7 Q12B1A
## 1      0             0             0          2500        22500      0
## 2      0             0             0             0        20000      0
## 3      0             0             0             0        12979      0
##   Q12B1B Q12B1C Q12B2A Q12B2B Q12B2C Q12B3A Q12B3B Q12B3C Q12B4A Q12B4B
## 1      0      0      0      0      0      0      0      0      0      0
## 2      0      0      0      0      0      0      0      0      0      0
## 3      0      0      0      0      0      0      0      0      0      0
##   Q12B4C Q12B5A Q12B5B Q12B5C Q12B6A Q12B6B Q12B6C Q12C Q13A Q13B1A Q13B2A
## 1      0      0      0      0      0      0      0   N            0      0
## 2      0      0      0      0      0      0      0   N            0      0
## 3      0      0      0      0      0      0      0   N            0      0
##   Q13B2B Q13B2C Q13B3A Q13B3B Q13B3C Q13B4A Q13B4B Q13B4C Q13B5 Q13B6 Q13C
## 1      0      0      0      0      0      0      0      0     0     0     
## 2      0      0      0      0      0      0      0      0     0     0     
## 3      0      0      0      0      0      0      0      0     0     0     
##   Q13D1A Q13D2A Q13D2B Q13D3A Q13D3B Q13D4A Q13D4B Q13D5 Q13E1 Q13E2A1
## 1      0      0      0      0      0      0      0     0             0
## 2      0      0      0      0      0      0      0     0             0
## 3      0      0      0      0      0      0      0     0             0
##   Q13E2A2 Q13E2B1 Q13E2B2 Q13E2C1 Q13E2C2 Q13E2D1 Q13E2D2 Q13E2E1 Q13E2E2
## 1       0       0       0       0       0       0       0       0       0
## 2       0       0       0       0       0       0       0       0       0
## 3       0       0       0       0       0       0       0       0       0
##   Q13E2F1 Q13E2F2 Q13E2G1 Q13E2G2 Q13E2H1 Q13E2H2 Q13E2I1 Q13E2I2 Q14A
## 1       0       0       0       0       0       0       0       0     
## 2       0       0       0       0       0       0       0       0     
## 3       0       0       0       0       0       0       0       0     
##   Q14B Q14C1 Q14C2 Q14C3 Q14C4 Q14C5 Q14C6 Q15A Q15B Q16A Q16B1 Q16B2
## 1                                                  0          0     0
## 2                                                  0          0     0
## 3                                                  0          0     0
##   Q16B3 Q16B4 Q16C Q17A Q17B Q17C  Q17D ECONOMIC NONECO EXEMP INTEREST
## 1     0     0      2032    0  161  2193        0      0     0        0
## 2     0     0      9998    0 2230 12228        0      0     0        0
## 3     0     0      9305    0  630  9935     1000  10282     0     1697
##   TOTALDAM INSPCT OTHPCT UNPCT INSSHARE OTHSHARE UNSHARE ET1A1B ET1A1C
## 1        0      0      0     0        0        0       0    631    564
## 2        0      0      0     0        0        0       0    952    863
## 3    12979      0      0     0        0        0       0    446    386
##   ET1A1D ET1A1E ET1A1F ET1A1G ET1B1C ET1B1D ET1B1E ET1B1F ET1B1G ET1C1D
## 1     NA    793     NA    892    -67     NA    162     NA    261     NA
## 2   1370   1360     NA   1449    -89    418    408     NA    497    507
## 3    917    964    918    998    -60    471    518    472    552    531
##   ET1C1E ET1C1F ET1C1G ET1D1E ET1D1F ET1D1G ET1E1F ET1E1G ET1F1G POLLIMIT
## 1    229     NA    328     NA     NA     NA     NA     99     NA  1000000
## 2    497     NA    586    -10     NA     79     NA     89     NA   100000
## 3    578    532    612     47      1     81    -46     34     80   100000
##   PARTY MULTIDEF UNKNOWN COMPANUN
## 1     2        1       0        0
## 2     1        0       0        0
## 3     1        0       0        0
definitions89<-read.csv("~/Downloads/ClosedClaimDataFieldDefinition1989.csv", 
                        stringsAsFactors=FALSE)
print(dim(definitions89))
## [1] 297  16
print(colnames(definitions89))
##  [1] "Column"       "Field.Name"   "Field.Type"   "Column.Width"
##  [5] "Question"     "Coding.Notes" "X"            "X.1"         
##  [9] "X.2"          "X.3"          "X.4"          "X.5"         
## [13] "X.6"          "X.7"          "X.8"          "X.9"
print(head(definitions89))
##   Column Field.Name Field.Type Column.Width                 Question
## 1      1     EXTSEQ          N            8               TDI Number
## 2      2      TYPEF          A            1             Type of Form
## 3      3        Q1A          N            8           Date of injury
## 4      4        Q1B          N            8 Date reported to insurer
## 5      5        Q1C          N            8          Date suit filed
## 6      6        Q1D          N            8            Date of trial
##           Coding.Notes  X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9
## 1                      NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 2 L - Long   S - Short NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 3      YYYYMMDD format NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 4      YYYYMMDD format NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 5      YYYYMMDD format NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## 6      YYYYMMDD format NA  NA  NA  NA  NA  NA  NA  NA  NA  NA

Figuring out the most relevant data columns

It looks as if we could just print definitions89 without killing too many forests if we restricted ourselves to the Field.Name and Question columns. I’m going to use the substr function to shorten the Question column so the table fits better on the page. (Again, this isn’t one of the traditional analytic skills, but making your ouput attractive and useful is an important practical skill) Let’s do that.

definitions89$short.Question<-substr(definitions89[,"Question"],0,50)
print(definitions89[,c("Field.Name","short.Question")],row.names=FALSE)
##         Field.Name                                     short.Question
##             EXTSEQ                                         TDI Number
##              TYPEF                                       Type of Form
##                Q1A                                     Date of injury
##                Q1B                           Date reported to insurer
##                Q1C                                    Date suit filed
##                Q1D                                      Date of trial
##                Q1E                                 Date of settlement
##                Q1F                                 Date of jury award
##                Q1G                                  Date claim closed
##                 Q2                               Age of injured party
##                Q3A                                  Employment status
##                Q3B                         Work related injury status
##                Q4A                                              Death
##                Q4B                                         Amputation
##                Q4C                                       Burns (heat)
##                Q4D                                   Burns (chemical)
##                Q4E                         Systemic poisoning (toxic)
##                Q4F                         Systemic poisoning (other)
##                Q4G                             Eye injury (blindness)
##                Q4H                              Respiratory condition
##                Q4I                                  Nervous condition
##                Q4J                         Hearing loss or impairment
##                Q4K                              Circulatory condition
##                Q4L                                  Multiple injuries
##                Q4M                                        Back injury
##                Q4N                                      Skin disorder
##                Q4O                                       Brain damage
##                Q4P                                           Scarring
##                Q4Q                               Spinal cord injuries
##                Q4R                                              Other
##                Q5A                                   Off road vehicle
##                Q5B                                 Air transportation
##                Q5C                                            Railway
##                Q5D                                Other motor vehicle
##                Q5E                              Surgical/medical care
##                Q5F                                              Falls
##                Q5G                                           Drowning
##                Q5H                           Use of defective product
##                Q5I                                               Fire
##                Q5J                                            Firearm
##                Q5K                           Pollution/Toxic exposure
##                Q5L                                         Explosions
##                Q5M                      Use of agricultural machinery
##                Q5N                               Oil & gas extraction
##                Q5O                                              Other
##                Q6A                                    Injury location
##                                                                      
##                Q6B                       County where injury occurred
##                Q6C                            County where suit filed
##                Q6D                            County where case tried
##                Q7A                                        Policy type
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                Q7B                                        Policy form
##                                                                      
##                Q7C                                     Business class
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##               Q7D1                            Per person policy limit
##               Q7D2               Per occurrence/accident policy limit
##               Q7D3                              Combined single limit
##                Q8A                          Initial indemnity reserve
##                Q8B                            Initial expense reserve
##                Q8C            Initial expenditure reserve (Q8A + Q8B)
##                Q8D                            Final indemnity reserve
##                Q8E                              Final expense reserve
##                Q8F              Final expenditure reserve (Q8D + Q8E)
##                Q9A                   Attorney involvement - plaintiff
##                Q9B                     Attorney involvement - insurer
##                Q9C                     Attorney involvement - insured
##               Q10A           Legal stage where settlement was reached
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##               Q10B                            Result of court verdict
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##               Q10C                                  Trial involvement
##                                                                      
##               Q10D                               Who requested appeal
##                                                                      
##               Q10E                   Did the court order a remittitur
##               Q10F           Amount remittitur reduced original award
##               Q11A  Final demand of claimant or attorney for claimant
##              Q11B1                            Amount of court verdict
##             Q11B2A                     Economic loss on court verdict
##             Q11B2B                 Non-economic loss on court verdict
##             Q11B2C                 Exemplary damages on court verdict
##             Q11B2D              Prejudgment interest on court verdict
##             Q11B2E                         Total court verdict amount
##               Q11C              Was settlement different from verdict
##              Q11D1                               Amount of settlement
##              Q11D2 Was settlement influence by noneconomic damages or
##                    prejudgment interest (long forms), exemplary damag
##             Q11D3A         Economic loss on settlements after verdict
##             Q11D3B     Non-economic loss on settlements after verdict
##             Q11D3C     Exemplary damages on settlements after verdict
##             Q11D3D  Prejudgment interest on settlements after verdict
##             Q11D3E                 Amount of settlement after verdict
##              Q11E1               Settlement amount - no court verdict
##              Q11E2 Was settlement influence by noneconomic damages or
##                    prejudgment interest (long forms), exemplary damag
##             Q11E3A       Economic loss on settlements without verdict
##             Q11E3B   Non-economic loss on settlements without verdict
##             Q11E3C   Exemplary damages on settlements without verdict
##             Q11E3D Prejudgment interest on settlements without verdic
##             Q11E3E               Amount of settlement without verdict
##              Q12A1                 Amount paid by the primary carrier
##              Q12A2 Amount paid by insured due to deductible (or self-
##             Q12A3A                      Amount paid by excess carrier
##             Q12A3B              Unknown amount paid by excess carrier
##             Q12A4A Amount paid by insured - over policy limit or to e
##             Q12A4B                 Unknown for amount paid by insured
##             Q12A5A            Amount paid by other insured defendants
##             Q12A5B    Unknown amount paid by other insured defendants
##             Q12A6A                Amount paid by uninsured defendants
##             Q12A6B        Unknown amount paid by uninsured defendants
##              Q12A7             Total settlement amount or court award
##             Q12B1A Other companies contributing to settlement (1st co
##             Q12B1B Other companies contributing to settlement NAIC co
##             Q12B1C       Amount paid by other companies (1st company)
##             Q12B2A Other companies contributing to settlement (2nd co
##             Q12B2B Other companies contributing to settlement NAIC co
##             Q12B2C       Amount paid by other companies (2nd company)
##             Q12B3A Other companies contributing to settlement (3rd co
##             Q12B3B Other companies contributing to settlement NAIC co
##             Q12B3C       Amount paid by other companies (3rd company)
##             Q12B4A Other companies contributing to settlement (4th co
##             Q12B4B Other companies contributing to settlement NAIC co
##             Q12B4C       Amount paid by other companies (4th company)
##             Q12B5A Other companies contributing to settlement (5th co
##             Q12B5B Other companies contributing to settlement NAIC co
##             Q12B5C       Amount paid by other companies (5th company)
##             Q12B6A Other companies contributing to settlement (6th co
##             Q12B6B Other companies contributing to settlement NAIC co
##             Q12B6C       Amount paid by other companies (6th company)
##               Q12C          Any other defendants still in liitigation
##               Q13A Did judgment provide for joint and several liabili
##             Q13B1A    Court verdict: percentage fault - injured party
##             Q13B2A    Court verdict: percentage fault - insured party
##             Q13B2B                  Amount of verdict - insured party
##             Q13B2C               Amount of settlement - insured party
##             Q13B3A Court verdict: percentage fault - other insured de
##             Q13B3B       Amount of verdict - other insured defendants
##             Q13B3C    Amount of settlement - other insured defendants
##             Q13B4A Court verdict: percentage fault - uninsured defend
##             Q13B4B           Amount of verdict - uninsured defendants
##             Q13B4C        Amount of settlement - uninsured defendants
##              Q13B5                               Total verdict amount
##              Q13B6 Total payout amount paid in settlement after verdi
##               Q13C Did the doctrine of joint and several liability im
##             Q13D1A       No verdict: percentage fault - injured party
##             Q13D2A       No verdict: percentage fault - insured party
##             Q13D2B                  Settlement amount - insured party
##             Q13D3A No verdict: percentage fault - other insured defen
##             Q13D3B       Settlement amount - other insured defendants
##             Q13D4A No verdict: percentage fault - uninsured defendant
##             Q13D4B           Settlement amount - uninsured defendants
##              Q13D5 Total settlement for all defendants, no court verd
##              Q13E1                         Number of other defendants
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##            Q13E2A1     Number of other insured defendants - Municipal
##            Q13E2A2   Number of other uninsured defendants - Municipal
##            Q13E2B1 Number of other insured defendants - Government ot
##            Q13E2B2 Number of other uninsured defendants - Government 
##            Q13E2C1      Number of other insured defendants - Business
##            Q13E2C2    Number of other uninsured defendants - Business
##            Q13E2D1    Number of other insured defendants - Industrial
##            Q13E2D2  Number of other uninsured defendants - Industrial
##            Q13E2E1 Number of other insured defendants - Non-profit or
##            Q13E2E2 Number of other uninsured defendants - Non-profit 
##            Q13E2F1      Number of other insured defendants - Hospital
##            Q13E2F2    Number of other uninsured defendants - Hospital
##            Q13E2G1 Number of other insured defendants - Physicians & 
##            Q13E2G2 Number of other uninsured defendants - Physicians 
##            Q13E2H1 Number of other insured defendants - Other health 
##            Q13E2H2 Number of other uninsured defendants - Other healt
##            Q13E2I1    Number of other insured defendants - All others
##            Q13E2I2  Number of other uninsured defendants - All others
##               Q14A   Workers' compensation available to injured party
##               Q14B           Availability of other collateral sources
##              Q14C1                                  Medical insurance
##              Q14C2                               Disability insurance
##              Q14C3 Social security disability/supplementary security 
##              Q14C4                                 Medicare, medicaid
##              Q14C5                                         Sick leave
##              Q14C6                                              Other
##               Q15A Lawsuits pertaining to subrogation, contribution, 
##               Q15B Status in lawsuit  for subrogation , contributions
##                                                                      
##                                                                      
##                                                                      
##               Q16A        Structured settlement used in closing claim
##              Q16B1                                  Immediate payment
##              Q16B2                 Present value of structured amount
##              Q16B3         Total award or settlement (Equal to Q12A7)
##              Q16B4                      Total projected future payout
##               Q16C Structured settlement used to pay plaintiff's atto
##               Q17A             Amount paid to outside defense counsel
##               Q17B Amount for allocated expense for inhouse defense c
##               Q17C                              Amount for other ALAE
##               Q17D            Total allocated loss adjustment expense
##           ECONOMIC       Calculated Field indicating economic damages
##             NONECO   Calculated Field indicating non-economic damages
##              EXEMP      Calculated Field indicating exemplary damages
##           INTEREST   Calculated Field indicating prejudgment interest
##           TOTALDAM Calculated Field indicating total damages, if deta
##             INSPCT Insured's percentage of fault after reallocating f
##             OTHPCT Other insured parties percentage of fault after re
##              UNPCT Uninsured parties percentage of fault after reallo
##           INSSHARE Insured's share of settlement after reallocating f
##           OTHSHARE Other insured parties share of settlement after re
##            UNSHARE Uninsured parties share of settlement after reallo
##             ET1A1B Elapsed time between date of injury and date repor
##             ET1A1C Elapsed time between date of injury and date suit 
##             ET1A1D Elapsed time between date of injury and date of tr
##             ET1A1E Elapsed time between date of injury and date of se
##             ET1A1F Elapsed time between date of injury and date of ju
##             ET1A1G Elapsed time between date of injury and date claim
##             ET1B1C Elapsed time between date reported to insurer and 
##             ET1B1D Elapsed time between date reported to insurer and 
##             ET1B1E Elapsed time between date reported to insurer and 
##             ET1B1F Elapsed time between date reported to insurer and 
##             ET1B1G Elapsed time between date reported to insurer and 
##             ET1C1D Elapsed time between date suit filed and date of t
##             ET1C1E Elapsed time between date suit filed and date of s
##             ET1C1F Elapsed time between date suit filed and date of j
##             ET1C1G Elapsed time between date suit filed and date clai
##             ET1D1E Elapsed time between date of trial and date of set
##             ET1D1F Elapsed time between date of trial and date of jur
##             ET1D1G Elapsed time between date of trial and date claim 
##             ET1E1F Elapsed time between date of settlement and date o
##             ET1E1G Elapsed time between date of settlement and date c
##             ET1F1G Elapsed time between date of jury award and date c
##           POLLIMIT            Policy limit for each individual injury
##              PARTY Indicates the presence of other contributors to th
##                                regardless of whether payment was made
##           MULTIDEF Indicates whether multiple defendants were involve
##                                                                      
##            UNKNOWN Indicates incomplete settlement amounts due to unk
##                                          litigation still in progress
##           COMPANUN    Identifies matches among multi-defendant claims
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##  Number of Records                                                   
##              5,917                                                   
##              7,249                                                   
##              9,140                                                   
##             11,197                                                   
##             12,610                                                   
##             12,891

When I do this, it looks as though the question I want is “12A7,” which contains the “Total settlement amount or court award.” I’m not 100% certain of this because the question descriptions are rather terse, so if you picked some other plausible column, don’t worry about it. In the real world, we’d either spend a lot more time on this issue or contact the authors of the study.

Getting the total settlement amount or court award column

All we really need for the next section of the exercise is this total settlement amount or court award column. We should know how to do this by now.

head(tsa<-closed89[,"Q12A7"])
## [1] 22500 20000 12979 17500 15000 20000

Descriptive statistics

Basics

Once we have the data, getting descriptive statistics is incredibly easy in R. I first print the mean, which is the sum of the values divided by the number of values. I then print the median, which for an odd number of values is the middle of a sorted list of values. For an even number of values it’s defined by R as the mean of the two middle values of the sorted list – a point I establish with the last piece of code.

print(summary(tsa))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    10000    17000    30000   134000    85000 22600000
print(mean(tsa))
## [1] 134393
print(median(tsa))
## [1] 30000
print(median(c(3,5,8,2)))
## [1] 4

Quantiles

R lets us calculate quantiles (sometimes called percentiles) quite easily.

quantile(tsa,0.9)
##    90% 
## 250000

So, 90% of the cases have awards under $250,000.

Visualization

One thing you should immediately see is that the mean and the median are far apart. This is an artifact of the distribution of awards being highly skewed.

hist(tsa)

plot of chunk unnamed-chunk-7 Or, if you want a clearer picture of the data, plot it on a logarithmic scale using ggplot2. Notice how I use + scale_x_log10() to create a logarithmic x-axis and how I use xlab and ylab to create custom axis labels.

require(ggplot2)
## Loading required package: ggplot2
ggplot(data.frame("tsa"=tsa), aes(x = tsa)) + geom_histogram(fill="steelblue") + scale_x_log10() + xlab("log of settlement or award") + ylab("how many cases")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-8

Tried cases: judge and jury

Suppose we want to look at cases that actually went to trial and see how settlements and awards varied depending on whether the case tried to a judge or jury. To do this, we will want to look at the variable “Q10A,” which is described as “Legal stage where settlement was reached.” We can see the values of this variable with the following R code. See if you can follow what I am doing. First I find the row that starts Q10A. Then I find the row that starts the next question, which happens to be Q10B. I subtract 1 from that so that I get the end of Q10A. Then I look at the “Coding.Notes” column for all of the answers relevant to Q10A.

print(start<-which(definitions89[,"Field.Name"]=="Q10A"))
## [1] 96
print(end<-which(definitions89[,"Field.Name"]=="Q10B")-1)
## [1] 104
definitions89[start:end,"Coding.Notes"]
## [1] "1 - Alternative dispute resolution: no suit"  
## [2] "2 - No suit filed"                            
## [3] "3 - Alternative dispute resolution: with suit"
## [4] "4 - Suit filed, settled before trial"         
## [5] "5 - During trial, before court verdict"       
## [6] "6 - Court verdict"                            
## [7] "7 - Settlement reached after verdict"         
## [8] "8 - Settlement after appeal filed"            
## [9] "9 - Case dismissed or summary judgment"

When I do this I see that answers 1 to 4 and answer 9 correspond to dispositions that do not involve trial. In fact, just out of curiosity, let’s make a pie chart showing the disposition of the cases. It’s not a beautiful graphic by any means, but for the moment I don’t want to spend the time making it better.

q10a<-table(closed89[,"Q10A"])
pie(q10a,labels=definitions89[start:end,"Coding.Notes"])

plot of chunk unnamed-chunk-10 Anyway, let’s just focus on the cases that have gone to trial.

dim(tried<-closed89[closed89$"Q10A" %in% c(5,6,7,8),])
## [1] 316 220

We see there are 316 of them. #Judge or Jury To figure out which cases were tried to a judge and which to a jury, we need to look at other data in the closed89 data.frame. I can use R’s “grep” command to find any “Coding Notes” that refer to a jury. It turns out that Question 10C appears to have the information I need.

definitions89[grep(".*jury.*",definitions89[,"Coding.Notes"])+-1:1,c("Question","Coding.Notes")]
##              Question                Coding.Notes
## 113                                9 - All others
## 114 Trial involvement 1 - Trial by judge and jury
## 115                      2 - Trial by judge alone

I can then use that information to obtain a distribution showing whether a case went to a jury (1), just a judge (2) or neither (0).

table(closed89[,"Q10C"])
## 
##    0    1    2 
## 6933  306   10