Installing Packages
COMPAS
Description: Conformational Manipulations of Protein Atomic Structures
install.packages(“compas”)
compas
library(“readr”)
library(“tidyverse”)
library(“dplyr”)
library(“magrittr”)
library(“lubridate”)
library(readr)
install.packages(readr)
library(“readr”)

The goal of readr is to provide a fast and friendly way to read rectangular data from delimited files, such as comma-separated values (CSV) and tab-separated values (TSV). It is designed to parse many types of data found in the wild, while providing an informative problem report when parsing leads to unexpected results.
library(tidyverse)
install.packages(tidyverse)
library(“tidyverse”)

The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham and his team that “share an underlying design philosophy, grammar, and data structures” of tidy data.
dplyr
install.packages(dplyr)
library(“dplyr”)

One of the core packages of the tidyverse in the R programming language, dplyr is primarily a set of functions designed to enable dataframe manipulation in an intuitive, user-friendly way
magrittr
install.packages(magrittr)
library(“magrittr”)

Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, “Ceci n’est pas un pipe.”
lubridate
install.packages(lubridate)
library(“lubridate”)

R commands for date-times are generally unintuitive and change depending on the type of date-time object being used. Moreover, the methods we use with date-times must be robust to time zones, leap days, daylight savings times, and other time related quirks, and R lacks these capabilities in some situations. Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not.
File Import
Read in .csv file
recid.dat = read.csv("compas.violent.csv")
Data Summary
amount1 <- length(recid.dat)
print(paste("there are", amount1, "columns in the recid.dat file"))
[1] "there are 20 columns in the recid.dat file"
amount2 <- nrow(recid.dat)
print(paste('there are', amount2, 'rows in the recid.dat file'))
[1] "there are 4743 rows in the recid.dat file"
take0 <- recid.dat[1:3,1:6]
Make a Copy of recid.dat
external.dat <- recid.dat
Number of Distinct Values
individuals <- length(unique(recid.dat$name))
print(paste('there are' , individuals,'unique individuals'))
[1] "there are 4721 unique individuals"
Assigning Unique Identifier
new_recid.dat <- cbind(recid.dat,
DEFID = c(5000:9742))
new_recid.dat <- new_recid.dat[, c(21, 1:20)] # assigning DEFID to 1st column
print(head(new_recid.dat[1:3,1:6]))
external.dat <- cbind(external.dat,
DEFID = new_recid.dat$DEFID)
external.dat <- external.dat[, c(21, 1:20)] # assigning DEFID to 1st column
take1 <- print(head(external.dat[1:3,1:6]))
ADD DELETE COLUMN(S)
Add AGECAT Variable
oldest <- min(external.dat$dob)
print(paste(oldest,"is the oldest date of birth"))
[1] "1932-09-24 is the oldest date of birth"
youngest <- max(external.dat$dob)
print(paste(youngest,"is the most recent date of birth"))
[1] "1998-01-20 is the most recent date of birth"
external.dat$AGECAT <- ifelse(external.dat$dob < "1924-12-31", "Greatest Gen",
ifelse(external.dat$dob < "1945-12-31", "Silent Gen",
ifelse(external.dat$dob < "1964-12-31", "Baby Boomer",
ifelse(external.dat$dob < "1979-12-31", "Gen X",
ifelse(external.dat$dob < "1994-12-31", "Millenials",
ifelse(external.dat$dob < "2012-12-31", "Gen Z",
ifelse(external.dat$dob < "2025-12-31", "Gen Alpha")))))))
Remove age & dob
external.dat <- external.dat[, c(19, 1:18)] #reassign new column as first column
external.dat <- external.dat[,-c(5,6)]
#length(external.dat) #check data frame dimension is correct
take3 <- print(head(external.dat[1:3.,1:10]))
RECODE VALUES OF A VARIABLE
REPLACE DATETIME COLUMNS
Create Jail Days Column
external.dat$jail_days <- ifelse(external.dat$jail_hours < 23.9, 1,
ifelse(external.dat$jail_hours >= 24, 2,
ifelse(external.dat$jail_hours < 48, 3,
ifelse(external.dat$jail_hours <= 72, 4
))))
external.dat <- external.dat[, c(19, 1:18)]
tail_ <- view(tail(external.dat[4741:4743,1:6]))
head_ <- view(head(external.dat[1:3,1:6]))
Bibliography
This assignment about replicating the COMPAS software, which uses an algorithm to assess potential recidivism risk R-4.2.2 for Windows.
R is ‘GNU S,’ a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques such as linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, and in our case: in modeling the risk assessment software.
COMPAS - Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is a case management and decision support tool developed and owned by Northpointe (now Equivant) used by U.S. courts to assess the likelihood of a defendant becoming a recidivist.
For an in depth understanding of the methodology used in COMPAS risk assessment, I recommend the Practitioner-s-Guide-to-COMPAS-Core handbook.
---
title: 'Data Confidentiality '
author: "by Ndubuisi Chibuogwu"
output:
  html_notebook:
    toc: yes
    toc_depth: 4
    number_sections: yes
  pdf_document:
    toc: yes
    toc_depth: '4'
    number_sections: yes
  html_document:
    
    toc: yes
    toc_depth: '4'
    df_print: paged
bibliography: references.bib
---

# Installing Packages

## COMPAS
Description: Conformational Manipulations of Protein Atomic Structures \
install.packages("compas") \
[compas]("https://en.wikipedia.org/wiki/COMPAS_(software))

 
library("readr") \
library("tidyverse") \
library("dplyr") \
library("magrittr") \
library("lubridate") \



## library(readr) 

install.packages(readr) \
library("readr") \

<img align="right" width="50" height="50"src="https://readr.tidyverse.org/logo.png">

The goal of readr is to provide a fast and friendly way to read rectangular data from delimited files, such as comma-separated values (CSV) and tab-separated values (TSV). It is designed to parse many types of data found in the wild, while providing an informative problem report when parsing leads to unexpected results.

## library(tidyverse)

install.packages(tidyverse) \
library("tidyverse") \

<img align="right" width="50" height="50"src="https://www.tidyverse.org/images/hex-tidyverse.png">

The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data.


## dplyr

install.packages(dplyr) \
library("dplyr") \

<img align="right" width="50" height="50"src="https://dplyr.tidyverse.org/logo.png">

One of the core packages of the tidyverse in the R programming language, dplyr is primarily a set of functions designed to enable dataframe manipulation in an intuitive, user-friendly way


## magrittr

install.packages(magrittr)  \
library("magrittr")  \

<img align="right" width="50" height="50"src="https://magrittr.tidyverse.org/logo.png">

Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, "Ceci n'est pas un pipe." \


## lubridate

install.packages(lubridate) \
library("lubridate")

<img align="right" width="50" height="50"src="https://lubridate.tidyverse.org/logo.png">

R commands for date-times are generally unintuitive and change depending on the type of date-time object being used. Moreover, the methods we use with date-times must be robust to time zones, leap days, daylight savings times, and other time related quirks, and R lacks these capabilities in some situations. Lubridate makes it easier to do the things R does with date-times and possible to do the things R does not. \



# File Import

## Read in .csv file

```{r}
recid.dat = read.csv("compas.violent.csv") 
```


# Data Summary


```{r}
amount1 <- length(recid.dat) 
print(paste("there are", amount1, "columns in the recid.dat file"))
amount2 <- nrow(recid.dat)
print(paste('there are', amount2, 'rows in the recid.dat file'))
take0 <- recid.dat[1:3,1:6]
```

`r take0`


# Make a Copy of recid.dat 

```{r}
external.dat <- recid.dat
```


# Number of Distinct Values

```{r}
individuals <- length(unique(recid.dat$name))
print(paste('there are' , individuals,'unique individuals'))
```


# Assigning Unique Identifier

```{r}
new_recid.dat <- cbind(recid.dat, 
                       DEFID = c(5000:9742))
new_recid.dat <- new_recid.dat[, c(21, 1:20)] # assigning DEFID to 1st column
print(head(new_recid.dat[1:3,1:6]))

external.dat <- cbind(external.dat, 
                      DEFID = new_recid.dat$DEFID)
external.dat <- external.dat[, c(21, 1:20)]  # assigning DEFID to 1st column
take1 <- print(head(external.dat[1:3,1:6]))
```



# REMOVE NAME RELATED COLUMNS

```{r}
external.dat <- external.dat[,-c(2,3,4)]
take2 <- print(head(external.dat[1:3, 1:6]))
```



# ADD DELETE COLUMN(S)

## Add AGECAT Variable 
```{r}
oldest <- min(external.dat$dob)
print(paste(oldest,"is the oldest date of birth"))
youngest <- max(external.dat$dob)
print(paste(youngest,"is the most recent date of birth"))


external.dat$AGECAT <- ifelse(external.dat$dob < "1924-12-31", "Greatest Gen",
                       ifelse(external.dat$dob < "1945-12-31", "Silent Gen", 
                       ifelse(external.dat$dob < "1964-12-31", "Baby Boomer",
                       ifelse(external.dat$dob < "1979-12-31", "Gen X",
                       ifelse(external.dat$dob < "1994-12-31", "Millenials",
                       ifelse(external.dat$dob < "2012-12-31", "Gen Z",
                       ifelse(external.dat$dob < "2025-12-31", "Gen Alpha")))))))
```                       

## Remove age & dob

```{r}
external.dat <- external.dat[, c(19, 1:18)] #reassign new column as first column
external.dat <- external.dat[,-c(5,6)]
#length(external.dat) #check data frame dimension is correct
take3 <- print(head(external.dat[1:3.,1:10]))
```



# RECODE VALUES OF A VARIABLE 

```{r pressure, echo=FALSE}
external.dat <- external.dat %>%
         mutate(c_charge_degree = recode(c_charge_degree, M = 'Misdemeanors', F = 'Felony'))

take4 <- print(head(external.dat[1:3,1:7])) #show class variable renamed

```


# REPLACE DATETIME COLUMNS

## Transform Date to Date Object

```{r}
external.dat$c_jail_out <- ymd_hms(external.dat$c_jail_out) #convert character to time stamp
external.dat$c_jail_in <- ymd_hms(external.dat$c_jail_in)   #convert character to time stamp
out_of_jail <- hour(external.dat$c_jail_out)            #convert time stamp into hour format
in_jail <- hour(external.dat$c_jail_in)                 #convert time stamp into hour format
jail_hours <- out_of_jail + in_jail             #calculate total hours between date variable

length(external.dat)
external.dat <- cbind(external.dat, jail_hours = jail_hours) #Add new column to external.dat
external.dat <- external.dat[, c(18, 1:17)]               #assign new column to first column
take5 <- print(head(external.dat[1:3, 1:6]))
```


## Create Jail Days Column

```{r}
external.dat$jail_days <- ifelse(external.dat$jail_hours < 23.9, 1,
                       ifelse(external.dat$jail_hours >= 24, 2, 
                       ifelse(external.dat$jail_hours < 48, 3,
                       ifelse(external.dat$jail_hours <= 72, 4
                       ))))

external.dat <- external.dat[, c(19, 1:18)]  
tail_ <- view(tail(external.dat[4741:4743,1:6]))
head_ <- view(head(external.dat[1:3,1:6]))
```
`r tail_`
`r head_`


# Bibliography

This assignment about replicating the **COMPAS** software, which uses an algorithm to assess potential recidivism risk [R-4.2.2 for Windows]("https://cran.r-project.org/").   \
R is ‘GNU S’, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques such as linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, and in our case: in modeling the risk assessment software. \

[COMPAS]("https://en.wikipedia.org/wiki/COMPAS_(software)") - Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is a case management and decision support tool developed and owned by Northpointe (now Equivant) used by U.S. courts to assess the likelihood of a defendant becoming a recidivist.  \
For an in depth understanding of the methodology used in COMPAS risk assessment, I recommend the [Practitioner-s-Guide-to-COMPAS-Core]("https://s3.documentcloud.org/documents/2840784/Practitioner-s-Guide-to-COMPAS-Core.pdf") handbook.