I retrieved the majority opinons for the 32 Supreme Court opinions that ruled on the Free Exercise Clause of the first amendment according to Oyez.org a website that contains summaries and puts the opinions into categories based on topics. However, I went to Westlaw to retrieve the entire majority text of the opinion. I clean the corpus by removing stopwords, moving to lowercase, and removing numbers. I then ran the LDA model based on 4 topics.
[1] "~/Documents/Text Project/Judicial"
“Tokens”
<itoken>
Inherits from: <CallbackIterator>
Public:
callback: function (x)
clone: function (deep = FALSE)
initialize: function (x, callback = identity)
is_complete: active binding
length: active binding
move_cursor: function ()
nextElem: function ()
x: GenericIterator, iterator, R6
Number of docs: 32
0 stopwords: ...
ngram_min = 1; ngram_max = 1
Vocabulary:
term term_count doc_count
1: abdicate 1 1
2: abdication 1 1
3: abdul 1 1
4: abdullah 1 1
5: abhorrence 1 1
---
8635: state 626 32
8636: church 627 27
8637: court 813 32
8638: ct 1047 32
8639: religious 1272 32
[1] 8639 3
[1] 448 3
“Iterate over tokens, built vocabulary, pruned vocabulary, created DTM.”
<WarpLDA>
Inherits from: <LDA>
Public:
clone: function (deep = FALSE)
components: active binding
fit_transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10,
get_top_words: function (n = 10, topic_number = 1L:private$n_topics, lambda = 1)
initialize: function (n_topics = 10L, doc_topic_prior = 50/n_topics, topic_word_prior = 1/n_topics,
plot: function (lambda.step = 0.1, reorder.topics = FALSE, doc_len = private$doc_len,
topic_word_distribution: active binding
transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10,
Private:
calc_pseudo_loglikelihood: function (ptr = private$ptr)
check_convert_input: function (x)
components_: NULL
doc_len: NULL
doc_topic_distribution: function ()
doc_topic_distribution_with_prior: function ()
doc_topic_matrix: NULL
doc_topic_prior: 0.1
fit_transform_internal: function (model_ptr, n_iter, convergence_tol, n_check_convergence,
get_c_all: function ()
get_c_all_local: function ()
get_doc_topic_matrix: function (prt, nr)
get_topic_word_count: function ()
init_model_dtm: function (x, ptr = private$ptr)
internal_matrix_formats: list
is_initialized: FALSE
n_iter_inference: 10
n_topics: 4
ptr: NULL
reset_c_local: function ()
run_iter_doc: function (update_topics = TRUE, ptr = private$ptr)
run_iter_word: function (update_topics = TRUE, ptr = private$ptr)
seeds: 1956722818.67766 1422772583.17494
set_c_all: function (x)
set_internal_matrix_formats: function (sparse = NULL, dense = NULL)
topic_word_distribution_with_prior: function ()
topic_word_prior: 0.01
transform_internal: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10,
vocabulary: NULL
“LDA model”
INFO [17:09:11.183] early stopping at 100 iteration
INFO [17:09:11.248] early stopping at 50 iteration
“Fitted model and used a bar graph to display topic distribution for the first document then the entire dataset.”
[,1] [,2] [,3] [,4]
[1,] "hhs" "holy" "ordinance" "perich"
[2,] "rfra" "diocese" "prison" "hastings"
[3,] "corporations" "ecclesiastical" "animal" "cls"
[4,] "phillips" "dionisije" "sacrifice" "minister"
[5,] "roy" "foster" "santeria" "student"
[6,] "trinity" "css" "beard" "hosanna"
[7,] "coverage" "sales" "inmates" "tabor"
[8,] "profit" "illinois" "rluipa" "substances"
[9,] "insurance" "bishop" "animals" "forum"
[10,] "montana" "disputes" "prisoners" "ministerial"
“I got the top 10 words for the 4 topics.”