Recreating Literary Works Using ChatGPT and Evaluating Results with NLP Analysis

{r setup, echo=FALSE} library(kableExtra) library(reticulate) use_python('/Users/thomasmatthews/opt/anaconda3/bin/python')

Introduction

This study is a deep dive into evaluating ChatGPT’s poetic proclivities. Can a cutting-edge chatbot re-create poetic styles effectively? Will it feel authentic to the prompted style? Can it fool the leading AI-generated text detectors? This study will explore these questions, specifically Open-AI’s ChatGPT and its specific responses to prompts about a selection of poems. ChatGPT is a Large Language Model that can generate text using different styles for various uses, from asking questions about historical events to troubleshooting coding errors. It is unique in its design because it uses Supervised Learning and Reinforcement Learning to improve its responses. There is a human feedback loop that acts as moderation to reduce biased outputs and help prevent harmful responses (Kargar 2023).

Supervised Learning is a subset of Machine Learning where models are supervised and manually adjusted. Examples include Regression Models that predict a stock price or a Classification model trying to predict whether the text is AI-generated. Reinforcement Learning comes into play with ChatGPT because Open-AI has dedicated teams adjust the model inputs and refine answers based on feedback from people using the tool. GPT stands for Generative Pre-trained Transformer, and ChatGPT is a large language model that was set up on an extensive neural network architecture with millions of inputs. The text it generates comes from millions of pieces of text it was trained on, and the way it is unique in how it can transform responses with convincing language, making it seem it came from a human.

The transformer type of neural network is unique in how it treats text. Transformers were designed to help with translating text from one language to another. The leading neural network for translation before transformers was Recurrent Neural Networks (RNN), and in that structure, the order in which words appear is how they are directly translated (Markowitz 2021). The order of words in text presents a challenge for languages like translating from English to French because words have genders, and adjectives come after nouns in French.

Transformers can navigate this challenge using positional encoding. Words in the text are given numbers, and their position within a sentence is stored in the data before it is loaded into the network architecture. The transformer then learns the order of words; the more information it is trained on, the more accurate it becomes in predicting the order in which words go. In neural network terms, this is called attention, specifically self-attention (Markowitz 2021). The high accuracy of predicting the correct sequences of words based on previously trained data gives ChatGPT’s responses a more conversationally authentic feel to users.

Six poems are prompted and recreated using ChatGPT, and each poem will use one of each poet’s six styles. This study uses Supervised Learning Classification techniques and is also a Natural Language Processing (NLP) exercise in which original works compare to ChatGPT recreations using various poetic styles. Sentiment analysis compares a poem’s original sentiment to ChatGPT’s recreations. Lexical measures explore vocabulary richness and diversity within the results, and this study will cover which measures were selected as the most significant for this study’s scope. Two Artificial Intelligence (AI) detectors will be evaluated on whether AI-generated poems are convincing enough to pass their current AI detection methods and how convincing the recreations are.

Video for Thesis Defense

link to Thesis Defense

Slides for Presentation

link to Defense Slides

Data Collection and Preparation

Poems were sourced from these sites

1. “The Raven” by Edgar Allen Poe - (Poe, n.d.)

2. “Jabberwocky” by Lewis Carroll - (Carroll, n.d.)

3. “Sonnet 18: Shall I compare thee to a summer’s day?” by William Shakespeare - (Shakespeare, n.d.)

4. “O Captain! My Captain!” by Walt Whitman - (Whitman, n.d.)

5. “Still I Rise” by Maya Angelou - (Angelou, n.d.)

6. “Do Not Go Gentle Into That Good Night” by Dylan Thomas - (Thomas, n.d.)

Each poem was loaded into Open-AI’s ChatGPT3 chatbot and responses were given for these scenarios (n.d.)

Tell the original poem: this becomes GPT’s version of the poem
Recreate the same poem using another poet’s style, for example, “Recreate the Raven by Edgar Allen Poe, using Maya Angelou’s style.”
This creates seven instances of the same poem, one for the original from the above sources, GPT’s version and five instances of the original poem using GPT’s rendition of another poet’s style for the original.
A row key is created for each permutation. For example, Maya_Angelou_Poets is “Still I Rise” by Maya Angelou and the poets means the original version. Maya_Angelou_WW is “Still I Rise” by Maya Angelou but recreated by ChatGPT’s style of Walt Whitman.

This forms the base structure for the data set; six poets and seven versions of each style per row becomes forty-two rows of data. Each piece of analysis will append more columns to this data set to explore trends and provide deeper insights.

Responses were obtained from ChatGPT on February 1st, 2023. Link below for those prompts and rpesponses.

(link to ChatGPT Responses)

Python Packages

This study will use Python as the programming language to run models, analyze results and visualize trends using the Pandas, Numpy, Seaborn, Scikit-learn, and Plotly packages to accomplish these goals. This study uses tools and platforms that are constantly changing because of the inputs within the Reinforcement Learning algorithm they are built on. Improvements within AI detectors and feedback within ChatGPT will alter current results in the future.

The packages below were used to generate features such as lexical measures and sentiment scores.

Version numbers used for this study and links to documentation for each:

LexicalRichness (Shen 2022) - version: 0.4.1

pandas - version: 1.5.3

textblob - version: 0.16.0

numpy - version: 1.24.0

matplotlib - version: 3.7.0

seaborn - version: 0.12.2

sklearn - version: 1.2.1

plotly - version: 5.13.1

Differences in Originals and GPT

	words	terms	GPT_words	GPT_terms	words_var	terms_var
Edgar_Allen_Poe_Poets	1101	452	314	156	787	296
Maya_Angelou_Poets	240	132	252	133	12	1
Walt_Whitman_Poets	199	114	203	115	4	1
Lewis_Carroll_Poets	168	94	166	90	2	4
William_Shakespeare_Poets	116	84	117	83	1	1
Dylan_Thomas_Poets	168	98	168	98	0	0

The table above shows the difference between the original poems and the ChatGPT recreations. The table is sorted by words_var descending, which means the poems with the most significant word variation from the original to ChatGPT recreation are in the top rows of the table. Edgar Allen Poe’s “The Raven” stands out the most, with 787 words fewer in the GPT recreation than in the original. Dylan Thomas’ “Do Not Go Gentle Into That Good Night” matched precisely, which means a user who did not know the original could prompt ChatGPT to give them that poem verbatim. If users were to take ChatGPT’s responses as a source of truth without verifying source material for these poems, they would get an abridged rendition of poem classics and would miss out on the wealth of the whole content from the original poets.

Lexical Analysis

The results from ChatGPT’s responses are analyzed with the Python package LexicalRichness. This package has multiple lexical diversity measures that can classify text using various methods such as Type Token Ratio (TTR), Measure of Textual Lexical Diversity (MTLD), Vocabulary Density (voc-D), Summer’s, Dungast’s, Maas’s, and Herdan’s lexical diversity measures (Shen 2022). This package also has some standard text analysis tools, such as counting words and unique terms, which are helpful when comparing original works to what ChatGPT considers to be the original work. The original poems and ChatGPT results are combined into a data frame, and columns are added for each lexical score, accomplished using the Pƒandas package in Python because the format provides an efficient base for future visualizations and models.

Lexical diversity scores benchmark the richness of words within a text. For most of the scores, the higher the score is, the more diverse the text is. Type-Token Ratio (TTR) is the standard calculation for measuring lexical diversity. It divides the unique words (types) by the text’s total words (tokens). The LexcialRichness has different iterations of this calculation to provide more scoring options for text. Mass, for example, takes the log of words minus the log of terms and divides it by the log of words squared to get an index version of the original TTR calculation. In this case, a lower score would indicate better lexical diversity.

The lexical analysis for this study has an extensive collection of lexical features. It offers multiple methods to evaluate vocabulary richness and diversity, which can be challenging when one is comparing different pieces of text to another. Text length becomes a factor, and that can affect the type-token ratio (TTR), but the Hypergeometric Distribution D (HD-D) and VOC-D do not require as much text to calculate scores (Lissón and Ballier 2018).

Heatmap

Selecting from the many features that the LexicalRichness package brought to this study’s dataset is a challenging task, so a correlation heatmap is used to better understand the relationship between words, terms, and the various metrics available. A correlation matrix is created off a subset of the original data frame. Values range from negative one to one; higher correlations tend to land on the ends of the scale, and the less correlated features land in the middle (McDonald 2022). A number closer to 1 means a postive corrleation to other features, as the value increases for the column, the value for the highly correlated column also increases. The inverse of this relationship is true with feautures that have values closer to -1. The grid below shows that from msttr to vocd, there are a lot of highly correlated features. These relationships are complex and need further analysis. By using principal component analysis, the two main features will be identified.A clustering model will be used to explore trends in other features.

PCA Analysis for Feature Selection

Principal Component Analysis (PCA) is a feature selection technique that helps identify feature importance within a dataset. This quote by Serafeim Loukas sums up PCA perfectly, “PCA is an orthogonal transformation of the data into a series of uncorrelated data living in the reduced PCA space that the first component explains the most variance in the data with each subsequent component explaining less” (Loukas 2023). A great way to think of PCA is if there is a large dataset with numerical columns, PCA can help identify components, which is a selection of a few columns that explain the majority of the variance in that dataset. This analysis helps enhance the findings from the correlation heatmap.

[[0.02730565 0.00346815 0.27618432 0.07531858 0.07531858 0.27102501
  0.30987239 0.31523984 0.31540353 0.3321793  0.33657807 0.33347875
  0.32372083 0.3219576 ]
 [0.46519634 0.47107241 0.26987254 0.45823909 0.45823909 0.13335679
  0.06002295 0.0539825  0.16429446 0.06543924 0.00467274 0.02812352
  0.0805703  0.07749603]]

The PCA was completed using this method with help from the scikit-learn package within Python. The results above show that the eleventh and twelfth columns (Dugast, Maas) have the highest feature importance for the first component. For the second component, words and terms have the most importance. The combined components explain 93.47% of the variation in the model.

Below is the explained variance ratio the two components add up to get the 93.47% mentioned above.

[0.61787872 0.31689756]

Cluster Analysis

Clustering analysis is a discipline within Machine Learning, specifically Unsupervised Learning, to evaluate trends within datasets. Unsupervised Learning is used in datasets with no labels, meaning that the scientist running the model does not have results to determine if a particular event is true or false. A great example is smoking datasets where the population has or does not have cancer. The smoking dataset is labeled and a prime candidate for Supervised Learning, trying to predict if a person in the dataset will or will not have cancer. Clustering Analysis is a great way to explore datasets to see if relationships exist within a dataset when no labels are present. The LexicalRichness package does not have documentation to tell users which metric is the best to use when determining lexical diversity, so clustering analysis will assist in evaluating trends within the data.

These results are unsurprising with the heatmap; Maas was on the far end of the correlation matrix scale, close to negative one. The other highly correlated values were close but slightly edged out by Dugast’s metric. Visualizing this relationship and trend was accomplished with k-means clustering analysis. To determine the number of clusters for each model iteration, the Elbow Method is used to help determine how many clusters exist in the data set with the selected features. One callout for this analysis section is that each clustering required three clusters, so there is only one screenshot of the preliminary analysis plot with the Elbow Method. The number of clusters is determined based on when this curve reaches an apex, and the curve looks like an arm raised with the curve at the elbow or apex.

The cluster plot for the identified features for the primary PCA is very straightforward; there is a clear negative correlation in this data set for Maas and Dugast; as a Dugast score increases, the Maas score decreases. The reddish-pink cluster on the bottom right of the plot consists mainly of William Shakespeare’s poem and recreations. The yellow cluster on the top left corner of the plot contains Dylan Thomas’ and Lewis Carroll’s recreations, which is an intriguing insight. Dugast’s score is made of taking the log of the number of tokens and squaring it, then dividing that by the log of the number of tokens minus the log of the V, which is the number of types. Maas’ calculation is an index of those terms, so it makes sense that they are correlated. Each lexical diversity measure is a different iteration or formulaic take on a previous method, so it is unsurprising that so many features showed extremes in the heat map.

Exploratory Clusters

Based on the assessment that many of these features are correlated, other combinations were explored: Vocd and words, Herdan and mtld, hdd and Maas, mtld and Maas. These clusters offer further exploratory analysis in which metrics also make sense in explaining trends with this dataset. A critical callout is that of the other combinations explored; they never came as close to the clear linear relationship within their clustering as the PCA selected features to the plot. The vocd and words plot is an extreme outlier with the original Edgar Allen Poe poem as a cluster. The other three plots also have scattered values, and a line of best fit could be drawn, but the clustering plot that was based on the PCA analysis has the tightest grouping of values, indicating a clear negative correlation; as the Maas score decreases, the Dugast score increases.

Sentiment Analysis

Sentiment Analysis provides feedback on whether data is positive, negative, or neutral. These sentiments are split into categories and scores. This study uses the TextBlob package in Python. “TextBlob is built on top of NLTK and Pattern also, it very easy to use and can process text in a few lines of code” (Bandgar 2021). Natural Language Toolkit is a Python platform with many lexical resources available. It can tokenize text, an essential part of any NLP process because it allows users to look for keywords and apply their list or dictionary keywords they are looking to trend and analyze.

TextBlob shows sentiment with two categories, polarity and subjectivity. Polarity is a float value from -1 to 1. -1 is a negative sentiment, and 1 is a positive sentiment. 0 would indicate a neutral feeling. In other NLP exercises, these scores can change if a custom dictionary categorizes specific words as positive, negative, or neutral. Subjectivity is a float value that ranges from 0 to 1. The closer to 0 a subjectivity score, the more objective a statement is, and the statement would be deemed factual. A score of 1 would indicate personal views or beliefs and be more subjective (Bandgar 2021).

The combination of these two scores provides context to a sentiment score. For the poems in this study, does ChatGPT tend to make poems more positive than originals? Are they more subjective or objective? The polarity and subjectivity scores for each poem were added to the data frame. NLP is a large field within Data Science, and this study incorporates a subset of the analysis to evaluate AI-generated text. The goal behind the analysis is to get a score and compare each poem’s permutation. The plots below will allow a user to filter by the legend, and data points have a hover function that shows more detail for each point.

Sentiment Scatterplots

Here is a view by GPT Style Re-creations:

The analysis below uses the Sentiment Analysis by Poem plot for analysis.

Here is the link to ChatGPT chat prompts and responses (link to ChatGPT Responses)

“The Raven” has an interesting trend; it ranks very high on subjectivity and low on the polarity scale, meaning that the original poem contains much personal sentiment, which makes sense, but the polarity is almost neutral. This score should be more negative than neutral. The recreation using Maya Angelou’s style came close in polarity but struggled with subjectivity. William Shakespeare and Lewis Carroll’s styles came out overly positive compared to the other styles.

Sonnet 18 is the first plot where we see the trend that GPT and original poems overlap on sentiment scoring. This aligns with the previous analysis on the differences between the originals and GPT-prompted originals. If the words are close and the terms are pretty much the same, it makes sense that the subjectivity and polarity scores reflect it. The plots use Plotly and can be filtered in the Python environment to see the overlap. An interesting trend with this poem is that Dylan Thomas and William Shakespeare’s points are grouped, showing positivity and opinion. In contrast, the Lewis Carroll and Maya Angelou recreations show positivity but come off as more objective than subjective with their scores.

The analysis for “Jabberwocky” shows intriguing trends. The original and GPT versions show objectivity and neutral sentiment, whereas Walt Whitman, Maya Angelou, Dylan Thomas, and William Shakespeare’s renditions show more subjectiveness and positive sentiment. Edgar Allen Poe’s version of “Jabberwocky” was more negative and subjective than the original, but considering the poetic signature of his style, this is expected.

Walt Whitman’s “O Captain! My Captain!” followed a similar trend, with Edgar Allen Poe’s style being more subjective and having a negative sentiment. There is evident passion and strong emotions in the original poem, but what is surprising is that the original poem scored a relatively neutral sentiment and is very objective. The other renditions follow the trend from the previous plot, with more positive and opinionated versions of the original.

Dylan Thomas’ “Don Not Go Gentle Into That Good Night” had an interesting trend: every other version of the original had a more positive and opinionated sentiment than the original, and the Edgar Allen Poe version was the most positive and subjective. This poem captures a lot of emotion and covers rich themes of maturation and losing one’s parent. This analysis is very high level and scratches the surface of a poem’s true meaning. This study shows that it is challenging to articulate poetic themes and their impact with simply using numerical scores.

Evaluating sentiment from AI-generated text is an interesting NLP topic because it implies that AI can generate something that is scored for feeling, which, if machines could feel, would imply that they can pass the Turing test, classifying them as sentient. ChatGPT’s bot reassures users that it has no opinions or feelings but can generate text showing a wide range of emotions. The TextBlob package provided a simple and effective way of classifying these feelings for each poem and recreation. For “Still I Rise,” the GPT rendition of the original was close in terms of words and terms and scored very close to the original. However, the William Shakespeare and Walt Whitman renditions took on a different sentiment signature. With higher subjectivity and polarity, it takes on a more subjective and positive feeling, which does not align with the original.

AI-Text Classification Analysis

High Level AI-Detector Findings

Detecting AI-generated text is a complex and constantly evolving part of NLP. This analysis will use two detectors and test their effectiveness (Open-AI and GPTZero).

Open-AI Score	Count
Text too short	15
Very Unlikely AI-generated	11
Unlikely AI-generated	8
Unclear if it is AI-generated	8

The classifier results from Open-AI’s AI-generated result output will evaluate the text detector’s effectiveness. The table below shows that eleven of the forty-two results were unlikely AI-generated. Eight of the forty-two results were unclear if they were AI-generated and fifteen were too short to be assessed by the AI detector. These were William Shakespeare’s “Sonnet 18” and Dylan Thomas’ “Do Not Go Gentle Into That Good Night.”

The poems that were unclear if they were AI-generated are shown below, and the poems that stood out are “Jabberwocky” and “The Raven.” Lewis Carroll’s poem uses obtuse words to punctuate stanzas and add flavor to a dragon-slaying epoch. It is not too surprising to see this show up as a possible AI-generated text. “The Raven” has a unique signature and is the longest poem in this study, contributing to its complexity.

GPTZero Score	Count
Likely to be entirely written by human	41
Most likely written by a human with some sentences with low perplexities	1

GPTZero’s overall scoring for this study is less discerning than Open-AI’s detector. It classified forty-one poems to be written by a human, and one had sentences with low perplexities. The irony in these results is that they classified Maya Angelou’s “Still I Rise” original with the text with sentences with low perplexities. In NLP terms, perplexity is how well a model predicts the words in a sample. The lower the score, the less complex the words are and easier to predict. Burstiness is another score introduced with GPTZero, and burstiness looks at spikes within sentences and looks for variation in perplexity. Humans will vary in their rhythm with writing where, as a bot will conform to patterns making it easier to predict. In the subsequent plots, it will become evident that this poem was the richest in average perplexity and burstiness. Five of the six original poets scored in the top ten for burstiness and perplexity, meaning the originals were classified as authentically written by a human correctly and were scored that way too. Walt Whitman’s GPT poem scored higher than the original, but it was still close to the original. The poem recreations that scored the lowest were different versions of Maya Angelou’s “Still I Rise” and Edgar Allen Poe’s “The Raven.” The plots were created using the Python library Plotly and can be filtered dynamically.

Bar Plot Showing Results from GPTZero

This bar chart explores the GPT perplexity and burstiness scores and ranks them in descending order for the dataset.

This plot is insightful, showing GPTZero Avg. Perplexity and Burstiness scores, but the Maya Angelou data point is an extreme outlier, which makes it challenging to see other scores and their overall impact within the study. Another cluster analysis was created to explore these relationships using a new dataset excluding Maya Angelou’s original poem. Three clusters were chosen, and for cluster 2 in yellow, Walt Whitman and William Shakespeare trended high with average perplexity and burstiness scores. The GPT version of Walt Whitman’s original scored the highest with this model. Edgar Allen Poe and Lewis Carroll make up cluster 1, the reddish pink cluster in the center of the plot. Dylan Thomas and most of his poem re-creations fall into the 0 cluster in blue, another callout with this clustering on the lower left of the plot, most GPT renditions grouped here. This plot shows that it is words chosen by original poets concerning perplexity and burstiness or the length of sentences and stanzas are challenging to accomplish for ChatGPT.

Discussion

ChatGPT is an exciting development within AI, bringing up valid concerns within academics. How do professors know ChatGPT did not write a term paper? Edward Tian took on this challenge with his GPTZero site. His story is fascinating because he is a 22-year-old student in the Computer Science program with minor in Journalism at Princeton. He acknowledges that his detector needs to be foolproof and is working to improve it. Here is a quote from his NPR interview, “For so long, AI has been a black box where we really don’t know what is going on inside,” he said. “And with GPTZero, I wanted to start pushing back and fighting against that” (Bowman 2023).

This study explored three main areas within NLP: lexical diversity, sentiment analysis, and AI text detection. Two other areas were initially explored but later abandoned because they needed to produce more substantial results. The first area was an iambic meter text classifier that could predict which century the text was written in; this was going to be explored with the original poems and their recreations to see if the styles line up with the centuries identified by the classifier. The text had to be ordered and arranged in a way for the classifier to run. It was a considerably manual process, with three-quarters of the results saying 15th century, which needed to be more accurate for any originals or re-creations.

The second area explored but not further studied was measuring the meter for each poem. A few Python packages can achieve this, but they could not classify Sonnet 18 correctly for this analysis. To troubleshoot this issue, ChatGPT was asked for code to identify the poetic meter. After multiple prompts and responses, it began to make up Python libraries that could accomplish this task. This is one of the more troubling aspects of ChatGPT. It can provide false confidence in complex tasks, especially with coding. For straightforward tasks, it is very responsive, but when it has a weak knowledge base on an esoteric subject, it tries to double down on its faulty knowledge base.

ChatGPT gets credit for specific tasks, and it was helpful for styling charts in Plotly and getting the PCA analysis to run correctly. It did not have the context base to select which chart or model would be most effective for this study’s analysis, but it had enough suggestions to start more profound research. This issue has been noticed in the programming community, and Stack Overflow banned AI-generated answers to coding questions because the risk it posed of misleading users was too high (Vincent 2022). ChatGPT can write impressively complex Structured Query Language (SQL) queries but does not know the platform infrastructure to run the most optimized code. ChatGPT can be a helpful tool but cannot replace domain knowledge and experience.

Conclusion

Gathering clean data is essential to any scientific study, and for this exercise, there were no missing data points, making analysis much more straightforward. The dataset was small, with only 42 rows and 16 columns. There were 672 observations in this study. The LexicalRichness and TextBlob packages processed the small dataset with ease. The more challenging part of the data collection process was running the poems and various GPT recreations through the Open-AI and GPTZero classifiers and appending the data points to the existing data frame.

The lexical analysis provided the framework for measuring and scoring text. When judging how rich a text is, there are many measures to pick from. Maas and Dugast were the most decisive measures identified with the PCA for this study. In future iterations of this study, the measures picked for the strongest correlation could change based on the text selected for the study. Unsupervised Learning, specifically cluster analysis, showed how correlated the features were, and the subsequent plots explored how other features trended within the dataset.

The classification analysis introduced four features to the dataset, all manually gathered. The original inspiration for the classification assessment of this study came from Michael Crichton’s Jurassic Park. In the book, dinosaurs are brought back to life and are genetically modified slightly in the process. Are these recreations dinosaurs, or are they genetically modified monsters? Are these poem recreations considered poems, or are they literary abominations from their originals? The true answer is that some recreations are so bad that they are comical. Both AI detectors failed to identify the recreations, but the GPTZero detector did provide a scoring method to better subset the recreations.

The variations discovered with the sentiment analysis showed that ChatGPT could provide text in different styles that feel different from the original creations. The AI can deviate and take on other styles and tones; it is not always accurate, and contextually it can be a way off point, but it is an exciting feature of the chatbot. Students may prompt for an essay and then request that the given result be recreated using Maya Angelou’s or Edgar Allen Poe’s style to try and trick AI detectors.

ChatGPT can be a valuable and helpful tool, but it needs to be improved in a few areas regarding being a knowledge source. For the poems analyzed in this study, it did not source “The Raven” correctly, and most poems were slightly different from their source, leading an untrained eye to build false confidence. Generating lexicaly diverse and rich tect is a challenge for AI and ChatGPT. If the source poem is lexicaly rich, it could create something close but not organically. This reinforces the chatbot’s shortcoming of lacking substantial domain expertise.

References

n.d. https://chat.openai.com/.

Angelou, Maya. n.d. “Still i Rise by Maya Angelou.” Poetry Foundation. Poetry Foundation. https://www.poetryfoundation.org/poems/46446/still-i-rise.

Bandgar, Swapnil. 2021. “Sentiment Analysis Using TextBlob.” Medium. Analytics Vidhya. https://medium.com/analytics-vidhya/sentiment-analysis-using-textblob-ecaaf0373dff#:~:text=Polarity%20is%20a%20float%20value,and%201.0%20is%20very%20subjective.

Bowman, Emma. 2023. “A College Student Created an App That Can Tell Whether Ai Wrote an Essay.” NPR. NPR. https://www.npr.org/2023/01/09/1147549845/gptzero-ai-chatgpt-edward-tian-plagiarism.

Carroll, Lewis. n.d. “Jabberwocky by Lewis Carroll - Poems | Academy of American Poets.” Poets.org. Academy of American Poets. https://poets.org/poem/jabberwocky.

Kargar, Isaac. 2023. “Reinforcement Learning from Human Feedback, INSTRUCTGPT, and Chatgpt.” Medium. AIGuys. https://medium.com/aiguys/reinforcement-learning-from-human-feedback-instructgpt-and-chatgpt-693d00cb9c58.

Lissón, Paula, and Nicolas Ballier. 2018. “Investigating Lexical Progression Through Lexical Diversity ...” Discours. Revue de Linguistique, Psycholinguistique Et Informatique. A Journal of Linguistics, Psycholinguistics and Computational Linguistics. Presses universitaires de Caen. https://journals.openedition.org/discours/9950.

Loukas, Serafeim. 2023. “PCA Clearly Explained - How, When, Why to Use It and Feature Importance: A Guide in Python.” Medium. Towards Data Science. https://towardsdatascience.com/pca-clearly-explained-how-when-why-to-use-it-and-feature-importance-a-guide-in-python-7c274582c37e.

Markowitz, Dale. 2021. “Transformers, Explained: Understand the Model Behind GPT, Bert, and T5.” YouTube. YouTube. https://www.youtube.com/watch?v=SZorAJ4I-sA.

McDonald, Andy. 2022. “Seaborn Heatmap for Visualising Data Correlations.” Medium. Towards Data Science. https://towardsdatascience.com/seaborn-heatmap-for-visualising-data-correlations-66cbef09c1fe.

Poe, Edgar Allan. n.d. “The Raven by Edgar Allan Poe.” Poetry Foundation. Poetry Foundation. https://www.poetryfoundation.org/poems/48860/the-raven.

Shakespeare, William. n.d. “Sonnet 18: Shall i Compare Thee to a Summer’s...” Poetry Foundation. Poetry Foundation. https://www.poetryfoundation.org/poems/45087/sonnet-18-shall-i-compare-thee-to-a-summers-day.

Shen, Lucas. 2022. “LexicalRichness: A small module to compute textual lexical richness.” https://doi.org/10.5281/zenodo.6607007.

Thomas, Dylan. n.d. “Do Not Go Gentle into That Good Night by Dylan Thomas.” By Dylan Thomas - Famous Poems, Famous Poets. - All Poetry. https://allpoetry.com/Do-Not-Go-Gentle-Into-That-Good-Night.

Vincent, James. 2022. “Ai-Generated Answers Temporarily Banned on Coding q&a Site Stack Overflow.” The Verge. The Verge. https://www.theverge.com/2022/12/5/23493932/chatgpt-ai-generated-answers-temporarily-banned-stack-overflow-llms-dangers.

Whitman, Walt. n.d. “O Captain! My Captain! By Walt Whitman.” Poetry Foundation. Poetry Foundation. https://www.poetryfoundation.org/poems/45474/o-captain-my-captain.