Predictive Text Model and Exploratory Analysis

Introduction

This report presents a brief exploratory analysis of a sample corpus and outlines the initial steps toward building a predictive text algorithm and Shiny application. The goal is to demonstrate familiarity with the data and readiness to develop a scalable model.

Data Overview

The corpus consists of five English-language sentences related to sustainability and development. It was successfully loaded and cleaned using basic text preprocessing techniques.

## Warning: package 'dplyr' was built under R version 4.4.3

## 
## Adjuntando el paquete: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

##   Lines Total_Words Avg_Words_Per_Line
## 1     5          27                5.4

## Warning: package 'tidytext' was built under R version 4.4.3

## Warning: package 'ggplot2' was built under R version 4.4.3

Predictive Model Plan

The model uses n-grams (uni-, bi-, and trigrams) to predict the next word based on previous context. It includes a backoff strategy to handle unseen combinations and is optimized for memory and speed.

Next steps:

Expand the corpus with real-world documents

Apply smoothing techniques

Integrate the model into a Shiny app for interactive predictions

Conclusion

This report confirms that the data has been successfully loaded and explored. The initial predictive model is functional, and the next phase will focus on scaling and deploying it via Shiny.

Predictive Text Model and Exploratory Analysis

Natali Pérez

Introduction

Data Overview

Predictive Model Plan

Conclusion