Group 3: Sudeshna Sarkar, Animesh Jain, Ashish Poojari, Santosh Neelapala
The Small Business Innovation Research (or SBIR) program is a United States Government program, coordinated by the Small Business Administration, intended to help certain small businesses conduct research and development (R&D). The data is taken from the official website SBIR. Please refer to the link below:
The data can be downloaded from the historical award records in which one can find the title and abstract of each awarded project.
Here we are working with the dataset that ranges over the time from 1983 to 2022. We are trying to analyze the data set with respect to the common problems, the solutions provided along with the awards the companies received for those solutions. We also try determine how these solutions are changed over the years. The analysis will also help us understand the various departments involved in creating the said solutions, thereby earning the awards.
1. Quantitative Analysis: We will analyze text data to create the following
• Corpus- • Tokens - Generate and regenerate tokens removing the unnecessary words. • Creating a document-feature Matrix and display the top features of the DFM • Select 3 relevant keywords for your data and perform the keyword-in- contexts analysis (that is, quanteda::kwic()) for each of these keywords.
2. Qualitative Analysis:
To understand the data distribution and prevalence of keywords in the awards abstract we will be using the topic modelling technique KeyATM
1. Models to implement on the data set.
• STM-To fit a STM that has 5 topics
• LDA- Perfrom LDA Find the best number of topics based on perplexity Perplexity
• Keyword assisted
2. Computation and visualization of the data:
• Measure the interpretability by determining the Coherence (consistency) and exclusivity (distinctiveness) within the topics.
• Plot summary and quality of the topics for the model.
• Department wise award distribution using ggplot.
• LDavis