Lack of classification
Since there are several problems that have been identified, this data does not have negative and positive comment classification. This could become a problem for two reasons. Firstly, the sophisticated statistical approach cannot be in use to make prediction without this classification, which should have it for determining confidence interval. Secondly, as held in high regard of data integrity, one should not rely on different data to train for statisitcal purpose. That is, using the product-related data to train machine learning for interpreting the specific parts of text in political-related data could result in measurement error.
Created_at Variable
The created_at variable that should be date datatype does not behave as a such. Since the lobbyists are more likely to be interested in particular legislators who change their poltiical stance over time, this datatype is, thus, useful. The problem is that this variable does not behave like this datatype. Perhaps, this variable is encrypted, which allows authorized users only to access this kind of information. Before the hypotheses explain, the next slide will explain the data approach.