Analyzing Tesla


Introduction

Financial markets are notoriously unpredictable. Institutions funnel billions into understanding the noise that exists in the markets, as discoveries of even the smallest criteria can allow them to develop an edge upon others. Edges like these are usually developed by looking at both subjective and objective external data sources such as realtime news feeds, corporate earnings reports, and raw data unrelated to the financial sector. In this document, we will investigate and attempt to discover whether a correlation exists between Tesla’s stock price and datapoints gathered from related reddit comments, and determine whether this possible correlation is both significant and predictive.

Here is a Link to the Reddit comment data used.


The Data



Reddit

The data used in this analysis was gathered from Pushshift.io, which is a public wrapper for the official Reddit API that allows mass querying and aggregation of data at a large scale. Pushshift, in this instance, will be used to gather all comments pertaining to Tesla, TSLA, and $TSLA ranging from January 1st, 2015 to December 31st, 2017. Data past this point is not included because of server and data issues on Pushshift’s end, leaving us with a total just shy of 750,000 comments. Below is a table that details information about the columns we will be working with, as well as a random sample of what a row might look like.

Aggregating reddit comments were a bit more involved. Data had to be gathered in batches of 1,000 comments at a time from Pushshift.io, which is a wrapper for reddit’s official API that is designed for mass querying aggregation of data. While provided by an API, this data was not provided in a structured and consistent format. Rows were not consistent in every request, with some columns missing entirely. Data outside the desired bounds (January 1st, 2015 to July 31st, 2018) will also have to be excluded. ___

Data Source Data Type Description
body character The comment text.
utc_datetime_str character The datetime the comment was posted.
Year integer The year the comment was posted.
Month integer The month the comment was posted.
Day integer The day the comment was posted.


Tesla Stock Prices

We will also be using Tesla’s historical stock price, which was gathered from Yahoo Finance. Stock splits and other corporate actions have already been accounted for, so we can use this data as-is. This data will serve as the ‘dependent variable’ in this analysis, allowing a baseline to be established for the analysis. Below is a table that details information about the columns we will be working with, as well as a random sample of the data that this analysis will utilize. Bolded text indicate that they exist as a derivative of the original data.

We have only been provided with the daily opening, closing, volume, and date of each day’s trading, so we will still need to establish a daily average value for each day. This will be done by creating a column titled average_price that represents the mean of the daily opening and adjusted closing prices. A row detailing change created by corporate actions has also been included. Extraneous rows that do not fall within the desired date range will also need to be removed. ___

Data Source Data Type Description
Date Date Date of the stock price.
Open Numeric Opening price of the stock.
High Numeric Highest price of the stock.
Low Numeric Lowest price of the stock.
Close Numeric Closing price of the stock.
Adj.Close Numeric Adjusted closing price of the stock.
Volume Numeric Volume of the stock.
Average Price Numeric Average price of the stock.
Corporate Action Change Numeric Change in price due to corporate actions.


Sources and Libraries


The Analysis


Abstract

We will be approaching this analysis in three parts. First, we will examine Tesla’s stock price and try to understand the larger picture while also attempting to identify any outliers ar anomalies that may impact our later analysis. This will be followed up by a deep dive into the data gathered from reddit comments, where we will try to understand and identify what is important and what is not. Afterwards, we will investigate both of these sets in conjunction with one another to discover any correlation that may exist between the two sets while also ensuring we take note of any discoveries that may have came up in the previous two parts.


Part 1: Tesla Stock Price


Price

Being a stock, it would only make sense to begin our analysis by examining Tesla’s stock price over time. This is what would normally appear if one were to quickly look up a company, and should give us a good look into how the company has changed as time has progressed.

We can see here that there is, on a macro scale, a positive trend as time progresses as indicated by the trend line. This is to be expected, as it falls in line with the general positive trend that the stock market and world economy experienced during this time period. Assuming that everything outside the scope of this analysis progresses constantly, we can expect this company to continue to grow as time progresses on a macro scale. However, it is important to note here that this trend cannot be used as an indicator of short-term price movements, as short-term movement cannot be predicted with any certainty from just previous price data alone.


Volume

Volume is another metric that is important to examine when looking at a stock. It indicates the amount of shares that exchange hands in a specified period of time. While it is not a direct indicator of price, it can be used as a factor in determining the strength of public interest surrounding the company. This is important to note, as this could be used to form a later relation between the volume of reddit comments and the stock’s volume. Below is a bar chart that shows the volume corresponding to Tesla stock over time.

Looking at this chart, we can see that there is a slow but steady increase in volume as time progresses. This most likely indicates that the company slowly grew in popularity over this time period, which was to be expected here. All monthly volume here fell within the range of 900 million to 3.3 billion, with an average hovering around 1.5 billion.


Volume: A Closer Look

We can also take a closer look at the volume by examining it on a daily basis. This will reveal any anomalies or outliers that may exist, which could indicate a significant company-related event that may have occurred.

Examining this chart, we can see that while most of the volume falls within a small margin, there are a few outliers that stand out. Most of these outliers can probably be attributed to quarterly earnings reports, which are usually significant indicators of where a company stands financially. There is one larger outlier that stands out in June 2016, where volume exceeded 300 million. External research reveals that this was most likely due to Tesla’s announcement of their plan to acquire SolarCity, which was a publicly-traded company that sold renewable energy systems. It is important to take note of this, as this is an outlier that would have no correlation to reddit comments, and would have to be excluded from any analysis that we perform later on.


Month-over-Month Change

Another metric that we can examine is the relative change in price from month to month. This will give us a better insight into month-to-month price movements by ‘re-balancing’ the base price, therefore limiting the effect that compounding growth may have had on the price.

A look at this chart reveals that there is a moderate amount of volatility depending on the month, with the average change being around 7.9%. March 2016 saw the largest increase in price, with a change topping 20%. Comparing this to the first chart leads us to believe that this represents a rebound from a large drop in price that occurred in the months prior.


Corporate Actions

Certain actions taken by corporations can have a direct impact on the price of their stock. Some examples of these actions could be dividends or stock splits. The graph below represents any variability that may lie between the close price, and adjusted close price for this time period. This is important to note, as it could indicate the presence of any actions that may have occurred whether one-off, or on a regular basis, that may have affected the price and in-turn must be accounted for in the later analysis.

Examining the difference between the adjusted and unadjusted prices reveals that there were no corporate actions taken that directly impacted the price of the stock during this time period. This indicates that we will not need to account for any corporate actions in the later analysis.


Part 2: Reddit Comments


Comments per Month

Reddit is a popular online social media platform that allows users to post ‘submissions’ that represent either their opinion on something or a reference to something outside. These submissions can then be commented on by other users, giving them an opportunity to voice their own opinion on the parent post. We first want to examine the volume of comments and how they stack up over time so that we can begin to develop an understanding of not only how the consensus of Tesla has expanded over time, but also how they stack up relatively, which will ideally pave the way for a later analysis of both the sentiment and possible relation with the stock.

Reviewing the chart above, it is apparent that the volume of comments has increased steadily over time. There is not much variability in the volume of comments, except for a few outliers. The smallest outlier occurred in February 2015 where the volume of comments were less than 2000. The largest outlier occurred in April 2016 where the volume of comments exceeded 30,000. External research reveals that this was most likely due to the release of the Tesla Model 3, which was groundbreaking for the company as it was their first vehicle affordable to the masses. This outlier confirms that company events do in fact have an impact on the volume of public opinion.


Average Sentiment by Month

Sentiment is important to examine in the scope of this analysis, as it can give us an indication of how the public view surrounding Tesla changes as time progresses, and can be indicative of how specific events in specific months may change the public’s perception on a larger scale.

Notably, every month after January 2015 was negative. Though, this was by a small margin - there was not a single month when the average sentiment was below -0.3, where the maximum and minimum possible sentiment values were 5 and -5. This indicates that the average sentiment was fairly neutral, with the average sentiment sitting at -.09. Though, it is also worth noting that this negativity was not constant, and has trended down as time has progressed. This indicates that consumer sentiment around the brand has progressively worsened over time, despite the company’s growth throughout this period.


Cumulative Sentiment

Cumulative sentiment will be used to examine potential trends in the public’s perception. This can be achieved by taking the average sentiment for each month, and progressively adding it to prior months. Ideally, this visualization will allow us to determine the lasting impact that both positive and negative public events in individual months may have on the grand scheme of things.

Examining how the cumulative sentiment has progressed over the time period, we can see that it was consistent for the first 1 and a half years, before the rate of change began to increase. This indicates that the quantified public opinion of the company started off negative and trended flat at that rate for a while, before the rate of decline began to decrease in June 2016.


Most Common Emotions

Examining sentiment purely on a numeric scale is not enough to grasp the full picture. We want to also examine the emotions that are expressed alongside just the quantifiable, as this can give us a better understanding by helping to eliminate instances where words may have been misinterpreted.

Despite the negativity of the average sentiment, we can see here that the most common emotion class found was ‘positive’. This chart partially contradicts the previous chart, as it indicates that the absolute public opinion surrounding Tesla is more positive than negative. The first two descriptive emotions were ‘trust’ and ‘anticipation’, which implies that the public opinion believes in the company’s future and has expressed excitement for what is to come.


Most Common Emotion by Month

We can expand upon the idea of a most common emotion by comparing it month-by-month. This can help us determine whether specific highly-public events may have had a lasting impacted the public’s perception of the company.

Most months were fairly consistent in terms of their most common emotions, with trust taking the lead in all but one month. There were a few exceptions where anger took the lead. The first anger instance was in May 2015, and this was a precursor to fear taking the lead, over joy, for the next 5 months. This took place around the time of the SolarCity acquisition, so these instances of ‘joy’ could be a lingering effect. The next two instances of anger were in March 2017 and September 2017, but neither of these were followed up with anything noteworthy.


Part 3: Correlation


Sentiment and Price

A security’s price indicates the value at which the public perceives it to be worth. We can pair this value alongside the sentiment, which represents the public’s opinion of the company, to see if they appear to relate to one another. In this visualization, we will chart both the price and the sentiment, ensuring that we normalize the sentiment to account for the average sentiment of the time period.

On a grand scale, there does not appear to be a strong correlation between both of these elements. There is a lot of nose, with the sentiment appearing to be more ‘reactive’ to major changes in the company’s value. An example of this is in June 2016, where a slight decrease in the company’s value was followed by a strong decrease in the mean-adjusted average sentiment, of which lasted for the next month or so.


Sentiment and RSI

Relative Strength Index (RSI) is an indicator that measures the strength of a stock’s price, relative to the current time period, and exists on a scale of 0 to 100. This will hopefully be a good indicator to pair with sentiment taken from reddit comments, as both of these elements can be used to decipher shifts in public opinion. We will attempt to normalize the sentiment to account for the average sentiment overall, to ensure that we are only looking at the variability from day-to-day.

Examining this chart, we can see that there is not much of a relationship between the two on a day-to-day level, it is mainly just noise. Looking at specific larger-scale events though, for example June 2016 and March 2017, we can see that periods of largely negative sentiment are preceded by a swift drop in RSI. This doesn’t appear to be the case when looking at periods of positive sentiment, however.


Comment Volume and Stock Volume

Comment Volume is a good indicator of the public’s current interest in Tesla, regardless of whether that interest is negative or positively biased. We can pair this alongside the stock’s volume to see if there is a relationship between the two, which could then pave the way for a more accurate prediction of volatility in the stock’s price.

Volume between Comments and Stock do appear to have some correlation between one another, but not one that is predictive by any means. Spikes in volume are followed by spikes in comment volume, which most likely implies that they are both good indicators of public interest. These spikes are most likely the result of large public events relating to the company The trend between both is almost linear, indicating that while both are growing at a steady rate, comments are growing at a slightly faster rate.