Cover Sheet

By including this statement, we the authors of this work, verify that:

Name Student.Number Contribution
1 Poonam Tammy Nair 15136653 100%

Assignment Three

Complete the following tasks using RStudio and record the results and your analysis in an R Markdown file.

Question One

Veteran or Novice Tweeters

A random sample of size 10 of users who have tweeted about #applepi, showed that the users had an average tweet count of 154.3 tweets with a standard deviation of 21.5 tweets. Compute the 95% confdence interval of the population mean user tweet count of users who have tweeted about #applepi. We can assume that the users tweet counts are Normally distributed. Solve the problem using R.

SampleSize = 10
SampleMean = 154.3
StandardDeviation = 21.5
ConfidenceCoefficient = 1.96
StandardError = StandardDeviation/(sqrt(SampleSize))
MarginOfError = ConfidenceCoefficient * StandardError

print(paste("Confidence Coefficient: ", ConfidenceCoefficient))
## [1] "Confidence Coefficient:  1.96"
print(paste("Standard Error: ", StandardError))
## [1] "Standard Error:  6.79889696936202"
print(paste("Margin of Error: ", MarginOfError))
## [1] "Margin of Error:  13.3258380599496"
print(paste("Confidence Interval at 95%: ", SampleMean, " ± ", MarginOfError))
## [1] "Confidence Interval at 95%:  154.3  ±  13.3258380599496"

To workout the 95% Confidence Interval from a normal distribution, we work out:

  1. Mean (154.3) and Standard Deviation (21.5) - already calculated
  2. Calculate Margin of Error
    Confidence Co-Efficient = 0.95/2 = 0.475 (1.96 according to Z Table) = 1.96
    Standard of Error = SD/√(sample size) = 21.5 / sqrt(10) = 6.79
    Margin of Error = Confidence Co-efficient x Standard Error = 1.96 * 6.79 = 13.30
  3. Confidence Interval at 95% (normal distribution) = 154.3 ± 13.3


Result:
Confidence Interval at 95% (normal distribution) = 154.3 ± 13.3.

Question Two

Marketing Influence

From a random sample of 500 tweets containing the term #applepi, we found that 96 were from marketing companies. Compute the 99% confidence interval of the population proportion of tweets containing #applepi, that are from marketing companies.

n = 500
c = 2.58
MarketingProportion = 96
p = MarketingProportion/n
RemainderProportion = 1 - p
MarginOfError = p + c * (sqrt((p * RemainderProportion)/n))
print(paste("The 99% confidence interval of the population is: ± ", MarginOfError))
## [1] "The 99% confidence interval of the population is: ±  0.237445511778392"
Result:
Range for the true population proportion:
14.66% to 23.74%

Question Three

Checking the Job

Further investigation lead us to and that one of the marketing companies was to issue an equal number of tweets containing the term #applepi and the terms #apple and #raspberrypi. A random sample of 400 tweets showed that 86 contained the term #applepi and 65 contained the terms #apple and #raspberrypi. Compute the 95% confidence interval of the difference in proportions for tweets that contain the term #applepi and the terms #apple and #raspberrypi. Does this interval contain the difference of zero?

Population = 400
ProportionOne = 86/Population
ProportionTwo = 65/Population
ConfidenceInterval = 1.96
POne = (ProportionOne * (1 - ProportionOne))/Population
PTwo = (ProportionTwo * (1 - ProportionTwo))/Population
LowerBound = (ProportionOne - ProportionTwo) - ConfidenceInterval * sqrt(POne + 
    PTwo)
UpperBound = (ProportionOne - ProportionTwo) + ConfidenceInterval * sqrt(POne + 
    PTwo)
print(paste("Lower Bound: ", LowerBound))
## [1] "Lower Bound:  -0.00161062257080398"
print(paste("Upper Bound: ", UpperBound))
## [1] "Upper Bound:  0.106610622570804"
print(paste("We are 95% confident that the mean difference between the two proportions is between ", 
    LowerBound, " and ", UpperBound))
## [1] "We are 95% confident that the mean difference between the two proportions is between  -0.00161062257080398  and  0.106610622570804"

Compute the 95% confidence interval of the difference between two proportions:

  1. Confidence Interval: 1.96

  2. Calculate Proportions:
    Proportion One (P1): 86/400 = 0.215
    Proportion Two (P2): 65/400 = 0.1625

  3. Calculate Lower and Upper Bounds:
    Lower Bound Formula:
    \[ \begin{aligned} P1-P2 - c * ( \frac{P1(1-P1)}{200} + \frac{P2(1-P2)}{n} ) \\ 0.215-0.1625 - 1.96 * ( \frac{0.215(1-0.215)}{200} + \frac{0.1625(1-0.1625)}{200} )\\ = -0.00161062257080398 \end{aligned} \]
    Upper Bound Formula:
    \[ \begin{aligned} P1-P2 + c * ( \frac{P1(1-P1)}{200} + \frac{P2(1-P2)}{n} ) \\ 0.215-0.1625 + 1.96 * ( \frac{0.215(1-0.215)}{200} + \frac{0.1625(1-0.1625)}{200} \\ = 0.106610622570804 \end{aligned} \]
Result:
We are 95% confident that the mean difference between the two proportions is between -0.001 and 0.10.

Question Four

Influence of Apple

The marketing company markIT wanted to observe Apple retweeting a tweet and its effect on the tweet's favourite flag. The following ten tweets were observed 1 minute before and after Apple retweeted them and their favourite count was recorded in the table below:

tweets <- c(1, 10, -2, -4, 1, -1, 4, 4, -2, 6)

NumberOfTweets = 10
MeanDifferenceTweets = mean(tweets)
print(paste("Mean difference of Tweets: ", MeanDifferenceTweets))
## [1] "Mean difference of Tweets:  1.7"
Variance = var(tweets)
print(paste("Variance: ", Variance))
## [1] "Variance:  18.4555555555556"
StandardDeviation = sd(tweets)
print(paste("Standard Deviation: ", StandardDeviation))
## [1] "Standard Deviation:  4.29599296502631"

Compute the 95% confidence interval for the mean difference in favourite count.

  1. Confidence Interval: 1.96

  2. Mean:
    \[ \begin{aligned} Mean : µ = \frac{1+10+-2+-4+1+-1+4+4+-2+6}{10}\\ = 1.7 \end{aligned} \]
  3. Work out the variance:
    \[ \begin{aligned} Variance : \sigma ^2 \quad = ( \frac{1^2\quad+10^2\quad+-2^2\quad+-4^2\quad+1^2\quad+-1^2\quad+4^2\quad+4^2\quad+-2^2\quad+6^2\quad}{10} )\\ = 14.5 \end{aligned} \]
  4. Work out the Standard Deviation:
    \[ \begin{aligned} Standard Deviation : \sigma = \sqrt \sigma ^2 \quad \\ \sigma = \sqrt145 \\ = 12.04 \end{aligned} \]
  5. Work out the Margin of Error:
    \[ \begin{aligned} Margin of Error : c * \frac{\sigma}{\sqrt n} \\ Margin of Error : 1.96 * \frac{12.04}{\sqrt10} \\ = 7.46 \end{aligned} \]
Result:
We are 95% confident that the mean difference between the mean difference of the tweets are 17 ± 7.46.