Cover Sheet

By including this statement, we the authors of this work, verify that:

We hold a copy of this assignment that we can produce if the original is lost or damaged.
We hereby certify that no part of this assignment/product has been copied from any other student's work or from any other source except where due acknowledgement is made in the assignment.
No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
We are aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
We are aware that this work may be used in Unit based peer review assessment. There is no student identification information contained within this work other than the information provided on this cover page.
We hereby certify that we have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit.

	Name	Student.Number	Contribution
1	Poonam Tammy Nair	15136653	100%

Assignment Three

Complete the following tasks using RStudio and record the results and your analysis in an R Markdown file.

Question One

Veteran or Novice Tweeters

A random sample of size 10 of users who have tweeted about #applepi, showed that the users had an average tweet count of 154.3 tweets with a standard deviation of 21.5 tweets. Compute the 95% confdence interval of the population mean user tweet count of users who have tweeted about #applepi. We can assume that the users tweet counts are Normally distributed. Solve the problem using R.

SampleSize = 10
SampleMean = 154.3
StandardDeviation = 21.5
ConfidenceCoefficient = 1.96
StandardError = StandardDeviation/(sqrt(SampleSize))
MarginOfError = ConfidenceCoefficient * StandardError

print(paste("Confidence Coefficient: ", ConfidenceCoefficient))

## [1] "Confidence Coefficient:  1.96"

print(paste("Standard Error: ", StandardError))

## [1] "Standard Error:  6.79889696936202"

print(paste("Margin of Error: ", MarginOfError))

## [1] "Margin of Error:  13.3258380599496"

print(paste("Confidence Interval at 95%: ", SampleMean, " ± ", MarginOfError))

## [1] "Confidence Interval at 95%:  154.3  ±  13.3258380599496"

To workout the 95% Confidence Interval from a normal distribution, we work out:

Mean (154.3) and Standard Deviation (21.5) - already calculated
Calculate Margin of Error
Confidence Co-Efficient = 0.95/2 = 0.475 (1.96 according to Z Table) = 1.96
Standard of Error = SD/√(sample size) = 21.5 / sqrt(10) = 6.79
Margin of Error = Confidence Co-efficient x Standard Error = 1.96 * 6.79 = 13.30
Confidence Interval at 95% (normal distribution) = 154.3 ± 13.3

Result:
Confidence Interval at 95% (normal distribution) = 154.3 ± 13.3.

Question Two

Marketing Influence

From a random sample of 500 tweets containing the term #applepi, we found that 96 were from marketing companies. Compute the 99% confidence interval of the population proportion of tweets containing #applepi, that are from marketing companies.

n = 500
c = 2.58
MarketingProportion = 96
p = MarketingProportion/n
RemainderProportion = 1 - p
MarginOfError = p + c * (sqrt((p * RemainderProportion)/n))
print(paste("The 99% confidence interval of the population is: ± ", MarginOfError))

## [1] "The 99% confidence interval of the population is: ±  0.237445511778392"

Result:
Range for the true population proportion:
14.66% to 23.74%

Question Three

Checking the Job

Further investigation lead us to and that one of the marketing companies was to issue an equal number of tweets containing the term #applepi and the terms #apple and #raspberrypi. A random sample of 400 tweets showed that 86 contained the term #applepi and 65 contained the terms #apple and #raspberrypi. Compute the 95% confidence interval of the difference in proportions for tweets that contain the term #applepi and the terms #apple and #raspberrypi. Does this interval contain the difference of zero?

Population = 400
ProportionOne = 86/Population
ProportionTwo = 65/Population
ConfidenceInterval = 1.96
POne = (ProportionOne * (1 - ProportionOne))/Population
PTwo = (ProportionTwo * (1 - ProportionTwo))/Population
LowerBound = (ProportionOne - ProportionTwo) - ConfidenceInterval * sqrt(POne + 
    PTwo)
UpperBound = (ProportionOne - ProportionTwo) + ConfidenceInterval * sqrt(POne + 
    PTwo)
print(paste("Lower Bound: ", LowerBound))

## [1] "Lower Bound:  -0.00161062257080398"

print(paste("Upper Bound: ", UpperBound))

## [1] "Upper Bound:  0.106610622570804"

print(paste("We are 95% confident that the mean difference between the two proportions is between ", 
    LowerBound, " and ", UpperBound))

## [1] "We are 95% confident that the mean difference between the two proportions is between  -0.00161062257080398  and  0.106610622570804"

Compute the 95% confidence interval of the difference between two proportions:

Confidence Interval: 1.96
Calculate Proportions:
Proportion One (P1): 86/400 = 0.215
Proportion Two (P2): 65/400 = 0.1625
Calculate Lower and Upper Bounds:

Lower Bound Formula:
\[ \begin{aligned} P1-P2 - c * ( \frac{P1(1-P1)}{200} + \frac{P2(1-P2)}{n} ) \\ 0.215-0.1625 - 1.96 * ( \frac{0.215(1-0.215)}{200} + \frac{0.1625(1-0.1625)}{200} )\\ = -0.00161062257080398 \end{aligned} \]

Upper Bound Formula:
\[ \begin{aligned} P1-P2 + c * ( \frac{P1(1-P1)}{200} + \frac{P2(1-P2)}{n} ) \\ 0.215-0.1625 + 1.96 * ( \frac{0.215(1-0.215)}{200} + \frac{0.1625(1-0.1625)}{200} \\ = 0.106610622570804 \end{aligned} \]

Result:
We are 95% confident that the mean difference between the two proportions is between -0.001 and 0.10.

Question Four

Influence of Apple

The marketing company markIT wanted to observe Apple retweeting a tweet and its effect on the tweet's favourite flag. The following ten tweets were observed 1 minute before and after Apple retweeted them and their favourite count was recorded in the table below:

tweets <- c(1, 10, -2, -4, 1, -1, 4, 4, -2, 6)

NumberOfTweets = 10
MeanDifferenceTweets = mean(tweets)
print(paste("Mean difference of Tweets: ", MeanDifferenceTweets))

## [1] "Mean difference of Tweets:  1.7"

Variance = var(tweets)
print(paste("Variance: ", Variance))

## [1] "Variance:  18.4555555555556"

StandardDeviation = sd(tweets)
print(paste("Standard Deviation: ", StandardDeviation))

## [1] "Standard Deviation:  4.29599296502631"

Compute the 95% confidence interval for the mean difference in favourite count.

Confidence Interval: 1.96
Mean:
\[ \begin{aligned} Mean : µ = \frac{1+10+-2+-4+1+-1+4+4+-2+6}{10}\\ = 1.7 \end{aligned} \]
Work out the variance:
\[ \begin{aligned} Variance : \sigma ^2 \quad = ( \frac{1^2\quad+10^2\quad+-2^2\quad+-4^2\quad+1^2\quad+-1^2\quad+4^2\quad+4^2\quad+-2^2\quad+6^2\quad}{10} )\\ = 14.5 \end{aligned} \]
Work out the Standard Deviation:
\[ \begin{aligned} Standard Deviation : \sigma = \sqrt \sigma ^2 \quad \\ \sigma = \sqrt145 \\ = 12.04 \end{aligned} \]
Work out the Margin of Error:
\[ \begin{aligned} Margin of Error : c * \frac{\sigma}{\sqrt n} \\ Margin of Error : 1.96 * \frac{12.04}{\sqrt10} \\ = 7.46 \end{aligned} \]

Result:
We are 95% confident that the mean difference between the mean difference of the tweets are 17 ± 7.46.