By including this statement, we the authors of this work, verify that:
We hold a copy of this assignment that we can produce if the original is lost or damaged.
We hereby certify that no part of this assignment/product has been copied from any other student's work or from any other source except where due acknowledgement is made in the assignment.
No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
We are aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
We are aware that this work may be used in Unit based peer review assessment. There is no student identification information contained within this work other than the information provided on this cover page.
We hereby certify that we have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit.
Name | Student.Number | Contribution | |
---|---|---|---|
1 | Poonam Tammy Nair | 15136653 | 100% |
Complete the following tasks using RStudio and record the results and your analysis in an R Markdown file.
A random sample of size 10 of users who have tweeted about #applepi, showed that the users had an average tweet count of 154.3 tweets with a standard deviation of 21.5 tweets. Compute the 95% confdence interval of the population mean user tweet count of users who have tweeted about #applepi. We can assume that the users tweet counts are Normally distributed. Solve the problem using R.
SampleSize = 10
SampleMean = 154.3
StandardDeviation = 21.5
ConfidenceCoefficient = 1.96
StandardError = StandardDeviation/(sqrt(SampleSize))
MarginOfError = ConfidenceCoefficient * StandardError
print(paste("Confidence Coefficient: ", ConfidenceCoefficient))
## [1] "Confidence Coefficient: 1.96"
print(paste("Standard Error: ", StandardError))
## [1] "Standard Error: 6.79889696936202"
print(paste("Margin of Error: ", MarginOfError))
## [1] "Margin of Error: 13.3258380599496"
print(paste("Confidence Interval at 95%: ", SampleMean, " ± ", MarginOfError))
## [1] "Confidence Interval at 95%: 154.3 ± 13.3258380599496"
To workout the 95% Confidence Interval from a normal distribution, we work out:
From a random sample of 500 tweets containing the term #applepi, we found that 96 were from marketing companies. Compute the 99% confidence interval of the population proportion of tweets containing #applepi, that are from marketing companies.
n = 500
c = 2.58
MarketingProportion = 96
p = MarketingProportion/n
RemainderProportion = 1 - p
MarginOfError = p + c * (sqrt((p * RemainderProportion)/n))
print(paste("The 99% confidence interval of the population is: ± ", MarginOfError))
## [1] "The 99% confidence interval of the population is: ± 0.237445511778392"
Further investigation lead us to and that one of the marketing companies was to issue an equal number of tweets containing the term #applepi and the terms #apple and #raspberrypi. A random sample of 400 tweets showed that 86 contained the term #applepi and 65 contained the terms #apple and #raspberrypi. Compute the 95% confidence interval of the difference in proportions for tweets that contain the term #applepi and the terms #apple and #raspberrypi. Does this interval contain the difference of zero?
Population = 400
ProportionOne = 86/Population
ProportionTwo = 65/Population
ConfidenceInterval = 1.96
POne = (ProportionOne * (1 - ProportionOne))/Population
PTwo = (ProportionTwo * (1 - ProportionTwo))/Population
LowerBound = (ProportionOne - ProportionTwo) - ConfidenceInterval * sqrt(POne +
PTwo)
UpperBound = (ProportionOne - ProportionTwo) + ConfidenceInterval * sqrt(POne +
PTwo)
print(paste("Lower Bound: ", LowerBound))
## [1] "Lower Bound: -0.00161062257080398"
print(paste("Upper Bound: ", UpperBound))
## [1] "Upper Bound: 0.106610622570804"
print(paste("We are 95% confident that the mean difference between the two proportions is between ",
LowerBound, " and ", UpperBound))
## [1] "We are 95% confident that the mean difference between the two proportions is between -0.00161062257080398 and 0.106610622570804"
Compute the 95% confidence interval of the difference between two proportions:
The marketing company markIT wanted to observe Apple retweeting a tweet and its effect on the tweet's favourite flag. The following ten tweets were observed 1 minute before and after Apple retweeted them and their favourite count was recorded in the table below:
tweets <- c(1, 10, -2, -4, 1, -1, 4, 4, -2, 6)
NumberOfTweets = 10
MeanDifferenceTweets = mean(tweets)
print(paste("Mean difference of Tweets: ", MeanDifferenceTweets))
## [1] "Mean difference of Tweets: 1.7"
Variance = var(tweets)
print(paste("Variance: ", Variance))
## [1] "Variance: 18.4555555555556"
StandardDeviation = sd(tweets)
print(paste("Standard Deviation: ", StandardDeviation))
## [1] "Standard Deviation: 4.29599296502631"
Compute the 95% confidence interval for the mean difference in favourite count.