Google Literacy Markdown

This R Markdown is to serve two purposes. First, will allow us to keep track of key data highlights and important emerging results. This document will also include the code used in the data analysis, to ensure our findings are replicable and can stand up to scrutiny.

Part I: Key Data Highlights

Note: All analysis is conducted using the weight designed for web-users (not the general population weight). Any sample size below 80 is excluded from the charts shown below.

Q3: Overall, what type of device do you use most frequently when you use the Internet?

Overall, smartphone usage holds a slight plurality over the other options. The next popular option is using both types of devices equally – smartphones and laptops.

These trends vary in clear ways when examined by age group. Older adults – especially those 65+ – are much more likely to rely mostly on desktops or laptops. This is likewise true of individuals with higher levels of education, though to the dramatic level seen with age.

Also, we see higher smartphone usage among Blacks and Hispanics.

Q4: Which search engine do you use most frequently?

Google is by far the most common type of search engine used by respondents. However, usage does decline with age. Conservatives are also relatively unlikely to use Google than people of other political beliefs.

Q5 SERIES: Which of the following social media sites or apps do you use at least once a week?

Facebook and YouTube are the only two social media sites used by a majority of respondents. Instagram, used by 40%, enjoys significant support as well. From there, user-rates (i.e. the percentage of people who say they use the platform at least once a week), falls precipitously.

Interesting patterns emerge by various different demographic groups. Most older Americans rely solely on Facebook compared to other social media sites; younger Americans enjoy a far more diverse social media diet – and one that does not primarily rely on Facebook (for the youngest Americans, in fact, YouTube is more popular). Liberals are slightly less likely to rely on Facebook than other political persuasions; we also see higher YouTube usage among Blacks or Hispanics than whites.

BLOCK 3 (Q7-Q17): INFORMATION LITERACY

These items feature a correct answer. The short-hand question wording for these items include:

  • Q7 You can find the MOST reliable and verified description of a concept in a…
  • Q8 In which list have the information sources been correctly ordered from the LEAST to the MOST verified?
  • Q9 Which of the data listed below are raw data, that is, data that has been collected from a source and has not been processed?
  • Q10 The following sentences are from a newspaper article. Which statement is NOT the author’s personal opinion about Genetically Modified Organisms (GMOs)?
  • Q11 Fear mongering is sometimes used to get more attention on an article by using scary language. Which headline contains fear mongering?
  • Q12 A false choice makes it seem like there are only two options, when in reality, there could be more than two possible options. Which statement contains a false choice?
  • Q13 Imagine you came across an article from the website climatechange.org, which you are not familiar with. What is the most reliable way to check if you can trust the source?
  • Q14 From the following options, which is the best way to check if a claim someone makes in a social media post is true or false?
  • Q15 Which search query will provide the most relevant search results for national parks in Colorado?
  • Q16 Which search query will provide the widest range of search results about the herb “oregano”, which has the scientific name “Oreganum vulgare” and is called “wild marjoram” in traditional medicine?
  • Q17 Compared with a search for “Blue Oven”, a search for “Oven” results in…

Q16 – which asks Which search query will provide the widest range of search results about the herb “oregano”, which has the scientific name “Oreganum vulgare” and is called “wild marjoram” in traditional medicine? – was answered correctly by the fewest percentage of individuals (46%). Fewer than half also answered Q9 correctly (identify an example of raw data).

On the flip side, Q11 (where respondents identify a fear-mongering statement) and Q15 (correct search terms for national parks in CO) were both wildly understood – with nearly nine-in-ten individuals answering each of these items correctly. Slightly fewer, but still the vast majority (around 8-in-10) of respondents also answered the following items correctly: Q10 (identifying a statement that is NOT a personal opinion of the author);Q12 (identifying a statement that asserts a false choice) and Q7 (where one identifies a reliable source of information).

Overall, there 11 questions in this “Block 3” series of items. About 9% of respondents answered all questions correctly; another 17% answered 10 items correctly and still another 18% answered 9 items correctly. The vast majority of respondents answered most questions correctly (i.e. 6-11 items) – 81%.

Q7-Q17: BLOCK 3 ITEMS – INITIAL EFA

Here, we explore the “Block 3” items from the prism of factor analysis. All items have been recoded to be BINARY variables, whereby a person is marked as being “incorrect,” (0) or “correct” (1).

We use exploratory FE/PCA techniques to identify the number of factors. This initially suggests 5 factors – which is quite high, for a battery of 8 items (in essence, this suggests these items should be treated separately). One reason for this is the relatively low (tetrachroic, a measure used for binary variables) correlation.

## Parallel analysis suggests that the number of factors =  5  and the number of components =  2
## Factor Analysis using method =  ml
## Call: fa(r = q3.data[3:13], nfactors = 5, rotate = "varimax", fm = "mle", 
##     cor = "tet", weight = q3.data[, "Weight_WEB"])
## Standardized loadings (pattern matrix) based upon correlation matrix
##         ML4  ML1  ML2  ML3   ML5   h2    u2 com
## Q7num  0.49 0.20 0.22 0.10  0.45 0.54 0.464 2.8
## Q8num  0.71 0.32 0.05 0.13  0.25 0.69 0.310 1.8
## Q9num  0.54 0.10 0.15 0.19  0.02 0.36 0.641 1.5
## Q11num 0.56 0.26 0.13 0.31  0.17 0.52 0.480 2.4
## Q12num 0.32 0.26 0.06 0.90  0.07 1.00 0.005 1.5
## Q13num 0.09 0.08 0.99 0.06  0.07 1.00 0.005 1.0
## Q14num 0.30 0.27 0.38 0.03  0.06 0.31 0.694 2.8
## Q10num 0.65 0.44 0.11 0.21  0.01 0.67 0.327 2.1
## Q15num 0.22 0.87 0.10 0.27  0.34 1.00 0.005 1.7
## Q16num 0.40 0.39 0.10 0.06  0.04 0.33 0.670 2.2
## Q17num 0.27 0.57 0.16 0.13 -0.04 0.45 0.552 1.7
## 
##                        ML4  ML1  ML2  ML3  ML5
## SS loadings           2.26 1.79 1.26 1.11 0.43
## Proportion Var        0.21 0.16 0.11 0.10 0.04
## Cumulative Var        0.21 0.37 0.48 0.58 0.62
## Proportion Explained  0.33 0.26 0.18 0.16 0.06
## Cumulative Proportion 0.33 0.59 0.77 0.94 1.00
## 
## Mean item complexity =  2
## Test of the hypothesis that 5 factors are sufficient.
## 
## The degrees of freedom for the null model are  55  and the objective function was  4.3 with Chi Square of  5585.29
## The degrees of freedom for the model are 10  and the objective function was  0.07 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.04 
## 
## The harmonic number of observations is  1303 with the empirical chi square  47.89  with prob <  6.5e-07 
## The total number of observations was  1303  with Likelihood Chi Square =  85.51  with prob <  4.1e-14 
## 
## Tucker Lewis Index of factoring reliability =  0.925
## RMSEA index =  0.076  and the 90 % confidence intervals are  0.062 0.091
## BIC =  13.78
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    ML4  ML1  ML2  ML3   ML5
## Correlation of (regression) scores with factors   0.88 0.95 1.00 0.98  0.65
## Multiple R square of scores with factors          0.77 0.90 0.99 0.97  0.42
## Minimum correlation of possible factor scores     0.55 0.80 0.99 0.94 -0.16

This solution lacks some desirable qualities – notably, the fifth factor lacks a question altogether (the item with the greatest loading to it is Q7, but this also loads to ML4 (a different factor).

We could of course re-run the solution to specify 4 factors. This does help (see below). However there are other issues of model fit to consider – namely the poor performance of Q12, Q13 and Q15 (see the “u2” and “com” columns above, a measure of the contribution of each question). One idea would be to remove these items and try a new solution. Also, we will want to think if these factors make sense.

Finally, we may consider other methods, including IRT.

Below output shows 4-factor solution across all variables.

## Parallel analysis suggests that the number of factors =  5  and the number of components =  2
## 
## Loadings:
##        ML4   ML1   ML2   ML3  
## Q7num  0.514 0.308 0.230      
## Q8num  0.754 0.330       0.124
## Q9num  0.540       0.143 0.198
## Q11num 0.581 0.265 0.133 0.306
## Q12num 0.326 0.266       0.902
## Q13num 0.100       0.987      
## Q14num 0.312 0.239 0.379      
## Q10num 0.652 0.368 0.105 0.224
## Q15num 0.288 0.916 0.111 0.245
## Q16num 0.427 0.348            
## Q17num 0.292 0.503 0.154 0.136
## 
##                  ML4   ML1   ML2   ML3
## SS loadings    2.452 1.761 1.272 1.107
## Proportion Var 0.223 0.160 0.116 0.101
## Cumulative Var 0.223 0.383 0.499 0.599

BLOCK 4 ITEMS WEB LITERACY: Q18-Q21

Now we turn to the Block 4, or Web Literacy items, which are:

  • Q18 Which of the following is a reliable way to check if an article on a website is trustworthy?
  • Q19 Which of the following statements about “.org” websites is true?
  • Q20 Web browsers sometimes display a lock icon in the URL bar. What does it mean when the lock icon is displayed for a particular website?
  • Q21 What determines how search results are ordered on the results page? Please select all that apply.

Note, to answer Q21 correctly respondents had to identify both correct options (“relevance of terms,” and “your browsing history”) in order to be classified as providing a ‘correct response’.

Indeed, this may be one reason so few respondents answered Q21 correct (8% identified both items correctly). However, for most of these Web Literacy items, fewer than half of respondents answered them correctly. The one exception is Q20 – which asks about the lock icon in the URL bar – where 67% answered it correctly (see below).

DISCUSSION OF MULTIVARIATE ANALYIS HERE

BLOCK 5 (Q22-Q26): Self-Perceived Information Literacy

The first 4 items in this Block of questions asks respondents to agree with a number of statements on a 7-point scale. As is shown below, there was quite a distribution on these items.

Should we turn into a scale? Cronbach’s alpha is VERY, VERY weak (though this could be corrected to some degree by reverse coding one of the item, or omitting it entirely), so we would need to consider factor analysis. However, with just 4 items, we wouldn’t expect more than two factors. Before doing so, we need to reverse code Q24, as the correlation matrix shows.

Furthermore, the (polychoric) correlation matrix reveals weak relationships, suggesting again factor analysis may be of limited value here.

The other item in this series is Q26 What are the top three things you review to decide whether the information in an online post is accurate or not?. Respondents were allowed to provide 3 answers, so response totals will be greater than 100%.

Overall, respondents were most likely to review the “organization that published the content,” in order to decide if an online post is accurate. 61% said they also review if the information has been shared by other authoritative sources; about half examine the amount of information and clarity presented by the article.