Analysis_Pipeline

We used Google Jigsaw Perspective API, to quantify the use of harmful language in post, then we applied Principal Component Analysis (PCA), where we only considered the first principle component also considered the quality news sources that they shared

For Linkedin data, we have 157,171 posts with valid URL and their rating.

For all posts that contain a link to a news website, linear regression with robust standard errors.

Model_Linkedin_1 <- lm_robust(scale(harm_PC1)~scale(domain_quality_rating), data=Linkedin_data_clean)
summary(Model_Linkedin_1)

## 
## Call:
## lm_robust(formula = scale(harm_PC1) ~ scale(domain_quality_rating), 
##     data = Linkedin_data_clean)
## 
## Standard error type:  HC2 
## 
## Coefficients:
##                                Estimate Std. Error    t value  Pr(>|t|)
## (Intercept)                   1.036e-14   0.002548  4.066e-12 1.000e+00
## scale(domain_quality_rating) -4.572e-02   0.002672 -1.711e+01 1.429e-65
##                               CI Lower  CI Upper     DF
## (Intercept)                  -0.004995  0.004995 153655
## scale(domain_quality_rating) -0.050955 -0.040481 153655
## 
## Multiple R-squared:  0.00209 ,   Adjusted R-squared:  0.002084 
## F-statistic: 292.8 on 1 and 153655 DF,  p-value: < 2.2e-16

For all posts that contain a link to a news website, linear regression with robust standard errors clustered on user.

Model_Linkedin_2 <- feols(scale(harm_PC1) ~ scale(domain_quality_rating), cluster = ~authorName, data = Linkedin_data_clean)
print(summary(Model_Linkedin_2))

## OLS estimation, Dep. Var.: scale(harm_PC1)
## Observations: 153,657
## Standard-errors: Clustered (authorName) 
##                                   Estimate Std. Error       t value   Pr(>|t|)
## (Intercept)                   1.030000e-14   0.010842  9.530000e-13 1.0000e+00
## scale(domain_quality_rating) -4.571842e-02   0.006906 -6.620338e+00 3.6107e-11
##                                 
## (Intercept)                     
## scale(domain_quality_rating) ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.998951   Adj. R2: 0.002084

bluesky

For bluesky data, we have 538,361 posts with valid URL and their rating.

For all posts that contain a link to a news website, linear regression with robust standard errors.

Model_bluesky_1 <- lm_robust(scale(harm_PC1)~scale(domain_quality_rating), data=BlueSky_data_clean)
print(summary(Model_bluesky_1))

## 
## Call:
## lm_robust(formula = scale(harm_PC1) ~ scale(domain_quality_rating), 
##     data = BlueSky_data_clean)
## 
## Standard error type:  HC2 
## 
## Coefficients:
##                                Estimate Std. Error    t value  Pr(>|t|)
## (Intercept)                  -6.616e-14   0.001363 -4.855e-11 1.000e+00
## scale(domain_quality_rating)  1.076e-02   0.001401  7.680e+00 1.588e-14
##                               CI Lower CI Upper     DF
## (Intercept)                  -0.002671 0.002671 538359
## scale(domain_quality_rating)  0.008015 0.013508 538359
## 
## Multiple R-squared:  0.0001158 , Adjusted R-squared:  0.000114 
## F-statistic: 58.99 on 1 and 538359 DF,  p-value: 1.588e-14

For all posts that contain a link to a news website, linear regression with robust standard errors clustered on user.

Model_bluesky_2 <- feols(scale(harm_PC1) ~ scale(domain_quality_rating), cluster = ~username,  data = BlueSky_data_clean)
print(summary(Model_bluesky_2))

## OLS estimation, Dep. Var.: scale(harm_PC1)
## Observations: 538,361
## Standard-errors: Clustered (username) 
##                                   Estimate Std. Error       t value Pr(>|t|) 
## (Intercept)                  -6.460000e-14   0.009315 -6.940000e-12  1.00000 
## scale(domain_quality_rating)  1.076158e-02   0.008677  1.240276e+00  0.21488 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.999941   Adj. R2: 1.14e-4

gab

For Gab data, we have 93,064 posts with valid URL and their rating.

For all posts that contain a link to a news website, linear regression with robust standard errors

Model_gab_1 <- lm_robust(scale(harm_PC1)~scale(domain_quality_rating), data=Gab_data_clean)
print(summary(Model_gab_1))

## 
## Call:
## lm_robust(formula = scale(harm_PC1) ~ scale(domain_quality_rating), 
##     data = Gab_data_clean)
## 
## Standard error type:  HC2 
## 
## Coefficients:
##                                Estimate Std. Error    t value Pr(>|t|)
## (Intercept)                   3.697e-15   0.003278  1.128e-12  1.00000
## scale(domain_quality_rating) -5.817e-03   0.003038 -1.915e+00  0.05554
##                               CI Lower  CI Upper    DF
## (Intercept)                  -0.006425 0.0064248 93062
## scale(domain_quality_rating) -0.011772 0.0001376 93062
## 
## Multiple R-squared:  3.384e-05 , Adjusted R-squared:  2.309e-05 
## F-statistic: 3.666 on 1 and 93062 DF,  p-value: 0.05554

For all posts that contain a link to a news website, linear regression with robust standard errors clustered on user.

Model_gab_2 <- feols(scale(harm_PC1) ~ scale(domain_quality_rating), cluster = ~username,  data = Gab_data_clean)
summary(Model_gab_2)

## OLS estimation, Dep. Var.: scale(harm_PC1)
## Observations: 93,064
## Standard-errors: Clustered (username) 
##                                   Estimate Std. Error       t value Pr(>|t|) 
## (Intercept)                   3.590000e-15   0.009312  3.850000e-13  1.00000 
## scale(domain_quality_rating) -5.817091e-03   0.009287 -6.263795e-01  0.53108 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.999978   Adj. R2: 2.309e-5

truthsocial

For truthsocial, we have 223,846 posts with valid URL and their rating.

For all posts that contain a link to a news website, linear regression with robust standard errors.

Model_truthsocial_1 <- lm_robust(scale(harm_PC1)~scale(domain_quality_rating), data=truthsocial_data_clean)
print(summary(Model_truthsocial_1))

## 
## Call:
## lm_robust(formula = scale(harm_PC1) ~ scale(domain_quality_rating), 
##     data = truthsocial_data_clean)
## 
## Standard error type:  HC2 
## 
## Coefficients:
##                               Estimate Std. Error t value  Pr(>|t|)  CI Lower
## (Intercept)                  0.0009964   0.002145  0.4646 6.423e-01 -0.003207
## scale(domain_quality_rating) 0.0212735   0.002256  9.4293 4.165e-21  0.016852
##                              CI Upper     DF
## (Intercept)                    0.0052 220190
## scale(domain_quality_rating)   0.0257 220190
## 
## Multiple R-squared:  0.0004466 , Adjusted R-squared:  0.000442 
## F-statistic: 88.91 on 1 and 220190 DF,  p-value: < 2.2e-16

For all posts that contain a link to a news website, linear regression with robust standard errors clustered on user.

Model_truthsocial_2 <- feols(scale(harm_PC1) ~ scale(domain_quality_rating), cluster = ~username,  data = truthsocial_data_clean)

## NOTE: 3,654 observations removed because of NA values (RHS: 3,654).

summary(Model_truthsocial_2)

## OLS estimation, Dep. Var.: scale(harm_PC1)
## Observations: 220,192
## Standard-errors: Clustered (username) 
##                              Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept)                  0.000996   0.007364 0.135303 0.8923735    
## scale(domain_quality_rating) 0.021273   0.006617 3.214968 0.0013066 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 1.00646   Adj. R2: 4.42e-4

For Telegram, we have 3,556,345 posts with valid URL and their rating (so far).

For all posts that contain a link to a news website, linear regression with robust standard errors.

Model_telegram_1 <- lm_robust(scale(harm_PC1)~scale(domain_quality_rating), data=Telegram_data_clean)
print(summary(Model_telegram_1))

## 
## Call:
## lm_robust(formula = scale(harm_PC1) ~ scale(domain_quality_rating), 
##     data = Telegram_data_clean)
## 
## Standard error type:  HC2 
## 
## Coefficients:
##                                Estimate Std. Error    t value Pr(>|t|)
## (Intercept)                  -5.213e-13  0.0005296 -9.844e-10        1
## scale(domain_quality_rating) -5.185e-02  0.0005389 -9.620e+01        0
##                               CI Lower  CI Upper      DF
## (Intercept)                  -0.001038  0.001038 3556343
## scale(domain_quality_rating) -0.052903 -0.050790 3556343
## 
## Multiple R-squared:  0.002688 ,  Adjusted R-squared:  0.002688 
## F-statistic:  9255 on 1 and 3556343 DF,  p-value: < 2.2e-16

For all posts that contain a link to a news website, linear regression with robust standard errors clustered on user.

Model_telegram_2 <- feols(scale(harm_PC1) ~ scale(domain_quality_rating), cluster = ~Username,  data = Telegram_data_clean)
summary(Model_telegram_2)

## OLS estimation, Dep. Var.: scale(harm_PC1)
## Observations: 3,556,345
## Standard-errors: Clustered (Username) 
##                                   Estimate Std. Error       t value   Pr(>|t|)
## (Intercept)                  -4.930000e-13   0.018779 -2.620000e-11 1.0000e+00
## scale(domain_quality_rating) -5.184646e-02   0.010419 -4.976303e+00 6.5301e-07
##                                 
## (Intercept)                     
## scale(domain_quality_rating) ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.998655   Adj. R2: 0.002688

Analysis_Pipeline

Linkedin

bluesky

gab

truthsocial

Telegram

mastodon