The Sloan Digital Sky Survey is a wide-sweeping telescope survey that releases data to the public in the form of raw data and images. One of these surveys is MaStar, the MaNGA Stellar Library. Packaged with the MaStar data are several error variables, which might lead one to ask: Is it possible to classify errors within the MaStar data based on the surrounding information? All MaStar/MaNGA/SDSS data is publicly available at www.sdss.org.
To begin data analysis the dataset needs to be understood. Here I retrieve a summary of the dataset and the data header.
mastar = read_csv("Data/mastarall.csv", show_col_types=FALSE)
summary(mastar) # Summarize the data
## DRPVER MPROCVER MANGAID PLATE
## Length:59266 Length:59266 Length:59266 Min. : 7443
## Class :character Class :character Class :character 1st Qu.: 8921
## Mode :character Mode :character Mode :character Median : 9800
## Mean :10036
## 3rd Qu.:11254
## Max. :12772
##
## IFUDESIGN MJD IFURA IFUDEC
## Min. : 701 Min. :56739 Min. : 0.0528 Min. :-32.666
## 1st Qu.: 706 1st Qu.:57673 1st Qu.:115.2279 1st Qu.: 7.576
## Median : 711 Median :58119 Median :194.3851 Median : 28.365
## Mean : 3599 Mean :58101 Mean :186.6236 Mean : 27.857
## 3rd Qu.: 6102 3rd Qu.:58582 3rd Qu.:261.0807 3rd Qu.: 47.416
## Max. :12705 Max. :59085 Max. :359.9651 Max. : 87.357
##
## PSFMAG MNGTARG2 EXPTIME NEXP_VISIT
## Length:59266 Min. : 132 Min. : 10.1 Min. : 1.000
## Class :character 1st Qu.: 1280 1st Qu.:900.1 1st Qu.: 3.000
## Mode :character Median : 8390656 Median :900.1 Median : 4.000
## Mean : 27139706 Mean :777.9 Mean : 6.727
## 3rd Qu.: 67108992 3rd Qu.:900.1 3rd Qu.: 6.000
## Max. :134299648 Max. :900.2 Max. :62.000
##
## NVELGOOD HELIOV VERR V_ERRCODE
## Min. : 0 Min. :-512.50 Min. :0.003199 Min. :0
## 1st Qu.: 9 1st Qu.: -47.90 1st Qu.:0.677605 1st Qu.:0
## Median : 12 Median : -10.03 Median :1.124152 Median :0
## Mean : 16 Mean : -17.12 Mean :1.477142 Mean :0
## 3rd Qu.: 17 3rd Qu.: 21.87 3rd Qu.:1.860484 3rd Qu.:0
## Max. :135 Max. : 488.71 Max. :9.987876 Max. :0
##
## HELIOV_VISIT VERR_VISIT V_ERRCODE_VISIT VELVARFLAG
## Min. :-520.317 Min. : 0.0000 Min. :0.0000000 Min. :0.0000
## 1st Qu.: -47.765 1st Qu.: 0.6357 1st Qu.:0.0000000 1st Qu.:0.0000
## Median : -9.952 Median : 1.3248 Median :0.0000000 Median :0.0000
## Mean : -17.002 Mean : 1.9169 Mean :0.0005568 Mean :0.2874
## 3rd Qu.: 22.202 3rd Qu.: 2.4725 3rd Qu.:0.0000000 3rd Qu.:1.0000
## Max. : 489.544 Max. :507.9113 Max. :1.0000000 Max. :1.0000
##
## DV_MAXSIG MJDQUAL BPRPERR_SP NEXP_USED
## Min. : 0.0000 Min. : 0.0 Min. :-999.00000 Min. : 1.000
## 1st Qu.: 0.4372 1st Qu.: 0.0 1st Qu.: 0.00430 1st Qu.: 3.000
## Median : 1.5599 Median : 0.0 Median : 0.00813 Median : 4.000
## Mean : 3.1547 Mean : 624.9 Mean : -41.07281 Mean : 6.693
## 3rd Qu.: 3.3920 3rd Qu.:1024.0 3rd Qu.: 0.01516 3rd Qu.: 6.000
## Max. :169.6528 Max. :3328.0 Max. : 0.05000 Max. :62.000
## NA's :8
## S2N S2N10 BADPIXFRAC RA
## Min. : 2.814 Min. : 34.08 Min. :0.000000 Min. : 0.0528
## 1st Qu.: 63.146 1st Qu.: 83.80 1st Qu.:0.000000 1st Qu.:115.2279
## Median : 95.904 Median : 126.89 Median :0.001096 Median :194.3851
## Mean : 126.010 Mean : 165.48 Mean :0.001126 Mean :186.6236
## 3rd Qu.: 159.397 3rd Qu.: 208.42 3rd Qu.:0.001534 3rd Qu.:261.0808
## Max. :1024.016 Max. :1386.06 Max. :0.019724 Max. :359.9651
##
## DEC EPOCH COORD_SOURCE PHOTOCAT
## Min. :-32.666 Min. :1999 Length:59266 Length:59266
## 1st Qu.: 7.576 1st Qu.:2012 Class :character Class :character
## Median : 28.365 Median :2012 Mode :character Mode :character
## Mean : 27.857 Mean :2010
## 3rd Qu.: 47.416 3rd Qu.:2012
## Max. : 87.357 Max. :2016
##
head(mastar) # Get the first lines of the data
## # A tibble: 6 × 32
## DRPVER MPROCVER MANGAID PLATE IFUDESIGN MJD IFURA IFUDEC PSFMAG MNGTARG2
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 v3_1_1 v1_7_7 5-66031 10001 701 57372 134. 56.4 [17.9317 … 8390656
## 2 v3_1_1 v1_7_7 5-66031 10001 701 57373 134. 56.4 [17.9317 … 8390656
## 3 v3_1_1 v1_7_7 5-12626 10001 702 57372 136. 57.6 [17.3449 … 8390656
## 4 v3_1_1 v1_7_7 5-12626 10001 702 57373 136. 57.6 [17.3449 … 8390656
## 5 v3_1_1 v1_7_7 5-66039 10001 703 57372 134. 57.9 [17.4606 … 8390656
## 6 v3_1_1 v1_7_7 5-66039 10001 703 57373 134. 57.9 [17.4606 … 8390656
## # ℹ 22 more variables: EXPTIME <dbl>, NEXP_VISIT <dbl>, NVELGOOD <dbl>,
## # HELIOV <dbl>, VERR <dbl>, V_ERRCODE <dbl>, HELIOV_VISIT <dbl>,
## # VERR_VISIT <dbl>, V_ERRCODE_VISIT <dbl>, VELVARFLAG <dbl>, DV_MAXSIG <dbl>,
## # MJDQUAL <dbl>, BPRPERR_SP <dbl>, NEXP_USED <dbl>, S2N <dbl>, S2N10 <dbl>,
## # BADPIXFRAC <dbl>, RA <dbl>, DEC <dbl>, EPOCH <dbl>, COORD_SOURCE <chr>,
## # PHOTOCAT <chr>
From the data summary and header, we can see that there are a lot of columns, and several of them are character class, several are floats, and several are integers. We should only be selecting the numeric columns and the MaNGA ID column for this project. That is done below.
mastar_refined <- mastar |>
select(c(MANGAID, MJD, IFURA, IFUDEC, MNGTARG2, EXPTIME, NEXP_VISIT, NVELGOOD, HELIOV, VERR, HELIOV_VISIT, VERR_VISIT, V_ERRCODE_VISIT, DV_MAXSIG))
mastar_refined
## # A tibble: 59,266 × 14
## MANGAID MJD IFURA IFUDEC MNGTARG2 EXPTIME NEXP_VISIT NVELGOOD HELIOV VERR
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 5-66031 57372 134. 56.4 8390656 900. 3 9 -81.5 4.41
## 2 5-66031 57373 134. 56.4 8390656 900. 6 9 -81.5 4.41
## 3 5-12626 57372 136. 57.6 8390656 900. 3 9 -226. 2.16
## 4 5-12626 57373 136. 57.6 8390656 900. 6 9 -226. 2.16
## 5 5-66039 57372 134. 57.9 8390656 900. 3 9 -257. 1.60
## 6 5-66039 57373 134. 57.9 8390656 900. 6 9 -257. 1.60
## 7 5-108715 57372 133. 58.0 8390656 900. 3 9 -29.8 2.52
## 8 5-108715 57373 133. 58.0 8390656 900. 6 9 -29.8 2.52
## 9 5-108718 57372 133. 58.3 8390656 900. 3 9 14.6 1.48
## 10 5-108718 57373 133. 58.3 8390656 900. 6 9 14.6 1.48
## # ℹ 59,256 more rows
## # ℹ 4 more variables: HELIOV_VISIT <dbl>, VERR_VISIT <dbl>,
## # V_ERRCODE_VISIT <dbl>, DV_MAXSIG <dbl>
Now we have the columns we need, these are the timing data, error data, astrophysical measurements, and measurements of the stars. Each of these can now be used with a regression model to determine links between variables and the error variables. Our error variables are VERR, VERR_VISIT, and V_ERRCODE_VISIT. We can still explore this data a bit further though.
For example, we can group the data by stars, halving the number of lines on the dataset (roughly halved).
mastar_stars <- mastar_refined |>
group_by(MANGAID) |>
summarize(across(everything(), mean, na.rm=FALSE))
## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(everything(), mean, na.rm = FALSE)`.
## ℹ In group 1: `MANGAID = "13-0"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
mastar_stars
## # A tibble: 24,290 × 14
## MANGAID MJD IFURA IFUDEC MNGTARG2 EXPTIME NEXP_VISIT NVELGOOD HELIOV VERR
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 13-0 56743. 231. 41.9 1050624 900. 5 15 -60.1 0.850
## 2 13-1 56743. 231. 41.5 1050624 900. 5 15 -65.6 1.00
## 3 13-10 56743. 230. 42.3 1050624 900. 5 15 -197. 1.13
## 4 13-11 56743. 232. 43.6 1050624 900. 5 15 -185. 0.853
## 5 13-2 56743 231. 42.4 1050624 900. 6 15 -16.3 1.11
## 6 13-3 56743. 233. 42.6 1050624 900. 5 15 -211. 1.63
## 7 13-4 56743. 231. 43.0 1050624 900. 5 15 -23.1 1.06
## 8 13-5 56743. 233. 43.3 1050624 900. 5 15 -145. 1.80
## 9 13-6 56743. 231. 43.9 1050624 900. 5 15 -95.4 1.44
## 10 13-7 56743. 229. 42.5 1050624 900. 5 15 25.3 0.746
## # ℹ 24,280 more rows
## # ℹ 4 more variables: HELIOV_VISIT <dbl>, VERR_VISIT <dbl>,
## # V_ERRCODE_VISIT <dbl>, DV_MAXSIG <dbl>
Now we have the data for each individual star. This is well-refined data already so there is no need to characterize any of the stars as their own factors, since that would require 24,290 factors.
Regression analysis of this dataset will come in three models. We will use one model for each of the error variables. Let’s start by setting up each of the linear regression models.
verr <- lm(VERR ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME + NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT, data = mastar_stars)
verr_visit <- lm(VERR_VISIT ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME + NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT, data = mastar_stars)
verrcode <- lm(V_ERRCODE_VISIT ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME + NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT, data = mastar_stars)
summary(verr)
##
## Call:
## lm(formula = VERR ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME +
## NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT, data = mastar_stars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2214 -0.7541 -0.3047 0.3750 9.7019
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.853e+00 9.899e-01 -9.954 < 2e-16 ***
## MJD 1.869e-04 1.682e-05 11.112 < 2e-16 ***
## IFURA -5.489e-04 8.499e-05 -6.458 1.08e-10 ***
## IFUDEC -2.523e-03 3.160e-04 -7.985 1.46e-15 ***
## MNGTARG2 -2.187e-09 1.947e-10 -11.234 < 2e-16 ***
## EXPTIME 1.038e-03 4.858e-05 21.358 < 2e-16 ***
## NEXP_VISIT 9.719e-03 1.947e-03 4.991 6.05e-07 ***
## NVELGOOD -7.419e-03 8.468e-04 -8.760 < 2e-16 ***
## HELIOV -1.409e-02 2.609e-03 -5.401 6.67e-08 ***
## HELIOV_VISIT 1.327e-02 2.611e-03 5.081 3.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.231 on 24280 degrees of freedom
## Multiple R-squared: 0.08324, Adjusted R-squared: 0.0829
## F-statistic: 244.9 on 9 and 24280 DF, p-value: < 2.2e-16
summary(verr_visit)
##
## Call:
## lm(formula = VERR_VISIT ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME +
## NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT, data = mastar_stars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.209 -0.900 -0.277 0.484 237.987
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.239e+01 1.736e+00 -7.139 9.68e-13 ***
## MJD 2.291e-04 2.950e-05 7.768 8.31e-15 ***
## IFURA -6.734e-04 1.490e-04 -4.519 6.24e-06 ***
## IFUDEC -3.531e-03 5.540e-04 -6.373 1.89e-10 ***
## MNGTARG2 -3.004e-09 3.413e-10 -8.799 < 2e-16 ***
## EXPTIME 1.637e-03 8.517e-05 19.215 < 2e-16 ***
## NEXP_VISIT 1.474e-03 3.414e-03 0.432 0.666
## NVELGOOD -1.528e-03 1.485e-03 -1.029 0.303
## HELIOV -1.819e-01 4.574e-03 -39.755 < 2e-16 ***
## HELIOV_VISIT 1.809e-01 4.579e-03 39.502 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.159 on 24280 degrees of freedom
## Multiple R-squared: 0.1162, Adjusted R-squared: 0.1159
## F-statistic: 354.8 on 9 and 24280 DF, p-value: < 2.2e-16
summary(verrcode)
##
## Call:
## lm(formula = V_ERRCODE_VISIT ~ MJD + IFURA + IFUDEC + MNGTARG2 +
## EXPTIME + NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT,
## data = mastar_stars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.02681 -0.00130 -0.00061 0.00007 0.99917
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.913e-02 1.593e-02 -1.201 0.22988
## MJD 4.075e-07 2.708e-07 1.505 0.13231
## IFURA -6.475e-06 1.368e-06 -4.734 2.21e-06 ***
## IFUDEC 9.355e-06 5.085e-06 1.840 0.06581 .
## MNGTARG2 -1.350e-11 3.133e-12 -4.309 1.65e-05 ***
## EXPTIME -2.406e-06 7.817e-07 -3.078 0.00208 **
## NEXP_VISIT 4.386e-05 3.134e-05 1.400 0.16162
## NVELGOOD -8.253e-05 1.363e-05 -6.056 1.42e-09 ***
## HELIOV -1.223e-04 4.199e-05 -2.913 0.00358 **
## HELIOV_VISIT 1.234e-04 4.202e-05 2.936 0.00333 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01982 on 24280 degrees of freedom
## Multiple R-squared: 0.003767, Adjusted R-squared: 0.003398
## F-statistic: 10.2 on 9 and 24280 DF, p-value: 7.594e-16
So, there’s a lot of data here and there’s a lot of data to suggest all of the selected variables are extremely significant. The model has so much data and so few errors that explaining those errors seems easy specifically for the VERR datapoint. For each of the models let’s isolate some of the more significant variables:
verr <- lm(VERR ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME + NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT, data = mastar_stars)
verr_visit <- lm(VERR_VISIT ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME + HELIOV + HELIOV_VISIT, data = mastar_stars)
verrcode <- lm(V_ERRCODE_VISIT ~ IFURA + MNGTARG2 + EXPTIME + NVELGOOD, data = mastar_stars)
summary(verr)
##
## Call:
## lm(formula = VERR ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME +
## NEXP_VISIT + NVELGOOD + HELIOV + HELIOV_VISIT, data = mastar_stars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2214 -0.7541 -0.3047 0.3750 9.7019
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.853e+00 9.899e-01 -9.954 < 2e-16 ***
## MJD 1.869e-04 1.682e-05 11.112 < 2e-16 ***
## IFURA -5.489e-04 8.499e-05 -6.458 1.08e-10 ***
## IFUDEC -2.523e-03 3.160e-04 -7.985 1.46e-15 ***
## MNGTARG2 -2.187e-09 1.947e-10 -11.234 < 2e-16 ***
## EXPTIME 1.038e-03 4.858e-05 21.358 < 2e-16 ***
## NEXP_VISIT 9.719e-03 1.947e-03 4.991 6.05e-07 ***
## NVELGOOD -7.419e-03 8.468e-04 -8.760 < 2e-16 ***
## HELIOV -1.409e-02 2.609e-03 -5.401 6.67e-08 ***
## HELIOV_VISIT 1.327e-02 2.611e-03 5.081 3.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.231 on 24280 degrees of freedom
## Multiple R-squared: 0.08324, Adjusted R-squared: 0.0829
## F-statistic: 244.9 on 9 and 24280 DF, p-value: < 2.2e-16
summary(verr_visit)
##
## Call:
## lm(formula = VERR_VISIT ~ MJD + IFURA + IFUDEC + MNGTARG2 + EXPTIME +
## HELIOV + HELIOV_VISIT, data = mastar_stars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.150 -0.900 -0.278 0.483 238.008
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.258e+01 1.726e+00 -7.286 3.28e-13 ***
## MJD 2.319e-04 2.936e-05 7.899 2.93e-15 ***
## IFURA -6.591e-04 1.478e-04 -4.458 8.32e-06 ***
## IFUDEC -3.555e-03 5.530e-04 -6.429 1.31e-10 ***
## MNGTARG2 -3.005e-09 3.408e-10 -8.819 < 2e-16 ***
## EXPTIME 1.651e-03 5.092e-05 32.422 < 2e-16 ***
## HELIOV -1.818e-01 4.572e-03 -39.755 < 2e-16 ***
## HELIOV_VISIT 1.808e-01 4.576e-03 39.502 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.159 on 24282 degrees of freedom
## Multiple R-squared: 0.1162, Adjusted R-squared: 0.1159
## F-statistic: 456.1 on 7 and 24282 DF, p-value: < 2.2e-16
summary(verrcode)
##
## Call:
## lm(formula = V_ERRCODE_VISIT ~ IFURA + MNGTARG2 + EXPTIME + NVELGOOD,
## data = mastar_stars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.00560 -0.00120 -0.00057 -0.00003 0.99833
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.842e-03 6.351e-04 9.198 < 2e-16 ***
## IFURA -6.492e-06 1.332e-06 -4.875 1.09e-06 ***
## MNGTARG2 -1.330e-11 3.113e-12 -4.273 1.94e-05 ***
## EXPTIME -3.506e-06 5.428e-07 -6.459 1.07e-10 ***
## NVELGOOD -7.324e-05 1.195e-05 -6.128 9.03e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01982 on 24285 degrees of freedom
## Multiple R-squared: 0.00312, Adjusted R-squared: 0.002956
## F-statistic: 19 on 4 and 24285 DF, p-value: 1.291e-15
It is important to note that the R-Squared values, both Multiple and Adjusted are far too low to insist on any correlation or causation within the data. Each of these models is fairly rudimentary and the models themselves, even when refined, only are able to explain a maximum of 6% of the variation in error status. But before jumping to this conclusion let’s see something:
# Are the R-Squared values actually high for the dataset?
# Understanding the error percentage can help greatly.
error_percent = 100*(sum(mastar_stars$V_ERRCODE_VISIT!=0)/sum(mastar_stars$V_ERRCODE_VISIT==0))
error_percent
## [1] 0.1195334
So for the error codes specifically, we can explain a decent amount of the errors. There are very few errors in the data, only about 0.1195% of the data, so being able to even explain the smallest amount of tat variation can be useful.
Next let’s talk about the model diagnostics.
# As an example, take the most significant values from VERR_VISIT, our most explained variable
plot(mastar_stars$HELIOV, mastar_stars$VERR_VISIT,
xlab="HELIOV", ylab="VERR_VISIT", main="Heliocentric Velocity vs Errors")
abline(verr_visit, col=1, lwd=2)
## Warning in abline(verr_visit, col = 1, lwd = 2): only using the first two of 8
## regression coefficients
Looking at the graph of heliocentric velocity against errors there is
something of a small pattern. The data seems to suggest that error rate
increases as heliocentric velocity nears 0 and again around -200 with
many errors with VERR_VISIT being over 10. Let’s attempt to look at the
residuals
# Investigate potential time trend
plot(resid(verr), type="b", main="VERR Residuals vs Order", ylab="Residuals")
abline(h=0, lty=2)
plot(resid(verrcode), type="b", main="V_ERRCODE_VISIT Residuals vs Order", ylab="Residuals")
abline(h=0, lty=2)
plot(resid(verr_visit), type="b", main="VERR_VISIT Residuals vs Order", ylab="Residuals")
abline(h=0, lty=2)
# Residual Plotting
par(mfrow=c(2,2)); plot(verr); par(mfrow=c(1,1))
par(mfrow=c(2,2)); plot(verrcode); par(mfrow=c(1,1))
par(mfrow=c(2,2)); plot(verr_visit); par(mfrow=c(1,1))
From here we can see that there is no time correlation in our dataset,
this means we can do back and remove the MJD value from the data where
we need to. If we remove that variable from all of the models we should
expect different graphs for the Residual vs. Order.
Additionally, the VERR_VISIT Residuals graph has the best figure, with all of the scatterplots mostly conforming to the model’s line, with large clumps int he middle where the majority of observations occur. It is important that we recognize this can be flattened by removing outliers, but we are trying to characterize the outliers in this project.
So, now that we have three different models with varying accuracies, we should choose one to go off of. My choice would be the VERR_VISIT model as it is based on the error from each visit to the star by the observatory. Undertanding that the VERR and V_ERRCODE_VISIT variables are sparsely populated thanks to good engineering and good astronomers operating the telescope, VERR_VISIT is the best shot we have at characterizing the errors themselves.
From this investigation, we can use and refine the models to get a more honed picture of what causes the VERR_VISIT variable to change. From our data, we know the perpetrator is likely the heliocentric velocity and we know that the SDSS survey is more challenged at heliocentric velocities of 0 and -200 km/s.
Funding for the Sloan Digital Sky Survey V has been provided by the Alfred P. Sloan Foundation, the Heising-Simons Foundation, the National Science Foundation, and the Participating Institutions. SDSS acknowledges support and resources from the Center for High-Performance Computing at the University of Utah. SDSS telescopes are located at Apache Point Observatory, funded by the Astrophysical Research Consortium and operated by New Mexico State University, and at Las Campanas Observatory, operated by the Carnegie Institution for Science. The SDSS web site is www.sdss.org.
SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS Collaboration, including the Carnegie Institution for Science, Chilean National Time Allocation Committee (CNTAC) ratified researchers, Caltech, the Gotham Participation Group, Harvard University, Heidelberg University, The Flatiron Institute, The Johns Hopkins University, L’Ecole polytechnique fédérale de Lausanne (EPFL), Leibniz-Institut für Astrophysik Potsdam (AIP), Max-Planck-Institut für Astronomie (MPIA Heidelberg), Max-Planck-Institut für Extraterrestrische Physik (MPE), Nanjing University, National Astronomical Observatories of China (NAOC), New Mexico State University, The Ohio State University, Pennsylvania State University, Smithsonian Astrophysical Observatory, Space Telescope Science Institute (STScI), the Stellar Astrophysics Participation Group, Universidad Nacional Autónoma de México (UNAM), University of Arizona, University of Colorado Boulder, University of Illinois at Urbana-Champaign, University of Toronto, University of Utah, University of Virginia, Yale University, and Yunnan University.