MVA Class 2 — Distance

Class Logistics and Introduction

  • The class recordings will be available for students to watch again if needed, and the instructor had to briefly pause the recording due to an issue with their cat (00:00:28).
  • The instructor discovered typos and mistakes in the syllabus on the day of the class, but they are minor and the online version is correct (00:02:08).
  • The class is a multivariate analysis and machine learning class, with a focus on Multivariate statistics taught from a modern angle (00:02:47).
  • There will be a full machine learning class in the spring that will cover more material, but this class will still cover some machine learning topics (00:02:51).

Instructor and Communication

  • The instructor’s name is Jaber, and students can contact him via direct message on Teams, as email and phone may not be reliable (00:03:30).

Course Content and Overview

  • The class covers multivariate data from a modern perspective, and the instructor will add to the existing knowledge in the field (00:04:23).
  • An example of multivariate data is text analytics, such as comparing coded qualitative interviews, which will be covered later in the semester (00:05:08).
  • Image analysis is a topic that the instructor wants to incorporate into the class but finds it too difficult due to data handling requirements (00:05:59).
  • A grayscale image can be thought of as a matrix with numbers, allowing for image analysis, and can be considered as multivariable data (00:06:18).
  • Color images are more complicated than grayscale images and require advanced math to work with, and will be touched upon in the context of color vision and preference data (00:06:41).
  • Preference data involves comparing different stimuli and deciding which one is preferred, with preference relationships being potentially complicated (00:07:03).
  • Educational test data and psychological test data, such as the PHQ-9 or state accountability tests, are examples of multivariate data that are usually analyzed using methods like IRT or structural equation modeling (00:07:29).
  • Genetic studies, such as an epigenetic experiment with 49 subjects and 7,000 genes, can be analyzed using multivariate analysis (00:08:02).
  • Biosensor data, like Electroencephalography data with multiple measurement spots on the head, can be analyzed using multivariate analysis (00:08:42).
  • Allometry, the study of shape, can be applied to understand how things change over time, such as how children grow and develop (00:09:01).
  • Regression analysis often involves predicting one outcome with a multivariate space of predictors, and understanding the relationships between the predictors can help diagnose issues like colinearity (00:09:54).
  • Multivariate data often requires Data reduction to visualize and analyze, as it is not possible to visualize a large number of variables, such as 7,000 variables (00:10:23).
  • Data reduction involves reducing the dimensionality of the data to make it more manageable, and also involves removing noise and redundancy (00:10:27).
  • The course will cover multivariate analysis methods, focusing on linear algebra, and will also touch on other mathematical concepts, with the goal of familiarizing students with important math for multivariate analysis (00:11:13).

Statistical Approach and Methods

  • In statistics, having a grounding in the substance of what is being studied is crucial for doing good analysis and answering useful and important questions (00:11:34).
  • There are different schools of thought in approaching statistics, and the approach taken in this course will be from the French language or Dutch language schools, which focus on theoretically motivated data description over formal statistical inference (00:12:37).
  • This approach is different from the Anglo American plc school, which focuses on hypothesis testing, and will not be the primary focus of the course (00:12:56).
  • The course will cover topics such as resampling statistics, bootstrapping, and other methods, but will not focus on topics like Multivariate analysis of variance, which is considered to be more relevant to other classes like HLM or SEM (00:13:05).
  • The French School of statistics was known for working on smaller problems and using smaller data sets, and the course will cover both small and big data sets (00:14:11).
  • The methods covered in the course have been developed for high-dimensional environments, where the number of variables exceeds the number of observations, and will address issues that arise in such situations (00:15:08).
  • The course will spend time thinking about how to deal with more variables than cases, which is a common problem in high-dimensional data analysis (00:15:42).

Multivariate Analysis and Machine Learning

  • The border between multivariate analysis and machine learning is not clearly defined, and the last third of the class will cover machine learning topics that build upon previously studied concepts (00:15:47).

Class Recordings and Access

  • The class is recorded, and a Zoom Video Communications link will be provided after a couple of hours of processing, usually posted on the day of class or the day after (00:16:26).

Prerequisites and Software

  • Course prerequisites include familiarity with algebra, and linear algebra will be covered in the class, including matrices and other related topics (00:17:01).
  • The class will use R, and students are recommended to have prior knowledge of R or take an introduction to R before the class, as the R code used will be kept simple (00:17:32).
  • Students should be familiar with R data types, such as vectors, matrices, and data frames, as well as functions like coercion (00:18:25).

In-Class Activities and Code

  • The instructor will post a link to the code for in-class activities, usually a day before or on the day of class, which will cover 90-95% of the material (00:19:04).
  • The instructor will occasionally update the code and post a fixed version if mistakes are found during class (00:19:38).
  • Students are encouraged to follow along with the code during class (00:19:51).
  • The instructor will be doing examples in the last hour of class and encourages students to follow along and ask questions (00:19:55).

R Packages

  • Several packages will be used in the class, including stat, match, M, psycho, R, factoMineR, vegan, smuff, ellipse, and anacore (00:20:10).
  • factoMineR will be used the most, as it does 80% of what is needed and has some really nice features, as well as extra packages that do machine learning things that integrate nicely with factoMineR (00:20:37).

Recording Policy and Availability

  • The instructor has a policy of not deleting recordings, but rather, they stay around and everyone loses access to them, and then Zoom Video Communications auto-deletes them after a while (00:21:39).
  • Students will have access to the recordings throughout the semester, but are asked to be judicious about sharing them with others (00:21:56).
  • If the recorder is not turned on or something happens during class, the instructor will redo the class and make a new recording (00:22:39).
  • There are also additional videos available that provide a more leisurely tour through matrices, which will be posted for students to access (00:23:01).
  • The videos can be watched at 1.5 speed for review, and the instructor assumes that students will use them as a resource (00:23:26).
  • The videos will be available under the video section in Teams, and the instructor no longer needs to upload them manually due to changes in Zoom’s video retention policy (00:23:39).
  • Links to Zoom Video Communications recordings will be posted in the video section of the team, and they will include a passcode for access, which will be provided (00:24:32).

Assessment and Grading

  • The course assessment will consist of three problem sets, which will be due in late September, early November, and during finals week, and the first assignment will be uploaded soon (00:27:52).
  • The problem sets are not cumulative, but the material is, so it is essential to understand the earlier material to make sense of the later material (00:28:11).
  • The instructor often uses the same data sets multiple times, so students will work with the same data set in different assignments to gain experience and see how it is used (00:28:27).
  • Homework assignments will be handled through Microsoft Teams, and students will need to upload their responses in PDF format to receive their grades back through the platform (00:29:59).
  • If there are issues uploading files to Microsoft Teams, students can email their files to the instructor, who will figure out a solution (00:30:29).
  • Students are required to submit their code in a separate file, which can be included at the end of the PDF if desired (00:30:46).
  • It is essential to interpret results and not send uninterpreted output or code, as the point of the assignment is to analyze and think about the results, not just send code (00:31:01).
  • If assignments are submitted without interpretation, they will be returned and marked late (00:31:25).
  • There is no need to submit assignments as R Markdown documents, as some cases may cause R Markdown to fail and require complicated workarounds (00:31:49).
  • The instructor may use the “try” command in R Markdown to allow R to fail, which can help students understand how to handle failures in their analyses (00:32:03).
  • Students may write graphs using stylist or pencil and paper (00:33:05).
  • The instructor plans to update to Brightspace, a new Learning management system (LMS), which is considered better than Blackboard Learn (00:29:21).
  • When submitting assignments, put problems in the specified order, but it doesn’t matter what order they are completed in, just ensure they are in the correct order in the document (00:33:12).
  • Merge separate PDFs and optimize the file to avoid extremely large documents, as files over 100 gigabytes are unnecessary for the class (00:33:30).
  • The scoring rubric is straightforward, with guidelines for a good answer including reasonable work shown, explanation of reasoning, and not needing to be overly lengthy (00:34:02).
  • Indicate what is written thought, computer code input, and computer output, and do not mistake volume for quality, as shorter answers are preferred (00:34:52).
  • Late assignments will be penalized 10% if submitted more than a week past the due date, but it’s essential to stay on top of homework to avoid falling behind in the cumulative material (00:35:48).
  • The Grading in education used is the Grad Center standard, which may be subject to change, but has remained consistent in the past (00:36:20).
  • Incompletes are no longer loosely handled by the Grad Center, and there is no discretion for the instructor to make exceptions, so it’s essential to follow the rules (00:36:41).
  • If an incomplete or unofficial withdrawal is needed, it’s crucial to have a conversation about it early on and discuss a plan with the department chair (00:37:17).
  • If a student is having trouble keeping up with the coursework, they should inform their advisor or someone above their level, so that their instructors can be informed and necessary arrangements can be made (00:37:49).
  • Incompletes and Withdrawals (WUs) are governed by Grad Center rules, and students should familiarize themselves with these rules (00:38:37).
  • International students or students with funding dependencies should consult with their advisor and relevant offices before taking any incompletes or withdrawals (00:39:00).

Course Structure and Topics

  • The course is roughly divided into four parts, with the first five classes covering concepts such as proximities, distances, matrices, covariance matrices, and related quantities (00:39:47).
  • The course will cover five different multivar techniques: Principal component analysis, multi-dimensional scaling, correspondence analysis, clustering, and discriminant analysis (00:41:22).
  • These techniques are mostly unsupervised learning methods, except for discriminant analysis, which is a supervised learning method (00:41:39).
  • The course reading materials will be uploaded to the General files section, and students can request additional materials if needed (00:41:59).
  • The five multivar techniques covered in the course are variations on a theme, with PCA being a generalized form that relates to the other techniques (00:42:24).
  • The topics to be covered are related to each other in various ways, and the goal is to provide a comprehensive viewpoint on how they are connected (00:42:33).
  • The topics include regularization, dealing with situations where N is less than P, and using an autoencoder, which is a machine learning method closely related to PCA (00:42:51).
  • An autoencoder is a generalization of Principal component analysis and will be used to explore aspects of statistical inference, including jackknifing, bootstrapping, and permutation (00:43:09).
  • Statistical inference will be covered, focusing on computationally intensive methods, and measuring performance using cross-validation, ROC, precision, and recall (00:43:41).
  • Cross-validation is a method for measuring the performance of models by taking a small chunk of the data out, fitting the model on the rest, and then seeing how well it predicts the held-out data (00:44:15).
  • The last example to be covered is spam filtering, a text analytics method that involves using various techniques, including cross-validation and Mahalanobis distance distances (00:44:54).

Syllabus and Learning Approach

  • The syllabus may be subject to change, and there may be a guest speaker for the spam filtering topic (00:45:28).
  • When reading the syllabus, it’s recommended not to get bogged down in the details, but rather to skim and revisit important points as needed (00:46:02).
  • It’s recommended to ask questions in class and approach the work without feeling the need to know everything, as no one can know everything (00:46:51).

Course Materials and Resources

  • All necessary files for the class will be placed in the “General Files” and “Class Materials” sections, making it easier to find the required materials (00:47:47).
  • If students need to message the instructor, they can use the chat option in Teams or send an email, but Teams alerts are more reliable, especially on mobile devices (00:48:22).
  • Homework assignments will be posted in the “Assignments” section, and students will be required to upload their work there (00:49:01).
  • If hidden channels appear in Teams, students should notify the instructor, who will make the channel visible (00:49:13).
  • Teams is primarily designed for business use and is not a Learning management system (LMS), but it is still a useful tool for the class (00:49:37).

Example Dataset and Analysis

  • The instructor will be using R code and data from a book by Bernard Flury, a specialist in multivariate analysis, to work through an example (00:50:14).
  • The book by Bernard Flury is highly recommended, but it is challenging and has a lot of mathematical content (00:50:36).
  • Although the book is not assigned, it is available for students to use, and the instructor will be using some examples from it in the class (00:51:41).
  • A dataset from a book contains facial measurements designed to create better masks, which were collected in the 1980s (00:52:08).
  • The data were found to be interesting and relevant during the COVID-19 pandemic, despite being old, as they could be used to understand physical measurements and the challenges associated with them (00:53:27).
  • The dataset is simpler compared to more complex data used in facial recognition or automated emotion capture, but it still provides insight into physical measurements (00:54:16).
  • The measurements in the dataset are referred to as “landmarks,” which are used to track specific points on the face (00:56:20).
  • The reason landmarks are used is that some facial measurements, such as the distance from the nose to the tip, do not change significantly, making them useful for analysis (00:56:36).
  • A professor at New York University, Ian Reed, was mentioned as someone who worked on facial expression modeling, and another professor, Josh Aronson, was mentioned as someone known for his work in the field (00:55:28).
  • The conversation also mentioned a professor who worked on micro-expression detection research, analyzing videos to detect subtle facial expressions (00:55:05).
  • Landmark measures are typically used to track facial features that do not change from one expression to another, making them useful for tasks like creating masks that fit different face shapes (00:56:50).
  • These measures can be used to focus on the parts of the face that change, rather than tracking extra features, and are essential for creating accurate masks (00:57:08).
  • The measurements include distances from the point of the chin to the ear, the bridge of the nose to the ear, and other similar metrics (00:57:23).
  • A study measured 25 variables in 900 members of the Swiss Army, but only six of these variables are considered most important for the fit of protective masks (00:58:01).
  • The six key variables are defined in Figure 1.5, and data is available for 200 male soldiers and female soldiers (00:58:11).
  • The data will be used for discriminant analysis to determine if it can distinguish between males and females, and if there is enough information to generate a good analysis for each gender (00:58:31).
  • The data and textbooks are available in PDF format, including works by authors such as Husan, La, P, Herle, and Simon (00:59:12).
  • The software FactoMineR will be used for analysis, and it provides a rundown of its capabilities and examples (00:59:45).
  • The necessary data and code have been uploaded, including the FluryHData package, which contains the required datasets (01:00:21).
  • To install the FluryHData package, users need to install it from a local repository, selecting the package archive file and installing it (01:00:52).
  • The data sets used are not particularly interesting from a modern perspective, but they serve as useful teaching examples (01:01:38).
  • The data set is organized into two pieces: males and females, with a binary indicator variable “female” where 0 represents men and 1 represents women (01:02:12).
  • The data set has 259 rows and 7 columns, and it is a data frame rather than a matrix because the “female” variable was made a factor (01:03:40).
  • A factor in R is a coded categorical variable with a special data type (01:04:11).
  • The warnings in R can be turned off to avoid repetitive warnings, especially when resizing plot windows, but it’s essential to turn them back on to catch important warnings (01:04:50).
  • The solution to avoid excessive warnings is to add a line of code at the beginning of the file to turn off warnings, and then turn them back on when needed (01:05:35).
  • The screen is usually arranged to prioritize the code, with the data sets section minimized since it’s already familiar (01:06:00).
  • Multivariate analysis involves developing special methods to handle larger data sets with higher dimensionality than those typically handled in introductory statistics (01:06:39).
  • In introductory statistics, smaller data sets are usually dealt with, but as data sets grow, it becomes necessary to cope with larger dimensionality and more variables (01:06:52).
  • Descriptive statistics is crucial for identifying problems in data, such as incorrect or impossible entry values, and should be run when working with a new data set, regardless of the complexity of the analysis (01:07:44).
  • Descriptive statistics can be used to visualize and understand higher dimensional data and lower dimensional spaces, making it easier to identify patterns and issues (01:07:15).
  • The summary function in R provides a lot of information, including means, quantiles, and standard deviations, but can be overwhelming and mixes different types of data (01:09:27).
  • The summary function treats factors, such as the variable “female”, differently than other variables, providing counts instead of descriptive statistics (01:10:24).
  • The measurements in the data set are in millimeters and appear to be plausible, with no negative or zero values (01:10:58).
  • A general rule in statistics is that if the mean and median differ significantly, the data is likely not symmetric, but in this case, the mean and median are similar (01:11:26).
  • The term “marginal” in statistics refers to averaging over a particular variable, such as females and males, and can be used to understand the data from different perspectives (01:11:55).
  • The means and standard deviations of different facial measurements are being analyzed, with the means varying and the standard deviations not differing strongly from each other, except for a few cases where they are almost twice as large (01:12:36).
  • Physical measurements tend to have larger standard deviations for bigger measurements, which is a common phenomenon (01:13:25).
  • The measurement with the higher standard deviation is the minimal frontal breadth (mfb), with a value of 6.8, compared to Elgan (LTG) with a value of 4.4 (01:13:50).
  • The correlations between different measurements are all positive, indicating a notion called size, where larger measurements are associated with larger measurements on average (01:15:28).
  • The correlations are not extremely large, with the largest being around 0.6 and the smallest being around 0.1 (01:15:45).
  • When analyzing the means and standard deviations separately for males and females, it is found that male means are larger than female means, but female standard deviations are larger than male standard deviations, which is an unusual finding (01:16:48).
  • The differences in means and standard deviations between males and females suggest that it may not be appropriate to analyze the data with males and females merged together (01:17:29).
  • The analysis of the data will be done separately and then merged to see the results from different approaches (01:17:34).
  • A question is raised about the possibility of the standard deviation of the female sample going down if a larger sample size were obtained, given that the current sample size is smaller (01:17:43).
  • It is acknowledged that in smaller samples, outliers can significantly impact the results, and a larger sample size may reduce the effect of these outliers (01:18:22).
  • The importance of using the “hand up” feature in the online class is emphasized to facilitate communication and avoid confusion (01:18:36).
  • The conclusion drawn from the analysis is that the male sample is reliable, but the female sample is too small to draw conclusive results (01:19:24).
  • The data will be revisited throughout the semester to explore different methods of analysis (01:19:36).
  • The possibility of using an autoencoder for outlier identification is mentioned, as autoencoders are effective in identifying outliers (01:20:12).
  • The importance of having a sufficient sample size is emphasized, and it is suggested that a sample size of 59 women may not be enough to draw conclusive results (01:20:47).
  • The limitations of the data, including the fact that it is from the 1980s and may not be representative of the original data set, are acknowledged (01:21:22).
  • The correlations between variables were analyzed separately for males and females, revealing a negative correlation in the female data set that was not present in the male or mixed data sets (01:22:24).
  • The correlations in the female data set seemed lower compared to the male data set, while the male correlations were slightly higher (01:23:07).
  • When two disparate groups are mixed, the correlation can be inflated due to the averaging of correlations across groups, which is known as “groupness” (01:23:35).
  • The correlation can be affected by the way the groups are mixed, and some of the correlation may be due to the differences between the groups (01:24:00).
  • Box plots are used to visualize the data, and they are oriented around quantiles rather than means (01:25:42).
  • The notch in the box plot represents an approximate confidence interval for the median, indicating whether the medians are different between groups (01:25:40).
  • The box plot shows more variability in the female data set compared to the male data set (01:26:08).
  • Exploratory data analysis, such as the one being done, is essential for understanding the data set and catching potential problems or issues (01:26:21).
  • Quantile plots can be used to further analyze the data, and in this case, the plots will be broken down by males and females (01:26:46).
  • QQ plots are used for normality testing, and they can help determine if data appears to be Normal distribution. (01:26:54)
  • The QQ plot for the whole data set shows that it does not appear to be Gaussian, but within each group, the data appears to be Gaussian. (01:27:13)
  • Simulation envelopes are used to show how different the sample statistics could be and still be considered Gaussian. (01:27:43)
  • The car package in R can be used to create QQ plots, and the function to use is QQnorm() or possibly QQP(). (01:27:58)
  • The data appears to be Non-Gaussianity marginally, but this is because the data is a mix of two populations (women and men) that differ strongly on the variable being tested. (01:28:49)
  • The car package has a companion book called “Companion to Applied Regression” that contains useful information and examples. (01:29:34)
  • The car package also has an upgraded version of scatterplot matrices that can show all pairs of variables, Quantile plots on the diagonals, and probability ellipses. (01:29:57)
  • The R Commander package can be helpful in creating graphs and code, especially for complex functions, and can be used to interface with the FactoMineR package. (01:30:36)
  • Facto minor commands can be complicated with many options, making it hard to figure them out without a GUI, which can lay out the options in a helpful way (01:32:00).
  • A plot version of the data shows negative correlation and potential issues with the mfb variable, which may not be a problem but is worth noting (01:32:42).
  • The plot also shows that males tend to be smaller than females, with some possible outliers (01:33:47).
  • Ellipses in the plot are useful for diagnosing variability, but this type of analysis is not feasible with a large number of variables (01:33:39).
  • The car team is praised for their well-done Regression analysis plots and visualizations, but these are not suitable for large datasets (01:34:24).
  • A heatmap function in base R can perform cluster analysis on cases and variables separately, trying to identify which variables and cases are more similar to each other (01:35:13).
  • The heatmap shows that Elan has the smallest mean and LTG has the largest mean, which is identified by the cluster analysis (01:36:34).
  • The variables LTG, LGAN, and the rest are analyzed, and it’s found that LTG has the largest value, while the rest are somewhat similar to each other, but the cluster analysis isn’t very useful in this case (01:36:48).
  • When there’s a massive trend with totally different means, subtracting the means out can help get rid of the trend and look at the data again (01:37:15).
  • The scale function in R is used to center and/or scale the columns of a matrix by subtracting out the means and possibly dividing by the standard deviations (01:37:35).
  • The sweep function is a more advanced function that allows for centering, scaling, multiplying, and other operations in a column-wise or row-wise manner (01:38:18).
  • The scale function is used to remove the overall mean from each column, and then a heatmap is generated to show which variables seem to be more or less similar to each other (01:38:49).
  • The heatmap shows that there are pockets of variables that are very similar to each other, such as TFH (true facial height) and MFB, which are very low and very small, respectively (01:39:35).
  • The facto minor package is introduced, which is a package that will be used a lot throughout the course, and it has an ecosystem of associated packages, including MissMDA for missing data analysis and multivariate analysis (01:40:09).
  • Facto shiny is a package that allows for interactive visualizations and can be used to ship off to a web page and run on the web (01:41:03).
  • Facto investigate is a package that generates automated interpretation for the results (01:41:25).
  • The file generated by an automatic interpretation contains a Microsoft Word document that can be edited and includes automated plots, available in both English language and French language (01:41:35).
  • The analysis method used is Principal component analysis (Principal Components Analysis), a simple multivariate analysis method that helps understand how variables are related (01:42:27).
  • In this analysis, an indicator variable is used to identify whether a participant is male or female, without separating the data by gender (01:42:49).
  • PCA analyzes the rows of the data, which represent the participants, and generates a plot showing the relationship between the participants and the variables (01:43:21).
  • The analysis shows that females are associated with Dimension One, while males are associated with Two-dimensional space, with some variables having a stronger association with one dimension than the other (01:43:33).
  • The number of dimensions is limited by the number of variables, which is six in this case, and the analysis provides information on the association between the variables and the dimensions (01:44:01).
  • The variables used in the analysis include Elan (length from globella to Apex Nai), TF (total facial height), and others, which are correlated with each other (01:44:25).
  • The analysis provides additional information, such as variable percentage breakdowns and correlations, which can be accessed through a summary report (01:45:22).
  • FAO Miner integrates cluster analysis to help interpret the results of Principal component analysis and other analysis methods (01:45:51).
  • The cluster analysis groups the individuals based on their characteristics, with females and males forming separate clusters (01:46:17).
  • The variables in the dataset are associated with face shape, with Dimension 2 essentially representing long, rounder, or taller, narrower faces (01:46:53).
  • The clusters were made based on the person, using the full dataset but projecting it down into the first two dimensions for visualization (01:47:59).
  • The two axes on the PCA graph represent the two PCA components, which can be interpreted to understand how the variables are associated with each other (01:48:58).
  • The variables “Bam” and “Mfb” are closely associated with each other, while another measure is associated with Dimension 2, representing the height of the face (01:50:07).
  • The other measures are more associated with pure size, and in an allometric study, the size dimension is often removed to focus on subsequent dimensions (01:50:51).
  • The importance of considering face shape and size is illustrated by the example of wearing masks, where different shaped faces can affect the fit of the mask (01:51:24).
  • Different types of masks fit better or worse, and the variables are strongly associated with the first dimension, primarily oriented around overall size, with examples including BAM and MFB, while LGAN and TFH are more about shape and breadth of angular men (01:51:48).
  • The association between variables is determined by angles, with wider angles resulting in less association, and this is shown by correlations such as cosiness (01:52:54).
  • The number of correlations between variables increases rapidly, with six variables resulting in 15 correlations, and 50 variables resulting in 600 correlations, which is difficult to understand and manage (01:53:10).
  • To manage this, correlations are “squished down” into a smaller space, making it easier to cope with, and this is essentially what PCA (Principal component analysis) does, reducing variable space to a more manageable level (01:53:51).
  • The concept of variable space is also referred to as “person space” (01:54:07).
  • Probability ellipses can be used to show the distribution of cases, such as 95% of male and female cases falling within certain areas, and can help identify overlap and non-overlap between groups (01:54:31).
  • A shiny app can be used to generate reports and provide graphical options, allowing users to select the language, graphs, and other features they want, and can also perform clustering and generate automated reports (01:56:01).
  • The report generated by the software will contain a lot of information, giving a good idea of what the analysis entails, and it’s recommended to let the software write the report to get a better understanding (01:57:23).
  • It’s suggested not to trust the software completely for analysis, but rather use it to get suggestions and help, as it’s very good at providing useful insights and ideas (01:57:40).
  • The software can take graphs and allow users to alter and change them as needed, making it a really cool tool for analysis (01:58:07).
  • The software may take some time to generate reports, and it’s not uncommon for it to time out, but it’s usually fine and can be run again (01:58:25).
  • Multivariate statistics statistics are used to manage large datasets that can’t be handled manually, and the software is helpful in this regard (01:59:12).
  • Facto Miner is a recommended plugin for the software, as it has done excellent work and has a lot of useful options and material (01:59:37).
  • Facto Miner has a plugin for Commanders and can be used with Shiny, making it a versatile tool (01:59:49).
  • The software is generally well-regarded and has a lot of useful features, making it a good choice for analysis (02:00:05).
  • It’s recommended to try the software out personally, as it may work better with fewer tabs open in the browser (02:00:59).
  • The instructor usually provides the R file in advance, but it might not be the final version, and it can be found under the “file” section, often linked in the schedule (02:03:08).
  • The instructor plans to upload the R file used in the class (02:02:28).
  • The class that will be meeting in person will also be available online, allowing students to listen and participate remotely if they cannot attend in person (02:03:45).
  • If a student wants to attend the in-person class but it is being held virtually, they can still attend the in-person class, but it will not be in a computer lab (02:04:18).
  • The in-person class is currently held in the Ed Edge educational psychology conference room, as the instructor found that using computer labs was problematic due to issues with downloading packages (02:04:34).
  • Students are encouraged to bring their laptops to the in-person class, but if they do not have one, they can watch along with someone who does or watch the class online later (02:05:03).
  • The class recording will be available on the video channel after a couple of hours of processing, and the link will be shared for those who want to watch it again or review specific parts (02:06:47).
  • The next class will be held in the Grad Center, room 3204, and students who cannot physically attend are encouraged to participate remotely (02:07:08).
  • The next class will cover the topic of proximities and distances (02:07:35).
  • There are two assigned readings for the next class: an article on distances and another article that applies the concept of distances, co-authored by the instructor (02:07:43).
  • The instructor rarely assigns their own work but made an exception for this article, which was written with a few graduate students a couple of years ago during the COVID-19 pandemic (02:07:48).
  • The instructor will contact Peter separately and bids farewell to the rest of the class (02:08:24).