Data is rapidly becoming the backbone of decision-making across various industries, the field of data analytics has piqued my interest and become a passion of mine, especially when it comes to public education. This collection of works represents the end of my academic journey in the Master of Science in Data Analytics program at UHD. Throughout this program, I have engaged in many different projects and collaborated with many different people. This program has assisted in honing my skills in data collection, analysis, visualization, and interpretation, which has allowed me to address complex problems with data-driven solutions.
This compilation showcases a variety of projects that span different domains, methodologies, and analytical techniques. Each project was designed to tackle real-world challenges, providing insights that can influence strategic decision-making in fields such as education, healthcare, finance, and more. This work reflects not only my technical proficiency but also my ability to apply analytical thinking to derive meaningful insights from raw data.
In these projects, I employed a range of tools and technologies, from traditional statistical methods to cutting-edge machine learning algorithms. This collection illustrates my journey from understanding fundamental concepts to mastering advanced analytics techniques. As you explore my works, you will encounter a blend of rigorous analysis, innovative problem-solving, and thoughtful application of data science principles. It is my hope that these works not only reflect my achievements as a student but also my readiness to contribute meaningfully to the field of data analytics in my future endeavors.
I began the Masters of Science in Data Analytics program in spring of 2023. My first semester was Statistical Foundations for Data Analytics (STAT 5301) and Programming Foundations for Data Analytics (CS 5301). Both classes did not require a project so my collection of works began summer of 2023 and continued on until summer of 2024.
Class: Multivariate Analysis and Nonparametric
Statistics (STAT 5311)
Professor: Dr. Kendra Mhoon
Team Members: Vanessa De La Cruz, Juan Robledo,
Fernanda Lehaci, Greg Ige, Dalya Ali
Task: Analyze body fat data and create a prediction
model
Data: Body fat of 252 men
Coding Language: JMP SAS
Models: Multiple Regression, Logistic Regression, and
Discriminant Analysis
Class: Applied Regression Analysis (STAT 5310)
Professor: Dr. Dexter Cahoy
Team Members: Vanessa De La Cruz, Jane Okon-Jeffrey,
Jessica Zapata, Dalya Ali
Task: In-depth regression analysis/modeling with an
adequate model fit
Data: Red and white wine quality
Coding Language: R Markdown, PDF or MS Word
Models: Regression Models
Class: Data Mining (CS 5310)
Professor: Dr. Pablo Rondon
Team Members: Amee Stevenson, Jeffrey Moreland, Dalya
Ali
Task: Lithofacies classification using well logs:
AutoGluon & Xgbtune
Data: Data of two wells drilled in a complex geological
setting
Coding Language: Python (Jupyter Notebook)
Models: Auto machine learning algorithms (AutoGluon
& Xgbtune)
Class: Information Visualization (CS 6301)
Professor: Dr. Ting Zhang
Team Members: Ryan Kelly, Dalya Ali
Task: PowerPoint Presentation using Tableau
Data: Student Performance in exams
Coding Language: Tableau
Models: At least 6 different visualization elements in
Tableau
Class: Project Management (TCOM 5340)
Professor: Dr. Joseph Sample
Team Members: Rene Cantu, Faisal Asif, Mohammed Khan,
Yasmin Noor, Dalya Ali
Task: PowerPoint Presentation on fulfilling a client
need
Data: Student Performance in STAAR Interim at
Willowridge High School
Coding Language: Python (Jupyter Notebook)
Models: Pandas dataframe
Class: Database Management Systems (CS 5318)
Professor: Dr. Shengli Yuan
Team Members: Tiebing Luo, Amrutha Nanduri, Dalya
Ali
Task: Create a client database system
Data: Restaurant Data with consumer ratings
Coding Language: mySQL
Models: ER diagram, ERD and normalized to 3NF, SQL
statements
Class: Biostatistics (STAT 6312)
Professor: Dr. Kendra Mhoon
Team Members: Jane Okon-Jeffrey, Amrutha Nanduri, Bhanu
kiran, John Feng, Dalya Ali
Task: PowerPoint Presentation on child obesity
Data: using BMI to determine obesity in children
Coding Language: Microsoft Excel
Models: Regression Analysis
I am torn between several options for this course. However, my first choice is my Applied Regression Analysis project (STAT 5310). The reason I would like to expand on it is because of how much coding I learned during the class but I am indifferent to it because the data set is not something I am excited about. In regards to its coverage to data analytics workflow, I believe it covers all aspects. The data preparation was just a simple Kaggle collection that required no data cleaning; the data analysis was the one of the most complex algorithms I had used, the reporting/visualizations required data mining and I had not took that class yet, and for our results our group concluded the possibility of reducing costs. A possible way I could expand on the project would be visualizations, my knowledge was limited at the time. another possible way I could expand on the project is to utilize auto machine learning algorthims which I was unaware of at the time. I am not sure how exactly but that could be a possible expansion.
My second project of choice would be the Project Management (TCOM 5340) assignment. Unlike, the previous choice I am very excited about the data. In the project I was assigned as the client liaison, which entailed finding the client and having weekly meetings with them. Our client was the school that I work at, Willowridge High School and my direct contact was the principal. I got to work with an amazing team to analyze my campuses data and try to assist my data driven campus that is in a Campus Improvement Plan. It became a passion project of mine that I would like to expand on. As for the data analytics workflow, again, I believe it covers every step. The data preparation was more complex than any other project I did. In order to be FERPA compliant, I had to create a key of alias at campus, then provide the team with the anonymous data. I had to revert it back when presenting to the client. An issue I have is the data analysis piece, it was performed in Python but I am not sure how that would working since this class is using RMarkdown. A possible expansion on the project would be how accurate the prediction was because at the time, the students had not taken STAAR. I now have the results and am interested in seeing the comparison.
The third and final choice is Biostatistics (STAT 6312) project. Although, it is highly unlikely that I will pick this one, I did simply because you asked for 3 projects. I feel that it is a bit too simple and I really want to focus on refining my coding (my first choice) or expanding on my career path in education (my second choice). It does cover all steps for the data anaytics flow from data preparation to sharing our findings/visualizations. I could possible expand the project with some predicitive/data mining analysis. But again, this is the bottom of my 3 choices.
Yasserh. (n.d.). Wine quality dataset. Kaggle. https://www.kaggle.com/datasets/yasserh/wine-quality-dataset
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). Wiley.
Dunphy, L. M., Winland-Brown, J. E., Porter, B. O., & Thomas, D. J. (2023). Primary care: The art and science of advanced practice nursing - An interprofessional approach (6th ed.). Jones & Bartlett Learning.
National Institute of Diabetes and Digestive and Kidney Diseases. (n.d.). Overweight & obesity statistics. National Institutes of Health. https://www.niddk.nih.gov/health-information/health-statistics/overweight-obesity
Lantz, B. (2015). Machine learning with R (2nd ed.). PACKT Publishing Ltd.
Jones, B. (2014). Communicating data with Tableau. O’Reilly Media.
Huang, Y., Wen, X., Zhan, W., Zhang, W., & Shen, X. (2023). Hierarchical automated machine learning (AutoML) for advanced unconventional reservoir characterization. ResearchGate. https://www.researchgate.net/publication/373386938_Hierarchical_automated_machine_learning_AutoML_for_advanced_unconventional_reservoir_characterization
National Institute of Diabetes and Digestive and Kidney Diseases. (n.d.). Overweight & obesity statistics. U.S. Department of Health and Human Services. https://www.niddk.nih.gov/health-information/health-statistics/overweight-obesity