class: center, middle, inverse, title-slide .title[ # Select a Project of your Choice ] .institute[ ### UNSW Data Science Hub ] .date[ ### 25-11-2024 ] --- layout: true <div class="my-footer"><span>UNSW Data Science Hub / <a href='https://www.unsw.edu.au/research/udash'>uDASH</a></span></div> <!-- this adds the link footer to all slides, depends on my-footer class in css-->
--- class: inverse, center, middle # Lifecycle of your Data Science Project --- # Lifecycle of your Data Science Project - Problem description & data acquisition - Data preparation - Data cleaning - Transformation - Exploratory data analysis (EDA) - Data modeling - Visualisation and communication --- class: inverse, middle # Projects -
Psychology: Emotions during the pandemic -
Business insights: Small automotive business -
Entertainment: Video Games reviews -
Economics: Big economic data -
Climate: Weather & Air Quality in Schools - <img src="data:image/png;base64,#images/crab-svgrepo-com.svg" alt="Mud Crab Icon" style="width: 1em; height: 1em;"/> Ecology: Mud Crab tracking --- class: inverse, center, middle #
Psychology: Emotions during the pandemic >In studies of emotions, a robust finding is that there is a relationship between aging and improved emotional experience. It's a well-recognized fact that older people often report higher levels of emotional wellbeing compared to younger people. ***During the pandemic, did older people still feel better emotionally than younger ones?*** --- ## Emotions ... 945 participants with 107 variables! - Primary variables: 16 positive and 13 negative emotions - frequency and intensity of positive emotions - frequency and intensity of negative emotions - average frequency and intensity of positive and negative emotions - age - Other variables (personality traits): - Openness, - Conscientiousness - Extraversion - Agreeableness and - Emotional Stability --- ## Emotions ... #### Data Data and metadata available here: [
//osf.io/h7uqv/](https://osf.io/h7uqv/). You can download this file from R! #### Cleaning and transformation Select and rename your variables #### Explore Range of ages, mean and extreme values of intensity and frequency of emotions, personality traits #### Model Increase or decrease of emotions by age? differences between groups? --- ## Emotions ... ### Visualise .pull-left[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/emo-eda-1.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/emo-viz-1.png" width="100%" /> ] --- class: inverse, center, middle #
Business insights: Small automotive business > By using data analytics, small businesses can gain insights into their customers, their products, and their operations. This information can help them make better decisions about how to run their business. ***Which customers contribute more income around the year?*** --- ## Small business ... This is a dataset of the jobs performed by a small automotive business located in Botany in 2010-2020. The dataset is itemised by each job performed and further by invoice number such that several jobs (and several cars) can be on a single invoice. The dataset also contains basic information about each car (make, model, date of last service, odometer reading), information about the date and cost of the job, and the postcode of the customer. - Variables of interest: - Date of the job - Cost of the job - Other variables: - Postcode of costumer - Car make and model - Date of last service - Odometer reading --- ## Small business ... #### Data The data will be provided by uDASH staff. #### Cleaning and transformation The dataset was entered by hand and thus has inconsistencies and omissions that will require cleaning. Some data cleaning has already been performed. #### Explore Histograms to show the spread and frequency of vehicle repair costs. Transform data to improve Visualisation. #### Model Linear regression inquiry: Does the year of manufacture of a vehicle influence its repair costs? --- ## Small business ... #### Visualise .pull-left[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/smallbiz-eda-1.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/smallbiz-viz-1.png" width="100%" /> ] --- class: inverse, center, middle #
Entertainment: Video Games Data > The video game industry has grown from niches to mainstream. It makes now more revenue than the international film industry. It has influenced the advance of personal computers with sound cards, graphics cards and 3D graphic accelerators. ***Do Metacritic scores reflect popularity among users?*** --- ## Video Games ... The dataset presented here are the metascores of videogames from 2000 to 2018 - Variables of interest: - metascore: averaged scores from reviews - userscore: averaged scores from user ratings - Other variables: - Console name - Release date - Name of the game --- ## Video Games ... #### Data The data has been assembled from two datasets: - Metacritics scores [
kaggle](https://www.kaggle.com/datasets/destring/metacritic-reviewed-games-since-2000), - Video game sales [
kaggle](https://www.kaggle.com/datasets/gregorut/videogamesales), #### Cleaning and transformation Transform userscore from character to numeric, transform date from character to date, group consoles, etc #### Explore Boxplots of variables #### Model Multiple linear regression --- ## Video Games ... #### Visualise .pull-left[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/game-eda-1.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/game-viz-1.png" width="100%" /> ] --- class: inverse, center, middle #
Economics: Big economic data > FRED-MD is a comprehensive collection of economic data compiled monthly by the Federal Reserve. This data is ideal for anyone interested in how the economy has changed over time. With information dating from 1959 to 2023, it covers a wide range of topics like employments, stock market prices, and production, offering a detailed look at the U.S. economy's history. ***What drives economic cycles?*** --- ## Big economic data ... - Variable of interest: choose any from - stock market returns, - unemployment rate, - inflation rate, etc. - Other variables: - financial indices - housing, - money and credtis, etc Start with a small subset of variables, and add more if you like --- ## Big economic data ... #### Data Data available from the [
FRED homepage](https://research.stlouisfed.org/econ/mccracken/fred-databases/) You can use a function provided by uDASH staff to get the data you need. #### Cleaning and transformation Select a manageable collection of key economic indicators, exclude outliers #### Explore Univariate statistics, time series. #### Model Which factors have the most influence on the unemployment rate's ups and downs? --- ## Big economic data ... #### Visualise .pull-left[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/eco-eda-1.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/eco-viz-1.png" width="100%" /> ] --- class: inverse, center, middle #
Climate: Weather & Air Quality in Schools > Good air quality in Sydney was something that many of us took for granted until the Black Summer Bushfires in 2019-2020. From serious health effects to general inconvenience, it was suddenly hard to ignore just how important the air we breathe truly is. However, fires aren’t the only phenomenon that can impact our air quality: urbanisation, pollution, and weather events can all influence our health and wellbeing. ***Can we predict air quality in Sydney using information about the weather?*** --- ## Weather & Air Quality ... This dataset contains 115 different variables documenting local weather and air quality measured at 12 different sites around Sydney. - Weather-related variables: - temperature `t`, relative humidity `rh`, - air pressure `p`, rainfall `rain`, - wind direction `wd`, and wind speed `ws` - Air-quality related variables: - concentrations of nitrogen dioxide `no2`, sulphur dioxide `so2`, - carbon monoxide `co`, ozone `o3`, - particulate matter <2.5μm `pm25`, and <10μm `pm10`. Since measurements are taken every 20 minutes, we have 43771 data points! --- ## Weather & Air Quality ... #### Data Detailed time-series weather and air-quality data collected at Sydney schools are available for download via [TERN](https://www.tern.org.au/news-swaq-data/), but a cleaned version will be provided by uDASH staff. #### Cleaning and transformation - Time Twist: convert time format from UTC to Sydney local time. - Data Detangle: instead of cluttered 116 columns- one new column for location and 12 other columns, each for a different climate variable! #### Explore Relationships between key variables for each site. #### Model Multiple linear regression, for example between temperature, relative humidity, and NO<sub>2</sub> concentrations. --- ## Weather & Air Quality ... #### Visualise Combining predictions with data to plot the results of the model. Plotting relationship between `\(NO_2\)` and key variables for one site (UNSW): .pull-left[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/airq-eda-1.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/airq-viz-1.png" width="100%" /> ] --- class: inverse, center, middle # <img src="data:image/png;base64,#images/crab-svgrepo-com.svg" alt="Mud Crab Icon" style="width: 1em; height: 1em;"/> Ecology: Mud Crab Tracking > Environmental factors such as water temperature, conductivity, and time of day significantly influence the behavioral patterns of Giant Mud Crabs, including their movement, activity levels, and periods of heightened activity *** Are there significant correlations or patterns that can be explored to understand response to climate variability? *** --- ## Mud Crab Tracking ... Crabs were tracked using acoustic telemetry, which involved attaching tags with distinct sounds.To pick up these sounds, receiver stations, functioning like listening posts, were set up in the study area. When a crab was within range of three or more receivers, its exact location could be determined. Acceleration was measured along three axes for a period of 20 seconds, recording 5 measurements per second (5 Hz). <div style="float:right;"> <img src="data:image/png;base64,#images/crab_speed_measure.jpeg" alt="Crab Speed Measure"/> </div> - Primary variables: - crab: Unique identifier for each crab - sex: male/female - cl_mm: Size of the crab's shell in millimeters - time: Timestamp YYYY-MM-DD hh:mm:ss - lat, lon: Geospatial measurements in degrees - x, y: The easting and northing of the detection in meters - accel: The acceleration of the crab in meters per second squared - temperature_c: Water temperature in degrees Celsius - conductivity_ms_cm: Electrical conductivity in millisiemens per centimeter --- ## Mud Crab Tracking ... #### Data Data provided by uDASH staff. #### Cleaning and transformation Data types and formats for handling date and time are particularly important for data analysis. #### Explore Plot to visualize the paths of individual crabs based on their spatial coordinates Potential differences in behavior between individual crabs (or possibly between sexes)! #### Model - Some questions you may consider asking of this data - What effect does water temperature have on crab behavior? - What effect does conductivity have on crab behavior? - What effect does the time of day have on crab behavior? --- ## Mud Crab Tracking ... #### Visualise .pull-left[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/crab-eda-1.png" width="100%" /> ] .pull-right[ <img src="data:image/png;base64,#Project_datasets_files/figure-html/crab-tempEffect-1.png" width="100%" /> ] --- class: middle # Choose your project -
Psychology: Emotions during the pandemic -
Business insights: Small automotive business -
Entertainment: Video Games reviews -
Economics: Big economic data -
Climate: Weather & Air Quality in Schools - <img src="data:image/png;base64,#images/crab-svgrepo-com.svg" alt="Mud Crab Icon" style="width: 1em; height: 1em;"/> Ecology: Mud Crab Tracking --- # Thanks! [
@uDASH_UNSW](https://twitter.com/uDASH_UNSW) [
//www.unsw.edu.au/research/udash](https://www.unsw.edu.au/research/udash) [
uDASH@unsw.edu.au](mailto:uDASH@unsw.edu.au )