Deakin Representation: Scientists affiliated with Deakin University appeared in the dataset 212 times.
Comparative Ranking: Deakin University ranked 10th in Australia and 3rd in Victoria in terms of the number of scientists listed.
Figure 1. Ranking of the top 10 institutes in Australia (left panel) and within Vitoria (right panel).
Indicator on Early-Career Researchers: Of the 212 Deakin scientists in this dataset, in the dataset, 23 scientists (10.85%) had their first publication in 2013 or later.
Additional Insights:
1. Distribution of Scientist Rankings in the Top 10 Institutes: Although there are large differences in the number of scientists represented from each institution in the dataset, the distribution of scientists’ rankings was similar across nine of the top 10 Australian institutions. CSIRO was an exception, with its scientists ranking significantly lower overall than those from several universities, including Deakin University. This suggests that the quality and overall research impact of Deakin University scientists is comparable to that of much larger institutions.
Figure 2. Distribution of scientist rankings in the top 10 instutes in Australia. Boxplot displays the median rank and interquartile range. Note: Lower values correspond to a higher rank.
2. Research Fields of Top Scientists: General and Internal Medicine was the leading research subfield across all Australian and Victorian institutions. At Deakin University, however, the Materials subfield (within the “Engineering” and “Enabling & Strategic Technologies” categories) had the highest number of scientists represented. Notably, Artificial Intelligence and Image Processing ranked second at Deakin and across Australia, and third in Victoria, surpassing several health science subfields. This highlights the strong national and institutional focus on Artificial Intelligence research, with Australian scientists, including those at Deakin University, playing a prominent role in advancing this rapidly evolving field.
3. Representation of ECRs in Top 10 Institutes: Deakin University had the highest proportion of early-career researchers among the top ten Australian institutions in the dataset. Notably, Deakin had more early-career researchers than the University of Melbourne, even though Melbourne had three times as many scientists represented overall. This indicates that larger institutions such as the University of Melbourne may have a more senior research profile, while Deakin University has been particularly successful in attracting and supporting talented new researchers.
Figure 3. Percentage of early-career researchers in the top 10 Australian institutes (left panel) and the number of early-career researchers in the top 10 Australian institutes (right panel).
Conclusion: The Stanford “World’s Top 2% Scientists” dataset offers valuable insights into the performance of Deakin University researchers in comparison to their peers at other Australian and international institutions. The dataset can help identify research subfields where Deakin excels, assess the overall impact of its research, and evaluate the effectiveness of its recruitment of talented new researchers. However, there are known issues with the accuracy of institutional affiliations in the dataset. It is important to thoroughly check and, where necessary, correct these affiliations before conducting any detailed analysis to ensure the reliability of the findings.
Data preparation and analysis steps used
to derive these solutions are presented below. Click on the “Show”
button above each result to show the code that was used to produce it.
The floating table of contents on the left can be used to help you
navigate through this document.
The original data file was saved as a CSV file and loaded into R for analysis.
A check for missing data revealed that no data was missing for the key variables pertaining to this task (auth_full, inst_name, firstyr, lastyr) so I proceeded with the analyses. It is worth noting, however, that there were 21 missing values in sm-subfield-2 and only one missing value in sm-subfield-2-frac, which suggests that data about the listed authors’ second-ranked Science-Metrix category is incomplete.
#Check for missing data in columns by counting NAs
colSums(is.na(S_data))
## authfull inst_name cntry
## 0 0 0
## np6023 firstyr lastyr
## 0 0 0
## rank (ns) nc2323 (ns) h23 (ns)
## 0 0 0
## hm23 (ns) nps (ns) ncs (ns)
## 0 0 0
## cpsf (ns) ncsf (ns) npsfl (ns)
## 0 0 0
## ncsfl (ns) c (ns) npciting (ns)
## 0 0 0
## cprat (ns) np6023 cited2323 (ns) self%
## 0 0 0
## np6023_rw nc2323_to_rw nc2323_rw
## 0 0 0
## sm-subfield-1 sm-subfield-1-frac sm-subfield-2
## 0 0 21
## sm-subfield-2-frac sm-field sm-field-frac
## 1 0 0
## rank sm-subfield-1 rank sm-subfield-1 (ns) sm-subfield-1 count
## 0 0 0
## inst_state inst_correct
## 2 0
All rows that contain missing values in the sm-subfield-2 and sm-subfield-2-frac columns are shown below.
Use the arrows next to column headers to navigate left and right across columns, and click to pages on the bottom right to navigate up and down through rows.
missing_vals <- as.data.frame(S_data %>% filter(if_any(.cols= `sm-subfield-2`, is.na))) #Filters for rows that contain missing value in specified column
paged_table(missing_vals)
Since the analyses to be conducted are based on scientists’ affiliation, I inspected the inst_name variable to see how many institutes were included in the dataset and scan for any anomalies. Isolating the inst_name showed that there are 403 unique institute names in the dataset. However, closer inspection of those names correspond to faculties or departments within Australian institutions or to foreign institutions.
# Create a dataframe that contains the number of times each institute occurs in the the "inst_name" variable in the dataset
inst_count <- as.data.frame(table(S_data$inst_name))
colnames(inst_count)[1] <- "Institute"
paged_table(inst_count)
The original dataset did not include information about the state in which each institute is located. I therefore created the inst_state variable to include this information.
Note: information about the state of some instutes was difficult to determine because they had sites across multiple states. These were labelled as “unknown”. Some scientists had foreign institute names, but I was able to identify the Australian institution with which they are affiliated and included the state of that institution. In five cases, no Australian affiliation could be identified, so their value in “inst_state” was set to NA.
# Create a dataframe that contains the number of times each institute occurs in the the "inst_name" variable in the dataset
S_data <- S_data %>% mutate(inst_state = case_when (
S_data$inst_name == "Advanced Mobility Analytics Group Pty Ltd" ~ "QLD",
S_data$inst_name == "AgriBio, Centre for AgriBioscience" ~ "VIC",
S_data$inst_name == "Agriculture Victoria" ~ "VIC",
S_data$inst_name == "Ambulance Victoria" ~ "VIC",
S_data$inst_name == "AML3D Ltd" ~ "SA",
S_data$inst_name == "ANZAC Research Institute" ~ "NSW",
S_data$inst_name == "ARC Centre of Excellence for Gravitational Wave Discovery" ~ "Unknown",
S_data$inst_name == "Arriba Consulting Pty Ltd" ~ "WA",
S_data$inst_name == "Arthur Rylah Institute for Environmental Research" ~ "VIC",
S_data$inst_name == "AUS-TBI Lived Experience Advisory Group" ~ "Unknown",
S_data$inst_name == "Austin Health" ~ "VIC",
S_data$inst_name == "Austin Hospital" ~ "VIC",
S_data$inst_name == "Australasian Society of Aesthetic Plastic Surgeons" ~ "NSW",
S_data$inst_name == "Australian Academy of Science" ~ "ACT",
S_data$inst_name == "Australian Antarctic Division" ~ "TAS",
S_data$inst_name == "Australian Catholic University" ~ "VIC",
S_data$inst_name == "Australian Centre for Sexual Health" ~ "NSW",
S_data$inst_name == "Australian Council for Educational Research" ~ "VIC",
S_data$inst_name == "Australian Herbicide Resistance Initiative" ~ "WA",
S_data$inst_name == "Australian Institute of Marine Science" ~ "QLD",
S_data$inst_name == "Australian Maritime College" ~ "TAS",
S_data$inst_name == "Australian Museum" ~ "VIC",
S_data$inst_name == "Australian Nuclear Science and Technology Organisation" ~ "NSW",
S_data$inst_name == "Australian Orthopaedic Association National Joint Replacement Registry" ~ "SA",
S_data$inst_name == "Australian Pet Welfare Foundation" ~ "QLD",
S_data$inst_name == "Australian Red Cross Blood Service" ~ "Unknown",
S_data$inst_name == "Australian Synchrotron" ~ "VIC",
S_data$inst_name == "Australian Wine Research Institute" ~ "SA",
S_data$inst_name == "Baker Heart and Diabetes Institute" ~ "VIC",
S_data$inst_name == "Ballarat Health Services" ~ "VIC",
S_data$inst_name == "Bangladesh & Research and Training International" ~ "NSW",
S_data$inst_name == "Barwon Health" ~ "VIC",
S_data$inst_name == "BHP" ~ "VIC",
S_data$inst_name == "Bio21 Molecular Science and Biotechnology Institute" ~ "VIC",
S_data$inst_name == "Bionics Institute of Australia" ~ "VIC",
S_data$inst_name == "Biospherics Pty Ltd" ~ "WA",
S_data$inst_name == "Black Dog Institute" ~ "NSW",
S_data$inst_name == "Blue Minerals Consultancy" ~ "WA",
S_data$inst_name == "BlueSky Genetics" ~ "SA",
S_data$inst_name == "Boeing Australia" & S_data$authfull == "Perez, Tristan" ~ "QLD",
S_data$inst_name == "Boeing Australia" & S_data$authfull == "Clothier, R." ~ "VIC",
S_data$inst_name == "Bond University" ~ "QLD",
S_data$inst_name == "Bowel Cancer Outcomes Registry (BCOR)" ~ "VIC",
S_data$inst_name == "Bradken" ~ "NSW",
S_data$inst_name == "BRITE Professional Services" ~ "VIC",
S_data$inst_name == "BRITE Professional Services" ~ "VIC",
S_data$inst_name == "Bureau of Meteorology Australia" ~ "ACT",
S_data$inst_name == "Burnet Institute" ~ "VIC",
S_data$inst_name == "Cabrini Health" ~ "VIC",
S_data$inst_name == "Cairnmillar Institute" ~ "VIC",
S_data$inst_name == "Calvary Mater Newcastle" ~ "NSW",
S_data$inst_name == "Cambia" ~ "NSW",
S_data$inst_name == "Canberra Hospital" ~ "ACT",
S_data$inst_name == "Cancer Council NSW" ~ "NSW",
S_data$inst_name == "Cancer Council Victoria" ~ "VIC",
S_data$inst_name == "Cancer Institute NSW" ~ "NSW",
S_data$inst_name == "Cartherics Pty Ltd." ~ "VIC",
S_data$inst_name == "Castlereagh Imaging" ~ "NSW",
S_data$inst_name == "Caulfield Hospital" ~ "VIC",
S_data$inst_name == "Centenary Institute of Cancer Medicine and Cell Biology" ~ "NSW",
S_data$inst_name == "Central Adelaide Local Health Network" ~ "SA",
S_data$inst_name == "Centre for Adolescent Health" ~ "VIC",
S_data$inst_name == "Centre for Cognitive Neuroscience" ~ "VIC",
S_data$inst_name == "Centre for Digestive Diseases" ~ "NSW",
S_data$inst_name == "Centre for Eye Research Australia" ~ "VIC",
S_data$inst_name == "ChangeAgents AEC" ~ "VIC",
S_data$inst_name == "Charles Darwin University" ~ "NT",
S_data$inst_name == "Charles Sturt University" ~ "VIC",
S_data$inst_name == "Charles Sturt University, Wagga Wagga" ~ "NSW",
S_data$inst_name == "Chessman Ecology" ~ "NSW",
S_data$inst_name == "Children’s Health Queensland" ~ "QLD",
S_data$inst_name == "Children's Medical Research Institute" ~ "VIC",
S_data$inst_name == "Climate and Disaster Risk Research and Consulting (CDRC)" ~ "NSW",
S_data$inst_name == "Clinical Governance" ~ "QLD",
S_data$inst_name == "Coelho Software" ~ "NA",
S_data$inst_name == "Cogstate Ltd." ~ "VIC",
S_data$inst_name == "Comminution Reimagined" ~ "QLD",
S_data$inst_name == "Commission for the Conservation of Antarctic Marine Living Resources" ~ "TAS",
S_data$inst_name == "Commonwealth Scientific and Industrial Research Organisation" ~ "ACT",
S_data$inst_name == "Concord Repatriation General Hospital" ~ "NSW",
S_data$inst_name == "Conservation and Attractions" ~ "WA",
S_data$inst_name == "Coogee Energy" ~ "VIC",
S_data$inst_name == "Cooperative Research Centres Australia" ~ "ACT",
S_data$inst_name == "CQUniversity Australia" ~ "QLD",
S_data$inst_name == "CQUniversity Brisbane" ~ "QLD",
S_data$inst_name == "CQUniversity Bundaberg" ~ "QLD",
S_data$inst_name == "CSIRO Environment" ~ "NSW",
S_data$inst_name == "CSL Limited" ~ "VIC",
S_data$inst_name == "Curtin University" ~ "WA",
S_data$inst_name == "CyanoLakes (Pty) Ltd" ~ "NSW",
S_data$inst_name == "Cyberstronomy Pty Ltd" ~ "VIC",
S_data$inst_name == "Dairy Australia" ~ "VIC",
S_data$inst_name == "Deakin University" ~ "VIC",
S_data$inst_name == "Deakin University, School of Life and Environmental Sciences" ~ "VIC",
S_data$inst_name == "Defence Science and Technology Group" ~ "ACT",
S_data$inst_name == "Department of Environment and Science" ~ "QLD",
S_data$inst_name == "Dianoia Institute of Philosophy" ~ "NSW",
S_data$inst_name == "Digital Frontier Partners" ~ "VIC",
S_data$inst_name == "Donvale Rehabilitation Hospital" ~ "VIC",
S_data$inst_name == "Dulwich Centre" ~ "SA",
S_data$inst_name == "EastCoast Geospatial Consultants" ~ "NSW",
S_data$inst_name == "Eastern Health" ~ "VIC",
S_data$inst_name == "Ecas4 Australia Pty Ltd" ~ "SA",
S_data$inst_name == "Edith Cowan University" ~ "WA",
S_data$inst_name == "Education and Design Associates" ~ "Unknown",
S_data$inst_name == "Eirich Australia" ~ "QLD",
S_data$inst_name == "Eirich Australia" ~ "NSW",
S_data$inst_name == "Emesent" ~ "QLD",
S_data$inst_name == "Endeavour College of Natural Health" ~ "VIC",
S_data$inst_name == "Engineering Institute of Technology" ~ "VIC",
S_data$inst_name == "Environment Protection Authority Victoria" ~ "VIC",
S_data$inst_name == "Epworth Freemasons Hospital" ~ "VIC",
S_data$inst_name == "Epworth Sleep Centre" ~ "VIC",
S_data$inst_name == "Extreme Wellness Institute" ~ "VIC",
S_data$inst_name == "Faculty of Health and Medical Sciences" ~ "SA",
S_data$inst_name == "Faculty of Medicine and Health" ~ "NSW",
S_data$inst_name == "Faculty of Medicine, Dentistry and Health Sciences" ~ "VIC",
S_data$inst_name == "Faculty of Medicine, Nursing and Health Sciences" ~ "VIC",
S_data$inst_name == "Faculty of Science, Medicine and Health" ~ "NSW",
S_data$inst_name == "Federation University Australia" ~ "VIC",
S_data$inst_name == "Fiona Stanley Hospital" ~ "WA",
S_data$inst_name == "Fischer-Cripps Laboratories Pty Ltd." ~ "NSW",
S_data$inst_name == "Flinders Medical Centre" ~ "SA",
S_data$inst_name == "Flinders University" ~ "SA",
S_data$inst_name == "Football Australia" ~ "NSW",
S_data$inst_name == "Forensic Science Centre, Adelaide" ~ "SA",
S_data$inst_name == "Forest and fire science consultant" ~ "WA",
S_data$inst_name == "Fortescue Future Industries Pty Ltd" ~ "WA",
S_data$inst_name == "Fortescue Metals Group Ltd." ~ "WA",
S_data$inst_name == "Francis Buttle & Associates" ~ "NSW",
S_data$inst_name == "Fraser Coast Regional Council" ~ "QLD",
S_data$inst_name == "Garvan Institute of Medical Research" ~ "NSW",
S_data$inst_name == "General Practice" ~ "NSW",
S_data$inst_name == "Geochron Research Group" ~ "NSW",
S_data$inst_name == "Geoffrey Smithers Food Industry Consulting Services" ~ "VIC",
S_data$inst_name == "Geological Survey of NSW" ~ "NSW",
S_data$inst_name == "Geological Survey of Western Australia" ~ "WA",
S_data$inst_name == "George Institute for Global Health" ~ "NSW",
S_data$inst_name == "Geoscience Australia" ~ "ACT",
S_data$inst_name == "Geotomo Software Sdn Bhd" ~ "NA",
S_data$inst_name == "Geotrack International Pty Ltd" ~ "VIC",
S_data$inst_name == "Government of Western Australia" ~ "WA",
S_data$inst_name == "Griffith Health" ~ "QLD",
S_data$inst_name == "Griffith School of Medicine" ~ "QLD",
S_data$inst_name == "Griffith University" ~ "QLD",
S_data$inst_name == "Health Research Group" ~ "NSW",
S_data$inst_name == "Heart Research Institute Australia" ~ "NSW",
S_data$inst_name == "Heritage and architectural conservation and rural development." ~ "Unknown",
S_data$inst_name == "Hudson Institute of Medical Research" ~ "VIC",
S_data$inst_name == "Hunter Medical Research Institute, Australia" ~ "NSW",
S_data$inst_name == "HyVista Corporation" ~ "NSW",
S_data$inst_name == "IBM Research" ~ "NSW",
S_data$inst_name == "IC Independent Consulting" ~ "NSW",
S_data$inst_name == "ICTRYM Pty Ltd" ~ "NSW",
S_data$inst_name == "IEEE" ~ "NSW",
S_data$inst_name == "Imaging at Olympic Park" ~ "VIC",
S_data$inst_name == "Independent academic researcher and author" ~ "Unknown",
S_data$inst_name == "Ingham Institute for Applied Medical Research" ~ "NSW",
S_data$inst_name == "Institute for Marine and Antarctic Studies" ~ "TAS",
S_data$inst_name == "Institute for Musculoskeletal Health" ~ "NSW",
S_data$inst_name == "Institute for Sensible Transport" ~ "VIC",
S_data$inst_name == "Institute for Study and Development Worldwide" ~ "NSW",
S_data$inst_name == "Integrity Ag & Environment" ~ "QLD",
S_data$inst_name == "ISN Psychology" ~ "VIC",
S_data$inst_name == "James Cook University" ~ "QLD",
S_data$inst_name == "JDH Consulting" ~ "VIC",
S_data$inst_name == "Jemora Pty Ltd" ~ "VIC",
S_data$inst_name == "John Hunter Hospital" ~ "NSW",
S_data$inst_name == "King Edward Memorial Hospital for Women" ~ "WA",
S_data$inst_name == "Kirk Marine Optics" ~ "Unknown",
S_data$inst_name == "Kolling Institute of Medical Research" ~ "NSW",
S_data$inst_name == "La Trobe Business School" ~ "VIC",
S_data$inst_name == "La Trobe University" ~ "VIC",
S_data$inst_name == "Lallemand Australia Pty Ltd" ~ "SA",
S_data$inst_name == "Laseire Consulting Pty Ltd" ~ "NSW",
S_data$inst_name == "Launceston General Hospital" ~ "TAS",
S_data$inst_name == "Laverty Pathology" ~ "NSW",
S_data$inst_name == "Lions Eye Institute, Perth" ~ "WA",
S_data$inst_name == "LISMORE" ~ "NSW",
S_data$inst_name == "Liverpool Hospital" ~ "NSW",
S_data$inst_name == "LM Consulting" ~ "WA",
S_data$inst_name == "Lockheed Martin Corporation" ~ "SA",
S_data$inst_name == "Macquarie University" ~ "NSW",
S_data$inst_name == "Macroview Group" ~ "NSW",
S_data$inst_name == "Mater Pathology" ~ "QLD",
S_data$inst_name == "MedAlliance" ~ "NSW",
S_data$inst_name == "Medical Journal of Australia" ~ "NSW",
S_data$inst_name == "Medical Radiation Research Team" ~ "VIC",
S_data$inst_name == "MEL Consultants Pty Ltd" ~ "VIC",
S_data$inst_name == "Melanoma Institute Australia" ~ "NSW",
S_data$inst_name == "Melbourne Business School" ~ "VIC",
S_data$inst_name == "Melbourne Institute of Technology" ~ "VIC",
S_data$inst_name == "Melbourne Law School" ~ "VIC",
S_data$inst_name == "Melbourne Polytechnic" ~ "VIC",
S_data$inst_name == "Melbourne School of Population and Global Health" ~ "VIC",
S_data$inst_name == "Melbourne School of Psychological Sciences" ~ "VIC",
S_data$inst_name == "Melbourne Sexual Health Centre" ~ "VIC",
S_data$inst_name == "Menzies Health Institute Queensland" ~ "QLD",
S_data$inst_name == "Menzies School of Health Research" ~ "NT",
S_data$inst_name == "Mesaplexx Pty Ltd." ~ "QLD",
S_data$inst_name == "Metro North Hospital and Health Service" ~ "QLD",
S_data$inst_name == "Microbial Screening Technologies Pty Ltd" ~ "NSW",
S_data$inst_name == "Mineral Resources Ltd." ~ "WA",
S_data$inst_name == "MJC Pain Management and Research Centre" ~ "NSW",
S_data$inst_name == "Modbury Hospital Foundation" ~ "SA",
S_data$inst_name == "Monash Business School" ~ "VIC",
S_data$inst_name == "Monash Health" ~ "VIC",
S_data$inst_name == "Monash Institute of Pharmaceutical Sciences" ~ "VIC",
S_data$inst_name == "Monash Medical Centre" ~ "VIC",
S_data$inst_name == "Monash University" ~ "VIC",
S_data$inst_name == "Mt Lindesay" ~ "QLD",
S_data$inst_name == "Murdoch Children's Research Institute" ~ "VIC",
S_data$inst_name == "Murdoch University" ~ "WA",
S_data$inst_name == "National Acoustic Laboratories" ~ "NSW",
S_data$inst_name == "National Ageing Research Institute" ~ "VIC",
S_data$inst_name == "National Trauma Research Institute" ~ "VIC",
S_data$inst_name == "NDIS Quality and Safeguards Commission" ~ "NSW",
S_data$inst_name == "Neuroscience Research Australia" ~ "NSW",
S_data$inst_name == "New South Wales Government" ~ "NSW",
S_data$inst_name == "New South Wales Rural Fire Service" ~ "NSW",
S_data$inst_name == "NEXTSENSE" ~ "VIC",
S_data$inst_name == "Ngaanyatjarra Health Service" ~ "NT",
S_data$inst_name == "Nontrivialzeros Research" ~ "NSW",
S_data$inst_name == "Northside Nutrition & Dietetics" ~ "VIC",
S_data$inst_name == "NSW Brain Clot Bank" ~ "NSW",
S_data$inst_name == "NSW Department of Primary Industries" ~ "NSW",
S_data$inst_name == "Office of the Chief Forensic Scientist" ~ "NSW",
S_data$inst_name == "Office of the NSW Chief Scientist and Engineer" ~ "NSW",
S_data$inst_name == "Olivia Newton-John Cancer Research Institute" ~ "VIC",
S_data$inst_name == "ORYGEN Youth Health" ~ "VIC",
S_data$inst_name == "Ozemantics Pty Ltd" ~ "NSW",
S_data$inst_name == "Park Centre for Mental Health" ~ "QLD",
S_data$inst_name == "Parliament of Australia" ~ "ACT",
S_data$inst_name == "Peninsula Health" ~ "VIC",
S_data$inst_name == "Perron Institute for Neurological and Translational Science" ~ "WA",
S_data$inst_name == "Perth Children's Hospital" ~ "WA",
S_data$inst_name == "Peter Maccallum Cancer Centre" ~ "VIC",
S_data$inst_name == "Philip Darbyshire Consulting Ltd" ~ "SA",
S_data$inst_name == "Phillip Island Nature Park" ~ "VIC",
S_data$inst_name == "PIVET Medical Centre" ~ "WA",
S_data$inst_name == "Pollard Geological Services Pty. Ltd." ~ "QLD",
S_data$inst_name == "Port Macquarie Base Hospital" ~ "NSW",
S_data$inst_name == "Prince of Songkla University" ~ "QLD",
S_data$inst_name == "Prince of Wales Hospital" ~ "NSW",
S_data$inst_name == "Prince of Wales Private Hospital" ~ "NSW",
S_data$inst_name == "Princess Alexandra Hospital" ~ "QLD",
S_data$inst_name == "Private Practice" & S_data$authfull == "Jackson, S. A." ~ "NSW",
S_data$inst_name == "Private Practice" & S_data$authfull == "Meredith, Neil" ~ "QLD",
S_data$inst_name == "Process Optimization for Future" ~ "Unknown",
S_data$inst_name == "Professional Services Department (Resources)" ~ "VIC",
S_data$inst_name == "Prostate Cancer Foundation of Australia" ~ "NSW",
S_data$inst_name == "PsychoTropical Research" ~ "Unknown",
S_data$inst_name == "PV Lighthouse" ~ "NSW",
S_data$inst_name == "QIMR Berghofer Medical Research Institute" ~ "QLD",
S_data$inst_name == "Quantm Ltd." ~ "Unknown",
S_data$inst_name == "Quantum Brilliance Pty Ltd" ~ "NSW",
S_data$inst_name == "Queen Elizabeth II Research Institute for Mothers and Infants" ~ "NSW",
S_data$inst_name == "Queensland Department of Agriculture and Fisheries" ~ "QLD",
S_data$inst_name == "Queensland Government" ~ "QLD",
S_data$inst_name == "Queensland Museum" ~ "QLD",
S_data$inst_name == "Queensland of Technology" ~ "QLD",
S_data$inst_name == "Queensland University of Technology" ~ "QLD",
S_data$inst_name == "R. E. Johannes Pty Ltd." ~ "TAS",
S_data$inst_name == "Rare Fibre" ~ "VIC",
S_data$inst_name == "RDadvisor" ~ "TAS",
S_data$inst_name == "Redcliffe Hospital" ~ "QLD",
S_data$inst_name == "Renal Research" ~ "NSW",
S_data$inst_name == "Research and Data Analysis Centre" ~ "QLD",
S_data$inst_name == "ResMed Science Center" ~ "NSW",
S_data$inst_name == "Retired Professor of Nursing and Independent Scholar" ~ "VIC",
S_data$inst_name == "RMIT University" ~ "VIC",
S_data$inst_name == "Royal Adelaide Hospital" ~ "SA",
S_data$inst_name == "Royal Botanic Gardens Sydney" ~ "NSW",
S_data$inst_name == "Royal Botanic Gardens Victoria" ~ "VIC",
S_data$inst_name == "Royal Brisbane and Women's Hospital" ~ "QLD",
S_data$inst_name == "Royal Children's Hospital, Melbourne" ~ "VIC",
S_data$inst_name == "Royal College of Pathologists of Australasia Quality Assurance Programs" ~ "NSW",
S_data$inst_name == "Royal Hobart Hospital" ~ "TAS",
S_data$inst_name == "Royal Hospital for Women, Sydney" ~ "NSW",
S_data$inst_name == "Royal Melbourne Hospital" ~ "VIC",
S_data$inst_name == "Royal North Shore Hospital" ~ "NSW",
S_data$inst_name == "Royal Perth Hospital" ~ "WA",
S_data$inst_name == "Royal Prince Alfred Hospital" ~ "NSW",
S_data$inst_name == "Royal Victorian Eye & Ear Hospital, Melbourne" ~ "VIC",
S_data$inst_name == "Royal Women's Hospital, Carlton" ~ "VIC",
S_data$inst_name == "RuleQuest Research Pty Ltd" ~ "NSW",
S_data$inst_name == "RVT Australia" ~ "QLD",
S_data$inst_name == "SA Water" ~ "SA",
S_data$inst_name == "Save Sight Institute" ~ "NSW",
S_data$inst_name == "School of EducationUniversity of Tasmania" ~ "TAS",
S_data$inst_name == "School of Medicine" ~ "QLD",
S_data$inst_name == "Scolexia Avian and Animal Consultancy Co." ~ "VIC",
S_data$inst_name == "Sir Charles Gairdner Hospital" ~ "WA",
S_data$inst_name == "SMC Testing Pty Ltd" ~ "VIC",
S_data$inst_name == "South Australian Health and Medical Research Institute" ~ "SA",
S_data$inst_name == "South Australian Museum" ~ "SA",
S_data$inst_name == "Southern Cross University" ~ "VIC",
S_data$inst_name == "SP Jain School of Global Management, Australia" ~ "NSW",
S_data$inst_name == "St George Hospital" ~ "VIC",
S_data$inst_name == "St John of God Health Care" ~ "VIC",
S_data$inst_name == "St Vincent's Institute" ~ "VIC",
S_data$inst_name == "St Vincent’s Private Hospital Northside" ~ "QLD",
S_data$inst_name == "St. Vincent's Hospital Melbourne" ~ "VIC",
S_data$inst_name == "St. Vincent's Hospital Sydney" ~ "NSW",
S_data$inst_name == "Stanford Graduate School of Business" ~ "NA",
S_data$inst_name == "State Government of Victoria" ~ "VIC",
S_data$inst_name == "Statistical and ASReml Consultant" ~ "NSW",
S_data$inst_name == "Swinburne University of Technology" ~ "VIC",
S_data$inst_name == "Syddansk Universitet" ~ "WA",
S_data$inst_name == "Sydney Children's Hospital, Randwick" ~ "NSW",
S_data$inst_name == "Sydney Hospital and Sydney Eye Hospital" ~ "NSW",
S_data$inst_name == "Sydney Urodynamic Centres" ~ "NSW",
S_data$inst_name == "TAFE Queensland" ~ "QLD",
S_data$inst_name == "Tasmanian Institute of Agriculture" ~ "TAS",
S_data$inst_name == "Tasmanian School of Medicine" ~ "TAS",
S_data$inst_name == "Telethon Kids Institute" ~ "WA",
S_data$inst_name == "Tennis Australia" ~ "WA",
S_data$inst_name == "Tetra Tech Coffey Pty Ltd" ~ "VIC",
S_data$inst_name == "The Alfred" ~ "VIC",
S_data$inst_name == "The Australian National University" ~ "ACT",
S_data$inst_name == "The Cancer Council Queensland" ~ "QLD",
S_data$inst_name == "The Children's Hospital at Westmead" ~ "NSW",
S_data$inst_name == "The Department of Primary Industries and Regional Development" ~ "WA",
S_data$inst_name == "The Equality Institute" ~ "VIC",
S_data$inst_name == "The Faculty of Health and Medical Sciences" ~ "SA",
S_data$inst_name == "The Faculty of Health Sciences" ~ "VIC",
S_data$inst_name == "The Faculty of Medicine, Health and Human Sciences" ~ "NSW",
S_data$inst_name == "The Florey" ~ "VIC",
S_data$inst_name == "The Gap" ~ "QLD",
S_data$inst_name == "The Harry Perkins Institute of Medical Research" ~ "WA",
S_data$inst_name == "The Kirby Institute" ~ "NSW",
S_data$inst_name == "The Knowledge Synthesis Lab" ~ "VIC",
S_data$inst_name == "The Mid North Coast Local Health District" ~ "NSW",
S_data$inst_name == "THE MULLION GROUP PTY LTD" ~ "ACT",
S_data$inst_name == "The Peter Doherty Institute for Infection and Immunity" ~ "VIC",
S_data$inst_name == "The Prince Charles Hospital" ~ "QLD",
S_data$inst_name == "The Queen Elizabeth Hospital, North Western Adelaide Health Service" ~ "SA",
S_data$inst_name == "The Tourism CoLab" ~ "QLD",
S_data$inst_name == "The University of Adelaide" ~ "SA",
S_data$inst_name == "The University of Hong Kong" ~ "VIC",
S_data$inst_name == "The University of Newcastle, Australia" ~ "NSW",
S_data$inst_name == "The University of Notre Dame Australia" ~ "NSW",
S_data$inst_name == "The University of Queensland" ~ "QLD",
S_data$inst_name == "The University of Queensland Business School" ~ "QLD",
S_data$inst_name == "The University of Sydney" ~ "NSW",
S_data$inst_name == "The University of Sydney Business School" ~ "NSW",
S_data$inst_name == "The University of Sydney School of Dentistry" ~ "NSW",
S_data$inst_name == "The University of Sydney School of Health Sciences" ~ "NSW",
S_data$inst_name == "The University of Sydney School of Public Health" ~ "NSW",
S_data$inst_name == "The University of Western Australia" ~ "WA",
S_data$inst_name == "The Westmead Institute for Medical Research" ~ "NSW",
S_data$inst_name == "Torrens University Australia" ~ "Unknown",
S_data$inst_name == "Translational Research Institute Australia" ~ "QLD",
S_data$inst_name == "Transport for NSW" ~ "NSW",
S_data$inst_name == "Tropical Australian Academic Health Centre" ~ "QLD",
S_data$inst_name == "Tropical Coastal and Mangrove Consultants" ~ "VIC",
S_data$inst_name == "Unaffiliated" ~ "NA",
S_data$inst_name == "Unity of First People of Australia" ~ "WA",
S_data$inst_name == "Universität Graz" ~ "NA",
S_data$inst_name == "University of Canberra" ~ "ACT",
S_data$inst_name == "University of Liverpool" ~ "NA",
S_data$inst_name == "University of Melbourne" ~ "VIC",
S_data$inst_name == "University of New England Australia" ~ "NSW",
S_data$inst_name == "University of New South Wales at Australian Defence Force Academy" ~ "NSW",
S_data$inst_name == "University of Notre Dame" ~ "Unknown",
S_data$inst_name == "University of Oxford" ~ "ACT",
S_data$inst_name == "University of South Australia" ~ "SA",
S_data$inst_name == "University of Southern Queensland" ~ "QLD",
S_data$inst_name == "University of Tasmania" ~ "TAS",
S_data$inst_name == "University of Technology Sydney" ~ "NSW",
S_data$inst_name == "University of Technology Sydney Centre for Forensic Science" ~ "NSW",
S_data$inst_name == "University of the Sunshine Coast" ~ "QLD",
S_data$inst_name == "University of Wollongong" ~ "NSW",
S_data$inst_name == "UNSW Medicine" ~ "NSW",
S_data$inst_name == "UNSW Sydney" ~ "NSW",
S_data$inst_name == "UQ Centre for Clinical Research" ~ "QLD",
S_data$inst_name == "UTS Law Health Justice Centre for funding" ~ "NSW",
S_data$inst_name == "UWA Medical School" ~ "WA",
S_data$inst_name == "Van Cleef Roet Centre for Nervous Diseases" ~ "VIC",
S_data$inst_name == "Vaxine Pty Ltd" ~ "SA",
S_data$inst_name == "Veterinary Oncology Consultants" ~ "NSW",
S_data$inst_name == "VicHealth" ~ "VIC",
S_data$inst_name == "Victor Chang Cardiac Research Institute" ~ "NSW",
S_data$inst_name == "Victoria University" ~ "VIC",
S_data$inst_name == "Victoria's Skin and Cancer Foundation" ~ "VIC",
S_data$inst_name == "Victorian Infectious Diseases Reference Laboratory" ~ "VIC",
S_data$inst_name == "Victorian Institute of Technology (VIT)" ~ "VIC",
S_data$inst_name == "WA School of Mines: Minerals, Energy and Chemical Engineering" ~ "WA",
S_data$inst_name == "Wagga Wagga Agricultural Institute" ~ "NSW",
S_data$inst_name == "Walter and Eliza Hall Institute of Medical Research" ~ "VIC",
S_data$inst_name == "Watermark Numerical Computing" ~ "QLD",
S_data$inst_name == "Western Australian Museum" ~ "WA",
S_data$inst_name == "Western Health" ~ "VIC",
S_data$inst_name == "Western Sydney Sexual Health Centre" ~ "NSW",
S_data$inst_name == "Western Sydney University" ~ "NSW",
S_data$inst_name == "Westmead Hospital" ~ "NSW",
S_data$inst_name == "Westmead Public Hospital" ~ "NSW",
S_data$inst_name == "Whithycombe”" ~ "NSW",
S_data$inst_name == "Women's and Children's Hospital Adelaide" ~ "SA",
S_data$inst_name == "Woolcock Institute of Medical Research" ~ "NSW",
S_data$inst_name == "WorkSafe Victoria" ~ "VIC",
S_data$inst_name == "World Health Organization, Australia" ~ "NSW",
S_data$inst_name == "Yanco Agricultural Institute" ~ "NSW",
S_data$inst_name == "Zeobond Group" ~ "VIC",
S_data$inst_name == "Zeobond Pty Ltd" ~ "VIC"))
S_data<-arrange(S_data,S_data$inst_name)
paged_table(select(S_data, 2,34,1,3,4,5,6,7,8,9)) #Displays selected columns in specified order. Full dataset now contains 34 columns.
While populating the inst_state variable, 264 authors were identified as having incorrect institute names. Incorrect names either included errors or displayed names of faculties or departments. I created a new variable called inst_correct which replaced incorrect institute names with correct ones, and copied over all the correct institute names for the remaining scientists.
The full list of incorrect institute names, the number of scientists affected, and what the institute names were corrected to can be foud in Appendix A1.
# Add column to dataset to with corrected institution names to obtain accurate rankings
S_data <- S_data %>% mutate(inst_correct = case_when (
S_data$inst_name == "Bangladesh & Research and Training International" ~ "Bangladesh & Research and Training International",
S_data$inst_name == "Clinical Governance" ~"Gold Coast Hospital and Health Service",
S_data$inst_name == "Deakin University, School of Life and Environmental Sciences" ~"Deakin University",
S_data$inst_name == "Austin Hospital" ~"Austin Health",
S_data$inst_name == "Griffith Health" ~"Griffith University",
S_data$inst_name == "Griffith School of Medicine" ~"Griffith University",
S_data$inst_name == "La Trobe Business School" ~"La Trobe University",
S_data$inst_name == "Medical Radiation Research Team" ~"Barwon Health",
S_data$inst_name == "Melbourne Business School" ~"University of Melbourne",
S_data$inst_name == "Melbourne Law School" ~"University of Melbourne",
S_data$inst_name == "Melbourne School of Population and Global Health" ~"University of Melbourne",
S_data$inst_name == "Melbourne School of Psychological Sciences" ~"University of Melbourne",
S_data$inst_name == "Monash Business School" ~"Monash University",
S_data$inst_name == "Monash Institute of Pharmaceutical Sciences" ~"Monash University",
S_data$inst_name == "Monash Medical Centre" ~"Monash Health",
S_data$inst_name == "Professional Services Department (Resources)" ~"Esri Australia",
S_data$inst_name == "Queensland of Technology" ~ "Queensland University of Technology",
S_data$inst_name == "Renal Research" ~"Gosford Hospital",
S_data$inst_name == "School of EducationUniversity of Tasmania" ~"University of Tasmania",
S_data$inst_name == "School of Medicine" ~"University of Queensland",
S_data$inst_name == "The University of Queensland Business School" ~"The University of Queensland",
S_data$inst_name == "The University of Sydney Business School" ~"The University of Sydney",
S_data$inst_name == "The University of Sydney School of Dentistry" ~"The University of Sydney",
S_data$inst_name == "The University of Sydney School of Health Sciences" ~"The University of Sydney",
S_data$inst_name == "The University of Sydney School of Public Health" ~"The University of Sydney",
S_data$inst_name == "University of Technology Sydney Centre for Forensic Science" ~"University of Technology Sydney",
S_data$inst_name == "UTS Law Health Justice Centre for funding" ~"University of Technology Sydney",
S_data$inst_name == "UNSW Medicine" ~"UNSW Sydney",
S_data$inst_name == "UQ Centre for Clinical Research" ~"University of Queensland",
S_data$inst_name == "UWA Medical School" ~"University of Western Australia",
S_data$inst_name == "Westmead Public Hospital" ~"Westmead Hospital",
S_data$inst_name == "Faculty of Health and Medical Sciences" ~"University of Adelaide",
S_data$inst_name == "Faculty of Medicine and Health" ~"The University of Sydney",
S_data$inst_name == "Faculty of Medicine, Dentistry and Health Sciences" ~"University of Melbourne",
S_data$inst_name == "Faculty of Medicine, Nursing and Health Sciences" ~"Monash University",
S_data$inst_name == "Faculty of Science, Medicine and Health" ~"University of Wollongong",
S_data$inst_name == "Prince of Songkla University" ~"CQUniversity Australia",
S_data$inst_name == "The Faculty of Health and Medical Sciences" ~"University of Adelaide",
S_data$inst_name == "The Faculty of Health Sciences" ~"Australian Catholic University",
S_data$inst_name == "The Faculty of Medicine, Health and Human Sciences" ~"Macquarie University",
S_data$inst_name == "The Gap" ~"Urban Water Futures",
S_data$inst_name == "Whithycombe”" ~"Withycombe",
S_data$inst_name == "University of Woolagong”" ~"University of Wollongong",
.default = S_data$inst_name))
S_data<-arrange(S_data,S_data$inst_correct)
paged_table(select(S_data, 35,34,1,3,4,5,6,7,8,9)) #Displays selected columns in specified order. Dataset now contains 35 columns.
As the final step of data preparation, scientists with no identifiable Australian affiliation were removed from the dataset.
The following five scientists were listed as belonging to institutes outside Australia, and a Google search did not reveal any known Australian affiliation:
They were therefore removed by filtering out all rows that had a value of “NA” in the inst_state variable. This reduced the total number of authors from 8034 to 8029.
The resulting dataset was saved to a dataframe called “data_clean”, which was used for analyses.
#Exclude authors from institutes outside Australia by filtering out all rows that have "NA" as the value for inst_state
data_clean <- dplyr::filter(S_data, !inst_state %in% c("NA")) #Returns 8029 of 8034 rows
To determine how many scientists in this dataset are affiliated with Deakin University, I counted the number of instances where “Deakin University” appeared in the newly created inst_correct variable.
# Count number of times Deakin University appears appear in the dataset using corrected institute column
deakin_count <- sum(data_clean$inst_correct == "Deakin University")
print(paste("Answer:", deakin_count, "of the scientists in this dataset are affiliated with Deakin University"))
## [1] "Answer: 212 of the scientists in this dataset are affiliated with Deakin University"
To determine Deakin’s comparative ranking in terms of number of scientists in Australia, I created a frequency table for all values in the inst_correct variable and arranged the rows in decending order of frequency. I used a bar plot to visualise this result as it is easy to interpret and conveys information about both the number of scientists affiliated with each institute and the institute’s ranking clearly.
# Determine ranking of Deakin University in AUSTRALIA in terms of number of scientists
inst_rank_AUS <- as.data.frame(table(data_clean$inst_correct)) # Creates a dataframe that contains the number of times each institute occurs in the the inst_correct variable in the dataset.
colnames(inst_rank_AUS)[1] <- "Institute" # Renames heading of first column in dataframe from "Var1" to "Institute"
colnames(inst_rank_AUS)[2] <- "Number of Scientists" # Renames heading of first column in dataframe from "Freq" to "Number of Scientists"
inst_rank_AUS <- arrange(inst_rank_AUS, desc(inst_rank_AUS[2])) #Arranges institutes in descending order of frequency
paged_table(inst_rank_AUS)
print(paste("Answer: In terms of the number of scientists in this dataset, Deakin ranked 10th in Australia"))
## [1] "Answer: In terms of the number of scientists in this dataset, Deakin ranked 10th in Australia"
# Plot Uni rankings in Aus (Top 10)
Aus_top <- inst_rank_AUS[1:10,]
Aus_top$Institute <-str_replace(Aus_top$Institute, "Commonwealth Scientific and Industrial Research Organisation", "CSIRO")
Aus_top %>% ggplot() +
labs(title = "Top 10 Institutes in Australia",
x = "Number of Scientists",
y = NULL) +
scale_fill_manual(values = c("orangered3", "gray70"), guide = "none") + theme_classic() + theme(plot.title = element_text(color="orangered3", size=18, face="bold")) +
geom_bar(aes(x = `Number of Scientists`, y = reorder(Institute, `Number of Scientists`),
fill = Institute != "Deakin University"), stat = "identity", show.legend = FALSE)
Deakin’s comparative ranking in Victoria was determined using the same method as above, with the exception that I first filtered the dataset for institutes with a value of “VIC” in the inst_state variable.
# Filter data for institutes based in Victoria and save to new dataset
data_VIC <- dplyr::filter(data_clean, inst_state %in% c("VIC"))
# Determine ranking of Deakin University in VICTORIA in terms of number of scientists
inst_rank_VIC <- as.data.frame(table(data_VIC$inst_correct))
colnames(inst_rank_VIC)[1] <- "Institute"
colnames(inst_rank_VIC)[2] <- "Number of Scientists"
inst_rank_VIC <- arrange(inst_rank_VIC, desc(inst_rank_VIC[2]))
paged_table(inst_rank_VIC)
print(paste("Answer: In terms of the number of scientists in this dataset, Deakin ranked 3rd in Victoria"))
## [1] "Answer: In terms of the number of scientists in this dataset, Deakin ranked 3rd in Victoria"
# Plot Uni rankings in Victoria (Top 10)
VIC_top <- inst_rank_VIC[1:10,]
VIC_top %>% ggplot() +
labs(title = "Top 10 Institutes in Victoria",
x = "Number of Scientists",
y = NULL) +
scale_fill_manual(values = c("orangered3", "gray70"), guide = "none") + theme_classic() +
theme(plot.title = element_text(color="orangered3", size=18, face="bold")) +
geom_bar(aes(x = `Number of Scientists`, y = reorder(Institute, `Number of Scientists`),
fill = Institute != "Deakin University"), stat = "identity", show.legend = FALSE)
The percentage of scientists in the dataset who are affiliated with Deakin University who are early-career researchers (ECRs) was calculated by first filtering the data for all for all scientists that had a value of “Deakin University” in the inst_correct variable. I then divided the number of scientists that had a value in the firstyr variable greater than 2012 and didvided it by the total number of scientists in the filtered dataset.
# Work out percentage of Deakin Scientists who had their first publication in 2013 or later
data_Deakin <- dplyr::filter(data_clean, inst_correct %in% c("Deakin University"))
percentage_Deakin <- percent(round(sum(data_Deakin$firstyr>2012) /count(data_Deakin), digits = 4))
print(paste("Answer:", percentage_Deakin, "of Deakin scientists in this dataset first published in 2013 or later"))
## [1] "Answer: 10.85% of Deakin scientists in this dataset first published in 2013 or later"
As the purpose of this dataset is the highlight the top 2% of Australian scientists within their fields, I wanted to examine how the rankings of the scientists from the top 10 institutes in Australia compared to one another. Specifically, I was interested in comparing the distribution of the ranks to see whether the quantitative difference between these institutes translated to a qualitative difference in the rankings of their scientists.
To achieve this, I filtered the clean dataset to select for scientists from the top 10 institutes in Australia (based on number of scientists in the dataset). I then used a boxplot to examine the distribution of rankings in those institutes, with a particular focus on the median and inter-quartile range. I chose to use a boxplot over a violin plot as it’s easier to interpret for stakeholders who do not have much experience in data analysis while still showing some information about how data is distributed (compared to a column graph, which only shows central tendency and a measure of variance). Violin plots were also compressed at the lower end due to the wide range of values in the rank (ns) variable (108 to 1,056,509) The median was chosen over the mean as the summary statistic for central tendency due to the strong positive skew of the data.
If you like violin plots and want to see the distribution of composite scores in the top 10 institutes, check out Appendix A2.
data_top10 <- dplyr::filter(data_clean, `inst_correct` %in% c("University of Melbourne", "Monash University", "Deakin University", "The University of Western Australia", "The University of Adelaide", "The University of Sydney", "UNSW Sydney", "The Australian National Univeristy", "The University of Queensland", "Commonwealth Scientific and Industrial Research Organisation"))
data_top10 <- as.data.frame(data_top10)
data_top10$inst_correct <- str_replace(data_top10$inst_correct, "Commonwealth Scientific and Industrial Research Organisation", "CSIRO")
data_top10 %>% ggplot(aes(x = `rank (ns)`, y = inst_correct, fill = inst_correct != "Deakin University", stat = "identity", show.legend=FALSE)) + scale_fill_manual(values = c("orangered3", "gray80"), guide = "none") +
labs(title = "Distribution of Scientist Rankings in Top 10 Institutes",
x = "Composite Score",
y = NULL) + theme_classic() +
theme(plot.title = element_text(color="orangered3", size=12, face="bold", hjust=0)) +
geom_boxplot()
To test whether there was any statistically significant difference in scientistists’ rankings between the institutes, I conducted a one-way ANOVA. While an assumption of the one-way ANOVA is that data is normally distributed, it can tolerate skewed data.
A statistically significant difference was found in scientists’ rankings between the institutes (F(3, 3843) = 3.83, p < .001). Post-hoc tests revealed that the rankings of scientists affiliated with CSIRO were significantly lower than the University of Melbourne (p < .001), Monash University (p = .015), The University of Queensland (p < .001), UNSW Sydney (p < .001), and Deakin University (p = .012). No statistically significant difference was found between any of the other institutes.
rank_test <-aov(`rank (ns)`~inst_correct, data_top10)
summary(rank_test)
## Df Sum Sq Mean Sq F value Pr(>F)
## inst_correct 8 4.857e+11 6.071e+10 3.826 0.000171 ***
## Residuals 3843 6.098e+13 1.587e+10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rank_posthoc <- posthocPairwiseT(rank_test)
print(rank_posthoc)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: rank (ns) and inst_correct
##
## CSIRO Deakin University Monash University
## Deakin University 0.01206 - -
## Monash University 0.01451 1.00000 -
## The University of Adelaide 0.13806 1.00000 1.00000
## The University of Queensland 0.00051 1.00000 1.00000
## The University of Sydney 0.29106 1.00000 1.00000
## The University of Western Australia 0.25205 1.00000 1.00000
## University of Melbourne 0.00051 1.00000 1.00000
## UNSW Sydney 0.00025 1.00000 1.00000
## The University of Adelaide
## Deakin University -
## Monash University -
## The University of Adelaide -
## The University of Queensland 1.00000
## The University of Sydney 1.00000
## The University of Western Australia 1.00000
## University of Melbourne 1.00000
## UNSW Sydney 1.00000
## The University of Queensland
## Deakin University -
## Monash University -
## The University of Adelaide -
## The University of Queensland -
## The University of Sydney 0.38941
## The University of Western Australia 1.00000
## University of Melbourne 1.00000
## UNSW Sydney 1.00000
## The University of Sydney
## Deakin University -
## Monash University -
## The University of Adelaide -
## The University of Queensland -
## The University of Sydney -
## The University of Western Australia 1.00000
## University of Melbourne 0.39661
## UNSW Sydney 0.21124
## The University of Western Australia
## Deakin University -
## Monash University -
## The University of Adelaide -
## The University of Queensland -
## The University of Sydney -
## The University of Western Australia -
## University of Melbourne 1.00000
## UNSW Sydney 1.00000
## University of Melbourne
## Deakin University -
## Monash University -
## The University of Adelaide -
## The University of Queensland -
## The University of Sydney -
## The University of Western Australia -
## University of Melbourne -
## UNSW Sydney 1.00000
##
## P value adjustment method: holm
Since the scientists in this dataset represent the top 2% in each primary research subfield, I was next interested in the research subfields of Australia’s top scientists. In which fields are Australian scientists leading research? And in what research fields do Deakin University’s top scientists excel?
To examine this, I looked at the frequency of the research subfields in the sm_subfield_1 variable for the whole dataset (Australia), when data was filtered on “VIC” in the inst_state variable (Victoria), and when data was filtered on “Deakin University” in the inst_correct variable.
While General & Internal Medicine was the top subfield at the national and state level, scientists at Deakin University excelled at the Materials subfield. Artificial Intelligence and Image Processing ranked second at Deakin and across Australia, and third in Victoria, surpassing several health science subfields.
#Clinical Medicine was the top ranked research category for the majority (**31.72%**) of Australian scientists in this dataset. This was followed by Engineering (7.5%), Emerging & Enabling Technologies (7.40%), and Biology (6.97%).
#Now rank on subfield - Aus
subfield_All <- as.data.frame(table(data_clean$`sm-subfield-1`))
subfield_All <- arrange(subfield_All, desc(subfield_All[2]))
colnames(subfield_All)[1] <- "A Research Subfield"
colnames(subfield_All)[2] <- "A Count"
#print(subfield_All[1:10,])
#And rank on subfield - VIC
subfield_VIC <- as.data.frame(table(data_VIC$`sm-subfield-1`))
subfield_VIC <- arrange(subfield_VIC, desc(subfield_VIC[2]))
colnames(subfield_VIC)[1] <- "B Research Subfield"
colnames(subfield_VIC)[2] <- "B Count"
#print(subfield_VIC[1:10,])
#Then rank on subfield - Deakin
subfield_Deakin <- as.data.frame(table(data_Deakin$`sm-subfield-1`))
subfield_Deakin <- arrange(subfield_Deakin, desc(subfield_Deakin[2]))
colnames(subfield_Deakin)[1] <- "C Research Subfield"
colnames(subfield_Deakin)[2] <- "C Count"
#print(subfield_Deakin[1:10,])
#combine them into a table
subfield_comb <-data.frame(subfield_All[1:10,], subfield_VIC[1:10,], subfield_Deakin[1:10,])
subfield_comb <- flextable(subfield_comb)
subfield_comb <- subfield_comb|>
add_header_row(top = TRUE, values = c("Australia",
"Victoria", "Deakin University"),colwidths = c(2,2,2))
subfield_comb <- autofit(subfield_comb)
#print(subfield_comb)
flextable::save_as_image(subfield_comb, path = "subfield_ranks.png")
Finally, I was interested in how representation of ECRs at Deakin University compared to the other top Australian institutes. The percentage of ECRs in each of the top 10 Australian institutes (by number of scientists) was calculated in Excel using the Vlookup function and institutes were ranked.
Deakin University had the highest percentage of ECRs compared to the top 10 Australian institutes. Deakin University also had a higher number of ECRs than Melbourne University despite having 1/3 the total number of scientists in the dataset. This indicates that larger institutions such as the University of Melbourne may have a more senior research profile, while Deakin University attracts and supports talented new researchers.
The imported table can be found in Appendix A3.
ECR_Aus_top %>% ggplot() +
labs(title = "Percentage of ECRs in Top 10 Australian Institutes",
x = "Percentage",
y = NULL) +
scale_fill_manual(values = c("orangered3", "gray70"), guide = "none") + theme_classic() +
theme(plot.title = element_text(color="orangered3", size=14, face="bold", hjust=0)) + scale_x_continuous(labels = scales::percent)+
geom_bar(aes(x = Percentage, y = reorder(`Top 10 Aus`, Percentage),
fill = `Top 10 Aus` != "Deakin University"), stat = "identity", show.legend = FALSE)
ECR_Aus_top %>% ggplot() +
labs(title = "Number of ECRs in Top 10 Australian Institutes",
x = "Number of Scientists",
y = NULL) +
scale_fill_manual(values = c("orangered3", "gray70"), guide = "none") + theme_classic() +
theme(plot.title = element_text(color="orangered3", size=14, face="bold", hjust=0)) +
geom_bar(aes(x = `Number of ECRs`, y = reorder(`Top 10 Aus`, `Number of ECRs`),
fill = `Top 10 Aus` != "Deakin University"), stat = "identity", show.legend = FALSE)
Below is the list of all incorrect institute names that appeared in the dataset, how many scientists were affected, and what the name was changed to in the inst_correct variable.
Violin plots give the most complete information about the distribution of data, but the range of values of the rank (ns) variable resulted in compression of the plots at the lower end. Rankings are based on the composite score, which has a much smaller range. I therefore made a violin plot of the composite scores to see if they provided any information about the data that wasn’t already captured by the boxplot. There was nothing worth noting in the violin plots, so I retained the boxplot of the rank (ns) variable in the final solution.
data_top10 %>% ggplot(aes(x = `c (ns)`, y = inst_correct, fill = inst_correct != "Deakin University", stat = "identity", show.legend=FALSE)) + scale_fill_manual(values = c("orangered3", "gray85"), guide = "none") +
labs(title = "Distribution of Scientist Rankings in Top 10 Institutes",
x = "Composite Score",
y = NULL) + theme_classic() +
theme(plot.title = element_text(color="orangered3", size=12, face="bold", hjust=0)) +
geom_violin()+geom_boxplot(width = 0.1)
Below is a table showing the data used to calculate the percentage of ECRs in each of the top 10 Australian institutes. This table was created in Excel and loaded into R for further analysis.