Software Tools for Earth and Environmental Science

YSB 801E

Lecture 1 - Data and Code

  • Course

  • Data

  • Code

  • New Accounts

Course

Objective

  • Introduction to software tools.

  • Traditional 5-step process for the scientific method

  1. Observation
  2. Hypothesis
  3. Experimental Design
  4. Data collection
  5. Analysis and Conclusion

Syllabus

Extended Syllabus

LINK

Book

PDF

Flow

LINK

Course Github-Web Page

LINK

Data

Data

  • What is the Data and Metadata?
  • Data Collection and Generation
  • Data Types, Formats and Sources
  • Download and Get the Data
  • Popular Terms About Data

What is the Data and Metadata?

Data are things, known or assumed as facts, making the basis of reasoning or calculation.

Metadata is information about data.

Metadata

Metadata

Metadata

Metadata

Metadata

Data Collection and Generation

Data collection is the process of gathering and measuring information on targeted variables in an established system. The purpose is to answer relevant questions or/and evaluate outcomes.

  • Observational
  • Statistical
  • Simulation

Data Types

Data Formats

  • Text, Picture, Audio, Video
  • File; pdf, txt, csv, html, xml, nc, hdf
  • Point, Line, Polygon
  • 1D, 2D, 3D, xD

Data Sources

The location where data that is being used comes from.

LINK

Download and Get the Data

You can just click the DOWNLOAD button.

What if the size of data is very big and you need numerous data files like hundreds or thousands.

Interpretation and Visualization

Interpretation is the process of making sense of numerical data that has been collected, analyzed, and presented.

Visualization is the graphical representation of information and data.

Data Analysis

Data analysis is a process of inspecting, cleaning, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making.

Exploratory Data Analysis (EDA)

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.

Big Data

Too large or complex data to be dealt with by traditional data-processing application and software.

Apache Point, New Mexico, US, 1995

ALMA, Atacama, Antofagasta Region, Chile

Apache Point, New Mexico, US, 1995

After one month, the ‘apache point’ collected astronomical data as much as we have since the beginning of the earth.

After ten years, it collected data as much as we have. (not only astronomically, all data)

Important because of the golden rate.

ALMA, Atacama, Antofagasta Region, Chile, 2011

66 high-precision antennas (over 100 tons for each), like 1 big teleschope. The antennas can be moved across the desert plateau over distances from 150 m to 16 km.

Atacama Large Millimeter/submillimeter Array, ALMA Transporter (130-ton) and Correlator (1.5 trillion/sec)

In 5 days it collects data as much as we have. Also we found the Black Body.

Data Mining

Data mining is the process of discovering patterns in large data sets.

Data Assimilation and Manipulation

Data assimilation is a mathematical discipline that seeks to optimally combine theory with observations.

Data manipulation; inserting, deleting, and modifying data in a database.

Chemical and Physical

Data Science

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge from structured and unstructured data.

Code

Code

words, letters, figures, or symbols

telegraph key / morse alphabet / morse code (1835)

In computing -> program instructions.

Code

  • Operating Systems
  • Programming Languages
  • Interpretation and Visualization
  • Algorithm, Simulation and Modelling
  • Popular Terms About Programming

Operational Systems

An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.

Microsoft Windows, Apple macOS, Linux

Programming Languages

Top 20 (256) most popular programming (2019-2020)

Interpretation and Visualization

Interpretation is the process of making sense of the data that has been collected, analyzed, and presented.

Visualization is the graphical representation of information and data.

Algorithm

The algorithm is a set of instructions, typically to solve a class of problems or perform a computation.

Simulation

The simulation is an approximate imitation of the operation of a process or system; that represents its operation over time.

Modelling

Model is a description of a system using mathematical concepts. The aim of modeling is to understand easier a particular part or feature of thing.

Artificial İntelligence

Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning, reasoning and self-correction.

Machine Learning

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions.

1997, Deep Blue vs Garry Kasparov

Deep Learning

Deep learning is part of a family of machine learning methods based on artificial neural networks.

Internet of Things

The Internet of Things (IoT) is a system of interrelated computing devices, mechanical and digital machines. IoT has the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.

NEW ACCOUNTS

NEW ACCOUNTS

  • Github, Researchgate, DOI Code, ORCID, Overleaf(LaTeX)
  • Mendeley, Panoply, Sublime Text, Filezilla
  • ArcGIS, QGIS
  • Anaconda, Jupyter, Cygwin, R Studio, NCL
  • Meted, Coursera, Udemy, Datacamp, Edx, Khanacademy
  • Stackoverflow, Wolfram-alpha, Dropbox, Wetransfer

LINK

BY THE WAY

I prepared this slide in R

Also our course webside is prepared in R

See you

I will share these in Ninova

I let you know what we will do at the next week

  • Quiz
  • Homework

Also you can check the Extended Abstract

**

Question ?

**

- THE END -