[1] 1.869028
[1] 1.527639
Unit 2: Fundamentals of R
2026-06-08


Package manager to facilitate loading and updating software libraries
Extensive collection of modules and packages for a wide range of functions (maps, data manipulation, etc.)
Active support and continued development from academic and corporate users community
Integrated Development Environment and Data Workbook
| Feature | R | Python |
|---|---|---|
| Overview | R is a language and environment for statistical programming which includes statistical computing and data graphics. | Python is a general-purpose programming language for data analysis, scientific computing and application development. Simplify program complexity using common approaches. |
| Design Objective | Designed by statisticians for data analysis, modelling and representation for both batch computation and interactive websites. Designed for simplifying complex mathematics and statistics. | Designed by engineers and computer scientists to develop GUI, web and embedded hardware applications |
| Key applications | Forecasting, Data Visualization, Machine Learning | Data collection, Computer Vision, Data machines learning |
R vs Python
https://www.tiobe.com/tiobe-index/
a= 15a = c(1,2,3,4)m = matrix(1:6, nrow = 2, ncol = 3)a = array(1:8, dim = c(2, 2, 2))l = list(name = "Alice", scores = c(95, 88, 100), has_passed = TRUE)df = data.frame(id = c(1,2,3), name = c("Bob","Carl","Del"), active = c(TRUE, FALSE, TRUE))[1] 1.869028
[1] 1.527639
plot(x,y2,type=“l”) hist(y2,breaks=25)
## Student Heights
Given 6 randomly selected individuals of a population, the middle 2 represent the range of $\pm1$ standard deviation
::: {.cell}
::: {.cell-output .cell-output-stdout}
[1] 1.869028
:::
::: {.cell-output .cell-output-stdout}
[1] 1.527639
:::
:::
## R Notebook {.scrollable}
- Title
- Authors
- Date
- Abstract
- Introduction
- The nature of the problem
- What work has been done before
- Key Research Objectives
- Methodology
- Results
- Discussion
- Conclusion
- Possible steps for future research
- Bibliography
## Data file types
- Excel `readxl`
- SQL `dbConnect; dbGetQuery`
- CSV `read_csv`
- XML `read_xml`
- YAML `read_yaml`
- Json `read_json`
## Data Sources {.sources}
- Weather APIs: api.open-meteo.com
- Kaggle: https://kaggle.com
- GitHub: https://github.com
- Data.gov: https://data.gov.in
- EU Open Data Portal: https://data.europa.eu/en
- UCI Machine Learning Repository: https://archive.ics.uci.edu/
- Hugging Face Dataset: https://huggingface.co/datasets
- Open Data on AWS: https://registry.opendata.aws/
- Harvard Data verse: https://data.harvard.edu/dataverse
- PhysioNet: https://physionet.org/
- World Bank Open Data: https://data.worldbank.org/
- Federal Reserve Economic Data: https://fred.stlouisfed.org/
- GNU Regression, Econometrics and Time-series Library: https://gretl.sourceforge.net/
**Commercial Data Networks**
- STAT: https://www.stata.com/
- Microsoft Power BI
- Tableau
- SAP
- IBM Dashboard (SPSS)
- Splunk
- Data.world
- Bit Bucket
- Google Docs
- Dropbox
- Flight tracker
- Marine Traffic
## Weather
- **Source:** api.open-meteo.com
- Current temp: at CNX Airport
https://api.open-meteo.com/v1/forecast?latitude=18.7668&longitude=98.9626¤t_weather=true
- Past week:
https://api.open-meteo.com/v1/forecast?latitude=18.7706&longitude=98.9626&hourly=temperature_2m,windspeed,winddirection,weathercode,rain,surface_pressure&past_days=7&forecast_days=0&timezone=Asia%2FBangkok
download.file(“https://api.open-meteo.com/v1/forecast?latitude=18.7668&longitude=98.9626¤t_weather=true”, “temp.txt”)


| Code | Description |
|---|---|
| 0 | Clear sky |
| 1, 2, 3 | Mainly clear, partly cloudy, and overcast |
| 45, 48 | Fog and depositing rime fog |
| 51, 53, 55 | Drizzle: Light, moderate, and dense intensity |
| 56, 57 | Freezing Drizzle: Light and dense intensity |
| 61, 63, 65 | Rain: Slight, moderate and heavy intensity |
| 66, 67 | Freezing Rain: Light and heavy intensity |
| 71, 73, 75 | Snow fall: Slight, moderate, and heavy intensity |
| 77 | Snow grains |
| 80, 81, 82 | Rain showers: Slight, moderate, and violent |
| 85, 86 | Snow showers slight and heavy |
| 95 * | Thunderstorm: Slight or moderate |
| 96, 99 * | Thunderstorm with slight and heavy hail |
Association Rule: An implication expression of the form \(X \rightarrow Y\), where \(X\) and \(Y\) are disjoint itemsets. It signifies that if the items in set \(X\) are present in a transaction, the items in set \(Y\) are also likely to be present.
Support: A metric that measures how frequently an itemset appears in the entire database. Mathematically, for a rule \(X \rightarrow Y\):
\[\text{Support}(X \rightarrow Y) = \frac{\text{Number of transactions containing both } X \text{ and } Y}{\text{Total number of transactions}}\]
\[\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}\]
\[\text{Lift}(X \rightarrow Y) = \frac{\text{Confidence}(X \rightarrow Y)}{\text{Support}(Y)}\]

IT408