Data import to R studio

P K Parida, CRFM

Look at your file

Look at your file location
Look at file extension / type ( .csv, .xlsx)
Right click on you file and go to the properties and see the location, copy the location and paste in the source editor pane of R studio
Change the back slash'\' format to forward slash format “/”
"C:\ Users\ PK Parida\ Documents" to "C:/Users/PK Parida/Documents" and then a forward slash and then the file name with extension

read your file by read.csv or read_csv or read_excel

Read the file

w1 <- read.csv(“~/mc3.csv”)
Library (Tidyverse)
w1 <- read_csv (“~/mc3.csv”)
if you want to read the excel file
library(readxl)
dataset <- read_excel(“~/table6_3.xlsx”)

read_csv() will always read variables containing text as character variables. In contrast, the base R function read.csv() will, by default, convert any character variable to a factor

Direct import the file from the Environment pane

click on import sign in the environment pane
select the type of file you are interested
then browse the file from your location, select the file and click on open
the details of file will open in environment pane
select the sheet number or name if it is an excel file

for csv or TSV file select the “From Text (base)” in the import sign

saving a file in R studio

CSVs a little unreliable for caching interim results—you need to recreate the column specification every time you load in.
`write_csv(table6, “table6.csv”)`
where student is the name of the analysis in R
write_rds() and read_rds() are uniform wrappers around the base functions readRDS() and saveRDS().
`write_rds(table6, “table6.rds”)`
reading a rds file in R : `read_rds(“table6.rds”)`

Reading the semicolon - separated file and TSV files

read_csv2() reads semicolon-separated files. These use ; instead of , to separate fields and are common in countries that use , as the decimal marker.
read_tsv() reads tab-delimited files.Reading file in R studio

Reading package in R studio

library(tidyverse)

Read Table from excel file

library(readxl)

table2 <- read_excel("table6_3.xlsx", sheet = "Sheet2")

glimpse(table2)

Rows: 12
Columns: 4
$ Country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "B…
$ year    <dbl> 1999, 1999, 2000, 2000, 1999, 1999, 2000, 2000, 1999, 1999, 20…
$ type    <chr> "cases", "population", "cases", "population", "cases", "popula…
$ count   <dbl> 745, 19987071, 2666, 20595360, 37737, 172006362, 80488, 174504…

Cases per country per year

table2 |>
  group_by(Country, year) |>
  filter(type ==  "cases")

# A tibble: 6 × 4
# Groups:   Country, year [6]
  Country      year type   count
  <chr>       <dbl> <chr>  <dbl>
1 Afghanistan  1999 cases    745
2 Afghanistan  2000 cases   2666
3 Brazil       1999 cases  37737
4 Brazil       2000 cases  80488
5 China        1999 cases 212258
6 China        2000 cases 213766

Cases per country per year - 2

table2 |>
  group_by(Country, year) |>
  filter(type %in%  "cases")

# A tibble: 6 × 4
# Groups:   Country, year [6]
  Country      year type   count
  <chr>       <dbl> <chr>  <dbl>
1 Afghanistan  1999 cases    745
2 Afghanistan  2000 cases   2666
3 Brazil       1999 cases  37737
4 Brazil       2000 cases  80488
5 China        1999 cases 212258
6 China        2000 cases 213766

Wide the Table

table2_1 = pivot_wider(table2, names_from = type, values_from = count)

glimpse(table2_1)

Rows: 6
Columns: 4
$ Country    <chr> "Afghanistan", "Afghanistan", "Brazil", "Brazil", "China", …
$ year       <dbl> 1999, 2000, 1999, 2000, 1999, 2000
$ cases      <dbl> 745, 2666, 37737, 80488, 212258, 213766
$ population <dbl> 19987071, 20595360, 172006362, 174504898, 1272915272, 12804…

Extract the TB cases per year

table2_1 |> 
  group_by(Country, year, cases) |> 
  summarize(total_cases = sum(cases))

# A tibble: 6 × 4
# Groups:   Country, year [6]
  Country      year  cases total_cases
  <chr>       <dbl>  <dbl>       <dbl>
1 Afghanistan  1999    745         745
2 Afghanistan  2000   2666        2666
3 Brazil       1999  37737       37737
4 Brazil       2000  80488       80488
5 China        1999 212258      212258
6 China        2000 213766      213766

Alternatively

# as the cases and sum of case per year is same, so alternatively

table2_1 |> 
  select(Country, year, cases)

# A tibble: 6 × 3
  Country      year  cases
  <chr>       <dbl>  <dbl>
1 Afghanistan  1999    745
2 Afghanistan  2000   2666
3 Brazil       1999  37737
4 Brazil       2000  80488
5 China        1999 212258
6 China        2000 213766

Rate of cases

# caculation of rate of cases

#Rate: Divide cases by population, and multiply by 10000

table2_1 |>
  mutate(rate = cases / population * 10000)

# A tibble: 6 × 5
  Country      year  cases population  rate
  <chr>       <dbl>  <dbl>      <dbl> <dbl>
1 Afghanistan  1999    745   19987071 0.373
2 Afghanistan  2000   2666   20595360 1.29 
3 Brazil       1999  37737  172006362 2.19 
4 Brazil       2000  80488  174504898 4.61 
5 China        1999 212258 1272915272 1.67 
6 China        2000 213766 1280428583 1.67

Numbers in Different format

table3 <- read_excel("table6_3.xlsx")

glimpse(table3)

Rows: 6
Columns: 3
$ Country <chr> "Afghanistan", "Afghanistan", "Brazil", "Brazil", "China", "Ch…
$ year    <dbl> 1999, 2000, 1999, 2000, 1999, 2000
$ rate    <chr> "745/19987071", "2666/20595360", "37737/172006362", "80488/174…

Separte the numbers from the above Table

table3_1 = table3 |> 
  separate(rate, c("cases", "population")) 

glimpse(table3_1)

Rows: 6
Columns: 4
$ Country    <chr> "Afghanistan", "Afghanistan", "Brazil", "Brazil", "China", …
$ year       <dbl> 1999, 2000, 1999, 2000, 1999, 2000
$ cases      <chr> "745", "2666", "37737", "80488", "212258", "213766"
$ population <chr> "19987071", "20595360", "172006362", "174504898", "12729152…