Question 1: Point 10 A string is given as combination of multiple gene sequences separated by “-”: “ATCGATCGATCG-ATCGAT-CGATC-GATCGAT-CGATCG-ATCGATCG-CGATCG” a. Separate all the strings and store it in a variable “gene_seqs” b. Using a loop reverse all the strings in gene_seqs c. Using a loop calculate length of each string in gene_seqs d. In gene_seqs, if a string length is greater than 7, print “Successful” else print “Error”. (use loop and condition) e. Find the locations and number of occurrences of pattern “GATC” in each string of the gene_seqs Bonus (+5): Solve b to e questions using one loop.
genes = 'ATCGATCGATCG-ATCGAT-CGATC-GATCGAT-CGATCG-ATCGATCG-CGATCG'
genes
## [1] "ATCGATCGATCG-ATCGAT-CGATC-GATCGAT-CGATCG-ATCGATCG-CGATCG"
gene_seqs=strsplit(genes, '-')
gene_seqs
## [[1]]
## [1] "ATCGATCGATCG" "ATCGAT" "CGATC" "GATCGAT" "CGATCG"
## [6] "ATCGATCG" "CGATCG"
library(stringi)
for(i in genes) {
rev_gene<-stri_reverse(i)
}
rev_gene
## [1] "GCTAGC-GCTAGCTA-GCTAGC-TAGCTAG-CTAGC-TAGCTA-GCTAGCTAGCTA"
for (i in gene_seqs){
len_seq<-nchar(i)
}
len_seq
## [1] 12 6 5 7 6 8 6
for (i in gene_seqs[[1]]) {
if (nchar(i) > 7) {
print("Successful")
} else {
print("Error")
}
}
## [1] "Successful"
## [1] "Error"
## [1] "Error"
## [1] "Error"
## [1] "Error"
## [1] "Successful"
## [1] "Error"
for(i in genes)
for (j in gene_seqs)
for (k in gene_seqs[[1]])
{
rev_gene<-stri_reverse(i)
{
len_seq<-nchar(j)
}
if (nchar(k)>7)
{
print("Successful")
}
else
{
print("Error")}
}
## [1] "Successful"
## [1] "Error"
## [1] "Error"
## [1] "Error"
## [1] "Error"
## [1] "Successful"
## [1] "Error"
rev_gene
## [1] "GCTAGC-GCTAGCTA-GCTAGC-TAGCTAG-CTAGC-TAGCTA-GCTAGCTAGCTA"
len_seq
## [1] 12 6 5 7 6 8 6
Given a data frame as following exam_score = data.frame( ID = c(1, 2, 3, 4, 5), Name = c(“Alice”, “Bob”, “David”, “John”, “Jenny”), Age = c(20, 25, 30, 22, 18), Score = c(100, 78, 90, 55, 81) ) a. Create the data frame. Add 2 new rows. Add a new column called “Income”. This column should be numerical. b. Find max, min, median, sum, mean, standard deviation, variance, quantiles of column Age, Score, and Income. c. Find correlation between i. Age and score ii. Age and income iii. Score and income d. Select rows where the score is greater than or equal to 80 e. Select rows with age range 20 to 30
# Create the data frame
exam_score = data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("Alice", "Bob", "David", "John", "Jenny"),
Age = c(20, 25, 30, 22, 18),
Score = c(100, 78, 90, 55, 81)
)
exam_score
# Add 2 new rows
exam_score<-rbind(exam_score,
c(6, "Lin", 31, 62),
c(7, "Jaccop", 37, 45))
exam_score
# Add a new column called “Income”
exam_score$Income <-c(4200, 3500, 5980, 3609, 4503, 2500, 4900)
exam_score
# For Age
exam_score$Age<-as.numeric(exam_score$Age)
exam_score$Age
## [1] 20 25 30 22 18 31 37
print(max(exam_score$Age))
## [1] 37
print(min(exam_score$Age))
## [1] 18
print(median(exam_score$Age))
## [1] 25
print(sum(exam_score$Age))
## [1] 183
print(mean(exam_score$Age))
## [1] 26.14286
print(sd(exam_score$Age))
## [1] 6.817345
print(var(exam_score$Age))
## [1] 46.47619
print(quantile(exam_score$Age))
## 0% 25% 50% 75% 100%
## 18.0 21.0 25.0 30.5 37.0
# For Score
exam_score$Score<-as.numeric(exam_score$Score)
exam_score$Score
## [1] 100 78 90 55 81 62 45
print(max(exam_score$Score))
## [1] 100
print(min(exam_score$Score))
## [1] 45
print(median(exam_score$Score))
## [1] 78
print(sum(exam_score$Score))
## [1] 511
print(mean(exam_score$Score))
## [1] 73
print(sd(exam_score$Score))
## [1] 19.73153
print(var(exam_score$Score))
## [1] 389.3333
print(quantile(exam_score$Score))
## 0% 25% 50% 75% 100%
## 45.0 58.5 78.0 85.5 100.0
# For Income
exam_score$Income<-as.numeric(exam_score$Income)
exam_score$Income
## [1] 4200 3500 5980 3609 4503 2500 4900
print(max(exam_score$Income))
## [1] 5980
print(min(exam_score$Income))
## [1] 2500
print(median(exam_score$Income))
## [1] 4200
print(sum(exam_score$Income))
## [1] 29192
print(mean(exam_score$Income))
## [1] 4170.286
print(sd(exam_score$Income))
## [1] 1116.043
print(var(exam_score$Income))
## [1] 1245552
print(quantile(exam_score$Income))
## 0% 25% 50% 75% 100%
## 2500.0 3554.5 4200.0 4701.5 5980.0
cor(exam_score$Age, exam_score$Score)
## [1] -0.5625078
cor(exam_score$Age, exam_score$Income)
## [1] 0.153113
cor(exam_score$Score, exam_score$Income)
## [1] 0.2945793
exam_score[exam_score$Score>=80,]
exam_score[exam_score$Age >=20 & exam_score$Age <=30,]