The R language was primarily designed as a language for data manipulation, modeling and visualization, and many of the data structures reflect this view. ####When R is started, a workspace is created and that workspace is where the user creates and manipulates variables.This workspace is an environment, and an environment is a set of bindings of names, or symbols, to values. The top-level workspace can be accessed through its name, which is GlobalEnv. Assignment of value to a variable is generally done with either the = (equals) character, or a special symbol that is the concatenation of less than and minus,<-. Assignment creates a binding between a symbol and a value, in a particular environment. An R package typically consists of a coherent collection of functions and data structures that are suitable for addressing a particular problem. Here we use some package : 1. Bio3d 2. scatter plot3d 3. ggbiplot 4. peptides 5. lines 3d
#setwd("H:/ROLL25")
Z<-read.csv("spectra.csv")
x<-Z$wavelength
y<-Z$absorb
mainlab=paste("wavelength vs absorbtion")
xlabel=paste("wavelength")
ylabel=paste("absorption")
plot(x,y,main = mainlab,col="red",cex=0.5,xlab = xlabel,ylab = ylabel)
jk<-which(x>450 & x<750)
length(jk)
## [1] 1557
wv<-x[jk]
aa<-y[jk]
plot(wv,aa,main = mainlab,col="red",cex=0.5,xlab = xlabel,ylab = ylabel)
mia<-max(aa)
mia
## [1] 0.386
miv<-which(aa==mia)
miv
## [1] 850
wvmax<-wv[850]
wvmax
## [1] 618.08
text(545,0.386,"wvmax= 618.08--->",cex = 0.7, col="blue")
lm(aa~wv)
##
## Call:
## lm(formula = aa ~ wv)
##
## Coefficients:
## (Intercept) wv
## 8.721e-02 2.193e-06
abline(h=0.386,v=618.08,col="black")
summary(wvmax)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 618.1 618.1 618.1 618.1 618.1 618.1
z<-read.csv("Murshidabadnewfolder.csv")
z.m<-z
summary(z.m)
## SL.NO NAME SURNAME AGE SEX
## Min. : 1.00 ANANDA : 1 HANSDA :3 13 YRS :4 F: 3
## 1st Qu.: 5.75 ANITA : 1 HEMBRAM:5 14 YRS :4 M:17
## Median :10.50 BAPI : 1 KISKU :1 15 YRS :2
## Mean :10.50 BIKASH : 1 MARDI :3 17 YRS :2
## 3rd Qu.:15.25 BIKRAM : 1 MURMU :4 8 YRS :2
## Max. :20.00 DEBEN : 1 SOREN :2 11 YRS :1
## (Other):14 TUDU :2 (Other):5
## HPLC.NO HB Ferritin RBC
## Min. : 13.00 Min. :12.10 Min. :33.8 Min. :3.200
## 1st Qu.: 48.25 1st Qu.:12.20 1st Qu.:33.8 1st Qu.:3.900
## Median :114.00 Median :12.40 Median :33.8 Median :4.800
## Mean :103.95 Mean :12.37 Mean :33.8 Mean :4.595
## 3rd Qu.:155.25 3rd Qu.:12.50 3rd Qu.:33.8 3rd Qu.:5.025
## Max. :174.00 Max. :12.80 Max. :33.8 Max. :6.000
## NA's :19
## PCV MCV MCH MCHC
## Min. :27.90 Min. :62.00 Min. :16.10 Min. :25.40
## 1st Qu.:33.80 1st Qu.:68.47 1st Qu.:19.70 1st Qu.:28.40
## Median :36.50 Median :71.40 Median :20.60 Median :28.75
## Mean :35.32 Mean :70.77 Mean :20.57 Mean :28.68
## 3rd Qu.:37.17 3rd Qu.:73.38 3rd Qu.:21.30 3rd Qu.:29.60
## Max. :39.10 Max. :77.30 Max. :25.00 Max. :30.90
##
## RDW HbA0 HbA2 HbF
## Min. :11.60 Min. :84.30 Min. :1.600 Min. :0.10
## 1st Qu.:12.62 1st Qu.:86.60 1st Qu.:2.375 1st Qu.:0.20
## Median :13.65 Median :87.45 Median :2.500 Median :0.25
## Mean :14.23 Mean :87.29 Mean :2.400 Mean :0.25
## 3rd Qu.:15.47 3rd Qu.:88.33 3rd Qu.:2.600 3rd Qu.:0.30
## Max. :21.50 Max. :89.20 Max. :2.600 Max. :0.50
##
## HBA2.E HBS HBD FERITIN
## Mode:logical Mode:logical Mode:logical Mode:logical
## NA's:20 NA's:20 NA's:20 NA's:20
##
##
##
##
##
## B12 CLINICAL.ABNORMALITY X X.1
## Mode:logical Mode:logical Mode:logical Mode:logical
## NA's:20 NA's:20 NA's:20 NA's:20
##
##
##
##
##
## molecular
## alpha mutation :14
## normal : 6
##
##
##
##
##
zx<-read.csv("naidanewfolder.csv")
zx.n<-zx
summary(zx.n)
## HPLC..No HB RBC HCT
## Min. : 8.00 Min. :12.10 Min. :4.510 Min. :31.60
## 1st Qu.: 52.50 1st Qu.:12.25 1st Qu.:4.965 1st Qu.:34.70
## Median : 87.00 Median :12.70 Median :5.180 Median :36.80
## Mean : 79.06 Mean :12.95 Mean :5.256 Mean :37.01
## 3rd Qu.:106.50 3rd Qu.:13.45 3rd Qu.:5.600 3rd Qu.:39.25
## Max. :121.00 Max. :14.90 Max. :6.030 Max. :45.00
## MCV MCH MCHC RDW
## Min. :61.50 Min. :17.60 Min. :28.60 Min. :13.70
## 1st Qu.:68.80 1st Qu.:20.60 1st Qu.:29.90 1st Qu.:15.05
## Median :70.80 Median :21.40 Median :30.30 Median :16.00
## Mean :70.45 Mean :21.41 Mean :30.38 Mean :16.07
## 3rd Qu.:72.30 3rd Qu.:22.10 3rd Qu.:30.80 3rd Qu.:16.80
## Max. :74.90 Max. :23.70 Max. :31.80 Max. :18.80
## HBA0 HBA2 HBF HBA2e
## Min. :86.00 Min. :1.600 Min. :0.1000 Mode:logical
## 1st Qu.:86.50 1st Qu.:2.600 1st Qu.:0.2000 NA's:47
## Median :87.00 Median :2.700 Median :0.3000
## Mean :87.01 Mean :2.723 Mean :0.3702
## 3rd Qu.:87.35 3rd Qu.:2.900 3rd Qu.:0.5000
## Max. :88.90 Max. :3.200 Max. :0.9000
## HBS HBD MOLECULAR
## Mode:logical Mode:logical ALPHA MUTATION:15
## NA's:47 NA's:47 NORMAL :32
##
##
##
##
xz<-read.csv("birbhumnewfolder.csv")
xz.b<-xz
summary(xz.b)
## SL.NO NAME SURNAME AGE SEX
## Min. : 1.00 LAKHINDAR : 2 SOREN :12 12Y : 9 F:12
## 1st Qu.:13.25 SONALI : 2 MURMU :10 14Y : 5 M:38
## Median :25.50 ARIKA : 1 MARDI : 8 14Y2M : 3
## Mean :25.50 BAIDYANATH: 1 HANSDA : 7 15Y7M : 3
## 3rd Qu.:37.75 BAPI : 1 HEMBRAM: 6 11Y10M : 2
## Max. :50.00 BIKASH : 1 TUDU : 5 12Y6M : 2
## (Other) :42 (Other): 2 (Other):26
## SCHOOL HPLC.TEST.NO. HB HPLC.DATE
## : 1 622A : 1 Min. :12.10 5.7.2016:10
## N.A.T.HIGH SCHOOL:49 628A : 1 1st Qu.:12.30 6.7.2016:31
## 631A : 1 Median :12.75 7.7.2016: 6
## 632A : 1 Mean :12.83 8.7.2016: 3
## 636A : 1 3rd Qu.:13.20
## 640A : 1 Max. :14.80
## (Other):44
## RBC PCV MCV MCH
## Min. :4.920 Min. :37.50 Min. :65.70 Min. :19.30
## 1st Qu.:5.275 1st Qu.:40.42 1st Qu.:72.33 1st Qu.:21.70
## Median :5.570 Median :41.45 Median :75.30 Median :23.20
## Mean :5.604 Mean :41.71 Mean :74.57 Mean :22.99
## 3rd Qu.:5.950 3rd Qu.:43.15 3rd Qu.:77.17 3rd Qu.:24.27
## Max. :6.420 Max. :47.50 Max. :79.30 Max. :26.00
##
## MCHC RDW HbA0 HbA2
## Min. :29.40 Min. :12.60 Min. :84.10 Min. :2.100
## 1st Qu.:30.10 1st Qu.:14.40 1st Qu.:85.70 1st Qu.:2.600
## Median :30.80 Median :15.20 Median :86.50 Median :2.800
## Mean :30.82 Mean :15.29 Mean :86.49 Mean :2.764
## 3rd Qu.:31.57 3rd Qu.:16.00 3rd Qu.:87.28 3rd Qu.:2.900
## Max. :32.90 Max. :18.90 Max. :88.60 Max. :3.700
##
## HbF HBA2.E HBS HBD
## Min. :0.200 Mode:logical Mode:logical Mode:logical
## 1st Qu.:0.200 NA's:50 NA's:50 NA's:50
## Median :0.300
## Mean :0.336
## 3rd Qu.:0.400
## Max. :0.700
##
## FERITIN B12 CLINICAL.ABNORMALITY MOLICULAR
## Mode:logical Mode:logical Mode:logical Alpha Mutation :29
## NA's:50 NA's:50 NA's:50 Normal :20
## not found : 1
##
##
##
##
mn<-read.csv("jalpaigurinewfolder.csv")
mn.j<-mn
summary(mn.j)
## NAME SURNAME AGE SEX School
## Sahil : 2 Oraon :7 Min. :10.00 M:19 Nathuahat :19
## Ashik : 1 Oraon :6 1st Qu.:11.00
## Atish : 1 Bhagat :1 Median :12.00
## Bipul : 1 Lama :1 Mean :12.16
## Bishal : 1 Munda :1 3rd Qu.:13.50
## Hrittik: 1 Shaibya :1 Max. :15.00
## (Other):12 (Other) :2
## HPLC.NO HB Molecular HPLC.TEST.NO.
## Min. : 1.00 Min. :12.10 alpha mutation:13 1/Nathua : 1
## 1st Qu.:13.00 1st Qu.:12.25 Alpha mutation: 1 17/Nathua: 1
## Median :19.00 Median :12.40 normal : 5 23/Nathua: 1
## Mean :35.68 Mean :12.48 31/Nathua: 1
## 3rd Qu.:68.50 3rd Qu.:12.65 33/Nathua: 1
## Max. :83.00 Max. :13.10 37/Nathua: 1
## (Other) :13
## HPLC.DATE RBC PCV MCV
## 17.03.16:6 Min. :4.550 Min. :34.00 Min. :64.40
## 16.03.16:5 1st Qu.:4.795 1st Qu.:35.55 1st Qu.:69.65
## 18.03.26:3 Median :5.160 Median :36.40 Median :71.30
## 18.03.17:1 Mean :5.129 Mean :36.76 Mean :71.90
## 18.03.19:1 3rd Qu.:5.445 3rd Qu.:37.50 3rd Qu.:74.45
## 18.03.20:1 Max. :5.940 Max. :41.20 Max. :78.50
## (Other) :2
## MCH MCHC RDW HbA0
## Min. :20.60 Min. :31.10 Min. :14.00 Min. :87.30
## 1st Qu.:22.55 1st Qu.:31.90 1st Qu.:15.65 1st Qu.:87.40
## Median :23.10 Median :32.20 Median :16.60 Median :87.60
## Mean :23.26 Mean :32.34 Mean :16.66 Mean :87.78
## 3rd Qu.:24.20 3rd Qu.:32.60 3rd Qu.:18.15 3rd Qu.:88.05
## Max. :26.60 Max. :33.90 Max. :18.90 Max. :89.20
##
## HbA2 HbF HBA2.E HBS
## Min. :2.700 Min. :0.0000 Mode:logical Mode:logical
## 1st Qu.:2.900 1st Qu.:0.0000 NA's:19 NA's:19
## Median :3.000 Median :0.2000
## Mean :3.016 Mean :0.2158
## 3rd Qu.:3.250 3rd Qu.:0.3500
## Max. :3.300 Max. :0.7000
##
## HBD FERITIN B12 CLINICAL.ABNORMALITY
## Mode:logical Mode:logical Mode:logical Mode:logical
## NA's:19 NA's:19 NA's:19 NA's:19
##
##
##
##
##
abc<-read.csv("Bankuranewfolder.csv")
abc.b<-abc
summary(abc.b)
## NAME SURNAME AGE SEX
## AJAY : 1 BASKEY:1 Min. : 5.000 MALE:24
## AVIJIT : 1 BESRA :1 1st Qu.: 8.000
## BIDYASAGAR: 1 KISKU :2 Median :10.000
## BIR SING : 1 MANDI :8 Mean : 9.208
## BUDDHADEB : 1 MURMU :9 3rd Qu.:10.000
## BUDHADEV : 1 SOREN :2 Max. :12.000
## (Other) :18 TUDU :1
## SCREENING HPLC.TEST.NO.
## Baganpara Bangladanga Chandi P. School: 2 Min. : 8.00
## Kawasole P. Vidyalaya : 9 1st Qu.:39.75
## Peardoba P. School : 2 Median :57.50
## Sri Maa Shishu Niketan :11 Mean :52.67
## 3rd Qu.:71.25
## Max. :86.00
##
## HB RBC PCV MCV
## Min. :12.10 Min. :4.970 Min. :37.50 Min. :68.40
## 1st Qu.:12.20 1st Qu.:5.452 1st Qu.:40.38 1st Qu.:69.72
## Median :12.45 Median :5.745 Median :41.00 Median :72.40
## Mean :12.47 Mean :5.647 Mean :40.99 Mean :72.78
## 3rd Qu.:12.62 3rd Qu.:5.883 3rd Qu.:41.62 3rd Qu.:74.30
## Max. :13.40 Max. :6.120 Max. :44.30 Max. :79.90
##
## MCH MCHC RDW HbA0
## Min. :20.20 Min. :29.30 Min. :14.00 Min. :86.50
## 1st Qu.:21.05 1st Qu.:29.95 1st Qu.:15.97 1st Qu.:87.38
## Median :21.90 Median :30.30 Median :16.40 Median :87.75
## Mean :22.19 Mean :30.43 Mean :16.51 Mean :87.72
## 3rd Qu.:22.80 3rd Qu.:30.80 3rd Qu.:17.10 3rd Qu.:88.12
## Max. :25.20 Max. :32.60 Max. :19.10 Max. :88.60
##
## HbA2 HbF HBA2.E HBS
## Min. :2.400 Min. :0.1000 Mode:logical Mode:logical
## 1st Qu.:2.700 1st Qu.:0.2000 NA's:24 NA's:24
## Median :2.800 Median :0.3000
## Mean :2.842 Mean :0.3583
## 3rd Qu.:3.000 3rd Qu.:0.3000
## Max. :3.200 Max. :1.6000
##
## HBD FERITIN B12 CLINICAL.ABNORMALITY
## Mode:logical Mode:logical Mode:logical Mode:logical
## NA's:24 NA's:24 NA's:24 NA's:24
##
##
##
##
##
## MOLICULAR.MUTATION
## Alpha Mutation:17
## normal : 7
##
##
##
##
##
#barplot
hj<-read.csv("Murshidabadnewfolder.csv")
jh<-read.csv("naidanewfolder.csv")
gd<-read.csv("birbhumnewfolder.csv")
dg<-read.csv("jalpaigurinewfolder.csv")
dk<-read.csv("Bankuranewfolder.csv")
f<-cbind(mean(hj$HB),mean(jh$HB),mean(gd$HB),mean(dg$HB),mean(dk$HB))
colnames(f)<-c("Mur","nai","bir","jal","Ban")
par(mfrow=c(2.5,2.5))
barplot(f)
e<-cbind(mean(hj$HbA0),mean(jh$HBA0),mean(gd$HbA0),mean(dg$HbA0),mean(dk$HB) )
colnames(e)<-c("Mur1","nai1","bir1","jal","Ban1")
barplot(e)
o<-cbind(mean(hj$HbF),mean(jh$HBF),mean(gd$HbF),mean(dg$HbF),mean(dk$HbF))
colnames(o)<-c("mur2","nai2","bir2","jal2","Ban2")
barplot(o)
ba<-read.csv("Bankuranewfolder.csv")
b<-read.csv("Birbhumnewfolder.csv")
j<-read.csv("jalpaigurinewfolder.csv")
m<-read.csv("Murshidabadnewfolder.csv")
n<-read.csv("naidanewfolder.csv")
par(mfrow=c(2,2))
boxhb<-cbind(ba$HB,b$HB,j$HB,m$HB,n$HB)
## Warning in cbind(ba$HB, b$HB, j$HB, m$HB, n$HB): number of rows of result
## is not a multiple of vector length (arg 1)
colnames(boxhb)<-c("BA","B","J","M","N")
boxplot(boxhb,ylab="hb",xlab="district",main="boxplot")
boxhbao<-cbind(ba$HbA0,b$HbA0,j$HbA0,m$HbA0,n$HBA0)
## Warning in cbind(ba$HbA0, b$HbA0, j$HbA0, m$HbA0, n$HBA0): number of rows
## of result is not a multiple of vector length (arg 1)
colnames(boxhbao)<-c("BA","B","J","M","N")
boxplot(boxhbao,ylab="hbao",xlab="district",main="boxplot")
boxhba2<-cbind(ba$HbA2,b$HbA2,j$HbA2,m$HbA2,n$HBA2)
## Warning in cbind(ba$HbA2, b$HbA2, j$HbA2, m$HbA2, n$HBA2): number of rows
## of result is not a multiple of vector length (arg 1)
colnames(boxhba2)<-c("BA","B","J","M","N")
boxplot(boxhba2,ylab="hba2",xlab="district",main="boxplot")
boxhbf<-cbind(ba$HbA2,b$HbF,j$HbF,m$HbF,n$HBF)
## Warning in cbind(ba$HbA2, b$HbF, j$HbF, m$HbF, n$HBF): number of rows of
## result is not a multiple of vector length (arg 1)
colnames(boxhbf)<-c("BA","B","J","M","N")
boxplot(boxhbf,ylab="hbf",xlab="district",main="boxplot")
ba<-read.csv("Bankuranewfolder.csv")
b<-read.csv("Birbhumnewfolder.csv")
j<-read.csv("jalpaigurinewfolder.csv")
m<-read.csv("Murshidabadnewfolder.csv")
n<-read.csv("naidanewfolder.csv")
nad<-cbind(n$HB,n$MCV,n$MCH,n$HBA0,n$HBA2,n$HBF,n$MOLECULAR)
distn<-rep("naida",nrow(n))
colnames(nad)<-c("hb","mcv","mch","hbao","hba2","hbf","mol")
f.n<-cbind(nad,distn)
ban<-cbind(ba$HB,ba$MCV,ba$MCH,ba$HbA0,ba$HbA2,ba$HbF,ba$MOLICULAR.MUTATION)
distn1<-rep("Bankura",nrow(ba))
colnames(ban)<-c("hb1","mcv1","mch1","hbao1","hba21","hbf1","mol1")
f.ba<-cbind(ban,distn1)
bir<-cbind(b$HB,b$MCV,b$MCH,b$HbA0,b$HbA2,b$HbF,b$MOLICULAR)
distn2<-rep("Birbhum",nrow(b))
colnames(bir)<-c("hb2","mcv2","mch2","hbao2","hba22","hbf2","mol2")
f.bir<-cbind(bir,distn2)
jal<-cbind(j$HB,j$MCV,j$MCH,j$HbA0,j$HbA2,j$HbF,j$Molecular)
distn3<-rep("jalpaiguri",nrow(j))
colnames(jal)<-c("hb3","mcv3","mch3","hbao3","hba23","hbf3","mol3")
f.jal<-cbind(jal,distn3)
mur<-cbind(m$HB,m$MCV,m$MCH,m$HbA0,m$HbA2,m$HbF,m$molecular)
distn4<-rep("Murshidabad",nrow(m))
colnames(mur)<-c("hb4","mcv4","mch4","hbao4","hba24","hbf4","mol4")
f.mur<-cbind(mur,distn4)
csvxtract<-function(csvfile){qq<-read.csv(csvfile,header = TRUE)
hb<-qq$HB
hbao<-qq$HbAO
hbf<-qq$HbF
mcv<-qq$mcv
mch<-qq$MCH
hba2<-qq$HbA2
diag<-qq$MOLECULAR
s1<-cbind(hb,hbao,hbf,mcv,mch,hba2)
s<-list("NUM"=s1,"MUT"=diag)
return(s)}
nad<-csvxtract("nadiaonly.csv")
summary(nad)
## Length Class Mode
## NUM 188 -none- numeric
## MUT 47 factor numeric
ban<-csvxtract("bankuraonly.csv")
summary(ban)
## Length Class Mode
## NUM 96 -none- numeric
## MUT 24 factor numeric
bir<-csvxtract("Birbhumonly.csv")
summary(bir)
## Length Class Mode
## NUM 200 -none- numeric
## MUT 50 factor numeric
mur<-csvxtract("Murshidabadonly.csv")
summary(mur)
## Length Class Mode
## NUM 80 -none- numeric
## MUT 20 factor numeric
jal<-csvxtract("jalpaigurionly.csv")
summary(jal)
## Length Class Mode
## NUM 76 -none- numeric
## MUT 19 factor numeric
mastercsv<-rbind(nad$NUM,ban$NUM,bir$NUM,mur$NUM,jal$NUM)
mastgrp<-c(nad$MUT,ban$MUT,bir$MUT,mur$MUT,jal$MUT)
recode<-c(normal=2,alpha=1)
gclass<-factor(mastgrp,levels=recode,labels=names(recode))
summary(gclass)
## normal alpha NA's
## 71 66 23
ik<-which(is.na(gclass))
ik
## [1] 48 49 53 57 60 63 67 122 123 125 127 132 133 135 136 137 138
## [18] 139 142 143 145 146 154
mast.pca<-prcomp(mastercsv,scale=TRUE)
summary(mast.pca)
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.1334 1.0825 0.9453 0.8061
## Proportion of Variance 0.3212 0.2930 0.2234 0.1625
## Cumulative Proportion 0.3212 0.6141 0.8375 1.0000
biplot(mast.pca)
library(ggbiplot)
## Loading required package: ggplot2
## Loading required package: plyr
## Loading required package: scales
## Loading required package: grid
g<-ggbiplot(mast.pca,obs.scale = 1,varscale=1,groups = gclass,ellipse = TRUE,circle = TRUE)
g<-g+ scale_colour_discrete(name='')
g<-g+ theme(legend.direction = 'horizontal', legend.position = 'top')
print(g)
library("bio3d")
## Warning: package 'bio3d' was built under R version 3.4.4
library("scatterplot3d")
## Warning: package 'scatterplot3d' was built under R version 3.4.4
# problem 1
a<-read.csv("bond.csv")
a
d<-a$atom
f<-a$x
g<-a$y
h<-a$z
x.ab<-f[1]-f[2]
y.ab<-g[1]-g[2]
z.ab<-h[1]-h[2]
x.cb<-f[3]-f[2]
y.cb<-g[3]-g[2]
z.cb<-h[3]-h[2]
u<-c(x.ab,y.ab,z.ab)
v<-c(x.cb,y.cb,z.cb)
dotuv<-u%*%v
modu<-sqrt(u%*%v)
modv<-sqrt(v%*%v)
mod<-modu%*%modv
ct<-dotuv/mod
ct%*%180
## [,1]
## [1,] 127.2792
s<-read.pdb("1ZNI")
## Note: Accessing on-line PDB file
## PDB has ALT records, taking A only, rm.alt=TRUE
summary(s)
##
## Call: read.pdb(file = "1ZNI")
##
## Total Models#: 1
## Total Atoms#: 915, XYZs#: 2745 Chains#: 4 (values: A B C D)
##
## Protein Atoms#: 806 (residues/Calpha atoms#: 102)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 109 (residues: 109)
## Non-protein/nucleic resid values: [ CL (3), HOH (103), ZN (3) ]
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
t<-torsion.pdb(s)
phi<-t$phi
psi<-t$psi
om<-t$omega
r1<-c(phi[1:10],psi[1:10])
(r1)
## [1] NA -61.87672 -69.04083 -68.51322 -84.38693 -118.16191
## [7] -100.90698 -92.32031 -143.03825 -84.72117 -153.62096 -31.84824
## [13] -37.72618 -35.06878 -35.52542 12.08905 -30.34935 -15.72777
## [19] 158.38593 146.23930
plot(om,pch=19)
plot(psi,phi,pch=19)
# problem 3
summary(s$atom)
## type eleno elety alt
## Length:915 Min. : 1.0 Length:915 Length:915
## Class :character 1st Qu.:230.5 Class :character Class :character
## Mode :character Median :467.0 Mode :character Mode :character
## Mean :466.7
## 3rd Qu.:698.5
## Max. :933.0
## resid chain resno insert
## Length:915 Length:915 Min. : 1.0 Length:915
## Class :character Class :character 1st Qu.: 8.0 Class :character
## Mode :character Mode :character Median :16.0 Mode :character
## Mean :17.1
## 3rd Qu.:24.0
## Max. :63.0
## x y z o
## Min. :-25.267 Min. :-0.149 Min. :-21.718 Min. :0.1700
## 1st Qu.: -8.268 1st Qu.: 7.718 1st Qu.: -6.747 1st Qu.:1.0000
## Median : -0.980 Median :12.372 Median : 0.215 Median :1.0000
## Mean : -1.742 Mean :12.493 Mean : 0.358 Mean :0.9874
## 3rd Qu.: 4.679 3rd Qu.:17.089 3rd Qu.: 7.059 3rd Qu.:1.0000
## Max. : 18.630 Max. :27.204 Max. : 27.500 Max. :1.0000
## b segid elesy charge
## Min. :11.90 Length:915 Length:915 Length:915
## 1st Qu.:20.16 Class :character Class :character Class :character
## Median :27.23 Mode :character Mode :character Mode :character
## Mean :31.79
## 3rd Qu.:39.42
## Max. :92.47
a<-s$atom[, c("x","y","z","elety")]
aa<-which(a$elety=="CA")
a.a<-s$atom[, c("x","y","z")]
r<-a.a[aa,]
summary(r)
## x y z
## Min. :-19.028 Min. : 3.866 Min. :-17.0430
## 1st Qu.: -8.092 1st Qu.: 7.950 1st Qu.: -6.1193
## Median : -0.957 Median :12.098 Median : 0.3630
## Mean : -1.820 Mean :12.477 Mean : 0.4244
## 3rd Qu.: 4.153 3rd Qu.:16.750 3rd Qu.: 7.1040
## Max. : 14.883 Max. :22.896 Max. : 21.8790
scatterplot3d(r,highlight.3d = TRUE, col.axis = "red",col.grid = "lightblue", main = "scatterplot 3D",pch = 20)
ik<-which(s$atom$elety=="CA")
CAxyz<-s$atom[ik,c("x","y","z")]
plot(CAxyz)
blast<-blast.pdb(pdbseq(s))
## Searching ... please wait (updates every 5 seconds) RID = E8NTJA64014
##
## Reporting 216 hits
top.hits<-plot.blast(blast, cutoff = 25)
## * Possible cutoff values: 32 15 31 16 32 16 27 15 57 13 27 11 30 11 32 16 25 16 25 13
## Yielding Nhits: 2 3 5 6 8 9 11 12 13 15 17 18 20 21 23 24 26 27 127 216
##
## * Chosen cutoff value of: 25
## Yielding Nhits: 117
head(top.hits$hits)
## pdb.id acc group
## 1 "6INS_E" "6INS_E" "1"
## 2 "6INS_E" "6INS_E" "1"
## 4 "1ZEI_A" "1ZEI_A" "1"
## 5 "1ZEI_A" "1ZEI_A" "1"
## 7 "1EFE_A" "1EFE_A" "1"
## 8 "1EFE_A" "1EFE_A" "1"
a<-blast$raw
returned.pdbids<-a$subjectids
(returned.pdbids)
## [1] "6INS_E" "6INS_E" "6INS_E" "1ZEI_A" "1ZEI_A" "1ZEI_A" "1EFE_A"
## [8] "1EFE_A" "1EFE_A" "1SJU_A" "1SJU_A" "1SJU_A" "2JZQ_A" "2JZQ_A"
## [15] "2JZQ_A" "5WBT_A" "5WBT_A" "5WBT_A" "5WDM_A" "5WDM_A" "5WDM_A"
## [22] "6B3Q_AA" "6B3Q_AA" "6B3Q_AA" "2KQP_A" "2KQP_A" "2KQP_A" "1APH_B"
## [29] "1APH_B" "2RN5_B" "2RN5_B" "2LGB_B" "2LGB_B" "1TRZ_B" "1TRZ_B"
## [36] "1XDA_B" "1XDA_B" "5AZZ_B" "5AZZ_B" "2INS_B" "2INS_B" "1IZA_B"
## [43] "1IZA_B" "1HO0_A" "1HO0_A" "4UNE_B" "4UNE_B" "1HLS_B" "1HLS_B"
## [50] "1B9E_B" "1B9E_B" "5MHD_B" "5MHD_B" "2MVD_B" "2MVD_B" "1BEN_B"
## [57] "1BEN_B" "1QIY_B" "1QIY_B" "3BXQ_B" "3BXQ_B" "3V1G_B" "3V1G_B"
## [64] "2WRX_B" "2WRX_B" "3ZQR_B" "3ZQR_B" "2M2O_B" "2M2O_B" "1W8P_B"
## [71] "1W8P_B" "3JSD_B" "3JSD_B" "2M2M_B" "2M2M_B" "1ZEG_B" "1ZEG_B"
## [78] "1HIQ_B" "1HIQ_B" "4UNG_B" "4UNG_B" "1HIT_B" "1HIT_B" "4UNH_B"
## [85] "4UNH_B" "1MHI_B" "1MHI_B" "5HPR_B" "5HPR_B" "1QIY_D" "1QIY_D"
## [92] "3U4N_B" "3U4N_B" "5BPO_B" "5BPO_B" "2N2V_B" "2N2V_B" "4P65_B"
## [99] "4P65_B" "5BQQ_B" "5BQQ_B" "4NIB_B" "4NIB_B" "1LPH_B" "1LPH_B"
## [106] "1SJT_B" "1SJT_B" "1JCO_B" "1JCO_B" "1HTV_B" "1HTV_B" "1A7F_B"
## [113] "1A7F_B" "1MHJ_B" "1MHJ_B" "4EFX_B" "4EFX_B" "1K3M_B" "1K3M_B"
## [120] "2N2X_B" "2N2X_B" "2MPG_B" "2MPG_B" "1T1K_B" "1T1K_B" "1T1P_B"
## [127] "1T1P_B" "3ZS2_B" "3ZS2_B" "1HUI_B" "1HUI_B" "1B9G_A" "1B9G_A"
## [134] "1B9G_A" "2HHO_B" "2HHO_B" "1T1Q_B" "1T1Q_B" "1TGR_A" "1TGR_A"
## [141] "1TGR_A" "2WS4_B" "2WS4_B" "1PID_B" "1PID_B" "1BZV_B" "1BZV_B"
## [148] "2KJU_B" "2KJU_B" "2K9R_B" "2K9R_B" "2MLI_B" "2MLI_B" "2H67_B"
## [155] "2H67_B" "2K91_B" "2K91_B" "2HH4_B" "2HH4_B" "2L1Y_B" "2L1Y_B"
## [162] "2MPI_B" "2MPI_B" "1IMX_A" "1IMX_A" "1IMX_A" "5L3M_A" "5L3M_A"
## [169] "5L3M_A" "1SDB_B" "1SDB_B" "5L3N_A" "5L3N_A" "5L3N_A" "1DEI_B"
## [176] "1DEI_B" "1IGL_A" "1IGL_A" "1IGL_A" "3LRI_A" "3LRI_A" "3LRI_A"
## [183] "2QIU_A" "2QIU_A" "5MHD_A" "5MHD_A" "4INS_A" "4INS_A" "2KXK_A"
## [190] "2KXK_A" "2BN3_A" "2BN3_A" "1XW7_A" "1XW7_A" "2JUM_A" "2JUM_A"
## [197] "5MWQ_A" "5MWQ_A" "1K3M_A" "1K3M_A" "1JCA_A" "1JCA_A" "2JUU_A"
## [204] "2JUU_A" "1Q4V_A" "1Q4V_A" "1J73_A" "1J73_A" "1APH_A" "1APH_A"
## [211] "4IYD_A" "4IYD_A" "2WBY_C" "2WBY_C" "1BEN_A" "1BEN_A"
p<-a$subjectids
p[10]
## [1] "1SJU_A"
p[40]
## [1] "2INS_B"
p[100]
## [1] "5BQQ_B"
plot(a$bitscore,a$evalue)
plot(a$evalue,type = "l")
plot(a$bitscore,type = "l")
y<-read.pdb("1SJU")
## Note: Accessing on-line PDB file
summary(y)
##
## Call: read.pdb(file = "1SJU")
##
## Total Models#: 1
## Total Atoms#: 762, XYZs#: 2286 Chains#: 1 (values: A)
##
## Protein Atoms#: 762 (residues/Calpha atoms#: 50)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 0 (residues: 0)
## Non-protein/nucleic resid values: [ none ]
##
## + attr: atom, xyz, seqres, helix, calpha,
## call
t1<-torsion.pdb(y)
phi1<-t1$phi
psi1<-t1$psi
om1<-t1$omega
z<-read.pdb("2INS")
## Note: Accessing on-line PDB file
## PDB has ALT records, taking A only, rm.alt=TRUE
summary(z)
##
## Call: read.pdb(file = "2INS")
##
## Total Models#: 1
## Total Atoms#: 956, XYZs#: 2868 Chains#: 4 (values: A B C D)
##
## Protein Atoms#: 770 (residues/Calpha atoms#: 100)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 186 (residues: 186)
## Non-protein/nucleic resid values: [ HOH (184), ZN (2) ]
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
t2<-torsion.pdb(z)
phi2<-t2$phi
psi2<-t2$psi
x<-read.pdb("5BQQ")
## Note: Accessing on-line PDB file
## PDB has ALT records, taking A only, rm.alt=TRUE
summary(x)
##
## Call: read.pdb(file = "5BQQ")
##
## Total Models#: 1
## Total Atoms#: 2812, XYZs#: 8436 Chains#: 12 (values: A B C D E F G H I J K L)
##
## Protein Atoms#: 2268 (residues/Calpha atoms#: 292)
## Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0)
##
## Non-protein/nucleic Atoms#: 544 (residues: 400)
## Non-protein/nucleic resid values: [ CL (2), HIX (5), HOH (375), IPH (10), NVA (6), ZN (2) ]
##
## + attr: atom, xyz, seqres, helix, sheet,
## calpha, remark, call
t3<-torsion.pdb(x)
phi3<-t3$phi
psi3<-t3$psi
plot(psi,phi,pch=19,col="blue",cex=0.5,text(-100,150,"1.b=1ZNI",col="blue"))
points(psi2,phi2,pch=19,col="green",cex=0.4,text(-25,150,"2.g=1SJU",col="green"))
points(psi1,phi1,pch=19,col="red",cex=0.4,text(45,15,"3.r=2INS",col = "red"))
points(psi3,phi3,pch=19,col="pink",cex=0.4,text(120,150,"4.bla=5BQQ",col = "black"))
z<-read.csv("xyz.csv")
x1<-z$x
y1<-z$y
z1<-z$z
AB<-c(x1[2]-x1[1],y1[2]-y1[1],z1[2]-z1[1])
BC<-c(x1[3]-x1[2],y1[3]-y1[2],z1[3]-z1[2])
crs<-function(u,v){
c1<- u[2]*v[3]-v[2]*u[3]
c2<- u[3]*v[1]-v[3]*u[1]
c3<- u[1]*v[2]-v[1]*u[2]
return(c(c1,c2,c3))
}
K<-crs(AB,BC)
(K)
## [1] 2.46 4.90 -0.54
#problem 2
## Suppose you have x and y variables with some NA elements .Write a program to find
###(a) Mean of x
####(b) Mean of y
#####(c) Plot of y against x
w<-read.csv("Book.csv")
f<-w$x
f1<-is.na(f)
f2<-which(f1)
b<-f[-f2]
b
## [1] 3 5 7 9 11
mean(b)
## [1] 7
g<-w$y
g1<-is.na(g)
g2<-which(g1)
a<-g[-g2]
a
## [1] 2 4 6 8 12
mean(a)
## [1] 6.4
plot(b,a)
# problem 4
## Suppose you have a spectrum with more than one peaks. Write a program to find out all the peaks
library("readxl")
z<-read_xlsx("rti.xlsx")
RA<-z$Ramean
t1<-z$Tmean
i1<-z$Imean
w1<-z$wavelen
#spectral extraction
a<-401-320
a
## [1] 81
aa<-a+(501-401)
aa
## [1] 181
aaa<-aa+(601-501)
aaa
## [1] 281
p=0
p1=0
p2=0
p3=0
p4=0
p5=0
p6=0
p7=0
p8=0
for(i in 1:a){
p[i]<-RA[i]
p1[i]<-t1[i]
p2[i]<-i1[i]
}
for(i in (a+1):aa){
p3[i-a]<-RA[i]
p4[i-a]<-t1[i]
p5[i-a]<-i1[i]
}
for(i in (aa+1):aaa){
p6[i-aa]<-RA[i]
p7[i-aa]<-t1[i]
p8[i-aa]<-i1[i]
}
plot(w1,RA,cex=0.5,col="blue",main="RA SFS",xlab = "Wavelength(320-600nm)",ylab = "Fluorescence intensities")
plot(w1,t1,cex=0.5,col="red",main="T SFS",xlab = "Wavelength(320-600nm)",ylab = "Fluorescence intensities")
plot(w1,i1,cex=0.5,col="cyan",main="I SFS",xlab = "Wavelength(320-600nm)",ylab = "Fluorescence intensities")
plot(p,cex=0.9,col="blue",main="pk in RA")
plot(p3,cex=0.9,col="blue",main="pk3 in RA")
plot(p6,cex=0.9,col="blue",main="pk6 in RA")
plot(p1,cex=0.9,col="blue",main="pk1 in T")
plot(p4,cex=0.9,col="blue",main="pk4 in T")
plot(p7,cex=0.9,col="blue",main="pk7 in T")
plot(p2,cex=0.9,col="blue",main="pk2 in I")
plot(p5,cex=0.9,col="blue",main="pk5 in I")
plot(p8,cex=0.9,col="blue",main="pk8 in I")
#problem 5
#Write a program to generate a random sequence of amino acids.
k5<-"CIHGLKFMTPFMTPSVWYGLKFMTPSVWY"
q<-unlist(strsplit(k5,""))
for(i in 1:30) {cat(sample(q))
cat("/n")}
## S C W W G L K M P T F G F K Y T H L P V S Y I M V F M P T/nM P G F S K T I W H T K Y C V P Y W F T M S L F M V G P L/nY W V P F T S Y I V H F G S K C M P L W P L T M M T F G K/nF P W T V Y C P T M T P W G F L V H S M I K L K Y G M S F/nH S I G C L T M W P Y T V V F S K L G P K Y F M M P W T F/nM T P S K W G F Y L I H G T M T F L V V S C K Y M W P P F/nW L P P I K G F V K T S F C T G H Y T V M P S M W M Y L F/nT T F Y K V M L Y M P W I M H P P K F C L S S F G T G V W/nI P W L G T H C Y S F K W S L M F P Y T T K G V M M P F V/nK P K C M G T V P F H T W M L T V F Y M G Y L P S W S I F/nP G V H F S K Y S K I T P F W F W M Y V M P T L G C M T L/nY Y P V G M F W P K P K L I C L M W T S M V F T H T S F G/nG K F M M L V F K G P Y Y M T S P S T C L P W V T W H I F/nT L S K K W I F P W G T M M M H C F G Y P F V V P S T Y L/nP W M T M V F K P F Y P T G W C V H K L T F S M I S G L Y/nP M C G Y T F T K P S V V Y F F L L K P W S W I G M M T H/nK G T P M F T I S K V F P L W H Y M Y V T G F W P S C M L/nT S H C T T P L W Y Y M P F W S G P I M M K G K F F V L V/nG S F P V M L M W L P W T T Y I T K V Y K S G F H C F M P/nC P F W P P T K H L F M L W V M I M T G Y K F S Y G S T V/nW M P W K V Y P M T I F G K M P F S T Y S V L C F G T H L/nP L Y M P M K G T S G I W P V C F T F T S M K H F V W L Y/nP Y F W P V M H Y M K G C S M T V F T L T G P K W L F S I/nY T M I Y G M F M P C G S L H V S W F F L K T W V K T P P/nC I M H F F G Y K W Y S P M T P S P K M T L W V G L F T V/nT I L L K T G W P M C F T F G W S P K V V F H Y M P M S Y/nC V S P F L V K P H I Y W M L G T T G M T M P Y F S K W F/nI K T K F F S G Y L L T T M H S M W P W G F V P P M V C Y/nP T W S P I K P Y L W H C F T L G M T F G F V M K Y M V S/nP Y M P F S L K H V K V G C P I F Y L W S T F T T G M W M/n
u<-read.csv("aspirincsv.csv")
a<-u$condition
#chi-square-analysis:
con<-u$condition
pla<-which(con=="placebo")
chg<-u$change
mn<-mean(chg)
h<-chg-mn
h[pla]
## [1] -0.05333333 1.04666667 1.14666667 0.64666667 2.24666667
## [6] 3.14666667 1.24666667 0.94666667 1.44666667 2.04666667
## [11] 1.64666667 1.24666667 0.74666667 1.04666667 2.04666667
## [16] 1.14666667
pm<-h[pla]
which(pm>mn)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
pp<-which(pm>mn)
length(pp)
## [1] 16
h
## [1] -0.05333333 1.04666667 -2.95333333 1.14666667 -2.15333333
## [6] -0.75333333 0.04666667 -3.35333333 0.64666667 2.24666667
## [11] -0.25333333 -2.45333333 -3.95333333 3.14666667 -1.65333333
## [16] 1.24666667 0.94666667 1.44666667 -2.35333333 2.04666667
## [21] -1.35333333 1.64666667 1.24666667 0.74666667 1.04666667
## [26] 2.04666667 0.94666667 -0.15333333 -1.35333333 1.14666667
mean(h)
## [1] -1.258433e-16
k<-u$change
k
## [1] -3.2 -2.1 -6.1 -2.0 -5.3 -3.9 -3.1 -6.5 -2.5 -0.9 -3.4 -5.6 -7.1 0.0
## [15] -4.8 -1.9 -2.2 -1.7 -5.5 -1.1 -4.5 -1.5 -1.9 -2.4 -2.1 -1.1 -2.2 -3.3
## [29] -4.5 -2.0
mean(k)
## [1] -3.146667
h
## [1] -0.05333333 1.04666667 -2.95333333 1.14666667 -2.15333333
## [6] -0.75333333 0.04666667 -3.35333333 0.64666667 2.24666667
## [11] -0.25333333 -2.45333333 -3.95333333 3.14666667 -1.65333333
## [16] 1.24666667 0.94666667 1.44666667 -2.35333333 2.04666667
## [21] -1.35333333 1.64666667 1.24666667 0.74666667 1.04666667
## [26] 2.04666667 0.94666667 -0.15333333 -1.35333333 1.14666667
which(k>mean(k))
## [1] 2 4 7 9 10 14 16 17 18 20 22 23 24 25 26 27 30
length(which(k>mean(k)))
## [1] 17
which(k>mean(k)& u$condition=="placebo")
## [1] 2 4 9 10 14 16 17 18 20 22 23 24 25 26 30
l<-which(k>mean(k) & u$condition=="placebo")
length(l)
## [1] 15
u$change<-((u$change)**2)**0.5
u$change
## [1] 3.2 2.1 6.1 2.0 5.3 3.9 3.1 6.5 2.5 0.9 3.4 5.6 7.1 0.0 4.8 1.9 2.2
## [18] 1.7 5.5 1.1 4.5 1.5 1.9 2.4 2.1 1.1 2.2 3.3 4.5 2.0
m<-mean(u$change)
m
## [1] 3.146667
ik<-which(u$change>m &u$condition=="placebo")
length(ik)
## [1] 1
jk<-which(u$change<m & u$condition=="placebo")
length(jk)
## [1] 15
kk<-which(u$change>m & u$condition=="aspirin")
length(kk)
## [1] 12
pk<-which(u$change<m & u$condition=="aspirin")
length(pk)
## [1] 2
placebo<-c(length(ik),length(jk))
placebo
## [1] 1 15
placebo<-cbind(length(ik),length(jk))
placebo
## [,1] [,2]
## [1,] 1 15
colnames(placebo)<-c("+","-")
placebo
## + -
## [1,] 1 15
aspirin<-cbind(length(kk),length(pk))
data.file<-rbind(placebo,aspirin)
data.file
## + -
## [1,] 1 15
## [2,] 12 2
colnames(data.file)<-c("+","-")
data.file
## + -
## [1,] 1 15
## [2,] 12 2
rownames(data.file)<-c("placebo","aspirin")
colnames(data.file)<-c("+","-")
data.file
## + -
## placebo 1 15
## aspirin 12 2
#probability of value
pa.pos<-((data.file[2,2]+data.file[2,1])/30)*((data.file[1,1]+data.file[2,1])/30)
po.pos<-((data.file[1,1]+data.file[1,2])/30)*((data.file[1,1]+data.file[2,1])/30)
po.pos
## [1] 0.2311111
pi.pos<-((data.file[2,1]+data.file[2,2])/30)*((data.file[1,2]+data.file[2,2])/30)
pi.pos
## [1] 0.2644444
pp.pos<-((data.file[1,2]+data.file[1,1])/30)*((data.file[1,2]+data.file[2,2])/30)
pp.pos
## [1] 0.3022222
pap<-pa.pos*30
pop<-po.pos*30
pip<-pi.pos*30
ppp<-pp.pos*30
e11<-(data.file[1,1]-pop)^2/pop
e12<-(data.file[1,2]-ppp)^2/ppp
e21<-(data.file[2,1]-pip)^2/pip
e22<-(data.file[2,2]-pap)^2/pap
chivalue<-sum(e11,e12,e21,e22)
chivalue
## [1] 13.77101
library(bio3d)
a<-read.pdb("1wba")
## Note: Accessing on-line PDB file
t.a<-torsion.pdb(a)
phi.a<-t.a$phi
b<-is.na(phi.a)
b1<-which(b)
phi.ak<-phi.a[-b1]
psi.a<-t.a$psi
e<-is.na(psi.a)
e1<-which(e)
psi.ak<-psi.a[-e1]
par(mfrow=c(2,2))
plot(phi.ak,psi.ak,xlim = c(-180,180),ylim = c(-180,180),main = "1wba")
abline(h=c(0,0),v=c(0,0))
phi1<- -50.0
phi2<- -100.0
ik<-which(phi.ak<phi1 & phi.ak>phi2)
phial.ak<-length(ik)
phi3<- -40.0
phi4<- -150.0
ij<-which(phi.ak<phi3 & phi.ak>phi4)
phib.ak<-length(ij)
psi5<- -80.0
psi6<- 10.0
il<-which(psi.ak<psi6 & psi.ak>psi5)
psial.ak<-length(il)
psi7<- 110.0
psi8<- 160.0
im<-which(psi.ak>psi7 & psi.ak<psi8)
psib.ak<-length(im)
r<-cbind(phial.ak,psial.ak,phib.ak,psib.ak)
rownames(r)<-c("1wba")
(r)
## phial.ak psial.ak phib.ak psib.ak
## 1wba 94 28 154 104
u<-read.pdb("3ua0")
## Note: Accessing on-line PDB file
t.ua<-torsion.pdb(u)
phi.ua<-t.ua$phi
x<-is.na(phi.ua)
x1<-which(x)
phi.ua<-phi.ua[-x1]
psi.ua<-t.ua$psi
s<-is.na(psi.ua)
s1<-which(s)
psi.ua<-psi.ua[-s1]
plot(phi.ua,psi.ua,xlim = c(-180,180),ylim = c(-180,180),main = "3ua0")
abline(h=c(0,0),v=c(0,0))
phi9<- -50.0
phi10<- -110.0
ik1<-which(phi.ua<phi9 & phi.ua>phi10)
phial.ua<-length(ik1)
phi11<- -50.0
phi12<- -150.0
ij1<-which(phi.ak<phi11 & phi.ak>phi12)
phib.ua<-length(ij1)
psi13<- -50.0
psi14<- 10.0
il1<-which(psi.ua<psi14 & psi.ua>psi13)
psial.ua<-length(il1)
psi15<- 100.0
psi16<- 160.0
im1<-which(psi.ua>psi15 & psi.ua<psi16)
psib.ua<-length(im1)
r1<-cbind(phial.ua,psial.ua,phib.ua,psib.ua)
rownames(r1)<-c("3ua0")
(r1)
## phial.ua psial.ua phib.ua psib.ua
## 3ua0 60 15 154 90
main.pro<-rbind(r,r1)
main.pro
## phial.ak psial.ak phib.ak psib.ak
## 1wba 94 28 154 104
## 3ua0 60 15 154 90
colnames(main.pro)<-c("Phi-alpha","Psi-alpha","Phi-beta","Psi-beta")
main.pro
## Phi-alpha Psi-alpha Phi-beta Psi-beta
## 1wba 94 28 154 104
## 3ua0 60 15 154 90
chisq.test(main.pro)
##
## Pearson's Chi-squared test
##
## data: main.pro
## X-squared = 7.1784, df = 3, p-value = 0.06642
x<-c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)
t.test<-function(dataset){
s<-sd(dataset)
d.mean<-mean(dataset)
n1<-length(dataset)
n<-n1**0.5
u<-0.3
t<-(d.mean)/(s/n)
return(t)
}
t.test<-t.test(x)
t.test
## [1] 6.43351
mean(x)
## [1] 0.4564444
#INTERPRETATION
##from the above ,it was found that the mean of the above data ,level of Salmonella (MPN/g) of randomly sampled batches of ice cream is 0.4564444.We have taken the mean value for the Null hypothesis as 0.3,ie the Salmonella level is equal to 0.3. As the mean value found here is greater than the null hyopthesis, it is not possible to accept the Null hypothesis,we have to accept the alternate hypothesis,ie the level of Salmonella is more than 0.3 MPN/g.
#problem2
ballet<-c(89.2,78.2,89.3,88.3,87.3,90.1,95.2,94.3,78.3,89.3)
mean1<-mean(ballet)
sd.ball<-sd(ballet)
sd.ball
## [1] 5.690587
s1<-sd.ball^2
s1
## [1] 32.38278
l1<-length(ballet)
lb<-l1-1
lb
## [1] 9
FOOTBALL<-c(79.3,78.3,85.3,79.3,88.9,91.2,87.2,89.2,93.3,79.9)
mean2<-mean(FOOTBALL)
sd.foot<-sd(FOOTBALL)
sd.foot
## [1] 5.583995
s2<-sd.foot^2
s2
## [1] 31.181
l2<-length(FOOTBALL)
lf<-l2-1
lf
## [1] 9
L<-l1+l2-1
L
## [1] 19
sp<-(s1*lb + s2*lf)/L
sp
## [1] 30.10916
Sq<-(sp*(1/l1 + 1/l2))^0.5
Sq
## [1] 2.453942
MM<-mean1 - mean2
MM
## [1] 2.76
ttest2<-MM/Sq
ttest2
## [1] 1.124721
#Interpritation:
## the t table does not give you the precise probability of every t value,you can use it for hypothesis testing.according to the given reference index,our value is 1.124721 which is almost near to 1.372 haveing p value=0.10
### so at 95% confidence level,haveing value greater than 0.05 it indicate null hypothesis.
####This means that we can not distinguish between the fitness level between the ballet dancers and the football player.
#problem3
data<-c(220, 200, 240, 210, 225 ,210, 180, 170 ,210 ,220 ,190 ,180 ,195 ,190 ,200, 190, 210, 220,240,210)
M<-matrix(data,nrow=10,byrow=TRUE)
colnames(M)<-c("Before","After")
(M)
## Before After
## [1,] 220 200
## [2,] 240 210
## [3,] 225 210
## [4,] 180 170
## [5,] 210 220
## [6,] 190 180
## [7,] 195 190
## [8,] 200 190
## [9,] 210 220
## [10,] 240 210
t.test(M[,1],M[,2],alternative="two.sided",mu=0,paired=TRUE,var.equal=FALSE,conf.level=0.95)
##
## Paired t-test
##
## data: M[, 1] and M[, 2]
## t = 2.5017, df = 9, p-value = 0.03377
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.053366 20.946634
## sample estimates:
## mean of the differences
## 11
library("bio3d")
library("Peptides")
## Warning: package 'Peptides' was built under R version 3.4.4
#Membrane protein
mem.pgu<-read.pdb("2F1T")
## Note: Accessing on-line PDB file
seq.pgu<-pdbseq(mem.pgu)
#Thermostability value of the membrane protein
seq.pgu.i<-aIndex(seq.pgu)
ik<-which(seq.pgu.i==0.0)
seq.pgu.in<-seq.pgu.i[-ik]
therm.pgu<-sum(seq.pgu.in)/length(seq.pgu.in)
hk<-therm.pgu
summary(hk)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 261.7 261.7 261.7 261.7 261.7 261.7
#Protein Interaction
seq.pgu.pp<-boman(seq.pgu)
bin.pot<-sum(seq.pgu.pp)/length(seq.pgu)
bk<-bin.pot
summary(bk)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.275 1.275 1.275 1.275 1.275 1.275
#binpot<2.68 which implies the protein has lower affinity to bind with another protein
#Cytosolic protein
cyto.zkv<-read.pdb("3zkv")
## Note: Accessing on-line PDB file
seq.zkv<-pdbseq(cyto.zkv)
#Thermostability value of the cytosolic protein
seq.zkv.i<-aIndex(seq.zkv)
jk<-which(seq.zkv.i==0.0)
seq.zkv.in<-seq.zkv.i[-jk]
therm.zkv<-sum(seq.zkv.in)/length(seq.zkv.in)
gk<-therm.zkv
summary(gk)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 306.4 306.4 306.4 306.4 306.4 306.4
#Protein Interaction
seq.zkv.pp<-boman(seq.zkv)
bin.pot1<-sum(seq.zkv.pp)/length(seq.zkv)
fk<-bin.pot1
summary(fk)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.221 1.221 1.221 1.221 1.221 1.221
#binpot<2.68 which implies the protein has lower affinity to bind with another protein
#Principle components
#A function for protein pca
protanapca<-function(seq){
library("bio3d")
library("Peptides")
ch<-charge(seq, pH = 7, pKscale = "EMBOSS")
pp<-boman(seq)
ampI<-hmoment(seq, angle = 100, window = 11)
hy<-hydrophobicity(seq, scale = "KyteDoolittle")
pI<-pI(seq,pKscale = "EMBOSS")
mw<-mw(seq, monoisotopic = FALSE)
protana<-cbind(ch,pp,ampI,hy,pI,mw)
colnames(protana)<-c("CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
p<-cbind(seq,protana)
colnames(p)<-c("AA","CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
pr.p<-prcomp(protana)
return(pr.p)}
#A function for protein analysis of different properties
protana<-function(seq){
library("bio3d")
library("Peptides")
ch<-charge(seq, pH = 7, pKscale = "EMBOSS")
pp<-boman(seq)
ampI<-hmoment(seq, angle = 100, window = 11)
hy<-hydrophobicity(seq, scale = "KyteDoolittle")
pI<-pI(seq,pKscale = "EMBOSS")
mw<-mw(seq, monoisotopic = FALSE)
protana<-cbind(ch,pp,ampI,hy,pI,mw)
colnames(protana)<-c("CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
p<-cbind(seq,protana)
colnames(p)<-c("AA","CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
return(p)}
#Membrane protein
mem.pgu<-read.pdb("2F1T")
## Note: Accessing on-line PDB file
## Warning in get.pdb(file, path = tempdir(), verbose = FALSE): /var/folders/
## t7/t6964h0d35938mcmcfhp1ngw0000gn/T//RtmpJ4MSmb/2F1T.pdb exists. Skipping
## download
seq.pgu<-pdbseq(mem.pgu)
pgu.a<-protana(seq.pgu)
pca.pgu<-protanapca(seq.pgu)
par(mfrow=c(1,2))
biplot(pca.pgu,main="2F1T")
#Cytosolic protein
cyto.zkv<-read.pdb("3zkv")
## Note: Accessing on-line PDB file
## Warning in get.pdb(file, path = tempdir(), verbose = FALSE): /var/folders/
## t7/t6964h0d35938mcmcfhp1ngw0000gn/T//RtmpJ4MSmb/3zkv.pdb exists. Skipping
## download
seq.zkv<-pdbseq(cyto.zkv)
zkv.a<-protana(seq.zkv)
pca.zkv<-protanapca(seq.zkv)
biplot(pca.zkv,main="3ZKV")
#CONCLUSION
#The membrane and cytosolic protein have equal thermal stability
#The protein protein interaction parameter concludes that the cytosolic protein has higher binding potential for a protein of equal thermal stability
#The pca shows out of the six parameters chosen for analysis of this protein molecular weight is an outlier and has minimum relation to protein functionality whereas protein-protein interaction and hydrophobicity are inversely related
#The other four factors such as charge,pI,Amphophilicity and polarity are related factors for a protein irrespective of its origin
#Hence the a conclusive outcome for the above pca could be that for proteins of different origins is that hydrophobicity and protein protein interaction are inversely related irrespective of its origin
#The stastical tool used are pca,biplot and self made function protanapca(which gives the pca values of any protein sequence) and protana(which gives a matrix of all amino acid vs six parameters for any given protein)
##project 3 ###REAL DATA ANALYSIS
ali<-read.csv("Alipurduarr.csv")
ban<-read.csv("Bankura.csv")
bir<-read.csv("Birbhummm.csv")
bur<-read.csv("Burdwann.csv")
mal<-read.csv("Malda.csv")
coo<-read.csv("Coochbeharr.csv")
d.dina<-read.csv("D Dinajpur.csv")
e.med<-read.csv("E Medinipurr.csv")
how<-read.csv("Howrahh.csv")
jal<-read.csv("Jalpaiguri.csv")
kol<-read.csv("Kolkataa.csv")
mur<-read.csv("Murshidabadd.csv")
npgs<-read.csv("N 24 Pgs.csv")
alipur<-cbind(ali$AGE,ali$HB,ali$MCV,ali$MCH,ali$HBA0,ali$HBA2,ali$HBF)
distn<-rep("Alipurduarr",nrow(ali))
sex1<-as.character(ali$SEX)
alipura1<-cbind(alipur,distn,sex1)
summary(alipura1)
## V1 V2 V3 V4 V5
## 16 :143 11.3 : 43 72.3 : 13 22.9 : 26 218 : 29
## 13 :134 11.6 : 40 70.5 : 12 21.7 : 22 228 : 27
## 14 :120 11 : 39 71.3 : 12 21.8 : 20 225 : 26
## 17 :115 10.7 : 35 71.8 : 11 20.2 : 19 226 : 26
## 15 :114 10.8 : 35 72.9 : 11 20.9 : 19 217 : 25
## 12 :111 12 : 35 (Other):992 22.7 : 19 221 : 24
## (Other):315 (Other):825 NA's : 1 (Other):927 (Other):895
## V6 V7 distn sex1
## 0 :143 0 :272 Alipurduarr:1052 F:598
## 2.8 :102 0.2 :192 M:454
## 2.6 : 98 0.3 :149
## 2.7 : 96 0.4 : 78
## 3 : 91 0.1 : 50
## (Other):519 (Other):310
## NA's : 3 NA's : 1
bankura<-cbind(ban$AGE,ban$HB,ban$HBF,ban$HBA2,ban$HBA0,ban$MCV,ban$MCH)
distn1<-rep("Bankura",nrow(ban))
sex<-as.character(ban$SEX)
bankura1<-cbind(bankura,distn1,sex)
summary(bankura1)
## V1 V2 V3 V4 V5
## 13 :138 12.1 : 53 0.2 :349 2.8 :174 88.3 : 61
## 12 :132 12 : 52 0.3 :266 2.9 :158 87.8 : 52
## 10 :100 11.2 : 51 0.4 :120 2.7 :149 88 : 52
## 8 : 91 11.9 : 50 0 :106 2.6 :128 87.9 : 50
## 15 : 86 11.3 : 48 0.1 : 90 3 :125 88.1 : 50
## (Other):602 11.7 : 47 0.5 : 78 3.1 : 88 87.5 : 48
## NA's : 1 (Other):849 (Other):141 (Other):328 (Other):837
## V6 V7 distn1 sex
## 72.7 : 17 21.6 : 31 Bankura:1150 F:458
## 71.8 : 14 22.4 : 30 M:692
## 70.7 : 13 22.3 : 27
## 72.5 : 13 21.7 : 26
## 70.6 : 11 22 : 26
## 71.9 : 11 22.1 : 26
## (Other):1071 (Other):984
birbhum<-cbind(bir$AGE,bir$HB,bir$HBF,bir$HBA0,bir$HBA2,bir$MCH,bir$MCV)
distn2<-rep("Birbhummm",nrow(bir))
sex3<-as.character(bir$SEX)
birbhuma1<-cbind(birbhum,distn2,sex3)
summary(birbhuma1)
## V1 V2 V3 V4 V5
## 4 :207 11.4 : 60 0.2 :370 87.9 : 61 2.9 :140
## 3 :203 11.5 : 52 0.3 :237 87.5 : 54 3 :128
## 5 :137 11.6 : 38 0.1 :139 87.6 : 54 3.2 :121
## 6 : 93 11.8 : 36 0.4 : 77 87.8 : 52 3.1 :113
## 2 : 82 11.9 : 36 0.5 : 54 87 : 48 2.8 :106
## 8 : 75 10.6 : 35 0.7 : 30 87.7 : 48 2.7 : 82
## (Other):150 (Other):690 (Other): 40 (Other):630 (Other):257
## V6 V7 distn2 sex3
## 20.5 : 29 69.7 : 22 Birbhummm:947 F:326
## 20.8 : 28 70.7 : 17 M:621
## 21 : 28 77.6 : 16
## 21.2 : 28 68.5 : 15
## 19.9 : 27 69.2 : 15
## 21.8 : 27 72.9 : 15
## (Other):780 (Other):847
burdwan<-cbind(bur$AGE,bur$HB,bur$HBF,bur$HBA0,bur$HBA2,bur$MCV,bur$MCH)
distn3<-rep("Burdann",nrow(bur))
sex4<-as.character(bur$SEX)
burdwana1<-cbind(burdwan,distn3,sex4)
summary(burdwana1)
## V1 V2 V3 V4 V5
## 14 :352 11 : 100 0.2 :956 88 : 136 2.6 :358
## 13 :351 11.4 : 89 0.3 :429 87.8 : 125 2.7 :352
## 15 :327 11.7 : 77 0.1 :424 87.9 : 113 2.8 :311
## 12 :324 11.2 : 76 0.4 :215 87.7 : 112 2.5 :288
## 11 :258 11.1 : 73 0.5 : 92 87.6 : 105 2.9 :209
## 10 :188 11.5 : 72 (Other):139 (Other):1665 (Other):733
## (Other):458 (Other):1771 NA's : 3 NA's : 2 NA's : 7
## V6 V7 distn3 sex4
## 72.3 : 28 22 : 60 Burdann:2258 F: 522
## 72.5 : 21 22.1 : 48 M:1736
## 71 : 20 21.5 : 45
## 71.2 : 19 21.4 : 44
## 73.2 : 19 22.4 : 44
## 76 : 19 20.7 : 40
## (Other):2132 (Other):1977
cooch<-cbind(coo$AGE,coo$HB,coo$HBF,coo$HBA0,coo$HBA2,coo$MCV,coo$MCH)
distn4<-rep("Coochbeharr",nrow(coo))
sex5<-as.character(coo$SEX)
coocha1<-cbind(cooch,distn4,sex5)
summary(coocha1)
## V1 V2 V3 V4 V5 V6
## 11:3 10.8 : 2 0 :13 84.5 : 3 3.4 :5 72.4 : 3
## 12:3 11.2 : 2 0.2 : 3 84.6 : 3 3.2 :4 69.1 : 2
## 13:9 11.5 : 2 0.4 : 3 83.4 : 2 3.3 :3 73.1 : 2
## 14:7 11.6 : 2 0.3 : 2 84.2 : 2 3.5 :3 61.7 : 1
## 15:3 12.6 : 2 0.1 : 1 25.4 : 1 3.6 :3 64.7 : 1
## 16:2 12.9 : 2 0.6 : 1 5 : 1 3 :2 66.8 : 1
## (Other):15 (Other): 4 (Other):15 (Other):7 (Other):17
## V7 distn4 sex5
## 20.2 : 3 Coochbeharr:27 M:27
## 22.3 : 3
## 20.9 : 2
## 23.4 : 2
## 19 : 1
## 19.7 : 1
## (Other):15
dinaj<-cbind(d.dina$AGE,d.dina$HBF,d.dina$HB,d.dina$HBA0,d.dina$HBA2,d.dina$MCV,d.dina$MCH)
distn5<-rep("D Dinajpur",nrow(d.dina))
sex6<-as.character(d.dina$SEX)
dinaja1<-cbind(dinaj,distn5,sex6)
summary(dinaja1)
## V1 V2 V3 V4 V5
## 13 :408 0.2 :772 10.2 : 183 88.1 : 117 2.9 :333
## 14 :368 0.3 :558 10.8 : 171 87.8 : 114 2.8 :330
## 12 :357 0.4 :223 11.2 : 154 88 : 109 2.7 :279
## 15 :282 0.1 :194 11.6 : 152 88.3 : 103 3 :275
## 11 :206 0.5 :119 11.9 : 129 87.7 : 102 2.6 :177
## 16 :173 0.6 : 72 10.6 : 123 88.2 : 101 (Other):695
## (Other):296 (Other):152 (Other):1178 (Other):1444 NA's : 1
## V6 V7 distn5 sex6
## 80.2 : 85 21.9 : 137 D Dinajpur:2090 F: 694
## 81.2 : 73 20.8 : 125 M:1396
## 78.2 : 66 21.3 : 94
## 78.6 : 63 22.8 : 94
## 76.2 : 53 21.5 : 84
## 69.8 : 45 22.4 : 84
## (Other):1705 (Other):1472
e.medini<-cbind(e.med$AGE,e.med$HB,e.med$HBF,e.med$HBA0,e.med$HBA2,e.med$MCV,e.med$MCH)
distn6<-rep("E medinipurr",nrow(e.med))
sex7<-as.character(e.med$SEX)
emedini1<-cbind(e.medini,distn6,sex7)
summary(emedini1)
## V1 V2 V3 V4 V5
## 12 :153 12.8 : 32 0.5 :204 82.7 : 37 2.6 : 89
## 13 :135 12.7 : 29 0.6 :134 82.2 : 35 2.7 : 88
## 14 :116 12.6 : 28 0.4 :117 82.4 : 30 2.5 : 79
## 15 : 82 11.4 : 24 0.2 : 82 82.3 : 29 2.8 : 74
## 11 : 80 12.9 : 24 0.3 : 73 82.5 : 29 2.4 : 62
## 16 : 66 11.9 : 23 0.7 : 47 81.9 : 28 (Other):357
## (Other):120 (Other):592 (Other): 95 (Other):564 NA's : 3
## V6 V7 distn6 sex7
## 65 : 13 22.1 : 24 E medinipurr:752 F:123
## 65.5 : 12 22.6 : 22 M:629
## 70.6 : 11 21.8 : 19
## 73.8 : 11 22.8 : 19
## 63.7 : 10 22 : 18
## 63.2 : 9 22.2 : 18
## (Other):686 (Other):632
howrah<-cbind(how$AGE,how$HB,how$MCV,how$MCH,how$HBA0,how$HBA2,how$HBF)
distn8<-rep("Howarhh",nrow(how))
sex8<-as.character(how$SEX)
howrah1<-cbind(howrah,distn8,sex8)
summary(howrah1)
## V1 V2 V3 V4 V5 V6
## 11:10 12.4 : 4 71.2 : 2 26.4 : 3 87.2 : 4 2.6 :8
## 12: 9 11.7 : 3 73.2 : 2 21.2 : 2 88 : 4 2.7 :7
## 13: 4 13.4 : 3 88.1 : 2 21.3 : 2 87 : 3 2.9 :7
## 14:12 10.6 : 2 62.9 : 1 22.1 : 2 87.4 : 3 3.1 :7
## 15: 6 11 : 2 67.9 : 1 22.5 : 2 87.6 : 3 2.8 :4
## 16: 2 11.2 : 2 68.9 : 1 22.6 : 2 88.2 : 3 2.5 :3
## 17: 2 (Other):29 (Other):36 (Other):32 (Other):25 (Other):9
## V7 distn8 sex8
## 0.1: 6 Howarhh:45 F: 8
## 0.2:16 M:37
## 0.3:15
## 0.4: 2
## 0.5: 2
## 0.6: 3
## 0.8: 1
jalpai<-cbind(jal$AGE,jal$HB,jal$MCV,jal$MCH,jal$HBA0,jal$HBA2,jal$HBF)
distn9<-rep("Jalpaiguri",nrow(jal))
sex9<-as.character(jal$SEX)
jalpai1<-cbind(jalpai,distn9,sex9)
summary(jalpai1)
## V1 V2 V3 V4 V5
## 8 : 99 11.4 : 35 70.5 : 10 22.5 : 18 85 : 39
## 10 : 91 11.3 : 29 71.3 : 10 23 : 18 82 : 35
## 6 : 61 12 : 27 71.4 : 10 22.7 : 17 83 : 32
## 9 : 56 12.1 : 27 71.7 : 10 22.6 : 16 89 : 26
## 5 : 54 11.8 : 26 71.1 : 8 23.1 : 16 84 : 24
## 13 : 47 12.3 : 25 68.7 : 7 21.6 : 15 88 : 24
## (Other):257 (Other):496 (Other):610 (Other):565 (Other):485
## V6 V7 distn9 sex9
## 2.9 : 98 0.2 :188 Jalpaiguri:665 F:264
## 2.8 : 93 0.3 : 84 M:401
## 2.7 : 78 0 : 73
## 3 : 66 0.4 : 69
## 3.1 : 60 0.9 : 36
## (Other):265 0.1 : 31
## NA's : 5 (Other):184
kolkata<-cbind(kol$AGE,kol$HB,kol$MCV,kol$MCH,kol$HBA0,kol$HBA2,kol$HBF)
distn10<-rep("kolkataa",nrow(kol))
sex10<-as.character(kol$SEX)
kolkata1<-cbind(kolkata,distn10,sex10)
summary(kolkata1)
## V1 V2 V3 V4 V5
## 20 :24 13.3 : 4 85.3 : 4 22.5 : 3 88.8 :10
## 21 :18 13.5 : 4 71.1 : 2 26.4 : 3 88.2 : 8
## 22 :17 14.6 : 4 71.3 : 2 26.7 : 3 88.9 : 8
## 19 :12 14.7 : 4 71.5 : 2 27.1 : 3 88.5 : 7
## 18 :10 10.9 : 3 76 : 2 27.6 : 3 88.6 : 7
## 23 : 7 11.8 : 3 79.1 : 2 28.6 : 3 89 : 5
## (Other): 8 (Other):74 (Other):82 (Other):78 (Other):51
## V6 V7 distn10 sex10
## 2.6 :17 0.1 :42 kolkataa:96 F:21
## 2.4 :13 0.2 :30 M:75
## 2.5 :12 0.3 :11
## 2.7 :12 0.4 : 6
## 2.3 :10 0.5 : 2
## 2.9 : 7 0.6 : 1
## (Other):25 (Other): 4
malda<-cbind(mal$AGE,mal$HB,mal$MCV,mal$MCH,mal$HBA0,mal$HBA2,mal$HBF)
distn11<-rep("Malada",nrow(mal))
sex11<-as.character(mal$SEX)
malda1<-cbind(malda,distn11,sex11)
summary(malda1)
## V1 V2 V3 V4 V5
## 14 :156 11.7 : 71 81.6 : 50 25.1 : 63 86.9 : 36
## 12 :154 11 : 58 81.4 : 47 24.9 : 57 87.2 : 32
## 13 :146 11.9 : 58 80 : 44 25.6 : 54 87.4 : 25
## 15 :130 10.4 : 51 81.7 : 41 25.3 : 51 87.5 : 25
## 11 : 96 12 : 50 81 : 35 24.6 : 45 87.1 : 24
## (Other):450 10.7 : 49 81.9 : 33 24.7 : 42 81.7 : 23
## NA's : 1 (Other):796 (Other):883 (Other):821 (Other):968
## V6 V7 distn11 sex11
## 2.7 :145 0.3 :336 Malada:1133 F:460
## 2.6 :144 0.4 :231 M:673
## 2.5 :143 0.2 :214
## 2.9 :102 0.5 :105
## 2.8 :101 0.6 : 68
## 2.4 : 83 (Other):178
## (Other):415 NA's : 1
murshi<-cbind(mur$AGE,mur$HB,mur$MCV,mur$MCH,mur$HBA0,mur$HBA2,mur$HBF)
distn12<-rep("Murshidabadd",nrow(mur))
sex12<-as.character(mur$SEX)
murshi1<-cbind(murshi,distn12,sex12)
summary(murshi1)
## V1 V2 V3 V4 V5
## 13 : 49 11 : 15 80 : 10 24 : 11 86.8 : 15
## 14 : 44 10.2 : 14 82.9 : 7 25.3 : 11 87 : 13
## 15 : 44 10.7 : 12 81.6 : 6 25.9 : 8 87.1 : 13
## 12 : 40 13.9 : 12 81.7 : 6 21.1 : 7 87.3 : 13
## 16 : 39 10 : 11 82.5 : 6 25.1 : 7 87.5 : 13
## 11 : 36 11.1 : 10 90.4 : 5 25.4 : 7 87.8 : 12
## (Other):108 (Other):286 (Other):320 (Other):309 (Other):281
## V6 V7 distn12 sex12
## 2.7 : 49 0.3 :100 Murshidabadd:360 F: 81
## 3 : 43 0.2 : 97 M:279
## 2.9 : 40 0.4 : 54
## 2.6 : 38 0.5 : 27
## 2.8 : 35 0.9 : 14
## 3.1 : 32 (Other): 67
## (Other):123 NA's : 1
Npgs<-cbind(npgs$AGE,npgs$HB,npgs$MCV,npgs$MCH,npgs$HBA0,npgs$HBA2,npgs$HBF)
distn13<-rep("N 24 Pgs",nrow(npgs))
sex13<-as.character(npgs$SEX)
npgs1<-cbind(Npgs,distn13,sex13)
summary(npgs1)
## V1 V2 V3 V4 V5
## 13 :38 11.9 : 14 67.2 : 5 21.3 : 10 87.7 : 17
## 11 :29 11.1 : 13 69.1 : 4 21.2 : 8 87.5 : 15
## 12 :28 11.6 : 13 69.2 : 4 21.5 : 7 87.3 : 13
## 10 :23 11.7 : 12 69.5 : 4 21.7 : 7 87.8 : 13
## 8 :21 11 : 10 70.5 : 4 22.2 : 7 87.6 : 12
## 9 :21 11.8 : 9 76.7 : 4 21.4 : 6 87.4 : 11
## (Other):62 (Other):151 (Other):197 (Other):177 (Other):141
## V6 V7 distn13 sex13
## 2.7 :51 0.2 :82 N 24 Pgs:222 F: 28
## 2.8 :39 0.3 :41 M:194
## 2.6 :28 0.1 :38
## 2.5 :21 0.4 :19
## 2.9 :20 0.5 :17
## 3 :16 0.6 : 9
## (Other):47 (Other):16
master<-rbind(alipura1,bankura1,birbhuma1,burdwana1,coocha1,dinaja1,emedini1,howrah1,jalpai1,kolkata1,malda1,murshi1,npgs1)
master<-data.frame(master)
master.f<-na.omit(master)
master1<-rbind(alipur,burdwan,bankura,birbhum,cooch,dinaj,e.medini,howrah,jalpai,kolkata,malda,murshi,Npgs)
colnames(master1)<-c("age","hb","hbf","hba2","hba0","mcv","mch")
master1.f<-na.omit(master1)
pca<-prcomp(master1.f,scale= TRUE)
biplot(pca)
g_class<- master.f$sex
library("ggbiplot")
g<-ggbiplot(pca,obs.scale = 0.3,var.scale = 0.3,groups = g_class,ellipse = TRUE,circle = TRUE,pc.biplot = TRUE)
g<-g+xlim(-2.5,2.5)
g<-g+ylim(-2.5,2.5)
g<-g+theme(legend.direction = 'horizontal',legend.position = 'top')
print(g)
## Warning: Removed 139 rows containing missing values (geom_point).
The biplot shows that mcv, mch and hb are very much related to each other.Hence they are contribiting factors.Age has very less co-relation to all other components and does not directly affects any component to a large extent.Hbf is likely to be a independent component of the analysis. From the ggbiplot which seperates components on the distinction of male and female patients the components seem to be likely be the same as the biplot without much variance except the fact that the female popuplation have a high correlation of mcv to mch than male.Also the age becomes more important in case of females with aspect of all the components correlation.The most distinct factor here are the outliers that are found which may be either or consisderd as very special scenarios having very rare conditions for both the male and female conditions. further this project could be extended to focus on finding this outliers or rare clinically important conditions because they are purely independent of each other.For the time being this project concludes with the fact that mcv,mch and hb are dependent variables and hbf is the most independent variable component.Also age and sex(M/F) are moderately significant for the other components except females have atendency to have more mcv and mch with increase in age as suggested from the ggbiplot.According to the grammer of graphics plot (ggbiplot) HbA0 and HbA2 are oppositly releated. When the normal Hb increased then HbA2 decresed and when HbA2 value increased probablity of thalasemia high.