PRINCIPLE

The R language was primarily designed as a language for data manipulation, modeling and visualization, and many of the data structures reflect this view. ####When R is started, a workspace is created and that workspace is where the user creates and manipulates variables.This workspace is an environment, and an environment is a set of bindings of names, or symbols, to values. The top-level workspace can be accessed through its name, which is GlobalEnv. Assignment of value to a variable is generally done with either the = (equals) character, or a special symbol that is the concatenation of less than and minus,<-. Assignment creates a binding between a symbol and a value, in a particular environment. An R package typically consists of a coherent collection of functions and data structures that are suitable for addressing a particular problem. Here we use some package : 1. Bio3d 2. scatter plot3d 3. ggbiplot 4. peptides 5. lines 3d

ASSIGNMENT 1

SPECTRAL ANALYSIS

#setwd("H:/ROLL25")
Z<-read.csv("spectra.csv")
x<-Z$wavelength
y<-Z$absorb
mainlab=paste("wavelength vs absorbtion")
xlabel=paste("wavelength")
ylabel=paste("absorption")
plot(x,y,main = mainlab,col="red",cex=0.5,xlab = xlabel,ylab = ylabel)

jk<-which(x>450 & x<750)
length(jk)

## [1] 1557

wv<-x[jk]
aa<-y[jk]
plot(wv,aa,main = mainlab,col="red",cex=0.5,xlab = xlabel,ylab = ylabel)
mia<-max(aa)
mia

## [1] 0.386

miv<-which(aa==mia)
miv

## [1] 850

wvmax<-wv[850]
wvmax

## [1] 618.08

text(545,0.386,"wvmax= 618.08--->",cex = 0.7, col="blue")
lm(aa~wv)

## 
## Call:
## lm(formula = aa ~ wv)
## 
## Coefficients:
## (Intercept)           wv  
##   8.721e-02    2.193e-06

abline(h=0.386,v=618.08,col="black")

summary(wvmax)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   618.1   618.1   618.1   618.1   618.1   618.1

ASSIGNMENT 2

DISTRICT WISE ANALYSIS OF THALASEMIA

z<-read.csv("Murshidabadnewfolder.csv")
z.m<-z
summary(z.m)

##      SL.NO            NAME       SURNAME       AGE    SEX   
##  Min.   : 1.00   ANANDA : 1   HANSDA :3   13 YRS :4   F: 3  
##  1st Qu.: 5.75   ANITA  : 1   HEMBRAM:5   14 YRS :4   M:17  
##  Median :10.50   BAPI   : 1   KISKU  :1   15 YRS :2         
##  Mean   :10.50   BIKASH : 1   MARDI  :3   17 YRS :2         
##  3rd Qu.:15.25   BIKRAM : 1   MURMU  :4   8 YRS  :2         
##  Max.   :20.00   DEBEN  : 1   SOREN  :2   11 YRS :1         
##                  (Other):14   TUDU   :2   (Other):5         
##     HPLC.NO             HB           Ferritin         RBC       
##  Min.   : 13.00   Min.   :12.10   Min.   :33.8   Min.   :3.200  
##  1st Qu.: 48.25   1st Qu.:12.20   1st Qu.:33.8   1st Qu.:3.900  
##  Median :114.00   Median :12.40   Median :33.8   Median :4.800  
##  Mean   :103.95   Mean   :12.37   Mean   :33.8   Mean   :4.595  
##  3rd Qu.:155.25   3rd Qu.:12.50   3rd Qu.:33.8   3rd Qu.:5.025  
##  Max.   :174.00   Max.   :12.80   Max.   :33.8   Max.   :6.000  
##                                   NA's   :19                    
##       PCV             MCV             MCH             MCHC      
##  Min.   :27.90   Min.   :62.00   Min.   :16.10   Min.   :25.40  
##  1st Qu.:33.80   1st Qu.:68.47   1st Qu.:19.70   1st Qu.:28.40  
##  Median :36.50   Median :71.40   Median :20.60   Median :28.75  
##  Mean   :35.32   Mean   :70.77   Mean   :20.57   Mean   :28.68  
##  3rd Qu.:37.17   3rd Qu.:73.38   3rd Qu.:21.30   3rd Qu.:29.60  
##  Max.   :39.10   Max.   :77.30   Max.   :25.00   Max.   :30.90  
##                                                                 
##       RDW             HbA0            HbA2            HbF      
##  Min.   :11.60   Min.   :84.30   Min.   :1.600   Min.   :0.10  
##  1st Qu.:12.62   1st Qu.:86.60   1st Qu.:2.375   1st Qu.:0.20  
##  Median :13.65   Median :87.45   Median :2.500   Median :0.25  
##  Mean   :14.23   Mean   :87.29   Mean   :2.400   Mean   :0.25  
##  3rd Qu.:15.47   3rd Qu.:88.33   3rd Qu.:2.600   3rd Qu.:0.30  
##  Max.   :21.50   Max.   :89.20   Max.   :2.600   Max.   :0.50  
##                                                                
##   HBA2.E          HBS            HBD          FERITIN       
##  Mode:logical   Mode:logical   Mode:logical   Mode:logical  
##  NA's:20        NA's:20        NA's:20        NA's:20       
##                                                             
##                                                             
##                                                             
##                                                             
##                                                             
##    B12          CLINICAL.ABNORMALITY    X             X.1         
##  Mode:logical   Mode:logical         Mode:logical   Mode:logical  
##  NA's:20        NA's:20              NA's:20        NA's:20       
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##            molecular 
##  alpha mutation :14  
##  normal         : 6  
##                      
##                      
##                      
##                      
##

zx<-read.csv("naidanewfolder.csv")
zx.n<-zx
summary(zx.n)

##     HPLC..No            HB             RBC             HCT       
##  Min.   :  8.00   Min.   :12.10   Min.   :4.510   Min.   :31.60  
##  1st Qu.: 52.50   1st Qu.:12.25   1st Qu.:4.965   1st Qu.:34.70  
##  Median : 87.00   Median :12.70   Median :5.180   Median :36.80  
##  Mean   : 79.06   Mean   :12.95   Mean   :5.256   Mean   :37.01  
##  3rd Qu.:106.50   3rd Qu.:13.45   3rd Qu.:5.600   3rd Qu.:39.25  
##  Max.   :121.00   Max.   :14.90   Max.   :6.030   Max.   :45.00  
##       MCV             MCH             MCHC            RDW       
##  Min.   :61.50   Min.   :17.60   Min.   :28.60   Min.   :13.70  
##  1st Qu.:68.80   1st Qu.:20.60   1st Qu.:29.90   1st Qu.:15.05  
##  Median :70.80   Median :21.40   Median :30.30   Median :16.00  
##  Mean   :70.45   Mean   :21.41   Mean   :30.38   Mean   :16.07  
##  3rd Qu.:72.30   3rd Qu.:22.10   3rd Qu.:30.80   3rd Qu.:16.80  
##  Max.   :74.90   Max.   :23.70   Max.   :31.80   Max.   :18.80  
##       HBA0            HBA2            HBF          HBA2e        
##  Min.   :86.00   Min.   :1.600   Min.   :0.1000   Mode:logical  
##  1st Qu.:86.50   1st Qu.:2.600   1st Qu.:0.2000   NA's:47       
##  Median :87.00   Median :2.700   Median :0.3000                 
##  Mean   :87.01   Mean   :2.723   Mean   :0.3702                 
##  3rd Qu.:87.35   3rd Qu.:2.900   3rd Qu.:0.5000                 
##  Max.   :88.90   Max.   :3.200   Max.   :0.9000                 
##    HBS            HBD                   MOLECULAR 
##  Mode:logical   Mode:logical   ALPHA MUTATION:15  
##  NA's:47        NA's:47        NORMAL        :32  
##                                                   
##                                                   
##                                                   
##

xz<-read.csv("birbhumnewfolder.csv")
xz.b<-xz
summary(xz.b)

##      SL.NO               NAME       SURNAME        AGE     SEX   
##  Min.   : 1.00   LAKHINDAR : 2   SOREN  :12   12Y    : 9   F:12  
##  1st Qu.:13.25   SONALI    : 2   MURMU  :10   14Y    : 5   M:38  
##  Median :25.50   ARIKA     : 1   MARDI  : 8   14Y2M  : 3         
##  Mean   :25.50   BAIDYANATH: 1   HANSDA : 7   15Y7M  : 3         
##  3rd Qu.:37.75   BAPI      : 1   HEMBRAM: 6   11Y10M : 2         
##  Max.   :50.00   BIKASH    : 1   TUDU   : 5   12Y6M  : 2         
##                  (Other)   :42   (Other): 2   (Other):26         
##                SCHOOL   HPLC.TEST.NO.       HB           HPLC.DATE 
##                   : 1   622A   : 1    Min.   :12.10   5.7.2016:10  
##  N.A.T.HIGH SCHOOL:49   628A   : 1    1st Qu.:12.30   6.7.2016:31  
##                         631A   : 1    Median :12.75   7.7.2016: 6  
##                         632A   : 1    Mean   :12.83   8.7.2016: 3  
##                         636A   : 1    3rd Qu.:13.20                
##                         640A   : 1    Max.   :14.80                
##                         (Other):44                                 
##       RBC             PCV             MCV             MCH       
##  Min.   :4.920   Min.   :37.50   Min.   :65.70   Min.   :19.30  
##  1st Qu.:5.275   1st Qu.:40.42   1st Qu.:72.33   1st Qu.:21.70  
##  Median :5.570   Median :41.45   Median :75.30   Median :23.20  
##  Mean   :5.604   Mean   :41.71   Mean   :74.57   Mean   :22.99  
##  3rd Qu.:5.950   3rd Qu.:43.15   3rd Qu.:77.17   3rd Qu.:24.27  
##  Max.   :6.420   Max.   :47.50   Max.   :79.30   Max.   :26.00  
##                                                                 
##       MCHC            RDW             HbA0            HbA2      
##  Min.   :29.40   Min.   :12.60   Min.   :84.10   Min.   :2.100  
##  1st Qu.:30.10   1st Qu.:14.40   1st Qu.:85.70   1st Qu.:2.600  
##  Median :30.80   Median :15.20   Median :86.50   Median :2.800  
##  Mean   :30.82   Mean   :15.29   Mean   :86.49   Mean   :2.764  
##  3rd Qu.:31.57   3rd Qu.:16.00   3rd Qu.:87.28   3rd Qu.:2.900  
##  Max.   :32.90   Max.   :18.90   Max.   :88.60   Max.   :3.700  
##                                                                 
##       HbF         HBA2.E          HBS            HBD         
##  Min.   :0.200   Mode:logical   Mode:logical   Mode:logical  
##  1st Qu.:0.200   NA's:50        NA's:50        NA's:50       
##  Median :0.300                                               
##  Mean   :0.336                                               
##  3rd Qu.:0.400                                               
##  Max.   :0.700                                               
##                                                              
##  FERITIN          B12          CLINICAL.ABNORMALITY           MOLICULAR 
##  Mode:logical   Mode:logical   Mode:logical         Alpha Mutation :29  
##  NA's:50        NA's:50        NA's:50              Normal         :20  
##                                                     not found      : 1  
##                                                                         
##                                                                         
##                                                                         
##

mn<-read.csv("jalpaigurinewfolder.csv")
mn.j<-mn
summary(mn.j)

##       NAME        SURNAME       AGE        SEX           School  
##  Sahil  : 2   Oraon   :7   Min.   :10.00   M:19   Nathuahat :19  
##  Ashik  : 1   Oraon   :6   1st Qu.:11.00                         
##  Atish  : 1   Bhagat  :1   Median :12.00                         
##  Bipul  : 1   Lama    :1   Mean   :12.16                         
##  Bishal : 1   Munda   :1   3rd Qu.:13.50                         
##  Hrittik: 1   Shaibya :1   Max.   :15.00                         
##  (Other):12   (Other) :2                                         
##     HPLC.NO            HB                 Molecular    HPLC.TEST.NO.
##  Min.   : 1.00   Min.   :12.10   alpha mutation:13   1/Nathua : 1   
##  1st Qu.:13.00   1st Qu.:12.25   Alpha mutation: 1   17/Nathua: 1   
##  Median :19.00   Median :12.40   normal        : 5   23/Nathua: 1   
##  Mean   :35.68   Mean   :12.48                       31/Nathua: 1   
##  3rd Qu.:68.50   3rd Qu.:12.65                       33/Nathua: 1   
##  Max.   :83.00   Max.   :13.10                       37/Nathua: 1   
##                                                      (Other)  :13   
##     HPLC.DATE      RBC             PCV             MCV       
##  17.03.16:6   Min.   :4.550   Min.   :34.00   Min.   :64.40  
##  16.03.16:5   1st Qu.:4.795   1st Qu.:35.55   1st Qu.:69.65  
##  18.03.26:3   Median :5.160   Median :36.40   Median :71.30  
##  18.03.17:1   Mean   :5.129   Mean   :36.76   Mean   :71.90  
##  18.03.19:1   3rd Qu.:5.445   3rd Qu.:37.50   3rd Qu.:74.45  
##  18.03.20:1   Max.   :5.940   Max.   :41.20   Max.   :78.50  
##  (Other) :2                                                  
##       MCH             MCHC            RDW             HbA0      
##  Min.   :20.60   Min.   :31.10   Min.   :14.00   Min.   :87.30  
##  1st Qu.:22.55   1st Qu.:31.90   1st Qu.:15.65   1st Qu.:87.40  
##  Median :23.10   Median :32.20   Median :16.60   Median :87.60  
##  Mean   :23.26   Mean   :32.34   Mean   :16.66   Mean   :87.78  
##  3rd Qu.:24.20   3rd Qu.:32.60   3rd Qu.:18.15   3rd Qu.:88.05  
##  Max.   :26.60   Max.   :33.90   Max.   :18.90   Max.   :89.20  
##                                                                 
##       HbA2            HbF          HBA2.E          HBS         
##  Min.   :2.700   Min.   :0.0000   Mode:logical   Mode:logical  
##  1st Qu.:2.900   1st Qu.:0.0000   NA's:19        NA's:19       
##  Median :3.000   Median :0.2000                                
##  Mean   :3.016   Mean   :0.2158                                
##  3rd Qu.:3.250   3rd Qu.:0.3500                                
##  Max.   :3.300   Max.   :0.7000                                
##                                                                
##    HBD          FERITIN          B12          CLINICAL.ABNORMALITY
##  Mode:logical   Mode:logical   Mode:logical   Mode:logical        
##  NA's:19        NA's:19        NA's:19        NA's:19             
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##

abc<-read.csv("Bankuranewfolder.csv")
abc.b<-abc
summary(abc.b)

##          NAME      SURNAME       AGE           SEX    
##  AJAY      : 1   BASKEY:1   Min.   : 5.000   MALE:24  
##  AVIJIT    : 1   BESRA :1   1st Qu.: 8.000            
##  BIDYASAGAR: 1   KISKU :2   Median :10.000            
##  BIR SING  : 1   MANDI :8   Mean   : 9.208            
##  BUDDHADEB : 1   MURMU :9   3rd Qu.:10.000            
##  BUDHADEV  : 1   SOREN :2   Max.   :12.000            
##  (Other)   :18   TUDU  :1                             
##                                   SCREENING  HPLC.TEST.NO.  
##  Baganpara Bangladanga Chandi P. School: 2   Min.   : 8.00  
##  Kawasole P. Vidyalaya                 : 9   1st Qu.:39.75  
##  Peardoba P. School                    : 2   Median :57.50  
##  Sri Maa Shishu Niketan                :11   Mean   :52.67  
##                                              3rd Qu.:71.25  
##                                              Max.   :86.00  
##                                                             
##        HB             RBC             PCV             MCV       
##  Min.   :12.10   Min.   :4.970   Min.   :37.50   Min.   :68.40  
##  1st Qu.:12.20   1st Qu.:5.452   1st Qu.:40.38   1st Qu.:69.72  
##  Median :12.45   Median :5.745   Median :41.00   Median :72.40  
##  Mean   :12.47   Mean   :5.647   Mean   :40.99   Mean   :72.78  
##  3rd Qu.:12.62   3rd Qu.:5.883   3rd Qu.:41.62   3rd Qu.:74.30  
##  Max.   :13.40   Max.   :6.120   Max.   :44.30   Max.   :79.90  
##                                                                 
##       MCH             MCHC            RDW             HbA0      
##  Min.   :20.20   Min.   :29.30   Min.   :14.00   Min.   :86.50  
##  1st Qu.:21.05   1st Qu.:29.95   1st Qu.:15.97   1st Qu.:87.38  
##  Median :21.90   Median :30.30   Median :16.40   Median :87.75  
##  Mean   :22.19   Mean   :30.43   Mean   :16.51   Mean   :87.72  
##  3rd Qu.:22.80   3rd Qu.:30.80   3rd Qu.:17.10   3rd Qu.:88.12  
##  Max.   :25.20   Max.   :32.60   Max.   :19.10   Max.   :88.60  
##                                                                 
##       HbA2            HbF          HBA2.E          HBS         
##  Min.   :2.400   Min.   :0.1000   Mode:logical   Mode:logical  
##  1st Qu.:2.700   1st Qu.:0.2000   NA's:24        NA's:24       
##  Median :2.800   Median :0.3000                                
##  Mean   :2.842   Mean   :0.3583                                
##  3rd Qu.:3.000   3rd Qu.:0.3000                                
##  Max.   :3.200   Max.   :1.6000                                
##                                                                
##    HBD          FERITIN          B12          CLINICAL.ABNORMALITY
##  Mode:logical   Mode:logical   Mode:logical   Mode:logical        
##  NA's:24        NA's:24        NA's:24        NA's:24             
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##       MOLICULAR.MUTATION
##  Alpha Mutation:17      
##  normal        : 7      
##                         
##                         
##                         
##                         
##

#barplot
hj<-read.csv("Murshidabadnewfolder.csv")
jh<-read.csv("naidanewfolder.csv")
gd<-read.csv("birbhumnewfolder.csv")
dg<-read.csv("jalpaigurinewfolder.csv")
dk<-read.csv("Bankuranewfolder.csv")
f<-cbind(mean(hj$HB),mean(jh$HB),mean(gd$HB),mean(dg$HB),mean(dk$HB))
colnames(f)<-c("Mur","nai","bir","jal","Ban")
par(mfrow=c(2.5,2.5))
barplot(f)
e<-cbind(mean(hj$HbA0),mean(jh$HBA0),mean(gd$HbA0),mean(dg$HbA0),mean(dk$HB) )
colnames(e)<-c("Mur1","nai1","bir1","jal","Ban1")
barplot(e)
o<-cbind(mean(hj$HbF),mean(jh$HBF),mean(gd$HbF),mean(dg$HbF),mean(dk$HbF))
colnames(o)<-c("mur2","nai2","bir2","jal2","Ban2")
barplot(o)

problem3

BOX PLOT ANALYSIS

ba<-read.csv("Bankuranewfolder.csv")
b<-read.csv("Birbhumnewfolder.csv")
j<-read.csv("jalpaigurinewfolder.csv")
m<-read.csv("Murshidabadnewfolder.csv")
n<-read.csv("naidanewfolder.csv")

par(mfrow=c(2,2))
boxhb<-cbind(ba$HB,b$HB,j$HB,m$HB,n$HB)

## Warning in cbind(ba$HB, b$HB, j$HB, m$HB, n$HB): number of rows of result
## is not a multiple of vector length (arg 1)

colnames(boxhb)<-c("BA","B","J","M","N")
boxplot(boxhb,ylab="hb",xlab="district",main="boxplot")

boxhbao<-cbind(ba$HbA0,b$HbA0,j$HbA0,m$HbA0,n$HBA0)

## Warning in cbind(ba$HbA0, b$HbA0, j$HbA0, m$HbA0, n$HBA0): number of rows
## of result is not a multiple of vector length (arg 1)

colnames(boxhbao)<-c("BA","B","J","M","N")
boxplot(boxhbao,ylab="hbao",xlab="district",main="boxplot")

boxhba2<-cbind(ba$HbA2,b$HbA2,j$HbA2,m$HbA2,n$HBA2)

## Warning in cbind(ba$HbA2, b$HbA2, j$HbA2, m$HbA2, n$HBA2): number of rows
## of result is not a multiple of vector length (arg 1)

colnames(boxhba2)<-c("BA","B","J","M","N")
boxplot(boxhba2,ylab="hba2",xlab="district",main="boxplot")

boxhbf<-cbind(ba$HbA2,b$HbF,j$HbF,m$HbF,n$HBF)

## Warning in cbind(ba$HbA2, b$HbF, j$HbF, m$HbF, n$HBF): number of rows of
## result is not a multiple of vector length (arg 1)

colnames(boxhbf)<-c("BA","B","J","M","N")
boxplot(boxhbf,ylab="hbf",xlab="district",main="boxplot")

PRINCIPLE COMPONENT ANALYSIS

ba<-read.csv("Bankuranewfolder.csv")
b<-read.csv("Birbhumnewfolder.csv")
j<-read.csv("jalpaigurinewfolder.csv")
m<-read.csv("Murshidabadnewfolder.csv")
n<-read.csv("naidanewfolder.csv")

nad<-cbind(n$HB,n$MCV,n$MCH,n$HBA0,n$HBA2,n$HBF,n$MOLECULAR)
distn<-rep("naida",nrow(n))
colnames(nad)<-c("hb","mcv","mch","hbao","hba2","hbf","mol")
f.n<-cbind(nad,distn)

ban<-cbind(ba$HB,ba$MCV,ba$MCH,ba$HbA0,ba$HbA2,ba$HbF,ba$MOLICULAR.MUTATION)
distn1<-rep("Bankura",nrow(ba))
colnames(ban)<-c("hb1","mcv1","mch1","hbao1","hba21","hbf1","mol1")
f.ba<-cbind(ban,distn1)

bir<-cbind(b$HB,b$MCV,b$MCH,b$HbA0,b$HbA2,b$HbF,b$MOLICULAR)
distn2<-rep("Birbhum",nrow(b))
colnames(bir)<-c("hb2","mcv2","mch2","hbao2","hba22","hbf2","mol2")
f.bir<-cbind(bir,distn2)

jal<-cbind(j$HB,j$MCV,j$MCH,j$HbA0,j$HbA2,j$HbF,j$Molecular)
distn3<-rep("jalpaiguri",nrow(j))
colnames(jal)<-c("hb3","mcv3","mch3","hbao3","hba23","hbf3","mol3")
f.jal<-cbind(jal,distn3)

mur<-cbind(m$HB,m$MCV,m$MCH,m$HbA0,m$HbA2,m$HbF,m$molecular)
distn4<-rep("Murshidabad",nrow(m))
colnames(mur)<-c("hb4","mcv4","mch4","hbao4","hba24","hbf4","mol4")
f.mur<-cbind(mur,distn4)
csvxtract<-function(csvfile){qq<-read.csv(csvfile,header = TRUE)
hb<-qq$HB
hbao<-qq$HbAO
hbf<-qq$HbF
mcv<-qq$mcv
mch<-qq$MCH
hba2<-qq$HbA2
diag<-qq$MOLECULAR
s1<-cbind(hb,hbao,hbf,mcv,mch,hba2)
s<-list("NUM"=s1,"MUT"=diag)
return(s)}
nad<-csvxtract("nadiaonly.csv")
summary(nad)

##     Length Class  Mode   
## NUM 188    -none- numeric
## MUT  47    factor numeric

ban<-csvxtract("bankuraonly.csv")
summary(ban)

##     Length Class  Mode   
## NUM 96     -none- numeric
## MUT 24     factor numeric

bir<-csvxtract("Birbhumonly.csv")
summary(bir)

##     Length Class  Mode   
## NUM 200    -none- numeric
## MUT  50    factor numeric

mur<-csvxtract("Murshidabadonly.csv")
summary(mur)

##     Length Class  Mode   
## NUM 80     -none- numeric
## MUT 20     factor numeric

jal<-csvxtract("jalpaigurionly.csv")
summary(jal)

##     Length Class  Mode   
## NUM 76     -none- numeric
## MUT 19     factor numeric

mastercsv<-rbind(nad$NUM,ban$NUM,bir$NUM,mur$NUM,jal$NUM)
mastgrp<-c(nad$MUT,ban$MUT,bir$MUT,mur$MUT,jal$MUT)                       
recode<-c(normal=2,alpha=1)
gclass<-factor(mastgrp,levels=recode,labels=names(recode))
summary(gclass)

## normal  alpha   NA's 
##     71     66     23

ik<-which(is.na(gclass))
ik

##  [1]  48  49  53  57  60  63  67 122 123 125 127 132 133 135 136 137 138
## [18] 139 142 143 145 146 154

mast.pca<-prcomp(mastercsv,scale=TRUE)
summary(mast.pca)

## Importance of components:
##                           PC1    PC2    PC3    PC4
## Standard deviation     1.1334 1.0825 0.9453 0.8061
## Proportion of Variance 0.3212 0.2930 0.2234 0.1625
## Cumulative Proportion  0.3212 0.6141 0.8375 1.0000

biplot(mast.pca)

library(ggbiplot)

## Loading required package: ggplot2

## Loading required package: plyr

## Loading required package: scales

## Loading required package: grid

g<-ggbiplot(mast.pca,obs.scale = 1,varscale=1,groups = gclass,ellipse = TRUE,circle = TRUE)
g<-g+ scale_colour_discrete(name='')
g<-g+ theme(legend.direction = 'horizontal', legend.position = 'top')
print(g)

ASSIGNMENT 3

PROTEIN PDB FILE

library("bio3d")

## Warning: package 'bio3d' was built under R version 3.4.4

library("scatterplot3d")

## Warning: package 'scatterplot3d' was built under R version 3.4.4

# problem 1
a<-read.csv("bond.csv")
a

d<-a$atom
f<-a$x
g<-a$y
h<-a$z
x.ab<-f[1]-f[2]
y.ab<-g[1]-g[2]
z.ab<-h[1]-h[2]
x.cb<-f[3]-f[2]
y.cb<-g[3]-g[2]
z.cb<-h[3]-h[2]
u<-c(x.ab,y.ab,z.ab)
v<-c(x.cb,y.cb,z.cb)
dotuv<-u%*%v
modu<-sqrt(u%*%v)
modv<-sqrt(v%*%v)
mod<-modu%*%modv
ct<-dotuv/mod
ct%*%180

##          [,1]
## [1,] 127.2792

TORSION ANGLE (RAMACHANDRAN PLOT)

s<-read.pdb("1ZNI")

##   Note: Accessing on-line PDB file
##    PDB has ALT records, taking A only, rm.alt=TRUE

summary(s)

## 
##  Call:  read.pdb(file = "1ZNI")
## 
##    Total Models#: 1
##      Total Atoms#: 915,  XYZs#: 2745  Chains#: 4  (values: A B C D)
## 
##      Protein Atoms#: 806  (residues/Calpha atoms#: 102)
##      Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
## 
##      Non-protein/nucleic Atoms#: 109  (residues: 109)
##      Non-protein/nucleic resid values: [ CL (3), HOH (103), ZN (3) ]
## 
## + attr: atom, xyz, seqres, helix, sheet,
##         calpha, remark, call

t<-torsion.pdb(s)
phi<-t$phi
psi<-t$psi
om<-t$omega
r1<-c(phi[1:10],psi[1:10])
(r1)

##  [1]         NA  -61.87672  -69.04083  -68.51322  -84.38693 -118.16191
##  [7] -100.90698  -92.32031 -143.03825  -84.72117 -153.62096  -31.84824
## [13]  -37.72618  -35.06878  -35.52542   12.08905  -30.34935  -15.72777
## [19]  158.38593  146.23930

plot(om,pch=19)

plot(psi,phi,pch=19)

# problem 3
summary(s$atom)

##      type               eleno          elety               alt           
##  Length:915         Min.   :  1.0   Length:915         Length:915        
##  Class :character   1st Qu.:230.5   Class :character   Class :character  
##  Mode  :character   Median :467.0   Mode  :character   Mode  :character  
##                     Mean   :466.7                                        
##                     3rd Qu.:698.5                                        
##                     Max.   :933.0                                        
##     resid              chain               resno         insert         
##  Length:915         Length:915         Min.   : 1.0   Length:915        
##  Class :character   Class :character   1st Qu.: 8.0   Class :character  
##  Mode  :character   Mode  :character   Median :16.0   Mode  :character  
##                                        Mean   :17.1                     
##                                        3rd Qu.:24.0                     
##                                        Max.   :63.0                     
##        x                 y                z                 o         
##  Min.   :-25.267   Min.   :-0.149   Min.   :-21.718   Min.   :0.1700  
##  1st Qu.: -8.268   1st Qu.: 7.718   1st Qu.: -6.747   1st Qu.:1.0000  
##  Median : -0.980   Median :12.372   Median :  0.215   Median :1.0000  
##  Mean   : -1.742   Mean   :12.493   Mean   :  0.358   Mean   :0.9874  
##  3rd Qu.:  4.679   3rd Qu.:17.089   3rd Qu.:  7.059   3rd Qu.:1.0000  
##  Max.   : 18.630   Max.   :27.204   Max.   : 27.500   Max.   :1.0000  
##        b            segid              elesy              charge         
##  Min.   :11.90   Length:915         Length:915         Length:915        
##  1st Qu.:20.16   Class :character   Class :character   Class :character  
##  Median :27.23   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.79                                                           
##  3rd Qu.:39.42                                                           
##  Max.   :92.47

a<-s$atom[, c("x","y","z","elety")]
aa<-which(a$elety=="CA")
a.a<-s$atom[, c("x","y","z")]
r<-a.a[aa,]
summary(r)

##        x                 y                z           
##  Min.   :-19.028   Min.   : 3.866   Min.   :-17.0430  
##  1st Qu.: -8.092   1st Qu.: 7.950   1st Qu.: -6.1193  
##  Median : -0.957   Median :12.098   Median :  0.3630  
##  Mean   : -1.820   Mean   :12.477   Mean   :  0.4244  
##  3rd Qu.:  4.153   3rd Qu.:16.750   3rd Qu.:  7.1040  
##  Max.   : 14.883   Max.   :22.896   Max.   : 21.8790

scatterplot3d(r,highlight.3d = TRUE, col.axis = "red",col.grid = "lightblue", main = "scatterplot 3D",pch = 20)

ik<-which(s$atom$elety=="CA")
CAxyz<-s$atom[ik,c("x","y","z")]
plot(CAxyz)

PROTEIN BLAST

blast<-blast.pdb(pdbseq(s))

##  Searching ... please wait (updates every 5 seconds) RID = E8NTJA64014 
##  
##  Reporting 216 hits

top.hits<-plot.blast(blast, cutoff = 25)

##   * Possible cutoff values:    32 15 31 16 32 16 27 15 57 13 27 11 30 11 32 16 25 16 25 13 
##             Yielding Nhits:    2 3 5 6 8 9 11 12 13 15 17 18 20 21 23 24 26 27 127 216 
## 
##   * Chosen cutoff value of:    25 
##             Yielding Nhits:    117

head(top.hits$hits)

##   pdb.id   acc      group
## 1 "6INS_E" "6INS_E" "1"  
## 2 "6INS_E" "6INS_E" "1"  
## 4 "1ZEI_A" "1ZEI_A" "1"  
## 5 "1ZEI_A" "1ZEI_A" "1"  
## 7 "1EFE_A" "1EFE_A" "1"  
## 8 "1EFE_A" "1EFE_A" "1"

a<-blast$raw
returned.pdbids<-a$subjectids
(returned.pdbids)

##   [1] "6INS_E"  "6INS_E"  "6INS_E"  "1ZEI_A"  "1ZEI_A"  "1ZEI_A"  "1EFE_A" 
##   [8] "1EFE_A"  "1EFE_A"  "1SJU_A"  "1SJU_A"  "1SJU_A"  "2JZQ_A"  "2JZQ_A" 
##  [15] "2JZQ_A"  "5WBT_A"  "5WBT_A"  "5WBT_A"  "5WDM_A"  "5WDM_A"  "5WDM_A" 
##  [22] "6B3Q_AA" "6B3Q_AA" "6B3Q_AA" "2KQP_A"  "2KQP_A"  "2KQP_A"  "1APH_B" 
##  [29] "1APH_B"  "2RN5_B"  "2RN5_B"  "2LGB_B"  "2LGB_B"  "1TRZ_B"  "1TRZ_B" 
##  [36] "1XDA_B"  "1XDA_B"  "5AZZ_B"  "5AZZ_B"  "2INS_B"  "2INS_B"  "1IZA_B" 
##  [43] "1IZA_B"  "1HO0_A"  "1HO0_A"  "4UNE_B"  "4UNE_B"  "1HLS_B"  "1HLS_B" 
##  [50] "1B9E_B"  "1B9E_B"  "5MHD_B"  "5MHD_B"  "2MVD_B"  "2MVD_B"  "1BEN_B" 
##  [57] "1BEN_B"  "1QIY_B"  "1QIY_B"  "3BXQ_B"  "3BXQ_B"  "3V1G_B"  "3V1G_B" 
##  [64] "2WRX_B"  "2WRX_B"  "3ZQR_B"  "3ZQR_B"  "2M2O_B"  "2M2O_B"  "1W8P_B" 
##  [71] "1W8P_B"  "3JSD_B"  "3JSD_B"  "2M2M_B"  "2M2M_B"  "1ZEG_B"  "1ZEG_B" 
##  [78] "1HIQ_B"  "1HIQ_B"  "4UNG_B"  "4UNG_B"  "1HIT_B"  "1HIT_B"  "4UNH_B" 
##  [85] "4UNH_B"  "1MHI_B"  "1MHI_B"  "5HPR_B"  "5HPR_B"  "1QIY_D"  "1QIY_D" 
##  [92] "3U4N_B"  "3U4N_B"  "5BPO_B"  "5BPO_B"  "2N2V_B"  "2N2V_B"  "4P65_B" 
##  [99] "4P65_B"  "5BQQ_B"  "5BQQ_B"  "4NIB_B"  "4NIB_B"  "1LPH_B"  "1LPH_B" 
## [106] "1SJT_B"  "1SJT_B"  "1JCO_B"  "1JCO_B"  "1HTV_B"  "1HTV_B"  "1A7F_B" 
## [113] "1A7F_B"  "1MHJ_B"  "1MHJ_B"  "4EFX_B"  "4EFX_B"  "1K3M_B"  "1K3M_B" 
## [120] "2N2X_B"  "2N2X_B"  "2MPG_B"  "2MPG_B"  "1T1K_B"  "1T1K_B"  "1T1P_B" 
## [127] "1T1P_B"  "3ZS2_B"  "3ZS2_B"  "1HUI_B"  "1HUI_B"  "1B9G_A"  "1B9G_A" 
## [134] "1B9G_A"  "2HHO_B"  "2HHO_B"  "1T1Q_B"  "1T1Q_B"  "1TGR_A"  "1TGR_A" 
## [141] "1TGR_A"  "2WS4_B"  "2WS4_B"  "1PID_B"  "1PID_B"  "1BZV_B"  "1BZV_B" 
## [148] "2KJU_B"  "2KJU_B"  "2K9R_B"  "2K9R_B"  "2MLI_B"  "2MLI_B"  "2H67_B" 
## [155] "2H67_B"  "2K91_B"  "2K91_B"  "2HH4_B"  "2HH4_B"  "2L1Y_B"  "2L1Y_B" 
## [162] "2MPI_B"  "2MPI_B"  "1IMX_A"  "1IMX_A"  "1IMX_A"  "5L3M_A"  "5L3M_A" 
## [169] "5L3M_A"  "1SDB_B"  "1SDB_B"  "5L3N_A"  "5L3N_A"  "5L3N_A"  "1DEI_B" 
## [176] "1DEI_B"  "1IGL_A"  "1IGL_A"  "1IGL_A"  "3LRI_A"  "3LRI_A"  "3LRI_A" 
## [183] "2QIU_A"  "2QIU_A"  "5MHD_A"  "5MHD_A"  "4INS_A"  "4INS_A"  "2KXK_A" 
## [190] "2KXK_A"  "2BN3_A"  "2BN3_A"  "1XW7_A"  "1XW7_A"  "2JUM_A"  "2JUM_A" 
## [197] "5MWQ_A"  "5MWQ_A"  "1K3M_A"  "1K3M_A"  "1JCA_A"  "1JCA_A"  "2JUU_A" 
## [204] "2JUU_A"  "1Q4V_A"  "1Q4V_A"  "1J73_A"  "1J73_A"  "1APH_A"  "1APH_A" 
## [211] "4IYD_A"  "4IYD_A"  "2WBY_C"  "2WBY_C"  "1BEN_A"  "1BEN_A"

p<-a$subjectids
p[10]

## [1] "1SJU_A"

p[40]

## [1] "2INS_B"

p[100]

## [1] "5BQQ_B"

plot(a$bitscore,a$evalue)

plot(a$evalue,type = "l")

plot(a$bitscore,type = "l")

SCATER PLOT

y<-read.pdb("1SJU")

##   Note: Accessing on-line PDB file

summary(y)

## 
##  Call:  read.pdb(file = "1SJU")
## 
##    Total Models#: 1
##      Total Atoms#: 762,  XYZs#: 2286  Chains#: 1  (values: A)
## 
##      Protein Atoms#: 762  (residues/Calpha atoms#: 50)
##      Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
## 
##      Non-protein/nucleic Atoms#: 0  (residues: 0)
##      Non-protein/nucleic resid values: [ none ]
## 
## + attr: atom, xyz, seqres, helix, calpha,
##         call

t1<-torsion.pdb(y)
phi1<-t1$phi
psi1<-t1$psi
om1<-t1$omega


z<-read.pdb("2INS")

##   Note: Accessing on-line PDB file
##    PDB has ALT records, taking A only, rm.alt=TRUE

summary(z)

## 
##  Call:  read.pdb(file = "2INS")
## 
##    Total Models#: 1
##      Total Atoms#: 956,  XYZs#: 2868  Chains#: 4  (values: A B C D)
## 
##      Protein Atoms#: 770  (residues/Calpha atoms#: 100)
##      Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
## 
##      Non-protein/nucleic Atoms#: 186  (residues: 186)
##      Non-protein/nucleic resid values: [ HOH (184), ZN (2) ]
## 
## + attr: atom, xyz, seqres, helix, sheet,
##         calpha, remark, call

t2<-torsion.pdb(z)
phi2<-t2$phi
psi2<-t2$psi

x<-read.pdb("5BQQ")

##   Note: Accessing on-line PDB file
##    PDB has ALT records, taking A only, rm.alt=TRUE

summary(x)

## 
##  Call:  read.pdb(file = "5BQQ")
## 
##    Total Models#: 1
##      Total Atoms#: 2812,  XYZs#: 8436  Chains#: 12  (values: A B C D E F G H I J K L)
## 
##      Protein Atoms#: 2268  (residues/Calpha atoms#: 292)
##      Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
## 
##      Non-protein/nucleic Atoms#: 544  (residues: 400)
##      Non-protein/nucleic resid values: [ CL (2), HIX (5), HOH (375), IPH (10), NVA (6), ZN (2) ]
## 
## + attr: atom, xyz, seqres, helix, sheet,
##         calpha, remark, call

t3<-torsion.pdb(x)
phi3<-t3$phi
psi3<-t3$psi



plot(psi,phi,pch=19,col="blue",cex=0.5,text(-100,150,"1.b=1ZNI",col="blue"))
points(psi2,phi2,pch=19,col="green",cex=0.4,text(-25,150,"2.g=1SJU",col="green"))
points(psi1,phi1,pch=19,col="red",cex=0.4,text(45,15,"3.r=2INS",col = "red"))
points(psi3,phi3,pch=19,col="pink",cex=0.4,text(120,150,"4.bla=5BQQ",col = "black"))

ASSIGNMENT 4

DATA QUIZ

problem 1:

Given three points (A,B,C) write an R script to determine the cross product of the two vectors AB and BC

z<-read.csv("xyz.csv")
x1<-z$x
y1<-z$y
z1<-z$z
AB<-c(x1[2]-x1[1],y1[2]-y1[1],z1[2]-z1[1])
BC<-c(x1[3]-x1[2],y1[3]-y1[2],z1[3]-z1[2])

crs<-function(u,v){
  c1<- u[2]*v[3]-v[2]*u[3]
  c2<- u[3]*v[1]-v[3]*u[1]
  c3<- u[1]*v[2]-v[1]*u[2]
  
  return(c(c1,c2,c3))
}
K<-crs(AB,BC)
(K)

## [1]  2.46  4.90 -0.54

#problem 2
##  Suppose you have x and y variables with some NA elements .Write a program to find 
###(a)  Mean of x
####(b) Mean of y 
#####(c)    Plot of y against x

w<-read.csv("Book.csv")
f<-w$x
f1<-is.na(f)
f2<-which(f1)
b<-f[-f2]
b

## [1]  3  5  7  9 11

mean(b)

## [1] 7

g<-w$y
g1<-is.na(g)
g2<-which(g1)
a<-g[-g2]
a

## [1]  2  4  6  8 12

mean(a)

## [1] 6.4

plot(b,a)

# problem 4
##  Suppose you have a spectrum with more than one peaks. Write a program to find out all the peaks

library("readxl")
z<-read_xlsx("rti.xlsx")
RA<-z$Ramean
t1<-z$Tmean
i1<-z$Imean
w1<-z$wavelen
#spectral extraction
a<-401-320
a

## [1] 81

aa<-a+(501-401)
aa

## [1] 181

aaa<-aa+(601-501)
aaa

## [1] 281

p=0
p1=0
p2=0
p3=0
p4=0
p5=0
p6=0
p7=0
p8=0
for(i in 1:a){
  p[i]<-RA[i]
  p1[i]<-t1[i]
  p2[i]<-i1[i]
}
for(i in (a+1):aa){
  p3[i-a]<-RA[i]
  p4[i-a]<-t1[i]
  p5[i-a]<-i1[i]
}
for(i in (aa+1):aaa){
  p6[i-aa]<-RA[i]
  p7[i-aa]<-t1[i]
  p8[i-aa]<-i1[i]
  
}
plot(w1,RA,cex=0.5,col="blue",main="RA SFS",xlab = "Wavelength(320-600nm)",ylab = "Fluorescence intensities")

plot(w1,t1,cex=0.5,col="red",main="T SFS",xlab = "Wavelength(320-600nm)",ylab = "Fluorescence intensities")

plot(w1,i1,cex=0.5,col="cyan",main="I SFS",xlab = "Wavelength(320-600nm)",ylab = "Fluorescence intensities")

plot(p,cex=0.9,col="blue",main="pk in RA")

plot(p3,cex=0.9,col="blue",main="pk3 in RA")

plot(p6,cex=0.9,col="blue",main="pk6 in RA")

plot(p1,cex=0.9,col="blue",main="pk1 in T")

plot(p4,cex=0.9,col="blue",main="pk4 in T")

plot(p7,cex=0.9,col="blue",main="pk7 in T")

plot(p2,cex=0.9,col="blue",main="pk2 in I")

plot(p5,cex=0.9,col="blue",main="pk5 in I")

plot(p8,cex=0.9,col="blue",main="pk8 in I")

#problem 5
#Write a program to generate a random sequence of amino acids.
k5<-"CIHGLKFMTPFMTPSVWYGLKFMTPSVWY"
q<-unlist(strsplit(k5,""))
for(i in 1:30) {cat(sample(q))
  cat("/n")}

## S C W W G L K M P T F G F K Y T H L P V S Y I M V F M P T/nM P G F S K T I W H T K Y C V P Y W F T M S L F M V G P L/nY W V P F T S Y I V H F G S K C M P L W P L T M M T F G K/nF P W T V Y C P T M T P W G F L V H S M I K L K Y G M S F/nH S I G C L T M W P Y T V V F S K L G P K Y F M M P W T F/nM T P S K W G F Y L I H G T M T F L V V S C K Y M W P P F/nW L P P I K G F V K T S F C T G H Y T V M P S M W M Y L F/nT T F Y K V M L Y M P W I M H P P K F C L S S F G T G V W/nI P W L G T H C Y S F K W S L M F P Y T T K G V M M P F V/nK P K C M G T V P F H T W M L T V F Y M G Y L P S W S I F/nP G V H F S K Y S K I T P F W F W M Y V M P T L G C M T L/nY Y P V G M F W P K P K L I C L M W T S M V F T H T S F G/nG K F M M L V F K G P Y Y M T S P S T C L P W V T W H I F/nT L S K K W I F P W G T M M M H C F G Y P F V V P S T Y L/nP W M T M V F K P F Y P T G W C V H K L T F S M I S G L Y/nP M C G Y T F T K P S V V Y F F L L K P W S W I G M M T H/nK G T P M F T I S K V F P L W H Y M Y V T G F W P S C M L/nT S H C T T P L W Y Y M P F W S G P I M M K G K F F V L V/nG S F P V M L M W L P W T T Y I T K V Y K S G F H C F M P/nC P F W P P T K H L F M L W V M I M T G Y K F S Y G S T V/nW M P W K V Y P M T I F G K M P F S T Y S V L C F G T H L/nP L Y M P M K G T S G I W P V C F T F T S M K H F V W L Y/nP Y F W P V M H Y M K G C S M T V F T L T G P K W L F S I/nY T M I Y G M F M P C G S L H V S W F F L K T W V K T P P/nC I M H F F G Y K W Y S P M T P S P K M T L W V G L F T V/nT I L L K T G W P M C F T F G W S P K V V F H Y M P M S Y/nC V S P F L V K P H I Y W M L G T T G M T M P Y F S K W F/nI K T K F F S G Y L L T T M H S M W P W G F V P P M V C Y/nP T W S P I K P Y L W H C F T L G M T F G F V M K Y M V S/nP Y M P F S L K H V K V G C P I F Y L W S T F T T G M W M/n

ASSIGNMENT 5

CHI-SQUARE TEST

u<-read.csv("aspirincsv.csv")
a<-u$condition
#chi-square-analysis:
con<-u$condition
pla<-which(con=="placebo")
chg<-u$change
mn<-mean(chg)
h<-chg-mn
h[pla]

##  [1] -0.05333333  1.04666667  1.14666667  0.64666667  2.24666667
##  [6]  3.14666667  1.24666667  0.94666667  1.44666667  2.04666667
## [11]  1.64666667  1.24666667  0.74666667  1.04666667  2.04666667
## [16]  1.14666667

pm<-h[pla]
which(pm>mn)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

pp<-which(pm>mn)
length(pp)

## [1] 16

##  [1] -0.05333333  1.04666667 -2.95333333  1.14666667 -2.15333333
##  [6] -0.75333333  0.04666667 -3.35333333  0.64666667  2.24666667
## [11] -0.25333333 -2.45333333 -3.95333333  3.14666667 -1.65333333
## [16]  1.24666667  0.94666667  1.44666667 -2.35333333  2.04666667
## [21] -1.35333333  1.64666667  1.24666667  0.74666667  1.04666667
## [26]  2.04666667  0.94666667 -0.15333333 -1.35333333  1.14666667

mean(h)

## [1] -1.258433e-16

k<-u$change
k

##  [1] -3.2 -2.1 -6.1 -2.0 -5.3 -3.9 -3.1 -6.5 -2.5 -0.9 -3.4 -5.6 -7.1  0.0
## [15] -4.8 -1.9 -2.2 -1.7 -5.5 -1.1 -4.5 -1.5 -1.9 -2.4 -2.1 -1.1 -2.2 -3.3
## [29] -4.5 -2.0

mean(k)

## [1] -3.146667

##  [1] -0.05333333  1.04666667 -2.95333333  1.14666667 -2.15333333
##  [6] -0.75333333  0.04666667 -3.35333333  0.64666667  2.24666667
## [11] -0.25333333 -2.45333333 -3.95333333  3.14666667 -1.65333333
## [16]  1.24666667  0.94666667  1.44666667 -2.35333333  2.04666667
## [21] -1.35333333  1.64666667  1.24666667  0.74666667  1.04666667
## [26]  2.04666667  0.94666667 -0.15333333 -1.35333333  1.14666667

which(k>mean(k))

##  [1]  2  4  7  9 10 14 16 17 18 20 22 23 24 25 26 27 30

length(which(k>mean(k)))

## [1] 17

which(k>mean(k)& u$condition=="placebo")

##  [1]  2  4  9 10 14 16 17 18 20 22 23 24 25 26 30

l<-which(k>mean(k) & u$condition=="placebo")

length(l)

## [1] 15

u$change<-((u$change)**2)**0.5
u$change

##  [1] 3.2 2.1 6.1 2.0 5.3 3.9 3.1 6.5 2.5 0.9 3.4 5.6 7.1 0.0 4.8 1.9 2.2
## [18] 1.7 5.5 1.1 4.5 1.5 1.9 2.4 2.1 1.1 2.2 3.3 4.5 2.0

m<-mean(u$change)  
m

## [1] 3.146667

ik<-which(u$change>m &u$condition=="placebo")
length(ik)

## [1] 1

jk<-which(u$change<m & u$condition=="placebo")
length(jk)

## [1] 15

kk<-which(u$change>m & u$condition=="aspirin")
length(kk)

## [1] 12

pk<-which(u$change<m & u$condition=="aspirin")
length(pk)

## [1] 2

placebo<-c(length(ik),length(jk))
placebo

## [1]  1 15

placebo<-cbind(length(ik),length(jk))
placebo

##      [,1] [,2]
## [1,]    1   15

colnames(placebo)<-c("+","-")
placebo

##      +  -
## [1,] 1 15

aspirin<-cbind(length(kk),length(pk))
data.file<-rbind(placebo,aspirin)
data.file

##       +  -
## [1,]  1 15
## [2,] 12  2

colnames(data.file)<-c("+","-")
data.file

##       +  -
## [1,]  1 15
## [2,] 12  2

rownames(data.file)<-c("placebo","aspirin")
colnames(data.file)<-c("+","-")
data.file

##          +  -
## placebo  1 15
## aspirin 12  2

#probability of value

pa.pos<-((data.file[2,2]+data.file[2,1])/30)*((data.file[1,1]+data.file[2,1])/30)
po.pos<-((data.file[1,1]+data.file[1,2])/30)*((data.file[1,1]+data.file[2,1])/30)
po.pos

## [1] 0.2311111

pi.pos<-((data.file[2,1]+data.file[2,2])/30)*((data.file[1,2]+data.file[2,2])/30)
pi.pos

## [1] 0.2644444

pp.pos<-((data.file[1,2]+data.file[1,1])/30)*((data.file[1,2]+data.file[2,2])/30)
pp.pos

## [1] 0.3022222

pap<-pa.pos*30
pop<-po.pos*30
pip<-pi.pos*30
ppp<-pp.pos*30
e11<-(data.file[1,1]-pop)^2/pop
e12<-(data.file[1,2]-ppp)^2/ppp
e21<-(data.file[2,1]-pip)^2/pip
e22<-(data.file[2,2]-pap)^2/pap
chivalue<-sum(e11,e12,e21,e22)
chivalue

## [1] 13.77101

ASSIGNMENT 6

CHI-SQUARE VALUE AND PROTEIN COMPARISM

library(bio3d)
a<-read.pdb("1wba")

##   Note: Accessing on-line PDB file

t.a<-torsion.pdb(a)
phi.a<-t.a$phi
b<-is.na(phi.a)
b1<-which(b)
phi.ak<-phi.a[-b1]
psi.a<-t.a$psi
e<-is.na(psi.a)
e1<-which(e)
psi.ak<-psi.a[-e1]
par(mfrow=c(2,2))
plot(phi.ak,psi.ak,xlim = c(-180,180),ylim = c(-180,180),main = "1wba")
abline(h=c(0,0),v=c(0,0))
phi1<- -50.0
phi2<- -100.0
ik<-which(phi.ak<phi1 & phi.ak>phi2)
phial.ak<-length(ik)
phi3<- -40.0
phi4<- -150.0
ij<-which(phi.ak<phi3 & phi.ak>phi4)
phib.ak<-length(ij)
psi5<- -80.0
psi6<- 10.0
il<-which(psi.ak<psi6 & psi.ak>psi5)
psial.ak<-length(il)
psi7<- 110.0
psi8<- 160.0
im<-which(psi.ak>psi7 & psi.ak<psi8)
psib.ak<-length(im)
r<-cbind(phial.ak,psial.ak,phib.ak,psib.ak)
rownames(r)<-c("1wba")
(r)

##      phial.ak psial.ak phib.ak psib.ak
## 1wba       94       28     154     104

u<-read.pdb("3ua0")

##   Note: Accessing on-line PDB file

t.ua<-torsion.pdb(u)
phi.ua<-t.ua$phi
x<-is.na(phi.ua)
x1<-which(x)
phi.ua<-phi.ua[-x1]
psi.ua<-t.ua$psi
s<-is.na(psi.ua)
s1<-which(s)
psi.ua<-psi.ua[-s1]
plot(phi.ua,psi.ua,xlim = c(-180,180),ylim = c(-180,180),main = "3ua0")
abline(h=c(0,0),v=c(0,0))
phi9<- -50.0
phi10<- -110.0
ik1<-which(phi.ua<phi9 & phi.ua>phi10)
phial.ua<-length(ik1)
phi11<- -50.0
phi12<- -150.0
ij1<-which(phi.ak<phi11 & phi.ak>phi12)
phib.ua<-length(ij1)
psi13<- -50.0
psi14<- 10.0
il1<-which(psi.ua<psi14 & psi.ua>psi13)
psial.ua<-length(il1)
psi15<- 100.0
psi16<- 160.0
im1<-which(psi.ua>psi15 & psi.ua<psi16)
psib.ua<-length(im1)
r1<-cbind(phial.ua,psial.ua,phib.ua,psib.ua)
rownames(r1)<-c("3ua0")
(r1)

##      phial.ua psial.ua phib.ua psib.ua
## 3ua0       60       15     154      90

main.pro<-rbind(r,r1)
main.pro

##      phial.ak psial.ak phib.ak psib.ak
## 1wba       94       28     154     104
## 3ua0       60       15     154      90

colnames(main.pro)<-c("Phi-alpha","Psi-alpha","Phi-beta","Psi-beta")
main.pro

##      Phi-alpha Psi-alpha Phi-beta Psi-beta
## 1wba        94        28      154      104
## 3ua0        60        15      154       90

chisq.test(main.pro)

## 
##  Pearson's Chi-squared test
## 
## data:  main.pro
## X-squared = 7.1784, df = 3, p-value = 0.06642

ASSIGNMENT 7

T-TEST PROBLEM

x<-c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418) 
t.test<-function(dataset){
  s<-sd(dataset)
  d.mean<-mean(dataset)
  n1<-length(dataset)
  n<-n1**0.5
  u<-0.3
  t<-(d.mean)/(s/n)
  return(t)
}
t.test<-t.test(x)
t.test

## [1] 6.43351

mean(x)

## [1] 0.4564444

#INTERPRETATION
##from the above ,it was found that the mean of the above data ,level of Salmonella (MPN/g) of randomly sampled batches of ice cream is 0.4564444.We have taken the mean value for the Null hypothesis as 0.3,ie the Salmonella level is equal to 0.3. As the mean value found here is greater than the null hyopthesis, it is not possible to accept the Null hypothesis,we have to accept the alternate hypothesis,ie the level of Salmonella is more than 0.3 MPN/g.

#problem2

ballet<-c(89.2,78.2,89.3,88.3,87.3,90.1,95.2,94.3,78.3,89.3)
mean1<-mean(ballet)
sd.ball<-sd(ballet)
sd.ball

## [1] 5.690587

s1<-sd.ball^2
s1

## [1] 32.38278

l1<-length(ballet)
lb<-l1-1
lb

## [1] 9

FOOTBALL<-c(79.3,78.3,85.3,79.3,88.9,91.2,87.2,89.2,93.3,79.9)
mean2<-mean(FOOTBALL)
sd.foot<-sd(FOOTBALL)
sd.foot

## [1] 5.583995

s2<-sd.foot^2
s2

## [1] 31.181

l2<-length(FOOTBALL)
lf<-l2-1
lf

## [1] 9

L<-l1+l2-1
L

## [1] 19

sp<-(s1*lb + s2*lf)/L
sp

## [1] 30.10916

Sq<-(sp*(1/l1 + 1/l2))^0.5
Sq

## [1] 2.453942

MM<-mean1 - mean2
MM

## [1] 2.76

ttest2<-MM/Sq
ttest2

## [1] 1.124721

#Interpritation:
## the t table does not give you the precise probability of every t value,you can use it for hypothesis testing.according to the given reference index,our value is 1.124721 which is almost near to 1.372 haveing p value=0.10
### so at 95% confidence level,haveing value greater than 0.05 it indicate null hypothesis.
####This means that we can not distinguish between the fitness level between the ballet dancers and the football player.

#problem3

data<-c(220, 200, 240, 210, 225 ,210, 180, 170 ,210 ,220 ,190 ,180 ,195 ,190 ,200, 190, 210, 220,240,210) 
M<-matrix(data,nrow=10,byrow=TRUE)
colnames(M)<-c("Before","After")
(M)

##       Before After
##  [1,]    220   200
##  [2,]    240   210
##  [3,]    225   210
##  [4,]    180   170
##  [5,]    210   220
##  [6,]    190   180
##  [7,]    195   190
##  [8,]    200   190
##  [9,]    210   220
## [10,]    240   210

t.test(M[,1],M[,2],alternative="two.sided",mu=0,paired=TRUE,var.equal=FALSE,conf.level=0.95)

## 
##  Paired t-test
## 
## data:  M[, 1] and M[, 2]
## t = 2.5017, df = 9, p-value = 0.03377
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   1.053366 20.946634
## sample estimates:
## mean of the differences 
##                      11

ASSIGNMENT 8

PROJECT 1

library("bio3d")
library("Peptides")

## Warning: package 'Peptides' was built under R version 3.4.4

#Membrane protein
mem.pgu<-read.pdb("2F1T")

##   Note: Accessing on-line PDB file

seq.pgu<-pdbseq(mem.pgu)
#Thermostability value of the membrane protein
seq.pgu.i<-aIndex(seq.pgu)
ik<-which(seq.pgu.i==0.0)
seq.pgu.in<-seq.pgu.i[-ik]
therm.pgu<-sum(seq.pgu.in)/length(seq.pgu.in)
hk<-therm.pgu
summary(hk)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   261.7   261.7   261.7   261.7   261.7   261.7

#Protein Interaction
seq.pgu.pp<-boman(seq.pgu)
bin.pot<-sum(seq.pgu.pp)/length(seq.pgu)
bk<-bin.pot
summary(bk)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.275   1.275   1.275   1.275   1.275   1.275

#binpot<2.68 which implies the protein has lower affinity to bind with another protein


#Cytosolic protein
cyto.zkv<-read.pdb("3zkv")

##   Note: Accessing on-line PDB file

seq.zkv<-pdbseq(cyto.zkv)
#Thermostability value of the cytosolic protein
seq.zkv.i<-aIndex(seq.zkv)
jk<-which(seq.zkv.i==0.0)
seq.zkv.in<-seq.zkv.i[-jk]
therm.zkv<-sum(seq.zkv.in)/length(seq.zkv.in)
gk<-therm.zkv
summary(gk)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   306.4   306.4   306.4   306.4   306.4   306.4

#Protein Interaction
seq.zkv.pp<-boman(seq.zkv)
bin.pot1<-sum(seq.zkv.pp)/length(seq.zkv)
fk<-bin.pot1
summary(fk)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.221   1.221   1.221   1.221   1.221   1.221

#binpot<2.68 which implies the protein has lower affinity to bind with another protein
#Principle components
#A function for protein pca
protanapca<-function(seq){
  library("bio3d")
  library("Peptides")
  ch<-charge(seq, pH = 7, pKscale = "EMBOSS")
  pp<-boman(seq) 
  ampI<-hmoment(seq, angle = 100, window = 11)
  hy<-hydrophobicity(seq, scale = "KyteDoolittle")
  pI<-pI(seq,pKscale = "EMBOSS")
  mw<-mw(seq, monoisotopic = FALSE)
  protana<-cbind(ch,pp,ampI,hy,pI,mw)
  colnames(protana)<-c("CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
  p<-cbind(seq,protana)
  colnames(p)<-c("AA","CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
  pr.p<-prcomp(protana)
  return(pr.p)}
#A function for protein analysis of different properties
protana<-function(seq){
  library("bio3d")
  library("Peptides")
  ch<-charge(seq, pH = 7, pKscale = "EMBOSS")
  pp<-boman(seq) 
  ampI<-hmoment(seq, angle = 100, window = 11)
  hy<-hydrophobicity(seq, scale = "KyteDoolittle")
  pI<-pI(seq,pKscale = "EMBOSS")
  mw<-mw(seq, monoisotopic = FALSE)
  protana<-cbind(ch,pp,ampI,hy,pI,mw)
  colnames(protana)<-c("CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
  p<-cbind(seq,protana)
  colnames(p)<-c("AA","CHARGE","P-P","AMPIPHILLICITY","HYDRPHOBICITY","ISOELECTRIC","MOL WEIGHT")
  return(p)}

#Membrane protein
mem.pgu<-read.pdb("2F1T")

##   Note: Accessing on-line PDB file

## Warning in get.pdb(file, path = tempdir(), verbose = FALSE): /var/folders/
## t7/t6964h0d35938mcmcfhp1ngw0000gn/T//RtmpJ4MSmb/2F1T.pdb exists. Skipping
## download

seq.pgu<-pdbseq(mem.pgu)
pgu.a<-protana(seq.pgu)
pca.pgu<-protanapca(seq.pgu)
par(mfrow=c(1,2))
biplot(pca.pgu,main="2F1T")

#Cytosolic protein
cyto.zkv<-read.pdb("3zkv")

##   Note: Accessing on-line PDB file

## Warning in get.pdb(file, path = tempdir(), verbose = FALSE): /var/folders/
## t7/t6964h0d35938mcmcfhp1ngw0000gn/T//RtmpJ4MSmb/3zkv.pdb exists. Skipping
## download

seq.zkv<-pdbseq(cyto.zkv)
zkv.a<-protana(seq.zkv)
pca.zkv<-protanapca(seq.zkv)
biplot(pca.zkv,main="3ZKV")

#CONCLUSION
#The membrane and cytosolic protein have equal thermal stability
#The protein protein interaction parameter concludes that the cytosolic protein has higher binding potential for a protein of equal thermal stability
#The pca shows out of the six parameters chosen for analysis of this protein molecular weight is an outlier and has minimum relation to protein functionality whereas protein-protein interaction and hydrophobicity are inversely related 
#The other four factors such as charge,pI,Amphophilicity and polarity are related factors for a protein irrespective of its origin
#Hence the a conclusive outcome for the above pca could be that for proteins of different origins is that hydrophobicity and protein protein interaction are inversely related irrespective of its origin
#The stastical tool used are pca,biplot and self made function protanapca(which gives the pca values of any protein sequence) and protana(which gives a matrix of all amino acid vs six parameters for any given protein)

project2

A webserver based bio3d application to explore molecular dynamics

http://bio3d.ucsd.edu/nma-app/

CRYSTAL STRUCTURE OF NICKEL RECONSTITUTED HEMOGLOBIN-A CASE FOR PERMANENT, T-STATE HEMOGLOBIN

Classification: OXYGEN STORAGE/TRANSPORT Organism(s): Homo sapiens

PROTEIN IMAGE FOR PDB id 1FN3

OVERLAP ANALYSIS FOR PDB id 1FN3

NMA FLUCTUATION OF 1FN3 ##project 3 ###REAL DATA ANALYSIS

PROJECT 3

ali<-read.csv("Alipurduarr.csv")
ban<-read.csv("Bankura.csv")
bir<-read.csv("Birbhummm.csv")
bur<-read.csv("Burdwann.csv")
mal<-read.csv("Malda.csv")
coo<-read.csv("Coochbeharr.csv")
d.dina<-read.csv("D Dinajpur.csv")
e.med<-read.csv("E Medinipurr.csv")
how<-read.csv("Howrahh.csv")
jal<-read.csv("Jalpaiguri.csv")
kol<-read.csv("Kolkataa.csv")
mur<-read.csv("Murshidabadd.csv")
npgs<-read.csv("N 24 Pgs.csv")


alipur<-cbind(ali$AGE,ali$HB,ali$MCV,ali$MCH,ali$HBA0,ali$HBA2,ali$HBF)
distn<-rep("Alipurduarr",nrow(ali))
sex1<-as.character(ali$SEX)
alipura1<-cbind(alipur,distn,sex1)
summary(alipura1)

##        V1            V2            V3            V4            V5     
##  16     :143   11.3   : 43   72.3   : 13   22.9   : 26   218    : 29  
##  13     :134   11.6   : 40   70.5   : 12   21.7   : 22   228    : 27  
##  14     :120   11     : 39   71.3   : 12   21.8   : 20   225    : 26  
##  17     :115   10.7   : 35   71.8   : 11   20.2   : 19   226    : 26  
##  15     :114   10.8   : 35   72.9   : 11   20.9   : 19   217    : 25  
##  12     :111   12     : 35   (Other):992   22.7   : 19   221    : 24  
##  (Other):315   (Other):825   NA's   :  1   (Other):927   (Other):895  
##        V6            V7              distn      sex1   
##  0      :143   0      :272   Alipurduarr:1052   F:598  
##  2.8    :102   0.2    :192                      M:454  
##  2.6    : 98   0.3    :149                             
##  2.7    : 96   0.4    : 78                             
##  3      : 91   0.1    : 50                             
##  (Other):519   (Other):310                             
##  NA's   :  3   NA's   :  1

bankura<-cbind(ban$AGE,ban$HB,ban$HBF,ban$HBA2,ban$HBA0,ban$MCV,ban$MCH)
distn1<-rep("Bankura",nrow(ban))
sex<-as.character(ban$SEX)
bankura1<-cbind(bankura,distn1,sex)
summary(bankura1)

##        V1            V2            V3            V4            V5     
##  13     :138   12.1   : 53   0.2    :349   2.8    :174   88.3   : 61  
##  12     :132   12     : 52   0.3    :266   2.9    :158   87.8   : 52  
##  10     :100   11.2   : 51   0.4    :120   2.7    :149   88     : 52  
##  8      : 91   11.9   : 50   0      :106   2.6    :128   87.9   : 50  
##  15     : 86   11.3   : 48   0.1    : 90   3      :125   88.1   : 50  
##  (Other):602   11.7   : 47   0.5    : 78   3.1    : 88   87.5   : 48  
##  NA's   :  1   (Other):849   (Other):141   (Other):328   (Other):837  
##        V6             V7          distn1     sex    
##  72.7   :  17   21.6   : 31   Bankura:1150   F:458  
##  71.8   :  14   22.4   : 30                  M:692  
##  70.7   :  13   22.3   : 27                         
##  72.5   :  13   21.7   : 26                         
##  70.6   :  11   22     : 26                         
##  71.9   :  11   22.1   : 26                         
##  (Other):1071   (Other):984

birbhum<-cbind(bir$AGE,bir$HB,bir$HBF,bir$HBA0,bir$HBA2,bir$MCH,bir$MCV)
distn2<-rep("Birbhummm",nrow(bir))
sex3<-as.character(bir$SEX)
birbhuma1<-cbind(birbhum,distn2,sex3)
summary(birbhuma1)

##        V1            V2            V3            V4            V5     
##  4      :207   11.4   : 60   0.2    :370   87.9   : 61   2.9    :140  
##  3      :203   11.5   : 52   0.3    :237   87.5   : 54   3      :128  
##  5      :137   11.6   : 38   0.1    :139   87.6   : 54   3.2    :121  
##  6      : 93   11.8   : 36   0.4    : 77   87.8   : 52   3.1    :113  
##  2      : 82   11.9   : 36   0.5    : 54   87     : 48   2.8    :106  
##  8      : 75   10.6   : 35   0.7    : 30   87.7   : 48   2.7    : 82  
##  (Other):150   (Other):690   (Other): 40   (Other):630   (Other):257  
##        V6            V7            distn2    sex3   
##  20.5   : 29   69.7   : 22   Birbhummm:947   F:326  
##  20.8   : 28   70.7   : 17                   M:621  
##  21     : 28   77.6   : 16                          
##  21.2   : 28   68.5   : 15                          
##  19.9   : 27   69.2   : 15                          
##  21.8   : 27   72.9   : 15                          
##  (Other):780   (Other):847

burdwan<-cbind(bur$AGE,bur$HB,bur$HBF,bur$HBA0,bur$HBA2,bur$MCV,bur$MCH)
distn3<-rep("Burdann",nrow(bur))
sex4<-as.character(bur$SEX)
burdwana1<-cbind(burdwan,distn3,sex4)
summary(burdwana1)

##        V1            V2             V3            V4             V5     
##  14     :352   11     : 100   0.2    :956   88     : 136   2.6    :358  
##  13     :351   11.4   :  89   0.3    :429   87.8   : 125   2.7    :352  
##  15     :327   11.7   :  77   0.1    :424   87.9   : 113   2.8    :311  
##  12     :324   11.2   :  76   0.4    :215   87.7   : 112   2.5    :288  
##  11     :258   11.1   :  73   0.5    : 92   87.6   : 105   2.9    :209  
##  10     :188   11.5   :  72   (Other):139   (Other):1665   (Other):733  
##  (Other):458   (Other):1771   NA's   :  3   NA's   :   2   NA's   :  7  
##        V6             V7           distn3     sex4    
##  72.3   :  28   22     :  60   Burdann:2258   F: 522  
##  72.5   :  21   22.1   :  48                  M:1736  
##  71     :  20   21.5   :  45                          
##  71.2   :  19   21.4   :  44                          
##  73.2   :  19   22.4   :  44                          
##  76     :  19   20.7   :  40                          
##  (Other):2132   (Other):1977

cooch<-cbind(coo$AGE,coo$HB,coo$HBF,coo$HBA0,coo$HBA2,coo$MCV,coo$MCH)
distn4<-rep("Coochbeharr",nrow(coo))
sex5<-as.character(coo$SEX)
coocha1<-cbind(cooch,distn4,sex5)
summary(coocha1)

##   V1          V2           V3           V4           V5          V6    
##  11:3   10.8   : 2   0      :13   84.5   : 3   3.4    :5   72.4   : 3  
##  12:3   11.2   : 2   0.2    : 3   84.6   : 3   3.2    :4   69.1   : 2  
##  13:9   11.5   : 2   0.4    : 3   83.4   : 2   3.3    :3   73.1   : 2  
##  14:7   11.6   : 2   0.3    : 2   84.2   : 2   3.5    :3   61.7   : 1  
##  15:3   12.6   : 2   0.1    : 1   25.4   : 1   3.6    :3   64.7   : 1  
##  16:2   12.9   : 2   0.6    : 1   5      : 1   3      :2   66.8   : 1  
##         (Other):15   (Other): 4   (Other):15   (Other):7   (Other):17  
##        V7             distn4   sex5  
##  20.2   : 3   Coochbeharr:27   M:27  
##  22.3   : 3                          
##  20.9   : 2                          
##  23.4   : 2                          
##  19     : 1                          
##  19.7   : 1                          
##  (Other):15

dinaj<-cbind(d.dina$AGE,d.dina$HBF,d.dina$HB,d.dina$HBA0,d.dina$HBA2,d.dina$MCV,d.dina$MCH)
distn5<-rep("D Dinajpur",nrow(d.dina))
sex6<-as.character(d.dina$SEX)
dinaja1<-cbind(dinaj,distn5,sex6)
summary(dinaja1)

##        V1            V2            V3             V4             V5     
##  13     :408   0.2    :772   10.2   : 183   88.1   : 117   2.9    :333  
##  14     :368   0.3    :558   10.8   : 171   87.8   : 114   2.8    :330  
##  12     :357   0.4    :223   11.2   : 154   88     : 109   2.7    :279  
##  15     :282   0.1    :194   11.6   : 152   88.3   : 103   3      :275  
##  11     :206   0.5    :119   11.9   : 129   87.7   : 102   2.6    :177  
##  16     :173   0.6    : 72   10.6   : 123   88.2   : 101   (Other):695  
##  (Other):296   (Other):152   (Other):1178   (Other):1444   NA's   :  1  
##        V6             V7              distn5     sex6    
##  80.2   :  85   21.9   : 137   D Dinajpur:2090   F: 694  
##  81.2   :  73   20.8   : 125                     M:1396  
##  78.2   :  66   21.3   :  94                             
##  78.6   :  63   22.8   :  94                             
##  76.2   :  53   21.5   :  84                             
##  69.8   :  45   22.4   :  84                             
##  (Other):1705   (Other):1472

e.medini<-cbind(e.med$AGE,e.med$HB,e.med$HBF,e.med$HBA0,e.med$HBA2,e.med$MCV,e.med$MCH)
distn6<-rep("E medinipurr",nrow(e.med))
sex7<-as.character(e.med$SEX)
emedini1<-cbind(e.medini,distn6,sex7)
summary(emedini1)

##        V1            V2            V3            V4            V5     
##  12     :153   12.8   : 32   0.5    :204   82.7   : 37   2.6    : 89  
##  13     :135   12.7   : 29   0.6    :134   82.2   : 35   2.7    : 88  
##  14     :116   12.6   : 28   0.4    :117   82.4   : 30   2.5    : 79  
##  15     : 82   11.4   : 24   0.2    : 82   82.3   : 29   2.8    : 74  
##  11     : 80   12.9   : 24   0.3    : 73   82.5   : 29   2.4    : 62  
##  16     : 66   11.9   : 23   0.7    : 47   81.9   : 28   (Other):357  
##  (Other):120   (Other):592   (Other): 95   (Other):564   NA's   :  3  
##        V6            V7               distn6    sex7   
##  65     : 13   22.1   : 24   E medinipurr:752   F:123  
##  65.5   : 12   22.6   : 22                      M:629  
##  70.6   : 11   21.8   : 19                             
##  73.8   : 11   22.8   : 19                             
##  63.7   : 10   22     : 18                             
##  63.2   :  9   22.2   : 18                             
##  (Other):686   (Other):632

howrah<-cbind(how$AGE,how$HB,how$MCV,how$MCH,how$HBA0,how$HBA2,how$HBF)
distn8<-rep("Howarhh",nrow(how))
sex8<-as.character(how$SEX)
howrah1<-cbind(howrah,distn8,sex8)
summary(howrah1)

##   V1           V2           V3           V4           V5           V6   
##  11:10   12.4   : 4   71.2   : 2   26.4   : 3   87.2   : 4   2.6    :8  
##  12: 9   11.7   : 3   73.2   : 2   21.2   : 2   88     : 4   2.7    :7  
##  13: 4   13.4   : 3   88.1   : 2   21.3   : 2   87     : 3   2.9    :7  
##  14:12   10.6   : 2   62.9   : 1   22.1   : 2   87.4   : 3   3.1    :7  
##  15: 6   11     : 2   67.9   : 1   22.5   : 2   87.6   : 3   2.8    :4  
##  16: 2   11.2   : 2   68.9   : 1   22.6   : 2   88.2   : 3   2.5    :3  
##  17: 2   (Other):29   (Other):36   (Other):32   (Other):25   (Other):9  
##    V7         distn8   sex8  
##  0.1: 6   Howarhh:45   F: 8  
##  0.2:16                M:37  
##  0.3:15                      
##  0.4: 2                      
##  0.5: 2                      
##  0.6: 3                      
##  0.8: 1

jalpai<-cbind(jal$AGE,jal$HB,jal$MCV,jal$MCH,jal$HBA0,jal$HBA2,jal$HBF)
distn9<-rep("Jalpaiguri",nrow(jal))
sex9<-as.character(jal$SEX)
jalpai1<-cbind(jalpai,distn9,sex9)
summary(jalpai1)

##        V1            V2            V3            V4            V5     
##  8      : 99   11.4   : 35   70.5   : 10   22.5   : 18   85     : 39  
##  10     : 91   11.3   : 29   71.3   : 10   23     : 18   82     : 35  
##  6      : 61   12     : 27   71.4   : 10   22.7   : 17   83     : 32  
##  9      : 56   12.1   : 27   71.7   : 10   22.6   : 16   89     : 26  
##  5      : 54   11.8   : 26   71.1   :  8   23.1   : 16   84     : 24  
##  13     : 47   12.3   : 25   68.7   :  7   21.6   : 15   88     : 24  
##  (Other):257   (Other):496   (Other):610   (Other):565   (Other):485  
##        V6            V7             distn9    sex9   
##  2.9    : 98   0.2    :188   Jalpaiguri:665   F:264  
##  2.8    : 93   0.3    : 84                    M:401  
##  2.7    : 78   0      : 73                           
##  3      : 66   0.4    : 69                           
##  3.1    : 60   0.9    : 36                           
##  (Other):265   0.1    : 31                           
##  NA's   :  5   (Other):184

kolkata<-cbind(kol$AGE,kol$HB,kol$MCV,kol$MCH,kol$HBA0,kol$HBA2,kol$HBF)
distn10<-rep("kolkataa",nrow(kol))
sex10<-as.character(kol$SEX)
kolkata1<-cbind(kolkata,distn10,sex10)
summary(kolkata1)

##        V1           V2           V3           V4           V5    
##  20     :24   13.3   : 4   85.3   : 4   22.5   : 3   88.8   :10  
##  21     :18   13.5   : 4   71.1   : 2   26.4   : 3   88.2   : 8  
##  22     :17   14.6   : 4   71.3   : 2   26.7   : 3   88.9   : 8  
##  19     :12   14.7   : 4   71.5   : 2   27.1   : 3   88.5   : 7  
##  18     :10   10.9   : 3   76     : 2   27.6   : 3   88.6   : 7  
##  23     : 7   11.8   : 3   79.1   : 2   28.6   : 3   89     : 5  
##  (Other): 8   (Other):74   (Other):82   (Other):78   (Other):51  
##        V6           V7         distn10   sex10 
##  2.6    :17   0.1    :42   kolkataa:96   F:21  
##  2.4    :13   0.2    :30                 M:75  
##  2.5    :12   0.3    :11                       
##  2.7    :12   0.4    : 6                       
##  2.3    :10   0.5    : 2                       
##  2.9    : 7   0.6    : 1                       
##  (Other):25   (Other): 4

malda<-cbind(mal$AGE,mal$HB,mal$MCV,mal$MCH,mal$HBA0,mal$HBA2,mal$HBF)
distn11<-rep("Malada",nrow(mal))
sex11<-as.character(mal$SEX)
malda1<-cbind(malda,distn11,sex11)
summary(malda1)

##        V1            V2            V3            V4            V5     
##  14     :156   11.7   : 71   81.6   : 50   25.1   : 63   86.9   : 36  
##  12     :154   11     : 58   81.4   : 47   24.9   : 57   87.2   : 32  
##  13     :146   11.9   : 58   80     : 44   25.6   : 54   87.4   : 25  
##  15     :130   10.4   : 51   81.7   : 41   25.3   : 51   87.5   : 25  
##  11     : 96   12     : 50   81     : 35   24.6   : 45   87.1   : 24  
##  (Other):450   10.7   : 49   81.9   : 33   24.7   : 42   81.7   : 23  
##  NA's   :  1   (Other):796   (Other):883   (Other):821   (Other):968  
##        V6            V7        distn11     sex11  
##  2.7    :145   0.3    :336   Malada:1133   F:460  
##  2.6    :144   0.4    :231                 M:673  
##  2.5    :143   0.2    :214                        
##  2.9    :102   0.5    :105                        
##  2.8    :101   0.6    : 68                        
##  2.4    : 83   (Other):178                        
##  (Other):415   NA's   :  1

murshi<-cbind(mur$AGE,mur$HB,mur$MCV,mur$MCH,mur$HBA0,mur$HBA2,mur$HBF)
distn12<-rep("Murshidabadd",nrow(mur))
sex12<-as.character(mur$SEX)
murshi1<-cbind(murshi,distn12,sex12)
summary(murshi1)

##        V1            V2            V3            V4            V5     
##  13     : 49   11     : 15   80     : 10   24     : 11   86.8   : 15  
##  14     : 44   10.2   : 14   82.9   :  7   25.3   : 11   87     : 13  
##  15     : 44   10.7   : 12   81.6   :  6   25.9   :  8   87.1   : 13  
##  12     : 40   13.9   : 12   81.7   :  6   21.1   :  7   87.3   : 13  
##  16     : 39   10     : 11   82.5   :  6   25.1   :  7   87.5   : 13  
##  11     : 36   11.1   : 10   90.4   :  5   25.4   :  7   87.8   : 12  
##  (Other):108   (Other):286   (Other):320   (Other):309   (Other):281  
##        V6            V7              distn12    sex12  
##  2.7    : 49   0.3    :100   Murshidabadd:360   F: 81  
##  3      : 43   0.2    : 97                      M:279  
##  2.9    : 40   0.4    : 54                             
##  2.6    : 38   0.5    : 27                             
##  2.8    : 35   0.9    : 14                             
##  3.1    : 32   (Other): 67                             
##  (Other):123   NA's   :  1

Npgs<-cbind(npgs$AGE,npgs$HB,npgs$MCV,npgs$MCH,npgs$HBA0,npgs$HBA2,npgs$HBF)
distn13<-rep("N 24 Pgs",nrow(npgs))
sex13<-as.character(npgs$SEX)
npgs1<-cbind(Npgs,distn13,sex13)
summary(npgs1)

##        V1           V2            V3            V4            V5     
##  13     :38   11.9   : 14   67.2   :  5   21.3   : 10   87.7   : 17  
##  11     :29   11.1   : 13   69.1   :  4   21.2   :  8   87.5   : 15  
##  12     :28   11.6   : 13   69.2   :  4   21.5   :  7   87.3   : 13  
##  10     :23   11.7   : 12   69.5   :  4   21.7   :  7   87.8   : 13  
##  8      :21   11     : 10   70.5   :  4   22.2   :  7   87.6   : 12  
##  9      :21   11.8   :  9   76.7   :  4   21.4   :  6   87.4   : 11  
##  (Other):62   (Other):151   (Other):197   (Other):177   (Other):141  
##        V6           V7         distn13    sex13  
##  2.7    :51   0.2    :82   N 24 Pgs:222   F: 28  
##  2.8    :39   0.3    :41                  M:194  
##  2.6    :28   0.1    :38                         
##  2.5    :21   0.4    :19                         
##  2.9    :20   0.5    :17                         
##  3      :16   0.6    : 9                         
##  (Other):47   (Other):16

master<-rbind(alipura1,bankura1,birbhuma1,burdwana1,coocha1,dinaja1,emedini1,howrah1,jalpai1,kolkata1,malda1,murshi1,npgs1)
master<-data.frame(master)
master.f<-na.omit(master)
master1<-rbind(alipur,burdwan,bankura,birbhum,cooch,dinaj,e.medini,howrah,jalpai,kolkata,malda,murshi,Npgs)
colnames(master1)<-c("age","hb","hbf","hba2","hba0","mcv","mch")
master1.f<-na.omit(master1)
pca<-prcomp(master1.f,scale= TRUE)
biplot(pca)

g_class<- master.f$sex
library("ggbiplot")
g<-ggbiplot(pca,obs.scale = 0.3,var.scale = 0.3,groups = g_class,ellipse = TRUE,circle = TRUE,pc.biplot = TRUE)
g<-g+xlim(-2.5,2.5)
g<-g+ylim(-2.5,2.5)
g<-g+theme(legend.direction = 'horizontal',legend.position = 'top')
print(g)

## Warning: Removed 139 rows containing missing values (geom_point).

conclusion

The biplot shows that mcv, mch and hb are very much related to each other.Hence they are contribiting factors.Age has very less co-relation to all other components and does not directly affects any component to a large extent.Hbf is likely to be a independent component of the analysis. From the ggbiplot which seperates components on the distinction of male and female patients the components seem to be likely be the same as the biplot without much variance except the fact that the female popuplation have a high correlation of mcv to mch than male.Also the age becomes more important in case of females with aspect of all the components correlation.The most distinct factor here are the outliers that are found which may be either or consisderd as very special scenarios having very rare conditions for both the male and female conditions. further this project could be extended to focus on finding this outliers or rare clinically important conditions because they are purely independent of each other.For the time being this project concludes with the fact that mcv,mch and hb are dependent variables and hbf is the most independent variable component.Also age and sex(M/F) are moderately significant for the other components except females have atendency to have more mcv and mch with increase in age as suggested from the ggbiplot.According to the grammer of graphics plot (ggbiplot) HbA0 and HbA2 are oppositly releated. When the normal Hb increased then HbA2 decresed and when HbA2 value increased probablity of thalasemia high.

ASSIGNMENT 2018

Asmita Paul