First, let’s load the datas, right from R (no need to load csv) so it’s easier to share this report. As the base data is not that big, it’s pretty convenient.
First load the groups (with embedded Rhesus) :
id <- c("O-", "O+", "B-", "B+", "A-", "A+", "AB-", "AB+")
Next, create the function that extracts the group, that is, makes it easy to extract the Rhesus (Rh+ or Rh-) from a full group :
library(stringr)
get_groupe <- function(agroupe){
out <- ""
out <- str_replace_all(agroupe, "[^[:alnum:]]", "")
return(out)
}
And give a try to the function :
test <- get_groupe("AB-")
test
## [1] "AB"
Next, buld the vector that stores the extracted groups, thanks to the previsouly created function :
groupe <- lapply(id, get_groupe)
head(groupe)
## [[1]]
## [1] "O"
##
## [[2]]
## [1] "O"
##
## [[3]]
## [1] "B"
##
## [[4]]
## [1] "B"
##
## [[5]]
## [1] "A"
##
## [[6]]
## [1] "A"
Now, let’s deal with rhesus : at first, create the function that extracts the rhesus from a given group :
# get rhesus
get_rhesus <- function(agroupe){
if(grepl('-', agroupe)){
return("Rh-")
} else {
return("Rh+")
}
}
… and try to call it :
test <- get_rhesus("AB+")
test
## [1] "Rh+"
test <- get_rhesus("O-")
test
## [1] "Rh-"
Next, build the list of rhesus by applying the function :
rhesus <- lapply(id, get_rhesus)
head(rhesus)
## [[1]]
## [1] "Rh-"
##
## [[2]]
## [1] "Rh+"
##
## [[3]]
## [1] "Rh-"
##
## [[4]]
## [1] "Rh+"
##
## [[5]]
## [1] "Rh-"
##
## [[6]]
## [1] "Rh+"
Next, just store the available groups distribution accross the population :
rate <- c(7,39, 1.5, 7.5, 6, 36, 0.5, 2.5)
Next, build the datas needed for the network display, that is : nodes et edges :
# build the data frame for the nodes
nodes <- cbind(id, groupe, rhesus, rate)
head(nodes)
## id groupe rhesus rate
## [1,] "O-" "O" "Rh-" 7
## [2,] "O+" "O" "Rh+" 39
## [3,] "B-" "B" "Rh-" 1.5
## [4,] "B+" "B" "Rh+" 7.5
## [5,] "A-" "A" "Rh-" 6
## [6,] "A+" "A" "Rh+" 36
## now, build the edges
edges <- cbind(c("O-", "O-", "O-","O-","O-","O-","O-","O-", "O+","O+","O+","O+","B-","B-","B-","B-","B+","B+","A-","A-","A-","A-","A+","A+","AB-","AB-","AB+"),
c("AB+","AB-","A+","A-","B+","B-","O+","O-","AB+","A+","B+","O+","AB+","AB-","B+","B-","AB+","B+", "AB+", "AB-", "A+", "A-", "AB+", "A+", "AB+", "AB-", "AB+"))
head(edges)
## [,1] [,2]
## [1,] "O-" "AB+"
## [2,] "O-" "AB-"
## [3,] "O-" "A+"
## [4,] "O-" "A-"
## [5,] "O-" "B+"
## [6,] "O-" "B-"
We have the datas to create the charts
At first, I’m asking myself which group is the most common :
piechart_df <- data.frame(
group = id,
value = rate
)
head(piechart_df)
## group value
## 1 O- 7.0
## 2 O+ 39.0
## 3 B- 1.5
## 4 B+ 7.5
## 5 A- 6.0
## 6 A+ 36.0
library(ggplot2)
bp<- ggplot(piechart_df, aes(x="", y=value, fill=group))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie
The main question was to show the easy way who can give blood to who, this is made through graph objects :
First, prepare the datas :
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
Ready to draw with a circular layout which suits the best this graph :
plot(net,
edge.arrow.size=.7,
vertex.label.dist=3,
layout=layout_in_circle
)
We can see clearly who can give blood to who, ie. who is compatible with who. We keep loops on the networks as it’s a reality that a people of the same group can exchange blood with each others (A+ is compatible with A+).
The size of the nodes are mapped on the degree of incoming edges : the bigger a node is, the best it is as i means that it can receive from a wider ranger of people…at the opposite : the more little you are, the more you are isolated, take a look at the O- node : he can give to everybody but only can receive from O- (check the local loop).
The question is to detect if there is a global rule of compatibility between rhesus.
Therefore, we need to prepare, that is, in our case make somme aggregation.
First, on the edges (relations between groups compatibility), replace each group by its rhesus :
rhesus_edges <- apply(edges, c(1,2), get_rhesus)
head(rhesus_edges)
## [,1] [,2]
## [1,] "Rh-" "Rh+"
## [2,] "Rh-" "Rh-"
## [3,] "Rh-" "Rh+"
## [4,] "Rh-" "Rh-"
## [5,] "Rh-" "Rh+"
## [6,] "Rh-" "Rh-"
Once this done just aggregate and count occurrences by grouping them :
# Add a column to compute the weight of the relation
rhesus_aggreg <- data.frame(rhesus_edges)
rhesus_aggreg["weight"] <- 1
# Next, compute the count
weighted_rhesus_edges <- aggregate(rhesus_aggreg[c("weight")], by=list(rhesus_from=rhesus_aggreg$X1, rhesus_to=rhesus_aggreg$X2), FUN=sum, na.rm=TRUE)
weighted_rhesus_edges
## rhesus_from rhesus_to weight
## 1 Rh- Rh- 9
## 2 Rh- Rh+ 9
## 3 Rh+ Rh+ 9
# Let's build a list of unique rhesus as they will become the nodes of the graph
unique_rhesus <- unique(weighted_rhesus_edges$rhesus_from)
Ready to plot, and to make things even easier to read, let’s remove loops on the graph :
# now plot relations between rhesus
net <- graph_from_data_frame(d=weighted_rhesus_edges, vertices=unique_rhesus, directed=T)
net <- simplify(net, remove.loops = TRUE)
plot(net, edge.arrow.size=.5,
vertex.label.dist=3)
With this last graph, it’s very clear that the Rh- are the givers…and the Rh+ the receivers : Rh- give to people like them and to the others while Rh+ only give to people like themselves…a bit like a selfish rhesus ;-p
In other terms, Rh- people are isolated…from the others. We can see that in a more “numerical” way by taking a look at the distances matrix, so we can get the shortest path length between rhesus :
net <- graph_from_data_frame(d=rhesus_edges, vertices=unique_rhesus, directed=T)
shortest.paths(net, v=V(net), mode = c("in"),
weights = NULL)
## Rh- Rh+
## Rh- 0 Inf
## Rh+ 1 0
The shortest path from Rh+ to Rh- is a path… of an infinite length (Inf) ; in other terms, there literally is no way for a group from RH+ to give to a Rh-.
To see a more detailed report on compatibility path between groups, just get the matrix of shortes paths as shown below :
net <- graph_from_data_frame(d=edges, vertices=id, directed=T)
shortest.paths(net, v=V(net), mode = c("in"),
weights = NULL)
## O- O+ B- B+ A- A+ AB- AB+
## O- 0 Inf Inf Inf Inf Inf Inf Inf
## O+ 1 0 Inf Inf Inf Inf Inf Inf
## B- 1 Inf 0 Inf Inf Inf Inf Inf
## B+ 1 1 1 0 Inf Inf Inf Inf
## A- 1 Inf Inf Inf 0 Inf Inf Inf
## A+ 1 1 Inf Inf 1 0 Inf Inf
## AB- 1 Inf 1 Inf 1 Inf 0 Inf
## AB+ 1 1 1 1 1 1 1 0
As we can see, we only get distances equal to :
I first wondered if there could be path of length == 2, as is there could be transitions in blood compatibility (A can give to C and C can give to B…but A cannot give to B), but there wasn’t.