Let us suppose that we need 4 numbers that add to 1.

randomNumbers<-sample(1:100,4)
sumOfRandomNumbers<-sum(randomNumbers)

round(randomNumbers/sumOfRandomNumbers,4) #proportion of each random number 
## [1] 0.4049 0.1415 0.3268 0.1268

The most straightforward way to accomplish this task would be to randomly obtain 4 numbers, sum them up (using this number as the denominator) and then divide each number by such denominator - providing, thus, each number a participation/proportion in those 4 numbers that add to 1.

sum(randomNumbers/sumOfRandomNumbers) #proof they actually sum to 1
## [1] 1
pie(randomNumbers/sumOfRandomNumbers)

Problem with This Methodology

The problem with this methodology is that numbers tend to cluster in the center.


Let’s observe this:

dfProb<-data.frame()
for(i in 1:2000)
{
  unaRow<-runif(4)
  rowFinal<-unaRow/sum(unaRow)
  dfProb<-rbind(dfProb,rowFinal)  
}

#adding names to columns
colnames(dfProb)<-paste('Obs',1:4)


#observing addition of 4 points - proof they sum to one
head(cbind(round(dfProb,4),Total=rowSums(dfProb)))
boxplot(dfProb, col = c(2:4,7))

Maximum number obtained in each observation:

#max number obtained
round(apply(dfProb,2,max),4)
##  Obs 1  Obs 2  Obs 3  Obs 4 
## 0.8242 0.7706 0.8350 0.8941

As you are able to notice, none of the observations in a sample of 2,000 randomly generated points obtains a value in the extremes. That is, beyond 0.95. And, for the most part, the highest number obtained in all 4 points is below 0.85. Which means that this methodology fails to actually be random.


Let us observe this in a geometrical perspective

Let’s suppose we have 4 points that form a polygon:

aDF<-data.frame(x=c(0,0.5,2,3),y=c(0,6,7.5,5))

library(ggplot2)

ggplot()+geom_path(data=rbind(aDF,c(0,0)), #added one last row to close the polygon - for better viewing
                   aes(x,y), linetype=2, color='slateblue1',size=1.3)+
  geom_point(data=aDF,aes(x,y), cex=3, color='steelblue')

From Liner Algebra, we can recall that we can obtain each dot within the polygon by obtaining each combination of 4 numbers that add to 1 and assigning each proportion to each dot.

\[\sum_{ij} x_{ij} y_{ij}\]

where :

\[\sum_{x_1}^{x_4} = 1\]

Let’s view the graph as the result of multiplying each point in the polygon by different proportions:

dataFrameMultiplication<-data.frame(i=0,j=0,x=0,y=0)

for(i in 1:2000)
{
  for(j in 1:4)
  {
    unVector<-aDF[j,]*dfProb[i,j]
    dataFrameMultiplication[(i-1)*4+j,]<-c(i,j,unVector)
    #points(unVector[1],unVector[2],col="blue",pch=19)
  }
  
}


library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#now we will group

resultFinal<-group_by(dataFrameMultiplication,i)%>% summarise(sumaX = sum(x), suamaY = sum(y))%>% ungroup()


ggplot()+geom_path(data=rbind(aDF,c(0,0)), #added one last row to close the polygon - for better viewing
                   aes(x,y), linetype=2, color='slateblue1',size=1.3)+
  geom_point(data=aDF,aes(x,y), cex=3, color='steelblue')+
  geom_point(data=resultFinal[,2:3],aes(sumaX,suamaY))

As you are able to notice, there is a tendency for points to clutter in the center because randomly generated numbers do not contain extreme values. In other words this methodology smoothes out the extremes.


Fixing Randomness Deficiency

Creating Function for Real Randomness

#function to create Randomly selected numbers that sum to 1

crearNumerosRandomSumanUno<-function(n){
  
  laCadena<-rep(0,n) #crea un vector de zeroes
  # posee tantos ceros a como el numero que definis
  # en la funcion cuando la corres 

#supongamos n=5
  
  for(i in 1:n)
  {
    if(i==1) #dado que sea el primer elemento
    {
      secuencia<-seq(0,1,by=0.01) #crea un vector de 101 elementos 
      # que van desde cero hasta 1, por cada 0.01;
      # es decir 0,0.01,0.02...0.98,0.99,1
      
      x<-sample(secuencia,1) #se toma un elemento random
      laCadena[i]<-x #este vendria a ser el primer elemento
      
      if(sum(laCadena)==1){ #dado que el primer elemento sea igual a 1
        break
      }
      
      
    } else if (i<=n-1) { #dado que sea cualquier otro elemento pero no el ultimo elemento
      
      
      secuencia<-seq(0,1-sum(laCadena),by=0.01) #se crea una secuencia de numeros 
      # de tal manera de 0 hasta el maximo numero que pudiese tomar el vector 
      # dado que los numeros no sumen mas de 1.
      
      #por ejemplo si el primer numero random fue 0.6, entonces tiene logica que no podes
      # tomar un numero superior a 0.4 (porque la sumatoria de estos numero superaria a 1)
      
      x<-sample(secuencia,1)
      laCadena[i]<-x
      
      if(sum(laCadena)==1){
        break
      }
      
    }
    
    else { #dado que sea el ultimo elemento
      if(sum(laCadena)<1){
        laCadena[i]<-1-sum(laCadena)  
      }
      
    }
  }
sample(laCadena,n,replace = F) #una vez creada la cadena, la randomizas entre ella misma
#para que todos los numeros tengan igual oportunidad de aparecer
}

In essence, the formula does the following by steps:

1. Selects the first number randomly
2. Second number would be selected randomly from 1 minus the first number (sum of both numbers can not be larger than 1)
3. 3rd Number and on would be selected one by one randomly from 1 - sum(of numbers selected so far)
4. Last number would be 1 - sum(numbers selected). Last number needs not be randomized.
5. Let’s randomize all selected numbers so that no number can actually be predicted. Otherwise, the first number would have a tendency towards 0.5, and the second number towards 0.25 , etc…


Let’s randomize in the proper way

df<-data.frame()

for(i in 1:2000){
  
  df<- rbind(df,crearNumerosRandomSumanUno(4)) 
}

colnames(df) <- paste('Obs',1:4)
boxplot(df, col=c(2:4,7))

#max number obtained
round(apply(df,2,max),4)
## Obs 1 Obs 2 Obs 3 Obs 4 
##     1     1     1     1

As you are able to notice, all of the randomly generated sequences contain a possibility of generating extreme numbers, included the possibility of having all 100% of the weight (observation == 1).


Geometrical Perspective

dataFrameMultiplication1<-data.frame(i=0,j=0,x=0,y=0)

for(i in 1:2000)
{
  for(j in 1:4)
  {
    unVector<-aDF[j,]*df[i,j]
    dataFrameMultiplication1[(i-1)*4+j,]<-c(i,j,unVector)
    #points(unVector[1],unVector[2],col="blue",pch=19)
  }
  
}


library(dplyr)

#now we will group

resultFinal1<-group_by(dataFrameMultiplication1,i)%>% summarise(sumaX = sum(x), suamaY = sum(y))%>% ungroup()



ggplot()+geom_path(data=rbind(aDF,c(0,0)), #added one last row to close the polygon - for better viewing
                   aes(x,y), linetype=2, color='blue')+
  geom_point(data=aDF,aes(x,y), cex=3, color='steelblue')+
  geom_point(data=resultFinal1[,2:3],aes(sumaX,suamaY), color = 'green')

#superimposing old method vs new method
ggplot()+geom_path(data=rbind(aDF,c(0,0)), #added one last row to close the polygon - for better viewing
                   aes(x,y), linetype=2, 
                   color='slateblue1', size=1.3)+
  geom_point(data=aDF,aes(x,y), cex=3, color='steelblue')+
  geom_point(data=resultFinal[,2:3],aes(sumaX,suamaY), color = 'black')+
  geom_point(data=resultFinal1[,2:3],aes(sumaX,suamaY), color = 'green')

The difference between both methods is noticeable. The new method closely resembles the real shape of the graph, whereas the old method seems like a big blob in the center of the graph