(This article was originally published at R – Xi'an's Og, and syndicated at StatsBlogs.)

**T**he [then current now past] riddle of the week is a sort of multiarmed bandits optimisation. Of sorts. Or rather a generalised knapsack problem. The question is about optimising the allocation of 100 undistinguishable units to 10 distinct boxes against a similarly endowed adversary, when the loss function is

and the distribution q of the adversary is unknown. As usual (!), the phrasing of the riddle is somewhat ambiguous but I am under the impression that the game is played sequentially, hence that one can learn about the distribution of the adversary, at least when assuming this adversary keeps the same distribution q at all times.

Thus given such a q, one aims at maximising the expected loss

which also amounts to maximising

based solely on the marginals of q. Under the constraint that the sum of the allocations is 100. It seems rather natural to assume a Multinomial distribution on y, with probability vector p. Hence Binomial marginals.

If we knew y, the optimum would be obtained as taking as many x[i] equal to y[i]+1 as possible for the largest values of the index i and cancelling all other terms. However, this is not necessarily the best option when, by cancelling one x[i], all remaining lowest terms could then be filled. The other point is that, once a positive loss is achieved, there is plenty of freedom in allocating the remaining bits. But this is not the problem, only the distribution matters. Still, here is an R code to optimise x given y:

loz=function(x,y){ return(sum(2*(1:10)*(x>y)+(1:10)*(x==y)))} #distribution of dominating allocations T=1e2 armz=matrix(0,T,100) for (l in 1:T){ # initialisation with default probability y=rmultinom(n=1,size=100,prob=1:10) #adversary x=y+1 #impossible allocation i=1 #get rid of lesser values while (sum(x)>100){x[i]=0;i=i+1} x[i-1]=100-sum(x) #correct sum #random search maxloz=loss=0 newloss=loz(x,y) t=1 while ((t<100)||(newloss>loss)){ #exchange two groups i=j=sample(1:10,1,prob=(x>0)) while (j==i) j=sample(1:10,1,prob=(x<(y+1))) #optimise over amount of move kaz=min(x[i],y[j]+1-x[j]) gain=rep(0,kaz) for (k in 1:kaz){ newx=x newx[i]=x[i]-k;newx[j]=x[j]+k gain[k]=loz(newx,y) if (gain[k]>maxloz){ maxloz=gain[k];argz=newx}} gain=gain-loss #soft simulated annealing mov=1 #choice for a move if ((max(gain)>0)&(kaz>1)) mov=sample(1:kaz,1,prob=gain*(gain>0)) gain=gain+loss;loss=newloss newloss=gain[mov] x[i]=x[i]-mov;x[j]=x[j]+mov t=t+1 if (maxloz>loz(y,argz)) break() } #creating a distribution on each of the 100 objects reprz=rep(1,x[1]) for (m in 2:10) reprz=c(reprz,rep(m,x[m])) armz[l,]=reprz }

In the riddle, after each round, the posterior on the Multinomial distribution of the adversary can be updated from the latest input. The next step is to play the algorithm against itself, assuming each adversary follows the same track. Or wait for The Riddler to provide the winning solution. Which is what happened to me as a frantic schedule in Warwick over the past two days (and multiple activities in the Swiss Alps before) preventing me from procrastinating this way! Still the plane ride back to France and the even longer late (in both senses) night train slog to Paris let me experiment with R codes… Here is for instance a representation of the table of allocations of the 100 units to the ten classes [after re-ordering]:

This makes for a distribution to sample from and try to compete against the adversary. After building this scheme, I went back to The Riddler to check for solutions. And found that I had misinterpreted the riddle since all competitors entered a single value of y instead of a distribution! Like the winning (3,5,8,10,13,1,26,30,2,2). Still, using this repartition as the probability of the adversary in the above quote, using the resulting distribution of x led to a 96.8% success rate:

game=function(y,armz){ w=0 v=rep(1,100) x=rep(0,10) for (i in 1:1e3){ for (u in 1:100) v[u]=sample(armz[u,],1) for (b in 1:10) x[b]=sum(v==b) w=w+(loz(x,y)>loz(y,x))} return(w/1e3)}

Checking against other top finishers also led to dominance, albeit less clearly:

> cy=c(3,5,8,10,13,1,26,30,2,2) > game(cy,armz) [1] 0.966 > sy=c(3,6,7,9,11,2,27,31,2,2) > game(sy,armz) [1] 0.91 > ny=c(5,7,9,11,15,21,25,2,2,3) > game(ny,armz) [1] 0.609 > jy=c(1,8,2,2,13,18,20,32,2,2) > game(jy,armz) [1] 0.977 > my=c(0,0,1,11,11,16,26,31,2,2) > game(my,armz) [1] 0.959

I wish I had time to test this code against the 1388 solutions proposed there, on Github, which is much less than the 4 10¹² different allocations!, as it would show how resistant the distribution can be to different strategies.

Filed under: Books, pictures, R, Statistics, Travel Tagged: allocations, Colonel Blotto game, github, optimisation, R, simulated annealing, The Riddler

**Please comment on the article here:** **R – Xi'an's Og**