(This article was originally published at Statistics – Freakonometrics, and syndicated at StatsBlogs.)

In demography, we like to use life tables to estimate the probability that someone born in 1945 (say) is still alive nowadays. But another interesting quantity might be the probability that someone **alive** in 1945 is still alive nowadays.

The main difference is that we do not know when that person, alive in 1945, was born. Someone who was old in 1945 is very unlikely still alive in 2017. To compute those probabilities, we can use datasets from http://www.mortality.org/hmd/. More precisely, we need both death and birth data. I assume that datasets (text files) were downloaded (it is necessary to register – for free – to get the data).

`D=read.table("FRDeaths_1x1.txt",skip=1,header=TRUE)`

B=read.table("FRBirths.txt",skip=1,header=TRUE)

In the death dataset, there is a “110+” for people older than 110 years. For convenience, let us cap our observations at 110 years old,

`D$Age=as.numeric(as.character(D$Age))`

D$Age[is.na(D$Age)]=110

Consider now a first function that will return, for people born in 1930 (say) two informations

- the number of people (here, let us consider women only) born in 1930 (from the birth database)
- the number of death of people of age 0 in 1930, people of age 1 in 1931, people of age 2 in 1932, etc…

The code is simple

`nb=function(y=1930){`

debut=1816

MatDFemale=matrix(D$Female,nrow=111)

colnames(MatDFemale)=debut+0:198

cly=y-debut+1:111

deces=diag(MatDFemale[,cly[cly%in%1:199]])

return(c(B$Female[B$Year==y],deces))}

We have a single number for the number of births, and then a vector for the number of deaths. Consider now another function. Consider the people born in 1930. We want to get two numbers : the number of people still alive in 1945 (say), and the number of people still alive nowadays. The ratio will be the proportion of people born in 1930 that were alive in 1945, that are still alive in 2015.

`pop=function(ne=1930,an=1945){`

comptage=nb(ne)

s=0

if(an>ne) s=sum(comptage[seq(2,1+an-ne)])

p1=max(comptage[1]-s,0)

p2=max(p1-sum(comptage[seq(2+an-ne,length(comptage))]),0)

c(p1,p2)

}

Then, for a given year (say 1945), to get the proportion of people alive in 1945 that are still alive today, we need to count how many people born in 1944 were still alive in 1945, and in 2015, but also born in 1943, 1942, etc, And we simply consider the ratio of the total number of people alive in 2015 over the total number of people alive in 1945

`ptn=function(y=1945){`

V=Vectorize(function(x) pop(x,y))(1816:y)

sum(V[2,!is.na(V[2,])])/sum(V[1,!is.na(V[1,])])

}

Hence, 22% of those alive in 1945 are still alive in 2015,

`> ptn(1945)`

[1] 0.2209435

Actually, instead of looking only at 1945, it is possible to get a plot

`P=Vectorize(ptn)(1900:2010)`

plot(1900:2010,P,type="l",ylim=0:1)

For instance,

`> ptn(1975)`

[1] 0.6377413

i.e. 63.7% of those alive in 1975 are stil alive 40 years after. That is a rather interesting function, I was surprised that I couldn’t find it is standard demographical R package…

**Please comment on the article here:** **Statistics – Freakonometrics**