Methodology for Estimating Excess Mortality #3 - Gompertz
Still in pursuit of the most robust and reliable model. Getting there...
For the foundation of this process of modelling see iteration #1 (linear fitting to the periodic data) here:
And iteration #2 (polynomial fitting to the cumulative series) here:
Both methods had their limitations. The linear model did not capture the convexity in the younger ages. the polynomial model captured too much convexity and too much of the concavity in the older ages.
The reasons for the limitations (and the solution) were always apparent. the underlying distributions followed a Gompertz function1 which has both convexity and concavity that is established over the entire course of the periodic data (essentially 100 years for any given year of birth).
Attempting to fit simple linear or polynomial models to an interval or integral of this distribution is always going to be problematic and potentially produce forecasts (that we need to estimate excess deaths) that are unreasonable, i.e. they do not (or would not) fit the empirical Gompertz distribution (if we knew what it was).
Now, of course, the data does exist in the ONS archives to establish just that - the entire empirical Gompertz distribution for every year of birth. Unfortunately, it is not publicly available without paying a hefty premium and waiting several weeks that I don’t have right now.
So, I had to get creative and estimate the Gompertz distribution from the limited amount of available data (2015 to early 2020). It turns out, the problem I had to solve was the same issue faced using polynomial fitting. The shorter integral has short term information that dominates the calibration but does not necessarily fit with the entire distribution.
This became most apparent to me when trying to fit the Gompertz to the integral. I could not. The reason being, that the way my model works, produces a cumulative series, from which I then bootstrap the periodic data. I normally start, therefore, with a known point on the cumulative series. But, the cumulative series in the limited dataset is not the true cumulative series - it is missing all the deaths prior to 2015. For all years of births, it is lower than the true series.
To overcome this issue, I simply had to solve for the correct seed point (imputing a point on the true cumulative series) in addition to solving for the other two parameters of the model (the instantaneous growth and constant decay). It took a while longer for each year-of-birth curve to calibrate but the results were satisfying.
To verify this, I plotted all the overlapping Gompertz functions (including the 3-year forecasts derived from each individual model (Figures 1 and 2).
If any model was unreasonable, we would see it deviate from the stitched-together curve (i.e. it captured too much short-term convexity or not enough). Either that, or it would indicate that a particular year-of-birth cohort had experienced an idiosyncratic event that would alter their expected course of mortality from other years (the ultimate subject of my analysis).
Evidently, there are differences in the absolute levels of weekly deaths as we would expect because the yearly cohorts are different sizes. However, there is also some slight variability in the shapes of the distributions in the older ages that we could perhaps improve (Figure 1).
More evidently, there are some quite significant differences in convexity in the younger ages (Figure 2) as we suspected would happen due to over-fitting (capturing short term dynamics too well which do not fit with the longer-term dynamics, especially when we don’t have the longer=term information to calibrate too!). Let’s hope this isn’t the manifestation of some event affecting millennial babies which would be the alternative hypothesis.
Curiously, year 2000 appears to have captured the expected concavity (albeit a little too much according to my acuity), where as the years either side appear to have captured some anomalous convexity.
And then, it is also apparent that we are completely missing convexity in the youngest cohort (2009).
Also, as I suggested at the end of methodology #2, we do, indeed, observe a bi-modal distribution with a clear hump between weeks 520 (16 years of age) and 1144 (28 years of age)2. We would do well to fit an average distribution between these points to assist us with the individual year cohorts.
Anyway, too many corrections required to ensure this model is robust so it’s on to methodology #4 next. This is will involve producing a single Gompertz function by splicing together all the single year of birth series then fitting a simple periodic percentage change model to it. Preliminary results are very satisfying in terms of accuracy and robustness.
The data starts with year of birth 2009 and year of death 2015 so we have to add 6 years (312 weeks).
Any chance you will be working with MP Andrew Bridgen on October 20th, in Parliament. Everyone please write to your MP to let him/her know you expect them to attend Andrew’s presentation regarding this dystopian hell that is occurring.
Thank you for your stellar work Joel. Did you hear that Andrew Bridgen has finally secured an Adjournment Debate on Friday 20th October 2023: Trends in excess deaths? I bet very few turn up.