An algorithmic method to identify epidemic waves of COVID-19

The COVID-19 pandemic has put epidemiology in the spotlight. Epidemics, epidemic peaks and waves of transmission are all subjects of discussion. However, there is no universally agreed definition of these concepts. The term “epidemic wave” can refer to anything from a well-defined attribute of a mathematical object to a loosely defined component of a time series. Despite the limitations of the definitions, these descriptive phrases are useful for planning and public health.

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of the pandemic, has spread around the world since it emerged in Wuhan, China at the end of December 2019. Non-pharmaceutical interventions (IPN ) have been carried out with varying levels of rigor and speed by governments around the world in an effort to prevent and reduce the importation and local spread of the virus. Unfortunately, these NPIs often come at a high price. Therefore, it is essential to understand how to reduce transmission costs as effectively as possible. In addition, given the many potential drivers of regional heterogeneities, it is difficult to understand the epidemic in a single country; making meaningful comparisons between countries is even more difficult.

In this research paper, a team of scientists from various institutions in the UK and Poland provide contributions aimed at solving this problem. First, the authors clarify the multiple ways in which researchers use the word “epidemic wave”. Their technique divides the epidemic time series (of confirmed cases and deaths) into non-overlapping “observed waves”. It is emphasized that this is not another definition of an epidemic wave, but rather an exercise to highlight some of the traits that any viable definition should include. Following this analysis, the authors present a more nuanced interpretation of the data.

A preprinted version of this study, which is yet to be peer reviewed, is currently available on the website medRxiv* server.

The study

The algorithm used in this study was applied to all countries for which data was available in the context of COVID-19. By applying the algorithm to both the time series of cases and deaths, the authors could use cross-validation to account for the confounding effect of the change in case determination and improve identification of case waves.

(A) Choropleth shows the number of days since the emergence of the first cases in China on December 31, 2019, until the cumulative number of deaths in each country exceeds 10. Countries with darker colors have crossed the threshold earlier than the lighter country colors. After starting in China, epidemics occurred in Europe, the Middle East and North America before moving south to South America, Africa and the Pacific. (B) Scatter plot showing the correlation between the days until the epidemic threshold is reached in each country compared to the GNI per capita for that country showing a negative trend, i.e. the pandemic s first spread to countries with higher GNI per capita. Linear regression line in purple with 95% confidence interval shaded (C) Time series of daily number of confirmed cases (left) and deaths (right) per 10,000 population among countries with evidence of a second wave (light gray), and the 7-day moving median of the mean between countries (black line). For each country, the time is taken relative to the date on which the epidemic took hold.

Only two trends identified are statistically significant at the 5% level. First, a greater number of waves is linked to a longer response time in a pinch (a one-sided Mann-Whitney test suggests that countries with more than one wave responded considerably slower than countries with only one wave. , p = 0.0002) and gross national income (GNI) (p 0.0001). The relationship between population density and mortality is not statistically significant.

Descriptions of the waves discovered are based on the idea that time series of deaths are a more reliable and consistent indicator of patterns of viral activity than a simple time series of cases. Transmission and testing are the two main drivers of waves in the time series of case incidence.

An increase in transmission can trigger a wave, an increase in tests, or a combination of both, if the test regime changes during a transmission wave.

As a result, it is often impossible to compare the case incidence statistics of the following two waves. However, at the very least, the presence or absence of an associated peak in mortality incidence can be used to infer the relative difference in drivers. In addition, the authors identify a third kind of wave on a national scale (spatially asynchronous waves). Countries with this typology of waves can benefit from the isolation of local epidemic curves and the development of local intervention measures.

In Italy, two separate waves of confirmed cases and two separate waves of mortality are occurring at almost identical times. However, the ratio of cases to deaths around each peak varies considerably between wave 1 and wave 2, implying a downward trend in the case fatality rate (CFR) that requires careful consideration.

Identification of COVID-19 epidemic waves. A: Zambia shows a clear structure with two waves (red circles) in the case data, while no waves are identified in the death data. B: UK shows a structure that could arguably have two or three waves, but sub-algorithm D combines the last two. C: In Ghana, sub-algorithm B filters out an early peak of cases. It is not visually clear whether this is noise or a significant epidemiological event; the algorithm cannot do better than the reader in determining this by simply inspecting a graph. No wave of deaths is identified due to the low absolute numbers. D: The number of cases in Costa Rica does not decrease by 70% after the first wave, so it is not identified by the algorithm as a wave. This shows how important the Prel parameter can be. However, a cross-validation with the time series of deaths makes it possible to identify the wave (yellow circle)

In the United States, three waves of cases and deaths are visually perceived, with the algorithm integrating the first two waveforms into a single wave. Again, there is a notable disparity between the number of cases and the number of deaths. In this case, investigators noticed regional diversity between the waves, with the outbreak concentrating in different places at different times. This is an illustration of spatially asynchronous waves in action.


It is possible to convert the intuitive visual perception of time series “waves” into simple mathematical procedures that can annotate many time series by objectively identifying their vague components. These waves can occur due to increased transmission, increased testing, or a combination of both in the context of COVID-19. Also, waves can form as a result of time series aggregation of a large geographic area, so the second wave is actually the first, but for a different part of the country. When performing comparative analyzes of the links between interventions and disease-related mortality, using the wave as a time unit of analysis can lead to more precise conclusions. The speed at which interventions are applied is primarily related to the wave structure of the next outbreak.

*Important Notice

medRxiv publishes preliminary scientific reports which are not peer reviewed and, therefore, should not be considered conclusive, guide clinical practice / health-related behavior, or treated as established information.

Comments are closed.