Influenza is a difficult disease to track, since the symptoms that it causes are also caused by many pathogens, such as coronaviruses and rhinoviruses. While a laboratory test exists to confirm influenza infection, it is not widely used in outpatient visits as the course of treatment for all of the viruses that cause these symptoms is similar unless the infection is caught quickly. As such, surveillance of influenza, especially outpatient influenza, often focuses on influenza-like illness as a proxy.
In the United States, surveillance of influenza activity is coordinated by the Centers for Disease Control and Prevention. Levels of outpatient influenza are reported from the US Outpatient Influenza-like Illness Surveillance Network (ILINet), a network of 2,200 outpatient providers that report weekly data on the total number of patients seen that week, along with the number of patients with influenza-like illness (ILI), defined as a fever and either cough or sore throat, without a known cause other than influenza. The number of patients with ILI is divided by the total number of patients to create a percentage of outpatient visits due to ILI. The raw percentages are reported for individual states, while for Health and Human Services Regions or national estimates the state percentages are weighted by state population before being combined.
Weekly forecasts are created separately for each location (US, each HHS Region, each state) based on prior weeks' levels of influenza-like illness (ILI), current cumulative percentages of infections due to each of H1N1 and H3N2 influenza viruses, and Google Trends data based on searches for 'flu'. Four related models are fit for each location - a weighted ensemble model, a dynamic harmonic regression model, a static historical average model, and a historical average model weighted by influenza virus type prevalence.
The ensemble model is a combination of the forecasts from the three other models. Ensembles have a strong track record of performance in machine learning and forecasting. The idea is to combine predictions from several less powerful models to improve the final predictions. Protea's ensemble model combines predictions from the other three models using a weighted average. Models are assigned different weights depending on the location, target, and time of year, with the values of the weights determined using predictions from training seasons.
Forecast accuracy is evaluated in two different ways, which each measure slightly different aspects of the forecast.
Use the drop down menus below to explore how forecast accuracy varies by location, season, and model type.
If you have any additional questions or comments, please feel free to reach out to craig.mcgowan@proteaanalytics.com