 Research
 Open access
 Published:
A Bayesian spatiotemporal study of the association between meteorological factors and the spread of COVID19
Journal of Translational Medicine volume 21, Article number: 848 (2023)
Abstract
Background
The spread of COVID19 has brought challenges to health, social and economic systems around the world. With little to no prior immunity in the global population, transmission has been driven primarily by human interaction. However, as with common respiratory illnesses such as influenza some authors have suggested COVID19 may become seasonal as immunity grows. Despite this, the effects of meteorological conditions on the spread of COVID19 are poorly understood. Previous studies have produced contrasting results, due in part to limited and inconsistent study designs.
Methods
This study investigates the effects of meteorological conditions on COVID19 infections in England using a Bayesian conditional autoregressive spatiotemporal model. Our data consists of daily case counts from local authorities in England during the first lockdown from March–May 2020. During this period, legal restrictions limiting human interaction remained consistent, minimising the impact of changes in human interaction. We introduce a lag from weather conditions to daily cases to accommodate an incubation period and delays in obtaining test results. By modelling spatiotemporal random effects we account for the nature of a human transmissible virus, allowing the model to isolate meteorological effects.
Results
Our analysis considers cases across England’s 312 local authorities for a 55day period. We find relative humidity is negatively associated with COVID19 cases, with a 1% increase in relative humidity corresponding to a reduction in relative risk of 0.2% [95% highest posterior density (HPD): 0.1–0.3%]. However, we find no evidence for temperature, wind speed, precipitation or solar radiation being associated with COVID19 spread. The inclusion of weekdays highlights systematic under reporting of cases on weekends with between 27.2–43.7% fewer cases reported on Saturdays and 26.3–44.8% fewer cases on Sundays respectively (based on 95% HPDs).
Conclusion
By applying a Bayesian conditional autoregressive model to COVID19 case data we capture the underlying spatiotemporal trends present in the data. This enables us to isolate the main meteorological effects and make robust claims about the association of weather variables to COVID19 incidence. Overall, we find no strong association between meteorological factors and COVID19 transmission.
Introduction
Since January 2020 COVID19 has spread across the world and changed societies dramatically. As the virus spread many countries took unprecedented measures to control it, including restrictions on public gatherings and movement. This had negative knockon economic and societal impacts as well as creating mental health issues. A UK government study showed deterioration in public mental health during lockdown periods [1], whilst restrictions led to an increase in the court case backlog [2] and demand on the NHS led to a reduction in elective care, substantially increasing the NHS backlog [3]. Given the impact both the virus and control measures have had on daily lives, it is crucial for governing, health and economic authorities to understand risk factors associated with COVID19 so that public policies are designed to balance risks to public health with social and economic impacts.
In response, many studies were conducted to understand key risk factors associated with COVID19 transmission. Whilst age is known to be a key driver of serious illness other factors such as poverty and ethnicity were also identified as playing an important role in the spread of COVID19 [4]. These were often linked to increased human interaction, the foremost driver of COVID19 spread early in the pandemic, due to confounding factors such as living conditions or employment type. However, the effects of meteorological conditions are less well understood. For instance, due to similarities between COVID19 and the flu many authorities claimed the virus would be less effective in warmer weather [5,6,7]. However, these seem illjudged as the virus has impacted all but the most remote places on the planet. Discrepancies early in the pandemic could be due to factors such as when the disease first arrived in certain countries and the ability of more developed countries to roll out mass testing quicker than less developed countries, many of which are in warmer climates.
Nonetheless, many other respiratory viruses, including influenza and other human coronaviruses (of which COVID19 is an instance) are susceptible to weather [8, 9]. This is due to the virus being less stable in warmer and more humid weather, as well as these conditions limiting the distance the virus can travel, and the susceptibility of the host [10]. With COVID19 likely to become endemic in the longer term, many scientists believe that COVID19 may display similar behaviour and become seasonal, once immunity has reached a critical point either naturally or via vaccination [8, 9, 11].
At the time of writing, the World Health Organisation has declared that COVID19 is no longer a global health emergency, but still classes it as a pandemic [12]. Whilst life has returned to normal in many countries, this is reflective of the threat that COVID19 continues to pose. As such it is critical to understand the longerterm behaviour of COVID19 so that authorities can prepare, for instance by timing booster vaccine programmes for vulnerable sections of the population, or implementing testing infrastructure so that the prevalence and spread of COVID19 can be tracked.
However, when studying the association of weather with COVID19 spread many nonmeteorological factors that contribute to it must be accounted for. In particular, as COVID19 spreads through human interaction spatial and temporal correlations are present in case data. As such statistical models that capture these spatiotemporal effects are needed to address this question.
To date, previous studies have delivered heterogeneous findings. For instance, a study in Germany found temperature is negatively correlated with the spread of COVID19, and argued humidity was positively correlated [13]. In contrast, a US study found temperature, humidity and precipitation were all negatively correlated with the number of cases [14]. More recently [15] showed that the effect of meteorological conditions on COVID19 transmission varies by country, with both low and high ranges of humidity associated with increased transmission in differing climates.
This study highlights the importance of methods used as, unlike many studies which assume monotonic relationships between variables, it allows for nonmonotonicity between meteorological variables and case counts. This could be beneficial if, within a normal range of meteorological parameters, there is a subrange where COVID19 is most or least transmissible.
One reason suggested for the widespread heterogeneity is the lack of consistency in study designs and analysis methods [16]. Analysis of COVID19 case data is not straightforward, as methods need to take account of the temporal and geographical dependency between the data points. Numerous approaches are possible, each requiring its own assumptions, opening up the possibility for different conclusions to be drawn from similar data. In some instances, the dependencies were altogether ignored, resulting in underestimated uncertainty and increased the chance of spurious findings.
In this paper we investigate the relationship between weather conditions and COVID19 spread. We study the average effect of several variables in order to understand the potential longterm seasonal behaviour of COVID19. Using a Bayesian spatiotemporal model we capture autocorrelation in case counts. This enables us to separate out the effects of regression parameters and make strong claims about these effects. Recent developments have made such models accessible to researchers, in particular through the use of the CARBayesST package in R, introduced by [17]. These models have previously been used in other epidemiological contexts including the study of dengue fever [18], malaria [19] and human foot and mouth disease [20].
Data
Study area and period
The study considers England’s \(K = 312\) local authority districts (LADs) as of April 2020, including the Isle of Wight and Isles of Scilly. We consider a 61day period from 31st March 2020  30th May 2020 which has been chosen to reflect the first full lockdown in England when restrictions remained stable. This allows us to assume the nature of social interactions is consistent throughout the study period, not affected by the weather. Therefore we can isolate the natural effects of weather on COVID19 spread without accounting for indirect effects due to weather affecting social interactions.
Daily cases and geographical region
Counts of labconfirmed cases of COVID19 (daily cases) in England were obtained from Public Health England [21] for each day in the study period. Daily cases were recorded in each lower tier local authority (LTLA) in England and refer to the date when they were confirmed.
We aggregated LTLA daily cases to the LAD level using definitions from the Office for National Statistics (ONS). This aggregation was necessary as LAD shape files for spatial analysis are available for the LADs (but not the LTLAs) through the Open Geography portal [22]. We used the boundaries that were current as of May 2020. Based on these boundaries, daily cases for the LTLAs of Aylesbury Vale, Chiltern, South Bucks, and Wycombe were aggregated to obtain cases for the LAD of Buckinghamshire. We also merged the LAD boundaries for Hackney and the City of London as well as Cornwall and Isles of Scilly, as daily cases were recorded at this aggregated level. The resulting set of boundaries were used in the analyses to define shared borders between pairs of LADs as well as for presentation of results.
Population data for LADs in England were derived from the ONS mid2019 annual estimates [23]. These were aggregated as above for Hackney with City of London and Cornwall with Isles of Scilly.
Meteorological
We obtained weather data for the United Kingdom for the whole of 2020 from the European Commission reanalysis database (ERA5) [24]. This publicly available resource provides an accurate set of hourly data at a 25km \(\times\) 25km grid resolution and includes the following features: solar radiation, temperature, humidity, wind velocity and precipitation. Note here that the spatiotemporal resolutions of the raw weather data do not align with the daily reporting of cases in LADs. ERA5 provides a suite of hourly weather parameters that may affect local COVID19 transmissions at a 30km spatial resolution.
We first aggregated the hourly measurements for each grid to the daily level by taking an average from midnight to 11:59pm. For each LAD we identified the set of points on the ERA5 grid contained in the LAD or within 30km of its boundary. For each weather parameter, we then took the average of all points in the corresponding set, to obtain a LAD level daily average. This strategy provides a consistent rule for assigning measurements of weather data, particularly to LADs that do not contain one of the grid points from the ERA5 data.
Methodology
Software
All data processing and analyses were completed in the R language. The wgrib software was used to convert the data from ERA5 into a format readable by R. The CARBayesST package was used to run MCMC spatiotemporal Bayesian models whilst the packages ggplot2, rgdal, leaflet, coda, and spdep were used to analyse results.
Statistical modelling
Bayesian inference
In Bayesian inference we assume that an observable random variable is generated by a process whose parameters are themselves random variables. Our goal is to study this process by learning about the parameters. In Bayesian inference our learning process starts with our prior belief, before seeing any data, then incorporates observations y of the observable random variable.
Let our observable random variables be Y, and \(\Theta\) a set of parameters for its distribution \(p_Y\). Let \(p_{\Theta }(\theta )\) be a joint probability distribution representing our prior belief for the parameters; this may be a noninformative prior if we have no strong belief. The likelihood of an observation y is the conditional distribution \(p_{Y}(y\Theta )\). Bayes’ Theorem states that we can calculate the joint posterior distribution of the parameters as:
This means that our updated belief after observing y, represented by \(p(\theta y)\), can be computed by multiplying our prior belief with the conditional likelihood of the data. It is enough to know the posterior up to a normalising constant since we know the true posterior’s integral over its full support must equal 1.
The idea that parameters are themselves random variables can be extended so that the parameters \(\beta _{\theta }\) for the distribution \(p_{\Theta }\), often referred to as hyperparameters, are also random variables. In fact this can be continued indefinitely, however, as more levels are added interpretability can be lost. Because of this at some point hyperparameters are set as point estimates.
Introducing a lag
When someone is infected with COVID19 there is a delay before they become ill known as the virus incubation period. There may then be further delays before they get tested and the result is obtained. As such to assess the impact of weather on COVID19 transmission risk we introduced a 6day lag between weather factors and daily cases. In this report days refer to the day cases were recorded, the lag was created by taking the weather from 6 days prior. This effectively reduces the data set to \(N = 55\) time periods.
Model covariates
In the model we included regression variables for temperature, relative humidity, wind velocity, solar radiation and precipitation. Due to the large positive skew in the wind velocity and precipitation variables we applied a transformation to normalise them. For the wind variable we applied a log transformation and for precipitation a BoxCox transformation with parameter 0.274, both after replacing zero values with the minimum nonzero value. One factor not often considered in the existing literature is the difference between weekdays due to administrative processes. Since fewer cases were reported on weekends due to reduced testing capacity [25] we included weekdays as a factor regression variable to capture these effects. In our analysis, Thursday presents the baseline against which other days are measured.
Standardised case rates
The number of daily cases in a given LAD is highly dependent on the population. To account for this we chose to include expected cases as an offset in the model, allowing us to effectively model the relative risk across all LADs. The model random effects capture latent temporal and spatial trends within the case count data, so the expected count only needs to consider the LAD specific population. Since we consider a relatively small time period we assume LAD population remains constant over time, as it is not practical to obtain daily counts. This means that LADspecific expected cases remain constant in time.
As such expected cases are defined in equation (2) where \(P_k\) is the population of the kth LAD and \(y_{kt}\) its observed daily cases at time t.
The LAD specific relative risk for time t, \(\theta _{kt}\), is estimated by:
Bayesian spatiotemporal model with adaptive prior
As the number of cases \(Y_{kt}\) in LAD k at time t can be modelled as count data we use a Bayesian hierarchical Poisson GLM for the daily cases likelihood function. It is common for count data to be generated from several underlying Poisson processes, leading to ‘overdispersion’ where the variance in the data is much higher than the mean, so that the data is not well modelled by a single Poisson distribution. However, in our Poisson GLM the Poisson parameter is the product of expected cases \(E_{k}\) and relative risk \(\theta _{kt}\) for the kth local authority at time t. The relative risk is dependent on a linear predictor containing covariates \(\textbf{x}_{kt}\) and spatiotemporal random effects \(\phi _{kt}\). As the relative risks vary spatially and temporally through the covariates and spatiotemporal effects our model fits a different Poisson distribution to each LADtime pair. This allows us to capture the underlying system behaviour and its corresponding overdispersion.
In the linear predictor \(\textbf{x}_{kt}\) denotes model covariates with corresponding regression parameters \(\beta\) given a noninformative prior.
The spatiotemporal random effects \(\phi _{kt}\) act as a latent variable for factors affecting the relative risk \(\theta _{kt}\) that are not given as model inputs. For instance, these will adjust for case spread due to the number of cases in LAD k, and neighbouring LADs, at time \(t1\). As such these random effects are correlated both spatially and temporally. Different structures can be given to these random effects, all of which use a neighbourhood matrix W to identify geographically correlated LADs [26], but have other differing underlying assumptions on the system behaviour.
In this study we use an adaptive prior which assumes the autocorrelated random effects follow a normal distribution whose mean of \(\Phi _t = (\phi _{1,t},\cdots , \phi _{K,t})\), the collection of random effects at time t, depends only on the random effects at the previous time period, with the first time period having a zero mean. The adaptive prior allows us to replace nonzero elements of W with random variables \(\textbf{w}^+\). Through a precision matrix \(\textbf{Q}\) these control localised spatial correlation between random effects. As they are random variables they allow different LAD pairs to have different correlation strengths. For example, correlation may be greater between neighbouring innercity LADs than between neighbouring rural LADs.
The precision matrix also depends on the parameter \(\rho _S \in [0,1]\) which controls the underlying strength of spatial variance whilst the overall spatial variance also depends on the variance parameter \(\tau ^2\). In the \(\tau ^2\) InverseGamma prior the default values of a, b are respectively 1 and 0.01. Details of the InverseGamma distribution are presented in the Appendix. The strength of temporal dependence in random effects is controlled by \(\rho _T\) which has support [0, 1].
Given P covariates the model is then defined as:
To model \(\mathbf {w^+}\) as a collection of random variables they’re first mapped to the real line via the following transformation:
A multivariate Gaussian prior with fixed mean \(\mu\) and inversegamma variance \(\tau _{\omega }^2\) is used to model the transformed elements so the joint density f is:
Following the work of Rushworth in [27] the prior choice of the value \(\mu\) influences the type of posteriors found for elements of \(\mathbf {w^+}\). Values greater than 0 generally prefer posterior elements close to 1, indicative of autocorrelation. In particular, the default value of 15 assigns high probability for elements being close to 0 or 1, corresponding to neighbouring LADs displaying correlation or not. Since the case counts display spatial autocorrelation we proceed with the default value. The default hyperparamater values c, d in the \(\tau _{\omega }^2\) prior are respectively are 1 and 0.01.
The variance parameter \(\tau _{\omega }^2\) controls the amount of variation in the \(\mathbf {v^+}\) and hence \(\mathbf {w^+}\). Small values of \(\tau _{\omega }^2\) make the density value small almost everywhere, except when \(v_{ij}^+\) is close to \(\mu\). Therefore small values correspond to globally strong autocorrelation whilst large values allow autocorrelation strength to vary between neighbours.
Neighbourhood matrix
We defined W using a binary approach identifying neighbouring LADs, an established approach in the use of Bayesian spatiotemporal models, which introduces spatial correlation in the random effects between neighbouring LADs. If \(LA_i\) denotes the ith local authority the neighbourhood matrix is such that:
Since the model requires a contiguous geography we defined the neighbours of the Isle of Wight to be those in England with ferry routes to the island, namely Portsmouth, Southampton and the New Forest.
Second neighbour model
To investigate the effect of the neighbourhood matrix specification and better understand the spatiotemporal effects we ran a second model (\(M_1\)) using the same structure as the original model (\(M_0\)) but with a different neighbourhood matrix \(W'\). This was based on the original neighbourhood matrix but with an adaptation made for England’s five largest metropolitan areas, Greater London, Tyneside, Merseyside (excluding The Wirral), Greater Manchester and the West Midlands. Within each of these regions we identified two nonneighbouring LADs as neighbours if they shared a common immediate neighbour also within that region.
Model implementation
The adaptive model was implemented using version 3.3 of the CARBayesST package in R. Since MCMC methods are computationally intensive we ran four MCMC chains in parallel, each with a burnin of 200,000 samples and a further 500,000 samples, thinned by 200 to reduce sampling autocorrelation. This yielded a total of 10,000 samples (2,500 per chain) for inference. All chains used the default fixed hyperparameters whilst regression parameters were given the default normal prior distribution of mean 0 and variance \(10^5\). The regression parameters were sampled using the simple random walk algorithm method within the CARBayesST package.
We checked the posterior distributions for convergence of individual chains using the Geweke diagnostic and for convergence between MCMC chains using the GelmanRubin diagnostic. Our inference is based on the model’s 95% highest posterior density intervals and posterior means.
Spatiotemporal diagnostics
To validate the performance of our models we examined the residuals for spatiotemporal autocorrelation. We tested for spatial autocorrelation using Moran’s I statistic, with the binary neighbourhood matrix. This test was applied for each time period individually and a Bonferroni correction was used for inference. Temporal autocorrelation was tested for using the LjungBox test for time series autocorrelation for each LAD. Again, a Bonferroni correction was used for inference.
Results
After adjusting the data to account for lags between weather data and daily cases we were left with 55 time periods and 312 LADs, giving 17,160 total observations. Key summary statistics on the data are presented in Table 1:
An initial visualisation of the data in Fig. 1 shows fewer cases are regularly reported on Saturday and Sunday of each week, whilst all other days appear to be relatively consistent in reported cases. Geographically, Fig. 2 displays the relative risk across the study period by LAD on a \(\log _2\) scale, so that an increase by one unit corresponds to a doubling in the relative risk. This displays clear clustering in the relative risk with high risk areas including Kent, Manchester and Tyneside, whilst North Lincolnshire and the Southwest of England are areas of low risk.
Model inference
Weather analysis
The posterior distributions in Fig. 3 for each of temperature, wind velocity, solar radiation and precipitation show no strong relationship between the covariate and the number of observed cases. Whilst the 95% HPD for relative humidity lies below 0 the size of the parameter in this interval suggests the effect is marginal, with a 1% rise in humidity corresponding to a fall in the relative risk between 0.1–0.3%.
Spatial analysis
Qualitative analysis of the model residuals suggests it’s largely captured spatial trends in the data as the mean residuals for each LAD have a random pattern, seen in Fig. 4. Quantitatively, using Moran’s I statistic at a 95% significance level with a Bonferroni correction, no time period showed spatial autocorrelation.
The posterior of \(\rho _S\) in Fig. 5, controlling the underlying strength of spatial autocorrelation, is close to 1. This allows for strong spatial autocorrelation in the random effects \(\phi _{kt}\). However, to understand the spatial structure the \(\textbf{w}^+\) elements must also be considered. The variation amongst \(\textbf{w}^+\) elements, controlled by \(\tau _{\omega }^2\), is large taking values in the range of 150–220. This large variation allows for both strong and weak correlations in the neighbourhood matrix posteriors. The posteriors for both \(\rho _S\) and \(\tau _{\omega }^2\) remain similar in the second model.
In both versions of the model the majority of \(\textbf{w}^+\) elements have a posterior density towards 1, a few elements have most of their density towards 0. This demonstrates the models identifying an overall geographical correlation in daily cases, but finding there are a small number of neighbouring LADs that did not influence each other.
Whilst on a national scale the collection of spatial correlations display similar trends between the two models, there are differences at a localised level. In \(M_1\) there are both examples of strong second neighbour correlations and differences in immediate neighbour correlations from \(M_0\).
An example of this is in Hackney, which Fig. 6 shows was strongly correlated with all 9 of its neighbouring LADs in \(M_0\). This gives credence to our assumption of high correlation in urban areas. However, whilst \(M_1\) finds Hackney to be closely correlated with Camden it gives a different view on other neighbours as shown in Fig. 7. For instance, Hackney is mostly uncorrelated with Westminster. In contrast, Fig. 8 shows Hackney is strongly correlated with its Eastern second neighbours Barking & Dagenham and Redbridge, whilst the correlations are less strong with southern second neighbours, in particular Bromley.
Figure 7 also highlights that \(\textbf{w}^+\) elements can have bimodal posteriors, such as the correlation between Hackney and Islington. This style of posteriors allows these LADs to share an overall trend with some elements of independent behaviour. Figure 9 displays samples from the random effect posteriors for Hackney and Islington. Whilst both LADs have an overall decreasing trend in terms of relative risk, Islington has a relatively stationary random effect in days 5–12 and 28–38. Contrastingly, Hackney’s random effect is stationary for a much smaller time, generally decreasing in these periods.
Whilst the localised correlation structures differ between \(M_0\) and \(M_1\), we found the overall regression results were insensitive to the change in neighbourhood matrix. In both models humidity displayed a small effect at a 95% confidence level, with other factors showing no evidence of an association with daily cases.
Temporal analysis
Overall the model captures the temporal trend in the data. Figure 10 shows the mean daily residuals across LADs fluctuating around 0 with no discernible difference between weekdays. The temporal trend within individual LADs is also captured, with the LjungBox test indicating just 4.8% of LADs showing temporal autocorrelation in their residuals at a 95% significance level.
The weekday posteriors in Fig. 11 highlight the trend of fewer cases being reported on weekends. with both Saturday and Sunday having large negative coefficients. Their 95% posteriors are in the ranges (\(\)0.437, \(\)0.272) and (\(\)0.448, \(\)0.263) respectively. This corresponds to, relative to Thursday, 24–35% fewer cases reported on Saturdays with 23–36% fewer cases reported on Sundays. Friday also shows fewer cases reported to a lesser extent, with between 3–15% fewer cases. There is no evidence of consistent case overreporting on any particular day, with all other weekday 95% highest posterior densities containing 0.
The posterior for \(\rho _T\) in Fig. 5 is clustered around 0.85–0.89. This introduces high correlation in the random effects \(\phi _{kt}\). However, the temporal variance between the random effects, controlled by \(\tau ^2\), is relatively small clustered around 0.11–0.14. The effect of these hyperparameters is seen in the Hackney and Islington random effects, where the random effects for each LAD change smoothly and gradually over the 40day period.
Discussion
In this report we aimed to identify relationships between meteorological factors and COVID19 transmission using Bayesian spatiotemporal techniques. We’ve shown this type of model can capture the autocorrelation present in observed cases of a humantransmissible virus. Our results provide evidence for relative humidity being negatively associated with COVID19 transmission but no evidence that other factors are associated with the virus’s spread. We believe that weather conditions are unlikely to critically affect COVID19 transmission.
The model posterior provides evidence for a negative relationship between relative humidity and COVID19 spread, something that is disputed amongst previous studies. For instance, in Germany [13] claimed a positive correlation whilst in the US [14] proposed a negative relationship. More widely, [15] found both low and high ranges of humidity could be associated with increased COVID19 prevalence.
Our findings provide no evidence that other meteorological are associated with COVID19 transmission. For temperature this agrees with the findings of [28] in Spain, despite their study’s use of a nonmonotonic relationship between temperature and observed cases, allowing for a possible ‘peak transmission’ range to be identified.
A possible reason for the lack of meteorological effects is that initially the virus’s spread may have been driven almost entirely by human interaction due to the low immunity in the population. Whilst spatiotemporal effects are used to account for these factors we appeal to George Box’s famous quote that “All models are wrong, but some are useful”. We interpret this as believing the spatiotemporal effects help to understand the spatial and temporal structure of the relative risk. However, the ‘true’ effect on the spread of COVID19 may be so sensitive to slight changes not captured in the model. For instance, the complexities of how society adjusted to lockdowns, the increase in testing capacity during the study period as well as changes in case reporting whilst the infrastructure was developed could all impact the observed case numbers.
The main feature of the adaptive spatiotemporal effect structure is that it allows the model to account for different strengths of spatial autocorrelation across the full geography. As shown in [27] and this report the model design encourages values close to 1, representing correlation, or 0 representing independence. However, by limiting spatial autocorrelation through \(\rho _S\) and allowing for spatial variation through \(\tau _{\omega }^2\) the model remains flexible enough to capture localised trends even when two neighbours are highly correlated.
We found that a change in the neighbourhood matrix allowed us to place different assumptions on the underlying behaviour. In particular, the use of a more complex neighbourhood matrix could enable researchers to discover more complex relationships between LADs. However, since the overall regression results are insensitive to our change we believe it’s not crucial to find a ‘perfect’ neighbourhood matrix to analyse a problem. This allows researchers to make simplifying assumptions on geographical dependencies and design their neighbourhood matrix appropriately on that basis. This could be advantageous when either finding a ‘perfect’ neighbourhood matrix is a complex problem, or using one would put strain on computational resources.
One limitation of our findings is that we have not compared our results with different models, such as a spatial moving average model. One interesting approach would have been to use an Integrated Nested Laplace Approximation (INLA) model. The key advantage of INLA over MCMC methods is its low computational cost, since it focuses on approximating the marginal posteriors. Whilst INLA does not provide the same theoretical guarantees as MCMC, in practise it has proven to be an effective tool. However, one limitation of INLA is that it should only be used with a small number of hyperparameters outside of the linear predictor, otherwise the computational cost savings are lost [29]. In particular, we could not natively use INLA to model the underlying spatial correlations between LADs, as we did using the adaptive prior in CARBayesST. We believe this capability to identify a more complex spatial structure provides greater insight into COVID19 incidence in this scenario.
Overall this report has found that there is no evidence of an association between most meteorological factors and COVID19 spread. In particular, this report shows that COVID19 transmission may be impacted by humidity, but is unlikely to be critically limited by it. As such public policies designed to address COVID19 should not rely on seasonality as part of their strategy.
Conclusion
In this study, we accounted for the nature of the virus in two ways. By introducing a lag between the variables and observed cases we adjusted for the delay between transmission and reporting cases due to an incubation period. Other authors such as [13] have shown how this can be extended to include multiple weighted lags. We also controlled for correlation in virus transmission through the use of spatiotemporal effects. We’ve shown how our model has captured both temporal and spatial trends in daily cases, allowing us to make robust conclusions from the model outputs.
This study focused on isolating the main effects of individual meteorological factors on COVID19 spread. To do so we looked at a narrow time frame from 31st March 2020 until 30th May 2020, covering only the initial phase of the pandemic. This has several important implications regarding the generalisability of the results. Firstly, results are derived from limited ranges for the meteorological variables, in particular temperature, solar radiation and humidity. As such the results are only applicable in countries with similar climates and may not apply to England during other seasons. It would be prudent for future studies to include data from a full year to better assess the effect of meteorological variables.
A more complex version of our model could introduce a spatial, temporal or spatiotemporal regression parameters for each meteorological variable to identify whether any weather conditions have spatial, temporal or temporalspatial associations with the spread of COVID19. However, we believe any associations found would both be difficult to interpret and less likely to generalise outside of the specifics of our study. As such we chose to focus solely on the average effect of each factor.
The introduction and subsequent successful uptake of COVID19 vaccines, allowing the lifting of lockdown restrictions, has dramatically altered where and when the virus can spread. This is likely to have had an impact on how important meteorological factors are associated with COVID19 transmission. In England all legal social restrictions were lifted by the end of February 2022, but it may be of interest to perform similar analyses with 12 months of data where no restrictions were in place. Whilst vaccinations continue during this period the vast majority of adults had received the recommended 2 doses by this point.
Future research could also consider different design principles for the neighbourhood matrix. In this study the neighbourhood matrix was based solely on neighbouring local authorities, the most common approach in the literature. Another simplistic option would be to have a weight between all pairs of local authorities that is inversely related to distance, justified by assuming the further apart two local authorities are the less related they are likely to be. A more complex approach would be to estimate the amount of human traffic between neighbouring local authorities and define weights based on this. Dependent on the alternative approach this may require a different prior structure for the spatiotemporal effects as the adaptive model requires a binary neighbourhood matrix.
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files.
Abbreviations
 ECMWF:

European Centre for MediumRange Weather Forecasts
 LAD:

Local Authority District
 LTLA:

Lower Tier Local Authority
 MCMC:

Markov chain Monte Carlo
 ONS:

Office for National Statistics
 S.D.:

Standard deviation
 SSRD:

Surface solar radiation downwards
References
UKGovernment: COVID19 Mental Health and Wellbeing Surveillance: Report. https://www.gov.uk/government/publications/covid19mentalhealthandwellbeingsurveillancereport Accessed 18 Aug 2022.
MoJ: Reducing the backlog in criminal courts. Accessed August 2022; 2022.
NHS: Delivery plan for tackling the covid19 backlog of elective care. Accessed Aug 2022; 2022.
Mahase E. Covid19: death rate is 0.66% and increases with age, study estimates. BMJ Br Med J. 2020;369:m1327.
abcNews: DHS Study Trump Touted on Sunlight, Heat Killing Virus Preliminary, Not Peerreviewed. https://abcnews.go.com/Politics/dhsstudytrumptoutedsunlightheatkillingvirus/story?id=70328629 Accessed 18 Aug 2022.
BBCNews: UK Coronavrus Briefing 22062020. https://twitter.com/BBCNews/status/1275103107232432133. 18 Aug 2022.
SkyNews: Coronavirus: nine times jair bolsonaro dismissed the severity of COVID19. https://news.sky.com/story/coronavirusninetimesjairbolsonarodismissedtheseverityofcovid1912023297. Accessed 18 Aug 2022.
Neher RA, Dyrdak R, Druelle V, Hodcroft EB, Albert J. Potential impact of seasonal forcing on a sarscov2 pandemic. MedRxiv 2020.
Kissler SM, Tedijanto C, Goldstein E, Grad YH, Lipsitch M. Projecting the transmission dynamics of sarscov2 through the postpandemic period. Science. 2020;368(6493):860–8.
Lowen AC, Steel J. Roles of humidity and temperature in shaping influenza seasonality. J Virol. 2014;88:7692–5.
BritishMedicalJournal: Covid19 Will Become Endemic but with decreased potency over time, scientists believe. https://www.bmj.com/content/372/bmj.n494 Accessed 23 Jul 2023.
UnitedNations: WHO Chief Declares End to COVID19 as a Global Health Emergency. https://news.un.org/en/story/2023/05/1136367 Accessed 23 Jul 2023.
Ganegoda NC, Wijaya KP, Amadi M, Erandi K, Aldila D. Interrelationship between daily covid19 cases and average temperature as well as relative humidity in germany. Sci Rep. 2021;11(1):1–16.
Chien LC, Chen LWA, Lin RT. Lagged meteorological impacts on covid19 incidence among highrisk counties in the united statesa spatiotemporal analysis. J Expo Sci Environ Epidemiol. 2022. https://doi.org/10.1038/s4137002100356y.
Ai H, Nie R, Wang X. Evaluation of the effects of meteorological factors on covid19 prevalence by the distributed lag nonlinear model. J Transl Med. 2022;20:170.
Kerr GH, Badr HS, Gardner LM, PerezSaez J, Zaitchik BF. Associations between meteorology and covid19 in early studies: Inconsistencies, uncertainties, and recommendations. One Health. 2021;12: 100225.
Lee D, Rushworth A, Napier G. Spatiotemporal areal unit modeling in r with conditional autoregressive priors using the carbayesst package. J Stat Softw. 2018;84:1–39.
Aswi A, Cramb S, Mengersen K, et al. Effects of climatic factors on dengue incidence: A comparison of bayesian spatiotemporal models. In: Journal of Physics: Conference Series, 2021;1863: 012050. IOP Publishing.
Rouamba T, Samadoulougou S, Tinto H, Alegana VA, KirakoyaSamadoulougou F. Bayesian spatiotemporal modeling of routinely collected data to assess the effect of health programs in malaria incidence during pregnancy in burkina faso. Sci Rep. 2020;10(1):1–15.
Du Z, Lawrence WR, Zhang W, Zhang D, Yu S, Hao Y. Bayesian spatiotemporal analysis for association of environmental factors with hand, foot, and mouth disease in guangdong, china. Sci Rep. 2018;8(1):1–9.
Public Health ENgland: Daily summary: Coronavirus in the UK. https://coronavirus.data.gov.uk/ Accessed 06 Sep 2020.
Office for National Statistics: The open geography portal. https://geoportal.statistics.gov.uk/ Accessed 06 Sep 2020.
Office for National Statistics: Population estimates. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates Accessed 06 Sep 2020.
Hersbach H, Bell B, Berrisford P, Hirahara S, Horanyi A, MunozSabater J, Nicolas J, Peubey C, Radu R, Schepers D, et al. The era5 global reanalysis. Q J Royal Meteorol Soc. 2020;147(730):1999–2049.
Government U. Coronavirus Press Conferences Quality and Methodology Information Report. https://www.gov.uk/government/publications/coronaviruspressconferencesqualityandmethodologyinformationreport/coronaviruspressconferencesqualityandmethodologyinformationreport Accessed 27 Feb 2023.
Lee D, Rushworth A, Napier G, Pettersson W. Carbayesst version 3.3: Spatiotemporal areal unit modeling in r with conditional autoregressive priors 2022.
Rushworth A, Lee D, Sarran C. An adaptive spatiotemporal smoothing model for estimating trends and step changes in disease risk. J Royal Stat Soc Series C Appl Stat. 2017;66(1):141–57.
BrizRedón Á, SerranoAroca Á. A spatiotemporal analysis for exploring the effect of temperature on covid19 early evolution in spain. Sci Total Environ. 2020;728: 138811.
Rue H, Riebler A, Sørbye SH, Illian JB, Simpson DP, Lindgren FK. Bayesian Computing with INLA: A Review 2016. arXiv:1604.00860
Acknowledgements
The authors thank David Whitney for his help in preprocessing the raw data.
Funding
Nothing to declare.
Author information
Authors and Affiliations
Contributions
TJ conceived the research and cleaned the data. The research was guided by TJ and BL; JM analysed the data and wrote the manuscript. TJ and BL reviewed and edited the manuscript. All authors have read and approved the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1.
Contains daily case counts for each LAD along with the LAD population from the ONS estimates. LADs are identified by LAD code and name.
Additional file 2.
RMarkdown file containing code to setup, run both and save the main model using parallelisation.
Additional file 3.
RMarkdown file containing code to carry out an exploratory analysis of the prepared data.
Additional file 4.
RMarkdown file containing code to load the cleaned data sources, prepare the data for analysis and load required libraries.
Additional file 5.
RMarkdown file containing code to analyse both models. Includes model diagnostics, analysis of reisduals and investigations into both model parameters and hyperparameters.
Additional file 6.
An R object that can be loaded using the readRDS command. Contains daily measurements of for each LAD. LADs are identified by name and LAD code.
Additional file 7.
RMarkdown file containing code to setup, run and save the extended model incorporating secondorder neighbours. Makes use of parallelisation.
Additional file 8.
RMarkdown file containing code to conduct analysis into the random effects in both models.
Additional file 9.
An R object that can be loaded using the readRDS command. The loaded object is a binary matrix used to define neighbouring LADs in the extended model.
Additional file 10.
An R object that can be loaded using the readRDS command. The loaded object is a binary matrix used to define neighbouring LADs in the original model.
Additional file 11.
A zippedfolder containing the shapefiles required to create the maps of English LADs.
Appendix
Appendix
See Table 2 and Figs. 12 and 13
InverseGamma distribution
A variable X follows an InverseGamma(a, b) distribution if its support is the positive real line and it has a probability density function
where \(\Gamma (\cdot )\) is the Gamma function. Both the shape parameter a and scale parameter b must be positive.
Meteorological variables

Temperature is given at 2metres above the Earth’s surface in Kelvin. In the study we converted this to Celsius.

Humidity is measured by relative humidity.

Wind velocity is given at a height of 10metres above the Earth’s surface.

Solar radiation is defined in terms of surface solar radiation downwards, measured in energy per unit area. Due to the large values, this variable was scaled down by a factor of \(10^6\).

Precipitation is measured as the cumulative depth of daily precipitation spread over the geographical region in metres. Due to the small units we scaled this by 1,000, effectively measuring depth in millimetres.
A word on notation
In section 3.2 \(Y_{kt}\) is a random variable for the number of cases in the \(k\text{th}\) LAD at time t, whereas \(y_{kt}\) refers to the observed number of cases. Similarly, \(\theta _{kt}\) is a random variable for the corresponding relative risk whilst \(\hat{\theta }_{kt}\) is a point estimate of the relative risk.
Convergence diagnostics
Below are the convergence diagnostics for model \(M_0\). The GelmanRubin diagnostic is denoted GR, and \(\text{G}_i\) is the Geweke diagnostic for the \(i{\text{th}}\) chain.
Trace plots
Reading left to right, up to down, the trace plots are the posterior samples for Sunday, Friday, Saturday, Wednesday, Monday and Tuesday.
Reading left to right, up to down, the trace plots are the posterior samples for the intercept, temperature, relative humidity, wind velocity, solar radiation and precipitation.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Mullineaux, J.D., Leurent, B. & Jendoubi, T. A Bayesian spatiotemporal study of the association between meteorological factors and the spread of COVID19. J Transl Med 21, 848 (2023). https://doi.org/10.1186/s12967023044365
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12967023044365