Skip to main content

Correlation of the two most frequent HLA haplotypes in the Italian population to the differential regional incidence of Covid-19



Understanding how HLA polymorphisms may affect both susceptibility, course and severity of Covid-19 infection could help both at the clinical level to identify individuals at higher risk from the disease and at the epidemiological one to explain the differences in the epidemic trend among countries or even within a specific country. Covid-19 disease in Italy showed a peculiar geographical distribution from the northern most affected regions to the southern ones only slightly touched.


In this study we analysed the regional frequencies for the most common Italian haplotypes from the Italian Bone Marrow Donor Registry (HLA-A, -B, -C and -DRB1 at four-digit level). Then we performed Pearson correlation analyses among regional haplotypes estimated frequency in the population and Covid-19 incidence and mortality.


In this study we found that the two most frequent HLA haplotypes in the Italian population, HLA-A*:01:01g-B*08:01 g-C*07:01g-DRB1*03:01g and HLA-A*02.01g-B*18.01g-C*07.01g-DRB1*11.04g, had a regional distribution overlapping that of Covid-19 and showed respectively a positive (suggestive of susceptibility) and negative (suggestive of protection) significant correlation with both Covid-19 incidence and mortality.


Based on these results, in order to define such HLA haplotypes as a factor effectively associated to the disease susceptibility, the creation of national networks that can collect patients’ samples from all regions for HLA typing should be highly encouraged.


The novel coronavirus identified in the last months of 2019 (SARS-CoV-2) belongs to the family of already known human CoVs of zoonotic origin, along with 229E, OC43, HKU1, NL63, that are community acquired CoVs, well adapted to humans, causing mild respiratory diseases. The CoVs causing severe acute respiratory syndrome (SARS-CoV) and Middle East respiratory syndrome (MERS-CoV), that are on the contrary highly pathogenic and cause severe respiratory disease with significantly high case fatality (9.6% for SARS-CoV and 34.4% in MERS-CoV), also belong to this family [1]. SARS-CoV-2-induced pneumonia, named by World Health Organization as coronavirus disease 2019 (Covid-19), has been declared a pandemic on the 11th of March 2020 since its first appearance in Wuhan, China, in December 2019 [2]. Italy was the first European country to report an outbreak of infections with two hot spots located in the northern of Italy, which led to the Lombardia and Veneto regions being defined as red zone, followed by complete isolation of these areas from the 21th of February onward. By that time, the epidemic had spread all over Italy and the lockdown was extended to the entire country on the 9th of March to further limit the diffusion and avoid the collapse of the public health system [3]. Since then, for many weeks Italy has been the country with the highest number of cases and deaths worldwide. With the pandemic evolving it has settled in sixth place for the number of confirmed infections in the world (more than 233,000) and third for the number of deaths (more than 33,300) as of the 31st of May ( [4]. These numbers could even be an underestimation, because of hidden asymptomatic or pauci-symptomatic individuals not subjected to the control swab. Even death counts could have been underreported especially in the climax of the emergency, as supported by several studies and by a report analysis of mortality in the period of epidemic from Covid-19 by National Institute for Social Security (INPS) [5, 6]. During this time the assistance and monitoring network was unprepared to face a pathology still unknown in many respects, hospitals and intensive care units were overcrowded and many people may have died in their homes without testing. A recent observational study conducted on a small community in Nembro, a little town located in Lombardia, one of the most affected areas of northern Italy, reported an all-cause mortality between January and April 2020, several times higher than that recorded in the previous 8 years in the same time frame, reaching a peak of 154.4 per 1000 person years in March 2020 vs a range 1.0 to 21.5 per 1000 person years between January 2012 and February 2020 [5]. Overall, Covid-19 deaths were mostly observed in males and older patients with pre-existing comorbidities [7, 8]. However, the still inexplicable high Case Fatality Rate that has been reported in Italy compared to other countries for the age group over 60, has not proven relationship with the demographic characteristics and the percentage in the elderly population [9, 10].

Even though Covid-19 pathogenesis has not yet been fully disclosed, the host antiviral response undoubtedly plays a key role in the disease course. Immunopathogenesis and induction of a proinflammatory cytokine storm are the key event in disease progression into severe forms leading to acute respiratory distress syndrome (ARDS). When the adaptive immune response fails to clear the infection from the host, the disease progresses to more severe stages since the virus rapidly spreads into different organs (lungs, intestine, kidney) eliciting a massive tissue destruction and a strong inflammatory response. These are mediated by innate immune cells, mainly macrophages and granulocytes, that induce a severe or even fatal clinical outcome caused by multi-organ dysfunction, through a cytokine storm that spreads throughout the body [11]. High systemic levels of IL-6, IL-7, TNFα, IL-10, G-CSF, MCP-1 and MIP1α have been observed in the blood of Covid-19 patients, with a correlation to disease severity [12].

A fundamental question that urgently deserves an answer is why, even considering only symptomatic patients, the disease progresses into a severe form compromising respiratory function only in a fraction of the infected individuals. One factor could be the appropriateness of the immune response to elicit a specific antiviral immunity without destroying the host tissues, which depends both on environmental and genetic factors. In this context, the human leukocyte antigen (HLA) complex, which is well known to influence the efficacy of T cell recognition of foreign antigens, could play a major role. The presentation of viral antigens through HLA II by APC cells is a key event in the establishment of the anti-viral adaptive immune response in addition to HLA I direct presentation to cytotoxic CD8 T cells. The HLA locus is the most polymorphic region in the human genome. The polymorphism of HLA proteins controls the possible repertoire of bound epitopes, thus shaping the immune response profile of an individual [13, 14]. Genetic polymorphisms have been reported to influence population and individual predisposition to multifactorial, autoimmune and infectious pathologies [15]. Susceptibility to viral infections like human immunodeficiency virus (HIV), human hepatitis B virus (HBV), human hepatitis C virus (HCV) and human papilloma virus (HPV), to name just a few, has been reported to be influenced by HLA specific subtypes [16,17,18,19]. Noteworthy, studies conducted on a meaningful data set of high-resolution HLA-typed individuals, revealed significant differences in single allele and haplotypes frequencies among the Italian regions [20]. Understanding how genetic variation in HLA may affect both susceptibility as well as course and severity of Covid-19 infection, could help to identify and stratify individuals at higher risk from the disease. Moreover, it could help to give a possible explanation at the epidemiological level why the epidemic in Italy, even though spread all over the country, showed a strong regionality with northern regions, above all Lombardia, reporting higher rates than central and southern regions.

On these premises, in this work we performed a geographical epidemiological analysis in order to find if in the Italian population there are particular frequent haplotypes and HLA alleles, whose distribution among the Italian regions overlaps with Covid-19 regional distribution and thus formulate and test the hypothesis of their potential association with Covid-19 incidence and severity, to identify sub-populations most at risk of susceptibility to the infection.


Data sources

For our analyses we used the large dataset from the Italian Bone Marrow Donors Registry (IBMDR) maintained at E.O. Ospedali Galliera di Genova [20]. In the study, we used the dataset 2, which is constituted by a sample of 104,135 donors with available data about the city of birth, typed for HLA-A, -B, -C and -DRB1 at a high-resolution (HR) level by ASHI or EFI-accredited tissue typing laboratories using HR molecular biology techniques (SBT, SSO, SSP, NGS) as described [20]. The data, based on the donor’s birth region, were divided into the 20 geographical regions of Italy that are in alphabetical order: Abruzzo, Basilicata, Calabria, Campania, Emilia Romagna, Friuli Venezia Giulia, Lazio, Liguria, Lombardia, Marche, Molise, Piemonte, Puglia, Sardegna, Sicilia, Toscana, Trentino Alto Adige, Umbria, Valle d’Aosta and Veneto. The datasets used as reference are the CWD 2.0.0 catalogue (ASHI CWD) from worldwide population and the EFI CWD catalogue for the European population [21, 22]. Sardegna and Valle D’Aosta regions were excluded from correlation analyses because of their widely recognized genetic difference, even within the HLA locus, with respect to the rest of Italy due to genetic isolation as previously reported [23].

With regard to the number of Covid-19 cases and deaths, we used the data collected by the Italian National Institute of Health (Istituto Superiore di Sanità [ISS]), which are daily reported by the Civil Protection Department Headquarters and published at

Moreover, we obtained data on the number of inhabitants for each Italian region that is freely available from the Italian National Institutes of Statistics (ISTAT), the main provider of official statistics in Italy, for both citizens and policy-makers.

Statistical analysis

All statistical analyses were performed using R version 4.0.0 (R Core team) [24]. Pearson correlations, as measure of the strength of the linear relationship between two variables (− 1 < r ≤ + 1) and accompanying P-values, were calculated using the package ‘Hmisc’. Correlation plots were generated using the package ‘ggpubr’. P values were considered statistically significant below 0.05 (* < 0.05, ** < 0.01, *** < 0.001).


Geographical distribution of Covid-19 epidemic in Italy

We analysed the number of Covid-19 cases confirmed by a real-time reverse transcriptase–polymerase chain reaction (RT-PCR) assay of nasal and pharyngeal swabs from patients and the number of deaths reported by ISS for each Italian region. The analysis was performed at four meaningful time points of the epidemic: before the lockdown start (8th of March), 1 month later during the exponential phase of the epidemic (8th of April), at the end of the lockdown (3rd of May) and 3 weeks later (24th of May) (Table 1). The values for the number of cases and deaths were normalised to the total of inhabitants of each region, in order to take into account the different population sizes, based on statistics reported by ISTAT for 2019 (Additional file 1: Table S1). At every time point, there is a clustering of the twenty regions for number of cases and deaths in three groups reflecting the geographical localization, with the northern regions showing the most cases and deaths (Fig. 1).

Table 1 Regional data relative to the impact of COVID-19 on the Italian population
Fig. 1
figure 1

Trend over time relative to the number of Covid-19 cases/100,000 inhabitants and deaths/100,000 inhabitants. The graphs report the number of Covid-19 cases/100,000 inhabitants (a) and deaths/100,000 inhabitants (b) at four time points of the epidemic. Red symbols are used for northern regions, blue symbols for central regions and green symbols for southern regions

Regional distribution of most frequent HLA haplotypes

Given the key role of the host immune response against the SARS-CoV-2 virus in the pathogenesis of the disease and the high degree of HLA polymorphism, we subsequently tried to determine, at the general population level, if there are significative differences in the frequency of the most frequent HLA haplotypes in the Italian population among the northern, central and southern regions. We performed our analyses on the five most common Italian haplotypes as ranked by the Italian Bone Marrow Donor Registry (IBMDR), the most extensive Italian collection consisting of more than 131,000 high resolution HLA-A, -B, -C and -DRB1 typed individuals at the four-digit level. The registry contains complete information about the region of provenience and ethnic origin for a sample of 104,135 subjects, thus providing a reliable estimation of HLA frequencies within the Italian population [20]. The estimated national frequencies for these haplotypes, calculated using the Arlequin programme by the EM algorithm, sum up to 6.9%: HLA-A*:01:01g-B*08:01g-C*07:01g-DRB1*03:01g (2.54%); HLA-A*02.01g-B*18.01g-C*07.01g-DRB1*11.04g (1.14%); HLA-A*30.01g-B*13.02g-C*06.02g-DRB1*07.01g (1.09%); HLA-A*29.02g-B*44.03g-C*16.01g-DRB1*07.01g (1.08%); HLA-A*03.01g-B*07.02g-C*07.02g-DRB1*15.01g (1.02%). The regional frequencies estimated from the data set sample are depicted in Table 2. We observed that the most frequent five Italian haplotypes were not uniformly distributed in all regions and, in some regions, they were totally missing. Sardegna and Valle D’Aosta regions which are widely recognized for their genetic difference in the HLA locus with respect to the rest of Italy due to genetic isolation, even if reported in the Tables to have an overall picture of the geographic distribution of HLA haplotypes in the Italian population, have been indeed excluded from all subsequent correlation analyses [23]. The haplotypes ranked#1 HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g and #2 HLA-A*02:01g-B*18:01g-C*07:01g-DRB1*11:04g showed the highest dispersion from the mean national value and an almost net clustering among northern, central and southern regions in the opposite direction for #1 and #2 (Fig. 2).

Table 2 Frequencies of the 5 most common haplotypes observed in the Italian population
Fig. 2
figure 2

Frequencies of the 5 most common haplotypes observed in the Italian population. The horizontal bars indicate the mean national values plus the 95% confidence interval. The # refers to the ranking of the haplotype for frequency in the Italian population. Red symbols are used for northern regions, blue symbols for central regions and green symbols for southern regions

Correlation among HLA haplotypes regional frequency and Covid-19 incidence and mortality

Next, in order to find if there is an overlap among the most frequent haplotypes distribution and the incidence of Covid-19 at the regional level, we calculated, using Pearson correlations, if and how the regional frequencies of each haplotype in the population linearly correlate with the regional number of both Covid-19 cases and deaths/100,000 inhabitants. We found that the haplotype ranked #1 HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g shows a positive (suggestive of susceptibility) significant correlation with both Covid-19 incidence and mortality. Conversely, the haplotype ranked #2 HLA-A*02:01g-B*18:01g-C*07:01g-DRB1*11:04g shows a negative correlation (suggestive of protection). This correlation is observed at all significant time points of the epidemic except for the 8th of March when the numbers were still too low. Pearson’s correlation coefficients and relative P values for each bivariate analysis are reported in Table 3. For the haplotype #1 HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g, the distribution is characterized by a net clustering of the regions in three groups, with the northern regions reporting high frequency values and corresponding highest incidence and mortality, the central regions displaying intermediate values and the southern regions the lowest values for the haplotype #1 (Fig. 3). The Pearson correlation coefficient among the frequency of HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g haplotype and Covid-19 N° cases/100,000 inhabitants ranges from 0.34 (at the 8th of March) to 0.75 (at the 8th of April). When considering the N° of Covid-19 deaths/100,000 inhabitants as the correlated variable, the Pearson’s coefficient varies from 0.24 to 0.57 (Table 3 and Fig. 3). On the contrary, for the #2 HLA-A*02:01g-B*18:01g-C*07:01g-DRB1*11:04g haplotype the regions are inversely clustered in three groups, with the southern regions reporting higher frequencies for the haplotype and low numbers of both cases and deaths, whereas central and northern regions show respectively intermediate and low frequencies and progressively increasing reported incidence and mortality of Covid-19 (Figure 4). For this haplotype, the Pearson correlation coefficient among its frequency and Covid-19 incidence varies from − 0.33 (at the 8th of March) to − 0.63 (at the 3rd of May). When considering mortality as the correlated variable the Pearson’s coefficient extends from − 0.28 to − 0.51 (Table 3 and Fig. 4).

Table 3 Bivariate correlation analysis among regional haplotypes estimated frequency in the population and COVID-19 incidence and mortality
Fig. 3
figure 3

Bivariate correlation analysis among the regional frequency of HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g haplotype and the N° cases/100,000 inhabitants and N° deaths/100,000 inhabitants. The graphs show the bivariate correlation analysis relative to 3rd May time point. High HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g frequency in the population is significantly correlated with a high number of both cases (a) and deaths (b)/100,000 inhabitants

Fig. 4
figure 4

Bivariate correlation analysis among the regional frequency of the HLA-A*02.01g-B*18.01g-C*07.01g-DRB1*11.04g haplotype with the N° cases/100,000 inhabitants and N° deaths/100,000 inhabitants. The graphs show the bivariate correlation analysis relative to 3rd May time point. High HLA-A*02.01g-B*18.01g-C*07.01g-DRB1*11.04g frequency in the population is significantly correlated with a low number of both cases (a) and deaths (b)/100,000 inhabitants

The haplotypes ranked #4 HLA-A*29:02g-B*44:03g-C*16:01g-DRB1*07:01g and #5 HLA-A*03:01g-B*07:02g-C*07:02g-DRB1*15:01g only show a slight significant correlation with the N° of cases, without a net clustering of the regions in the three areas (north, center, south), whereas the haplotype #3 doesn’t have any correlation (Table 3). Given that single specific alleles are represented in thousand haplotypes in different combinations inside the Italian population, covering a larger percentage of the population, the next step was to determine if the distribution of single HLA-A, -B, -C, -DRB1 alleles of the haplotypes #1 and #2 may in turn overlap with Covid-19 regional distribution. We therefore analysed the estimated frequencies of each allele alone and in all possible double or triple combinations of the four considered HLA-A, -B, -C, -DRB1 loci (Tables 4 and 5). We found that the regional frequencies of the alleles HLA-A*01:01g, HLA-B*08:01g and HLA-DRB1*03:01g were all directly correlated with a higher Covid-19 regional incidence and mortality, with the northern regions having higher frequencies for these alleles. The same was for all the allelic combinations, with a stronger significance for those containing the HLA-B*08:01g and/or the HLA-DRB1*03:01g allele (Table 6) In contrast, the allelic frequencies of HLA-B*18:01, HLA-C*07:01 and HLA-DRB1*11:04 were all inversely related to the number of Covid-19 cases and deaths, having the southern regions higher frequencies and lower incidence and mortality associated to the infection. The same result was observed for all the possible double or triple combinations of the four considered HLA-A, -B, -C, -DRB1 loci (Table 7).

Table 4 Frequencies of the single alleles and allelic combinations of the HLA-A, -B, -C, -DRB1 loci of the haplotype HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g in the Italian population
Table 5 Frequencies of the single alleles and allelic combinations of the HLA-A, -B, -C, -DRB1 loci of the haplotype HLA-A*02.01g-C*07.01g-DRB1*11.04g in the Italian population
Table 6 Correlation analysis of the single alleles and allelic combinations of the HLA-A, -B, -C, -DRB1 loci of the haplotype HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g with N° cases and deaths/100,000 inhabitants
Table 7 Correlation analysis of the single alleles and allelic combinations of the HLA-A, -B, -C, -DRB1 loci of the haplotype HLA-A*02.01g-C*07.01g-DRB1*11.04g with N° cases and deaths/100,000 inhabitants


In the present study, through a geographical epidemiological analysis, we observed that there are significative regional differences in the frequency of the two most common HLA haplotypes in the Italian population among the northern, central and southern regions with HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g (ranked #1 at the national level) showing a decreasing frequency gradient and HLA-A*02:01g-B*18:01g-C*07:01g-DRB1*11:04g (ranked #2) an increasing frequency gradient from North to South. The geographical distribution of these haplotypes overlaps with that of Covid-19 in Italy, being linearly correlated in a positive/direct way for the haplotype #1 and in a negative/inverse way for the haplotype #2. This means that a high incidence and mortality was observed in the northern regions where the population has high frequency values of the haplotype HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g and all the allelic combinations of the four considered HLA-A, -B, -C, -DRB1 loci, containing at least one of these alleles, particularly those with the B*08:01g and DRB1*03:01g polymorphism, suggestive of potential ‘susceptibility’ to the disease. On the contrary, a low incidence and mortality for Covid-19 was observed in the central-southern regions with high frequency values of the haplotype HLA-A*02:01g-B*18:01g-C*07:01g-DRB1*11:04g and of its alleles B*18:01g, C*07:01g and DRB1*11:04g in all their possible combinations containing at least one of such alleles, suggestive of potential ‘protection’ from the infection. Hence, the population of central-southern Italy that shows the highest prevalence of the protective haplotype HLA-A*02:01g-B*18:01g-C*07:01g-DRB1*11:04g and its allelic combinations and, at the same time, the lowest frequencies of the disadvantageous haplotype HLA-A*01:01g-B*08:01g-C*07:01g-DRB1*03:01g and its allelic combinations, could be genetically shielded from Covid-19. Such findings are only descriptive in nature and need to be validated through retrospective observational case–control studies on Covid-19 patients typed for HLA comparing the frequencies of the potential ‘protective’ and ‘unfavourable’ HLA haplotypes and alleles highlighted in the general Italian population with those observed in the Covid-19 patients cohort, in order to define such HLA polymorphisms as a factor effectively associated to the disease susceptibility as already done for other viral infections, communicable diseases and autoimmune pathologies [15,16,17,18,19]. However, also in these pathologies such geographical epidemiological approaches have given important clues to identify sub-populations most at risk of susceptibility to the infection also taking into account as a susceptibility parameter HLA specific alleles and haplotypes [13].

To the best of our knowledge, this is the first study that estimated, through a population frequency analysis, the potential association of specific HLA alleles and haplotypes with the incidence and mortality of Covid-19. Although the primary scope of a bone marrow registry is to increase the possibilities to find allogenic compatible donors for transplants, it is also a unique source of precious HLA data from the widest and most representative sample available at the national level, which makes it possible to reliably estimate haplotypes frequencies in a given population and carry out association studies in many disease contexts. We conducted our study on a large sample of 104,135 subjects typed at high resolution four-digit level, subdivided in the 20 Italian regions, with a regional sample size adequately statistically representative of the resident population for each region [20].

Our study is the first to propose HLA as a susceptibility marker to SARS-CoV-2 infection and highlight its potential impact on the epidemic trend within a specific country, Italy, that has been hit particularly hard. However, similar associations may also be observed within other countries, bringing to light common genetic patterns or new country-specific protective or unfavourable HLA polymorphisms, that could explain some of the differences observed in the epidemic between one country and another. Such geographical epidemiological studies, conducted at the general population level, need to be confirmed in Covid-19 patients’ cohorts of asymptomatic, mildly symptomatic, severely affected individuals to draw fundamental conclusions with important implications not only at the epidemiological level but also at the clinical one. Indeed, particular HLA haplotypes/alleles could be associated with a stronger immune response and hence a better host response to the virus. Some useful information can also be inferred by previous researches on SARS and MERS, where it has been reported that several HLA polymorphisms are associated to SARS susceptibility (HLA-B*46:01, HLA-B*07:03, HLA-DRB1*12:02 and HLA-Cw*08:01) [25,26,27]. On the contrary the allelotypes HLA-DR*03:01, HLA-Cw*15:02 and HLA-A*02:01 seem to be protective from SARS infection [28]. HLA-DRB1*11:01 and HLA-DQB1*02:02 are related to MERS-CoV infection susceptibility [29]. On these premises, it is conceivable that several HLA associations could be unfavourable or protective also for the course of Covid-19 infection.

Very recent works employed different bioinformatic approaches to predict the best SARS-CoV-2 derived B and T cell epitopes and their associated HLA alleles, that may help to design effective vaccines and find protective antibodies [30,31,32,33,34,35]. Employing HLA binding affinity prediction tools, it has been observed that HLA-A and HLA-C alleles exhibited the relatively most and least capacity to present SARS-CoV-2 antigens, respectively. However, depending on the specific study and the bioinformatic approach used, the best and worse predicted presenters of conserved peptides reported are not the same. We found that the alleles analysed in our study are present in the database recently made available by Nguyen et al., that reports the list of 32,257 8- to 12-mers peptides from the SARS-CoV-2 proteome and their binding affinity to 145 different HLA A, B, C alleles, predicted by bioinformatic tools [30]. In particular, all the alleles pointed out in our study have been predicted to have an overall good capacity to present viral peptides, independently of their potential correlation with Covid-19 regional incidence and mortality, with HLA-A*02:01 being the best (1062 total peptides, 268 with a very high binding affinity < 50 nM), followed by HLA-B*08:01 (225 total, 25 high affinity), HLA-A*01:01 (183 total, 44 high affinity), HLA-B*18:01 (101 total, 12 high) and HLA-C*07:01 (44 total, 4 high) (Additional file 1: Fig. S1) [30].

It is important to note that all the bioinformatic predictions made on SARS-CoV-2 epitopes and their HLA binding, have the limit to be exclusively theoretical and thus need to be experimentally validated in in vitro binding assays and in the ability to effectively elicit T and B cell mediated responses. Indeed, it is widely recognised that antigenicity, immunogenicity and, for T cells, the TCR avidity to the antigen/HLA and hence the functional immune responses elicited, are not directly related with the peptide binding affinity [36, 37]. No information is available to date regarding the binding of HLA II molecules, whose polymorphic variants could play a relevant role in orchestrating a functional adaptive immune response.

Undoubtedly, the method of analysis used in our study presents some limits and could be affected by an inevitable selection bias, since it takes into consideration the region of birth of the typed individuals but not the region of residence, whereas data about Covid-19 infections are reported per region where the infection occurred, independently of birthplace. However, we can reasonably exclude the influence of migration flows (that in Italy are historically directed from the southern regions to the northern) on the regional frequencies used in our computations, since they are equivalent to those from previous studies with information concerning both the region of birth and residence and so, thanks to the large dimension of the regional subgroups analysed, independent from the migratory movements [38, 39]. The information about Covid-19 cases and deaths relies on public resources, daily updated on the basis of laboratory analysis of swabs tested positive for the virus by RT-PCR at the regional accredited centers, following confirmatory testing by the Italian National Institute of Health in Rome. As above reported, these values could have been underestimated for reasons depending on several factors like a stringent testing policy, limited to severely affected symptomatic individuals, that excluded from testing the bulk of asymptomatic ones, shortage of testing materials in the peak of the emergency, limited access to overcrowded hospital facilities, to name just a few. Noteworthy, a higher overall mortality rate than previous years has been observed in Nembro, a little town of Lombardia region, indicative of both direct and indirect disease burden and has been also highlighted by a recent report published by Italian National Institute for Social Security [5, 6].

Apart from the epidemiological value in tracing the distribution of Covid-19 and understanding its immunopathogenesis, the identification of specific HLA haplotypes as potential risk, susceptibility or protective biomarkers, can be of great help in stratifying the population, in order to identify those patients more at risk to develop a severe infection, thus allowing to adopt proper preventive strategies and early intervention measures.

It is important to note that the HLA region is known for its linkage disequilibrium, therefore, other genes very near to HLA could be eventually responsible for the association with Covid-19 regional distribution. Genetic polymorphisms in the HLA locus or in other genes encoding key components of the immune-inflammatory response observed in SARS-CoV-2 infection (KIR receptors, inflammasome components, cytokines and chemokines like CXCL10) may help to explain the high variable spectrum of disease manifestations, progression and outcome (from asymptomatic, to mild-moderately symptomatic and severely affected patients requiring intensive care and respiratory support).

With this in mind, even though the collected knowledge is still limited to few studies, some susceptibility markers other than HLA have been proposed for Covid-19. An association with ABO blood antigens has been observed in a cohort of Chinese patients, with the type A and 0 being respectively at highest and lowest risk to be infected, as previously been reported for other viral infections [40]. This observation was confirmed in a genomewide study on Spanish and Italian patients’ cohorts. Indeed, a skewing of ABO blood antigens distribution among Covid-19 patients who suffered from respiratory failure was reported, whereas no significant association was found between HLA polymorphisms in Covid-19 patients and respiratory failure (oxygen supplementation or mechanical ventilation) [41]. To the best of our knowledge this is the only study available to date that takes into account the association of HLA polymorphisms and Covid-19 severity, but it is important to note that it was performed in a limited Italian population, including only patients from Lombardia region, without taking into account geographical patterns of HLA distribution. Genetic polymorphisms of key genes of the virus entry machinery (Ace2, Tmprss2, CtsB, and CtsL) or of the inflammatory/immune response (e.g. cytokines and their receptors) or epigenetic mechanisms may influence virus susceptibility and the severity/outcome of the infection among different individuals and populations, too [42, 43]. A novel susceptibility locus containing a cluster of six genes (SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1) on chromosome 3p21.31, most of whose involved in the regulation of inflammatory and immune response, has been indeed found [41].

We recognize that other factors, e.g. climatic differences, pollution, lockdown effect that limited the diffusion from North to South, could be responsible alone or in combination with genetic factors for the different Covid-19 infection rates among Italian regions. Our reported potential association of two haplotypes with the differential regional incidence and mortality for Covid-19 in Italy may explain, from the point of view of the genetic diversity of the Italian population, why the epidemic hit the northern regions so hard and instead had a small impact on those of the central-south, a figure which cannot be explained on the basis of population, urban density, movements to and from large urban and industrial areas, pollution or climate alone. Indeed, several central and southern metropolitan areas like those of Rome, Naples, Bari, Palermo (respectively located in Lazio, Campania, Puglia, Sicilia) have an urban density comparable or even higher (Naples) than Milan and Lombardia, atmospheric emissions of PM10, PM2.5 and NO2 levels above threshold, and high flows of mobility through public transports [44]. Furthermore, the climatic variations in Italy are very limited and not comparable to those occurring in wider countries like China, US or Brasil [45].

Our correlation analysis among HLA regional frequencies and Covid-19 cases/deaths numbers, having been carried out at different times over the epidemic, also takes into account the potential effects elicited by the displacement of thousands of off-site students and workers from the northern (mainly Lombardia, the fire of the epidemic) to the southern regions (Campania, Puglia, Calabria, Sicilia), which occurred in two large waves, the night before the start of the lockdown (the 9th of March, totally uncontrolled) and at the end of the lockdown (after the 3rd of May, with some monitoring from region to region). These uncontrolled exoduses and especially the first one, although occurring in a phase of mobility restrictions and contact reduction, could have caused the epidemic to break out in the southern Italian regions, which instead did not occur and which makes the hypothesis of a protective genetics even more plausible in the populations of central-southern Italy.

Genetic variations and HLA polymorphisms alone cannot help to understand other significant features of Covid-19, like the higher mortality observed in men vs women (2.8% vs 1.7%) or the higher morbidity and mortality in old vs young people [46,47,48]. However, it is fundamental to take into account that significant differences at the immunological level exist among these groups and such differences could be dependent on HLA polymorphisms and, overall, on the genetic, hormonal and metabolic background. Indeed, HLA genes are involved in the decline of anti-viral response mediated by T cells that is observed with aging.


Our study proposes for the first time that some HLA polymorphisms in the Italian population may be potentially associated to the different regional incidence and mortality for Covid-19, likely activating a better and more powerful antiviral response, with central-southern regions being most protected from the epidemic. Such evidence, obtained at the general population level, needs to be confirmed in retrospective case–control studies on wide cohorts of Covid-19 patients from all the Italian regions in order to define HLA polymorphisms as a factor involved in disease susceptibility. Moreover, since the bioinformatic predictions on HLA-viral peptides binding affinity alone are of limited functional significance, it is fundamental to identify through proper in vitro and in vivo studies, if such HLA genetic loci are effectively associated to the induction of a protective T and B-cell mediated antiviral immunity. Research efforts aimed to explore genetic associations with the immune response in Covid-19 could be particularly useful both at the epidemiological and clinical level, to identify patients most at risk to develop severe complications, that should hence have priority to vaccination access, when it will be available, and to evaluate the differential efficacy of the vaccination in subjects with different HLA genetic background. HLA typing, that can be easily done through cost-efficient methodologies, also along with Covid-19 testing, should hence be envisaged and encouraged at the clinical level and by policy makers through the creation of a national network that may collect DNA samples from patients from all regions.

Availability of data and materials

Most of the data used in this study are freely available from the source cited. Some data are available upon reasonable request from the corresponding authors.



Acute respiratory distress syndrome


Human leukocyte antigen


Human immunodeficiency virus


Human hepatitis B virus


Human hepatitis C virus


Human papilloma virus


National Institute for Social Security


Istituto Superiore di Sanità


Italian National Institutes of Statistics


Reverse transcriptase–polymerase chain reaction


  1. de Wit E, van Doremalen N, Falzarano D, Munster VJ. SARS and MERS: recent insights into emerging coronaviruses. Nat Rev Microbiol. 2016;14:523–34.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Zhou P, Yang XL, Wang XG, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sebastiani G, Massa M, Riboli E. Covid-19 epidemic in Italy: evolution, projections and impact of government measures. Eur J Epidemiol. 2020;35:341–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20:533–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Piccininni M, Rohmann JL, Foresti L, Lurani C, Kurth T. Use of all cause mortality to quantify the consequences of covid-19 in Nembro, Lombardy: descriptive study. BMJ. 2020;369:m1835.

    Article  PubMed  PubMed Central  Google Scholar 

  6. National Institute for Social Security (INPS). Analysis of mortality in the period of epidemic from COVID-19. 20 May 2020. Nota_CGSA_mortal_Covid19_def.pdf.

  7. Distante C, Piscitelli P, Miani A. Covid-19 outbreak progression in Italian regions: approaching the peak by the end of March in Northern Italy and First Week of April in Southern Italy. Int J Environ Res Public Health. 2020;17:E3025.

    Article  PubMed  Google Scholar 

  8. Grasselli G, Zangrillo A, Zanella A, et al. Baseline characteristics and outcomes of 1591 patients infected with SARS-CoV-2 admitted to ICUs of the Lombardy Region, Italy. JAMA. 2020;323:1574–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Onder G, Rezza G, Brusaferro S. Case-fatality rate and characteristics of patients dying in relation to COVID-19 in Italy. JAMA. 2020.

    Article  PubMed  Google Scholar 

  10. Natale F, Ghio D, Tarchi D, Goujon A, Conte A. COVID-19 cases and case fatality rate by age. European Commission Knowledge for policy.

  11. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:1054–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Blackwell JM, Jamieson SE, Burgner D. HLA and infectious diseases. Clin Microbiol Rev. 2009;22:370–85.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Crux NB, Elahi S. Human leukocyte antigen (HLA) and immune regulation: how do classical and non-classical HLA alleles modulate immune response to human immunodeficiency virus and hepatitis C virus infections? Front Immunol. 2017;8:832.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Matzaraki V, Kumar V, Wijmenga C, Zhernakova A. The MHC locus and genetic susceptibility to autoimmune and infectious diseases. Genome Biol. 2017;18:76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Pereyra F, Jia X, McLaren PJ, et al. International HIV controllers study, the major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science. 2011;330:1551–7.

    Article  CAS  Google Scholar 

  17. Nishida N, Ohashi J, Khor S, Sugiyama M, Tsuchiura T. Understanding of HLA-conferred susceptibility to chronic hepatitis B infection requires HLA genotyping- based association analysis. Sci Rep. 2016;6:24767.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhu M, Dai J, Wang C, et al. Fine mapping the MHC region identified four independent variants modifying susceptibility to chronic hepatitis B in han chinese. Hum Mol Genet. 2015;25:1225–32.

    Article  CAS  Google Scholar 

  19. Duggal P, Thio CL, Wojcik GL, et al. Genome-wide association study of spontaneous resolution of hepatitis C virus infection: data from multiple cohorts. Ann Intern Med. 2013;158:235–45.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Sacchi N, Castagnetta M, Miotti V, Garbarino L, Gallina A. High-resolution analysis of the HLA-A, -B, -C and -DRB1 alleles and national and regional haplotype frequencies based on 120 926 volunteers from the Italian Bone Marrow Donor Registry. HLA. 2019;94:285–95.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Sanchez-Mazas A, Nunes JM, Middleton D, et al. Common and well-documented HLA alleles over all of Europe and within European sub-regions: a catalogue from the European Federation for Immunogenetics. HLA. 2017;89:104–13.

    Article  CAS  PubMed  Google Scholar 

  22. Mack SJ, Cano P, Hollenbach JA, et al. Common and well documented HLA alleles: 2012 update to the CWD catalogue. Tissue Antigens. 2013;81:194–203.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Fiorito G, Di Gaetano C, Guarrera S, et al. The Italian genome reflects the history of Europe and the Mediterranean basin. Eur J Hum Genet. 2016;24(7):1056–62.

    Article  CAS  PubMed  Google Scholar 

  24. Core Team R. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2020.

    Google Scholar 

  25. Lin M, Tseng H-K, Trejaut JA, et al. Association of HLA class I with severe acute respiratory syndrome coronavirus infection. BMC Med Genet. 2003;4:9.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Chen YM, Liang SY, Shih YP, et al. Epidemiological and genetic correlates of severe acute respiratory syndrome coronavirus infection in the hospital with the highest nosocomial infection rate in Taiwan in 2003. J Clin Microbiol. 2006;44:359–65.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Keicho N, Itoyama S, Kashiwase K, et al. Association of human leukocyte antigen class II alleles with severe acute respiratory syndrome in the Vietnamese population. Hum Immunol. 2009;70:527–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Wang SF, Chen KH, Chen M, et al. Human-leukocyte antigen class I Cw 1502 and class II DR 0301 genotypes are associated with resistance to severe acute respiratory syndrome (SARS) infection. Viral Immunol. 2011;24:421–6.

    Article  CAS  PubMed  Google Scholar 

  29. Hajeer AH, Balkhy H, Johani S, et al. Association of human leukocyte antigen class II alleles with severe Middle East respiratory syndrome-coronavirus infection. Ann Thorac Med. 2016;11:211–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Nguyen A, David JK, Maden SK, et al. Human leukocyte antigen susceptibility map for SARS-CoV-2. J Virol. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Ahmed SF, Quadeer AA, McKay MR. Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies. Viruses. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Barquera R, Collen E, Di D, et al. Binding affinities of 438 HLA proteins to complete proteomes of seven pandemic viruses and distributions of strongest and weakest HLA peptide binders in populations worldwide. HLA. 2020. online ahead of print, 2020 May 31).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Baruah V, Bose S. Immunoinformatics-aided identification of T cell and B cell epitopes in the surface glycoprotein of 2019-nCoV. J Med Virol. 2020;92(5):495–500.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Bhattacharya M, Sharma AR, Patra P, et al. Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): immunoinformatics approach. J Med Virol. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Grifoni A, Sidney J, Zhang Y, Scheuermann RH, Peters B, Sette A. A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2. Cell Host Microbe. 2020;27(671–680):e2.

    Article  CAS  Google Scholar 

  36. Bihl F, Frahm N, Di Giammarino L, et al. Impact of HLA-B alleles, epitope binding affinity, functional avidity, and viral coinfection on the immunodominance of virus-specific CTL responses. J Immunol. 2006;176:4094–101.

    Article  CAS  PubMed  Google Scholar 

  37. Wang S, Li J, Chen X, Wang L, Liu W, Wu Y. Analyzing the effect of peptide-HLA-binding ability on the immunogenicity of potential CD8 + and CD4 + T cell epitopes in a large dataset. Immunol Res. 2016;64:908–18.

    Article  CAS  PubMed  Google Scholar 

  38. Rendine S, Borelli I, Barbanti M, Sacchi N, Roggero S, Curtoni ES. HLA polymorphisms in Italian bone marrow donors: a regional analysis. Tissue Antigens. 1998;52:135–46.

    Article  CAS  PubMed  Google Scholar 

  39. Amoroso A, Ferrero NM, Rendine S, Sacchi N. Le caratteristiche HLA della popolazione Italiana: Analisi di 370.000 volontari iscritti all’ IBMDR. Analysis. 2010;23:1-2.

  40. Zietz M, Tatonetti NP. Testing the association between blood type and COVID-19 infection, intubation, and death. Preprint. medRxiv.

  41. Ellinghaus D, Degenhardt F, Bujanda L, et al. Genomewide association study of severe Covid-19 with respiratory failure. N Engl J Med. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Cao Y, Li L, Feng Z, et al. Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations. Cell Discov. 2020;6:11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Paniri A, Hosseini MM, Akhavan-Niaki H. First comprehensive computational analysis of functional consequences of TMPRSS2 SNPs in susceptibility to SARS-CoV-2 among different populations. J Biomol Struct Dyn. 2020;15:1–18.

    Article  Google Scholar 

  44. ISPRA. XIV Rapporto Qualità dell’ambiente urbano. Stato dell’Ambiente. 82/2018. ISBN: 978-88-448-0926-3.

  45. Benedetti F, Pachetti M, Marini B, Ippodrino R, Gallo RC, Ciccozzi M, Zella D. Inverse correlation between average monthly high temperatures and COVID-19-related death rates in different geographical areas. J Transl Med. 2020;18:251.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Epidemiology Working Group for NCIP Epidemic Response. The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China. Chin J Epidemiol. 2020;41:145–51.

    Article  Google Scholar 

  47. Sharifi N, Ryan CJ. Androgen hazards with COVID-19. Endocr Relat Cancer. 2020;27(6):E1–3.

    Article  PubMed  Google Scholar 

  48. Jin JM, Bai P, He W, et al. Gender differences in patients with COVID-19: focus on severity and mortality. Front Public Health. 2020;8:152.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We wish to thank for helping us establish the network of collaborations.


This study was supported by funds provided by University of Salerno (FARB to RM). SP was funded by Associazione Italiana per la Ricerca sul Cancro (AIRC) and Fondazione Cariplo (AIRC TRIDEO No. 17216).

Author information

Authors and Affiliations



SP and RM conceived and designed the study. SP, MC, MA, MC and AMG acquired and managed the data and ran some preliminary analyses. SP, JD, RM analysed the data and created the figures. SP, JD and RM interpreted the results. CV contributed to the development and conduct of the study. SP and RM drafted the first version of the manuscript. All coauthors provided critical comments and approved the final version of the manuscript.

Corresponding authors

Correspondence to Simona Pisanti or Rosanna Martinelli.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

Regional statistics relative to the Italian population (2019 data from ISTAT). Figure S1 The histograms report the number of peptides with high binding affinity (< 50 nM) that have been predicted to bind to HLA-A, -B, and -C most frequent alleles worldwide by Nguyen et al. 2020 (public database at

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pisanti, S., Deelen, J., Gallina, A.M. et al. Correlation of the two most frequent HLA haplotypes in the Italian population to the differential regional incidence of Covid-19. J Transl Med 18, 352 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: