What is the history of the H3 numbering system?

Where possible, researchers tend to refer to the amino acid positions on the influenza hemagglutinin protein by two numbers: one based on itself, and one based on the “H3 numbering system”. What exactly is the “H3 numbering system”? Is there a rational reason for using such a numbering system, or is this simply an artefact of history?

To answer this question, I did a bit of digging around on the history of the H3 numbering system. As it turns out, the numbering system is really an artefact of history.

Back in the 80s, when sequencing began to be used as a tool for comparative analysis, groups at the MRC (Cambridge, UK) and at Harvard needed a way of establishing a reference system. Since not much data were available for each subtype, researchers defaulted to an earlier virus – A/Aichi/2/68, which happened to be an H3 virus. Let me show you what I’ve found based on a backward citation trace.

The latest mention I can find about the H3 numbering system that does not explicitly mention “H3 numbering system” is by Robertson et. al., 1987, in the Journal of General Virology (link).

The amino acids are numbered to correspond to the H3 subtype HA following alignment of the cysteine residues as described (Winter et. al., 1981)

Digging back into Winter et. al., 1981 from Nature (link):

The precise position of gaps is rather subjective, but was finally decided after inspection of the three-dimensional structure of the H3 subtype [23]. Gaps were then introduced into regions which, by analogy with the H3 subtype, might be expected to be in surface loops. The numbering system is based on consecutive amino acid residues (ignoring gaps) starting at the terminus of the H3 subtype (A/Aichi/2/68). Thus, residue 160 of A/PR/8/34 is actually residue 157 from its N terminus, but is homologous to 160 of A/Aichi/2/68.

Aha! We’re getting somewhere specific now! For interest, here’s the crystal structure taken from the original paper that described the HA protein structure (also by Winter et. al., also in 1981) (link)

Screen Shot 2014-06-16 at 12.56.38 PM
Isn’t the crystal structure just gorgeous?

Continuing the original questions I posed above – what exactly is the “reference” virus on which the H3 numbering system is based on? Turns out it’s a human H3 virus: A/Aichi/2/68. Was there any reason for picking this virus? No idea, but I decided to continue the backward citation trace.

Digging back into Wiley, Wilson and Skehel, 1981, in Nature (link):

Fourteen amino acid substitutions were detected in the HA1 polypeptide component of A/memphis/102/72 relative to that of the A/Aichi/2/68 virus and a further 16 in HA1 of A/Victoria/3/75.

Great, so it was these guys who decided to use A/Aichi/2/68. A structure was available for the 1968 virus, but another earlier virus: A/Japan/305/57 (H2) was also mentioned in the paper. Why didn’t they use the H2 virus instead?

One reference to this virus, published in the book “Structure and Variation in Influenza Virus”, was not searchable online. However, the other reference, is in a Nature article, coming from a time when cloning something would land you a Nature article. The HA that was cloned was based on recombinant viruses involving – you guessed it – A/Aichi/2/68!

Let’s take a look at Gething et. al., 1980, Nature (link):

The influenza haemagglutinins chosen for analysis were from the high-yielding recombinant virus strains A/Japan/305/57:A/Bel/42:A/PR/8/34(H2N1) [12] and X-31 (H3N2) [13]. These strains contain haemagglutinin genes from the originally isolated strains A/Japan/305/57 (H2) and A/Aichi/2/68 (H3).

At that time, the cysteine residues held a lot of interest for the researchers, so much so that they published a figure with all of the cysteine residues present.

Screen Shot 2014-06-16 at 12.21.23 PM
An example of a really effective visualization. You can clearly see, immediately, where the cysteine residues align up.


It’s now beginning to make sense why the alignments by Robertson were done based on the positions of the cysteine residues. Recalling some basic biochemistry, the cysteine residues were responsible for disulphide bridges, which would mean that they would form important anchors in the resulting protein structure. By the way, should I spell it as disulphide or disulfide? I will go with -ph-, since the authors used the British “haemagglutinin”.

The citation trace stops here, as I could not retrieve references 12 and 13 from Gething et. al., based on a cursory search on Google Scholar. I think this also means that the Japan H2 virus did not have a crystal structure available, while the Aichi H3 virus did.

In addition to having the crystal structure, the amino acid sequence and glycan distribution (then called “oligosaccharide distribution) were both known to a certain degree. This was published by Ward and Dopheide in Biochemical Journal, 1981 (link):

The amino acid sequence and oligosaccharide distribution for the haemagglutinin from the early Hong Kong influenza virus A/Aichi/2/68 (X-31) was investigated. The two polypeptide chains, HA1 and HA2, were fragmented by CNBr and enzymic digestion, and the amino acid sequence of each small peptide was deduced by comparing its chromatographic behaviour, electrophoretic mobility, amino acid composition and N-terminus with that of the corresponding peptide of the haemagglutinin of known structure from the influenza-virus variant A/Memphis/102/72. Those peptides in which changes were detected were sequenced fully. The complete amino acid sequence of the haemagglutinin HA1 chain (328 residues) and 188 of the 221 residues of the HA2 chain were established by this approach, and revealed only twelve differences between the amino acid sequences of variant-A/Aichi/68 and -A/Memphis/72 haemagglutinins. These occurred at positions 2, 3, 122, 144, 155, 158, 188, 207, 242 and 275 in the HA1 chain and 150 and 216 in the HA2 chain. The highly aggregated hydrophobic region (residues 180-121) near the C-terminal end of the HA2 chain was not resolved by peptide sequencing. The oligosaccharide distribution in variant-A/Aichi/68 haemagglutinin was identical with that found in that of A/Memphis/72, with sugar units attached at asparagine residues 8, 22 38, 81, 165 and 285 in the HA1 chain and 154 on the HA2 chain. The monosaccharide compositions of the individual carbohydrate units on variant-A/Aichi/68 haemagglutinin differed from those of the corresponding units in variant-A/Memphis/72 haemagglutinin, and evidence was found for heterogeneity in the oligosaccharide units attached at single glycosylation sites.

The Brits really ruled the day back then, which is why hemagglutinin was spelled haemagglutinin. It’s been changed since, but I won’t say for better or worse. (I’ve straddled both spelling systems, and I get confused myself.) However, this particular article doesn’t pre-date the Winter article, and doesn’t give us the original data for the crystal structure.

So, by adding “crystal structure” to my search terms, I found this treasure mine of a a paper/book chapter/whatever it is in Structure and Variation in Influenza Virus, published in 1980 (link):

The haemagglutinin glycoprotein of the Hong Kong strain of influenza virus (H3N2), [5] when released by bromelain from them embrane, [3] has been crystallized by microdialysis (50 microlitre) against 1.24-1.45M sodium citrate…

Citrate! Sounded like pickle juice to me.

They were able to infer the structure of the HA protein based on a bunch of electron density maps, which I remain unable to decipher. We’ll have to defer to them on that. Nonetheless, their description matches real close to the real structure of HA:

The molecular appears to be an elongate trimeric cylinder of approximately 130 [angstrom] length with radius varying from 15-38 [angstrom]. Sections down the molecular three-fold axis often show the subunit boundaries with little density in the centre of the molecule (Figure 3). Other regions have tightly-packed subunits forming a triangular structure with a smaller radius (Figure 4). The average azimuthal position of the subunit varies slightly along the length of the tree-fold axis. The molecule tends to be thicker at one end (70-80 [angstrom] diameter), around the heavy atom locations (Figure 3), and thinner (30-50 [angstrom]) at the other (Figure 4).

Detailed interpretation of the map has not been attempted pending phase refinement by non-crystallographic symmetry averaging.

In case you’re interested in the crystal structure of the HA trimer, here it is:

The crystal structure of the HA protein from A/Aichi/2/68.

This was the basis for the structure that Wiley, Wilson and Skehel eventually published in 1981, which was then picked up by Winter, Fields and Brownlee in 1981 as well, which then formed the basis of the H3 numbering system as first described by Robertson in 1987, which then morphed into the H3 numbering system that we have today.

So it seems like the H3 numbering system had its origins in a very well-characterized virus that came from 36 years ago. This numbering system has been used in the alignment of many different viruses, to provide a common reference point for talking about amino acid positions and their features. However, newer knowledge about HA evolution raises some new questions. Back in the day, there were many remarks on the high degree of similarity between HA subtypes. Granted, they only had to deal with a few subtypes. But now we have 16 different subtypes, with some being more host-restricted and others having a broader tropic range. While there is consensus that all HAs originated by common descent from an “original” HA, there is sufficient evolutionary time (~40 years or so) for a large number of substitutions, insertions, and deletions to have occurred within one subtype, not to mention the substitutions and indels when comparing between different subtypes.

Is it reasonable for us to keep using a numbering system based on a single virus? Would it make more sense to update the numbering system to be subtype-specific? Can we build upon the subtype-specific numbering system to create a properly unified numbering system for all HAs, so that we can properly communicate structural features of the HA protein? These are questions we as influenza researchers need to address, to ensure that we’re communicating on the same page.

Update 20 October 2015: Right a few months after I wrote this blog article, Drs. David Burke and Derek Smith of Cambridge University published a paper describing a reference numbering system for each subtype. I had a chance to chat with David one-on-one at a recent influenza surveillance meeting, and he recommended that I update my blog post. Great to know that we’re all thinking about the same issues!