PhD Project - Systematics of New Zealand Rhizobia

This page is a quick conversion to text format, for figures, tables, and references, see the pdf file

Systematics of New Zealand Rhizobia

Introduction Bacterial systematics Bacterial systematics is the study of diversity of organisms and their relationships comprising classification, nomenclature, and identification. The goal of systematics is to have a `natural' classification---one that reflects the evolutionary history of organisms. Bacterial systematics (with a focus on rhizobia) has been extensively reviewed [see][]Nutman87,Coutinho00,Broughton03,Brenner05. Changes in technology have had a significant impact on the tools used in systematics. Initially bacteria were classified by gross morphological features (cocci, spirals, short and long rods) Cohn72, and by cell wall staining Gram84. Further technological developments led to using biochemical and physiological characters to identify and classify cultures Orla09. Present day examples are the Biolog system (substrate utilisation), fatty acid profiles MacKenzie79, and multilocus enzyme electrophoresis (MLEE) Selander86,Pupo97,Nick99. The next development was using polymorphisms in DNA as a basis for differentiation. Early examples were DNA--DNA hybridisation studies Jarvis80, where it was proposed that 70 homology over the genome would determine if two strains were the same species Cohan02a. Other studies used restriction fragment length polymorphisms of DNA (RFLP) Laguerre94. The advent of the polymerase chain reaction (PCR), and DNA sequencing changed the face of bacterial systematics. DNA sequencing is comparatively affordable (in the past decade), rapid, specific, and easily comparable and reproducible between different labs and researchers. Gene phylogenies or `trees' have contributed greatly to understanding the evolutionary history of bacteria Woese77. A more complex approach is a polyphasic one Colwell70,Vandamme96, which incorporates multiple gene sequence data (genotype) with biochemical or morphological data (phenotype) (reviewed in Gillis05). Polyphasic studies have been used to describe new species Rivas03, and to clarify existing relationships Eardly05a. Combining these different datasets gives a better picture of the true evolutionary history, and may help to achieve the goal of systematics to have a `natural' classification system. A polyphasic approach is used in this thesis to identify strains from New Zealand legumes. Rhizobial systematics Historically rhizobial systematics was based largely on bacterial isolates from herbaceous crop and forage legumes of agricultural significance Broughton99. In contrast, few studies have been made of rhizobial associations among non-crop legumes, although indigenous legumes may be ecologically important in the natural landscape Boring88. Worldwide, there are an estimated 17000--19000 legume species Martinez96, although rhizobial species have only been identified for a small proportion of these. To date, 55 nodulating bacterial species have been identified in twelve genera (Table t-Rhizobia-species). Rhizobial systematics had been dominated from the beginning by association to the host plant. By 1980, the species names reflected those of their hosts Skerman80 (Table old-hosts). Nevertheless, there were many strains that were unclassified, or indeed unclassifiable under this scheme. Most of these anomalies were included in the `cowpea' rhizobia group. This group eventually contained isolates from the majority of all nodulated legumes Norris56 ``[This] situation was widely considered to be unsatisfactory'' Howieson05. table Historical classification of Rhizobium tabularlll Rhizobium species & Hosts R.leguminosarum & Lens, Pisum, Vicia R.trifolii & Trifolium R.phaseoli & Phaseolus R.meliloti & Medicago, Melilotus, Trigonella R.lupini & Lupinus, Ornithopus R.japonicum & Glycine Unclassified & Cowpea group tabular old-hosts Note: Classification of Rhizobium species according to cross-inoculation groups Jordan74. After Coutinho00 table The realisation that transmissible genetic elements---plasmids and symbiosis islands---could carry genes that conferred nodulation ability Klein75,Johnston78,Prakash81,Fenton94,Rao94,Sullivan98, resolved one of the long standing problems of rhizobial systematics, vis.strains with identical nodulation profiles could appear to be different in biochemical and genetic tests (and vice versa). These studies also showed that non-nodulating strains could easily gain the ability to nodulate by acquiring an accessory genetic element. Historical research on New Zealand rhizobia s-NZ-rhizobia-history Introduction There have been many studies made of rhizobia in New Zealand. Reflecting research around the world, most of these studies were made of crop and forage legumes, particularly Trifolium, Lotus and lupins [For example:][]Rys81,Patel84,Bonish85,Bonish91,Patrick92,Lowther95,Patrick95,Sarathchandra96, Lowther02. Most recently the molecular genetics of Mesorhizobium--Lotus interactions has been studied Sullivan95,Sullivan96,Scott96,Sullivan98,Sullivan01,Sullivan02. Some studies have also been made of the rhizobial symbionts of native legumes. Most of this work was carried out by R.M.Greenwood and colleagues in the 1960s--70s Hastings66,Greenwood69,Jarvis77,Greenwood78a,Greenwood78b,Crow81, and also recently by McCallum96. Work by international scientists is also of relevance to New Zealand rhizobia, particularly with relations of native legumes such as exotic Sophora species, and the Australian Swainsona, which are closely related to Clianthus, Carmichaelia, and Montigena. Prior work on rhizobia nodulating native legumes The earliest recorded work on the rhizobia of New Zealand native legumes was conducted in Europe. Dawson00 described thin infection threads in Carmichaelia australis root tissue. More elaborate structures were later described by Lechtova31. Milovidov28 described the nodules of C.australis, and named the isolated bacteria ``Bacterium radicicola forma carmichaeliana'', although this description ``lacked a sound basis'' Allen81. Infection studies by Wilson39a,Wilson44 showed that Australian Clianthus species (now called Swainsona) were nodulated by members of the `cowpea miscellany'. The most comprehensive studies of rhizobia nodulating native New Zealand legumes were done by Greenwood, who described strains isolated from the native legumes as ``all fairly similar acid producers'' Greenwood69. Carmichaelia, Clianthus, and Sophora were also ineffectively nodulated with ``a range of introduced rhizobia'' Greenwood69. Investigations of the amino acid patterns of nodules---the amino acid pattern in the 80 alcohol soluble fraction of nodules is determined primarily by the bacterial strain---revealed that there were many differences between strains that nodulated Carmichaelia Greenwood78a,Greenwood78b. Jarvis77 used 37 morphological, cultural, and physiological tests on 65 strains from native legumes, and 45 reference strains that consisted of R.leguminosarum strains, Ensifer meliloti, and the diverse `Lotus-Lupinus-Ornithopus cross inoculation group'. The indigenous strains (from Sophora and Carmichaelia) were similar to one another and were well separated from the R.leguminosarum complex and Ensifer meliloti, but acid-producing strains from the Lotus-Lupinus-Ornithopus cross inoculation group segregated with strains from native legumes. This diverse group included strains known as Rhizobium lupini (not currently a valid name). The identification of the New Zealand rhizobia was hampered by the small range of rhizobial species known at the time. Crow81 investigated 122 strains from a wide range of legume hosts. These strains were assigned groups based on original host and nodulation capacity, then `DNA homology' was determined by DNA--DNA reassociation. Strains from New Zealand legumes were placed in `Group 4' along with Cicer and Lotus rhizobia (including the strain that later became the type for Mesorhizobium loti). These strains all showed high similarity (by DNA hybridisation) to fast-growing Lotus isolates (CC811, and CC809a). Some strains from `Group 2' (isolated from Coronilla varia, Onobrychis viciifolia, Sophora formosa, and Sophora secundiflora) were able to form effective nodules on native Sophora and ineffective nodules on Carmichaelia and Clianthus. Both groups were distinct from R.leguminosarum and Ensifer species. Pankhurst87 investigated the morphology and flavolan content of root nodules of Lotus, Leucaena leucocephala, Carmichaelia australisAs Carmichaelia flagelliformis in publication., Ornithopus sativus, and Clianthus puniceus, that were induced by Mesorhizobium loti strains. Strain NZP2037 (ICMP1326) formed effective nodules on all tested legumes; strain NZP2213 (ICMP4682) was effective on Lotus corniculatus and ineffective on all other legumes. In a large study of hosts of the broad host-range Ensifer fredii strains USDA 257 and NGR 234, Sophora microphylla and Sophora tetraptera were not nodulated. Effective nodules, however, were formed on the exotic Sophora species S. tomentosa, S.velutina and S.davidii Pueppke99. In the same study, Swainsona forrestii was nodulated by NGR234 but four other Australian species did not nodulate. The only molecular phylogenetic work done on nodule isolates from Carmichaelia and New Zealand Sophora was a BSc Honours dissertation, in which a small (200-bp) variable region of the 16S rRNA gene was sequenced from eleven strains. The isolates were identified as Mesorhizobium spp.and showed diverse RFLP profiles McCallum96. In summary, these prior studies showed that the rhizobia nodulating native New Zealand legumes were distinct from known Rhizobium and Ensifer species, and belonged in the poorly described `cowpea' group (or Lotus group). However, a more accurate classification was inhibited by the techniques and knowledge of the time. The recent study using molecular techniques is promising, and this work will be greatly extended in this thesis through rigorous analysis of more strains. Rhizobia nodulating woody legume weeds There has been little research done on the rhizobia of legumes that are considered weeds in New Zealand. Prior to work beginning on this thesis, previously published literature reported that Cytisus species were nodulated by slow-growing Rhizobium species in New Zealand Greenwood77. Other studies overseas by Pieters27 and Wilson39a classified gorse isolates into the `cowpea group'. More recently Pueppke99 reported that Ulex and Cytisus formed ineffective nodules with the exceptionally broad host-range Ensifer fredii NGR234. Australian Acacia-nodulating strains have been identified as Bradyrhizobium spp.and Rhizobium spp.Barnet91,Fremont99,Marsudi99. But the rhizobia nodulating Acacia in New Zealand have not been identified. Experimental objectives Although New Zealand became geographically isolated starting from about about 80 million years ago Cooper93,Stevens95, legume ancestors arrived less than 10 mya from the northern hemisphere (see Section legume-history). It is postulated that the native legume genera co-evolved with nitrogen-fixing bacterial symbionts in isolation from the regions of major legume evolution Provorov98b,Aguilar04. This being the case, it is possible that there are novel rhizobial species associated with the native legumes in New Zealand. The source of rhizobial symbionts of introduced legumes is unknown, although the invasive legume weeds are readily nodulated Zabkiewicz76,Hicks01. These legumes have been present for fewer than 200 years, and were deliberately introduced by early settlers. Their rhizobia either were introduced at the same time as the plants, or the plants were able to use a population pre-existing in New Zealand soils. The objectives of this chapter were to identify the rhizobia nodulating New Zealand native legumes, and selected invasive introduced woody legumes. Methodology In this study a polyphasic approach Vandamme96 was used to characterise bacterial strains. Rhizobial isolates were obtained from the root nodules of native and introduced legumes, and DNA extracted from bacterial cultures. Four housekeeping genes were sequenced (16S rRNA, atpD, glnII, recA) and used to construct phylogenetic trees. The genes chosen were well spaced throughout the genome, and had data from other strains available in GenBank. The use of multiple genes allows more rigorous investigation of genotypes. The sequence data were aligned with type strains of rhizobia, and DNA and protein sequences analysed with Maximum Likelihood and Bayesian methods, to build phylogenetic trees. Protein trees were used where possible, as the protein is the unit of selection, and to counteract the effect of substitution bias at the third codon position (`wobble base') on phylogenies. Saturation of this base has been shown to contribute to phylogenetic misinformation Mindell96. Phenotypic characters of rhizobial strains were assessed by two biochemical tests. The Biolog GN2 microplate system (see section method-Biolog) analyses the ability of strains to metabolise different carbon source substrates. The data were analysed by hierarchical clustering and Bayesian inference. Fatty acid methyl ester profiles (FAME) (see Section method-FAME) were determined by saponification and methylation of cell wall lipids, then analysis by gas chromatography. Results were presented as a UPGMA dendrogram. Results of phylogenetic analyses 16S rRNA analyses figure [width=12cm]16S-ML [16S rRNA maximum likelihood phylogenetic tree]Maximum likelihood phylogenetic tree of 16S rRNA gene sequences showing the relationship of rhizobial isolates from New Zealand legume flora to type strains of rhizobia. Sequences are referred to by the ICMP number of the source strain, and the host plant. Isolates were assigned to eight major clades (Genomic groups A--H) based on this phylogeny. Original host legume is shown in parenthesis. The model of evolution used was TIM+I+. Likelihood score of the best tree was 4642.07. Scale bar indicates number of substitutions per site. p-16S-ML figure figure [width=12cm]16S-MB [16S rRNA Bayesian inference phylogenetic tree]Bayesian inference phylogenetic tree of 16S rRNA gene sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. Identical sequences were collapsed into a single taxon. The model of evolution used was GTR+I+ with 1010 generations. Scale bar indicates number of expected changes per site. Clade posterior probability is indicated above the node. p-16S-MB figure Amplification of the 16S rRNA gene was successful for all isolates attempted. Sequences for ICMP strains 11719, 11736, 11542, 11727, 12637, 12642, 12674, 12687, 12624 were sequenced by Duckchul Park (Landcare Research) in a pilot study for this project. Strains 12624 and 12642 were resequenced, and determined to have been transposed in the pilot study. A total of 22 sequences were obtained from native legumes (Table t-GenBank-native), and 16 from introduced woody legume weeds (Table t-GenBank-exotic). GenBank sequence accession numbers beginning with `DQ' were sequenced after the publication of this data Weir04. The sequence alignment contained 61 taxa and was 1321-bp long. There were several single base pair indels, but no significant features, such as major deletions in the alignment. All known rhizobial Mesorhizobium and Bradyrhizobium species were included, represented by sequences of type strains, with selected species from Rhizobium and Ensifer also included after evaluation of initial trees. In all figures of gene and protein trees the outgroup, Caulobacter crescentus, was removed to save space. The model of evolution selected under Maximum Likelihood was TIM+I+ (transitional model), a subset of the GTR (Fig.p-16S-ML). The tree-island profile showed multiple hits on the best tree, but also multiple hits on another island. The --lnL score for the second island was 0.54 greater, this small difference meant that the tree topologies of the two islands were probably very similar. The same data were analysed under Bayesian analysis using the GTR+I+ model, for ten million generations. Identical sequences were collapsed to a single taxon to simplify analysis. On the default settings the four concurrent analyses (`chains') did not swap their states well, so the temperature of the heated chains was decreased from the default of 0.2, to 0.05. After this adjustment, the analysis performed well with measures of convergence at good levels. The consensus tree is shown in Figure p-16S-MB, the numbers above nodes are the marginal posterior probabilities of the clade being correct. The strains were assigned to `genomic groups' based on clustering observed on these trees. The sequences from rhizobia isolated from New Zealand legumes are distributed in eight genomic groups (A to H). Sequences from native legumes, Carmichaelia, Clianthus, Montigena and Sophora, were distributed in Groups A--D, together with the reference sequences representing Mesorhizobium spp. Other sequences from native legumes also formed a clade (Group E) with Rhizobium leguminosarum. All rhizobia isolated from introduced legumes, Acacia, Albizia, Cytisus and Ulex, were in the Bradyrhizobium clade (Groups F--H). In both analyses the bacterial genera (Mesorhizobium, Rhizobium, Ensifer, and Bradyrhizobium) were well separated from each other. The Maximum Likelihood and Bayesian trees were essentially congruent in their topology, a minor exception was in Group D where the clade was internal in the ML tree, but as an outgroup to other Mesorhizobium sequences in the Bayesian tree, the posterior probability for such an arrangement was low (0.58). Most genomic groups do not show any regional variation (Fig.p-Isolates). Exceptions are the single strain of Group B (found only at Mt.Terako), and the 3 strains of Group C which were all isolated from Clianthus in Waikaremoana, the natural range of this species. atpD analyses figure [width=12cm]atpD-ML [atpD maximum likelihood phylogenetic tree]Maximum likelihood phylogenetic tree of atpD DNA sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was GTR+I+. Likelihood score of the best tree was 4038.52. Scale bar indicates number of substitutions per site. p-atpD-ML figure figure [width=12cm]atpD-MB [atpD Bayesian inference phylogenetic tree]Bayesian phylogenetic tree of atpD DNA sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was GTR+I+ with 1010 generations. Scale bar indicates number of expected changes per site. Clade posterior probability is indicated above the node. p-atpD-MB figure figure [width=12cm]atpD-protein-ML [AtpD protein maximum likelihood phylogenetic tree]Maximum likelihood phylogenetic tree of AtpD protein sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was WAG+I+. Scale bar indicates number of substitutions per site. p-atpD-protein-ML figure figure [width=12cm]atpD-protein-MB [AtpD protein Bayesian inference phylogenetic tree]Bayesian phylogenetic tree of AtpD protein sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was WAG+I+ with 210 generations. Scale bar indicates number of expected changes per site. Clade posterior probability is indicated above the node. p-atpD-protein-MB figure Amplification of the atpD gene was successful for all isolates attempted, after the extension time of the PCR cycle was increased to 45 seconds. No sequences were available for the type strains of Mesorhizobium temperatum and Mesorhizobium septentrionale, as these species were not described during the experimental phase of this project. A total of 14 sequences were obtained from native legumes (Table t-GenBank-native), and 7 from introduced woody legume weeds (Table t-GenBank-exotic). The alignment of the atpD gene DNA sequence had 41 taxa and was 459-bp long. A feature of the alignment is an 15-bp deletion in the Mesorhizobium, Rhizobium and Ensifer sequences and a 3-bp deletion in the Bradyrhizobium sequences near the middle of the alignment---compared to the outgroup Caulobacter crescentus. The model of DNA evolution selected under Maximum Likelihood was GTR+I+. The ML tree is shown in Figure p-atpD-ML. The tree-island profile showed multiple hits on the best tree with no other significant islands indicating a different topology. The same data were analysed under Bayesian analysis using the GTR+I+ model for ten million generations. The consensus tree is shown in Figure p-atpD-MB. The gene sequence was translated to a protein sequence of 153 aa. The ML model of protein evolution selected was WAG+I+ (Fig.p-atpD-protein-ML). The same model was used for the Bayesian analysis of the protein data, and run for two million generations (Fig.p-atpD-protein-MB). The atpD DNA and protein sequence data, when analysed with Maximum Likelihood and Bayesian inference, generated phylogenetic trees that were generally similar, most of the differences were in a large Mesorhizobium clade that was poorly resolved, and had weak support with a posterior probability of 0.56 in the gene sequence, and 0.52 in the protein sequence. All isolates from the native legumes (with the exception of 14642) were grouped in the Mesorhizobium clade. All isolates isolated from introduced legumes were found in the Bradyrhizobium clade. There were exceptions in the congruence of the trees. The type strain of Mesorhizobium chacoense remained as an outgroup to all other Mesorhizobium sequences in all but the Bayesian DNA analysis, where it was included in a large poorly resolved clade. Other deviations from the 16S trees include the positions of Ensifer and Rhizobium. In the 16S trees these two genera were separated, but in the atpD analyses the Ensifer spp.sequences are internal to Rhizobium spp. In the ML protein tree, these positions are reversed while in the Bayesian protein tree the groups are separated, as expected. glnII analyses Amplification of the glnII gene was successful for most isolates attempted, although sequences could only be partially amplified for four strains: Mesorhizobium plurifarium and strain ICMP 14753 (48 sequence coverage obtained), and Mesorhizobium amorphae and strain ICMP 13190 (67 sequence coverage obtained). Gaps in the alignment were treated as `missing data' as described in the methods. A total of 10 sequences were obtained from native legumes (Table t-GenBank-native), and 7 from introduced woody legume weeds (Table t-GenBank-exotic). The alignment of the glnII gene DNA sequence had 36 taxa and was 828-bp long. There were no deletions or insertions in the alignment. The glutamine synthase II gene is only present in rhizobia, although a distant homologue is present in eukaryotes Turner00. Thus there was no appropriate outgroup for this alignment. To prevent presenting this data as an unrooted tree, which is visually untidy, the trees were bent at the central node around the Bradyrhizobium sequences. The model of DNA evolution selected under Maximum Likelihood was TIM+I+. The ML tree is shown in Figure p-glnII-ML. The tree-island profile showed multiple hits on the best tree with no other significant islands indicating a different topology. The same data were analysed under Bayesian inference using the GTR+I+ model for ten million generations. The consensus tree is shown in Figure p-glnII-MB. The gene sequence was translated to a protein sequence of 276 aa. The ML model of protein evolution selected was WAG+ (Fig.p-glnII-protein-ML). The same model was used for the Bayesian analysis of the protein data, and run for two million generations (Fig.p-glnII-protein-MB). All isolates from the native legumes (with the exception of 14642) were grouped in the Mesorhizobium clade. All isolates isolated from introduced legumes were found in the Bradyrhizobium clade. figure [width=12cm]glnII-ML [glnII maximum likelihood phylogenetic tree]Maximum likelihood phylogenetic tree of glnII DNA sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was TIM+I+. Likelihood score of the best tree was 6569.29. Scale bar indicates number of substitutions per site. p-glnII-ML figure figure [width=12cm]glnII-MB [glnII Bayesian inference phylogenetic tree]Bayesian phylogenetic tree of glnII DNA sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was GTR+I+ with 1010 generations. Scale bar indicates number of expected changes per site. Clade posterior probability is indicated above the node. p-glnII-MB figure figure [width=12cm]glnII-protein-ML [GlnII protein maximum likelihood phylogenetic tree]Maximum likelihood phylogenetic tree of GlnII protein sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was WAG+. Scale bar indicates number of substitutions per site. p-glnII-protein-ML figure figure [width=12cm]glnII-protein-MB [GlnII protein Bayesian inference phylogenetic tree]Bayesian phylogenetic tree of GlnII protein sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was WAG+ with 210 generations. Scale bar indicates number of expected changes per site. Clade posterior probability is indicated above the node. p-glnII-protein-MB figure The topology of the glnII tree was quite consistent between analyses. The Mesorhizobium sequences were split into three clades, with all Rhizobium sequences grouping with M. plurifarium. Ensifer went to the root of the tree in DNA sequences, but grouped with M.chacoense in the protein trees however this could be due to long branch attraction Bergsten05. The position of R.tropici had low support in the Bayesian trees, and did not group with other Rhizobium species in the ML protein analysis. The overall topology is similar to that of previous studies Turner00. recA analyses figure [width=12cm]recA-ML [recA maximum likelihood phylogenetic tree]Maximum likelihood phylogenetic tree of recA DNA sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was GTR+I+. Likelihood score of the best tree was 4636.57. Scale bar indicates number of substitutions per site. p-recA-ML figure figure [width=12cm]recA-MB [recA Bayesian inference phylogenetic tree]Bayesian phylogenetic tree of recA DNA sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was GTR+I+ with 1010 generations. Scale bar indicates number of expected changes per site. Clade posterior probability is indicated above the node. p-recA-MB figure figure [width=12cm]recA-protein-ML [RecA protein maximum likelihood phylogenetic tree]Maximum likelihood phylogenetic tree of RecA protein sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was CpREV+. Scale bar indicates number of substitutions per site. p-recA-protein-ML figure figure [width=12cm]recA-protein-MB [RecA protein Bayesian inference phylogenetic tree]Bayesian phylogenetic tree of RecA protein sequences showing the relationship of rhizobial isolates from New Zealand legumes to type strains of rhizobia. Letters after strain numbers indicate genomic grouping. The model of evolution used was CpREV+I+ with 210 generations. Scale bar indicates number of expected changes per site. Clade posterior probability is indicated above the node. p-recA-protein-MB figure Amplification of the recA gene was successful for most isolates attempted with the exception of sequences from Group C Mesorhizobium strains. A total of 14 sequences were obtained from native legumes (Table t-GenBank-native), and 7 from introduced woody legume weeds (Table t-GenBank-exotic). The alignment of the recA gene DNA sequence had 39 taxa and was 533-bp long. There were no deletions or insertions in the alignment. The model of DNA evolution selected under Maximum Likelihood was GTR+I+. The ML tree is shown in Figure p-recA-ML. The tree-island profile showed multiple hits on the best tree, but also multiple hits on another island. The --lnL score for the second island was 1.16 greater, this small difference meant that the tree topologies of the two islands were probably very similar. The same data were analysed under Bayesian analysis using the GTR+I+ model for ten million generations. The consensus tree is shown in Figure p-recA-MB. The gene sequence was translated to a protein sequence of 276 aa. The ML model of protein evolution selected was CpREV+ (Fig.p-recA-protein-ML). The same model was used for the Bayesian analysis of the protein data, and run for two million generations (Fig.p-recA-protein-MB). All isolates from the native legumes (with the exception of 14642) were grouped in the Mesorhizobium clade. All isolates isolated from introduced legumes were found in the Bradyrhizobium clade. An interesting property of the tree topology differences between DNA and protein analyses is shown with the Mesorhizobium clade. This clade is somewhat heterogeneous in the DNA analyses, but collapses to a single homogenous clade of identical sequences in the protein analysis (excepting M.plurifarium, M.chacoense, and 14330) Coherence of groups Group A -- Mesorhizobium Group A comprised strains: 14330, 11719, 11736, 12637 (Sophora), and 12649, 15054 (Carmichaelia). In the 16S rRNA gene analyses, on which the grouping was based, all six Group A sequences were identical, and closely grouped to M.ciceri and M.loti. A BLAST of the Group A 16S sequence revealed an identical match in the GenBank database to Mesorhizobium strain rob8, isolated from Robina pseudoacacia in Germany Ulrich00. In the atpD trees, Group A splits with strain 14330 diverging from the others of the group (11719, 11736, 15054). The later have identical protein sequences but diverge a little in DNA sequence. In the recA DNA trees, all Group A sequences are together, however in the protein trees, 14330 splits away from the large clade containing most other New Zealand sequences. Two Group A strains were sequenced for the glnII gene, and in all analyses these sequences formed a well supported clade. There was no consistency in the clustering of Group A strains to specific type strains, but they were always within Mesorhizobium. Group B -- Mesorhizobium Group B consists of a single strain: 12685 (Montigena), that was sufficiently well separated from other strains in the 16S rRNA Maximum Likelihood and Bayesian analyses to be given its own group. A BLAST of the Group B 16S sequence reveals an identical match in the GenBank database to two Mesorhizobium strains R88b (USDA3462) and R8CS (USDA3467) isolated from the root nodules of Lotus corniculatus in New Zealand Sullivan96. In the atpD and recA analyses, this strain groups variably with other Mesorhizobium species and is not distinct. In the glnII analyses however, Group B was well separated from all other strains. Group C -- Mesorhizobium Group C comprised strains: 11720, 11721, and 11726 which were all from Clianthus and the same region of New Zealand (Waikaremoana). Group C 16S rRNA sequences were identical to the type strain of Mesorhizobium amorphae from China and Spain, however in the atpD analyses these strains did not cluster with M.amorphae, but were distinct from the large clade containing most other Mesorhizobium species. Mesorhizobium plurifarium was the closest neighbour. No sequences were amplified for the glnII or recA genes for Group C strains, yet these genes were amplified for Mesorhizobium type strains (Figure t-GenBank-type), suggesting Group C glnII and recA genes are sufficiently different from the type strains (and other Mesorhizobium spp.) that the primers do not anneal well. Group D -- Mesorhizobium Group D comprised eight strains: 11708, 14319, 12635, 13190, 11722 (Carmichaelia), 12690 (Montigena), and 12680, 11541 (Clianthus) and is the most divergent of the Mesorhizobium groups. In the 16S rRNA analyses, these strains cluster with the type strain of Mesorhizobium huakuii. In the atpD DNA ML analysis, Group D forms a coherent group with M.huakuii, M.amorphae and M.loti. In the other atpD analyses most of Group D falls into the large Mesorhizobium clade. In the recA gene DNA phylograms, Group D formed a coherent clade, and in the protein trees all the sequences were in the large Mesorhizobium clade. In the glnII analyses, all Group D sequences formed a clade with M.loti, except for strain 11708. Group E -- Rhizobium leguminosarum Group E comprised four Rhizobium leguminosarum strains: 14642 (Sophora), 12687 (Carmichaelia), 11542 (Clianthus) and 11727 (Carmichaelia). In the 16S rRNA trees, these strains formed a clade with R.leguminosarum and R.etli, however these strains are more similar to R.leguminosarum, as the branch length to R.etli is longer. In all other gene and protein analyses Group E (represented by strain 14642) tightly grouped with R.leguminosarum with excellent clade support. Group F -- Bradyrhizobium Group F comprised six strains: 12835, 14754, 14755 (Acacia), 14533, 14304 (Ulex) and 14753 (Albizia). The Bradyrhizobium groups have altered from the previously published grouping Weir04. In all atpD analyses the group splits in two with 12835 and 14533 forming a clade with B.canariense, and 14753, 14754, and 14755 grouping with B.liaoningense. In all recA trees the group is split into two with clades comprising 12835, 14533, 14755 and 14753, 14754 respectively. In the DNA trees they do not form a clade with known species, but in the protein trees 14533 and 12835 have the same sequence as B.canariense; the other strains form a clade with B.liaoningense. In the glnII DNA trees the group forms a single clade with B.canariense. In the protein trees 12835, 14533, 14754, and 14755 loosely cluster with B.canariense, and 14753 groups with B.japonicum. Group G -- Bradyrhizobium Group G comprised 14320 and 12674 (Ulex), and in the 16S rRNA analyses was almost identical to B.canariense. No other genes were sequenced for this group. Group H -- Bradyrhizobium Group H comprised eight strains: 14309, 14310, 14291, 14328, 12624 (Cytisus), 14292, 14306 (Ulex), and 14752 (Albizia). Under the 16S rRNA analysis, five of the eight sequences were identical and there was no obvious relationship with a single type strain. In the atpD analyses, the two strains 14291 (Cytisus) and 14752 (Albizia) always grouped together. There was no clear relationship to type stains but in the DNA ML analyses, this cluster formed a clade with B.japonicum. In all of the the recA analyses the strains 14291 and 14752 formed a well supported clade with B.japonicum. In the glnII DNA analyses, the strains cluster with B.japonicum. In the GlnII protein trees, the strains formed a clade with B.japonicum, that included 14753 (Group F) and B.liaoningense. Discussion of phylogenetic analyses Identification of strains Rhizobial isolates of the three most common and geographically widespread species from Carmichaelia and Sophora, and from the genera, Clianthus and Montigena (Table t-GenBank-native), were used to infer phylogenetic relationships of the rhizobia of the native legume genera in New Zealand. These were compared with the rhizobia of invasive introduced legumes, Acacia, Cytisus and Ulex, which are noxious weeds in New Zealand. Reference sequences from Bradyrhizobium, Mesorhizobium, Rhizobium, and Ensifer type strains were included. Phylogenetic inference, as an approach to clarifying bacterial relationships, is usually based on the comparative analysis of 16S rRNA sequences and has been used in past investigations of rhizobia deLajudie94,Jarvis97,Young01a. In this study, three additional genes were used in an attempt to derive more reliable phylogenetic inferences Anzai00,Gaunt01,Hilario04. Partial sequences of the three housekeeping genes (atpD and glnII and recA) were also used to generate phylograms, which were then compared. The topologies of all four trees are congruent in indicating that New Zealand's native legumes are nodulated by members of Mesorhizobium and Rhizobium genera. Based on the analysis of 16S rRNA, individual rhizobial strains were assigned to 8 groups (A--H). Sequences representing rhizobial strains from a single plant genus are distributed between groups. With the exception of Group C, which includes two strains from Clianthus, and Group D, which is dominated by strains from Carmichaelia, the groups generally do not represent bacterial strains from particular host legumes. Homogeneous groups such as Group C are probably a reflection of the small sample size of this legume genus. The presence of an outlying Clianthus strain in Group D suggests that larger representations of strains may result in groups that are more heterogeneous. All other groups are heterogeneous with respect to the host sources of strains. For instance, Group A comprises four strains from Sophora and two from Carmichaelia. Groups A--D are in a clade represented entirely by known Mesorhizobium spp. The clade formed by strains in Group E, from Sophora, Carmichaelia and Clianthus, includes the sequence representing Rhizobium leguminosarum. Group F in Bradyrhizobium contains all sequences from Acacia. The tree topologies for the different gene sequences place the strains isolated from New Zealand native legumes in the genus Mesorhizobium and in Rhizobium leguminosarum, and all the introduced legumes in Bradyrhizobium. Consideration of the individual gene trees, however, shows that they are not mutually congruent at the species level. In some cases, sequences are as similar to one another as to the neighbouring known species and therefore they may be members of these species. For instance, the sequences in Group C may represent strains of M.amorphae. The placement of many strains into clusters that are distinct from existing named species, indicate possible novel species. Such novel species are unlikely to be unique to New Zealand, as 16S rRNA types are very similar to non-type isolates found overseas. Nevertheless, the absence of criteria relating sequence directly to taxonomic differences means that further data must be obtained by other methods before these strains can be properly classified Vandamme96. These data confirm a preliminary study, which showed that isolates from Carmichaelia were members of Mesorhizobium McCallum96. By extension it is suggested that the fast growing, acid producing, strains isolated from native legumes by Greenwood69, Jarvis77, and Crow81 should be identified as Mesorhizobium spp. Review of phylogenetic analysis methods Each phylogenetic analysis method attempts to construct the `true' tree---one that reflects evolutionary processes. Nevertheless, the phylogenies presented in this study were not entirely consistent, raising the question of the `true' phylogenetic position of the strains. The methods of analysis used here are presently among the most rigorous methods available for phylogenetic inference; the analyses were done conservatively with many more computational replicates used than comparable published analyses Gaunt01. The analyses performed well according to the internal measures of congruence (tree-island profiles and convergence diagnostics). Additionally two methods of analysis (ML and Bayesian) were used to counteract any biases of a single method, and both DNA and protein data were analysed in order to counteract base saturation. It was assumed that if the same patterns are present in different analyses then one could be more confident of the result. The phylogenies, however, were not entirely congruent; although strains were classified into genera with high confidence, grouping within a genus was not entirely consistent among analyses. Differences between phylogenies of different genes can easily be explained by differential evolution of the genes. For example, there is no reason to assume that the glnII gene, a glutamine synthase involved in the nitrogen fixation process, will evolve at the same rate or be subject to the same evolutionary pressures as a protein involved in ATP synthesis (atpD). In cases of extreme differences in topology, horizontal gene transfer between related strains may be an explanation for the incongruence. Differences between methods of analysis of exactly the same data can in part be explained by the different biases and assumptions of the algorithms used, although the aim of each is to get the `true' tree. In other cases the incongruence may arise from an intrinsic feature of sequence alignment such as biases in taxa sampling, or chimeric sequences. Alternatively the third base position may have become saturated, i.e.accumulated so many mutations that it is effectively random. Using protein trees (or character partitioning in MrBayes) can solve this problem, an example is the Mesorhizobium recA gene, that revealed diverse DNA sequences, yet nearly all had identical protein sequences. This may indicate a functional constraint on the structure of the protein. Future directions in phylogenetic analysis Gene choice Phylogenetic analysis has usually been performed with a single gene. The gene of choice for bacterial systematics is the DNA sequence coding for the 16S ribosomal RNA (rrn) molecule. This is no doubt due, in part, to Woese's seminal Woese87 `three kingdoms of life' paper. Since then, phylogenies of the 16S gene have been used extensively to identify and classify bacterial strains Young04b,Kuykendall05, and it is now a requirement to sequence this gene as part of the description of a proposed new bacterial species Graham91,Stackebrandt94. It was proposed that a 3 difference should be used to place strains into different species Cohan02a although this `rule' has widely been misinterpreted to mean that organisms sharing greater than 97 16S homology are the same species. However, there are problems with this approach. Primarily it must be realised that these data are creating `gene trees', not necessarily trees reflecting the evolution of the whole organism. This problem is intrinsic with using the sequence from any one gene. There are also problems with the 16S gene itself; some rhizobia have multiple copies of the 16S rRNA gene, (M.loti: two, B.japonicum: one, E.meliloti: three), which may have internal variation. For example Thermobispora bispora has two copies which differ by 6.4 Wang97b. Horizontal (lateral) gene transfer (HGT) can also confound single gene phylogenies, as has been observed in Bradyrhizobium elkanii which included a small fragment of the 16S rRNA derived from Mesorhizobium VanBerkum03. Because of these problems, recent studies (including this work) have examined multiple genes in an attempt to reduce bias and the effects of HGT. A current implementation of multigene analysis that shows great promise is Multilocus sequence typing (MLST) Maiden98. In this technique short segments of many different genes are sequenced. This method has high discriminatory power below the level of genus, to individual strains. The use of multiple genes raises the question of which genes to select for analysis. Housekeeping genes are intuitively good to use as they are conserved, but few studies have been made that analyse the appropriateness of gene choice. The genes used in this work were selected because they had previously been used successfully. Zeigler03 investigated thirty-two protein-encoding genes that are distributed widely among bacterial genomes, and tested for the potential usefulness of their DNA sequences in assigning bacterial strains to species. It was determined that recN and dnaX worked well, but genes such as 16S and trpS scored poorly. It is likely that future studies using multiple gene data could be improved by careful selection of the genes used. Methods of analysis For many years the predominant method to analyse genetic data were Neighbour Joining. This analysis is simple to perform, and exceptionally quick---only a few seconds---even for large data sets. However, such simplistic analysis may not do the data justice, which may have taken months to obtain. Modern methods of analysis (Maximum likelihood and Bayesian) are more complex, and are character (nucleotide) based rather than distance based like NJ, and as such are more rigorous. It is important when inferring from data that it has been analysed accurately, preferably by multiple analyses to eliminate biases. It is likely that the best approach is to combine the data from multiple genes, and analyse the combined data with robust statistical analyses. It is possible to combine data from several genes into a single tree either by concatenation (joining the data end-to-end into a single long alignment) Hilario04, or computationally through splits or reticulate networks Huson06, or by MLST Maiden98. Such methods simplify inference by providing a single tree. It is likely and desirable that 16S sequencing will remain a part of bacterial systematics, as it has value in higher level (genus and above) taxonomy, and for use in comparison with the large amount of data available for this gene. In addition a phylogenetic study should include carefully analysed multigene data to determine relationships below that of the genus. Results of phenotypic analyses Biolog metabolic fingerprinting figure [width=12cm]BL2 [2-State hierarchal cluster Biolog dendrogram]Hierarchal clustering dendrogram of Biolog phenotypic two-state (0, 1) data showing the relationship of isolates from New Zealand legumes to type strains of rhizobia. Similarity matrix generated by simple matching, and the dendrogram by average linkage. Letters after the strain numbers indicate genomic grouping assigned from phylogenetic analyses. p-BL2 figure figure [width=12cm]BL3 [3-State hierarchal cluster Biolog dendrogram]Hierarchal clustering dendrogram of Biolog phenotypic three-state (0, 0.5, 1) data showing the relationship of isolates from New Zealand legumes to type strains of rhizobia. Similarity matrix generated by simple matching, and the dendrogram by average linkage. Letters after the strain numbers indicate genomic grouping assigned from phylogenetic analyses. p-BL3 figure figure [width=12cm]Biolog-0-1-MB [2-State Bayesian Biolog dendrogram]Bayesian inference dendrogram of Biolog phenotypic two-state (0, 1) data showing the relationship of isolates from New Zealand legumes to type strains of rhizobia. Analysis was run for 1010 generations, clade posterior probability is indicated above the node. Letters after the strain numbers indicate genomic grouping assigned from phylogenetic analyses. p-Biolog-0-1-MB figure figure [width=12cm]Biolog-0-1-2-MB [3-State Bayesian Biolog dendrogram]Bayesian inference dendrogram of Biolog phenotypic three-state (0, 1, 2) data showing the relationship of isolates from New Zealand legumes to type strains of rhizobia. Analysis was run for 1010 generations, clade posterior probability is indicated above the node. Letters after the strain numbers indicate genomic grouping assigned from phylogenetic analyses. p-Biolog-0-1-2-MB figure The Biolog metabolic fingerprint consists of a pattern of colour changes corresponding to the ability to metabolise a substrate in a 96 well plate, containing 95 defined carbon-source substrates (Figure Biolog-grid). The Biolog software determined if a well was unchanged, partially, or fully changed. These three `states' were assigned the values (0, 0.5, 1) respectively. In analyses using two states (0, 1), partial reactions were assumed to be positives (all 0.5 values converted to 1). With Bayesian analyses it was not possible to specify a decimal value for a state and therefore in this case all 0.5 values were converted to 2 (the value in this case is irrelevant as each state has an equal weight). In all these analyses, data from the 24 hour incubation was used. Few Bradyrhizobium strains were amenable to this method of analysis, as they produced profiles with fewer than three positive results. It is possible that the growth rate of these strains is too slow for this particular assay. The analysis was successfully performed on 15 strains from native legumes, one strain from an introduced legume and 16 type strains (Mesorhizobium spp., Rhizobium spp., Ensifer spp., Bradyrhizobium spp.). Generally isolates from native legumes had weaker profiles than type strains (less intense colour and fewer positives). Isolates formed two major clades in the hierarchal cluster analyses (Figures p-BL2 and p-BL3), separate from the type strains. One group contained mostly genomic group D (although other Group D strains were scattered throughout the dendrogram). The other clade contained the remaining Mesorhizobium spp., but also R.leguminosarum and Bradyrhizobium spp. In the Bayesian analyses (Figures p-Biolog-0-1-MB and p-Biolog-0-1-2-MB), nine isolates formed a large clade consisting of Mesorhizobium spp.and R.leguminosarum, with an internal Bradyrhizobium spp.clade. Grouping of strains by Biolog did not conform to currently accepted genera in bacterial classification, nor to classifications based on phenotypic investigations Kuykendall05,Brenner05. There was also no clear correlation of the clades to the genomic groups defined by the gene trees. Fatty acid methyl ester profiles figure [width=11cm]fatty_acids [FAME dendrogram]Dendrogram of Fatty Acid Methyl Esters (FAME). Dendrogram was calculated by Unweighted Pair Match Grouping (UPGMA) and the scale is expressed in Euclidean distances. There are duplicate runs of some strains. Genomic group assigned from phylogenetic analyses is indicated where known. p-fatty_acids figure Fatty acid methyl ester (FAME) profiles were generated by Central Science Laboratories in York, England. Hard copies of raw data and a dendrogram were received. The hardcopy tree was scanned, manually converted to a vector based format, and species names and genomic grouping labels applied. Strains Mesorhizobium sp.12637, Mesorhizobium sp.14321, B.japonicum, B.liaoningense, and E.saheli did not grow or were contaminated (no profiles were generated from these). There are 55 profiles representing 45 strains (Tables t-Complete-type,t-Complete-native, and t-Complete-exotic). The scale of the dendrogram is expressed in Euclidean distances. ``Euclidean distances of 2--3 are expected in reruns of the same strain under the strict cultural and CG conditions applied. Strains within a species usually have Euclidean distances of less than 7 [on average]. Distances above 7 often imply different species [although in this analysis] the rule may slip to as much as 12'' [David][]Stead-pc. Some of the strains were repeated in the analysis. In most cases the replicates clustered closely, although some at a distance greater than 2--3. The repeated strains that did cluster closely were: 14291, 14642, 12674, M.amorphae, 11719, M.chacoense, 12649, 14319, 12690, and 12687. However, replicates of strain 14324 and E.terangae are quite different. Mesorhizobium sp.14324 (genomic group unknown) is present in two well-separated Mesorhizobium clades by a distance of about 18 Euclidean distances (ED). One replicate of E.terangae clusters with two other Ensifer species, yet the replicate is over 50 ED away as an outgroup to all other sequences. It is probable that the data for the latter replicate is erroneous. Grouping of strains did not conform to currently accepted genera in bacterial classification. Bradyrhizobium species from introduced legumes were found in two clusters separated by about 45 ED, the only Bradyrhizobium type strain to be analysed was 19 ED away from the closest cluster. Ensifer type strains were found in three clusters (not including the erroneous outgroup) that were well separated in the dendrogram. Rhizobium species were found in five well separated clades, including R.leguminosarum 12687 in a Mesorhizobium clade. Mesorhizobium species were found in about eight clades (depending on how they were defined). Likewise there was no clear correlation to the Genomic groups defined by the 16S rRNA tree. Discussion of phenotypic analyses Introduction The topologies of the Biolog and FAME trees are quite different from each other and from the phylogenetic analyses. Both Biolog (conducted in this laboratory) and FAME analyses (conducted at CSL, York) had problems with reproducibility, analysis, and disagreement with currently accepted bacterial classification. Therefore in interpreting the data for this chapter more weight is placed on the genetic work. Critique of Biolog analysis Although a few studies have been made using the Biolog system for systematics McInroy99,Wolde04a,Wolde04b, by far its common usage is for identification of isolates to a defined database. The system requires living cells which must be handled correctly and be at the correct stage of growth, and concentration. Furthermore, the system was modified from the standard procedure for this study. The `bug' agar was replaced with R2A agar, because New Zealand rhizobial isolates grew poorly on `bug'. This is not a drastic change as the Biolog company once recommended R2A medium in an early version of the protocol [Maureen][]Fletcher-pc, and the results would be internally consistent. The second major problem with the Biolog data were with analysis. With hierarchial clustering in GenStat there are seven different methods of forming the similarly matrix (Euclidean, Pythagorean, Jaccard, Simple matching, Cityblock, Manhattan, and Ecological), and seven different methods of clustering (Single link, Nearest neighbour, Complete link, Furtherest neighbour, Average linkage, Median sort, and Group average). The analysis methods used here (Simple matching and average linkage) were recommended by a statistician [Greg][]Arnold-pc. Trials using the other methods of analysis gave different results (not shown), but none of these other analysis methods gave results that were consistent with the gene phylogenies. Other studies have been made of rhizobia using Biolog. In a study of 15 rhizobial strains (mostly types), McInroy99 used R2A medium and average linkage cluster analysis, supporting the choices used here. This method worked well, and generally the genera were separated, although the Rhizobium sequences split in two, with the R.tropici strains grouping with Sinorhizobium (Ensifer). In a study of Ethiopian native and exotic legumes, Wolde04a,Wolde04b used UPGMA cluster analysis with arithmetic averages. There was good separation of the genera, in both analyses Bradyrhizobium sequences were at the base of the dendrogram. These studies show that Biolog dendrograms could be used to establish relationships, but clear genera delineations are not seen in the data presented here. Because of the great variation among different clustering methods, and no apparent way to test the validity of the hierarchical clustering dendrogram, a Bayesian system was used (which is also designed to analyse phenotypic data). The strength of this method is its rigorous statistical measure of support, expressed as clade probabilities. These dendrograms were worse in some respects than the hierarchal clustering analyses (tightly grouping R.leguminosarum and B.japonicum), and better in others (placing the Bradyrhizobium strain with the Bradyrhizobium type species). The dendrograms did not produce clustering consistent with established classification. It is unclear whether using the `three-state' data (including partial changes as another character) or the `two-state' data (converting partial changes to positives) gave better results, although clades were perhaps better resolved with the `three-state' data. Although previous studies have shown some value of Biolog phenotypic analyses in identification and classification of strains, in this study strains were not correctly placed into genera groupings, and thus inferring relationships from these data is dubious. Critique of FAME analysis Fatty acid methyl ester (FAME) profiles were determined under contract by Central Science Laboratories (CSL) in York, England. A printed dendrogram was received from CSL, and no further analysis of the data were possible. There have been several previous studies of rhizobia using FAME. Jarvis96 investigated 215 strains of rhizobia and agrobacteria. The data were analysed by principal component analysis, and fairly good resolution of the species was found. Sawada92 also used principal component analysis of FAME data in an investigation of Agrobacterium taxonomy. Tighe00 in a large study of 600 strains used 3D principal component analysis, and found some correlation to genomic data, although Ensifer and Rhizobium could not clearly be distinguished, and one Mesorhizobium strain grouped with Bradyrhizobium. Unfortunately principal component analysis could not be performed in this study as the data for the dendrogram were not provided. In the dendrogram supplied, replicates of most strains grouped together, however the distance between them was greater than expected for profiles of identical strains. The FAME dendrogram, like Biolog ones, did not group related genera together which questions the accuracy and reliability of this data. This indicates a failure of this method to classify strains of this study. Since this method was unable to resolve relationships at the genera level, relationships below this level are likely spurious. Conclusions Relationship of genomic group to host plant These data did not show a clear relationship within a rhizobial genus to original host legume. This may occur because host specificity in Mesorhizobium is conferred by a transmissible element. Studies of Lotus corniculatus have shown that this legume species was not nodulated in pristine New Zealand soils because there were no effective bacteria present Greenwood77. Nodulation and fixation were initiated when Lotus was inoculated with an effective rhizobial strain Patrick92. Since then, it has been shown that the effective rhizobial symbiont of Lotus corniculatus, Mesorhizobium loti, carries nodulation and nitrogen-fixation genes on a large transmissible genetic element---`symbiosis island'---of 500 kb Sullivan95,Sullivan98. This symbiosis island can be transmitted to, and incorporated by, a range of Mesorhizobium strains already present in the soil, converting them into effective strains Sullivan95. This raises the question of whether symbiosis islands may therefore also be involved with transfer and fixation in the native New Zealand Mesorhizobium. If so, then the observation that sequences representative of isolates from Carmichaelia and Sophora are distributed across the Mesorhizobium clade indicates either that a single symbiosis island with a broad host range is responsible for nodulation and fixation of several native legume genera or that symbiosis islands specific for each native legume genus are distributed across the genus. By their nature, symbiosis islands are incorporated into the bacterial genome of recipient strains. It seems clear that these genes may be transferred between many, if not all, Mesorhizobium species. The distribution of sequences in Mesorhizobium apparently representing several species, raises a fundamental question concerning the specificity of the association of the effective nodulating strains. It appears that many, if not all, known Mesorhizobium spp.reported in other countries Chen05a are present in New Zealand and have nodulating capacity with the native legumes. The further studies presented in Chapters 4 and 5 sought to establish the extent and genetic basis of host specificity of these strains. Rhizobium leguminosarum An exception to the general association of native legume strains with Mesorhizobium was four isolates from Sophora chathamica, Carmichaelia australis and Clianthus puniceus that were very similar to Rhizobium leguminosarum in all genes sequenced. However, the recorded host-range of R.leguminosarum is Lathyrus spp., Lens spp., Phaseolus spp., Pisum spp., Trifolium spp.and Vicia spp., allocated to three biovars, named according to the host plants with which they are associated Kuykendall05. Therefore isolations reported in the present study may represent extensions to the known host-range of R.leguminosarum. A possible explanation is that these strains of R.leguminosarum may harbour broad-host-range Sym plasmids Hooykaas81, or may have acquired a specific nodulating plasmid, or symbiosis island, from the Mesorhizobium strains which would enable the nodulation of Carmichaelia, Clianthus and Sophora. Chapter 4 presents the results of an investigation of nodulation genes, and Chapter 5 of host-range studies of these strains. Only the single Rhizobium species, R.leguminosarum, was isolated from native species, in contrast with nodulation by Mesorhizobium spp.or Bradyrhizobium spp.where diverse strains of each genera were isolated. This may indicate a unique property of the R.leguminosarum strains, or alternatively may represent the abundance of this species in New Zealand soils due to extensive inoculation of clover in pasture Hastings66. A larger sample of non-Mesorhizobium species from native legumes would be required to investigate this further. Introduced weed legumes A primary question of this research was to determine if the legume weeds (broom, gorse and wattle) introduced into New Zealand, nodulated with rhizobia that were cosmopolitan---already present in New Zealand---or rhizobia introduced during colonisation. A further possibility was that these hosts were able to take advantage of native New Zealand rhizobia. This study indicates that most rhizobia isolated from New Zealand native legumes are members of Mesorhizobium, and all isolates obtained from the introduced legumes studied are members of Bradyrhizobium. Therefore it is clear the two groups of legume plants of different origins are nodulated by unrelated rhizobial populations. This means that introduced legume weeds in New Zealand are using either cosmopolitan or introduced strains, and have not been nodulated by strains from native legumes. During the course of this work, other studies were published identifying Bradyrhizobium as the predominant symbiont of broom (Cytisus scoparius) Sajnaga01b,Perez03,Rodriguez-Echeverria03. Surprisingly, given the serious weed status of gorse (Ulex europaeus), only one other publication has identified gorse symbionts, where it was shown that gorse and Acacia koa were nodulated by Bradyrhizobium spp.in Hawaii Leary06. Australian Acacia have been reported to nodulate dominantly with Bradyrhizobium, and to a lesser extent with Rhizobium tropici Lafay01. The work of this thesis confirms the results of these studies, by showing that Acacia, Ulex, and Cytisus are nodulated by Bradyrhizobium spp.in New Zealand. The nodulating Bradyrhizobium may have been transmitted in the course of dispersal of the plants Wang03. For instance, Bradyrhizobium could be introduced either with adventive legumes, in soil imported with other plants, or with seed. Alternatively, these bacteria may occur naturally in New Zealand soils without being involved in symbiotic associations, but have been available to nodulate the introduced legumes. These bacteria may have been present before the breakup of Gondwana, or arrived since then by various mechanisms. The heterogeneity of the Bradyrhizobium sequences isolated from gorse and broom is substantially greater than the recorded difference between B.liaoningense and B.yuanmingense, which are classified as separate species, hence the New Zealand Bradyrhizobium sequences may represent several new species. This may be an indication of a long presence and evolution in New Zealand, rather than of a small recent founder population. Similar heterogeneity in Bradyrhizobium has been recorded elsewhere Lafay98,Jarabo03 suggesting the classification of Bradyrhizobium species is far from complete. Summary of polyphasic analyses Rhizobia isolated from legume plants in New Zealand were investigated by polyphasic methods, although the phenotypic analyses were surprisingly poor and were not used for systematic inference. Gene trees were built with four housekeeping genes (16S rRNA, atpD, glnII, recA), using maximum likelihood and Bayesian analyses on both DNA and protein data. Isolates from native legumes were identified as predominantly Mesorhizobium spp., and R.leguminosarum. Isolates from introduced woody weed legumes were identified as Bradyrhizobium spp.strains were assigned to one of eight `genomic groups' based on their similarity in the 16S rRNA data. No clear relation of host legume to a genomic group within a rhizobial genus was seen. Strains were also analysed by phenotypic methods (Biolog and FAME). Neither analysis agreed with the accepted classification of Mesorhizobium, Rhizobium, Bradyrhizobium and Ensifer into discrete genera, and strains did not cluster into consistent groups. FAME and Biolog appear not to have value in discriminating rhizobia at generic and species levels. Such incongruence of different methods of analysis has been reported before in Pseudomonas species analysed with 16S rRNA, ribotyping, SDS-Page, FAME, Westprinting, Biolog, and Biotype100 Young00. It is apparent that phenotypic methods are not a reliable means of systematic inference. In the next chapter, further phylogenetic analyses were performed on a gene involved in the symbiosis process, to determine if host-specific patterns are seen with a symbiosis gene, and if horizontal transfer of a transmissible symbiotic genetic element has occurred.