Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://vertebrata.bio.msu.ru/Shedlock_Okada_2000_BE.pdf
Äàòà èçìåíåíèÿ: Mon Feb 8 19:42:00 2010
Äàòà èíäåêñèðîâàíèÿ: Mon Oct 1 19:32:02 2012
Êîäèðîâêà:
Review articles

SINE insertions: powerful tools for molecular systematics
Andrew M. Shedlock
1,2

and Norihiro Okada1*

Summary Short interspersed repetitive elements, or SINEs, are tRNA-derived retroposons that are dispersed throughout eukaryotic genomes and can be present in well over 104 total copies. The enormous volume of SINE amplifications per organism makes them important evolutionary agents for shaping the diversity of genomes, and the irreversible, independent nature of their insertion allows them to be used for diagnosing common ancestry among host taxa with extreme confidence. As such, they represent a powerful new tool for systematic biology that can be strategically integrated with other conventional phylogenetic characters, most notably morphology and DNA sequences. This review covers the basic aspects of SINE evolution that are especially relevant to their use as systematic characters and describes the practical methods of characterizing SINEs for cladogram construction. It also discusses the limits of their systematic utility, clarifies some recently published misunderstandings, and illustrates the effective application of SINEs for vertebrate phylogenetics with results from selected case studies. BioEssays 22:148 ­160, 2000. © 2000 John Wiley & Sons, Inc. Introduction

Systematics, genomics, and the nature of retroposons
The field of systematics is presently moving through an exciting period of discovery which has been advanced in large part by the examination of molecular genetic characters, primarily in the form of comparative DNA nucleotide sequences. Great strides have been made, but ambiguities in data sets due to back-mutation in nucleic acid character states confound the effective use of many molecular markers for phylogenetic inference. Appropriately matching the variation in sequence data to any given systematic question is of prime importance but is not necessarily straightforward. Commonly encountered problems also include nucleotide

1 Tokyo Institute of Technology, Faculty of Bioscience and Biotechnology, Yokohama, Japan. 2 The Institute of Statistical Mathematics, Tokyo, Japan. *Correspondence to: Norihiro Okada, Tokyo Institute of Technology, Faculty of Bioscience and Biotechnology, 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan. E-mail: nokada@bio.titech.ac.jp

base composition bias, uneven mutation rates, and incomplete lineage sorting of polymorphic characters.(1) Accurately modeling DNA sequence evolution and refining the statistical analysis of phylogenetic signal within well-characterized regions continue to be major goals.(2) Many difficulties remain, however, and as the field rapidly develops in concert with the study of genomics, numerous investigators are eagerly searching for novel markers of clear phylogenetic utility in the expansive and largely unexplored world of nuclear "gene space." The wave of DNA sequence information generated over the past decade has revealed in fine detail what was indicated by renaturation studies completed more than 30 years ago: remarkably, not more than 5% of the higher eukaryotic nuclear genome is composed of genes encoding for specific protein products but is instead largely formed by a variety of repetitive sequences with less obvious function. Progress in the Human Genome Project, in particular, indicates that at least 30% of our chromosomal DNA is either composed of or derived from repeat sequences called short and long interspersed elements, or SINEs and LINEs, respectively.(3) Similarly, well studied plant genomes such as those of maize and Arabidopsis, also clearly suggest that more than 50% of the nuclear DNA of higher plants is composed of repetitive sequences.(4) These are rather dramatic discoveries and underscore the importance of repetitive elements as both evolutionary agents for shaping the diversity of genomes (5­9) and as ubiquitous potential sources of phylogenetic information.(10 ­15) A large fraction of repetitive sequences are associated with mobile elements that can move from a parent locus to a target locus on the DNA level. Such dispersed mobile repeats are fundamentally different from tandemly repeated sequences, such as microsatellites, which arise by gene duplications.(16) The general process of relocation, termed transposition, can be mediated by either DNA or RNA. This aspect of movement within the genome distinguishes two broad categories of transposable elements: 1) transposons, which are wholly DNA-based elements found in both prokaryotes and eukaryotes that can directly relocate autonomously via recombination; and 2) retroposons, which relocate indirectly via RNA intermediates. RNA-mediated transposition appears to be restricted to eukaryotic genomes and is specifically termed retroposition to denote the

148

BioEssays 22.2

BioEssays 22:148 ­160,

©

2000 John Wiley & Sons, Inc.


Review articles

Figure 1. General model for a possible tRNA-derived SINE amplification process. Corresponding LINE and SINE components are color-coded and share a common region (green) due to a recombination process. Reverse transcriptase (yellow) is generated by a LINE and the corresponding SINE transcript can be recognized by its LINE-derived tail region (bold black 3 end). The SINE transcript is then reverse transcribed into cDNA and integrates into the host genome (red site) by the target DNA-primed mechanism adopted by LINEs.(19,22)

reverse flow of genetic information that occurs from RNA back into chromosomal DNA.(8,17) Retroposons are further divided into two superfamilies based on common structural features: 1) the viral superfamily that encodes for reverse transcriptase (RTase), this family includes retroviruses, long terminal repeat (LTR) retrotransposons, and LINEs (also known as non-LTR retrotransposons); and 2) the nonviral superfamily that does not encode for RTase, which includes SINEs and processed retropseudogenes. Our understanding of the relationships among these groups of elements is still incomplete but it continues to develop. In particular, retroviruses are thought to have evolved from LTR-retrotransposons by the acquisition of env genes, and LTR retrotransposons in turn are believed to have evolved from LINEs by the acquisition of long terminal repeats.(18) LINEs encode for RTase and are typically moderately or severely truncated at the 5 end, which suggests that an RTase encoded by a LINE must recognize the 3 end of the RNA template for first strand synthesis.(19) As discussed below, most SINEs are derived from tRNA and are believed to recombine and interact functionally with corresponding LINEs, leading to the acquisition of their retropositional activity.(20)

SINEs are distinguished in part from LINEs by their large copy number, relatively short length, and inability to encode for enzymes, such as RTase, that are essential for their own amplification. They typically range between 70 and 500 bases in length and may be present in well over 104 total copies in the eukaryotic genome.(13,21) Thus, SINEs greatly outnumber other repetitive elements and are of increasing interest to systematists because of their exceptional diagnostic power for establishing common ancestry among taxa and straightforward analysis once properly characterized. The precise biochemical mechanism of SINE amplification is not yet completely known. The basic aspects are well characterized, however, and there is increasing evidence that SINEs may acquire the necessary enzymes for retroposition from corresponding LINEs, which do code for reverse transcriptase (19,20,22) and are thus capable of self-replication. Figure 1 illustrates a basic model of the SINE amplification process. Most SINEs are derived from cellular non-viral RNA and are composed of three intact major regions: a 5 tRNA-related region, a tRNA-unrelated region, and a 3 ATrich region.(13) A major clue to the possible functional relationship between SINEs and corresponding LINEs is the direct homology evident at 3 sequence tails in the tRNA-

BioEssays 22.2

149


Review articles

unrelated region of SINEs with those of certain LINEs. This homology of common 3 sequences has been demonstrated to hold true across a broad eukaryotic taxonomic spectrum and the partnership between these two elements has important implications for the general role that retroposons may play in the speciation process.(20,22,23) Clearly, the sheer number of insertion events from hundreds of thousands of SINEs adds great fluidity to genomes and has the potential to infuence their architecture to a much greater degree than other common molecular genetic mechanisms such as point mutation, meiotic recombination, and DNA transposition. Furthermore, the retroposition of SINEs in concert with corresponding partner LINEs is quite likely an ancient molecular process and is consistent with the intriguing hypothesis that historical retroposon activity may have played an important role in landmark radiations of eukaryotic taxa.(18,20,23,24)

SINE evolutionary models
Understanding the evolutionary mechanisms of SINEs forms a critical foundation for their intelligent use as systematic tools. Repetitive SINE families are typically derived from tRNAs (13,14,25), with the notable exceptions of human Alu and rodent type 1 Alu families which are homologous to 7SL cytoplasmic RNA.(26) SINEs can be classified according to the tRNA species from which they originated and tRNALys appears to be the most common source of new SINEs in nature.(13,14,25) A SINE family is further classified into subfamilies based on relative sequence similarity. Two contrasting models of SINE evolution have emerged from available evidence: the master gene model (27) and the multiple source gene model.(28) Each of these supports different predictions regarding the possible role of selection, expected amplification rates, accumulation of sequence diversity within a SINE family and the subsequent formation and distribution of subfamilies over evolutionary time. In the master gene model (Fig. 2A), only one or perhaps a few highly active "master" SINE loci are capable of amplification, giving rise only to non-propogating offspring copies. Thus, any diagnostic mutations in the master will be reflected in subfamilies derived vertically from the original master sequence. In this scenario, the amplification rate is tightly coupled to the activity of the master gene and will either decrease or increase over time based on the accumulated mutations in the master only. Because the master must be maintained over long periods of evolutionary time, this model predicts a functional role of the master for the host, that is, a successful master gene would likely need to be under selective constraints or in some other way protected so that it could maintain amplification activity for tens of millions of years. Alternatively, the multiple source gene model (Fig. 2B) includes SINE offspring copies that potentially can propogate with the same capacity of their parent loci. Hence,

copies can serve as "multiple sources" for subsequent SINE amplification. Aspects of this model have been reviewed by Schmid and Maraia.(29) SINE retroposition efficiency can be influenced by many factors at the molecular level, such as the chromosomal environment at the site of retroposition, RNA secondary structure, and promoter activity. SINEs that do not find themselves in favorable chromosomal environments for amplification or that become compromised by the accumulation of mutations will be gradually inactivated over evolutionary time. In this model, subfamilies can result from distinct, multiple, active founder genes and the rate of amplification is tightly linked to overall copy number. Unlike the master gene model, it is not necessary to invoke a functional role to support long-term persistence of individual source genes. The amplification rate will increase or decrease over time depending on whether or not accumulated mutations in all copies deactivate loci faster or slower than the overall production of new copies. Evidence supporting the master gene model stems largely from early studies of sequence variation and divergence estimates among subfamilies of the human Alu family of SINEs (27,30) and, in particular, the ID SINEs in rodents.(31) A number of other detailed phylogenetic analyses, however, of Alu sequences (28,29,32), the mammalian L1 and MIR repeats (33,34), the AFC SINEs in cichlid fishes (10,23), and the extensive characterization of the Hpa-I subfamily in salmonid fishes (35­38) are all most clearly explained by the multiple source gene model. This expanding comparative database strongly suggests that this model may reflect the more probable mode of evolution for most SINEs other than the ID family in rodents. Mechanisms controlling the birth and death of SINEs in specific lineages define the active lifespans of these retroposons over evolutionary time and are important to understand with respect to their possible implications for a SINE's phylogenetic utility. As mentioned above, the 3 tail of a given SINE is derived from that of a specific partner LINE present in the same genome; consequently, it must be generated by a recombination event with a corresponding LINE. General models for this kind of recombination as well as for the mechanism for generation of the other major components of SINEs, namely, the tRNA-related region and the tRNA-unrelated region excluding the 3 tail, have yet to be proposed. The opposite process, SINE death, is another key concept and may be influenced by a variety of factors. Under the multiple source gene model, it is clear that the fate of a given SINE will be dependent on numerous conditions, and SINEs that do not find themselves in favorable chromosomal environments for amplification and that become compromised by the accumulation of mutations will become inactivated and rendered inert or "dead" over evolutionary time. Likewise, because SINEs appear to acquire retropositional

150

BioEssays 22.2


Review articles

activity by sharing common 3'-end sequence tails with particular LINEs, we expect that for each SINE family in the genome, a corresponding LINE family with the same 3'-end sequence must exist in the same organism. If, over the course of evolutionary time, all the source genes of a LINE family become inert, then all the members of the partner SINE family would also become inert. In other words, the death of a LINE automatically dictates the extinction of its corresponding SINEs in the same organism.(20) A pair of mammalian LINE2 and MIR SINEs provides a typical example of such a relationship in that the MIRs probably died about 150-200 Myrs ago due to the death of mammalian LINE2 (see below).(23) Details of analyzing patterns of SINE insertion as well as placing aspects of a SINE's lifespan within a practical systematic context are discussed later in this review.

SINE insertion dynamics as the key to their systematic utility
Given the enormous complexities of the evolutionary process, the notion of a phylogenetic character type that is essentially free of ambiguity seems too good to be true. Yet despite considerable scrutiny of these markers within vertebrate, invertebrate and plant genomes, SINE insertions appear to offer such a treasure trove for systematic biology. The keystone to the intrinsic value of SINEs as systematic characters is their widely dispersed, irreversible re-integration into the host genome. It is worth highlighting that these features of SINE genetics are well-established phenomena and should not be confused with less clear aspects of SINE biochemistry, such as the precise mechanism of how SINEs acquire reverse transciptase from other sources (e.g., LINEs) during the amplification process, or whether the master or multiple source gene model most accurately explains empirical patterns of variation in divergent SINE sequences. Although hotspots of insertion may ocuur in exceptional cases (H.A. Wichman, unpublished data), and human Alus may preferentially integrate in locally AT-rich or regions of R-bands of chromatin (28,39), SINEs are commonly found dispersed throughout the genome and can be primed for retroposition at nicked sites.(19,22,40) Furthermore, there is still no evidence of any process that specifically removes SINEs from chomosomal DNA despite more than a decade's worth of intense analysis of patterns of retroelements in humans and other eukaryotes (see numerous detailed review papers (4,8,13,14,18,21,25,30,41)). Large, non-specific SINE deletions (e.g., unequal recombination between short repeats to either side of an insert) are relatively rare events that would also be detected by PCR diagnostics and comparative sequence inspection during SINE insertion analysis (see following section on Applications). Likewise, horizontal transfer of SINEs is severely restricted by their non-autonomous amplification. Even if it does occur occasionally be-

tween species (38, 42), it is not problematic for the general use of SINEs as systematic characters: it is extremely unlikely that a SINE would be horizontally transferred exactly at the same locus independently in two different taxa. Thus, the probablilities of a SINE either inserting precisely in the same locus in two different unrelated lineages in a convergent manner or else being precisely excised from the genome and leaving no detectable trace are both exceedingly minute, and for all practical purposes, can be ignored. Because copies of the same SINE shared in two different taxa are derived from the same initial insertion event in the germ line of a common ancestor, they define monophyletic groups, or clades, and can be considered essentially noise-free synapomorphies (shared, derived characters) in the sense of Hennigian phylogenetic methodology.(43) This makes them extremely powerful tools for systematics and distinguishes them in a fundamental manner from other standard molecular characters, most notably aligned nucleotide sequence data. Because sequence mutations are reversible, nucleotide characters are statistically considered to be ancestral or derived based on some probability of backmutation at shared sites among individuals in an aligned data matrix (see reviews on DNA sequence characters (44­46) ). Tree construction from DNA sequences has consequently become a sophisticated statistical persuit with the routine use of larger data matrices and more complex mutation models(47,48), whereas building cladograms from fewer but highly informative SINE insertion data is straightforward in the absence of any character conflict.(12,43) Such unambiguous hypotheses can be strategically integrated with results from other data sets, such as DNA sequences, morphology, and ecological traits to conclusively solve difficult biological problems. In addition, highconfidence cladograms from SINE insertion analyses can be effectively used as reference hypotheses to test the diagnostic performance of different molecular markers, models of nucleotide substitution, and statistical methods of phylogenetic inference in a manner similar to the innovative use of known phylogenies generated by computer simulation (49) or those that can be created experimentally with microbes.(50)

The value of SINE flanking sequences
Like morphological synapomorphies, SINE insertion events allow one to establish tree topologies, but by themselves are unreliable for calculating relative branch lengths without reference to additional information, such as an explicit model of amplification rates for different independent loci over evolutionary time. As discussed above, it is clear that such a model would depend on numerous factors that are difficult to estimate, such as the precise amplification mode of each locus and the historical profile of environmental conditions and population diversity for each lineage. To date, evidence from the extensively characterized Alu elements in humans, the MIR repeats in mammals, and the Hpa-I subfamily in Pacific salmon indicates that significant rate heterogeneity

BioEssays 22.2

151


Review articles

should be expected during the active lifespan of some loci within SINE subfamilies (21,30,33,38,51), making the a priori assumption of a constant amplification rate for any given locus difficult to justify. Fortunately, sequences of the SINE elements themselves are easily characterized as new informative loci in their own right, as are the DNA sequences immediately flanking each inserted element (see next section). Together, these two different forms of systematic character data (SINE insertions and nucleotide substitutions) can be integrated and analyzed to estimate the timing of phylogenetically informative SINE insertion events that occur on internodes of a cladogram. A basic assumption, of course, is that the levels of divergence between flanking sequences examined is sufficiently large and free of excessive backmutation. Branch length on SINE cladograms can be accurately estimated in theory because the amount of divergence between SINE flanking sequences analyzed for monophyletic taxa will be proportional to the time that has elapsed since a diagnostic SINE locus originally appeared in the genome of a common ancestor. This is feasible because each SINE insertion represents an irreversible evolutionary event at time zero for a molecular clock that corresponds to nucleotide change in its flanking sequences. An added convenience of such flanking sequence anlaysis is that the majority of SINE insertions are found in non-functional areas of the genome where the sometimes dubious assumptions of the molecular clock, most notably a constant mutation rate, are less likely to be violated by the occurrence of selection. This analytical approach was first elegantly applied to human Alu SINE inserted adjacent to the HLA-DQA1 gene. It was used to date the divergence of DQA1 and DQA2 sequences with respect to the timing of the lineage split between hominids and Old World monkeys.(52) Although the examination of SINE flanking sequences is still at an early stage, it nicely complements the proven utility of SINE insertion analysis for cladogram construction and should add significantly to the overall value of SINEs as they become more widely applied as systematic tools. Application of SINEs for phylogenetics

cause it greatly broadens the scope of taxa from which one can search for novel loci of phylogenetic utility. For many species, in vitro transcription of total genomic DNA results in a discrete transcript that can be fingerprinted as if it were a single RNA species and also used as a hybridization probe to isolate cloned DNA fragments containing SINEs from a genomic library. This latter assay is essential in that it allows one to easily subclone and sequence numerous members of new SINE families and their adjacent flanking regions. Once novel loci can be isolated and sequenced from the genomic library, they can be characterized as follows prior to their use for phylogenetic inference: 1) a new SINE can be radioactively labelled and dot blot hybridization experiments can be conducted with genomic templates from various taxa of interest in order to calculate relative copy number of new SINEs in the host genome and to assess their taxonomic distribution; 2) sequences of newly cloned loci can be aligned and examined for profile differences that define distinct SINE subfamilies; and 3) subfamilies well matched in their taxonomic distribution to the phylogenetic question at hand can be selected for locus-specific PCR primer development in their adjacent flanking sequences. It is worth noting that the great majority of SINE insertions examined occur in non-functional regions of the genome which do not preclude straightforward PCR primer design. The number of newly isolated loci inserted into other repeat regions or functional genes that might lead to non-target-specific PCR priming during insertion analysis does not present a significant technical problem. In addition, any non-target PCR amplifications are detected by Southern hybridization experiments and sequencing (see below).

PCR detection strategy and cladogram construction
Once PCR primers can be developed for SINE loci, the detection of their presence or absence in any taxon of interest is straightforward and can be easily characterized in a rigorous fashion. The process involves three basic steps: 1) PCR amplification and electrophoretic visualization of size polymorphic bands corresponding to fragments possessing ( ) and/or lacking ( ) target SINE inserts; 2) Southern hybridization of a blot of the PCR gel using a unit sequence of the SINE; and 3) Southern hybridization of the same blot using the SINE flanking sequence. Steps 2 and 3 simply confirm the fidelity of PCR target site amplification in step 1: in step 2, signals should be detectable only for bands, whereas in step 3, both and bands should generate detectable signal. Additionally, any PCR products from step 1 can be easily sequenced using amplification primers and explicitly characterized for diagnostic features at the nucleotide level. As discussed above, irreversible SINE insertions ( bands) are essentially noise-free synapomorphies (shared, derived characters) that define monophyletic

Isolation of novel loci
Initial characterization of SINEs was based on comparisons of reported repetitive families and compilation of tRNA sequences in mammals.(53,54) In order to gain direct access to a wider range of comparative information about SINE families, a more efficient means of detecting and characterizing new SINEs from different genomes was developed by Endoh and Okada (55), namely in vitro transcription of total genomic DNA. This technique allows one to discern, quite efficiently, the specific tRNA origin of new repetitive families. It also enhances the use of SINEs as systematic tools be-

152

BioEssays 22.2


Review articles

of different loci are completely independent events, clades defined by insertions of multiple loci can be considered with extremely high confidence.

Lifespans, and limits of SINEs
As mentioned above, the application of SINEs, in addition to being restricted to eukaryotic taxa, is confined by limits to phylogenetic resolution and experimental detection just as is the case with any other type of molecular marker. A SINE's active lifespan will directly influence the evolutionary timeframe within which a given SINE is likely to resolve divergences among taxa. Likewise, if a SINE is unfixed for a species, its status as a possible synapomorphy remains unclear. From examination of the taxonomic distributions of SINE families characterized to date, it is evident that certain SINE families are only distributed within certain phylogenetic groups. This implies that, at a particular moment in evolutionary time, a particular SINE family is born in the host genome. Therefore, a SINE family is specific to a given host order, family, genus, or sometimes only a few species. The birth of a SINE family occurs by the generation of the tRNA-related region and the 5 part of tRNA-unrelated re-

Figure 2. Two alternative SINE evolutionary models. A: Master Gene Model, in which a single parent SINE, A, and its derived subfamilies A and A , give rise only to non-propagating copies. B: Multiple Source Gene Model, in which some offspring of a parent SINE become inactive and others can propagate and serve as multiple sources (A, B, C) for new SINE copies over evolutionary time.

groups, and taxa lacking these insertions for the same locus ( bands) can be considered outgroups. An important point is that only the successful PCR amplification of a negative SINE band that lacks insertion of the specific SINE element in question (versus the lack of any PCR amplification typically due to unsuccessful priming in flanking regions) allows one to draw conclusions about the absence of a SINE in that given locus. In other words, the lack of PCR amplification does not indicate the absence of SINE insertion. A simple example of tree construction from PCR amplification patterns for two different SINE loci is illustrated by the sequence in Figure 3A­C. As more loci are examined for size variation, the phylogenetic history of both insertion events and host lineages can be established across an increasingly broad temporal and taxonomic range. Because the insertion

Figure 3. Schematic of PCR detection of SINE insertion and corresponding cladogram construction. A: Forward and reverse strand PCR primers, represented by arrows, are designed to anneal at sites flanking a SINE element inserted into the host genome at a specific locus. Fragments for a given SINE locus are amplified by PCR for the different host taxa being examined. B: Cartoon of a gel electrophoresis banding pattern for PCR products in which two different SINE loci, L1 and L2, are assayed for presence ( expected fragment size) or absence ( expected fragment size) in four different host taxa, gel lanes 1­ 4. C: The cladogram that corresponds to the banding pattern in B. Taxa 1 and 2 share a common ancestor that had the SINE L1 inserted into its genome, designated by an arrow on the tree branch. Taxa 3 and 4 lack this L1 insertion and can be considered outgroups. Likewise, Taxa 1, 2 and 3 share the L2 SINE insertion and are thus monophyletic, with taxon 4 as an outgroup.

BioEssays 22.2

153


Review articles

gion, and possible recombination between this major body of the SINE and the 3 tail of a LINE (see above). From the time of birth of SINEs, the amplification rate of a SINE family may not be constant, as is readily understood in light of the multiple source gene model for SINE amplification. As stated earlier, it is possible for a SINE family to become inert or dead. Therefore, patterns of birth and death that define the active lifespan of a SINE are important to establishing the evolutionary timeframe within which a given SINE family may be most useful for resolving phylogeny. If the average amount of sequence divergence between members of a certain subfamily of SINEs is significantly small, we can assume they amplified relatively recently, and hence that this subfamily is relatively young. Alternatively, if the average sequence divergence of members of a certain subfamily of SINEs is large, then the subfamily can be considered old, and it is possible that the subfamily died in the host genome a long time ago. Therefore, even if the age (from the birth to the present time) of two different families of SINEs are the same, it is possible that one is active and the other is already dead. Such a differential pattern of lifespan between two families over the same evolutionary time period is well illustrated by the "fossil" MIR SINEs (34,56) in mammals and the still active cichlid AFC SINEs (10,23) or tortoise pol III SINEs.(57) All of these families of SINEs are still detectable in extant genomes and initally arose about 150-200 Myrs ago. However, the MIR SINEs became inert over 150 Myrs ago and hence their presence or absence in a given host (if reliably detectable by PCR) would be most useful for resolving tree topologies with divergences among lineages around that time period, whereas the cichlid AFC and tortoise pol III SINEs are still actively proliferating and can thus be used for resolving cladogenesis that has occurred much more recently. If a SINE is very young, it may not yet be fixed among all individuals in a species and, in this case, its use for inferring common ancestry among host taxa is inappropriate. There are numerous factors that can influence the number of fixed SINE loci in a species, most notably an interaction between amplification rate and population size. A SINE locus that yields a polymorphic banding pattern for insertion upon PCR assay of multiple individuals (e.g., combinations of / , / , / ) is easily detected as being unfixed in that species for that locus. Because the time necessary for fixation under neutral conditions (58, 59) is usually short in comparison with the length of time from the speciation event to the present, the vast majority of SINEs characterized to date have been fixed in lineages examined. A small subset have been inserted recently enough, however, to be still undergoing active lineage sorting toward fixation or loss across recently developed species boundaries (human Alu(60); rice pSINE1 (61); salmon Sma I (35); charr Fok I (62); cichlid

AFC: Terai, Takahashi and Okada, unpublished data). Although unfixed SINE loci are not useful for species-level phylogenetics, they have been shown to be extremely useful for discerning population structure and intra-specific demographics.(62, 63) The application of SINEs to population biology is an area of rich potential that nicely complements their use for phylogenetics, but is clearly beyond the scope of this review.

Phylogenetic context and other practical concerns
In addition to the lifespan of a SINE family, another limitation for the application of SINE insertion analysis must be considered. This major limiting factor is the practical need to design effective PCR primers in SINE flanking regions, which are subject to random mutation over time. Empirical results to date suggest that the ability to design efficient PCR primers for any given locus is difficult to reliably predict if the two taxa to be examined diverged more than roughly 50 Myrs ago (100 Myrs total distance between sequences). Over such extended periods of time, the accumulated mutations in flanking sequences adjacent to a particular SINE locus in divergent taxa can become too numerous (usually ca. 25-30% empirical sequence difference) to permit effective annealing with the same locus-specific primers. This practical limit to reliable PCR priming in flanking sequences must be kept in mind. Yet it should be regarded only as a general experimental guideline and must be distinguished from the other limits discussed above, which are inherent to the evolutionary genetics of SINE elements themselves. Designing SINE assays for phylogenetic inference is also greatly facilitated by comparative results from other available studies, such as those based on morphology, allozymes or mtDNA. This is especially apparent with respect to selecting a species from which to construct a genomic library and subsequently isolate new, potentially informative loci. It is ideal to isolate such loci from a relatively derived source that contains the broadest range of useful markers across a clade (e.g., taxon 1 or 2 in Fig. 3C). If the source is too basal in the cladogram (e.g., taxon 4 in Fig. 3C), it could prove difficult to find any loci that resolve more derived branches. Since the primary goal is to infer phylogeny in the first place, there is often no way to be certain of the exact systematic position of many taxa a priori. Consulting results from other morphological and molecular studies can thus offer some practical guidelines. Although each informative SINE locus represents a poweful systematic character, isolating and characterizing sufficient numbers of SINE insertions to resolve a large phylogeny can clearly involve a substantial amount of benchwork. Fortunately, a single screening for positive clones from a genomic library can often yield dozens of useful markers and, in many cases, need only be done once or a few times for the completion of a sizeable systematic

154

BioEssays 22.2


Review articles

project. In addition, published comparative sequence information available for newly characterized SINE families is rapidly expanding in concert with the proliferation of genomic studies and increased attention on the possible function and evolution of retroelements in general. For example, in an attempt to close the information gap on acquiring valuable comparative sequence information on retroposons as quickly as possible, a new, highly successful electronic database has been recently established over the internet (Repbase Update 1997 on World Wide Web URL: http://www.girinst.org/ server/repbase.html). Because SINE insertions can be weighted with extreme confidence and are easily analyzed once PCR primers become available, it is likely that more investigators will choose to use them as isolation of novel loci becomes more streamlined, or can be minimized or avoided altogether for some applications.

TABLE 1. SINE Families, Host Taxa, and Associated References.
SINE family Galago type 2 Rodent B2 ID Elements Pig PRE 1 Rabbit C Bov-tA CHR-1 CHR-2 ERE-1 Can SINE MIR Tortoise Pol III Charr fok I Salmon Sma I Salmon Hpa I Salmon Ava III AFC DANA Elements Squid SK Octopus OK Octopus OR1 Octopus OR2 SURF1 Bm1 Elements Feilai SINEs SM alpha Mg SINE Rice p-SINE1 Tobacco TS SINE S1 Host taxa Primates, Galago Rodents Rodents Suina (Pigs, Peccaries) Rabbits Ruminantia (Ruminants) Cetaceans, Hippos, Ruminants Cetaceans, Hippos, Ruminants Horses, Equus Carnivora Mammalia Cryptodira (Tortoises, Turtles) Charr, Salvelinus Chum and Pink salmon, Oncorhynchus Salmonidae (All Salmonids) Salmonidae (All Salmonids) Cichlid Fishes Zebrafish, Danio Squid, Loligo Octopuses Octopuses Octopuses Sea Urchin, Strongylocentrotus Silk Worm, Bombyx mori Mosquito, Aedes aegypti Blood Fluke, Schistosoma Rice Blast Fungus, Magnoporthe Rice, Oryza Solanaceae (Tobacco) Cruciferous Plants References 64 65 66, 31 67 68 69 11 11 70 71 34 57, 22 7, 62 72,42 7, 36, 37 37 10, 23 73 74 75 75 75 76 77, 20 78 79 80 60 81 82

Empirical studies The number of animal and plant species for which SINEs have been characterized is increasing rapidly and now includes a broad range of vertebrates, invertebrates and plants. Table 1 summarizes repetitive families and their corresponding host taxa and associated references. It seems likely from empirical data that SINEs are widespread in most eukaryotes, especially vertebrates, hence they offer rich potential for advancing the study of systematic zoology and botany. Their usefulness has already been included in a number of important systematic revisions, perhaps most dramatically with respect to mammalian orders (see review by de Jong (83) ). Empirical results from three distinct SINE insertion analyses are outlined below to illustrate the effective use of these phylogenetic markers and to demonstrate how SINE analyses can be employed strategically in concert with sequence data and morphology to solve particularly difficult and valuable evolutionary questions.

Whales and the Artiodactyla
The origin of whales and the remarkable reversion of their ancestral land-based lineage to a fully aquatic life has been studied in detail from a variety of biological perspectives. Evidence from both mtDNA and nDNA milk casein genes challenges the traditional morphology-based view that whales diverged from a common ancestor of the order Artiodactyla (even-toed ungulates) and suggests instead that cetaceans (whales, dolphins and porpoises) may actually be derived members nested deeply within a single "cetartiodactyl" clade.(84,85) To resolve the controversy regarding whale origins fueled by conflicting results from morphological and DNA sequence studies, two new SINE families specific to whales, hippopotamusses and ruminants, CHR1 and CHR2, were characterized and examined for insertions among these taxa.(11,86) SINE insertion data conclusively shows rather dramatically that whales are in fact derived artiodactyls and resolves the major clades of this diverse order of mammals (Fig. 4B). These results unambiguously make a hippopotamus more closely related to a whale than

Salmonid fishes
The most extensive characterization of non-mammalian SINEs has been completed for salmonid fishes over the past decade by Okada and colleagues. In particular, loci from families, Sma I, Hpa I, and Fok I were employed to originally illustrate the use of SINE insertions as systematic tools for tree construction.(12) A fully resolved cladogram for Pacific salmon species and outgroup taxa are illustrated in Figure 4A. These results demonstrate monphyly for Pacific salmon, including a sister relationship with the steelhead trout, which traditionally had been placed in the Atlantic genus Salmo. They also add substantial confidence to relationships in the salmonid species complex that have been difficult to resolve with morphological markers or by other molecular data.

BioEssays 22.2

155


Review articles

to a camel or pig, and prompt serious reconsideration of interpretations of fossil data that mesonychians (an extinct group of hooved terrestrial mammals) gave rise to the extinct cetacean suborder Archaeoceti. In addition to advancing our understanding of whale origins, these SINE markers facilitate two important directions of further research: 1) firmly resolving relationships among additional "cetartiodactyl" taxa; 2) defining clades among cetacean families more precisely, especially those including toothed vs. baleen whales.

African cichlid fishes
The speciation and diversification of African Great Lake cichlid fishes has been regarded as a prime example of explosive adaptive radiation. Understanding the evolutionary history of this radiation has been a major challenge for systematic biologists and numerous attempts to create a phylogenetic framework for the extensive endemic species flocks have been attempted with morphological, ecological, and molecular data. In this vein, a novel family of SINEs, named AFC, was isolated and characterized in a broad range of African cichlid species.(10) An examination of Lake Tanganyikan cichlid species using AFC SINE insertions, showed clear evidence of monophyly for four traditonally established tribes. Figure 4C shows results from the PCR detection assay published for one of three independent AFC loci that clearly resolves one of the major Tanganyikan cichlid tribes, Lamprologini. These conclusive and exceptionally clean results represent a valuable foundation for employing SINE insertions to address a broad array of puzzling systematic questions posed by African Great Lake cichlids.

Published misunderstandings of SINEs
Recent SINE data published in main journals (11, 86) has prompted new commentaries by noted authors regarding the value of SINEs as systematic characters.(87, 88) While these comments have been largely supportive, several published misinterpretations of SINE data are evident and a misleading view has emerged implying that SINEs are "perfect" characters with no limitations.(87, 88, 89) Most of this can be attributed to both confusion about how retroposons evolve and to a focus on advantages over limitations in reports that have employed the technique. Four particular issues require clarification to prevent further misunderstanding: 1) Missing data. The absence of PCR amplification does not indicate the absence of SINE insertion at a given locus. While PCR mispriming may present practical limits to SINE analysis for certain taxa and loci, it does not corrupt the legitimate use of insertion events as excellent systematic tools when they can be properly experimentally diagnosed. Luckett and Hong (89) badly misconstrue this fact with respect to data published on artiodactyl phylog-

Figure 4. Summary cladograms of SINE insertion analyses published for salmonid fishes (A)(12) and whales and artiodactyls (B)(11,86). The distributions of historical insertion events of SINE families, LINE elements and of specific loci are indicated by colored arrows. Example of published data from PCR SINE insertion analyses of African cichlid fishes is shown in C(10). PCR products (a), results of Southern hybridizations probing with a SINE unit sequence (b) and the SINE flanking sequence (c) are presented for one of three independent SINE loci examined. Filled arrows indicate expected sizes of products containing SINE insertions; unfilled arrows indicate expected size of products lacking SINE insertions. This locus clearly confirms monophyly of the Lake Tanganyikan cichlid tribe Lamprologini. "S" in lanes 1 and 26 denotes DNA size standard; "C" in lane 25 denote negative control. A taxonomic legend for cichlid species corresponding to gel lanes and names of specific loci are available in the original report(10).

156

BioEssays 22.2


Review articles

eny (11) and consequently make numerous erroneous conclusions about SINE analysis in general. A detailed discussion regarding the proper consideration of missing SINE data in relation to retroposon evolution and the origin of whales is available elsewhere.(90) 2) Incomplete lineage sorting. Hillis (87) and Miyomoto (88) have both highlighted the potential for phylogenetic incongruence created by incomplete lineage sorting, which is a legitimate concern for SINE analysis. The suggestion, however, that SINEs are "particularly sensitive" to this problem is not justified. It should be emphasized that each SINE locus analyzed represents the equivalent of an entirely different independent gene sequence analysis, since all the mutations at the DNA level considered within any given linked locus (e.g., the entire mitochondrial genome) will be confounded by ancestral polymorphism. Hence, the phylogenetic incongruence created by ancestral polymoprphism as encountered by DNA sequence analysis of any single gene cannot be directly extended to SINE analysis: with a sequence analysis affected by incomplete lineage sorting, one can only speculate about whether in fact the phenomenon is responsible for ambiguous results, whereas polymorphic SINE data directly identify and diagnose the problem. Certainly, the evidence to date for hundreds of independent SINE loci, analyzed in a wide variety of eukaryotes, indicates that incongruence due to ancestral polymorphism is not a significant problem to SINE insertion analysis, as long as multiple loci are examined. Polymorphic SINE insertion results are themselves useful as alternative pathways that can be used to evaluate the fate of alleles identical by descent. In fact, in cases where incomplete lineage sorting is well established and of interest as a subject of investigation, such as in the species complexes of freshwater salmonid and African cichlid fishes, the irreversible nature of SINE insertions have been exploited to resolve the scope and history of the evolutionary phenomenon with precision unavailable using other standard markers such as mtDNA or MHC genes ((62), Terai, Takahashi, and Okada, unpublished data). 3) Bootstrapping. The meaning of a bootstrap test is obscure for a data set based on irreversible evolutionary events that exhibits no empirical character conflict. Nevertheless the belief that some statistical assessment of confidence is necessary for trees inferred by SINE insertions is evident by published comments.(87, 89) The validity of the comparison of bootstrap values among different data sets by Hillis (87) is questionable both philosophically and in terms of the large error bias and signifcant loss of phylogenetic information one would predict for the relatively small sample size of characters in the SINE data

matrix.(91) Cases where SINE insertion results either exhibit character incongruence and/or suffer from a large proportion of missing data certainly invite statistical evaluation. The optimal method for such exploration, however, is not obvious and bootstrap results should be considered with due caution. 4) Independence of SINE insertions. Miyomoto, while supportive of the SINE method, suggested that co-occurrence of SINEs across taxa is not truly independent for elements originating from the same amplification event (88), and advocated the use of only insertions from different SINE families or subfamilies. While it is true that the taxonomic distribution of SINE families and amplification of copies of SINEs therein is distributed unevenly, this logic does not hold for the independent nature of the insertion process at different unique loci in the host genome. Indeed, the bias in distribution is itself the basis for selecting a taxon from which to create a genomic library for optimizing the isolation of new phylogenetically informative SINE loci. It is irrelevant, however, to the use of the ancestral vs. derived state of insertion at any given locus as an independent indication of common ancestry. Consideration of SINE insertions as irreversible point mutations in the genome is a simple conceptual exercise that may help clarify any confusion about the nature of SINE character independence. Each locus inserted by a SINE evolves independently regardless of its amplification origin and will be lost or fixed by random genetic drift over time. Conclusions The irreversible nature of SINE retropositional events in eukaryotic genomes allows them to be employed as unambiguous, derived, homologous characters for straightforward cladogram construction. In this sense SINE insertions are exceptionally powerful systematic tools that can be used strategically to complement less conclusive results obtained from other data such as morphology and DNA sequences. It must be emphasized that the intelligent use of SINEs for phylogenetics depends critically on an understanding of how SINEs evolve. In particular, the process of amplification and dispersion in the germ cell of an individual must be considered together with a population perspective on the random drift of SINEs towards eventual loss or fixation in a species. Topologies from SINE insertion can be nicely integrated with relative branch length information independently derived from DNA sequence data or used as valuable reference hypotheses for systematic methods development. Properly considered within an evolutionary framework, SINE flanking sequences also have the potential for use in dating diagnostic retropositional events on the branches of cladograms, given the probable neutral nature of their evolution.

BioEssays 22.2

157


Review articles

Characterization of SINEs in major animal and plant groups is still in an early phase but it appears likely these elements are widespread thoughout many eukaryotic genomes, especially vertebrates. The utility of any SINE locus for tree contruction depends on its fixation and the timespan it has been active in the host genome. In cases where SINE insertion results are complicated by incomplete lineage sorting, it is logical to evaluate any character conflict with statistical analysis of data. The bootstrap, however, may not be the ideal analytical tool for such investigation. Excessive divergence in locus-specific PCR priming sites among host genomes can also preclude effective SINE insertion analysis and reduce phylogenetic resolution. The majority of SINEs that have been examined in detail, however, are fixed for the taxa in question and can be most reliably characterized for resolving divergences that are roughly 50 Myrs old or less. Hence, they are ideally suited for addressing a wide variety of species and generic level questions. While the isolation and characterization of new loci may involve substantial work, the payoff in conclusive results is exceptional. Consequently, empirical studies using SINE insertions to date have largely focused on particularly difficult and/or valuable systematic problems, which cannot be easily resolved with other data. Continued characterization of new SINE families in diverse taxa and expanding access to genome sequence data should greatly facilitate and expand the use of these powerful systematic tools in the near future. Acknowledgments We thank Peter Holland for his kind invitation to contribute this review and Masami Hasegawa for encouragement, advice, and critical reading of the manuscript. Two anonymous reviewers and the Editor, Adam Wilkins, provided helpful comments to improve the final manuscript. We apologize for not being able to cite numerous relevant contributions from colleagues because of space limitations. The Japan Society for the Promotion of Science provided generous fellowship support to AMS. References
1. Swofford D, Olsen GJ., Waddell PJ, Hillis DM. Phylogenetic inference. In: Hillis DM, Moritz C, Mable B, editors. Molecular systematics, 2nd edition. Sunderland, MA: Sinauer; 1996. p 407­509. 2. Cao Y, Janke A, Waddell PJ, Westerman M, Takenaka O, Murata S, Okada N, Pa ¨ bo S, Hasegawa M. Conflict among individual mitochondrial ¨a proteins in resolving the phylogeny of eutherian orders. J Mol Evol 1998; 47:307­322. 3. Kazazian HH Jr, Moran JV. The impact of L1 retrotransposons on the human genome. Nat Genet 1998;19:19 ­24. 4. Grandbastien MA. Retroelements in higher plants. Trends Genet 1992;8: 103­108. 5. Maraia R. The impact of short interspersed elements on the host genome. New York: Springer Publishing; 1995. 6. Brosius J. Retroposons ­ seeds of evolution. Science 1991;251:753. 7. Kido Y, Aono M, Yamaki T, Matsumoto K, Murata S, Saneyoshi M, Okada N.

8.

9. 10.

11.

12.

13. 14. 15. 16. 17. 18.

19. 20. 21. 22.

23.

24. 25.

26. 27. 28. 29.

30. 31.

32. 33. 34.

Shaping and reshaping of salmonid genomes by amplification of tRNA-derived retroposons during evolution. Proc Natl Acad Sci USA 1991;88:2326 ­ 2330. Weiner A, Deininger PL, Efstratiadis A. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Ann Rev Biochem 1986;55:631­ 661. Ohno S. Evolution by gene duplication. Heidelberg: Springer Publishing; 1970. Takahashi K, Terai Y, Nishida M, Okada N. A novel family of short interspersed repetitive elements (SINEs) from cichlids: the pattern of insertion of SINEs at orthologoous loci support the proposed monophyly of four major groups of cichlid fishes in Lake Tanganyika. Mol Biol Evol 1998;15: 391­ 407. Shimamura M, Yasue H, Ohshima K, Abe H, Kato H, Kishiro T, Goto M, Munechika I, Okada N. Molecular evidence from retroposons that whales form a clade within even-toed ungulates. Nature 1997;388:666 ­ 670. Murata S, Takasaki N, Saitoh M, Okada N. Determination of the phylogenetic relationships among Pacific salmonids by using short interspersed elements (SINEs) as temporal landmarks of evolution. Proc Natl Acad Sci USA 1993; 90:6995­ 6999. Okada N. SINEs: short interspersed repeated elements of the eukaryotic genome. TREE 1991a;6:358 ­361. Okada N. SINEs. Curr Opin Genet Dev 1991b;1:498 ­504. Ryan S, Dugaiczyk A. Newly arisen DNA repeats in primate phylogeny. Proc Natl Acad Sci USA 1989;86:9360 ­9364. Singer M, Berg P. Genes and genomes. Mill Valley, CA: University Science Books; 1991. Rogers J. Retroposons defined. Nature 1983;301:460. Eickbush TH. Origin and evolutionary relationships of retroelements. In: Morse SS, editor. The evolutionary biology of viruses. New York: Raven Press; 1994. p 121­157. Eickbush TH. Transposing without ends: the non-LTR retrotransposable elements. New Biol 1992;4:430 ­ 440. Okada N, Hamada M, Ogiwara I, Ohshima K. SINEs and LINEs share common 3' sequences: a review. Gene 1997;205:229 ­243. Schmid C. Alu: structure, origin, evolution, significance and function of onetenth of human DNA. Prog Nucl Acid Res Mol Biol 1996;53:283­319. Ohshima K, Hamada M, Terai Y, Okada N. The 3 ends of tRNA-derived short interspersed repetitive elements are derived from the 3 ends of long interspersed repetitive elements. Mol Cell Biol 1996;16:3756 ­3764. Terai Y, Takahashi K, Okada N. SINE cousins: the 3' end tails of the two oldest and distantly related families of SINEs are descended from the 3 ends of LINEs with the same geneological origin. Mol Biol Evol 1998;15:1460 ­ 1471. McDonald JF. Transposable elements, gene silencing, and macroevolution. TREE 1998;13:94. Okada N, Ohshima K. Evolution of tRNA-derived SINEs. In: Maraia RJ, editor. The impact of short interspersed elemenets (SINEs) on the host genome. Austin: RG Landes Co; 1995. p 62­79. Ullu E, Tschudi C. Alu sequences are processed 7SL RNA genes. Nature 1984;312:171­172. Shen MR, Batzer MA, Deininger PL. Evolution of the master Alu gene(s). J Mol Evol 1991;33:311­320. Matera AG, Hellman U, Schmid CW. A transpositionally and transcriptionally competent Alu subfamily. Mol Cell Biol 1990;10:5424 ­5432. Schmid CW, Maraia R. Transcriptional regulation and transpositional selection of active SINE sequences. Curr Opin Genet Devel 1992;2:874 ­ 882. Deininger PL, Batzer MA. Evolution of retroposons. Evol Biol 1993;27:157­ 196. Kim J, Martignetti JA, Shen MR, Brosius J, Deininger P. Rodent BC1 RNA gene as a master gene for ID element amplification. Proc Natl Acad Sci USA 1994;91:3607­3611. Leeflang EP, Liu W-M, Hashimoto C, Choudary PV, Schmid CW. Phylogenetic evidence for multiple Alu source genes. J Mol Evol 1992;35:7­16. Smit AFA, Toth G, Riggs AD, Jurka J. Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J Mol Biol 1995;246:401­ 417. Smit AFA, Riggs AD. MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Res 1995;23:98 ­102.

158

BioEssays 22.2


Review articles

35. Takasaki N, Yamaki T, Hamada M, Park L, Okada N. The salmon Sma I family of short interspersed elements (SINEs): interspecific and intraspecific variation in the insertion of SINEs in the genomes of chum and pink salmon. Genetics 1997;146:369 ­380. 36. Kido Y, Saitoh M, Murata S, Okada N. Evolution of the source sequence of Hpa I short interspersed elements. J Mol Evol 1995;41:986 ­995. 37. Kido Y, Himberg M, Takasaki N, Okada N. Amplification of distinct subfamilies of short interspersed elements (SINEs) during evolution of the Salmonidae. J Mol Biol 1994;241:633­ 644. 38. Takasaki N, Murata S, Saitoh M, Kobayashi T, Park L, Okada N. Speciesspecific amplification of tRNA-derived SINEs via retroposition: a process of paratization of entire genomes during the evolution of salmonids. Proc Natl Acad Sci USA 1994;91:10153­10157. 39. Korenberg JR, Rykowski MC. Human genome organization: Alu, LINE and the molecular structure of metaphase chromosome bands. Cell 1988;53: 391­ 400. 40. Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-RTL retrotransposition. Cell 1993;72:595­ 605. 41. Britten RJ, Baron WF, Stout DB, Davidson EH. Sources and evolution of human Alu repeated sequences. Proc Natl Acad Sci USA 1988;85:4470 ­ 4774. 42. Hamada M, Kido Y, Himberg, M, Hasegawa M, Okada N. Characterization of a newly isolated family of short interspersed repetitive elements (SINEs) in coregonid fish (whitefish) with sequences that are almost identical to those of the Sma I family of repeats: possible evidence for the horizontal transfer of SINEs. Genetics 1997;146:369 ­380. 43. Hennig W. Phylogenetic systematics. Urbana-Champaign: University of Illinois Press; 1966. 44. Felsenstein J. Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet 1988;22:521­565. 45. Hillis DM. Molecular versus morphological approaches to systematics. Annu Rev Ecol Syst 1987;18:23­ 42. 46. Patterson C. Homology in classical and molecular biology. Mol Biol Evol 1988;5:603­ 625. 47. Huelsenbeck JP, Rannala B. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 1997;276:227­232. 48. Swofford D. PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods), Version 4.0. Sunderland, MA: Sinauer Associates. 1996. 49. Huelsenbeck JP. Performance of phylogenetic methods in simulation. Syst Biol 1995;44:17­ 48. 50. Hillis DM, Bull JJ, White ME, Badgett MR, Molineux J. Experimental phylogenetics: generation of a known phylogeny. Science 1992;255:589 ­592. 51. Quentin Y. The Alu family developed through successive waves of fixation closely connected with primate lineage history. J Mol Evol 1988;27:194 ­ 202. 52. Del Pozzo G, Guardiola J. A SINE insertion provides information on the divergence of the HLA-DQA1 and HLA-DQA2 genes. Immunogenetics 1990; 31:229 ­232. 53. Sakamoto K, Okada N. Rodent type 2 Alu family, rat identifier sequence, rabbit C family, and bovine or goat 73 bp repeat may have evolved from tRNA genes. J Mol Evol 1985;22:134 ­140. 54. Daniels GR, Deininger PL. Repeat sequence families derived from mammalian tRNA genes. Nature 1985;317:819. 55. Endoh H, Okada N. Total DNA transcription in vitro: a procedure to detect highly repetitive and transcribable sequences with tRNA-like structures. Proc Natl Acad Sci USA 1986;83:251­255. 56. Jurka J, Zeitkiewicz E, Labuda D. Ubiquitous mammalian-wide interspersed repeats (MIRs) are molecular fossils from the Mesozoic era. Nucleic Acids Res 1995;23:170 ­175. 57. Endoh H, Nagahashi S, Okada N. Highly repetitive and transcribable sequence in the tortoise genome is probably a retroposon. Eur J Biochem. 1990;189:25­31. 58. Kimura M. The neutral theory of molecular evolution. Cambridge: Cambridge University Press; 1983. 59. Nei M. Molecular evolutionary genetics. New York: Columbia University Press; 1987. 60. Batzer MA, Gudi VA, Mena JC, Foltz D, Herrera RJ, Deininger PL. Amplification dynamics of human-specific (HS) Alu family members. Nucleic Acids Res 1991;19:3619 ­3623.

61. Mochizuki K, Umeda M, Ohtsubo H, Ohtsubo E. Characterization of a plant SINE, p-SINE1, in rice genomes. Jpn J Genet 1992;67:155­166. 62. Hamada M, Takasaki N, Reist JD, De Cicco, Goto A, Okada N. Detection of the ongoing sorting of ancestrally polymorphic SINEs toward fixation or loss in populations of two species of charr during speciation. Genetics 1998;150: 301­311. 63. Batzer MA, Arcot SS, Phinney JW, Alegria-Hartman M, Kass DH, Milligan SM, Kimpton C, Gill P, Hochmeister M, Ioannou P, et al. Genetic variation of recent Alu insertions in human populations. J Mol Evol 1996;42:22­29. 64. Daniels GR, Deininger PL. A second major class of Alu family repeated DNA sequences in a primate genome. Nucleic Acids Res 1983;11:7595­ 7610. 65. Krayev AS, Markusheva TV, Kramerov DA, Ryskov, AP, Skryabin, KG, Bayev AA, Georgiev GP. Ubiquitous transposon-like repeats B1 and B2 of the mouse genome: B2 sequencing. Nucleic Acids Res 1982;10:7461­ 7475. 66. Milner RJ, Bloom FE, Lai C, Lerner RA, Sutcliffe JG. Brain-specific genes have identifier sequences in their introns. Proc Natl Acad Sci USA 1984;81:713­ 717. 67. Singer DS, Parent LJ, Ehrlich R. Identification and DNA sequence of an interspersed repetitive DNA element in the genome of the miniature swine. Nucleic Acids Res 1987;15:2780. 68. Cheng J-F, Printz R, Callaghan T, Shuey D, Hardison RC. The rabbit C family of interspersed repeats: nucleotide sequence determination and transcriptional analysis. J Mol Biol 1984;176:1­20. 69. Lenstra J A, Boxtel J AF, Zwaagstra KA, Schwerin M. Short interspersed nuclear element (SINE) sequences of the Bovidae. Anim Genet 1993;24:33­ 39. 70. Sakagami M, Ohshima K, Mukoyama H, Yasue H, Okada N. A novel tRNA species as an origin of short interspersed repetitive elements (SINEs): equine SINEs may have originated from tRNASer. J Mol Bio. 1994;239: 731­735. 71. Coltman DW, Wright JM. Can SINEs: a family of tRNA retroposons specific to the superfamily Canoidea. Nucleic Acids Res 1994;22:2726 ­2730. 72. Matsumoto K, Murakami K, Okada N. Gene for lysine tRNA1 may be a progenitor of the highly repetitive and transcribable sequences present in the salmon genome. Proc Natl Acad Sci USA 1986;83:3156 ­3160. 73. Izsvak Z, Ivics Z, Hackett PB. Repetitive elements and their genetic applications in zebrafish. Biochem. Cell Biol 1997;75:507­523. 74. Ohshima K, Koishi R, Matsuo M, Okada N. Several short interspersed repetitive elements (SINEs) in distant species may have originated from a common ancestral retrovirus: characterization of a squid SINE and a possible mechanism for generation of tRNA-derived retroposons. Proc Natl Acad Sci USA 1993;90:6260 ­ 6264. 75. Ohshima K, Okada N. Generality of the tRNA origin of short interspersed repetitive elements (SINEs): characterization of three different tRNA-derived retroposons in the octopus. J Mol Biol 1994;243:25­37. 76. Nisson PE, Hickey RJ, Boshar MF, Crain WR Jr. Identification of a repeated sequence in the genome of the sea urchin which is transcribed by RNA polymerase III and contains the features of a retroposon. Nucelic Acids Res 1988;16:1431­1452. 77. Adams DS, Eickbush TH, Herrera RJ, Lizardi PM. A highly reiterated family of transcribed oligo(A)-terminated, interspersed DNA elements in the genome of Bombyx mori. J Mol Biol 1986;187:465­ 478. 78. Tu Z. Genomic and evolutionary analysis of Feilai, a diverse family of highly reiterated SINEs in the yellow fever mosquito, Aedes aegypti. Mol Biol Evol 1999;16:760 ­762. 79. Spotila LD, Hirai H, Rekosh DM, LoVerde PT. A retroposon-like short repetitive DNA element in the genome of the human blood fluke, Schistosoma mansoni. Chromosoma 1989; 97:421­ 428. 80. Kachroo P, Leong SA, Chattoo BB. Mg-SINE: A short interspersed nuclear element from the rice blast fungus, Magnoporthe grisea. Proc Natl Acad Sci USA 1995;92:11125­11129. 81. Yoshioka Y, Matsumoto S, Kojima S, Ohshima K, Okada N, Machida Y. Molecular characterization of a short interspersed repetitive element from tobacco that exhibits sequence homology to specific tRNAs. Proc Natl Acad Sci USA 1993;90:6562­ 6566. 82. Lenoir A, Cournoyer B, Warwick S, Picard G, Deragon J-M. Evolution of SINE S1 retroposons in Cruciferae plant species. Mol Biol Evol 1997;14: 934 ­941.

BioEssays 22.2

159


Review articles

83. De Jong WW. Molecules remodel the mammalian tree. TREE 1998;13:270 ­275. 84. Milinkovitch MC, Berube MM, Palsboll. Whales are highly derived artiodactyls. In: Thewissen JGM, editor. The emergence of whales, evolutionary patterns in the origin of Cetacea. New York: Plenum; 1998. p 113­131. 85. Gatesy J, Hayashi C, Cronin M, Arctander P. Evidence from milk casein genes that cetaceans are close relatives of hippopotamid artiodactyls. Mol Biol Evol 1996;13:954 ­963. 86. Nikaido M, Rooney AP, Okada N. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interspersed elements: hippopotamuses are the closest extant relatives of whales. Proc Natl Acad Sci USA 1999;96:10261­10266.

87. Hillis DM. SINEs of the perfect character. Proc Natl Acad Sci USA 1999;96: 9979 ­9981. 88. Miyamoto MM. Perfect SINEs of evolutionary history? Curr Biol 1999; (in press). 89. Luckett P, Hong N. Phylogenetic relationships between the orders Artiodactyla and Cetacea: a combined assessment of morphological and molecular evidence. J Mammal Evol 1998;5:127­182. 90. Shedlock AM, Milinkovitch MC, Okada N. SINE evolution, missing data, and the origin of whales. Syst Biol 2000; (in press). 91. Sanderson MJ. Objections to bootstrapping phylogenies: a critique. Syst Biol 1995;44:299 ­320.

160

BioEssays 22.2