i5K/Pre sequencing informatics

From ArthropodBase wiki
(Redirected from Pre-sequencing informatics)
Jump to: navigation, search


I5klogo4.jpg Criteria for prioritization of arthropods/Pre-sequencing informatics Coordinator: Michael Pfrender
Google Groups Visit this group
Subscribe to i5K Pre-sequencing informatics
Email:


Goal of this working group

The goal of this working group is to develop criteria to prioritize Arthropod Genome Sequencing efforts. We have developed a set of areas proposed as selection criteria for the i5k genome sequences. This working group, in coordination with the Phylogenomics Working Group and the i5k Coordination Committee, takes a broad view of the scientific and societal impact of genome sequencing on outstanding issues related to the evolution of Arthropods.

Comments and edits are welcome.

Criteria for species prioritization

Relevance - Scientific impact from this genome project


  • The number, and significance, of areas of biological investigation that can be addressed through acquisition of a species genome. These areas are likely diverse, and may include a variety of both basic and applied issues. Areas of relevance include agricultural impact, human health and well-being, outstanding issues in evolution and ecology, and the conservation of biodiversity in changing environments.
  • Potential utility of this species as a model system for biological investigation. The size and level of coordination of the community of researchers that would utilize and enhance a species genome sequence is an important consideration. Ideally, a genome sequence is a tool that will facilitate research and form a focal point to enhance interdisciplinary interactions.
  • Relevance to comparative studies - Phylogenetic placement or representation in light of the growing diversity of completed genome sequences.
  • Potential for metagenomic discovery – are there biologically relevant associations with parasites, parasitoids, or symbionts that would enhance the utility of a genome sequence?


Scientific and Societal Impact

  • Agricultural impact - positive or negative (i.e., pest control, pollinator, crop damage, etc.)
  • Human health impact – (i.e., disease vectoring capabilities)
  • Economic and social relevance as a sentinel/keystone species (ecosystem role)
  • Species relevant to conservation - undergoing range expansion/reduction due to climate change or human encroachment or other form of habitat modification, invasive species, etc.


Feasibility – What resources are available to insure success of the genome project?


  • Available genomic resources (genetic maps, ESTs, recombinant lines, etc.)
  • Community support for annotation and end usage (size of the community, established databases, etc.)
  • Development and availability of cultured and inbred strains

Consideration of the timeline has been an important component of this criterion. Species with well-developed resources and readily available material suitable for sequencing have been given higher priority in the first phase of species prioritization. We anticipate this factor will diminish in importance over the long term.


Clearly Identified Contact Individual(s)


A critical element of any genome sequence project is the community that will take ownership of the sequence and work effectively to analyze, describe and extract biological significance from the sequence. A clearly defined person(s) that will interface with the user community and take the lead in organizing and developing a genome sequence publication is critical. In the proposed sequence phase this person(s) will be responsible for organizing and presenting information relevant to the selection criteria, and when a genome sequence project is underway, will be responsible for organizing the delivery of DNA/RNA to the genome center collecting the sequence data. After the genome sequence data are collected this person(s) will be responsible for coordinating the initial publication of the genome sequence.


Species prioritization: Process

Species prioritization is an iterative process (Fig. 1) coordinated by the Species Prioritization Working Group. This group was formed following the AGS meeting in Kansas City in June 2011.

  • The process begins with species nominations from: 1) i5k wiki and 2) Self-assembled groups with specific expertise in taxonomic groups or research areas.
  • This information was used to generate lists of candidate species that formed the starting point for the Species Prioritization Working Group.
  • Through many (many!) emails, the use of Google Docs, and bi-monthly telephone conference calls, the working group narrowed the list with the initial goal of identifying a set of 50 species for top priority in the initial round of sequencing.
  • Our group used the criteria outlined above to prioritize species. These criteria were used as guidelines to motivate a careful consideration of all candidate species.
  • The process has been highly iterative with input from the i5k Coordination Committee and direct solicitation of input from taxonomists with expertise in specific major groups.
  • After generating an initial set of ~100 species we considered additional criteria:
  • Overlap in phylogeny – one goal in this initial set is to maximize the value of these genomes in the multiple dimensions of our primary criteria including the phylogenetic distribution of species. Taxonomic groups with multiple representation were evaluated to minimize the overlap in terms of phylogenetic coverage.
  • Genome size – some arthropods have large (>1 Gb) genomes. The consideration of these large genomes is really a pragmatic constraint - Generating the sequence of one large genome versus multiple smaller genomes. Species with a range of genome sizes will be sequenced to avoid introducing bias by sequencing only the smallest genomes in any particular group.
  • Availability of suitable material – ideally inbred material from a lab culture is available to minimize heterozygosity and ensure that DNA/RNA material is not limiting
Figure 1: Flow chart of the iterative prioritization process
Figure 1: Flow chart of the iterative prioritization process
Personal tools