Publication/Paper (NAR 2012)

From SEQwiki
< Publication
Revision as of 11:13, 9 August 2011 by Mmartin (talk | contribs) (Introduction)
Jump to: navigation, search

Paper for the NAR annual Database Issue

Note: NAR 2012 is proposed to be a 'wiki special', so I thought it would be a good idea to write up the work done and contents of on SEQwiki for that. For details, see the instructions for the NAR annual Database Issue (all the submitted wikis have to meet those criteria). --Dan 01:49, 1 August 2011 (PDT)
Note: See the NAR general guidelines for authors.
Note: For discussion, use the 'discussion' tab. See also the forum thread.

Title

SEQanswers Wiki: A Database of Tools for the Analysis of High Throughput Sequencing related Data

Authors

Provisional--Marcowanger 22:18, 8 August 2011 (PDT)

Marcel Martin, USAD's name? Others?

  • Dan Bolser
  • Eric C. Olivares &

Dan M. Bolser, University of Dundee (Scotland, UK); The BiO Centre
Eric C. Olivares, Founder, SEQanswers.com (Union City, CA, USA)
Marcel Martin, Bioinformatics for High-throughput Technologies, TU Dortmund, Germany
Jing-Woei Li, School of Life Sciences, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR; Hong Kong Bioinformatics Center, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR <-This is me--Marcowanger 22:21, 8 August 2011 (PDT)

& corresponding author

Abstract

The SEQanswers forum (http://SEQanswers.com/) was founded to facilitate direct communication among the packages' users and the developers. Recent advances in sequencing technologies have created unprecedented opportunities for biological research. The surge in data throughput from these new technologies has also created remarkable challenges in data management and analysis. As the demand for sophisticated analyses increases, the development of packages and algorithms is beginning to outpace the time frame of peer-reviewed publications and other traditional means of information sharing. In extreme cases, the algorithms or even the methodology used by a package might changed after its publication, making the paper irrelevant for the actual user.The SEQanswers wiki is a Semantic MediWiki (SMW) site that is edited and updated by the members of the SEQanswers community. The wiki provides an extensive catalogue of manually categorized analysis tools, technologies and information about service providers. Within two years, the SEQanswers community has created pages for over 400 unique software tools and associated them with around 350 literature references and 500 web links. This collaborated effort has made the SEQanswers wiki the most comprehensive and detailed portal of high-throughput sequencing related tools anywhere on the web.

Introduction

Recent breakthroughs in sequencing technologies have brought together wet-lab biologists and bioinformaticians. Traditional biologists wishing to do quick bioinformatics analysis and beginning bioinformaticians have a multitude of published packages to choose from, but choices without clear attributes tend to exhaust rather than enhance productivity (http://www.physorg.com/news127404469.html , please cite the appropriate psychology paper). Moreover, users without access to non-open-access journals are otherwise deprived of chances of understanding a package without any community review or sharing.

Online communities for Biological Sciences

As international collaboration increases in sciences, community-curated databases and websites are gaining popularity. Some prominent examples include OpenWetWare, a laboratory protocol site widely used in the International Genetically Engineered Machine (iGEM); BitesizeBio, an online community and magazine with community-contributed articles; BioStar, a question-and-answer site for bioinformatics, computational genomics and systems biology; and the BioTechniques forum, where people discuss traditional laboratory techniques.

The use of wikis in life sciences was pioneered by WikiGenes. WikiGenes is a gene-centric wiki system that links traditional expert-reviewed information, for example NCBI Entrez and Uniprot, to research findings from published scientific articles. Each insight into a gene is referenced to the original article. Thereby, WikiGenes serves as an important portal to available information of genes.

Complexity of the HTS field calls for community's effort

Accompanying the ever-expanding high-throughput sequencing (HTS) technologies is the sharp rise in the variety of informatics tools. On average, (counts) of tools appeared each year. (divide the total number of tools /beginning of HTS) [anyone have idea?, compare this surge to earlier years]. Such rapid emergence of tools exceeds an individual’s or even a single institution’s capacity to monitor. Besides, funding issues have limited institutions' abilities to engage in non-core activities. The Sequence Read Archive of NCBI (http://www.ncbi.nlm.nih.gov/pubmed/21062823), with no comparable alternative, will be phased out within the next 12 months (from June 2011). Individual researchers might find it even more difficult to keep track of trends in the HTS field. Traditionally, journal clubs are meant to educate the current topics. However, in such a setting the attendants are often confined to a geographical space, usually within an institution. Besides, the limited number of topics per session impedes the knowledge transfer of the field. All these calls for a robust system for rapid sharing of knowledge.

Community System: Complement to primary databases

Similar to Wikipedia, SEQanswers wiki is open to edit but not anonymously. Each modification is associated to a registered user and can be reversed if faulty information is found. Previously, "wikification" of primary sequence databases such as Genbank faced stiff resistance. We reasoned that Genbank is an important archive of static sequences and annotations. However, we reasoned that individual curator cannot fully encompass the collective expertise of the larger scientific community. Indeed, in order to allow prompt error correction while maintaining content accuracy, Steven Salzberg suggested a layer of wiki to be added to existing expert-curated databases. Besides, no chaos was observed in major biological sciences Wikis(*).

SEQanswers wiki is a semantic wiki that serves as rapid, day-to-day reference for bioinformaticians. The database refers users to the publication(s) of respective tool. Besides, community knowledge of tools are also extensively discussed in SEQanswers. SEQanswers wiki is an rigorously community maintained platform.

Reference to the Letter to Science: The Emerging World of Wikis

Wikis:

SEQanswers and the wiki: A credible HTS community

SEQanswers enables rapid dissemination of both wet-lab techniques and information regarding computational tools and analyses. SEQanswers allows new tools, techniques and pipelines to be rapidly announced, tested and benchmarked within the active community. Since its establishment in late 2007, SEQanswers has already been cited more than 35 [please update it] times in numerous high impact journals, including Nature and PLoS. SEQanswers aims to be a information resource and user-driven community focused on every aspects of high throughput genomics. SEQanswers is open to everyone regardless of scientific background or knowledge.

An open forum works best for discussion, but to a less extent in collaborative editing of resources. SEQanswers wiki was hence setup to achieve the later function. SEQanswers wiki (The wiki) originated from a discussion thread in SEQanswers, where presented a static list of packages according to the type of data analysis. The wiki is a structured catalogue of bioinformatics tools for NGS analysis. The wiki pages provide structured (semantic) data for each tool, including data types and formats, capabilities, and provenance details as well as links to publications and online resources. All data is contributed by users in both structured and free text form using SMW's semantic data entry capabilities. A search tool provides a simple yet powerful means to retrieve information of tools; structured data can be queried and presented as reports directly within the wiki. The wiki serves as a community annotated central portal to available bioinformatics tools. SEQanswers moderators have no role in judging the usefulness of the listed tools. Indeed, the popularity of tools is reflected by the number of view-count of respective software's page. On the other hand, the more often a tool is being searched and/or browsed, the higher is its ranking.

Goal of SEQanswers wiki

The goals of SEQanswers wiki are to:

(i) Gather and organize the ever growing bioinformatics packages by a community effort.

(ii) Provide a freely accessible, criteria-based searchable interface to facilitate selection of packages for analysis.

(iii) Accelerate informatics based knowledge exchange by bridging peer-reviewed journals and online community

SEQanswers wiki

Content

Currently, the database is deposited with 400 unique software tools, 350 references and 500 web links.

(Describe with overview, more detail here!)

Community contribution

SEQanswers wiki is open to edit by an active community of more than 19 thousand members around the world. Registered member can submit and curate profile of bioinformatic packages. Profile for both open-sourced and commercial package can be submitted with short description, its application domain, analysis method employed and the types of compatible NGS technologies. The package can be tagged if it's still under active maintenance. Reference and abstract to the tools can be automatically fetched according to the PubMed ID.

Usage walkthrough

(i) Software Hub,

Each package is tagged by the its programming language; compatible operating system; maintenance status, the compatible NGS technologies and the type(s) of performed analysis. Searching can be narrowed by by using multiple parameters. A package can be tagged by multiple attributes in each category (e.g. an RNA-Seq program can simultaneously do reads alignment and junction finding). Therefore, a package is searchable as long as any one of attributes is matched. Through tagging, Bioinformaticians focusing on real data can quickly retrieve a collection of up-to-date tools for analysis, while tools writers will be able to find the most comparable tools to benchmark their own programs. The database helps organizing the semantic information and let users and developers concentrate in productive analysis and development rather than finding the right tools to do the thing right. The Software hub uses tag clouding extensively. Tag cloud is a kind of visual representation to show the relative importance of a keyword among all. Tag cloud allow one to glimpse the trend in the field.

The overview of the software hub. The respective tag's size increase as more packages are tagged with it. In this example, packages written in C++ dominants, follows by Java and Perl. Most of the packages run in Linux. The number of packages with maintenance status still to be confirmed out-weighted the ones with a affirmative. Most packages are compatible with Illumina and 454 technologies

Comparisons and review of tools

SEQanswers wiki maintains an independent, community based hyper-focused reviews of commonly used bioinformatics tools. Users are introduced by essential knowledge of bioinformatics analysis. This section is complementary to the search function.

(ii) HTS Providers

This section is a compilation of NGS providers around the globe. Users can find a service provider by the type of services they need (whether sequencing, genotyping or analysis). Searching can be furthered narrowed down to service providers located in specific region of the world. Finding a service provider near to the user is of great importance to NGS users to ensure quick sample delivery and maintain sample integrity. This section is invaluable for researchers without NGS core facilities in their own's institution. Meanwhile, this section is also for those curious to get informed the current deployment of NGS services in different geographical locations.

The overview of High throughput Sequencing Service Provider. Here shows the type of service provided by different service provider. By default, names are sorted alphabetically. User can customize the sorting. Optionally, names can be filtered by services, regions, areas or states. A new service provider can be added (top left). Existing entries can be edited

Future Directions

SEQanswers and the wiki is an on-going project. It has served as a successful platform to provide powerful searching capabilities on packages for users. We plan to further enhance the features. In particular, we plan to implement the followings,

(i) Community review system: Before SEQanswers, comments by users are usually directed solely to authors of respective packages. Reviews of packages by blogger are posted independently. SEQanswers has fostered both pre-publication and post-publication review on packages. Long before peer review publication, packages were usually announced in SEQanswers and tested extensively within the community (think DESeq, for example). Post publication improvement and benchmarking among developers is encouraged by discussions in SEQanswers (think Cufflinks vs DESeq vs DEGSeq vs ...). SEQanswers wiki aims to become an independent community-based review system to complement the peer-review publication sysetm and provide a central portal to NGS field.

(ii) More accurate popularity metric: Currently, the usefulness of an software is inferred by its popularity. In the future, we may implement a more reliable metric to measure popularity, possibly by the number of citations by peer-review journals normalized by the time after the tool's publication.

Please read the sites below!

1) http://www.genomesunzipped.org/2011/07/why-publish-science-in-peer-reviewed-journals.php

2) http://scienceofblogging.com/post-publication-peer-review-blogs-vs-letters-to-the-editor/

3) http://cenblog.org/terra-sigillata/2010/12/08/post-publication-peer-review-in-public-poison-or-progress/

Citing SEQanswers and the wiki

For a general citation of SEQanswers and the wiki, please cite this article. In addition, the following citation format is suggested when referring to data specific to the SEQanswers and the wiki (URL: http://seqanswers.com/ or http://seqanswers.com/wiki/) [Type in date (month, year)you retrieved the cited data].

Acknowledgements

The authors would like to thank the SEQanswers members who contributed to the creation and curation of the wiki.

Funding

[Authors, except ECO] are not affiliated to the operation of SEQanswers. SEQanswers has advertising relationship with commercial companies. All relationships between SEQanswers and sponsoring companies are explicitly listed in About SEQanswers. These companies have no role in operation of SEQanswers, nor in writing of this manuscript. Discussion on SEQanswers is based on the entity of an individual registered with an account. Funding for open access charge: To be confirmed

Conflict of interest statement. None declared / ECO is the fonunder of SEQanswers??

References

Description in accordance with BioDBcore standards

See some examples.

Database name SEQanswers wiki
Main resource URL http://SEQanswers.com/wiki/
Contact information webmaster@seqanswers.com
Date resource established (year) 2009 (SEQanswers forum: 2007)
Conditions of use (Free, or type of license) Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Scope:
data types captured
curation policy manual curation
Standards: MIs, Data formats, terminologies
Taxonomic coverage
Data accessibility/output options
Data release frequency immediately after modification
Versioning period and access to historical files every modification is versioned; full version history is available
Documentation available
User support options forum, e-mail
Data submission policy any registered user may contribute; registration is not restricted
Relevant publications
Resource's Wikipedia URL http://en.wikipedia.org/wiki/SEQanswers
Tools available