The PR2 Database

The Protist Ribosomal Reference Database provides an access to unicellular eukaryotes Small SubUnit rRNA and rDNA sequences with curated taxonomy. The database is focused on nuclear-encoded sequences of protists. However, Metazoa, land plants and macrosporic fungi, as well as eukaryotic organelles (mitochondrion, plastid and others) are also included because they are useful for Next Generation Sequencing dataset analyses. The PR2 database can be downloaded or searched by similarity as explained below. Because of their tremendous diversity, taxonomy of protists is still under progress.

The PR2 database is updated after each new GenBank Release. Contributions or suggestions are very welcome and can be submitted either to Laure Guillou or to Richard Christen.


If you are using PR2 database then Please Cite Here. Read more about the PR2 database here

For latest news about PR2 Database Please Click Here. Read and share your knowledge in PR2 Wiki

 
Pulpit rock

PR2 Database

The Protist Ribosomal Reference Database

The Protist Ribosomal Reference database provides a catalog of unicellular eukaryotes Small SubUnit rRNA sequences with curated taxonomy. The database is focused on nuclear-encoded sequences of protists. However, Metazoa, land plants and macrosporic fungi, as well as eukaryotic organelles (mitochondrion, plastid and others) are also included because they are useful for Next Generation Sequencing dataset analyses.


Read more about the PR2 database.


Complete as well as partial PR2 databases can be downloaded from this site, which also proposes different tools to assign your own sequence by sequence similarity throughout the whole database. Because of their tremendous diversity, taxonomy of protists is still under progress.

Taxonomy of this database was built by a consortium of experts and will be maintained and developed under the project "Oceanomics" ( French project "Grand Emprunt", 2012-2020, coordinator Colomban de Vargas ).

New contributions are very welcome and can be submitted either to Laure Guillou. , Station Biologique of Roscoff, France or to Richard Christen.

Taxonomy of unclassified sequences can be deduced by sequence similarity using the PR2 database by the following tools:

Use our blast server in combination with Blast2Tree.

Blast is based on local alignments, as a result the % similarity.

Use our Crunch Assign Server ( Needleman-Wunsch global aligner ) in combination with Crunch2Tree.

Crunch is based on single global alignment, the % of similarity is then based on the entire sequences.

Search by primers.

Primers can be searched in the database using our special C algorithm.

Use KeyDNA Tools.

Sequences are annotated by short (15 nt) oligonucleotides generated form the core PR2 reference database.

For NGS analyses or special cases, please contact :

Richard Christen

Laure Guillou.

Bibliography:

Publications linked to accession numbers have been collected for every sequence present in the database. This does not mean that each article referenced for a clade deals with this clade, but that each article concerns a study that was identified as having submitted at least one sequence of this clade. Each reference has a link to PubMed.



You can find them in our bibliography page. 

 Intronic sequences

The gene encoding SSU-rRNA sequences is the tool of choice for phylogenetic analyses and environmental biodiversity analyses of Bacteria, Archaea but also unicellular Eukaryota.

    In Eukaryota, gene sequences may often be interrupted by long or several short introns.

Building the database

Searching in GenBank release 188 we found descriptions of 3,638 such sequences. Using a database of 180,000 SSU-rRNA sequences well annotated for taxonomy and a parallel mpi C++ program ( server version is here ) written for that purpose, we computed the presence of

18,691 introns (among which the 3,638 described introns). Filtering on length and sequence quality, 3,646 sequences were retained. These introns were clustered; clusters were analyzed for the presence of a single or multiple clades at various

levels of taxonomic depth, allowing future analyses of horizontal transfers. Various analyses of the results are provided

as tabulated files as well as fasta files of described or computed introns. Each sequence is annotated for cellular location

(nuclear, chloroplast and mitochondria), positions at which they were found in the SSU-rRNA sequences and taxonomy as provided by GenBank.

Intronic database available for download

We provide a series of fasta files in which only sequences extracted from abnormal clusters are shown; each sequence is annotated with our taxonomy. A series of files is thus provided, each with sequences clustered at a given level of dissimilarities and for discrepancies found at successive levels of the taxonomy (more details provided in the readme associated with these files). Note however that some clusters can be found abnormal when they are probably not. This is true in particular when a discrepancy is found as for example between "Eukaryota Opisthokonta" and "Eukaryota eukaryota_x", the latter annotation resulting from our failing to properly assign a SSU-rRNA sequence. When such case is encountered, it can probably be manually solved.