Ray is a parallel application which calculates a de novo genome assembly from latest generation sequencing data. Ray is written in C++ and can operate in parallel by means of the MPI library.

Ray can be executed with up to 100,000 processor cores using MPI. With hypercube routing the maximum number is 4096 cores.

You can obtain a copy of Ray from the site http://denovoassembler.sourceforge.net/download.html

Installation on Colosse

File : script.sh
# Load the necessary software environment
module load compilers/gcc/4.6.1 mpi/openmpi/1.4.5_gcc
tar xjf Ray-vx.y.z.tar.bz2
cd Ray-vx.y.z
make PREFIX=$(pwd)/Installation HAVE_LIBZ=y
make install
# test 
mpiexec -n 1 ./Installation/Ray -version


To get help or advice, there is a mailing list, http://denovoassembler.sourceforge.net/mailing-list.html

Public Data

There is a lot of genomic data in the public domain that is freely available.

Data Source Provider Link
European Nucleotide Archive (ENA) European Bioinformatics Institute (European Molecular Biology Laboratory) http://www.ebi.ac.uk/ena/
Sequence Read Archive (SRA) National Center for Biotechnology Information (U.S.A.) http://www.ncbi.nlm.nih.gov/sra
DNA Data Bank of Japan National Institute of Genetics (Japan) http://trace.ddbj.nig.ac.jp/DRASearch/
Sequence Read Archive (SRA) DNANexus, Inc. http://sra.dnanexus.com/
MG-RAST http://metagenomics.anl.gov/?page=MetagenomeSelect Argonne National Laboratory/U.S. Department of Energy

E. coli K12 MG1655


[name@server $] mpiexec -n 32 Ray -k 21 -o Ecoli -p SRR001665_1.fastq.gz SRR001665_2.fastq.gz -p SRR001666_1.fastq.gz SRR001666_2.fastq.gz

