Next-Gene - Next Generation Sequencing
Software
Software for Analysis of “Next
Generation” Sequence data
Download
de novo assembly Application note as de
novo Assembly Application Note PDF
(167Kb)
Download
Digital Gene Expression Application note as Digital
Gene Expression Application Note PDF
(131Kb)
Download
File Format Application note as File
Format Application Note PDF
(98Kb)
Download
SNP & Indel Application note as SNP
& Indel Application Note PDF
(243Kb)
Download
Transcriptome Application note as Transcriptome
Application Note PDF
(605Kb)
• Compatible with all Next Generation systems
• Performs Assembly, SNP/Indel detection, Transcriptome
Analysis
• “Low-end” Hardware Requirements
• Easy & Intuitive User Interface
• Exclusive SoftGenetics technical support
SoftGenetics offers NextGENe, a unique software for analyzing
“Next Generation” Sequencing data, generated by platforms
such as the Genome Sequencer FLX System from Roche Applied Science
(454 Life Sciences) and short read systems such as the Illumina®
Genome Analyzer (Solexa sequencing technology). Designed with
speed and simplicity of use in mind, NextGENe is capable of completing
much of the analysis on a standard PC, and soon the analyses will
be available on a Linux platform. By statistically polishing the
large sets of data into manageable fragments, NextGENe can quickly
generate accurate results. NextGENe can be used to identify Single
Nucleotide Polymorphisms (SNPs) and small Insertions and Deletions
(Indels) for large scale resequencing projects and transcriptome
analyses.
De novo assembly of both long and short sequences will be available
shortly, as well as analysis of data from the Applied Biosystem
SOLiD™ System. The software is designed to handle the unique
hurtles presented by each platform, such as the homopolymers for
pyrosequencing and short reads generated by both the bridge amplification
sequencing by synthesis method and the sequencing by ligation
method, while utilizing the system’s advantages.

Figure 1: NextGENe Alignment tool highlights
SNPs and small Indels. Towards the left side, the software has
identified a T>G SNP. The center of the figure shows a GAAA
repeat regions, and some of the reads (highlighted in red) have
the end of the repeat in the middle of the read. This scenario
enables detection of both deletions, as shown here, and insertions.
Resequencing Projects
SNPs and small Indels can be detected in targeted
sequencing data from both longer sequence reads and the short
reads from techniques such as the Solexa sequencing technology.
Using the Condensation Tool the short reads sometimes aligning
to multiple regions of the genome can be elongated to 50 or 65
bases, increasing the probability of the sequence being unique.
The Condensation Tool partitions the reads into smaller groups
and sorts them into two categories, condensed fragments and fragments
that are more likely to be noise from sequencing errors. Because
the short reads have been elongated by the Condensation Tool,
detection of Indels is possible. Since a true SNP occurs at a
high frequency within the reads, the low frequency variants caused
by noise can be deciphered from the positions containing a polymorphism.
Whole Transcriptome Analysis
Analyzing an organism’s transcriptome
with the Next Generation Sequencing technology presents several
challenges, including the generation of sequences with homopolymeric
regions and high variability in expression rates. Short reads
(25 to 35 bases) are not always unique, but fragments of 50 to
65 bases are generally unique. In addition, high expression of
some genes can mask genes of low expression levels. When the sequence
from a low expressed transcript is similar to the sequence of
a gene expressed at a high level, this sequence could be misinterpreted
as noise or error. By using the Condensation Tool, the short reads
are polished into longer reads, allowing for noise and error to
more reliably be filtered out.
When using the Alignment tools, the highly expressed
sequences are matched to the reference. The low level reads, often
mistaken as containing errors, are rescanned and matched to the
reference allowing for more accurate detection of genes expressed
at lower rates. Uniqueness Scores are determined for each fragment.
Any Reference mRNA Sequence database can be used as a reference
for alignment of the reads.
The results of the analysis can be saved as
a reference file, allowing for direct comparison to the results
from another analysis. This is a useful feature for comparison
studies such as Chromatin Immunoprecipitation (ChIP).
Figure 2: The alignment tool shows three
SNPs identified by the blue background. At the top of the figure,
the first row shows the chromosome position, the vertical numbers
below this show the coverage, and the reference sequence is located
below the coverage. Holding down Ctrl key while cursor is in reference
region at top of figure shows the annotation line.
Condensation Assembly Tool
The Condensation Assembly Tool is used to statistically
polish and lengthen the short sequence reads into fragment sizes
that are more manageable. The short reads such as those from the
Illumina Genome Analyzer System are often not unique within the
genome being analyzed. By clustering similar reads containing
a unique anchor sequence, data of adequate coverage is condensed
and the short reads are lengthened. The unique anchor sequence,
or index, can be a 12 base fragment that is found in several of
the reads. All reads containing this exact sequence are clustered
together. Often, many of the reads within a cluster contain 4
homologous nucleotides both upstream and downstream of the index
sequence. The cluster of reads can be sorted by these flanking
shoulder regions into groups of similarity. The consensus of these
groups is much larger in length, and often these 50 to 65 base
pair fragments are unique within the genome, with exceptions such
as homopolymeric regions, repeats and duplications.
Figure 3: The Condensation Assembly tool
clustered similar reads containing the same anchor sequence of
TCACGACGGTCT. The right shoulder, four nucleotides to the right
of anchor, is divided into two sequences, AATC (red) and AACC
(blue). A consensus sequence is generated for these groups.

Figure 4: The Condensation Assembly tool
generates a fasta file of the sequence reads that were condensed
into larger fragments. The anchor, index, is common sequence between
the group of reads and by dividing the clusters of reads by their
shoulder sequences, a longer consensus sequence is generated for
the group.
Alignment Tools
The Alignment tools are designed to match the
sequence reads to a user-defined annotated reference sequence.
Once the reads have been aligned to the reference, SNPs and Indels
are highlighted for quick identification. The display shows the
reference sequence, aligned sequence reads, breakpoints between
genes, coverage, and with the click of a button, biological information
for the position can be displayed. The projects and reports can
be saved for further analysis.
Figure 5: Coverage is indicated by gray lines
at the top of this alignment tool. The red lines indicate the
break points between the transcripts of this reference sequence
file. An Indel was highlighted by the tool in the middle of the
screen, and two substitutions were detected to the right.
Recommended Hardware
Customizable Desktop PC
• Windows Vista (64-bit)
• Intel(R) Core(TM) 2 Quad processor Q6600 (2.4GHz)
• 8GB DDR2-800MHz dual channel SDRAM (2x1024)
• 512MB NVIDIA GeForce 8500GT, TV-out, DVI-I, HDMI
Next-Gene - Software for Analysis of “Next Generation”
Sequence data