Accelize Blog

By Anonymous (not verified)
on 23 Jul 2018 2:23 PM

In this message we explain how you can accelerate search and replace operations in ASCII files using the FPGA-based Ultra Fast Search & Replace for Data Analytics accelerator to boost compute intensive workloads such as log processing, message processing, IP address filtering or genomics. We illustrate how to use this accelerator to boost up to 150x the conversion of DNA sequences to amino acid sequences.

 

1 Genomics use case

DNA is present in all cells in a body aside from those cells that do not contain a nucleus. DNA carries genetic information in the form of two paired strands. The genes in DNA encode protein molecules. These proteins perform essential functions as enzymes, hormones and receptors. Expressing a gene means manufacturing its corresponding protein. In a cell, this is achieved in two steps: transcription and translation.

The transcription step transfers a two-paired DNA to a single-stranded messenger RNA (mRNA). Then, during the translation step, the mRNA is read according to the genetic code. The genetic code relates the DNA sequence to the amino acid sequence in proteins. In mRNA, each group of three bases - including adenine (A), thymine (T), cytosine (C), and guanine (G) - constitutes a codon. To each codon corresponds a specific amino acid. The bases can be arranged in 64 unique combinations that are given in the following table. For example, the TCG codon corresponds to the Serine amino acid that is used in the biosynthesis of proteins.

 

2 Accelerator usage

The Ultra Fast Search & Replace for Data Analytics accelerator is a FPGA-based accelerator that boosts whole words search and replace operations in ASCII files. This accelerator is developed by Axonerve based on their high-speed and low latency search engine IP core, and using Accelize solutions. Axonerve deployed this accelerator in public cloud infrastructures such as on AWS EC2 F1 instances (with Xilinx FPGAs) and OVH Public Cloud instances (with Intel FPGAs), and on premise (for example with BittWare A10PL4 boards).

The accelerator can be run remotely or locally on AWS or OVH FPGA instances in two steps. The first step is to configure the accelerator with a corpus: a comma separated value file that specifies what words (codons) must be searched and replaced with what other words (amino acids). The second step is to send an input text file (a sequence of codons) to the accelerator, which will then perform a search and replace operation according to the corpus and produce an output text file (a sequence of amino acids). This second step is illustrated below for a remote execution.

 

An open source Python library is available to ease the integration of the accelerator in your application and to operate the accelerator. The following lines illustrate how to use simple APIs from this library to configure the accelerator with a corpus and then convert sequences of codons to sequences of amino acids.

import apyfal
# 1- Create Accelerator
with apyfal.Accelerator(accelerator='axonerve_hyperfire') as myaccel:
  
 # 2- Configure Accelerator with corpus (list of codons and corresponding amino acids)
   myaccel.start(datafile="corpus_codons_aminoacids.csv")
  
 # 3- Process files: replace codons with amino acids according to the corpus
   myaccel.process(file_in="codons_seq1.txt", file_out="amino_acids_seq1.txt")
   myaccel.process(file_in="codons_seq2.txt", file_out="amino_acids_seq2.txt")

 

3 Performance and other use cases

When running the Ultra Fast Search & Replace for Data Analytics engine to accelerate the conversion of a sequence of codons to a sequence of amino acids, BittWare measured a 150x acceleration factor compare to a sed (stream editor) command. This was measured when processing a 7.1 MB sequence of codons using the A10PL4 board (PCIe x8 card based on the Altera Arria 10 GX FPGA) hosted in a server with an Intel Xeon E5 processor. Watch this video for more details.

 

In the same video, BittWare used the Ultra Fast Search & Replace for Data Analytics accelerator with a corpus of 2,500 words to perform a search and replace operation in a 5.6 MB text file (the complete works of Shakespeare), and measured a 2,600x speedup compare to a sed command.  The accelerator github gives the instructions to reproduce this on AWS and OVH FPGA instances. Axonerve is currently working on the acceleration of short-read mapping operations to boost genomic analysis using the accelerator.

 

4 Universal Ubiquitous Value

While it is possible to boost up to 150x the conversion of DNA sequences to amino acid sequences using the FPGA-based Ultra Fast Search & Replace for Data Analytics accelerator, we see even greater performance (>1000x) when searching and replacing more words in text files. This Accelerator Function is accessible to any software programmer who can easily operate it in their Cloud context using an open source Python library. This opens up new benefits to a large number of Applications developers in many spaces as simple search and replace operations are widely used in applications such as log processing (e.g. for GDPR requirements), message processing (for keywords/sentiment analysis), security (IP address filtering).

The FPGA-based Ultra Fast Search & Replace for Data Analytics accelerator is currently available for AWS F1 and OVH FPGA instances, and testing it with your own corpus and text files through AccelStore on one of these platforms only takes a few minutes. It is also available for deployment on on-premise servers, so anyone interested in leveraging it should contact Accelize to get more details.