Accelize Blog

By Anonymous (not verified)
on 03 Aug 2018 10:41 AM

In this message we explain how you can boost GZIP compression up to 25x faster than CPU in Cloud infrastructures using the FPGA-based GZIP accelerator.

 

1. The need for faster compression

In the last decade, what we call the “Big Data” (IoT, Machine-learning, VR, …) generated new usages for customers along with huge volume of data exchanged all over the world. Insanely high amounts of data generated from cutting-edge technology appliances, connected objects or software need to be stored everywhere on the planet.

 

This huge amount of data makes storage a constant challenge. To increase available storage capacity, one solution is to compress data before storing. This is achieved  using algorithms like GZIP, which is one of the oldest algorithms - and one of the only survivors of this stone age - that has been invented for computer science. GZIP compression is currently performed by software on CPUs, and roughly 5% of servers globally are said to be dedicated to storage.

CPUs are not available to perform other tasks while compressing data. This can slow down the whole chain and even become its weakest link. To address this limitation, different approaches emerged, ranging from new software compression algorithms to hardware accelerated compression systems. Here we focus on hardware accelerated compression.
 

2. How hardware acceleration works

Hardware acceleration is based on specific boards plugged into servers in the Cloud. Those boards outperform CPUs equivalent operations by supporting massive true parallel operations in chips called FPGAs, while CPUs are essentially sequential devices.

As opposed to general purpose CPUs, FPGAs are programmed specifically to optimize the function that will be running on them: 100% of their resources are allocated to reduce the latency and increase the throughput of the function.

 

The GZIP accelerator is an FPGA-based accelerator developed by CAST based on their GZIP IP core and using Accelize solutions. It is ready to use in public cloud infrastructures (AWS EC2 F1 and OVH Public Cloud instances) and on premise.
 

3. Leveraging hardware accelerated compression in 4 lines of code

Fortunately, to leverage the GZIP accelerator in your Cloud application, you don’t need to learn more about how FPGAs work and how to program them. Instead, you simply need to write 4 lines of Python code that will transparently do the job for you:
 

import apyfal
with apyfal.Accelerator(accelerator='cast_gzip') as myaccel:
   myaccel.start()
   myaccel.process(file_in="YOUR_FILE", file_out="COMPRESSED_FILE.gz")


It really is that simple! With only 4 lines of python code you can achieve a gzip compression on a file from an hardware acceleration board in the cloud. Let’s take a look at the code line by line:

“Import apyfal”:
APYFAL is an open source Python library that allows you to operate the hardware acceleration boards, remotely or locally on the FPGA instances/servers.

“with apyfal.Accelerator(accelerator='cast_gzip') as myaccel:”
With this line you select the gzip compression accelerator from the apyfal toolkit

“myaccel.start()”:
This line will start the cloud server that hosts the hardware acceleration board

“myaccel.process(file_in="YOUR_FILE", file_out="COMPRESSED_FILE.gz")”:
with this line your can give the path to the file you want to compress and choose the path where the compressed file will be stored. You can use paths on your local machine, on cloud storage buckets (AWS S3, OpenStack Swift…), or on public Internet (using HTTP URLs). This line can be duplicated for every file you want to compress.
 

4. Performance

Depending on the file size and content, you can achieve up to 25x acceleration using APYFAL file compression compared to traditional software-based gzip compression.

 

https://github.com/Accelize/gzip

The compression ratio obtained when using the GZIP accelerator also depends on the file content. When compressing 4GB text files such as log files or genomic data, the accelerator can lead to compression ratios that match those obtained with a pure software compression based on gzip -4. 
 

5. Enjoy accelerated compression

Faster data compression with the GZIP accelerator frees up servers for other duties, which helps to improve the performance of any kind of data analysis in the Cloud. This accelerator is accessible to any software programmer who can leverage it in 4 lines of code using an open source Python library, which also operates other accelerators such as tar.gz compression, search and replace, true random number generation, with more accelerators to come (H264 and HEVC video encoders).

It only takes a few minutes to enjoy the GZIP accelerator and compress your own files on FPGA instances through AccelStore. Anyone interested in deploying this accelerator on on-premise FPGA servers should contact Accelize to get more details.