Increasing the video compression throughput on FPGAs: HD to UHD

Several video compression technologies exist today including H.264 (MPEG-4 AVC), H.265 (HEVC), AV1, VVC & EVC.

These standards have all aimed to improve the video compression efficiency from the previous generation but this also results in the complexity of the codecs increasing significantly with each generation. Take for example the comparison between the implementation complexity of the x264 and x265 codecs, while the latter generation shows a 30% compression gain over its predecessor it is also 2.5 times more complex. VVC also shows close to a 60% compression gain over h.264 but at a complexity cost close to 25x.

The use of FPGAs in video compression is particularly interesting in real-time applications with benefits such as lower power usage, feature acceleration leading to increased throughput and of course on-the-field flexibility and reconfiguration. With the use of FPGAs, it is feasible to process and compress a single channel of 1080p raw input video at 60fps using the HEVC standard with a video quality comparable to the popular x265 at its highest quality settings but in real-time. There is however a consideration to be made here. A typical HEVC implementation of the sort described above is expected to use an average of 600K LUTs (DSP’s and FF’s excluded) which requires the larger FGPA chipsets to host the implementation due to the complexity involved. This fact coupled with the routing difficulty of designs which occupy more than 80% of the available FGPA resources makes for an interesting problem indeed.

This poses a challenge if one were to aim at providing a solution for a real-time UHD encoding application on an FPGA which will have a further complexity increase of between 2.5x to 4x. The two most common approaches used to date are to either use multiple FPGAs or a much larger FPGA to provide the solution, both of which result in an increase in both the power usage as well and the cost of the platform required to host such a solution. This problem is exacerbated when considering even more complex video compression standards like VVC. The cost of providing a real-time compression solution with some of the newer standards becomes prohibitively expensive.

If there was a way to maintain the video quality without increasing the required resources whilst maintaining the required throughput to achieve real-time encoding, then this solution can enable real-time UHD encoding that is low in power usage and implementation cost but also delivers high quality.

Encoding and decoding enhanced video streams with MPEG-5 Part 2

The new MPEG-5 Part 2 (LCEVC) standard provides the technology that can facilitate this requirement. LCEVC is an enhancement aimed at improving the compression and computational performance of any given codec. For example, an LCEVC-enabled HEVC UHD encoder can be implemented on a single FPGA device, instead of 4. It does so by leveraging a 1080p real-time HEVC ‘base’ solution and uses the LCEVC enhancement to produce a resolution upgrade to UHD. In effect, this FGPA solution contains two separate codecs working in tandem to produce a real-time UHD HEVC encoder.

In one implementation of this solution, it was possible to fit the LCEVC-enhanced HEVC encoder in a 1M LUT FPGA with usage capacity well below the 80% mark with significant implications on its usability:

Firstly, the cloud infrastructure cost savings of running a single FPGA versus 4 FPGAs (as previous solutions would have required) becomes significant to the user as well as the infrastructure provider in terms of power, cost and space.
Secondly, the resulting video quality surpassed that of the stand-alone HEVC solution at all key bitrate operating points.
Thirdly, the LCEVC-enhanced HEVC solution maintains real-time at UHD resolutions or alternatively provides up to a 4x density upgrade to the existing solution at 1080p resolutions.
Finally, the solution is codec agnostic and hence on the field can be reprogrammed to
enhance AV1, AVC or VVC encoding single FPGA solutions with the same benefits as above.

The solution is also very applicable in Ultra Low Latency (ULL) scenarios such as video conferencing. Typically, ULL solutions sacrifice compression features in order to achieve low latency but in this case the LCEVC-enhanced HEVC implementation provides an objective improvement in video quality by up to 65%.

Further to the final point above, the addition of LCEVC to existing encoding cores is made easy by the latest technological advances in the FPGA design process. This sort of multi-IP integration is typically not a trivial process and has the requirement for the individual IP interfaces to be defined and adhered to prior to implementation, with the post-integration verification phase being a logistical nightmare. However, the latest methodological advances of FPGA manufacturers have made such feats easier to accomplish. By defining a standard and static region of the chip which houses core data transfer peripherals and facilitates the host-to-device link, design houses only need cater for their IP ensuring they adhere to the interfaces provided within the static region. This significantly reduces the time-to-market from conception to product and lightens the design collaboration effort required by the individual IP vendors. Designing for FPGAs with this methodology is not only practical but highly flexible and hence easily ported to other FPGA technologies as well as ASICs.

Share on:

By Jean-Michel Frouin on 25 March 2021