NVIDIA GPUDirect

Introduction

With the advent of CUDA, OpenCL and other generic GPU processing technologies, very high speed processing of massively-parallel algorithms has become possible on standard consumer-level computer hardware, even those with otherwise modest specifications. This has been a boon to many fields of research and technology, and promises ongoing and increasing benefits in the future. However, there remain several impediments to widespread adoption, and maximized performance.

One substantial difficulty for real-time applications, is the data transfer bottleneck; a GPU can process data only as fast as the data can be transferred to it. In traditional system memory models, each device will have access only to its own memory, so that a frame grabber will acquire to its own set of system buffers, while the GPU will have a completely separate set of system buffers, and any transfer between the two must be performed by the CPU. NVIDIA’s Direct-for-Video technology is meant to alleviate this problem, and does so, on Microsoft Windows systems, by allowing the GPU and frame grabber to share the same system memory, eliminating the single biggest bottleneck of the traditional architecture, the CPU copy from frame-grabber to GPU system buffer.

By properly utilizing the NVIDIA Direct-for-Video Protocol, a frame grabber application may perform very high speed processing, in real time, with very little additional latency, and using very few CPU cycles indeed.

Justification

Because the NVIDIA Direct-for-Video Protocol (DVP) is generic, it takes several non-trivial steps to integrate with any given application. Because NVIDIA DVP is not currently available for general use, a potential user must go through bureaucratic hurdles to gain access to the library. For these reasons, BitFlow has chosen to publish BFDVP (BitFlow Direct-for-Video Protocol), a deep wrapper of NVIDIA DVP, designed to enable very simple integration of the BitFlow Buffer Interface acquisition API (BiAPI) with NVIDIA GPUs. With BFDVP, one may focus more readily on the two most important features of a frame grabber-GPU processing system: Acquisition and processing. As with NVIDIA DVP, BFDVP is intended to be used with NVIDIA CUDA GPU processing. Other GPU processing architectures are not currently supported.

Dependencies

CUDA development library, v7.0 or newer.
A NVIDIA GPU supporting CUDA and Direct-for-Video
The BitFlow SDK (any paid version supporting the BiAPI may be used; mainly, v5.XX or greater)
A modern BitFlow frame grabber compatible with the BiAPI (eg., Aon, Axion, Claxon, Cyton)
Microsoft Visual Studio 2012 or greater

Examples

SimpleDVP

Demonstrates a simple, continuous acquisition with transfer to a DVP device. Once transferred to the CUDA device, the image is copied into an OpenGL buffer and displayed using the GLUT and GLEW libraries. Pressing the ‘G’ character enables or disables a live Gaussian blur operation (adapted from the Gaussian example in the CUDA samples library), executed on the CUDA device prior to display. The live display is interactive, and the image may be panned with mouse dragging, or zoomed with the mouse wheel. Although CUDA processing is very fast, the Gaussian operation is quite expensive, and may be overwhelmed by very fast acquisition of large images, especially on slower GPUs.

SimpleCUDA

A variation of the SimpleDVP example, implemented using pure CUDA calls, without DVP. This example will run on any NVIDIA card supporting CUDA, including many consumer graphics cards that do not support DVP.

BayerDVP

A live image preview example, similar to SimpleDVP, but demonstrating a live Bayer demosaic operation. Several additional features are implemented, above and beyond what SimpleDVP implements.

Additional dependencies: OpenCV GPU library