async_cuda

This library adds a simple API that enables the user to retrieve a future from a cuda stream. Typically, a user may launch one or more kernels and then get a future from the stream that will become ready when those kernels have completed. The act of getting a future from the cuda_stream_helper object in this library hides the creation of a cuda stream event and the attachment of this event to the promise that is backing the future returned.

The usage is best illustrated by looking at an example

// create a cuda target using device number 0,1,2...
hpx::cuda::experimental::target target(device);
// create a stream helper object
hpx::cuda::experimental::cuda_future_helper helper(device);

// launch a kernel and return a future
auto fn = &cuda_trivial_kernel<double>;
double d = 3.1415;
auto f = helper.async(fn, d);

// attach a continuation to the future
f.then([](hpx::future<void>&& f) {
    std::cout << "trivial kernel completed \n";
}).get();

Kernels and CPU work may be freely intermixed/overlapped and synchronized with futures.

It is important to note that multiple kernels may be launched without fetching a future, and multiple futures may be obtained from the helper. Please refer to the unit tests and examples for further examples.

CMake variables

HPX_WITH_CUDA - this is a general option that will enable both HPX_WITH_ASYNC_CUDA and HPX_WITH_CUDA_COMPUTE when turned ON.

HPX_WITH_ASYNC_CUDA=ON enables the building of this module which requires only the presence of CUDA on the system and only exposes cuda+fuures support (HPX_WITH_ASYNC_CUDA may be used when HPX_WITH_CUDA_COMPUTE=OFF).

HPX_WITH_CUDA_COMPUTE=ON enables building HPX compute features that allow parallel algorithms to be passed through to the GPU/CUDA backend.

See the API reference of this module for more details.