Writing single-node applications#

Being a C++ Standard Library for Concurrency and Parallelism, HPX implements all of the corresponding facilities as defined by the C++ Standard but also those which are proposed as part of the ongoing C++ standardization process. This section focuses on the features available in HPX for parallel and concurrent computation on a single node, although many of the features presented here are also implemented to work in the distributed case.

Synchronization objects#

The following objects are providing synchronization for HPX applications:

  1. Barrier

  2. Condition variable

  3. Latch

  4. Mutex

  5. Shared mutex

  6. Semaphore

  7. Composable guards

Barrier#

Barriers are used for synchronizing multiple threads. They provide a synchronization point, where all threads must wait until they have all reached the barrier, before they can continue execution. This allows multiple threads to work together to solve a common task, and ensures that no thread starts working on the next task until all threads have completed the current task. This ensures that all threads are in the same state before performing any further operations, leading to a more consistent and accurate computation.

Unlike latches, barriers are reusable: once the participating threads are released from a barrier’s synchronization point, they can re-use the same barrier. It is thus useful for managing repeated tasks, or phases of a larger task, that are handled by multiple threads. The code below shows how barriers can be used to synchronize two threads:

#include <hpx/barrier.hpp>
#include <hpx/future.hpp>
#include <hpx/init.hpp>

#include <iostream>

int hpx_main()
{
    hpx::barrier b(2);

    hpx::future<void> f1 = hpx::async([&b]() {
        std::cout << "Thread 1 started." << std::endl;
        // Do some computation
        b.arrive_and_wait();
        // Continue with next task
        std::cout << "Thread 1 finished." << std::endl;
    });

    hpx::future<void> f2 = hpx::async([&b]() {
        std::cout << "Thread 2 started." << std::endl;
        // Do some computation
        b.arrive_and_wait();
        // Continue with next task
        std::cout << "Thread 2 finished." << std::endl;
    });

    f1.get();
    f2.get();

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In this example, two hpx::future objects are created, each representing a separate thread of execution. The wait function of the hpx::barrier object is called by each thread. The threads will wait at the barrier until both have reached it. Once both threads have reached the barrier, they can continue with their next task.

Condition variable#

A condition variable is a synchronization primitive in HPX that allows a thread to wait for a specific condition to be satisfied before continuing execution. It is typically used in conjunction with a mutex or a lock to protect shared data that is being modified by multiple threads. Hence, it blocks one or more threads until another thread both modifies a shared variable (the condition) and notifies the condition_variable. The code below shows how two threads modifying the shared variable data can be synchronized using the condition_variable:

#include <hpx/condition_variable.hpp>
#include <hpx/init.hpp>
#include <hpx/mutex.hpp>
#include <hpx/thread.hpp>

#include <iostream>
#include <string>

hpx::condition_variable cv;
hpx::mutex m;
std::string data;
bool ready = false;
bool processed = false;

void worker_thread()
{
    // Wait until the main thread signals that data is ready
    std::unique_lock<hpx::mutex> lk(m);
    cv.wait(lk, [] { return ready; });

    // Access the shared resource
    std::cout << "Worker thread: Processing data...\n";
    data = "Test data after";

    // Send data back to the main thread
    processed = true;
    std::cout << "Worker thread: data processing is complete\n";

    // Manual unlocking is done before notifying, to avoid waking up
    // the waiting thread only to block again
    lk.unlock();
    cv.notify_one();
}

int hpx_main()
{
    hpx::thread worker(worker_thread);

    // Do some work
    std::cout << "Main thread: Preparing data...\n";
    data = "Test data before";
    hpx::this_thread::sleep_for(std::chrono::seconds(1));
    std::cout << "Main thread: Data before processing = " << data << '\n';

    // Signal that data is ready and send data to worker thread
    {
        std::lock_guard<hpx::mutex> lk(m);
        ready = true;
        std::cout << "Main thread: Data is ready...\n";
    }
    cv.notify_one();

    // Wait for the worker thread to finish
    {
        std::unique_lock<hpx::mutex> lk(m);
        cv.wait(lk, [] { return processed; });
    }
    std::cout << "Main thread: Data after processing = " << data << '\n';
    worker.join();

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

The main thread of the code above starts by creating a worker thread and preparing the shared variable data. Once the data is ready, the main thread acquires a lock on the mutex m using std::lock_guard<hpx::mutex> lk(m) and sets the ready flag to true, then signals the worker thread to start processing by calling cv.notify_one(). The cv.wait() call in the main thread then blocks until the worker thread signals that processing is complete by setting the processed flag.

The worker thread starts by acquiring a lock on the mutex m to ensure exclusive access to the shared data. The cv.wait() call blocks the thread until the ready flag is set by the main thread. Once this is true, the worker thread accesses the shared data resource, processes it, and sets the processed flag to indicate completion. The mutex is then unlocked using lk.unlock() and the cv.notify_one() call signals the main thread to resume execution. Finally, the new data is printed by the main thread to the console.

Latch#

A latch is a downward counter which can be used to synchronize threads. The value of the counter is initialized on creation. Threads may block on the latch until the counter is decremented to zero. There is no possibility to increase or reset the counter, which makes the latch a single-use barrier.

In HPX, a latch is implemented as a counting semaphore, which can be initialized with a specific count value and decremented each time a thread reaches the latch. When the count value reaches zero, all waiting threads are unblocked and allowed to continue execution. The code below shows how latch can be used to synchronize 16 threads:

std::ptrdiff_t num_threads = 16;

///////////////////////////////////////////////////////////////////////////////
void wait_for_latch(hpx::latch& l)
{
    l.arrive_and_wait();
}

///////////////////////////////////////////////////////////////////////////////
int hpx_main(hpx::program_options::variables_map& vm)
{
    num_threads = vm["num-threads"].as<std::ptrdiff_t>();

    hpx::latch l(num_threads + 1);

    std::vector<hpx::future<void>> results;
    for (std::ptrdiff_t i = 0; i != num_threads; ++i)
        results.push_back(hpx::async(&wait_for_latch, std::ref(l)));

    // Wait for all threads to reach this point.
    l.arrive_and_wait();

    hpx::wait_all(results);

    return hpx::local::finalize();
}

In the above code, the hpx_main function creates a latch object l with a count of num_threads + 1 and num_threads number of threads using hpx::async. These threads call the wait_for_latch function and pass the reference to the latch object. In the wait_for_latch function, the thread calls the arrive_and_wait method on the latch, which decrements the count of the latch and causes the thread to wait until the count reaches zero. Finally, the main thread waits for all the threads to arrive at the latch by calling the arrive_and_wait method and then waits for all the threads to finish by calling the hpx::wait_all method.

Mutex#

A mutex (short for “mutual exclusion”) is a synchronization primitive in HPX used to control access to a shared resource, ensuring that only one thread can access it at a time. A mutex is used to protect data structures from race conditions and other synchronization-related issues. When a thread acquires a mutex, other threads that try to access the same resource will be blocked until the mutex is released. The code below shows the basic use of mutexes:

#include <hpx/future.hpp>
#include <hpx/init.hpp>
#include <hpx/mutex.hpp>

#include <iostream>

int hpx_main()
{
    hpx::mutex m;

    hpx::future<void> f1 = hpx::async([&m]() {
        std::scoped_lock sl(m);
        std::cout << "Thread 1 acquired the mutex" << std::endl;
    });

    hpx::future<void> f2 = hpx::async([&m]() {
        std::scoped_lock sl(m);
        std::cout << "Thread 2 acquired the mutex" << std::endl;
    });

    hpx::wait_all(f1, f2);

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In this example, two HPX threads created using hpx::async are acquiring a hpx::mutex m. std::scoped_lock sl(m) is used to take ownership of the given mutex m. When control leaves the scope in which the scoped_lock object was created, the scoped_lock is destructed and the mutex is released.

Attention

A common way to acquire and release mutexes is by using the function m.lock() before accessing the shared resource, and m.unlock() called after the access is complete. However, these functions may lead to deadlocks in case of exception(s). That is, if an exception happens when the mutex is locked then the code that unlocks the mutex will never be executed, the lock will remain held by the thread that acquired it, and other threads will be unable to access the shared resource. This can cause a deadlock if the other threads are also waiting to acquire the same lock. For this reason, we suggest you use std::scoped_lock, which prevents this issue by releasing the lock when control leaves the scope in which the scoped_lock object was created.

Shared mutex#

A shared mutex is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads. In contrast to other mutex types which facilitate exclusive access, a shared_mutex has two levels of access:

  • Exclusive access prevents any other thread from acquiring the mutex, just as with the normal mutex. It does not matter if the other thread tries to acquire shared or exclusive access.

  • Shared access allows multiple threads to acquire the mutex, but all of them only in shared mode. Exclusive access is not granted until all of the previous shared holders have returned the mutex (typically, as long as an exclusive request is waiting, new shared ones are queued to be granted after the exclusive access).

Shared mutexes are especially useful when shared data can be safely read by any number of threads simultaneously, but a thread may only write the same data when no other thread is reading or writing at the same time. A typical scenario is a database: The data can be read simultaneously by different threads with no problem. However, modification of the database is critical: if some threads read data while another one is writing, the threads reading may receive inconsistent data. Hence, while a thread is writing, reading should not be allowed. After writing is complete, reads can occur simultaneously again. The code below shows how shared_mutex can be used to synchronize reads and writes:

int const writers = 3;
int const readers = 3;
int const cycles = 10;

using std::chrono::milliseconds;

int hpx_main()
{
    std::vector<hpx::thread> threads;
    std::atomic<bool> ready(false);
    hpx::shared_mutex stm;

    for (int i = 0; i < writers; ++i)
    {
        threads.emplace_back([&ready, &stm, i] {
            std::mt19937 urng(static_cast<std::uint32_t>(std::time(nullptr)));
            std::uniform_int_distribution<int> dist(1, 1000);

            while (!ready)
            { /*** wait... ***/
            }

            for (int j = 0; j < cycles; ++j)
            {
                // scope of unique_lock
                {
                    std::unique_lock<hpx::shared_mutex> ul(stm);

                    std::cout << "^^^ Writer " << i << " starting..."
                              << std::endl;
                    hpx::this_thread::sleep_for(milliseconds(dist(urng)));
                    std::cout << "vvv Writer " << i << " finished."
                              << std::endl;
                }

                hpx::this_thread::sleep_for(milliseconds(dist(urng)));
            }
        });
    }

    for (int i = 0; i < readers; ++i)
    {
        int k = writers + i;
        threads.emplace_back([&ready, &stm, k, i] {
            HPX_UNUSED(k);
            std::mt19937 urng(static_cast<std::uint32_t>(std::time(nullptr)));
            std::uniform_int_distribution<int> dist(1, 1000);

            while (!ready)
            { /*** wait... ***/
            }

            for (int j = 0; j < cycles; ++j)
            {
                // scope of shared_lock
                {
                    std::shared_lock<hpx::shared_mutex> sl(stm);

                    std::cout << "Reader " << i << " starting..." << std::endl;
                    hpx::this_thread::sleep_for(milliseconds(dist(urng)));
                    std::cout << "Reader " << i << " finished." << std::endl;
                }
                hpx::this_thread::sleep_for(milliseconds(dist(urng)));
            }
        });
    }

    ready = true;
    for (auto& t : threads)
        t.join();

    return hpx::local::finalize();
}

The above code creates writers and readers threads, each of which will perform cycles of operations. Both the writer and reader threads use the hpx::shared_mutex object stm to synchronize access to a shared resource.

  • For the writer threads, a unique_lock on the shared mutex is acquired before each write operation and is released after control leaves the scope in which the unique_lock object was created.

  • For the reader threads, a shared_lock on the shared mutex is acquired before each read operation and is released after control leaves the scope in which the shared_lock object was created.

Before each operation, both the reader and writer threads sleep for a random time period, which is generated using a random number generator. The random time period simulates the processing time of the operation.

Semaphore#

Semaphores are a synchronization mechanism used to control concurrent access to a shared resource. The two types of semaphores are:

  • counting semaphore: it has a counter that is bigger than zero. The counter is initialized in the constructor. Acquiring the semaphore decreases the counter and releasing the semaphore increases the counter. If a thread tries to acquire the semaphore when the counter is zero, the thread will block until another thread increments the counter by releasing the semaphore. Unlike hpx::mutex, an hpx::counting_semaphore is not bound to a thread, which means that the acquire and release call of a semaphore can happen on different threads.

  • binary semaphore: it is an alias for a hpx::counting_semaphore<1>. In this case, the least maximal value is 1. hpx::binary_semaphore can be used to implement locks.

#include <hpx/init.hpp>
#include <hpx/semaphore.hpp>
#include <hpx/thread.hpp>

#include <iostream>

// initialize the semaphore with a count of 3
hpx::counting_semaphore<> semaphore(3);

void worker()
{
    semaphore.acquire();    // decrement the semaphore's count
    std::cout << "Entering critical section" << std::endl;
    hpx::this_thread::sleep_for(std::chrono::seconds(1));
    semaphore.release();    // increment the semaphore's count
    std::cout << "Exiting critical section" << std::endl;
}

int hpx_main()
{
    hpx::thread t1(worker);
    hpx::thread t2(worker);
    hpx::thread t3(worker);
    hpx::thread t4(worker);
    hpx::thread t5(worker);

    t1.join();
    t2.join();
    t3.join();
    t4.join();
    t5.join();

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In this example, the counting semaphore is initialized to the value of 3. This means that up to 3 threads can access the critical section (the section of code inside the worker() function) at the same time. When a thread enters the critical section, it acquires the semaphore, which decrements the count, while when it exits the critical section, it releases the semaphore, incrementing thus the count. The worker() function simulates a critical section by acquiring the semaphore, sleeping for 1 second and then releasing the semaphore.

In the main function, 5 worker threads are created and started, each trying to enter the critical section. If the count of the semaphore is already 0, a worker will wait until another worker releases the semaphore (increasing its value).

Composable guards#

Composable guards operate in a manner similar to locks, but are applied only to asynchronous functions. The guard (or guards) is automatically locked at the beginning of a specified task and automatically unlocked at the end. Because guards are never added to an existing task’s execution context, the calling of guards is freely composable and can never deadlock.

To call an application with a single guard, simply declare the guard and call run_guarded() with a function (task):

hpx::lcos::local::guard gu;
run_guarded(gu,task);

If a single method needs to run with multiple guards, use a guard set:

std::shared_ptr<hpx::lcos::local::guard> gu1(new hpx::lcos::local::guard());
std::shared_ptr<hpx::lcos::local::guard> gu2(new hpx::lcos::local::guard());
gs.add(*gu1);
gs.add(*gu2);
run_guarded(gs,task);

Guards use two atomic operations (which are not called repeatedly) to manage what they do, so overhead should be extremely low.

Execution control#

The following objects are providing control of the execution in HPX applications:

  1. Futures

  2. Channels

  3. Task blocks

  4. Task groups

  5. Threads

Futures#

Futures are a mechanism to represent the result of a potentially asynchronous operation. A future is a type that represents a value that will become available at some point in the future, and it can be used to write asynchronous and parallel code. Futures can be returned from functions that perform time-consuming operations, allowing the calling code to continue executing while the function performs its work. The value of the future is set when the operation completes and can be accessed later. Futures are used in HPX to write asynchronous and parallel code. Below is an example demonstrating different features of futures:

#include <hpx/assert.hpp>
#include <hpx/future.hpp>
#include <hpx/hpx_main.hpp>
#include <hpx/tuple.hpp>

#include <iostream>
#include <utility>

int main()
{
    // Asynchronous execution with futures
    hpx::future<void> f1 = hpx::async(hpx::launch::async, []() {});
    hpx::shared_future<int> f2 =
        hpx::async(hpx::launch::async, []() { return 42; });
    hpx::future<int> f3 =
        f2.then([](hpx::shared_future<int>&& f) { return f.get() * 3; });

    hpx::promise<double> p;
    auto f4 = p.get_future();
    HPX_ASSERT(!f4.is_ready());
    p.set_value(123.45);
    HPX_ASSERT(f4.is_ready());

    hpx::packaged_task<int()> t([]() { return 43; });
    hpx::future<int> f5 = t.get_future();
    HPX_ASSERT(!f5.is_ready());
    t();
    HPX_ASSERT(f5.is_ready());

    // Fire-and-forget
    hpx::post([]() {
        std::cout << "This will be printed later\n" << std::flush;
    });

    // Synchronous execution
    hpx::sync([]() {
        std::cout << "This will be printed immediately\n" << std::flush;
    });

    // Combinators
    hpx::future<double> f6 = hpx::async([]() { return 3.14; });
    hpx::future<double> f7 = hpx::async([]() { return 42.0; });
    std::cout
        << hpx::when_all(f6, f7)
               .then([](hpx::future<
                         hpx::tuple<hpx::future<double>, hpx::future<double>>>
                             f) {
                   hpx::tuple<hpx::future<double>, hpx::future<double>> t =
                       f.get();
                   double pi = hpx::get<0>(t).get();
                   double r = hpx::get<1>(t).get();
                   return pi * r * r;
               })
               .get()
        << std::endl;

    // Easier continuations with dataflow; it waits for all future or
    // shared_future arguments before executing the continuation, and also
    // accepts non-future arguments
    hpx::future<double> f8 = hpx::async([]() { return 3.14; });
    hpx::future<double> f9 = hpx::make_ready_future(42.0);
    hpx::shared_future<double> f10 = hpx::async([]() { return 123.45; });
    hpx::future<hpx::tuple<double, double>> f11 = hpx::dataflow(
        [](hpx::future<double> a, hpx::future<double> b,
            hpx::shared_future<double> c, double d) {
            return hpx::make_tuple<>(a.get() + b.get(), c.get() / d);
        },
        f8, f9, f10, -3.9);

    // split_future gives a tuple of futures from a future of tuple
    hpx::tuple<hpx::future<double>, hpx::future<double>> f12 =
        hpx::split_future(std::move(f11));
    std::cout << hpx::get<1>(f12).get() << std::endl;

    return 0;
}

The first section of the main function demonstrates how to use futures for asynchronous execution. The first two lines create two futures, one for void and another for an integer, using the hpx::async() function. These futures are executed asynchronously in separate threads using the hpx::launch::async launch policy. The third future is created by chaining the second future using the then() member function. This future multiplies the result of the second future by 3.

The next part of the code demonstrates how to use promises and packaged tasks, which are constructs used for communicating data between threads. The promise class is used to store a value that can be retrieved later using a future. The packaged_task class represents a task that can be executed asynchronously, and its result can be obtained using a future. The last three lines create a packaged task that returns an integer, obtain its future, execute the task, and check whether the future is ready or not.

The code then demonstrates how to use the hpx::post() and hpx::sync() functions for fire-and-forget and synchronous execution, respectively. The hpx::post() function executes a given function asynchronously and returns immediately without waiting for the result. The hpx::sync() function executes a given function synchronously and waits for the result before returning.

Next the code demonstrates the use of combinators, which are higher-order functions that combine two or more futures into a single future. The hpx::when_all() function is used to combine two futures, which return double values, into a tuple of futures. The then() member function is then used to compute the area of a circle using the values of the two futures. The get() member function is used to retrieve the result of the computation.

The last section demonstrates the use of hpx::dataflow(), which is a higher-order function that waits for all the future or shared_future arguments to be ready before executing the continuation. The hpx::make_ready_future() function is used to create a future with a given value. The hpx::split_future() function is used to split a future of a tuple into a tuple of futures. The last line retrieves the value of the second future in the tuple using hpx::get() and prints it to the console.

Extended facilities for futures#

Concurrency is about both decomposing and composing the program from the parts that work well individually and together. It is in the composition of connected and multicore components where today’s C++ libraries are still lacking.

The functionality of std::future offers a partial solution. It allows for the separation of the initiation of an operation and the act of waiting for its result; however, the act of waiting is synchronous. In communication-intensive code this act of waiting can be unpredictable, inefficient and simply frustrating. The example below illustrates a possible synchronous wait using futures:

#include <future>
using namespace std;
int main()
{
    future<int> f = async([]() { return 123; });
    int result = f.get(); // might block
}

For this reason, HPX implements a set of extensions to std::future (as proposed by N4313). This proposal introduces the following key asynchronous operations to hpx::future, hpx::shared_future and hpx::async, which enhance and enrich these facilities.

Table 10 Facilities extending std::future#

Facility

Description

hpx::future::then

In asynchronous programming, it is very common for one asynchronous operation, on completion, to invoke a second operation and pass data to it. The current C++ standard does not allow one to register a continuation to a future. With then, instead of waiting for the result, a continuation is “attached” to the asynchronous operation, which is invoked when the result is ready. Continuations registered using then function will help to avoid blocking waits or wasting threads on polling, greatly improving the responsiveness and scalability of an application.

unwrapping constructor for hpx::future

In some scenarios, you might want to create a future that returns another future, resulting in nested futures. Although it is possible to write code to unwrap the outer future and retrieve the nested future and its result, such code is not easy to write because users must handle exceptions and it may cause a blocking call. Unwrapping can allow users to mitigate this problem by doing an asynchronous call to unwrap the outermost future.

hpx::future::is_ready

There are often situations where a get() call on a future may not be a blocking call, or is only a blocking call under certain circumstances. This function gives the ability to test for early completion and allows us to avoid associating a continuation, which needs to be scheduled with some non-trivial overhead and near-certain loss of cache efficiency.

hpx::make_ready_future

Some functions may know the value at the point of construction. In these cases the value is immediately available, but needs to be returned as a future. By using hpx::make_ready_future a future can be created that holds a pre-computed result in its shared state. In the current standard it is non-trivial to create a future directly from a value. First a promise must be created, then the promise is set, and lastly the future is retrieved from the promise. This can now be done with one operation.

The standard also omits the ability to compose multiple futures. This is a common pattern that is ubiquitous in other asynchronous frameworks and is absolutely necessary in order to make C++ a powerful asynchronous programming language. Not including these functions is synonymous to Boolean algebra without AND/OR.

In addition to the extensions proposed by N4313, HPX adds functions allowing users to compose several futures in a more flexible way.

Table 11 Facilities for composing hpx::futures#

Facility

Description

hpx::when_any, hpx::when_any_n

Asynchronously wait for at least one of multiple future or shared_future objects to finish.

hpx::wait_any, hpx::wait_any_n

Synchronously wait for at least one of multiple future or shared_future objects to finish.

hpx::when_all, hpx::when_all_n

Asynchronously wait for all future and shared_future objects to finish.

hpx::wait_all, hpx::wait_all_n

Synchronously wait for all future and shared_future objects to finish.

hpx::when_some, hpx::when_some_n

Asynchronously wait for multiple future and shared_future objects to finish.

hpx::wait_some, hpx::wait_some_n

Synchronously wait for multiple future and shared_future objects to finish.

hpx::when_each

Asynchronously wait for multiple future and shared_future objects to finish and call a function for each of the future objects as soon as it becomes ready.

hpx::wait_each, hpx::wait_each_n

Synchronously wait for multiple future and shared_future objects to finish and call a function for each of the future objects as soon as it becomes ready.

Channels#

Channels combine communication (the exchange of a value) with synchronization (guaranteeing that two calculations (tasks) are in a known state). A channel can transport any number of values of a given type from a sender to a receiver:

    hpx::lcos::local::channel<int> c;
    hpx::future<int> f = c.get();
    HPX_ASSERT(!f.is_ready());
    c.set(42);
    HPX_ASSERT(f.is_ready());
    std::cout << f.get() << std::endl;

Channels can be handed to another thread (or in case of channel components, to other localities), thus establishing a communication channel between two independent places in the program:

void do_something(hpx::lcos::local::receive_channel<int> c,
    hpx::lcos::local::send_channel<> done)
{
    // prints 43
    std::cout << c.get(hpx::launch::sync) << std::endl;
    // signal back
    done.set();
}

void send_receive_channel()
{
    hpx::lcos::local::channel<int> c;
    hpx::lcos::local::channel<> done;

    hpx::post(&do_something, c, done);

    // send some value
    c.set(43);
    // wait for thread to be done
    done.get().wait();
}

Note how hpx::lcos::local::channel::get without any arguments returns a future which is ready when a value has been set on the channel. The launch policy hpx::launch::sync can be used to make hpx::lcos::local::channel::get block until a value is set and return the value directly.

A channel component is created on one locality and can be sent to another locality using an action. This example also demonstrates how a channel can be used as a range of values:

// channel components need to be registered for each used type (not needed
// for hpx::lcos::local::channel)
HPX_REGISTER_CHANNEL(double)

void channel_sender(hpx::lcos::channel<double> c)
{
    for (double d : c)
        hpx::cout << d << std::endl;
}
HPX_PLAIN_ACTION(channel_sender)

void channel()
{
    // create the channel on this locality
    hpx::lcos::channel<double> c(hpx::find_here());

    // pass the channel to a (possibly remote invoked) action
    hpx::post(channel_sender_action(), hpx::find_here(), c);

    // send some values to the receiver
    std::vector<double> v = {1.2, 3.4, 5.0};
    for (double d : v)
        c.set(d);

    // explicitly close the communication channel (implicit at destruction)
    c.close();
}

Task blocks#

Task blocks in HPX provide a way to structure and organize the execution of tasks in a parallel program, making it easier to manage dependencies between tasks. A task block actually is a group of tasks that can be executed in parallel. Tasks in a task block can depend on other tasks in the same task block. The task block allows the runtime to optimize the execution of tasks, by scheduling them in an optimal order based on the dependencies between them.

The define_task_block, run and the wait functions implemented based on N4755 are based on the task_block concept that is a part of the common subset of the Microsoft Parallel Patterns Library (PPL) and the Intel Threading Building Blocks (TBB) libraries.

These implementations adopt a simpler syntax than exposed by those libraries— one that is influenced by language-based concepts, such as spawn and sync from Cilk++ and async and finish from X10. They improve on existing practice in the following ways:

  • The exception handling model is simplified and more consistent with normal C++ exceptions.

  • Most violations of strict fork-join parallelism can be enforced at compile time (with compiler assistance, in some cases).

  • The syntax allows scheduling approaches other than child stealing.

Consider an example of a parallel traversal of a tree, where a user-provided function compute is applied to each node of the tree, returning the sum of the results:

template <typename Func>
int traverse(node& n, Func && compute)
{
    int left = 0, right = 0;
    define_task_block(
        [&](task_block<>& tr) {
            if (n.left)
                tr.run([&] { left = traverse(*n.left, compute); });
            if (n.right)
                tr.run([&] { right = traverse(*n.right, compute); });
        });

    return compute(n) + left + right;
}

The example above demonstrates the use of two of the functions, hpx::experimental::define_task_block and the hpx::experimental::task_block::run member function of a hpx::experimental::task_block.

The task_block function delineates a region in a program code potentially containing invocations of threads spawned by the run member function of the task_block class. The run function spawns an HPX thread, a unit of work that is allowed to execute in parallel with respect to the caller. Any parallel tasks spawned by run within the task block are joined back to a single thread of execution at the end of the define_task_block. run takes a user-provided function object f and starts it asynchronously—i.e., it may return before the execution of f completes. The HPX scheduler may choose to run f immediately or delay running f until compute resources become available.

A task_block can be constructed only by define_task_block because it has no public constructors. Thus, run can be invoked directly or indirectly only from a user-provided function passed to define_task_block:

void g();

void f(task_block<>& tr)
{
    tr.run(g);          // OK, invoked from within task_block in h
}

void h()
{
    define_task_block(f);
}

int main()
{
    task_block<> tr;    // Error: no public constructor
    tr.run(g);          // No way to call run outside of a define_task_block
    return 0;
}

Extensions for task blocks#

Using execution policies with task blocks#

HPX implements some extensions for task_block beyond the actual standards proposal N4755. The main addition is that a task_block can be invoked with an execution policy as its first argument, very similar to the parallel algorithms.

An execution policy is an object that expresses the requirements on the ordering of functions invoked as a consequence of the invocation of a task block. Enabling passing an execution policy to define_task_block gives the user control over the amount of parallelism employed by the created task_block. In the following example the use of an explicit par execution policy makes the user’s intent explicit:

template <typename Func>
int traverse(node *n, Func&& compute)
{
    int left = 0, right = 0;

    define_task_block(
        execution::par,                // execution::parallel_policy
        [&](task_block<>& tb) {
            if (n->left)
                tb.run([&] { left = traverse(n->left, compute); });
            if (n->right)
                tb.run([&] { right = traverse(n->right, compute); });
        });

    return compute(n) + left + right;
}

This also causes the hpx::experimental::task_block object to be a template in our implementation. The template argument is the type of the execution policy used to create the task block. The template argument defaults to hpx::execution::parallel_policy.

HPX still supports calling hpx::experimental::define_task_block without an explicit execution policy. In this case the task block will run using the hpx::execution::parallel_policy.

HPX also adds the ability to access the execution policy that was used to create a given task_block.

Using executors to run tasks#

Often, users want to be able to not only define an execution policy to use by default for all spawned tasks inside the task block, but also to customize the execution context for one of the tasks executed by task_block::run. Adding an optionally passed executor instance to that function enables this use case:

template <typename Func>
int traverse(node *n, Func&& compute)
{
    int left = 0, right = 0;

    define_task_block(
        execution::par,                // execution::parallel_policy
        [&](auto& tb) {
            if (n->left)
            {
                // use explicitly specified executor to run this task
                tb.run(my_executor(), [&] { left = traverse(n->left, compute); });
            }
            if (n->right)
            {
                // use the executor associated with the par execution policy
                tb.run([&] { right = traverse(n->right, compute); });
            }
        });

    return compute(n) + left + right;
}

HPX still supports calling hpx::experimental::task_block::run without an explicit executor object. In this case the task will be run using the executor associated with the execution policy that was used to call hpx::experimental::define_task_block.

Task groups#

A task group in HPX is a synchronization primitive that allows you to execute a group of tasks concurrently and wait for their completion before continuing. The tasks in an hpx::experimental::task_group can be added dynamically. This is the HPX implementation of tbb::task_group of the Intel Threading Building Blocks (TBB) library.

The example below shows that to use a task group, you simply create an hpx::task_group object and add tasks to it using the run() method. Once all the tasks have been added, you can call the wait() method to synchronize the tasks and wait for them to complete.

#include <hpx/experimental/task_group.hpp>
#include <hpx/init.hpp>

#include <iostream>

void task1()
{
    std::cout << "Task 1 executed." << std::endl;
}

void task2()
{
    std::cout << "Task 2 executed." << std::endl;
}

int hpx_main()
{
    hpx::experimental::task_group tg;

    tg.run(task1);
    tg.run(task2);

    tg.wait();

    std::cout << "All tasks finished!" << std::endl;

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

Note

task groups and task blocks are both ways to group and synchronize parallel tasks, but task groups are used to group multiple tasks together as a single unit, while task blocks are used to execute a loop in parallel, with each iteration of the loop executing in a separate task. If the difference is not clear yet, continue reading.

A task group is a construct that allows multiple parallel tasks to be grouped together as a single unit. The task group provides a way to synchronize all the tasks in the group before continuing with the rest of the program.

A task block, on the other hand, is a parallel loop construct that allows you to execute a loop in parallel, with each iteration of the loop executing in a separate task. The loop iterations are executed in a block, meaning that the loop body is executed as a single task.

Threads#

A thread in HPX refers to a sequence of instructions that can be executed concurrently with other such sequences in multithreading environments, while sharing a same address space. These threads can communicate with each other through various means, such as futures or shared data structures.

The example below demonstrates how to launch multiple threads and synchronize them using a hpx::latch object. It also shows how to query the state of threads and wait for futures to complete.

#include <hpx/future.hpp>
#include <hpx/init.hpp>
#include <hpx/thread.hpp>

#include <functional>
#include <iostream>
#include <vector>

int const num_threads = 10;

///////////////////////////////////////////////////////////////////////////////
void wait_for_latch(hpx::latch& l)
{
    l.arrive_and_wait();
}

int hpx_main()
{
    // Spawn a couple of threads
    hpx::latch l(num_threads + 1);

    std::vector<hpx::future<void>> results;
    results.reserve(num_threads);

    for (int i = 0; i != num_threads; ++i)
        results.push_back(hpx::async(&wait_for_latch, std::ref(l)));

    // Allow spawned threads to reach latch
    hpx::this_thread::yield();

    // Enumerate all suspended threads
    hpx::threads::enumerate_threads(
        [](hpx::threads::thread_id_type id) -> bool {
            std::cout << "thread " << hpx::thread::id(id) << " is "
                      << hpx::threads::get_thread_state_name(
                             hpx::threads::get_thread_state(id))
                      << std::endl;
            return true;    // always continue enumeration
        },
        hpx::threads::thread_schedule_state::suspended);

    // Wait for all threads to reach this point.
    l.arrive_and_wait();

    hpx::wait_all(results);

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In more detail, the wait_for_latch() function is a simple helper function that waits for a hpx::latch object to be released. At this point we remind that hpx::latch is a synchronization primitive that allows multiple threads to wait for a common event to occur.

In the hpx_main() function, an hpx::latch object is created with a count of num_threads + 1, indicating that num_threads threads need to arrive at the latch before the latch is released. The loop that follows launches num_threads asynchronous operations, each of which calls the wait_for_latch function. The resulting futures are added to the vector.

After the threads have been launched, hpx::this_thread::yield() is called to give them a chance to reach the latch before the program proceeds. Then, the hpx::threads::enumerate_threads function prints the state of each suspended thread, while the next call of l.arrive_and_wait() waits for all the threads to reach the latch. Finally, hpx::wait_all is called to wait for all the futures to complete.

Hint

An advantage of using hpx::thread over other threading libraries is that it is optimized for high-performance parallelism, with support for lightweight threads and task scheduling to minimize thread overhead and maximize parallelism. Additionally, hpx::thread integrates seamlessly with other features of HPX such as futures, promises, and task groups, making it a powerful tool for parallel programming.

Checkout the examples of Shared mutex, Condition variable, Semaphore to see how HPX threads are used in combination with other features.

High level parallel facilities#

In preparation for the upcoming C++ Standards, there are currently several proposals targeting different facilities supporting parallel programming. HPX implements (and extends) some of those proposals. This is well aligned with our strategy to align the APIs exposed from HPX with current and future C++ Standards.

At this point, HPX implements several of the C++ Standardization working papers, most notably N4409 (Working Draft, Technical Specification for C++ Extensions for Parallelism), N4755 (Task Blocks), and N4406 (Parallel Algorithms Need Executors).

Using parallel algorithms#

A parallel algorithm is a function template declared in the namespace hpx::parallel.

All parallel algorithms are very similar in semantics to their sequential counterparts (as defined in the namespace std) with an additional formal template parameter named ExecutionPolicy. The execution policy is generally passed as the first argument to any of the parallel algorithms and describes the manner in which the execution of these algorithms may be parallelized and the manner in which they apply user-provided function objects.

The applications of function objects in parallel algorithms invoked with an execution policy object of type hpx::execution::sequenced_policy or hpx::execution::sequenced_task_policy execute in sequential order. For hpx::execution::sequenced_policy the execution happens in the calling thread.

The applications of function objects in parallel algorithms invoked with an execution policy object of type hpx::execution::parallel_policy or hpx::execution::parallel_task_policy are permitted to execute in an unordered fashion in unspecified threads, and are indeterminately sequenced within each thread.

Important

It is the caller’s responsibility to ensure correctness, such as making sure that the invocation does not introduce data races or deadlocks.

The example below demonstrates how to perform a sequential and parallel hpx::for_each loop on a vector of integers.

#include <hpx/algorithm.hpp>
#include <hpx/execution.hpp>
#include <hpx/init.hpp>

#include <iostream>
#include <vector>

int hpx_main()
{
    std::vector<int> v{1, 2, 3, 4, 5};

    auto print = [](const int& n) { std::cout << n << ' '; };

    std::cout << "Print sequential: ";
    hpx::for_each(v.begin(), v.end(), print);
    std::cout << '\n';

    std::cout << "Print parallel: ";
    hpx::for_each(hpx::execution::par, v.begin(), v.end(), print);
    std::cout << '\n';

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

The above code uses hpx::for_each to print the elements of the vector v{1, 2, 3, 4, 5}. At first, hpx::for_each() is called without an execution policy, which means that it applies the lambda function print to each element in the vector sequentially. Hence, the elements are printed in order.

Next, hpx::for_each() is called with the hpx::execution::par execution policy, which applies the lambda function print to each element in the vector in parallel. Therefore, the output order of the elements in the vector is not deterministic and may vary from run to run.

Parallel exceptions#

During the execution of a standard parallel algorithm, if temporary memory resources are required by any of the algorithms and no memory is available, the algorithm throws a std::bad_alloc exception.

During the execution of any of the parallel algorithms, if the application of a function object terminates with an uncaught exception, the behavior of the program is determined by the type of execution policy used to invoke the algorithm:

For example, the number of invocations of the user-provided function object in for_each is unspecified. When hpx::for_each is executed sequentially, only one exception will be contained in the hpx::exception_list object.

These guarantees imply that, unless the algorithm has failed to allocate memory and terminated with std::bad_alloc, all exceptions thrown during the execution of the algorithm are communicated to the caller. It is unspecified whether an algorithm implementation will “forge ahead” after encountering and capturing a user exception.

The algorithm may terminate with the std::bad_alloc exception even if one or more user-provided function objects have terminated with an exception. For example, this can happen when an algorithm fails to allocate memory while creating or adding elements to the hpx::exception_list object.

Parallel algorithms#

HPX provides implementations of the following parallel algorithms:

Table 12 Non-modifying parallel algorithms of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::adjacent_find

Computes the differences between adjacent elements in a range.

adjacent_find

hpx::all_of

Checks if a predicate is true for all of the elements in a range.

all_any_none_of

hpx::any_of

Checks if a predicate is true for any of the elements in a range.

all_any_none_of

hpx::count

Returns the number of elements equal to a given value.

count

hpx::count_if

Returns the number of elements satisfying a specific criteria.

count_if

hpx::equal

Determines if two sets of elements are the same.

equal

hpx::find

Finds the first element equal to a given value.

find

hpx::find_end

Finds the last sequence of elements in a certain range.

find_end

hpx::find_first_of

Searches for any one of a set of elements.

find_first_of

hpx::find_if

Finds the first element satisfying a specific criteria.

find_if

hpx::find_if_not

Finds the first element not satisfying a specific criteria.

find_if_not

hpx::for_each

Applies a function to a range of elements.

for_each

hpx::for_each_n

Applies a function to a number of elements.

for_each_n

hpx::lexicographical_compare

Checks if a range of values is lexicographically less than another range of values.

lexicographical_compare

hpx::mismatch

Finds the first position where two ranges differ.

mismatch

hpx::none_of

Checks if a predicate is true for none of the elements in a range.

all_any_none_of

hpx::search

Searches for a range of elements.

search

hpx::search_n

Searches for a number consecutive copies of an element in a range.

search_n


Table 13 Modifying parallel algorithms of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::copy

Copies a range of elements to a new location.

exclusive_scan

hpx::copy_n

Copies a number of elements to a new location.

copy_n

hpx::copy_if

Copies the elements from a range to a new location for which the given predicate is true

copy

hpx::move

Moves a range of elements to a new location.

move

hpx::fill

Assigns a range of elements a certain value.

fill

hpx::fill_n

Assigns a value to a number of elements.

fill_n

hpx::generate

Saves the result of a function in a range.

generate

hpx::generate_n

Saves the result of N applications of a function.

generate_n

hpx::experimental::reduce_by_key

Performs an inclusive scan on consecutive elements with matching keys, with a reduction to output only the final sum for each key. The key sequence {1,1,1,2,3,3,3,3,1} and value sequence {2,3,4,5,6,7,8,9,10} would be reduced to keys={1,2,3,1}, values={9,5,30,10}.

hpx::remove

Removes the elements from a range that are equal to the given value.

remove

hpx::remove_if

Removes the elements from a range that are equal to the given predicate is false

remove

hpx::remove_copy

Copies the elements from a range to a new location that are not equal to the given value.

remove_copy

hpx::remove_copy_if

Copies the elements from a range to a new location for which the given predicate is false

remove_copy

hpx::replace

Replaces all values satisfying specific criteria with another value.

replace

hpx::replace_if

Replaces all values satisfying specific criteria with another value.

replace

hpx::replace_copy

Copies a range, replacing elements satisfying specific criteria with another value.

replace_copy

hpx::replace_copy_if

Copies a range, replacing elements satisfying specific criteria with another value.

replace_copy

hpx::reverse

Reverses the order elements in a range.

reverse

hpx::reverse_copy

Creates a copy of a range that is reversed.

reverse_copy

hpx::rotate

Rotates the order of elements in a range.

rotate

hpx::rotate_copy

Copies and rotates a range of elements.

rotate_copy

hpx::shift_left

Shifts the elements in the range left by n positions.

shift_left

hpx::shift_right

Shifts the elements in the range right by n positions.

shift_right

hpx::swap_ranges

Swaps two ranges of elements.

swap_ranges

hpx::transform

Applies a function to a range of elements.

transform

hpx::unique

Eliminates all but the first element from every consecutive group of equivalent elements from a range.

unique

hpx::unique_copy

Copies the elements from one range to another in such a way that there are no consecutive equal elements.

unique_copy


Table 14 Set operations on sorted sequences of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::merge

Merges two sorted ranges.

merge

hpx::inplace_merge

Merges two ordered ranges in-place.

inplace_merge

hpx::includes

Returns true if one set is a subset of another.

includes

hpx::set_difference

Computes the difference between two sets.

set_difference

hpx::set_intersection

Computes the intersection of two sets.

set_intersection

hpx::set_symmetric_difference

Computes the symmetric difference between two sets.

set_symmetric_difference

hpx::set_union

Computes the union of two sets.

set_union


Table 15 Heap operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::is_heap

Returns true if the range is max heap.

is_heap

hpx::is_heap_until

Returns the first element that breaks a max heap.

is_heap_until

hpx::make_heap

Constructs a max heap in the range [first, last).

make_heap


Table 16 Minimum/maximum operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::max_element

Returns the largest element in a range.

max_element

hpx::min_element

Returns the smallest element in a range.

min_element

hpx::minmax_element

Returns the smallest and the largest element in a range.

minmax_element


Table 17 Partitioning Operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::nth_element

Partially sorts the given range making sure that it is partitioned by the given element

nth_element

hpx::is_partitioned

Returns true if each true element for a predicate precedes the false elements in a range.

is_partitioned

hpx::partition

Divides elements into two groups without preserving their relative order.

partition

hpx::partition_copy

Copies a range dividing the elements into two groups.

partition_copy

hpx::stable_partition

Divides elements into two groups while preserving their relative order.

stable_partition


Table 18 Sorting Operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::is_sorted

Returns true if each element in a range is sorted.

is_sorted

hpx::is_sorted_until

Returns the first unsorted element.

is_sorted_until

hpx::sort

Sorts the elements in a range.

sort

hpx::stable_sort

Sorts the elements in a range, maintain sequence of equal elements.

stable_sort

hpx::partial_sort

Sorts the first elements in a range.

partial_sort

hpx::partial_sort_copy

Sorts the first elements in a range, storing the result in another range.

partial_sort_copy

hpx::experimental::sort_by_key

Sorts one range of data using keys supplied in another range.


Table 19 Numeric Parallel Algorithms of header hpx/numeric.hpp#

Name

Description

C++ standard

hpx::adjacent_difference

Calculates the difference between each element in an input range and the preceding element.

adjacent_difference

hpx::exclusive_scan

Does an exclusive parallel scan over a range of elements.

exclusive_scan

hpx::inclusive_scan

Does an inclusive parallel scan over a range of elements.

inclusive_scan

hpx::reduce

Sums up a range of elements.

reduce

hpx::transform_exclusive_scan

Does an exclusive parallel scan over a range of elements after applying a function.

transform_exclusive_scan

hpx::transform_inclusive_scan

Does an inclusive parallel scan over a range of elements after applying a function.

transform_inclusive_scan

hpx::transform_reduce

Sums up a range of elements after applying a function. Also, accumulates the inner products of two input ranges.

transform_reduce


Table 20 Dynamic Memory Management of header hpx/memory.hpp#

Name

Description

C++ standard

hpx::destroy

Destroys a range of objects.

destroy

hpx::destroy_n

Destroys a range of objects.

destroy_n

hpx::uninitialized_copy

Copies a range of objects to an uninitialized area of memory.

uninitialized_copy

hpx::uninitialized_copy_n

Copies a number of objects to an uninitialized area of memory.

uninitialized_copy_n

hpx::uninitialized_default_construct

Copies a range of objects to an uninitialized area of memory.

uninitialized_default_construct

hpx::uninitialized_default_construct_n

Copies a number of objects to an uninitialized area of memory.

uninitialized_default_construct_n

hpx::uninitialized_fill

Copies an object to an uninitialized area of memory.

uninitialized_fill

hpx::uninitialized_fill_n

Copies an object to an uninitialized area of memory.

uninitialized_fill_n

hpx::uninitialized_move

Moves a range of objects to an uninitialized area of memory.

uninitialized_move

hpx::uninitialized_move_n

Moves a number of objects to an uninitialized area of memory.

uninitialized_move_n

hpx::uninitialized_value_construct

Constructs objects in an uninitialized area of memory.

uninitialized_value_construct

hpx::uninitialized_value_construct_n

Constructs objects in an uninitialized area of memory.

uninitialized_value_construct_n


Table 21 Index-based for-loops of header hpx/algorithm.hpp#

Name

Description

hpx::experimental::for_loop

Implements loop functionality over a range specified by integral or iterator bounds.

hpx::experimental::for_loop_strided

Implements loop functionality over a range specified by integral or iterator bounds.

hpx::experimental::for_loop_n

Implements loop functionality over a range specified by integral or iterator bounds.

hpx::experimental::for_loop_n_strided

Implements loop functionality over a range specified by integral or iterator bounds.

Executor parameters and executor parameter traits#

HPX introduces the notion of execution parameters and execution parameter traits. At this point, the only parameter that can be customized is the size of the chunks of work executed on a single HPX thread (such as the number of loop iterations combined to run as a single task).

An executor parameter object is responsible for exposing the calculation of the size of the chunks scheduled. It abstracts the (potentially platform-specific) algorithms of determining those chunk sizes.

The way executor parameters are implemented is aligned with the way executors are implemented. All functionalities of concrete executor parameter types are exposed and accessible through a corresponding customization point, e.g. get_chunk_size().

With executor_parameter_traits, clients access all types of executor parameters uniformly, e.g.:

std::size_t chunk_size =
    hpx::execution::get_chunk_size(my_parameter, my_executor,
        num_cores, num_tasks);

This call synchronously retrieves the size of a single chunk of loop iterations (or similar) to combine for execution on a single HPX thread if the overall number of cores num_cores and tasks to schedule is given by num_tasks. The lambda function exposes a means of test-probing the execution of a single iteration for performance measurement purposes. The execution parameter type might dynamically determine the execution time of one or more tasks in order to calculate the chunk size; see hpx::execution::experimental::auto_chunk_size for an example of this executor parameter type.

Other functions in the interface exist to discover whether an executor parameter type should be invoked once (i.e., it returns a static chunk size; see hpx::execution::experimental::static_chunk_size) or whether it should be invoked for each scheduled chunk of work (i.e., it returns a variable chunk size; for an example, see hpx::execution::experimental::guided_chunk_size).

Although this interface appears to require executor parameter type authors to implement all different basic operations, none are required. In practice, all operations have sensible defaults. However, some executor parameter types will naturally specialize all operations for maximum efficiency.

HPX implements the following executor parameter types:

  • hpx::execution::experimental::auto_chunk_size: Loop iterations are divided into pieces and then assigned to threads. The number of loop iterations combined is determined based on measurements of how long the execution of 1% of the overall number of iterations takes. This executor parameter type makes sure that as many loop iterations are combined as necessary to run for the amount of time specified.

  • hpx::execution::experimental::static_chunk_size: Loop iterations are divided into pieces of a given size and then assigned to threads. If the size is not specified, the iterations are, if possible, evenly divided contiguously among the threads. This executor parameters type is equivalent to OpenMP’s STATIC scheduling directive.

  • hpx::execution::experimental::dynamic_chunk_size: Loop iterations are divided into pieces of a given size and then dynamically scheduled among the cores; when a core finishes one chunk, it is dynamically assigned another. If the size is not specified, the default chunk size is 1. This executor parameter type is equivalent to OpenMP’s DYNAMIC scheduling directive.

  • hpx::execution::experimental::guided_chunk_size: Iterations are dynamically assigned to cores in blocks as cores request them until no blocks remain to be assigned. This is similar to dynamic_chunk_size except that the block size decreases each time a number of loop iterations is given to a thread. The size of the initial block is proportional to number_of_iterations / number_of_cores. Subsequent blocks are proportional to number_of_iterations_remaining / number_of_cores. The optional chunk size parameter defines the minimum block size. The default minimal chunk size is 1. This executor parameter type is equivalent to OpenMP’s GUIDED scheduling directive.