Welcome to the HPX documentation!#

If you’re new to HPX you can get started with the Quick start guide. Don’t forget to read the Terminology section to learn about the most important concepts in HPX. The Examples give you a feel for how it is to write real HPX applications and the Manual contains detailed information about everything from building HPX to debugging it. There are links to blog posts and videos about HPX in Additional material.

If you can’t find what you’re looking for in the documentation, please:

You can find a comprehensive list of contact options on Support for deploying and using HPX.

See Citing HPX for details on how to cite HPX in publications. See HPX users for a list of institutions and projects using HPX.

There are also available a PDF version of this documentation as well as a Single HTML Page.

What is HPX?#

HPX is a C++ Standard Library for Concurrency and Parallelism. It implements all of the corresponding facilities as defined by the C++ Standard. Additionally, in HPX we implement functionalities proposed as part of the ongoing C++ standardization process. We also extend the C++ Standard APIs to the distributed case. HPX is developed by the STE||AR group (see People).

The goal of HPX is to create a high quality, freely available, open source implementation of a new programming model for conventional systems, such as classic Linux based Beowulf clusters or multi-socket highly parallel SMP nodes. At the same time, we want to have a very modular and well designed runtime system architecture which would allow us to port our implementation onto new computer system architectures. We want to use real-world applications to drive the development of the runtime system, coining out required functionalities and converging onto a stable API which will provide a smooth migration path for developers.

The API exposed by HPX is not only modeled after the interfaces defined by the C++11/14/17/20 ISO standard. It also adheres to the programming guidelines used by the Boost collection of C++ libraries. We aim to improve the scalability of today’s applications and to expose new levels of parallelism which are necessary to take advantage of the exascale systems of the future.

What’s so special about HPX?#

  • HPX exposes a uniform, standards-oriented API for ease of programming parallel and distributed applications.

  • It enables programmers to write fully asynchronous code using hundreds of millions of threads.

  • HPX provides unified syntax and semantics for local and remote operations.

  • HPX makes concurrency manageable with dataflow and future based synchronization.

  • It implements a rich set of runtime services supporting a broad range of use cases.

  • HPX exposes a uniform, flexible, and extendable performance counter framework which can enable runtime adaptivity

  • It is designed to solve problems conventionally considered to be scaling-impaired.

  • HPX has been designed and developed for systems of any scale, from hand-held devices to very large scale systems.

  • It is the first fully functional implementation of the ParalleX execution model.

  • HPX is published under a liberal open-source license and has an open, active, and thriving developer community.

Quick start#

The following steps will help you get started with HPX.

Installing HPX#

The easiest way to install HPX on your system is by choosing one of the steps below:

  1. vcpkg

    You can download and install HPX using the vcpkg dependency manager:

    $ vcpkg install hpx
    
  2. Spack

    Another way to install HPX is using Spack:

    $ spack install hpx
    
  3. Fedora

    Installation can be done with Fedora as well:

    $ dnf install hpx*
    
  4. Arch Linux

    HPX is available in the Arch User Repository (AUR) as hpx too.

More information or alternatives regarding the installation can be found in the Building HPX, a detailed guide with thorough explanation of ways to build and use HPX.

Hello, World!#

To get started with this minimal example you need to create a new project directory and a file CMakeLists.txt with the contents below in order to build an executable using CMake and HPX:

cmake_minimum_required(VERSION 3.18)
project(my_hpx_project CXX)
find_package(HPX REQUIRED)
add_executable(my_hpx_program main.cpp)
target_link_libraries(my_hpx_program HPX::hpx HPX::wrap_main HPX::iostreams_component)

The next step is to create a main.cpp with the contents below:

// Including 'hpx/hpx_main.hpp' instead of the usual 'hpx/hpx_init.hpp' enables
// to use the plain C-main below as the direct main HPX entry point.
#include <hpx/hpx_main.hpp>
#include <hpx/iostream.hpp>

int main()
{
    // Say hello to the world!
    hpx::cout << "Hello World!\n" << std::flush;
    return 0;
}

Then, in your project directory run the following:

$ mkdir build && cd build
$ cmake -DCMAKE_PREFIX_PATH=/path/to/hpx/installation ..
$ make all
$ ./my_hpx_program
$ ./my_hpx_program
Hello World!

The program looks almost like a regular C++ hello world with the exception of the two includes and hpx::cout.

  • When you include hpx_main.hpp HPX makes sure that main actually gets launched on the HPX runtime. So while it looks almost the same you can now use futures, async, parallel algorithms and more which make use of the HPX runtime with lightweight threads.

  • hpx::cout is a replacement for std::cout to make sure printing never blocks a lightweight thread. You can read more about hpx::cout in The HPX I/O-streams component.

Note

Caution

When including hpx_main.hpp the user-defined main gets renamed and the real main function is defined by HPX. This means that the user-defined main must include a return statement, unlike the real main. If you do not include the return statement, you may end up with confusing compile time errors mentioning user_main or even runtime errors.

Writing task-based applications#

So far we haven’t done anything that can’t be done using the C++ standard library. In this section we will give a short overview of what you can do with HPX on a single node. The essence is to avoid global synchronization and break up your application into small, composable tasks whose dependencies control the flow of your application. Remember, however, that HPX allows you to write distributed applications similarly to how you would write applications for a single node (see Why HPX? and Writing distributed HPX applications).

If you are already familiar with async and futures from the C++ standard library, the same functionality is available in HPX.

The following terminology is essential when talking about task-based C++ programs:

  • lightweight thread: Essential for good performance with task-based programs. Lightweight refers to smaller stacks and faster context switching compared to OS threads. Smaller overheads allow the program to be broken up into smaller tasks, which in turns helps the runtime fully utilize all processing units.

  • async: The most basic way of launching tasks asynchronously. Returns a future<T>.

  • future<T>: Represents a value of type T that will be ready in the future. The value can be retrieved with get (blocking) and one can check if the value is ready with is_ready (non-blocking).

  • shared_future<T>: Same as future<T> but can be copied (similar to std::unique_ptr vs std::shared_ptr).

  • continuation: A function that is to be run after a previous task has run (represented by a future). then is a method of future<T> that takes a function to run next. Used to build up dataflow DAGs (directed acyclic graphs). shared_futures help you split up nodes in the DAG and functions like when_all help you join nodes in the DAG.

The following example is a collection of the most commonly used functionality in HPX:

#include <hpx/algorithm.hpp>
#include <hpx/future.hpp>
#include <hpx/init.hpp>

#include <iostream>
#include <random>
#include <vector>

void final_task(hpx::future<hpx::tuple<hpx::future<double>, hpx::future<void>>>)
{
    std::cout << "in final_task" << std::endl;
}

int hpx_main()
{
    // A function can be launched asynchronously. The program will not block
    // here until the result is available.
    hpx::future<int> f = hpx::async([]() { return 42; });
    std::cout << "Just launched a task!" << std::endl;

    // Use get to retrieve the value from the future. This will block this task
    // until the future is ready, but the HPX runtime will schedule other tasks
    // if there are tasks available.
    std::cout << "f contains " << f.get() << std::endl;

    // Let's launch another task.
    hpx::future<double> g = hpx::async([]() { return 3.14; });

    // Tasks can be chained using the then method. The continuation takes the
    // future as an argument.
    hpx::future<double> result = g.then([](hpx::future<double>&& gg) {
        // This function will be called once g is ready. gg is g moved
        // into the continuation.
        return gg.get() * 42.0 * 42.0;
    });

    // You can check if a future is ready with the is_ready method.
    std::cout << "Result is ready? " << result.is_ready() << std::endl;

    // You can launch other work in the meantime. Let's sort a vector.
    std::vector<int> v(1000000);

    // We fill the vector synchronously and sequentially.
    hpx::generate(hpx::execution::seq, std::begin(v), std::end(v), &std::rand);

    // We can launch the sort in parallel and asynchronously.
    hpx::future<void> done_sorting =
        hpx::sort(hpx::execution::par(          // In parallel.
                      hpx::execution::task),    // Asynchronously.
            std::begin(v), std::end(v));

    // We launch the final task when the vector has been sorted and result is
    // ready using when_all.
    auto all = hpx::when_all(result, done_sorting).then(&final_task);

    // We can wait for all to be ready.
    all.wait();

    // all must be ready at this point because we waited for it to be ready.
    std::cout << (all.is_ready() ? "all is ready!" : "all is not ready...")
              << std::endl;

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

Try copying the contents to your main.cpp file and look at the output. It can be a good idea to go through the program step by step with a debugger. You can also try changing the types or adding new arguments to functions to make sure you can get the types to match. The type of the then method can be especially tricky to get right (the continuation needs to take the future as an argument).

Note

HPX programs accept command line arguments. The most important one is --hpx:threads=N to set the number of OS threads used by HPX. HPX uses one thread per core by default. Play around with the example above and see what difference the number of threads makes on the sort function. See Launching and configuring HPX applications for more details on how and what options you can pass to HPX.

Tip

The example above used the construction hpx::when_all(...).then(...). For convenience and performance it is a good idea to replace uses of hpx::when_all(...).then(...) with dataflow. See Dataflow for more details on dataflow.

Tip

If possible, try to use the provided parallel algorithms instead of writing your own implementation. This can save you time and the resulting program is often faster.

Next steps#

If you haven’t done so already, reading the Terminology section will help you get familiar with the terms used in HPX.

The Examples section contains small, self-contained walkthroughs of example HPX programs. The Local to remote example is a thorough, realistic example starting from a single node implementation and going stepwise to a distributed implementation.

The Manual contains detailed information on writing, building and running HPX applications.

Examples#

The following sections analyze some examples to help you get familiar with the HPX style of programming. We start off with simple examples that utilize basic HPX elements and then begin to expose the reader to the more complex and powerful HPX concepts. Section Building tests and examples shows how you can build the examples.

Asynchronous execution#

The Fibonacci sequence is a sequence of numbers starting with 0 and 1 where every subsequent number is the sum of the previous two numbers. In this example, we will use HPX to calculate the value of the n-th element of the Fibonacci sequence. In order to compute this problem in parallel, we will use a facility known as a future.

As shown in the Fig. 1 below, a future encapsulates a delayed computation. It acts as a proxy for a result initially not known, most of the time because the computation of the result has not completed yet. The future synchronizes the access of this value by optionally suspending any HPX-threads requesting the result until the value is available. When a future is created, it spawns a new HPX-thread (either remotely with a parcel or locally by placing it into the thread queue) which, when run, will execute the function associated with the future. The arguments of the function are bound when the future is created.

_images/future_schematics.png

Fig. 1 Schematic of a future execution.#

Once the function has finished executing, a write operation is performed on the future. The write operation marks the future as completed, and optionally stores data returned by the function. When the result of the delayed computation is needed, a read operation is performed on the future. If the future’s function hasn’t completed when a read operation is performed on it, the reader HPX-thread is suspended until the future is ready. The future facility allows HPX to schedule work early in a program so that when the function value is needed it will already be calculated and available. We use this property in our Fibonacci example below to enable its parallel execution.

Setup#

The source code for this example can be found here: fibonacci_local.cpp.

To compile this program, go to your HPX build directory (see Building HPX for information on configuring and building HPX) and enter:

$ make examples.quickstart.fibonacci_local

To run the program type:

$ ./bin/fibonacci_local

This should print (time should be approximate):

fibonacci(10) == 55
elapsed time: 0.002430 [s]

This run used the default settings, which calculate the tenth element of the Fibonacci sequence. To declare which Fibonacci value you want to calculate, use the --n-value option. Additionally you can use the --hpx:threads option to declare how many OS-threads you wish to use when running the program. For instance, running:

$ ./bin/fibonacci --n-value 20 --hpx:threads 4

Will yield:

fibonacci(20) == 6765
elapsed time: 0.062854 [s]
Walkthrough#

Now that you have compiled and run the code, let’s look at how the code works. Since this code is written in C++, we will begin with the main() function. Here you can see that in HPX, main() is only used to initialize the runtime system. It is important to note that application-specific command line options are defined here. HPX uses Boost.Program_options for command line processing. You can see that our programs --n-value option is set by calling the add_options() method on an instance of hpx::program_options::options_description. The default value of the variable is set to 10. This is why when we ran the program for the first time without using the --n-value option the program returned the 10th value of the Fibonacci sequence. The constructor argument of the description is the text that appears when a user uses the --hpx:help option to see what command line options are available. HPX_APPLICATION_STRING is a macro that expands to a string constant containing the name of the HPX application currently being compiled.

In HPX main() is used to initialize the runtime system and pass the command line arguments to the program. If you wish to add command line options to your program you would add them here using the instance of the Boost class options_description, and invoking the public member function .add_options() (see Boost Documentation for more details). hpx::init calls hpx_main() after setting up HPX, which is where the logic of our program is encoded.

int main(int argc, char* argv[])
{
    // Configure application-specific options
    hpx::program_options::options_description desc_commandline(
        "Usage: " HPX_APPLICATION_STRING " [options]");

    desc_commandline.add_options()("n-value",
        hpx::program_options::value<std::uint64_t>()->default_value(10),
        "n value for the Fibonacci function");

    // Initialize and run HPX
    hpx::local::init_params init_args;
    init_args.desc_cmdline = desc_commandline;

    return hpx::local::init(hpx_main, argc, argv, init_args);
}

The hpx::init function in main() starts the runtime system, and invokes hpx_main() as the first HPX-thread. Below we can see that the basic program is simple. The command line option --n-value is read in, a timer (hpx::chrono::high_resolution_timer) is set up to record the time it takes to do the computation, the fibonacci function is invoked synchronously, and the answer is printed out.

int hpx_main(hpx::program_options::variables_map& vm)
{
    // extract command line argument, i.e. fib(N)
    std::uint64_t n = vm["n-value"].as<std::uint64_t>();

    {
        // Keep track of the time required to execute.
        hpx::chrono::high_resolution_timer t;

        std::uint64_t r = fibonacci(n);

        char const* fmt = "fibonacci({1}) == {2}\nelapsed time: {3} [s]\n";
        hpx::util::format_to(std::cout, fmt, n, r, t.elapsed());
    }

    return hpx::local::finalize();    // Handles HPX shutdown
}

The fibonacci function itself is synchronous as the work done inside is asynchronous. To understand what is happening we have to look inside the fibonacci function:

std::uint64_t fibonacci(std::uint64_t n)
{
    if (n < 2)
        return n;

    // Invoking the Fibonacci algorithm twice is inefficient.
    // However, we intentionally demonstrate it this way to create some
    // heavy workload.

    hpx::future<std::uint64_t> n1 = hpx::async(fibonacci, n - 1);
    hpx::future<std::uint64_t> n2 = hpx::async(fibonacci, n - 2);

    return n1.get() +
        n2.get();    // wait for the Futures to return their values
}

This block of code looks similar to regular C++ code. First, if (n < 2), meaning n is 0 or 1, then we return 0 or 1 (recall the first element of the Fibonacci sequence is 0 and the second is 1). If n is larger than 1 we spawn two new tasks whose results are contained in n1 and n2. This is done using hpx::async which takes as arguments a function (function pointer, object or lambda) and the arguments to the function. Instead of returning a std::uint64_t like fibonacci does, hpx::async returns a future of a std::uint64_t, i.e. hpx::future<std::uint64_t>. Each of these futures represents an asynchronous, recursive call to fibonacci. After we’ve created the futures, we wait for both of them to finish computing, we add them together, and return that value as our result. We get the values from the futures using the get method. The recursive call tree will continue until n is equal to 0 or 1, at which point the value can be returned because it is implicitly known. When this termination condition is reached, the futures can then be added up, producing the n-th value of the Fibonacci sequence.

Note that calling get potentially blocks the calling HPX-thread, and lets other HPX-threads run in the meantime. There are, however, more efficient ways of doing this. examples/quickstart/fibonacci_futures.cpp contains many more variations of locally computing the Fibonacci numbers, where each method makes different tradeoffs in where asynchrony and parallelism is applied. To get started, however, the method above is sufficient and optimizations can be applied once you are more familiar with HPX. The example Dataflow presents dataflow, which is a way to more efficiently chain together multiple tasks.

Parallel algorithms#

This program will perform a matrix multiplication in parallel. The output will look something like this:

Matrix A is :
4 9 6
1 9 8

Matrix B is :
4 9
6 1
9 8

Resultant Matrix is :
124 93
111 127
Setup#

The source code for this example can be found here: matrix_multiplication.cpp.

To compile this program, go to your HPX build directory (see Building HPX for information on configuring and building HPX) and enter:

$ make examples.quickstart.matrix_multiplication

To run the program type:

$ ./bin/matrix_multiplication

or:

$ ./bin/matrix_multiplication --n 2 --m 3 --k 2 --s 100 --l 0 --u 10

where the first matrix is n x m and the second m x k, s is the seed for creating the random values of the matrices and the range of these values is [l,u]

This should print:

Matrix A is :
4 9 6
1 9 8

Matrix B is :
4 9
6 1
9 8

Resultant Matrix is :
124 93
111 127

Notice that the numbers may be different because of the random initialization of the matrices.

Walkthrough#

Now that you have compiled and run the code, let’s look at how the code works.

First, main() is used to initialize the runtime system and pass the command line arguments to the program. hpx::init calls hpx_main() after setting up HPX, which is where our program is implemented.

int main(int argc, char* argv[])
{
    using namespace hpx::program_options;
    options_description cmdline("usage: " HPX_APPLICATION_STRING " [options]");
    // clang-format off
    cmdline.add_options()
        ("n",
        hpx::program_options::value<std::size_t>()->default_value(2),
        "Number of rows of first matrix")
        ("m",
        hpx::program_options::value<std::size_t>()->default_value(3),
        "Number of columns of first matrix (equal to the number of rows of "
        "second matrix)")
        ("k",
        hpx::program_options::value<std::size_t>()->default_value(2),
        "Number of columns of second matrix")
        ("seed,s",
        hpx::program_options::value<unsigned int>(),
        "The random number generator seed to use for this run")
        ("l",
        hpx::program_options::value<int>()->default_value(0),
        "Lower limit of range of values")
        ("u",
        hpx::program_options::value<int>()->default_value(10),
        "Upper limit of range of values");
    // clang-format on
    hpx::local::init_params init_args;
    init_args.desc_cmdline = cmdline;

    return hpx::local::init(hpx_main, argc, argv, init_args);
}

Proceeding to the hpx_main() function, we can see that matrix multiplication can be done very easily.

int hpx_main(hpx::program_options::variables_map& vm)
{
    using element_type = int;

    // Define matrix sizes
    std::size_t const rowsA = vm["n"].as<std::size_t>();
    std::size_t const colsA = vm["m"].as<std::size_t>();
    std::size_t const rowsB = colsA;
    std::size_t const colsB = vm["k"].as<std::size_t>();
    std::size_t const rowsR = rowsA;
    std::size_t const colsR = colsB;

    // Initialize matrices A and B
    std::vector<int> A(rowsA * colsA);
    std::vector<int> B(rowsB * colsB);
    std::vector<int> R(rowsR * colsR);

    // Define seed
    unsigned int seed = std::random_device{}();
    if (vm.count("seed"))
        seed = vm["seed"].as<unsigned int>();

    gen.seed(seed);
    std::cout << "using seed: " << seed << std::endl;

    // Define range of values
    int const lower = vm["l"].as<int>();
    int const upper = vm["u"].as<int>();

    // Matrices have random values in the range [lower, upper]
    std::uniform_int_distribution<element_type> dis(lower, upper);
    auto generator = std::bind(dis, gen);
    hpx::ranges::generate(A, generator);
    hpx::ranges::generate(B, generator);

    // Perform matrix multiplication
    hpx::experimental::for_loop(hpx::execution::par, 0, rowsA, [&](auto i) {
        hpx::experimental::for_loop(0, colsB, [&](auto j) {
            R[i * colsR + j] = 0;
            hpx::experimental::for_loop(0, rowsB, [&](auto k) {
                R[i * colsR + j] += A[i * colsA + k] * B[k * colsB + j];
            });
        });
    });

    // Print all 3 matrices
    print_matrix(A, rowsA, colsA, "A");
    print_matrix(B, rowsB, colsB, "B");
    print_matrix(R, rowsR, colsR, "R");

    return hpx::local::finalize();
}

First, the dimensions of the matrices are defined. If they were not given as command-line arguments, their default values are 2 x 3 for the first matrix and 3 x 2 for the second. We use standard vectors to define the matrices to be multiplied as well as the resultant matrix.

To give some random initial values to our matrices, we use std::uniform_int_distribution. Then, std::bind() is used along with hpx::ranges::generate() to yield two matrices A and B, which contain values in the range of [0, 10] or in the range defined by the user at the command-line arguments. The seed to generate the values can also be defined by the user.

The next step is to perform the matrix multiplication in parallel. This can be done by just using an hpx::experimental::for_loop combined with a parallel execution policy hpx::execution::par as the outer loop of the multiplication. Note that the execution of hpx::experimental::for_loop without specifying an execution policy is equivalent to specifying hpx::execution::seq as the execution policy.

Finally, the matrices A, B that are multiplied as well as the resultant matrix R are printed using the following function.

void print_matrix(std::vector<int> const& M, std::size_t rows, std::size_t cols,
    char const* message)
{
    std::cout << "\nMatrix " << message << " is:" << std::endl;
    for (std::size_t i = 0; i < rows; i++)
    {
        for (std::size_t j = 0; j < cols; j++)
            std::cout << M[i * cols + j] << " ";
        std::cout << "\n";
    }
}

Asynchronous execution with actions#

This example extends the previous example by introducing actions: functions that can be run remotely. In this example, however, we will still only run the action locally. The mechanism to execute actions stays the same: hpx::async. Later examples will demonstrate running actions on remote localities (e.g. Remote execution with actions).

Setup#

The source code for this example can be found here: fibonacci.cpp.

To compile this program, go to your HPX build directory (see Building HPX for information on configuring and building HPX) and enter:

$ make examples.quickstart.fibonacci

To run the program type:

$ ./bin/fibonacci

This should print (time should be approximate):

fibonacci(10) == 55
elapsed time: 0.00186288 [s]

This run used the default settings, which calculate the tenth element of the Fibonacci sequence. To declare which Fibonacci value you want to calculate, use the --n-value option. Additionally you can use the --hpx:threads option to declare how many OS-threads you wish to use when running the program. For instance, running:

$ ./bin/fibonacci --n-value 20 --hpx:threads 4

Will yield:

fibonacci(20) == 6765
elapsed time: 0.233827 [s]
Walkthrough#

The code needed to initialize the HPX runtime is the same as in the previous example:

int main(int argc, char* argv[])
{
    // Configure application-specific options
    hpx::program_options::options_description desc_commandline(
        "Usage: " HPX_APPLICATION_STRING " [options]");

    desc_commandline.add_options()("n-value",
        hpx::program_options::value<std::uint64_t>()->default_value(10),
        "n value for the Fibonacci function");

    // Initialize and run HPX
    hpx::init_params init_args;
    init_args.desc_cmdline = desc_commandline;

    return hpx::init(argc, argv, init_args);
}

The hpx::init function in main() starts the runtime system, and invokes hpx_main() as the first HPX-thread. The command line option --n-value is read in, a timer (hpx::chrono::high_resolution_timer) is set up to record the time it takes to do the computation, the fibonacci action is invoked synchronously, and the answer is printed out.

int hpx_main(hpx::program_options::variables_map& vm)
{
    // extract command line argument, i.e. fib(N)
    std::uint64_t n = vm["n-value"].as<std::uint64_t>();

    {
        // Keep track of the time required to execute.
        hpx::chrono::high_resolution_timer t;

        // Wait for fib() to return the value
        fibonacci_action fib;
        std::uint64_t r = fib(hpx::find_here(), n);

        char const* fmt = "fibonacci({1}) == {2}\nelapsed time: {3} [s]\n";
        hpx::util::format_to(std::cout, fmt, n, r, t.elapsed());
    }

    return hpx::finalize();    // Handles HPX shutdown
}

Upon a closer look we see that we’ve created a std::uint64_t to store the result of invoking our fibonacci_action fib. This action will launch synchronously (as the work done inside of the action will be asynchronous itself) and return the result of the Fibonacci sequence. But wait, what is an action? And what is this fibonacci_action? For starters, an action is a wrapper for a function. By wrapping functions, HPX can send packets of work to different processing units. These vehicles allow users to calculate work now, later, or on certain nodes. The first argument to our action is the location where the action should be run. In this case, we just want to run the action on the machine that we are currently on, so we use hpx::find_here. To further understand this we turn to the code to find where fibonacci_action was defined:

// forward declaration of the Fibonacci function
std::uint64_t fibonacci(std::uint64_t n);

// This is to generate the required boilerplate we need for the remote
// invocation to work.
HPX_PLAIN_ACTION(fibonacci, fibonacci_action)

A plain action is the most basic form of action. Plain actions wrap simple global functions which are not associated with any particular object (we will discuss other types of actions in Components and actions). In this block of code the function fibonacci() is declared. After the declaration, the function is wrapped in an action in the declaration HPX_PLAIN_ACTION. This function takes two arguments: the name of the function that is to be wrapped and the name of the action that you are creating.

This picture should now start making sense. The function fibonacci() is wrapped in an action fibonacci_action, which was run synchronously but created asynchronous work, then returns a std::uint64_t representing the result of the function fibonacci(). Now, let’s look at the function fibonacci():

std::uint64_t fibonacci(std::uint64_t n)
{
    if (n < 2)
        return n;

    // We restrict ourselves to execute the Fibonacci function locally.
    hpx::id_type const locality_id = hpx::find_here();

    // Invoking the Fibonacci algorithm twice is inefficient.
    // However, we intentionally demonstrate it this way to create some
    // heavy workload.

    fibonacci_action fib;
    hpx::future<std::uint64_t> n1 = hpx::async(fib, locality_id, n - 1);
    hpx::future<std::uint64_t> n2 = hpx::async(fib, locality_id, n - 2);

    return n1.get() +
        n2.get();    // wait for the Futures to return their values
}

This block of code is much more straightforward and should look familiar from the previous example. First, if (n < 2), meaning n is 0 or 1, then we return 0 or 1 (recall the first element of the Fibonacci sequence is 0 and the second is 1). If n is larger than 1 we spawn two tasks using hpx::async. Each of these futures represents an asynchronous, recursive call to fibonacci. As previously we wait for both futures to finish computing, get the results, add them together, and return that value as our result. The recursive call tree will continue until n is equal to 0 or 1, at which point the value can be returned because it is implicitly known. When this termination condition is reached, the futures can then be added up, producing the n-th value of the Fibonacci sequence.

Remote execution with actions#

This program will print out a hello world message on every OS-thread on every locality. The output will look something like this:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 1 on locality 1
hello world from OS-thread 0 on locality 0
hello world from OS-thread 0 on locality 1
Setup#

The source code for this example can be found here: hello_world_distributed.cpp.

To compile this program, go to your HPX build directory (see Building HPX for information on configuring and building HPX) and enter:

$ make examples.quickstart.hello_world_distributed

To run the program type:

$ ./bin/hello_world_distributed

This should print:

hello world from OS-thread 0 on locality 0

To use more OS-threads use the command line option --hpx:threads and type the number of threads that you wish to use. For example, typing:

$ ./bin/hello_world_distributed --hpx:threads 2

will yield:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0

Notice how the ordering of the two print statements will change with subsequent runs. To run this program on multiple localities please see the section How to use HPX applications with PBS.

Walkthrough#

Now that you have compiled and run the code, let’s look at how the code works, beginning with main():

// Here is the main entry point. By using the include 'hpx/hpx_main.hpp' HPX
// will invoke the plain old C-main() as its first HPX thread.
int main()
{
    // Get a list of all available localities.
    std::vector<hpx::id_type> localities = hpx::find_all_localities();

    // Reserve storage space for futures, one for each locality.
    std::vector<hpx::future<void>> futures;
    futures.reserve(localities.size());

    for (hpx::id_type const& node : localities)
    {
        // Asynchronously start a new task. The task is encapsulated in a
        // future, which we can query to determine if the task has
        // completed.
        typedef hello_world_foreman_action action_type;
        futures.push_back(hpx::async<action_type>(node));
    }

    // The non-callback version of hpx::wait_all takes a single parameter,
    // a vector of futures to wait on. hpx::wait_all only returns when
    // all of the futures have finished.
    hpx::wait_all(futures);
    return 0;
}

In this excerpt of the code we again see the use of futures. This time the futures are stored in a vector so that they can easily be accessed. hpx::wait_all is a family of functions that wait on for an std::vector<> of futures to become ready. In this piece of code, we are using the synchronous version of hpx::wait_all, which takes one argument (the std::vector<> of futures to wait on). This function will not return until all the futures in the vector have been executed.

In Asynchronous execution with actions we used hpx::find_here to specify the target of our actions. Here, we instead use hpx::find_all_localities, which returns an std::vector<> containing the identifiers of all the machines in the system, including the one that we are on.

As in Asynchronous execution with actions our futures are set using hpx::async<>. The hello_world_foreman_action is declared here:

// Define the boilerplate code necessary for the function 'hello_world_foreman'
// to be invoked as an HPX action.
HPX_PLAIN_ACTION(hello_world_foreman, hello_world_foreman_action)

Another way of thinking about this wrapping technique is as follows: functions (the work to be done) are wrapped in actions, and actions can be executed locally or remotely (e.g. on another machine participating in the computation).

Now it is time to look at the hello_world_foreman() function which was wrapped in the action above:

void hello_world_foreman()
{
    // Get the number of worker OS-threads in use by this locality.
    std::size_t const os_threads = hpx::get_os_thread_count();

    // Populate a set with the OS-thread numbers of all OS-threads on this
    // locality. When the hello world message has been printed on a particular
    // OS-thread, we will remove it from the set.
    std::set<std::size_t> attendance;
    for (std::size_t os_thread = 0; os_thread < os_threads; ++os_thread)
        attendance.insert(os_thread);

    // As long as there are still elements in the set, we must keep scheduling
    // HPX-threads. Because HPX features work-stealing task schedulers, we have
    // no way of enforcing which worker OS-thread will actually execute
    // each HPX-thread.
    while (!attendance.empty())
    {
        // Each iteration, we create a task for each element in the set of
        // OS-threads that have not said "Hello world". Each of these tasks
        // is encapsulated in a future.
        std::vector<hpx::future<std::size_t>> futures;
        futures.reserve(attendance.size());

        for (std::size_t worker : attendance)
        {
            // Asynchronously start a new task. The task is encapsulated in a
            // future that we can query to determine if the task has completed.
            //
            // We give the task a hint to run on a particular worker thread
            // (core) and suggest binding the scheduled thread to the given
            // core, but no guarantees are given by the scheduler that the task
            // will actually run on that worker thread. It will however try as
            // hard as possible to place the new task on the given worker
            // thread.
            hpx::execution::parallel_executor exec(
                hpx::threads::thread_priority::bound);

            hpx::threads::thread_schedule_hint hint(
                hpx::threads::thread_schedule_hint_mode::thread,
                static_cast<std::int16_t>(worker));

            futures.push_back(
                hpx::async(hpx::execution::experimental::with_hint(exec, hint),
                    hello_world_worker, worker));
        }

        // Wait for all of the futures to finish. The callback version of the
        // hpx::wait_each function takes two arguments: a vector of futures,
        // and a binary callback.  The callback takes two arguments; the first
        // is the index of the future in the vector, and the second is the
        // return value of the future. hpx::wait_each doesn't return until
        // all the futures in the vector have returned.
        hpx::spinlock mtx;
        hpx::wait_each(hpx::unwrapping([&](std::size_t t) {
            if (std::size_t(-1) != t)
            {
                std::lock_guard<hpx::spinlock> lk(mtx);
                attendance.erase(t);
            }
        }),
            futures);
    }
}

Now, before we discuss hello_world_foreman(), let’s talk about the hpx::wait_each function. The version of hpx::wait_each invokes a callback function provided by the user, supplying the callback function with the result of the future.

In hello_world_foreman(), an std::set<> called attendance keeps track of which OS-threads have printed out the hello world message. When the OS-thread prints out the statement, the future is marked as ready, and hpx::wait_each in hello_world_foreman(). If it is not executing on the correct OS-thread, it returns a value of -1, which causes hello_world_foreman() to leave the OS-thread id in attendance.

std::size_t hello_world_worker(std::size_t desired)
{
    // Returns the OS-thread number of the worker that is running this
    // HPX-thread.
    std::size_t current = hpx::get_worker_thread_num();
    if (current == desired)
    {
        // The HPX-thread has been run on the desired OS-thread.
        char const* msg = "hello world from OS-thread {1} on locality {2}\n";

        hpx::util::format_to(hpx::cout, msg, desired, hpx::get_locality_id())
            << std::flush;

        return desired;
    }

    // This HPX-thread has been run by the wrong OS-thread, make the foreman
    // try again by rescheduling it.
    return std::size_t(-1);
}

Because HPX features work stealing task schedulers, there is no way to guarantee that an action will be scheduled on a particular OS-thread. This is why we must use a guess-and-check approach.

Components and actions#

The accumulator example demonstrates the use of components. Components are C++ classes that expose methods as a type of HPX action. These actions are called component actions.

Components are globally named, meaning that a component action can be called remotely (e.g., from another machine). There are two accumulator examples in HPX.

In the Asynchronous execution with actions and the Remote execution with actions, we introduced plain actions, which wrapped global functions. The target of a plain action is an identifier which refers to a particular machine involved in the computation. For plain actions, the target is the machine where the action will be executed.

Component actions, however, do not target machines. Instead, they target component instances. The instance may live on the machine that we’ve invoked the component action from, or it may live on another machine.

The component in this example exposes three different functions:

  • reset() - Resets the accumulator value to 0.

  • add(arg) - Adds arg to the accumulators value.

  • query() - Queries the value of the accumulator.

This example creates an instance of the accumulator, and then allows the user to enter commands at a prompt, which subsequently invoke actions on the accumulator instance.

Setup#

The source code for this example can be found here: accumulator_client.cpp.

To compile this program, go to your HPX build directory (see Building HPX for information on configuring and building HPX) and enter:

$ make examples.accumulators.accumulator

To run the program type:

$ ./bin/accumulator_client

Once the program starts running, it will print the following prompt and then wait for input. An example session is given below:

commands: reset, add [amount], query, help, quit
> add 5
> add 10
> query
15
> add 2
> query
17
> reset
> add 1
> query
1
> quit
Walkthrough#

Now, let’s take a look at the source code of the accumulator example. This example consists of two parts: an HPX component library (a library that exposes an HPX component) and a client application which uses the library. This walkthrough will cover the HPX component library. The code for the client application can be found here: accumulator_client.cpp.

An HPX component is represented by two C++ classes:

  • A server class - The implementation of the component’s functionality.

  • A client class - A high-level interface that acts as a proxy for an instance of the component.

Typically, these two classes both have the same name, but the server class usually lives in different sub-namespaces (server). For example, the full names of the two classes in accumulator are:

  • examples::server::accumulator (server class)

  • examples::accumulator (client class)

The server class#

The following code is from: accumulator.hpp.

All HPX component server classes must inherit publicly from the HPX component base class: hpx::components::component_base

The accumulator component inherits from hpx::components::locking_hook. This allows the runtime system to ensure that all action invocations are serialized. That means that the system ensures that no two actions are invoked at the same time on a given component instance. This makes the component thread safe and no additional locking has to be implemented by the user. Moreover, an accumulator component is a component because it also inherits from hpx::components::component_base (the template argument passed to locking_hook is used as its base class). The following snippet shows the corresponding code:

    class accumulator
      : public hpx::components::locking_hook<
            hpx::components::component_base<accumulator>>

Our accumulator class will need a data member to store its value in, so let’s declare a data member:

        argument_type value_;

The constructor for this class simply initializes value_ to 0:

        accumulator()
          : value_(0)
        {
        }

Next, let’s look at the three methods of this component that we will be exposing as component actions:

Here are the action types. These types wrap the methods we’re exposing. The wrapping technique is very similar to the one used in the Asynchronous execution with actions and the Remote execution with actions:

        HPX_DEFINE_COMPONENT_ACTION(accumulator, reset)
        HPX_DEFINE_COMPONENT_ACTION(accumulator, add)
        HPX_DEFINE_COMPONENT_ACTION(accumulator, query)

The last piece of code in the server class header is the declaration of the action type registration code:

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::accumulator::reset_action, accumulator_reset_action)

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::accumulator::add_action, accumulator_add_action)

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::accumulator::query_action, accumulator_query_action)

Note

The code above must be placed in the global namespace.

The rest of the registration code is in accumulator.cpp

///////////////////////////////////////////////////////////////////////////////
// Add factory registration functionality.
HPX_REGISTER_COMPONENT_MODULE()

///////////////////////////////////////////////////////////////////////////////
typedef hpx::components::component<examples::server::accumulator>
    accumulator_type;

HPX_REGISTER_COMPONENT(accumulator_type, accumulator)

///////////////////////////////////////////////////////////////////////////////
// Serialization support for accumulator actions.
HPX_REGISTER_ACTION(
    accumulator_type::wrapped_type::reset_action, accumulator_reset_action)
HPX_REGISTER_ACTION(
    accumulator_type::wrapped_type::add_action, accumulator_add_action)
HPX_REGISTER_ACTION(
    accumulator_type::wrapped_type::query_action, accumulator_query_action)

Note

The code above must be placed in the global namespace.

The client class#

The following code is from accumulator.hpp.

The client class is the primary interface to a component instance. Client classes are used to create components:

// Create a component on this locality.
examples::accumulator c = hpx::new_<examples::accumulator>(hpx::find_here());

and to invoke component actions:

c.add(hpx::launch::apply, 4);

Clients, like servers, need to inherit from a base class, this time, hpx::components::client_base:

    class accumulator
      : public hpx::components::client_base<accumulator, server::accumulator>

For readability, we typedef the base class like so:

        typedef hpx::components::client_base<accumulator, server::accumulator>
            base_type;

Here are examples of how to expose actions through a client class:

There are a few different ways of invoking actions:

  • Non-blocking: For actions that don’t have return types, or when we do not care about the result of an action, we can invoke the action using fire-and-forget semantics. This means that once we have asked HPX to compute the action, we forget about it completely and continue with our computation. We use hpx::post to invoke an action in a non-blocking fashion.

        void reset(hpx::launch::apply_policy)
        {
            HPX_ASSERT(this->get_id());

            typedef server::accumulator::reset_action action_type;
            hpx::post<action_type>(this->get_id());
        }
        hpx::future<argument_type> query(hpx::launch::async_policy)
        {
            HPX_ASSERT(this->get_id());

            typedef server::accumulator::query_action action_type;
            return hpx::async<action_type>(hpx::launch::async, this->get_id());
        }
  • Synchronous: To invoke an action in a fully synchronous manner, we can simply call hpx::async().get() (i.e., create a future and immediately wait on it to be ready). Here’s an example from the accumulator client class:

        void add(argument_type arg)
        {
            HPX_ASSERT(this->get_id());

            typedef server::accumulator::add_action action_type;
            action_type()(this->get_id(), arg);
        }

Note that this->get_id() references a data member of the hpx::components::client_base base class which identifies the server accumulator instance.

hpx::naming::id_type is a type which represents a global identifier in HPX. This type specifies the target of an action. This is the type that is returned by hpx::find_here in which case it represents the locality the code is running on.

Dataflow#

HPX provides its users with several different tools to simply express parallel concepts. One of these tools is a local control object (LCO) called dataflow. An LCO is a type of component that can spawn a new thread when triggered. They are also distinguished from other components by a standard interface that allow users to understand and use them easily. A Dataflow, being an LCO, is triggered when the values it depends on become available. For instance, if you have a calculation X that depends on the results of three other calculations, you could set up a dataflow that would begin the calculation X as soon as the other three calculations have returned their values. Dataflows are set up to depend on other dataflows. It is this property that makes dataflow a powerful parallelization tool. If you understand the dependencies of your calculation, you can devise a simple algorithm that sets up a dependency tree to be executed. In this example, we calculate compound interest. To calculate compound interest, one must calculate the interest made in each compound period, and then add that interest back to the principal before calculating the interest made in the next period. A practical person would, of course, use the formula for compound interest:

\[F = P(1 + i) ^ n\]

where \(F\) is the future value, \(P\) is the principal value, \(i\) is the interest rate, and \(n\) is the number of compound periods.

However, for the sake of this example, we have chosen to manually calculate the future value by iterating:

\[I = Pi\]

and

\[P = P + I\]
Setup#

The source code for this example can be found here: interest_calculator.cpp.

To compile this program, go to your HPX build directory (see Building HPX for information on configuring and building HPX) and enter:

$ make examples.quickstart.interest_calculator

To run the program type:

$ ./bin/interest_calculator --principal 100 --rate 5 --cp 6 --time 36
Final amount: 134.01
Amount made: 34.0096
Walkthrough#

Let us begin with main. Here we can see that we again are using Boost.Program_options to set our command line variables (see Asynchronous execution with actions for more details). These options set the principal, rate, compound period, and time. It is important to note that the units of time for cp and time must be the same.

int main(int argc, char** argv)
{
    options_description cmdline("Usage: " HPX_APPLICATION_STRING " [options]");

    cmdline.add_options()("principal", value<double>()->default_value(1000),
        "The principal [$]")("rate", value<double>()->default_value(7),
        "The interest rate [%]")("cp", value<int>()->default_value(12),
        "The compound period [months]")("time",
        value<int>()->default_value(12 * 30),
        "The time money is invested [months]");

    hpx::init_params init_args;
    init_args.desc_cmdline = cmdline;

    return hpx::init(argc, argv, init_args);
}

Next we look at hpx_main.

int hpx_main(variables_map& vm)
{
    {
        using hpx::dataflow;
        using hpx::make_ready_future;
        using hpx::shared_future;
        using hpx::unwrapping;
        hpx::id_type here = hpx::find_here();

        double init_principal =
            vm["principal"].as<double>();              //Initial principal
        double init_rate = vm["rate"].as<double>();    //Interest rate
        int cp = vm["cp"].as<int>();     //Length of a compound period
        int t = vm["time"].as<int>();    //Length of time money is invested

        init_rate /= 100;    //Rate is a % and must be converted
        t /= cp;    //Determine how many times to iterate interest calculation:
            //How many full compound periods can fit in the time invested

        // In non-dataflow terms the implemented algorithm would look like:
        //
        // int t = 5;    // number of time periods to use
        // double principal = init_principal;
        // double rate = init_rate;
        //
        // for (int i = 0; i < t; ++i)
        // {
        //     double interest = calc(principal, rate);
        //     principal = add(principal, interest);
        // }
        //
        // Please note the similarity with the code below!

        shared_future<double> principal = make_ready_future(init_principal);
        shared_future<double> rate = make_ready_future(init_rate);

        for (int i = 0; i < t; ++i)
        {
            shared_future<double> interest =
                dataflow(unwrapping(calc), principal, rate);
            principal = dataflow(unwrapping(add), principal, interest);
        }

        // wait for the dataflow execution graph to be finished calculating our
        // overall interest
        double result = principal.get();

        std::cout << "Final amount: " << result << std::endl;
        std::cout << "Amount made: " << result - init_principal << std::endl;
    }

    return hpx::finalize();
}

Here we find our command line variables read in, the rate is converted from a percent to a decimal, the number of calculation iterations is determined, and then our shared_futures are set up. Notice that we first place our principal and rate into shares futures by passing the variables init_principal and init_rate using hpx::make_ready_future.

In this way hpx::shared_future<double> principal and rate will be initialized to init_principal and init_rate when hpx::make_ready_future<double> returns a future containing those initial values. These shared futures then enter the for loop and are passed to interest. Next principal and interest are passed to the reassignment of principal using a hpx::dataflow. A dataflow will first wait for its arguments to be ready before launching any callbacks, so add in this case will not begin until both principal and interest are ready. This loop continues for each compound period that must be calculated. To see how interest and principal are calculated in the loop, let us look at calc_action and add_action:

// Calculate interest for one period
double calc(double principal, double rate)
{
    return principal * rate;
}

///////////////////////////////////////////////////////////////////////////////
// Add the amount made to the principal
double add(double principal, double interest)
{
    return principal + interest;
}

After the shared future dependencies have been defined in hpx_main, we see the following statement:

double result = principal.get();

This statement calls hpx::future::get on the shared future principal which had its value calculated by our for loop. The program will wait here until the entire dataflow tree has been calculated and the value assigned to result. The program then prints out the final value of the investment and the amount of interest made by subtracting the final value of the investment from the initial value of the investment.

Local to remote#

When developers write code they typically begin with a simple serial code and build upon it until all of the required functionality is present. The following set of examples were developed to demonstrate this iterative process of evolving a simple serial program to an efficient, fully-distributed HPX application. For this demonstration, we implemented a 1D heat distribution problem. This calculation simulates the diffusion of heat across a ring from an initialized state to some user-defined point in the future. It does this by breaking each portion of the ring into discrete segments and using the current segment’s temperature and the temperature of the surrounding segments to calculate the temperature of the current segment in the next timestep as shown by Fig. 2 below.

_images/1d_stencil_program_flow.png

Fig. 2 Heat diffusion example program flow.#

We parallelize this code over the following eight examples:

The first example is straight serial code. In this code we instantiate a vector U that contains two vectors of doubles as seen in the structure stepper.

struct stepper
{
    // Our partition type
    typedef double partition;

    // Our data for one time step
    typedef std::vector<partition> space;

    // Our operator
    static double heat(double left, double middle, double right)
    {
        return middle + (k * dt / (dx * dx)) * (left - 2 * middle + right);
    }

    // do all the work on 'nx' data points for 'nt' time steps
    space do_work(std::size_t nx, std::size_t nt)
    {
        // U[t][i] is the state of position i at time t.
        std::vector<space> U(2);
        for (space& s : U)
            s.resize(nx);

        // Initial conditions: f(0, i) = i
        for (std::size_t i = 0; i != nx; ++i)
            U[0][i] = double(i);

        // Actual time step loop
        for (std::size_t t = 0; t != nt; ++t)
        {
            space const& current = U[t % 2];
            space& next = U[(t + 1) % 2];

            next[0] = heat(current[nx - 1], current[0], current[1]);

            for (std::size_t i = 1; i != nx - 1; ++i)
                next[i] = heat(current[i - 1], current[i], current[i + 1]);

            next[nx - 1] = heat(current[nx - 2], current[nx - 1], current[0]);
        }

        // Return the solution at time-step 'nt'.
        return U[nt % 2];
    }
};

Each element in the vector of doubles represents a single grid point. To calculate the change in heat distribution, the temperature of each grid point, along with its neighbors, is passed to the function heat. In order to improve readability, references named current and next are created which, depending on the time step, point to the first and second vector of doubles. The first vector of doubles is initialized with a simple heat ramp. After calling the heat function with the data in the current vector, the results are placed into the next vector.

In example 2 we employ a technique called futurization. Futurization is a method by which we can easily transform a code that is serially executed into a code that creates asynchronous threads. In the simplest case this involves replacing a variable with a future to a variable, a function with a future to a function, and adding a .get() at the point where a value is actually needed. The code below shows how this technique was applied to the struct stepper.

struct stepper
{
    // Our partition type
    typedef hpx::shared_future<double> partition;

    // Our data for one time step
    typedef std::vector<partition> space;

    // Our operator
    static double heat(double left, double middle, double right)
    {
        return middle + (k * dt / (dx * dx)) * (left - 2 * middle + right);
    }

    // do all the work on 'nx' data points for 'nt' time steps
    hpx::future<space> do_work(std::size_t nx, std::size_t nt)
    {
        using hpx::dataflow;
        using hpx::unwrapping;

        // U[t][i] is the state of position i at time t.
        std::vector<space> U(2);
        for (space& s : U)
            s.resize(nx);

        // Initial conditions: f(0, i) = i
        for (std::size_t i = 0; i != nx; ++i)
            U[0][i] = hpx::make_ready_future(double(i));

        auto Op = unwrapping(&stepper::heat);

        // Actual time step loop
        for (std::size_t t = 0; t != nt; ++t)
        {
            space const& current = U[t % 2];
            space& next = U[(t + 1) % 2];

            // WHEN U[t][i-1], U[t][i], and U[t][i+1] have been computed, THEN we
            // can compute U[t+1][i]
            for (std::size_t i = 0; i != nx; ++i)
            {
                next[i] =
                    dataflow(hpx::launch::async, Op, current[idx(i, -1, nx)],
                        current[i], current[idx(i, +1, nx)]);
            }
        }

        // Now the asynchronous computation is running; the above for-loop does not
        // wait on anything. There is no implicit waiting at the end of each timestep;
        // the computation of each U[t][i] will begin as soon as its dependencies
        // are ready and hardware is available.

        // Return the solution at time-step 'nt'.
        return hpx::when_all(U[nt % 2]);
    }
};

In example 2, we redefine our partition type as a shared_future and, in main, create the object result, which is a future to a vector of partitions. We use result to represent the last vector in a string of vectors created for each timestep. In order to move to the next timestep, the values of a partition and its neighbors must be passed to heat once the futures that contain them are ready. In HPX, we have an LCO (Local Control Object) named Dataflow that assists the programmer in expressing this dependency. Dataflow allows us to pass the results of a set of futures to a specified function when the futures are ready. Dataflow takes three types of arguments, one which instructs the dataflow on how to perform the function call (async or sync), the function to call (in this case Op), and futures to the arguments that will be passed to the function. When called, dataflow immediately returns a future to the result of the specified function. This allows users to string dataflows together and construct an execution tree.

After the values of the futures in dataflow are ready, the values must be pulled out of the future container to be passed to the function heat. In order to do this, we use the HPX facility unwrapping, which underneath calls .get() on each of the futures so that the function heat will be passed doubles and not futures to doubles.

By setting up the algorithm this way, the program will be able to execute as quickly as the dependencies of each future are met. Unfortunately, this example runs terribly slow. This increase in execution time is caused by the overheads needed to create a future for each data point. Because the work done within each call to heat is very small, the overhead of creating and scheduling each of the three futures is greater than that of the actual useful work! In order to amortize the overheads of our synchronization techniques, we need to be able to control the amount of work that will be done with each future. We call this amount of work per overhead grain size.

In example 3, we return to our serial code to figure out how to control the grain size of our program. The strategy that we employ is to create “partitions” of data points. The user can define how many partitions are created and how many data points are contained in each partition. This is accomplished by creating the struct partition, which contains a member object data_, a vector of doubles that holds the data points assigned to a particular instance of partition.

In example 4, we take advantage of the partition setup by redefining space to be a vector of shared_futures with each future representing a partition. In this manner, each future represents several data points. Because the user can define how many data points are in each partition, and, therefore, how many data points are represented by one future, a user can control the grainsize of the simulation. The rest of the code is then futurized in the same manner as example 2. It should be noted how strikingly similar example 4 is to example 2.

Example 4 finally shows good results. This code scales equivalently to the OpenMP version. While these results are promising, there are more opportunities to improve the application’s scalability. Currently, this code only runs on one locality, but to get the full benefit of HPX, we need to be able to distribute the work to other machines in a cluster. We begin to add this functionality in example 5.

In order to run on a distributed system, a large amount of boilerplate code must be added. Fortunately, HPX provides us with the concept of a component, which saves us from having to write quite as much code. A component is an object that can be remotely accessed using its global address. Components are made of two parts: a server and a client class. While the client class is not required, abstracting the server behind a client allows us to ensure type safety instead of having to pass around pointers to global objects. Example 5 renames example 4’s struct partition to partition_data and adds serialization support. Next, we add the server side representation of the data in the structure partition_server. Partition_server inherits from hpx::components::component_base, which contains a server-side component boilerplate. The boilerplate code allows a component’s public members to be accessible anywhere on the machine via its Global Identifier (GID). To encapsulate the component, we create a client side helper class. This object allows us to create new instances of our component and access its members without having to know its GID. In addition, we are using the client class to assist us with managing our asynchrony. For example, our client class partition‘s member function get_data() returns a future to partition_data get_data(). This struct inherits its boilerplate code from hpx::components::client_base.

In the structure stepper, we have also had to make some changes to accommodate a distributed environment. In order to get the data from a particular neighboring partition, which could be remote, we must retrieve the data from all of the neighboring partitions. These retrievals are asynchronous and the function heat_part_data, which, amongst other things, calls heat, should not be called unless the data from the neighboring partitions have arrived. Therefore, it should come as no surprise that we synchronize this operation with another instance of dataflow (found in heat_part). This dataflow receives futures to the data in the current and surrounding partitions by calling get_data() on each respective partition. When these futures are ready, dataflow passes them to the unwrapping function, which extracts the shared_array of doubles and passes them to the lambda. The lambda calls heat_part_data on the locality, which the middle partition is on.

Although this example could run distributed, it only runs on one locality, as it always uses hpx::find_here() as the target for the functions to run on.

In example 6, we begin to distribute the partition data on different nodes. This is accomplished in stepper::do_work() by passing the GID of the locality where we wish to create the partition to the partition constructor.

    for (std::size_t i = 0; i != np; ++i)
        U[0][i] = partition(localities[locidx(i, np, nl)], nx, double(i));

We distribute the partitions evenly based on the number of localities used, which is described in the function locidx. Because some of the data needed to update the partition in heat_part could now be on a new locality, we must devise a way of moving data to the locality of the middle partition. We accomplished this by adding a switch in the function get_data() that returns the end element of the buffer data_ if it is from the left partition or the first element of the buffer if the data is from the right partition. In this way only the necessary elements, not the whole buffer, are exchanged between nodes. The reader should be reminded that this exchange of end elements occurs in the function get_data() and, therefore, is executed asynchronously.

Now that we have the code running in distributed, it is time to make some optimizations. The function heat_part spends most of its time on two tasks: retrieving remote data and working on the data in the middle partition. Because we know that the data for the middle partition is local, we can overlap the work on the middle partition with that of the possibly remote call of get_data(). This algorithmic change, which was implemented in example 7, can be seen below:

    // The partitioned operator, it invokes the heat operator above on all elements
    // of a partition.
    static partition heat_part(
        partition const& left, partition const& middle, partition const& right)
    {
        using hpx::dataflow;
        using hpx::unwrapping;

        hpx::shared_future<partition_data> middle_data =
            middle.get_data(partition_server::middle_partition);

        hpx::future<partition_data> next_middle = middle_data.then(
            unwrapping([middle](partition_data const& m) -> partition_data {
                HPX_UNUSED(middle);

                // All local operations are performed once the middle data of
                // the previous time step becomes available.
                std::size_t size = m.size();
                partition_data next(size);
                for (std::size_t i = 1; i != size - 1; ++i)
                    next[i] = heat(m[i - 1], m[i], m[i + 1]);
                return next;
            }));

        return dataflow(hpx::launch::async,
            unwrapping([left, middle, right](partition_data next,
                           partition_data const& l, partition_data const& m,
                           partition_data const& r) -> partition {
                HPX_UNUSED(left);
                HPX_UNUSED(right);

                // Calculate the missing boundary elements once the
                // corresponding data has become available.
                std::size_t size = m.size();
                next[0] = heat(l[size - 1], m[0], m[1]);
                next[size - 1] = heat(m[size - 2], m[size - 1], r[0]);

                // The new partition_data will be allocated on the same locality
                // as 'middle'.
                return partition(middle.get_id(), std::move(next));
            }),
            std::move(next_middle),
            left.get_data(partition_server::left_partition), middle_data,
            right.get_data(partition_server::right_partition));
    }

Example 8 completes the futurization process and utilizes the full potential of HPX by distributing the program flow to multiple localities, usually defined as nodes in a cluster. It accomplishes this task by running an instance of HPX main on each locality. In order to coordinate the execution of the program, the struct stepper is wrapped into a component. In this way, each locality contains an instance of stepper that executes its own instance of the function do_work(). This scheme does create an interesting synchronization problem that must be solved. When the program flow was being coordinated on the head node, the GID of each component was known. However, when we distribute the program flow, each partition has no notion of the GID of its neighbor if the next partition is on another locality. In order to make the GIDs of neighboring partitions visible to each other, we created two buffers to store the GIDs of the remote neighboring partitions on the left and right respectively. These buffers are filled by sending the GID of newly created edge partitions to the right and left buffers of the neighboring localities.

In order to finish the simulation, the solution vectors named result are then gathered together on locality 0 and added into a vector of spaces overall_result using the HPX functions gather_id and gather_here.

Example 8 completes this example series, which takes the serial code of example 1 and incrementally morphs it into a fully distributed parallel code. This evolution was guided by the simple principles of futurization, the knowledge of grainsize, and utilization of components. Applying these techniques easily facilitates the scalable parallelization of most applications.

Serializing user-defined types#

In order to facilitate the sending and receiving of complex datatypes HPX provides a serialization abstraction.

Just like boost, hpx allows users to serialize user-defined types by either providing the serializer as a member function or defining the serialization as a free function.

Unlike Boost HPX doesn’t acknowledge second unsigned int parameter, it is solely there to preserve API compatibility with Boost Serialization

This is tutorial was heavily inspired by Boost’s serialization concepts.

Setup#

The source code for this example can be found here: custom_serialization.cpp.

To compile this program, go to your HPX build directory (see Building HPX for information on configuring and building HPX) and enter:

$ make examples.quickstart.custom_serialization

To run the program type:

$ ./bin/custom_serialization

This should print:

Rectangle(Point(x=0,y=0),Point(x=0,y=5))
gravity.g = 9.81%
Serialization Requirements#

In order to serialize objects in HPX, at least one of the following criteria must be met:

In the case of default constructible objects:

  • The object is an empty type.

  • Has a serialization function as shown in this tutorial.

  • All members are accessible publicly and they can be used in structured binding contexts.

Otherwise:

  • They need to have special serialization support.

Member function serialization#
struct point_member_serialization
{
    int x{0};
    int y{0};

    // Required when defining the serialization function as private
    // In this case it isn't
    // Provides serialization access to HPX
    friend class hpx::serialization::access;

    // Second argument exists solely for compatibility with boost serialize
    // it is NOT processed by HPX in any way.
    template <typename Archive>
    void serialize(Archive& ar, const unsigned int)
    {
        // clang-format off
        ar & x & y;
        // clang-format on
    }
};

// Allow bitwise serialization
HPX_IS_BITWISE_SERIALIZABLE(point_member_serialization)

Notice that point_member_serialization is defined as bitwise serializable (see Bitwise serialization for bitwise copyable data for more details). HPX is also able to recursively serialize composite classes and structs given that its members are serializable.

struct rectangle_member_serialization
{
    point_member_serialization top_left;
    point_member_serialization lower_right;

    template <typename Archive>
    void serialize(Archive& ar, const unsigned int)
    {
        // clang-format off
        ar & top_left & lower_right;
        // clang-format on
    }
};
Free function serialization#

In order to decouple your models from HPX, HPX also allows for the definition of free function serializers.

struct rectangle_free
{
    point_member_serialization top_left;
    point_member_serialization lower_right;
};

template <typename Archive>
void serialize(Archive& ar, rectangle_free& pt, const unsigned int)
{
    // clang-format off
    ar & pt.lower_right & pt.top_left;
    // clang-format on
}

Even if you can’t modify a class to befriend it, you can still be able to serialize your class provided that your class is default constructable and you are able to reconstruct it yourself.

class point_class
{
public:
    point_class(int x, int y)
      : x(x)
      , y(y)
    {
    }

    point_class() = default;

    [[nodiscard]] int get_x() const noexcept
    {
        return x;
    }

    [[nodiscard]] int get_y() const noexcept
    {
        return y;
    }

private:
    int x;
    int y;
};

template <typename Archive>
void load(Archive& ar, point_class& pt, const unsigned int)
{
    int x, y;
    ar >> x >> y;
    pt = point_class(x, y);
}

template <typename Archive>
void save(Archive& ar, point_class const& pt, const unsigned int)
{
    ar << pt.get_x() << pt.get_y();
}

// This tells HPX that you have spilt your serialize function into
// load and save
HPX_SERIALIZATION_SPLIT_FREE(point_class)
Serializing non default constructable classes#

Some classes don’t provide any default constructor.

class planet_weight_calculator
{
public:
    explicit planet_weight_calculator(double g)
      : g(g)
    {
    }

    template <class Archive>
    friend void save_construct_data(
        Archive&, planet_weight_calculator const*, unsigned int);

    [[nodiscard]] double get_g() const
    {
        return g;
    }

private:
    // Provides serialization access to HPX
    friend class hpx::serialization::access;
    template <class Archive>
    void serialize(Archive&, const unsigned int)
    {
        // Serialization will be done in the save_construct_data
        // Still needs to be defined
    }

    double g;
};

In this case you have to define a save_construct_data and load_construct_data in which you do the serialization yourself.

template <class Archive>
inline void save_construct_data(Archive& ar,
    planet_weight_calculator const* weight_calc, const unsigned int)
{
    ar << weight_calc->g;    // Do all of your serialization here
}

template <class Archive>
inline void load_construct_data(
    Archive& ar, planet_weight_calculator* weight_calc, const unsigned int)
{
    double g;
    ar >> g;

    // ::new(ptr) construct new object at given address
    hpx::construct_at(weight_calc, g);
}
Bitwise serialization for bitwise copyable data#

When sending non arithmetic types not defined by std::is_arithmetic, HPX has to (de)serialize each object separately. However, if the class you are trying to send classes consists only of bitwise copyable datatypes, you may mark your class as such. Then HPX will serialize your object bitwise instead of element wise. This has enormous benefits, especially when sending a vector/array of your class. To define your class as such you need to call HPX_IS_BITWISE_SERIALIZABLE(T) with your desired custom class.

struct point_member_serialization
{
    int x{0};
    int y{0};

    // Required when defining the serialization function as private
    // In this case it isn't
    // Provides serialization access to HPX
    friend class hpx::serialization::access;

    // Second argument exists solely for compatibility with boost serialize
    // it is NOT processed by HPX in any way.
    template <typename Archive>
    void serialize(Archive& ar, const unsigned int)
    {
        // clang-format off
        ar & x & y;
        // clang-format on
    }
};

// Allow bitwise serialization
HPX_IS_BITWISE_SERIALIZABLE(point_member_serialization)

Manual#

The manual is your comprehensive guide to HPX. It contains detailed information on how to build and use HPX in different scenarios.

Prerequisites#

Supported platforms#

At this time, HPX supports the following platforms. Other platforms may work, but we do not test HPX with other platforms, so please be warned.

Table 1 Supported Platforms for HPX#

Name

Minimum Version

Architectures

Linux

2.6

x86-32, x86-64, k1om

BlueGeneQ

V1R2M0

PowerPC A2

Windows

Any Windows system

x86-32, x86-64

Mac OSX

Any OSX system

x86-64

Supported compilers#

The table below shows the supported compilers for HPX.

Table 2 Supported Compilers for HPX#

Name

Minimum Version

GNU Compiler Collection (g++)

9.0

clang: a C language family frontend for LLVM

10.0

Visual C++ (x64)

2019

Software and libraries#

The table below presents all the necessary prerequisites for building HPX.

Table 3 Software prerequisites for HPX#

Name

Minimum Version

Build System

CMake

3.18

Required Libraries

Boost

1.71.0

Portable Hardware Locality (HWLOC)

1.5

Asio

1.12.0

The most important dependencies are Boost and Portable Hardware Locality (HWLOC). The installation of Boost is described in detail in Boost’s Getting Started document. A recent version of hwloc is required in order to support thread pinning and NUMA awareness and can be found in Hwloc Downloads.

HPX is written in 99.99% Standard C++ (the remaining 0.01% is platform specific assembly code). As such, HPX is compilable with almost any standards compliant C++ compiler. The code base takes advantage of C++ language and standard library features when available.

Note

When building Boost using gcc, please note that it is required to specify a cxxflags=-std=c++17 command line argument to b2 (bjam).

Note

In most configurations, HPX depends only on header-only Boost. Boost.Filesystem is required if the standard library does not support filesystem. The following are not needed by default, but are required in certain configurations: Boost.Chrono, Boost.DateTime, Boost.Log, Boost.LogSetup, Boost.Regex, and Boost.Thread.

Depending on the options you chose while building and installing HPX, you will find that HPX may depend on several other libraries such as those listed below.

Note

In order to use a high speed parcelport, we currently recommend configuring HPX to use MPI so that MPI can be used for communication between different localities. Please set the CMake variable MPI_CXX_COMPILER to your MPI C++ compiler wrapper if not detected automatically.

Table 4 Optional software prerequisites for HPX#

Name

Minimum version

google-perftools

1.7.1

jemalloc

2.1.0

mi-malloc

1.0.0

Performance Application Programming Interface (PAPI)

Getting HPX#

Download a tarball of the latest release from HPX Downloads and unpack it or clone the repository directly using git:

$ git clone https://github.com/STEllAR-GROUP/hpx.git

It is also recommended that you check out the latest stable tag:

$ cd hpx
$ git checkout 1.10.0

Building HPX#

Basic information#

The build system for HPX is based on CMake, a cross-platform build-generator tool which is not responsible for building the project but rather generates the files needed by your build tool (GNU make, Visual Studio, etc.) for building HPX. If CMake is not already installed in your system, you can download it and install it here: CMake Downloads.

Once CMake has been run, the build process can be started. The build process consists of the following parts:

  • The HPX core libraries (target core): This forms the basic set of HPX libraries.

  • HPX Examples (target examples): This target is enabled by default and builds all HPX examples (disable by setting HPX_WITH_EXAMPLES:BOOL=Off). HPX examples are part of the all target and are included in the installation if enabled.

  • HPX Tests (target tests): This target builds the HPX test suite and is enabled by default (disable by setting HPX_WITH_TESTS:BOOL =Off). They are not built by the all target and have to be built separately.

  • HPX Documentation (target docs): This target builds the documentation, and is not enabled by default (enable by setting HPX_WITH_DOCUMENTATION:BOOL=On. For more information see Documentation.

The HPX build process is highly configurable through CMake, and various CMake variables influence the build process. A list with the most important CMake variables can be found in the section that follows, while the complete list of available CMake variables is in CMake options. These variables can be used to refine the recipes that can be found at Platform specific build recipes, a section that shows some basic steps on how to build HPX for a specific platform.

In order to use HPX, only the core libraries are required. In order to use the optional libraries, you need to specify them as link dependencies in your build (See Creating HPX projects).

Most important CMake options#

While building HPX, you are provided with multiple CMake options which correspond to different configurations. Below, there is a set of the most important and frequently used CMake options.

HPX_WITH_MALLOC#

Use a custom allocator. Using a custom allocator tuned for multithreaded applications is very important for the performance of HPX applications. When debugging applications, it’s useful to set this to system, as custom allocators can hide some memory-related bugs. Note that setting this to something other than system requires an external dependency.

HPX_WITH_CUDA#

Enable support for CUDA. Use CMAKE_CUDA_COMPILER to set the CUDA compiler. This is a standard CMake variable, like CMAKE_CXX_COMPILER.

HPX_WITH_PARCELPORT_MPI#

Enable the MPI parcelport. This enables the use of MPI for the networking operations in the HPX runtime. The default value is OFF because it’s not available on all systems and/or requires another dependency. However, it is the recommended parcelport.

HPX_WITH_PARCELPORT_TCP#

Enable the TCP parcelport. Enables the use of TCP for networking in the runtime. The default value is ON. However, it’s only recommended for debugging purposes, as it is slower than the MPI parcelport.

HPX_WITH_PARCELPORT_LCI#

Enable the LCI parcelport. This enables the use of LCI for the networking operations in the HPX runtime. The default value is OFF because it’s not available on all systems and/or requires another dependency. However, this experimental parcelport may provide better performance than the MPI parcelport. Please refer to Using the LCI parcelport for more information about the LCI parcelport.

HPX_WITH_APEX#

Enable APEX integration. APEX can be used to profile HPX applications. In particular, it provides information about individual tasks in the HPX runtime.

HPX_WITH_GENERIC_CONTEXT_COROUTINES#

Enable Boost. Context for task context switching. It must be enabled for non-x86 architectures such as ARM and Power.

HPX_WITH_MAX_CPU_COUNT#

Set the maximum CPU count supported by HPX. The default value is 64, and should be set to a number at least as high as the number of cores on a system including virtual cores such as hyperthreads.

HPX_WITH_CXX_STANDARD#

Set a specific C++ standard version e.g. HPX_WITH_CXX_STANDARD=20. The default and minimum value is 17.

HPX_WITH_EXAMPLES#

Build examples.

HPX_WITH_TESTS#

Build tests.

For a complete list of available CMake variables that influence the build of HPX, see CMake options.

Build types#

CMake can be configured to generate project files suitable for builds that have enabled debugging support or for an optimized build (without debugging support). The CMake variable used to set the build type is CMAKE_BUILD_TYPE (for more information see the CMake Documentation). Available build types are:

  • Debug: Full debug symbols are available as well as additional assertions to help debugging. To enable the debug build type for the HPX API, the C++ Macro HPX_DEBUG is defined.

  • RelWithDebInfo: Release build with debugging symbols. This is most useful for profiling applications

  • Release: Release build. This disables assertions and enables default compiler optimizations.

  • RelMinSize: Release build with optimizations for small binary sizes.

Important

We currently don’t guarantee ABI compatibility between Debug and Release builds. Please make sure that applications built against HPX use the same build type as you used to build HPX. For CMake builds, this means that the CMAKE_BUILD_TYPE variables have to match and for projects not using CMake, the HPX_DEBUG macro has to be set in debug mode.

Platform specific build recipes#
Unix variants#

Once you have the source code and the dependencies and assuming all your dependencies are in paths known to CMake, the following gets you started:

  1. First, set up a separate build directory to configure the project:

    $ mkdir build && cd build
    
  2. To configure the project you have the following options:

    • To build the core HPX libraries and examples, and install them to your chosen location (recommended):

    $ cmake -DCMAKE_INSTALL_PREFIX=/install/path ..
    

    Tip

    If you want to change CMake variables for your build, it is usually a good idea to start with a clean build directory to avoid configuration problems. It is especially important that you use a clean build directory when changing between Release and Debug modes.

    • To install HPX to the default system folders, simply leave out the CMAKE_INSTALL_PREFIX option:

    $ cmake ..
    
    • If your dependencies are in custom locations, you may need to tell CMake where to find them by passing one or more options to CMake as shown below:

    $ cmake -DBOOST_ROOT=/path/to/boost
          -DHWLOC_ROOT=/path/to/hwloc
          -DTCMALLOC_ROOT=/path/to/tcmalloc
          -DJEMALLOC_ROOT=/path/to/jemalloc
          [other CMake variable definitions]
          /path/to/source/tree
    

    For instance:

    $ cmake -DBOOST_ROOT=~/packages/boost -DHWLOC_ROOT=/packages/hwloc -DCMAKE_INSTALL_PREFIX=~/packages/hpx ~/downloads/hpx_1.5.1
    
    • If you want to try HPX without using a custom allocator pass -DHPX_WITH_MALLOC=system to CMake:

    $ cmake -DCMAKE_INSTALL_PREFIX=/install/path -DHPX_WITH_MALLOC=system ..
    

    Note

    Please pay special attention to the section about HPX_WITH_MALLOC:STRING as this is crucial for getting decent performance.

    Important

    If you are building HPX for a system with more than 64 processing units, you must change the CMake variable HPX_WITH_MAX_CPU_COUNT (to a value at least as big as the number of (virtual) cores on your system). Note that the default value is 64.

    Caution

    Compiling and linking HPX needs a considerable amount of memory. It is advisable that at least 2 GB of memory per parallel process is available.

  3. Once the configuration is complete, to build the project you run:

$ cmake --build . --target install
Windows#

Note

The following build recipes are mostly user-contributed and may be outdated. We always welcome updated and new build recipes.

To build HPX under Windows 10 x64 with Visual Studio 2015:

  • Download the CMake V3.18.1 installer (or latest version) from here

  • Download the hwloc V1.11.0 (or the latest version) from here and unpack it.

  • Download the latest Boost libraries from here and unpack them.

  • Build the Boost DLLs and LIBs by using these commands from Command Line (or PowerShell). Open CMD/PowerShell inside the Boost dir and type in:

    .\bootstrap.bat
    

    This batch file will set up everything needed to create a successful build. Now execute:

    .\b2.exe link=shared variant=release,debug architecture=x86 address-model=64 threading=multi --build-type=complete install
    

    This command will start a (very long) build of all available Boost libraries. Please, be patient.

  • Open CMake-GUI.exe and set up your source directory (input field ‘Where is the source code’) to the base directory of the source code you downloaded from HPX’s GitHub pages. Here’s an example of CMake path settings, which point to the Documents/GitHub/hpx folder:

    _images/cmake_settings1.png

    Fig. 3 Example CMake path settings.#

    Inside ‘Where is the source-code’ enter the base directory of your HPX source directory (do not enter the “src” sub-directory!). Inside ‘Where to build the binaries’ you should put in the path where all the building processes will happen. This is important because the building machinery will do an “out-of-tree” build. CMake will not touch or change the original source files in any way. Instead, it will generate Visual Studio Solution Files, which will build HPX packages out of the HPX source tree.

  • Set three new environment variables (in CMake, not in Windows environment): BOOST_ROOT, HWLOC_ROOT, ASIO_ROOT, CMAKE_INSTALL_PREFIX. The meaning of these variables is as follows:

    • BOOST_ROOT the HPX root directory of the unpacked Boost headers/cpp files.

    • HWLOC_ROOT the HPX root directory of the unpacked Portable Hardware Locality files.

    • ASIO_ROOT the HPX root directory of the unpacked ASIO files. Alternatively use HPX_WITH_FETCH_ASIO with value True.

    • CMAKE_INSTALL_PREFIX the HPX root directory where the future builds of HPX should be installed.

      Note

      HPX is a very large software collection, so it is not recommended to use the default C:\Program Files\hpx. Many users may prefer to use simpler paths without whitespace, like C:\bin\hpx or D:\bin\hpx etc.

    To insert new env-vars click on “Add Entry” and then insert the name inside “Name”, select PATH as Type and put the path-name in the “Path” text field. Repeat this for the first three variables.

    This is how variable insertion will look:

    _images/cmake_settings2.png

    Fig. 4 Example CMake adding entry.#

    Alternatively, users could provide BOOST_LIBRARYDIR instead of BOOST_ROOT; the difference is that BOOST_LIBRARYDIR should point to the subdirectory inside Boost root where all the compiled DLLs/LIBs are. For example, BOOST_LIBRARYDIR may point to the bin.v2 subdirectory under the Boost rootdir. It is important to keep the meanings of these two variables separated from each other: BOOST_DIR points to the ROOT folder of the Boost library. BOOST_LIBRARYDIR points to the subdir inside the Boost root folder where the compiled binaries are.

  • Click the ‘Configure’ button of CMake-GUI. You will be immediately presented with a small window where you can select the C++ compiler to be used within Visual Studio. This has been tested using the latest v14 (a.k.a C++ 2015) but older versions should be sufficient too. Make sure to select the 64Bit compiler.

  • After the generate process has finished successfully, click the ‘Generate’ button. Now, CMake will put new VS Solution files into the BUILD folder you selected at the beginning.

  • Open Visual Studio and load the HPX.sln from your build folder.

  • Go to CMakePredefinedTargets and build the INSTALL project:

    _images/vs_targets_install.png

    Fig. 5 Visual Studio INSTALL target.#

    It will take some time to compile everything, and in the end you should see an output similar to this one:

    _images/vs_build_output.png

    Fig. 6 Visual Studio build output.#

CMake options#

In order to configure HPX, you can set a variety of options to allow CMake to generate your specific makefiles/project files. A list of the most important CMake options can be found in Most important CMake options, while this section includes the comprehensive list.

Variables that influence how HPX is built#

The options are split into these categories:

Generic options#
HPX_WITH_AUTOMATIC_SERIALIZATION_REGISTRATION:BOOL#

Use automatic serialization registration for actions and functions. This affects compatibility between HPX applications compiled with different compilers (default ON)

HPX_WITH_BENCHMARK_SCRIPTS_PATH:PATH#

Directory to place batch scripts in

HPX_WITH_BUILD_BINARY_PACKAGE:BOOL#

Build HPX on the build infrastructure on any LINUX distribution (default: OFF).

HPX_WITH_CHECK_MODULE_DEPENDENCIES:BOOL#

Verify that no modules are cross-referenced from a different module category (default: OFF)

HPX_WITH_COMPILER_WARNINGS:BOOL#

Enable compiler warnings (default: ON)

HPX_WITH_COMPILER_WARNINGS_AS_ERRORS:BOOL#

Turn compiler warnings into errors (default: OFF)

HPX_WITH_COMPRESSION_BZIP2:BOOL#

Enable bzip2 compression for parcel data (default: OFF).

HPX_WITH_COMPRESSION_SNAPPY:BOOL#

Enable snappy compression for parcel data (default: OFF).

HPX_WITH_COMPRESSION_ZLIB:BOOL#

Enable zlib compression for parcel data (default: OFF).

HPX_WITH_CUDA:BOOL#

Enable support for CUDA (default: OFF)

HPX_WITH_CXX_STANDARD:STRING#

Set the C++ standard to use when compiling HPX itself. (default: 17)

HPX_WITH_DATAPAR:BOOL#

Enable data parallel algorithm support using Vc library (default: ON)

HPX_WITH_DATAPAR_BACKEND:STRING#

Define which vectorization library should be used. Options are: VC, EVE, STD_EXPERIMENTAL_SIMD, SVE; NONE

HPX_WITH_DATAPAR_VC_NO_LIBRARY:BOOL#

Don’t link with the Vc static library (default: OFF)

HPX_WITH_DEPRECATION_WARNINGS:BOOL#

Enable warnings for deprecated facilities. (default: ON)

HPX_WITH_DISABLED_SIGNAL_EXCEPTION_HANDLERS:BOOL#

Disables the mechanism that produces debug output for caught signals and unhandled exceptions (default: OFF)

HPX_WITH_DYNAMIC_HPX_MAIN:BOOL#

Enable dynamic overload of system main() (Linux and Apple only, default: ON)

HPX_WITH_FAULT_TOLERANCE:BOOL#

Build HPX to tolerate failures of nodes, i.e. ignore errors in active communication channels (default: OFF)

HPX_WITH_FULL_RPATH:BOOL#

Build and link HPX libraries and executables with full RPATHs (default: ON)

HPX_WITH_GCC_VERSION_CHECK:BOOL#

Don’t ignore version reported by gcc (default: ON)

HPX_WITH_GENERIC_CONTEXT_COROUTINES:BOOL#

Use Boost.Context as the underlying coroutines context switch implementation.

HPX_WITH_HIDDEN_VISIBILITY:BOOL#

Use -fvisibility=hidden for builds on platforms which support it (default OFF)

HPX_WITH_HIP:BOOL#

Enable compilation with HIPCC (default: OFF)

HPX_WITH_HIPSYCL:BOOL#

Use hipsycl cmake integration (default: OFF)

HPX_WITH_LOGGING:BOOL#

Build HPX with logging enabled (default: ON).

HPX_WITH_MALLOC:STRING#

Define which allocator should be linked in. Options are: system, tcmalloc, jemalloc, mimalloc, tbbmalloc, and custom (default is: tcmalloc)

HPX_WITH_MODULES_AS_STATIC_LIBRARIES:BOOL#

Compile HPX modules as STATIC (whole-archive) libraries instead of OBJECT libraries (Default: ON)

HPX_WITH_NICE_THREADLEVEL:BOOL#

Set HPX worker threads to have high NICE level (may impact performance) (default: OFF)

HPX_WITH_PARCEL_COALESCING:BOOL#

Enable the parcel coalescing plugin (default: ON).

HPX_WITH_PKGCONFIG:BOOL#

Enable generation of pkgconfig files (default: ON on Linux without CUDA/HIP, otherwise OFF)

HPX_WITH_PRECOMPILED_HEADERS:BOOL#

Enable precompiled headers for certain build targets (experimental) (default OFF)

HPX_WITH_RUN_MAIN_EVERYWHERE:BOOL#

Run hpx_main by default on all localities (default: OFF).

HPX_WITH_STACKOVERFLOW_DETECTION:BOOL#

Enable stackoverflow detection for HPX threads/coroutines. (default: OFF, debug: ON)

HPX_WITH_STATIC_LINKING:BOOL#

Compile HPX statically linked libraries (Default: OFF)

HPX_WITH_SYCL:BOOL#

Enable support for Sycl (default: OFF)

HPX_WITH_SYCL_FLAGS:STRING#

Sycl compile flags for selecting specific targets (default: empty)

HPX_WITH_UNITY_BUILD:BOOL#

Enable unity build for certain build targets (default OFF)

HPX_WITH_VIM_YCM:BOOL#

Generate HPX completion file for VIM YouCompleteMe plugin

HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD:STRING#

The threshold in bytes to when perform zero copy optimizations (default: 128)

Build Targets options#
HPX_WITH_ASIO_TAG:STRING#

Asio repository tag or branch

HPX_WITH_COMPILE_ONLY_TESTS:BOOL#

Create build system support for compile time only HPX tests (default ON)

HPX_WITH_DISTRIBUTED_RUNTIME:BOOL#

Enable the distributed runtime (default: ON). Turning off the distributed runtime completely disallows the creation and use of components and actions. Turning this option off is experimental!

HPX_WITH_DOCUMENTATION:BOOL#

Build the HPX documentation (default OFF).

HPX_WITH_DOCUMENTATION_OUTPUT_FORMATS:STRING#

List of documentation output formats to generate. Valid options are html;singlehtml;latexpdf;man. Multiple values can be separated with semicolons. (default html).

HPX_WITH_EXAMPLES:BOOL#

Build the HPX examples (default ON)

HPX_WITH_EXAMPLES_HDF5:BOOL#

Enable examples requiring HDF5 support (default: OFF).

HPX_WITH_EXAMPLES_OPENMP:BOOL#

Enable examples requiring OpenMP support (default: OFF).

HPX_WITH_EXAMPLES_QT4:BOOL#

Enable examples requiring Qt4 support (default: OFF).

HPX_WITH_EXAMPLES_QTHREADS:BOOL#

Enable examples requiring QThreads support (default: OFF).

HPX_WITH_EXAMPLES_TBB:BOOL#

Enable examples requiring TBB support (default: OFF).

HPX_WITH_EXECUTABLE_PREFIX:STRING#

Executable prefix (default none), ‘hpx_’ useful for system install.

HPX_WITH_FAIL_COMPILE_TESTS:BOOL#

Create build system support for fail compile HPX tests (default ON)

HPX_WITH_FETCH_ASIO:BOOL#

Use FetchContent to fetch Asio. By default an installed Asio will be used. (default: OFF)

HPX_WITH_FETCH_LCI:BOOL#

Use FetchContent to fetch LCI. By default an installed LCI will be used. (default: OFF)

HPX_WITH_IO_COUNTERS:BOOL#

Enable IO counters (default: ON)

HPX_WITH_LCI_TAG:STRING#

LCI repository tag or branch

Number of Parallel link jobs while building hpx (only for Ninja as generator) (default 2)

HPX_WITH_TESTS:BOOL#

Build the HPX tests (default ON)

HPX_WITH_TESTS_BENCHMARKS:BOOL#

Build HPX benchmark tests (default: ON)

HPX_WITH_TESTS_EXAMPLES:BOOL#

Add HPX examples as tests (default: ON)

HPX_WITH_TESTS_EXTERNAL_BUILD:BOOL#

Build external cmake build tests (default: ON)

HPX_WITH_TESTS_HEADERS:BOOL#

Build HPX header tests (default: OFF)

HPX_WITH_TESTS_REGRESSIONS:BOOL#

Build HPX regression tests (default: ON)

HPX_WITH_TESTS_UNIT:BOOL#

Build HPX unit tests (default: ON)

HPX_WITH_TOOLS:BOOL#

Build HPX tools (default: OFF)

Thread Manager options#
HPX_COROUTINES_WITH_SWAP_CONTEXT_EMULATION:BOOL#

Emulate SwapContext API for coroutines (Windows only, default: OFF)

HPX_WITH_COROUTINE_COUNTERS:BOOL#

Enable keeping track of coroutine creation and rebind counts (default: OFF)

HPX_WITH_IO_POOL:BOOL#

Disable internal IO thread pool, do not change if not absolutely necessary (default: ON)

HPX_WITH_MAX_CPU_COUNT:STRING#

HPX applications will not use more that this number of OS-Threads (empty string means dynamic) (default: “”)

HPX_WITH_MAX_NUMA_DOMAIN_COUNT:STRING#

HPX applications will not run on machines with more NUMA domains (default: 8)

HPX_WITH_SCHEDULER_LOCAL_STORAGE:BOOL#

Enable scheduler local storage for all HPX schedulers (default: OFF)

HPX_WITH_SPINLOCK_DEADLOCK_DETECTION:BOOL#

Enable spinlock deadlock detection (default: OFF)

HPX_WITH_SPINLOCK_POOL_NUM:STRING#

Number of elements a spinlock pool manages (default: 128)

HPX_WITH_STACKTRACES:BOOL#

Attach backtraces to HPX exceptions (default: ON)

HPX_WITH_STACKTRACES_DEMANGLE_SYMBOLS:BOOL#

Thread stack back trace symbols will be demangled (default: ON)

HPX_WITH_STACKTRACES_STATIC_SYMBOLS:BOOL#

Thread stack back trace will resolve static symbols (default: OFF)

HPX_WITH_THREAD_BACKTRACE_DEPTH:STRING#

Thread stack back trace depth being captured (default: 20)

HPX_WITH_THREAD_BACKTRACE_ON_SUSPENSION:BOOL#

Enable thread stack back trace being captured on suspension (default: OFF)

HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES:BOOL#

Enable measuring thread creation and cleanup times (default: OFF)

HPX_WITH_THREAD_CUMULATIVE_COUNTS:BOOL#

Enable keeping track of cumulative thread counts in the schedulers (default: ON)

HPX_WITH_THREAD_IDLE_RATES:BOOL#

Enable measuring the percentage of overhead times spent in the scheduler (default: OFF)

HPX_WITH_THREAD_LOCAL_STORAGE:BOOL#

Enable thread local storage for all HPX threads (default: OFF)

HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF:BOOL#

HPX scheduler threads do exponential backoff on idle queues (default: ON)

HPX_WITH_THREAD_QUEUE_WAITTIME:BOOL#

Enable collecting queue wait times for threads (default: OFF)

HPX_WITH_THREAD_STACK_MMAP:BOOL#

Use mmap for stack allocation on appropriate platforms

HPX_WITH_THREAD_STEALING_COUNTS:BOOL#

Enable keeping track of counts of thread stealing incidents in the schedulers (default: OFF)

HPX_WITH_THREAD_TARGET_ADDRESS:BOOL#

Enable storing target address in thread for NUMA awareness (default: OFF)

HPX_WITH_TIMER_POOL:BOOL#

Disable internal timer thread pool, do not change if not absolutely necessary (default: ON)

AGAS options#
HPX_WITH_AGAS_DUMP_REFCNT_ENTRIES:BOOL#

Enable dumps of the AGAS refcnt tables to logs (default: OFF)

Parcelport options#
HPX_WITH_NETWORKING:BOOL#

Enable support for networking and multi-node runs (default: ON)

HPX_WITH_PARCELPORT_ACTION_COUNTERS:BOOL#

Enable performance counters reporting parcelport statistics on a per-action basis.

HPX_WITH_PARCELPORT_COUNTERS:BOOL#

Enable performance counters reporting parcelport statistics.

HPX_WITH_PARCELPORT_LCI:BOOL#

Enable the LCI based parcelport.

HPX_WITH_PARCELPORT_LIBFABRIC:BOOL#

Enable the libfabric based parcelport. This is currently an experimental feature

HPX_WITH_PARCELPORT_MPI:BOOL#

Enable the MPI based parcelport.

HPX_WITH_PARCELPORT_TCP:BOOL#

Enable the TCP based parcelport.

HPX_WITH_PARCEL_PROFILING:BOOL#

Enable profiling data for parcels

Profiling options#
HPX_WITH_APEX:BOOL#

Enable APEX instrumentation support.

HPX_WITH_ITTNOTIFY:BOOL#

Enable Amplifier (ITT) instrumentation support.

HPX_WITH_PAPI:BOOL#

Enable the PAPI based performance counter.

Debugging options#
HPX_WITH_ATTACH_DEBUGGER_ON_TEST_FAILURE:BOOL#

Break the debugger if a test has failed (default: OFF)

HPX_WITH_PARALLEL_TESTS_BIND_NONE:BOOL#

Pass –hpx:bind=none to tests that may run in parallel (cmake -j flag) (default: OFF)

HPX_WITH_SANITIZERS:BOOL#

Configure with sanitizer instrumentation support.

HPX_WITH_TESTS_DEBUG_LOG:BOOL#

Turn on debug logs (–hpx:debug-hpx-log) for tests (default: OFF)

HPX_WITH_TESTS_DEBUG_LOG_DESTINATION:STRING#

Destination for test debug logs (default: cout)

HPX_WITH_TESTS_MAX_THREADS_PER_LOCALITY:STRING#

Maximum number of threads to use for tests (default: 0, use the number of threads specified by the test)

HPX_WITH_THREAD_DEBUG_INFO:BOOL#

Enable thread debugging information (default: OFF, implicitly enabled in debug builds)

HPX_WITH_THREAD_DESCRIPTION_FULL:BOOL#

Use function address for thread description (default: OFF)

HPX_WITH_THREAD_GUARD_PAGE:BOOL#

Enable thread guard page (default: ON)

HPX_WITH_VALGRIND:BOOL#

Enable Valgrind instrumentation support.

HPX_WITH_VERIFY_LOCKS:BOOL#

Enable lock verification code (default: OFF, enabled in debug builds)

HPX_WITH_VERIFY_LOCKS_BACKTRACE:BOOL#

Enable thread stack back trace being captured on lock registration (to be used in combination with HPX_WITH_VERIFY_LOCKS=ON, default: OFF)

Modules options#
HPX_DATASTRUCTURES_WITH_ADAPT_STD_TUPLE:BOOL#

Enable compatibility of hpx::get with std::tuple. (default: ON)

HPX_DATASTRUCTURES_WITH_ADAPT_STD_VARIANT:BOOL#

Enable compatibility of hpx::get with std::variant.

(default: OFF)

HPX_FILESYSTEM_WITH_BOOST_FILESYSTEM_COMPATIBILITY:BOOL#

Enable Boost.FileSystem compatibility. (default: OFF)

HPX_ITERATOR_SUPPORT_WITH_BOOST_ITERATOR_TRAVERSAL_TAG_COMPATIBILITY:BOOL#

Enable Boost.Iterator traversal tag compatibility. (default: OFF)

HPX_LOGGING_WITH_SEPARATE_DESTINATIONS:BOOL#

Enable separate logging channels for AGAS, timing, and parcel transport. (default: ON)

HPX_SERIALIZATION_WITH_ALLOW_CONST_TUPLE_MEMBERS:BOOL#

Enable serializing std::tuple with const members. (default: OFF)

HPX_SERIALIZATION_WITH_ALLOW_RAW_POINTER_SERIALIZATION:BOOL#

Enable serializing raw pointers. (default: OFF)

HPX_SERIALIZATION_WITH_ALL_TYPES_ARE_BITWISE_SERIALIZABLE:BOOL#

Assume all types are bitwise serializable. (default: OFF)

HPX_SERIALIZATION_WITH_BOOST_TYPES:BOOL#

Enable serialization of certain Boost types. (default: OFF)

HPX_SERIALIZATION_WITH_SUPPORTS_ENDIANESS:BOOL#

Support endian conversion on inout and output archives. (default: OFF)

HPX_TOPOLOGY_WITH_ADDITIONAL_HWLOC_TESTING:BOOL#

Enable HWLOC filtering that makes it report no cores, this is purely an

option supporting better testing - do not enable under normal circumstances. (default: OFF)

HPX_WITH_POWER_COUNTER:BOOL#

Enable use of performance counters based on pwr library (default: OFF)

Additional tools and libraries used by HPX#

Here is a list of additional libraries and tools that are either optionally supported by the build system or are optionally required for certain examples or tests. These libraries and tools can be detected by the HPX build system.

Each of the tools or libraries listed here will be automatically detected if they are installed in some standard location. If a tool or library is installed in a different location, you can specify its base directory by appending _ROOT to the variable name as listed below. For instance, to configure a custom directory for BOOST, specify BOOST_ROOT=/custom/boost/root.

BOOST_ROOT:PATH#

Specifies where to look for the Boost installation to be used for compiling HPX. Set this if CMake is not able to locate a suitable version of Boost. The directory specified here can be either the root of an installed Boost distribution or the directory where you unpacked and built Boost without installing it (with staged libraries).

HWLOC_ROOT:PATH#

Specifies where to look for the hwloc library. Set this if CMake is not able to locate a suitable version of hwloc. Hwloc provides platform- independent support for extracting information about the used hardware architecture (number of cores, number of NUMA domains, hyperthreading, etc.). HPX utilizes this information if available.

PAPI_ROOT:PATH#

Specifies where to look for the PAPI library. The PAPI library is needed to compile a special component exposing PAPI hardware events and counters as HPX performance counters. This is not available on the Windows platform.

AMPLIFIER_ROOT:PATH#

Specifies where to look for one of the tools of the Intel Parallel Studio product, either Intel Amplifier or Intel Inspector. This should be set if the CMake variable HPX_USE_ITT_NOTIFY is set to ON. Enabling ITT support in HPX will integrate any application with the mentioned Intel tools, which customizes the generated information for your application and improves the generated diagnostics.

In addition, some of the examples may need the following variables:

HDF5_ROOT:PATH#

Specifies where to look for the Hierarchical Data Format V5 (HDF5) include files and libraries.

Building tests and examples#

Tests#

To build the tests:

$ cmake --build . --target tests

To control which tests to run use ctest:

  • To run single tests, for example a test for for_loop:

$ ctest --output-on-failure -R tests.unit.modules.algorithms.algorithms.for_loop
  • To run a whole group of tests:

$ ctest --output-on-failure -R tests.unit
Examples#
  • To build (and install) all examples invoke:

$ cmake -DHPX_WITH_EXAMPLES=On .
$ make examples
$ make install
  • To build the hello_world_1 example run:

$ make hello_world_1

HPX executables end up in the bin directory in your build directory. You can now run hello_world_1 and should see the following output:

$ ./bin/hello_world_1
Hello World!

You’ve just run an example which prints Hello World! from the HPX runtime. The source for the example is in examples/quickstart/hello_world_1.cpp. The hello_world_distributed example (also available in the examples/quickstart directory) is a distributed hello world program, which is described in Remote execution with actions. It provides a gentle introduction to the distributed aspects of HPX.

Tip

Most build targets in HPX have two names: a simple name and a hierarchical name corresponding to what type of example or test the target is. If you are developing HPX it is often helpful to run make help to get a list of available targets. For example, make help | grep hello_world outputs the following:

... examples.quickstart.hello_world_2
... hello_world_2
... examples.quickstart.hello_world_1
... hello_world_1
... examples.quickstart.hello_world_distributed
... hello_world_distributed

It is also possible to build, for instance, all quickstart examples using make examples.quickstart.

Creating HPX projects#

Using HPX with pkg-config#
How to build HPX applications with pkg-config#

After you are done installing HPX, you should be able to build the following program. It prints Hello World! on the locality you run it on.

// Including 'hpx/hpx_main.hpp' instead of the usual 'hpx/hpx_init.hpp' enables
// to use the plain C-main below as the direct main HPX entry point.
#include <hpx/hpx_main.hpp>
#include <hpx/iostream.hpp>

int main()
{
    // Say hello to the world!
    hpx::cout << "Hello World!\n" << std::flush;
    return 0;
}

Copy the text of this program into a file called hello_world.cpp.

Now, in the directory where you put hello_world.cpp, issue the following commands (where $HPX_LOCATION is the build directory or CMAKE_INSTALL_PREFIX you used while building HPX):

$ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HPX_LOCATION/lib/pkgconfig
$ c++ -o hello_world hello_world.cpp \
   `pkg-config --cflags --libs hpx_application`\
    -lhpx_iostreams -DHPX_APPLICATION_NAME=hello_world

Important

When using pkg-config with HPX, the pkg-config flags must go after the -o flag.

Note

HPX libraries have different names in debug and release mode. If you want to link against a debug HPX library, you need to use the _debug suffix for the pkg-config name. That means instead of hpx_application or hpx_component, you will have to use hpx_application_debug or hpx_component_debug Moreover, all referenced HPX components need to have an appended d suffix. For example, instead of -lhpx_iostreams you will need to specify -lhpx_iostreamsd.

Important

If the HPX libraries are in a path that is not found by the dynamic linker, you will need to add the path $HPX_LOCATION/lib to your linker search path (for example LD_LIBRARY_PATH on Linux).

To test the program, type:

$ ./hello_world

which should print Hello World! and exit.

How to build HPX components with pkg-config#

Let’s try a more complex example involving an HPX component. An HPX component is a class that exposes HPX actions. HPX components are compiled into dynamically loaded modules called component libraries. Here’s the source code:

hello_world_component.cpp

#include <hpx/config.hpp>
#if !defined(HPX_COMPUTE_DEVICE_CODE)
#include <hpx/iostream.hpp>
#include "hello_world_component.hpp"

#include <iostream>

namespace examples { namespace server {
    void hello_world::invoke()
    {
        hpx::cout << "Hello HPX World!" << std::endl;
    }
}}    // namespace examples::server

HPX_REGISTER_COMPONENT_MODULE()

typedef hpx::components::component<examples::server::hello_world>
    hello_world_type;

HPX_REGISTER_COMPONENT(hello_world_type, hello_world)

HPX_REGISTER_ACTION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action)
#endif

hello_world_component.hpp

#pragma once

#include <hpx/config.hpp>
#if !defined(HPX_COMPUTE_DEVICE_CODE)
#include <hpx/hpx.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/include/components.hpp>
#include <hpx/include/lcos.hpp>
#include <hpx/serialization.hpp>

#include <utility>

namespace examples { namespace server {
    struct HPX_COMPONENT_EXPORT hello_world
      : hpx::components::component_base<hello_world>
    {
        void invoke();
        HPX_DEFINE_COMPONENT_ACTION(hello_world, invoke)
    };
}}    // namespace examples::server

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action)

namespace examples {
    struct hello_world
      : hpx::components::client_base<hello_world, server::hello_world>
    {
        typedef hpx::components::client_base<hello_world, server::hello_world>
            base_type;

        hello_world(hpx::future<hpx::id_type>&& f)
          : base_type(std::move(f))
        {
        }

        hello_world(hpx::id_type&& f)
          : base_type(std::move(f))
        {
        }

        void invoke()
        {
            hpx::async<server::hello_world::invoke_action>(this->get_id())
                .get();
        }
    };
}    // namespace examples

#endif

hello_world_client.cpp

#include <hpx/config.hpp>
#if defined(HPX_COMPUTE_HOST_CODE)
#include <hpx/wrap_main.hpp>

#include "hello_world_component.hpp"

int main()
{
    {
        // Create a single instance of the component on this locality.
        examples::hello_world client =
            hpx::new_<examples::hello_world>(hpx::find_here());

        // Invoke the component's action, which will print "Hello World!".
        client.invoke();
    }

    return 0;
}
#endif

Copy the three source files above into three files (called hello_world_component.cpp, hello_world_component.hpp and hello_world_client.cpp, respectively).

Now, in the directory where you put the files, run the following command to build the component library. (where $HPX_LOCATION is the build directory or CMAKE_INSTALL_PREFIX you used while building HPX):

$ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HPX_LOCATION/lib/pkgconfig
$ c++ -o libhpx_hello_world.so hello_world_component.cpp \
   `pkg-config --cflags --libs hpx_component` \
    -lhpx_iostreams -DHPX_COMPONENT_NAME=hpx_hello_world

Now pick a directory in which to install your HPX component libraries. For this example, we’ll choose a directory named my_hpx_libs:

$ mkdir ~/my_hpx_libs
$ mv libhpx_hello_world.so ~/my_hpx_libs

Note

HPX libraries have different names in debug and release mode. If you want to link against a debug HPX library, you need to use the _debug suffix for the pkg-config name. That means instead of hpx_application or hpx_component you will have to use hpx_application_debug or hpx_component_debug. Moreover, all referenced HPX components need to have a appended d suffix, e.g. instead of -lhpx_iostreams you will need to specify -lhpx_iostreamsd.

Important

If the HPX libraries are in a path that is not found by the dynamic linker. You need to add the path $HPX_LOCATION/lib to your linker search path (for example LD_LIBRARY_PATH on Linux).

Now, to build the application that uses this component (hello_world_client.cpp), we do:

$ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HPX_LOCATION/lib/pkgconfig
$ c++ -o hello_world_client hello_world_client.cpp \
   ``pkg-config --cflags --libs hpx_application``\
    -L${HOME}/my_hpx_libs -lhpx_hello_world -lhpx_iostreams

Important

When using pkg-config with HPX, the pkg-config flags must go after the -o flag.

Finally, you’ll need to set your LD_LIBRARY_PATH before you can run the program. To run the program, type:

$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HOME/my_hpx_libs"
$ ./hello_world_client

which should print Hello HPX World! and exit.

Using HPX with CMake-based projects#

In addition to the pkg-config support discussed on the previous pages, HPX comes with full CMake support. In order to integrate HPX into existing or new CMakeLists.txt, you can leverage the find_package command integrated into CMake. Following, is a Hello World component example using CMake.

Let’s revisit what we have. We have three files that compose our example application:

  • hello_world_component.hpp

  • hello_world_component.cpp

  • hello_world_client.hpp

The basic structure to include HPX into your CMakeLists.txt is shown here:

# Require a recent version of cmake
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)

# This project is C++ based.
project(your_app CXX)

# Instruct cmake to find the HPX settings
find_package(HPX)

In order to have CMake find HPX, it needs to be told where to look for the HPXConfig.cmake file that is generated when HPX is built or installed. It is used by find_package(HPX) to set up all the necessary macros needed to use HPX in your project. The ways to achieve this are:

  • Set the HPX_DIR CMake variable to point to the directory containing the HPXConfig.cmake script on the command line when you invoke CMake:

    $ cmake -DHPX_DIR=$HPX_LOCATION/lib/cmake/HPX ...
    

    where $HPX_LOCATION is the build directory or CMAKE_INSTALL_PREFIX you used when building/configuring HPX.

  • Set the CMAKE_PREFIX_PATH variable to the root directory of your HPX build or install location on the command line when you invoke CMake:

    $ cmake -DCMAKE_PREFIX_PATH=$HPX_LOCATION ...
    

    The difference between CMAKE_PREFIX_PATH and HPX_DIR is that CMake will add common postfixes, such as lib/cmake/<project, to the CMAKE_PREFIX_PATH and search in these locations too. Note that if your project uses HPX as well as other CMake-managed projects, the paths to the locations of these multiple projects may be concatenated in the CMAKE_PREFIX_PATH.

  • The variables above may be set in the CMake GUI or curses ccmake interface instead of the command line.

Additionally, if you wish to require HPX for your project, replace the find_package(HPX) line with find_package(HPX REQUIRED).

You can check if HPX was successfully found with the HPX_FOUND CMake variable.

Using CMake targets#

The recommended way of setting up your targets to use HPX is to link to the HPX::hpx CMake target:

target_link_libraries(hello_world_component PUBLIC HPX::hpx)

This requires that you have already created the target like this:

add_library(hello_world_component SHARED hello_world_component.cpp)
target_include_directories(hello_world_component PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})

When you link your library to the HPX::hpx CMake target, you will be able use HPX functionality in your library. To use main() as the implicit entry point in your application you must additionally link your application to the CMake target HPX::wrap_main. This target is automatically linked to executables if you are using the macros described below (Using macros to create new targets). See Re-use the main() function as the main HPX entry point for more information on implicitly using main() as the entry point.

Creating a component requires setting two additional compile definitions:

target_compile_options(hello_world_component
  HPX_COMPONENT_NAME=hello_world
  HPX_COMPONENT_EXPORTS)

Instead of setting these definitions manually you may link to the HPX::component target, which sets HPX_COMPONENT_NAME to hpx_<target_name>, where <target_name> is the target name of your library. Note that these definitions should be PRIVATE to make sure these definitions are not propagated transitively to dependent targets.

In addition to making your library a component you can make it a plugin. To do so link to the HPX::plugin target. Similarly to HPX::component this will set HPX_PLUGIN_NAME to hpx_<target_name>. This definition should also be PRIVATE. Unlike regular shared libraries, plugins are loaded at runtime from certain directories and will not be found without additional configuration. Plugins should be installed into a directory containing only plugins. For example, the plugins created by HPX itself are installed into the hpx subdirectory in the library install directory (typically lib or lib64). When using the HPX::plugin target you need to install your plugins into an appropriate directory. You may also want to set the location of your plugin in the build directory with the *_OUTPUT_DIRECTORY* CMake target properties to be able to load the plugins in the build directory. Once you’ve set the install or output directory of your plugin you need to tell your executable where to find it at runtime. You can do this either by setting the environment variable HPX_COMPONENT_PATHS or the ini setting hpx.component_paths (see --hpx:ini) to the directory containing your plugin.

Using macros to create new targets#

In addition to the targets described above, HPX provides convenience macros to hide optional boilerplate code that may be useful for your project. The link to the targets described above. We recommend that you use the targets directly whenever possible as they tend to compose better with other targets.

The macro for adding an HPX component is add_hpx_component. It can be used in your CMakeLists.txt file like this:

# build your application using HPX
add_hpx_component(hello_world
    SOURCES hello_world_component.cpp
    HEADERS hello_world_component.hpp
    COMPONENT_DEPENDENCIES iostreams)

Note

add_hpx_component adds a _component suffix to the target name. In the example above, a hello_world_component target will be created.

The available options to add_hpx_component are:

  • SOURCES: The source files for that component

  • HEADERS: The header files for that component

  • DEPENDENCIES: Other libraries or targets this component depends on

  • COMPONENT_DEPENDENCIES: The components this component depends on

  • PLUGIN: Treats this component as a plugin-able library

  • COMPILE_FLAGS: Additional compiler flags

  • LINK_FLAGS: Additional linker flags

  • FOLDER: Adds the headers and source files to this Source Group folder

  • EXCLUDE_FROM_ALL: Do not build this component as part of the all target

After adding the component, the way you add the executable is as follows:

# build your application using HPX
add_hpx_executable(hello_world
    SOURCES hello_world_client.cpp
    COMPONENT_DEPENDENCIES hello_world)

Note

add_hpx_executable automatically adds a _component suffix to dependencies specified in COMPONENT_DEPENDENCIES, meaning you can directly use the name given when adding a component using add_hpx_component.

When you configure your application, all you need to do is set the HPX_DIR variable to point to the installation of HPX.

Note

All library targets built with HPX are exported and readily available to be used as arguments to target_link_libraries in your targets. The HPX include directories are available with the HPX_INCLUDE_DIRS CMake variable.

Using the HPX compiler wrapper hpxcxx#

The hpxcxx compiler wrapper helps to compile a HPX component, application, or object file, based on the arguments passed to it.

$ hpxcxx [--exe=<APPLICATION_NAME> | --comp=<COMPONENT_NAME> | -c] FLAGS FILES

The hpxcxx command requires that either an application or a component is built or -c flag is specified. If the build is against a debug build, the -g is to be specified while building.

Optional FLAGS#
  • -l <LIBRARY> | -l<LIBRARY>: Links <LIBRARY> to the build

  • -g: Specifies that the application or component build is against a debug build

  • -rd: Sets release-with-debug-info option

  • -mr: Sets minsize-release option

All other flags (like -o OUTPUT_FILE) are directly passed to the underlying C++ compiler.

Using macros to set up existing targets to use HPX#

In addition to the add_hpx_component and add_hpx_executable, you can use the hpx_setup_target macro to have an already existing target to be used with the HPX libraries:

hpx_setup_target(target)

Optional parameters are:

  • EXPORT: Adds it to the CMake export list HPXTargets

  • INSTALL: Generates an install rule for the target

  • PLUGIN: Treats this component as a plugin-able library

  • TYPE: The type can be: EXECUTABLE, LIBRARY or COMPONENT

  • DEPENDENCIES: Other libraries or targets this component depends on

  • COMPONENT_DEPENDENCIES: The components this component depends on

  • COMPILE_FLAGS: Additional compiler flags

  • LINK_FLAGS: Additional linker flags

If you do not use CMake, you can still build against HPX, but you should refer to the section on How to build HPX components with pkg-config.

Note

Since HPX relies on dynamic libraries, the dynamic linker needs to know where to look for them. If HPX isn’t installed into a path that is configured as a linker search path, external projects need to either set RPATH or adapt LD_LIBRARY_PATH to point to where the HPX libraries reside. In order to set RPATHs, you can include HPX_SetFullRPATH in your project after all libraries you want to link against have been added. Please also consult the CMake documentation here.

Using HPX with Makefile#

A basic project building with HPX is through creating makefiles. The process of creating one can get complex depending upon the use of cmake parameter HPX_WITH_HPX_MAIN (which defaults to ON).

How to build HPX applications with makefile#

If HPX is installed correctly, you should be able to build and run a simple Hello World program. It prints Hello World! on the locality you run it on.

// Including 'hpx/hpx_main.hpp' instead of the usual 'hpx/hpx_init.hpp' enables
// to use the plain C-main below as the direct main HPX entry point.
#include <hpx/hpx_main.hpp>
#include <hpx/iostream.hpp>

int main()
{
    // Say hello to the world!
    hpx::cout << "Hello World!\n" << std::flush;
    return 0;
}

Copy the content of this program into a file called hello_world.cpp.

Now, in the directory where you put hello_world.cpp, create a Makefile. Add the following code:

CXX=(CXX)  # Add your favourite compiler here or let makefile choose default.

CXXFLAGS=-O3 -std=c++17

BOOST_ROOT=/path/to/boost
HWLOC_ROOT=/path/to/hwloc
TCMALLOC_ROOT=/path/to/tcmalloc
HPX_ROOT=/path/to/hpx

INCLUDE_DIRECTIVES=$(HPX_ROOT)/include $(BOOST_ROOT)/include $(HWLOC_ROOT)/include

LIBRARY_DIRECTIVES=-L$(HPX_ROOT)/lib $(HPX_ROOT)/lib/libhpx_init.a $(HPX_ROOT)/lib/libhpx.so $(BOOST_ROOT)/lib/libboost_atomic-mt.so $(BOOST_ROOT)/lib/libboost_filesystem-mt.so $(BOOST_ROOT)/lib/libboost_program_options-mt.so $(BOOST_ROOT)/lib/libboost_regex-mt.so $(BOOST_ROOT)/lib/libboost_system-mt.so -lpthread $(TCMALLOC_ROOT)/libtcmalloc_minimal.so $(HWLOC_ROOT)/libhwloc.so -ldl -lrt

LINK_FLAGS=$(HPX_ROOT)/lib/libhpx_wrap.a -Wl,-wrap=main  # should be left empty for HPX_WITH_HPX_MAIN=OFF

hello_world: hello_world.o
   $(CXX) $(CXXFLAGS) -o hello_world hello_world.o $(LIBRARY_DIRECTIVES) $(LINK_FLAGS)

hello_world.o:
   $(CXX) $(CXXFLAGS) -c -o hello_world.o hello_world.cpp $(INCLUDE_DIRECTIVES)

Important

LINK_FLAGS should be left empty if HPX_WITH_HPX_MAIN is set to OFF. Boost in the above example is build with --layout=tagged. Actual Boost flags may vary on your build of Boost.

To build the program, type:

$ make

A successful build should result in hello_world binary. To test, type:

$ ./hello_world
How to build HPX components with makefile#

Let’s try a more complex example involving an HPX component. An HPX component is a class that exposes HPX actions. HPX components are compiled into dynamically-loaded modules called component libraries. Here’s the source code:

hello_world_component.cpp

#include <hpx/config.hpp>
#if !defined(HPX_COMPUTE_DEVICE_CODE)
#include <hpx/iostream.hpp>
#include "hello_world_component.hpp"

#include <iostream>

namespace examples { namespace server {
    void hello_world::invoke()
    {
        hpx::cout << "Hello HPX World!" << std::endl;
    }
}}    // namespace examples::server

HPX_REGISTER_COMPONENT_MODULE()

typedef hpx::components::component<examples::server::hello_world>
    hello_world_type;

HPX_REGISTER_COMPONENT(hello_world_type, hello_world)

HPX_REGISTER_ACTION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action)
#endif

hello_world_component.hpp

#pragma once

#include <hpx/config.hpp>
#if !defined(HPX_COMPUTE_DEVICE_CODE)
#include <hpx/hpx.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/include/components.hpp>
#include <hpx/include/lcos.hpp>
#include <hpx/serialization.hpp>

#include <utility>

namespace examples { namespace server {
    struct HPX_COMPONENT_EXPORT hello_world
      : hpx::components::component_base<hello_world>
    {
        void invoke();
        HPX_DEFINE_COMPONENT_ACTION(hello_world, invoke)
    };
}}    // namespace examples::server

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action)

namespace examples {
    struct hello_world
      : hpx::components::client_base<hello_world, server::hello_world>
    {
        typedef hpx::components::client_base<hello_world, server::hello_world>
            base_type;

        hello_world(hpx::future<hpx::id_type>&& f)
          : base_type(std::move(f))
        {
        }

        hello_world(hpx::id_type&& f)
          : base_type(std::move(f))
        {
        }

        void invoke()
        {
            hpx::async<server::hello_world::invoke_action>(this->get_id())
                .get();
        }
    };
}    // namespace examples

#endif

hello_world_client.cpp

#include <hpx/config.hpp>
#if defined(HPX_COMPUTE_HOST_CODE)
#include <hpx/wrap_main.hpp>

#include "hello_world_component.hpp"

int main()
{
    {
        // Create a single instance of the component on this locality.
        examples::hello_world client =
            hpx::new_<examples::hello_world>(hpx::find_here());

        // Invoke the component's action, which will print "Hello World!".
        client.invoke();
    }

    return 0;
}
#endif

Now, in the directory, create a Makefile. Add the following code:

CXX=(CXX)  # Add your favourite compiler here or let makefile choose default.

CXXFLAGS=-O3 -std=c++17

BOOST_ROOT=/path/to/boost
HWLOC_ROOT=/path/to/hwloc
TCMALLOC_ROOT=/path/to/tcmalloc
HPX_ROOT=/path/to/hpx

INCLUDE_DIRECTIVES=$(HPX_ROOT)/include $(BOOST_ROOT)/include $(HWLOC_ROOT)/include

LIBRARY_DIRECTIVES=-L$(HPX_ROOT)/lib $(HPX_ROOT)/lib/libhpx_init.a $(HPX_ROOT)/lib/libhpx.so $(BOOST_ROOT)/lib/libboost_atomic-mt.so $(BOOST_ROOT)/lib/libboost_filesystem-mt.so $(BOOST_ROOT)/lib/libboost_program_options-mt.so $(BOOST_ROOT)/lib/libboost_regex-mt.so $(BOOST_ROOT)/lib/libboost_system-mt.so -lpthread $(TCMALLOC_ROOT)/libtcmalloc_minimal.so $(HWLOC_ROOT)/libhwloc.so -ldl -lrt

LINK_FLAGS=$(HPX_ROOT)/lib/libhpx_wrap.a -Wl,-wrap=main  # should be left empty for HPX_WITH_HPX_MAIN=OFF

hello_world_client: libhpx_hello_world hello_world_client.o
  $(CXX) $(CXXFLAGS) -o hello_world_client $(LIBRARY_DIRECTIVES) libhpx_hello_world $(LINK_FLAGS)

hello_world_client.o: hello_world_client.cpp
  $(CXX) $(CXXFLAGS) -o hello_world_client.o hello_world_client.cpp $(INCLUDE_DIRECTIVES)

libhpx_hello_world: hello_world_component.o
  $(CXX) $(CXXFLAGS) -o libhpx_hello_world hello_world_component.o $(LIBRARY_DIRECTIVES)

hello_world_component.o: hello_world_component.cpp
  $(CXX) $(CXXFLAGS) -c -o hello_world_component.o hello_world_component.cpp $(INCLUDE_DIRECTIVES)

To build the program, type:

$ make

A successful build should result in hello_world binary. To test, type:

$ ./hello_world

Note

Due to high variations in CMake flags and library dependencies, it is recommended to build HPX applications and components with pkg-config or CMakeLists.txt. Writing Makefile may result in broken builds if due care is not taken. pkg-config files and CMake systems are configured with CMake build of HPX. Hence, they are stable when used together and provide better support overall.

Starting the HPX runtime#

In order to write an application that uses services from the HPX runtime system, you need to initialize the HPX library by inserting certain calls into the code of your application. Depending on your use case, this can be done in 3 different ways:

  • Minimally invasive: Re-use the main() function as the main HPX entry point.

  • Balanced use case: Supply your own main HPX entry point while blocking the main thread.

  • Most flexibility: Supply your own main HPX entry point while avoiding blocking the main thread.

  • Suspend and resume: As above but suspend and resume the HPX runtime to allow for other runtimes to be used.

Re-use the main() function as the main HPX entry point#

This method is the least intrusive to your code. However, it provides you with the smallest flexibility in terms of initializing the HPX runtime system. The following code snippet shows what a minimal HPX application using this technique looks like:

#include <hpx/hpx_main.hpp>

int main(int argc, char* argv[])
{
    return 0;
}

The only change to your code you have to make is to include the file hpx/hpx_main.hpp. In this case the function main() will be invoked as the first HPX thread of the application. The runtime system will be initialized behind the scenes before the function main() is executed and will automatically stop after main() has returned. For this method to work you must link your application to the CMake target HPX::wrap_main. This is done automatically if you are using the provided macros (Using macros to create new targets) to set up your application, but must be done explicitly if you are using targets directly (Using CMake targets). All HPX API functions can be used from within the main() function now.

Note

The function main() does not need to expect receiving argc and argv as shown above, but could expose the signature int main(). This is consistent with the usually allowed prototypes for the function main() in C++ applications.

All command line arguments specific to HPX will still be processed by the HPX runtime system as usual. However, those command line options will be removed from the list of values passed to argc/argv of the function main(). The list of values passed to main() will hold only the commandline options that are not recognized by the HPX runtime system (see the section HPX Command Line Options for more details on what options are recognized by HPX).

Note

In this mode all one-letter shortcuts that are normally available on the HPX command line are disabled (such as -t or -l see HPX Command Line Options). This is done to minimize any possible interaction between the command line options recognized by the HPX runtime system and any command line options defined by the application.

The value returned from the function main() as shown above will be returned to the operating system as usual.

Important

To achieve this seamless integration, the header file hpx/hpx_main.hpp defines a macro:

#define main hpx_startup::user_main

which could result in unexpected behavior.

Important

To achieve this seamless integration, we use different implementations for different operating systems. In case of Linux or macOS, the code present in hpx_wrap.cpp is put into action. We hook into the system function in case of Linux and provide alternate entry point in case of macOS. For other operating systems we rely on a macro:

#define main hpx_startup::user_main

provided in the header file hpx/hpx_main.hpp. This implementation can result in unexpected behavior.

Caution

We make use of an override variable include_libhpx_wrap in the header file hpx/hpx_main.hpp to swiftly choose the function call stack at runtime. Therefore, the header file should only be included in the main executable. Including it in the components will result in multiple definition of the variable.

Supply your own main HPX entry point while blocking the main thread#

With this method you need to provide an explicit main-thread function named hpx_main at global scope. This function will be invoked as the main entry point of your HPX application on the console locality only (this function will be invoked as the first HPX thread of your application). All HPX API functions can be used from within this function.

The thread executing the function hpx::init will block waiting for the runtime system to exit. The value returned from hpx_main will be returned from hpx::init after the runtime system has stopped.

The function hpx::finalize has to be called on one of the HPX localities in order to signal that all work has been scheduled and the runtime system should be stopped after the scheduled work has been executed.

This method of invoking HPX has the advantage of the user being able to decide which version of hpx::init to call. This allows to pass additional configuration parameters while initializing the HPX runtime system.

#include <hpx/hpx_init.hpp>

int hpx_main(int argc, char* argv[])
{
    // Any HPX application logic goes here...
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    // Initialize HPX, run hpx_main as the first HPX thread, and
    // wait for hpx::finalize being called.
    return hpx::init(argc, argv);
}

Note

The function hpx_main does not need to expect receiving argc/argv as shown above, but could expose one of the following signatures:

int hpx_main();
int hpx_main(int argc, char* argv[]);
int hpx_main(hpx::program_options::variables_map& vm);

This is consistent with (and extends) the usually allowed prototypes for the function main() in C++ applications.

The header file to include for this method of using HPX is hpx/hpx_init.hpp.

There are many additional overloads of hpx::init available, such as the ability to provide your own entry-point function instead of hpx_main. Please refer to the function documentation for more details (see: hpx/hpx_init.hpp).

Supply your own main HPX entry point while avoiding blocking the main thread#

With this method you need to provide an explicit main thread function named hpx_main at global scope. This function will be invoked as the main entry point of your HPX application on the console locality only (this function will be invoked as the first HPX thread of your application). All HPX API functions can be used from within this function.

The thread executing the function hpx::start will not block waiting for the runtime system to exit, but will return immediately. The function hpx::finalize has to be called on one of the HPX localities in order to signal that all work has been scheduled and the runtime system should be stopped after the scheduled work has been executed.

This method of invoking HPX is useful for applications where the main thread is used for special operations, such a GUIs. The function hpx::stop can be used to wait for the HPX runtime system to exit and should at least be used as the last function called in main(). The value returned from hpx_main will be returned from hpx::stop after the runtime system has stopped.

#include <hpx/hpx_start.hpp>

int hpx_main(int argc, char* argv[])
{
    // Any HPX application logic goes here...
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    // Initialize HPX, run hpx_main.
    hpx::start(argc, argv);

    // ...Execute other code here...

    // Wait for hpx::finalize being called.
    return hpx::stop();
}

Note

The function hpx_main does not need to expect receiving argc/argv as shown above, but could expose one of the following signatures:

int hpx_main();
int hpx_main(int argc, char* argv[]);
int hpx_main(hpx::program_options::variables_map& vm);

This is consistent with (and extends) the usually allowed prototypes for the function main() in C++ applications.

The header file to include for this method of using HPX is hpx/hpx_start.hpp.

There are many additional overloads of hpx::start available, such as the option for users to provide their own entry point function instead of hpx_main. Please refer to the function documentation for more details (see: hpx/hpx_start.hpp).

Supply your own explicit startup function as the main HPX entry point#

There is also a way to specify any function (besides hpx_main) to be used as the main entry point for your HPX application:

#include <hpx/hpx_init.hpp>

int application_entry_point(int argc, char* argv[])
{
    // Any HPX application logic goes here...
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    // Initialize HPX, run application_entry_point as the first HPX thread,
    // and wait for hpx::finalize being called.
    return hpx::init(&application_entry_point, argc, argv);
}

Note

The function supplied to hpx::init must have one of the following prototypes:

int application_entry_point(int argc, char* argv[]); int application_entry_point(hpx::program_options::variables_map& vm);

Note

If nullptr is used as the function argument, HPX will not run any startup function on this locality.

Suspending and resuming the HPX runtime#

In some applications it is required to combine HPX with other runtimes. To support this use case, HPX provides two functions: hpx::suspend and hpx::resume. hpx::suspend is a blocking call which will wait for all scheduled tasks to finish executing and then put the thread pool OS threads to sleep. hpx::resume simply wakes up the sleeping threads so that they are ready to accept new work. hpx::suspend and hpx::resume can be found in the header hpx/hpx_suspend.hpp.

#include <hpx/hpx_start.hpp>
#include <hpx/hpx_suspend.hpp>

int main(int argc, char* argv[])
{

   // Initialize HPX, don't run hpx_main
    hpx::start(nullptr, argc, argv);

    // Schedule a function on the HPX runtime
    hpx::post(&my_function, ...);

    // Wait for all tasks to finish, and suspend the HPX runtime
    hpx::suspend();

    // Execute non-HPX code here

    // Resume the HPX runtime
    hpx::resume();

    // Schedule more work on the HPX runtime

    // hpx::finalize has to be called from the HPX runtime before hpx::stop
    hpx::post([]() { hpx::finalize(); });
    return hpx::stop();
}

Note

hpx::suspend does not wait for hpx::finalize to be called. Only call hpx::finalize when you wish to fully stop the HPX runtime.

Warning

hpx::suspend only waits for local tasks, i.e. tasks on the

current locality, to finish executing. When using hpx::suspend in a multi-locality scenario the user is responsible for ensuring that any work required from other localities has also finished.

HPX also supports suspending individual thread pools and threads. For details on how to do that, see the documentation for hpx::threads::thread_pool_base.

Automatically suspending worker threads#

The previous method guarantees that the worker threads are suspended when you ask for it and that they stay suspended. An alternative way to achieve the same effect is to tweak how quickly HPX suspends its worker threads when they run out of work. The following configuration values make sure that HPX idles very quickly:

hpx.max_idle_backoff_time = 1000
hpx.max_idle_loop_count = 0

They can be set on the command line using --hpx:ini=hpx.max_idle_backoff_time=1000 and --hpx:ini=hpx.max_idle_loop_count=0. See Launching and configuring HPX applications for more details on how to set configuration parameters.

After setting idling parameters the previous example could now be written like this instead:

#include <hpx/hpx_start.hpp>

int main(int argc, char* argv[])
{

   // Initialize HPX, don't run hpx_main
    hpx::start(nullptr, argc, argv);

    // Schedule some functions on the HPX runtime
    // NOTE: run_as_hpx_thread blocks until completion.
    hpx::run_as_hpx_thread(&my_function, ...);
    hpx::run_as_hpx_thread(&my_other_function, ...);

    // hpx::finalize has to be called from the HPX runtime before hpx::stop
    hpx::post([]() { hpx::finalize(); });
    return hpx::stop();
}

In this example each call to hpx::run_as_hpx_thread acts as a “parallel region”.

Working of hpx_main.hpp#

In order to initialize HPX from main(), we make use of linker tricks.

It is implemented differently for different operating systems. The method of implementation is as follows:

  • Linux: Using linker --wrap option.

  • Mac OSX: Using the linker -e option.

  • Windows: Using #define main hpx_startup::user_main

Linux implementation#

We make use of the Linux linker ld‘s --wrap option to wrap the main() function. This way any calls to main() are redirected to our own implementation of main. It is here that we check for the existence of hpx_main.hpp by making use of a shadow variable include_libhpx_wrap. The value of this variable determines the function stack at runtime.

The implementation can be found in libhpx_wrap.a.

Important

It is necessary that hpx_main.hpp be not included more than once. Multiple inclusions can result in multiple definition of include_libhpx_wrap.

Mac OSX implementation#

Here we make use of yet another linker option -e to change the entry point to our custom entry function initialize_main. We initialize the HPX runtime system from this function and call main from the initialized system. We determine the function stack at runtime by making use of the shadow variable include_libhpx_wrap.

The implementation can be found in libhpx_wrap.a.

Important

It is necessary that hpx_main.hpp be not included more than once. Multiple inclusions can result in multiple definition of include_libhpx_wrap.

Windows implementation#

We make use of a macro #define main hpx_startup::user_main to take care of the initializations.

This implementation could result in unexpected behaviors.

Launching and configuring HPX applications#

Configuring HPX applications#

All HPX applications can be configured using special command line options and/or using special configuration files. This section describes the available options, the configuration file format, and the algorithm used to locate possible predefined configuration files. Additionally, this section describes the defaults assumed if no external configuration information is supplied.

During startup any HPX application applies a predefined search pattern to locate one or more configuration files. All found files will be read and merged in the sequence they are found into one single internal database holding all configuration properties. This database is used during the execution of the application to configure different aspects of the runtime system.

In addition to the ini files, any application can supply its own configuration files, which will be merged with the configuration database as well. Moreover, the user can specify additional configuration parameters on the command line when executing an application. The HPX runtime system will merge all command line configuration options (see the description of the --hpx:ini, --hpx:config, and --hpx:app-config command line options).

The HPX ini file format#

All HPX applications can be configured using a special file format that is similar to the well-known Windows INI file format. This is a structured text format that allows users to group key/value pairs (properties) into sections. The basic element contained in an ini file is the property. Every property has a name and a value, delimited by an equal sign '='. The name appears to the left of the equal sign:

name=value

The value may contain equal signs as only the first '=' character is interpreted as the delimiter between name and value. Whitespace before the name, after the value and immediately before and after the delimiting equal sign is ignored. Whitespace inside the value is retained.

Properties may be grouped into arbitrarily named sections. The section name appears on a line by itself, in square brackets. All properties after the section declaration are associated with that section. There is no explicit “end of section” delimiter; sections end at the next section declaration or the end of the file:

[section]

In HPX sections can be nested. A nested section has a name composed of all section names it is embedded in. The section names are concatenated using a dot '.':

[outer_section.inner_section]

Here, inner_section is logically nested within outer_section.

It is possible to use the full section name concatenated with the property name to refer to a particular property. For example, in:

[a.b.c]
d = e

the property value of d can be referred to as a.b.c.d=e.

In HPX ini files can contain comments. Hash signs '#' at the beginning of a line indicate a comment. All characters starting with '#' until the end of the line are ignored.

If a property with the same name is reused inside a section, the second occurrence of this property name will override the first occurrence (discard the first value). Duplicate sections simply merge their properties together, as if they occurred contiguously.

In HPX ini files a property value ${FOO:default} will use the environmental variable FOO to extract the actual value if it is set and default otherwise. No default has to be specified. Therefore, ${FOO} refers to the environmental variable FOO. If FOO is not set or empty, the overall expression will evaluate to an empty string. A property value $[section.key:default] refers to the value held by the property section.key if it exists and default otherwise. No default has to be specified. Therefore $[section.key] refers to the property section.key. If the property section.key is not set or empty, the overall expression will evaluate to an empty string.

Note

Any property $[section.key:default] is evaluated whenever it is queried and not when the configuration data is initialized. This allows for lazy evaluation and relaxes initialization order of different sections. The only exception are recursive property values, e.g., values referring to the very key they are associated with. Those property values are evaluated at initialization time to avoid infinite recursion.

Built-in default configuration settings#

During startup any HPX application applies a predefined search pattern to locate one or more configuration files. All found files will be read and merged in the sequence they are found into one single internal data structure holding all configuration properties.

As a first step the internal configuration database is filled with a set of default configuration properties. Those settings are described on a section by section basis below.

Note

You can print the default configuration settings used for an executable by specifying the command line option --hpx:dump-config.

The system configuration section#
[system]
pid = <process-id>
prefix = <current prefix path of core HPX library>
executable = <current prefix path of executable>

Property

Description

system.pid

This is initialized to store the current OS-process id of the application instance.

system.prefix

This is initialized to the base directory HPX has been loaded from.

system.executable_prefix

This is initialized to the base directory the current executable has been loaded from.

The `HPX configuration section#
[hpx]
location = ${HPX_LOCATION:$[system.prefix]}
component_path = $[hpx.location]/lib/hpx:$[system.executable_prefix]/lib/hpx:$[system.executable_prefix]/../lib/hpx
master_ini_path = $[hpx.location]/share/hpx-<version>:$[system.executable_prefix]/share/hpx-<version>:$[system.executable_prefix]/../share/hpx-<version>
ini_path = $[hpx.master_ini_path]/ini
os_threads = 1
cores = all
localities = 1
program_name =
cmd_line =
lock_detection = ${HPX_LOCK_DETECTION:0}
throw_on_held_lock = ${HPX_THROW_ON_HELD_LOCK:1}
minimal_deadlock_detection = <debug>
spinlock_deadlock_detection = <debug>
spinlock_deadlock_detection_limit = ${HPX_SPINLOCK_DEADLOCK_DETECTION_LIMIT:1000000}
max_background_threads = ${HPX_MAX_BACKGROUND_THREADS:$[hpx.os_threads]}
max_idle_loop_count = ${HPX_MAX_IDLE_LOOP_COUNT:<hpx_idle_loop_count_max>}
max_busy_loop_count = ${HPX_MAX_BUSY_LOOP_COUNT:<hpx_busy_loop_count_max>}
max_idle_backoff_time = ${HPX_MAX_IDLE_BACKOFF_TIME:<hpx_idle_backoff_time_max>}
exception_verbosity = ${HPX_EXCEPTION_VERBOSITY:2}
trace_depth = ${HPX_TRACE_DEPTH:20}
handle_signals = ${HPX_HANDLE_SIGNALS:1}

[hpx.stacks]
small_size = ${HPX_SMALL_STACK_SIZE:<hpx_small_stack_size>}
medium_size = ${HPX_MEDIUM_STACK_SIZE:<hpx_medium_stack_size>}
large_size = ${HPX_LARGE_STACK_SIZE:<hpx_large_stack_size>}
huge_size = ${HPX_HUGE_STACK_SIZE:<hpx_huge_stack_size>}
use_guard_pages = ${HPX_THREAD_GUARD_PAGE:1}

Property

Description

hpx.location

This is initialized to the id of the locality this application instance is running on.

hpx.component_path

Duplicates are discarded. This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or by ';' (Windows).

hpx.master_ini_path

This is initialized to the list of default paths of the main hpx.ini configuration files. This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or using ';' (Windows).

hpx.ini_path

This is initialized to the default path where HPX will look for more ini configuration files. This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or using ';' (Windows).

hpx.os_threads

This setting reflects the number of OS threads used for running HPX threads. Defaults to number of detected cores (not hyperthreads/PUs).

hpx.cores

This setting reflects the number of cores used for running HPX threads. Defaults to number of detected cores (not hyperthreads/PUs).

hpx.localities

This setting reflects the number of localities the application is running on. Defaults to 1.

hpx.program_name

This setting reflects the program name of the application instance. Initialized from the command line argv[0].

hpx.cmd_line

This setting reflects the actual command line used to launch this application instance.

hpx.lock_detection

This setting verifies that no locks are being held while a HPX thread is suspended. This setting is applicable only if HPX_WITH_VERIFY_LOCKS is set during configuration in CMake.

hpx.throw_on_held_lock

This setting causes an exception if during lock detection at least one lock is being held while a HPX thread is suspended. This setting is applicable only if HPX_WITH_VERIFY_LOCKS is set during configuration in CMake. This setting has no effect if hpx.lock_detection=0.

hpx.minimal_deadlock_detection

This setting enables support for minimal deadlock detection for HPX threads. By default this is set to 1 (for Debug builds) or to 0 (for Release, RelWithDebInfo, RelMinSize builds). This setting is effective only if HPX_WITH_THREAD_DEADLOCK_DETECTION is set during configuration in CMake.

hpx.spinlock_deadlock_detection

This setting verifies that spinlocks don’t spin longer than specified using the hpx.spinlock_deadlock_detection_limit. This setting is applicable only if HPX_WITH_SPINLOCK_DEADLOCK_DETECTION is set during configuration in CMake. By default this is set to 1 (for Debug builds) or to 0 (for Release, RelWithDebInfo, RelMinSize builds).

hpx.spinlock_deadlock_detection_limit

This setting specifies the upper limit of the allowed number of spins that spinlocks are allowed to perform. This setting is applicable only if HPX_WITH_SPINLOCK_DEADLOCK_DETECTION is set during configuration in CMake. By default this is set to 1000000.

hpx.max_background_threads

This setting defines the number of threads in the scheduler, which are used to execute background work. By default this is the same as the number of cores used for the scheduler.

hpx.max_idle_loop_count

By default this is defined by the preprocessor constant HPX_IDLE_LOOP_COUNT_MAX. This is an internal setting that you should change only if you know exactly what you are doing.

hpx.max_busy_loop_count

This setting defines the maximum value of the busy-loop counter in the scheduler. By default this is defined by the preprocessor constant HPX_BUSY_LOOP_COUNT_MAX. This is an internal setting that you should change only if you know exactly what you are doing.

hpx.max_idle_backoff_time

This setting defines the maximum time (in milliseconds) for the scheduler to sleep after being idle for hpx.max_idle_loop_count iterations. This setting is applicable only if HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF is set during configuration in CMake. By default this is defined by the preprocessor constant HPX_IDLE_BACKOFF_TIME_MAX. This is an internal setting that you should change only if you know exactly what you are doing.

hpx.exception_verbosity

This setting defines the verbosity of exceptions. Valid values are integers. A setting of 2 or higher prints all available information. A setting of 1 leaves out the build configuration and environment variables. A setting of 0 or lower prints only the description of the thrown exception and the file name, function, and line number where the exception was thrown. The default value is 2 or the value of the environment variable HPX_EXCEPTION_VERBOSITY.

hpx.trace_depth

This setting defines the number of stack-levels printed in generated stack backtraces. This defaults to 20, but can be changed using the cmake HPX_WITH_THREAD_BACKTRACE_DEPTH configuration setting.

hpx.handle_signals

This setting defines whether HPX will register signal handlers that will print the configuration information (stack backtrace, system information, etc.) whenever a signal is raised. The default is 1. Setting this value to 0 can be useful in cases when generating a core-dump on segmentation faults or similar signals is desired. This setting has no effects on non-Linux platforms.

hpx.stacks.small_size

This is initialized to the small stack size to be used by HPX threads. Set by default to the value of the compile time preprocessor constant HPX_SMALL_STACK_SIZE (defaults to 0x8000). This value is used for all HPX threads by default, except for the thread running hpx_main (which runs on a large stack).

hpx.stacks.medium_size

This is initialized to the medium stack size to be used by HPX threads. Set by default to the value of the compile time preprocessor constant HPX_MEDIUM_STACK_SIZE (defaults to 0x20000).

hpx.stacks.large_size

This is initialized to the large stack size to be used by HPX threads. Set by default to the value of the compile time preprocessor constant HPX_LARGE_STACK_SIZE (defaults to 0x200000). This setting is used by default for the thread running hpx_main only.

hpx.stacks.huge_size

This is initialized to the huge stack size to be used by HPX threads. Set by default to the value of the compile time preprocessor constant HPX_HUGE_STACK_SIZE (defaults to 0x2000000).

hpx.stacks.use_guard_pages

This entry controls whether the coroutine library will generate stack guard pages or not. This entry is applicable on Linux only and only if the HPX_USE_GENERIC_COROUTINE_CONTEXT option is not enabled and the HPX_WITH_THREAD_GUARD_PAGE is set to 1 while configuring the build system. It is set by default to 1.

The hpx.threadpools configuration section#
[hpx.threadpools]
io_pool_size = ${HPX_NUM_IO_POOL_SIZE:2}
parcel_pool_size = ${HPX_NUM_PARCEL_POOL_SIZE:2}
timer_pool_size = ${HPX_NUM_TIMER_POOL_SIZE:2}

Property

Description

hpx.threadpools.io_pool_size

The value of this property defines the number of OS threads created for the internal I/O thread pool.

hpx.threadpools.parcel_pool_size

The value of this property defines the number of OS threads created for the internal parcel thread pool.

hpx.threadpools.timer_pool_size

The value of this property defines the number of OS threads created for the internal timer thread pool.

The hpx.thread_queue configuration section#

Important

These are the setting control internal values used by the thread scheduling queues in the HPX scheduler. You should not modify these settings unless you know exactly what you are doing.

[hpx.thread_queue]
min_tasks_to_steal_pending = ${HPX_THREAD_QUEUE_MIN_TASKS_TO_STEAL_PENDING:0}
min_tasks_to_steal_staged = ${HPX_THREAD_QUEUE_MIN_TASKS_TO_STEAL_STAGED:0}
min_add_new_count = ${HPX_THREAD_QUEUE_MIN_ADD_NEW_COUNT:10}
max_add_new_count = ${HPX_THREAD_QUEUE_MAX_ADD_NEW_COUNT:10}
max_delete_count = ${HPX_THREAD_QUEUE_MAX_DELETE_COUNT:1000}

Property

Description

hpx.thread_queue.min_tasks_to_steal_pending

The value of this property defines the number of pending HPX threads that have to be available before neighboring cores are allowed to steal work. The default is to allow stealing always.

hpx.thread_queue.min_tasks_to_steal_staged

The value of this property defines the number of staged HPX tasks that need to be available before neighboring cores are allowed to steal work. The default is to allow stealing always.

hpx.thread_queue.min_add_new_count

The value of this property defines the minimal number of tasks to be converted into HPX threads whenever the thread queues for a core have run empty.

hpx.thread_queue.max_add_new_count

The value of this property defines the maximal number of tasks to be converted into HPX threads whenever the thread queues for a core have run empty.

hpx.thread_queue.max_delete_count

The value of this property defines the number of terminated HPX threads to discard during each invocation of the corresponding function.

The hpx.components configuration section#
[hpx.components]
load_external = ${HPX_LOAD_EXTERNAL_COMPONENTS:1}

Property

Description

hpx.components.load_external

This entry defines whether external components will be loaded on this locality. This entry is normally set to 1, and usually there is no need to directly change this value. It is automatically set to 0 for a dedicated AGAS server locality.

Additionally, the section hpx.components will be populated with the information gathered from all found components. The information loaded for each of the components will contain at least the following properties:

[hpx.components.<component_instance_name>]
name = <component_name>
path = <full_path_of_the_component_module>
enabled = $[hpx.components.load_external]

Property

Description

hpx.components.<component_instance_name>.name

This is the name of a component, usually the same as the second argument to the macro used while registering the component with HPX_REGISTER_COMPONENT. Set by the component factory.

hpx.components.<component_instance_name>.path

This is either the full path file name of the component module or the directory the component module is located in. In this case, the component module name will be derived from the property hpx.components.<component_instance_name>.name. Set by the component factory.

hpx.components.<component_instance_name>.enabled

This setting explicitly enables or disables the component. This is an optional property. HPX assumes that the component is enabled if it is not defined.

The value for <component_instance_name> is usually the same as for the corresponding name property. However, generally it can be defined to any arbitrary instance name. It is used to distinguish between different ini sections, one for each component.

The hpx.parcel configuration section#
[hpx.parcel]
address = ${HPX_PARCEL_SERVER_ADDRESS:<hpx_initial_ip_address>}
port = ${HPX_PARCEL_SERVER_PORT:<hpx_initial_ip_port>}
bootstrap = ${HPX_PARCEL_BOOTSTRAP:<hpx_parcel_bootstrap>}
max_connections = ${HPX_PARCEL_MAX_CONNECTIONS:<hpx_parcel_max_connections>}
max_connections_per_locality = ${HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY:<hpx_parcel_max_connections_per_locality>}
max_message_size = ${HPX_PARCEL_MAX_MESSAGE_SIZE:<hpx_parcel_max_message_size>}
max_outbound_message_size = ${HPX_PARCEL_MAX_OUTBOUND_MESSAGE_SIZE:<hpx_parcel_max_outbound_message_size>}
array_optimization = ${HPX_PARCEL_ARRAY_OPTIMIZATION:1}
zero_copy_optimization = ${HPX_PARCEL_ZERO_COPY_OPTIMIZATION:$[hpx.parcel.array_optimization]}
zero_copy_receive_optimization = ${HPX_PARCEL_ZERO_COPY_RECEIVE_OPTIMIZATION:$[hpx.parcel.array_optimization]}
async_serialization = ${HPX_PARCEL_ASYNC_SERIALIZATION:1}
message_handlers = ${HPX_PARCEL_MESSAGE_HANDLERS:0}

Property

Description

hpx.parcel.address

This property defines the default IP address to be used for the parcel layer to listen to. This IP address will be used as long as no other values are specified (for instance, using the --hpx:hpx command line option). The expected format is any valid IP address or domain name format that can be resolved into an IP address. The default depends on the compile time preprocessor constant HPX_INITIAL_IP_ADDRESS ("127.0.0.1").

hpx.parcel.port

This property defines the default IP port to be used for the parcel layer to listen to. This IP port will be used as long as no other values are specified (for instance using the --hpx:hpx command line option). The default depends on the compile time preprocessor constant HPX_INITIAL_IP_PORT (7910).

hpx.parcel.bootstrap

This property defines which parcelport type should be used during application bootstrap. The default depends on the compile time preprocessor constant HPX_PARCEL_BOOTSTRAP ("tcp").

hpx.parcel.max_connections

This property defines how many network connections between different localities are overall kept alive by each locality. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_CONNECTIONS (512).

hpx.parcel.max_connections_per_locality

This property defines the maximum number of network connections that one locality will open to another locality. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY (4).

hpx.parcel.max_message_size

This property defines the maximum allowed message size that will be transferrable through the parcel layer. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_MESSAGE_SIZE (1000000000 bytes).

hpx.parcel.max_outbound_message_size

This property defines the maximum allowed outbound coalesced message size that will be transferrable through the parcel layer. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_OUTBOUND_MESSAGE_SIZE (1000000 bytes).

hpx.parcel.array_optimization

This property defines whether this locality is allowed to utilize array optimizations during serialization of parcel data. The default is 1.

hpx.parcel.zero_copy_optimization

This property defines whether this locality is allowed to utilize zero copy optimizations during serialization of parcel data. The default is the same value as set for hpx.parcel.array_optimization.

hpx.parcel.zero_copy_receive_optimization

This property defines whether this locality is allowed to utilize zero copy optimizations on the receiving end during de-serialization of parcel data. The default is the same value as set for hpx.parcel.zero_copy_optimization.

hpx.parcel.zero_copy_serialization_threshold

This property defines the threshold value (in bytes) starting at which the serialization layer will apply zero-copy optimizations for serialized entities. The default value is defined by the preprocessor constant HPX_ZERO_COPY_SERIALIZATION_THRESHOLD.

hpx.parcel.async_serialization

This property defines whether this locality is allowed to spawn a new thread for serialization (this is both for encoding and decoding parcels). The default is 1.

hpx.parcel.message_handlers

This property defines whether message handlers are loaded. The default is 0.

hpx.parcel.max_background_threads

This property defines how many cores should be used to perform background operations. The default is -1 (all cores).

The following settings relate to the TCP/IP parcelport.

[hpx.parcel.tcp]
enable = ${HPX_HAVE_PARCELPORT_TCP:$[hpx.parcel.enabled]}
array_optimization = ${HPX_PARCEL_TCP_ARRAY_OPTIMIZATION:$[hpx.parcel.array_optimization]}
zero_copy_optimization = ${HPX_PARCEL_TCP_ZERO_COPY_OPTIMIZATION:$[hpx.parcel.zero_copy_optimization]}
zero_copy_receive_optimization = ${HPX_PARCEL_TCP_ZERO_COPY_RECEIVE_OPTIMIZATION:$[hpx.parcel.zero_copy_receive_optimization]}
zero_copy_serialization_threshold =  ${HPX_PARCEL_TCP_ZERO_COPY_SERIALIZATION_THRESHOLD:$[hpx.parcel.zero_copy_serialization_threshold]}
async_serialization = ${HPX_PARCEL_TCP_ASYNC_SERIALIZATION:$[hpx.parcel.async_serialization]}
parcel_pool_size = ${HPX_PARCEL_TCP_PARCEL_POOL_SIZE:$[hpx.threadpools.parcel_pool_size]}
max_connections =  ${HPX_PARCEL_TCP_MAX_CONNECTIONS:$[hpx.parcel.max_connections]}
max_connections_per_locality = ${HPX_PARCEL_TCP_MAX_CONNECTIONS_PER_LOCALITY:$[hpx.parcel.max_connections_per_locality]}
max_message_size =  ${HPX_PARCEL_TCP_MAX_MESSAGE_SIZE:$[hpx.parcel.max_message_size]}
max_outbound_message_size =  ${HPX_PARCEL_TCP_MAX_OUTBOUND_MESSAGE_SIZE:$[hpx.parcel.max_outbound_message_size]}
max_background_threads =  ${HPX_PARCEL_TCP_MAX_BACKGROUND_THREADS:$[hpx.parcel.max_background_threads]}

Property

Description

hpx.parcel.tcp.enable

Enables the use of the default TCP parcelport. Note that the initial bootstrap of the overall HPX application will be performed using the default TCP connections. This parcelport is enabled by default. This will be disabled only if MPI is enabled (see below).

hpx.parcel.tcp.array_optimization

This property defines whether this locality is allowed to utilize array optimizations in the TCP/IP parcelport during serialization of parcel data. The default is the same value as set for hpx.parcel.array_optimization.

hpx.parcel.tcp.zero_copy_optimization

This property defines whether this locality is allowed to utilize zero copy optimizations during serialization of parcel data. The default is the same value as set for hpx.parcel.zero_copy_optimization.

hpx.parcel.tcp.zero_copy_receive_optimization

This property defines whether this locality is allowed to utilize zero copy optimizations on the receiving end in the TCP/IP parcelport during de-serialization of parcel data. The default is the same value as set for hpx.parcel.zero_copy_optimization.

hpx.parcel.tcp.zero_copy_serialization_threshold

This property defines the threshold value (in bytes) starting at which the serialization layer will apply zero-copy optimizations for serialized entities. The default is the same value as set for hpx.parcel.zero_copy_serialization_threshold.

hpx.parcel.tcp.async_serialization

This property defines whether this locality is allowed to spawn a new thread for serialization in the TCP/IP parcelport (this is both for encoding and decoding parcels). The default is the same value as set for hpx.parcel.async_serialization.

hpx.parcel.tcp.parcel_pool_size

The value of this property defines the number of OS threads created for the internal parcel thread pool of the TCP parcel port. The default is taken from hpx.threadpools.parcel_pool_size.

hpx.parcel.tcp.max_connections

This property defines how many network connections between different localities are overall kept alive by each locality. The default is taken from hpx.parcel.max_connections.

hpx.parcel.tcp.max_connections_per_locality

This property defines the maximum number of network connections that one locality will open to another locality. The default is taken from hpx.parcel.max_connections_per_locality.

hpx.parcel.tcp.max_message_size

This property defines the maximum allowed message size that will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_message_size.

hpx.parcel.tcp.max_outbound_message_size

This property defines the maximum allowed outbound coalesced message size that will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_outbound_connections.

hpx.parcel.tcp.max_background_threads

This property defines how many cores should be used to perform background operations. The default is taken from hpx.parcel.max_background_threads.

The following settings relate to the MPI parcelport. These settings take effect only if the compile time constant HPX_HAVE_PARCELPORT_MPI is set (the equivalent CMake variable is HPX_WITH_PARCELPORT_MPI and has to be set to ON).

[hpx.parcel.mpi]
enable = ${HPX_HAVE_PARCELPORT_MPI:$[hpx.parcel.enabled]}
env = ${HPX_HAVE_PARCELPORT_MPI_ENV:MV2_COMM_WORLD_RANK,PMI_RANK,OMPI_COMM_WORLD_SIZE,ALPS_APP_PE,PALS_NODEID}
multithreaded = ${HPX_HAVE_PARCELPORT_MPI_MULTITHREADED:1}
rank = <MPI_rank>
processor_name = <MPI_processor_name>
array_optimization = ${HPX_HAVE_PARCEL_MPI_ARRAY_OPTIMIZATION:$[hpx.parcel.array_optimization]}
zero_copy_optimization = ${HPX_HAVE_PARCEL_MPI_ZERO_COPY_OPTIMIZATION:$[hpx.parcel.zero_copy_optimization]}
zero_copy_receive_optimization = ${HPX_HAVE_PARCEL_MPI_ZERO_COPY_RECEIVE_OPTIMIZATION:$[hpx.parcel.zero_copy_receive_optimization]}
zero_copy_serialization_threshold =  ${HPX_PARCEL_MPI_ZERO_COPY_SERIALIZATION_THRESHOLD:$[hpx.parcel.zero_copy_serialization_threshold]}
use_io_pool = ${HPX_HAVE_PARCEL_MPI_USE_IO_POOL:$1}
async_serialization = ${HPX_HAVE_PARCEL_MPI_ASYNC_SERIALIZATION:$[hpx.parcel.async_serialization]}
parcel_pool_size = ${HPX_HAVE_PARCEL_MPI_PARCEL_POOL_SIZE:$[hpx.threadpools.parcel_pool_size]}
max_connections =  ${HPX_HAVE_PARCEL_MPI_MAX_CONNECTIONS:$[hpx.parcel.max_connections]}
max_connections_per_locality = ${HPX_HAVE_PARCEL_MPI_MAX_CONNECTIONS_PER_LOCALITY:$[hpx.parcel.max_connections_per_locality]}
max_message_size =  ${HPX_HAVE_PARCEL_MPI_MAX_MESSAGE_SIZE:$[hpx.parcel.max_message_size]}
max_outbound_message_size =  ${HPX_HAVE_PARCEL_MPI_MAX_OUTBOUND_MESSAGE_SIZE:$[hpx.parcel.max_outbound_message_size]}
max_background_threads =  ${HPX_PARCEL_MPI_MAX_BACKGROUND_THREADS:$[hpx.parcel.max_background_threads]}

Property

Description

hpx.parcel.mpi.enable

Enables the use of the MPI parcelport. HPX tries to detect if the application was started within a parallel MPI environment. If the detection was successful, the MPI parcelport is enabled by default. To explicitly disable the MPI parcelport, set to 0. Note that the initial bootstrap of the overall HPX application will be performed using MPI as well.

hpx.parcel.mpi.env

This property influences which environment variables (separated by commas) will be analyzed to find out whether the application was invoked by MPI.

hpx.parcel.mpi.multithreaded

This property is used to determine what threading mode to use when initializing MPI. If this setting is 0, HPX will initialize MPI with MPI_THREAD_SINGLE. If the value is not equal to 0, HPX will initialize MPI with MPI_THREAD_MULTI.

hpx.parcel.mpi.rank

This property will be initialized to the MPI rank of the locality.

hpx.parcel.mpi.processor_name

This property will be initialized to the MPI processor name of the locality.

hpx.parcel.mpi.array_optimization

This property defines whether this locality is allowed to utilize array optimizations in the MPI parcelport during serialization of parcel data. The default is the same value as set for hpx.parcel.array_optimization.

hpx.parcel.mpi.zero_copy_optimization

This property defines whether this locality is allowed to utilize zero copy optimizations in the MPI parcelport during serialization of parcel data. The default is the same value as set for hpx.parcel.zero_copy_optimization.

hpx.parcel.mpi.zero_copy_receive_optimization

This property defines whether this locality is allowed to utilize zero copy optimizations on the receiving end in the MPI parcelport during de-serialization of parcel data. The default is the same value as set for hpx.parcel.zero_copy_optimization.

hpx.parcel.mpi.zero_copy_serialization_threshold

This property defines the threshold value (in bytes) starting at which the serialization layer will apply zero-copy optimizations for serialized entities. The default is the same value as set for hpx.parcel.zero_copy_serialization_threshold.

hpx.parcel.mpi.use_io_pool

This property can be set to run the progress thread inside of HPX threads instead of a separate thread pool. The default is 1.

hpx.parcel.mpi.async_serialization

This property defines whether this locality is allowed to spawn a new thread for serialization in the MPI parcelport (this is both for encoding and decoding parcels). The default is the same value as set for hpx.parcel.async_serialization.

hpx.parcel.mpi.parcel_pool_size

The value of this property defines the number of OS threads created for the internal parcel thread pool of the MPI parcel port. The default is taken from hpx.threadpools.parcel_pool_size.

hpx.parcel.mpi.max_connections

This property defines how many network connections between different localities are overall kept alive by each locality. The default is taken from hpx.parcel.max_connections.

hpx.parcel.mpi.max_connections_per_locality

This property defines the maximum number of network connections that one locality will open to another locality. The default is taken from hpx.parcel.max_connections_per_locality.

hpx.parcel.mpi.max_message_size

This property defines the maximum allowed message size that will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_message_size.

hpx.parcel.mpi.max_outbound_message_size

This property defines the maximum allowed outbound coalesced message size that will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_outbound_connections.

hpx.parcel.mpi.max_background_threads

This property defines how many cores should be used to perform background operations. The default is taken from hpx.parcel.max_background_threads.

The hpx.agas configuration section#
[hpx.agas]
address = ${HPX_AGAS_SERVER_ADDRESS:<hpx_initial_ip_address>}
port = ${HPX_AGAS_SERVER_PORT:<hpx_initial_ip_port>}
service_mode = hosted
dedicated_server = 0
max_pending_refcnt_requests = ${HPX_AGAS_MAX_PENDING_REFCNT_REQUESTS:<hpx_initial_agas_max_pending_refcnt_requests>}
use_caching = ${HPX_AGAS_USE_CACHING:1}
use_range_caching = ${HPX_AGAS_USE_RANGE_CACHING:1}
local_cache_size = ${HPX_AGAS_LOCAL_CACHE_SIZE:<hpx_agas_local_cache_size>}

Property

Description

hpx.agas.address

This property defines the default IP address to be used for the AGAS root server. This IP address will be used as long as no other values are specified (for instance, using the --hpx:agas command line option). The expected format is any valid IP address or domain name format that can be resolved into an IP address. The default depends on the compile time preprocessor constant HPX_INITIAL_IP_ADDRESS ("127.0.0.1").

hpx.agas.port

This property defines the default IP port to be used for the AGAS root server. This IP port will be used as long as no other values are specified (for instance, using the --hpx:agas command line option). The default depends on the compile time preprocessor constant HPX_INITIAL_IP_PORT (7009).

hpx.agas.service_mode

This property specifies what type of AGAS service is running on this locality. Currently, two modes exist. The locality that acts as the AGAS server runs in bootstrap mode. All other localities are in hosted mode.

hpx.agas.dedicated_server

This property specifies whether the AGAS server is exclusively running AGAS services and not hosting any application components. It is a boolean value. Set to 1 if --hpx:run-agas-server-only is present.

hpx.agas.max_pending_refcnt_requests

This property defines the number of reference counting requests (increments or decrements) to buffer. The default depends on the compile time preprocessor constant HPX_INITIAL_AGAS_MAX_PENDING_REFCNT_REQUESTS (4096).

hpx.agas.use_caching

This property specifies whether a software address translation cache is used. It is a boolean value. Defaults to 1.

hpx.agas.use_range_caching

This property specifies whether range-based caching is used by the software address translation cache. This property is ignored if hpx.agas.use_caching is false. It is a boolean value. Defaults to 1.

hpx.agas.local_cache_size

This property defines the size of the software address translation cache for AGAS services. This property is ignored if hpx.agas.use_caching is false. Note that if hpx.agas.use_range_caching is true, this size will refer to the maximum number of ranges stored in the cache, not the number of entries spanned by the cache. The default depends on the compile time preprocessor constant HPX_AGAS_LOCAL_CACHE_SIZE (4096).

The hpx.commandline configuration section#

The following table lists the definition of all pre-defined command line option shortcuts. For more information about commandline options, see the section HPX Command Line Options.

[hpx.commandline]
aliasing = ${HPX_COMMANDLINE_ALIASING:1}
allow_unknown = ${HPX_COMMANDLINE_ALLOW_UNKNOWN:0}

[hpx.commandline.aliases]
-a = --hpx:agas
-c = --hpx:console
-h = --hpx:help
-I = --hpx:ini
-l = --hpx:localities
-p = --hpx:app-config
-q = --hpx:queuing
-r = --hpx:run-agas-server
-t = --hpx:threads
-v = --hpx:version
-w = --hpx:worker
-x = --hpx:hpx
-0 = --hpx:node=0
-1 = --hpx:node=1
-2 = --hpx:node=2
-3 = --hpx:node=3
-4 = --hpx:node=4
-5 = --hpx:node=5
-6 = --hpx:node=6
-7 = --hpx:node=7
-8 = --hpx:node=8
-9 = --hpx:node=9

Note

The short options listed above are disabled by default if the application is built using #include <hpx/hpx_main.hpp>. See Re-use the main() function as the main HPX entry point for more information. The rationale behind this is that in this case the user’s application may handle its own command line options, since HPX passes all unknown options to main(). Short options like -t are prone to create ambiguities regarding what the application will support. Hence, the user should instead rely on the corresponding long options like --hpx:threads in such a case.

Property

Description

hpx.commandline.aliasing

Enable command line aliases as defined in the section hpx.commandline.aliases (see below). Defaults to 1.

hpx.commandline.allow_unknown

Allow for unknown command line options to be passed through to hpx_main() Defaults to 0.

hpx.commandline.aliases.-a

On the commandline -a expands to: --hpx:agas.

hpx.commandline.aliases.-c

On the commandline -c expands to: --hpx:console.

hpx.commandline.aliases.-h

On the commandline -h expands to: --hpx:help.

hpx.commandline.aliases.--help

On the commandline --help expands to: --hpx:help.

hpx.commandline.aliases.-I

On the commandline -I expands to: --hpx:ini.

hpx.commandline.aliases.-l

On the commandline -l expands to: --hpx:localities.

hpx.commandline.aliases.-p

On the commandline -p expands to: --hpx:app-config.

hpx.commandline.aliases.-q

On the commandline -q expands to: --hpx:queuing.

hpx.commandline.aliases.-r

On the commandline -r expands to: --hpx:run-agas-server.

hpx.commandline.aliases.-t

On the commandline -t expands to: --hpx:threads.

hpx.commandline.aliases.-v

On the commandline -v expands to: --hpx:version.

hpx.commandline.aliases.--version

On the commandline --version expands to: --hpx:version.

hpx.commandline.aliases.-w

On the commandline -w expands to: --hpx:worker.

hpx.commandline.aliases.-x

On the commandline -x expands to: --hpx:hpx.

hpx.commandline.aliases.-0

On the commandline -0 expands to: --hpx:node=0.

hpx.commandline.aliases.-1

On the commandline -1 expands to: --hpx:node=1.

hpx.commandline.aliases.-2

On the commandline -2 expands to: --hpx:node=2.

hpx.commandline.aliases.-3

On the commandline -3 expands to: --hpx:node=3.

hpx.commandline.aliases.-4

On the commandline -4 expands to: --hpx:node=4.

hpx.commandline.aliases.-5

On the commandline -5 expands to: --hpx:node=5.

hpx.commandline.aliases.-6

On the commandline -6 expands to: --hpx:node=6.

hpx.commandline.aliases.-7

On the commandline -7 expands to: --hpx:node=7.

hpx.commandline.aliases.-8

On the commandline -8 expands to: --hpx:node=8.

hpx.commandline.aliases.-9

On the commandline -9 expands to: --hpx:node=9.

Loading INI files#

During startup and after the internal database has been initialized as described in the section Built-in default configuration settings, HPX will try to locate and load additional ini files to be used as a source for configuration properties. This allows for a wide spectrum of additional customization possibilities by the user and system administrators. The sequence of locations where HPX will try loading the ini files is well defined and documented in this section. All ini files found are merged into the internal configuration database. The merge operation itself conforms to the rules as described in the section The HPX ini file format.

  1. Load all component shared libraries found in the directories specified by the property hpx.component_path and retrieve their default configuration information (see section Loading components for more details). This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or by ';' (Windows).

  2. Load all files named hpx.ini in the directories referenced by the property hpx.master_ini_path This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or by ';' (Windows).

  3. Load a file named .hpx.ini in the current working directory, e.g., the directory the application was invoked from.

  4. Load a file referenced by the environment variable HPX_INI. This variable is expected to provide the full path name of the ini configuration file (if any).

  5. Load a file named /etc/hpx.ini. This lookup is done on non-Windows systems only.

  6. Load a file named .hpx.ini in the home directory of the current user, e.g., the directory referenced by the environment variable HOME.

  7. Load a file named .hpx.ini in the directory referenced by the environment variable PWD.

  8. Load the file specified on the command line using the option --hpx:config.

  9. Load all properties specified on the command line using the option --hpx:ini. The properties will be added to the database in the same sequence as they are specified on the command line. The format for those options is, for instance, --hpx:ini=hpx.default_stack_size=0x4000. In addition to the explicit command line options, this will set the following properties as implied from other settings:

  10. Load files based on the pattern *.ini in all directories listed by the property hpx.ini_path. All files found during this search will be merged. The property hpx.ini_path can hold a list of directories separated by ':' (on Linux or Mac) or ';' (on Windows).

  11. Load the file specified on the command line using the option --hpx:app-config. Note that this file will be merged as the content for a top level section [application].

Note

Any changes made to the configuration database caused by one of the steps will influence the loading process for all subsequent steps. For instance, if one of the ini files loaded changes the property hpx.ini_path, this will influence the directories searched in step 9 as described above.

Important

The HPX core library will verify that all configuration settings specified on the command line (using the --hpx:ini option) will be checked for validity. That means that the library will accept only known configuration settings. This is to protect the user from unintentional typos while specifying those settings. This behavior can be overwritten by appending a '!' to the configuration key, thus forcing the setting to be entered into the configuration database. For instance: --hpx:ini=hpx.foo! = 1

If any of the environment variables or files listed above are not found, the corresponding loading step will be silently skipped.

Loading components#

HPX relies on loading application specific components during the runtime of an application. Moreover, HPX comes with a set of preinstalled components supporting basic functionalities useful for almost every application. Any component in HPX is loaded from a shared library, where any of the shared libraries can contain more than one component type. During startup, HPX tries to locate all available components (e.g., their corresponding shared libraries) and creates an internal component registry for later use. This section describes the algorithm used by HPX to locate all relevant shared libraries on a system. As described, this algorithm is customizable by the configuration properties loaded from the ini files (see section Loading INI files).

Loading components is a two-stage process. First HPX tries to locate all component shared libraries, loads those, and generates a default configuration section in the internal configuration database for each component found. For each found component the following information is generated:

[hpx.components.<component_instance_name>]
name = <name_of_shared_library>
path = $[component_path]
enabled = $[hpx.components.load_external]
default = 1

The values in this section correspond to the expected configuration information for a component as described in the section Built-in default configuration settings.

In order to locate component shared libraries, HPX will try loading all shared libraries (files with the platform specific extension of a shared library, Linux: *.so, Windows: *.dll, MacOS: *.dylib found in the directory referenced by the ini property hpx.component_path).

This first step corresponds to step 1) during the process of filling the internal configuration database with default information as described in section Loading INI files.

After all of the configuration information has been loaded, HPX performs the second step in terms of loading components. During this step, HPX scans all existing configuration sections [hpx.component.<some_component_instance_name>] and instantiates a special factory object for each of the successfully located and loaded components. During the application’s life time, these factory objects are responsible for creating new and discarding old instances of the component they are associated with. This step is performed after step 11) of the process of filling the internal configuration database with default information as described in section Loading INI files.

Application specific component example#

This section assumes there is a simple application component that exposes one member function as a component action. The header file app_server.hpp declares the C++ type to be exposed as a component. This type has a member function print_greeting(), which is exposed as an action print_greeting_action. We assume the source files for this example are located in a directory referenced by $APP_ROOT:

// file: $APP_ROOT/app_server.hpp
#include <hpx/hpx.hpp>
#include <hpx/include/iostreams.hpp>

namespace app
{
    // Define a simple component exposing one action 'print_greeting'
    class HPX_COMPONENT_EXPORT server
      : public hpx::components::component_base<server>
    {
        void print_greeting ()
        {
            hpx::cout << "Hey, how are you?\n" << std::flush;
        }

        // Component actions need to be declared, this also defines the
        // type 'print_greeting_action' representing the action.
        HPX_DEFINE_COMPONENT_ACTION(server, print_greeting, print_greeting_action);
    };
}

// Declare boilerplate code required for each of the component actions.
HPX_REGISTER_ACTION_DECLARATION(app::server::print_greeting_action);

The corresponding source file contains mainly macro invocations that define the boilerplate code needed for HPX to function properly:

// file: $APP_ROOT/app_server.cpp
#include "app_server.hpp"

// Define boilerplate required once per component module.
HPX_REGISTER_COMPONENT_MODULE();

// Define factory object associated with our component of type 'app::server'.
HPX_REGISTER_COMPONENT(app::server, app_server);

// Define boilerplate code required for each of the component actions. Use the
// same argument as used for HPX_REGISTER_ACTION_DECLARATION above.
HPX_REGISTER_ACTION(app::server::print_greeting_action);

The following gives an example of how the component can be used. Here, one instance of the app::server component is created on the current locality and the exposed action print_greeting_action is invoked using the global id of the newly created instance. Note that no special code is required to delete the component instance after it is not needed anymore. It will be deleted automatically when its last reference goes out of scope (shown in the example below at the closing brace of the block surrounding the code):

// file: $APP_ROOT/use_app_server_example.cpp
#include <hpx/hpx_init.hpp>
#include "app_server.hpp"

int hpx_main()
{
    {
        // Create an instance of the app_server component on the current locality.
        hpx::naming:id_type app_server_instance =
            hpx::create_component<app::server>(hpx::find_here());

        // Create an instance of the action 'print_greeting_action'.
        app::server::print_greeting_action print_greeting;

        // Invoke the action 'print_greeting' on the newly created component.
        print_greeting(app_server_instance);
    }
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::init(argc, argv);
}

In order to make sure that the application will be able to use the component app::server, special configuration information must be passed to HPX. The simplest way to allow HPX to ‘find’ the component is to provide special ini configuration files that add the necessary information to the internal configuration database. The component should have a special ini file containing the information specific to the component app_server.

# file: $APP_ROOT/app_server.ini
[hpx.components.app_server]
name = app_server
path = $APP_LOCATION/

Here, $APP_LOCATION is the directory where the (binary) component shared library is located. HPX will attempt to load the shared library from there. The section name hpx.components.app_server reflects the instance name of the component (app_server is an arbitrary, but unique name). The property value for hpx.components.app_server.name should be the same as used for the second argument to the macro HPX_REGISTER_COMPONENT above.

Additionally, a file .hpx.ini, which could be located in the current working directory (see step 3 as described in the section Loading INI files), can be used to add to the ini search path for components:

# file: $PWD/.hpx.ini
[hpx]
ini_path = $[hpx.ini_path]:$APP_ROOT/

This assumes that the above ini file specific to the component is located in the directory $APP_ROOT.

Note

It is possible to reference the defined property from inside its value. HPX will gracefully use the previous value of hpx.ini_path for the reference on the right hand side and assign the overall (now expanded) value to the property.

Logging#

HPX uses a sophisticated logging framework, allowing users to follow in detail what operations have been performed inside the HPX library in what sequence. This information proves to be very useful for diagnosing problems or just for improving the understanding of what is happening in HPX as a consequence of invoking HPX API functionality.

Default logging#

Enabling default logging is a simple process. The detailed description in the remainder of this section explains different ways to customize the defaults. Default logging can be enabled by using one of the following:

  • A command line switch --hpx:debug-hpx-log, which will enable logging to the console terminal.

  • The command line switch --hpx:debug-hpx-log=<filename>, which enables logging to a given file <filename>.

  • Setting an environment variable HPX_LOGLEVEL=<loglevel> while running the HPX application. In this case <loglevel> should be a number between (or equal to) 1 and 5 where 1 means minimal logging and 5 causes all available messages to be logged. When setting the environment variable, the logs will be written to a file named hpx.<PID>.lo in the current working directory, where <PID> is the process id of the console instance of the application.

Customizing logging#

Generally, logging can be customized either using environment variable settings or using by an ini configuration file. Logging is generated in several categories, each of which can be customized independently. All customizable configuration parameters have reasonable defaults, allowing for the use of logging without any additional configuration effort. The following table lists the available categories.

Table 5 Logging categories#

Category

Category shortcut

Information to be generated

Environment variable

General

None

Logging information generated by different subsystems of HPX, such as thread-manager, parcel layer, LCOs, etc.

HPX_LOGLEVEL

AGAS

AGAS

Logging output generated by the AGAS subsystem

HPX_AGAS_LOGLEVEL

Application

APP

Logging generated by applications.

HPX_APP_LOGLEVEL

By default, all logging output is redirected to the console instance of an application, where it is collected and written to a file, one file for each logging category.

Each logging category can be customized at two levels. The parameters for each are stored in the ini configuration sections hpx.logging.CATEGORY and hpx.logging.console.CATEGORY (where CATEGORY is the category shortcut as listed in the table above). The former influences logging at the source locality and the latter modifies the logging behaviour for each of the categories at the console instance of an application.

Levels#

All HPX logging output has seven different logging levels. These levels can be set explicitly or through environment variables in the main HPX ini file as shown below. The logging levels and their associated integral values are shown in the table below, ordered from most verbose to least verbose. By default, all HPX logs are set to 0, e.g., all logging output is disabled by default.

Table 6 Logging levels#

Logging level

Integral value

<debug>

5

<info>

4

<warning>

3

<error>

2

<fatal>

1

No logging

0

Tip

The easiest way to enable logging output is to set the environment variable corresponding to the logging category to an integral value as described in the table above. For instance, setting HPX_LOGLEVEL=5 will enable full logging output for the general category. Please note that the syntax and means of setting environment variables varies between operating systems.

Configuration#

Logs will be saved to destinations as configured by the user. By default, logging output is saved on the console instance of an application to hpx.<CATEGORY>.<PID>.lo (where CATEGORY and PID> are placeholders for the category shortcut and the OS process id). The output for the general logging category is saved to hpx.<PID>.log. The default settings for the general logging category are shown here (the syntax is described in the section The HPX ini file format):

[hpx.logging]
level = ${HPX_LOGLEVEL:0}
destination = ${HPX_LOGDESTINATION:console}
format = ${HPX_LOGFORMAT:(T%locality%/%hpxthread%.%hpxphase%/%hpxcomponent%) P%parentloc%/%hpxparent%.%hpxparentphase% %time%($hh:$mm.$ss.$mili) [%idx%]|\\n}

The logging level is taken from the environment variable HPX_LOGLEVEL and defaults to zero, e.g., no logging. The default logging destination is read from the environment variable HPX_LOGDESTINATION On any of the localities it defaults to console, which redirects all generated logging output to the console instance of an application. The following table lists the possible destinations for any logging output. It is possible to specify more than one destination separated by whitespace.

Table 7 Logging destinations#

Logging destination

Description

file(<filename>)

Directs all output to a file with the given <filename>.

cout

Directs all output to the local standard output of the application instance on this locality.

cerr

Directs all output to the local standard error output of the application instance on this locality.

console

Directs all output to the console instance of the application. The console instance has its logging destinations configured separately.

android_log

Directs all output to the (Android) system log (available on Android systems only).

The logging format is read from the environment variable HPX_LOGFORMAT, and it defaults to a complex format description. This format consists of several placeholder fields (for instance %locality%), which will be replaced by concrete values when the logging output is generated. All other information is transferred verbatim to the output. The table below describes the available field placeholders. The separator character | separates the logging message prefix formatted as shown and the actual log message which will replace the separator.

Table 8 Available field placeholders#

Name

Description

locality

The id of the locality on which the logging message was generated.

hpxthread

The id of the HPX thread generating this logging output.

hpxphase

The phase 1 of the HPX thread generating this logging output.

hpxcomponent

The local virtual address of the component which the current HPX thread is accessing.

parentloc

The id of the locality where the HPX thread was running that initiated the current HPX thread. The current HPX thread is generating this logging output.

hpxparent

The id of the HPX thread that initiated the current HPX thread. The current HPX thread is generating this logging output.

hpxparentphase

The phase of the HPX thread when it initiated the current HPX thread. The current HPX thread is generating this logging output.

time

The time stamp for this logging outputline as generated by the source locality.

idx

The sequence number of the logging output line as generated on the source locality.

osthread

The sequence number of the OS thread that executes the current HPX thread.

Note

Not all of the field placeholder may be expanded for all generated logging output. If no value is available for a particular field, it is replaced with a sequence of '-' characters.

Here is an example line from a logging output generated by one of the HPX examples (please note that this is generated on a single line, without a line break):

(T00000000/0000000002d46f90.01/00000000009ebc10) P--------/0000000002d46f80.02 17:49.37.320 [000000000000004d]
    <info>  [RT] successfully created component {0000000100ff0001, 0000000000030002} of type: component_barrier[7(3)]

The default settings for the general logging category on the console is shown here:

[hpx.logging.console]
level = ${HPX_LOGLEVEL:$[hpx.logging.level]}
destination = ${HPX_CONSOLE_LOGDESTINATION:file(hpx.$[system.pid].log)}
format = ${HPX_CONSOLE_LOGFORMAT:|}

These settings define how the logging is customized once the logging output is received by the console instance of an application. The logging level is read from the environment variable HPX_LOGLEVEL (as set for the console instance of the application). The level defaults to the same values as the corresponding settings in the general logging configuration shown before. The destination on the console instance is set to be a file that’s name is generated based on its OS process id. Setting the environment variable HPX_CONSOLE_LOGDESTINATION allows customization of the naming scheme for the output file. The logging format is set to leave the original logging output unchanged, as received from one of the localities the application runs on.

HPX Command Line Options#

The predefined command line options for any application using hpx::init are described in the following subsections.

HPX options (allowed on command line only)#
--hpx:help#

Print out program usage (default: this message). Possible values: full (additionally prints options from components).

--hpx:version#

Print out HPX version and copyright information.

--hpx:info#

Print out HPX configuration information.

--hpx:options-file arg#

Specify a file containing command line options (alternatively: @filepath).

HPX options (additionally allowed in an options file)#
--hpx:worker#

Run this instance in worker mode.

--hpx:console#

Run this instance in console mode.

--hpx:connect#

Run this instance in worker mode, but connecting late.

--hpx:run-agas-server#

Run AGAS server as part of this runtime instance.

--hpx:run-hpx-main#

Run the hpx_main function, regardless of locality mode.

--hpx:hpx arg#

The IP address the HPX parcelport is listening on, expected format: address:port (default: 127.0.0.1:7910).

--hpx:agas arg#

The IP address the AGAS root server is running on, expected format: address:port (default: 127.0.0.1:7910).

--hpx:run-agas-server-only#

Run only the AGAS server.

--hpx:nodefile arg#

The file name of a node file to use (list of nodes, one node name per line and core).

--hpx:nodes arg#

The (space separated) list of the nodes to use (usually this is extracted from a node file).

--hpx:endnodes#

This can be used to end the list of nodes specified using the option --hpx:nodes.

--hpx:ifsuffix arg#

Suffix to append to host names in order to resolve them to the proper network interconnect.

--hpx:ifprefix arg#

Prefix to prepend to host names in order to resolve them to the proper network interconnect.

--hpx:iftransform arg#

Sed-style search and replace (s/search/replace/) used to transform host names to the proper network interconnect.

--hpx:force_ipv4#

Network hostnames will be resolved to ipv4 addresses instead of using the first resolved endpoint. This is especially useful on Windows where the local hostname will resolve to an ipv6 address while remote network hostnames are commonly resolved to ipv4 addresses.

--hpx:localities arg#

The number of localities to wait for at application startup (default: 1).

--hpx:node arg#

Number of the node this locality is run on (must be unique).

--hpx:ignore-batch-env#

Ignore batch environment variables.

--hpx:expect-connecting-localities#

This locality expects other localities to dynamically connect (this is implied if the number of initial localities is larger than 1).

--hpx:pu-offset#

The first processing unit this instance of HPX should be run on (default: 0).

--hpx:pu-step#

The step between used processing unit numbers for this instance of HPX (default: 1).

--hpx:threads arg#

The number of operating system threads to spawn for this HPX locality. Possible values are: numeric values 1, 2, 3 and so on, all (which spawns one thread per processing unit, includes hyperthreads), or cores (which spawns one thread per core) (default: cores).

--hpx:cores arg#

The number of cores to utilize for this HPX locality (default: all, i.e., the number of cores is based on the number of threads --hpx:threads assuming --hpx:bind=compact.

--hpx:affinity arg#

The affinity domain the OS threads will be confined to, possible values: pu, core, numa, machine (default: pu).

--hpx:bind arg#

he detailed affinity description for the OS threads, see More details about HPX command line options for a detailed description of possible values. Do not use with --hpx:pu-step, --hpx:pu-offset or --hpx:affinity options. Implies --hpx:numa-sensitive (--hpx:bind=none) disables defining thread affinities).

--hpx:use-process-mask#

Use the process mask to restrict available hardware resources (implies --hpx:ignore-batch-env).

--hpx:print-bind#

Print to the console the bit masks calculated from the arguments specified to all --hpx:bind options.

--hpx:queuing arg#

The queue scheduling policy to use. Options are local, local-priority-fifo, local-priority-lifo, static, static-priority, abp-priority-fifo and abp-priority-lifo (default: local-priority-fifo).

--hpx:high-priority-threads arg#

The number of operating system threads maintaining a high priority queue (default: number of OS threads), valid for --hpx:queuing=abp-priority, --hpx:queuing=static-priority and --hpx:queuing=local-priority only.

--hpx:numa-sensitive#

Makes the scheduler NUMA sensitive.

HPX configuration options#
--hpx:app-config arg#

Load the specified application configuration (ini) file.

--hpx:config arg#

Load the specified HPX configuration (ini) file.

--hpx:ini arg#

Add a configuration definition to the default runtime configuration.

--hpx:exit#

Exit after configuring the runtime.

HPX debugging options#
--hpx:list-symbolic-names#

List all registered symbolic names after startup.

--hpx:list-component-types#

List all dynamic component types after startup.

--hpx:dump-config-initial#

Print the initial runtime configuration.

--hpx:dump-config#

Print the final runtime configuration.

--hpx:debug-hpx-log [arg]#

Enable all messages on the HPX log channel and send all HPX logs to the target destination (default: cout).

--hpx:debug-agas-log [arg]#

Enable all messages on the AGAS log channel and send all AGAS logs to the target destination (default: cout).

--hpx:debug-parcel-log [arg]#

Enable all messages on the parcel transport log channel and send all parcel transport logs to the target destination (default: cout).

--hpx:debug-timing-log [arg]#

Enable all messages on the timing log channel and send all timing logs to the target destination (default: cout).

--hpx:debug-app-log [arg]#

Enable all messages on the application log channel and send all application logs to the target destination (default: cout).

--hpx:debug-clp#

Debug command line processing.

--hpx:attach-debugger arg#

Wait for a debugger to be attached, possible arg values: startup or exception (default: startup)

Command line argument shortcuts#

Additionally, the following shortcuts are available from every HPX application.

Table 9 Predefined command line option shortcuts#

Shortcut option

Equivalent long option

-a

--hpx:agas

-c

--hpx:console

-h

--hpx:help

-I

--hpx:ini

-l

--hpx:localities

-p

--hpx:app-config

-q

--hpx:queuing

-r

--hpx:run-agas-server

-t

--hpx:threads

-v

--hpx:version

-w

--hpx:worker

-x

--hpx:hpx

-0

--hpx:node=0

-1

--hpx:node=1

-2

--hpx:node=2

-3

--hpx:node=3

-4

--hpx:node=4

-5

--hpx:node=5

-6

--hpx:node=6

-7

--hpx:node=7

-8

--hpx:node=8

-9

--hpx:node=9

Note

The short options listed above are disabled by default if the application is built using #include <hpx/hpx_main.hpp>. See Re-use the main() function as the main HPX entry point for more information. The rationale behind this is that in this case the user’s application may handle its own command line options, since HPX passes all unknown options to main(). Short options like -t are prone to create ambiguities regarding what the application will support. Hence, the user should instead rely on the corresponding long options like --hpx:threads in such a case.

It is possible to define your own shortcut options. In fact, all of the shortcuts listed above are pre-defined using the technique described here. Also, it is possible to redefine any of the pre-defined shortcuts to expand differently as well.

Shortcut options are obtained from the internal configuration database. They are stored as key-value properties in a special properties section named hpx.commandline. You can define your own shortcuts by adding the corresponding definitions to one of the ini configuration files as described in the section Configuring HPX applications. For instance, in order to define a command line shortcut --p, which should expand to -hpx:print-counter, the following configuration information needs to be added to one of the ini configuration files:

[hpx.commandline.aliases]
--pc = --hpx:print-counter

Note

Any arguments for shortcut options passed on the command line are retained and passed as arguments to the corresponding expanded option. For instance, given the definition above, the command line option:

--pc=/threads{locality#0/total}/count/cumulative

would be expanded to:

--hpx:print-counter=/threads{locality#0/total}/count/cumulative

Important

Any shortcut option should either start with a single '-' or with two '--' characters. Shortcuts starting with a single '-' are interpreted as short options (i.e., everything after the first character following the '-' is treated as the argument). Shortcuts starting with '--' are interpreted as long options. No other shortcut formats are supported.

Specifying options for single localities only#

For runs involving more than one locality, it is sometimes desirable to supply specific command line options to single localities only. When the HPX application is launched using a scheduler (like PBS; for more details see section How to use HPX applications with PBS), specifying dedicated command line options for single localities may be desirable. For this reason all of the command line options that have the general format --hpx:<some_key> can be used in a more general form: --hpx:<N>:<some_key>, where <N> is the number of the locality this command line option will be applied to; all other localities will simply ignore the option. For instance, the following PBS script passes the option --hpx:pu-offset=4 to the locality '1' only.

#!/bin/bash
#
#PBS -l nodes=2:ppn=4

APP_PATH=~/packages/hpx/bin/hello_world_distributed
APP_OPTIONS=

pbsdsh -u $APP_PATH $APP_OPTIONS --hpx:1:pu-offset=4 --hpx:nodes=`cat $PBS_NODEFILE`

Caution

If the first application specific argument (inside $APP_OPTIONS) is a non-option (i.e., does not start with a - or a --), then it must be placed before the option --hpx:nodes, which, in this case, should be the last option on the command line.

Alternatively, use the option --hpx:endnodes to explicitly mark the end of the list of node names:

$ pbsdsh -u $APP_PATH --hpx:1:pu-offset=4 --hpx:nodes=`cat $PBS_NODEFILE` --hpx:endnodes $APP_OPTIONS
More details about HPX command line options#

This section documents the following list of the command line options in more detail:

The command line option --hpx:bind#

This command line option allows one to specify the required affinity of the HPX worker threads to the underlying processing units. As a result the worker threads will run only on the processing units identified by the corresponding bind specification. The affinity settings are to be specified using --hpx:bind=<BINDINGS>, where <BINDINGS> have to be formatted as described below.

In addition to the syntax described below, one can use --hpx:bind=none to disable all binding of any threads to a particular core. This is mostly supported for debugging purposes.

The specified affinities refer to specific regions within a machine hardware topology. In order to understand the hardware topology of a particular machine, it may be useful to run the lstopo tool, which is part of Portable Hardware Locality (HWLOC), to see the reported topology tree. Seeing and understanding a topology tree will definitely help in understanding the concepts that are discussed below.

Affinities can be specified using hwloc tuples. Tuples of hwloc objects and associated indexes can be specified in the form object:index, object:index-index or object:index,...,index. Hwloc objects represent types of mapped items in a topology tree. Possible values for objects are socket, numanode, core and pu (processing unit). Indexes are non-negative integers that specify a unique physical object in a topology tree using its logical sequence number.

Chaining multiple tuples together in the more general form object1:index1[.object2:index2[...]] is permissible. While the first tuple’s object may appear anywhere in the topology, the Nth tuple’s object must have a shallower topology depth than the (N+1)th tuple’s object. Put simply: as you move right in a tuple chain, objects must go deeper in the topology tree. Indexes specified in chained tuples are relative to the scope of the parent object. For example, socket:0.core:1 refers to the second core in the first socket (all indices are zero based).

Multiple affinities can be specified using several --hpx:bind command line options or by appending several affinities separated by a ';'. By default, if multiple affinities are specified, they are added.

"all" is a special affinity consisting in the entire current topology.

Note

All “names” in an affinity specification, such as thread, socket, numanode, pu or all, can be abbreviated. Thus, the affinity specification threads:0-3=socket:0.core:1.pu:1 is fully equivalent to its shortened form t:0-3=s:0.c:1.p:1.

Here is a full grammar describing the possible format of mappings:

mappings     ::=  distribution | mapping (";" mapping)*
distribution ::=  "compact" | "scatter" | "balanced" | "numa-balanced"
mapping      ::=  thread_spec "=" pu_specs
thread_spec  ::=  "thread:" range_specs
pu_specs     ::=  pu_spec ("." pu_spec)*
pu_spec      ::=  type ":" range_specs | "~" pu_spec
range_specs  ::=  range_spec ("," range_spec)*
range_spec   ::=  int | int "-" int | "all"
type         ::=  "socket" | "numanode" | "core" | "pu"

The following example assumes a system with at least 4 cores, where each core has more than 1 processing unit (hardware threads). Running hello_world_distributed with 4 OS threads (on 4 processing units), where each of those threads is bound to the first processing unit of each of the cores, can be achieved by invoking:

$ hello_world_distributed -t4 --hpx:bind=thread:0-3=core:0-3.pu:0

Here, thread:0-3 specifies the OS threads used to define affinity bindings, and core:0-3.pu: defines that for each of the cores (core:0-3) only their first processing unit pu:0 should be used.

Note

The command line option --hpx:print-bind can be used to print the bitmasks generated from the affinity mappings as specified with --hpx:bind. For instance, on a system with hyperthreading enabled (i.e. 2 processing units per core), the command line:

$ hello_world_distributed -t4 --hpx:bind=thread:0-3=core:0-3.pu:0 --hpx:print-bind

will cause this output to be printed:

0: PU L#0(P#0), Core L#0, Socket L#0, Node L#0(P#0)
1: PU L#2(P#2), Core L#1, Socket L#0, Node L#0(P#0)
2: PU L#4(P#4), Core L#2, Socket L#0, Node L#0(P#0)
3: PU L#6(P#6), Core L#3, Socket L#0, Node L#0(P#0)

where each bit in the bitmasks corresponds to a processing unit the listed worker thread will be bound to run on.

The difference between the four possible predefined distribution schemes (compact, scatter, balanced and numa-balanced) is best explained with an example. Imagine that we have a system with 4 cores and 4 hardware threads per core on 2 sockets. If we place 8 threads the assignments produced by the compact, scatter, balanced and numa-balanced types are shown in the figure below. Notice that compact does not fully utilize all the cores in the system. For this reason it is recommended that applications are run using the scatter or balanced/numa-balanced options in most cases.

_images/affinities.png

Fig. 7 Schematic of thread affinity type distributions.#

In addition to the predefined distributions it is possible to restrict the resources used by HPX to the process CPU mask. The CPU mask is typically set by e.g. MPI and batch environments. Using the command line option --hpx:use-process-mask makes HPX act as if only the processing units in the CPU mask are available for use by HPX. The number of threads is automatically determined from the CPU mask. The number of threads can still be changed manually using this option, but only to a number less than or equal to the number of processing units in the CPU mask. The option --hpx:print-bind is useful in conjunction with --hpx:use-process-mask to make sure threads are placed as expected.

1

The phase of a HPX-thread counts how often this thread has been activated.

Writing single-node applications#

Being a C++ Standard Library for Concurrency and Parallelism, HPX implements all of the corresponding facilities as defined by the C++ Standard but also those which are proposed as part of the ongoing C++ standardization process. This section focuses on the features available in HPX for parallel and concurrent computation on a single node, although many of the features presented here are also implemented to work in the distributed case.

Synchronization objects#

The following objects are providing synchronization for HPX applications:

  1. Barrier

  2. Condition variable

  3. Latch

  4. Mutex

  5. Shared mutex

  6. Semaphore

  7. Composable guards

Barrier#

Barriers are used for synchronizing multiple threads. They provide a synchronization point, where all threads must wait until they have all reached the barrier, before they can continue execution. This allows multiple threads to work together to solve a common task, and ensures that no thread starts working on the next task until all threads have completed the current task. This ensures that all threads are in the same state before performing any further operations, leading to a more consistent and accurate computation.

Unlike latches, barriers are reusable: once the participating threads are released from a barrier’s synchronization point, they can re-use the same barrier. It is thus useful for managing repeated tasks, or phases of a larger task, that are handled by multiple threads. The code below shows how barriers can be used to synchronize two threads:

#include <hpx/barrier.hpp>
#include <hpx/future.hpp>
#include <hpx/init.hpp>

#include <iostream>

int hpx_main()
{
    hpx::barrier b(2);

    hpx::future<void> f1 = hpx::async([&b]() {
        std::cout << "Thread 1 started." << std::endl;
        // Do some computation
        b.arrive_and_wait();
        // Continue with next task
        std::cout << "Thread 1 finished." << std::endl;
    });

    hpx::future<void> f2 = hpx::async([&b]() {
        std::cout << "Thread 2 started." << std::endl;
        // Do some computation
        b.arrive_and_wait();
        // Continue with next task
        std::cout << "Thread 2 finished." << std::endl;
    });

    f1.get();
    f2.get();

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In this example, two hpx::future objects are created, each representing a separate thread of execution. The wait function of the hpx::barrier object is called by each thread. The threads will wait at the barrier until both have reached it. Once both threads have reached the barrier, they can continue with their next task.

Condition variable#

A condition variable is a synchronization primitive in HPX that allows a thread to wait for a specific condition to be satisfied before continuing execution. It is typically used in conjunction with a mutex or a lock to protect shared data that is being modified by multiple threads. Hence, it blocks one or more threads until another thread both modifies a shared variable (the condition) and notifies the condition_variable. The code below shows how two threads modifying the shared variable data can be synchronized using the condition_variable:

#include <hpx/condition_variable.hpp>
#include <hpx/init.hpp>
#include <hpx/mutex.hpp>
#include <hpx/thread.hpp>

#include <iostream>
#include <string>

hpx::condition_variable cv;
hpx::mutex m;
std::string data;
bool ready = false;
bool processed = false;

void worker_thread()
{
    // Wait until the main thread signals that data is ready
    std::unique_lock<hpx::mutex> lk(m);
    cv.wait(lk, [] { return ready; });

    // Access the shared resource
    std::cout << "Worker thread: Processing data...\n";
    data = "Test data after";

    // Send data back to the main thread
    processed = true;
    std::cout << "Worker thread: data processing is complete\n";

    // Manual unlocking is done before notifying, to avoid waking up
    // the waiting thread only to block again
    lk.unlock();
    cv.notify_one();
}

int hpx_main()
{
    hpx::thread worker(worker_thread);

    // Do some work
    std::cout << "Main thread: Preparing data...\n";
    data = "Test data before";
    hpx::this_thread::sleep_for(std::chrono::seconds(1));
    std::cout << "Main thread: Data before processing = " << data << '\n';

    // Signal that data is ready and send data to worker thread
    {
        std::lock_guard<hpx::mutex> lk(m);
        ready = true;
        std::cout << "Main thread: Data is ready...\n";
    }
    cv.notify_one();

    // Wait for the worker thread to finish
    {
        std::unique_lock<hpx::mutex> lk(m);
        cv.wait(lk, [] { return processed; });
    }
    std::cout << "Main thread: Data after processing = " << data << '\n';
    worker.join();

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

The main thread of the code above starts by creating a worker thread and preparing the shared variable data. Once the data is ready, the main thread acquires a lock on the mutex m using std::lock_guard<hpx::mutex> lk(m) and sets the ready flag to true, then signals the worker thread to start processing by calling cv.notify_one(). The cv.wait() call in the main thread then blocks until the worker thread signals that processing is complete by setting the processed flag.

The worker thread starts by acquiring a lock on the mutex m to ensure exclusive access to the shared data. The cv.wait() call blocks the thread until the ready flag is set by the main thread. Once this is true, the worker thread accesses the shared data resource, processes it, and sets the processed flag to indicate completion. The mutex is then unlocked using lk.unlock() and the cv.notify_one() call signals the main thread to resume execution. Finally, the new data is printed by the main thread to the console.

Latch#

A latch is a downward counter which can be used to synchronize threads. The value of the counter is initialized on creation. Threads may block on the latch until the counter is decremented to zero. There is no possibility to increase or reset the counter, which makes the latch a single-use barrier.

In HPX, a latch is implemented as a counting semaphore, which can be initialized with a specific count value and decremented each time a thread reaches the latch. When the count value reaches zero, all waiting threads are unblocked and allowed to continue execution. The code below shows how latch can be used to synchronize 16 threads:

std::ptrdiff_t num_threads = 16;

///////////////////////////////////////////////////////////////////////////////
void wait_for_latch(hpx::latch& l)
{
    l.arrive_and_wait();
}

///////////////////////////////////////////////////////////////////////////////
int hpx_main(hpx::program_options::variables_map& vm)
{
    num_threads = vm["num-threads"].as<std::ptrdiff_t>();

    hpx::latch l(num_threads + 1);

    std::vector<hpx::future<void>> results;
    for (std::ptrdiff_t i = 0; i != num_threads; ++i)
        results.push_back(hpx::async(&wait_for_latch, std::ref(l)));

    // Wait for all threads to reach this point.
    l.arrive_and_wait();

    hpx::wait_all(results);

    return hpx::local::finalize();
}

In the above code, the hpx_main function creates a latch object l with a count of num_threads + 1 and num_threads number of threads using hpx::async. These threads call the wait_for_latch function and pass the reference to the latch object. In the wait_for_latch function, the thread calls the arrive_and_wait method on the latch, which decrements the count of the latch and causes the thread to wait until the count reaches zero. Finally, the main thread waits for all the threads to arrive at the latch by calling the arrive_and_wait method and then waits for all the threads to finish by calling the hpx::wait_all method.

Mutex#

A mutex (short for “mutual exclusion”) is a synchronization primitive in HPX used to control access to a shared resource, ensuring that only one thread can access it at a time. A mutex is used to protect data structures from race conditions and other synchronization-related issues. When a thread acquires a mutex, other threads that try to access the same resource will be blocked until the mutex is released. The code below shows the basic use of mutexes:

#include <hpx/future.hpp>
#include <hpx/init.hpp>
#include <hpx/mutex.hpp>

#include <iostream>

int hpx_main()
{
    hpx::mutex m;

    hpx::future<void> f1 = hpx::async([&m]() {
        std::scoped_lock sl(m);
        std::cout << "Thread 1 acquired the mutex" << std::endl;
    });

    hpx::future<void> f2 = hpx::async([&m]() {
        std::scoped_lock sl(m);
        std::cout << "Thread 2 acquired the mutex" << std::endl;
    });

    hpx::wait_all(f1, f2);

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In this example, two HPX threads created using hpx::async are acquiring a hpx::mutex m. std::scoped_lock sl(m) is used to take ownership of the given mutex m. When control leaves the scope in which the scoped_lock object was created, the scoped_lock is destructed and the mutex is released.

Attention

A common way to acquire and release mutexes is by using the function m.lock() before accessing the shared resource, and m.unlock() called after the access is complete. However, these functions may lead to deadlocks in case of exception(s). That is, if an exception happens when the mutex is locked then the code that unlocks the mutex will never be executed, the lock will remain held by the thread that acquired it, and other threads will be unable to access the shared resource. This can cause a deadlock if the other threads are also waiting to acquire the same lock. For this reason, we suggest you use std::scoped_lock, which prevents this issue by releasing the lock when control leaves the scope in which the scoped_lock object was created.

Shared mutex#

A shared mutex is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads. In contrast to other mutex types which facilitate exclusive access, a shared_mutex has two levels of access:

  • Exclusive access prevents any other thread from acquiring the mutex, just as with the normal mutex. It does not matter if the other thread tries to acquire shared or exclusive access.

  • Shared access allows multiple threads to acquire the mutex, but all of them only in shared mode. Exclusive access is not granted until all of the previous shared holders have returned the mutex (typically, as long as an exclusive request is waiting, new shared ones are queued to be granted after the exclusive access).

Shared mutexes are especially useful when shared data can be safely read by any number of threads simultaneously, but a thread may only write the same data when no other thread is reading or writing at the same time. A typical scenario is a database: The data can be read simultaneously by different threads with no problem. However, modification of the database is critical: if some threads read data while another one is writing, the threads reading may receive inconsistent data. Hence, while a thread is writing, reading should not be allowed. After writing is complete, reads can occur simultaneously again. The code below shows how shared_mutex can be used to synchronize reads and writes:

int const writers = 3;
int const readers = 3;
int const cycles = 10;

using std::chrono::milliseconds;

int hpx_main()
{
    std::vector<hpx::thread> threads;
    std::atomic<bool> ready(false);
    hpx::shared_mutex stm;

    for (int i = 0; i < writers; ++i)
    {
        threads.emplace_back([&ready, &stm, i] {
            std::mt19937 urng(static_cast<std::uint32_t>(std::time(nullptr)));
            std::uniform_int_distribution<int> dist(1, 1000);

            while (!ready)
            { /*** wait... ***/
            }

            for (int j = 0; j < cycles; ++j)
            {
                // scope of unique_lock
                {
                    std::unique_lock<hpx::shared_mutex> ul(stm);

                    std::cout << "^^^ Writer " << i << " starting..."
                              << std::endl;
                    hpx::this_thread::sleep_for(milliseconds(dist(urng)));
                    std::cout << "vvv Writer " << i << " finished."
                              << std::endl;
                }

                hpx::this_thread::sleep_for(milliseconds(dist(urng)));
            }
        });
    }

    for (int i = 0; i < readers; ++i)
    {
        int k = writers + i;
        threads.emplace_back([&ready, &stm, k, i] {
            HPX_UNUSED(k);
            std::mt19937 urng(static_cast<std::uint32_t>(std::time(nullptr)));
            std::uniform_int_distribution<int> dist(1, 1000);

            while (!ready)
            { /*** wait... ***/
            }

            for (int j = 0; j < cycles; ++j)
            {
                // scope of shared_lock
                {
                    std::shared_lock<hpx::shared_mutex> sl(stm);

                    std::cout << "Reader " << i << " starting..." << std::endl;
                    hpx::this_thread::sleep_for(milliseconds(dist(urng)));
                    std::cout << "Reader " << i << " finished." << std::endl;
                }
                hpx::this_thread::sleep_for(milliseconds(dist(urng)));
            }
        });
    }

    ready = true;
    for (auto& t : threads)
        t.join();

    return hpx::local::finalize();
}

The above code creates writers and readers threads, each of which will perform cycles of operations. Both the writer and reader threads use the hpx::shared_mutex object stm to synchronize access to a shared resource.

  • For the writer threads, a unique_lock on the shared mutex is acquired before each write operation and is released after control leaves the scope in which the unique_lock object was created.

  • For the reader threads, a shared_lock on the shared mutex is acquired before each read operation and is released after control leaves the scope in which the shared_lock object was created.

Before each operation, both the reader and writer threads sleep for a random time period, which is generated using a random number generator. The random time period simulates the processing time of the operation.

Semaphore#

Semaphores are a synchronization mechanism used to control concurrent access to a shared resource. The two types of semaphores are:

  • counting semaphore: it has a counter that is bigger than zero. The counter is initialized in the constructor. Acquiring the semaphore decreases the counter and releasing the semaphore increases the counter. If a thread tries to acquire the semaphore when the counter is zero, the thread will block until another thread increments the counter by releasing the semaphore. Unlike hpx::mutex, an hpx::counting_semaphore is not bound to a thread, which means that the acquire and release call of a semaphore can happen on different threads.

  • binary semaphore: it is an alias for a hpx::counting_semaphore<1>. In this case, the least maximal value is 1. hpx::binary_semaphore can be used to implement locks.

#include <hpx/init.hpp>
#include <hpx/semaphore.hpp>
#include <hpx/thread.hpp>

#include <iostream>

// initialize the semaphore with a count of 3
hpx::counting_semaphore<> semaphore(3);

void worker()
{
    semaphore.acquire();    // decrement the semaphore's count
    std::cout << "Entering critical section" << std::endl;
    hpx::this_thread::sleep_for(std::chrono::seconds(1));
    semaphore.release();    // increment the semaphore's count
    std::cout << "Exiting critical section" << std::endl;
}

int hpx_main()
{
    hpx::thread t1(worker);
    hpx::thread t2(worker);
    hpx::thread t3(worker);
    hpx::thread t4(worker);
    hpx::thread t5(worker);

    t1.join();
    t2.join();
    t3.join();
    t4.join();
    t5.join();

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In this example, the counting semaphore is initialized to the value of 3. This means that up to 3 threads can access the critical section (the section of code inside the worker() function) at the same time. When a thread enters the critical section, it acquires the semaphore, which decrements the count, while when it exits the critical section, it releases the semaphore, incrementing thus the count. The worker() function simulates a critical section by acquiring the semaphore, sleeping for 1 second and then releasing the semaphore.

In the main function, 5 worker threads are created and started, each trying to enter the critical section. If the count of the semaphore is already 0, a worker will wait until another worker releases the semaphore (increasing its value).

Composable guards#

Composable guards operate in a manner similar to locks, but are applied only to asynchronous functions. The guard (or guards) is automatically locked at the beginning of a specified task and automatically unlocked at the end. Because guards are never added to an existing task’s execution context, the calling of guards is freely composable and can never deadlock.

To call an application with a single guard, simply declare the guard and call run_guarded() with a function (task):

hpx::lcos::local::guard gu;
run_guarded(gu,task);

If a single method needs to run with multiple guards, use a guard set:

std::shared_ptr<hpx::lcos::local::guard> gu1(new hpx::lcos::local::guard());
std::shared_ptr<hpx::lcos::local::guard> gu2(new hpx::lcos::local::guard());
gs.add(*gu1);
gs.add(*gu2);
run_guarded(gs,task);

Guards use two atomic operations (which are not called repeatedly) to manage what they do, so overhead should be extremely low.

Execution control#

The following objects are providing control of the execution in HPX applications:

  1. Futures

  2. Channels

  3. Task blocks

  4. Task groups

  5. Threads

Futures#

Futures are a mechanism to represent the result of a potentially asynchronous operation. A future is a type that represents a value that will become available at some point in the future, and it can be used to write asynchronous and parallel code. Futures can be returned from functions that perform time-consuming operations, allowing the calling code to continue executing while the function performs its work. The value of the future is set when the operation completes and can be accessed later. Futures are used in HPX to write asynchronous and parallel code. Below is an example demonstrating different features of futures:

#include <hpx/assert.hpp>
#include <hpx/future.hpp>
#include <hpx/hpx_main.hpp>
#include <hpx/tuple.hpp>

#include <iostream>
#include <utility>

int main()
{
    // Asynchronous execution with futures
    hpx::future<void> f1 = hpx::async(hpx::launch::async, []() {});
    hpx::shared_future<int> f2 =
        hpx::async(hpx::launch::async, []() { return 42; });
    hpx::future<int> f3 =
        f2.then([](hpx::shared_future<int>&& f) { return f.get() * 3; });

    hpx::promise<double> p;
    auto f4 = p.get_future();
    HPX_ASSERT(!f4.is_ready());
    p.set_value(123.45);
    HPX_ASSERT(f4.is_ready());

    hpx::packaged_task<int()> t([]() { return 43; });
    hpx::future<int> f5 = t.get_future();
    HPX_ASSERT(!f5.is_ready());
    t();
    HPX_ASSERT(f5.is_ready());

    // Fire-and-forget
    hpx::post([]() {
        std::cout << "This will be printed later\n" << std::flush;
    });

    // Synchronous execution
    hpx::sync([]() {
        std::cout << "This will be printed immediately\n" << std::flush;
    });

    // Combinators
    hpx::future<double> f6 = hpx::async([]() { return 3.14; });
    hpx::future<double> f7 = hpx::async([]() { return 42.0; });
    std::cout
        << hpx::when_all(f6, f7)
               .then([](hpx::future<
                         hpx::tuple<hpx::future<double>, hpx::future<double>>>
                             f) {
                   hpx::tuple<hpx::future<double>, hpx::future<double>> t =
                       f.get();
                   double pi = hpx::get<0>(t).get();
                   double r = hpx::get<1>(t).get();
                   return pi * r * r;
               })
               .get()
        << std::endl;

    // Easier continuations with dataflow; it waits for all future or
    // shared_future arguments before executing the continuation, and also
    // accepts non-future arguments
    hpx::future<double> f8 = hpx::async([]() { return 3.14; });
    hpx::future<double> f9 = hpx::make_ready_future(42.0);
    hpx::shared_future<double> f10 = hpx::async([]() { return 123.45; });
    hpx::future<hpx::tuple<double, double>> f11 = hpx::dataflow(
        [](hpx::future<double> a, hpx::future<double> b,
            hpx::shared_future<double> c, double d) {
            return hpx::make_tuple<>(a.get() + b.get(), c.get() / d);
        },
        f8, f9, f10, -3.9);

    // split_future gives a tuple of futures from a future of tuple
    hpx::tuple<hpx::future<double>, hpx::future<double>> f12 =
        hpx::split_future(std::move(f11));
    std::cout << hpx::get<1>(f12).get() << std::endl;

    return 0;
}

The first section of the main function demonstrates how to use futures for asynchronous execution. The first two lines create two futures, one for void and another for an integer, using the hpx::async() function. These futures are executed asynchronously in separate threads using the hpx::launch::async launch policy. The third future is created by chaining the second future using the then() member function. This future multiplies the result of the second future by 3.

The next part of the code demonstrates how to use promises and packaged tasks, which are constructs used for communicating data between threads. The promise class is used to store a value that can be retrieved later using a future. The packaged_task class represents a task that can be executed asynchronously, and its result can be obtained using a future. The last three lines create a packaged task that returns an integer, obtain its future, execute the task, and check whether the future is ready or not.

The code then demonstrates how to use the hpx::post() and hpx::sync() functions for fire-and-forget and synchronous execution, respectively. The hpx::post() function executes a given function asynchronously and returns immediately without waiting for the result. The hpx::sync() function executes a given function synchronously and waits for the result before returning.

Next the code demonstrates the use of combinators, which are higher-order functions that combine two or more futures into a single future. The hpx::when_all() function is used to combine two futures, which return double values, into a tuple of futures. The then() member function is then used to compute the area of a circle using the values of the two futures. The get() member function is used to retrieve the result of the computation.

The last section demonstrates the use of hpx::dataflow(), which is a higher-order function that waits for all the future or shared_future arguments to be ready before executing the continuation. The hpx::make_ready_future() function is used to create a future with a given value. The hpx::split_future() function is used to split a future of a tuple into a tuple of futures. The last line retrieves the value of the second future in the tuple using hpx::get() and prints it to the console.

Extended facilities for futures#

Concurrency is about both decomposing and composing the program from the parts that work well individually and together. It is in the composition of connected and multicore components where today’s C++ libraries are still lacking.

The functionality of std::future offers a partial solution. It allows for the separation of the initiation of an operation and the act of waiting for its result; however, the act of waiting is synchronous. In communication-intensive code this act of waiting can be unpredictable, inefficient and simply frustrating. The example below illustrates a possible synchronous wait using futures:

#include <future>
using namespace std;
int main()
{
    future<int> f = async([]() { return 123; });
    int result = f.get(); // might block
}

For this reason, HPX implements a set of extensions to std::future (as proposed by N4313). This proposal introduces the following key asynchronous operations to hpx::future, hpx::shared_future and hpx::async, which enhance and enrich these facilities.

Table 10 Facilities extending std::future#

Facility

Description

hpx::future::then

In asynchronous programming, it is very common for one asynchronous operation, on completion, to invoke a second operation and pass data to it. The current C++ standard does not allow one to register a continuation to a future. With then, instead of waiting for the result, a continuation is “attached” to the asynchronous operation, which is invoked when the result is ready. Continuations registered using then function will help to avoid blocking waits or wasting threads on polling, greatly improving the responsiveness and scalability of an application.

unwrapping constructor for hpx::future

In some scenarios, you might want to create a future that returns another future, resulting in nested futures. Although it is possible to write code to unwrap the outer future and retrieve the nested future and its result, such code is not easy to write because users must handle exceptions and it may cause a blocking call. Unwrapping can allow users to mitigate this problem by doing an asynchronous call to unwrap the outermost future.

hpx::future::is_ready

There are often situations where a get() call on a future may not be a blocking call, or is only a blocking call under certain circumstances. This function gives the ability to test for early completion and allows us to avoid associating a continuation, which needs to be scheduled with some non-trivial overhead and near-certain loss of cache efficiency.

hpx::make_ready_future

Some functions may know the value at the point of construction. In these cases the value is immediately available, but needs to be returned as a future. By using hpx::make_ready_future a future can be created that holds a pre-computed result in its shared state. In the current standard it is non-trivial to create a future directly from a value. First a promise must be created, then the promise is set, and lastly the future is retrieved from the promise. This can now be done with one operation.

The standard also omits the ability to compose multiple futures. This is a common pattern that is ubiquitous in other asynchronous frameworks and is absolutely necessary in order to make C++ a powerful asynchronous programming language. Not including these functions is synonymous to Boolean algebra without AND/OR.

In addition to the extensions proposed by N4313, HPX adds functions allowing users to compose several futures in a more flexible way.

Table 11 Facilities for composing hpx::futures#

Facility

Description

hpx::when_any, hpx::when_any_n

Asynchronously wait for at least one of multiple future or shared_future objects to finish.

hpx::wait_any, hpx::wait_any_n

Synchronously wait for at least one of multiple future or shared_future objects to finish.

hpx::when_all, hpx::when_all_n

Asynchronously wait for all future and shared_future objects to finish.

hpx::wait_all, hpx::wait_all_n

Synchronously wait for all future and shared_future objects to finish.

hpx::when_some, hpx::when_some_n

Asynchronously wait for multiple future and shared_future objects to finish.

hpx::wait_some, hpx::wait_some_n

Synchronously wait for multiple future and shared_future objects to finish.

hpx::when_each

Asynchronously wait for multiple future and shared_future objects to finish and call a function for each of the future objects as soon as it becomes ready.

hpx::wait_each, hpx::wait_each_n

Synchronously wait for multiple future and shared_future objects to finish and call a function for each of the future objects as soon as it becomes ready.

Channels#

Channels combine communication (the exchange of a value) with synchronization (guaranteeing that two calculations (tasks) are in a known state). A channel can transport any number of values of a given type from a sender to a receiver:

    hpx::lcos::local::channel<int> c;
    hpx::future<int> f = c.get();
    HPX_ASSERT(!f.is_ready());
    c.set(42);
    HPX_ASSERT(f.is_ready());
    std::cout << f.get() << std::endl;

Channels can be handed to another thread (or in case of channel components, to other localities), thus establishing a communication channel between two independent places in the program:

void do_something(hpx::lcos::local::receive_channel<int> c,
    hpx::lcos::local::send_channel<> done)
{
    // prints 43
    std::cout << c.get(hpx::launch::sync) << std::endl;
    // signal back
    done.set();
}

void send_receive_channel()
{
    hpx::lcos::local::channel<int> c;
    hpx::lcos::local::channel<> done;

    hpx::post(&do_something, c, done);

    // send some value
    c.set(43);
    // wait for thread to be done
    done.get().wait();
}

Note how hpx::lcos::local::channel::get without any arguments returns a future which is ready when a value has been set on the channel. The launch policy hpx::launch::sync can be used to make hpx::lcos::local::channel::get block until a value is set and return the value directly.

A channel component is created on one locality and can be sent to another locality using an action. This example also demonstrates how a channel can be used as a range of values:

// channel components need to be registered for each used type (not needed
// for hpx::lcos::local::channel)
HPX_REGISTER_CHANNEL(double)

void channel_sender(hpx::lcos::channel<double> c)
{
    for (double d : c)
        hpx::cout << d << std::endl;
}
HPX_PLAIN_ACTION(channel_sender)

void channel()
{
    // create the channel on this locality
    hpx::lcos::channel<double> c(hpx::find_here());

    // pass the channel to a (possibly remote invoked) action
    hpx::post(channel_sender_action(), hpx::find_here(), c);

    // send some values to the receiver
    std::vector<double> v = {1.2, 3.4, 5.0};
    for (double d : v)
        c.set(d);

    // explicitly close the communication channel (implicit at destruction)
    c.close();
}
Task blocks#

Task blocks in HPX provide a way to structure and organize the execution of tasks in a parallel program, making it easier to manage dependencies between tasks. A task block actually is a group of tasks that can be executed in parallel. Tasks in a task block can depend on other tasks in the same task block. The task block allows the runtime to optimize the execution of tasks, by scheduling them in an optimal order based on the dependencies between them.

The define_task_block, run and the wait functions implemented based on N4755 are based on the task_block concept that is a part of the common subset of the Microsoft Parallel Patterns Library (PPL) and the Intel Threading Building Blocks (TBB) libraries.

These implementations adopt a simpler syntax than exposed by those libraries— one that is influenced by language-based concepts, such as spawn and sync from Cilk++ and async and finish from X10. They improve on existing practice in the following ways:

  • The exception handling model is simplified and more consistent with normal C++ exceptions.

  • Most violations of strict fork-join parallelism can be enforced at compile time (with compiler assistance, in some cases).

  • The syntax allows scheduling approaches other than child stealing.

Consider an example of a parallel traversal of a tree, where a user-provided function compute is applied to each node of the tree, returning the sum of the results:

template <typename Func>
int traverse(node& n, Func && compute)
{
    int left = 0, right = 0;
    define_task_block(
        [&](task_block<>& tr) {
            if (n.left)
                tr.run([&] { left = traverse(*n.left, compute); });
            if (n.right)
                tr.run([&] { right = traverse(*n.right, compute); });
        });

    return compute(n) + left + right;
}

The example above demonstrates the use of two of the functions, hpx::experimental::define_task_block and the hpx::experimental::task_block::run member function of a hpx::experimental::task_block.

The task_block function delineates a region in a program code potentially containing invocations of threads spawned by the run member function of the task_block class. The run function spawns an HPX thread, a unit of work that is allowed to execute in parallel with respect to the caller. Any parallel tasks spawned by run within the task block are joined back to a single thread of execution at the end of the define_task_block. run takes a user-provided function object f and starts it asynchronously—i.e., it may return before the execution of f completes. The HPX scheduler may choose to run f immediately or delay running f until compute resources become available.

A task_block can be constructed only by define_task_block because it has no public constructors. Thus, run can be invoked directly or indirectly only from a user-provided function passed to define_task_block:

void g();

void f(task_block<>& tr)
{
    tr.run(g);          // OK, invoked from within task_block in h
}

void h()
{
    define_task_block(f);
}

int main()
{
    task_block<> tr;    // Error: no public constructor
    tr.run(g);          // No way to call run outside of a define_task_block
    return 0;
}
Extensions for task blocks#
Using execution policies with task blocks#

HPX implements some extensions for task_block beyond the actual standards proposal N4755. The main addition is that a task_block can be invoked with an execution policy as its first argument, very similar to the parallel algorithms.

An execution policy is an object that expresses the requirements on the ordering of functions invoked as a consequence of the invocation of a task block. Enabling passing an execution policy to define_task_block gives the user control over the amount of parallelism employed by the created task_block. In the following example the use of an explicit par execution policy makes the user’s intent explicit:

template <typename Func>
int traverse(node *n, Func&& compute)
{
    int left = 0, right = 0;

    define_task_block(
        execution::par,                // execution::parallel_policy
        [&](task_block<>& tb) {
            if (n->left)
                tb.run([&] { left = traverse(n->left, compute); });
            if (n->right)
                tb.run([&] { right = traverse(n->right, compute); });
        });

    return compute(n) + left + right;
}

This also causes the hpx::experimental::task_block object to be a template in our implementation. The template argument is the type of the execution policy used to create the task block. The template argument defaults to hpx::execution::parallel_policy.

HPX still supports calling hpx::experimental::define_task_block without an explicit execution policy. In this case the task block will run using the hpx::execution::parallel_policy.

HPX also adds the ability to access the execution policy that was used to create a given task_block.

Using executors to run tasks#

Often, users want to be able to not only define an execution policy to use by default for all spawned tasks inside the task block, but also to customize the execution context for one of the tasks executed by task_block::run. Adding an optionally passed executor instance to that function enables this use case:

template <typename Func>
int traverse(node *n, Func&& compute)
{
    int left = 0, right = 0;

    define_task_block(
        execution::par,                // execution::parallel_policy
        [&](auto& tb) {
            if (n->left)
            {
                // use explicitly specified executor to run this task
                tb.run(my_executor(), [&] { left = traverse(n->left, compute); });
            }
            if (n->right)
            {
                // use the executor associated with the par execution policy
                tb.run([&] { right = traverse(n->right, compute); });
            }
        });

    return compute(n) + left + right;
}

HPX still supports calling hpx::experimental::task_block::run without an explicit executor object. In this case the task will be run using the executor associated with the execution policy that was used to call hpx::experimental::define_task_block.

Task groups#

A task group in HPX is a synchronization primitive that allows you to execute a group of tasks concurrently and wait for their completion before continuing. The tasks in an hpx::experimental::task_group can be added dynamically. This is the HPX implementation of tbb::task_group of the Intel Threading Building Blocks (TBB) library.

The example below shows that to use a task group, you simply create an hpx::task_group object and add tasks to it using the run() method. Once all the tasks have been added, you can call the wait() method to synchronize the tasks and wait for them to complete.

#include <hpx/experimental/task_group.hpp>
#include <hpx/init.hpp>

#include <iostream>

void task1()
{
    std::cout << "Task 1 executed." << std::endl;
}

void task2()
{
    std::cout << "Task 2 executed." << std::endl;
}

int hpx_main()
{
    hpx::experimental::task_group tg;

    tg.run(task1);
    tg.run(task2);

    tg.wait();

    std::cout << "All tasks finished!" << std::endl;

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

Note

task groups and task blocks are both ways to group and synchronize parallel tasks, but task groups are used to group multiple tasks together as a single unit, while task blocks are used to execute a loop in parallel, with each iteration of the loop executing in a separate task. If the difference is not clear yet, continue reading.

A task group is a construct that allows multiple parallel tasks to be grouped together as a single unit. The task group provides a way to synchronize all the tasks in the group before continuing with the rest of the program.

A task block, on the other hand, is a parallel loop construct that allows you to execute a loop in parallel, with each iteration of the loop executing in a separate task. The loop iterations are executed in a block, meaning that the loop body is executed as a single task.

Threads#

A thread in HPX refers to a sequence of instructions that can be executed concurrently with other such sequences in multithreading environments, while sharing a same address space. These threads can communicate with each other through various means, such as futures or shared data structures.

The example below demonstrates how to launch multiple threads and synchronize them using a hpx::latch object. It also shows how to query the state of threads and wait for futures to complete.

#include <hpx/future.hpp>
#include <hpx/init.hpp>
#include <hpx/thread.hpp>

#include <functional>
#include <iostream>
#include <vector>

int const num_threads = 10;

///////////////////////////////////////////////////////////////////////////////
void wait_for_latch(hpx::latch& l)
{
    l.arrive_and_wait();
}

int hpx_main()
{
    // Spawn a couple of threads
    hpx::latch l(num_threads + 1);

    std::vector<hpx::future<void>> results;
    results.reserve(num_threads);

    for (int i = 0; i != num_threads; ++i)
        results.push_back(hpx::async(&wait_for_latch, std::ref(l)));

    // Allow spawned threads to reach latch
    hpx::this_thread::yield();

    // Enumerate all suspended threads
    hpx::threads::enumerate_threads(
        [](hpx::threads::thread_id_type id) -> bool {
            std::cout << "thread " << hpx::thread::id(id) << " is "
                      << hpx::threads::get_thread_state_name(
                             hpx::threads::get_thread_state(id))
                      << std::endl;
            return true;    // always continue enumeration
        },
        hpx::threads::thread_schedule_state::suspended);

    // Wait for all threads to reach this point.
    l.arrive_and_wait();

    hpx::wait_all(results);

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

In more detail, the wait_for_latch() function is a simple helper function that waits for a hpx::latch object to be released. At this point we remind that hpx::latch is a synchronization primitive that allows multiple threads to wait for a common event to occur.

In the hpx_main() function, an hpx::latch object is created with a count of num_threads + 1, indicating that num_threads threads need to arrive at the latch before the latch is released. The loop that follows launches num_threads asynchronous operations, each of which calls the wait_for_latch function. The resulting futures are added to the vector.

After the threads have been launched, hpx::this_thread::yield() is called to give them a chance to reach the latch before the program proceeds. Then, the hpx::threads::enumerate_threads function prints the state of each suspended thread, while the next call of l.arrive_and_wait() waits for all the threads to reach the latch. Finally, hpx::wait_all is called to wait for all the futures to complete.

Hint

An advantage of using hpx::thread over other threading libraries is that it is optimized for high-performance parallelism, with support for lightweight threads and task scheduling to minimize thread overhead and maximize parallelism. Additionally, hpx::thread integrates seamlessly with other features of HPX such as futures, promises, and task groups, making it a powerful tool for parallel programming.

Checkout the examples of Shared mutex, Condition variable, Semaphore to see how HPX threads are used in combination with other features.

High level parallel facilities#

In preparation for the upcoming C++ Standards, there are currently several proposals targeting different facilities supporting parallel programming. HPX implements (and extends) some of those proposals. This is well aligned with our strategy to align the APIs exposed from HPX with current and future C++ Standards.

At this point, HPX implements several of the C++ Standardization working papers, most notably N4409 (Working Draft, Technical Specification for C++ Extensions for Parallelism), N4755 (Task Blocks), and N4406 (Parallel Algorithms Need Executors).

Using parallel algorithms#

A parallel algorithm is a function template declared in the namespace hpx::parallel.

All parallel algorithms are very similar in semantics to their sequential counterparts (as defined in the namespace std) with an additional formal template parameter named ExecutionPolicy. The execution policy is generally passed as the first argument to any of the parallel algorithms and describes the manner in which the execution of these algorithms may be parallelized and the manner in which they apply user-provided function objects.

The applications of function objects in parallel algorithms invoked with an execution policy object of type hpx::execution::sequenced_policy or hpx::execution::sequenced_task_policy execute in sequential order. For hpx::execution::sequenced_policy the execution happens in the calling thread.

The applications of function objects in parallel algorithms invoked with an execution policy object of type hpx::execution::parallel_policy or hpx::execution::parallel_task_policy are permitted to execute in an unordered fashion in unspecified threads, and are indeterminately sequenced within each thread.

Important

It is the caller’s responsibility to ensure correctness, such as making sure that the invocation does not introduce data races or deadlocks.

The example below demonstrates how to perform a sequential and parallel hpx::for_each loop on a vector of integers.

#include <hpx/algorithm.hpp>
#include <hpx/execution.hpp>
#include <hpx/init.hpp>

#include <iostream>
#include <vector>

int hpx_main()
{
    std::vector<int> v{1, 2, 3, 4, 5};

    auto print = [](const int& n) { std::cout << n << ' '; };

    std::cout << "Print sequential: ";
    hpx::for_each(v.begin(), v.end(), print);
    std::cout << '\n';

    std::cout << "Print parallel: ";
    hpx::for_each(hpx::execution::par, v.begin(), v.end(), print);
    std::cout << '\n';

    return hpx::local::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::local::init(hpx_main, argc, argv);
}

The above code uses hpx::for_each to print the elements of the vector v{1, 2, 3, 4, 5}. At first, hpx::for_each() is called without an execution policy, which means that it applies the lambda function print to each element in the vector sequentially. Hence, the elements are printed in order.

Next, hpx::for_each() is called with the hpx::execution::par execution policy, which applies the lambda function print to each element in the vector in parallel. Therefore, the output order of the elements in the vector is not deterministic and may vary from run to run.

Parallel exceptions#

During the execution of a standard parallel algorithm, if temporary memory resources are required by any of the algorithms and no memory is available, the algorithm throws a std::bad_alloc exception.

During the execution of any of the parallel algorithms, if the application of a function object terminates with an uncaught exception, the behavior of the program is determined by the type of execution policy used to invoke the algorithm:

For example, the number of invocations of the user-provided function object in for_each is unspecified. When hpx::for_each is executed sequentially, only one exception will be contained in the hpx::exception_list object.

These guarantees imply that, unless the algorithm has failed to allocate memory and terminated with std::bad_alloc, all exceptions thrown during the execution of the algorithm are communicated to the caller. It is unspecified whether an algorithm implementation will “forge ahead” after encountering and capturing a user exception.

The algorithm may terminate with the std::bad_alloc exception even if one or more user-provided function objects have terminated with an exception. For example, this can happen when an algorithm fails to allocate memory while creating or adding elements to the hpx::exception_list object.

Parallel algorithms#

HPX provides implementations of the following parallel algorithms:

Table 12 Non-modifying parallel algorithms of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::adjacent_find

Computes the differences between adjacent elements in a range.

adjacent_find

hpx::all_of

Checks if a predicate is true for all of the elements in a range.

all_any_none_of

hpx::any_of

Checks if a predicate is true for any of the elements in a range.

all_any_none_of

hpx::count

Returns the number of elements equal to a given value.

count

hpx::count_if

Returns the number of elements satisfying a specific criteria.

count_if

hpx::equal

Determines if two sets of elements are the same.

equal

hpx::find

Finds the first element equal to a given value.

find

hpx::find_end

Finds the last sequence of elements in a certain range.

find_end

hpx::find_first_of

Searches for any one of a set of elements.

find_first_of

hpx::find_if

Finds the first element satisfying a specific criteria.

find_if

hpx::find_if_not

Finds the first element not satisfying a specific criteria.

find_if_not

hpx::for_each

Applies a function to a range of elements.

for_each

hpx::for_each_n

Applies a function to a number of elements.

for_each_n

hpx::lexicographical_compare

Checks if a range of values is lexicographically less than another range of values.

lexicographical_compare

hpx::mismatch

Finds the first position where two ranges differ.

mismatch

hpx::none_of

Checks if a predicate is true for none of the elements in a range.

all_any_none_of

hpx::search

Searches for a range of elements.

search

hpx::search_n

Searches for a number consecutive copies of an element in a range.

search_n


Table 13 Modifying parallel algorithms of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::copy

Copies a range of elements to a new location.

exclusive_scan

hpx::copy_n

Copies a number of elements to a new location.

copy_n

hpx::copy_if

Copies the elements from a range to a new location for which the given predicate is true

copy

hpx::move

Moves a range of elements to a new location.

move

hpx::fill

Assigns a range of elements a certain value.

fill

hpx::fill_n

Assigns a value to a number of elements.

fill_n

hpx::generate

Saves the result of a function in a range.

generate

hpx::generate_n

Saves the result of N applications of a function.

generate_n

hpx::experimental::reduce_by_key

Performs an inclusive scan on consecutive elements with matching keys, with a reduction to output only the final sum for each key. The key sequence {1,1,1,2,3,3,3,3,1} and value sequence {2,3,4,5,6,7,8,9,10} would be reduced to keys={1,2,3,1}, values={9,5,30,10}.

hpx::remove

Removes the elements from a range that are equal to the given value.

remove

hpx::remove_if

Removes the elements from a range that are equal to the given predicate is false

remove

hpx::remove_copy

Copies the elements from a range to a new location that are not equal to the given value.

remove_copy

hpx::remove_copy_if

Copies the elements from a range to a new location for which the given predicate is false

remove_copy

hpx::replace

Replaces all values satisfying specific criteria with another value.

replace

hpx::replace_if

Replaces all values satisfying specific criteria with another value.

replace

hpx::replace_copy

Copies a range, replacing elements satisfying specific criteria with another value.

replace_copy

hpx::replace_copy_if

Copies a range, replacing elements satisfying specific criteria with another value.

replace_copy

hpx::reverse

Reverses the order elements in a range.

reverse

hpx::reverse_copy

Creates a copy of a range that is reversed.

reverse_copy

hpx::rotate

Rotates the order of elements in a range.

rotate

hpx::rotate_copy

Copies and rotates a range of elements.

rotate_copy

hpx::shift_left

Shifts the elements in the range left by n positions.

shift_left

hpx::shift_right

Shifts the elements in the range right by n positions.

shift_right

hpx::swap_ranges

Swaps two ranges of elements.

swap_ranges

hpx::transform

Applies a function to a range of elements.

transform

hpx::unique

Eliminates all but the first element from every consecutive group of equivalent elements from a range.

unique

hpx::unique_copy

Copies the elements from one range to another in such a way that there are no consecutive equal elements.

unique_copy


Table 14 Set operations on sorted sequences of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::merge

Merges two sorted ranges.

merge

hpx::inplace_merge

Merges two ordered ranges in-place.

inplace_merge

hpx::includes

Returns true if one set is a subset of another.

includes

hpx::set_difference

Computes the difference between two sets.

set_difference

hpx::set_intersection

Computes the intersection of two sets.

set_intersection

hpx::set_symmetric_difference

Computes the symmetric difference between two sets.

set_symmetric_difference

hpx::set_union

Computes the union of two sets.

set_union


Table 15 Heap operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::is_heap

Returns true if the range is max heap.

is_heap

hpx::is_heap_until

Returns the first element that breaks a max heap.

is_heap_until

hpx::make_heap

Constructs a max heap in the range [first, last).

make_heap


Table 16 Minimum/maximum operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::max_element

Returns the largest element in a range.

max_element

hpx::min_element

Returns the smallest element in a range.

min_element

hpx::minmax_element

Returns the smallest and the largest element in a range.

minmax_element


Table 17 Partitioning Operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::nth_element

Partially sorts the given range making sure that it is partitioned by the given element

nth_element

hpx::is_partitioned

Returns true if each true element for a predicate precedes the false elements in a range.

is_partitioned

hpx::partition

Divides elements into two groups without preserving their relative order.

partition

hpx::partition_copy

Copies a range dividing the elements into two groups.

partition_copy

hpx::stable_partition

Divides elements into two groups while preserving their relative order.

stable_partition


Table 18 Sorting Operations of header hpx/algorithm.hpp#

Name

Description

C++ standard

hpx::is_sorted

Returns true if each element in a range is sorted.

is_sorted

hpx::is_sorted_until

Returns the first unsorted element.

is_sorted_until

hpx::sort

Sorts the elements in a range.

sort

hpx::stable_sort

Sorts the elements in a range, maintain sequence of equal elements.

stable_sort

hpx::partial_sort

Sorts the first elements in a range.

partial_sort

hpx::partial_sort_copy

Sorts the first elements in a range, storing the result in another range.

partial_sort_copy

hpx::experimental::sort_by_key

Sorts one range of data using keys supplied in another range.


Table 19 Numeric Parallel Algorithms of header hpx/numeric.hpp#

Name

Description

C++ standard

hpx::adjacent_difference

Calculates the difference between each element in an input range and the preceding element.

adjacent_difference

hpx::exclusive_scan

Does an exclusive parallel scan over a range of elements.

exclusive_scan

hpx::inclusive_scan

Does an inclusive parallel scan over a range of elements.

inclusive_scan

hpx::reduce

Sums up a range of elements.

reduce

hpx::transform_exclusive_scan

Does an exclusive parallel scan over a range of elements after applying a function.

transform_exclusive_scan

hpx::transform_inclusive_scan

Does an inclusive parallel scan over a range of elements after applying a function.

transform_inclusive_scan

hpx::transform_reduce

Sums up a range of elements after applying a function. Also, accumulates the inner products of two input ranges.

transform_reduce


Table 20 Dynamic Memory Management of header hpx/memory.hpp#

Name

Description

C++ standard

hpx::destroy

Destroys a range of objects.

destroy

hpx::destroy_n

Destroys a range of objects.

destroy_n

hpx::uninitialized_copy

Copies a range of objects to an uninitialized area of memory.

uninitialized_copy

hpx::uninitialized_copy_n

Copies a number of objects to an uninitialized area of memory.

uninitialized_copy_n

hpx::uninitialized_default_construct

Copies a range of objects to an uninitialized area of memory.

uninitialized_default_construct

hpx::uninitialized_default_construct_n

Copies a number of objects to an uninitialized area of memory.

uninitialized_default_construct_n

hpx::uninitialized_fill

Copies an object to an uninitialized area of memory.

uninitialized_fill

hpx::uninitialized_fill_n

Copies an object to an uninitialized area of memory.

uninitialized_fill_n

hpx::uninitialized_move

Moves a range of objects to an uninitialized area of memory.

uninitialized_move

hpx::uninitialized_move_n

Moves a number of objects to an uninitialized area of memory.

uninitialized_move_n

hpx::uninitialized_value_construct

Constructs objects in an uninitialized area of memory.

uninitialized_value_construct

hpx::uninitialized_value_construct_n

Constructs objects in an uninitialized area of memory.

uninitialized_value_construct_n


Table 21 Index-based for-loops of header hpx/algorithm.hpp#

Name

Description

hpx::experimental::for_loop

Implements loop functionality over a range specified by integral or iterator bounds.

hpx::experimental::for_loop_strided

Implements loop functionality over a range specified by integral or iterator bounds.

hpx::experimental::for_loop_n

Implements loop functionality over a range specified by integral or iterator bounds.

hpx::experimental::for_loop_n_strided

Implements loop functionality over a range specified by integral or iterator bounds.

Executor parameters and executor parameter traits#

HPX introduces the notion of execution parameters and execution parameter traits. At this point, the only parameter that can be customized is the size of the chunks of work executed on a single HPX thread (such as the number of loop iterations combined to run as a single task).

An executor parameter object is responsible for exposing the calculation of the size of the chunks scheduled. It abstracts the (potentially platform-specific) algorithms of determining those chunk sizes.

The way executor parameters are implemented is aligned with the way executors are implemented. All functionalities of concrete executor parameter types are exposed and accessible through a corresponding customization point, e.g. get_chunk_size().

With executor_parameter_traits, clients access all types of executor parameters uniformly, e.g.:

std::size_t chunk_size =
    hpx::execution::get_chunk_size(my_parameter, my_executor,
        num_cores, num_tasks);

This call synchronously retrieves the size of a single chunk of loop iterations (or similar) to combine for execution on a single HPX thread if the overall number of cores num_cores and tasks to schedule is given by num_tasks. The lambda function exposes a means of test-probing the execution of a single iteration for performance measurement purposes. The execution parameter type might dynamically determine the execution time of one or more tasks in order to calculate the chunk size; see hpx::execution::experimental::auto_chunk_size for an example of this executor parameter type.

Other functions in the interface exist to discover whether an executor parameter type should be invoked once (i.e., it returns a static chunk size; see hpx::execution::experimental::static_chunk_size) or whether it should be invoked for each scheduled chunk of work (i.e., it returns a variable chunk size; for an example, see hpx::execution::experimental::guided_chunk_size).

Although this interface appears to require executor parameter type authors to implement all different basic operations, none are required. In practice, all operations have sensible defaults. However, some executor parameter types will naturally specialize all operations for maximum efficiency.

HPX implements the following executor parameter types:

  • hpx::execution::experimental::auto_chunk_size: Loop iterations are divided into pieces and then assigned to threads. The number of loop iterations combined is determined based on measurements of how long the execution of 1% of the overall number of iterations takes. This executor parameter type makes sure that as many loop iterations are combined as necessary to run for the amount of time specified.

  • hpx::execution::experimental::static_chunk_size: Loop iterations are divided into pieces of a given size and then assigned to threads. If the size is not specified, the iterations are, if possible, evenly divided contiguously among the threads. This executor parameters type is equivalent to OpenMP’s STATIC scheduling directive.

  • hpx::execution::experimental::dynamic_chunk_size: Loop iterations are divided into pieces of a given size and then dynamically scheduled among the cores; when a core finishes one chunk, it is dynamically assigned another. If the size is not specified, the default chunk size is 1. This executor parameter type is equivalent to OpenMP’s DYNAMIC scheduling directive.

  • hpx::execution::experimental::guided_chunk_size: Iterations are dynamically assigned to cores in blocks as cores request them until no blocks remain to be assigned. This is similar to dynamic_chunk_size except that the block size decreases each time a number of loop iterations is given