Welcome to the HPX documentation!

If you’re new to HPX you can get started with the Quick start guide. Don’t forget to read the Terminology section to learn about the most important concepts in HPX. The Examples give you a feel for how it is to write real HPX applications and the Manual contains detailed information about everything from building HPX to debugging it. There are links to blog posts and videos about HPX in Additional material.

If you can’t find what you’re looking for in the documentation, please:

What is HPX?

HPX is a C++ Standard Library for Concurrency and Parallelism. It implements all of the corresponding facilities as defined by the C++ Standard. Additionally, in HPX we implement functionalities proposed as part of the ongoing C++ standardization process. We also extend the C++ Standard APIs to the distributed case. HPX is developed by the STE||AR group (see People).

The goal of HPX is to create a high quality, freely available, open source implementation of a new programming model for conventional systems, such as classic Linux based Beowulf clusters or multi-socket highly parallel SMP nodes. At the same time, we want to have a very modular and well designed runtime system architecture which would allow us to port our implementation onto new computer system architectures. We want to use real-world applications to drive the development of the runtime system, coining out required functionalities and converging onto a stable API which will provide a smooth migration path for developers.

The API exposed by HPX is not only modeled after the interfaces defined by the C++11/14/17/20 ISO standard. It also adheres to the programming guidelines used by the Boost collection of C++ libraries. We aim to improve the scalability of today’s applications and to expose new levels of parallelism which are necessary to take advantage of the exascale systems of the future.

What’s so special about HPX?

  • HPX exposes a uniform, standards-oriented API for ease of programming parallel and distributed applications.
  • It enables programmers to write fully asynchronous code using hundreds of millions of threads.
  • HPX provides unified syntax and semantics for local and remote operations.
  • HPX makes concurrency manageable with dataflow and future based synchronization.
  • It implements a rich set of runtime services supporting a broad range of use cases.
  • HPX exposes a uniform, flexible, and extendable performance counter framework which can enable runtime adaptivity
  • It is designed to solve problems conventionally considered to be scaling-impaired.
  • HPX has been designed and developed for systems of any scale, from hand-held devices to very large scale systems.
  • It is the first fully functional implementation of the ParalleX execution model.
  • HPX is published under a liberal open-source license and has an open, active, and thriving developer community.

Why HPX?

Current advances in high performance computing (HPC) continue to suffer from the issues plaguing parallel computation. These issues include, but are not limited to, ease of programming, inability to handle dynamically changing workloads, scalability, and efficient utilization of system resources. Emerging technological trends such as multi-core processors further highlight limitations of existing parallel computation models. To mitigate the aforementioned problems, it is necessary to rethink the approach to parallelization models. ParalleX contains mechanisms such as multi-threading, parcels, global name space support, percolation and local control objects (LCO). By design, ParalleX overcomes limitations of current models of parallelism by alleviating contention, latency, overhead and starvation. With ParalleX, it is further possible to increase performance by at least an order of magnitude on challenging parallel algorithms, e.g., dynamic directed graph algorithms and adaptive mesh refinement methods for astrophysics. An additional benefit of ParalleX is fine-grained control of power usage, enabling reductions in power consumption.

ParalleX—a new execution model for future architectures

ParalleX is a new parallel execution model that offers an alternative to the conventional computation models, such as message passing. ParalleX distinguishes itself by:

  • Split-phase transaction model
  • Message-driven
  • Distributed shared memory (not cache coherent)
  • Multi-threaded
  • Futures synchronization
  • Local Control Objects (LCOs)
  • Synchronization for anonymous producer-consumer scenarios
  • Percolation (pre-staging of task data)

The ParalleX model is intrinsically latency hiding, delivering an abundance of variable-grained parallelism within a hierarchical namespace environment. The goal of this innovative strategy is to enable future systems delivering very high efficiency, increased scalability and ease of programming. ParalleX can contribute to significant improvements in the design of all levels of computing systems and their usage from application algorithms and their programming languages to system architecture and hardware design together with their supporting compilers and operating system software.

What is HPX?

High Performance ParalleX (HPX) is the first runtime system implementation of the ParalleX execution model. The HPX runtime software package is a modular, feature-complete, and performance oriented representation of the ParalleX execution model targeted at conventional parallel computing architectures such as SMP nodes and commodity clusters. It is academically developed and freely available under an open source license. We provide HPX to the community for experimentation and application to achieve high efficiency and scalability for dynamic adaptive and irregular computational problems. HPX is a C++ library that supports a set of critical mechanisms for dynamic adaptive resource management and lightweight task scheduling within the context of a global address space. It is solidly based on many years of experience in writing highly parallel applications for HPC systems.

The two-decade success of the communicating sequential processes (CSP) execution model and its message passing interface (MPI) programming model has been seriously eroded by challenges of power, processor core complexity, multi-core sockets, and heterogeneous structures of GPUs. Both efficiency and scalability for some current (strong scaled) applications and future Exascale applications demand new techniques to expose new sources of algorithm parallelism and exploit unused resources through adaptive use of runtime information.

The ParalleX execution model replaces CSP to provide a new computing paradigm embodying the governing principles for organizing and conducting highly efficient scalable computations greatly exceeding the capabilities of today’s problems. HPX is the first practical, reliable, and performance-oriented runtime system incorporating the principal concepts of the ParalleX model publicly provided in open source release form.

HPX is designed by the STE||AR Group (Systems Technology, Emergent Parallelism, and Algorithm Research) at Louisiana State University (LSU)’s Center for Computation and Technology (CCT) to enable developers to exploit the full processing power of many-core systems with an unprecedented degree of parallelism. STE||AR is a research group focusing on system software solutions and scientific application development for hybrid and many-core hardware architectures.

For more information about the STE||AR Group, see People.

What makes our systems slow?

Estimates say that we currently run our computers at way below 100% efficiency. The theoretical peak performance (usually measured in FLOPS—floating point operations per second) is much higher than any practical peak performance reached by any application. This is particularly true for highly parallel hardware. The more hardware parallelism we provide to an application, the better the application must scale in order to efficiently use all the resources of the machine. Roughly speaking, we distinguish two forms of scalability: strong scaling (see Amdahl’s Law) and weak scaling (see Gustafson’s Law). Strong scaling is defined as how the solution time varies with the number of processors for a fixed total problem size. It gives an estimate of how much faster can we solve a particular problem by throwing more resources at it. Weak scaling is defined as how the solution time varies with the number of processors for a fixed problem size per processor. In other words, it defines how much more data can we process by using more hardware resources.

In order to utilize as much hardware parallelism as possible an application must exhibit excellent strong and weak scaling characteristics, which requires a high percentage of work executed in parallel, i.e. using multiple threads of execution. Optimally, if you execute an application on a hardware resource with N processors it either runs N times faster or it can handle N times more data. Both cases imply 100% of the work is executed on all available processors in parallel. However, this is just a theoretical limit. Unfortunately, there are more things which limit scalability, mostly inherent to the hardware architectures and the programming models we use. We break these limitations into four fundamental factors which make our systems SLOW:

  • Starvation occurs when there is insufficient concurrent work available to maintain high utilization of all resources.
  • Latencies are imposed by the time-distance delay intrinsic to accessing remote resources and services.
  • Overhead is work required for the management of parallel actions and resources on the critical execution path which is not necessary in a sequential variant.
  • Waiting for contention resolution is the delay due to the lack of availability of oversubscribed shared resources.

Each of those four factors manifests itself in multiple and different ways; each of the hardware architectures and programming models expose specific forms. However the interesting part is that all of them are limiting the scalability of applications no matter what part of the hardware jungle we look at. Hand-helds, PCs, supercomputers, or the cloud, all suffer from the reign of the 4 horsemen: Starvation, Latency, Overhead, and Contention. This realization is very important as it allows us to derive the criteria for solutions to the scalability problem from first principles, it allows us to focus our analysis on very concrete patterns and measurable metrics. Moreover, any derived results will be applicable to a wide variety of targets.

Technology demands new response

Today’s computer systems are designed based on the initial ideas of John von Neumann, as published back in 1945, and later extended by the Harvard architecture. These ideas form the foundation, the execution model of computer systems we use currently. But apparently a new response is required in the light of the demands created by today’s technology.

So, what are the overarching objectives for designing systems allowing for applications to scale as they should? In our opinion, the main objectives are:

  • Performance: as mentioned, scalability and efficiency are the main criteria people are interested in
  • Fault tolerance: the low expected mean time between failures (MTBF) of future systems requires embracing faults, not trying to avoid them
  • Power: minimizing energy consumption is a must as it is one of the major cost factors today, even more so in the future
  • Generality: any system should be usable for a broad set of use cases
  • Programmability: for me as a programmer this is a very important objective, ensuring long term platform stability and portability

What needs to be done to meet those objectives, to make applications scale better on tomorrow’s architectures? Well, the answer is almost obvious: we need to devise a new execution model—a set of governing principles for the holistic design of future systems—targeted at minimizing the effect of the outlined SLOW factors. Everything we create for future systems, every design decision we make, every criteria we apply, has to be validated against this single, uniform metric. This includes changes in the hardware architecture we prevalently use today, and it certainly involves new ways of writing software, starting from the operating system, runtime system, compilers, and at the application level. However the key point is that all those layers have to be co-designed, they are interdependent and cannot be seen as separate facets. The systems we have today have been evolving for over 50 years now. All layers function in a certain way relying on the other layers to do so as well. However, we do not have the time to wait for a coherent system to evolve for another 50 years. The new paradigms are needed now—therefore, co-design is the key.

Governing principles applied while developing HPX

As it turn out, we do not have to start from scratch. Not everything has to be invented and designed anew. Many of the ideas needed to combat the 4 horsemen have already been had, often more than 30 years ago. All it takes is to gather them into a coherent approach. We’ll highlight some of the derived principles we think to be crucial for defeating SLOW. Some of those are focused on high-performance computing, others are more general.

Focus on latency hiding instead of latency avoidance

It is impossible to design a system exposing zero latencies. In an effort to come as close as possible to this goal many optimizations are mainly targeted towards minimizing latencies. Examples for this can be seen everywhere, for instance low latency network technologies like InfiniBand, caching memory hierarchies in all modern processors, the constant optimization of existing MPI implementations to reduce related latencies, or the data transfer latencies intrinsic to the way we use GPGPUs today. It is important to note, that existing latencies are often tightly related to some resource having to wait for the operation to be completed. At the same time it would be perfectly fine to do some other, unrelated work in the meantime, allowing the system to hide the latencies by filling the idle-time with useful work. Modern systems already employ similar techniques (pipelined instruction execution in the processor cores, asynchronous input/output operations, and many more). What we propose is to go beyond anything we know today and to make latency hiding an intrinsic concept of the operation of the whole system stack.

Embrace fine-grained parallelism instead of heavyweight Threads

If we plan to hide latencies even for very short operations, such as fetching the contents of a memory cell from main memory (if it is not already cached), we need to have very lightweight threads with extremely short context switching times, optimally executable within one cycle. Granted, for mainstream architectures this is not possible today (even if we already have special machines supporting this mode of operation, such as the Cray XMT). For conventional systems however, the smaller the overhead of a context switch and the finer the granularity of the threading system, the better will be the overall system utilization and its efficiency. For today’s architectures we already see a flurry of libraries providing exactly this type of functionality: non-pre-emptive, task-queue based parallelization solutions, such as Intel Threading Building Blocks (TBB), Microsoft Parallel Patterns Library (PPL), Cilk++, and many others. The possibility to suspend a current task if some preconditions for its execution are not met (such as waiting for I/O or the result of a different task), seamlessly switching to any other task which can continue, and to reschedule the initial task after the required result has been calculated, which makes the implementation of latency hiding almost trivial.

Rediscover constraint-based synchronization to replace global Barriers

The code we write today is riddled with implicit (and explicit) global barriers. By global barrier we mean the synchronization of the control flow between several (very often all) threads (when using OpenMP) or processes (MPI). For instance, an implicit global barrier is inserted after each loop parallelized using OpenMP as the system synchronizes the threads used to execute the different iterations in parallel. In MPI each of the communication steps imposes an explicit barrier onto the execution flow as (often all) nodes have to be synchronized. Each of those barriers acts as an eye of the needle the overall execution is forced to be squeezed through. Even minimal fluctuations in the execution times of the parallel threads (jobs) causes them to wait. Additionally it is often only one of the threads executing doing the actual reduce operation, which further impedes parallelism. A closer analysis of a couple of key algorithms used in science applications reveals that these global barriers are not always necessary. In many cases it is sufficient to synchronize a small subset of the threads. Any operation should proceed whenever the preconditions for its execution are met, and only those. Usually there is no need to wait for iterations of a loop to finish before you could continue calculating other things, all you need is to have those iterations done which were producing the required results for a particular next operation. Good bye global barriers, hello constraint based synchronization! People have been trying to build this type of computing (and even computers) already back in the 1970’s. The theory behind what they did is based on ideas around static and dynamic dataflow. There are certain attempts today to get back to those ideas and to incorporate them with modern architectures. For instance, a lot of work is being done in the area of constructing dataflow oriented execution trees. Our results show that employing dataflow techniques in combination with the other ideas, as outlined herein, considerably improves scalability for many problems.

Adaptive Locality Control instead of Static Data Distribution

While this principle seems to be a given for single desktop or laptop computers (the operating system is your friend), it is everything but ubiquitous on modern supercomputers, which are usually built from a large number of separate nodes (i.e. Beowulf clusters), tightly interconnected by a high bandwidth, low latency network. Today’s prevalent programming model for those is MPI which does not directly help with proper data distribution, leaving it to the programmer to decompose the data to all of the nodes the application is running on. There are a couple of specialized languages and programming environments based on PGAS (Partitioned Global Address Space) designed to overcome this limitation, such as Chapel, X10, UPC, or Fortress. However all systems based on PGAS rely on static data distribution. This works fine as long as such a static data distribution does not result in homogeneous workload distributions or other resource utilization imbalances. In a distributed system these imbalances can be mitigated by migrating part of the application data to different localities (nodes). The only framework supporting (limited) migration today is Charm++. The first attempts towards solving related problem go back decades as well, a good example is the Linda coordination language. Nevertheless, none of the other mentioned systems support data migration today, which forces the users to either rely on static data distribution and live with the related performance hits or to implement everything themselves, which is very tedious and difficult. We believe that the only viable way to flexibly support dynamic and adaptive locality control is to provide a global, uniform address space to the applications, even on distributed systems.

Prefer moving work to the data over moving data to the work

For best performance it seems obvious to minimize the amount of bytes transferred from one part of the system to another. This is true on all levels. At the lowest level we try to take advantage of processor memory caches, thus minimizing memory latencies. Similarly, we try to amortize the data transfer time to and from GPGPUs as much as possible. At high levels we try to minimize data transfer between different nodes of a cluster or between different virtual machines on the cloud. Our experience (well, it’s almost common wisdom) show that the amount of bytes necessary to encode a certain operation is very often much smaller than the amount of bytes encoding the data the operation is performed upon. Nevertheless we still often transfer the data to a particular place where we execute the operation just to bring the data back to where it came from afterwards. As an example let me look at the way we usually write our applications for clusters using MPI. This programming model is all about data transfer between nodes. MPI is the prevalent programming model for clusters, it is fairly straightforward to understand and to use. Therefore, we often write the applications in a way accommodating this model, centered around data transfer. These applications usually work well for smaller problem sizes and for regular data structures. The larger the amount of data we have to churn and the more irregular the problem domain becomes, the worse are the overall machine utilization and the (strong) scaling characteristics. While it is not impossible to implement more dynamic, data driven, and asynchronous applications using MPI, it is overly difficult to so. At the same time, if we look at applications preferring to execute the code close the locality where the data was placed, i.e. utilizing active messages (for instance based on Charm++), we see better asynchrony, simpler application codes, and improved scaling.

Favor message driven computation over message passing

Today’s prevalently used programming model on parallel (multi-node) systems is MPI. It is based on message passing (as the name implies), which means that the receiver has to be aware of a message about to come in. Both codes, the sender and the receiver, have to synchronize in order to perform the communication step. Even the newer, asynchronous interfaces require explicitly coding the algorithms around the required communication scheme. As a result, any more than trivial MPI application spends a considerable amount of time waiting for incoming messages, thus causing starvation and latencies to impede full resource utilization. The more complex and more dynamic the data structures and algorithms become, the larger are the adverse effects. The community has discovered message-driven and (data-driven) methods of implementing algorithms a long time ago, and systems such as Charm++ already have integrated active messages demonstrating the validity of the concept. Message driven computation allows sending messages without requiring the receiver to actively wait for them. Any incoming message is handled asynchronously and triggers the encoded action by passing along arguments and—possibly—continuations. HPX combines this scheme with work queue-based scheduling as described above, which allows the system to overlap almost completely any communication with useful work, thereby minimizing latencies.

Quick start

This section is intended to get you to the point of running a basic HPX program as quickly as possible. To that end we skip many details but instead give you hints and links to more details along the way.

We assume that you are on a Unix system with access to reasonably recent packages. You should have cmake and make available for the build system (pkg-config is also supported, see Using HPX with pkg-config).

Getting HPX

Download a tarball of the latest release from HPX Downloads and unpack it or clone the repository directly using git:

git clone https://github.com/STEllAR-GROUP/hpx.git

It is also recommended that you check out the latest stable tag:

git checkout 1.3.0

HPX dependencies

The minimum dependencies needed to use HPX are Boost and Portable Hardware Locality (HWLOC). If these are not available through your system package manager, see Installing Boost and Installing Hwloc for instructions on how to build them yourself. In addition to Boost and Portable Hardware Locality (HWLOC), it is recommended that you don’t use the system allocator, but instead use either tcmalloc from google-perftools (default) or jemalloc for better performance. If you would like to try HPX without a custom allocator at this point you can configure HPX to use the system allocator in the next step.

A full list of required and optional dependencies, including recommended versions is available at Prerequisites.

Building HPX

Once you have the source code and the dependencies, set up a separate build directory and configure the project. Assuming all your dependencies are in paths known to CMake, the following gets you started:

# In the HPX source directory
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/install/path ..
make install

This will build the core HPX libraries and examples, and install them to your chosen location. If you want to install HPX to system folders simply leave out the CMAKE_INSTALL_PREFIX option. This may take a while. To speed up the process launch more jobs by passing the -jN option to make.

Tip

Do not set only -j (i.e. -j without an explicit number of jobs) unless you have a lot of memory available on your machine.

Tip

If you want to change CMake variables for your build it is usually a good idea to start with a clean build directory to avoid configuration problems. It is especially important that you use a clean build directory when changing between Release and Debug modes.

If your dependencies are in custom locations you may need to tell CMake where to find them by passing one or more of the following options to CMake:

-DBOOST_ROOT=/path/to/boost
-DHWLOC_ROOT=/path/to/hwloc
-DTCMALLOC_ROOT=/path/to/tcmalloc
-DJEMALLOC_ROOT=/path/to/jemalloc

If you want to try HPX without using a custom allocator pass -DHPX_WITH_MALLOC=system to CMake.

Important

If you are building HPX for a system with more than 64 processing units you must change the CMake variables HPX_WITH_MORE_THAN_64_THREADS (to On) and HPX_WITH_MAX_CPU_COUNT (to a value at least as big as the number of (virtual) cores on your system).

To build the tests run make tests. To run the tests run either make test or use ctest for more control over which tests to run. You can run single tests for example with ctest --output-on-failure -R tests.unit.parallel.algorithms.for_loop or a whole group of tests with ctest --output-on-failure -R tests.unit.

If you did not run make install earlier do so now or build the hello_world_1 example by running:

make hello_world_1

HPX executables end up in the bin directory in your build directory. You can now run hello_world_1 and should see the following output:

./bin/hello_world_1
Hello World!

You’ve just run an example which prints Hello World! from the HPX runtime. The source for the example is in examples/quickstart/hello_world_1.cpp. The hello_world_distributed example (also available in the examples/quickstart directory) is a distributed hello world program which is described in Remote execution with actions: Hello world. It provides a gentle introduction to the distributed aspects of HPX.

Tip

Most build targets in HPX have two names: a simple name and a hierarchical name corresponding to what type of example or test the target is. If you are developing HPX it is often helpful to run make help to get a list of available targets. For example, make help | grep hello_world outputs the following:

... examples.quickstart.hello_world_2
... hello_world_2
... examples.quickstart.hello_world_1
... hello_world_1
... examples.quickstart.hello_world_distributed
... hello_world_distributed

It is also possible to build e.g. all quickstart examples using make examples.quickstart.

Hello, World!

The following CMakeLists.txt is a minimal example of what you need in order to build an executable using CMake and HPX:

cmake_minimum_required(VERSION 3.3.2)
project(my_hpx_project CXX)
find_package(HPX REQUIRED)
add_hpx_executable(my_hpx_program
    SOURCES main.cpp
    COMPONENT_DEPENDENCIES iostreams)

Note

You will most likely have more than one main.cpp file in your project. See the section on Using HPX with CMake-based projects for more details on how to use add_hpx_executable.

Note

COMPONENT_DEPENDENCIES iostreams is optional for a minimal project but lets us use the HPX equivalent of std::cout, i.e. the HPX The HPX I/O-streams component functionality in our application.

Create a new project directory and a CMakeLists.txt with the contents above. Also create a main.cpp with the contents below.

// Including 'hpx/hpx_main.hpp' instead of the usual 'hpx/hpx_init.hpp' enables
// to use the plain C-main below as the direct main HPX entry point.
#include <hpx/hpx_main.hpp>
#include <hpx/include/iostreams.hpp>

int main()
{
    // Say hello to the world!
    hpx::cout << "Hello World!\n" << hpx::flush;
    return 0;
}

Then, in your project directory run the following:

mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH=/path/to/hpx/installation ..
make all
./my_hpx_program

The program looks almost like a regular C++ hello world with the exception of the two includes and hpx::cout. When you include hpx_main.hpp some things will be done behind the scenes to make sure that main actually gets launched on the HPX runtime. So while it looks almost the same you can now use futures, async, parallel algorithms and more which make use of the HPX runtime with lightweight threads. hpx::cout is a replacement for std::cout to make sure printing never blocks a lightweight thread. You can read more about hpx::cout in The HPX I/O-streams component. If you rebuild and run your program now you should see the familiar Hello World!:

./my_hpx_program
Hello World!

Note

You do not have to let HPX take over your main function like in the example. You can instead keep your normal main function, and define a separate hpx_main function which acts as the entry point to the HPX runtime. In that case you start the HPX runtime explicitly by calling hpx::init:

//  Copyright (c) 2007-2012 Hartmut Kaiser
//
//  Distributed under the Boost Software License, Version 1.0. (See accompanying
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

///////////////////////////////////////////////////////////////////////////////
// The purpose of this example is to initialize the HPX runtime explicitly and
// execute a HPX-thread printing "Hello World!" once. That's all.

//[hello_world_2_getting_started
#include <hpx/hpx_init.hpp>
#include <hpx/include/iostreams.hpp>

int hpx_main(int, char**)
{
    // Say hello to the world!
    hpx::cout << "Hello World!\n" << hpx::flush;
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::init(argc, argv);
}
//]

You can also use hpx::start and hpx::stop for a non-blocking alternative, or use hpx::resume and hpx::suspend if you need to combine HPX with other runtimes.

See Starting the HPX runtime for more details on how to initialize and run the HPX runtime.

Caution

When including hpx_main.hpp the user-defined main gets renamed and the real main function is defined by HPX. This means that the user-defined main must include a return statement, unlike the real main. If you do not include the return statement you may end up with confusing compile time errors mentioning user_main or even runtime errors.

Writing task-based applications

So far we haven’t done anything that can’t be done using the C++ standard library. In this section we will give a short overview of what you can do with HPX on a single node. The essence is to avoid global synchronization and break up your application into small, composable tasks whose dependencies control the flow of your application. Remember, however, that HPX allows you to write distributed applications similarly to how you would write applications for a single node (see Why HPX? and Writing distributed HPX applications).

If you are already familiar with async and futures from the C++ standard library, the same functionality is available in HPX.

The following terminology is essential when talking about task-based C++ programs:

  • lightweight thread: Essential for good performance with task-based programs. Lightweight refers to smaller stacks and faster context switching compared to OS-threads. Smaller overheads allow the program to be broken up into smaller tasks, which in turns helps the runtime fully utilize all processing units.
  • async: The most basic way of launching tasks asynchronously. Returns a future<T>.
  • future<T>: Represents a value of type T that will be ready in the future. The value can be retrieved with get (blocking) and one can check if the value is ready with is_ready (non-blocking).
  • shared_future<T>: Same as future<T> but can be copied (similar to std::unique_ptr vs std::shared_ptr).
  • continuation: A function that is to be run after a previous task has run (represented by a future). then is a method of future<T> that takes a function to run next. Used to build up dataflow DAGs (directed acyclic graphs). shared_futures help you split up nodes in the DAG and functions like when_all help you join nodes in the DAG.

The following example is a collection of the most commonly used functionality in HPX:

#include <hpx/hpx_main.hpp>
#include <hpx/include/iostreams.hpp>
#include <hpx/include/lcos.hpp>
#include <hpx/include/parallel_generate.hpp>
#include <hpx/include/parallel_sort.hpp>

#include <random>
#include <vector>

void final_task(hpx::future<hpx::util::tuple<hpx::future<double>, hpx::future<void>>>)
{
    hpx::cout << "in final_task" << hpx::endl;
}

// Avoid ABI incompatibilities between C++11/C++17 as std::rand has exception
// specification in libstdc++.
int rand_wrapper()
{
    return std::rand();
}

int main(int, char**)
{
    // A function can be launched asynchronously. The program will not block
    // here until the result is available.
    hpx::future<int> f = hpx::async([]() { return 42; });
    hpx::cout << "Just launched a task!" << hpx::endl;

    // Use get to retrieve the value from the future. This will block this task
    // until the future is ready, but the HPX runtime will schedule other tasks
    // if there are tasks available.
    hpx::cout << "f contains " << f.get() << hpx::endl;

    // Let's launch another task.
    hpx::future<double> g = hpx::async([]() { return 3.14; });

    // Tasks can be chained using the then method. The continuation takes the
    // future as an argument.
    hpx::future<double> result = g.then([](hpx::future<double>&& gg)
        {
            // This function will be called once g is ready. gg is g moved
            // into the continuation.
            return gg.get() * 42.0 * 42.0;
        });

    // You can check if a future is ready with the is_ready method.
    hpx::cout << "Result is ready? " << result.is_ready() << hpx::endl;

    // You can launch other work in the meantime. Let's sort a vector.
    std::vector<int> v(1000000);

    // We fill the vector synchronously and sequentially.
    hpx::parallel::generate(hpx::parallel::execution::seq,
                  std::begin(v), std::end(v), &rand_wrapper);

    // We can launch the sort in parallel and asynchronously.
    hpx::future<void> done_sorting =
        hpx::parallel::sort(
            hpx::parallel::execution::par( // In parallel.
                hpx::parallel::execution::task), // Asynchronously.
            std::begin(v),
            std::end(v));

    // We launch the final task when the vector has been sorted and result is
    // ready using when_all.
    auto all = hpx::when_all(result, done_sorting).then(&final_task);

    // We can wait for all to be ready.
    all.wait();

    // all must be ready at this point because we waited for it to be ready.
    hpx::cout <<
        (all.is_ready() ? "all is ready!" : "all is not ready...") << hpx::endl;

    return hpx::finalize();
}

Try copying the contents to your main.cpp file and look at the output. It can be a good idea to go through the program step by step with a debugger. You can also try changing the types or adding new arguments to functions to make sure you can get the types to match. The type of the then method can be especially tricky to get right (the continuation needs to take the future as an argument).

Note

HPX programs accept command line arguments. The most important one is --hpx:threads=N to set the number of OS-threads used by HPX. HPX uses one thread per core by default. Play around with the example above and see what difference the number of threads makes on the sort function. See Launching and configuring HPX applications for more details on how and what options you can pass to HPX.

Tip

The example above used the construction hpx::when_all(...).then(...). For convenience and performance it is a good idea to replace uses of hpx::when_all(...).then(...) with dataflow. See Dataflow: Interest calculator for more details on dataflow.

Tip

If possible, prefer to use the provided parallel algorithms instead of writing your own implementation. This can save you time and the resulting program is often faster.

Next steps

If you haven’t done so already, reading the Terminology section will help you get familiar with the terms used in HPX.

The Examples section contains small, self-contained walkthroughs of example HPX programs. The Local to remote: 1D stencil example is a thorough, realistic example starting from a single node implementation and going stepwise to a distributed implementation.

The Manual contains detailed information on writing, building and running HPX applications.

Terminology

This section gives definitions for some of the terms used throughout the HPX documentation and source code.

Locality
A locality in HPX describes a synchronous domain of execution, or the domain of bounded upper response time. This normally is just a single node in a cluster or a NUMA domain in a SMP machine.
Active Global Address Space
AGAS
HPX incorporates a global address space. Any executing thread can access any object within the domain of the parallel application with the caveat that it must have appropriate access privileges. The model does not assume that global addresses are cache coherent; all loads and stores will deal directly with the site of the target object. All global addresses within a Synchronous Domain are assumed to be cache coherent for those processor cores that incorporate transparent caches. The Active Global Address Space used by HPX differs from research PGAS models. Partitioned Global Address Space is passive in their means of address translation. Copy semantics, distributed compound operations, and affinity relationships are some of the global functionality supported by AGAS.
Process
The concept of the “process” in HPX is extended beyond that of either sequential execution or communicating sequential processes. While the notion of process suggests action (as do “function” or “subroutine”) it has a further responsibility of context, that is, the logical container of program state. It is this aspect of operation that process is employed in HPX. Furthermore, referring to “parallel processes” in HPX designates the presence of parallelism within the context of a given process, as well as the coarse grained parallelism achieved through concurrency of multiple processes of an executing user job. HPX processes provide a hierarchical name space within the framework of the active global address space and support multiple means of internal state access from external sources.
Parcel
The Parcel is a component in HPX that communicates data, invokes an action at a distance, and distributes flow-control through the migration of continuations. Parcels bridge the gap of asynchrony between synchronous domains while maintaining symmetry of semantics between local and global execution. Parcels enable message-driven computation and may be seen as a form of “active messages”. Other important forms of message-driven computation predating active messages include dataflow tokens, the J-machine’s support for remote method instantiation, and at the coarse grained variations of Unix remote procedure calls, among others. This enables work to be moved to the data as well as performing the more common action of bringing data to the work. A parcel can cause actions to occur remotely and asynchronously, among which are the creation of threads at different system nodes or synchronous domains.
Local Control Object
Lightweight Control Object
LCO

A local control object (sometimes called a lightweight control object) is a general term for the synchronization mechanisms used in HPX. Any object implementing a certain concept can be seen as an LCO. This concepts encapsulates the ability to be triggered by one or more events which when taking the object into a predefined state will cause a thread to be executed. This could either create a new thread or resume an existing thread.

The LCO is a family of synchronization functions potentially representing many classes of synchronization constructs, each with many possible variations and multiple instances. The LCO is sufficiently general that it can subsume the functionality of conventional synchronization primitives such as spinlocks, mutexes, semaphores, and global barriers. However due to the rich concept an LCO can represent powerful synchronization and control functionality not widely employed, such as dataflow and futures (among others), which open up enormous opportunities for rich diversity of distributed control and operation.

See Using LCOs for more details on how to use LCOs in HPX.

Action
An action is a function that can be invoked remotely. In HPX a plain function can be made into an action using a macro. See Applying actions for details on how to use actions in HPX.
Component
A component is a C++ object which can be accessed remotely. A component can also contain member functions which can be invoked remotely. These are referred to as component actions. See Writing components for details on how to use components in HPX.

Examples

The following sections analyze some examples to help you get familiar with the HPX style of programming. We start off with simple examples that utilize basic HPX elements and then begin to expose the reader to the more complex and powerful HPX concepts.

Asynchronous execution with hpx::async: Fibonacci

The Fibonacci sequence is a sequence of numbers starting with 0 and 1 where every subsequent number is the sum of the previous two numbers. In this example, we will use HPX to calculate the value of the n-th element of the Fibonacci sequence. In order to compute this problem in parallel, we will use a facility known as a future.

As shown in the Fig. 1 below, a future encapsulates a delayed computation. It acts as a proxy for a result initially not known, most of the time because the computation of the result has not completed yet. The future synchronizes the access of this value by optionally suspending any HPX-threads requesting the result until the value is available. When a future is created, it spawns a new HPX-thread (either remotely with a parcel or locally by placing it into the thread queue) which, when run, will execute the function associated with the future. The arguments of the function are bound when the future is created.

_images/future_schematics.png

Fig. 1 Schematic of a future execution.

Once the function has finished executing, a write operation is performed on the future. The write operation marks the future as completed, and optionally stores data returned by the function. When the result of the delayed computation is needed, a read operation is performed on the future. If the future’s function hasn’t completed when a read operation is performed on it, the reader HPX-thread is suspended until the future is ready. The future facility allows HPX to schedule work early in a program so that when the function value is needed it will already be calculated and available. We use this property in our Fibonacci example below to enable its parallel execution.

Setup

The source code for this example can be found here: fibonacci_local.cpp.

To compile this program, go to your HPX build directory (see HPX build system for information on configuring and building HPX) and enter:

make examples.quickstart.fibonacci_local

To run the program type:

./bin/fibonacci_local

This should print (time should be approximate):

fibonacci(10) == 55
elapsed time: 0.002430 [s]

This run used the default settings, which calculate the tenth element of the Fibonacci sequence. To declare which Fibonacci value you want to calculate, use the --n-value option. Additionally you can use the --hpx:threads option to declare how many OS-threads you wish to use when running the program. For instance, running:

./bin/fibonacci --n-value 20 --hpx:threads 4

Will yield:

fibonacci(20) == 6765
elapsed time: 0.062854 [s]
Walkthrough

Now that you have compiled and run the code, let’s look at how the code works. Since this code is written in C++, we will begin with the main() function. Here you can see that in HPX, main() is only used to initialize the runtime system. It is important to note that application-specific command line options are defined here. HPX uses Boost.Program Options for command line processing. You can see that our programs --n-value option is set by calling the add_options() method on an instance of boost::program_options::options_description. The default value of the variable is set to 10. This is why when we ran the program for the first time without using the --n-value option the program returned the 10th value of the Fibonacci sequence. The constructor argument of the description is the text that appears when a user uses the --hpx:help option to see what command line options are available. HPX_APPLICATION_STRING is a macro that expands to a string constant containing the name of the HPX application currently being compiled.

In HPX main() is used to initialize the runtime system and pass the command line arguments to the program. If you wish to add command line options to your program you would add them here using the instance of the Boost class options_description, and invoking the public member function .add_options() (see Boost Documentation for more details). hpx::init calls hpx_main() after setting up HPX, which is where the logic of our program is encoded.

int main(int argc, char* argv[])
{
    // Configure application-specific options
    boost::program_options::options_description
       desc_commandline("Usage: " HPX_APPLICATION_STRING " [options]");

    desc_commandline.add_options()
        ( "n-value",
          boost::program_options::value<std::uint64_t>()->default_value(10),
          "n value for the Fibonacci function")
        ;

    // Initialize and run HPX
    return hpx::init(desc_commandline, argc, argv);
}

The hpx::init function in main() starts the runtime system, and invokes hpx_main() as the first HPX-thread. Below we can see that the basic program is simple. The command line option --n-value is read in, a timer (hpx::util::high_resolution_timer) is set up to record the time it takes to do the computation, the fibonacci function is invoked synchronously, and the answer is printed out.

int hpx_main(boost::program_options::variables_map& vm)
{
    // extract command line argument, i.e. fib(N)
    std::uint64_t n = vm["n-value"].as<std::uint64_t>();

    {
        // Keep track of the time required to execute.
        hpx::util::high_resolution_timer t;

        std::uint64_t r = fibonacci(n);

        char const* fmt = "fibonacci({1}) == {2}\nelapsed time: {3} [s]\n";
        hpx::util::format_to(std::cout, fmt, n, r, t.elapsed());
    }

    return hpx::finalize(); // Handles HPX shutdown
}

The fibonacci function itself is synchronous as the work done inside is asynchronous. To understand what is happening we have to look inside the fibonacci function:

std::uint64_t fibonacci(std::uint64_t n)
{
    if (n < 2)
        return n;

    // Invoking the Fibonacci algorithm twice is inefficient.
    // However, we intentionally demonstrate it this way to create some
    // heavy workload.

    hpx::future<std::uint64_t> n1 = hpx::async(fibonacci, n - 1);
    hpx::future<std::uint64_t> n2 = hpx::async(fibonacci, n - 2);

    return n1.get() + n2.get();   // wait for the Futures to return their values
}

This block of code is looks similar to regular C++ code. First, if (n < 2), meaning n is 0 or 1, then we return 0 or 1 (recall the first element of the Fibonacci sequence is 0 and the second is 1). If n is larger than 1 we spawn two new tasks whose results are contained in n1 and n2. This is done using hpx::async which takes as arguments a function (function pointer, object or lambda) and the arguments to the function. Instead of returning a std::uint64_t like fibonacci does, hpx::async returns a future of a std::uint64_t, i.e. hpx::future<std::uint64_t>. Each of these futures represents an asynchronous, recursive call to fibonacci. After we’ve created the futures, we wait for both of them to finish computing, we add them together, and return that value as our result. We get the values from the futures using the get method. The recursive call tree will continue until n is equal to 0 or 1, at which point the value can be returned because it is implicitly known. When this termination condition is reached, the futures can then be added up, producing the n-th value of the Fibonacci sequence.

Note that calling get potentially blocks the calling HPX-thread, and lets other HPX-threads run in the meantime. There are, however, more efficient ways of doing this. examples/quickstart/fibonacci_futures.cpp contains many more variations of locally computing the Fibonacci numbers, where each method makes different tradeoffs in where asynchrony and parallelism is applied. To get started, however, the method above is sufficient and optimizations can be applied once you are more familiar with HPX. The example Dataflow: Interest calculator presents dataflow, which is a way to more efficiently chain together multiple tasks.

Asynchronous execution with hpx::async and actions: Fibonacci

This example extends the previous example by introducing actions: functions that can be run remotely. In this example, however, we will still only run the action locally. The mechanism to execute actions stays the same: hpx::async. Later examples will demonstrate running actions on remote localities (e.g. Remote execution with actions: Hello world).

Setup

The source code for this example can be found here: fibonacci.cpp.

To compile this program, go to your HPX build directory (see HPX build system for information on configuring and building HPX) and enter:

make examples.quickstart.fibonacci

To run the program type:

./bin/fibonacci

This should print (time should be approximate):

fibonacci(10) == 55
elapsed time: 0.00186288 [s]

This run used the default settings, which calculate the tenth element of the Fibonacci sequence. To declare which Fibonacci value you want to calculate, use the --n-value option. Additionally you can use the --hpx:threads option to declare how many OS-threads you wish to use when running the program. For instance, running:

./bin/fibonacci --n-value 20 --hpx:threads 4

Will yield:

fibonacci(20) == 6765
elapsed time: 0.233827 [s]
Walkthrough

The code needed to initialize the HPX runtime is the same as in the previous example:

int main(int argc, char* argv[])
{
    // Configure application-specific options
    boost::program_options::options_description
       desc_commandline("Usage: " HPX_APPLICATION_STRING " [options]");

    desc_commandline.add_options()
        ( "n-value",
          boost::program_options::value<std::uint64_t>()->default_value(10),
          "n value for the Fibonacci function")
        ;

    // Initialize and run HPX
    return hpx::init(desc_commandline, argc, argv);
}

The hpx::init function in main() starts the runtime system, and invokes hpx_main() as the first HPX-thread. The command line option --n-value is read in, a timer (hpx::util::high_resolution_timer) is set up to record the time it takes to do the computation, the fibonacci action is invoked synchronously, and the answer is printed out.

int hpx_main(boost::program_options::variables_map& vm)
{
    // extract command line argument, i.e. fib(N)
    std::uint64_t n = vm["n-value"].as<std::uint64_t>();

    {
        // Keep track of the time required to execute.
        hpx::util::high_resolution_timer t;

        // Wait for fib() to return the value
        fibonacci_action fib;
        std::uint64_t r = fib(hpx::find_here(), n);

        char const* fmt = "fibonacci({1}) == {2}\nelapsed time: {3} [s]\n";
        hpx::util::format_to(std::cout, fmt, n, r, t.elapsed());
    }

    return hpx::finalize(); // Handles HPX shutdown
}

Upon a closer look we see that we’ve created a std::uint64_t to store the result of invoking our fibonacci_action fib. This action will launch synchronously (as the work done inside of the action will be asynchronous itself) and return the result of the Fibonacci sequence. But wait, what is an action? And what is this fibonacci_action? For starters, an action is a wrapper for a function. By wrapping functions, HPX can send packets of work to different processing units. These vehicles allow users to calculate work now, later, or on certain nodes. The first argument to our action is the location where the action should be run. In this case, we just want to run the action on the machine that we are currently on, so we use hpx::find_here that we wish to calculate. To further understand this we turn to the code to find where fibonacci_action was defined:

// forward declaration of the Fibonacci function
std::uint64_t fibonacci(std::uint64_t n);

// This is to generate the required boilerplate we need for the remote
// invocation to work.
HPX_PLAIN_ACTION(fibonacci, fibonacci_action);

A plain action is the most basic form of action. Plain actions wrap simple global functions which are not associated with any particular object (we will discuss other types of actions in Components and actions: Accumulator). In this block of code the function fibonacci() is declared. After the declaration, the function is wrapped in an action in the declaration HPX_PLAIN_ACTION. This function takes two arguments: the name of the function that is to be wrapped and the name of the action that you are creating.

This picture should now start making sense. The function fibonacci() is wrapped in an action fibonacci_action, which was run synchronously but created asynchronous work, then returns a std::uint64_t representing the result of the function fibonacci(). Now, let’s look at the function fibonacci():

std::uint64_t fibonacci(std::uint64_t n)
{
    if (n < 2)
        return n;

    // We restrict ourselves to execute the Fibonacci function locally.
    hpx::naming::id_type const locality_id = hpx::find_here();

    // Invoking the Fibonacci algorithm twice is inefficient.
    // However, we intentionally demonstrate it this way to create some
    // heavy workload.

    fibonacci_action fib;
    hpx::future<std::uint64_t> n1 =
        hpx::async(fib, locality_id, n - 1);
    hpx::future<std::uint64_t> n2 =
        hpx::async(fib, locality_id, n - 2);

    return n1.get() + n2.get();   // wait for the Futures to return their values
}

This block of code is much more straightforward and should look familiar from the previous example. First, if (n < 2), meaning n is 0 or 1, then we return 0 or 1 (recall the first element of the Fibonacci sequence is 0 and the second is 1). If n is larger than 1 we spawn two tasks using hpx::async. Each of these futures represents an asynchronous, recursive call to fibonacci. As previously we wait for both futures to finish computing, get the results, add them together, and return that value as our result. The recursive call tree will continue until n is equal to 0 or 1, at which point the value can be returned because it is implicitly known. When this termination condition is reached, the futures can then be added up, producing the n-th value of the Fibonacci sequence.

Remote execution with actions: Hello world

This program will print out a hello world message on every OS-thread on every locality. The output will look something like this:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 1 on locality 1
hello world from OS-thread 0 on locality 0
hello world from OS-thread 0 on locality 1
Setup

The source code for this example can be found here: hello_world_distributed.cpp.

To compile this program, go to your HPX build directory (see HPX build system for information on configuring and building HPX) and enter:

make examples.quickstart.hello_world_distributed

To run the program type:

./bin/hello_world_distributed

This should print:

hello world from OS-thread 0 on locality 0

To use more OS-threads use the command line option --hpx:threads and type the number of threads that you wish to use. For example, typing:

./bin/hello_world_distributed --hpx:threads 2

will yield:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0

Notice how the ordering of the two print statements will change with subsequent runs. To run this program on multiple localities please see the section How to use HPX applications with PBS.

Walkthrough

Now that you have compiled and run the code, let’s look at how the code works, beginning with main():

//` Here is the main entry point. By using the include 'hpx/hpx_main.hpp' HPX
//` will invoke the plain old C-main() as its first HPX thread.
int main()
{
    // Get a list of all available localities.
    std::vector<hpx::naming::id_type> localities =
        hpx::find_all_localities();

    // Reserve storage space for futures, one for each locality.
    std::vector<hpx::lcos::future<void> > futures;
    futures.reserve(localities.size());

    for (hpx::naming::id_type const& node : localities)
    {
        // Asynchronously start a new task. The task is encapsulated in a
        // future, which we can query to determine if the task has
        // completed.
        typedef hello_world_foreman_action action_type;
        futures.push_back(hpx::async<action_type>(node));
    }

    // The non-callback version of hpx::lcos::wait_all takes a single parameter,
    // a vector of futures to wait on. hpx::wait_all only returns when
    // all of the futures have finished.
    hpx::wait_all(futures);
    return 0;
}

In this excerpt of the code we again see the use of futures. This time the futures are stored in a vector so that they can easily be accessed. hpx::wait_all is a family of functions that wait on for an std::vector<> of futures to become ready. In this piece of code, we are using the synchronous version of hpx::wait_all, which takes one argument (the std::vector<> of futures to wait on). This function will not return until all the futures in the vector have been executed.

In Asynchronous execution with hpx::async and actions: Fibonacci we used hpx::find_here to specify the target of our actions. Here, we instead use hpx::find_all_localities, which returns an std::vector<> containing the identifiers of all the machines in the system, including the one that we are on.

As in Asynchronous execution with hpx::async and actions: Fibonacci our futures are set using hpx::async<>. The hello_world_foreman_action is declared here:

// Define the boilerplate code necessary for the function 'hello_world_foreman'
// to be invoked as an HPX action.
HPX_PLAIN_ACTION(hello_world_foreman, hello_world_foreman_action);

Another way of thinking about this wrapping technique is as follows: functions (the work to be done) are wrapped in actions, and actions can be executed locally or remotely (e.g. on another machine participating in the computation).

Now it is time to look at the hello_world_foreman() function which was wrapped in the action above:

void hello_world_foreman()
{
    // Get the number of worker OS-threads in use by this locality.
    std::size_t const os_threads = hpx::get_os_thread_count();

    // Find the global name of the current locality.
    hpx::naming::id_type const here = hpx::find_here();

    // Populate a set with the OS-thread numbers of all OS-threads on this
    // locality. When the hello world message has been printed on a particular
    // OS-thread, we will remove it from the set.
    std::set<std::size_t> attendance;
    for (std::size_t os_thread = 0; os_thread < os_threads; ++os_thread)
        attendance.insert(os_thread);

    // As long as there are still elements in the set, we must keep scheduling
    // HPX-threads. Because HPX features work-stealing task schedulers, we have
    // no way of enforcing which worker OS-thread will actually execute
    // each HPX-thread.
    while (!attendance.empty())
    {
        // Each iteration, we create a task for each element in the set of
        // OS-threads that have not said "Hello world". Each of these tasks
        // is encapsulated in a future.
        std::vector<hpx::lcos::future<std::size_t> > futures;
        futures.reserve(attendance.size());

        for (std::size_t worker : attendance)
        {
            // Asynchronously start a new task. The task is encapsulated in a
            // future, which we can query to determine if the task has
            // completed.
            typedef hello_world_worker_action action_type;
            futures.push_back(hpx::async<action_type>(here, worker));
        }

        // Wait for all of the futures to finish. The callback version of the
        // hpx::lcos::wait_each function takes two arguments: a vector of futures,
        // and a binary callback.  The callback takes two arguments; the first
        // is the index of the future in the vector, and the second is the
        // return value of the future. hpx::lcos::wait_each doesn't return until
        // all the futures in the vector have returned.
        hpx::lcos::local::spinlock mtx;
        hpx::lcos::wait_each(
            hpx::util::unwrapping([&](std::size_t t) {
                if (std::size_t(-1) != t)
                {
                    std::lock_guard<hpx::lcos::local::spinlock> lk(mtx);
                    attendance.erase(t);
                }
            }),
            futures);
    }
}

Now, before we discuss hello_world_foreman(), let’s talk about the hpx::wait_each function. hpx::lcos::wait_each for each one. The version of hpx::lcos::wait_each invokes a callback function provided by the user, supplying the callback function with the result of the future.

In hello_world_foreman(), an std::set<> called attendance keeps track of which OS-threads have printed out the hello world message. When the OS-thread prints out the statement, the future is marked as ready, and hpx::lcos::wait_each in hello_world_foreman(). If it is not executing on the correct OS-thread, it returns a value of -1, which causes hello_world_foreman() to leave the OS-thread id in attendance.

std::size_t hello_world_worker(std::size_t desired)
{
    // Returns the OS-thread number of the worker that is running this
    // HPX-thread.
    std::size_t current = hpx::get_worker_thread_num();
    if (current == desired)
    {
        // The HPX-thread has been run on the desired OS-thread.
        char const* msg = "hello world from OS-thread {1} on locality {2}\n";

        hpx::util::format_to(hpx::cout, msg, desired, hpx::get_locality_id())
            << hpx::flush;

        return desired;
    }

    // This HPX-thread has been run by the wrong OS-thread, make the foreman
    // try again by rescheduling it.
    return std::size_t(-1);
}

// Define the boilerplate code necessary for the function 'hello_world_worker'
// to be invoked as an HPX action (by a HPX future). This macro defines the
// type 'hello_world_worker_action'.
HPX_PLAIN_ACTION(hello_world_worker, hello_world_worker_action);

Because HPX features work stealing task schedulers, there is no way to guarantee that an action will be scheduled on a particular OS-thread. This is why we must use a guess-and-check approach.

Components and actions: Accumulator

The accumulator example demonstrates the use of components. Components are C++ classes that expose methods as a type of HPX action. These actions are called component actions.

Components are globally named, meaning that a component action can be called remotely (e.g. from another machine). There are two accumulator examples in HPX; accumulator.

In the Asynchronous execution with hpx::async and actions: Fibonacci and the Remote execution with actions: Hello world, we introduced plain actions, which wrapped global functions. The target of a plain action is an identifier which refers to a particular machine involved in the computation. For plain actions, the target is the machine where the action will be executed.

Component actions, however, do not target machines. Instead, they target component instances. The instance may live on the machine that we’ve invoked the component action from, or it may live on another machine.

The component in this example exposes three different functions:

  • reset() - Resets the accumulator value to 0.
  • add(arg) - Adds arg to the accumulators value.
  • query() - Queries the value of the accumulator.

This example creates an instance of the accumulator, and then allows the user to enter commands at a prompt, which subsequently invoke actions on the accumulator instance.

Setup

The source code for this example can be found here: accumulator_client.cpp.

To compile this program, go to your HPX build directory (see HPX build system for information on configuring and building HPX) and enter:

make examples.accumulators.accumulator

To run the program type:

./bin/accumulator_client

Once the program starts running, it will print the following prompt and then wait for input. An example session is given below:

commands: reset, add [amount], query, help, quit
> add 5
> add 10
> query
15
> add 2
> query
17
> reset
> add 1
> query
1
> quit
Walkthrough

Now, let’s take a look at the source code of the accumulator example. This example consists of two parts: an HPX component library (a library that exposes an HPX component) and a client application which uses the library. This walkthrough will cover the HPX component library. The code for the client application can be found here: accumulator_client.cpp.

An HPX component is represented by two C++ classes:

  • A server class - The implementation of the components functionality.
  • A client class - A high-level interface that acts as a proxy for an instance of the component.

Typically, these two classes all have the same name, but the server class usually lives in different sub-namespaces (server). For example, the full names of the two classes in accumulator are:

  • examples::server::accumulator (server class)
  • examples::accumulator (client class)
The server class

The following code is from: accumulator.hpp.

All HPX component server classes must inherit publicly from the HPX component base class: hpx::components::component_base

The accumulator component inherits from hpx::components::locking_hook. This allows the runtime system to ensure that all action invocations are serialized. That means that the system ensures that no two actions are invoked at the same time on a given component instance. This makes the component thread safe and no additional locking has to be implemented by the user. Moreover, accumulator component is a component, because it also inherits from hpx::components::component_base (the template argument passed to locking_hook is used as its base class). The following snippet shows the corresponding code:

    class accumulator
      : public hpx::components::locking_hook<
            hpx::components::component_base<accumulator> >

Our accumulator class will need a data member to store its value in, so let’s declare a data member:

        argument_type value_;

The constructor for this class simply initializes value_ to 0:

        accumulator() : value_(0) {}

Next, let’s look at the three methods of this component that we will be exposing as component actions:

        /// Reset the components value to 0.
        void reset()
        {
            //  set value_ to 0.
            value_ = 0;
        }

        /// Add the given number to the accumulator.
        void add(argument_type arg)
        {
            //  add value_ to arg, and store the result in value_.
            value_ += arg;
        }

        /// Return the current value to the caller.
        argument_type query() const
        {
            // Get the value of value_.
            return value_;
        }

Here are the action types. These types wrap the methods we’re exposing. The wrapping technique is very similar to the one used in the Asynchronous execution with hpx::async and actions: Fibonacci and the Remote execution with actions: Hello world:

        HPX_DEFINE_COMPONENT_ACTION(accumulator, reset);
        HPX_DEFINE_COMPONENT_ACTION(accumulator, add);
        HPX_DEFINE_COMPONENT_ACTION(accumulator, query);

The last piece of code in the server class header is the declaration of the action type registration code:

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::accumulator::reset_action,
    accumulator_reset_action);

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::accumulator::add_action,
    accumulator_add_action);

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::accumulator::query_action,
    accumulator_query_action);

Note

The code above must be placed in the global namespace.

The rest of the registration code is in accumulator.cpp

///////////////////////////////////////////////////////////////////////////////
// Add factory registration functionality.
HPX_REGISTER_COMPONENT_MODULE();

///////////////////////////////////////////////////////////////////////////////
typedef hpx::components::component<
    examples::server::accumulator
> accumulator_type;

HPX_REGISTER_COMPONENT(accumulator_type, accumulator);

///////////////////////////////////////////////////////////////////////////////
// Serialization support for accumulator actions.
HPX_REGISTER_ACTION(
    accumulator_type::wrapped_type::reset_action,
    accumulator_reset_action);
HPX_REGISTER_ACTION(
    accumulator_type::wrapped_type::add_action,
    accumulator_add_action);
HPX_REGISTER_ACTION(
    accumulator_type::wrapped_type::query_action,
    accumulator_query_action);

Note

The code above must be placed in the global namespace.

The client class

The following code is from accumulator.hpp.

The client class is the primary interface to a component instance. Client classes are used to create components:

// Create a component on this locality.
examples::accumulator c = hpx::new_<examples::accumulator>(hpx::find_here());

and to invoke component actions:

c.add(hpx::launch::apply, 4);

Clients, like servers, need to inherit from a base class, this time, hpx::components::client_base:

    class accumulator
      : public hpx::components::client_base<
            accumulator, server::accumulator
        >

For readability, we typedef the base class like so:

        typedef hpx::components::client_base<
            accumulator, server::accumulator
        > base_type;

Here are examples of how to expose actions through a client class:

There are a few different ways of invoking actions:

  • Non-blocking: For actions which don’t have return types, or when we do not care about the result of an action, we can invoke the action using fire-and-forget semantics. This means that once we have asked HPX to compute the action, we forget about it completely and continue with our computation. We use hpx::apply to invoke an action in a non-blocking fashion.
        void reset(hpx::launch::apply_policy)
        {
            HPX_ASSERT(this->get_id());

            typedef server::accumulator::reset_action action_type;
            hpx::apply<action_type>(this->get_id());
        }
        hpx::future<argument_type> query(hpx::launch::async_policy)
        {
            HPX_ASSERT(this->get_id());

            typedef server::accumulator::query_action action_type;
            return hpx::async<action_type>(hpx::launch::async, this->get_id());
        }
  • Synchronous: To invoke an action in a fully synchronous manner, we can simply call hpx::async().get() (e.g., create a future and immediately wait on it to be ready). Here’s an example from the accumulator client class:
        void add(argument_type arg)
        {
            HPX_ASSERT(this->get_id());

            typedef server::accumulator::add_action action_type;
            action_type()(this->get_id(), arg);
        }

Note that this->get_id() references a data member of the hpx::components::client_base base class which identifies the server accumulator instance.

hpx::naming::id_type is a type which represents a global identifier in HPX. This type specifies the target of an action. This is the type that is returned by hpx::find_here in which case it represents the locality the code is running on.

Dataflow: Interest calculator

HPX provides its users with several different tools to simply express parallel concepts. One of these tools is a local control object (LCO) called dataflow. An LCO is a type of component that can spawn a new thread when triggered. They are also distinguished from other components by a standard interface which allow users to understand and use them easily. Dataflows, being a LCO, is triggered when the values it depends on become available. For instance, if you have a calculation X that depends on the result of three other calculations, you could set up a dataflow that would begin the calculation X as soon as the other three calculations have returned their values. Dataflows are set up to depend on other dataflows. It is this property that makes dataflow a powerful parallelization tool. If you understand the dependencies of your calculation, you can devise a simple algorithm which sets up a dependency tree to be executed. In this example, we calculate compound interest. To calculate compound interest, one must calculate the interest made in each compound period, and then add that interest back to the principal before calculating the interest made in the next period. A practical person would of course use the formula for compound interest:

\[F = P(1 + i) ^ n\]

where \(F\) is the future value, \(P\) is the principal value, \(i\) is the interest rate, and \(n\) is the number of compound periods.

Nevertheless, we have chosen for the sake of example to manually calculate the future value by iterating:

\[I = Pi\]

and

\[P = P + i\]
Setup

The source code for this example can be found here: interest_calculator.cpp.

To compile this program, go to your HPX build directory (see HPX build system for information on configuring and building HPX) and enter:

make examples.quickstart.interest_calculator

To run the program type:

./bin/interest_calculator --principal 100 --rate 5 --cp 6 --time 36

This should print:

Final amount: 134.01
Amount made: 34.0096
Walkthrough

Let us begin with main, here we can see that we again are using Boost.Program Options to set our command line variables (see Asynchronous execution with hpx::async and actions: Fibonacci for more details). These options set the principal, rate, compound period, and time. It is important to note that the units of time for cp and time must be the same.

int main(int argc, char ** argv)
{
    options_description cmdline("Usage: " HPX_APPLICATION_STRING " [options]");

    cmdline.add_options()
        ("principal", value<double>()->default_value(1000), "The principal [$]")
        ("rate", value<double>()->default_value(7), "The interest rate [%]")
        ("cp", value<int>()->default_value(12), "The compound period [months]")
        ("time", value<int>()->default_value(12*30),
            "The time money is invested [months]")
    ;

    return hpx::init(cmdline, argc, argv);
}

Next we look at hpx_main.

int hpx_main(variables_map & vm)
{
    {
        using hpx::shared_future;
        using hpx::make_ready_future;
        using hpx::dataflow;
        using hpx::util::unwrapping;
        hpx::naming::id_type here = hpx::find_here();

        double init_principal=vm["principal"].as<double>(); //Initial principal
        double init_rate=vm["rate"].as<double>(); //Interest rate
        int cp=vm["cp"].as<int>(); //Length of a compound period
        int t=vm["time"].as<int>(); //Length of time money is invested

        init_rate/=100; //Rate is a % and must be converted
        t/=cp; //Determine how many times to iterate interest calculation:
               //How many full compound periods can fit in the time invested

        // In non-dataflow terms the implemented algorithm would look like:
        //
        // int t = 5;    // number of time periods to use
        // double principal = init_principal;
        // double rate = init_rate;
        //
        // for (int i = 0; i < t; ++i)
        // {
        //     double interest = calc(principal, rate);
        //     principal = add(principal, interest);
        // }
        //
        // Please note the similarity with the code below!

        shared_future<double> principal = make_ready_future(init_principal);
        shared_future<double> rate = make_ready_future(init_rate);

        for (int i = 0; i < t; ++i)
        {
            shared_future<double> interest = dataflow(unwrapping(calc), principal, rate);
            principal = dataflow(unwrapping(add), principal, interest);
        }

        // wait for the dataflow execution graph to be finished calculating our
        // overall interest
        double result = principal.get();

        std::cout << "Final amount: " << result << std::endl;
        std::cout << "Amount made: " << result-init_principal << std::endl;
    }

    return hpx::finalize();
}

Here we find our command line variables read in, the rate is converted from a percent to a decimal, the number of calculation iterations is determined, and then our shared_futures are set up. Notice that we first place our principal and rate into shares futures by passing the variables init_principal and init_rate using hpx::make_ready_future.

In this way hpx::shared_future<double> principal and rate will be initialized to init_principal and init_rate when hpx::make_ready_future<double> returns a future containing those initial values. These shared futures then enter the for loop and are passed to interest. Next principal and interest are passed to the reassignment of principal using a hpx::dataflow. A dataflow will first wait for its arguments to be ready before launching any callbacks, so add in this case will not begin until both principal and interest are ready. This loop continues for each compound period that must be calculated. To see how interest and principal are calculated in the loop let us look at calc_action and add_action:

// Calculate interest for one period
double calc(double principal, double rate)
{
    return principal * rate;
}

///////////////////////////////////////////////////////////////////////////////
// Add the amount made to the principal
double add(double principal, double interest)
{
    return principal + interest;
}

After the shared future dependencies have been defined in hpx_main, we see the following statement:

double result = principal.get();

This statement calls hpx::future::get on the shared future principal which had its value calculated by our for loop. The program will wait here until the entire dataflow tree has been calculated and the value assigned to result. The program then prints out the final value of the investment and the amount of interest made by subtracting the final value of the investment from the initial value of the investment.

Local to remote: 1D stencil

When developers write code they typically begin with a simple serial code and build upon it until all of the required functionality is present. The following set of examples were developed to demonstrate this iterative process of evolving a simple serial program to an efficient, fully distributed HPX application. For this demonstration, we implemented a 1D heat distribution problem. This calculation simulates the diffusion of heat across a ring from an initialized state to some user defined point in the future. It does this by breaking each portion of the ring into discrete segments and using the current segment’s temperature and the temperature of the surrounding segments to calculate the temperature of the current segment in the next timestep as shown by Fig. 2 below.

_images/1d_stencil_program_flow.png

Fig. 2 Heat diffusion example program flow.

We parallelize this code over the following eight examples:

The first example is straight serial code. In this code we instantiate a vector U which contains two vectors of doubles as seen in the structure stepper.

struct stepper
{
    // Our partition type
    typedef double partition;

    // Our data for one time step
    typedef std::vector<partition> space;

    // Our operator
    static double heat(double left, double middle, double right)
    {
        return middle + (k*dt/(dx*dx)) * (left - 2*middle + right);
    }

    // do all the work on 'nx' data points for 'nt' time steps
    space do_work(std::size_t nx, std::size_t nt)
    {
        // U[t][i] is the state of position i at time t.
        std::vector<space> U(2);
        for (space& s : U)
            s.resize(nx);

        // Initial conditions: f(0, i) = i
        for (std::size_t i = 0; i != nx; ++i)
            U[0][i] = double(i);

        // Actual time step loop
        for (std::size_t t = 0; t != nt; ++t)
        {
            space const& current = U[t % 2];
            space& next = U[(t + 1) % 2];

            next[0] = heat(current[nx-1], current[0], current[1]);

            for (std::size_t i = 1; i != nx-1; ++i)
                next[i] = heat(current[i-1], current[i], current[i+1]);

            next[nx-1] = heat(current[nx-2], current[nx-1], current[0]);
        }

        // Return the solution at time-step 'nt'.
        return U[nt % 2];
    }
};

Each element in the vector of doubles represents a single grid point. To calculate the change in heat distribution, the temperature of each grid point, along with its neighbors, are passed to the function heat. In order to improve readability, references named current and next are created which, depending on the time step, point to the first and second vector of doubles. The first vector of doubles is initialized with a simple heat ramp. After calling the heat function with the data in the current vector, the results are placed into the next vector.

In example 2 we employ a technique called futurization. Futurization is a method by which we can easily transform a code which is serially executed into a code which creates asynchronous threads. In the simplest case this involves replacing a variable with a future to a variable, a function with a future to a function, and adding a .get() at the point where a value is actually needed. The code below shows how this technique was applied to the struct stepper.

struct stepper
{
    // Our partition type
    typedef hpx::shared_future<double> partition;

    // Our data for one time step
    typedef std::vector<partition> space;

    // Our operator
    static double heat(double left, double middle, double right)
    {
        return middle + (k*dt/(dx*dx)) * (left - 2*middle + right);
    }

    // do all the work on 'nx' data points for 'nt' time steps
    hpx::future<space> do_work(std::size_t nx, std::size_t nt)
    {
        using hpx::dataflow;
        using hpx::util::unwrapping;

        // U[t][i] is the state of position i at time t.
        std::vector<space> U(2);
        for (space& s : U)
            s.resize(nx);

        // Initial conditions: f(0, i) = i
        for (std::size_t i = 0; i != nx; ++i)
            U[0][i] = hpx::make_ready_future(double(i));

        auto Op = unwrapping(&stepper::heat);

        // Actual time step loop
        for (std::size_t t = 0; t != nt; ++t)
        {
            space const& current = U[t % 2];
            space& next = U[(t + 1) % 2];

            // WHEN U[t][i-1], U[t][i], and U[t][i+1] have been computed, THEN we
            // can compute U[t+1][i]
            for (std::size_t i = 0; i != nx; ++i)
            {
                next[i] = dataflow(
                        hpx::launch::async, Op,
                        current[idx(i, -1, nx)], current[i], current[idx(i, +1, nx)]
                    );
            }
        }

        // Now the asynchronous computation is running; the above for-loop does not
        // wait on anything. There is no implicit waiting at the end of each timestep;
        // the computation of each U[t][i] will begin when as soon as its dependencies
        // are ready and hardware is available.

        // Return the solution at time-step 'nt'.
        return hpx::when_all(U[nt % 2]);
    }
};

In example 2, we re-define our partition type as a shared_future and, in main, create the object result which is a future to a vector of partitions. We use result to represent the last vector in a string of vectors created for each timestep. In order to move to the next timestep, the values of a partition and its neighbors must be passed to heat once the futures that contain them are ready. In HPX, we have an LCO (Local Control Object) named Dataflow which assists the programmer in expressing this dependency. Dataflow allows us to pass the results of a set of futures to a specified function when the futures are ready. Dataflow takes three types of arguments, one which instructs the dataflow on how to perform the function call (async or sync), the function to call (in this case Op), and futures to the arguments that will be passed to the function. When called, dataflow immediately returns a future to the result of the specified function. This allows users to string dataflows together and construct an execution tree.

After the values of the futures in dataflow are ready, the values must be pulled out of the future container to be passed to the function heat. In order to do this, we use the HPX facility unwrapped, which underneath calls .get() on each of the futures so that the function heat will be passed doubles and not futures to doubles.

By setting up the algorithm this way, the program will be able to execute as quickly as the dependencies of each future are met. Unfortunately, this example runs terribly slow. This increase in execution time is caused by the overheads needed to create a future for each data point. Because the work done within each call to heat is very small, the overhead of creating and scheduling each of the three futures is greater than that of the actual useful work! In order to amortize the overheads of our synchronization techniques, we need to be able to control the amount of work that will be done with each future. We call this amount of work per overhead grain size.

In example 3, we return to our serial code to figure out how to control the grain size of our program. The strategy that we employ is to create “partitions” of data points. The user can define how many partitions are created and how many data points are contained in each partition. This is accomplished by creating the struct partition which contains a member object data_, a vector of doubles which holds the data points assigned to a particular instance of partition.

In example 4, we take advantage of the partition setup by redefining space to be a vector of shared_futures with each future representing a partition. In this manner, each future represents several data points. Because the user can define how many data points are contained in each partition (and therefore how many data points that are represented by one future) a user can now control the grainsize of the simulation. The rest of the code was then futurized in the same manner that was done in example 2. It should be noted how strikingly similar example 4 is to example 2.

Example 4 finally shows good results. This code scales equivalently to the OpenMP version. While these results are promising, there are more opportunities to improve the application’s scalability. Currently this code only runs on one locality, but to get the full benefit of HPX we need to be able to distribute the work to other machines in a cluster. We begin to add this functionality in example 5.

In order to run on a distributed system, a large amount of boilerplate code must be added. Fortunately, HPX provides us with the concept of a component which saves us from having to write quite as much code. A component is an object which can be remotely accessed using its global address. Components are made of two parts: a server and a client class. While the client class is not required, abstracting the server behind a client allows us to ensure type safety instead of having to pass around pointers to global objects. Example 5 renames example 4’s struct partition to partition_data and adds serialization support. Next we add the server side representation of the data in the structure partition_server. Partition_server inherits from hpx::components::component_base which contains a server side component boilerplate. The boilerplate code allows a component’s public members to be accessible anywhere on the machine via its Global Identifier (GID). To encapsulate the component, we create a client side helper class. This object allows us to create new instances of our component, and access its members without having to know its GID. In addition, we are using the client class to assist us with managing our asynchrony. For example, our client class partition’s member function get_data() returns a future to partition_data get_data(). This struct inherits its boilerplate code from hpx::components::client_base.

In the structure stepper, we have also had to make some changes to accommodate a distributed environment. In order to get the data from a neighboring partition, which could be remote, we must retrieve the data from the neighboring partitions. These retrievals are asynchronous and the function heat_part_data, which amongst other things calls heat, should not be called unless the data from the neighboring partitions have arrived. Therefore it should come as no surprise that we synchronize this operation with another instance of dataflow (found in heat_part). This dataflow is passed futures to the data in the current and surrounding partitions by calling get_data() on each respective partition. When these futures are ready dataflow passes then to the unwrapped function, which extracts the shared_array of doubles and passes them to the lambda. The lambda calls heat_part_data on the locality which the middle partition is on.

Although this example could run in distributed, it only runs on one locality as it always uses hpx::find_here() as the target for the functions to run on.

In example 6, we begin to distribute the partition data on different nodes. This is accomplished in stepper::do_work() by passing the GID of the locality where we wish to create the partition to the the partition constructor.

    for (std::size_t i = 0; i != np; ++i)
        U[0][i] = partition(localities[locidx(i, np, nl)], nx, double(i));

We distribute the partitions evenly based on the number of localities used, which is described in the function locidx. Because some of the data needed to update the partition in heat_part could now be on a new locality, we must devise a way of moving data to the locality of the middle partition. We accomplished this by adding a switch in the function get_data() which returns the end element of the buffer data_ if it is from the left partition or the first element of the buffer if the data is from the right partition. In this way only the necessary elements, not the whole buffer, are exchanged between nodes. The reader should be reminded that this exchange of end elements occurs in the function get_data() and therefore is executed asynchronously.

Now that we have the code running in distributed, it is time to make some optimizations. The function heat_part spends most of its time on two tasks: retrieving remote data and working on the data in the middle partition. Because we know that the data for the middle partition is local, we can overlap the work on the middle partition with that of the possibly remote call of get_data(). This algorithmic change which was implemented in example 7 can be seen below:

    // The partitioned operator, it invokes the heat operator above on all elements
    // of a partition.
    static partition heat_part(partition const& left,
        partition const& middle, partition const& right)
    {
        using hpx::dataflow;
        using hpx::util::unwrapping;

        hpx::shared_future<partition_data> middle_data =
            middle.get_data(partition_server::middle_partition);

        hpx::future<partition_data> next_middle = middle_data.then(
            unwrapping(
                [middle](partition_data const& m) -> partition_data
                {
                    HPX_UNUSED(middle);

                    // All local operations are performed once the middle data of
                    // the previous time step becomes available.
                    std::size_t size = m.size();
                    partition_data next(size);
                    for (std::size_t i = 1; i != size-1; ++i)
                        next[i] = heat(m[i-1], m[i], m[i+1]);
                    return next;
                }
            )
        );

        return dataflow(
            hpx::launch::async,
            unwrapping(
                [left, middle, right](partition_data next, partition_data const& l,
                    partition_data const& m, partition_data const& r) -> partition
                {
                    HPX_UNUSED(left);
                    HPX_UNUSED(right);

                    // Calculate the missing boundary elements once the
                    // corresponding data has become available.
                    std::size_t size = m.size();
                    next[0] = heat(l[size-1], m[0], m[1]);
                    next[size-1] = heat(m[size-2], m[size-1], r[0]);

                    // The new partition_data will be allocated on the same locality
                    // as 'middle'.
                    return partition(middle.get_id(), next);
                }
            ),
            std::move(next_middle),
            left.get_data(partition_server::left_partition),
            middle_data,
            right.get_data(partition_server::right_partition)
        );
    }

Example 8 completes the futurization process and utilizes the full potential of HPX by distributing the program flow to multiple localities, usually defined as nodes in a cluster. It accomplishes this task by running an instance of HPX main on each locality. In order to coordinate the execution of the program the struct stepper is wrapped into a component. In this way, each locality contains an instance of stepper which executes its own instance of the function do_work(). This scheme does create an interesting synchronization problem that must be solved. When the program flow was being coordinated on the head node the, GID of each component was known. However, when we distribute the program flow, each partition has no notion of the GID of its neighbor if the next partition is on another locality. In order to make the GIDs of neighboring partitions visible to each other, we created two buffers to store the GIDs of the remote neighboring partitions on the left and right respectively. These buffers are filled by sending the GID of a newly created edge partitions to the right and left buffers of the neighboring localities.

In order to finish the simulation the solution vectors named result are then gathered together on locality 0 and added into a vector of spaces overall_result using the HPX functions gather_id and gather_here.

Example 8 completes this example series which takes the serial code of example 1 and incrementally morphs it into a fully distributed parallel code. This evolution was guided by the simple principles of futurization, the knowledge of grainsize, and utilization of components. Applying these techniques easily facilitates the scalable parallelization of most applications.

Manual

The manual is your comprehensive guide to HPX. It contains detailed information on how to build and use HPX in different scenarios.

Getting HPX

There are HPX packages available for a few Linux distributions. The easiest way to get started with HPX is to use those packages. We keep an up-to-date list with instructions on the HPX Downloads page. If you use one of the available packages you can skip the next section, HPX build system, but we still recommend that you look through it as it contains useful information on how you can customize HPX at compile-time.

If there isn’t a package available for your platform you should either clone our repository:

or download a package with the source files from HPX Downloads.

HPX build system

The build system for HPX is based on CMake. CMake is a cross-platform build-generator tool. CMake does not build the project, it generates the files needed by your build tool (GNU make, Visual Studio, etc.) for building HPX.

This section gives an introduction on how to use our build system to build HPX and how to use HPX in your own projects.

CMake basics

CMake is a cross-platform build-generator tool. cmake does not build the project, it generates the files needed by your build tool (gnu make, visual studio, etc.) for building HPX.

in general, the hpx CMake scripts try to adhere to the general cmake policies on how to write CMake based projects.

Basic CMake usage

This section explains basic aspects of CMake, mostly for explaining those options which you may need on your day-to-day usage.

CMake comes with extensive documentation in the form of html files and on the cmake executable itself. Execute cmake --help for further help options.

CMake requires to know for which build tool it shall generate files (GNU make, Visual Studio, Xcode, etc.). If not specified on the command line, it tries to guess it based on you environment. Once identified the build tool, CMake uses the corresponding Generator for creating files for your build tool. You can explicitly specify the generator with the command line option -G "Name of the generator". For knowing the available generators on your platform, execute:

cmake --help

This will list the generator names at the end of the help text. Generator names are case-sensitive. Example:

cmake -G "Visual Studio 9 2008" path/to/hpx

For a given development platform there can be more than one adequate generator. If you use Visual Studio "NMake Makefiles" is a generator you can use for building with NMake. By default, CMake chooses the more specific generator supported by your development environment. If you want an alternative generator, you must tell this to CMake with the -G option.

Quick start

We use here the command-line, non-interactive CMake interface.

  1. Download and install CMake here: CMake Downloads. Version 3.3.2 is the minimally required version for HPX.

  2. Open a shell. Your development tools must be reachable from this shell through the PATH environment variable.

  3. Create a directory for containing the build. It is not supported to build HPX on the source directory. cd to this directory:

    mkdir mybuilddir
    cd mybuilddir
    
  4. Execute this command on the shell replacing path/to/hpx/ with the path to the root of your HPX source tree:

    cmake path/to/hpx
    

CMake will detect your development environment, perform a series of tests and will generate the files required for building HPX. CMake will use default values for all build parameters. See the CMake variables used to configure HPX section for fine-tuning your build.

This can fail if CMake can’t detect your toolset, or if it thinks that the environment is not sane enough. In this case make sure that the toolset that you intend to use is the only one reachable from the shell and that the shell itself is the correct one for you development environment. CMake will refuse to build MinGW makefiles if you have a POSIX shell reachable through the PATH environment variable, for instance. You can force CMake to use various compilers and tools. Please visit CMake Useful Variables for a detailed overview of specific CMake variables.

Options and variables

Variables customize how the build will be generated. Options are boolean variables, with possible values ON/OFF. Options and variables are defined on the CMake command line like this:

cmake -DVARIABLE=value path/to/hpx

You can set a variable after the initial CMake invocation for changing its value. You can also undefine a variable:

cmake -UVARIABLE path/to/hpx

Variables are stored on the CMake cache. This is a file named CMakeCache.txt on the root of the build directory. Do not hand-edit it.

Variables are listed here appending its type after a colon. It is correct to write the variable and the type on the CMake command line:

cmake -DVARIABLE:TYPE=value path/to/llvm/source

CMake supports the following variable types: BOOL (options), STRING (arbitrary string), PATH (directory name), FILEPATH (file name).

Prerequisites
Supported platforms

At this time, HPX supports the following platforms. Other platforms may work, but we do not test HPX with other platforms, so please be warned.

Table 1 Supported Platforms for HPX
Name Recommended Version Minimum Version Architectures
Linux 3.2 2.6 x86-32, x86-64, k1om
BlueGeneQ V1R2M0 V1R2M0 PowerPC A2
Windows 7, Server 2008 R2 Any Windows system x86-32, x86-64
Mac OSX   Any OSX system x86-64
Software and libraries

In the simplest case, HPX depends on Boost and Portable Hardware Locality (HWLOC). So, before you read further, please make sure you have a recent version of Boost installed on your target machine. HPX currently requires at least Boost V1.61.0 to work properly. It may build and run with older versions, but we do not test HPX with those versions, so please be warned.

Installing the Boost libraries is described in detail in Boost’s own Getting Started document. It is often possible to download the Boost libraries using the package manager of your distribution. Please refer to the corresponding documentation for your system for more information.

The installation of Boost is described in detail in Boost’s own Getting Started document. However, if you’ve never used the Boost libraries (or even if you have), here’s a quick primer: Installing Boost.

In addition, we require a recent version of hwloc in order to support thread pinning and NUMA awareness. See Installing Hwloc for instructions on building Portable Hardware Locality (HWLOC).

HPX is written in 99.99% Standard C++ (the remaining 0.01% is platform specific assembly code). As such HPX is compilable with almost any standards compliant C++ compiler. A compiler supporting the C++11 Standard is highly recommended. The code base takes advantage of C++11 language features when available (move semantics, rvalue references, magic statics, etc.). This may speed up the execution of your code significantly. We currently support the following C++ compilers: GCC, MSVC, ICPC and clang. For the status of your favorite compiler with HPX visit HPX Buildbot Website.

Table 2 Software prerequisites for HPX on Linux systems.
Name Recommended version Minimum version Notes
Compilers      
GNU Compiler Collection (g++) 4.9 or newer 4.9  
Intel Composer XE Suites 2014 or newer 2014  
clang: a C language family frontend for LLVM 3.8 or newer 3.8  
Build System      
CMake 3.9.0 3.3.2 Cuda support 3.9
Required Libraries      
Boost C++ Libraries 1.67.0 or newer 1.61.0  
Portable Hardware Locality (HWLOC) 1.11 1.2 (Xeon Phi: 1.6)  

Note

When compiling with the Intel Compiler on Linux systems, we only support C++ Standard Libraries provided by gcc 4.8 and upwards. If the g++ in your path is older than 4.8, please specify the path of a newer g++ by setting CMAKE_CXX_FLAGS='-gxx-name=/path/to/g++' via CMake.

Note

When building Boost using gcc please note that it is always a good idea to specify a cxxflags=-std=c++11 command line argument to b2 (bjam). Note however, that this is absolutely necessary when using gcc V5.2 and above.

Table 3 Software prerequisites for HPX on Windows systems
Name Recommended version Minimum version Notes
Compilers      
Visual C++ (x64) 2015 2015  
Build System      
CMake 3.9.0 3.3.2  
Required Libraries      
Boost 1.67.0 or newer 1.61.0  
Portable Hardware Locality (HWLOC) 1.11 1.5  

Note

You need to build the following Boost libraries for HPX: Boost.Filesystem, Boost.ProgramOptions, Boost.Regex, and Boost.System. The following are not needed by default, but are required in certain configurations: Boost.Chrono, Boost.DateTime, Boost.Log, Boost.LogSetup, and Boost.Thread.

Depending on the options you chose while building and installing HPX, you will find that HPX may depend on several other libraries such as those listed below.

Note

In order to use a high speed parcelport, we currently recommend configuring HPX to use MPI so that MPI can be used for communication between different localities. Please set the CMake variable MPI_CXX_COMPILER to your MPI C++ compiler wrapper if not detected automatically.

Table 4 Highly recommended optional software prerequisites for HPX on Linux systems
Name Recommended version Minimum version Notes
google-perftools 1.7.1 1.7.1 Used as a replacement for the system allocator, and for allocation diagnostics.
libunwind 0.99 0.97 Dependency of google-perftools on x86-64, used for stack unwinding.
Open MPI 1.10.1 1.8.0 Can be used as a highspeed communication library backend for the parcelport.

Note

When using OpenMPI please note that Ubuntu (notably 18.04 LTS) and older Debian ship an OpenMPI 2.x built with --enable-heterogeneous which may cause communication failures at runtime and should not be used.

Table 5 Optional software prerequisites for HPX on Linux systems
Name Recommended version Minimum version Notes
Performance Application Programming Interface (PAPI) Used for accessing hardware performance data.    
jemalloc 2.1.2 2.1.0 Used as a replacement for the system allocator.
Hierarchical Data Format V5 (HDF5) 1.8.7 1.6.7 Used for data I/O in some example applications. See important note below.
Table 6 Optional software prerequisites for HPX on Windows systems
Name Recommended version Minimum version Notes
Hierarchical Data Format V5 (HDF5) 1.8.7 1.6.7 Used for data I/O in some example applications. See important note below.

Important

The C++ HDF5 libraries must be compiled with enabled thread safety support. This has to be explicitly specified while configuring the HDF5 libraries as it is not the default. Additionally, you must set the following environment variables before configuring the HDF5 libraries (this part only needs to be done on Linux):

export CFLAGS='-DHDatexit=""'
export CPPFLAGS='-DHDatexit=""'
Documentation

To build the HPX documentation you need recent versions of the following packages:

  • python (2 or 3)
  • sphinx (Python package)
  • sphinx_rtd_theme (Python package)
  • breathe (Python package)
  • doxygen

If the Python dependencies are not available through your system package manager you can install them using the Python package manager pip:

pip install --user sphinx sphinx_rtd_theme breathe

You may need to set the following CMake variables to make sure CMake can find the required dependencies.

DOXYGEN_ROOT:PATH

Specifies where to look for the installation of the Doxygen tool.

SPHINX_ROOT:PATH

Specifies where to look for the installation of the Sphinx tool.

BREATHE_APIDOC_ROOT:PATH

Specifies where to look for the installation of the Breathe tool.

Installing Boost

Important

When building Boost using gcc please note that it is always a good idea to specify a cxxflags=-std=c++11 command line argument to b2 (bjam). Note however, that this is absolutely necessary when using gcc V5.2 and above.

Important

On Windows, depending on the installed versions of Visual Studio, you might also want to pass the correct toolset to the b2 command depending on which version of the IDE you want to use. In addition, passing address-model=64 is highly recommended. It might be also necessary to add command line argument --build-type=complete to the b2 command on the Windows platform.

The easiest way to create a working Boost installation is to compile Boost from sources yourself. This is particularly important as many high performance resources, even if they have Boost installed, usually only provide you with an older version of Boost. We suggest you download the most recent release of the Boost libraries from here: Boost Downloads. Unpack the downloaded archive into a directory of your choosing. We will refer to this directory a $BOOST.

Building and installing the Boost binaries is simple, regardless what platform you are on the basic instructions are as follows (with possible additional platform-dependent command line arguments):

cd $BOOST
bootstrap --prefix=<where to install boost>
./b2 -j<N>
./b2 install

where: <where to install boost> is the directory the built binaries will be installed to, and <N> is the number of cores to use to build the Boost binaries.

After the above sequence of commands has been executed (this may take a while!) you will need to specify the directory where Boost was installed as BOOST_ROOT (<where to install boost>) while executing cmake for HPX as explained in detail in the sections How to install HPX on Unix variants and How to install HPX on Windows.

Installing Hwloc

Note

These instructions are for everything except Windows. On Windows there is no need to build hwloc. Instead download the latest release, extract the files, and set HWLOC_ROOT during cmake configuration to the directory in which you extracted the files.

We suggest you download the most recent release of hwloc from here: Hwloc Downloads. Unpack the downloaded archive into a directory of your choosing. We will refer to this directory as $HWLOC.

To build hwloc run:

cd $HWLOC
./configure --prefix=<where to install hwloc>
make -j<N> install

where: <where to install hwloc> is the directory the built binaries will be installed to, and <N> is the number of cores to use to build hwloc.

After the above sequence of commands has been executed you will need to specify the directory where Hwloc was installed as HWLOC_ROOT (<where to install hwloc>) while executing cmake for HPX as explained in detail in the sections How to install HPX on Unix variants and How to install HPX on Windows.

Please see Hwloc Documentation for more information about Hwloc.

Building HPX
Basic information

Once CMake has been run, the build process can be started. The HPX build process is highly configurable through CMake and various CMake variables influence the build process. The build process consists of the following parts:

  • The HPX core libraries (target core): This forms the basic set of HPX libraries. The generated targets are:
    • hpx: The core HPX library (always enabled).
    • hpx_init: The HPX initialization library that applications need to link against to define the HPX entry points (disabled for static builds).
    • hpx_wrap: The HPX static library used to determine the runtime behavior of HPX code and respective entry points for hpx_main.h
    • iostreams_component: The component used for (distributed) IO (always enabled).
    • component_storage_component: The component needed for migration to persistent storage.
    • unordered_component: The component needed for a distributed (partitioned) hash table.
    • partioned_vector_component: The component needed for a distributed (partitioned) vector.
    • memory_component: A dynamically loaded plugin that exposed memory based performance counters (only available on Linux).
    • io_counter_component: A dynamically loaded plugin plugin that exposes I/O performance counters (only available on Linux).
    • papi_component: A dynamically loaded plugin that exposes PAPI performance counters (enabled with HPX_WITH_PAPI:BOOL, default is Off).
  • HPX Examples (target examples): This target is enabled by default and builds all HPX examples (disable by setting HPX_WITH_EXAMPLES:BOOL=Off). HPX examples are part of the all target and are included in the installation if enabled.
  • HPX Tests (target tests): This target builds the HPX test suite and is enabled by default (disable by setting HPX_WITH_TESTS:BOOL =Off). They are not built by the all target and have to be built separately.
  • HPX Documentation (target docs): This target builds the documentation, this is not enabled by default (enable by setting HPX_WITH_DOCUMENTATION:BOOL=On. For more information see Documentation.

For a complete list of available CMake variables that influence the build of HPX see CMake variables used to configure HPX.

The variables can be used to refine the recipes that can be found Platform specific build recipes which show some basic steps on how to build HPX for a specific platform.

In order to use HPX, only the core libraries are required (the ones marked as optional above are truly optional). When building against HPX, the CMake variable HPX_LIBRARIES will contain hpx and hpx_init (for pkgconfig, those are added to the Libs sections). In order to use the optional libraries, you need to specify them as link dependencies in your build (See Creating HPX projects).

As HPX is a modern C++ Library we require a certain minimal set of features from the C++11 standard. In addition, we make use of certain C++14 features if the used compiler supports them. This means that the HPX build system will try to determine the highest support C++ standard flavor and check for availability of those features. That is, the default will be the highest C++ standard version available. If you want to force HPX to use a specific C++ standard version you can use the following CMake variables:

  • HPX_WITH_CXX0X: Enables Pre-C++11 support (This is the minimal required mode on older gcc versions).
  • HPX_WITH_CXX11: Enables C++11 support
  • HPX_WITH_CXX14: Enables C++14 support
  • HPX_WITH_CXX17: Enables C++17 support
  • HPX_WITH_CXX2A: Enables (experimental) C++20 support
Build types

CMake can be configured to generate project files suitable for builds that have enabled debugging support or for an optimized build (without debugging support). The CMake variable used to set the build type is CMAKE_BUILD_TYPE (for more information see the CMake Documentation). Available build types are:

  • Debug: Full debug symbols available and additional assertions to help debugging. To enable the debug build type for the HPX API, the C++ Macro HPX_DEBUG is defined.
  • RelWithDebInfo: Release build with debugging symbols. This is most useful for profiling applications
  • Release: Release build. This disables assertions and enables default compiler optimizations.
  • RelMinSize: Release build with optimizations for small binary sizes.

Important

We currently don’t guarantee ABI compatibility between Debug and Release builds. Please make sure that applications built against HPX use the same build type as you used to build HPX. For CMake builds, this means that the CMAKE_BUILD_TYPE variables have to match and for projects not using CMake, the HPX_DEBUG macro has to be set in debug mode.

Platform specific notes

Some platforms require to have special link and/or compiler flags specified to build HPX. This is handled via CMake’s support for different toolchains (see cmake-toolchains(7) for more information). This is also used for cross compilation.

HPX ships with a set of toolchains that can be used for compilation of HPX itself and applications depending on HPX. Please see CMake toolchains shipped with HPX for more information.

In order to enable full static linking with the libraries, the CMake variable HPX_WITH_STATIC_LINKING:BOOL has to be set to On.

Debugging applications using core files

For HPX to generate useful core files, HPX has to be compiled without signal and exception handlers HPX_WITH_DISABLED_SIGNAL_EXCEPTION_HANDLERS:BOOL. If this option is not specified, the signal handlers change the application state. For example, after a segmentation fault the stack trace will show the signal handler. Similarly, unhandled exceptions are also caught by the these handlers and the stack trace will not point to the location where the unhandled exception was thrown.

In general, core files are a helpful tool to inspect the state of the application at the moment of the crash (post-mortem debugging), without the need of attaching a debugger beforehand. This approach to debugging is especially useful if the error cannot be reliably reproduced, as only a single crashed application run is required to gain potentially helpful information like a stacktrace.

To debug with core files, the operating system first has to be told to actually write them. On most unix systems this can be done by calling:

ulimit -c unlimited

in the shell. Now the debugger can be started up with:

gdb <application> <core file name>

The debugger should now display the last state of the application. The default file name for core files is core.

Platform specific build recipes

Note

The following build recipes are mostly user-contributed and may be outdated. We always welcome updated and new build recipes.

How to install HPX on Unix variants
  • Create a build directory. HPX requires an out-of-tree build. This means you will be unable to run CMake in the HPX source tree.

    cd hpx
    mkdir my_hpx_build
    cd my_hpx_build
    
  • Invoke CMake from your build directory, pointing the CMake driver to the root of your HPX source tree.

    cmake -DBOOST_ROOT=/root/of/boost/installation \
          -DHWLOC_ROOT=/root/of/hwloc/installation
          [other CMake variable definitions] \
          /path/to/source/tree
    

    for instance:

    cmake -DBOOST_ROOT=~/packages/boost -DHWLOC_ROOT=/packages/hwloc -DCMAKE_INSTALL_PREFIX=~/packages/hpx ~/downloads/hpx_0.9.10
    
  • Invoke GNU make. If you are on a machine with multiple cores, add the -jN flag to your make invocation, where N is the number of parallel processes HPX gets compiled with.

    gmake -j4
    

    Caution

    Compiling and linking HPX needs a considerable amount of memory. It is advisable that at least 2 GB of memory per parallel process is available.

    Note

    Many Linux distributions use make as an alias for gmake.

  • To complete the build and install HPX:

    gmake install
    

    Important

    These commands will build and install the essential core components of HPX only. In order to build and run the tests, please invoke:

    gmake tests && gmake test
    

    and in order to build (and install) all examples invoke:

    cmake -DHPX_WITH_EXAMPLES=On .
    gmake examples
    gmake install
    

For more detailed information about using CMake please refer its documentation and also the section Building HPX. Please pay special attention to the section about HPX_WITH_MALLOC:STRING as this is crucial for getting decent performance.

How to install HPX on OS X (Mac)

This section describes how to build HPX for OS X (Mac).

Build (and install) a recent version of Boost, using Clang and libc++

To build Boost with Clang and make it link to libc++ as standard library, you’ll need to set up either of the following in your ~/user-config.jam file:

# user-config.jam (put this file into your home directory)
# ...

using clang
    :
    : "/usr/bin/clang++"
    : <cxxflags>"-std=c++11 -fcolor-diagnostics"
      <linkflags>"-stdlib=libc++ -L/path/to/libcxx/lib"
    ;

(Again, remember to replace /path/to with whatever you used earlier.)

You can then use as build command either:

b2 --build-dir=/tmp/build-boost --layout=versioned toolset=clang install -j4

or:

b2 --build-dir=/tmp/build-boost --layout=versioned toolset=clang install -j4

We verified this using Boost V1.53. If you use a different version, just remember to replace /usr/local/include/boost-1_53 with whatever include prefix you had in your installation.

Build HPX, finally
cd /path/to
git clone https://github.com/STEllAR-GROUP/hpx.git
mkdir build-hpx && cd build-hpx

To build with Clang 3.2, execute:

cmake ../hpx \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DBOOST_INCLUDE_DIR=/usr/local/include/boost-1_53 \
    -DBOOST_LIBRARY_DIR=/usr/local/lib \
    -DBOOST_SUFFIX=-clang-darwin32-mt-1_53 \
make

To build with Clang 3.3 (trunk), execute:

cmake ../hpx \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DBOOST_INCLUDE_DIR=/usr/local/include/boost-1_53 \
    -DBOOST_LIBRARY_DIR=/usr/local/lib \
    -DBOOST_SUFFIX=-clang-darwin33-mt-1_53 \
make

For more detailed information about using CMake please refer its documentation and to the section Building HPX for.

Alternative installation method of HPX on OS X (Mac)

Alternatively, you can install a recent version of gcc as well as all required libraries via MacPorts:

  1. Install MacPorts

  2. Install CMake, gcc 4.8, and hwloc:

    sudo port install gcc48
    sudo port install hwloc
    

    You may also want:

    sudo port install cmake
    sudo port install git-core
    
  3. Make this version of gcc your default compiler:

    sudo port install gcc_select
    sudo port select gcc mp-gcc48
    
  4. Build Boost manually (the Boost package of MacPorts is built with Clang, and unfortunately doesn’t work with a GCC-build version of HPX):

    wget https://dl.bintray.com/boostorg/release/1.69.0/source/boost_1_69_0.tar.bz2
    tar xjf boost_1_69_0.tar.bz2
    pushd boost_1_69_0
    export BOOST_ROOT=$HOME/boost_1_69_0
    ./bootstrap.sh --prefix=$BOOST_DIR
    ./b2 -j8
    ./b2 -j8 install
    export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:$BOOST_ROOT/lib
    popd
    
  5. Build HPX:

    git clone https://github.com/STEllAR-GROUP/hpx.git
    mkdir hpx-build
    pushd hpx-build
    export HPX_ROOT=$HOME/hpx
    cmake -DCMAKE_C_COMPILER=gcc \
        -DCMAKE_CXX_COMPILER=g++ \
        -DCMAKE_FORTRAN_COMPILER=gfortran \
        -DCMAKE_C_FLAGS="-Wno-unused-local-typedefs" \
        -DCMAKE_CXX_FLAGS="-Wno-unused-local-typedefs" \
        -DBOOST_ROOT=$BOOST_ROOT \
        -DHWLOC_ROOT=/opt/local \
        -DCMAKE_INSTALL_PREFIX=$HOME/hpx \
             $(pwd)/../hpx
    make -j8
    make -j8 install
    export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:$HPX_ROOT/lib/hpx
    popd
    
  6. Note that you need to set BOOST_ROOT, HPX_ROOT and DYLD_LIBRARY_PATH (for both BOOST_ROOT and HPX_ROOT every time you configure, build, or run an HPX application.

  7. If you want to use HPX with MPI, you need to enable the MPI parcelport, and also specify the location of the MPI wrapper scripts. This can be done e.g. with the following command:

    cmake -DHPX_WITH_PARCELPORT_MPI=ON \
         -DCMAKE_C_COMPILER=gcc \
         -DCMAKE_CXX_COMPILER=g++ \
         -DCMAKE_FORTRAN_COMPILER=gfortran \
         -DMPI_C_COMPILER=openmpicc \
         -DMPI_CXX_COMPILER=openmpic++ \
         -DMPI_FORTRAN_COMPILER=openmpif90 \
         -DCMAKE_C_FLAGS="-Wno-unused-local-typedefs" \
         -DCMAKE_CXX_FLAGS="-Wno-unused-local-typedefs" \
         -DBOOST_ROOT=$BOOST_DIR \
         -DHWLOC_ROOT=/opt/local \
         -DCMAKE_INSTALL_PREFIX=$HOME/hpx
             $(pwd)/../hpx
    
How to install HPX on Windows
Installation of required prerequisites
  • Download the Boost c++ libraries from Boost Downloads
  • Install the boost library as explained in the section Installing Boost
  • Install the hwloc library as explained in the section Installing Hwloc
  • Download the latest version of CMake binaries, which are located under the platform section of the downloads page at CMake Downloads.
  • Download the latest version of HPX from the STE||AR website: HPX Downloads.
Installation of the HPX library
  • Create a build folder. HPX requires an out-of-tree-build. This means that you will be unable to run CMake in the HPX source folder.
  • Open up the CMake GUI. In the input box labelled “Where is the source code:”, enter the full path to the source folder. The source directory is one where the sources were checked out. CMakeLists.txt files in the source directory as well as the subdirectories describe the build to CMake. In addition to this, there are CMake scripts (usually ending in .cmake) stored in a special CMake directory. CMake does not alter any file in the source directory and doesn’t add new ones either. In the input box labelled “Where to build the binaries:”, enter the full path to the build folder you created before. The build directory is one where all compiler outputs are stored, which includes object files and final executables.
  • Add CMake variable definitions (if any) by clicking the “Add Entry” button. There are two required variables you need to define: BOOST_ROOT and HWLOC_ROOT These (PATH) variables need to be set to point to the root folder of your Boost and Portable Hardware Locality (HWLOC) installations. It is recommended to set the variable CMAKE_INSTALL_PREFIX as well. This determines where the HPX libraries will be built and installed. If this (PATH) variable is set, it has to refer to the directory where the built HPX files should be installed to.
  • Press the “Configure” button. A window will pop up asking you which compilers to use. Select the Visual Studio 10 (64Bit) compiler (it usually is the default if available). The Visual Studio 2012 (64Bit) and Visual Studio 2013 (64Bit) compilers are supported as well. Note that while it is possible to build HPX for x86, we don’t recommend doing so as 32 bit runs are severely restricted by a 32 bit Windows system limitation affecting the number of HPX threads you can create.
  • Press “Configure” again. Repeat this step until the “Generate” button becomes clickable (and until no variable definitions are marked red anymore).
  • Press “Generate”.
  • Open up the build folder, and double-click hpx.sln.
  • Build the INSTALL target.

For more detailed information about using CMake please refer its documentation and also the section Building HPX.

How to build HPX under Windows 10 x64 with Visual Studio 2015
  • Download the CMake V3.4.3 installer (or latest version) from here

  • Download the Portable Hardware Locality (HWLOC) V1.11.0 (or latest version) from here and unpack it.

  • Download the latest Boost libraries from here and unpack them.

  • Build the boost DLLs and LIBs by using these commands from Command Line (or PowerShell). Open CMD/PowerShell inside the Boost dir and type in:

    bootstrap.bat
    

    This batch file will set up everything needed to create a successful build. Now execute:

    b2.exe link=shared variant=release,debug architecture=x86 address-model=64 threading=multi --build-type=complete install
    

    This command will start a (very long) build of all available Boost libraries. Please, be patient.

  • Open CMake-GUI.exe and set up your source directory (input field ‘Where is the source code’) to the base directory of the source code you downloaded from HPX’s GitHub pages. Here’s an example of my CMake path settings which point to my Documents/GitHub/hpx folder:

    _images/cmake_settings1.png

    Fig. 3 Example CMake path settings.

    Inside the ‘Where is the source-code’ enter the base directory of your HPX source directory (do not enter the “src” sub-directory!) Inside ‘Where to build the binaries’ you should put in the path where all the building process will happen. This is important because the building machinery will do an “out-of-tree” build. CMake is not touching or changing in any way the original source files. Instead, it will generate Visual Studio Solution Files which will build HPX packages out of the HPX source tree.

  • Set three new environment variables (in CMake, not in Windows environment, by the way): BOOST_ROOT, HWLOC_ROOT, CMAKE_INSTALL_PREFIX. The meaning of these variables is as follows:

    • BOOST_ROOT the root directory of the unpacked Boost headers/cpp files.

    • HWLOC_ROOT the root directory of the unpacked Portable Hardware Locality files.

    • CMAKE_INSTALL_PREFIX the “root directory” where the future builds of HPX should be installed to.

      Note

      HPX is a BIG software collection and I really don’t recommend using the default C:\Program Files\hpx. I prefer simpler paths without white space, like C:\bin\hpx or D:\bin\hpx etc.

    To insert new env-vars click on “Add Entry” and then insert the name inside “Name”, select PATH as Type and put the path-name in “Path” text field. Repeat this for the first three variables.

    This is how variable insertion looks like:

    _images/cmake_settings2.png

    Fig. 4 Example CMake adding entry.

    Alternatively you could provide BOOST_LIBRARYDIR instead of BOOST_ROOT with a difference that BOOST_LIBRARYDIR should point to the subdirectory inside Boost root where all the compiled DLLs/LIBs are. I myself have used BOOST_LIBRARYDIR which pointed to the bin.v2 subdirectory under the Boost rootdir. Important is to keep the meanings of these two variables separated from each other: BOOST_DIR points to the ROOT folder of the boost library. BOOST_LIBRARYDIR points to the subdir inside Boost root folder where the compiled binaries are.

  • Click the ‘Configure’ button of CMake-GUI. You will be immediately presented a small window where you can select the C++ compiler to be used within Visual Studio. In my case I have used the latest v14 (a.k.a C++ 2015) but older versions should be sufficient too. Make sure to select the 64Bit compiler

  • After the generate process has finished successfully click the ‘Generate’ button. Now, CMake will put new VS Solution files into the BUILD folder you selected at the beginning.

  • Open Visual Studio and load the HPX.sln from your build folder.

  • Go to CMakePredefinedTargets and build the INSTALL project:

    _images/vs_targets_install.png

    Fig. 5 Visual Studio INSTALL target.

    It will take some time to compile everything and in the end you should see an output similar to this one:

    _images/vs_build_output.png

    Fig. 6 Visual Studio build output.

How to Install HPX on BlueGene/Q

So far we only support BGClang for compiling HPX on the BlueGene/Q.

  • Check if BGClang is available on your installation. If not obtain and install a copy from the BGClang trac page.

  • Build (and install) a recent version of Hwloc Downloads. With the following commands:

    ./configure \
      --host=powerpc64-bgq-linux \
      --prefix=$HOME/install/hwloc \
      --disable-shared \
      --enable-static \
      CPPFLAGS='-I/bgsys/drivers/ppcfloor -I/bgsys/drivers/ppcfloor/spi/include/kernel/cnk/'
    make
    make install
    
  • Build (and install) a recent version of Boost, using BGClang. To build Boost with BGClang, you’ll need to set up the following in your Boost ~/user-config.jam file:

    # user-config.jam (put this file into your home directory)
    using clang
      :
      : bgclang++11
      :
      ;
    

    You can then use this as your build command:

    ./bootstrap.sh
    ./b2 --build-dir=/tmp/build-boost --layout=versioned toolset=clang -j12
    
  • Clone the master HPX git repository (or a stable tag):

    git clone git://github.com/STEllAR-GROUP/hpx.git
    
  • Generate the HPX buildfiles using cmake:

    cmake -DHPX_PLATFORM=BlueGeneQ \
            -DCMAKE_TOOLCHAIN_FILE=/path/to/hpx/cmake/toolchains/BGQ.cmake \
            -DCMAKE_CXX_COMPILER=bgclang++11 \
            -DMPI_CXX_COMPILER=mpiclang++11 \
            -DHWLOC_ROOT=/path/to/hwloc/installation \
            -DBOOST_ROOT=/path/to/boost \
            -DHPX_WITH_MALLOC=system \
            /path/to/hpx
    
  • To complete the build and install HPX:

    make -j24
    make install
    

    This will build and install the essential core components of HPX only. Use:

    make -j24 examples
    make -j24 install
    

    to build and install the examples.

How to Install HPX on the Xeon Phi
Installation of the Boost Libraries
  • Download Boost Downloads for Linux and unpack the retrieved tarball.

  • Adapt your ~/user-config.jam to contain the following lines:

    ## Toolset to be used for compiling for the host
    using intel
        : host
        :
        : <cxxflags>"-std=c++0x"
        ;
    
    ## Toolset to be used for compiling for the Xeon Phi
    using intel
        : mic
        :
        : <cxxflags>"-std=c++0x -mmic"
          <linkflags>"-std=c++0x -mmic"
        ;
    
  • Change to the directory you unpacked boost in (from now on referred to as $BOOST_ROOT) and execute the following commands:

    ./bootstrap.sh
    ./b2 toolset=intel-mic -j<N>
    

    You should now have all the required boost libraries.

Installation of the Hwloc library
  • Download Hwloc Downloads, unpack the retrieved tarball and change to the newly created directory.

  • Run the configure-make-install procedure as follows:

    CC=icc CFLAGS=-mmic CXX=icpc CXXFLAGS=-mmic LDFLAGS=-mmic ./configure --host=x86_64-k1om-linux --prefix=$HWLOC_ROOT
    make
    make install
    

Important

The minimally required version of the Portable Hardware Locality (HWLOC) library on the Intel Xeon Phi is V1.6.

You now have a working hwloc installation in $HWLOC_ROOT.

Building HPX

After all the prerequisites have been successfully installed, we can now start building and installing HPX. The build procedure is almost the same as for How to install HPX on Unix variants with the sole difference that you have to enable the Xeon Phi in the CMake Build system. This is achieved by invoking CMake in the following way:

cmake                                             \
    -DCMAKE_TOOLCHAIN_FILE=/path/to/hpx/cmake/toolchains/XeonPhi.cmake \
    -DBOOST_ROOT=$BOOST_ROOT                      \
    -DHWLOC_ROOT=$HWLOC_ROOT                      \
    /path/to/hpx

For more detailed information about using CMake please refer to its documentation and to the section Building HPX. Please pay special attention to the section about HPX_WITH_MALLOC:STRING as this is crucial for getting decent performance on the Xeon Phi.

How to install HPX on Fedora distributions

Important

There are official HPX packages for Fedora. Unless you want to customize your build you may want to start off with the official packages. Instructions can be found on the HPX Downloads page.

Note

This section of the manual is based off of our collaborators Patrick Diehl’s blog post Installing HPX on Fedora 22.

  • Install all packages for minimal installation:

    sudo dnf install gcc-c++ cmake boost-build boost boost-devel hwloc-devel \
      hwloc gcc-gfortran papi-devel gperftools-devel docbook-dtds \
      docbook-style-xsl libsodium-devel doxygen boost-doc hdf5-devel \
      fop boost-devel boost-openmpi-devel boost-mpich-devel
    
  • Get the development branch of HPX:

    git clone https://github.com/STEllAR-GROUP/hpx.git
    
  • Configure it with CMake:

    cd hpx
    mkdir build
    cd build
    cmake -DCMAKE_INSTALL_PREFIX=/opt/hpx ..
    make -j
    make install
    

    Note

    To build HPX without examples use:

    cmake -DCMAKE_INSTALL_PREFIX=/opt/hpx -DHPX_WITH_EXAMPLES=Off ..
    
  • Add the library path of HPX to ldconfig:

    sudo echo /opt/hpx/lib > /etc/ld.so.conf.d/hpx.conf
    sudo ldconfig
    
How to install HPX on Arch distributions

Important

There are HPX packages for Arch in the AUR. Unless you want to customize your build you may want to start off with those. Instructions can be found on the HPX Downloads page.

  • Install all packages for a minimal installation:

    sudo pacman -S gcc clang cmake boost hwloc gperftools
    
  • For building the documentation you will need to further install the following:

    sudo pacman -S doxygen python-pip
    
    pip install --user sphinx sphinx_rtd_theme breathe
    

The rest of the installation steps are same as provided with Fedora or Unix variants.

How to install HPX on Debian-based distributions
  • Install all packages for a minimal installation:

    sudo apt install cmake libboost-all-dev hwloc libgoogle-perftools-dev
    
  • For building the documentation you will need to further install the following:

    sudo apt install doxygen python-pip
    
    pip install --user sphinx sphinx_rtd_theme breathe
    

    or the following if you prefer to get Python packages from the Debian repositories:

    sudo apt install doxygen python-sphinx python-sphinx-rtd-theme python-breathe
    

The rest of the installation steps are same as provided with Fedora or Unix variants.

CMake toolchains shipped with HPX

In order to compile HPX for various platforms, we provide a variety of toolchain files that take care of setting up various CMake variables like compilers etc. They are located in the cmake/toolchains directory:

To use them pass the -DCMAKE_TOOLCHAIN_FILE=<toolchain> argument to the cmake invocation.

ARM-gcc
# Copyright (c) 2015 Thomas Heller
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_CROSSCOMPILING ON)
# Set the gcc Compiler
set(CMAKE_CXX_COMPILER arm-linux-gnueabihf-g++-4.8)
set(CMAKE_C_COMPILER arm-linux-gnueabihf-gcc-4.8)
set(HPX_WITH_GENERIC_CONTEXT_COROUTINES ON CACHE BOOL "enable generic coroutines")
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
BGION-gcc
# Copyright (c) 2014 John Biddiscombe
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
# This is the default toolchain file to be used with CNK on a BlueGene/Q. It sets
# the appropriate compile flags and compiler such that HPX will compile.
# Note that you still need to provide Boost, hwloc and other utility libraries
# like a custom allocator yourself.
#
# Usage : cmake -DCMAKE_TOOLCHAIN_FILE=~/src/hpx/cmake/toolchains/BGION-gcc.cmake ~/src/hpx
#
set(CMAKE_SYSTEM_NAME Linux)
# Set the gcc Compiler
set(CMAKE_CXX_COMPILER g++)
set(CMAKE_C_COMPILER gcc)
#set(CMAKE_Fortran_COMPILER)
# Add flags we need for BGAS compilation
set(CMAKE_CXX_FLAGS_INIT
  "-D__powerpc__ -D__bgion__ -I/gpfs/bbp.cscs.ch/home/biddisco/src/bgas/rdmahelper "
  CACHE STRING "Initial compiler flags used to compile for BGAS"
)
# the V1R2M2 includes are necessary for some hardware specific features
#-DHPX_SMALL_STACK_SIZE=0x200000 -DHPX_MEDIUM_STACK_SIZE=0x200000 -DHPX_LARGE_STACK_SIZE=0x200000 -DHPX_HUGE_STACK_SIZE=0x200000
set(CMAKE_EXE_LINKER_FLAGS_INIT "-L/gpfs/bbp.cscs.ch/apps/bgas/tools/gcc/gcc-4.8.2/install/lib64 -latomic -lrt" CACHE STRING "BGAS flags")
set(CMAKE_C_FLAGS_INIT "-D__powerpc__ -I/gpfs/bbp.cscs.ch/home/biddisco/src/bgas/rdmahelper" CACHE STRING "BGAS flags")
# We do not perform cross compilation here ...
set(CMAKE_CROSSCOMPILING OFF)
# Set our platform name
set(HPX_PLATFORM "native")
# Disable generic coroutines (and use posix version)
set(HPX_WITH_GENERIC_CONTEXT_COROUTINES OFF CACHE BOOL "disable generic coroutines")
# BGAS nodes support ibverbs
set(HPX_WITH_PARCELPORT_IBVERBS ON CACHE BOOL "")
# Always disable the tcp parcelport as it is non-functional on the BGQ.
set(HPX_WITH_PARCELPORT_TCP ON CACHE BOOL "")
# Always enable the tcp parcelport as it is currently the only way to communicate on the BGQ.
set(HPX_WITH_PARCELPORT_MPI ON CACHE BOOL "")
# We have a bunch of cores on the A2 processor ...
set(HPX_WITH_MAX_CPU_COUNT "64" CACHE STRING "")
# We have no custom malloc yet
if(NOT DEFINED HPX_WITH_MALLOC)
  set(HPX_WITH_MALLOC "system" CACHE STRING "")
endif()
set(HPX_HIDDEN_VISIBILITY OFF CACHE BOOL "")
#
# Convenience setup for jb @ bbpbg2.cscs.ch
#
set(BOOST_ROOT "/gpfs/bbp.cscs.ch/home/biddisco/apps/gcc-4.8.2/boost_1_56_0")
set(HWLOC_ROOT "/gpfs/bbp.cscs.ch/home/biddisco/apps/gcc-4.8.2/hwloc-1.8.1")
set(CMAKE_BUILD_TYPE "Debug" CACHE STRING "Default build")
#
# Testing flags
#
set(BUILD_TESTING                  ON  CACHE BOOL "Testing enabled by default")
set(HPX_WITH_TESTS                ON  CACHE BOOL "Testing enabled by default")
set(HPX_WITH_TESTS_BENCHMARKS     ON  CACHE BOOL "Testing enabled by default")
set(HPX_WITH_TESTS_REGRESSIONS    ON  CACHE BOOL "Testing enabled by default")
set(HPX_WITH_TESTS_UNIT           ON  CACHE BOOL "Testing enabled by default")
set(HPX_WITH_TESTS_EXAMPLES       ON  CACHE BOOL "Testing enabled by default")
set(HPX_WITH_TESTS_EXTERNAL_BUILD OFF CACHE BOOL "Turn off build of cmake build tests")
set(DART_TESTING_TIMEOUT           45  CACHE STRING "Life is too short")
# HPX_WITH_STATIC_LINKING
BGQ
# Copyright (c) 2014 Thomas Heller
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#
# This is the default toolchain file to be used with CNK on a BlueGene/Q. It sets
# the appropriate compile flags and compiler such that HPX will compile.
# Note that you still need to provide Boost, hwloc and other utility libraries
# like a custom allocator yourself.
#
set(CMAKE_SYSTEM_NAME Linux)
# Set the Intel Compiler
set(CMAKE_CXX_COMPILER bgclang++11)
set(CMAKE_C_COMPILER bgclang)
#set(CMAKE_Fortran_COMPILER)
set(MPI_CXX_COMPILER mpiclang++11)
set(MPI_C_COMPILER mpiclang)
#set(MPI_Fortran_COMPILER)
set(CMAKE_C_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_C_COMPILE_OBJECT "<CMAKE_C_COMPILER> -fPIC <DEFINES> <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_C_LINK_EXECUTABLE "<CMAKE_C_COMPILER> -fPIC -dynamic <FLAGS> <CMAKE_C_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_C_CREATE_SHARED_LIBRARY "<CMAKE_C_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_CXX_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES> " CACHE STRING "")
set(CMAKE_CXX_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_CXX_COMPILE_OBJECT "<CMAKE_CXX_COMPILER> -fPIC <DEFINES> <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_CXX_LINK_EXECUTABLE "<CMAKE_CXX_COMPILER> -fPIC -dynamic <FLAGS> <CMAKE_CXX_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_CXX_CREATE_SHARED_LIBRARY "<CMAKE_CXX_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_CXX_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_Fortran_COMPILE_OBJECT "<CMAKE_Fortran_COMPILER> -fPIC <DEFINES> <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_Fortran_LINK_EXECUTABLE "<CMAKE_Fortran_COMPILER> -fPIC -dynamic <FLAGS> <CMAKE_Fortran_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>")
set(CMAKE_Fortran_CREATE_SHARED_LIBRARY "<CMAKE_Fortran_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_Fortran_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_Fortran_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES> " CACHE STRING "")
# Disable searches in the default system paths. We are cross compiling after all
# and cmake might pick up wrong libraries that way
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM BOTH)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
# We do a cross compilation here ...
set(CMAKE_CROSSCOMPILING ON)
# Set our platform name
set(HPX_PLATFORM "BlueGeneQ")
# Always disable the ibverbs parcelport as it is non-functional on the BGQ.
set(HPX_WITH_IBVERBS_PARCELPORT OFF)
# Always disable the tcp parcelport as it is non-functional on the BGQ.
set(HPX_WITH_TCP_PARCELPORT OFF)
# Always enable the tcp parcelport as it is currently the only way to communicate on the BGQ.
set(HPX_WITH_MPI_PARCELPORT ON)
# We have a bunch of cores on the BGQ ...
set(HPX_WITH_MAX_CPU_COUNT "64")
# We default to tbbmalloc as our allocator on the MIC
if(NOT DEFINED HPX_WITH_MALLOC)
  set(HPX_WITH_MALLOC "system" CACHE STRING "")
endif()
Cray
# Copyright (c) 2014 Thomas Heller
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#
# This is the default toolchain file to be used with Intel Xeon PHIs. It sets
# the appropriate compile flags and compiler such that HPX will compile.
# Note that you still need to provide Boost, hwloc and other utility libraries
# like a custom allocator yourself.
#
#set(CMAKE_SYSTEM_NAME Cray-CNK-Intel)
if(HPX_WITH_STATIC_LINKING)
  set_property(GLOBAL PROPERTY TARGET_SUPPORTS_SHARED_LIBS FALSE)
else()
endif()
# Set the Cray Compiler Wrapper
set(CMAKE_CXX_COMPILER CC)
set(CMAKE_C_COMPILER cc)
set(CMAKE_Fortran_COMPILER ftn)
if (CMAKE_VERSION VERSION_GREATER 3.3.9)
  set(__includes "<INCLUDES>")
endif()
set(CMAKE_C_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_C_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_C_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_C_COMPILE_OBJECT "<CMAKE_C_COMPILER> -shared -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_C_LINK_EXECUTABLE "<CMAKE_C_COMPILER> -fPIC -dynamic <FLAGS> <CMAKE_C_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_C_CREATE_SHARED_LIBRARY "<CMAKE_C_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_CXX_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES> " CACHE STRING "")
set(CMAKE_CXX_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CXX_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_CXX_COMPILE_OBJECT "<CMAKE_CXX_COMPILER> -shared -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_CXX_LINK_EXECUTABLE "<CMAKE_CXX_COMPILER> -fPIC -dynamic <FLAGS> <CMAKE_CXX_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_CXX_CREATE_SHARED_LIBRARY "<CMAKE_CXX_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_CXX_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_Fortran_FLAGS "-fPIC" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_Fortran_FLAGS "-shared" CACHE STRING "")
set(CMAKE_Fortran_COMPILE_OBJECT "<CMAKE_Fortran_COMPILER> -shared -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_Fortran_LINK_EXECUTABLE "<CMAKE_Fortran_COMPILER> -fPIC -dynamic <FLAGS> <CMAKE_Fortran_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>")
set(CMAKE_Fortran_CREATE_SHARED_LIBRARY "<CMAKE_Fortran_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_Fortran_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_Fortran_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES> " CACHE STRING "")
# Disable searches in the default system paths. We are cross compiling after all
# and cmake might pick up wrong libraries that way
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM BOTH)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
set(HPX_WITH_PARCELPORT_TCP ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI_MULTITHREADED OFF CACHE BOOL "")
set(HPX_WITH_PARCELPORT_LIBFABRIC ON CACHE BOOL "")
set(HPX_PARCELPORT_LIBFABRIC_PROVIDER "gni" CACHE STRING
  "See libfabric docs for details, gni,verbs,psm2 etc etc")
set(HPX_PARCELPORT_LIBFABRIC_THROTTLE_SENDS "256" CACHE STRING
  "Max number of messages in flight at once")
set(HPX_PARCELPORT_LIBFABRIC_WITH_DEV_MODE OFF CACHE BOOL
  "Custom libfabric logging flag")
set(HPX_PARCELPORT_LIBFABRIC_WITH_LOGGING  OFF CACHE BOOL
  "Libfabric parcelport logging on/off flag")
set(HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD "4096" CACHE STRING
  "The threshhold in bytes to when perform zero copy optimizations (default: 128)")
# We do a cross compilation here ...
set(CMAKE_CROSSCOMPILING ON CACHE BOOL "")
CrayKNL
# Copyright (c) 2014 Thomas Heller
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#
# This is the default toolchain file to be used with Intel Xeon PHIs. It sets
# the appropriate compile flags and compiler such that HPX will compile.
# Note that you still need to provide Boost, hwloc and other utility libraries
# like a custom allocator yourself.
#
if(HPX_WITH_STATIC_LINKING)
  set_property(GLOBAL PROPERTY TARGET_SUPPORTS_SHARED_LIBS FALSE)
else()
endif()
# Set the Cray Compiler Wrapper
set(CMAKE_CXX_COMPILER CC)
set(CMAKE_C_COMPILER cc)
set(CMAKE_Fortran_COMPILER ftn)
if (CMAKE_VERSION VERSION_GREATER 3.3.9)
  set(__includes "<INCLUDES>")
endif()
set(CMAKE_C_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_C_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_C_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_C_COMPILE_OBJECT "<CMAKE_C_COMPILER> -shared -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_C_LINK_EXECUTABLE "<CMAKE_C_COMPILER> -fPIC <FLAGS> <CMAKE_C_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_C_CREATE_SHARED_LIBRARY "<CMAKE_C_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_CXX_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES> " CACHE STRING "")
#
set(CMAKE_CXX_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CXX_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS "-fPIC -shared" CACHE STRING "")
set(CMAKE_CXX_COMPILE_OBJECT "<CMAKE_CXX_COMPILER> -shared -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_CXX_LINK_EXECUTABLE "<CMAKE_CXX_COMPILER> -fPIC -dynamic <FLAGS> <CMAKE_CXX_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_CXX_CREATE_SHARED_LIBRARY "<CMAKE_CXX_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_CXX_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_CXX_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES>" CACHE STRING "")
#
set(CMAKE_Fortran_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_Fortran_FLAGS "-fPIC" CACHE STRING "")
set(CMAKE_SHARED_LIBRARY_CREATE_Fortran_FLAGS "-shared" CACHE STRING "")
set(CMAKE_Fortran_COMPILE_OBJECT "<CMAKE_Fortran_COMPILER> -shared -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_Fortran_LINK_EXECUTABLE "<CMAKE_Fortran_COMPILER> -fPIC <FLAGS> <CMAKE_Fortran_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>")
set(CMAKE_Fortran_CREATE_SHARED_LIBRARY "<CMAKE_Fortran_COMPILER> -fPIC -shared <CMAKE_SHARED_LIBRARY_Fortran_FLAGS> <LANGUAGE_COMPILE_FLAGS> <LINK_FLAGS> <CMAKE_SHARED_LIBRARY_CREATE_Fortran_FLAGS> <SONAME_FLAG><TARGET_SONAME> -o <TARGET> <OBJECTS> <LINK_LIBRARIES> " CACHE STRING "")
#
# Disable searches in the default system paths. We are cross compiling after all
# and cmake might pick up wrong libraries that way
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM BOTH)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
set(HPX_WITH_PARCELPORT_TCP ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI_MULTITHREADED OFF CACHE BOOL "")
set(HPX_WITH_PARCELPORT_LIBFABRIC ON CACHE BOOL "")
set(HPX_PARCELPORT_LIBFABRIC_PROVIDER "gni" CACHE STRING
  "See libfabric docs for details, gni,verbs,psm2 etc etc")
set(HPX_PARCELPORT_LIBFABRIC_THROTTLE_SENDS "256" CACHE STRING
  "Max number of messages in flight at once")
set(HPX_PARCELPORT_LIBFABRIC_WITH_DEV_MODE OFF CACHE BOOL
  "Custom libfabric logging flag")
set(HPX_PARCELPORT_LIBFABRIC_WITH_LOGGING  OFF CACHE BOOL
  "Libfabric parcelport logging on/off flag")
set(HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD "4096" CACHE STRING
  "The threshhold in bytes to when perform zero copy optimizations (default: 128)")
# Set the TBBMALLOC_PLATFORM correctly so that find_package(TBBMalloc) sets the
# right hints
set(TBBMALLOC_PLATFORM "mic-knl" CACHE STRING "")
# We have a bunch of cores on the MIC ... increase the default
set(HPX_WITH_MAX_CPU_COUNT "512" CACHE STRING "")
# We do a cross compilation here ...
set(CMAKE_CROSSCOMPILING ON CACHE BOOL "")
# RDTSCP is available on Xeon/Phis
set(HPX_WITH_RDTSCP ON CACHE BOOL "")
CrayKNLStatic
# Copyright (c) 2014-2017 Thomas Heller
# Copyright (c) 2017      Bryce Adelstein Lelbach
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
set(HPX_WITH_STATIC_LINKING ON CACHE BOOL "")
set(HPX_WITH_STATIC_EXE_LINKING ON CACHE BOOL "")
set_property(GLOBAL PROPERTY TARGET_SUPPORTS_SHARED_LIBS FALSE)
# Set the Cray Compiler Wrapper
set(CMAKE_CXX_COMPILER CC)
set(CMAKE_C_COMPILER cc)
set(CMAKE_Fortran_COMPILER ftn)
if (CMAKE_VERSION VERSION_GREATER 3.3.9)
  set(__includes "<INCLUDES>")
endif()
set(CMAKE_C_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_C_COMPILE_OBJECT "<CMAKE_C_COMPILER> -static -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_C_LINK_EXECUTABLE "<CMAKE_C_COMPILER> -fPIC <FLAGS> <CMAKE_C_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_CXX_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_CXX_COMPILE_OBJECT "<CMAKE_CXX_COMPILER> -static -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_CXX_LINK_EXECUTABLE "<CMAKE_CXX_COMPILER> -fPIC <FLAGS> <CMAKE_CXX_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_Fortran_COMPILE_OBJECT "<CMAKE_Fortran_COMPILER> -static -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_Fortran_LINK_EXECUTABLE "<CMAKE_Fortran_COMPILER> -fPIC <FLAGS> <CMAKE_Fortran_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>")
# Disable searches in the default system paths. We are cross compiling after all
# and cmake might pick up wrong libraries that way
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM BOTH)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
set(HPX_WITH_PARCELPORT_TCP ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI_MULTITHREADED ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_LIBFABRIC ON CACHE BOOL "")
set(HPX_PARCELPORT_LIBFABRIC_PROVIDER "gni" CACHE STRING
  "See libfabric docs for details, gni,verbs,psm2 etc etc")
set(HPX_PARCELPORT_LIBFABRIC_THROTTLE_SENDS "256" CACHE STRING
  "Max number of messages in flight at once")
set(HPX_PARCELPORT_LIBFABRIC_WITH_DEV_MODE OFF CACHE BOOL
  "Custom libfabric logging flag")
set(HPX_PARCELPORT_LIBFABRIC_WITH_LOGGING  OFF CACHE BOOL
  "Libfabric parcelport logging on/off flag")
set(HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD "4096" CACHE STRING
  "The threshhold in bytes to when perform zero copy optimizations (default: 128)")
# Set the TBBMALLOC_PLATFORM correctly so that find_package(TBBMalloc) sets the
# right hints
set(TBBMALLOC_PLATFORM "mic-knl" CACHE STRING "")
# We have a bunch of cores on the MIC ... increase the default
set(HPX_WITH_MAX_CPU_COUNT "512" CACHE STRING "")
# We do a cross compilation here ...
set(CMAKE_CROSSCOMPILING ON CACHE BOOL "")
# RDTSCP is available on Xeon/Phis
set(HPX_WITH_RDTSCP ON CACHE BOOL "")
CrayStatic
# Copyright (c) 2014-2017 Thomas Heller
# Copyright (c) 2017      Bryce Adelstein Lelbach
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
set(HPX_WITH_STATIC_LINKING ON CACHE BOOL "")
set(HPX_WITH_STATIC_EXE_LINKING ON CACHE BOOL "")
set_property(GLOBAL PROPERTY TARGET_SUPPORTS_SHARED_LIBS FALSE)
# Set the Cray Compiler Wrapper
set(CMAKE_CXX_COMPILER CC)
set(CMAKE_C_COMPILER cc)
set(CMAKE_Fortran_COMPILER ftn)
if (CMAKE_VERSION VERSION_GREATER 3.3.9)
  set(__includes "<INCLUDES>")
endif()
set(CMAKE_C_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_C_COMPILE_OBJECT "<CMAKE_C_COMPILER> -static -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_C_LINK_EXECUTABLE "<CMAKE_C_COMPILER> -fPIC <FLAGS> <CMAKE_C_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_CXX_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_CXX_COMPILE_OBJECT "<CMAKE_CXX_COMPILER> -static -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_CXX_LINK_EXECUTABLE "<CMAKE_CXX_COMPILER> -fPIC <FLAGS> <CMAKE_CXX_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_INIT "" CACHE STRING "")
set(CMAKE_Fortran_COMPILE_OBJECT "<CMAKE_Fortran_COMPILER> -static -fPIC <DEFINES> ${__includes} <FLAGS> -o <OBJECT> -c <SOURCE>" CACHE STRING "")
set(CMAKE_Fortran_LINK_EXECUTABLE "<CMAKE_Fortran_COMPILER> -fPIC <FLAGS> <CMAKE_Fortran_LINK_FLAGS> <LINK_FLAGS> <OBJECTS> -o <TARGET> <LINK_LIBRARIES>")
# Disable searches in the default system paths. We are cross compiling after all
# and cmake might pick up wrong libraries that way
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM BOTH)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
# We do a cross compilation here ...
set(CMAKE_CROSSCOMPILING ON CACHE BOOL "")
# RDTSCP is available on Xeon/Phis
set(HPX_WITH_RDTSCP ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_TCP ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_MPI_MULTITHREADED ON CACHE BOOL "")
set(HPX_WITH_PARCELPORT_LIBFABRIC ON CACHE BOOL "")
set(HPX_PARCELPORT_LIBFABRIC_PROVIDER "gni" CACHE STRING
  "See libfabric docs for details, gni,verbs,psm2 etc etc")
set(HPX_PARCELPORT_LIBFABRIC_THROTTLE_SENDS "256" CACHE STRING
  "Max number of messages in flight at once")
set(HPX_PARCELPORT_LIBFABRIC_WITH_DEV_MODE OFF CACHE BOOL
  "Custom libfabric logging flag")
set(HPX_PARCELPORT_LIBFABRIC_WITH_LOGGING  OFF CACHE BOOL
  "Libfabric parcelport logging on/off flag")
set(HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD "4096" CACHE STRING
  "The threshhold in bytes to when perform zero copy optimizations (default: 128)")
XeonPhi
# Copyright (c) 2014 Thomas Heller
#
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#
# This is the default toolchain file to be used with Intel Xeon PHIs. It sets
# the appropriate compile flags and compiler such that HPX will compile.
# Note that you still need to provide Boost, hwloc and other utility libraries
# like a custom allocator yourself.
#
set(CMAKE_SYSTEM_NAME Linux)
# Set the Intel Compiler
set(CMAKE_CXX_COMPILER icpc)
set(CMAKE_C_COMPILER icc)
set(CMAKE_Fortran_COMPILER ifort)
# Add the -mmic compile flag such that everything will be compiled for the correct
# platform
set(CMAKE_CXX_FLAGS_INIT "-mmic" CACHE STRING "Initial compiler flags used to compile for the Xeon Phi")
set(CMAKE_C_FLAGS_INIT "-mmic" CACHE STRING "Initial compiler flags used to compile for the Xeon Phi")
set(CMAKE_Fortran_FLAGS_INIT "-mmic" CACHE STRING "Initial compiler flags used to compile for the Xeon Phi")
# Disable searches in the default system paths. We are cross compiling after all
# and cmake might pick up wrong libraries that way
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM BOTH)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
# We do a cross compilation here ...
set(CMAKE_CROSSCOMPILING ON)
# Set our platform name
set(HPX_PLATFORM "XeonPhi")
# Always disable the ibverbs parcelport as it is non-functional on the BGQ.
set(HPX_WITH_PARCELPORT_IBVERBS OFF CACHE BOOL "Enable the ibverbs based parcelport. This is currently an experimental feature")
# We have a bunch of cores on the MIC ... increase the default
set(HPX_WITH_MAX_CPU_COUNT "256" CACHE STRING "")
# We default to tbbmalloc as our allocator on the MIC
if(NOT DEFINED HPX_WITH_MALLOC)
  set(HPX_WITH_MALLOC "tbbmalloc" CACHE STRING "")
endif()
# Set the TBBMALLOC_PLATFORM correctly so that find_package(TBBMalloc) sets the
# right hints
set(TBBMALLOC_PLATFORM "mic" CACHE STRING "")
set(HPX_HIDDEN_VISIBILITY OFF CACHE BOOL "Use -fvisibility=hidden for builds on platforms which support it")
# RDTSC is available on Xeon/Phis
set(HPX_WITH_RDTSC ON CACHE BOOL "")
CMake variables used to configure HPX

In order to configure HPX, you can set a variety of options to allow cmake to generate your specific makefiles/project files.

Variables that influence how HPX is built

The options are split into these categories:

Generic options
HPX_WITH_AUTOMATIC_SERIALIZATION_REGISTRATION:BOOL

Use automatic serialization registration for actions and functions. This affects compatibility between HPX applications compiled with different compilers (default ON)

HPX_WITH_BENCHMARK_SCRIPTS_PATH:PATH

Directory to place batch scripts in

HPX_WITH_BUILD_BINARY_PACKAGE:BOOL

Build HPX on the build infrastructure on any LINUX distribution (default: OFF).

HPX_WITH_COMPILER_WARNINGS:BOOL

Enable compiler warnings (default: ON)

HPX_WITH_COMPILER_WARNINGS_AS_ERRORS:BOOL

Turn compiler warnings into errors (default: OFF)

HPX_WITH_COMPRESSION_BZIP2:BOOL

Enable bzip2 compression for parcel data (default: OFF).

HPX_WITH_COMPRESSION_SNAPPY:BOOL

Enable snappy compression for parcel data (default: OFF).

HPX_WITH_COMPRESSION_ZLIB:BOOL

Enable zlib compression for parcel data (default: OFF).

HPX_WITH_CUDA:BOOL

Enable CUDA support (default: OFF)

HPX_WITH_CUDA_CLANG:BOOL

Use clang to compile CUDA code (default: OFF)

HPX_WITH_CXX14_RETURN_TYPE_DEDUCTION:BOOL

Enable the use of auto as a return value in some places. Overriding this flag is only necessary if the C++ compiler is not standard compliant, e.g. nvcc.

HPX_WITH_DATAPAR_BOOST_SIMD:BOOL

Enable data parallel algorithm support using the external Boost.SIMD library (default: OFF)

HPX_WITH_DATAPAR_VC:BOOL

Enable data parallel algorithm support using the external Vc library (default: OFF)

HPX_WITH_DEPRECATION_WARNINGS:BOOL

Enable warnings for deprecated facilities. (default: ON)

HPX_WITH_DISABLED_SIGNAL_EXCEPTION_HANDLERS:BOOL

Disables the mechanism that produces debug output for caught signals and unhandled exceptions (default: OFF)

HPX_WITH_DYNAMIC_HPX_MAIN:BOOL

Enable dynamic overload of system main() (Linux only, default: ON)

HPX_WITH_FAULT_TOLERANCE:BOOL

Build HPX to tolerate failures of nodes, i.e. ignore errors in active communication channels (default: OFF)

HPX_WITH_FORTRAN:BOOL

Enable or disable the compilation of Fortran examples using HPX

HPX_WITH_FULL_RPATH:BOOL

Build and link HPX libraries and executables with full RPATHs (default: ON)

HPX_WITH_GCC_VERSION_CHECK:BOOL

Don’t ignore version reported by gcc (default: ON)

HPX_WITH_GENERIC_CONTEXT_COROUTINES:BOOL

Use Boost.Context as the underlying coroutines context switch implementation.

HPX_WITH_HCC:BOOL

Enable hcc support (default: OFF)

HPX_WITH_HIDDEN_VISIBILITY:BOOL

Use -fvisibility=hidden for builds on platforms which support it (default OFF)

HPX_WITH_INCLUSIVE_SCAN_COMPATIBILITY:BOOL

Enable old overloads for inclusive_scan (default: OFF)

HPX_WITH_LOGGING:BOOL

Build HPX with logging enabled (default: ON).

HPX_WITH_MALLOC:STRING

Define which allocator should be linked in. Options are: system, tcmalloc, jemalloc, tbbmalloc, and custom (default is: tcmalloc)

HPX_WITH_NATIVE_TLS:BOOL

Use native TLS support if available (default: ON)

HPX_WITH_NICE_THREADLEVEL:BOOL

Set HPX worker threads to have high NICE level (may impact performance) (default: OFF)

HPX_WITH_PARCEL_COALESCING:BOOL

Enable the parcel coalescing plugin (default: ON).

HPX_WITH_RUN_MAIN_EVERYWHERE:BOOL

Run hpx_main by default on all localities (default: OFF).

HPX_WITH_STACKOVERFLOW_DETECTION:BOOL

Enable stackoverflow detection for HPX threads/coroutines. (default: OFF, debug: ON)

HPX_WITH_STATIC_LINKING:BOOL

Compile HPX statically linked libraries (Default: OFF)

HPX_WITH_SYCL:BOOL

Enable sycl support (default: OFF)

HPX_WITH_THREAD_COMPATIBILITY:BOOL

Use a compatibility implementation of std::thread, i.e. fall back to Boost.Thread (default: OFF)

HPX_WITH_UNWRAPPED_COMPATIBILITY:BOOL

Enable the deprecated unwrapped function (default: OFF)

HPX_WITH_VIM_YCM:BOOL

Generate HPX completion file for VIM YouCompleteMe plugin

HPX_WITH_ZERO_COPY_SERIALIZATION_THRESHOLD:STRING

The threshhold in bytes to when perform zero copy optimizations (default: 128)

Build Targets options
HPX_WITH_COMPILE_ONLY_TESTS:BOOL

Create build system support for compile time only HPX tests (default ON)

HPX_WITH_DEFAULT_TARGETS:BOOL

Associate the core HPX library with the default build target (default: ON).

HPX_WITH_DOCUMENTATION:BOOL

Build the HPX documentation (default OFF).

HPX_WITH_DOCUMENTATION_OUTPUT_FORMATS:STRING

List of documentation output formats to generate. Valid options are html;singlehtml;latexpdf;man. Multiple values can be separated with semicolons. (default html).

HPX_WITH_EXAMPLES:BOOL

Build the HPX examples (default ON)

HPX_WITH_EXAMPLES_HDF5:BOOL

Enable examples requiring HDF5 support (default: OFF).

HPX_WITH_EXAMPLES_OPENMP:BOOL

Enable examples requiring OpenMP support (default: OFF).

HPX_WITH_EXAMPLES_QT4:BOOL

Enable examples requiring Qt4 support (default: OFF).

HPX_WITH_EXAMPLES_QTHREADS:BOOL

Enable examples requiring QThreads support (default: OFF).

HPX_WITH_EXAMPLES_TBB:BOOL

Enable examples requiring TBB support (default: OFF).

HPX_WITH_EXECUTABLE_PREFIX:STRING

Executable prefix (default none), ‘hpx_’ useful for system install.

HPX_WITH_FAIL_COMPILE_TESTS:BOOL

Create build system support for fail compile HPX tests (default ON)

HPX_WITH_IO_COUNTERS:BOOL

Build HPX runtime (default: ON)

HPX_WITH_PSEUDO_DEPENDENCIES:BOOL

Force creating pseudo targets and pseudo dependencies (default ON).

HPX_WITH_TESTS:BOOL

Build the HPX tests (default ON)

HPX_WITH_TESTS_BENCHMARKS:BOOL

Build HPX benchmark tests (default: ON)

HPX_WITH_TESTS_EXAMPLES:BOOL

Add HPX examples as tests (default: ON)

HPX_WITH_TESTS_EXTERNAL_BUILD:BOOL

Build external cmake build tests (default: ON)

HPX_WITH_TESTS_HEADERS:BOOL

Build HPX header tests (default: OFF)

HPX_WITH_TESTS_REGRESSIONS:BOOL

Build HPX regression tests (default: ON)

HPX_WITH_TESTS_UNIT:BOOL

Build HPX unit tests (default: ON)

HPX_WITH_TOOLS:BOOL

Build HPX tools (default: OFF)

Thread Manager options
HPX_SCHEDULER_MAX_TERMINATED_THREADS:STRING

Maximum number of terminated threads collected before those are cleaned up (default: 100)

HPX_WITH_IO_POOL:BOOL

Disable internal IO thread pool, do not change if not absolutely necessary (default: ON)

HPX_WITH_MAX_CPU_COUNT:STRING

HPX applications will not use more that this number of OS-Threads (empty string means dynamic) (default: 64)

HPX_WITH_MAX_NUMA_DOMAIN_COUNT:STRING

HPX applications will not run on machines with more NUMA domains (default: 8)

HPX_WITH_MORE_THAN_64_THREADS:BOOL

HPX applications will be able to run on more than 64 cores (default: OFF)

HPX_WITH_SCHEDULER_LOCAL_STORAGE:BOOL

Enable scheduler local storage for all HPX schedulers (default: OFF)

HPX_WITH_SPINLOCK_DEADLOCK_DETECTION:BOOL

Enable spinlock deadlock detection (default: OFF)

HPX_WITH_SPINLOCK_POOL_NUM:STRING

Number of elements a spinlock pool manages (default: 128)

HPX_WITH_STACKTRACES:BOOL

Attach backtraces to HPX exceptions (default: ON)

HPX_WITH_SWAP_CONTEXT_EMULATION:BOOL

Emulate SwapContext API for coroutines (default: OFF)

HPX_WITH_THREAD_BACKTRACE_DEPTH:STRING

Thread stack back trace depth being captured (default: 5)

HPX_WITH_THREAD_BACKTRACE_ON_SUSPENSION:BOOL

Enable thread stack back trace being captured on suspension (default: OFF)

HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES:BOOL

Enable measuring thread creation and cleanup times (default: OFF)

HPX_WITH_THREAD_CUMULATIVE_COUNTS:BOOL

Enable keeping track of cumulative thread counts in the schedulers (default: ON)

HPX_WITH_THREAD_IDLE_RATES:BOOL

Enable measuring the percentage of overhead times spent in the scheduler (default: OFF)

HPX_WITH_THREAD_LOCAL_STORAGE:BOOL

Enable thread local storage for all HPX threads (default: OFF)

HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF:BOOL

HPX scheduler threads do exponential backoff on idle queues (default: ON)

HPX_WITH_THREAD_QUEUE_WAITTIME:BOOL

Enable collecting queue wait times for threads (default: OFF)

HPX_WITH_THREAD_SCHEDULERS:STRING

Which thread schedulers are built. Options are: all, abp-priority, local, static-priority, static, shared-priority. For multiple enabled schedulers, separate with a semicolon (default: all)

HPX_WITH_THREAD_STACK_MMAP:BOOL

Use mmap for stack allocation on appropriate platforms

HPX_WITH_THREAD_STEALING_COUNTS:BOOL

Enable keeping track of counts of thread stealing incidents in the schedulers (default: OFF)

HPX_WITH_THREAD_TARGET_ADDRESS:BOOL

Enable storing target address in thread for NUMA awareness (default: OFF)

HPX_WITH_TIMER_POOL:BOOL

Disable internal timer thread pool, do not change if not absolutely necessary (default: ON)

AGAS options
HPX_WITH_AGAS_DUMP_REFCNT_ENTRIES:BOOL

Enable dumps of the AGAS refcnt tables to logs (default: OFF)

Parcelport options
HPX_WITH_NETWORKING:BOOL

Enable support for networking and multi-node runs (default: ON)

HPX_WITH_PARCELPORT_ACTION_COUNTERS:BOOL

Enable performance counters reporting parcelport statistics on a per-action basis.

HPX_WITH_PARCELPORT_LIBFABRIC:BOOL

Enable the libfabric based parcelport. This is currently an experimental feature

HPX_WITH_PARCELPORT_MPI:BOOL

Enable the MPI based parcelport.

HPX_WITH_PARCELPORT_MPI_ENV:STRING

List of environment variables checked to detect MPI (default: MV2_COMM_WORLD_RANK;PMI_RANK;OMPI_COMM_WORLD_SIZE;ALPS_APP_PE;PMIX_RANK).

HPX_WITH_PARCELPORT_MPI_MULTITHREADED:BOOL

Turn on MPI multithreading support (default: ON).

HPX_WITH_PARCELPORT_TCP:BOOL

Enable the TCP based parcelport.

HPX_WITH_PARCELPORT_VERBS:BOOL

Enable the ibverbs based parcelport. This is currently an experimental feature

HPX_WITH_PARCEL_PROFILING:BOOL

Enable profiling data for parcels

Profiling options
HPX_WITH_APEX:BOOL

Enable APEX instrumentation support.

HPX_WITH_GOOGLE_PERFTOOLS:BOOL

Enable Google Perftools instrumentation support.

HPX_WITH_ITTNOTIFY:BOOL

Enable Amplifier (ITT) instrumentation support.

HPX_WITH_PAPI:BOOL

Enable the PAPI based performance counter.

Debugging options
HPX_WITH_ATTACH_DEBUGGER_ON_TEST_FAILURE:BOOL

Break the debugger if a test has failed (default: OFF)

HPX_WITH_SANITIZERS:BOOL

Configure with sanitizer instrumentation support.

HPX_WITH_TESTS_DEBUG_LOG:BOOL

Turn on debug logs (–hpx:debug-hpx-log) for tests (default: OFF)

HPX_WITH_TESTS_DEBUG_LOG_DESTINATION:STRING

Destination for test debug logs (default: cout)

HPX_WITH_THREAD_DEBUG_INFO:BOOL

Enable thread debugging information (default: OFF, implicitly enabled in debug builds)

HPX_WITH_THREAD_DESCRIPTION_FULL:BOOL

Use function address for thread description (default: OFF)

HPX_WITH_THREAD_GUARD_PAGE:BOOL

Enable thread guard page (default: ON)

HPX_WITH_VALGRIND:BOOL

Enable Valgrind instrumentation support.

HPX_WITH_VERIFY_LOCKS:BOOL

Enable lock verification code (default: OFF, implicitly enabled in debug builds)

HPX_WITH_VERIFY_LOCKS_BACKTRACE:BOOL

Enable thread stack back trace being captured on lock registration (to be used in combination with HPX_WITH_VERIFY_LOCKS=ON, default: OFF)

HPX_WITH_VERIFY_LOCKS_GLOBALLY:BOOL

Enable global lock verification code (default: OFF, implicitly enabled in debug builds)

Modules options
HPX_PREPROCESSOR_WITH_COMPATIBILITY_HEADERS:BOOL

Enable compatibility headers for old headers

HPX_PREPROCESSOR_WITH_DEPRECATION_WARNINGS:BOOL

Enable warnings for deprecated facilities. (default: Off)

HPX_PREPROCESSOR_WITH_TESTS:BOOL

Build HPX preprocessor module tests. (default: ON)

Additional tools and libraries used by HPX

Here is a list of additional libraries and tools which are either optionally supported by the build system or are optionally required for certain examples or tests. These libraries and tools can be detected by the HPX build system.

Each of the tools or libraries listed here will be automatically detected if they are installed in some standard location. If a tool or library is installed in a different location you can specify its base directory by appending _ROOT to the variable name as listed below. For instance, to configure a custom directory for BOOST, specify BOOST_ROOT=/custom/boost/root.

BOOST_ROOT:PATH

Specifies where to look for the Boost installation to be used for compiling HPX Set this if CMake is not able to locate a suitable version of Boost The directory specified here can be either the root of a installed Boost distribution or the directory where you unpacked and built Boost without installing it (with staged libraries).

HWLOC_ROOT:PATH

Specifies where to look for the Portable Hardware Locality (HWLOC) library. Set this if CMake is not able to locate a suitable version of Portable Hardware Locality (HWLOC) Portable Hardware Locality (HWLOC) provides platform independent support for extracting information about the used hardware architecture (number of cores, number of NUMA domains, hyperthreading, etc.). HPX utilizes this information if available.

PAPI_ROOT:PATH

Specifies where to look for the Performance Application Programming Interface (PAPI) library. The PAPI library is necessary to compile a special component exposing PAPI hardware events and counters as HPX performance counters. This is not available on the Windows platform.

AMPLIFIER_ROOT:PATH

Specifies where to look for one of the tools of the Intel Parallel Studio(tm) product, either Intel Amplifier(tm) or Intel Inspector(tm). This should be set if the CMake variable HPX_USE_ITT_NOTIFY is set to ON. Enabling ITT support in HPX will integrate any application with the mentioned Intel tools, which customizes the generated information for your application and improves the generated diagnostics.

In addition, some of the examples may need the following variables:

HDF5_ROOT:PATH

Specifies where to look for the Hierarchical Data Format V5 (HDF5) include files and libraries.

Creating HPX projects

Using HPX with pkg-config
How to build HPX applications with pkg-config

After you are done installing HPX, you should be able to build the following program. It prints Hello World! on the locality you run it on.

//  Copyright (c) 2007-2012 Hartmut Kaiser
//
//  Distributed under the Boost Software License, Version 1.0. (See accompanying
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

///////////////////////////////////////////////////////////////////////////////
// The purpose of this example is to execute a HPX-thread printing
// "Hello World!" once. That's all.

//[hello_world_1_getting_started
// Including 'hpx/hpx_main.hpp' instead of the usual 'hpx/hpx_init.hpp' enables
// to use the plain C-main below as the direct main HPX entry point.
#include <hpx/hpx_main.hpp>
#include <hpx/include/iostreams.hpp>

int main()
{
    // Say hello to the world!
    hpx::cout << "Hello World!\n" << hpx::flush;
    return 0;
}
//]

Copy the text of this program into a file called hello_world.cpp.

Now, in the directory where you put hello_world.cpp, issue the following commands (where $HPX_LOCATION is the build directory or CMAKE_INSTALL_PREFIX you used while building HPX):

export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HPX_LOCATION/lib/pkgconfig
c++ -o hello_world hello_world.cpp \
   `pkg-config --cflags --libs hpx_application`\
    -lhpx_iostreams -DHPX_APPLICATION_NAME=hello_world

Important

When using pkg-config with HPX, the pkg-config flags must go after the -o flag.

Note

HPX libraries have different names in debug and release mode. If you want to link against a debug HPX library, you need to use the _debug suffix for the pkg-config name. That means instead of hpx_application or hpx_component you will have to use hpx_application_debug or hpx_component_debug Moreover, all referenced HPX components need to have a appended d suffix, e.g. instead of -lhpx_iostreams you will need to specify -lhpx_iostreamsd.

Important

If the HPX libraries are in a path that is not found by the dynamic linker. You need to add the path $HPX_LOCATION/lib to your linker search path (for example LD_LIBRARY_PATH on Linux).

To test the program, type:

./hello_world

which should print Hello World! and exit.

How to build HPX components with pkg-config

Let’s try a more complex example involving an HPX component. An HPX component is a class which exposes HPX actions. HPX components are compiled into dynamically loaded modules called component libraries. Here’s the source code:

hello_world_component.cpp

#include "hello_world_component.hpp"
#include <hpx/include/iostreams.hpp>

#include <iostream>

namespace examples { namespace server
{
    void hello_world::invoke()
    {
        hpx::cout << "Hello HPX World!" << std::endl;
    }
}}

HPX_REGISTER_COMPONENT_MODULE();

typedef hpx::components::component<
    examples::server::hello_world
> hello_world_type;

HPX_REGISTER_COMPONENT(hello_world_type, hello_world);

HPX_REGISTER_ACTION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action);

hello_world_component.hpp

#if !defined(HELLO_WORLD_COMPONENT_HPP)
#define HELLO_WORLD_COMPONENT_HPP

#include <hpx/hpx.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/include/lcos.hpp>
#include <hpx/include/components.hpp>
#include <hpx/include/serialization.hpp>

#include <utility>

namespace examples { namespace server
{
    struct HPX_COMPONENT_EXPORT hello_world
        : hpx::components::component_base<hello_world>
    {
        void invoke();
        HPX_DEFINE_COMPONENT_ACTION(hello_world, invoke);
    };
}}

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action);

namespace examples
{
    struct hello_world
      : hpx::components::client_base<hello_world, server::hello_world>
    {
        typedef hpx::components::client_base<hello_world, server::hello_world>
            base_type;

        hello_world(hpx::future<hpx::naming::id_type> && f)
          : base_type(std::move(f))
        {}

        hello_world(hpx::naming::id_type && f)
          : base_type(std::move(f))
        {}

        void invoke()
        {
            hpx::async<server::hello_world::invoke_action>(this->get_id()).get();
        }
    };
}

#endif // HELLO_WORLD_COMPONENT_HPP

hello_world_client.cpp

//  Copyright (c) 2012 Bryce Lelbach
//
//  Distributed under the Boost Software License, Version 1.0. (See accompanying
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

//[hello_world_client_getting_started
#include "hello_world_component.hpp"
#include <hpx/hpx_init.hpp>

int hpx_main(boost::program_options::variables_map&)
{
    {
        // Create a single instance of the component on this locality.
        examples::hello_world client =
            hpx::new_<examples::hello_world>(hpx::find_here());

        // Invoke the component's action, which will print "Hello World!".
        client.invoke();
    }

    return hpx::finalize(); // Initiate shutdown of the runtime system.
}

int main(int argc, char* argv[])
{
    return hpx::init(argc, argv); // Initialize and run HPX.
}
//]

Copy the three source files above into three files (called hello_world_component.cpp, hello_world_component.hpp and hello_world_client.cpp respectively).

Now, in the directory where you put the files, run the following command to build the component library. (where $HPX_LOCATION is the build directory or CMAKE_INSTALL_PREFIX you used while building HPX):

export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HPX_LOCATION/lib/pkgconfig
c++ -o libhpx_hello_world.so hello_world_component.cpp \
   `pkg-config --cflags --libs hpx_component` \
    -lhpx_iostreams -DHPX_COMPONENT_NAME=hpx_hello_world

Now pick a directory in which to install your HPX component libraries. For this example, we’ll choose a directory named my_hpx_libs:

mkdir ~/my_hpx_libs
mv libhpx_hello_world.so ~/my_hpx_libs

Note

HPX libraries have different names in debug and release mode. If you want to link against a debug HPX library, you need to use the _debug suffix for the pkg-config name. That means instead of hpx_application or hpx_component you will have to use hpx_application_debug or hpx_component_debug. Moreover, all referenced HPX components need to have a appended d suffix, e.g. instead of -lhpx_iostreams you will need to specify -lhpx_iostreamsd.

Important

If the HPX libraries are in a path that is not found by the dynamic linker. You need to add the path $HPX_LOCATION/lib to your linker search path (for example LD_LIBRARY_PATH on Linux).

Now, to build the application that uses this component (hello_world_client.cpp), we do:

export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HPX_LOCATION/lib/pkgconfig
c++ -o hello_world_client hello_world_client.cpp \
   ``pkg-config --cflags --libs hpx_application``\
    -L${HOME}/my_hpx_libs -lhpx_hello_world -lhpx_iostreams

Important

When using pkg-config with HPX, the pkg-config flags must go after the -o flag.

Finally, you’ll need to set your LD_LIBRARY_PATH before you can run the program. To run the program, type:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HOME/my_hpx_libs"
./hello_world_client

which should print Hello HPX World! and exit.

Using HPX with CMake-based projects

In Addition to the pkg-config support discussed on the previous pages, HPX comes with full CMake support. In order to integrate HPX into your existing, or new CMakeLists.txt you can leverage the find_package command integrated into CMake. Following is a Hello World component example using CMake.

Let’s revisit what we have. We have three files which compose our example application:

  • hello_world_component.hpp
  • hello_world_component.cpp
  • hello_world_client.hpp

The basic structure to include HPX into your CMakeLists.txt is shown here:

# Require a recent version of cmake
cmake_minimum_required(VERSION 3.3.2 FATAL_ERROR)

# This project is C++ based.
project(your_app CXX)

# Instruct cmake to find the HPX settings
find_package(HPX)

In order to have CMake find HPX, it needs to be told where to look for the HPXConfig.cmake file that is generated when HPX is built or installed, it is used by find_package(HPX) to set up all the necessary macros needed to use HPX in your project. The ways to achieve this are:

  • set the HPX_DIR cmake variable to point to the directory containing the HPXConfig.cmake script on the command line when you invoke cmake:

    cmake -DHPX_DIR=$HPX_LOCATION/lib/cmake/HPX ...
    

    where $HPX_LOCATION is the build directory or CMAKE_INSTALL_PREFIX you used when building/configuring HPX.

  • set the CMAKE_PREFIX_PATH variable to the root directory of your HPX build or install location on the command line when you invoke cmake:

    cmake -DCMAKE_PREFIX_PATH=$HPX_LOCATION ...
    

    the difference between CMAKE_PREFIX_PATH and HPX_DIR is that cmake will add common postfixes such as lib/cmake/<project to the MAKE_PREFIX_PATH and search in these locations too. Note that if your project uses HPX as well as other cmake managed projects, the paths to the locations of these multiple projects may be concatenated in the CMAKE_PREFIX_PATH.

  • The variables above may be set in the CMake GUI or curses ccmake interface instead of the command line.

Additionally, if you wish to require HPX for your project, replace the find_package(HPX) line with find_package(HPX REQUIRED).

You can check if HPX was successfully found with the HPX_FOUND CMake variable.

The simplest way to add the HPX component is to use the add_hpx_component macro and add it to the CMakeLists.txt file:

# build your application using HPX
add_hpx_component(hello_world
    SOURCES hello_world_component.cpp
    HEADERS hello_world_component.hpp
    COMPONENT_DEPENDENCIES iostreams)

Note

add_hpx_component adds a _component suffix to the target name. In the example above a hello_world_component target will be created.

The available options to add_hpx_component are:

  • SOURCES: The source files for that component
  • HEADERS: The header files for that component
  • DEPENDENCIES: Other libraries or targets this component depends on
  • COMPONENT_DEPENDENCIES: The components this component depends on
  • PLUGIN: Treat this component as a plugin-able library
  • COMPILE_FLAGS: Additional compiler flags
  • LINK_FLAGS: Additional linker flags
  • FOLDER: Add the headers and source files to this Source Group folder
  • EXCLUDE_FROM_ALL: Do not build this component as part of the all target

After adding the component, the way you add the executable is as follows:

# build your application using HPX
add_hpx_executable(hello_world
    ESSENTIAL
    SOURCES hello_world_client.cpp
    COMPONENT_DEPENDENCIES hello_world)

Note

add_hpx_executable automatically adds a _component suffix to dependencies specified in COMPONENT_DEPENDENCIES, meaning you can directly use the name given when adding a component using add_hpx_component.

When you configure your application, all you need to do is set the HPX_DIR variable to point to the installation of HPX!

Note

All library targets built with HPX are exported and readily available to be used as arguments to target_link_libraries in your targets. The HPX include directories are available with the HPX_INCLUDE_DIRS CMake variable.

CMake macros to integrate HPX into existing applications

In addition to the add_hpx_component and add_hpx_executable you can use the hpx_setup_target macro to have an already existing target to be used with the HPX libraries:

hpx_setup_target(target)

Optional parameters are:

  • EXPORT: Adds it to the CMake export list HPXTargets
  • INSTALL: Generates a install rule for the target
  • PLUGIN: Treat this component as a plugin-able library
  • TYPE: The type can be: EXECUTABLE, LIBRARY or COMPONENT
  • DEPENDENCIES: Other libraries or targets this component depends on
  • COMPONENT_DEPENDENCIES: The components this component depends on
  • COMPILE_FLAGS: Additional compiler flags
  • LINK_FLAGS: Additional linker flags

If you do not use CMake, you can still build against HPX but you should refer to the section on How to build HPX components with pkg-config.

Note

Since HPX relies on dynamic libraries, the dynamic linker needs to know where to look for them. If HPX isn’t installed into a path which is configured as a linker search path, external projects need to either set RPATH or adapt LD_LIBRARY_PATH to point to where the hpx libraries reside. In order to set RPATHs, you can include HPX_SetFullRPATH in your project after all libraries you want to link against have been added. Please also consult the CMake documentation here.

Using HPX with Makefile

A basic project building with HPX is through creating makefiles. The process of creating one can get complex depending upon the use of cmake parameter HPX_WITH_HPX_MAIN (which defaults to ON).

How to build HPX applications with makefile

If HPX is installed correctly, you should be able to build and run a simple hello world program. It prints Hello World! on the locality you run it on.

//  Copyright (c) 2007-2012 Hartmut Kaiser
//
//  Distributed under the Boost Software License, Version 1.0. (See accompanying
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

///////////////////////////////////////////////////////////////////////////////
// The purpose of this example is to execute a HPX-thread printing
// "Hello World!" once. That's all.

//[hello_world_1_getting_started
// Including 'hpx/hpx_main.hpp' instead of the usual 'hpx/hpx_init.hpp' enables
// to use the plain C-main below as the direct main HPX entry point.
#include <hpx/hpx_main.hpp>
#include <hpx/include/iostreams.hpp>

int main()
{
    // Say hello to the world!
    hpx::cout << "Hello World!\n" << hpx::flush;
    return 0;
}
//]

Copy the content of this program into a file called hello_world.cpp.

Now in the directory where you put hello_world.cpp, create a Makefile. Add the following code:

CXX=(CXX)  # Add your favourite compiler here or let makefile choose default.

CXXFLAGS=-O3 -std=c++17

BOOST_ROOT=/path/to/boost
HWLOC_ROOT=/path/to/hwloc
TCMALLOC_ROOT=/path/to/tcmalloc
HPX_ROOT=/path/to/hpx

INCLUDE_DIRECTIVES=$(HPX_ROOT)/include $(BOOST_ROOT)/include $(HWLOC_ROOT)/include

LIBRARY_DIRECTIVES=-L$(HPX_ROOT)/lib $(HPX_ROOT)/lib/libhpx_init.a $(HPX_ROOT)/lib/libhpx.so $(BOOST_ROOT)/lib/libboost_atomic-mt.so $(BOOST_ROOT)/lib/libboost_filesystem-mt.so $(BOOST_ROOT)/lib/libboost_program_options-mt.so $(BOOST_ROOT)/lib/libboost_regex-mt.so $(BOOST_ROOT)/lib/libboost_system-mt.so -lpthread $(TCMALLOC_ROOT)/libtcmalloc_minimal.so $(HWLOC_ROOT)/libhwloc.so -ldl -lrt

LINK_FLAGS=$(HPX_ROOT)/lib/libhpx_wrap.a -Wl,-wrap=main  # should be left empty for HPX_WITH_HPX_MAIN=OFF

hello_world: hello_world.o
   $(CXX) $(CXXFLAGS) -o hello_world hello_world.o $(LIBRARY_DIRECTIVES) $(LINK_FLAGS)

hello_world.o:
   $(CXX) $(CXXFLAGS) -c -o hello_world.o hello_world.cpp $(INCLUDE_DIRECTIVES)

Important

LINK_FLAGS should be left empty if HPX_WITH_HPX_MAIN is set to OFF. Boost in the above example is build with --layout=tagged. Actual boost flags may vary on your build of boost.

To build the program, type:

make

A successfull build should result in hello_world binary. To test, type:

./hello_world
How to build HPX components with makefile

Let’s try a more complex example involving an HPX component. An HPX component is a class which exposes HPX actions. HPX components are compiled into dynamically loaded modules called component libraries. Here’s the source code:

hello_world_component.cpp

#include "hello_world_component.hpp"
#include <hpx/include/iostreams.hpp>

#include <iostream>

namespace examples { namespace server
{
    void hello_world::invoke()
    {
        hpx::cout << "Hello HPX World!" << std::endl;
    }
}}

HPX_REGISTER_COMPONENT_MODULE();

typedef hpx::components::component<
    examples::server::hello_world
> hello_world_type;

HPX_REGISTER_COMPONENT(hello_world_type, hello_world);

HPX_REGISTER_ACTION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action);

hello_world_component.hpp

#if !defined(HELLO_WORLD_COMPONENT_HPP)
#define HELLO_WORLD_COMPONENT_HPP

#include <hpx/hpx.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/include/lcos.hpp>
#include <hpx/include/components.hpp>
#include <hpx/include/serialization.hpp>

#include <utility>

namespace examples { namespace server
{
    struct HPX_COMPONENT_EXPORT hello_world
        : hpx::components::component_base<hello_world>
    {
        void invoke();
        HPX_DEFINE_COMPONENT_ACTION(hello_world, invoke);
    };
}}

HPX_REGISTER_ACTION_DECLARATION(
    examples::server::hello_world::invoke_action, hello_world_invoke_action);

namespace examples
{
    struct hello_world
      : hpx::components::client_base<hello_world, server::hello_world>
    {
        typedef hpx::components::client_base<hello_world, server::hello_world>
            base_type;

        hello_world(hpx::future<hpx::naming::id_type> && f)
          : base_type(std::move(f))
        {}

        hello_world(hpx::naming::id_type && f)
          : base_type(std::move(f))
        {}

        void invoke()
        {
            hpx::async<server::hello_world::invoke_action>(this->get_id()).get();
        }
    };
}

#endif // HELLO_WORLD_COMPONENT_HPP

hello_world_client.cpp

//  Copyright (c) 2012 Bryce Lelbach
//
//  Distributed under the Boost Software License, Version 1.0. (See accompanying
//  file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

//[hello_world_client_getting_started
#include "hello_world_component.hpp"
#include <hpx/hpx_init.hpp>

int hpx_main(boost::program_options::variables_map&)
{
    {
        // Create a single instance of the component on this locality.
        examples::hello_world client =
            hpx::new_<examples::hello_world>(hpx::find_here());

        // Invoke the component's action, which will print "Hello World!".
        client.invoke();
    }

    return hpx::finalize(); // Initiate shutdown of the runtime system.
}

int main(int argc, char* argv[])
{
    return hpx::init(argc, argv); // Initialize and run HPX.
}
//]

Now in the directory, create a Makefile. Add the following code:

CXX=(CXX)  # Add your favourite compiler here or let makefile choose default.

CXXFLAGS=-O3 -std=c++17

BOOST_ROOT=/path/to/boost
HWLOC_ROOT=/path/to/hwloc
TCMALLOC_ROOT=/path/to/tcmalloc
HPX_ROOT=/path/to/hpx

INCLUDE_DIRECTIVES=$(HPX_ROOT)/include $(BOOST_ROOT)/include $(HWLOC_ROOT)/include

LIBRARY_DIRECTIVES=-L$(HPX_ROOT)/lib $(HPX_ROOT)/lib/libhpx_init.a $(HPX_ROOT)/lib/libhpx.so $(BOOST_ROOT)/lib/libboost_atomic-mt.so $(BOOST_ROOT)/lib/libboost_filesystem-mt.so $(BOOST_ROOT)/lib/libboost_program_options-mt.so $(BOOST_ROOT)/lib/libboost_regex-mt.so $(BOOST_ROOT)/lib/libboost_system-mt.so -lpthread $(TCMALLOC_ROOT)/libtcmalloc_minimal.so $(HWLOC_ROOT)/libhwloc.so -ldl -lrt

LINK_FLAGS=$(HPX_ROOT)/lib/libhpx_wrap.a -Wl,-wrap=main  # should be left empty for HPX_WITH_HPX_MAIN=OFF

hello_world_client: libhpx_hello_world hello_world_client.o
  $(CXX) $(CXXFLAGS) -o hello_world_client $(LIBRARY_DIRECTIVES) libhpx_hello_world $(LINK_FLAGS)

hello_world_client.o: hello_world_client.cpp
  $(CXX) $(CXXFLAGS) -o hello_world_client.o hello_world_client.cpp $(INCLUDE_DIRECTIVES)

libhpx_hello_world: hello_world_component.o
  $(CXX) $(CXXFLAGS) -o libhpx_hello_world hello_world_component.o $(LIBRARY_DIRECTIVES)

hello_world_component.o: hello_world_component.cpp
  $(CXX) $(CXXFLAGS) -c -o hello_world_component.o hello_world_component.cpp $(INCLUDE_DIRECTIVES)

To build the program, type:

make

A successfull build should result in hello_world binary. To test, type:

./hello_world

Note

Due to high variations in CMake flags and library dependencies, it is recommended to build HPX applications and components with pkg-config or CMakeLists.txt. Writing Makefile may result in broken builds if due care is not taken. pkg-config files and CMake systems are configured with CMake build of HPX. Hence, they are stable and provides with better support overall.

Starting the HPX runtime

In order to write an application which uses services from the HPX runtime system you need to initialize the HPX library by inserting certain calls into the code of your application. Depending on your use case, this can be done in 3 different ways:

  • Minimally invasive: Re-use the main() function as the main HPX entry point.
  • Balanced use case: Supply your own main HPX entry point while blocking the main thread.
  • Most flexibility: Supply your own main HPX entry point while avoiding to block the main thread.
  • Suspend and resume: As above but suspend and resume the HPX runtime to allow for other runtimes to be used.
Re-use the main() function as the main HPX entry point

This method is the least intrusive to your code. It however provides you with the smallest flexibility in terms of initializing the HPX runtime system. The following code snippet shows what a minimal HPX application using this technique looks like:

#include <hpx/hpx_main.hpp>

int main(int argc, char* argv[])
{
    return 0;
}

The only change to your code you have to make is to include the file hpx/hpx_main.hpp. In this case the function main() will be invoked as the first HPX thread of the application. The runtime system will be initialized behind the scenes before the function main() is executed and will automatically stop after main() has returned. All HPX API functions can be used from within this function now.

Note

The function main() does not need to expect receiving argc argv as shown above, but could expose the signature int main(). This is consistent with the usually allowed prototypes for the function main() in C++ applications.

All command line arguments specific to HPX will still be processed by the HPX runtime system as usual. However, those command line options will be removed from the list of values passed to argc/argv of the function main(). The list of values passed to main() will hold only the commandline options which are not recognized by the HPX runtime system (see the section HPX Command Line Options for more details on what options are recognized by HPX).

Note

In this mode all one-letter-shortcuts are disabled which are normally available on the HPX command line (such as -t or -l see HPX Command Line Options). This is done to minimize any possible interaction between the command line options recognized by the HPX runtime system and any command line options defined by the application.

The value returned from the function main() as shown above will be returned to the operating system as usual.

Important

To achieve this seamless integration, the header file hpx/hpx_main.hpp defines a macro:

#define main hpx_startup::user_main

which could result in unexpected behavior.

Important

To achieve this seamless integration, we use different implementations for different Operating Systems. In case of Linux or Mac OSX, the code present in hpx_wrap.cpp is put into action. We hook into the system function in case of Linux and provide alternate entry point in case of Mac OSX. For other Operating Systems we rely on a macro:

#define main hpx_startup::user_main

provided in the header file hpx/hpx_main.hpp. This implementation can result in unexpected behavior.

Caution

We make use of an override variable include_libhpx_wrap in the header file hpx/hpx_main.hpp to swiftly choose the function call stack at runtime. Therefore, the header file should only be included in the main executable. Including it in the components will result in multiple definition of the variable.

Supply your own main HPX entry point while blocking the main thread

With this method you need to provide an explicit main thread function named hpx_main at global scope. This function will be invoked as the main entry point of your HPX application on the console locality only (this function will be invoked as the first HPX thread of your application). All HPX API functions can be used from within this function.

The thread executing the function hpx::init will block waiting for the runtime system to exit. The value returned from hpx_main will be returned from hpx::init after the runtime system has stopped.

The function hpx::finalize has to be called on one of the HPX localities in order to signal that all work has been scheduled and the runtime system should be stopped after the scheduled work has been executed.

This method of invoking HPX has the advantage of you being able to decide which version of hpx::init to call. This allows to pass additional configuration parameters while initializing the HPX runtime system.

#include <hpx/hpx_init.hpp>

int hpx_main(int argc, char* argv[])
{
    // Any HPX application logic goes here...
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    // Initialize HPX, run hpx_main as the first HPX thread, and
    // wait for hpx::finalize being called.
    return hpx::init(argc, argv);
}

Note

The function hpx_main does not need to expect receiving argc/argv as shown above, but could expose one of the following signatures:

int hpx_main();
int hpx_main(int argc, char* argv[]);
int hpx_main(boost::program_options::variables_map& vm);

This is consistent with (and extends) the usually allowed prototypes for the function main() in C++ applications.

The header file to include for this method of using HPX is hpx/hpx_init.hpp.

There are many additional overloads of hpx::init available, such as for instance to provide your own entry point function instead of hpx_main. Please refer to the function documentation for more details (see: hpx/hpx_init.hpp).

Supply your own main HPX entry point while avoiding to block the main thread

With this method you need to provide an explicit main thread function named hpx_main at global scope. This function will be invoked as the main entry point of your HPX application on the console locality only (this function will be invoked as the first HPX thread of your application). All HPX API functions can be used from within this function.

The thread executing the function hpx::start will not block waiting for the runtime system to exit, but will return immediately.

Important

You cannot use any of the HPX API functions other that hpx::stop from inside your main() function.

The function hpx::finalize has to be called on one of the HPX localities in order to signal that all work has been scheduled and the runtime system should be stopped after the scheduled work has been executed.

This method of invoking HPX is useful for applications where the main thread is used for special operations, such a GUIs. The function hpx::stop can be used to wait for the HPX runtime system to exit and should be at least used as the last function called in main(). The value returned from hpx_main will be returned from hpx::stop after the runtime system has stopped.

#include <hpx/hpx_start.hpp>

int hpx_main(int argc, char* argv[])
{
    // Any HPX application logic goes here...
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    // Initialize HPX, run hpx_main.
    hpx::start(argc, argv);

    // ...Execute other code here...

    // Wait for hpx::finalize being called.
    return hpx::stop();
}

Note

The function hpx_main does not need to expect receiving argc/argv as shown above, but could expose one of the following signatures:

int hpx_main();
int hpx_main(int argc, char* argv[]);
int hpx_main(boost::program_options::variables_map& vm);

This is consistent with (and extends) the usually allowed prototypes for the function main() in C++ applications.

The header file to include for this method of using HPX is hpx/hpx_start.hpp.

There are many additional overloads of hpx::start available, such as for instance to provide your own entry point function instead of hpx_main. Please refer to the function documentation for more details (see: hpx/hpx_start.hpp).

Suspending and resuming the HPX runtime

In some applications it is required to combine HPX with other runtimes. To support this use case HPX provides two functions: hpx::suspend and hpx::resume. hpx::suspend is a blocking call which will wait for all scheduled tasks to finish executing and then put the thread pool OS threads to sleep. hpx::resume simply wakes up the sleeping threads so that they are ready to accept new work. hpx::suspend and hpx::resume can be found in the header hpx/hpx_suspend.hpp.

#include <hpx/hpx_start.hpp>
#include <hpx/hpx_suspend.hpp>

int main(int argc, char* argv[])
{

   // Initialize HPX, don't run hpx_main
    hpx::start(nullptr, argc, argv);

    // Schedule a function on the HPX runtime
    hpx::apply(&my_function, ...);

    // Wait for all tasks to finish, and suspend the HPX runtime
    hpx::suspend();

    // Execute non-HPX code here

    // Resume the HPX runtime
    hpx::resume();

    // Schedule more work on the HPX runtime

    // hpx::finalize has to be called from the HPX runtime before hpx::stop
    hpx::apply([]() { hpx::finalize(); });
    return hpx::stop();
}

Note

hpx::suspend does not wait for hpx::finalize to be called. Only call hpx::finalize when you wish to fully stop the HPX runtime.

HPX also supports suspending individual thread pools and threads. For details on how to do that see the documentation for hpx::threads::thread_pool_base.

Automatically suspending worker threads

The previous method guarantees that the worker threads are suspended when you ask for it and that they stay suspended. An alternative way to achieve the same effect is to tweak how quickly HPX suspends its worker threads when they run out of work. The following configuration values make sure that HPX idles very quickly:

hpx.max_idle_backoff_time = 1000
hpx.max_idle_loop_count = 0

They can be set on the command line using --hpx:ini=hpx.max_idle_backoff_time=1000 and --hpx:ini=hpx.max_idle_loop_count=0. See Launching and configuring HPX applications for more details on how to set configuration parameters.

After setting idling parameters the previous example could now be written like this instead:

#include <hpx/hpx_start.hpp>

int main(int argc, char* argv[])
{

   // Initialize HPX, don't run hpx_main
    hpx::start(nullptr, argc, argv);

    // Schedule some functions on the HPX runtime
    // NOTE: run_as_hpx_thread blocks until completion.
    hpx::run_as_hpx_thread(&my_function, ...);
    hpx::run_as_hpx_thread(&my_other_function, ...);

    // hpx::finalize has to be called from the HPX runtime before hpx::stop
    hpx::apply([]() { hpx::finalize(); });
    return hpx::stop();
}

In this example each call to hpx::run_as_hpx_thread acts as a “parallel region”.

Working of hpx_main.hpp

In order to initialize HPX from main(), we make use of linker tricks.

It is implemented differently for different Operating Systems. Method of implementation is as follows:

  • Linux: Using linker --wrap option.
  • Mac OSX: Using the linker -e option.
  • Windows: Using #define main hpx_startup::user_main
Linux implementation

We make use of the Linux linker ld’s --wrap option to wrap the main() function. This way any call to main() are redirected to our own implementation of main. It is here that we check for the existence of hpx_main.hpp by making use of a shadow variable include_libhpx_wrap. The value of this variable determines the function stack at runtime.

The implementation can be found in libhpx_wrap.a.

Important

It is necessary that hpx_main.hpp be not included more than once. Multiple inclusions can result in multiple definition of include_libhpx_wrap.

Mac OSX implementation

Here we make use of yet another linker option -e to change the entry point to our custom entry function initialize_main. We initialize the HPX runtime system from this function and call main from the initialized system. We determine the function stack at runtime by making use of the shadow variable include_libhpx_wrap.

The implementation can be found in libhpx_wrap.a.

Important

It is necessary that hpx_main.hpp be not included more than once. Multiple inclusions can result in multiple definition of include_libhpx_wrap.

Windows implementation

We make use of a macro #define main hpx_startup::user_main to take care of the initializations.

This implementation could result in unexpected behaviors.

Launching and configuring HPX applications

Configuring HPX applications

All HPX applications can be configured using special command line options and/or using special configuration files. This section describes the available options, the configuration file format, and the algorithm used to locate possible predefined configuration files. Additionally this section describes the defaults assumed if no external configuration information is supplied.

During startup any HPX application applies a predefined search pattern to locate one or more configuration files. All found files will be read and merged in the sequence they are found into one single internal database holding all configuration properties. This database is used during the execution of the application to configure different aspects of the runtime system.

In addition to the ini files, any application can supply its own configuration files, which will be merged with the configuration database as well. Moreover, the user can specify additional configuration parameters on the command line when executing an application. The HPX runtime system will merge all command line configuration options (see the description of the --hpx:ini, --hpx:config, and --hpx:app-config command line options).

The HPX INI File Format

All HPX applications can be configured using a special file format which is similar to the well-known Windows INI file format. This is a structured text format allowing to group key/value pairs (properties) into sections. The basic element contained in an ini file is the property. Every property has a name and a value, delimited by an equals sign '='. The name appears to the left of the equals sign:

name=value

The value may contain equal signs as only the first '=' character is interpreted as the delimiter between name and value Whitespace before the name, after the value and immediately before and after the delimiting equal sign is ignored. Whitespace inside the value is retained.

Properties may be grouped into arbitrarily named sections. The section name appears on a line by itself, in square brackets [ and ]. All properties after the section declaration are associated with that section. There is no explicit “end of section” delimiter; sections end at the next section declaration, or the end of the file:

[section]

In HPX sections can be nested. A nested section has a name composed of all section names it is embedded in. The section names are concatenated using a dot '.':

[outer_section.inner_section]

Here inner_section is logically nested within outer_section.

It is possible to use the full section name concatenated with the property name to refer to a particular property. For example in:

[a.b.c]
d = e

the property value of d can be referred to as a.b.c.d=e.

In HPX ini files can contain comments. Hash signs '#' at the beginning of a line indicate a comment. All characters starting with the '#' until the end of line are ignored.

If a property with the same name is reused inside a section, the second occurrence of this property name will override the first occurrence (discard the first value). Duplicate sections simply merge their properties together, as if they occurred contiguously.

In HPX ini files, a property value ${FOO:default} will use the environmental variable FOO to extract the actual value if it is set and default otherwise. No default has to be specified. Therefore ${FOO} refers to the environmental variable FOO. If FOO is not set or empty the overall expression will evaluate to an empty string. A property value $[section.key:default] refers to the value held by the property section.key if it exists and default otherwise. No default has to be specified. Therefore $[section.key] refers to the property section.key. If the property section.key is not set or empty, the overall expression will evaluate to an empty string.

Note

Any property $[section.key:default] is evaluated whenever it is queried and not when the configuration data is initialized. This allows for lazy evaluation and relaxes initialization order of different sections. The only exception are recursive property values, e.g. values referring to the very key they are associated with. Those property values are evaluated at initialization time to avoid infinite recursion.

Built-in Default Configuration Settings

During startup any HPX application applies a predefined search pattern to locate one or more configuration files. All found files will be read and merged in the sequence they are found into one single internal data structure holding all configuration properties.

As a first step the internal configuration database is filled with a set of default configuration properties. Those settings are described on a section by section basis below.

Note

You can print the default configuration settings used for an executable by specifying the command line option --hpx:dump-config.

The system configuration section
[system]
pid = <process-id>
prefix = <current prefix path of core HPX library>
executable = <current prefix path of executable>
Property Description
system.pid This is initialized to store the current OS-process id of the application instance.
system.prefix This is initialized to the base directory HPX has been loaded from.
system.executable_prefix This is initialized to the base directory the current executable has been loaded from.
The hpx configuration section
[hpx]
location = ${HPX_LOCATION:$[system.prefix]}
component_path = $[hpx.location]/lib/hpx:$[system.executable_prefix]/lib/hpx:$[system.executable_prefix]/../lib/hpx
master_ini_path = $[hpx.location]/share/hpx-<version>:$[system.executable_prefix]/share/hpx-<version>:$[system.executable_prefix]/../share/hpx-<version>
ini_path = $[hpx.master_ini_path]/ini
os_threads = 1
localities = 1
program_name =
cmd_line =
lock_detection = ${HPX_LOCK_DETECTION:0}
throw_on_held_lock = ${HPX_THROW_ON_HELD_LOCK:1}
minimal_deadlock_detection = <debug>
spinlock_deadlock_detection = <debug>
spinlock_deadlock_detection_limit = ${HPX_SPINLOCK_DEADLOCK_DETECTION_LIMIT:1000000}
max_background_threads = ${HPX_MAX_BACKGROUND_THREADS:$[hpx.os_threads]}
max_idle_loop_count = ${HPX_MAX_IDLE_LOOP_COUNT:<hpx_idle_loop_count_max>}
max_busy_loop_count = ${HPX_MAX_BUSY_LOOP_COUNT:<hpx_busy_loop_count_max>}
max_idle_backoff_time = ${HPX_MAX_IDLE_BACKOFF_TIME:<hpx_idle_backoff_time_max>}

[hpx.stacks]
small_size = ${HPX_SMALL_STACK_SIZE:<hpx_small_stack_size>}
medium_size = ${HPX_MEDIUM_STACK_SIZE:<hpx_medium_stack_size>}
large_size = ${HPX_LARGE_STACK_SIZE:<hpx_large_stack_size>}
huge_size = ${HPX_HUGE_STACK_SIZE:<hpx_huge_stack_size>}
use_guard_pages = ${HPX_THREAD_GUARD_PAGE:1}
Property Description
hpx.location This is initialized to the id of the locality this application instance is running on.
hpx.component_path Duplicates are discarded. This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or using ';' (Windows).
hpx.master_ini_path This is initialized to the list of default paths of the main hpx.ini configuration files. This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or using ';' (Windows).
hpx.ini_path This is initialized to the default path where HPX will look for more ini configuration files. This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or using ';' (Windows).
hpx.os_threads This setting reflects the number of OS-threads used for running HPX-threads. Defaults to number of detected cores (not hyperthreads/PUs).
hpx.localities This setting reflects the number of localities the application is running on. Defaults to 1.
hpx.program_name This setting reflects the program name of the application instance. Initialized from the command line argv[0].
hpx.cmd_line This setting reflects the actual command line used to launch this application instance.
hpx.lock_detection This setting verifies that no locks are being held while a HPX thread is suspended. This setting is applicable only if HPX_WITH_VERIFY_LOCKS is set during configuration in CMake.
hpx.throw_on_held_lock This setting causes an exception if during lock detection at least one lock is being held while a HPX thread is suspended. This setting is applicable only if HPX_WITH_VERIFY_LOCKS is set during configuration in CMake. This setting has no effect if hpx.lock_detection=0.
hpx.minimal_deadlock_detection This setting enables support for minimal deadlock detection for HPX-threads. By default this is set to 1 (for Debug builds) or to 0 (for Release, RelWithDebInfo, RelMinSize builds), this setting is effective only if HPX_WITH_THREAD_DEADLOCK_DETECTION is set during configuration in CMake.
hpx.spinlock_deadlock_detection This setting verifies that spinlocks don’t spin longer than specified using the hpx.spinlock_deadlock_detection_limit. This setting is applicable only if HPX_WITH_SPINLOCK_DEADLOCK_DETECTION is set during configuration in CMake. By default this is set to 1 (for Debug builds) or to 0 (for Release, RelWithDebInfo, RelMinSize builds).
hpx.spinlock_deadlock_detection_limit This setting specifies the upper limit of allowed number of spins that spinlocks are allowed to perform. This setting is applicable only if HPX_WITH_SPINLOCK_DEADLOCK_DETECTION is set during configuration in CMake. By default this is set to 1000000.
hpx.max_background_threads This setting defines the number of threads in the scheduler which are used to execute background work. By default this is the same as the number of cores used for the scheduler.
hpx.max_idle_loop_count By default this is defined by the preprocessor constant HPX_IDLE_LOOP_COUNT_MAX. This is an internal setting which you should change only if you know exactly what you are doing.
hpx.max_busy_loop_count This setting defines the maximum value of the busy-loop counter in the scheduler. By default this is defined by the preprocessor constant HPX_BUSY_LOOP_COUNT_MAX. This is an internal setting which you should change only if you know exactly what you are doing.
hpx.max_idle_backoff_time This setting defines the maximum time (in milliseconds) for the scheduler to sleep after being idle for hpx.max_idle_loop_count iterations. This setting is applicable only if HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF is set during configuration in CMake. By default this is defined by the preprocessor constant HPX_IDLE_BACKOFF_TIME_MAX. This is an internal setting which you should change only if you know exactly what you are doing.
hpx.stacks.small_size This is initialized to the small stack size to be used by HPX-threads. Set by default to the value of the compile time preprocessor constant HPX_SMALL_STACK_SIZE (defaults to 0x8000). This value is used for all HPX threads by default, except for the thread running hpx_main (which runs on a large stack).
hpx.stacks.medium_size This is initialized to the medium stack size to be used by HPX-threads. Set by default to the value of the compile time preprocessor constant HPX_MEDIUM_STACK_SIZE (defaults to 0x20000).
hpx.stacks.large_size This is initialized to the large stack size to be used by HPX-threads. Set by default to the value of the compile time preprocessor constant HPX_LARGE_STACK_SIZE (defaults to 0x200000). This setting is used by default for the thread running hpx_main only.
hpx.stacks.huge_size This is initialized to the huge stack size to be used by HPX-threads. Set by default to the value of the compile time preprocessor constant HPX_HUGE_STACK_SIZE (defaults to 0x2000000).
hpx.stacks.use_guard_pages This entry controls whether the coroutine library will generate stack guard pages or not. This entry is applicable on Linux only and only if the HPX_USE_GENERIC_COROUTINE_CONTEXT option is not enabled and the HPX_WITH_THREAD_GUARD_PAGE is set to 1 while configuring the build system. It is set by default to 1.
The hpx.threadpools configuration section
[hpx.threadpools]
io_pool_size = ${HPX_NUM_IO_POOL_SIZE:2}
parcel_pool_size = ${HPX_NUM_PARCEL_POOL_SIZE:2}
timer_pool_size = ${HPX_NUM_TIMER_POOL_SIZE:2}
Property Description
hpx.threadpools.io_pool_size The value of this property defines the number of OS-threads created for the internal I/O thread pool.
hpx.threadpools.parcel_pool_size The value of this property defines the number of OS-threads created for the internal parcel thread pool.
hpx.threadpools.timer_pool_size The value of this property defines the number of OS-threads created for the internal timer thread pool.
The hpx.thread_queue configuration section

Important

These setting control internal values used by the thread scheduling queues in the HPX scheduler. You should not modify these settings except if you know exactly what you are doing]

[hpx.thread_queue]
min_tasks_to_steal_pending = ${HPX_THREAD_QUEUE_MIN_TASKS_TO_STEAL_PENDING:0}
min_tasks_to_steal_staged = ${HPX_THREAD_QUEUE_MIN_TASKS_TO_STEAL_STAGED:10}
min_add_new_count = ${HPX_THREAD_QUEUE_MIN_ADD_NEW_COUNT:10}
max_add_new_count = ${HPX_THREAD_QUEUE_MAX_ADD_NEW_COUNT:10}
max_delete_count = ${HPX_THREAD_QUEUE_MAX_DELETE_COUNT:1000}
Property Description
hpx.thread_queue.min_tasks_to_steal_pending The value of this property defines the number of pending HPX threads which have to be available before neighboring cores are allowed to steal work. The default is to allow stealing always.
hpx.thread_queue.min_tasks_to_steal_staged The value of this property defines the number of staged HPX tasks have which to be available before neighboring cores are allowed to steal work. The default is to allow stealing only if there are more tan 10 tasks available.
hpx.thread_queue.min_add_new_count The value of this property defines the minimal number tasks to be converted into HPX threads whenever the thread queues for a core have run empty.
hpx.thread_queue.max_add_new_count The value of this property defines the maximal number tasks to be converted into HPX threads whenever the thread queues for a core have run empty.
hpx.thread_queue.max_delete_count The value of this property defines the number number of terminated HPX threads to discard during each invocation of the corresponding function.
The hpx.components configuration section
[hpx.components]
load_external = ${HPX_LOAD_EXTERNAL_COMPONENTS:1}
Property Description
hpx.components.load_external This entry defines whether external components will be loaded on this locality. This entry normally is set to 1 and usually there is no need to directly change this value. It is automatically set to 0 for a dedicated AGAS server locality.

Additionally, the section hpx.components will be populated with the information gathered from all found components. The information loaded for each of the components will contain at least the following properties:

[hpx.components.<component_instance_name>]
name = <component_name>
path = <full_path_of_the_component_module>
enabled = $[hpx.components.load_external]
Property Description
hpx.components.<component_instance_name>.name This is the name of a component, usually the same as the second argument to the macro used while registering the component with HPX_REGISTER_COMPONENT. Set by the component factory.
hpx.components.<component_instance_name>.path This is either the full path file name of the component module or the directory the component module is located in. In this case, the component module name will be derived from the property hpx.components.<component_instance_name>.name. Set by the component factory.
hpx.components.<component_instance_name>.enabled This setting explicitly enables or disables the component. This is an optional property, HPX assumed that the component is enabled if it is not defined.

The value for <component_instance_name> is usually the same as for the corresponding name property. However generally it can be defined to any arbitrary instance name. It is used to distinguish between different ini sections, one for each component.

The hpx.parcel configuration section
[hpx.parcel]
address = ${HPX_PARCEL_SERVER_ADDRESS:<hpx_initial_ip_address>}
port = ${HPX_PARCEL_SERVER_PORT:<hpx_initial_ip_port>}
bootstrap = ${HPX_PARCEL_BOOTSTRAP:<hpx_parcel_bootstrap>}
max_connections = ${HPX_PARCEL_MAX_CONNECTIONS:<hpx_parcel_max_connections>}
max_connections_per_locality = ${HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY:<hpx_parcel_max_connections_per_locality>}
max_message_size = ${HPX_PARCEL_MAX_MESSAGE_SIZE:<hpx_parcel_max_message_size>}
max_outbound_message_size = ${HPX_PARCEL_MAX_OUTBOUND_MESSAGE_SIZE:<hpx_parcel_max_outbound_message_size>}
array_optimization = ${HPX_PARCEL_ARRAY_OPTIMIZATION:1}
zero_copy_optimization = ${HPX_PARCEL_ZERO_COPY_OPTIMIZATION:$[hpx.parcel.array_optimization]}
async_serialization = ${HPX_PARCEL_ASYNC_SERIALIZATION:1}
message_handlers = ${HPX_PARCEL_MESSAGE_HANDLERS:0}
Property Description
hpx.parcel.address This property defines the default IP address to be used for the parcel layer to listen to. This IP address will be used as long as no other values are specified (for instance using the --hpx:hpx command line option). The expected format is any valid IP address or domain name format which can be resolved into an IP address. The default depends on the compile time preprocessor constant HPX_INITIAL_IP_ADDRESS ("127.0.0.1").
hpx.parcel.port This property defines the default IP port to be used for the parcel layer to listen to. This IP port will be used as long as no other values are specified (for instance using the --hpx:hpx command line option). The default depends on the compile time preprocessor constant HPX_INITIAL_IP_PORT (7910).
hpx.parcel.bootstrap This property defines which parcelport type should be used during application bootstrap. The default depends on the compile time preprocessor constant HPX_PARCEL_BOOTSTRAP ("tcp").
hpx.parcel.max_connections This property defines how many network connections between different localities are overall kept alive by each of locality. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_CONNECTIONS (512).
hpx.parcel.max_connections_per_locality This property defines the maximum number of network connections that one locality will open to another locality. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY (4).
hpx.parcel.max_message_size This property defines the maximum allowed message size which will be transferrable through the parcel layer. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_MESSAGE_SIZE (1000000000 bytes).
hpx.parcel.max_outbound_message_size This property defines the maximum allowed outbound coalesced message size which will be transferrable through the parcel layer. The default depends on the compile time preprocessor constant HPX_PARCEL_MAX_OUTBOUND_MESSAGE_SIZE (1000000 bytes).
hpx.parcel.array_optimization This property defines whether this locality is allowed to utilize array optimizations during serialization of parcel data. The default is 1.
hpx.parcel.zero_copy_optimization This property defines whether this locality is allowed to utilize zero copy optimizations during serialization of parcel data. The default is the same value as set for hpx.parcel.array_optimization.
hpx.parcel.async_serialization This property defines whether this locality is allowed to spawn a new thread for serialization (this is both for encoding and decoding parcels). The default is 1.
hpx.parcel.message_handlers This property defines whether message handlers are loaded. The default is 0.

The following settings relate to the TCP/IP parcelport.

[hpx.parcel.tcp]
enable = ${HPX_HAVE_PARCELPORT_TCP:$[hpx.parcel.enabled]}
array_optimization = ${HPX_PARCEL_TCP_ARRAY_OPTIMIZATION:$[hpx.parcel.array_optimization]}
zero_copy_optimization = ${HPX_PARCEL_TCP_ZERO_COPY_OPTIMIZATION:$[hpx.parcel.zero_copy_optimization]}
async_serialization = ${HPX_PARCEL_TCP_ASYNC_SERIALIZATION:$[hpx.parcel.async_serialization]}
parcel_pool_size = ${HPX_PARCEL_TCP_PARCEL_POOL_SIZE:$[hpx.threadpools.parcel_pool_size]}
max_connections =  ${HPX_PARCEL_TCP_MAX_CONNECTIONS:$[hpx.parcel.max_connections]}
max_connections_per_locality = ${HPX_PARCEL_TCP_MAX_CONNECTIONS_PER_LOCALITY:$[hpx.parcel.max_connections_per_locality]}
max_message_size =  ${HPX_PARCEL_TCP_MAX_MESSAGE_SIZE:$[hpx.parcel.max_message_size]}
max_outbound_message_size =  ${HPX_PARCEL_TCP_MAX_OUTBOUND_MESSAGE_SIZE:$[hpx.parcel.max_outbound_message_size]}
Property Description
hpx.parcel.tcp.enable Enable the use of the default TCP parcelport. Note that the initial bootstrap of the overall HPX application will be performed using the default TCP connections. This parcelport is enabled by default. This will be disabled only if MPI is enabled (see below).
hpx.parcel.tcp.array_optimization This property defines whether this locality is allowed to utilize array optimizations in the TCP/IP parcelport during serialization of parcel data. The default is the same value as set for hpx.parcel.array_optimization.
hpx.parcel.tcp.zero_copy_optimization This property defines whether this locality is allowed to utilize zero copy optimizations in the TCP/IP parcelport during serialization of parcel data. The default is the same value as set for hpx.parcel.zero_copy_optimization.
hpx.parcel.tcp.async_serialization This property defines whether this locality is allowed to spawn a new thread for serialization in the TCP/IP parcelport (this is both for encoding and decoding parcels). The default is the same value as set for hpx.parcel.async_serialization.
hpx.parcel.tcp.parcel_pool_size The value of this property defines the number of OS-threads created for the internal parcel thread pool of the TCP parcel port. The default is taken from hpx.threadpools.parcel_pool_size.
hpx.parcel.tcp.max_connections This property defines how many network connections between different localities are overall kept alive by each of locality. The default is taken from hpx.parcel.max_connections.
hpx.parcel.tcp.max_connections_per_locality This property defines the maximum number of network connections that one locality will open to another locality. The default is taken from hpx.parcel.max_connections_per_locality.
hpx.parcel.tcp.max_message_size This property defines the maximum allowed message size which will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_message_size.
hpx.parcel.tcp.max_outbound_message_size This property defines the maximum allowed outbound coalesced message size which will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_outbound_connections.

The following settings relate to the MPI parcelport. These settings take effect only if the compile time constant HPX_HAVE_PARCELPORT_MPI is set (the equivalent cmake variable is HPX_WITH_PARCELPORT_MPI and has to be set to ON.

[hpx.parcel.mpi]
enable = ${HPX_HAVE_PARCELPORT_MPI:$[hpx.parcel.enabled]}
env = ${HPX_HAVE_PARCELPORT_MPI_ENV:MV2_COMM_WORLD_RANK,PMI_RANK,OMPI_COMM_WORLD_SIZE,ALPS_APP_PE}
multithreaded = ${HPX_HAVE_PARCELPORT_MPI_MULTITHREADED:0}
rank = <MPI_rank>
processor_name = <MPI_processor_name>
array_optimization = ${HPX_HAVE_PARCEL_MPI_ARRAY_OPTIMIZATION:$[hpx.parcel.array_optimization]}
zero_copy_optimization = ${HPX_HAVE_PARCEL_MPI_ZERO_COPY_OPTIMIZATION:$[hpx.parcel.zero_copy_optimization]}
use_io_pool = ${HPX_HAVE_PARCEL_MPI_USE_IO_POOL:$1}
async_serialization = ${HPX_HAVE_PARCEL_MPI_ASYNC_SERIALIZATION:$[hpx.parcel.async_serialization]}
parcel_pool_size = ${HPX_HAVE_PARCEL_MPI_PARCEL_POOL_SIZE:$[hpx.threadpools.parcel_pool_size]}
max_connections =  ${HPX_HAVE_PARCEL_MPI_MAX_CONNECTIONS:$[hpx.parcel.max_connections]}
max_connections_per_locality = ${HPX_HAVE_PARCEL_MPI_MAX_CONNECTIONS_PER_LOCALITY:$[hpx.parcel.max_connections_per_locality]}
max_message_size =  ${HPX_HAVE_PARCEL_MPI_MAX_MESSAGE_SIZE:$[hpx.parcel.max_message_size]}
max_outbound_message_size =  ${HPX_HAVE_PARCEL_MPI_MAX_OUTBOUND_MESSAGE_SIZE:$[hpx.parcel.max_outbound_message_size]}
Property Description
hpx.parcel.mpi.enable Enable the use of the MPI parcelport. HPX tries to detect if the application was started within a parallel MPI environment. If the detection was succesful, the MPI parcelport is enabled by default. To explicitly disable the MPI parcelport, set to 0. Note that the initial bootstrap of the overall HPX application will be performed using MPI as well.
hpx.parcel.mpi.env This property influences which environment variables (comma separated) will be analyzed to find out whether the application was invoked by MPI.
hpx.parcel.mpi.multithreaded This property is used to determine what threading mode to use when initializing MPI. If this setting is 0 HPX will initialize MPI with MPI_THREAD_SINGLE if the value is not equal to 0 HPX will initialize MPI with MPI_THREAD_MULTI.
hpx.parcel.mpi.rank This property will be initialized to the MPI rank of the locality.
hpx.parcel.mpi.processor_name This property will be initialized to the MPI processor name of the locality.
hpx.parcel.mpi.array_optimization This property defines whether this locality is allowed to utilize array optimizations in the MPI parcelport during serialization of parcel data. The default is the same value as set for hpx.parcel.array_optimization.
hpx.parcel.mpi.zero_copy_optimization This property defines whether this locality is allowed to utilize zero copy optimizations in the MPI parcelport during serialization of parcel data. The default is the same value as set for hpx.parcel.zero_copy_optimization.
hpx.parcel.mpi.use_io_pool This property can be set to run the progress thread inside of HPX threads instead of a separate thread pool. The default is 1.
hpx.parcel.mpi.async_serialization This property defines whether this locality is allowed to spawn a new thread for serialization in the MPI parcelport (this is both for encoding and decoding parcels). The default is the same value as set for hpx.parcel.async_serialization.
hpx.parcel.mpi.parcel_pool_size The value of this property defines the number of OS-threads created for the internal parcel thread pool of the MPI parcel port. The default is taken from hpx.threadpools.parcel_pool_size.
hpx.parcel.mpi.max_connections This property defines how many network connections between different localities are overall kept alive by each of locality. The default is taken from hpx.parcel.max_connections.
hpx.parcel.mpi.max_connections_per_locality This property defines the maximum number of network connections that one locality will open to another locality. The default is taken from hpx.parcel.max_connections_per_locality.
hpx.parcel.mpi.max_message_size This property defines the maximum allowed message size which will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_message_size.
hpx.parcel.mpi.max_outbound_message_size This property defines the maximum allowed outbound coalesced message size which will be transferrable through the parcel layer. The default is taken from hpx.parcel.max_outbound_connections.
The hpx.agas configuration section
[hpx.agas]
address = ${HPX_AGAS_SERVER_ADDRESS:<hpx_initial_ip_address>}
port = ${HPX_AGAS_SERVER_PORT:<hpx_initial_ip_port>}
service_mode = hosted
dedicated_server = 0
max_pending_refcnt_requests = ${HPX_AGAS_MAX_PENDING_REFCNT_REQUESTS:<hpx_initial_agas_max_pending_refcnt_requests>}
use_caching = ${HPX_AGAS_USE_CACHING:1}
use_range_caching = ${HPX_AGAS_USE_RANGE_CACHING:1}
local_cache_size = ${HPX_AGAS_LOCAL_CACHE_SIZE:<hpx_agas_local_cache_size>}
Property Description
hpx.agas.address This property defines the default IP address to be used for the AGAS root server. This IP address will be used as long as no other values are specified (for instance using the --hpx:agas command line option). The expected format is any valid IP address or domain name format which can be resolved into an IP address. The default depends on the compile time preprocessor constant HPX_INITIAL_IP_ADDRESS ("127.0.0.1").
hpx.agas.port This property defines the default IP port to be used for the AGAS root server. This IP port will be used as long as no other values are specified (for instance using the --hpx:agas command line option). The default depends on the compile time preprocessor constant HPX_INITIAL_IP_PORT (7009).
hpx.agas.service_mode This property specifies what type of AGAS service is running on this locality. Currently, two modes exist. The locality that acts as the AGAS server runs in bootstrap mode. All other localities are in hosted mode.
hpx.agas.dedicated_server This property specifies whether the AGAS server is exclusively running AGAS services and not hosting any application components. It is a boolean value. Set to 1 if --hpx:run-agas-server-only is present.
hpx.agas.max_pending_refcnt_requests This property defines the number of reference counting requests (increments or decrements) to buffer. The default depends on the compile time preprocessor constant HPX_INITIAL_AGAS_MAX_PENDING_REFCNT_REQUESTS (4096).
hpx.agas.use_caching This property specifies whether a software address translation cache is used. It is a boolean value. Defaults to 1.
hpx.agas.use_range_caching This property specifies whether range-based caching is used by the software address translation cache. This property is ignored if hpx.agas.use_caching is false. It is a boolean value. Defaults to 1.
hpx.agas.local_cache_size This property defines the size of the software address translation cache for AGAS services. This property is ignored if hpx.agas.use_caching is false. Note that if hpx.agas.use_range_caching is true, this size will refer to the maximum number of ranges stored in the cache, not the number of entries spanned by the cache. The default depends on the compile time preprocessor constant HPX_AGAS_LOCAL_CACHE_SIZE (4096).
The hpx.commandline configuration section

The following table lists the definition of all pre-defined command line option shortcuts. For more information about commandline options see the section HPX Command Line Options.

[hpx.commandline]
aliasing = ${HPX_COMMANDLINE_ALIASING:1}
allow_unknown = ${HPX_COMMANDLINE_ALLOW_UNKNOWN:0}

[hpx.commandline.aliases]
-a = --hpx:agas
-c = --hpx:console
-h = --hpx:help
-I = --hpx:ini
-l = --hpx:localities
-p = --hpx:app-config
-q = --hpx:queuing
-r = --hpx:run-agas-server
-t = --hpx:threads
-v = --hpx:version
-w = --hpx:worker
-x = --hpx:hpx
-0 = --hpx:node=0
-1 = --hpx:node=1
-2 = --hpx:node=2
-3 = --hpx:node=3
-4 = --hpx:node=4
-5 = --hpx:node=5
-6 = --hpx:node=6
-7 = --hpx:node=7
-8 = --hpx:node=8
-9 = --hpx:node=9
Property Description
hpx.commandline.aliasing Enable command line aliases as defined in the section hpx.commandline.aliases (see below). Defaults to 1.
hpx.commandline.allow_unknown Allow for unknown command line options to be passed through to hpx_main() Defaults to 0.
hpx.commandline.aliases.-a On the commandline, -a expands to: --hpx:agas.
hpx.commandline.aliases.-c On the commandline, -c expands to: --hpx:console.
hpx.commandline.aliases.-h On the commandline, -h expands to: --hpx:help.
hpx.commandline.aliases.--help On the commandline, --help expands to: --hpx:help.
hpx.commandline.aliases.-I On the commandline, -I expands to: --hpx:ini.
hpx.commandline.aliases.-l On the commandline, -l expands to: --hpx:localities.
hpx.commandline.aliases.-p On the commandline, -p expands to: --hpx:app-config.
hpx.commandline.aliases.-q On the commandline, -q expands to: --hpx:queuing.
hpx.commandline.aliases.-r On the commandline, -r expands to: --hpx:run-agas-server.
hpx.commandline.aliases.-t On the commandline, -t expands to: --hpx:threads.
hpx.commandline.aliases.-v On the commandline, -v expands to: --hpx:version.
hpx.commandline.aliases.--version On the commandline, --version expands to: --hpx:version.
hpx.commandline.aliases.-w On the commandline, -w expands to: --hpx:worker.
hpx.commandline.aliases.-x On the commandline, -x expands to: --hpx:hpx.
hpx.commandline.aliases.-0 On the commandline, -0 expands to: --hpx:node=0.
hpx.commandline.aliases.-1 On the commandline, -1 expands to: --hpx:node=1.
hpx.commandline.aliases.-2 On the commandline, -2 expands to: --hpx:node=2.
hpx.commandline.aliases.-3 On the commandline, -3 expands to: --hpx:node=3.
hpx.commandline.aliases.-4 On the commandline, -4 expands to: --hpx:node=4.
hpx.commandline.aliases.-5 On the commandline, -5 expands to: --hpx:node=5.
hpx.commandline.aliases.-6 On the commandline, -6 expands to: --hpx:node=6.
hpx.commandline.aliases.-7 On the commandline, -7 expands to: --hpx:node=7.
hpx.commandline.aliases.-8 On the commandline, -8 expands to: --hpx:node=8.
hpx.commandline.aliases.-9 On the commandline, -9 expands to: --hpx:node=9.
Loading INI files

During startup and after the internal database has been initialized as described in the section Built-in Default Configuration Settings, HPX will try to locate and load additional ini files to be used as a source for configuration properties. This allows for a wide spectrum of additional customization possibilities by the user and system administrators. The sequence of locations where HPX will try loading the ini files is well defined and documented in this section. All ini files found are merged into the internal configuration database. The merge operation itself conforms to the rules as described in the section The HPX INI File Format.

  1. Load all component shared libraries found in the directories specified by the property hpx.component_path and retrieve their default configuration information (see section Loading components for more details). This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or using ';' (Windows).
  2. Load all files named hpx.ini in the directories referenced by the property hpx.master_ini_path This property can refer to a list of directories separated by ':' (Linux, Android, and MacOS) or using ';' (Windows).
  3. Load a file named .hpx.ini in the current working directory, e.g. the directory the application was invoked from.
  4. Load a file referenced by the environment variable HPX_INI. This variable is expected to provide the full path name of the ini configuration file (if any).
  5. Load a file named /etc/hpx.ini. This lookup is done on non-Windows systems only.
  6. Load a file named .hpx.ini in the home directory of the current user, e.g. the directory referenced by the environment variable HOME.
  7. Load a file named .hpx.ini in the directory referenced by the environment variable PWD.
  8. Load the file specified on the command line using the option --hpx:config.
  9. Load all properties specified on the command line using the option --hpx:ini. The properties will be added to the database in the same sequence as they are specified on the command line. The format for those options is for instance --hpx:ini=hpx.default_stack_size=0x4000. In addition to the explicit command line options, this will set the following properties as implied from other settings:
  10. Load files based on the pattern *.ini in all directories listed by the property hpx.ini_path. All files found during this search will be merged. The property hpx.ini_path can hold a list of directories separated by ':' (on Linux or Mac) or ';' (on Windows).
  11. Load the file specified on the command line using the option --hpx:app-config. Note that this file will be merged as the content for a top level section [application].

Note

Any changes made to the configuration database caused by one of the steps will influence the loading process for all subsequent steps. For instance, if one of the ini files loaded changes the property hpx.ini_path this will influence the directories searched in step 9 as described above.

Important

The HPX core library will verify that all configuration settings specified on the command line (using the --hpx:ini option) will be checked for validity. That means that the library will accept only known configuration settings. This is to protect the user from unintentional typos while specifying those settings. This behavior can be overwritten by appending a '!' to the configuration key, thus forcing the setting to be entered into the configuration database, for instance: --hpx:ini=hpx.foo! = 1

If any of the environment variables or files listed above is not found the corresponding loading step will be silently skipped.

Loading components

HPX relies on loading application specific components during the runtime of an application. Moreover, HPX comes with a set of preinstalled components supporting basic functionalities useful for almost every application. Any component in HPX is loaded from a shared library, where any of the shared libraries can contain more than one component type. During startup, HPX tries to locate all available components (e.g. their corresponding shared libraries) and creates an internal component registry for later use. This section describes the algorithm used by HPX to locate all relevant shared libraries on a system. As described, this algorithm is customizable by the configuration properties loaded from the ini files (see section Loading INI files).

Loading components is a two stage process. First HPX tries to locate all component shared libraries, loads those, and generates default configuration section in the internal configuration database for each component found. For each found component the following information is generated:

[hpx.components.<component_instance_name>]
name = <name_of_shared_library>
path = $[component_path]
enabled = $[hpx.components.load_external]
default = 1

The values in this section correspond to the expected configuration information for a component as described in the section Built-in Default Configuration Settings.

In order to locate component shared libraries, HPX will try loading all shared libraries (files with the platform specific extension of a shared library, Linux: *.so, Windows: *.dll, MacOS: *.dylib found in the directory referenced by the ini property hpx.component_path).

This first step corresponds to step 1) during the process of filling the internal configuration database with default information as described in section Loading INI files.

After all of the configuration information has been loaded, HPX performs the second step in terms of loading components. During this step, HPX scans all existing configuration sections [hpx.component.<some_component_instance_name>] and instantiates a special factory object for each of the successfully located and loaded components. During the application’s life time, these factory objects will be responsible to create new and discard old instances of the component they are associated with. This step is performed after step 11) of the process of filling the internal configuration database with default information as described in section Loading INI files.

Application specific component example

In this section we assume to have a simple application component which exposes one member function as a component action. The header file app_server.hpp declares the C++ type to be exposed as a component. This type has a member function print_greeting() which is exposed as an action print_greeting_action. We assume the source files for this example are located in a directory referenced by $APP_ROOT:

// file: $APP_ROOT/app_server.hpp
#include <hpx/hpx.hpp>
#include <hpx/include/iostreams.hpp>

namespace app
{
    // Define a simple component exposing one action 'print_greeting'
    class HPX_COMPONENT_EXPORT server
      : public hpx::components::component_base<server>
    {
        void print_greeting ()
        {
            hpx::cout << "Hey, how are you?\n" << hpx::flush;
        }

        // Component actions need to be declared, this also defines the
        // type 'print_greeting_action' representing the action.
        HPX_DEFINE_COMPONENT_ACTION(server, print_greeting, print_greeting_action);
    };
}

// Declare boilerplate code required for each of the component actions.
HPX_REGISTER_ACTION_DECLARATION(app::server::print_greeting_action);

The corresponding source file contains mainly macro invocations which define boilerplate code needed for HPX to function properly:

// file: $APP_ROOT/app_server.cpp
#include "app_server.hpp"

// Define boilerplate required once per component module.
HPX_REGISTER_COMPONENT_MODULE();

// Define factory object associated with our component of type 'app::server'.
HPX_REGISTER_COMPONENT(app::server, app_server);

// Define boilerplate code required for each of the component actions. Use the
// same argument as used for HPX_REGISTER_ACTION_DECLARATION above.
HPX_REGISTER_ACTION(app::server::print_greeting_action);

The following gives an example of how the component can be used. We create one instance of the app::server component on the current locality and invoke the exposed action print_greeting_action using the global id of the newly created instance. Note, that no special code is required to delete the component instance after it is not needed anymore. It will be deleted automatically when its last reference goes out of scope, here at the closing brace of the block surrounding the code:

// file: $APP_ROOT/use_app_server_example.cpp
#include <hpx/hpx_init.hpp>
#include "app_server.hpp"

int hpx_main()
{
    {
        // Create an instance of the app_server component on the current locality.
        hpx::naming:id_type app_server_instance =
            hpx::create_component<app::server>(hpx::find_here());

        // Create an instance of the action 'print_greeting_action'.
        app::server::print_greeting_action print_greeting;

        // Invoke the action 'print_greeting' on the newly created component.
        print_greeting(app_server_instance);
    }
    return hpx::finalize();
}

int main(int argc, char* argv[])
{
    return hpx::init(argc, argv);
}

In order to make sure that the application will be able to use the component app::server, special configuration information must be passed to HPX. The simples way to allow HPX to ‘find’ the component is to provide special ini configuration files, which add the necessary information to the internal configuration database. The component should have a special ini file containing the information specific to the component app_server.

# file: $APP_ROOT/app_server.ini
[hpx.components.app_server]
name = app_server
path = $APP_LOCATION/

Here $APP_LOCATION is the directory where the (binary) component shared library is located. HPX will attempt to load the shared library from there. The section name hpx.components.app_server reflects the instance name of the component (app_server is an arbitrary, but unique name). The property value for hpx.components.app_server.name should be the same as used for the second argument to the macro HPX_REGISTER_COMPONENT above.

Additionally a file .hpx.ini which could be located in the current working directory (see step 3 as described in the section Loading INI files) can be used to add to the ini search path for components:

# file: $PWD/.hpx.ini
[hpx]
ini_path = $[hpx.ini_path]:$APP_ROOT/

This assumes that the above ini file specific to the component is located in the directory $APP_ROOT.

Note

It is possible to reference the defined property from inside its value. HPX will gracefully use the previous value of hpx.ini_path for the reference on the right hand side and assign the overall (now expanded) value to the property.

Logging

HPX uses a sophisticated logging framework allowing to follow in detail what operations have been performed inside the HPX library in what sequence. This information proves to be very useful for diagnosing problems or just for improving the understanding what is happening in HPX as a consequence of invoking HPX API functionality.

Default logging

Enabling default logging is a simple process. The detailed description in the remainder of this section explains different ways to customize the defaults. Default logging can be enabled by using one of the following:

  • a command line switch --hpx:debug-hpx-log, which will enable logging to the console terminal
  • the command line switch --hpx:debug-hpx-log=<filename>, which enables logging to a given file <filename>, or
  • setting an environment variable HPX_LOGLEVEL=<loglevel> while running the HPX application. In this case <loglevel> should be a number between (or equal to) 1 and 5 where 1 means minimal logging and 5 causes to log all available messages. When setting the environment variable the logs will be written to a file named hpx.<PID>.lo in the current working directory, where <PID> is the process id of the console instance of the application.
Customizing logging

Generally, logging can be customized either using environment variable settings or using by an ini configuration file. Logging is generated in several categories, each of which can be customized independently. All customizable configuration parameters have reasonable defaults, allowing to use logging without any additional configuration effort. The following table lists the available categories.

Table 18 Logging categories
Category Category shortcut Information to be generated Environment variable
General None Logging information generated by different subsystems of HPX, such as thread-manager, parcel layer, LCOs, etc. HPX_LOGLEVEL
AGAS AGAS Logging output generated by the AGAS subsystem HPX_AGAS_LOGLEVEL
Application APP Logging generated by applications. HPX_APP_LOGLEVEL

By default, all logging output is redirected to the console instance of an application, where it is collected and written to a file, one file for each logging category.

Each logging category can be customized at two levels, the parameters for each are stored in the ini configuration sections hpx.logging.CATEGORY and hpx.logging.console.CATEGORY (where CATEGORY is the category shortcut as listed in the table above). The former influences logging at the source locality and the latter modifies the logging behaviour for each of the categories at the console instance of an application.

Levels

All HPX logging output have seven different logging levels. These levels can be set explicitly or through environmental variables in the main HPX ini file as shown below. The logging levels and their associated integral values are shown in the table below, ordered from most verbose to least verbose. By default, all HPX logs are set to 0, e.g. all logging output is disabled by default.

Table 19 Logging levels
Logging level Integral value
<debug> 5
<info> 4
<warning> 3
<error> 2
<fatal> 1
No logging 0

Tip

The easiest way to enable logging output is to set the environment variable corresponding to the logging category to an integral value as described in the table above. For instance, setting HPX_LOGLEVEL=5 will enable full logging output for the general category. Please note that the syntax and means of setting environment variables varies between operating systems.

Configuration

Logs will be saved to destinations as configured by the user. By default, logging output is saved on the console instance of an application to hpx.<CATEGORY>.<PID>.lo (where CATEGORY and PID> are placeholders for the category shortcut and the OS process id). The output for the general logging category is saved to hpx.<PID>.log. The default settings for the general logging category are shown here (the syntax is described in the section The HPX INI File Format):

[hpx.logging]
level = ${HPX_LOGLEVEL:0}
destination = ${HPX_LOGDESTINATION:console}
format = ${HPX_LOGFORMAT:(T%locality%/%hpxthread%.%hpxphase%/%hpxcomponent%) P%parentloc%/%hpxparent%.%hpxparentphase% %time%($hh:$mm.$ss.$mili) [%idx%]|\\n}

The logging level is taken from the environment variable HPX_LOGLEVEL and defaults to zero, e.g. no logging. The default logging destination is read from the environment variable HPX_LOGDESTINATION On any of the localities it defaults to console which redirects all generated logging output to the console instance of an application. The following table lists the possible destinations for any logging output. It is possible to specify more than one destination separated by whitespace.

Table 20 Logging destinations
Logging destination Description
file(<filename>) Direct all output to a file with the given <filename>.
cout Direct all output to the local standard output of the application instance on this locality.
cerr Direct all output to the local standard error output of the application instance on this locality.
console Direct all output to the console instance of the application. The console instance has its logging destinations configured separately.
android_log Direct all output to the (Android) system log (available on Android systems only).

The logging format is read from the environment variable HPX_LOGFORMAT and it defaults to a complex format description. This format consists of several placeholder fields (for instance %locality% which will be replaced by concrete values when the logging output is generated. All other information is transferred verbatim to the output. The table below describes the available field placeholders. The separator character | separates the logging message prefix formatted as shown and the actual log message which will replace the separator.

Table 21 Available field placeholders
Name Description
locality The id of the locality on which the logging message was generated.
hpxthread The id of the HPX-thread generating this logging output.
hpxphase The phase [1] of the HPX-thread generating this logging output.
hpxcomponent The local virtual address of the component which the current HPX-thread is accessing.
parentloc The id of the locality where the HPX thread was running which initiated the current HPX-thread. The current HPX-thread is generating this logging output.
hpxparent The id of the HPX-thread which initiated the current HPX-thread. The current HPX-thread is generating this logging output.
hpxparentphase The phase of the HPX-thread when it initiated the current HPX-thread. The current HPX-thread is generating this logging output.
time The time stamp for this logging outputline as generated by the source locality.
idx The sequence number of the logging output line as generated on the source locality.
osthread The sequence number of the OS-thread which executes the current HPX-thread.

Note

Not all of the field placeholder may be expanded for all generated logging output. If no value is available for a particular field it is replaced with a sequence of '-' characters.]

Here is an example line from a logging output generated by one of the HPX examples (please note that this is generated on a single line, without line break):

(T00000000/0000000002d46f90.01/00000000009ebc10) P--------/0000000002d46f80.02 17:49.37.320 [000000000000004d]
    <info>  [RT] successfully created component {0000000100ff0001, 0000000000030002} of type: component_barrier[7(3)]

The default settings for the general logging category on the console is shown here:

[hpx.logging.console]
level = ${HPX_LOGLEVEL:$[hpx.logging.level]}
destination = ${HPX_CONSOLE_LOGDESTINATION:file(hpx.$[system.pid].log)}
format = ${HPX_CONSOLE_LOGFORMAT:|}

These settings define how the logging is customized once the logging output is received by the console instance of an application. The logging level is read from the environment variable HPX_LOGLEVEL (as set for the console instance of the application). The level defaults to the same values as the corresponding settings in the general logging configuration shown before. The destination on the console instance is set to be a file which name is generated based from its OS process id. Setting the environment variable HPX_CONSOLE_LOGDESTINATION allows customization of the naming scheme for the output file. The logging format is set to leave the original logging output unchanged, as received from one of the localities the application runs on.

HPX Command Line Options

The predefined command line options for any application using hpx::init are described in the following subsections.

HPX options (allowed on command line only)
--hpx:help

print out program usage (default: this message), possible values: full (additionally prints options from components)

--hpx:version

print out HPX version and copyright information

--hpx:info

print out HPX configuration information

--hpx:options-file arg

specify a file containing command line options (alternatively: @filepath)

HPX options (additionally allowed in an options file)
--hpx:worker

run this instance in worker mode

--hpx:console

run this instance in console mode

--hpx:connect

run this instance in worker mode, but connecting late

--hpx:run-agas-server

run AGAS server as part of this runtime instance

--hpx:run-hpx-main

run the hpx_main function, regardless of locality mode

--hpx:hpx arg

the IP address the HPX parcelport is listening on, expected format: address:port (default: 127.0.0.1:7910)

--hpx:agas arg

the IP address the AGAS root server is running on, expected format: address:port (default: 127.0.0.1:7910)

--hpx:run-agas-server-only

run only the AGAS server

--hpx:nodefile arg

the file name of a node file to use (list of nodes, one node name per line and core)

--hpx:nodes arg

the (space separated) list of the nodes to use (usually this is extracted from a node file)

--hpx:endnodes

this can be used to end the list of nodes specified using the option --hpx:nodes

--hpx:ifsuffix arg

suffix to append to host names in order to resolve them to the proper network interconnect

--hpx:ifprefix arg

prefix to prepend to host names in order to resolve them to the proper network interconnect

--hpx:iftransform arg

sed-style search and replace (s/search/replace/) used to transform host names to the proper network interconnect

--hpx:localities arg

the number of localities to wait for at application startup (default: 1)

--hpx:node arg

number of the node this locality is run on (must be unique)

--hpx:ignore-batch-env

ignore batch environment variables

--hpx:expect-connecting-localities

this locality expects other localities to dynamically connect (this is implied if the number of initial localities is larger than 1)

--hpx:pu-offset

the first processing unit this instance of HPX should be run on (default: 0)

--hpx:pu-step

the step between used processing unit numbers for this instance of HPX (default: 1)

--hpx:threads arg

the number of operating system threads to spawn for this HPX locality. Possible values are: numeric values 1, 2, 3 and so on, all (which spawns one thread per processing unit, includes hyperthreads), or cores (which spawns one thread per core) (default: cores).

--hpx:cores arg

the number of cores to utilize for this HPX locality (default: all, i.e. the number of cores is based on the number of threads --hpx:threads assuming --hpx:bind=compact

--hpx:affinity arg

the affinity domain the OS threads will be confined to, possible values: pu, core, numa, machine (default: pu)

--hpx:bind arg

the detailed affinity description for the OS threads, see More details about HPX command line options for a detailed description of possible values. Do not use with --hpx:pu-step, --hpx:pu-offset or --hpx:affinity options. Implies --hpx:numa-sensitive (--hpx:bind=none) disables defining thread affinities).

--hpx:print-bind

print to the console the bit masks calculated from the arguments specified to all --hpx:bind options.

--hpx:queuing arg

the queue scheduling policy to use, options are local, local-priority-fifo, local-priority-lifo, static, static-priority, abp-priority-fifo and abp-priority-lifo (default: local-priority-fifo)

--hpx:high-priority-threads arg

the number of operating system threads maintaining a high priority queue (default: number of OS threads), valid for --hpx:queuing=abp-priority, --hpx:queuing=static-priority and --hpx:queuing=local-priority only

--hpx:numa-sensitive

makes the scheduler NUMA sensitive

HPX configuraton options
--hpx:app-config arg

load the specified application configuration (ini) file

--hpx:config arg

load the specified hpx configuration (ini) file

--hpx:ini arg

add a configuration definition to the default runtime configuration

--hpx:exit

exit after configuring the runtime

HPX debugging options
--hpx:list-symbolic-names

list all registered symbolic names after startup

--hpx:list-component-types

list all dynamic component types after startup

--hpx:dump-config-initial

print the initial runtime configuration

--hpx:dump-config

print the final runtime configuration

--hpx:debug-hpx-log [arg]

enable all messages on the HPX log channel and send all HPX logs to the target destination (default: cout)

--hpx:debug-agas-log [arg]

enable all messages on the AGAS log channel and send all AGAS logs to the target destination (default: cout)

--hpx:debug-parcel-log [arg]

enable all messages on the parcel transport log channel and send all parcel transport logs to the target destination (default: cout)

--hpx:debug-timing-log [arg]

enable all messages on the timing log channel and send all timing logs to the target destination (default: cout)

--hpx:debug-app-log [arg]

enable all messages on the application log channel and send all application logs to the target destination (default: cout)

--hpx:debug-clp

debug command line processing

--hpx:attach-debugger arg

wait for a debugger to be attached, possible arg values: startup or exception (default: startup)

Command line argument shortcuts

Additionally, the following shortcuts are available from every HPX application.

Table 22 Predefined command line option shortcuts
Shortcut option Equivalent long option
-a --hpx:agas
-c --hpx:console
-h --hpx:help
-I --hpx:ini
-l --hpx:localities
-p --hpx:app-config
-q --hpx:queuing
-r --hpx:run-agas-server
-t --hpx:threads
-v --hpx:version
-w --hpx:worker
-x --hpx:hpx
-0 --hpx:node=0
-1 --hpx:node=1
-2 --hpx:node=2
-3 --hpx:node=3
-4 --hpx:node=4
-5 --hpx:node=5
-6 --hpx:node=6
-7 --hpx:node=7
-8 --hpx:node=8
-9 --hpx:node=9

It is possible to define your own shortcut options. In fact, all of the shortcuts listed above are pre-defined using the technique described here. Also, it is possible to redefine any of the pre-defined shortcuts to expand differently as well.

Shortcut options are obtained from the internal configuration database. They are stored as key-value properties in a special properties section named hpx.commandline. You can define your own shortcuts by adding the corresponding definitions to one of the ini configuration files as described in the section Configuring HPX applications. For instance, in order to define a command line shortcut --p which should expand to -hpx:print-counter, the following configuration information needs to be added to one of the ini configuration files:

[hpx.commandline.aliases]
--pc = --hpx:print-counter

Note

Any arguments for shortcut options passed on the command line are retained and passed as arguments to the corresponding expanded option. For instance, given the definition above, the command line option:

--pc=/threads{locality#0/total}/count/cumulative

would be expanded to:

--hpx:print-counter=/threads{locality#0/total}/count/cumulative

Important

Any shortcut option should either start with a single '-' or with two '--' characters. Shortcuts starting with a single '-' are interpreted as short options (i.e. everything after the first character following the '-' is treated as the argument). Shortcuts starting with '--' are interpreted as long options. No other shortcut formats are supported.

Specifying options for single localities only

For runs involving more than one locality it is sometimes desirable to supply specific command line options to single localities only. When the HPX application is launched using a scheduler (like PBS, for more details see section How to use HPX applications with PBS), specifying dedicated command line options for single localities may be desirable. For this reason all of the command line options which have the general format --hpx:<some_key> can be used in a more general form: --hpx:<N>:<some_key>, where <N> is the number of the locality this command line options will be applied to, all other localities will simply ignore the option. For instance, the following PBS script passes the option --hpx:pu-offset=4 to the locality '1' only.

#!/bin/bash
#
#PBS -l nodes=2:ppn=4

APP_PATH=~/packages/hpx/bin/hello_world_distributed
APP_OPTIONS=

pbsdsh -u $APP_PATH $APP_OPTIONS --hpx:1:pu-offset=4 --hpx:nodes=`cat $PBS_NODEFILE`

Caution

If the first application specific argument (inside $APP_OPTIONS is a non-option (i.e. does not start with a - or a --, then it must be placed before the option --hpx:nodes, which, in this case, should be the last option on the command line.

Alternatively, use the option --hpx:endnodes to explicitly mark the end of the list of node names:

pbsdsh -u $APP_PATH --hpx:1:pu-offset=4 --hpx:nodes=`cat $PBS_NODEFILE` --hpx:endnodes $APP_OPTIONS
More details about HPX command line options

This section documents the following list of the command line options in more detail:

The command line option --hpx:bind

This command line option allows one to specify the required affinity of the HPX worker threads to the underlying processing units. As a result the worker threads will run only on the processing units identified by the corresponding bind specification. The affinity settings are to be specified using --hpx:bind=<BINDINGS>, where <BINDINGS> have to be formatted as described below.

In addition to the syntax described below one can use --hpx:bind=none to disable all binding of any threads to a particular core. This is mostly supported for debugging purposes.

The specified affinities refer to specific regions within a machine hardware topology. In order to understand the hardware topology of a particular machine it may be useful to run the lstopo tool which is part of Portable Hardware Locality (HWLOC) to see the reported topology tree. Seeing and understanding a topology tree will definitely help in understanding the concepts that are discussed below.

Affinities can be specified using HWLOC (Portable Hardware Locality (HWLOC)) tuples. Tuples of HWLOC objects and associated indexes can be specified in the form object:index, object:index-index or object:index,...,index. HWLOC objects represent types of mapped items in a topology tree. Possible values for objects are socket, numanode, core and pu (processing unit). Indexes are non-negative integers that specify a unique physical object in a topology tree using its logical sequence number.

Chaining multiple tuples together in the more general form object1:index1[.object2:index2[...]] is permissible. While the first tuple’s object may appear anywhere in the topology, the Nth tuple’s object must have a shallower topology depth than the (N+1)th tuple’s object. Put simply: as you move right in a tuple chain, objects must go deeper in the topology tree. Indexes specified in chained tuples are relative to the scope of the parent object. For example, socket:0.core:1 refers to the second core in the first socket (all indices are zero based).

Multiple affinities can be specified using several --hpx:bind command line options or by appending several affinities separated by a ';' By default, if multiple affinities are specified, they are added.

"all" is a special affinity consisting in the entire current topology.

Note

All ‘names’ in an affinity specification, such as thread, socket, numanode, pu or all can be abbreviated. Thus the affinity specification threads:0-3=socket:0.core:1.pu:1 is fully equivalent to its shortened form t:0-3=s:0.c:1.p:1.

Here is a full grammar describing the possible format of mappings:

mappings     ::=  distribution | mapping (";" mapping)*
distribution ::=  "compact" | "scatter" | "balanced" | "numa-balanced"
mapping      ::=  thread_spec "=" pu_specs
thread_spec  ::=  "thread:" range_specs
pu_specs     ::=  pu_spec ("." pu_spec)*
pu_spec      ::=  type ":" range_specs | "~" pu_spec
range_specs  ::=  range_spec ("," range_spec)*
range_spec   ::=  int | int "-" int | "all"
type         ::=  "socket" | "numanode" | "core" | "pu"

The following example assumes a system with at least 4 cores, where each core has more than 1 processing unit (hardware threads). Running hello_world_distributed with 4 OS-threads (on 4 processing units), where each of those threads is bound to the first processing unit of each of the cores, can be achieved by invoking:

hello_world_distributed -t4 --hpx:bind=thread:0-3=core:0-3.pu:0

Here thread:0-3 specifies the OS threads for which to define affinity bindings, and core:0-3.pu: defines that for each of the cores (core:0-3) only their first processing unit pu:0 should be used.

Note

The command line option --hpx:print-bind can be used to print the bitmasks generated from the affinity mappings as specified with --hpx:bind. For instance, on a system with hyperthreading enabled (i.e. 2 processing units per core), the command line:

hello_world_distributed -t4 --hpx:bind=thread:0-3=core:0-3.pu:0 --hpx:print-bind

will cause this output to be printed:

0: PU L#0(P#0), Core L#0, Socket L#0, Node L#0(P#0)
1: PU L#2(P#2), Core L#1, Socket L#0, Node L#0(P#0)
2: PU L#4(P#4), Core L#2, Socket L#0, Node L#0(P#0)
3: PU L#6(P#6), Core L#3, Socket L#0, Node L#0(P#0)

where each bit in the bitmasks corresponds to a processing unit the listed worker thread will be bound to run on.

The difference between the four possible predefined distribution schemes (compact, scatter, balanced and numa-balanced) is best explained with an example. Imagine that we have a system with 4 cores and 4 hardware threads per core on 2 sockets. If we place 8 threads the assignments produced by the compact, scatter, balanced and numa-balanced types are shown in the figure below. Notice that compact does not fully utilize all the cores in the system. For this reason it is recommended that applications are run using the scatter or balanced/numa-balanced options in most cases.

_images/affinities.png

Fig. 7 Schematic of thread affinity type distributions.

[1]The phase of a HPX-thread counts how often this thread has been activated.

Writing single-node HPX applications

HPX is a C++ Standard Library for Concurrency and Parallelism. This means that it implements all of the corresponding facilities as defined by the C++ Standard. Additionally, in HPX we implement functionalities proposed as part of the ongoing C++ standardization process. This section focuses on the features available in HPX for parallel and concurrent computation on a single node, although many of the features presented here are also implemented to work in the distributed case.

Using LCOs

Lightweight Control Objects provide synchronization for HPX applications. Most of them are familiar from other frameworks, but a few of them work in slightly special different ways adapted to HPX.

  1. future
  2. queue
  3. object_semaphore
  4. barrier
Channels

Channels combine communication (the exchange of a value) with synchronization (guaranteeing that two calculations (tasks) are in a known state). A channel can transport any number of values of a given type from a sender to a receiver:

hpx::lcos::local::channel<int> c;
c.set(42);
cout << c.get();      // will print '42'

Channels can be handed to another thread (or in case of channel components, to other localities), thus establishing a communication channel between two independent places in the program:

void do_something(
    hpx::lcos::local::receive_channel<int> c,
    hpx::lcos::local::send_channel<> done)
{
    cout << c.get();        // prints 42
    done.set();             // signal back
}

{
    hpx::lcos::local::channel<int> c;
    hpx::lcos::local::channel<> done;

    hpx::apply(&do_something, c, done);

    c.set(42);              // send some value
    done.get();             // wait for thread to be done
}

A channel component is created on one locality and can be send to another locality using an action. This example also demonstrates how a channel can be used as a range of values:

// channel components need to be registered for each used type (not needed
// for hpx::lcos::local::channel)
HPX_REGISTER_CHANNEL(double);

void some_action(hpx::lcos::channel<double> c)
{
    for (double d : c)
        hpx::cout << d << std::endl;
}
HPX_REGISTER_ACTION(some_action);

{
    // create the channel on this locality
    hpx::lcos::channel<double> c(hpx::find_here());

    // pass the channel to a (possibly remote invoked) action
    hpx::apply(some_action(), hpx::find_here(), c);

    // send some values to the receiver
    std::vector<double> v = { 1.2, 3.4, 5.0 };
    for (double d : v)
        c.set(d);

    // explicitly close the communication channel (implicit at destruction)
    c.close();
}
Composable guards

Composable guards operate in a manner similar to locks, but are applied only to asynchronous functions. The guard (or guards) is automatically locked at the beginning of a specified task and automatically unlocked at the end. Because guards are never added to an existing task’s execution context, the calling of guards is freely composable and can never deadlock.

To call an application with a single guard, simply declare the guard and call run_guarded() with a function (task):

hpx::lcos::local::guard gu;
run_guarded(gu,task);

If a single method needs to run with multiple guards, use a guard set:

boost::shared<hpx::lcos::local::guard> gu1(new hpx::lcos::local::guard());
boost::shared<hpx::lcos::local::guard> gu2(new hpx::lcos::local::guard());
gs.add(*gu1);
gs.add(*gu2);
run_guarded(gs,task);

Guards use two atomic operations (which are not called repeatedly) to manage what they do, so overhead should be extremely low.

  1. conditional_trigger
  2. counting_semaphore
  3. dataflow
  4. event
  5. mutex
  6. once
  7. recursive_mutex
  8. spinlock
  9. spinlock_no_backoff
  10. trigger
Extended facilities for futures

Concurrency is about both decomposing and composing the program from the parts that work well individually and together. It is in the composition of connected and multicore components where today’s C++ libraries are still lacking.

The functionality of std::future offers a partial solution. It allows for the separation of the initiation of an operation and the act of waiting for its result; however the act of waiting is synchronous. In communication-intensive code this act of waiting can be unpredictable, inefficient and simply frustrating. The example below illustrates a possible synchronous wait using futures:

#include <future>
using namespace std;
int main()
{
    future<int> f = async([]() { return 123; });
    int result = f.get(); // might block
}

For this reason, HPX implements a set of extensions to std::future (as proposed by __cpp11_n4107__). This proposal introduces the following key asynchronous operations to hpx::future, hpx::shared_future and hpx::async, which enhance and enrich these facilities.

Table 23 Facilities extending std::future
Facility Description
hpx::future::then In asynchronous programming, it is very common for one asynchronous operation, on completion, to invoke a second operation and pass data to it. The current C++ standard does not allow one to register a continuation to a future. With``then`` instead of waiting for the result, a continuation is “attached” to the asynchronous operation, which is invoked when the result is ready. Continuations registered using then function will help to avoid blocking waits or wasting threads on polling, greatly improving the responsiveness and scalability of an application.
unwrapping constructor for hpx::future In some scenarios, you might want to create a future that returns another future, resulting in nested futures. Although it is possible to write code to unwrap the outer future and retrieve the nested future and its result, such code is not easy to write because you must handle exceptions and it may cause a blocking call. Unwrapping can allow us to mitigate this problem by doing an asynchronous call to unwrap the outermost future.
hpx::future::is_ready There are often situations where a get() call on a future may not be a blocking call, or is only a blocking call under certain circumstances. This function gives the ability to test for early completion and allows us to avoid associating a continuation, which needs to be scheduled with some non-trivial overhead and near-certain loss of cache efficiency.
hpx::make_ready_future Some functions may know the value at the point of construction. In these cases the value is immediately available, but needs to be returned as a future. By using``hpx::make_ready_future``a future can be created which holds a pre-computed result in its shared state. In the current standard it is non-trivial to create a future directly from a value. First a promise must be created, then the promise is set, and lastly the future is retrieved from the promise. This can now be done with one operation.

The standard also omits the ability to compose multiple futures. This is a common pattern that is ubiquitous in other asynchronous frameworks and is absolutely necessary in order to make C++ a powerful asynchronous programming language. Not including these functions is synonymous to Boolean algebra without AND/OR.

In addition to the extensions proposed by N4313, HPX adds functions allowing to compose several futures in a more flexible way.

Table 24 Facilities for composing hpx::futures
Facility Description Comment
hpx::when_any, hpx::when_any_n Asynchronously wait for at least one of multiple future or shared_future objects to finish. N4313, ..._n versions are HPX only
hpx::wait_any, hpx::wait_any_n Synchronously wait for at least one of multiple future or shared_future objects to finish. HPX only
hpx::when_all, hpx::when_all_n Asynchronously wait for all future and shared_future objects to finish. N4313, ..._n versions are HPX only
hpx::wait_all, hpx::wait_all_n Synchronously wait for all future and shared_future objects to finish. HPX only
hpx::when_some, hpx::when_some_n Asynchronously wait for multiple future and shared_future objects to finish. HPX only
hpx::wait_some, hpx::wait_some_n Synchronously wait for multiple future and shared_future objects to finish. HPX only
hpx::when_each Asynchronously wait for multiple future and shared_future objects to finish and call a function for each of the future objects as soon as it becomes ready. HPX only
hpx::wait_each, hpx::wait_each_n Synchronously wait for multiple future and shared_future objects to finish and call a function for each of the future objects as soon as it becomes ready. HPX only
High level parallel facilities

In preparation for the upcoming C++ Standards we currently see several proposals targeting different facilities supporting parallel programming. HPX implements (and extends) some of those proposals. This is well aligned with our strategy to align the APIs exposed from HPX with current and future C++ Standards.

At this point, HPX implements several of the C++ Standardization working papers, most notably N4409 (Working Draft, Technical Specification for C++ Extensions for Parallelism), N4411 (Task Blocks), and N4406 (Parallel Algorithms Need Executors).

Using parallel algorithms

A parallel algorithm is a function template described by this document which is declared in the (inline) namespace hpx::parallel::v1.

Note

For compilers which do not support inline namespaces, all of the namespace v1 is imported into the namespace hpx::parallel. The effect is similar to what inline namespaces would do, namely all names defined in hpx::parallel::v1 are accessible from the namespace hpx::parallel as well.

All parallel algorithms are very similar in semantics to their sequential counterparts (as defined in the namespace std with an additional formal template parameter named ExecutionPolicy. The execution policy is generally passed as the first argument to any of the parallel algorithms and describes the manner in which the execution of these algorithms may be parallelized and the manner in which they apply user-provided function objects.

The applications of function objects in parallel algorithms invoked with an execution policy object of type hpx::parallel::execution::sequenced_policy or hpx::parallel::execution::sequenced_task_policy execute in sequential order. For hpx::parallel::execution::sequenced_policy the execution happens in the calling thread.

The applications of function objects in parallel algorithms invoked with an execution policy object of type hpx::parallel::execution::parallel_policy or hpx::parallel::execution::parallel_task_policy are permitted to execute in an unordered fashion in unspecified threads, and indeterminately sequenced within each thread.

Important

It is the caller’s responsibility to ensure correctness, for example that the invocation does not introduce data races or deadlocks.

The applications of function objects in parallel algorithms invoked with an execution policy of type hpx::parallel::execution::parallel_unsequenced_policy is in HPX equivalent to the use of the execution policy hpx::parallel::execution::parallel_policy.

Algorithms invoked with an execution policy object of type hpx::parallel::v1::execution_policy execute internally as if invoked with the contained execution policy object. No exception is thrown when an hpx::parallel::v1::execution_policy contains an execution policy of type hpx::parallel::execution::sequenced_task_policy or hpx::parallel::execution::parallel_task_policy (which normally turn the algorithm into its asynchronous version). In this case the execution is semantically equivalent to the case of passing a hpx::parallel::execution::sequenced_policy or hpx::parallel::execution::parallel_policy contained in the hpx::parallel::v1::execution_policy object respectively.

Parallel exceptions

During the execution of a standard parallel algorithm, if temporary memory resources are required by any of the algorithms and no memory are available, the algorithm throws a std::bad_alloc exception.

During the execution of any of the parallel algorithms, if the application of a function object terminates with an uncaught exception, the behavior of the program is determined by the type of execution policy used to invoke the algorithm:

For example, the number of invocations of the user-provided function object in for_each is unspecified. When hpx::parallel::v1::for_each is executed sequentially, only one exception will be contained in the hpx::exception_list object.

These guarantees imply that, unless the algorithm has failed to allocate memory and terminated with std::bad_alloc all exceptions thrown during the execution of the algorithm are communicated to the caller. It is unspecified whether an algorithm implementation will “forge ahead” after encountering and capturing a user exception.

The algorithm may terminate with the std::bad_alloc exception even if one or more user-provided function objects have terminated with an exception. For example, this can happen when an algorithm fails to allocate memory while creating or adding elements to the hpx::exception_list object.

Parallel algorithms

HPX provides implementations of the following parallel algorithms:

Table 25 Non-modifying parallel algorithms (in header: <hpx/include/parallel_algorithm.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::adjacent_find Computes the differences between adjacent elements in a range. <hpx/include/parallel_adjacent_find.hpp> adjacent_find
hpx::parallel::v1::all_of Checks if a predicate is true for all of the elements in a range. <hpx/include/parallel_all_any_none.hpp> all_any_none_of
hpx::parallel::v1::any_of Checks if a predicate is true for any of the elements in a range. <hpx/include/parallel_all_any_none.hpp> all_any_none_of
hpx::parallel::v1::count Returns the number of elements equal to a given value. <hpx/include/parallel_count.hpp> count
hpx::parallel::v1::count_if Returns the number of elements satisfying a specific criteria. <hpx/include/parallel_count.hpp> count_if
hpx::parallel::v1::equal Determines if two sets of elements are the same. <hpx/include/parallel_equal.hpp> equal
hpx::parallel::v1::exclusive_scan Does an exclusive parallel scan over a range of elements. <hpx/include/parallel_scan.hpp> exclusive_scan
hpx::parallel::v1::find Finds the first element equal to a given value. <hpx/include/parallel_find.hpp> find
hpx::parallel::v1::find_end Finds the last sequence of elements in a certain range. <hpx/include/parallel_find.hpp> find_end
hpx::parallel::v1::find_first_of Searches for any one of a set of elements. <hpx/include/parallel_find.hpp> find_first_of
hpx::parallel::v1::find_if Finds the first element satisfying a specific criteria. <hpx/include/parallel_find.hpp> find
hpx::parallel::v1::find_if_not Finds the first element not satisfying a specific criteria. <hpx/include/parallel_find.hpp> find_if_not
hpx::parallel::v1::for_each Applies a function to a range of elements. <hpx/include/parallel_for_each.hpp> for_each
hpx::parallel::v1::for_each_n Applies a function to a number of elements. <hpx/include/parallel_for_each.hpp> for_each_n
hpx::parallel::v1::inclusive_scan Does an inclusive parallel scan over a range of elements. <hpx/include/parallel_scan.hpp> inclusive_scan
hpx::parallel::v1::lexicographical_compare Checks if a range of values is lexicographically less than another range of values. <hpx/include/parallel_lexicographical_compare.hpp> lexicographical_compare
hpx::parallel::v1::mismatch Finds the first position where two ranges differ. <hpx/include/parallel_mismatch.hpp> mismatch
hpx::parallel::v1::none_of Checks if a predicate is true for none of the elements in a range. <hpx/include/parallel_all_any_none.hpp> all_any_none_of
hpx::parallel::v1::search Searches for a range of elements. <hpx/include/parallel_search.hpp> search
hpx::parallel::v1::search_n Searches for a number consecutive copies of an element in a range. <hpx/include/parallel_search.hpp> search_n
Table 26 Modifying Parallel Algorithms (In Header: <hpx/include/parallel_algorithm.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::copy Copies a range of elements to a new location. <hpx/include/parallel_copy.hpp> exclusive_scan
hpx::parallel::v1::copy_n Copies a number of elements to a new location. <hpx/include/parallel_copy.hpp> copy_n
hpx::parallel::v1::copy_if Copies the elements from a range to a new location for which the given predicate is true <hpx/include/parallel_copy.hpp> copy
hpx::parallel::v1::move Moves a range of elements to a new location. <hpx/include/parallel_fill.hpp> move
hpx::parallel::v1::fill Assigns a range of elements a certain value. <hpx/include/parallel_fill.hpp> fill
hpx::parallel::v1::fill_n Assigns a value to a number of elements. <hpx/include/parallel_fill.hpp> fill_n
hpx::parallel::v1::generate Saves the result of a function in a range. <hpx/include/parallel_generate.hpp> generate
hpx::parallel::v1::generate_n Saves the result of N applications of a function. <hpx/include/parallel_generate.hpp> generate_n
hpx::parallel::v1::remove Removes the elements from a range that are equal to the given value. <hpx/include/parallel_remove.hpp> remove
hpx::parallel::v1::remove_if Removes the elements from a range that are equal to the given predicate is false <hpx/include/parallel_remove.hpp> remove
hpx::parallel::v1::remove_copy Copies the elements from a range to a new location that are not equal to the given value. <hpx/include/parallel_remove_copy.hpp> remove_copy
hpx::parallel::v1::remove_copy_if Copies the elements from a range to a new location for which the given predicate is false <hpx/include/parallel_remove_copy.hpp> remove_copy
hpx::parallel::v1::replace Replaces all values satisfying specific criteria with another value. <hpx/include/parallel_replace.hpp> replace
hpx::parallel::v1::replace_if Replaces all values satisfying specific criteria with another value. <hpx/include/parallel_replace.hpp> replace
hpx::parallel::v1::replace_copy Copies a range, replacing elements satisfying specific criteria with another value. <hpx/include/parallel_replace.hpp> replace_copy
hpx::parallel::v1::replace_copy_if Copies a range, replacing elements satisfying specific criteria with another value. <hpx/include/parallel_replace.hpp> replace_copy
hpx::parallel::v1::reverse Reverses the order elements in a range. <hpx/include/parallel_reverse.hpp> reverse
hpx::parallel::v1::reverse_copy Creates a copy of a range that is reversed. <hpx/include/parallel_reverse.hpp> reverse_copy
hpx::parallel::v1::rotate Rotates the order of elements in a range. <hpx/include/parallel_rotate.hpp> rotate
hpx::parallel::v1::rotate_copy Copies and rotates a range of elements. <hpx/include/parallel_rotate.hpp> rotate_copy
hpx::parallel::v1::swap_ranges Swaps two ranges of elements. <hpx/include/parallel_swap_ranges.hpp> swap_ranges
hpx::parallel::v1::transform Applies a function to a range of elements. <hpx/include/parallel_transform.hpp> transform
hpx::parallel::v1::unique_copy Eliminates all but the first element from every consecutive group of equivalent elements from a range. <hpx/include/parallel_unique.hpp> unique
hpx::parallel::v1::unique_copy Eliminates all but the first element from every consecutive group of equivalent elements from a range. <hpx/include/parallel_unique.hpp> unique_copy
Table 27 Set operations on sorted sequences (In Header: <hpx/include/parallel_algorithm.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::merge Merges two sorted ranges. <hpx/include/parallel_merge.hpp> merge
hpx::parallel::v1::inplace_merge Merges two ordered ranges in-place. <hpx/include/parallel_merge.hpp> inplace_merge
hpx::parallel::v1::includes Returns true if one set is a subset of another. <hpx/include/parallel_set_operations.hpp> includes
hpx::parallel::v1::set_difference Computes the difference between two sets. <hpx/include/parallel_set_operations.hpp> set_difference
hpx::parallel::v1::set_intersection Computes the intersection of two sets. <hpx/include/parallel_set_operations.hpp> set_intersection
hpx::parallel::v1::set_symmetric_difference Computes the symmetric difference between two sets. <hpx/include/parallel_set_operations.hpp> set_symmetric_difference
hpx::parallel::v1::set_union Computes the union of two sets. <hpx/include/parallel_set_operations.hpp> set_union
Table 28 Heap operations (In Header: <hpx/include/parallel_algorithm.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::is_heap Returns true if the range is max heap. <hpx/include/is_heap.hpp> is_heap
hpx::parallel::v1::is_heap_until Returns the first element that breaks a max heap. <hpx/include/is_heap.hpp> is_heap_until
Table 29 Minimum/maximum operations (In Header: <hpx/include/parallel_algortithm.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::max_element Returns the largest element in a range. <hpx/include/parallel_minmax.hpp> max_element
hpx::parallel::v1::min_element Returns the smallest element in a range. <hpx/include/parallel_minmax.hpp> min_element
hpx::parallel::v1::minmax_element Returns the smallest and the largest element in a range. <hpx/include/parallel_minmax.hpp> minmax_element
Table 30 Partitioning Operations (In Header: <hpx/include/parallel_algorithm.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::is_partitioned Returns true if each true element for a predicate precedes the false elements in a range <hpx/include/parallel_is_partitioned.hpp> is_partitioned
hpx::parallel::v1::partition Divides elements into two groups while don’t preserve their relative order <hpx/include/parallel_partition.hpp> partition
hpx::parallel::v1::partition_copy Copies a range dividing the elements into two groups <hpx/include/parallel_partition.hpp> partition_copy
hpx::parallel::v1::stable_partition Divides elements into two groups while preserving their relative order <hpx/include/parallel_partition.hpp> stable_partition
Table 31 Sorting Operations (In Header: <hpx/include/parallel_algorithm.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::is_sorted Returns true if each element in a range is sorted <hpx/include/parallel_is_sorted.hpp> is_sorted
hpx::parallel::v1::is_sorted_until Returns the first unsorted element <hpx/include/parallel_is_sorted.hpp> is_sorted_until
hpx::parallel::v1::sort Sorts the elements in a range <hpx/include/parallel_sort.hpp> sort
hpx::parallel::v1::sort_by_key Sorts one range of data using keys supplied in another range <hpx/include/parallel_sort.hpp>  
Table 32 Numeric Parallel Algorithms (In Header: <hpx/include/parallel_numeric.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::adjacent_difference Calculates the difference between each element in an input range and the preceding element. <hpx/include/parallel_adjacent_difference.hpp> adjacent_difference
hpx::parallel::v1::reduce Sums up a range of elements. <hpx/include/parallel_reduce.hpp> reduce
hpx::parallel::v1::reduce_by_key Performs an inclusive scan on consecutive elements with matching keys, with a reduction to output only the final sum for each key. The key sequence {1,1,1,2,3,3,3,3,1} and value sequence {2,3,4,5,6,7,8,9,10} would be reduced to keys={1,2,3,1}, values={9,5,30,10} <hpx/include/parallel_reduce.hpp>  
hpx::parallel::v1::transform_reduce Sums up a range of elements after applying a function. Also, accumulates the inner products of two input ranges. <hpx/include/parallel_transform_reduce.hpp> transform_reduce
hpx::parallel::v1::transform_inclusive_scan Does an inclusive parallel scan over a range of elements after applying a function. <hpx/include/parallel_scan.hpp> transform_inclusive_scan
hpx::parallel::v1::transform_exclusive_scan Does an exclusive parallel scan over a range of elements after applying a function. <hpx/include/parallel_scan.hpp> transform_exclusive_scan
Table 33 Dynamic Memory Management (In Header: <hpx/include/parallel_memory.hpp>)
Name Description In header Algorithm page at cppreference.com
hpx::parallel::v1::destroy Destroys a range of objects. <hpx/include/parallel_destroy.hpp> destroy
hpx::parallel::v1::destroy_n Destroys a range of objects. <hpx/include/parallel_destroy.hpp> destroy_n
hpx::parallel::v1::uninitialized_copy Copies a range of objects to an uninitialized area of memory. <hpx/include/parallel_uninitialized_copy.hpp> uninitialized_copy
hpx::parallel::v1::uninitialized_copy_n Copies a number of objects to an uninitialized area of memory. <hpx/include/parallel_uninitialized_copy.hpp> uninitialized_copy_n
hpx::parallel::v1::uninitialized_default_construct Copies a range of objects to an uninitialized area of memory. <hpx/include/parallel_uninitialized_default_construct.hpp> uninitialized_default_construct
hpx::parallel::v1::uninitialized_default_construct_n Copies a number of objects to an uninitialized area of memory. <hpx/include/parallel_uninitialized_default_construct.hpp> uninitialized_default_construct_n
hpx::parallel::v1::uninitialized_fill Copies an object to an uninitialized area of memory. <hpx/include/parallel_uninitialized_fill.hpp> uninitialized_fill
hpx::parallel::v1::uninitialized_fill_n Copies an object to an uninitialized area of memory. <hpx/include/parallel_uninitialized_fill.hpp> uninitialized_fill_n
hpx::parallel::v1::uninitialized_move Moves a range of objects to an uninitialized area of memory. <hpx/include/parallel_uninitialized_move.hpp> uninitialized_move
hpx::parallel::v1::uninitialized_move_n Moves a number of objects to an uninitialized area of memory. <hpx/include/parallel_uninitialized_move.hpp> uninitialized_move_n
hpx::parallel::v1::uninitialized_value_construct Constructs objects in an uninitialized area of memory. <hpx/include/parallel_uninitialized_value_construct.hpp> uninitialized_value_construct
hpx::parallel::v1::uninitialized_value_construct_n Constructs objects in an uninitialized area of memory. <hpx/include/uninitialized_value_construct.hpp> uninitialized_value_construct_n
Table 34 Index-based for-loops (In Header: <hpx/include/parallel_algorithm.hpp>)
Name Description In header
hpx::parallel::v2::for_loop Implements loop functionality over a range specified by integral or iterator bounds. <hpx/include/parallel_for_loop.hpp>
hpx::parallel::v2::for_loop_strided Implements loop functionality over a range specified by integral or iterator bounds. <hpx/include/parallel_for_loop.hpp>
hpx::parallel::v2::for_loop_n Implements loop functionality over a range specified by integral or iterator bounds. <hpx/include/parallel_for_loop.hpp>
hpx::parallel::v2::for_loop_n_strided Implements loop functionality over a range specified by integral or iterator bounds. <hpx/include/parallel_for_loop.hpp>
Executor parameters and executor parameter traits

In HPX we introduce the notion of execution parameters and execution parameter traits. At this point, the only parameter which can be customized is the size of the chunks of work executed on a single HPX-thread (such as the number of loop iterations combined to run as a single task).

An executor parameter object is responsible for exposing the calculation of the size of the chunks scheduled. It abstracts the (potential platform-specific) algorithms of determining those chunks sizes.

The way executor parameters are implemented is aligned with the way executors are implemented. All functionalities of concrete executor parameter types are exposed and accessible through a corresponding hpx::parallel::executor_parameter_traits type.

With executor_parameter_traits clients access all types of executor parameters uniformly:

std::size_t chunk_size =
    executor_parameter_traits<my_parameter_t>::get_chunk_size(my_parameter,
        my_executor, [](){ return 0; }, num_tasks);

This call synchronously retrieves the size of a single chunk of loop iterations (or similar) to combine for execution on a single HPX-thread if the overall number of tasks to schedule is given by num_tasks. The lambda function exposes a means of test-probing the execution of a single iteration for performance measurement purposes (the execution parameter type might dynamically determine the execution time of one or more tasks in order to calculate the chunk size, see hpx::parallel::execution::auto_chunk_size for an example of such a executor parameter type).

Other functions in the interface exist to discover whether a executor parameter type should be invoked once (i.e. returns a static chunk size, see hpx::parallel::execution::static_chunk_size) or whether it should be invoked for each scheduled chunk of work (i.e. it returns a variable chunk size, for an example, see hpx::parallel::execution::guided_chunk_size).

Though this interface appears to require executor parameter type authors to implement all different basic operations, there is really none required. In practice, all operations have sensible defaults. However, some executor parameter types will naturally specialize all operations for maximum efficiency.

In HPX we have implemented the following executor parameter types:

  • hpx::parallel::execution::auto_chunk_size: Loop iterations are divided into pieces and then assigned to threads. The number of loop iterations combined is determined based on measurements of how long the execution of 1% of the overall number of iterations takes. This executor parameters type makes sure that as many loop iterations are combined as necessary to run for the amount of time specified.
  • hpx::parallel::execution::static_chunk_size: Loop iterations are divided into pieces of a given size and then assigned to threads. If the size is not specified, the iterations are evenly (if possible) divided contiguously among the threads. This executor parameters type is equivalent to OpenMP’s STATIC scheduling directive.
  • hpx::parallel::execution::dynamic_chunk_size: Loop iterations are divided into pieces of a given size and then dynamically scheduled among the cores; when an core finishes one chunk, it is dynamically assigned another If the size is not specified, the default chunk size is 1. This executor parameters type is equivalent to OpenMP’s DYNAMIC scheduling directive.
  • hpx::parallel::execution::guided_chunk_size: Iterations are dynamically assigned to cores in blocks as cores request them until no blocks remain to be assigned. Similar to dynamic_chunk_size except that the block size decreases each time a number of loop iterations is given to a thread. The size of the initial block is proportional to number_of_iterations / number_of_cores. Subsequent blocks are proportional to number_of_iterations_remaining / number_of_cores. The optional chunk size parameter defines the minimum block size. The default minimal chunk size is 1. This executor parameters type is equivalent to OpenMP’s GUIDED scheduling directive.
Using task blocks

The define_task_block, run and the wait functions implemented based on N4411 are based on the task_block concept that is a part of the common subset of the Microsoft Parallel Patterns Library (PPL) and the Intel Threading Building Blocks (TBB) libraries.

This implementations adopts a simpler syntax than exposed by those libraries— one that is influenced by language-based concepts such as spawn and sync from Cilk++ and async and finish from X10. It improves on existing practice in the following ways:

  • The exception handling model is simplified and more consistent with normal C++ exceptions.
  • Most violations of strict fork-join parallelism can be enforced at compile time (with compiler assistance, in some cases).
  • The syntax allows scheduling approaches other than child stealing.

Consider an example of a parallel traversal of a tree, where a user-provided function compute is applied to each node of the tree, returning the sum of the results:

template <typename Func>
int traverse(node& n, Func && compute)
{
    int left = 0, right = 0;
    define_task_block(
        [&](task_block<>& tr) {
            if (n.left)
                tr.run([&] { left = traverse(*n.left, compute); });
            if (n.right)
                tr.run([&] { right = traverse(*n.right, compute); });
        });

    return compute(n) + left + right;
}

The example above demonstrates the use of two of the functions, hpx::parallel::define_task_block and the hpx::parallel::task_block::run member function of a hpx::parallel::task_block.

The task_block function delineates a region in a program code potentially containing invocations of threads spawned by the run member function of the task_block class. The run function spawns an HPX thread, a unit of work that is allowed to execute in parallel with respect to the caller. Any parallel tasks spawned by run within the task block are joined back to a single thread of execution at the end of the define_task_block. run takes a user-provided function object f and starts it asynchronously—i.e. it may return before the execution of f completes. The HPX scheduler may choose to run f immediately or delay running f until compute resources become available.

A task_block can be constructed only by define_task_block because it has no public constructors. Thus, run can be invoked (directly or indirectly) only from a user-provided function passed to define_task_block:

void g();

void f(task_block<>& tr)
{
    tr.run(g);          // OK, invoked from within task_block in h
}

void h()
{
    define_task_block(f);
}

int main()
{
    task_block<> tr;    // Error: no public constructor
    tr.run(g);          // No way to call run outside of a define_task_block
    return 0;
}
Extensions for task blocks
Using execution policies with task blocks

In HPX we implemented some extensions for task_block beyond the actual standards proposal N4411. The main addition is that a task_block can be invoked with a execution policy as its first argument, very similar to the parallel algorithms.

An execution policy is an object that expresses the requirements on the ordering of functions invoked as a consequence of the invocation of a task block. Enabling passing an execution policy to define_task_block gives the user control over the amount of parallelism employed by the created task_block. In the following example the use of an explicit par execution policy makes the user’s intent explicit:

template <typename Func>
int traverse(node *n, Func&& compute)
{
    int left = 0, right = 0;

    define_task_block(
        execution::par,                // execution::parallel_policy
        [&](task_block<>& tb) {
            if (n->left)
                tb.run([&] { left = traverse(n->left, compute); });
            if (n->right)
                tb.run([&] { right = traverse(n->right, compute); });
        });

    return compute(n) + left + right;
}

This also causes the hpx::parallel::v2::task_block object to be a template in our implementation. The template argument is the type of the execution policy used to create the task block. The template argument defaults to hpx::parallel::execution::parallel_policy.

HPX still supports calling hpx::parallel::v2::define_task_block without an explicit execution policy. In this case the task block will run using the hpx::parallel::execution::parallel_policy.

HPX also adds the ability to access the execution policy which was used to create a given task_block.

Using executors to run tasks

Often, we want to be able to not only define an execution policy to use by default for all spawned tasks inside the task block, but in addition to customize the execution context for one of the tasks executed by task_block::run. Adding an optionally passed executor instance to that function enables this use case:

template <typename Func>
int traverse(node *n, Func&& compute)
{
    int left = 0, right = 0;

    define_task_block(
        execution::par,                // execution::parallel_policy
        [&](auto& tb) {
            if (n->left)
            {
                // use explicitly specified executor to run this task
                tb.run(my_executor(), [&] { left = traverse(n->left, compute); });
            }
            if (n->right)
            {
                // use the executor associated with the par execution policy
                tb.run([&] { right = traverse(n->right, compute); });
            }
        });

    return compute(n) + left + right;
}

HPX still supports calling hpx::parallel::v2::task_block::run without an explicit executor object. In this case the task will be run using the executor associated with the execution policy which was used to call hpx::parallel::v2::define_task_block.

Writing distributed HPX applications

This section focuses on the features of HPX needed to write distributed applications, namely the Active Global Address Space (AGAS), remotely executable functions (i.e. actions), and distributed objects (i.e. components).

Global names

HPX implements an Active Global Address Space (AGAS) which is exposing a single uniform address space spanning all localities an application runs on. AGAS is a fundamental component of the ParalleX execution model. Conceptually, there is no rigid demarcation of local or global memory in AGAS; all available memory is a part of the same address space. AGAS enables named objects to be moved (migrated) across localities without having to change the object’s name, i.e., no references to migrated objects have to be ever updated. This feature has significance for dynamic load balancing and in applications where the workflow is highly dynamic, allowing work to be migrated from heavily loaded nodes to less loaded nodes. In addition, immutability of names ensures that AGAS does not have to keep extra indirections (“bread crumbs”) when objects move, hence minimizing complexity of code management for system developers as well as minimizing overheads in maintaining and managing aliases.

The AGAS implementation in HPX does not automatically expose every local address to the global address space. It is the responsibility of the programmer to explicitly define which of the objects have to be globally visible and which of the objects are purely local.

In HPX global addresses (global names) are represented using the hpx::id_type data type. This data type is conceptually very similar to void* pointers as it does not expose any type information of the object it is referring to.

The only predefined global addresses are assigned to all localities. The following HPX API functions allow one to retrieve the global addresses of localities:

Additionally, the global addresses of localities can be used to create new instances of components using the following HPX API function:

  • hpx::components::new_: Create a new instance of the given Component type on the specified locality.

Note

HPX does not expose any functionality to delete component instances. All global addresses (as represented using hpx::id_type) are automatically garbage collected. When the last (global) reference to a particular component instance goes out of scope the corresponding component instance is automatically deleted.

Applying actions
Action type definition

Actions are special types we use to describe possibly remote operations. For every global function and every member function which has to be invoked distantly, a special type must be defined. For any global function the special macro HPX_PLAIN_ACTION can be used to define the action type. Here is an example demonstrating this:

namespace app
{
    void some_global_function(double d)
    {
        cout << d;
    }
}

// This will define the action type 'some_global_action' which represents
// the function 'app::some_global_function'.
HPX_PLAIN_ACTION(app::some_global_function, some_global_action);

Important

The macro HPX_PLAIN_ACTION has to be placed in global namespace, even if the wrapped function is located in some other namespace. The newly defined action type is placed in the global namespace as well.

If the action type should be defined somewhere not in global namespace, the action type definition has to be split into two macro invocations (HPX_DEFINE_PLAIN_ACTION and HPX_REGISTER_ACTION) as shown in the next example:

namespace app
{
    void some_global_function(double d)
    {
        cout << d;
    }

    // On conforming compilers the following macro expands to:
    //
    //    typedef hpx::actions::make_action<
    //        decltype(&some_global_function), &some_global_function
    //    >::type some_global_action;
    //
    // This will define the action type 'some_global_action' which represents
    // the function 'some_global_function'.
    HPX_DEFINE_PLAIN_ACTION(some_global_function, some_global_action);
}

// The following macro expands to a series of definitions of global objects
// which are needed for proper serialization and initialization support
// enabling the remote invocation of the function``some_global_function``
HPX_REGISTER_ACTION(app::some_global_action, app_some_global_action);

The shown code defines an action type some_global_action inside the namespace app.

Important

If the action type definition is split between two macros as shown above, the name of the action type to create has to be the same for both macro invocations (here some_global_action).

Important

The second argument passed to HPX_REGISTER_ACTION (app_some_global_action) has to comprise a globally unique C++ identifier representing the action. This is used for serialization purposes.

For member functions of objects which have been registered with AGAS (e.g. ‘components’) a different registration macro HPX_DEFINE_COMPONENT_ACTION has to be utilized. Any component needs to be declared in a header file and have some special support macros defined in a source file. Here is an example demonstrating this. The first snippet has to go into the header file:

namespace app
{
    struct some_component
      : hpx::components::component_base<some_component>
    {
        int some_member_function(std::string s)
        {
            return boost::lexical_cast<int>(s);
        }

        // This will define the action type 'some_member_action' which
        // represents the member function 'some_member_function' of the
        // object type 'some_component'.
        HPX_DEFINE_COMPONENT_ACTION(some_component, some_member_function,
            some_member_action);
    };
}

// Note: The second argument to the macro below has to be systemwide-unique
//       C++ identifiers
HPX_REGISTER_ACTION_DECLARATION(app::some_component::some_member_action, some_component_some_action);

The next snippet belongs into a source file (e.g. the main application source file) in the simplest case:

typedef hpx::components::component<app::some_component> component_type;
typedef app::some_component some_component;

HPX_REGISTER_COMPONENT(component_type, some_component);

// The parameters for this macro have to be the same as used in the corresponding
// HPX_REGISTER_ACTION_DECLARATION() macro invocation above
typedef some_component::some_member_action some_component_some_action;
HPX_REGISTER_ACTION(some_component_some_action);

Granted, these macro invocations are a bit more complex than for simple global functions, however we believe they are still manageable.

The most important macro invocation is the HPX_DEFINE_COMPONENT_ACTION in the header file as this defines the action type we need to invoke the member function. For a complete example of a simple component action see [hpx_link examples/quickstart/component_in_executable.cpp..component_in_executable.cpp]

Action invocation

The process of invoking a global function (or a member function of an object) with the help of the associated action is called ‘applying the action’. Actions can have arguments, which will be supplied while the action is applied. At the minimum, one parameter is required to apply any action - the id of the locality the associated function should be invoked on (for global functions), or the id of the component instance (for member functions). Generally, HPX provides several ways to apply an action, all of which are described in the following sections.

Generally, HPX actions are very similar to ‘normal’ C++ functions except that actions can be invoked remotely. Fig. 8 below shows an overview of the main API exposed by HPX. This shows the function invocation syntax as defined by the C++ language (dark gray), the additional invocation syntax as provided through C++ Standard Library features (medium gray), and the extensions added by HPX (light gray) where:

  • f function to invoke,
  • p..: (optional) arguments,
  • R: return type of f,
  • action: action type defined by, HPX_DEFINE_PLAIN_ACTION or HPX_DEFINE_COMPONENT_ACTION encapsulating f,
  • a: an instance of the type `action,
  • id: the global address the action is applied to.
_images/hpx_the_api.png

Fig. 8 Overview of the main API exposed by HPX.

This figure shows that HPX allows the user to apply actions with a syntax similar to the C++ standard. In fact, all action types have an overloaded function operator allowing to synchronously apply the action. Further, HPX implements hpx::async which semantically works similar to the way std::async works for plain C++ function.

Note

The similarity of applying an action to conventional function invocations extends even further. HPX implements hpx::bind and hpx::function two facilities which are semantically equivalent to the std::bind and std::function types as defined by the C++11 Standard. While hpx::async extends beyond the conventional semantics by supporting actions and conventional C++ functions, the HPX facilities hpx::bind and hpx::function extend beyond the conventional standard facilities too. The HPX facilities not only support conventional functions, but can be used for actions as well.

Additionally, HPX exposes hpx::apply and hpx::async_continue both of which refine and extend the standard C++ facilities.

The different ways to invoke a function in HPX will be explained in more detail in the following sections.

Applying an action asynchronously without any synchronization

This method (‘fire and forget’) will make sure the function associated with the action is scheduled to run on the target locality. Applying the action does not wait for the function to start running, instead it is a fully asynchronous operation. The following example shows how to apply the action as defined in the previous section on the local locality (the locality this code runs on):

some_global_action act;     // define an instance of some_global_action
hpx::apply(act, hpx::find_here(), 2.0);

(the function hpx::find_here() returns the id of the local locality, i.e. the locality this code executes on).

Any component member function can be invoked using the same syntactic construct. Given that id is the global address for a component instance created earlier, this invocation looks like:

some_component_action act;     // define an instance of some_component_action
hpx::apply(act, id, "42");

In this case any value returned from this action (e.g. in this case the integer 42 is ignored. Please look at Action type definition for the code defining the component action some_component_action used.

Applying an action asynchronously with synchronization

This method will make sure the action is scheduled to run on the target locality. Applying the action itself does not wait for the function to start running or to complete, instead this is a fully asynchronous operation similar to using hpx::apply as described above. The difference is that this method will return an instance of a hpx::future<> encapsulating the result of the (possibly remote) execution. The future can be used to synchronize with the asynchronous operation. The following example shows how to apply the action from above on the local locality:

some_global_action act;     // define an instance of some_global_action
hpx::future<void> f = hpx::async(act, hpx::find_here(), 2.0);
//
// ... other code can be executed here
//
f.get();    // this will possibly wait for the asynchronous operation to 'return'

(as before, the function hpx::find_here() returns the id of the local locality (the locality this code is executed on).

Note

The use of a hpx::future<void> allows the current thread to synchronize with any remote operation not returning any value.

Note

Any std::future<> returned from std::async() is required to block in its destructor if the value has not been set for this future yet. This is not true for hpx::future<> which will never block in its destructor, even if the value has not been returned to the future yet. We believe that consistency in the behavior of futures is more important than standards conformance in this case.

Any component member function can be invoked using the same syntactic construct. Given that id is the global address for a component instance created earlier, this invocation looks like:

some_component_action act;     // define an instance of some_component_action
hpx::future<int> f = hpx::async(act, id, "42");
//
// ... other code can be executed here
//
cout << f.get();    // this will possibly wait for the asynchronous operation to 'return' 42

Note

The invocation of f.get() will return the result immediately (without suspending the calling thread) if the result from the asynchronous operation has already been returned. Otherwise, the invocation of f.get() will suspend the execution of the calling thread until the asynchronous operation returns its result.

Applying an action synchronously

This method will schedule the function wrapped in the specified action on the target locality. While the invocation appears to be synchronous (as we will see), the calling thread will be suspended while waiting for the function to return. Invoking a plain action (e.g. a global function) synchronously is straightforward:

some_global_action act;     // define an instance of some_global_action
act(hpx::find_here(), 2.0);

While this call looks just like a normal synchronous function invocation, the function wrapped by the action will be scheduled to run on a new thread and the calling thread will be suspended. After the new thread has executed the wrapped global function, the waiting thread will resume and return from the synchronous call.

Equivalently, any action wrapping a component member function can be invoked synchronously as follows:

some_component_action act;     // define an instance of some_component_action
int result = act(id, "42");

The action invocation will either schedule a new thread locally to execute the wrapped member function (as before, id is the global address of the component instance the member function should be invoked on), or it will send a parcel to the remote locality of the component causing a new thread to be scheduled there. The calling thread will be suspended until the function returns its result. This result will be returned from the synchronous action invocation.

It is very important to understand that this ‘synchronous’ invocation syntax in fact conceals an asynchronous function call. This is beneficial as the calling thread is suspended while waiting for the outcome of a potentially remote operation. The HPX thread scheduler will schedule other work in the mean time, allowing the application to make further progress while the remote result is computed. This helps overlapping computation with communication and hiding communication latencies.

Note

The syntax of applying an action is always the same, regardless whether the target locality is remote to the invocation locality or not. This is a very important feature of HPX as it frees the user from the task of keeping track what actions have to be applied locally and which actions are remote. If the target for applying an action is local, a new thread is automatically created and scheduled. Once this thread is scheduled and run, it will execute the function encapsulated by that action. If the target is remote, HPX will send a parcel to the remote locality which encapsulates the action and its parameters. Once the parcel is received on the remote locality HPX will create and schedule a new thread there. Once this thread runs on the remote locality, it will execute the function encapsulated by the action.

Applying an action with a continuation but without any synchronization

This method is very similar to the method described in section Applying an action asynchronously without any synchronization. The difference is that it allows the user to chain a sequence of asynchronous operations, while handing the (intermediate) results from one step to the next step in the chain. Where hpx::apply invokes a single function using ‘fire and forget’ semantics, hpx::apply_continue asynchronously triggers a chain of functions without the need for the execution flow ‘to come back’ to the invocation site. Each of the asynchronous functions can be executed on a different locality.

Applying an action with a continuation and with synchronization

This method is very similar to the method described in section Applying an action asynchronously with synchronization. In addition to what hpx::async can do, the functions hpx::async_continue takes an additional function argument. This function will be called as the continuation of the executed action. It is expected to perform additional operations and to make sure that a result is returned to the original invocation site. This method chains operations asynchronously by providing a continuation operation which is automatically executed once the first action has finished executing.

As an example we chain two actions, where the result of the first action is forwarded to the second action and the result of the second action is sent back to the original invocation site:

// first action
std::int32_t action1(std::int32_t i)
{
    return i+1;
}
HPX_PLAIN_ACTION(action1);    // defines action1_type

// second action
std::int32_t action2(std::int32_t i)
{
    return i*2;
}
HPX_PLAIN_ACTION(action2);    // defines action2_type

// this code invokes 'action1' above and passes along a continuation
// function which will forward the result returned from 'action1' to
// 'action2'.
action1_type act1;     // define an instance of 'action1_type'
action2_type act2;     // define an instance of 'action2_type'
hpx::future<int> f =
    hpx::async_continue(act1, hpx::make_continuation(act2),
        hpx::find_here(), 42);
hpx::cout << f.get() << "\n";   // will print: 86 ((42 + 1) * 2)

By default, the continuation is executed on the same locality as hpx::async_continue is invoked from. If you want to specify the locality where the continuation should be executed, the code above has to be written as:

// this code invokes 'action1' above and passes along a continuation
// function which will forward the result returned from 'action1' to
// 'action2'.
action1_type act1;     // define an instance of 'action1_type'
action2_type act2;     // define an instance of 'action2_type'
hpx::future<int> f =
    hpx::async_continue(act1, hpx::make_continuation(act2, hpx::find_here()),
        hpx::find_here(), 42);
hpx::cout << f.get() << "\n";   // will print: 86 ((42 + 1) * 2)

Similarly, it is possible to chain more than 2 operations:

action1_type act1;     // define an instance of 'action1_type'
action2_type act2;     // define an instance of 'action2_type'
hpx::future<int> f =
    hpx::async_continue(act1,
        hpx::make_continuation(act2, hpx::make_continuation(act1)),
        hpx::find_here(), 42);
hpx::cout << f.get() << "\n";   // will print: 87 ((42 + 1) * 2 + 1)

The function hpx::make_continuation creates a special function object which exposes the following prototype:

struct continuation
{
    template <typename Result>
    void operator()(hpx::id_type id, Result&& result) const
    {
        ...
    }
};

where the parameters passed to the overloaded function operator operator()() are:

  • the id is the global id where the final result of the asynchronous chain of operations should be sent to (in most cases this is the id of the hpx::future returned from the initial call to hpx::async_continue. Any custom continuation function should make sure this id is forwarded to the last operation in the chain.
  • the result is the result value of the current operation in the asynchronous execution chain. This value needs to be forwarded to the next operation.

Note

All of those operations are implemented by the predefined continuation function object which is returned from hpx::make_continuation. Any (custom) function object used as a continuation should conform to the same interface.

Action error handling

Like in any other asynchronous invocation scheme it is important to be able to handle error conditions occurring while the asynchronous (and possibly remote) operation is executed. In HPX all error handling is based on standard C++ exception handling. Any exception thrown during the execution of an asynchronous operation will be transferred back to the original invocation locality, where it is rethrown during synchronization with the calling thread.

Important

Exceptions thrown during asynchronous execution can be transferred back to the invoking thread only for the synchronous and the asynchronous case with synchronization. Like with any other unhandled exception, any exception thrown during the execution of an asynchronous action without synchronization will result in calling hpx::terminate causing the running application to exit immediately.

Note

Even if error handling internally relies on exceptions, most of the API functions exposed by HPX can be used without throwing an exception. Please see Working with exceptions for more information.

As an example, we will assume that the following remote function will be executed:

namespace app
{
    void some_function_with_error(int arg)
    {
        if (arg < 0) {
            HPX_THROW_EXCEPTION(bad_parameter, "some_function_with_error",
                "some really bad error happened");
        }
        // do something else...
    }
}

// This will define the action type 'some_error_action' which represents
// the function 'app::some_function_with_error'.
HPX_PLAIN_ACTION(app::some_function_with_error, some_error_action);

The use of HPX_THROW_EXCEPTION to report the error encapsulates the creation of a hpx::exception which is initialized with the error code hpx::bad_parameter. Additionally it carries the passed strings, the information about the file name, line number, and call stack of the point the exception was thrown from.

We invoke this action using the synchronous syntax as described before:

// note: wrapped function will throw hpx::exception
some_error_action act;            // define an instance of some_error_action
try {
    act(hpx::find_here(), -3);    // exception will be rethrown from here
}
catch (hpx::exception const& e) {
    // prints: 'some really bad error happened: HPX(bad parameter)'
    cout << e.what();
}

If this action is invoked asynchronously with synchronization, the exception is propagated to the waiting thread as well and is re-thrown from the future’s function get():

// note: wrapped function will throw hpx::exception
some_error_action act;            // define an instance of some_error_action
hpx::future<void> f = hpx::async(act, hpx::find_here(), -3);
try {
    f.get();                      // exception will be rethrown from here
}
catch (hpx::exception const& e) {
    // prints: 'some really bad error happened: HPX(bad parameter)'
    cout << e.what();
}

For more information about error handling please refer to the section Working with exceptions. There we also explain how to handle error conditions without having to rely on exception.

Writing components

A component in HPX is a C++ class which can be created remotely and for which its member functions can be invoked remotely as well. The following sections highlight how components can be defined, created, and used.

Defining components

In order for a C++ class type to be managed remotely in HPX, the type must be derived from the hpx::components::component_base template type. We call such C++ class types ‘components’.

Note that the component type itself is passed as a template argument to the base class:

// header file some_component.hpp

#include <hpx/include/components.hpp>

namespace app
{
    // Define a new component type 'some_component'
    struct some_component
      : hpx::components::component_base<some_component>
    {
        // This member function is has to be invoked remotely
        int some_member_function(std::string const& s)
        {
            return boost::lexical_cast<int>(s);
        }

        // This will define the action type 'some_member_action' which
        // represents the member function 'some_member_function' of the
        // object type 'some_component'.
        HPX_DEFINE_COMPONENT_ACTION(some_component, some_member_function, some_member_action);
    };
}

// This will generate the necessary boiler-plate code for the action allowing
// it to be invoked remotely. This declaration macro has to be placed in the
// header file defining the component itself.
//
// Note: The second argument to the macro below has to be systemwide-unique
//       C++ identifiers
//
HPX_REGISTER_ACTION_DECLARATION(app::some_component::some_member_action, some_component_some_action);

There is more boiler plate code which has to be placed into a source file in order for the component to be usable. Every component type is required to have macros placed into its source file, one for each component type and one macro for each of the actions defined by the component type.

For instance:

// source file some_component.cpp

#include "some_component.hpp"

// The following code generates all necessary boiler plate to enable the
// remote creation of 'app::some_component' instances with 'hpx::new_<>()'
//
using some_component = app::some_component;
using some_component_type = hpx::components::component<some_component>;

// Please note that the second argument to this macro must be a
// (system-wide) unique C++-style identifier (without any namespaces)
//
HPX_REGISTER_COMPONENT(some_component_type, some_component);

// The parameters for this macro have to be the same as used in the corresponding
// HPX_REGISTER_ACTION_DECLARATION() macro invocation in the corresponding
// header file.
//
// Please note that the second argument to this macro must be a
// (system-wide) unique C++-style identifier (without any namespaces)
//
HPX_REGISTER_ACTION(app::some_component::some_member_action, some_component_some_action);
Defining client side representation classes

Often it is very convenient to define a separate type for a component which can be used on the client side (from where the component is instantiated and used). This step might seem as unnecessary duplicating code, however it significantly increases the type safety of the code.

A possible implementation of such a client side representation for the component described in the previous section could look like:

#include <hpx/include/components.hpp>

namespace app
{
    // Define a client side representation type for the component type
    // 'some_component' defined in the previous section.
    //
    struct some_component_client
      : hpx::components::client_base<some_component_client, some_component>
    {
        using base_type = hpx::components::client_base<
                some_component_client, some_component>;

        some_component_client(hpx::future<hpx::id_type> && id)
          : base_type(std::move(id))
        {}

        hpx::future<int> some_member_function(std::string const& s)
        {
            some_component::some_member_action act;
            return hpx::async(act, get_id(), s);
        }
    };
}

A client side object stores the global id of the component instance it represents. This global id is accessible by calling the function client_base<>::get_id(). The special constructor which is provided in the example allows to create this client side object directly using the API function hpx::new_.

Creating component instances

Instances of defined component types can be created in two different ways. If the component to create has a defined client side representation type, then this can be used, otherwise use the server type.

The following examples assume that some_component_type is the type of the server side implementation of the component to create. All additional arguments (see , ... notation below) are passed through to the corresponding constructor calls of those objects:

// create one instance on the given locality
hpx::id_type here = hpx::find_here();
hpx::future<hpx::id_type> f =
    hpx::new_<some_component_type>(here, ...);

// create one instance using the given distribution
// policy (here: hpx::colocating_distribution_policy)
hpx::id_type here = hpx::find_here();
hpx::future<hpx::id_type> f =
    hpx::new_<some_component_type>(hpx::colocated(here), ...);

// create multiple instances on the given locality
hpx::id_type here = find_here();
hpx::future<std::vector<hpx::id_type>> f =
    hpx::new_<some_component_type[]>(here, num, ...);

// create multiple instances using the given distribution
// policy (here: hpx::binpacking_distribution_policy)
hpx::future<std::vector<hpx::id_type>> f = hpx::new_<some_component_type[]>(
    hpx::binpacking(hpx::find_all_localities()), num, ...);

The examples below demonstrate the use of the same API functions for creating client side representation objects (instead of just plain ids). These examples assume that client_type is the type of the client side representation of the component type to create. As above, all additional arguments (see , ... notation below) are passed through to the corresponding constructor calls of the server side implementation objects corresponding to the client_type:

// create one instance on the given locality
hpx::id_type here = hpx::find_here();
client_type c = hpx::new_<client_type>(here, ...);

// create one instance using the given distribution
// policy (here: hpx::colocating_distribution_policy)
hpx::id_type here = hpx::find_here();
client_type c = hpx::new_<client_type>(hpx::colocated(here), ...);

// create multiple instances on the given locality
hpx::id_type here = hpx::find_here();
hpx::future<std::vector<client_type>> f =
    hpx::new_<client_type[]>(here, num, ...);

// create multiple instances using the given distribution
// policy (here: hpx::binpacking_distribution_policy)
hpx::future<std::vector<client_type>> f = hpx::new_<client_type[]>(
    hpx::binpacking(hpx::find_all_localities()), num, ...);
Using component instances
Segmented containers

In parallel programming, there is now a plethora of solutions aimed at implementing “partially contiguous” or segmented data structures, whether on shared memory systems or distributed memory systems. HPX implements such structures by drawing inspiration from Standard C++ containers.

Using segmented containers

A segmented container is a template class that is described in the namespace hpx. All segmented containers are very similar semantically to their sequential counterpart (defined in namespace std but with an additional template parameter named DistPolicy). The distribution policy is an optional parameter that is passed last to the segmented container constructor (after the container size when no default value is given, after the default value if not). The distribution policy describes the manner in which a container is segmented and the placement of each segment among the available runtime localities.

However, only a part of the std container member functions were reimplemented:

  • (constructor), (destructor), operator=
  • operator[]
  • begin, cbegin, end, cend
  • size

An example of how to use the partitioned_vector container would be:

#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(double);

// By default, the number of segments is equal to the current number of
// localities
//
hpx::partitioned_vector<double> va(50);
hpx::partitioned_vector<double> vb(50, 0.0);

An example of how to use the partitioned_vector container with distribution policies would be:

#include <hpx/include/partitioned_vector.hpp>
#include <hpx/runtime/find_localities.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(double);

std::size_t num_segments = 10;
std::vector<hpx::id_type> locs = hpx::find_all_localities()

auto layout =
        hpx::container_layout( num_segments, locs );

// The number of segments is 10 and those segments are spread across the
// localities collected in the variable locs in a Round-Robin manner
//
hpx::partitioned_vector<double> va(50, layout);
hpx::partitioned_vector<double> vb(50, 0.0, layout);

By definition, a segmented container must be accessible from any thread although its construction is synchronous only for the thread who has called its constructor. To overcome this problem, it is possible to assign a symbolic name to the segmented container:

#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(double);

hpx::future<void> fserver = hpx::async(
  [](){
    hpx::partitioned_vector<double> v(50);

    // Register the 'partitioned_vector' with the name "some_name"
    //
    v.register_as("some_name");

    /* Do some code  */
  });

hpx::future<void> fclient =
  hpx::async(
    [](){
      // Naked 'partitioned_vector'
      //
      hpx::partitioned_vector<double> v;

      // Now the variable v points to the same 'partitioned_vector' that has
      // been registered with the name "some_name"
      //
      v.connect_to("some_name");

      /* Do some code  */
    });
Segmented containers

HPX provides the following segmented containers:

Table 35 Sequence containers
Name Description In header Class page at cppreference.com
hpx::partitioned_vector Dynamic segmented contiguous array. <hpx/include/partitioned_vector.hpp> vector
Table 36 Unordered associative containers
Name Description In header Class page at cppreference.com
hpx::unordered_map Segmented collection of key-value pairs, hashed by keys, keys are unique. <hpx/include/unordered_map.hpp> unordered_map
Segmented iterators and segmented iterator traits

The basic iterator used in the STL library is only suitable for one-dimensional structures. The iterators we use in HPX must adapt to the segmented format of our containers. Our iterators are then able to know when incrementing themselves if the next element of type T is in the same data segment or in another segment. In this second case, the iterator will automatically point to the beginning of the next segment.

Note

Note that the dereference operation operator * does not directly return a reference of type T& but an intermediate object wrapping this reference. When this object is used as an l-value, a remote write operation is performed; When this object is used as an r-value, implicit conversion to T type will take care of performing remote read operation.

It is sometimes useful not only to iterate element by element, but also segment by segment, or simply get a local iterator in order to avoid additional construction costs at each deferencing operations. To mitigate this need, the hpx::traits::segmented_iterator_traits are used.

With segmented_iterator_traits users can uniformly get the iterators which specifically iterates over segments (by providing a segmented iterator as a parameter), or get the local begin/end iterators of the nearest local segment (by providing a per-segment iterator as a parameter):

#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(double);

using iterator = hpx::partitioned_vector<T>::iterator;
using traits   = hpx::traits::segmented_iterator_traits<iterator>;

hpx::partitioned_vector<T> v;
std::size_t count = 0;

auto seg_begin = traits::segment(v.begin());
auto seg_end   = traits::segment(v.end());

// Iterate over segments
for (auto seg_it = seg_begin; seg_it != seg_end; ++seg_it)
{
    auto loc_begin = traits::begin(seg_it)
    auto loc_end   = traits::end(seg_it);

    // Iterate over elements inside segments
    for (auto lit = loc_begin; lit != loc_end; ++lit, ++count)
    {
        *lit = count;
    }
}

Which is equivalent to:

hpx::partitioned_vector<T> v;
std::size_t count = 0;

auto begin = v.begin();
auto end   = v.end();

for (auto it = begin; it != end; ++it, ++count)
{
    *it = count;
}
Using views

The use of multidimensional arrays is quite common in the numerical field whether to perform dense matrix operations or to process images. It exist many libraries which implement such object classes overloading their basic operators (e.g.``+``, -, *, (), etc.). However, such operation becomes more delicate when the underlying data layout is segmented or when it is mandatory to use optimized linear algebra subroutines (i.e. BLAS subroutines).

Our solution is thus to relax the level of abstraction by allowing the user to work not directly on n-dimensionnal data, but on “n-dimensionnal collections of 1-D arrays”. The use of well-accepted techniques on contiguous data is thus preserved at the segment level, and the composability of the segments is made possible thanks to multidimensional array-inspired access mode.

Preface: Why SPMD?

Although HPX refutes by design this programming model, the locality plays a dominant role when it comes to implement vectorized code. To maximize local computations and avoid unneeded data transfers, a parallel section (or Single Programming Multiple Data section) is required. Because the use of global variables is prohibited, this parallel section is created via the RAII idiom.

To define a parallel section, simply write an action taking a spmd_block variable as a first parameter:

#include <hpx/lcos/spmd_block.hpp>

void bulk_function(hpx::lcos::spmd_block block /* , arg0, arg1, ... */)
{
    // Parallel section

    /* Do some code */
}
HPX_PLAIN_ACTION(bulk_function, bulk_action);

Note

In the following paragraphs, we will use the term “image” several times. An image is defined as a lightweight process whose entry point is a function provided by the user. It’s an “image of the function”.

The spmd_block class contains the following methods:

  • [def Team information] get_num_images, this_image, images_per_locality
  • [def Control statements] sync_all, sync_images

Here is a sample code summarizing the features offered by the spmd_block class:

#include <hpx/lcos/spmd_block.hpp>

void bulk_function(hpx::lcos::spmd_block block /* , arg0, arg1, ... */)
{
    std::size_t num_images = block.get_num_images();
    std::size_t this_image = block.this_image();
    std::size_t images_per_locality = block.images_per_locality();

    /* Do some code */

    // Synchronize all images in the team
    block.sync_all();

    /* Do some code */

    // Synchronize image 0 and image 1
    block.sync_images(0,1);

    /* Do some code */

    std::vector<std::size_t> vec_images = {2,3,4};

    // Synchronize images 2, 3 and 4
    block.sync_images(vec_images);

    // Alternative call to synchronize images 2, 3 and 4
    block.sync_images(vec_images.begin(), vec_images.end());

    /* Do some code */

    // Non-blocking version of sync_all()
    hpx::future<void> event =
        block.sync_all(hpx::launch::async);

    // Callback waiting for 'event' to be ready before being scheduled
    hpx::future<void> cb =
        event.then(
          [](hpx::future<void>)
          {

            /* Do some code */

          });

    // Finally wait for the execution tree to be finished
    cb.get();
}
HPX_PLAIN_ACTION(bulk_test_function, bulk_test_action);

Then, in order to invoke the parallel section, call the function define_spmd_block specifying an arbitrary symbolic name and indicating the number of images per locality to create:

void bulk_function(hpx::lcos::spmd_block block, /* , arg0, arg1, ... */)
{

}
HPX_PLAIN_ACTION(bulk_test_function, bulk_test_action);

int main()
{
    /* std::size_t arg0, arg1, ...; */

    bulk_action act;
    std::size_t images_per_locality = 4;

    // Instanciate the parallel section
    hpx::lcos::define_spmd_block(
        "some_name", images_per_locality, std::move(act) /*, arg0, arg1, ... */);

    return 0;
}

Note

In principle, the user should never call the spmd_block constructor. The define_spmd_block function is responsible of instantiating spmd_block objects and broadcasting them to each created image.

SPMD multidimensional views

Some classes are defined as “container views” when the purpose is to observe and/or modify the values of a container using another perspective than the one that characterizes the container. For example, the values of an std::vector object can be accessed via the expression [i]. Container views can be used, for example, when it is desired for those values to be “viewed” as a 2D matrix that would have been flattened in a std::vector. The values would be possibly accessible via the expression vv(i,j) which would call internally the expression v[k].

By default, the partitioned_vector class integrates 1-D views of its segments:

#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(double);

using iterator = hpx::partitioned_vector<double>::iterator;
using traits   = hpx::traits::segmented_iterator_traits<iterator>;

hpx::partitioned_vector<double> v;

// Create a 1-D view of the vector of segments
auto vv = traits::segment(v.begin());

// Access segment i
std::vector<double> v = vv[i];

Our views are called “multidimensional” in the sense that they generalize to N dimensions the purpose of segmented_iterator_traits::segment() in the 1-D case. Note that in a parallel section, the 2-D expression a(i,j) = b(i,j) is quite confusing because without convention, each of the images invoked will race to execute the statement. For this reason, our views are not only multidimensional but also “spmd-aware”.

Note

SPMD-awareness: The convention is simple. If an assignment statement contains a view subscript as an l-value, it is only and only the image holding the r-value who is evaluating the statement. (In MPI sense, it is called a Put operation).

Subscript-based operations

Here are some examples of using subscripts in the 2-D view case:

#include <hpx/components/containers/partitioned_vector/partitioned_vector_view.hpp>
#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(double);

using Vec = hpx::partitioned_vector<double>;
using View_2D = hpx::partitioned_vector_view<double,2>;

/* Do some code */

Vec v;

// Parallel section (suppose 'block' an spmd_block instance)
{
    std::size_t height, width;

    // Instanciate the view
    View_2D vv(block, v.begin(), v.end(), {height,width});

    // The l-value is a view subscript, the image that owns vv(1,0)
    // evaluates the assignment.
    vv(0,1) = vv(1,0);

    // The l-value is a view subscript, the image that owns the r-value
    // (result of expression 'std::vector<double>(4,1.0)') evaluates the
    // assignment : oops! race between all participating images.
    vv(2,3) = std::vector<double>(4,1.0);
}
Iterator-based operations

Here are some examples of using iterators in the 3-D view case:

#include <hpx/components/containers/partitioned_vector/partitioned_vector_view.hpp>
#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(int);

using Vec = hpx::partitioned_vector<int>;
using View_3D = hpx::partitioned_vector_view<int,3>;

/* Do some code */

Vec v1, v2;

// Parallel section (suppose 'block' an spmd_block instance)
{
    std::size_t sixe_x, size_y, size_z;

    // Instanciate the views
    View_3D vv1(block, v1.begin(), v1.end(), {sixe_x,size_y,size_z});
    View_3D vv2(block, v2.begin(), v2.end(), {sixe_x,size_y,size_z});

    // Save previous segments covered by vv1 into segments covered by vv2
    auto vv2_it = vv2.begin();
    auto vv1_it = vv1.cbegin();

    for(; vv2_it != vv2.end(); vv2_it++, vv1_it++)
    {
        // It's a Put operation
        *vv2_it = *vv1_it;
    }

    // Ensure that all images have performed their Put operations
    block.sync_all();

    // Ensure that only one image is putting updated data into the different
    // segments covered by vv1
    if(block.this_image() == 0)
    {
        int idx = 0;

        // Update all the segments covered by vv1
        for(auto i = vv1.begin(); i != vv1.end(); i++)
        {
            // It's a Put operation
            *i = std::vector<float>(elt_size,idx++);
        }
    }
}

Here is an example that shows how to iterate only over segments owned by the current image:

#include <hpx/components/containers/partitioned_vector/partitioned_vector_view.hpp>
#include <hpx/components/containers/partitioned_vector/partitioned_vector_local_view.hpp>
#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(float);

using Vec = hpx::partitioned_vector<float>;
using View_1D = hpx::partitioned_vector_view<float,1>;

/* Do some code */

Vec v;

// Parallel section (suppose 'block' an spmd_block instance)
{
    std::size_t num_segments;

    // Instanciate the view
    View_1D vv(block, v.begin(), v.end(), {num_segments});

    // Instanciate the local view from the view
    auto local_vv = hpx::local_view(vv);

    for ( auto i = localvv.begin(); i != localvv.end(); i++ )
    {
        std::vector<float> & segment = *i;

        /* Do some code */
    }

}
Instanciating sub-views

It is possible to construct views from other views: we call it sub-views. The constraint nevertheless for the subviews is to retain the dimension and the value type of the input view. Here is an example showing how to create a sub-view:

#include <hpx/components/containers/partitioned_vector/partitioned_vector_view.hpp>
#include <hpx/include/partitioned_vector.hpp>

// The following code generates all necessary boiler plate to enable the
// remote creation of 'partitioned_vector' segments
//
HPX_REGISTER_PARTITIONED_VECTOR(float);

using Vec = hpx::partitioned_vector<float>;
using View_2D = hpx::partitioned_vector_view<float,2>;

/* Do some code */

Vec v;

// Parallel section (suppose 'block' an spmd_block instance)
{
    std::size_t N = 20;
    std::size_t tilesize = 5;

    // Instanciate the view
    View_2D vv(block, v.begin(), v.end(), {N,N});

    // Instanciate the subview
    View_2D svv(
        block,&vv(tilesize,0),&vv(2*tilesize-1,tilesize-1),{tilesize,tilesize},{N,N});

    if(block.this_image() == 0)
    {
        // Equivalent to 'vv(tilesize,0) = 2.0f'
        svv(0,0) = 2.0f;

        // Equivalent to 'vv(2*tilesize-1,tilesize-1) = 3.0f'
        svv(tilesize-1,tilesize-1) = 3.0f;
    }

}

Note

The last parameter of the subview constructor is the size of the original view. If one would like to create a subview of the subview and so on, this parameter should stay unchanged. {N,N} for the above example).

C++ co-arrays

Fortran has extended its scalar element indexing approach to reference each segment of a distributed array. In this extension, a segment is attributed a ?co-index? and lives in a specific locality. A co-index provides the application with enough information to retrieve the corresponding data reference. In C++, containers present themselves as a ?smarter? alternative of Fortran arrays but there are still no corresponding standardized features similar to the Fortran co-indexing approach. We present here an implementation of such features in HPX.

Preface: co-array, a segmented container tied to a SPMD multidimensional views

As mentioned before, a co-array is a distributed array whose segments are accessible through an array-inspired access mode. We have previously seen that it is possible to reproduce such access mode using the concept of views. Nevertheless, the user must pre-create a segmented container to instanciate this view. We illustrate below how a single constructor call can perform those two operations:

#include <hpx/components/containers/coarray/coarray.hpp>
#include <hpx/lcos/spmd_block.hpp>

// The following code generates all necessary boiler plate to enable the
// co-creation of 'coarray'
//
HPX_REGISTER_COARRAY(double);

// Parallel section (suppose 'block' an spmd_block instance)
{
    using hpx::container::placeholders::_;

    std::size_t height=32, width=4, segment_size=10;

    hpx::coarray<double,3> a(block, "a", {height,width,_}, segment_size);

    /* Do some code */
}

Unlike segmented containers, a co-array object can only be instantiated within a parallel section. Here is the description of the parameters to provide to the coarray constructor:

Table 37 Parameters of coarray constructor
Parameter Description
block Reference to a spmd_block object
"a" Symbolic name of type std::string
{height,width,_} Dimensions of the coarray object
segment_size Size of a co-indexed element (i.e. size of the object referenced by the expression a(i,j,k))

Note that the “last dimension size” cannot be set by the user. It only accepts the constexpr variable hpx::container::placeholders::_. This size, which is considered private, is equal to the number of current images (value returned by block.get_num_images()).

Note

An important constraint to remember about coarray objects is that all segments sharing the same “last dimension index” are located in the same image.

Using co-arrays

The member functions owned by the coarray objects are exactly the same as those of spmd multidimensional views. These are:

* Subscript-based operations
* Iterator-based operations

However, one additional functionality is provided. Knowing that the element a(i,j,k) is in the memory of the kth image, the use of local subscripts is possible.

Note

For spmd multidimensional views, subscripts are only global as it still involves potential remote data transfers.

Here is an example of using local subscripts:

#include <hpx/components/containers/coarray/coarray.hpp>
#include <hpx/lcos/spmd_block.hpp>

// The following code generates all necessary boiler plate to enable the
// co-creation of 'coarray'
//
HPX_REGISTER_COARRAY(double);

// Parallel section (suppose 'block' an spmd_block instance)
{
    using hpx::container::placeholders::_;

    std::size_t height=32, width=4, segment_size=10;

    hpx::coarray<double,3> a(block, "a", {height,width,_}, segment_size);

    double idx = block.this_image()*height*width;

    for (std::size_t j = 0; j<width; j++)
    for (std::size_t i = 0; i<height; i++)
    {
        // Local write operation performed via the use of local subscript
        a(i,j,_) = std::vector<double>(elt_size,idx);
        idx++;
    }

    block.sync_all();
}

Note

When the “last dimension index” of a subscript is equal to hpx::container::placeholders::_, local subscript (and not global subscript) is used. It is equivalent to a global subscript used with a “last dimension index” equal to the value returned by block.this_image().

Running on batch systems

This section walks you through launching HPX applications on various batch systems.

How to use HPX applications with PBS

Most HPX applications are executed on parallel computers. These platforms typically provide integrated job management services that facilitate the allocation of computing resources for each parallel program. HPX includes out of the box support for one of the most common job management systems, the Portable Batch System (PBS).

All PBS jobs require a script to specify the resource requirements and other parameters associated with a parallel job. The PBS script is basically a shell script with PBS directives placed within commented sections at the beginning of the file. The remaining (not commented-out) portions of the file executes just like any other regular shell script. While the description of all available PBS options is outside the scope of this tutorial (the interested reader may refer to in-depth documentation for more information), below is a minimal example to illustrate the approach. As a test application we will use the multithreaded hello_world_distributed program, explained in the section Remote execution with actions: Hello world.

#!/bin/bash
#
#PBS -l nodes=2:ppn=4

APP_PATH=~/packages/hpx/bin/hello_world_distributed
APP_OPTIONS=

pbsdsh -u