Optimizing HPX applications¶
Performance counters¶
Performance counters in HPX are used to provide information as to how well the runtime system or an application is performing. The counter data can help determine system bottlenecks, and fine-tune system and application performance. The HPX runtime system, its networking, and other layers provide counter data that an application can consume to provide users with information about how well the application is performing.
Applications can also use counter data to determine how much system resources to consume. For example, an application that transfers data over the network could consume counter data from a network switch to determine how much data to transfer without competing for network bandwidth with other network traffic. The application could use the counter data to adjust its transfer rate as the bandwidth usage from other network traffic increases or decreases.
Performance counters are HPX parallel processes that expose a predefined interface. HPX exposes special API functions that allow one to create, manage, and read the counter data, and release instances of performance counters. Performance Counter instances are accessed by name, and these names have a predefined structure which is described in the section Performance counter names. The advantage of this is that any Performance Counter can be accessed remotely (from a different locality) or locally (from the same locality). Moreover, since all counters expose their data using the same API, any code consuming counter data can be utilized to access arbitrary system information with minimal effort.
Counter data may be accessed in real time. More information about how to consume counter data can be found in the section Consuming performance counter data.
All HPX applications provide command line options related to performance counters, such as the ability to list available counter types, or periodically query specific counters to be printed to the screen or save them in a file. For more information, please refer to the section HPX Command Line Options.
Performance counter names¶
All Performance Counter instances have a name uniquely identifying each instance. This name can be used to access the counter, retrieve all related meta data, and to query the counter data (as described in the section Consuming performance counter data). Counter names are strings with a predefined structure. The general form of a countername is:
/objectname{full_instancename}/countername@parameters
where full_instancename
could be either another (full) counter name or a
string formatted as:
parentinstancename#parentindex/instancename#instanceindex
Each separate part of a countername (e.g., objectname
, countername
parentinstancename
, instancename
, and parameters
) should start with
a letter ('a'
…'z'
, 'A'
…'Z'
) or an underscore
character ('_'
), optionally followed by letters, digits ('0'
…'9'
), hyphen ('-'
), or underscore characters. Whitespace is not allowed
inside a counter name. The characters '/'
, '{'
, '}'
, '#'
and
'@'
have a special meaning and are used to delimit the different parts of
the counter name.
The parts parentinstanceindex
and instanceindex
are integers. If an
index is not specified, HPX will assume a default of -1
.
Two counter name examples¶
This section gives examples of both simple counter names and aggregate counter names. For more information on simple and aggregate counter names, please see Performance counter instances.
An example of a well-formed (and meaningful) simple counter name would be:
/threads{locality#0/total}/count/cumulative
This counter returns the current cumulative number of executed (retired)
HPX threads for the locality 0
. The counter type of this counter
is /threads/count/cumulative
and the full instance name is
locality#0/total
. This counter type does not require an instanceindex
or
parameters
to be specified.
In this case, the parentindex
(the '0'
) designates the locality
for which the counter instance is created. The counter will return the number of
HPX threads retired on that particular locality.
Another example for a well formed (aggregate) counter name is:
/statistics{/threads{locality#0/total}/count/cumulative}/average@500
This counter takes the simple counter from the first example, samples its values
every 500
milliseconds, and returns the average of the value samples
whenever it is queried. The counter type of this counter is
/statistics/average
and the instance name is the full name of the counter
for which the values have to be averaged. In this case, the parameters
(the
'500'
) specify the sampling interval for the averaging to take place (in
milliseconds).
Performance counter types¶
Every performance counter belongs to a specific performance counter type which
classifies the counters into groups of common semantics. The type of a counter
is identified by the objectname
and the countername
parts of the name.
/objectname/countername
When an application starts HPX will register all available counter types on each of the localities. These counter types are held in a special performance counter registration database, which can be used to retrieve the meta data related to a counter type and to create counter instances based on a given counter instance name.
Performance counter instances¶
The full_instancename
distinguishes different counter instances of the same
counter type. The formatting of the full_instancename
depends on the counter
type. There are two types of counters: simple counters, which usually generate
the counter values based on direct measurements, and aggregate counters, which
take another counter and transform its values before generating their own
counter values. An example for a simple counter is given above:
counting retired HPX threads. An aggregate counter is shown as an example
above as well: calculating the average of the underlying
counter values sampled at constant time intervals.
While simple counters use instance names formatted as
parentinstancename#parentindex/instancename#instanceindex
, most aggregate
counters have the full counter name of the embedded counter as their instance
name.
Not all simple counter types require specifying all four elements of a full counter
instance name; some of the parts (parentinstancename
, parentindex
,
instancename
, and instanceindex
) are optional for specific counters.
Please refer to the documentation of a particular counter for more information
about the formatting requirements for the name of this counter (see
Existing HPX performance counters).
The parameters
are used to pass additional information to a counter at
creation time. They are optional, and they fully depend on the concrete counter.
Even if a specific counter type allows additional parameters to be given, those
usually are not required as sensible defaults will be chosen. Please refer to
the documentation of a particular counter for more information about what
parameters are supported, how to specify them, and what default values are
assumed (see also Existing HPX performance counters).
Every locality of an application exposes its own set of performance counter types and performance counter instances. The set of exposed counters is determined dynamically at application start based on the execution environment of the application. For instance, this set is influenced by the current hardware environment for the locality (such as whether the locality has access to accelerators), and the software environment of the application (such as the number of OS threads used to execute HPX threads).
Using wildcards in performance counter names¶
It is possible to use wildcard characters when specifying performance counter names. Performance counter names can contain two types of wildcard characters:
Wildcard characters in the performance counter type
Wildcard characters in the performance counter instance name
A wildcard character has a meaning which is very close to usual file name wildcard matching rules implemented by common shells (like bash).
Wildcard |
Description |
|
This wildcard character matches any number (zero or more) of arbitrary characters. |
|
This wildcard character matches any single arbitrary character. |
|
This wildcard character matches any single character from the list of specified within the square brackets. |
Wildcard |
Description |
|
This wildcard character matches any locality or any thread,
depending on whether it is used for |
Consuming performance counter data¶
You can consume performance data using either the command line interface, the HPX application or the HPX API. The command line interface is easier to use, but it is less flexible and does not allow one to adjust the behaviour of your application at runtime. The command line interface provides a convenience abstraction but simplified abstraction for querying and logging performance counter data for a set of performance counters.
Consuming performance counter data from the command line¶
HPX provides a set of predefined command line options for every application
that uses hpx::init
for its initialization. While there are many more
command line options available (see HPX Command Line Options), the set of options related
to performance counters allows one to list existing counters, and query existing
counters once at application termination or repeatedly after a constant time
interval.
The following table summarizes the available command line options:
Command line option |
Description |
|
Prints the specified performance counter either repeatedly and/or at the
times specified by |
|
Prints the specified performance counter either repeatedly and/or at the
times specified by |
|
Prints the performance counter(s) specified with |
|
Prints the performance counter(s) specified with |
|
Lists the names of all registered performance counters. |
|
Lists the description of all registered performance counters. |
|
Prints the performance counter(s) specified with |
|
Prints the performance counter(s) specified with |
|
Prints the performance counter(s) specified with |
|
Resets all performance counter(s) specified with |
|
Appends counter type description to generated output. |
|
Each locality prints only its own local counters. |
While the options --hpx:list-counters
and --hpx:list-counter-infos
give
a short list of all available counters, the full documentation for those can
be found in the section Existing HPX performance counters.
A simple example¶
All of the commandline options mentioned above can be tested using
the hello_world_distributed
example.
Listing all available counters hello_world_distributed --hpx:list-counters
yields:
List of available counter instances (replace * below with the appropriate
sequence number)
-------------------------------------------------------------------------
/agas/count/allocate /agas/count/bind /agas/count/bind_gid
/agas/count/bind_name ... /threads{locality#*/allocator#*}/count/objects
/threads{locality#*/total}/count/stack-recycles
/threads{locality#*/total}/idle-rate
/threads{locality#*/worker-thread#*}/idle-rate
Providing more information about all available counters,
hello_world_distributed --hpx:list-counter-infos
yields:
Information about available counter instances (replace * below with the
appropriate sequence number)
------------------------------------------------------------------------------
fullname: /agas/count/allocate helptext: returns the number of invocations of
the AGAS service 'allocate' type: counter_raw version: 1.0.0
------------------------------------------------------------------------------
------------------------------------------------------------------------------
fullname: /agas/count/bind helptext: returns the number of invocations of the
AGAS service 'bind' type: counter_raw version: 1.0.0
------------------------------------------------------------------------------
------------------------------------------------------------------------------
fullname: /agas/count/bind_gid helptext: returns the number of invocations of
the AGAS service 'bind_gid' type: counter_raw version: 1.0.0
------------------------------------------------------------------------------
...
This command will not only list the counter names but also a short description of the data exposed by this counter.
Note
The list of available counters may differ depending on the concrete execution environment (hardware or software) of your application.
Requesting the counter data for one or more performance counters can be achieved
by invoking hello_world_distributed
with a list of counter names:
hello_world_distributed \
--hpx:print-counter=/threads{locality#0/total}/count/cumulative \
--hpx:print-counter=/agas{locality#0/total}/count/bind
which yields for instance:
hello world from OS-thread 0 on locality 0
/threads{locality#0/total}/count/cumulative,1,0.212527,[s],33
/agas{locality#0/total}/count/bind,1,0.212790,[s],11
The first line is the normal output generated by hello_world_distributed
and
has no relation to the counter data listed. The last two lines contain the
counter data as gathered at application shutdown. These lines have six fields, the
counter name, the sequence number of the counter invocation, the time stamp at
which this information has been sampled, the unit of measure for the time stamp,
the actual counter value and an optional unit of measure for the counter value.
Note
The command line option --hpx:print-counter-types
will append a seventh
field to the generated output. This field will hold an abbreviated counter
type.
The actual counter value can be represented by a single number (for counters
returning singular values) or a list of numbers separated by ':'
(for
counters returning an array of values, like for instance a histogram).
Note
The name of the performance counter will be enclosed in double quotes '"'
if it contains one or more commas ','
.
Requesting to query the counter data once after a constant time interval with this command line:
hello_world_distributed \
--hpx:print-counter=/threads{locality#0/total}/count/cumulative \
--hpx:print-counter=/agas{locality#0/total}/count/bind \
--hpx:print-counter-interval=20
yields for instance (leaving off the actual console output of the
hello_world_distributed
example for brevity):
threads{locality#0/total}/count/cumulative,1,0.002409,[s],22
agas{locality#0/total}/count/bind,1,0.002542,[s],9
threads{locality#0/total}/count/cumulative,2,0.023002,[s],41
agas{locality#0/total}/count/bind,2,0.023557,[s],10
threads{locality#0/total}/count/cumulative,3,0.037514,[s],46
agas{locality#0/total}/count/bind,3,0.038679,[s],10
The command --hpx:print-counter-destination=<file>
will redirect all counter
data gathered to the specified file name, which avoids cluttering the console
output of your application.
The command line option --hpx:print-counter
supports using a limited set of
wildcards for a (very limited) set of use cases. In particular, all occurrences
of #*
as in locality#*
and in worker-thread#*
will be automatically
expanded to the proper set of performance counter names representing the actual
environment for the executed program. For instance, if your program is utilizing
four worker threads for the execution of HPX threads (see command line option
--hpx:threads
) the following command line
hello_world_distributed \
--hpx:threads=4 \
--hpx:print-counter=/threads{locality#0/worker-thread#*}/count/cumulative
will print the value of the performance counters monitoring each of the worker threads:
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
hello world from OS-thread 3 on locality 0
hello world from OS-thread 2 on locality 0
/threads{locality#0/worker-thread#0}/count/cumulative,1,0.0025214,[s],27
/threads{locality#0/worker-thread#1}/count/cumulative,1,0.0025453,[s],33
/threads{locality#0/worker-thread#2}/count/cumulative,1,0.0025683,[s],29
/threads{locality#0/worker-thread#3}/count/cumulative,1,0.0025904,[s],33
The command --hpx:print-counter-format
takes values csv
and
csv-short
to generate CSV formatted counter values with a header.
With format as csv:
hello_world_distributed \
--hpx:threads=2 \
--hpx:print-counter-format csv \
--hpx:print-counter /threads{locality#*/total}/count/cumulative \
--hpx:print-counter /threads{locality#*/total}/count/cumulative-phases
will print the values of performance counters in CSV format with the full countername as a header:
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
/threads{locality#*/total}/count/cumulative,/threads{locality#*/total}/count/cumulative-phases
39,93
With format csv-short:
hello_world_distributed \
--hpx:threads 2 \
--hpx:print-counter-format csv-short \
--hpx:print-counter cumulative,/threads{locality#*/total}/count/cumulative \
--hpx:print-counter phases,/threads{locality#*/total}/count/cumulative-phases
will print the values of performance counters in CSV format with the short countername as a header:
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
cumulative,phases
39,93
With format csv and csv-short when used with --hpx:print-counter-interval
:
hello_world_distributed \
--hpx:threads 2 \
--hpx:print-counter-format csv-short \
--hpx:print-counter cumulative,/threads{locality#*/total}/count/cumulative \
--hpx:print-counter phases,/threads{locality#*/total}/count/cumulative-phases \
--hpx:print-counter-interval 5
will print the header only once repeating the performance counter value(s) repeatedly:
cum,phases
25,42
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
44,95
The command --hpx:no-csv-header
can be used with
--hpx:print-counter-format
to print performance counter values in CSV format
without any header:
hello_world_distributed \
--hpx:threads 2 \
--hpx:print-counter-format csv-short \
--hpx:print-counter cumulative,/threads{locality#*/total}/count/cumulative \
--hpx:print-counter phases,/threads{locality#*/total}/count/cumulative-phases \
--hpx:no-csv-header
will print:
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
37,91
Consuming performance counter data using the HPX API¶
HPX provides an API that allows users to discover performance counters and to retrieve the current value of any existing performance counter from any application.
Discover existing performance counters¶
Retrieve the current value of any performance counter¶
Performance counters are specialized HPX components. In order to retrieve a counter value, the performance counter needs to be instantiated. HPX exposes a client component object for this purpose:
hpx::performance_counters::performance_counter counter(std::string const& name);
Instantiating an instance of this type will create the performance counter
identified by the given name
. Only the first invocation for any given counter
name will create a new instance of that counter. All following invocations for a
given counter name will reference the initially created instance. This ensures
that at any point in time there is never more than one active instance of
any of the existing performance counters.
In order to access the counter value (or to invoke any of the other functionality
related to a performance counter, like start
, stop
or reset
) member
functions of the created client component instance should be called:
// print the current number of threads created on locality 0
hpx::performance_counters::performance_counter count(
"/threads{locality#0/total}/count/cumulative");
hpx::cout << count.get_value<int>().get() << hpx::endl;
For more information about the client component type, see
hpx::performance_counters::performance_counter
Note
In the above example count.get_value()
returns a future. In order to print
the result we must append .get()
to retrieve the value. You could write the
above example like this for more clarity:
// print the current number of threads created on locality 0
hpx::performance_counters::performance_counter count(
"/threads{locality#0/total}/count/cumulative");
hpx::future<int> result = count.get_value<int>();
hpx::cout << result.get() << hpx::endl;
Providing performance counter data¶
HPX offers several ways by which you may provide your own data as a performance counter. This has the benefit of exposing additional, possibly application-specific information using the existing Performance Counter framework, unifying the process of gathering data about your application.
An application that wants to provide counter data can implement a performance counter to provide the data. When a consumer queries performance data, the HPX runtime system calls the provider to collect the data. The runtime system uses an internal registry to determine which provider to call.
Generally, there are two ways of exposing your own performance counter data: a simple, function-based way and a more complex, but more powerful way of implementing a full performance counter. Both alternatives are described in the following sections.
Exposing performance counter data using a simple function¶
The simplest way to expose arbitrary numeric data is to write a function which will then be called whenever a consumer queries this counter. Currently, this type of performance counter can only be used to expose integer values. The expected signature of this function is:
std::int64_t some_performance_data(bool reset);
The argument bool reset
(which is supplied by the runtime system when the
function is invoked) specifies whether the counter value should be reset after
evaluating the current value (if applicable).
For instance, here is such a function returning how often it was invoked:
// The atomic variable 'counter' ensures the thread safety of the counter.
boost::atomic<std::int64_t> counter(0);
std::int64_t some_performance_data(bool reset)
{
std::int64_t result = ++counter;
if (reset)
counter = 0;
return result;
}
This example function exposes a linearly-increasing value as our performance data. The value is incremented on each invocation, i.e., each time a consumer requests the counter data of this performance counter.
The next step in exposing this counter to the runtime system is to register the
function as a new raw counter type using the HPX API function
hpx::performance_counters::install_counter_type
. A counter type
represents certain common characteristics of counters, like their counter type
name and any associated description information. The following snippet shows an
example of how to register the function some_performance_data
, which is shown
above, for a counter type named "/test/data"
. This registration has to be
executed before any consumer instantiates, and queries an instance of this
counter type:
#include <hpx/include/performance_counters.hpp>
void register_counter_type()
{
// Call the HPX API function to register the counter type.
hpx::performance_counters::install_counter_type(
"/test/data", // counter type name
&some_performance_data, // function providing counter data
"returns a linearly increasing counter value" // description text (optional)
"" // unit of measure (optional)
);
}
Now it is possible to instantiate a new counter instance based on the naming
scheme "/test{locality#*/total}/data"
where *
is a zero-based integer
index identifying the locality for which the counter instance should be
accessed. The function
hpx::performance_counters::install_counter_type
enables users to
instantiate exactly one counter instance for each locality. Repeated
requests to instantiate such a counter will return the same instance, i.e., the
instance created for the first request.
If this counter needs to be accessed using the standard HPX command line
options, the registration has to be performed during application startup, before
hpx_main
is executed. The best way to achieve this is to register an HPX
startup function using the API function
hpx::register_startup_function
before calling hpx::init
to
initialize the runtime system:
int main(int argc, char* argv[])
{
// By registering the counter type we make it available to any consumer
// who creates and queries an instance of the type "/test/data".
//
// This registration should be performed during startup. The
// function 'register_counter_type' should be executed as an HPX thread right
// before hpx_main is executed.
hpx::register_startup_function(®ister_counter_type);
// Initialize and run HPX.
return hpx::init(argc, argv);
}
Please see the code in simplest_performance_counter.cpp
for a full example demonstrating this functionality.
Implementing a full performance counter¶
Sometimes, the simple way of exposing a single value as a performance counter is not sufficient. For that reason, HPX provides a means of implementing full performance counters which support:
Retrieving the descriptive information about the performance counter
Retrieving the current counter value
Resetting the performance counter (value)
Starting the performance counter
Stopping the performance counter
Setting the (initial) value of the performance counter
Every full performance counter will implement a predefined interface:
// Copyright (c) 2007-2020 Hartmut Kaiser
//
// SPDX-License-Identifier: BSL-1.0
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#pragma once
#include <hpx/config.hpp>
#include <hpx/async_base/launch_policy.hpp>
#include <hpx/components/client_base.hpp>
#include <hpx/functional/bind_front.hpp>
#include <hpx/futures/future.hpp>
#include <hpx/modules/execution.hpp>
#include <hpx/performance_counters/counters_fwd.hpp>
#include <hpx/performance_counters/server/base_performance_counter.hpp>
#include <string>
#include <utility>
#include <vector>
///////////////////////////////////////////////////////////////////////////////
namespace hpx { namespace performance_counters {
///////////////////////////////////////////////////////////////////////////
struct HPX_EXPORT performance_counter
: components::client_base<performance_counter,
server::base_performance_counter>
{
using base_type = components::client_base<performance_counter,
server::base_performance_counter>;
performance_counter() = default;
performance_counter(std::string const& name);
performance_counter(
std::string const& name, hpx::id_type const& locality);
performance_counter(id_type const& id)
: base_type(id)
{
}
performance_counter(future<id_type>&& id)
: base_type(std::move(id))
{
}
performance_counter(hpx::future<performance_counter>&& c)
: base_type(std::move(c))
{
}
///////////////////////////////////////////////////////////////////////
future<counter_info> get_info() const;
counter_info get_info(
launch::sync_policy, error_code& ec = throws) const;
future<counter_value> get_counter_value(bool reset = false);
counter_value get_counter_value(
launch::sync_policy, bool reset = false, error_code& ec = throws);
future<counter_value> get_counter_value() const;
counter_value get_counter_value(
launch::sync_policy, error_code& ec = throws) const;
future<counter_values_array> get_counter_values_array(
bool reset = false);
counter_values_array get_counter_values_array(
launch::sync_policy, bool reset = false, error_code& ec = throws);
future<counter_values_array> get_counter_values_array() const;
counter_values_array get_counter_values_array(
launch::sync_policy, error_code& ec = throws) const;
///////////////////////////////////////////////////////////////////////
future<bool> start();
bool start(launch::sync_policy, error_code& ec = throws);
future<bool> stop();
bool stop(launch::sync_policy, error_code& ec = throws);
future<void> reset();
void reset(launch::sync_policy, error_code& ec = throws);
future<void> reinit(bool reset = true);
void reinit(
launch::sync_policy, bool reset = true, error_code& ec = throws);
///////////////////////////////////////////////////////////////////////
future<std::string> get_name() const;
std::string get_name(
launch::sync_policy, error_code& ec = throws) const;
private:
template <typename T>
static T extract_value(future<counter_value>&& value)
{
return value.get().get_value<T>();
}
public:
template <typename T>
future<T> get_value(bool reset = false)
{
return get_counter_value(reset).then(hpx::launch::sync,
util::bind_front(&performance_counter::extract_value<T>));
}
template <typename T>
T get_value(
launch::sync_policy, bool reset = false, error_code& ec = throws)
{
return get_counter_value(launch::sync, reset).get_value<T>(ec);
}
template <typename T>
future<T> get_value() const
{
return get_counter_value().then(hpx::launch::sync,
util::bind_front(&performance_counter::extract_value<T>));
}
template <typename T>
T get_value(launch::sync_policy, error_code& ec = throws) const
{
return get_counter_value(launch::sync).get_value<T>(ec);
}
};
// Return all counters matching the given name (with optional wild cards).
HPX_EXPORT std::vector<performance_counter> discover_counters(
std::string const& name, error_code& ec = throws);
}} // namespace hpx::performance_counters
In order to implement a full performance counter, you have to create an HPX
component exposing this interface. To simplify this task, HPX provides a
ready-made base class which handles all the boiler plate of creating
a component for you. The remainder of this section will explain the process of creating a full
performance counter based on the Sine example, which you can find in the
directory examples/performance_counters/sine/
.
The base class is defined in the header file [hpx_link hpx/performance_counters/base_performance_counter.hpp..hpx/performance_counters/base_performance_counter.hpp] as:
// Copyright (c) 2007-2018 Hartmut Kaiser
//
// SPDX-License-Identifier: BSL-1.0
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#pragma once
#include <hpx/config.hpp>
#include <hpx/actions_base/component_action.hpp>
#include <hpx/components_base/component_type.hpp>
#include <hpx/components_base/server/component_base.hpp>
#include <hpx/performance_counters/counters.hpp>
#include <hpx/performance_counters/server/base_performance_counter.hpp>
///////////////////////////////////////////////////////////////////////////////
//[performance_counter_base_class
namespace hpx { namespace performance_counters {
template <typename Derived>
class base_performance_counter;
}} // namespace hpx::performance_counters
//]
///////////////////////////////////////////////////////////////////////////////
namespace hpx { namespace performance_counters {
template <typename Derived>
class base_performance_counter
: public hpx::performance_counters::server::base_performance_counter
, public hpx::components::component_base<Derived>
{
private:
typedef hpx::components::component_base<Derived> base_type;
public:
typedef Derived type_holder;
typedef hpx::performance_counters::server::base_performance_counter
base_type_holder;
base_performance_counter() {}
base_performance_counter(
hpx::performance_counters::counter_info const& info)
: base_type_holder(info)
{
}
// Disambiguate finalize() which is implemented in both base classes
void finalize()
{
base_type_holder::finalize();
base_type::finalize();
}
};
}} // namespace hpx::performance_counters
The single template parameter is expected to receive the type of the derived class implementing the performance counter. In the Sine example this looks like:
// Copyright (c) 2007-2012 Hartmut Kaiser
//
// SPDX-License-Identifier: BSL-1.0
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#pragma once
#include <hpx/config.hpp>
#if !defined(HPX_COMPUTE_DEVICE_CODE)
#include <hpx/hpx.hpp>
#include <hpx/include/lcos_local.hpp>
#include <hpx/include/performance_counters.hpp>
#include <hpx/include/util.hpp>
#include <cstdint>
namespace performance_counters { namespace sine { namespace server
{
///////////////////////////////////////////////////////////////////////////
//[sine_counter_definition
class sine_counter
: public hpx::performance_counters::base_performance_counter<sine_counter>
//]
{
public:
sine_counter() : current_value_(0), evaluated_at_(0) {}
explicit sine_counter(
hpx::performance_counters::counter_info const& info);
/// This function will be called in order to query the current value of
/// this performance counter
hpx::performance_counters::counter_value get_counter_value(bool reset);
/// The functions below will be called to start and stop collecting
/// counter values from this counter.
bool start();
bool stop();
/// finalize() will be called just before the instance gets destructed
void finalize();
protected:
bool evaluate();
private:
typedef hpx::lcos::local::spinlock mutex_type;
mutable mutex_type mtx_;
double current_value_;
std::uint64_t evaluated_at_;
hpx::util::interval_timer timer_;
};
}}}
#endif
i.e., the type sine_counter
is derived from the base class passing the type
as a template argument (please see simplest_performance_counter.cpp
for the full source code of the counter definition). For more information about this
technique (called Curiously Recurring Template Pattern - CRTP), please see for
instance the corresponding Wikipedia article. This base
class itself is derived from the performance_counter
interface described
above.
Additionally, a full performance counter implementation not only exposes the actual value but also provides information about:
The point in time a particular value was retrieved.
A (sequential) invocation count.
The actual counter value.
An optional scaling coefficient.
Information about the counter status.
Existing HPX performance counters¶
The HPX runtime system exposes a wide variety of predefined performance counters. These counters expose critical information about different modules of the runtime system. They can help determine system bottlenecks and fine-tune system and application performance.
Counter type |
Counter instance formatting |
Description |
Parameters |
where:
primary namespace services: component namespace services: locality namespace services: symbol namespace services: |
where:
The value for |
None |
Returns the total number of invocations of the specified AGAS service since its creation. |
where:
|
where:
|
None |
Returns the overall total number of invocations of all AGAS services provided by the given AGAS service category since its creation. |
where:
primary namespace services: component namespace services: locality namespace services: symbol namespace services: |
where:
The value for |
None |
Returns the overall execution time of the specified AGAS service since its creation (in nanoseconds). |
where:
|
where:
|
None |
Returns the overall execution time of all AGAS services provided by the given AGAS service category since its creation (in nanoseconds). |
|
where:
|
None |
Returns the number of cache entries resident in the AGAS cache of
the specified locality (see |
where:
|
where:
|
None |
Returns the number of cache events (evictions, hits, inserts, and misses)
in the AGAS cache of the specified locality (see
|
where:
|
where:
|
None |
Returns the number of invocations of the specified cache API function of the AGAS cache. |
where:
|
where:
|
None |
Returns the overall time spent executing of the specified API function of the AGAS cache. |
Counter type |
Counter instance formatting |
Description |
Parameters |
where:
|
where:
|
Returns the overall number of raw (uncompressed) bytes sent or received
(see The performance counters for the connection type Please see CMake variables used to configure HPX for more details. |
None |
where:
|
where:
|
Returns the total time (in nanoseconds) between the start of each
asynchronous transmission operation and the end of the corresponding
operation for the specified The performance counters for the connection type Please see CMake variables used to configure HPX for more details. |
None |
where:
|
where:
|
Returns the overall number of bytes transferred (see The performance counters for the connection type Please see CMake variables used to configure HPX for more details. |
If the configure-time option |
where:
|
where:
|
Returns the overall time spent performing outgoing data serialization for
the specified The performance counters for the connection type Please see CMake variables used to configure HPX for more details. |
If the configure-time option |
|
where:
|
Returns the overall number of routed (outbound) parcels transferred by the given locality. Routed parcels are those which cannot directly be delivered to its destination as the local AGAS is not able to resolve the destination address. In this case a parcel is sent to the AGAS service component which is responsible for creating the destination GID (and is responsible for resolving the destination address). This AGAS service component will deliver the parcel to its final target. |
If the configure-time option |
where:
|
where:
|
Returns the overall number of parcels transferred using the specified
The performance counters for the connection type Please see CMake variables used to configure HPX for more details. |
None |
where:
|
where:
|
Returns the overall number of messages 1 transferred using the
specified The performance counters for the connection type Please see CMake variables used to configure HPX for more details. |
None |
where:
<connection_type` is one of the following: |
where:
|
Returns the overall number cache events (evictions, hits, inserts,
misses, and reclaims) for the connection cache of the given connection
type on the given locality (see The performance counters for the connection type Please see CMake variables used to configure HPX for more details. |
None |
where:
|
where:
|
Returns the current number of parcels stored in the parcel queue (see
|
None |
Counter type |
Counter instance formatting |
Description |
Parameters |
|
where:
|
Returns the overall number of executed (retired) HPX-threads on the
given locality since application start. If the instance name is
|
None |
|
where:
|
Returns the average time spent executing one HPX-thread on the given
locality since application start. If the instance name is |
None |
|
where:
|
Returns the average time spent on overhead while executing one
HPX-thread on the given locality since application start. If
the instance name is |
None |
|
where:
|
Returns the overall number of executed HPX-thread phases (invocations)
on the given locality since application start. If the instance
name is |
None |
|
where:
|
Returns the average time spent executing one HPX-thread phase
(invocation) on the given locality since application start. If
the instance name is |
None |
|
where:
|
Returns the average time spent on overhead executing one HPX-thread
phase (invocation) on the given locality since application start.
If the instance name is |
None |
|
where:
|
Returns the overall time spent running the scheduler on the given
locality since application start. If the instance name is |
None |
|
where:
|
Returns the overall time spent executing all HPX-threads on the given
locality since application start. If the instance name is |
None |
|
where:
|
Returns the overall overhead time incurred executing all HPX-threads on
the given locality since application start. If the instance name
is |
None |
where:
|
where:
The |
Returns the current number of HPX-threads having the given thread state
on the given locality. If the instance name is |
None |
where:
|
where:
The |
Returns the average wait time of HPX-threads (if the thread state is
These counters are available only if the compile time constant
|
None |
|
where:
|
Returns the average idle rate for the given worker thread(s) on the given
locality. The idle rate is defined as the ratio of the time spent
on scheduling and management tasks and the overall time spent executing
work since the application started. This counter is available only if the
configuration time constant |
None |
|
where:
|
Returns the average idle rate for the given worker thread(s) on the given
locality which is caused by creating new threads. The creation idle
rate is defined as the ratio of the time spent on creating new threads and
the overall time spent executing work since the application started. This
counter is available only if the configuration time constants
|
None |
|
where:
|
Returns the average idle rate for the given worker thread(s) on the given
locality which is caused by cleaning up terminated threads. The
cleanup idle rate is defined as the ratio of the time spent on cleaning up
terminated thread objects and the overall time spent executing work since
the application started. This counter is available only if the
configuration time constants |
None |
|
where:
|
Returns the overall length of all queues for the given worker thread(s) on the given locality. |
None |
|
where:
|
Returns the total number of HPX-thread unbind (madvise) operations performed for the referenced locality. Note that this counter is not available on Windows based platforms. |
None |
|
where:
|
Returns the total number of HPX-thread recycling operations performed. |
None |
|
|
Returns the total number of HPX-threads ‘stolen’ from the pending
thread queue by a neighboring thread worker thread (these threads are
executed by a different worker thread than they were initially scheduled
on). This counter is available only if the configuration time constant
|
None |
|
where:
|
Returns the total number of times that the referenced worker-thread on
the referenced locality failed to find pending HPX-threads in
its associated queue. This counter is available only if the configuration
time constant |
None |
|
where:
|
Returns the total number of times that the referenced worker-thread on
the referenced locality looked for pending HPX-threads in its
associated queue. This counter is available only if the configuration
time constant |
None |
|
where:
|
Returns the total number of HPX-threads ‘stolen’ from the staged thread
queue by a neighboring worker thread (these threads are executed by a
different worker thread than they were initially scheduled on). This
counter is available only if the configuration time constant
|
None |
|
where:
|
Returns the total number of HPX-threads ‘stolen’ to the pending thread
queue of the worker thread (these threads are executed by a different
worker thread than they were initially scheduled on). This counter is
available only if the configuration time constant
|
None |
|
where:
|
Returns the total number of HPX-threads ‘stolen’ to the staged thread
queue of a neighboring worker thread (these threads are executed by a
different worker thread than they were initially scheduled on). This
counter is available only if the configuration time constant
|
None |
|
where:
|
Returns the total number of HPX-thread objects created. Note that thread objects are reused to improve system performance, thus this number does not reflect the number of actually executed (retired) HPX-threads. |
None |
|
where:
|
|
Percent |
|
where:
|
Returns the current (instantaneous) idle-loop count for the given HPX- worker thread or the accumulated value for all worker threads. |
None |
|
where:
|
Returns the current (instantaneous) busy-loop count for the given HPX- worker thread or the accumulated value for all worker threads. |
None |
|
where:
|
Returns the overall time spent performing background work on the given
locality since application start. If the instance name is |
None |
|
where:
|
Returns the background overhead on the given locality since application
start. If the instance name is |
None |
|
where:
|
Returns the overall time spent performing background work related to
sending parcels on the given locality since application start. If the
instance name is This counter will currently return meaningful values for the MPI parcelport only. |
None |
|
where:
|
Returns the background overhead related to sending parcels on the given
locality since application start. If the instance name is This counter will currently return meaningful values for the MPI parcelport only. |
None |
|
where:
|
Returns the overall time spent performing background work related to
receiving parcels on the given locality since application start. If the
instance name is This counter will currently return meaningful values for the MPI parcelport only. |
None |
|
where:
|
Returns the background overhead related to receiving parcels on the given
locality since application start. If the instance name is This counter will currently return meaningful values for the MPI parcelport only. |
None |
Counter type |
Counter instance formatting |
Description |
Parameters |
|
where:
|
Returns the overall number of currently active components of the specified type on the given locality. |
The type of the component. This is the string which has been used while
registering the component with HPX, e.g. which has been passed as the
second parameter to the macro |
|
|
Returns the overall (local) invocation count of the specified action type on the given locality. |
The action type. This is the string which has been used while registering
the action with HPX, e.g. which has been passed as the second parameter
to the macro |
|
where:
|
Returns the overall (remote) invocation count of the specified action type on the given locality. |
The action type. This is the string which has been used while registering
the action with HPX, e.g. which has been passed as the second parameter
to the macro |
|
where:
|
Returns the overall time since application start on the given locality in nanoseconds. |
None |
|
where:
|
Returns the amount of virtual memory currently allocated by the referenced locality (in bytes). |
None |
|
where:
|
Returns the amount of resident memory currently allocated by the referenced locality (in bytes). |
None |
|
where:
|
|
None |
|
where:
|
Returns the number of bytes read by the process (aggregate of count arguments passed to read() call or its analogues). This performance counter is available only on systems which expose the related data through the /proc file system. |
None |
|
where:
|
Returns the number of bytes written by the process (aggregate of count arguments passed to write() call or its analogues). This performance counter is available only on systems which expose the related data through the /proc file system. |
None |
|
where:
|
Returns the number of system calls that perform I/O reads. This performance counter is available only on systems which expose the related data through the /proc file system. |
None |
|
where:
|
Returns the number of system calls that perform I/O writes. This performance counter is available only on systems which expose the related data through the /proc file system. |
None |
|
where:
|
Returns the number of bytes retrieved from storage by I/O operations. This performance counter is available only on systems which expose the related data through the /proc file system. |
None |
|
where:
|
Returns the number of bytes retrieved from storage by I/O operations. This performance counter is available only on systems which expose the related data through the /proc file system. |
None |
|
where:
|
Returns the number of bytes accounted by write_bytes_transferred that has not been ultimately stored due to truncation or deletion. This performance counter is available only on systems which expose the related data through the /proc file system. |
None |
Counter type |
Counter instance formatting |
Description |
Parameters |
where:
For a full list of available PAPI events and their (short) description
use the |
where:
|
This counter returns the current count of occurrences of the specified
PAPI event. This counter is available only if the configuration time
constant |
None |
Counter type |
Counter instance formatting |
Description |
Parameters |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current average (mean) value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to two comma separated
(integer) values, where the first is the time interval (in milliseconds)
at which the underlying counter should be queried. If no value is
specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current rolling average (mean) value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to three comma
separated (integer) values, where the first is the time interval (in
milliseconds) at which the underlying counter should be queried. If no
value is specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current standard deviation (stddev) value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to two comma separated
(integer) values, where the first is the time interval (in milliseconds)
at which the underlying counter should be queried. If no value is
specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current rolling variance (stddev) value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to three comma
separated (integer) values, where the first is the time interval (in
milliseconds) at which the underlying counter should be queried. If no
value is specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current (statistically estimated) median value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to two comma separated
(integer) values, where the first is the time interval (in milliseconds)
at which the underlying counter should be queried. If no value is
specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current maximum value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to two comma separated
(integer) values, where the first is the time interval (in milliseconds)
at which the underlying counter should be queried. If no value is
specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current rolling maximum value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to three comma
separated (integer) values, where the first is the time interval (in
milliseconds) at which the underlying counter should be queried. If no
value is specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current minimum value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to two comma separated
(integer) values, where the first is the time interval (in milliseconds)
at which the underlying counter should be queried. If no value is
specified, the counter will assume |
|
Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. |
Returns the current rolling minimum value calculated based on the values queried from the underlying counter (the one specified as the instance name). |
Any parameter will be interpreted as a list of up to three comma
separated (integer) values, where the first is the time interval (in
milliseconds) at which the underlying counter should be queried. If no
value is specified, the counter will assume |
Counter type |
Counter instance formatting |
Description |
Parameters |
|
None |
Returns the sum calculated based on the values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the difference calculated based on the values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the product calculated based on the values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the result of division of the values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the average value of all values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the standard deviation of all values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the median value of all values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the minimum value of all values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the maximum value of all values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
|
None |
Returns the count value of all values queried from the underlying counters (the ones specified as the parameters). |
The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. |
Note
The /arithmetics
counters can consume an arbitrary number of other
counters. For this reason those have to be specified as parameters (a comma
separated list of counters appended after a '@'
). For instance:
./bin/hello_world_distributed -t2 \
--hpx:print-counter=/threads{locality#0/worker-thread#*}/count/cumulative \
--hpx:print-counter=/arithmetics/add@/threads{locality#0/worker-thread#*}/count/cumulative
hello world from OS-thread 0 on locality 0
hello world from OS-thread 1 on locality 0
/threads{locality#0/worker-thread#0}/count/cumulative,1,0.515640,[s],25
/threads{locality#0/worker-thread#1}/count/cumulative,1,0.515520,[s],36
/arithmetics/add@/threads{locality#0/worker-thread#*}/count/cumulative,1,0.516445,[s],64
Since all wildcards in the parameters are expanded, this example is fully
equivalent to specifying both counters separately to /arithmetics/add
:
./bin/hello_world_distributed -t2 \
--hpx:print-counter=/threads{locality#0/worker-thread#*}/count/cumulative \
--hpx:print-counter=/arithmetics/add@\
/threads{locality#0/worker-thread#0}/count/cumulative,\
/threads{locality#0/worker-thread#1}/count/cumulative
Counter type |
Counter instance formatting |
Description |
Parameters |
|
where:
|
Returns the number of parcels handled by the message handler associated with the action which is given by the counter parameter. |
The action type. This is the string which has been used while registering
the action with HPX, e.g. which has been passed as the second parameter
to the macro |
|
where:
|
Returns the number of messages generated by the message handler associated with the action which is given by the counter parameter. |
The action type. This is the string which has been used while registering
the action with HPX, e.g. which has been passed as the second parameter
to the macro |
|
where:
|
Returns the average number of parcels sent in a message generated by the message handler associated with the action which is given by the counter parameter. |
The action type. This is the string which has been used while registering
the action with HPX, e.g. which has been passed as the second parameter
to the macro |
|
where:
|
Returns the average time between arriving parcels for the action which is given by the counter parameter. |
The action type. This is the string which has been used while registering
the action with HPX, e.g. which has been passed as the second parameter
to the macro |
|
where:
|
Returns a histogram representing the times between arriving parcels for the action which is given by the counter parameter. This counter returns an array of values, where the first three values represent the three parameters used for the histogram followed by one value for each of the histogram buckets. The first unit of measure displayed for this counter For each bucket the counter shows a value between |
The action type and optional histogram parameters. The action type is
the string which has been used while registering the action with HPX,
e.g. which has been passed as the second parameter to the macro
The action type may be followed by a comma separated list of up-to three
numbers: the lower and upper boundaries for the collected histogram, and
the number of buckets for the histogram to generate. By default these
three numbers will be assumed to be |
Note
The performance counters related to parcel coalescing are available only if
the configuration time constant HPX_WITH_PARCEL_COALESCING
is set to
ON
(default: ON
). However, even in this case it will be available
only for actions that are enabled for parcel coalescing (see the
macros HPX_ACTION_USES_MESSAGE_COALESCING
and
HPX_ACTION_USES_MESSAGE_COALESCING_NOTHROW
).
APEX integration¶
HPX provides integration with APEX, which is a framework for application
profiling using task timers and various performance counters. It can be added as
a git
submodule by turning on the option HPX_WITH_APEX:BOOL
during
CMake configuration. TAU is an optional dependency when using APEX.
To build HPX with APEX, add HPX_WITH_APEX
=ON
, and,
optionally, TAU_ROOT=$PATH_TO_TAU
to your CMake configuration. In
addition, you can override the tag used for APEX with the
HPX_WITH_APEX_TAG
option. Please see the APEX HPX documentation for detailed
instructions on using APEX with HPX.