HPX V1.7.0 (Jul 14, 2021)#

This release is again focused on C++20 conformance of algorithms. Additionally, many new experimental sender-based algorithms have been added based on the latest proposals.

General changes#

  • The following algorithms have been adapted to be C++20 conformant:

    • remove,

    • remove_if,

    • remove_copy,

    • remove_copy_if,

    • replace,

    • replace_if,

    • reverse, and

    • lexicographical_compare.

  • When the compiler and standard library support the standard execution policies std::execution::seq, std::execution::par, and std::execution::par_unseq they can now be used in all HPX parallel algorithms with equivalent behaviour to the non-task policies hpx::execution::seq, hpx::execution::par, and hpx::execution::par_unseq.

  • Vc support has been fixed, after being broken in 1.6.0. In addition, HPX now experimentally supports GCC’s SIMD implementation, when available. The implementation can be used through the hpx::execution::simd and hpx::execution::simdpar execution policies.

  • The customization points sync_execute, async_execute, then_execute, post, bulk_sync_execute, bulk_async_execute, and bulk_then_execute are now implemented using tag_dispatch (previously tag_invoke). Executors can still be implemented by providing the aforementioned functions as member functions of an executor.

  • New functionality, enhancements, and fixes based on P0443r14 (executors proposal) and P1897 (sender-based algorithms) have been added to the hpx::execution::experimental namespace. These can be accessed through the hpx/execution.hpp and hpx/local/execution.hpp headers. In particular, the following sender-based algorithms have been added:

    • detach,

    • ensure_started,

    • just,

    • just_on,

    • let_error,

    • let_value,

    • on,

    • transform, and

    • when_all.

    Additionally, futures now implement the sender concept. make_future can be used to turn a sender into a future. All functionality is experimental and can change without notice.

  • All hpx::init and hpx::start overloads now take std::functions instead of hpx::util::function_nonser. No changes should be required in user code to accommodate this change.

  • hpx::util::unwrapping and other related unwrapping functionality has been moved up into the hpx namespace. Names in hpx::util are still usable with a deprecation warning. This functionality can now be accessed through the hpx/unwrap.hpp and hpx/local/unwrap.hpp headers.

  • The default tag for APEX has been update from 2.3.1 to 2.4.0. In particular, this fixes a bug which could lead to hangs in distributed runs.

  • The dependency on Boost.Asio has been replaced with the standalone Asio available at https://github.com/chriskohlhoff/asio. By default, a system-installed Asio will be used. ASIO_ROOT can be given as a hint to tell CMake where to find Asio. Alternatively, Asio can be fetched automatically using CMake’s fetchcontent by setting HPX_WITH_FETCH_ASIO=ON. In general, dependencies on Boost have again been reduced.

  • Modularization of the library has continued. In this release almost all functionality has been moved into modules. These changes do not generally affect user code. Warnings are still issued for headers that have moved.

  • hipBLAS is now optional when compiling with hipcc. A warning instead of an error will be printed if hipBLAS is not found during configuration.

  • Previously HPX_COMPUTE_HOST_CODE was defined in host code only if HPX was configured with CUDA or HIP. In this release HPX_COMPUTE_HOST_CODE is always defined in host code.

  • An experimental HPX_WITH_PRECOMPILED_HEADERS CMake option has been added to use precompiled headers when building HPX. This option should not be used on Windows.

  • Numerous bug fixes.

Breaking changes#

  • The minimum required CMake version is now 3.17.

  • The minimum required Boost version is now 1.71.0.

  • The customization mechanism used to implement and extend sender functionality and algorithms has been renamed from tag_invoke to tag_dispatch. All customization of sender functionality should be done by overloading tag_dispatch.

  • The following compatibility options have been removed, along with their compatibility implementations: - HPX_PROGRAM_OPTIONS_WITH_BOOST_PROGRAM_OPTIONS_COMPATIBILITY - HPX_WITH_ACTION_BASE_COMPATIBILITY - HPX_WITH_EMBEDDED_THREAD_POOLS_COMPATIBILITY - HPX_WITH_POOL_EXECUTOR_COMPATIBILITY. - HPX_WITH_PROMISE_ALIAS_COMPATIBILITY - HPX_WITH_REGISTER_THREAD_COMPATIBILITY - HPX_WITH_REGISTER_THREAD_OVERLOADS_COMPATIBILITY - HPX_WITH_THREAD_AWARE_TIMER_COMPATIBILITY - HPX_WITH_THREAD_EXECUTORS_COMPATIBILITY - HPX_WITH_THREAD_POOL_OS_EXECUTOR_COMPATIBILITY

  • The HPX_WITH_THREAD_SCHEDULERS CMake option has been removed. All schedulers are now enabled when possible.

  • HPX_WITH_INIT_START_OVERLOADS_COMPATIBILITY has been turned off by default.

Closed issues#

  • Issue #5423 - Fix lvalue-ref qualified connect for when_all-sender

  • Issue #5412 - Link error

  • Issue #5397 - Performance regression in thread annotations

  • Issue #5395 - HPX 1.7.0-rc1 fails to build icw APEX + OTF2

  • Issue #5385 - HPX 1.7 crashes on Piz Daint > 64 nodes

  • Issue #5380 - CMake should search for asio package installed on the system

  • Issue #5378 - HPX 1.7.0 stopped building on Fedora

  • Issue #5369 - HPX 1.6 and master hangs on Summit for > 64 nodes

  • Issue #5358 - HPX init fails for single-core environments

  • Issue #5345 - Rename P2220 property CPOs?

  • Issue #5333 - HPX does not compile on the new Mac OSX using the M1 chip

  • Issue #5317 - Consider making hipblas optional

  • Issue #5306 - asio fails to build with CUDA 10.0

  • Issue #5294 - execution::on should be based on execution::schedule

  • Issue #5275 - HPX V1.6.0 fails on Fedora release

  • Issue #5270 - HPX-1.6.0 fails to build on Windows 10

  • Issue #5257 - Allow triggering the output of OS thread affinity from configuration settings

  • Issue #5246 - HPX fails to build on ppc64le

  • Issue #5232 - Annotation using hpx::util::annotated_function not working

  • Issue #5222 - Build and link errors with ittnotify enabled

  • Issue #5204 - Move algorithms to tag_fallback_dispatch

  • Issue #5163 - Remove module-specific compatibility and deprecation options

  • Issue #5161 - Bump required CMake version to 3.17

  • Issue #5143 - Searching for HPX-Application to generate work on multiple Nodes

Closed pull requests#

  • PR #5438 - Delete datapar/foreach_tests.hpp

  • PR #5437 - Add back explicit -pthread flags when available

  • PR #5435 - This adds support for systems that assume all types are bitwise serializable by default

  • PR #5434 - Update CUDA polling logging to be more verbose

  • PR #5433 - Fix when_all_sender connect for references

  • PR #5432 - Add deprecation warnings for v1.8

  • PR #5431 - Rename the new P0443/P2300 executor to thread_pool_scheduler

  • PR #5430 - Revert “Adding the missing defined for HPX_HAVE_DEPRECATION_WARNINGS

  • PR #5427 - Removing unneeded typedef

  • PR #5426 - Adding more concept checks for sender/receiver algorithms

  • PR #5425 - Adding the missing defined for HPX_HAVE_DEPRECATION_WARNINGS

  • PR #5424 - Disable Vc in final docker image created in CI

  • PR #5422 - Adding execution::experimental::bulk algorithm

  • PR #5420 - Update logic to find threading library

  • PR #5418 - Reduce max size and number of files in ccache cache

  • PR #5417 - Final release notes for 1.7.0

  • PR #5416 - Adapt uninitialized_value_construct and uninitialized_value_construct_n to C++ 20

  • PR #5415 - Adapt uninitialized_default_construct and uninitialized_default_construct_n to C++ 20

  • PR #5414 - Improve integration of futures and senders

  • PR #5413 - Fixing sender/receiver code base to compile with MSVC

  • PR #5407 - Handle exceptions thrown during initialization of parcel handler

  • PR #5406 - Simplify dispatching to annotation handlers

  • PR #5405 - Fetch Asio automatically in perftests CI

  • PR #5403 - Create generic executor that adds annotations to any other executor

  • PR #5402 - Adapt uninitialized_fill and uninitialized_fill_n to C++ 20

  • PR #5401 - Modernize a variety of facilities related to parallel algorithms

  • PR #5400 - Fix sliding semaphore test

  • PR #5399 - Rename leftover tag_fallback_invoke to tag_fallback_dispatch

  • PR #5398 - Improve logging in AGAS symbol namespace

  • PR #5396 - Introduce compatibility layer for collective operations

  • PR #5394 - Enable OTF2 in APEX CI configuration

  • PR #5393 - Update APEX tag

  • PR #5392 - Fixing wrong usage of std::forward

  • PR #5391 - Fix forwarding in transform_receiver constructor

  • PR #5390 - Make sure shared priority scheduler steals tasks on the current NUMA domain when (core) stealing is enabled

  • PR #5389 - Adapt uninitialized_move and uninitialized_move_n to C++ 20

  • PR #5388 - Fixing gather_there for used with lvalue reference argument

  • PR #5387 - Extend thread state logging and change default stealing parameters

  • PR #5386 - Attempt to fix the startup hang with nodes > 32

  • PR #5384 - Remove HPX 1.5.0 deprecations

  • PR #5382 - Prefer installed Asio before considering FetchContent

  • PR #5379 - Allow using pre-downloaded (not installed) versions of Asio and/or Apex

  • PR #5376 - Remove unnecessary explicit listing of library modules.rst files in CMakeLists.txt

  • PR #5375 - Slight performance improvement for hpx::copy and hpx::move et.al.

  • PR #5374 - Remove unnecessary moves from future sender implementations

  • PR #5373 - More changes to clang-cuda Jenkins configuration

  • PR #5372 - Slight improvements to min/max/minmax_element algorithms

  • PR #5371 - Adapt uninitialized_copy and uninitialized_copy_n to C++ 20

  • PR #5370 - Decay types in just_sender value_types to match stored types

  • PR #5367 - Disable pkgconfig by default again on macOS

  • PR #5365 - Use ccache for Jenkins builds on Piz Daint

  • PR #5363 - Update cudatoolkit module name in clang-cuda Jenkins configuration

  • PR #5362 - Adding channel_communicator

  • PR #5361 - Fix compilation with MPI enabled

  • PR #5360 - Update APEX and asio tags

  • PR #5359 - Fix check for pu-step in single-core case

  • PR #5357 - Making sure collective operations can be reused by preallocating communicator

  • PR #5356 - Update API documentation

  • PR #5355 - Make the sequenced_executor processing_units_count member function const

  • PR #5354 - Making sure default_stack_size is defined whenever declared

  • PR #5353 - Add CUDA timestamp support to HPX Hardware Clock

  • PR #5352 - Adding missing includes

  • PR #5351 - Adding enable_logging/disable_logging API functions

  • PR #5350 - Adapt lexicographical_compare to C++20

  • PR #5349 - Update minimum boost version needed on the docs

  • PR #5348 - Rename tag_invoke and related facilities to tag_dispatch

  • PR #5347 - Remove make_ prefix for executor properties

  • PR #5346 - Remove and disable compatibility options for 1.7.0

  • PR #5343 - Fix timed_executor static cast conversion

  • PR #5342 - Refactor CUDA event polling

  • PR #5341 - Adding make_with_annotation and get_annotation properties

  • PR #5339 - Making sure hpx::util::hardware::timestamp() is always defined

  • PR #5338 - Fixing timed_executor specializations of customization points

  • PR #5335 - Make partial_algorithm work with any number of arguments

  • PR #5334 - Follow up iter_sent include on #5225

  • PR #5332 - Simplify tag_invoke and friends

  • PR #5331 - More work on cleaning up executor CPOs

  • PR #5330 - Add option to disable pkgconfig generation

  • PR #5328 - Adapt data parallel support using std-simd

  • PR #5327 - Fix missing ifdef HPX_SMT_PAUSE

  • PR #5326 - Adding resize() to serialize_buffer allowing to shrink its size

  • PR #5324 - Add get member functions to async_rw_mutex proxy objects for explicitly getting the wrapped value

  • PR #5323 - Add keep_future algorithm

  • PR #5322 - Replace executor customization point implementations with tag_invoke

  • PR #5321 - Seperate segmented algorithms for reduce

  • PR #5320 - Fix is_sender trait and other small fixes to p0443 traits

  • PR #5319 - gcc 11.1 c++20 build fixes

  • PR #5318 - Make hipblas dependency optional as not always available

  • PR #5316 - Attempt to fix checking for libatomic

  • PR #5315 - Add explicit keyword to fixture constructor

  • PR #5314 - Fix a race condition in async mpi affecting limiting executor

  • PR #5312 - Use local runtime and local headers in local-only modules and tests

  • PR #5311 - Add GCC 11 builder to jenkins

  • PR #5310 - Adding hpx::execution::experimental::task_group

  • PR #5309 - Seperate datapar

  • PR #5308 - Seperate segmented algorithms for find, find_if, find_if_not

  • PR #5307 - Seperate segmented algorithms for fill and generate

  • PR #5304 - Fix compilation of sender CPOs with nvcc

  • PR #5300 - Remove PRIVATE flag that was propagated into the LANGUAGES

  • PR #5298 - Seperate datapar

  • PR #5297 - Specify exact cmake and ninja versions when loading them in jenkins jobs

  • PR #5295 - Update clang-newest configuration to use clang 12 and Boost 1.76.0

  • PR #5293 - Fix Clang 11 cuda_future test bug

  • PR #5292 - Add async_rw_mutex based on senders

  • PR #5291 - “Fix” termination detection

  • PR #5290 - Fixed source file line statements in examples documentation

  • PR #5289 - Allow splitting of futures holding std::tuple

  • PR #5288 - Move algorithms to tag_fallback_invoke

  • PR #5287 - Move algorithms to tag_fallback_invoke

  • PR #5285 - Fix clang-format failure on master

  • PR #5284 - Replacing util::function_nonser on std::function in hpx_init

  • PR #5282 - Update Boost for daint 20.11 after update

  • PR #5281 - Fix Segmentation fault on foreach_datapar_zipiter

  • PR #5280 - Avoid modulo by zero in counting_iterator test

  • PR #5279 - Fix more GCC 10 deprecation warnings

  • PR #5277 - Small fixes and improvements to CUDA/MPI polling

  • PR #5276 - Fix typo in docs

  • PR #5274 - More P1897 algorithms

  • PR #5273 - Retry CDash submissions on failure

  • PR #5272 - Fix bogus deprecation warnings with GCC 10

  • PR #5271 - Correcting target ids for symbol_namespace::iterate

  • PR #5268 - Adding generic require, require_concept, and query properties

  • PR #5267 - Support annotations in hpx::transform_reduce

  • PR #5266 - Making late command line options available for local runtime

  • PR #5265 - Leverage no_unique_address for member_pack

  • PR #5264 - Adopt format in more places

  • PR #5262 - Install HPX in Rostam Jenkins jobs

  • PR #5261 - Limit Rostam Jenkins jobs to marvin partition temporarily

  • PR #5260 - Separate segmented algorithms for transform_reduce

  • PR #5259 - Making sure late command line options are recognized as configuration options

  • PR #5258 - Allow for HPX algorithms being invoked with std execution policies

  • PR #5256 - Separate segmented algorithms for transform

  • PR #5255 - Future/sender adapters

  • PR #5254 - Fixing datapar

  • PR #5253 - Add utility to format ranges

  • PR #5252 - Remove uses of Boost.Bimap

  • PR #5251 - Banish <iostream> from library headers

  • PR #5250 - Try fixing vc circle ci

  • PR #5249 - Adding missing header

  • PR #5248 - Use old Piz Daint modules after upgrade

  • PR #5247 - Significantly speedup simple for_each, for_loop, and transform

  • PR #5245 - P1897 operator| overloads

  • PR #5244 - P1897 when_all

  • PR #5243 - Make sure HPX_DEBUG is set based on HPX’s build type, not consuming project’s build type

  • PR #5242 - Moving last files unrelated to parcel layer to modules

  • PR #5240 - change namespace for transform_loop.hpp

  • PR #5238 - Make sure annotations are used in the binary transform

  • PR #5237 - Add P1897 just, just_on, and on algorithms

  • PR #5236 - Add an example demonstrating the use of the invoke_function_action facility

  • PR #5235 - Attempting to fix datapar compilation issues

  • PR #5234 - Fix small typo in --hpx:local option description

  • PR #5233 - Only find Boost.Iostreams if required for plugins

  • PR #5231 - Sort printed config options

  • PR #5230 - Fix C++20 replace algo adaptation misses

  • PR #5229 - Remove leftover Boost include from sync_wait.hpp

  • PR #5228 - Print module name only if it has custom configuration settings

  • PR #5227 - Update .codespell_whitelist

  • PR #5226 - Use new docker image in all CircleCI steps

  • PR #5225 - Adapt reverse to C++20

  • PR #5224 - Separate segmented algorithms for none_of, any_of and all_of

  • PR #5223 - Fixing build system for ittnotify

  • PR #5221 - Moving LCO related files to modules

  • PR #5220 - Seperate segmented algorithms for count and count_if

  • PR #5218 - Seperate segmented algorithms for adjacent_find

  • PR #5217 - Add a HIP github action

  • PR #5215 - Update ROCm to 4.0.1 on Rostam

  • PR #5214 - Fix clang-format error in sender.hpp

  • PR #5213 - Removing ESSENTIAL option to the doc example

  • PR #5212 - Seperate segmented algorithms for for_each_n

  • PR #5211 - Minor adapted algos fixes

  • PR #5210 - Fixing is_invocable deprecation warnings

  • PR #5209 - Moving more files into modules (actions, components, init_runtime, etc.)

  • PR #5208 - Add examples and explanation on when tag_fallback/priority are useful

  • PR #5207 - Always define HPX_COMPUTE_HOST_CODE for host code

  • PR #5206 - Add formatting exceptions for libhpx to create_module_skeleton.py

  • PR #5205 - Moving all distribution policies into modules

  • PR #5203 - Move copy algorithms to tag_fallback_invoke

  • PR #5202 - Make HPX_WITH_PSEUDO_DEPENDENCIES a cache variable

  • PR #5201 - Replaced tag_invoke with tag_fallback_invoke for adjacent_find algorithm

  • PR #5200 - Moving files to (distributed) runtime module

  • PR #5199 - Update ICC module name on Piz Daint Jenkins configuration

  • PR #5198 - Add doxygen documentation for thread_schedule_hint

  • PR #5197 - Attempt to fix compilation of context implementations with unity build enabled

  • PR #5196 - Re-enable component tests

  • PR #5195 - Moving files related to colocation logic

  • PR #5194 - Another attempt at fixing the Fedora 35 problem

  • PR #5193 - Components module

  • PR #5192 - Adapt replace(_if) to C++20

  • PR #5190 - Set compatibility headers by default to on

  • PR #5188 - Bump Boost minimum version to 1.71.0

  • PR #5187 - Force CMake to set the -std=c++XX flag

  • PR #5186 - Remove message to print .cu extension whenever .cu files are encountered

  • PR #5185 - Remove some minor unnecessary CMake options

  • PR #5184 - Remove some leftover HPX_WITH_*_SCHEDULER uses

  • PR #5183 - Remove dependency on boost/iterators/iterator_categories.hpp

  • PR #5182 - Fixing Fedora 35 for Power architectures

  • PR #5181 - Bump version number and tag post 1.6.0 release

  • PR #5180 - Fix htts_v2 tests linking

  • PR #5179 - Make sure --hpx:local command line option is respected with networking is off but distributed runtime is on

  • PR #5177 - Remove module cmake options

  • PR #5176 - Starting to separate segmented algorithms: for_each

  • PR #5174 - Don’t run segmented algorithms twice on CircleCI

  • PR #5173 - Fetching APEX using cmake FetchContent

  • PR #5172 - Add separate local-only entry point

  • PR #5171 - Remove HPX_WITH_THREAD_SCHEDULERS CMake option

  • PR #5170 - Add HPX_WITH_PRECOMPILED_HEADERS option

  • PR #5166 - Moving some action tests to modules

  • PR #5165 - Require cmake 3.17

  • PR #5164 - Move thread_pool_suspension_helper files to small utility module

  • PR #5160 - Adding checks ensuring modules are not cross-referenced from other module categories

  • PR #5158 - Replace boost::asio with standalone asio

  • PR #5155 - Allow logging when distributed runtime is off

  • PR #5153 - Components module

  • PR #5152 - Move more files to performance counter module

  • PR #5150 - Adapt remove_copy(_if) to C++20

  • PR #5144 - AGAS module

  • PR #5125 - Adapt remove and remove_if to C++20

  • PR #5117 - Attempt to fix segfaults assumed to be caused by future_data instances going out of scope.

  • PR #5099 - Allow mixing debug and release builds

  • PR #5092 - Replace spirit.qi with x3

  • PR #5053 - Add P0443r14 executor and a a few P1897 algorithms

  • PR #5044 - Add performance test in jenkins and reports