HPX V1.10.0 (May 29, 2024)#

General changes#

  • The HPX documentation has seen a major overhaul for this release. We finished documenting the public local HPX API, we have added migration guides from widely used parallelization platforms to HPX (OpenMP, TBB, and MPI).

  • We have added facilities enabling optimizations for trivially-relocatable types (see P1144 for more details).

  • We have added (and use) the scope_xxx helper facilities as specified by the C++ library fundamentals TS v3 (see: N4948).

  • We have added configuration options that allow to build HPX without pre-installing any prerequisites. Use HPX_WITH_FETCH_HWLOC=On to have Portable Hardware Locality (HWLOC) installed for you. Similarly, setting HPX_WITH_FETCH_BOOST=On during configuration time will install the necessary Boost libraries (currently V1.84.0).

  • We have performed a lot of code cleanup and refactoring to improve the overall code quality and decrease compile times.

  • The collective operations APIs have seen an unification, we have fixed issues and performance problems for the collectives.

  • The HPX executors have seen a streamlining and some consistency changes. We have applied many performance improvements to the executor implementations that directly positively impact the performance of our parallel algorithms.

  • We have added a new parcelport allowing to use Gasnet as a communication platform.

  • We have added optimizations to various parcelports improving overall communication performance. This includes - amongst other things - send immediate optimizations and receiver-side zero-copy optimizations.

  • Futures will now execute the associated task eagerly and inline on any wait operation if the task has not started running yet. This feature can be enabled using the HPX_COROUTINES_WITH_THREAD_SCHEDULE_HINT_RUNS_AS_CHILD=On configuration setting (which is Off by default).

  • We have enabled using json files to supply configuration information through the command line. This feature can be enabled with the configuration option HPX_COMMAND_LINE_HANDLING_WITH_JSON_CONFIGURATION_FILES=On. This functionality depends on the external JSon library, which can be built at configuration time by supplying HPX_WITH_FETCH_JSON=On to CMake.

  • We have applied many fixes to our CUDA, ROCm, and SYCL build environments.

Breaking changes#

  • The CMake configuration keys SOMELIB_ROOT (e.g., BOOST_ROOT) have been renamed to Somelib_ROOT (e.g., Boost_ROOT) to avoid warnings when using newer versions of CMake. Please update your scripts accordingly. For now, the old variable names are re-assigned to the new names and unset in the CMake cache.

Closed issues#

  • Issue #6466 - No access limitations to Wiki

  • Issue #6461 - handle_received_parcels may never return

  • Issue #6459 - Building HPX

  • Issue #6451 - HPX hangs at the very end

  • Issue #6446 - Issue on page /manual/getting_hpx.html

  • Issue #6443 - PR #6435 (parcel_layer_tweaks) broke Octo-Tiger

  • Issue #6440 - HPX does not compile with MSVC of Visual Studio 2022 17.9+

  • Issue #6437 - HPX 1.9.1 does not compile on Fedora with ‘#pragma message: [Parallel STL message]: “Vectorized algorithm unimplemented, redirected to serial

  • Issue #6419 - Enhancement of the macro functionalities within hpx

  • Issue #6417 - The current HPX master branch is still not compatible with Kokkos 4.0.1

  • Issue #6414 - Current HPX master causes segfaults within Octo-Tiger

  • Issue #6412 - Clangd (Language Server) throws error for __integer_pack at pack.hpp

  • Issue #6407 - Cannot build Kokkos 4.0.01 with current HPX master

  • Issue #6405 - Spack Build Error with ROCm 5.7.0

  • Issue #6398 - HPX sets affinity wrong with multiple processes per node and LCI parcelport enabled

  • Issue #6392 - [Feature] Install dependencies using CMake

  • Issue #6388 - HPX error: “Host not found” when running on Expanse with 128 nodes

  • Issue #6366 - serialize_buffer allocator support needs adjustments

  • Issue #6361 - HPX 1.9.1 does not compile on Fedora 40

  • Issue #6355 - Single page documentation is broken

  • Issue #6334 - Segmentation fault after adding a padding in one_size_heap_list

  • Issue #6329 - Log hpx threads on forced shutdown

  • Issue #6316 - Build breaks on FreeBSD

  • Issue #6299 - HPX does not use distributed localities on Fugaku

  • Issue #6298 - Update config for coroutines on ARM

  • Issue #6291 - Zero-copy receive optimization disabled the invocation of direct actions

  • Issue #6261 - Add optional reading of json files for command line options

  • Issue #6087 - Support for vcpkg on Linux is broken

  • Issue #5921 - hpx::info claims that async_mpi was not built, while cmake assures its existence

  • Issue #5893 - Tests fail on FreeBSD: Executable copyn_test does not exist

  • Issue #5833 - barrier lockup

  • Issue #5799 - Investigate CUDA compilation problems

  • Issue #5340 - Examples do not run on Mac OSX using the M1 chip

Closed pull requests#

  • PR #6493 - Fix distributed latch documentation

  • PR #6492 - Fix kokkos hpx nvcc compilation

  • PR #6491 - More fixes to handling bool arguments for collective operations

  • PR #6490 - Remove the default max cpu count

  • PR #6489 - Ensure TCP parcelport is deactivated if not needed

  • PR #6488 - Fixing handling of bool value type for collective operations

  • PR #6485 - Destructive interference size

  • PR #6484 - Improve performance counter error handling

  • PR #6482 - Generalize the notion of bitwise serialization

  • PR #6481 - Fixing use of HPX_WITH_CXX_STANDARD

  • PR #6480 - Remove equal_to from hpx::any

  • PR #6479 - Remove optimizations for certain built-in compiler intrinsics

  • PR #6478 - Fixing issues on MacOS

  • PR #6477 - lci pp: lci’s github repo name changed from LC to lci

  • PR #6476 - Fixing binary filter test target names

  • PR #6475 - Fix mac os github actions

  • PR #6472 - Troubleshoot CI hangs

  • PR #6469 - improve(lci pp): more options to control the LCI parcelport

  • PR #6467 - Bump jwlawson/actions-setup-cmake from 1.14 to 2.0

  • PR #6464 - Update docs of “Writing distributed applications” page

  • PR #6463 - Revert “Always return outermost thread id”

  • PR #6458 - Reduce test workload to fix CI/CD time-out

  • PR #6457 - replace boost::array with std::array and update file name

  • PR #6456 - Move APEX CI to rostam

  • PR #6455 - Fixing compilation if HPX_HAVE_THREAD_QUEUE_WAITTIME is defined

  • PR #6454 - Update perftests reference measurements

  • PR #6453 - Update supported platforms of Manual/Prerequisites page

  • PR #6452 - Fix nvcc crashes in transform_stream.cu and synchronize.cu

  • PR #6450 - Fix git tag name in Getting HPX page

  • PR #6449 - LCI parcelport: add yield to potentially infinite retry loop

  • PR #6447 - Use compressed ptr in schedulers when 128 atomics are not lockfree

  • PR #6445 - Fix agas addressing cache

  • PR #6444 - Update CTestConfig.cmake

  • PR #6442 - Update CMakeLists.txt

  • PR #6441 - Minor documentation fixes

  • PR #6439 - Optimizing use of certain #includes

  • PR #6438 - Bump jwlawson/actions-setup-cmake from 1.14 to 2.0

  • PR #6436 - Update docs

  • PR #6435 - Parcel layer tweaks

  • PR #6434 - improve termination detection: removing lock from critical path

  • PR #6433 - Use shared mutex for resolve_locality procedure

  • PR #6432 - Module cleanup up to level 30

  • PR #6429 - Making sure HPX_WITH_ASYNC_MPI is reported properly

  • PR #6427 - Modifying CMakeLists to copy libhwloc-15.dll to the binary folder in Windows, independently

  • PR #6425 - Fix macOS failing test

  • PR #6424 - Adding option for downloading Boost using CMake FetchContent

  • PR #6423 - Move adjacent_difference to numeric header file

  • PR #6422 - Adding steal-half functionalities to work-requesting scheduler

  • PR #6421 - Bump actions/checkout from 2 to 4

  • PR #6418 - Working around nvcc problems to use CTAD

  • PR #6416 - Change run_as_os_thread deprecation forwarding due to hipcc compilation issue

  • PR #6415 - Attempting to avoid segfault in OctoTiger during initialization

  • PR #6413 - Always return outermost thread id

  • PR #6411 - Minor refactoring and fixes to the LCI parcelport and pingpong_performance2 benchmark

  • PR #6410 - Adding scope_xxx from library fundamentals TS v3

  • PR #6409 - Working around CUDA issue

  • PR #6408 - Tightening up collective operation semantics

  • PR #6406 - Working around ROCm compiler issue

  • PR #6404 - Allow to disable use of [[no_unique_address]] attribute

  • PR #6403 - Fixing copyright year

  • PR #6402 - fix(lci pp): fix deadlocks with too many failed sends

  • PR #6401 - fix(lci pp): fix the null_thread_id bug in the LCI parcelport

  • PR #6400 - Fix the affinity setting bug when using LCI pp and multiple localities per node

  • PR #6397 - Change API header titles and info

  • PR #6396 - Making is_bitwise_serializable SFINAE-friendly

  • PR #6395 - Adapt amount of collective testing

  • PR #6394 - Adding option for installing Hwloc using CMake FetchContent

  • PR #6393 - Optionally disable caching allocator

  • PR #6391 - Cleaning up collective operations

  • PR #6390 - Making function local constexpr variables non-static

  • PR #6389 - Disable resolving hostnames if TCP is disabled

  • PR #6387 - Need to break out of the loop when searching the suffixes.

  • PR #6384 - Fixing allocation/deallocation mismatch in serialize_buffer

  • PR #6383 - Enable fork_join_executor to handle return values from scheduled functions

  • PR #6381 - Consistently treat conflicting parameters provided by executors and parameter objects

  • PR #6380 - Fixing setting an annotation for an execution policy

  • PR #6378 - Allowing to disable signal handlers

  • PR #6377 - Fix gasnet-related test failures

  • PR #6375 - Update LSU Jenkins with 2023-10 libraries

  • PR #6374 - Investigate builder gasnet failure

  • PR #6373 - Fixing communicator API, adding docs

  • PR #6372 - Fix resource partitioner tests for small thread count

  • PR #6371 - Fix jacobi omp examples.

  • PR #6370 - improve one_size_heap_list: use rwlock to speedup the allocation/free

  • PR #6369 - working issue with MPI_CC / CC conflict in automake

  • PR #6368 - Making sure serialize_buffer properly destroys buffer, if needed.

  • PR #6367 - Fix parallel relocation test

  • PR #6364 - Relocation variants

  • PR #6363 - Update the lci parcelport to use LCI v1.7.6

  • PR #6362 - Fixing compilation problems on 32 Linux systems

  • PR #6360 - Fix broken links in docs: PDF, Single HTML page, Dependency report

  • PR #6359 - Fix header file links in Public API page

  • PR #6358 - Fix CMake find_library for HWLOC

  • PR #6357 - Replace Custom Benchmarking Code with Nanobench

  • PR #6356 - Fixed matrix multiplication example output

  • PR #6354 - Fix broken links for header files in Public API page

  • PR #6353 - Enable using std::reference_wrapper with executor parameters

  • PR #6352 - Add Public distributed API documentation

  • PR #6350 - Make coverage work with Jenkins Github Branch Source plugin

  • PR #6349 - Moving hpx::threads::run_as_xxx to namespace hpx

  • PR #6348 - Adding –exclusive to launching tests on rostam

  • PR #6346 - changed chat link to discord

  • PR #6344 - uninitialized_relocate w/ type_support primitive

  • PR #6343 - Bump actions/checkout from 3 to 4

  • PR #6342 - Fix HPX-APEX cmake integration

  • PR #6341 - Fix shared_future_continuation_order regression test

  • PR #6340 - Log alive hpx threads on exit

  • PR #6339 - Add coverage testing on Jenkins

  • PR #6338 - Fixing HPX_CURRENT_SOURCE_LOCATION when std::source_location exists

  • PR #6337 - Remove aurianer, biddisco, and msimberg from codeowners

  • PR #6336 - More cleaning up for module levels 19-20

  • PR #6335 - Finalize the MPI docs of the Migration Guide

  • PR #6332 - More fixes for CMake V3.27

  • PR #6330 - Adding basic logging to collective operations

  • PR #6328 - Cleanup previous patch adapting to CMake V3.27

  • PR #6327 - Modernize modules in level 17 and 18

  • PR #6324 - P1144 Relocation primitives

  • PR #6321 - Ensure hpx_main is a proper thread_function

  • PR #6320 - Fixing cyclic dependencies in naming and agas modules

  • PR #6319 - Generate git tag if needed but it is not available

  • PR #6317 - Fixing linker problem on FreeBSD

  • PR #6315 - acknowledge triv-rel and nothrow-rel types

  • PR #6314 - Relocation algorithms Clean

  • PR #6313 - Trivial relocation of c-v-ref-array types

  • PR #6312 - Fixing warning/error

  • PR #6311 - Adding executor parallel invoke CPOs

  • PR #6310 - Define HPX_COMPUTE_CODE in builds with SYCL

  • PR #6309 - Making sure changed number of cores is propagated to executor

  • PR #6308 - openshmem-parcelport initial import

  • PR #6306 - The hpxcxx script was broken such that it could only compile for _release

  • PR #6305 - Adapting build system for CMake V3.27

  • PR #6304 - Fixing an integral type mismatch warning

  • PR #6303 - omp for default vectorization

  • PR #6301 - Add MPI migration guide

  • PR #6294 - Add internal reference counting to semaphores

  • PR #6286 - Simd helpers

  • PR #6280 - Add TBB to HPX documentation in Migration Guide

  • PR #6276 - Add dependabot.yml

  • PR #6275 - Revert “Move dependabot.yml into correct directory”

  • PR #6272 - set thread name for linux

  • PR #6271 - Uninitialised algorithms, move using std::memcpy

  • PR #6270 - Bump jwlawson/actions-setup-cmake from 1.9 to 1.14

  • PR #6269 - Bump actions/checkout from 2 to 3

  • PR #6268 - Move dependabot.yml into correct directory

  • PR #6265 - Create dependabot.yml

  • PR #6264 - hpx::is_trivially_relocatable trait implementation

  • PR #6263 - Adding support for reading json configuration files for command line options

  • PR #6249 - Implement the send immediate optimization for the MPI parcelport.

  • PR #6237 - Improve compilation performance

  • PR #6234 - Adding release notes page for next release

  • PR #6233 - Moving is_relocatable to namespace hpx

  • PR #6230 - gasnet based parcelport

  • PR #6226 - Re-enable dependency on segmented algorithms on CircleCI

  • PR #6220 - Add execution on

  • PR #6212 - Initial trait definition for relocatable

  • PR #6199 - added support for unseq, par_unseq for hpx::make_heap algorithm

  • PR #6173 - C++ modules

  • PR #6122 - Add Module support

  • PR #6099 - Futures attempt to execute threads directly if those have not started executing

  • PR #6050 - Investigating partitioned_vector problems

  • PR #5988 - Adding CI configuration for DGX-A100 at LSU

  • PR #5910 - Improve MPI initialization

  • PR #5845 - Adding local work requesting scheduler that is based on message passing internally