HPX V1.0.0 (Apr 24, 2017)¶
General changes¶
Here are some of the main highlights and changes for this release (in no particular order):
- Added the facility
hpx::split_future
which allows to convert afuture<tuple<Ts...>>
into atuple<future<Ts>...>
. This functionality is not available when compiling HPX with VS2012. - Added a new type of performance counter which allows to return a list of values for each invocation. We also added a first counter of this type which collects a histogram of the times between parcels being created.
- Added new LCOs:
hpx::lcos::channel
andhpx::lcos::local::channel
which are very similar to the well known channel constructs used in the Go language. - Added new performance counters reporting the amount of data handled by the networking layer on a action-by-action basis (please see PR #2289 for more details).
- Added a new facility
hpx::lcos::barrier
, replacing the equally named older one. The new facility has a slightly changed API and is much more efficient. Most notable, the new facility exposes a (global) functionhpx::lcos::barrier::synchronize()
which represents a global barrier across all localities. - We have started to add support for vectorization to our parallel algorithm implementations. This support depends on using an external library, currently either Vc Library or |boost_simd|_. Please see Issue #2333 for a list of currently supported algorithms. This is an experimental feature and its implementation and/or API might change in the future. Please see this blog-post for more information.
- The parameter sequence for the
hpx::parallel::transform_reduce
overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17. The old overload can be still enabled at configure time by specifying-DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On
to CMake. - The algorithm
hpx::parallel::inner_product
has been renamed tohpx::parallel::transform_reduce
to match the changes this algorithm has undergone while being moved to C++17. The old inner_product names can be still enabled at configure time by specifying-DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On
to CMake. - Added versions of
hpx::get_ptr
taking client side representations for component instances as their parameter (instead of a global id). - Added the helper utility
hpx::performance_counters::performance_counter_set
helping to encapsulate a set of performance counters to be managed concurrently. - All execution policies and related classes have been renamed to be consistent
with the naming changes applied for C++17. All policies now live in the
namespace
hpx::parallel::execution
. The ols names can be still enabled at configure time by specifying-DHPX_WITH_EXECUTION_POLICY_COMPATIBILITY=On
to CMake. - The thread scheduling subsystem has undergone a major refactoring which
results in significant performance improvements. We have also imroved the
performance of creating
hpx::future
and of various facilities handling those. - We have consolidated all of the code in HPX.Compute related to the integration
of CUDA.
hpx::partitioned_vector
has been enabled to be usable withhpx::compute::vector
which allows to place the partitions on one or more GPU devices. - Added new performance counters exposing various internals of the thread scheduling subsystem, such as the current idle- and busy-loop counters and instantaneous scheduler utilization.
- Extended and improved the use of the ITTNotify hooks allowing to collect performance counter data and function annotation information from within the Intel Amplifier tool.
Breaking changes¶
- We have dropped support for the gcc compiler versions V4.6 and 4.7. The minimal gcc version we now test on is gcc V4.8.
- We have removed (default) support for
boost::chrono
in interfaces, uses of it have been replaced withstd::chrono
. This facility can be still enabled at configure time by specifying-DHPX_WITH_BOOST_CHRONO_COMPATIBILITY=On
to CMake. - The parameter sequence for the
hpx::parallel::transform_reduce
overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17. - The algorithm
hpx::parallel::inner_product
has been renamed tohpx::parallel::transform_reduce
to match the changes this algorithm has undergone while being moved to C++17. - the build options
HPX_WITH_COLOCATED_BACKWARDS_COMPATIBILITY
andHPX_WITH_COMPONENT_GET_GID_COMPATIBILITY
are now disabled by default. Please change your code still depending on the deprecated interfaces.
Bug fixes (closed tickets)¶
Here is a list of the important tickets we closed for this release.
- PR #2596 - Adding apex data
- PR #2595 - Remove obsolete file
- Issue #2594 - FindOpenCL.cmake mismatch with the official cmake module
- PR #2592 - First attempt to introduce spmd_block in hpx
- Issue #2591 - Feature request: continuation (then) which does not require the callable object to take a future<R> as parameter
- PR #2588 - Daint fixes
- PR #2587 - Fixing transfer_(continuation)_action::schedule
- PR #2585 - Work around MSVC having an ICE when compiling with -Ob2
- PR #2583 - chaning 7zip command to 7za in roll_release.sh
- PR #2582 - First attempt to introduce spmd_block in hpx
- PR #2581 - Enable annotated function for parallel algorithms
- PR #2580 - First attempt to introduce spmd_block in hpx
- PR #2579 - Make thread NICE level setting an option
- PR #2578 - Implementing enqueue instead of busy wait when no sender is available
- PR #2577 - Retrieve -std=c++11 consistent nvcc flag
- PR #2576 - Add missing dependencies of cuda based tests
- PR #2575 - Remove warnings due to some captured variables
- PR #2573 - Attempt to resolve resolve_locality
- PR #2572 - Adding APEX hooks to background thread
- PR #2571 - Pick up hpx.ignore_batch_env from config map
- PR #2570 - Add commandline options –hpx:print-counters-locally
- PR #2569 - Fix computeapi unit tests
- PR #2567 - This adds another barrier::synchronize before registering performance counters
- PR #2564 - Cray static toolchain support
- PR #2563 - Fixed unhandled exception during startup
- PR #2562 - Remove partitioned_vector.cu from build tree when nvcc is used
- Issue #2561 - octo-tiger crash with commit 6e921495ff6c26f125d62629cbaad0525f14f7ab
- PR #2560 - Prevent -Wundef warnings on Vc version checks
- PR #2559 - Allowing CUDA callback to set the future directly from an OS thread
- PR #2558 - Remove warnings due to float precisions
- PR #2557 - Removing bogus handling of compile flags for CUDA
- PR #2556 - Fixing scan partitioner
- PR #2554 - Add more diagnostics to error thrown from find_appropriate_destination
- Issue #2555 - No valid parcelport configured
- PR #2553 - Add cmake cuda_arch option
- PR #2552 - Remove incomplete datapar bindings to libflatarray
- PR #2551 - Rename hwloc_topology to hwloc_topology_info
- PR #2550 - Apex api updates
- PR #2549 - Pre-include defines.hpp to get the macro HPX_HAVE_CUDA value
- PR #2548 - Fixing issue with disconnect
- PR #2546 - Some fixes around cuda clang partitioned_vector example
- PR #2545 - Fix uses of the Vc2 datapar flags; the value, not the type, should be passed to functions
- PR #2542 - Make HPX_WITH_MALLOC easier to use
- PR #2541 - avoid recompiles when enabling/disabling examples
- PR #2540 - Fixing usage of target_link_libraries()
- PR #2539 - fix RPATH behaviour
- Issue #2538 - HPX_WITH_CUDA corrupts compilation flags
- PR #2537 - Add output of a Bazel Skylark extension for paths and compile options
- PR #2536 - Add counter exposing total available memory to Windows as well
- PR #2535 - Remove obsolete support for security
- Issue #2534 - Remove command line option
--hpx:run-agas-server
- PR #2533 - Pre-cache locality endpoints during bootstrap
- PR #2532 - Fixing handling of GIDs during serialization preprocessing
- PR #2531 - Amend uses of the term “functor”
- PR #2529 - added counter for reading available memory
- PR #2527 - Facilities to create actions from lambdas
- PR #2526 - Updated docs: HPX_WITH_EXAMPLES
- PR #2525 - Remove warnings related to unused captured variables
- Issue #2524 - CMAKE failed because it is missing: TCMALLOC_LIBRARY TCMALLOC_INCLUDE_DIR
- PR #2523 - Fixing compose_cb stack overflow
- PR #2522 - Instead of unlocking, ignore the lock while creating the message handler
- PR #2521 - Create
LPROGRESS_
logging macro to simplify progress tracking and timings - PR #2520 - Intel 17 support
- PR #2519 - Fix components example
- PR #2518 - Fixing parcel scheduling
- Issue #2517 - Race condition during Parcel Coalescing Handler creation
- Issue #2516 - HPX locks up when using at least 256 localities
- Issue #2515 - error: Install cannot find “/lib/hpx/libparcel_coalescing.so.0.9.99” but I can see that file
- PR #2514 - Making sure that all continuations of a shared_future are invoked in order
- PR #2513 - Fixing locks held during suspension
- PR #2512 - MPI Parcelport improvements and fixes related to the background work changes
- PR #2511 - Fixing bit-wise (zero-copy) serialization
- Issue #2509 - Linking errors in hwloc_topology
- PR #2508 - Added documentation for debugging with core files
- PR #2506 - Fixing background work invocations
- PR #2505 - Fix tuple serialization
- Issue #2504 - Ensure continuations are called in the order they have been attached
- PR #2503 - Adding serialization support for Vc v2 (datapar)
- PR #2502 - Resolve various, minor compiler warnings
- PR #2501 - Some other fixes around cuda examples
- Issue #2500 - nvcc / cuda clang issue due to a missing -DHPX_WITH_CUDA flag
- PR #2499 - Adding support for std::array to wait_all and friends
- PR #2498 - Execute background work as HPX thread
- PR #2497 - Fixing configuration options for spinlock-deadlock detection
- PR #2496 - Accounting for different compilers in CrayKNL toolchain file
- PR #2494 - Adding component base class which ties a component instance to a given executor
- PR #2493 - Enable controlling amount of pending threads which must be available to allow thread stealing
- PR #2492 - Adding new command line option –hpx:print-counter-reset
- PR #2491 - Resolve ambiguities when compiling with APEX
- PR #2490 - Resuming threads waiting on future with higher priority
- Issue #2489 - nvcc issue because -std=c++11 appears twice
- PR #2488 - Adding performance counters exposing the internal idle and busy-loop counters
- PR #2487 - Allowing for plain suspend to reschedule thread right away
- PR #2486 - Only flag HPX code for CUDA if HPX_WITH_CUDA is set
- PR #2485 - Making thread-queue parameters runtime-configurable
- PR #2484 - Added atomic counter for parcel-destinations
- PR #2483 - Added priority-queue lifo scheduler
- PR #2482 - Changing scheduler to steal only if more than a minimal number of tasks are available
- PR #2481 - Extending command line option –hpx:print-counter-destination to support value ‘none’
- PR #2479 - Added option to disable signal handler
- PR #2478 - Making sure the sine performance counter module gets loaded only for the corresponding example
- Issue #2477 - Breaking at a throw statement
- PR #2476 - Annotated function
- PR #2475 - Ensure that using %osthread% during logging will not throw for non-hpx threads
- PR #2474 - Remove now superficial non_direct actions from base_lco and friends
- PR #2473 - Refining support for ITTNotify
- PR #2472 - Some fixes around hpx compute
- Issue #2470 - redefinition of boost::detail::spinlock
- Issue #2469 - Dataflow performance issue
- PR #2468 - Perf docs update
- PR #2466 - Guarantee to execute remote direct actions on HPX-thread
- PR #2465 - Improve demo : Async copy and fixed device handling
- PR #2464 - Adding performance counter exposing instantaneous scheduler utilization
- PR #2463 - Downcast to future<void>
- PR #2462 - Fixed usage of ITT-Notify API with Intel Amplifier
- PR #2461 - Cublas demo
- PR #2460 - Fixing thread bindings
- PR #2459 - Make -std=c++11 nvcc flag consistent for in-build and installed versions
- Issue #2457 - Segmentation fault when registering a partitioned vector
- PR #2452 - Properly releasing global barrier for unhandled exceptions
- PR #2451 - Fixing long shutdown times
- PR #2450 - Attempting to fix initialization errors on newer platforms (Boost V1.63)
- PR #2449 - Replace BOOST_COMPILER_FENCE with an HPX version
- PR #2448 - This fixes a possible race in the migration code
- PR #2445 - Fixing dataflow et.al. for futures or future-ranges wrapped
- into ref()
- PR #2444 - Fix segfaults
- PR #2443 - Issue 2442
- Issue #2442 - Mismatch between #if/#endif and namespace scope brackets in this_thread_executers.hpp
- Issue #2441 - undeclared identifier BOOST_COMPILER_FENCE
- PR #2440 - Knl build
- PR #2438 - Datapar backend
- PR #2437 - Adapt algorithm parameter sequence changes from C++17
- PR #2436 - Adapt execution policy name changes from C++17
- Issue #2435 - Trunk broken, undefined reference to hpx::thread::interrupt(hpx::thread::id, bool)
- PR #2434 - More fixes to resource manager
- PR #2433 - Added versions of
hpx::get_ptr
taking client side representations - PR #2432 - Warning fixes
- PR #2431 - Adding facility representing set of performance counters
- PR #2430 - Fix parallel_executor thread spawning
- PR #2429 - Fix attribute warning for gcc
- Issue #2427 - Seg fault running octo-tiger with latest HPX commit
- Issue #2426 - Bug in 9592f5c0bc29806fce0dbe73f35b6ca7e027edcb causes immediate crash in Octo-tiger
- PR #2425 - Fix nvcc errors due to constexpr specifier
- Issue #2424 - Async action on component present on hpx::find_here is executing synchronously
- PR #2423 - Fix nvcc errors due to constexpr specifier
- PR #2422 - Implementing hpx::this_thread thread data functions
- PR #2421 - Adding benchmark for wait_all
- Issue #2420 - Returning object of a component client from another component action fails
- PR #2419 - Infiniband parcelport
- Issue #2418 - gcc + nvcc fails to compile code that uses partitioned_vector
- PR #2417 - Fixing context switching
- PR #2416 - Adding fixes and workarounds to allow compilation with nvcc/msvc (VS2015up3)
- PR #2415 - Fix errors coming from hpx compute examples
- PR #2414 - Fixing msvc12
- PR #2413 - Enable cuda/nvcc or cuda/clang when using add_hpx_executable()
- PR #2412 - Fix issue in HPX_SetupTarget.cmake when cuda is used
- PR #2411 - This fixes the core compilation issues with MSVC12
- Issue #2410 -
undefined reference to opal_hwloc191_hwloc_.....
- PR #2409 - Fixing locking for channel and receive_buffer
- PR #2407 - Solving #2402 and #2403
- PR #2406 - Improve guards
- PR #2405 - Enable parallel::for_each for iterators returning proxy types
- PR #2404 - Forward the explicitly given result_type in the hpx invoke
- Issue #2403 - datapar_execution + zip iterator: lambda arguments aren’t references
- Issue #2402 - datapar algorithm instantiated with wrong type #2402
- PR #2401 - Added support for imported libraries to HPX_Libraries.cmake
- PR #2400 - Use CMake policy CMP0060
- Issue #2399 - Error trying to push back vector of futures to vector
- PR #2398 - Allow config #defines to be written out to custom config/defines.hpp
- Issue #2397 - CMake generated config defines can cause tedious rebuilds category
- Issue #2396 - BOOST_ROOT paths are not used at link time
- PR #2395 - Fix target_link_libraries() issue when HPX Cuda is enabled
- Issue #2394 - Template compilation error using HPX_WITH_DATAPAR_LIBFLATARRAY
- PR #2393 - Fixing lock registration for recursive mutex
- PR #2392 - Add keywords in target_link_libraries in hpx_setup_target
- PR #2391 - Clang goroutines
- Issue #2390 - Adapt execution policy name changes from C++17
- PR #2389 - Chunk allocator and pool are not used and are obsolete
- PR #2388 - Adding functionalities to datapar needed by octotiger
- PR #2387 - Fixing race condition for early parcels
- Issue #2386 - Lock registration broken for recursive_mutex
- PR #2385 - Datapar zip iterator
- PR #2384 - Fixing race condition in for_loop_reduction
- PR #2383 - Continuations
- PR #2382 - add LibFlatArray-based backend for datapar
- PR #2381 - remove unused typedef to get rid of compiler warnings
- PR #2380 - Tau cleanup
- PR #2379 - Can send immediate
- PR #2378 - Renaming copy_helper/copy_n_helper/move_helper/move_n_helper
- Issue #2376 - Boost trunk’s spinlock initializer fails to compile
- PR #2375 - Add support for minimal thread local data
- PR #2374 - Adding API functions set_config_entry_callback
- PR #2373 - Add a simple utility for debugging that gives supended task backtraces
- PR #2372 - Barrier Fixes
- Issue #2370 - Can’t wait on a wrapped future
- PR #2369 - Fixing stable_partition
- PR #2367 - Fixing find_prefixes for Windows platforms
- PR #2366 - Testing for experimental/optional only in C++14 mode
- PR #2364 - Adding set_config_entry
- PR #2363 - Fix papi
- PR #2362 - Adding missing macros for new non-direct actions
- PR #2361 - Improve cmake output to help debug compiler incompatibility check
- PR #2360 - Fixing race condition in condition_variable
- PR #2359 - Fixing shutdown when parcels are still in flight
- Issue #2357 - failed to insert console_print_action into typename_to_id_t registry
- PR #2356 - Fixing return type of get_iterator_tuple
- PR #2355 - Fixing compilation against Boost 1 62
- PR #2354 - Adding serialization for mask_type if CPU_COUNT > 64
- PR #2353 - Adding hooks to tie in APEX into the parcel layer
- Issue #2352 - Compile errors when using intel 17 beta (for KNL) on edison
- PR #2351 - Fix function vtable get_function_address implementation
- Issue #2350 - Build failure - master branch (4de09f5) with Intel Compiler v17
- PR #2349 - Enabling zero-copy serialization support for std::vector<>
- PR #2348 - Adding test to verify #2334 is fixed
- PR #2347 - Bug fixes for hpx.compute and hpx::lcos::channel
- PR #2346 - Removing cmake “find” files that are in the APEX cmake Modules
- PR #2345 - Implemented parallel::stable_partition
- PR #2344 - Making hpx::lcos::channel usable with basename registration
- PR #2343 - Fix a couple of examples that failed to compile after recent api changes
- Issue #2342 - Enabling APEX causes link errors
- PR #2341 - Removing cmake “find” files that are in the APEX cmake Modules
- PR #2340 - Implemented all existing datapar algorithms using Boost.SIMD
- PR #2339 - Fixing 2338
- PR #2338 - Possible race in sliding semaphore
- PR #2337 - Adjust osu_latency test to measure window_size parcels in flight at once
- PR #2336 - Allowing remote direct actions to be executed without spawning a task
- PR #2335 - Making sure multiple components are properly initialized from arguments
- Issue #2334 - Cannot construct component with large vector on a remote locality
- PR #2332 - Fixing hpx::lcos::local::barrier
- PR #2331 - Updating APEX support to include OTF2
- PR #2330 - Support for data-parallelism for parallel algorithms
- Issue #2329 - Coordinate settings in cmake
- PR #2328 - fix LibGeoDecomp builds with HPX + GCC 5.3.0 + CUDA 8RC
- PR #2326 - Making scan_partitioner work (for now)
- Issue #2323 - Constructing a vector of components only correctly initializes the first component
- PR #2322 - Fix problems that bubbled up after merging #2278
- PR #2321 - Scalable barrier
- PR #2320 - Std flag fixes
- Issue #2319 - -std=c++14 and -std=c++1y with Intel can’t build recent Boost builds due to insufficient C++14 support; don’t enable these flags by default for Intel
- PR #2318 - Improve handling of –hpx:bind=<bind-spec>
- PR #2317 - Making sure command line warnings are printed once only
- PR #2316 - Fixing command line handling for default bind mode
- PR #2315 - Set id_retrieved if set_id is present
- Issue #2314 - Warning for requested/allocated thread discrepancy is printed twice
- Issue #2313 - –hpx:print-bind doesn’t work with –hpx:pu-step
- Issue #2312 - –hpx:bind range specifier restrictions are overly restrictive
- Issue #2311 - hpx_0.9.99 out of project build fails
- PR #2310 - Simplify function registration
- PR #2309 - Spelling and grammar revisions in documentation (and some code)
- PR #2306 - Correct minor typo in the documentation
- PR #2305 - Cleaning up and fixing parcel coalescing
- PR #2304 - Inspect checks for stream related includes
- PR #2303 - Add functionality allowing to enumerate threads of given state
- PR #2301 - Algorithm overloads fix for VS2013
- PR #2300 - Use <cstdint>, add inspect checks
- PR #2299 - Replace boost::[c]ref with std::[c]ref, add inspect checks
- PR #2297 - Fixing compilation with no hw_loc
- PR #2296 - Hpx compute
- PR #2295 - Making sure for_loop(execution::par, 0, N, …) is actually executed in parallel
- PR #2294 - Throwing exceptions if the runtime is not up and running
- PR #2293 - Removing unused parcel port code
- PR #2292 - Refactor function vtables
- PR #2291 - Fixing 2286
- PR #2290 - Simplify algorithm overloads
- PR #2289 - Adding performance counters reporting parcel related data on a per-action basis
- Issue #2288 - Remove dormant parcelports
- Issue #2286 - adjustments to parcel handling to support parcelports that do not need a connection cache
- PR #2285 - add CMake option to disable package export
- PR #2283 - Add more inspect checks for use of deprecated components
- Issue #2282 - Arithmetic exception in executor static chunker
- Issue #2281 - For loop doesn’t parallelize
- PR #2280 - Fixing 2277: build failure with PAPI
- PR #2279 - Child vs parent stealing
- Issue #2277 - master branch build failure (53c5b4f) with papi
- PR #2276 - Compile time launch policies
- PR #2275 - Replace boost::chrono with std::chrono in interfaces
- PR #2274 - Replace most uses of Boost.Assign with initializer list
- PR #2273 - Fixed typos
- PR #2272 - Inspect checks
- PR #2270 - Adding test verifying -Ihpx.os_threads=all
- PR #2269 - Added inspect check for now obsolete boost type traits
- PR #2268 - Moving more code into source files
- Issue #2267 - Add inspect support to deprecate Boost.TypeTraits
- PR #2265 - Adding channel LCO
- PR #2264 - Make support for std::ref mandatory
- PR #2263 - Constrain tuple_member forwarding constructor
- Issue #2262 - Test hpx.os_threads=all
- Issue #2261 - OS X: Error: no matching constructor for initialization of ‘hpx::lcos::local::condition_variable_any’
- Issue #2260 - Make support for std::ref mandatory
- PR #2259 - Remove most of Boost.MPL, Boost.EnableIf and Boost.TypeTraits
- PR #2258 - Fixing #2256
- PR #2257 - Fixing launch process
- Issue #2256 - Actions are not registered if not invoked
- PR #2255 - Coalescing histogram
- PR #2254 - Silence explicit initialization in copy-constructor warnings
- PR #2253 - Drop support for GCC 4.6 and 4.7
- PR #2252 - Prepare V1.0
- PR #2251 - Convert to 0.9.99
- PR #2249 - Adding iterator_facade and iterator_adaptor
- Issue #2248 - Need a feature to yield to a new task immediately
- PR #2246 - Adding split_future
- PR #2245 - Add an example for handing over a component instance to a dynamically launched locality
- Issue #2243 - Add example demonstrating AGAS symbolic name registration
- Issue #2242 - pkgconfig test broken on CentOS 7 / Boost 1.61
- Issue #2241 - Compilation error for partitioned vector in hpx_compute branch
- PR #2240 - Fixing termination detection on one locality
- Issue #2239 - Create a new facility lcos::split_all
- Issue #2236 - hpx::cout vs. std::cout
- PR #2232 - Implement local-only primary namespace service
- Issue #2147 - would like to know how much data is being routed by particular actions
- Issue #2109 - Warning while compiling hpx
- Issue #1973 - Setting INTERFACE_COMPILE_OPTIONS for hpx_init in CMake taints Fortran_FLAGS
- Issue #1864 - run_guarded using bound function ignores reference
- Issue #1754 - Running with TCP parcelport causes immediate crash or freeze
- Issue #1655 - Enable zip_iterator to be used with Boost traversal iterator categories
- Issue #1591 - Optimize AGAS for shared memory only operation
- Issue #1401 - Need an efficient infiniband parcelport
- Issue #1125 - Fix the IPC parcelport
- Issue #839 - Refactor ibverbs and shmem parcelport
- Issue #702 - Add instrumentation of parcel layer
- Issue #668 - Implement ispc task interface
- Issue #533 - Thread queue/deque internal parameters should be runtime configurable
- Issue #475 - Create a means of combining performance counters into querysets