HPX V1.0.0 (Apr 24, 2017)
Contents
HPX V1.0.0 (Apr 24, 2017)#
General changes#
Here are some of the main highlights and changes for this release (in no particular order):
Added the facility
hpx::split_future
which allows one to convert afuture<tuple<Ts...>>
into atuple<future<Ts>...>
. This functionality is not available when compiling HPX with VS2012.Added a new type of performance counter which allows one to return a list of values for each invocation. We also added a first counter of this type which collects a histogram of the times between parcels being created.
Added new LCOs:
hpx::lcos::channel
andhpx::lcos::local::channel
which are very similar to the well known channel constructs used in the Go language.Added new performance counters reporting the amount of data handled by the networking layer on a action-by-action basis (please see PR #2289 for more details).
Added a new facility
hpx::lcos::barrier
, replacing the equally named older one. The new facility has a slightly changed API and is much more efficient. Most notable, the new facility exposes a (global) functionhpx::lcos::barrier::synchronize()
which represents a global barrier across all localities.We have started to add support for vectorization to our parallel algorithm implementations. This support depends on using an external library, currently either Vc Library or |boost_simd|_. Please see Issue #2333 for a list of currently supported algorithms. This is an experimental feature and its implementation and/or API might change in the future. Please see this blog-post for more information.
The parameter sequence for the
hpx::parallel::transform_reduce
overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17. The old overload can be still enabled at configure time by specifying-DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On
to CMake.The algorithm
hpx::parallel::inner_product
has been renamed tohpx::parallel::transform_reduce
to match the changes this algorithm has undergone while being moved to C++17. The old inner_product names can be still enabled at configure time by specifying-DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On
to CMake.Added versions of
hpx::get_ptr
taking client side representations for component instances as their parameter (instead of a global id).Added the helper utility
hpx::performance_counters::performance_counter_set
helping to encapsulate a set of performance counters to be managed concurrently.All execution policies and related classes have been renamed to be consistent with the naming changes applied for C++17. All policies now live in the namespace
hpx::parallel::execution
. The ols names can be still enabled at configure time by specifying-DHPX_WITH_EXECUTION_POLICY_COMPATIBILITY=On
to CMake.The thread scheduling subsystem has undergone a major refactoring which results in significant performance improvements. We have also imroved the performance of creating
hpx::future
and of various facilities handling those.We have consolidated all of the code in HPX.Compute related to the integration of CUDA.
hpx::partitioned_vector
has been enabled to be usable withhpx::compute::vector
which allows one to place the partitions on one or more GPU devices.Added new performance counters exposing various internals of the thread scheduling subsystem, such as the current idle- and busy-loop counters and instantaneous scheduler utilization.
Extended and improved the use of the ITTNotify hooks allowing to collect performance counter data and function annotation information from within the Intel Amplifier tool.
Breaking changes#
We have dropped support for the gcc compiler versions V4.6 and 4.7. The minimal gcc version we now test on is gcc V4.8.
We have removed (default) support for
boost::chrono
in interfaces, uses of it have been replaced withstd::chrono
. This facility can be still enabled at configure time by specifying-DHPX_WITH_BOOST_CHRONO_COMPATIBILITY=On
to CMake.The parameter sequence for the
hpx::parallel::transform_reduce
overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17.The algorithm
hpx::parallel::inner_product
has been renamed tohpx::parallel::transform_reduce
to match the changes this algorithm has undergone while being moved to C++17.the build options
HPX_WITH_COLOCATED_BACKWARDS_COMPATIBILITY
andHPX_WITH_COMPONENT_GET_GID_COMPATIBILITY
are now disabled by default. Please change your code still depending on the deprecated interfaces.
Bug fixes (closed tickets)#
Here is a list of the important tickets we closed for this release.
PR #2596 - Adding apex data
PR #2595 - Remove obsolete file
Issue #2594 - FindOpenCL.cmake mismatch with the official cmake module
PR #2592 - First attempt to introduce spmd_block in hpx
Issue #2591 - Feature request: continuation (then) which does not require the callable object to take a future<R> as parameter
PR #2588 - Daint fixes
PR #2587 - Fixing transfer_(continuation)_action::schedule
PR #2585 - Work around MSVC having an ICE when compiling with -Ob2
PR #2583 - changing 7zip command to 7za in roll_release.sh
PR #2582 - First attempt to introduce spmd_block in hpx
PR #2581 - Enable annotated function for parallel algorithms
PR #2580 - First attempt to introduce spmd_block in hpx
PR #2579 - Make thread NICE level setting an option
PR #2578 - Implementing enqueue instead of busy wait when no sender is available
PR #2577 - Retrieve -std=c++11 consistent nvcc flag
PR #2576 - Add missing dependencies of cuda based tests
PR #2575 - Remove warnings due to some captured variables
PR #2573 - Attempt to resolve resolve_locality
PR #2572 - Adding APEX hooks to background thread
PR #2571 - Pick up hpx.ignore_batch_env from config map
PR #2570 - Add commandline options –hpx:print-counters-locally
PR #2569 - Fix computeapi unit tests
PR #2567 - This adds another barrier::synchronize before registering performance counters
PR #2564 - Cray static toolchain support
PR #2563 - Fixed unhandled exception during startup
PR #2562 - Remove partitioned_vector.cu from build tree when nvcc is used
Issue #2561 - octo-tiger crash with commit 6e921495ff6c26f125d62629cbaad0525f14f7ab
PR #2560 - Prevent -Wundef warnings on Vc version checks
PR #2559 - Allowing CUDA callback to set the future directly from an OS thread
PR #2558 - Remove warnings due to float precisions
PR #2557 - Removing bogus handling of compile flags for CUDA
PR #2556 - Fixing scan partitioner
PR #2554 - Add more diagnostics to error thrown from find_appropriate_destination
Issue #2555 - No valid parcelport configured
PR #2553 - Add cmake cuda_arch option
PR #2552 - Remove incomplete datapar bindings to libflatarray
PR #2551 - Rename hwloc_topology to hwloc_topology_info
PR #2550 - Apex api updates
PR #2549 - Pre-include defines.hpp to get the macro HPX_HAVE_CUDA value
PR #2548 - Fixing issue with disconnect
PR #2546 - Some fixes around cuda clang partitioned_vector example
PR #2545 - Fix uses of the Vc2 datapar flags; the value, not the type, should be passed to functions
PR #2542 - Make HPX_WITH_MALLOC easier to use
PR #2541 - avoid recompiles when enabling/disabling examples
PR #2540 - Fixing usage of target_link_libraries()
PR #2539 - fix RPATH behaviour
Issue #2538 - HPX_WITH_CUDA corrupts compilation flags
PR #2537 - Add output of a Bazel Skylark extension for paths and compile options
PR #2536 - Add counter exposing total available memory to Windows as well
PR #2535 - Remove obsolete support for security
Issue #2534 - Remove command line option
--hpx:run-agas-server
PR #2533 - Pre-cache locality endpoints during bootstrap
PR #2532 - Fixing handling of GIDs during serialization preprocessing
PR #2531 - Amend uses of the term “functor”
PR #2529 - added counter for reading available memory
PR #2527 - Facilities to create actions from lambdas
PR #2526 - Updated docs: HPX_WITH_EXAMPLES
PR #2525 - Remove warnings related to unused captured variables
Issue #2524 - CMAKE failed because it is missing: TCMALLOC_LIBRARY TCMALLOC_INCLUDE_DIR
PR #2523 - Fixing compose_cb stack overflow
PR #2522 - Instead of unlocking, ignore the lock while creating the message handler
PR #2521 - Create
LPROGRESS_
logging macro to simplify progress tracking and timingsPR #2520 - Intel 17 support
PR #2519 - Fix components example
PR #2518 - Fixing parcel scheduling
Issue #2517 - Race condition during Parcel Coalescing Handler creation
Issue #2516 - HPX locks up when using at least 256 localities
Issue #2515 - error: Install cannot find “/lib/hpx/libparcel_coalescing.so.0.9.99” but I can see that file
PR #2514 - Making sure that all continuations of a shared_future are invoked in order
PR #2513 - Fixing locks held during suspension
PR #2512 - MPI Parcelport improvements and fixes related to the background work changes
PR #2511 - Fixing bit-wise (zero-copy) serialization
Issue #2509 - Linking errors in hwloc_topology
PR #2508 - Added documentation for debugging with core files
PR #2506 - Fixing background work invocations
PR #2505 - Fix tuple serialization
Issue #2504 - Ensure continuations are called in the order they have been attached
PR #2503 - Adding serialization support for Vc v2 (datapar)
PR #2502 - Resolve various, minor compiler warnings
PR #2501 - Some other fixes around cuda examples
Issue #2500 - nvcc / cuda clang issue due to a missing -DHPX_WITH_CUDA flag
PR #2499 - Adding support for std::array to wait_all and friends
PR #2498 - Execute background work as HPX thread
PR #2497 - Fixing configuration options for spinlock-deadlock detection
PR #2496 - Accounting for different compilers in CrayKNL toolchain file
PR #2494 - Adding component base class which ties a component instance to a given executor
PR #2493 - Enable controlling amount of pending threads which must be available to allow thread stealing
PR #2492 - Adding new command line option –hpx:print-counter-reset
PR #2491 - Resolve ambiguities when compiling with APEX
PR #2490 - Resuming threads waiting on future with higher priority
Issue #2489 - nvcc issue because -std=c++11 appears twice
PR #2488 - Adding performance counters exposing the internal idle and busy-loop counters
PR #2487 - Allowing for plain suspend to reschedule thread right away
PR #2486 - Only flag HPX code for CUDA if HPX_WITH_CUDA is set
PR #2485 - Making thread-queue parameters runtime-configurable
PR #2484 - Added atomic counter for parcel-destinations
PR #2483 - Added priority-queue lifo scheduler
PR #2482 - Changing scheduler to steal only if more than a minimal number of tasks are available
PR #2481 - Extending command line option –hpx:print-counter-destination to support value ‘none’
PR #2479 - Added option to disable signal handler
PR #2478 - Making sure the sine performance counter module gets loaded only for the corresponding example
Issue #2477 - Breaking at a throw statement
PR #2476 - Annotated function
PR #2475 - Ensure that using %osthread% during logging will not throw for non-hpx threads
PR #2474 - Remove now superficial non_direct actions from base_lco and friends
PR #2473 - Refining support for ITTNotify
PR #2472 - Some fixes around hpx compute
Issue #2470 - redefinition of boost::detail::spinlock
Issue #2469 - Dataflow performance issue
PR #2468 - Perf docs update
PR #2466 - Guarantee to execute remote direct actions on HPX-thread
PR #2465 - Improve demo : Async copy and fixed device handling
PR #2464 - Adding performance counter exposing instantaneous scheduler utilization
PR #2463 - Downcast to future<void>
PR #2462 - Fixed usage of ITT-Notify API with Intel Amplifier
PR #2461 - Cublas demo
PR #2460 - Fixing thread bindings
PR #2459 - Make -std=c++11 nvcc flag consistent for in-build and installed versions
Issue #2457 - Segmentation fault when registering a partitioned vector
PR #2452 - Properly releasing global barrier for unhandled exceptions
PR #2451 - Fixing long shutdown times
PR #2450 - Attempting to fix initialization errors on newer platforms (Boost V1.63)
PR #2449 - Replace BOOST_COMPILER_FENCE with an HPX version
PR #2448 - This fixes a possible race in the migration code
- PR #2445 - Fixing dataflow et.al. for futures or future-ranges wrapped
into ref()
PR #2444 - Fix segfaults
PR #2443 - Issue 2442
Issue #2442 - Mismatch between #if/#endif and namespace scope brackets in this_thread_executers.hpp
Issue #2441 - undeclared identifier BOOST_COMPILER_FENCE
PR #2440 - Knl build
PR #2438 - Datapar backend
PR #2437 - Adapt algorithm parameter sequence changes from C++17
PR #2436 - Adapt execution policy name changes from C++17
Issue #2435 - Trunk broken, undefined reference to hpx::thread::interrupt(hpx::thread::id, bool)
PR #2434 - More fixes to resource manager
PR #2433 - Added versions of
hpx::get_ptr
taking client side representationsPR #2432 - Warning fixes
PR #2431 - Adding facility representing set of performance counters
PR #2430 - Fix parallel_executor thread spawning
PR #2429 - Fix attribute warning for gcc
Issue #2427 - Seg fault running octo-tiger with latest HPX commit
Issue #2426 - Bug in 9592f5c0bc29806fce0dbe73f35b6ca7e027edcb causes immediate crash in Octo-tiger
PR #2425 - Fix nvcc errors due to constexpr specifier
Issue #2424 - Async action on component present on hpx::find_here is executing synchronously
PR #2423 - Fix nvcc errors due to constexpr specifier
PR #2422 - Implementing hpx::this_thread thread data functions
PR #2421 - Adding benchmark for wait_all
Issue #2420 - Returning object of a component client from another component action fails
PR #2419 - Infiniband parcelport
Issue #2418 - gcc + nvcc fails to compile code that uses partitioned_vector
PR #2417 - Fixing context switching
PR #2416 - Adding fixes and workarounds to allow compilation with nvcc/msvc (VS2015up3)
PR #2415 - Fix errors coming from hpx compute examples
PR #2414 - Fixing msvc12
PR #2413 - Enable cuda/nvcc or cuda/clang when using add_hpx_executable()
PR #2412 - Fix issue in HPX_SetupTarget.cmake when cuda is used
PR #2411 - This fixes the core compilation issues with MSVC12
Issue #2410 -
undefined reference to opal_hwloc191_hwloc_.....
PR #2409 - Fixing locking for channel and receive_buffer
PR #2407 - Solving #2402 and #2403
PR #2406 - Improve guards
PR #2405 - Enable parallel::for_each for iterators returning proxy types
PR #2404 - Forward the explicitly given result_type in the hpx invoke
Issue #2403 - datapar_execution + zip iterator: lambda arguments aren’t references
Issue #2402 - datapar algorithm instantiated with wrong type #2402
PR #2401 - Added support for imported libraries to HPX_Libraries.cmake
PR #2400 - Use CMake policy CMP0060
Issue #2399 - Error trying to push back vector of futures to vector
PR #2398 - Allow config #defines to be written out to custom config/defines.hpp
Issue #2397 - CMake generated config defines can cause tedious rebuilds category
Issue #2396 - BOOST_ROOT paths are not used at link time
PR #2395 - Fix target_link_libraries() issue when HPX Cuda is enabled
Issue #2394 - Template compilation error using HPX_WITH_DATAPAR_LIBFLATARRAY
PR #2393 - Fixing lock registration for recursive mutex
PR #2392 - Add keywords in target_link_libraries in hpx_setup_target
PR #2391 - Clang goroutines
Issue #2390 - Adapt execution policy name changes from C++17
PR #2389 - Chunk allocator and pool are not used and are obsolete
PR #2388 - Adding functionalities to datapar needed by octotiger
PR #2387 - Fixing race condition for early parcels
Issue #2386 - Lock registration broken for recursive_mutex
PR #2385 - Datapar zip iterator
PR #2384 - Fixing race condition in for_loop_reduction
PR #2383 - Continuations
PR #2382 - add LibFlatArray-based backend for datapar
PR #2381 - remove unused typedef to get rid of compiler warnings
PR #2380 - Tau cleanup
PR #2379 - Can send immediate
PR #2378 - Renaming copy_helper/copy_n_helper/move_helper/move_n_helper
Issue #2376 - Boost trunk’s spinlock initializer fails to compile
PR #2375 - Add support for minimal thread local data
PR #2374 - Adding API functions set_config_entry_callback
PR #2373 - Add a simple utility for debugging that gives suspended task backtraces
PR #2372 - Barrier Fixes
Issue #2370 - Can’t wait on a wrapped future
PR #2369 - Fixing stable_partition
PR #2367 - Fixing find_prefixes for Windows platforms
PR #2366 - Testing for experimental/optional only in C++14 mode
PR #2364 - Adding set_config_entry
PR #2363 - Fix papi
PR #2362 - Adding missing macros for new non-direct actions
PR #2361 - Improve cmake output to help debug compiler incompatibility check
PR #2360 - Fixing race condition in condition_variable
PR #2359 - Fixing shutdown when parcels are still in flight
Issue #2357 - failed to insert console_print_action into typename_to_id_t registry
PR #2356 - Fixing return type of get_iterator_tuple
PR #2355 - Fixing compilation against Boost 1 62
PR #2354 - Adding serialization for mask_type if CPU_COUNT > 64
PR #2353 - Adding hooks to tie in APEX into the parcel layer
Issue #2352 - Compile errors when using intel 17 beta (for KNL) on edison
PR #2351 - Fix function vtable get_function_address implementation
Issue #2350 - Build failure - master branch (4de09f5) with Intel Compiler v17
PR #2349 - Enabling zero-copy serialization support for std::vector<>
PR #2348 - Adding test to verify #2334 is fixed
PR #2347 - Bug fixes for hpx.compute and hpx::lcos::channel
PR #2346 - Removing cmake “find” files that are in the APEX cmake Modules
PR #2345 - Implemented parallel::stable_partition
PR #2344 - Making hpx::lcos::channel usable with basename registration
PR #2343 - Fix a couple of examples that failed to compile after recent api changes
Issue #2342 - Enabling APEX causes link errors
PR #2341 - Removing cmake “find” files that are in the APEX cmake Modules
PR #2340 - Implemented all existing datapar algorithms using Boost.SIMD
PR #2339 - Fixing 2338
PR #2338 - Possible race in sliding semaphore
PR #2337 - Adjust osu_latency test to measure window_size parcels in flight at once
PR #2336 - Allowing remote direct actions to be executed without spawning a task
PR #2335 - Making sure multiple components are properly initialized from arguments
Issue #2334 - Cannot construct component with large vector on a remote locality
PR #2332 - Fixing hpx::lcos::local::barrier
PR #2331 - Updating APEX support to include OTF2
PR #2330 - Support for data-parallelism for parallel algorithms
Issue #2329 - Coordinate settings in cmake
PR #2328 - fix LibGeoDecomp builds with HPX + GCC 5.3.0 + CUDA 8RC
PR #2326 - Making scan_partitioner work (for now)
Issue #2323 - Constructing a vector of components only correctly initializes the first component
PR #2322 - Fix problems that bubbled up after merging #2278
PR #2321 - Scalable barrier
PR #2320 - Std flag fixes
Issue #2319 - -std=c++14 and -std=c++1y with Intel can’t build recent Boost builds due to insufficient C++14 support; don’t enable these flags by default for Intel
PR #2318 - Improve handling of –hpx:bind=<bind-spec>
PR #2317 - Making sure command line warnings are printed once only
PR #2316 - Fixing command line handling for default bind mode
PR #2315 - Set id_retrieved if set_id is present
Issue #2314 - Warning for requested/allocated thread discrepancy is printed twice
Issue #2313 - –hpx:print-bind doesn’t work with –hpx:pu-step
Issue #2312 - –hpx:bind range specifier restrictions are overly restrictive
Issue #2311 - hpx_0.9.99 out of project build fails
PR #2310 - Simplify function registration
PR #2309 - Spelling and grammar revisions in documentation (and some code)
PR #2306 - Correct minor typo in the documentation
PR #2305 - Cleaning up and fixing parcel coalescing
PR #2304 - Inspect checks for stream related includes
PR #2303 - Add functionality allowing to enumerate threads of given state
PR #2301 - Algorithm overloads fix for VS2013
PR #2300 - Use <cstdint>, add inspect checks
PR #2299 - Replace boost::[c]ref with std::[c]ref, add inspect checks
PR #2297 - Fixing compilation with no hw_loc
PR #2296 - Hpx compute
PR #2295 - Making sure for_loop(execution::par, 0, N, …) is actually executed in parallel
PR #2294 - Throwing exceptions if the runtime is not up and running
PR #2293 - Removing unused parcel port code
PR #2292 - Refactor function vtables
PR #2291 - Fixing 2286
PR #2290 - Simplify algorithm overloads
PR #2289 - Adding performance counters reporting parcel related data on a per-action basis
Issue #2288 - Remove dormant parcelports
Issue #2286 - adjustments to parcel handling to support parcelports that do not need a connection cache
PR #2285 - add CMake option to disable package export
PR #2283 - Add more inspect checks for use of deprecated components
Issue #2282 - Arithmetic exception in executor static chunker
Issue #2281 - For loop doesn’t parallelize
PR #2280 - Fixing 2277: build failure with PAPI
PR #2279 - Child vs parent stealing
Issue #2277 - master branch build failure (53c5b4f) with papi
PR #2276 - Compile time launch policies
PR #2275 - Replace boost::chrono with std::chrono in interfaces
PR #2274 - Replace most uses of Boost.Assign with initializer list
PR #2273 - Fixed typos
PR #2272 - Inspect checks
PR #2270 - Adding test verifying -Ihpx.os_threads=all
PR #2269 - Added inspect check for now obsolete boost type traits
PR #2268 - Moving more code into source files
Issue #2267 - Add inspect support to deprecate Boost.TypeTraits
PR #2265 - Adding channel LCO
PR #2264 - Make support for std::ref mandatory
PR #2263 - Constrain tuple_member forwarding constructor
Issue #2262 - Test hpx.os_threads=all
Issue #2261 - OS X: Error: no matching constructor for initialization of ‘hpx::lcos::local::condition_variable_any’
Issue #2260 - Make support for std::ref mandatory
PR #2259 - Remove most of Boost.MPL, Boost.EnableIf and Boost.TypeTraits
PR #2258 - Fixing #2256
PR #2257 - Fixing launch process
Issue #2256 - Actions are not registered if not invoked
PR #2255 - Coalescing histogram
PR #2254 - Silence explicit initialization in copy-constructor warnings
PR #2253 - Drop support for GCC 4.6 and 4.7
PR #2252 - Prepare V1.0
PR #2251 - Convert to 0.9.99
PR #2249 - Adding iterator_facade and iterator_adaptor
Issue #2248 - Need a feature to yield to a new task immediately
PR #2246 - Adding split_future
PR #2245 - Add an example for handing over a component instance to a dynamically launched locality
Issue #2243 - Add example demonstrating AGAS symbolic name registration
Issue #2242 - pkgconfig test broken on CentOS 7 / Boost 1.61
Issue #2241 - Compilation error for partitioned vector in hpx_compute branch
PR #2240 - Fixing termination detection on one locality
Issue #2239 - Create a new facility lcos::split_all
Issue #2236 - hpx::cout vs. std::cout
PR #2232 - Implement local-only primary namespace service
Issue #2147 - would like to know how much data is being routed by particular actions
Issue #2109 - Warning while compiling hpx
Issue #1973 - Setting INTERFACE_COMPILE_OPTIONS for hpx_init in CMake taints Fortran_FLAGS
Issue #1864 - run_guarded using bound function ignores reference
Issue #1754 - Running with TCP parcelport causes immediate crash or freeze
Issue #1655 - Enable zip_iterator to be used with Boost traversal iterator categories
Issue #1591 - Optimize AGAS for shared memory only operation
Issue #1401 - Need an efficient infiniband parcelport
Issue #1125 - Fix the IPC parcelport
Issue #839 - Refactor ibverbs and shmem parcelport
Issue #702 - Add instrumentation of parcel layer
Issue #668 - Implement ispc task interface
Issue #533 - Thread queue/deque internal parameters should be runtime configurable
Issue #475 - Create a means of combining performance counters into querysets