HPX V1.0.0 (Apr 24, 2017)#

General changes#

Here are some of the main highlights and changes for this release (in no particular order):

  • Added the facility hpx::split_future which allows one to convert a future<tuple<Ts...>> into a tuple<future<Ts>...>. This functionality is not available when compiling HPX with VS2012.

  • Added a new type of performance counter which allows one to return a list of values for each invocation. We also added a first counter of this type which collects a histogram of the times between parcels being created.

  • Added new LCOs: hpx::lcos::channel and hpx::lcos::local::channel which are very similar to the well known channel constructs used in the Go language.

  • Added new performance counters reporting the amount of data handled by the networking layer on a action-by-action basis (please see PR #2289 for more details).

  • Added a new facility hpx::lcos::barrier, replacing the equally named older one. The new facility has a slightly changed API and is much more efficient. Most notable, the new facility exposes a (global) function hpx::lcos::barrier::synchronize() which represents a global barrier across all localities.

  • We have started to add support for vectorization to our parallel algorithm implementations. This support depends on using an external library, currently either Vc Library or |boost_simd|_. Please see Issue #2333 for a list of currently supported algorithms. This is an experimental feature and its implementation and/or API might change in the future. Please see this blog-post for more information.

  • The parameter sequence for the hpx::parallel::transform_reduce overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17. The old overload can be still enabled at configure time by specifying -DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On to CMake.

  • The algorithm hpx::parallel::inner_product has been renamed to hpx::parallel::transform_reduce to match the changes this algorithm has undergone while being moved to C++17. The old inner_product names can be still enabled at configure time by specifying -DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On to CMake.

  • Added versions of hpx::get_ptr taking client side representations for component instances as their parameter (instead of a global id).

  • Added the helper utility hpx::performance_counters::performance_counter_set helping to encapsulate a set of performance counters to be managed concurrently.

  • All execution policies and related classes have been renamed to be consistent with the naming changes applied for C++17. All policies now live in the namespace hpx::parallel::execution. The ols names can be still enabled at configure time by specifying -DHPX_WITH_EXECUTION_POLICY_COMPATIBILITY=On to CMake.

  • The thread scheduling subsystem has undergone a major refactoring which results in significant performance improvements. We have also imroved the performance of creating hpx::future and of various facilities handling those.

  • We have consolidated all of the code in HPX.Compute related to the integration of CUDA. hpx::partitioned_vector has been enabled to be usable with hpx::compute::vector which allows one to place the partitions on one or more GPU devices.

  • Added new performance counters exposing various internals of the thread scheduling subsystem, such as the current idle- and busy-loop counters and instantaneous scheduler utilization.

  • Extended and improved the use of the ITTNotify hooks allowing to collect performance counter data and function annotation information from within the Intel Amplifier tool.

Breaking changes#

  • We have dropped support for the gcc compiler versions V4.6 and 4.7. The minimal gcc version we now test on is gcc V4.8.

  • We have removed (default) support for boost::chrono in interfaces, uses of it have been replaced with std::chrono. This facility can be still enabled at configure time by specifying -DHPX_WITH_BOOST_CHRONO_COMPATIBILITY=On to CMake.

  • The parameter sequence for the hpx::parallel::transform_reduce overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17.

  • The algorithm hpx::parallel::inner_product has been renamed to hpx::parallel::transform_reduce to match the changes this algorithm has undergone while being moved to C++17.

  • the build options HPX_WITH_COLOCATED_BACKWARDS_COMPATIBILITY and HPX_WITH_COMPONENT_GET_GID_COMPATIBILITY are now disabled by default. Please change your code still depending on the deprecated interfaces.

Bug fixes (closed tickets)#

Here is a list of the important tickets we closed for this release.

  • PR #2596 - Adding apex data

  • PR #2595 - Remove obsolete file

  • Issue #2594 - FindOpenCL.cmake mismatch with the official cmake module

  • PR #2592 - First attempt to introduce spmd_block in hpx

  • Issue #2591 - Feature request: continuation (then) which does not require the callable object to take a future<R> as parameter

  • PR #2588 - Daint fixes

  • PR #2587 - Fixing transfer_(continuation)_action::schedule

  • PR #2585 - Work around MSVC having an ICE when compiling with -Ob2

  • PR #2583 - changing 7zip command to 7za in roll_release.sh

  • PR #2582 - First attempt to introduce spmd_block in hpx

  • PR #2581 - Enable annotated function for parallel algorithms

  • PR #2580 - First attempt to introduce spmd_block in hpx

  • PR #2579 - Make thread NICE level setting an option

  • PR #2578 - Implementing enqueue instead of busy wait when no sender is available

  • PR #2577 - Retrieve -std=c++11 consistent nvcc flag

  • PR #2576 - Add missing dependencies of cuda based tests

  • PR #2575 - Remove warnings due to some captured variables

  • PR #2573 - Attempt to resolve resolve_locality

  • PR #2572 - Adding APEX hooks to background thread

  • PR #2571 - Pick up hpx.ignore_batch_env from config map

  • PR #2570 - Add commandline options –hpx:print-counters-locally

  • PR #2569 - Fix computeapi unit tests

  • PR #2567 - This adds another barrier::synchronize before registering performance counters

  • PR #2564 - Cray static toolchain support

  • PR #2563 - Fixed unhandled exception during startup

  • PR #2562 - Remove partitioned_vector.cu from build tree when nvcc is used

  • Issue #2561 - octo-tiger crash with commit 6e921495ff6c26f125d62629cbaad0525f14f7ab

  • PR #2560 - Prevent -Wundef warnings on Vc version checks

  • PR #2559 - Allowing CUDA callback to set the future directly from an OS thread

  • PR #2558 - Remove warnings due to float precisions

  • PR #2557 - Removing bogus handling of compile flags for CUDA

  • PR #2556 - Fixing scan partitioner

  • PR #2554 - Add more diagnostics to error thrown from find_appropriate_destination

  • Issue #2555 - No valid parcelport configured

  • PR #2553 - Add cmake cuda_arch option

  • PR #2552 - Remove incomplete datapar bindings to libflatarray

  • PR #2551 - Rename hwloc_topology to hwloc_topology_info

  • PR #2550 - Apex api updates

  • PR #2549 - Pre-include defines.hpp to get the macro HPX_HAVE_CUDA value

  • PR #2548 - Fixing issue with disconnect

  • PR #2546 - Some fixes around cuda clang partitioned_vector example

  • PR #2545 - Fix uses of the Vc2 datapar flags; the value, not the type, should be passed to functions

  • PR #2542 - Make HPX_WITH_MALLOC easier to use

  • PR #2541 - avoid recompiles when enabling/disabling examples

  • PR #2540 - Fixing usage of target_link_libraries()

  • PR #2539 - fix RPATH behaviour

  • Issue #2538 - HPX_WITH_CUDA corrupts compilation flags

  • PR #2537 - Add output of a Bazel Skylark extension for paths and compile options

  • PR #2536 - Add counter exposing total available memory to Windows as well

  • PR #2535 - Remove obsolete support for security

  • Issue #2534 - Remove command line option --hpx:run-agas-server

  • PR #2533 - Pre-cache locality endpoints during bootstrap

  • PR #2532 - Fixing handling of GIDs during serialization preprocessing

  • PR #2531 - Amend uses of the term “functor”

  • PR #2529 - added counter for reading available memory

  • PR #2527 - Facilities to create actions from lambdas

  • PR #2526 - Updated docs: HPX_WITH_EXAMPLES

  • PR #2525 - Remove warnings related to unused captured variables

  • Issue #2524 - CMAKE failed because it is missing: TCMALLOC_LIBRARY TCMALLOC_INCLUDE_DIR

  • PR #2523 - Fixing compose_cb stack overflow

  • PR #2522 - Instead of unlocking, ignore the lock while creating the message handler

  • PR #2521 - Create LPROGRESS_ logging macro to simplify progress tracking and timings

  • PR #2520 - Intel 17 support

  • PR #2519 - Fix components example

  • PR #2518 - Fixing parcel scheduling

  • Issue #2517 - Race condition during Parcel Coalescing Handler creation

  • Issue #2516 - HPX locks up when using at least 256 localities

  • Issue #2515 - error: Install cannot find “/lib/hpx/libparcel_coalescing.so.0.9.99” but I can see that file

  • PR #2514 - Making sure that all continuations of a shared_future are invoked in order

  • PR #2513 - Fixing locks held during suspension

  • PR #2512 - MPI Parcelport improvements and fixes related to the background work changes

  • PR #2511 - Fixing bit-wise (zero-copy) serialization

  • Issue #2509 - Linking errors in hwloc_topology

  • PR #2508 - Added documentation for debugging with core files

  • PR #2506 - Fixing background work invocations

  • PR #2505 - Fix tuple serialization

  • Issue #2504 - Ensure continuations are called in the order they have been attached

  • PR #2503 - Adding serialization support for Vc v2 (datapar)

  • PR #2502 - Resolve various, minor compiler warnings

  • PR #2501 - Some other fixes around cuda examples

  • Issue #2500 - nvcc / cuda clang issue due to a missing -DHPX_WITH_CUDA flag

  • PR #2499 - Adding support for std::array to wait_all and friends

  • PR #2498 - Execute background work as HPX thread

  • PR #2497 - Fixing configuration options for spinlock-deadlock detection

  • PR #2496 - Accounting for different compilers in CrayKNL toolchain file

  • PR #2494 - Adding component base class which ties a component instance to a given executor

  • PR #2493 - Enable controlling amount of pending threads which must be available to allow thread stealing

  • PR #2492 - Adding new command line option –hpx:print-counter-reset

  • PR #2491 - Resolve ambiguities when compiling with APEX

  • PR #2490 - Resuming threads waiting on future with higher priority

  • Issue #2489 - nvcc issue because -std=c++11 appears twice

  • PR #2488 - Adding performance counters exposing the internal idle and busy-loop counters

  • PR #2487 - Allowing for plain suspend to reschedule thread right away

  • PR #2486 - Only flag HPX code for CUDA if HPX_WITH_CUDA is set

  • PR #2485 - Making thread-queue parameters runtime-configurable

  • PR #2484 - Added atomic counter for parcel-destinations

  • PR #2483 - Added priority-queue lifo scheduler

  • PR #2482 - Changing scheduler to steal only if more than a minimal number of tasks are available

  • PR #2481 - Extending command line option –hpx:print-counter-destination to support value ‘none’

  • PR #2479 - Added option to disable signal handler

  • PR #2478 - Making sure the sine performance counter module gets loaded only for the corresponding example

  • Issue #2477 - Breaking at a throw statement

  • PR #2476 - Annotated function

  • PR #2475 - Ensure that using %osthread% during logging will not throw for non-hpx threads

  • PR #2474 - Remove now superficial non_direct actions from base_lco and friends

  • PR #2473 - Refining support for ITTNotify

  • PR #2472 - Some fixes around hpx compute

  • Issue #2470 - redefinition of boost::detail::spinlock

  • Issue #2469 - Dataflow performance issue

  • PR #2468 - Perf docs update

  • PR #2466 - Guarantee to execute remote direct actions on HPX-thread

  • PR #2465 - Improve demo : Async copy and fixed device handling

  • PR #2464 - Adding performance counter exposing instantaneous scheduler utilization

  • PR #2463 - Downcast to future<void>

  • PR #2462 - Fixed usage of ITT-Notify API with Intel Amplifier

  • PR #2461 - Cublas demo

  • PR #2460 - Fixing thread bindings

  • PR #2459 - Make -std=c++11 nvcc flag consistent for in-build and installed versions

  • Issue #2457 - Segmentation fault when registering a partitioned vector

  • PR #2452 - Properly releasing global barrier for unhandled exceptions

  • PR #2451 - Fixing long shutdown times

  • PR #2450 - Attempting to fix initialization errors on newer platforms (Boost V1.63)

  • PR #2449 - Replace BOOST_COMPILER_FENCE with an HPX version

  • PR #2448 - This fixes a possible race in the migration code

  • PR #2445 - Fixing dataflow et.al. for futures or future-ranges wrapped

    into ref()

  • PR #2444 - Fix segfaults

  • PR #2443 - Issue 2442

  • Issue #2442 - Mismatch between #if/#endif and namespace scope brackets in this_thread_executers.hpp

  • Issue #2441 - undeclared identifier BOOST_COMPILER_FENCE

  • PR #2440 - Knl build

  • PR #2438 - Datapar backend

  • PR #2437 - Adapt algorithm parameter sequence changes from C++17

  • PR #2436 - Adapt execution policy name changes from C++17

  • Issue #2435 - Trunk broken, undefined reference to hpx::thread::interrupt(hpx::thread::id, bool)

  • PR #2434 - More fixes to resource manager

  • PR #2433 - Added versions of hpx::get_ptr taking client side representations

  • PR #2432 - Warning fixes

  • PR #2431 - Adding facility representing set of performance counters

  • PR #2430 - Fix parallel_executor thread spawning

  • PR #2429 - Fix attribute warning for gcc

  • Issue #2427 - Seg fault running octo-tiger with latest HPX commit

  • Issue #2426 - Bug in 9592f5c0bc29806fce0dbe73f35b6ca7e027edcb causes immediate crash in Octo-tiger

  • PR #2425 - Fix nvcc errors due to constexpr specifier

  • Issue #2424 - Async action on component present on hpx::find_here is executing synchronously

  • PR #2423 - Fix nvcc errors due to constexpr specifier

  • PR #2422 - Implementing hpx::this_thread thread data functions

  • PR #2421 - Adding benchmark for wait_all

  • Issue #2420 - Returning object of a component client from another component action fails

  • PR #2419 - Infiniband parcelport

  • Issue #2418 - gcc + nvcc fails to compile code that uses partitioned_vector

  • PR #2417 - Fixing context switching

  • PR #2416 - Adding fixes and workarounds to allow compilation with nvcc/msvc (VS2015up3)

  • PR #2415 - Fix errors coming from hpx compute examples

  • PR #2414 - Fixing msvc12

  • PR #2413 - Enable cuda/nvcc or cuda/clang when using add_hpx_executable()

  • PR #2412 - Fix issue in HPX_SetupTarget.cmake when cuda is used

  • PR #2411 - This fixes the core compilation issues with MSVC12

  • Issue #2410 - undefined reference to opal_hwloc191_hwloc_.....

  • PR #2409 - Fixing locking for channel and receive_buffer

  • PR #2407 - Solving #2402 and #2403

  • PR #2406 - Improve guards

  • PR #2405 - Enable parallel::for_each for iterators returning proxy types

  • PR #2404 - Forward the explicitly given result_type in the hpx invoke

  • Issue #2403 - datapar_execution + zip iterator: lambda arguments aren’t references

  • Issue #2402 - datapar algorithm instantiated with wrong type #2402

  • PR #2401 - Added support for imported libraries to HPX_Libraries.cmake

  • PR #2400 - Use CMake policy CMP0060

  • Issue #2399 - Error trying to push back vector of futures to vector

  • PR #2398 - Allow config #defines to be written out to custom config/defines.hpp

  • Issue #2397 - CMake generated config defines can cause tedious rebuilds category

  • Issue #2396 - BOOST_ROOT paths are not used at link time

  • PR #2395 - Fix target_link_libraries() issue when HPX Cuda is enabled

  • Issue #2394 - Template compilation error using HPX_WITH_DATAPAR_LIBFLATARRAY

  • PR #2393 - Fixing lock registration for recursive mutex

  • PR #2392 - Add keywords in target_link_libraries in hpx_setup_target

  • PR #2391 - Clang goroutines

  • Issue #2390 - Adapt execution policy name changes from C++17

  • PR #2389 - Chunk allocator and pool are not used and are obsolete

  • PR #2388 - Adding functionalities to datapar needed by octotiger

  • PR #2387 - Fixing race condition for early parcels

  • Issue #2386 - Lock registration broken for recursive_mutex

  • PR #2385 - Datapar zip iterator

  • PR #2384 - Fixing race condition in for_loop_reduction

  • PR #2383 - Continuations

  • PR #2382 - add LibFlatArray-based backend for datapar

  • PR #2381 - remove unused typedef to get rid of compiler warnings

  • PR #2380 - Tau cleanup

  • PR #2379 - Can send immediate

  • PR #2378 - Renaming copy_helper/copy_n_helper/move_helper/move_n_helper

  • Issue #2376 - Boost trunk’s spinlock initializer fails to compile

  • PR #2375 - Add support for minimal thread local data

  • PR #2374 - Adding API functions set_config_entry_callback

  • PR #2373 - Add a simple utility for debugging that gives suspended task backtraces

  • PR #2372 - Barrier Fixes

  • Issue #2370 - Can’t wait on a wrapped future

  • PR #2369 - Fixing stable_partition

  • PR #2367 - Fixing find_prefixes for Windows platforms

  • PR #2366 - Testing for experimental/optional only in C++14 mode

  • PR #2364 - Adding set_config_entry

  • PR #2363 - Fix papi

  • PR #2362 - Adding missing macros for new non-direct actions

  • PR #2361 - Improve cmake output to help debug compiler incompatibility check

  • PR #2360 - Fixing race condition in condition_variable

  • PR #2359 - Fixing shutdown when parcels are still in flight

  • Issue #2357 - failed to insert console_print_action into typename_to_id_t registry

  • PR #2356 - Fixing return type of get_iterator_tuple

  • PR #2355 - Fixing compilation against Boost 1 62

  • PR #2354 - Adding serialization for mask_type if CPU_COUNT > 64

  • PR #2353 - Adding hooks to tie in APEX into the parcel layer

  • Issue #2352 - Compile errors when using intel 17 beta (for KNL) on edison

  • PR #2351 - Fix function vtable get_function_address implementation

  • Issue #2350 - Build failure - master branch (4de09f5) with Intel Compiler v17

  • PR #2349 - Enabling zero-copy serialization support for std::vector<>

  • PR #2348 - Adding test to verify #2334 is fixed

  • PR #2347 - Bug fixes for hpx.compute and hpx::lcos::channel

  • PR #2346 - Removing cmake “find” files that are in the APEX cmake Modules

  • PR #2345 - Implemented parallel::stable_partition

  • PR #2344 - Making hpx::lcos::channel usable with basename registration

  • PR #2343 - Fix a couple of examples that failed to compile after recent api changes

  • Issue #2342 - Enabling APEX causes link errors

  • PR #2341 - Removing cmake “find” files that are in the APEX cmake Modules

  • PR #2340 - Implemented all existing datapar algorithms using Boost.SIMD

  • PR #2339 - Fixing 2338

  • PR #2338 - Possible race in sliding semaphore

  • PR #2337 - Adjust osu_latency test to measure window_size parcels in flight at once

  • PR #2336 - Allowing remote direct actions to be executed without spawning a task

  • PR #2335 - Making sure multiple components are properly initialized from arguments

  • Issue #2334 - Cannot construct component with large vector on a remote locality

  • PR #2332 - Fixing hpx::lcos::local::barrier

  • PR #2331 - Updating APEX support to include OTF2

  • PR #2330 - Support for data-parallelism for parallel algorithms

  • Issue #2329 - Coordinate settings in cmake

  • PR #2328 - fix LibGeoDecomp builds with HPX + GCC 5.3.0 + CUDA 8RC

  • PR #2326 - Making scan_partitioner work (for now)

  • Issue #2323 - Constructing a vector of components only correctly initializes the first component

  • PR #2322 - Fix problems that bubbled up after merging #2278

  • PR #2321 - Scalable barrier

  • PR #2320 - Std flag fixes

  • Issue #2319 - -std=c++14 and -std=c++1y with Intel can’t build recent Boost builds due to insufficient C++14 support; don’t enable these flags by default for Intel

  • PR #2318 - Improve handling of –hpx:bind=<bind-spec>

  • PR #2317 - Making sure command line warnings are printed once only

  • PR #2316 - Fixing command line handling for default bind mode

  • PR #2315 - Set id_retrieved if set_id is present

  • Issue #2314 - Warning for requested/allocated thread discrepancy is printed twice

  • Issue #2313 - –hpx:print-bind doesn’t work with –hpx:pu-step

  • Issue #2312 - –hpx:bind range specifier restrictions are overly restrictive

  • Issue #2311 - hpx_0.9.99 out of project build fails

  • PR #2310 - Simplify function registration

  • PR #2309 - Spelling and grammar revisions in documentation (and some code)

  • PR #2306 - Correct minor typo in the documentation

  • PR #2305 - Cleaning up and fixing parcel coalescing

  • PR #2304 - Inspect checks for stream related includes

  • PR #2303 - Add functionality allowing to enumerate threads of given state

  • PR #2301 - Algorithm overloads fix for VS2013

  • PR #2300 - Use <cstdint>, add inspect checks

  • PR #2299 - Replace boost::[c]ref with std::[c]ref, add inspect checks

  • PR #2297 - Fixing compilation with no hw_loc

  • PR #2296 - Hpx compute

  • PR #2295 - Making sure for_loop(execution::par, 0, N, …) is actually executed in parallel

  • PR #2294 - Throwing exceptions if the runtime is not up and running

  • PR #2293 - Removing unused parcel port code

  • PR #2292 - Refactor function vtables

  • PR #2291 - Fixing 2286

  • PR #2290 - Simplify algorithm overloads

  • PR #2289 - Adding performance counters reporting parcel related data on a per-action basis

  • Issue #2288 - Remove dormant parcelports

  • Issue #2286 - adjustments to parcel handling to support parcelports that do not need a connection cache

  • PR #2285 - add CMake option to disable package export

  • PR #2283 - Add more inspect checks for use of deprecated components

  • Issue #2282 - Arithmetic exception in executor static chunker

  • Issue #2281 - For loop doesn’t parallelize

  • PR #2280 - Fixing 2277: build failure with PAPI

  • PR #2279 - Child vs parent stealing

  • Issue #2277 - master branch build failure (53c5b4f) with papi

  • PR #2276 - Compile time launch policies

  • PR #2275 - Replace boost::chrono with std::chrono in interfaces

  • PR #2274 - Replace most uses of Boost.Assign with initializer list

  • PR #2273 - Fixed typos

  • PR #2272 - Inspect checks

  • PR #2270 - Adding test verifying -Ihpx.os_threads=all

  • PR #2269 - Added inspect check for now obsolete boost type traits

  • PR #2268 - Moving more code into source files

  • Issue #2267 - Add inspect support to deprecate Boost.TypeTraits

  • PR #2265 - Adding channel LCO

  • PR #2264 - Make support for std::ref mandatory

  • PR #2263 - Constrain tuple_member forwarding constructor

  • Issue #2262 - Test hpx.os_threads=all

  • Issue #2261 - OS X: Error: no matching constructor for initialization of ‘hpx::lcos::local::condition_variable_any’

  • Issue #2260 - Make support for std::ref mandatory

  • PR #2259 - Remove most of Boost.MPL, Boost.EnableIf and Boost.TypeTraits

  • PR #2258 - Fixing #2256

  • PR #2257 - Fixing launch process

  • Issue #2256 - Actions are not registered if not invoked

  • PR #2255 - Coalescing histogram

  • PR #2254 - Silence explicit initialization in copy-constructor warnings

  • PR #2253 - Drop support for GCC 4.6 and 4.7

  • PR #2252 - Prepare V1.0

  • PR #2251 - Convert to 0.9.99

  • PR #2249 - Adding iterator_facade and iterator_adaptor

  • Issue #2248 - Need a feature to yield to a new task immediately

  • PR #2246 - Adding split_future

  • PR #2245 - Add an example for handing over a component instance to a dynamically launched locality

  • Issue #2243 - Add example demonstrating AGAS symbolic name registration

  • Issue #2242 - pkgconfig test broken on CentOS 7 / Boost 1.61

  • Issue #2241 - Compilation error for partitioned vector in hpx_compute branch

  • PR #2240 - Fixing termination detection on one locality

  • Issue #2239 - Create a new facility lcos::split_all

  • Issue #2236 - hpx::cout vs. std::cout

  • PR #2232 - Implement local-only primary namespace service

  • Issue #2147 - would like to know how much data is being routed by particular actions

  • Issue #2109 - Warning while compiling hpx

  • Issue #1973 - Setting INTERFACE_COMPILE_OPTIONS for hpx_init in CMake taints Fortran_FLAGS

  • Issue #1864 - run_guarded using bound function ignores reference

  • Issue #1754 - Running with TCP parcelport causes immediate crash or freeze

  • Issue #1655 - Enable zip_iterator to be used with Boost traversal iterator categories

  • Issue #1591 - Optimize AGAS for shared memory only operation

  • Issue #1401 - Need an efficient infiniband parcelport

  • Issue #1125 - Fix the IPC parcelport

  • Issue #839 - Refactor ibverbs and shmem parcelport

  • Issue #702 - Add instrumentation of parcel layer

  • Issue #668 - Implement ispc task interface

  • Issue #533 - Thread queue/deque internal parameters should be runtime configurable

  • Issue #475 - Create a means of combining performance counters into querysets