Using the LCI parcelport
Contents
Using the LCI parcelport#
Basic information#
The Lightweight Communication Interface (LCI) is an ongoing research project aiming to provide efficient support for applications with irregular and asynchronous communication patterns such as graph analysis, sparse linear algebra, and task-based runtime on modern parallel architectures. Its features include (a) support for more communication primitives such as two-sided send/recv and one-sided (dynamic or direct) remote put/get (b) better multi-threaded performance (c) explicit user control of communication resource (d) flexible signaling mechanisms such as synchronizer, completion queue, and active message handler. It is designed to be a low-level communication library used by high-level libraries and frameworks.
The LCI parcelport is an experimental parcelport. It aims to provide the best possible communication performance on high-performance computation platforms. Compared to the MPI parcelport, it uses much fewer messages and memory copies to transfer an HPX parcel over the network. Its message transmission path involves minimum synchronization points and is almost lock-free. It is expected to be much faster than the MPI parcelport.
Build HPX with the LCI parcelport#
While building HPX, you can specify a set of CMake variables to enable and configure the LCI parcelport. Below, there is a set of the most important and frequently used CMake variables.
- HPX_WITH_PARCELPORT_LCI#
Enable the LCI parcelport. This enables the use of LCI for networking operations in the HPX runtime. The default value is
OFF
because it’s not available on all systems and/or requires another dependency. However, this experimental parcelport may provide better performance than the MPI parcelport. You must set this variable toON
in order to use the LCI parcelport. All the following variables only make sense when this variable is set toON
.
- HPX_WITH_FETCH_LCI#
Use FetchContent to fetch LCI. The default value is
OFF
. If this option is set toOFF
. You need to install your own LCI library and HPX will try to find it using CMakefind_package
. You can specify the location of the LCI installation by the environmental variableLCI_ROOT
. Refer to the LCI README for how to install LCI. If this option is set toON
. HPX will fetch and build LCI for you. You can use the following CMake variables to configure this behavior for your platform.
- HPX_WITH_LCI_TAG#
This variable only takes effect when
HPX_WITH_FETCH_LCI
is set toON
andFETCHCONTENT_SOURCE_DIR_LCI
is not set. HPX will fetch LCI from its github repository. This variable controls the branch/tag LCI will be fetched.
- FETCHCONTENT_SOURCE_DIR_LCI#
This variable only takes effect when
HPX_WITH_FETCH_LCI
is set toON
. When it is defined,HPX_WITH_LCI_TAG
will be ignored. It accepts a path to a local version of LCI source code and HPX will fetch and build LCI from there. The default value is set conservatively for the stability of HPX, but users are welcome to set this variable tomaster
for potentially better performance.
Run HPX with the LCI parcelport#
We use the same mechanisms as MPI to launch LCI, so you can use the same way you run MPI parcelport to run LCI
parcelport. Typically, it would be hpxrun.py
, mpirun
, or srun
.
hpxrun.py
serves as a wrapper for mpirun
and srun
.
If you are using hpxrun.py
, pass -p lci
to the scripts. You also need to pass either -r mpi
or
-r srun
to select the correct run wrapper according to the platform.
If you are using mpirun
or srun
, you can just pass
--hpx:ini=hpx.parcel.lci.priority=1000
, --hpx:ini=hpx.parcel.lci.enable=1
, and
--hpx:ini=hpx.parcel.bootstrap=lci
to the HPX applications.
The hpxrun.py
argument -r none
(the default option for the run wrapper) and its corresponding HPX arguments
--hpx:hpx
and --hpx:agas
do not work for the MPI or the LCI parcelport.
Performance tuning of the LCI parcelport#
We encourage users to set the following environmental variables when using the LCI parcelport to get better performance.
$ export LCI_SERVER_MAX_SENDS=1024
$ export LCI_SERVER_MAX_RECVS=4096
$ export LCI_SERVER_NUM_PKTS=65536
$ export LCI_SERVER_MAX_CQES=65536
$ export LCI_PACKET_SIZE=12288
This setting needs roughly 800MB memory per process. The memory consumption mainly comes from the packets, which can be calculated using LCI_SERVER_NUM_PKTS x LCI_PACKET_SIZE.