Using the LCI parcelport#

Basic information#

The Lightweight Communication Interface (LCI) is an ongoing research project aiming to provide efficient support for applications with irregular and asynchronous communication patterns such as graph analysis, sparse linear algebra, and task-based runtime on modern parallel architectures. Its features include (a) support for more communication primitives such as two-sided send/recv and one-sided (dynamic or direct) remote put/get (b) better multi-threaded performance (c) explicit user control of communication resource (d) flexible signaling mechanisms such as synchronizer, completion queue, and active message handler. It is designed to be a low-level communication library used by high-level libraries and frameworks.

The LCI parcelport is an experimental parcelport. It aims to provide the best possible communication performance on high-performance computation platforms. Compared to the MPI parcelport, it uses much fewer messages and memory copies to transfer an HPX parcel over the network. Its message transmission path involves minimum synchronization points and is almost lock-free. It is expected to be much faster than the MPI parcelport.

Build HPX with the LCI parcelport#

While building HPX, you can specify a set of CMake variables to enable and configure the LCI parcelport. Below, there is a set of the most important and frequently used CMake variables.

HPX_WITH_PARCELPORT_LCI#

Enable the LCI parcelport. This enables the use of LCI for networking operations in the HPX runtime. The default value is OFF because it’s not available on all systems and/or requires another dependency. However, this experimental parcelport may provide better performance than the MPI parcelport. You must set this variable to ON in order to use the LCI parcelport. All the following variables only make sense when this variable is set to ON.

HPX_WITH_FETCH_LCI#

Use FetchContent to fetch LCI. The default value is OFF. If this option is set to OFF. You need to install your own LCI library and HPX will try to find it using CMake find_package. You can specify the location of the LCI installation by the environmental variable LCI_ROOT. Refer to the LCI README for how to install LCI. If this option is set to ON. HPX will fetch and build LCI for you. You can use the following CMake variables to configure this behavior for your platform.

HPX_WITH_LCI_TAG#

This variable only takes effect when HPX_WITH_FETCH_LCI is set to ON and FETCHCONTENT_SOURCE_DIR_LCI is not set. HPX will fetch LCI from its github repository. This variable controls the branch/tag LCI will be fetched.

FETCHCONTENT_SOURCE_DIR_LCI#

This variable only takes effect when HPX_WITH_FETCH_LCI is set to ON. When it is defined, HPX_WITH_LCI_TAG will be ignored. It accepts a path to a local version of LCI source code and HPX will fetch and build LCI from there. The default value is set conservatively for the stability of HPX, but users are welcome to set this variable to master for potentially better performance.

Run HPX with the LCI parcelport#

We use the same mechanisms as MPI to launch LCI, so you can use the same way you run MPI parcelport to run LCI parcelport. Typically, it would be hpxrun, mpirun, or srun.

If you are using hpxrun.py, just pass --parcelport lci to the scripts.

If you are using mpirun or srun, you can just pass --hpx:ini=hpx.parcel.lci.priority=1000, --hpx:ini=hpx.parcel.lci.enable=1, and --hpx:ini=hpx.parcel.bootstrap=lci to the HPX applications.

Performance tuning of the LCI parcelport#

We encourage users to set the following environmental variables when using the LCI parcelport to get better performance.

$ export LCI_SERVER_MAX_SENDS=1024
$ export LCI_SERVER_MAX_RECVS=4096
$ export LCI_SERVER_NUM_PKTS=65536
$ export LCI_SERVER_MAX_CQES=65536
$ export LCI_PACKET_SIZE=12288

This setting needs roughly 800MB memory per process. The memory consumption mainly comes from the packets, which can be calculated using LCI_SERVER_NUM_PKTS x LCI_PACKET_SIZE.