checkpoint

A common need of users is to periodically backup an application. This practice provides resiliency and potential restart points in code. We have developed the concept of a checkpoint to support this use case.

Found in hpx/util/checkpoint.hpp, checkpoints are defined as objects which hold a serialized version of an object or set of objects at a particular moment in time. This representation can be stored in memory for later use or it can be written to disk for storage and/or recovery at a later point. In order to create and fill this object with data we use a function called save_checkpoint. In code the function looks like this:

hpx::future<hpx::util::checkpoint> hpx::util::save_checkpoint(a, b, c, ...);

save_checkpoint takes arbitrary data containers such as int, double, float, vector, and future and serializes them into a newly created checkpoint object. This function returns a future to a checkpoint containing the data. Let us look a simple use case below:

using hpx::util::checkpoint;
using hpx::util::save_checkpoint;

std::vector<int> vec{1,2,3,4,5};
hpx::future<checkpoint> save_checkpoint(vec);

Once the future is ready the checkpoint object will contain the vector vec and its five elements.

It is also possible to modify the launch policy used by save_checkpoint. This is accomplished by passing a launch policy as the first argument. It is important to note that passing hpx::launch::sync will cause save_checkpoint to return a checkpoint instead of a future to a checkpoint. All other policies passed to save_checkpoint will return a future to a checkpoint.

Sometimes checkpoint s must be declared before they are used. save_checkpoint allows users to move pre-created checkpoint s into the function as long as they are the first container passing into the function (In the case where a launch policy is used, the checkpoint will immediately follow the launch policy). An example of these features can be found below:

    char character = 'd';
    int integer = 10;
    float flt = 10.01f;
    bool boolean = true;
    std::string str = "I am a string of characters";
    std::vector<char> vec(str.begin(), str.end());
    checkpoint archive;

    // Test 1
    //  test basic functionality
    hpx::shared_future<checkpoint> f_archive = save_checkpoint(
        std::move(archive), character, integer, flt, boolean, str, vec);

Now that we can create checkpoint s we now must be able to restore the objects they contain into memory. This is accomplished by the function restore_checkpoint. This function takes a checkpoint and fills its data into the containers it is provided. It is important to remember that the containers must be ordered in the same way they were placed into the checkpoint. For clarity see the example below:

    char character2;
    int integer2;
    float flt2;
    bool boolean2;
    std::string str2;
    std::vector<char> vec2;

    restore_checkpoint(
        f_archive.get(), character2, integer2, flt2, boolean2, str2, vec2);

The core utility of checkpoint is in its ability to make certain data persistent. Often this means that the data is needed to be stored in an object, such as a file, for later use. For these cases we have provided two solutions: stream operator overloads and access iterators.

We have created the two stream overloads operator<< and operator>> to stream data out of and into checkpoint. You can see an example of the overloads in use below:

    double a9 = 1.0, b9 = 1.1, c9 = 1.2;
    std::ofstream test_file_9("test_file_9.txt");
    hpx::future<checkpoint> f_9 = save_checkpoint(a9, b9, c9);
    test_file_9 << f_9.get();
    test_file_9.close();

    double a9_1, b9_1, c9_1;
    std::ifstream test_file_9_1("test_file_9.txt");
    checkpoint archive9;
    test_file_9_1 >> archive9;
    restore_checkpoint(archive9, a9_1, b9_1, c9_1);

This is the primary way to move data into and out of a checkpoint. It is important to note, however, that users should be cautious when using a stream operator to load data an another function to remove it (or vice versa). Both operator<< and operator>> rely on a .write() and a .read() function respectively. In order to know how much data to read from the std::istream, the operator<< will write the size of the checkpoint before writing the checkpoint data. Correspondingly, the operator>> will read the size of the stored data before reading the data into new instance of checkpoint. As long as the user employs the operator<< and operator>> to stream the data this detail can be ignored.

Important

Be careful when mixing operator<< and operator>> with other facilities to read and write to a checkpoint. operator<< writes an extra variable and operator>> reads this variable back separately. Used together the user will not encounter any issues and can safely ignore this detail.

Users may also move the data into and out of a checkpoint using the exposed .begin() and .end() iterators. An example of this use case is illustrated below.

    std::ofstream test_file_7("checkpoint_test_file.txt");
    std::vector<float> vec7{1.02f, 1.03f, 1.04f, 1.05f};
    hpx::future<checkpoint> fut_7 = save_checkpoint(vec7);
    checkpoint archive7 = fut_7.get();
    std::copy(archive7.begin(),    // Write data to ofstream
        archive7.end(),            // ie. the file
        std::ostream_iterator<char>(test_file_7));
    test_file_7.close();

    std::vector<float> vec7_1;
    std::vector<char> char_vec;
    std::ifstream test_file_7_1("checkpoint_test_file.txt");
    if (test_file_7_1)
    {
        test_file_7_1.seekg(0, test_file_7_1.end);
        auto length = test_file_7_1.tellg();
        test_file_7_1.seekg(0, test_file_7_1.beg);
        char_vec.resize(length);
        test_file_7_1.read(char_vec.data(), length);
    }
    checkpoint archive7_1(std::move(char_vec));    // Write data to checkpoint
    restore_checkpoint(archive7_1, vec7_1);

Checkpointing components

save_checkpoint and restore_checkpoint are also able to store components inside checkpoints. This can be done in one of two ways. First a client of the component can be passed to save_checkpoint. When the user wishes to resurrect the component she can pass a client instance to restore_checkpoint.

This technique is demonstrated below:

    // Try to checkpoint and restore a component with a client
    std::vector<int> vec3{10, 10, 10, 10, 10};

    // Create a component instance through client constructor
    data_client D(hpx::find_here(), std::move(vec3));
    hpx::future<checkpoint> f3 = save_checkpoint(D);

    // Create a new client
    data_client E;

    // Restore server inside client instance
    restore_checkpoint(f3.get(), E);

The second way a user can save a component is by passing a shared_ptr to the component to save_checkpoint. This component can be resurrected by creating a new instance of the component type and passing a shared_ptr to the new instance to restore_checkpoint.

This technique is demonstrated below:

    // test checkpoint a component using a shared_ptr
    std::vector<int> vec{1, 2, 3, 4, 5};
    data_client A(hpx::find_here(), std::move(vec));

    // Checkpoint Server
    hpx::id_type old_id = A.get_id();

    hpx::future<std::shared_ptr<data_server>> f_a_ptr =
        hpx::get_ptr<data_server>(A.get_id());
    std::shared_ptr<data_server> a_ptr = f_a_ptr.get();
    hpx::future<checkpoint> f = save_checkpoint(a_ptr);

    // Restore Server
    // Create a new server instance
    std::shared_ptr<data_server> b_server;
    restore_checkpoint(f.get(), b_server);