checkpoint¶
A common need of users is to periodically backup an application. This practice
provides resiliency and potential restart points in code. HPX utilizes the
concept of a checkpoint
to support this use case.
Found in hpx/util/checkpoint.hpp
, checkpoint
s are defined as objects
that hold a serialized version of an object or set of objects at a particular
moment in time. This representation can be stored in memory for later use or it
can be written to disk for storage and/or recovery at a later point. In order to
create and fill this object with data, users must use a function called
save_checkpoint
. In code the function looks like this:
hpx::future<hpx::util::checkpoint> hpx::util::save_checkpoint(a, b, c, ...);
save_checkpoint
takes arbitrary data containers, such as int
,
double
, float
, vector
, and future
, and serializes them into a
newly created checkpoint
object. This function returns a future
to a
checkpoint
containing the data. Here’s an example of a simple use case:
using hpx::util::checkpoint;
using hpx::util::save_checkpoint;
std::vector<int> vec{1,2,3,4,5};
hpx::future<checkpoint> save_checkpoint(vec);
Once the future is ready, the checkpoint object will contain the vector
vec
and its five elements.
prepare_checkpoint
takes arbitrary data containers (same as for
save_checkpoint
), , such as int
,
double
, float
, vector
, and future
, and calculates the necessary
buffer space for the checkpoint that would be created if save_checkpoint
was called with the same arguments. This function returns a future
to a
checkpoint
that is appropriately initialized. Here’s an example of a
simple use case:
using hpx::util::checkpoint;
using hpx::util::prepare_checkpoint;
std::vector<int> vec{1,2,3,4,5};
hpx::future<checkpoint> prepare_checkpoint(vec);
Once the future is ready, the checkpoint object will be initialized with an appropriately sized internal buffer.
It is also possible to modify the launch policy used by save_checkpoint
.
This is accomplished by passing a launch policy as the first argument. It is
important to note that passing hpx::launch::sync
will cause
save_checkpoint
to return a checkpoint
instead of a future
to a
checkpoint
. All other policies passed to save_checkpoint
will return a
future
to a checkpoint
.
Sometimes checkpoint
s must be declared before they are used.
save_checkpoint
allows users to move pre-created checkpoint
s into the
function as long as they are the first container passing into the function (In
the case where a launch policy is used, the checkpoint
will immediately
follow the launch policy). An example of these features can be found below:
char character = 'd';
int integer = 10;
float flt = 10.01f;
bool boolean = true;
std::string str = "I am a string of characters";
std::vector<char> vec(str.begin(), str.end());
checkpoint archive;
// Test 1
// test basic functionality
hpx::shared_future<checkpoint> f_archive = save_checkpoint(
std::move(archive), character, integer, flt, boolean, str, vec);
Once users can create checkpoint
s they must now be able to restore the
objects they contain into memory. This is accomplished by the function
restore_checkpoint
. This function takes a checkpoint
and fills its data
into the containers it is provided. It is important to remember that the
containers must be ordered in the same way they were placed into the
checkpoint
. For clarity see the example below:
char character2;
int integer2;
float flt2;
bool boolean2;
std::string str2;
std::vector<char> vec2;
restore_checkpoint(data, character2, integer2, flt2, boolean2, str2, vec2);
The core utility of checkpoint
is in its ability to make certain data
persistent. Often, this means that the data needs to be stored in an object,
such as a file, for later use. HPX has two solutions for these issues: stream
operator overloads and access iterators.
HPX contains two stream overloads, operator<<
and operator>>
, to stream
data out of and into checkpoint
. Here is an example of the overloads in
use below:
double a9 = 1.0, b9 = 1.1, c9 = 1.2;
std::ofstream test_file_9("test_file_9.txt");
hpx::future<checkpoint> f_9 = save_checkpoint(a9, b9, c9);
test_file_9 << f_9.get();
test_file_9.close();
double a9_1, b9_1, c9_1;
std::ifstream test_file_9_1("test_file_9.txt");
checkpoint archive9;
test_file_9_1 >> archive9;
restore_checkpoint(archive9, a9_1, b9_1, c9_1);
This is the primary way to move data into and out of a checkpoint
. It is
important to note, however, that users should be cautious when using a stream
operator to load data and another function to remove it (or vice versa). Both
operator<<
and operator>>
rely on a .write()
and a .read()
function respectively. In order to know how much data to read from the
std::istream
, the operator<<
will write the size of the checkpoint
before writing the checkpoint
data. Correspondingly, the operator>>
will
read the size of the stored data before reading the data into a new instance of
checkpoint
. As long as the user employs the operator<<
and
operator>>
to stream the data, this detail can be ignored.
Important
Be careful when mixing operator<<
and operator>>
with other
facilities to read and write to a checkpoint
. operator<<
writes an
extra variable, and operator>>
reads this variable back separately. Used
together the user will not encounter any issues and can safely ignore this
detail.
Users may also move the data into and out of a checkpoint
using the exposed
.begin()
and .end()
iterators. An example of this use case is
illustrated below.
std::ofstream test_file_7("checkpoint_test_file.txt");
std::vector<float> vec7{1.02f, 1.03f, 1.04f, 1.05f};
hpx::future<checkpoint> fut_7 = save_checkpoint(vec7);
checkpoint archive7 = fut_7.get();
std::copy(archive7.begin(), // Write data to ofstream
archive7.end(), // ie. the file
std::ostream_iterator<char>(test_file_7));
test_file_7.close();
std::vector<float> vec7_1;
std::vector<char> char_vec;
std::ifstream test_file_7_1("checkpoint_test_file.txt");
if (test_file_7_1)
{
test_file_7_1.seekg(0, test_file_7_1.end);
auto length = test_file_7_1.tellg();
test_file_7_1.seekg(0, test_file_7_1.beg);
char_vec.resize(length);
test_file_7_1.read(char_vec.data(), length);
}
checkpoint archive7_1(std::move(char_vec)); // Write data to checkpoint
restore_checkpoint(archive7_1, vec7_1);
Checkpointing components¶
save_checkpoint
and restore_checkpoint
are also able to store components
inside checkpoint
s. This can be done in one of two ways. First a client of
the component can be passed to save_checkpoint
. When the user wishes to
resurrect the component she can pass a client instance to
restore_checkpoint
.
This technique is demonstrated below:
// Try to checkpoint and restore a component with a client
std::vector<int> vec3{10, 10, 10, 10, 10};
// Create a component instance through client constructor
data_client D(hpx::find_here(), std::move(vec3));
hpx::future<checkpoint> f3 = save_checkpoint(D);
// Create a new client
data_client E;
// Restore server inside client instance
restore_checkpoint(f3.get(), E);
The second way a user can save a component is by passing a shared_ptr
to the
component to save_checkpoint
. This component can be resurrected by creating
a new instance of the component type and passing a shared_ptr
to the new
instance to restore_checkpoint
.
This technique is demonstrated below:
// test checkpoint a component using a shared_ptr
std::vector<int> vec{1, 2, 3, 4, 5};
data_client A(hpx::find_here(), std::move(vec));
// Checkpoint Server
hpx::id_type old_id = A.get_id();
hpx::future<std::shared_ptr<data_server>> f_a_ptr =
hpx::get_ptr<data_server>(A.get_id());
std::shared_ptr<data_server> a_ptr = f_a_ptr.get();
hpx::future<checkpoint> f = save_checkpoint(a_ptr);
auto&& data = f.get();
// test prepare_checkpoint API
checkpoint c = prepare_checkpoint(hpx::launch::sync, a_ptr);
HPX_TEST(c.size() == data.size());
// Restore Server
// Create a new server instance
std::shared_ptr<data_server> b_server;
restore_checkpoint(data, b_server);