"Hello world!" initialization.

Lesson topics

The Tpetra package provides next-generation distributed sparse linear algebra. It includes sparse matrices, vectors, and other linear algebra objects, along with computational kernels. This lesson shows the MPI (or non-MPI) initialization you need to do in order to start using Tpetra. The initialization procedure differs slightly, depending on whether you are writing a code from scratch, or introducing Tpetra into an existing code base. We will give example codes and discussion for the following three use cases:

A code which only uses MPI through Trilinos
A code which uses MPI on its own as well as through Trilinos
A code which does not use MPI

Initialization for a code that only uses MPI through Trilinos

This section explains how to set up the distributed-memory parallel environment for using Tpetra, in a code which only uses MPI through Trilinos. If you want to introduce Tpetra into an existing MPI application, please see the next section. This example works whether or not Trilinos was built with MPI support.

Tpetra was written for distributed-memory parallel programming. It uses MPI (the Message Passing Interface) for this. However, Tpetra will work correctly whether or not you have built Trilinos with MPI support. It does so by interacting with MPI through an interface called Teuchos::Comm. (If you are familiar with Epetra, this interface is analogous to Epetra_Comm.) If MPI is enabled, then this wraps an MPI_Comm. Otherwise, this is a "serial communicator" with one process, analogous to MPI_COMM_SELF.

Tpetra also provides an MPI initialization interface, Tpetra::ScopeGuard. If you built Trilinos with MPI enabled, but have not yet initialized MPI, ScopeGuard's constructor calls MPI_Init for you, and its destructor calls MPI_Finalize for you. The following code example shows how to initialize MPI (if available) and get a Teuchos::Comm corresponding to MPI_COMM_WORLD. The example works whether or not Trilinos was build with MPI support. If you prefer a C-style initialization and finalization interface, you may use Tpetra::initialize and Tpetra::finalize instead.

//
// This example includes MPI initialization, getting a Teuchos::Comm
// communicator, and printing out Tpetra version information.
//
#include <Tpetra_Core.hpp>
#include <Tpetra_Version.hpp>
// Do something with the given communicator.  In this case, we just
// print Tpetra's version to stdout on Process 0.
void
exampleRoutine (const Teuchos::RCP<const Teuchos::Comm<int> >& comm)
{
  if (comm->getRank () == 0) {
    // On (MPI) Process 0, print out the Tpetra software version.
    std::cout << Tpetra::version () << std::endl << std::endl;
  }
}
int
main (int argc, char *argv[])
{
  // These "using" declarations make the code more concise, in that
  // you don't have to write the namespace along with the class or
  // object name.  This is especially helpful with commonly used
  // things like std::endl.
  using std::cout;
  using std::endl;
  // Start up MPI, if using MPI.  Trilinos doesn't have to be built
  // with MPI; it's called a "serial" build if you build without MPI.
  // Tpetra::ScopeGuard hides this implementation detail.
  Tpetra::ScopeGuard tpetraScope (&argc, &argv);
  {
    // Never let Tpetra objects persist after either MPI_Finalize or
    // Kokkos::finalize has been called.  This is because the objects'
    // destructors may need to call MPI or Kokkos functions.  In
    // particular, never create Tpetra objects at main scope.
    // Get a pointer to the communicator object representing
    // MPI_COMM_WORLD.  The function knows whether or not we built with
    // MPI support.  If we didn't build with MPI, we'll get a
    // "communicator" with size 1, whose only process has rank 0.
    Teuchos::RCP<const Teuchos::Comm<int> > comm = Tpetra::getDefaultComm ();
    // Get my process' rank, and the total number of processes.
    // Equivalent to MPI_Comm_rank resp. MPI_Comm_size.
    const int myRank = comm->getRank ();
    const int numProcs = comm->getSize ();
    if (myRank == 0) {
      cout << "Total number of processes: " << numProcs << endl;
    }
    // Do something with the new communicator.
    exampleRoutine (comm);
    // This tells the Trilinos test framework that the test passed.
    if (myRank == 0) {
      cout << "End Result: TEST PASSED" << endl;
    }
    // ScopeGuard's destructor calls MPI_Finalize, if its constructor
    // called MPI_Init.  Likewise, it calls Kokkos::finalize, if its
    // constructor called Kokkos::initialize.
  }
  // You don't have to do anything here!  Just return from main().
  // Isn't that helpful?
  return 0;
}

Initialization for an existing MPI code

Tpetra also works fine in an existing MPI code. For this example, we assume that your code initializes MPI on its own by calling MPI_Init, and calls MPI_Finalize at the end. It also must get an MPI_Comm (an MPI communicator) somewhere, either by using a predefined communicator such as MPI_COMM_WORLD, or by creating a new one.

//
// This example shows how to wrap the MPI_Comm (MPI communicator) that
// you are using, so that Tpetra can use it as well.  it includes MPI
// initialization, getting a Teuchos::Comm communicator, and printing
// out Tpetra version information.
//
// Your code is an existing MPI code, so it presumably includes mpi.h directly.
#include <mpi.h>
#include <Teuchos_DefaultMpiComm.hpp> // wrapper for MPI_Comm
#include <Tpetra_Version.hpp> // Tpetra version string
//
// ... Your other include files go here ...
//
// Do something with the given communicator.  In this case, we just
// print Tpetra's version to stdout on Process 0 in the given
// communicator.
void
exampleRoutine (const Teuchos::RCP<const Teuchos::Comm<int> >& comm)
{
  if (comm->getRank () == 0) {
    // On (MPI) Process 0, print out the Tpetra software version.
    std::cout << Tpetra::version () << std::endl << std::endl;
  }
}
int
main (int argc, char *argv[])
{
  // These "using" declarations make the code more concise, in that
  // you don't have to write the namespace along with the class or
  // object name.  This is especially helpful with commonly used
  // things like std::endl or Teuchos::RCP.
  using std::cout;
  using std::endl;
  using Teuchos::Comm;
  using Teuchos::MpiComm;
  using Teuchos::RCP;
  using Teuchos::rcp;
  // We assume that your code calls MPI_Init.  It's bad form
  // to ignore the error codes returned by MPI functions, but
  // we do so here for brevity.
  (void) MPI_Init (&argc, &argv);
  // This code takes the place of whatever you do to get an MPI_Comm.
  MPI_Comm yourComm = MPI_COMM_WORLD;
  {
    // Never create Tpetra objects at main scope.  Their destructors
    // must be called before MPI_Finalize and Kokkos::finalize are
    // called.
    // If your code plans to use MPI on its own, as well as through
    // Trilinos, consider giving Trilinos a copy of your MPI_Comm
    // (created via MPI_Comm_dup) rather than your MPI_Comm directly.
    // Trilinos may in the future duplicate the MPI_Comm
    // automatically, but it does not currently do this.  Duplicating
    // the MPI_Comm is not necessary, but may make it easier for you
    // to overlap asynchronous communication operations performed by
    // Trilinos with those performed by your code.
    // Wrap the MPI_Comm.  If you wrap it in this way, you are telling
    // Trilinos that you are responsible for calling MPI_Comm_free on
    // your MPI_Comm after use, if necessary.  (It's not necessary for
    // MPI_COMM_WORLD.)  There is a way to tell Trilinos to call
    // MPI_Comm_free itself; we don't show it here.  (It involves
    // passing the result of Teuchos::opaqueWrapper to MpiComm's
    // constructor.)
    RCP<const Comm<int> > comm (new MpiComm<int> (yourComm));
    // Get my process' rank, and the total number of processes.
    // Equivalent to MPI_Comm_rank resp. MPI_Comm_size.
    const int myRank = comm->getRank ();
    const int numProcs = comm->getSize ();
    if (myRank == 0) {
      cout << "Total number of processes: " << numProcs << endl;
    }
    // Do something with the new communicator.
    exampleRoutine (comm);
    // This tells the Trilinos test framework that the test passed.
    if (myRank == 0) {
      cout << "End Result: TEST PASSED" << endl;
    }
  }
  // If you need to call MPI_Comm_free on your MPI_Comm, now would be
  // the time to do so, before calling MPI_Finalize.  You may also
  // automate this process; ask the tutorial presenter for more
  // information.
  // Since you called MPI_Init, you are responsible for calling
  // MPI_Finalize.
  (void) MPI_Finalize ();
  return 0;
}

Initialization for an existing non-MPI code

The first code example in this lesson uses Tpetra::ScopeGuard to initialize MPI. Despite the name, in a non-MPI build of Trilinos, it does nothing. Thus, if you built Trilinos with MPI disabled, you should feel free to use Tpetra::ScopeGuard in your main() function. However, if you are using a build of Trilinos that has MPI enabled, but you don't want to use multiple MPI processes in your application, you may use the three-argument version of Tpetra::ScopeGuard's constructor, that takes an MPI_Comm argument. Just pass in MPI_COMM_SELF to force Tpetra not to use multiple processes. The following example illustrates this.

// ... Your other include files go here ...
#include <Tpetra_Core.hpp>
#include <Tpetra_Version.hpp>
// Do something with the given communicator.  In this case, we just
// print Tpetra's version to stdout on Process 0.
void
exampleRoutine (const Teuchos::RCP<const Teuchos::Comm<int> >& comm)
{
  if (comm->getRank () == 0) {
    // On Process 0, print out the Tpetra software version.
    std::cout << Tpetra::version () << std::endl << std::endl;
  }
}
int
main (int argc, char *argv[])
{
  using std::cout;
  using std::endl;
  using Teuchos::Comm;
  using Teuchos::RCP;
  using Teuchos::rcp;
  // Tpetra's default communicator will use a 1-process comm.
#ifdef HAVE_TPETRA_MPI
  Tpetra::ScopeGuard tpetraScope (&argc, &argv, MPI_COMM_SELF);
#else
  // Not building with MPI, so default comm won't use MPI.
  Tpetra::ScopeGuard tpetraScope (&argc, &argv);
#endif // HAVE_TPETRA_MPI
  {
    RCP<const Comm<int> > comm = Tpetra::getDefaultComm ();
    // With a single-process communicator, the rank is always 0, and
    // the number of processes is always 1.
    const int myRank = comm->getRank ();
    const int numProcs = comm->getSize ();
    if (myRank == 0) {
      cout << "Total number of processes: " << numProcs << endl;
    }
    // Test the two assertions in the previous comment.
    // TEUCHOS_TEST_FOR_EXCEPTION is a macro defined in the Teuchos
    // package that takes three arguments: a bool expression, an
    // exception to throw if the expression evaluates to true, and a
    // message (interpreted as if it follows a "<<" after an
    // std::ostream) to include in the exception.  The macro includes
    // useful line number and file information in the exception
    // message, as well as a place where you can set a breakpoint in a
    // debugger right before the exception is thrown.
    TEUCHOS_TEST_FOR_EXCEPTION
      (myRank != 0, std::logic_error,
       "This is a single-MPI-process example, but the calling process' "
       "rank is " << myRank << " != 0.  Please report this bug.");
    TEUCHOS_TEST_FOR_EXCEPTION
      (numProcs != 1, std::logic_error,
       "This is a single-MPI-process example, but the number of "
       "processes in the Teuchos::Comm is " << numProcs << " != 1.  "
       "Please report this bug.");
    // Do something with the new communicator.
    exampleRoutine (comm);
    // This tells the Trilinos test framework that the test passed.
    if (myRank == 0) {
      cout << "End Result: TEST PASSED" << endl;
    }
  }
  return 0;
}

Things we didn't explain above

What are RCP and rcp?

RCP stands for "reference-counted pointer." It lives in the Teuchos package of Trilinos, and is Trilinos' version of std::shared_ptr or boost::shared_ptr. (There are both historical and technical reasons why we use our own class instead of one of those.) For more details, please refer to the complete reference for the Teuchos memory management classes.

In brief: RCP lets you have the benefits of pointers (the lightweight sharing of data) without the headaches and risks (managing ownership and deallocation). It behaves like a pointer in terms of syntax, but handles deallocation for you. RCP is templated on the type of object to which it points. For example, RCP<T> x works very much like T* x (a pointer to a T), and RCP<const T> y works like const T* y (that is, a pointer to a const T). The dereference (*) and method call (->) operators work just like they do with regular pointers. For example,

// Pointer to nonconst T.
RCP<T> x_ptr = ...;
// Unary operator* returns a reference, not a copy.
T& x_ref = *x_ptr;
x_ptr->nonconstInstanceMethod ();
x_ptr->constInstanceMethod ();
// Pointer to const T.
RCP<const T> y_ptr = ...;
const T& y_ref = *y_ptr;
// You may only call const instance methods with a const pointer.
y_ptr->constInstanceMethod ();

The "reference-counted" part of RCP means that it automatically handles deallocation, if appropriate. Copying the RCP increments the reference count; allowing it to fall out of scope or assigning Teuchos::null to it decrements the reference count. When the reference count reaches zero, the deallocation function is called. By default, it calls delete, but you can control this behavior. Reference counting allows you to share pointers between different parts of your code, without needing to worry about what part of the code deallocates the object, or when it gets deallocated.

The Teuchos::rcp function is a "nonmember constructor" template function that returns a newly created RCP of something. Using this function to create an RCP saves some typing, and also may improve exception safety.

Teuchos::Comm, Teuchos::MpiComm, and Teuchos::SerialComm

Teuchos::Comm is Trilinos' interface to distributed-memory parallel communication. It lives in the Teuchos package of Trilinos. Comm is an abstract base class. The Teuchos::MpiComm and Teuchos::SerialComm classes implement this interface. As the name indicates, MpiComm implements Comm by using MPI calls. SerialComm implements Comm without MPI, as a "communicator" with only one process. (This is more or less equivalent to MPI_COMM_SELF, except without actually using MPI.) All of these classes are templated on the integer type used for MPI function calls. Currently, this is always int, so you should always use int as the template parameter.

Since Comm is an abstract base class, you must handle it by pointer or reference. The idiomatic way to create and handle a Comm is by RCP, as an RCP<const Comm<int> >. This expression should be considered Trilinos' version of the MPI_Comm opaque handle. Thus,

RCP<const Comm<int> > comm = rcp (new MpiComm<int> (MPI_COMM_WORLD));

is "Trilinos-speak" for this:

MPI_Comm comm = MPI_COMM_WORLD;

The normal way to use Comm is to call the nonmember functions in the Teuchos_CommHelpers.hpp header file. We won't cover this in the lesson today.