A lesson on parallel distributions and distributed objects.

Lesson topics

In this lesson, we will explain how to create the simplest kind of Tpetra linear algebra object: a Vector, whose entries are distributed over the process(es) in a communicator. The Map object describes this distribution of entries over processes. You create a Map to describe the distribution scheme you want, and then use the Map to create objects (such as Vectors) that have this distribution. We spend a little bit more time than you might initially wish explaining Map, but understanding it is important for getting the best performance out of Tpetra. We give examples of different distributions you can create, use their Maps to create Vectors, and then do some arithmetic with the Vectors. All of this gives us an opportunity to explain the various template parameters that are part of the type of nearly every Tpetra object.

Tpetra::Map

A Map instance describes a data distribution

Tpetra, like Epetra, uses objects called "Maps" to encapsulate the details of distributing data over MPI processes. Maps make data distribution into a first-class citizen. Each Map instance represents a particular data distribution.

You can think of a Map instance abstractly as representing a vector space. If two vectors have the same map, it's like they come from the same vector space. For example, you can add them together without performing communication. If they come from different vector spaces, then you need more information to know whether it is legal to add the vectors together.

You can find documentation for Tpetra's Map class here.

A Map assigns entries of a data structure to processes

Global indices matter to you

For you as the user, the fact that you might be parallelizing your application using MPI is really an implementation detail. You care about what we call global indices. These represent the entries of a distributed object (such as rows or columns of a sparse matrix, or entries of a vector) uniquely over the entire object. The object in turn may be distributed over multiple processes. Just about any data structure containing entries that can be assigned an integer index can be distributed using a Map. For most Tpetra users, this means entries of a vector, rows of a Tpetra::MultiVector, or rows or columns of a sparse matrix. However, it is not limited to these kinds of objects. You may even use Map for your own distributed objects.

A Map assigns global indices to parallel processes. If it assigns a global index G to a process P, we say that process P owns global index G. It is legal for multiple processes to own the same global index G. In fact, this is how we implement many useful communication patterns, including those in sparse matrix-vector multiply. We won't go into much detail in this lesson about that.

Local indices are an implementation detail

For efficiency, within a process, we refer to a global index using its "local index" on that process. Local indices are local to the process that owns them. If process P owns global index G, then there is a unique local index L on process P corresponding to G. If the local index L is valid on process P, then there is a unique global index G owned by P corresponding to the pair (L, P). However, multiple processes might own the same global index, so a global index G might correspond to multiple (L, P) pairs. In summary, local indices on a process correspond to object "entries" (e.g., sparse matrix rows or columns) owned by that process.

We expose local indices for performance reasons

Local indices matter to you because it may be more efficient to use them to access or modify local data than it is to use global indices. This is because distributed data structures must convert from global to local indices every time a user asks for an element by its global index. This requires a table lookup in general, since a process may own an arbitrary subset of all the global indices, in an arbitrary order. Even though local indices are an implementation detail, we expose them because avoiding that table lookup on each access can improve performance a lot.

Maps are themselves distributed data

If a Map has N global entries over P processes, and if no one process owns all the global entries, we never store all N global indices on a single process. Some kinds of Maps require storing all the global indices, but in this case, the indices are themselves distributed over processes. This ensures memory scalability (no one process has to store all the data).

Map compatibility

We mentioned above that a Map behaves much like a vector space. For instance, if two Vectors have the same Map, it is both legal and meaningful to add them together. This makes it useful to be able to compare Maps. Tpetra gives two ways to compare two Maps. Two Maps map1 and map2 may either be "compatible" (map1.isCompatible(map2)) or "the same" (map1.isSameAs(map2)).

Compatibility of two Maps corresponds to isomorphism of two vector spaces. Two Maps that are the same are always compatible. The isCompatible() criterion is less restrictive, and also less expensive to check (although checking for compatibility requires a reduction on a Boolean over all processes in the Map's communicator).

Adding together two vectors with compatible but not the same Maps is legal. It might not make mathematical sense, depending on your application. This is because entries of the vectors are ordered differently. (Also, just because two vector spaces are isomorphic, doesn't necessarily mean that adding entries of one to entries of another makes sense.) Adding together two vectors with the same Maps is both legal and mathematically sensible.

Both sameness and compatibility are commutative Boolean relations: for example, map1.isCompatible(map2) means map2.isCompatible(map1).

Two Maps are compatible when:

they have the same global number of entries
MPI processes in the Map's communicator that have the same MPI rank, own the same number of entries.

Two Maps are the same when:

their minimum and maximum global indices are the same
they have the same global number of entries
the Maps are both distributed over multiple processes, or both not distributed over multiple processes
the Maps have the same index base (this means the smallest legal global index value, more or less)
Processes that have the same rank, own the same number of entries.
Processes that have the same rank, own the same entries. That is, their entries have the same indices, in the same order.

You get to specify the types of local and global indices

In Tpetra, the types of local and global indices are template parameters of Map, Vector, CrsMatrix, and other distributed objects. Local indices have type LocalOrdinal, and global indices have type GlobalOrdinal. Both should be signed built-in C++ integer types. However, you get to pick their size, based on how big your problem is. If your problem has more than 2 billion entries, you will need a 64-bit integer type (such as long long or int64_t) for GlobalOrdinal, but if you have enough processes so that no one process stores more than 2 billion entries locally, then you may use a 32-bit integer type (such as int or int32_t) for LocalOrdinal. The default type of both LocalOrdinal and GlobalOrdinal is int.

You need not specify these types explicitly. If you do not specify them, Tpetra will pick default values. Furthermore, Tpetra objects have typedefs, so you can get these types even if you don't know what their default values are. This is the preferred way to find out their default values. The typedef local_ordinal_type tells you the local ordinal type, and the typedef global_ordinal_type tells you the global ordinal type.

It is usually more efficient to use the shortest integer type possible for both local and global indices. "Shortest" means fewest number of bits. Fewer bits mean you use less memory and thus can solve bigger problems or use higher-quality preconditioners that solve problems in fewer iterations. Shorter local indices can also mean better performance for local sparse matrix kernels, such as sparse matrix-vector multiply, sparse triangular solve, and smoothing (for algebraic multigrid).

Tpetra differs from Epetra in that you, the user, get to decide the types of local and global indices. In Epetra, local and global indices both used to have type int. With the latest Trilinos release, Epetra may use either 32-bit or 64-bit integers for global indices, and always uses 32-bit integers (int) for local indices. Tpetra lets you decide the types of each.

Different categories of Maps

One to one

A Map is one to one if each global index in the Map is owned by only one process. This means that the function from global index G to its local index and process rank (L,P) is one to one in a mathematical sense ("injective"). In this case, the function is only onto ("surjective") if there is only one process. Knowing whether a Map is one-to-one is important for data redistribution, which Tpetra exposes as the Import and Export operations. We will cover Import and Export in subsequent lessons.

An example of a one-to-one Map is a Map containing 101 global indices 0 .. 100 and distributed over four processes, where

Process 0 owns 0 .. 24
Process 1 owns 25 .. 49
Process 2 owns 50 .. 74
Process 3 owns 75 .. 100

An example of a not one-to-one Map is a Map containing 101 global indices 0 .. 100 and distributed over four processes, where

Process 0 owns 0 .. 25
Process 1 owns 25 .. 50
Process 2 owns 50 .. 75
Process 3 owns 75 .. 100

Note the overlap of one global index between each "adjacent" process. An example of a mathematical problem with an overlapping distribution like this would be a 1-D linear finite element or finite difference discretization, where entries are distributed with unique ownership among the processes, but the boundary node between two adjacent entries on different processes is shared among those two processes.

Contiguous or noncontiguous, uniform or not

A Map is contiguous when each process' list of global indices forms an interval and is strictly increasing, and the globally minimum global index equals the index base. Map optimizes for the contiguous case. In particular, noncontiguous Maps require communication in order to figure out which process owns a particular global index.

Note that in Tpetra, "contiguous" is an optimization, not a predicate. Tpetra may not necessarily work hard to check contiguity. The best way to ensure that your Map is contiguous is to use one of the two constructors that always make a contiguous Map.

An example of a contiguous Map is one containing 101 global indices 0 .. 100 and distributed over four processes, where

Process 0 owns 0 .. 24
Process 1 owns 25 .. 49
Process 2 owns 50 .. 74
Process 3 owns 75 .. 100

Note that Process 3 in this example owns 26 global indices, whereas the other processes each own 25. We say that a Map is uniform if each process owns the same number of global indices. The above Map is not uniform. Map includes both a constructor for uniform contiguous Maps, where you specify the total number of global indices, and a constructor for possibly nonuniform contiguous Maps, where you specify the number of global indices owned by each process.

Globally distributed or locally replicated

Globally distributed means that all of the following are true:

The Map's communicator has more than one process.
There is at least one process in the Map's communicator, whose local number of entries does not equal the number of global entries. (That is, not all the entries are replicated over all the processes.)

If at least one of the above are not true, then we call the Map locally replicated. The two terms are mutually exclusive.

Other differences between Tpetra and Epetra

Tpetra's maps look different than Epetra's maps because of all the template parameters, but they work similiarly. One difference is that Tpetra maps tend to be handled by Teuchos::RCP (reference-counted smart pointer) rather than copied or passed by const reference. Another difference is that Epetra_Map inherits from Epetra_BlockMap, whereas Tpetra's Map does not inherit from a corresponding block map class. Epetra_Map only has a SameAs() predicate, whereas Tpetra's Map class distinguishes between "compatibility" and "sameness" (see above). Finally, Epetra_Map's SameAs() means about the same thing as Tpetra's isSameAs().

Tpetra::Vector

Tpetra::Vector implements a finite-dimensional vector distributed over processes. Vector inherits from Tpetra's MultiVector class, which represents a collection of one or more vectors with the same Map. Tpetra favors block algorithms, so it favors MultiVectors over single Vectors. A single Vector is just a MultiVector containing one vector, with a few convenience methods. You'll find documentation for Tpetra's Vector class here.

Vector's interface contains some common linear algebra operations for vector-vector operations, including operations analogous to those in the BLAS 1 standard.

Tpetra objects' template parameters

Most Tpetra objects, including Map and Vector, take several different template parameters. Some of them have default values. For example, Vector has the following template parameters:

Scalar: The type of data stored in the vector
LocalOrdinal: The integer type of local indices
GlobalOrdinal: The integer type of global indices
Node: The implementation of intranode (within a node) parallelism

Map has the same template parameters, except for Scalar (since the same Map can be used to describe Vectors with different Scalar types).

Code example: Initialize Maps and Vectors

The following example follows the same initialization steps as in the previous lesson. It then creates two distributed Tpetra Maps and some Tpetra Vectors, and does a few computations with the vectors.

// @HEADER
// *****************************************************************************
//          Tpetra: Templated Linear Algebra Services Package
//
// Copyright 2008 NTESS and the Tpetra contributors.
// SPDX-License-Identifier: BSD-3-Clause
// *****************************************************************************
// @HEADER
#include <Tpetra_Core.hpp>
#include <Tpetra_Vector.hpp>
#include <Tpetra_Version.hpp>
#include <Teuchos_Comm.hpp>
#include <Teuchos_OrdinalTraits.hpp>
void
exampleRoutine (const Teuchos::RCP<const Teuchos::Comm<int> >& comm,
                std::ostream& out)
{
  using std::endl;
  using Teuchos::Array;
  using Teuchos::ArrayView;
  using Teuchos::RCP;
  using Teuchos::rcp;
  const int myRank = comm->getRank ();
  if (myRank == 0) {
    // Print out the Tpetra software version information.
    out << Tpetra::version () << endl << endl;
  }
  //
  // The first thing you might notice that makes Tpetra objects
  // different than their Epetra counterparts, is that Tpetra objects
  // take several template parameters.  These template parameters give
  // Tpetra its features of being able to solve very large problems
  // (of more than 2 billion unknowns) and to exploit intranode
  // parallelism.
  //
  // Most Tpetra objects come with default values of those template
  // parameters.  In many cases, you might not have to specify _any_
  // of those values explicitly!  You can also control some of their
  // default values, like that of the Node type, when configuring
  // Trilinos.
  //
  // It's common to begin a Tpetra application with some typedefs to
  // make the code more concise and readable.  They also make the code
  // more maintainable, since you can change the typedefs without
  // changing the rest of the program.  You may also want to
  // abbreviate the template parameters.  Standard abbreviations are
  // "LO" for local_ordinal_type and "GO" for global_ordinal_type.
  //
  // The "Scalar" type is the type of the values stored in the Tpetra
  // objects.  Valid Scalar types include real or complex
  // (std::complex<T>) floating-point types, or more exotic objects
  // with similar behavior.  We use the default type here, which we
  // get from Vector.  "Vector<>" means that we let all template
  // parameters' values revert to their defaults.  The default type is
  // 'double', a 64-bit double-precision binary floating-point value.
  typedef Tpetra::Vector<>::scalar_type scalar_type;
  // The "LocalOrdinal" (LO) type is the type of "local" indices.
  // Both Epetra and Tpetra index local entries differently than
  // global entries.  Tpetra exploits this so that you can use a
  // shorter integer type for local indices.  This saves bandwidth
  // when computing sparse matrix-vector products.  We use the default
  // LO type here, which we get from Tpetra::Vector.  We could also
  // get it from Tpetra::Map.
  // This line is commented out because we don't actually use this
  // type in the code below.  Leaving the typedef in that case will
  // make the compiler emit "unused typedef" warnings.
  //
  //typedef Tpetra::Vector<>::local_ordinal_type local_ordinal_type;
  // The "GlobalOrdinal" (GO) type is the type of "global" indices.
  // We use the default GO type here, which we get from
  // Tpetra::Vector.  We could also get it from Tpetra::Map.
  typedef Tpetra::Vector<>::global_ordinal_type global_ordinal_type;
  // The Kokkos "Node" type describes the type of shared-memory
  // parallelism that Tpetra will use _within_ an MPI process.  The
  // available Node types depend on Trilinos' build options and the
  // availability of certain third-party libraries.  In almost all
  // cases, the default setting will do.  You may set the default Node
  // type when configuring Trilinos.  In this case, we access the
  // default Node type using the typedef in Tpetra::Vector.  Almost
  // all Tpetra classes have default template parameter values.
  // This line is commented out because we don't actually use this
  // type in the code below.  Leaving the typedef in that case will
  // make the compiler emit "unused typedef" warnings.
  //typedef Tpetra::Vector<>::node_type node_type;
  // Maps know how to convert between local and global indices, so of
  // course they are templated on the local and global Ordinal types.
  // They are also templated on the Kokkos Node type, because Tpetra
  // objects that use Tpetra::Map are.  It's important not to mix up
  // Maps for different Kokkos Node types.  In this case, we use all
  // default template parameters, which are the same as the
  // corresponding template parameters of Vector.
  typedef Tpetra::Map<> map_type;
  // Create some Tpetra Map objects
  //
  // Like Epetra, Tpetra has local and global Maps.  Local maps
  // describe objects that are replicated over all participating MPI
  // processes.  Global maps describe distributed objects.  You can do
  // imports and exports between local and global maps; this is how
  // you would turn locally replicated objects into distributed
  // objects and vice versa.
  //
  // numLocalEntries: The local (on the calling MPI process) number of
  // entries (indices) in the first Map that we create.  Tpetra
  // expects a size_t for this value.
  const size_t numLocalEntries = 5;
  // numGlobalEntries: The total (global, i.e., over all MPI
  // processes) number of entries (indices) in the Map.  Tpetra
  // expects Tpetra::global_size_t for this value.  This type is at
  // least 64 bits long on 64-bit machines.
  //
  // For this example, we scale the global number of entries in the
  // Map with the number of MPI processes.  That way, you can run this
  // example with any number of MPI processes and every process will
  // still have a positive number of entries.
  const Tpetra::global_size_t numGlobalEntries =
    comm->getSize () * numLocalEntries;
  // Tpetra can index the entries of a Map starting with 0 (C style),
  // 1 (Fortran style), or any base you want.  1-based indexing is
  // handy when interfacing with Fortran.  We choose 0-based indexing
  // here.
  const global_ordinal_type indexBase = 0;
  //
  // Create some Maps.  All Map constructors must be called as a
  // collective over the input communicator.  Not all Map constructors
  // necessarily require communication, but some do, so it's best to
  // treat them all as collectives.
  //
  // Construct a Map that puts the same number of equations on each
  // processor.  The resulting Map is "contiguous and uniform."
  //
  // Maps should be considered immutable objects.  This is why we
  // create it as a "const map_type".  If you want a new data
  // distribution, create a new Map.
  RCP<const map_type> contigMap =
    rcp (new map_type (numGlobalEntries, indexBase, comm));
  // contigMap is contiguous by construction.  Test this at run time.
  // Lesson 01 introduced the TEUCHOS_TEST_FOR_EXCEPTION macro, which
  // throws an exception of the given type (second argument) with the
  // given message (third argument), if the first argument is true.
  TEUCHOS_TEST_FOR_EXCEPTION(
    ! contigMap->isContiguous (), std::logic_error,
    "The supposedly contiguous Map isn't contiguous.");
  // contigMap2: Create a Map which is the same as contigMap, but uses
  // a different Map constructor.  This one asks for the number of
  // entries on each MPI process.  The resulting Map is "contiguous"
  // but not necessarily uniform, since the numbers of entries on
  // different MPI processes may differ.  In this case, the number of
  // entries on each MPI process is the same, but that doesn't always
  // have to be the case.
  RCP<const map_type> contigMap2 =
    rcp (new map_type (numGlobalEntries, numLocalEntries, indexBase, comm));
  // Since contigMap and contigMap2 have the same communicators, and
  // the same number of entries on all MPI processes in their
  // communicators, they are "the same."
  TEUCHOS_TEST_FOR_EXCEPTION(
    ! contigMap->isSameAs (*contigMap2), std::logic_error,
    "contigMap should be the same as contigMap2, but it's not.");
  // contigMap3: Use the same Map constructor as contigMap3, but don't
  // specify the global number of entries.  This is helpful if you
  // only know how many entries each MPI process has, but don't know
  // the global number.  Instead of numGlobalEntries, we use the
  // equivalent of Epetra's -1 for Tpetra::global_size_t (which might
  // be unsigned, so don't use -1!!!), which we call "INVALID" (an
  // "invalid value" used as a flag).
  const Tpetra::global_size_t INVALID =
    Teuchos::OrdinalTraits<Tpetra::global_size_t>::invalid ();
  RCP<const map_type> contigMap3 =
    rcp (new map_type (INVALID, numLocalEntries, indexBase, comm));
  // Even though we made contigMap3 without specifying the global
  // number of entries, it should still be the same as contigMap2.
  TEUCHOS_TEST_FOR_EXCEPTION(
    ! contigMap2->isSameAs (*contigMap3), std::logic_error,
    "contigMap2 should be the same as contigMap3, but it's not.");
  // Create a Map which has the same number of global entries per
  // process as contigMap, but distributes them differently, in
  // round-robin (1-D cyclic) fashion instead of contiguously.
  RCP<const map_type> cyclicMap;
  {
    // We'll use the version of the Map constructor that takes, on
    // each MPI process, a list of the global entries in the Map
    // belonging to that process.  You can use this constructor to
    // construct an overlapping (also called "not 1-to-1") Map, in
    // which one or more entries are owned by multiple processes.  We
    // don't do that here; we make a nonoverlapping (also called
    // "1-to-1") Map.
    Array<global_ordinal_type>::size_type numEltsPerProc = 5;
    Array<global_ordinal_type> elementList (numEltsPerProc);
    const int numProcs = comm->getSize ();
    for (Array<global_ordinal_type>::size_type k = 0; k < numEltsPerProc; ++k) {
      elementList[k] = myRank + k*numProcs;
    }
    cyclicMap = rcp (new map_type (numGlobalEntries, elementList, indexBase, comm));
  }
  // If there's more than one MPI process in the communicator,
  // then cyclicMap is definitely NOT contiguous.
  TEUCHOS_TEST_FOR_EXCEPTION(
    comm->getSize () > 1 && cyclicMap->isContiguous (),
    std::logic_error,
    "The cyclic Map claims to be contiguous.");
  // contigMap and cyclicMap should always be compatible.  However, if
  // the communicator contains more than 1 process, then contigMap and
  // cyclicMap are NOT the same.
  TEUCHOS_TEST_FOR_EXCEPTION(! contigMap->isCompatible (*cyclicMap),
    std::logic_error,
    "contigMap should be compatible with cyclicMap, but it's not.");
  TEUCHOS_TEST_FOR_EXCEPTION(comm->getSize() > 1 && contigMap->isSameAs (*cyclicMap),
    std::logic_error,
    "contigMap should be compatible with cyclicMap, but it's not.");
  // We have maps now, so we can create vectors.
  // Since Tpetra::Vector takes four template parameters, its type is
  // long.  Here, we use all default values for the template
  // parameters, which helps with the length.  However, I still prefer
  // to use a typedef for encapsulation, so that we only have to
  // change one line of code if we decide to change the template
  // parameters of Vector.
  typedef Tpetra::Vector<> vector_type;
  // Create a Vector with the contiguous Map.  This version of the
  // constructor will fill in the vector with zeros.
  vector_type x (contigMap);
  // The two-argument copy constructor with second argument
  // Teuchos::Copy performs a deep copy.  x and y have the same Map.
  // The one-argument copy constructor does a _shallow_ copy.
  vector_type y (x, Teuchos::Copy);
  // Create a Vector with the 1-D cyclic Map.  Calling the constructor
  // with false for the second argument leaves the data uninitialized,
  // so that you can fill it later without paying the cost of
  // initially filling it with zeros.
  vector_type z (cyclicMap, false);
  // Set the entries of z to (pseudo)random numbers.  Please don't
  // consider this a good parallel pseudorandom number generator.
  z.randomize ();
  // Set the entries of x to all ones.
  //
  // The code below works because scalar_type=double.  In general, you
  // may use the commented-out line of code, if the conversion from
  // float to scalar_type is not defined for your scalar type.
  x.putScalar (1.0);
  //x.putScalar (Teuchos::ScalarTraits<scalar_type>::one());
  // See comment above about type conversions to scalar_type.
  const scalar_type alpha = 3.14159;
  const scalar_type beta = 2.71828;
  const scalar_type gamma = -10.0;
  // x = beta*x + alpha*z
  //
  // This is a legal operation!  Even though the Maps of x and z are
  // not the same, their Maps are compatible.  Whether it makes sense
  // or not depends on your application.
  x.update (alpha, z, beta);
  // See comment above about type conversions from float to scalar_type.
  y.putScalar (42.0);
  // y = gamma*y + alpha*x + beta*z
  y.update (alpha, x, beta, z, gamma);
  // Compute the 2-norm of y.
  //
  // The norm may have a different type than scalar_type.  For
  // example, if scalar_type is complex, then the norm is real.
  // Tpetra::MultiVector and Tpetra::Vector give us the type of the
  // norm.
  //
  // If you are using an older version of Tpetra, this code might not
  // work.  Try the commented-out line instead in that case.
  typedef Tpetra::Vector<>::mag_type mag_type;
  //typedef Teuchos::ScalarTraits<scalar_type>::magnitudeType mag_type;
  const mag_type theNorm = y.norm2 ();
  // Print the norm of y on Proc 0.
  if (myRank == 0) {
    out << "Norm of y: " << theNorm << endl;
  }
}
//
// The same main() driver routine as in the first Tpetra lesson.
//
int
main (int argc, char *argv[])
{
  Tpetra::ScopeGuard tpetraScope (&argc, &argv);
  {
    // Never allow Tpetra objects to persist past ScopeGuard's
    // destructor.
    auto comm = Tpetra::getDefaultComm ();
    exampleRoutine (comm, std::cout);
    // This tells the Trilinos test framework that the test passed.
    if (comm->getRank () == 0) {
      std::cout << "End Result: TEST PASSED" << std::endl;
    }
  }
  return 0;
}

Code example: Read and modify the entries of a Vector

The following example follows the same initialization steps as in the previous lesson. It then creates a distributed Tpetra Map and a Tpetra Vector, and shows how to read and modify the entries of the Vector.

// @HEADER
// *****************************************************************************
//          Tpetra: Templated Linear Algebra Services Package
//
// Copyright 2008 NTESS and the Tpetra contributors.
// SPDX-License-Identifier: BSD-3-Clause
// *****************************************************************************
// @HEADER
#include <Tpetra_Core.hpp>
#include <Tpetra_Vector.hpp>
#include <Tpetra_Version.hpp>
#include <Teuchos_CommHelpers.hpp>
void
exampleRoutine (const Teuchos::RCP<const Teuchos::Comm<int> >& comm,
                std::ostream& out)
{
  using std::endl;
  using Teuchos::Array;
  using Teuchos::ArrayRCP;
  using Teuchos::ArrayView;
  using Teuchos::outArg;
  using Teuchos::RCP;
  using Teuchos::rcp;
  using Teuchos::REDUCE_SUM;
  using Teuchos::reduceAll;
  const int myRank = comm->getRank ();
  // Print out the Tpetra software version information.
  if (myRank == 0) {
    out << Tpetra::version () << endl << endl;
  }
  // Type of the Tpetra::Map specialization to use.
  using map_type = Tpetra::Map<>;
  // The type of the Tpetra::Vector specialization to use.  The first
  // template parameter is the Scalar type.  The "Scalar" type is the
  // type of the values stored in the Tpetra::Vector.  You could use
  // Tpetra::Vector<>::scalar_type to get the default Scalar type.  We
  // will assume that it's double.
  //
  // using scalar_type = Tpetra::Vector<>::scalar_type;
  using vector_type = Tpetra::Vector<double>;
  // The "LocalOrdinal" (LO) type is the type of "local" indices.
  // The typedef is commented out to avoid "unused typedef" warnings.
  //
  //using local_ordinal_type = vector_type::local_ordinal_type;
  // The "GlobalOrdinal" (GO) type is the type of "global" indices.
  using global_ordinal_type = vector_type::global_ordinal_type;
  // Create a Tpetra Map
  // The total (global, i.e., over all MPI processes) number of
  // entries in the Map.
  //
  // For this example, we scale the global number of entries in the
  // Map with the number of MPI processes.  That way, you can run this
  // example with any number of MPI processes and every process will
  // still have a positive number of entries.
  const Tpetra::global_size_t numGlobalEntries = comm->getSize () * 5;
  // Index base of the Map.  We choose zero-based (C-style) indexing.
  const global_ordinal_type indexBase = 0;
  // Construct a Map that puts the same number of equations on each
  // MPI process.
  RCP<const map_type> contigMap =
    rcp (new map_type (numGlobalEntries, indexBase, comm));
  // Create a Tpetra Vector
  // Create a Vector with the Map we created above.
  // This version of the constructor will fill in the vector with zeros.
  vector_type x (contigMap);
  // Fill the Vector with a single number, or with random numbers
  // Set all entries of x to 42.0.
  x.putScalar (42.0);
  // norm2() is a collective, so we need to call it on all processes
  // in the Vector's communicator.
  auto x_norm2 = x.norm2 ();
  if (myRank == 0) {
    out << "Norm of x (all entries are 42.0): " << x_norm2 << endl;
  }
  // Set the entries of x to (pseudo)random numbers.  Please don't
  // consider this a good parallel pseudorandom number generator.
  x.randomize ();
  x_norm2 = x.norm2 ();
  if (myRank == 0) {
    out << "Norm of x (random numbers): " << x_norm2 << endl;
  }
  // Read the entries of the Vector
  {
    // Get a view of the Vector's entries.  The view has type
    // Kokkos::View.  Kokkos::View acts like an array, but is
    // reference-counted like std::shared_ptr or Teuchos::RCP.  This
    // means that it may persist beyond the lifetime of the Vector.  A
    // View is like a shallow copy of the data, so be careful
    // modifying the Vector while a view of it exists.  You may
    // decrement the reference count manually by assigning an empty
    // View to it.  We put this code in an inner scope (in an extra
    // pair of {}) so that the Kokkos::View will fall out of scope
    // before the next example, which modifies the entries of the
    // Vector.
    // We want a _host_ View.  Vector implements "dual view"
    // semantics.  This is really only relevant for architectures with
    // two memory spaces.
    auto x_2d = x.getLocalViewHost(Tpetra::Access::ReadOnly);
    // getLocalView returns a 2-D View by default.  We want a 1-D
    // View, so we take a subview.
    auto x_1d = Kokkos::subview (x_2d, Kokkos::ALL (), 0);
    // x_data.extent (0) may be longer than the number of local
    // rows in the Vector, so be sure to ask the Vector for its
    // dimensions, rather than the ArrayRCP.
    const size_t localLength = x.getLocalLength ();
    // Count the local number of entries less than 0.5.
    // Use local indices to access the entries of x_data.
    size_t localCount = 0;
    for (size_t k = 0; k < localLength; ++k) {
      if (x_1d(k) < 0.5) {
        ++localCount;
      }
    }
    // "reduceAll" is a type-safe templated version of MPI_Allreduce.
    // "outArg" is like taking the address using &, but makes it more
    // clear that its argument is an output argument of a function.
    size_t globalCount = 0;
    reduceAll<int, size_t> (*comm, REDUCE_SUM, localCount,
                            outArg (globalCount));
    // Find the total number of entries less than 0.5, over all
    // processes in the Vector's communicator.  Note the trick for
    // pluralizing the word "entry" conditionally on globalCount.
    if (myRank == 0) {
      out << "x has " << globalCount << " entr"
          << (globalCount != 1 ? "ies" : "y")
          << " less than 0.5." << endl;
    }
  }
  // Modify the entries of the Vector
  {
    // Get a nonconst persisting view of the entries in the Vector.
    // "Nonconst" means that you may modify the entries.  "Persisting"
    // means that the view persists beyond the lifetime of the Vector.
    // Even after the Vector's destructor is called, the view won't go
    // away.  If you create two nonconst persisting views of the same
    // Vector, and modify the entries of one view during the lifetime
    // of the other view, the entries of the other view are undefined.
    auto x_2d = x.getLocalViewHost(Tpetra::Access::ReadWrite);
    auto x_1d = Kokkos::subview (x_2d, Kokkos::ALL (), 0);
    // Use local indices to access the entries of x_data.
    // x_data.extent (0) may be longer than the number of local
    // rows in the Vector, so be sure to ask the Vector for its
    // dimensions.
    const size_t localLength = x.getLocalLength ();
    for (size_t k = 0; k < localLength; ++k) {
      // Add k (the local index) to every entry of x.  Treat 'double'
      // as a function to convert k (an integer) to double.
      x_1d(k) += double (k);
    }
  }
  // Print the norm of x.
  x_norm2 = x.norm2 ();
  if (myRank == 0) {
    out << "Norm of x (modified random numbers): " << x_norm2 << endl;
  }
}
//
// The same main() driver routine as in the first Tpetra lesson.
//
int
main (int argc, char *argv[])
{
  Tpetra::ScopeGuard tpetraScope (&argc, &argv);
  {
    auto comm = Tpetra::getDefaultComm ();
    exampleRoutine (comm, std::cout);
    // Tell the Trilinos test framework that the test passed.
    if (comm->getRank () == 0) {
      std::cout << "End Result: TEST PASSED" << std::endl;
    }
  }
  return 0;
}

Things not previously explained

This lesson introduces three new topics: the Node template parameter of Tpetra objects, the Teuchos::ScalarTraits scalar traits class, and Teuchos memory management classes like Teuchos::Array. We will explain them here.

Tpetra's Node template parameter

The Node template parameter governs the way the Tpetra objects do parallelism within a node ("intranode," as opposed to MPI's internode parallelism). We have implemented several different Node types. All you need to know for now is that it is part of the type of the object. You can't assign a matrix with one Node type to a matrix with a different Node type; they are incompatible. Usually, you will use one Node type and Node instance for all the Tpetra objects that you create.

The Teuchos::ScalarTraits scalar traits class

A traits class maps from a C++ type to attributes of that type. It is a standard C++ idiom for generic programming. The C++ Standard Library comes with a few different traits classes, such as std::numeric_traits. Teuchos::ScalarTraits is like std::numeric_traits, but offers more features. For a given scalar type S, Teuchos::ScalarTraits<S> can tell you the type of the magnitude of S (which is real if S is complex), how to compute the magnitude or extract the real or imaginary components, the definition of zero or one for S, and other useful information. Users may also define new specializations (definitions for new "input types") of Teuchos::ScalarTraits.

Teuchos memory management classes

Teuchos::Array is an array container, templated on the type of objects that it contains. It behaves much like std::vector. The difference is that Array interoperates with the other Teuchos memory management classes. For example, Teuchos::ArrayView is a nonowning, nonpersistent view of part or all of an Array. The std::vector class does not have nonowning views; passing std::vector by value copies the data, and there is no way to get a view of part of the std::vector. Array and ArrayView fix these deficiencies. Teuchos::ArrayRCP is the array analog of !RCP; it allows shared ownership of an array. For more details, please refer to the reference guide to the Teuchos Memory Management Classes.