Use Tpetra sparse matrix and dense vector objects to implement a simple iteration (the power method)
Lesson topics
This lesson demonstrates the following:
-
How to construct a Tpetra::CrsMatrix (a distributed sparse matrix)
-
How to modify the entries of a previously constructed CrsMatrix
-
How to use CrsMatrix and Vector to implement a simple iterative eigensolver (the power method)
This example is just like the Epetra power method example with which many users might be familiar, but uses Tpetra objects in place of Epetra objects.
Relation to other lessons
Before starting this lesson, it helps to have learned Tpetra Lesson 01: Initialization (how to initialize Tpetra) and Tpetra::Vector (how to create distributions and distributed vectors). After completing this lesson, you might want to learn about more efficient ways to add or modify entries in a Tpetra sparse matrix.
Creating and filling a sparse matrix
This is the first lesson in the usual sequence which covers adding entries to ("filling") a Tpetra sparse matrix, and modifying the values of those entries after creating the matrix. Creating and filling a Tpetra sparse matrix involves the following steps:
-
Create the CrsMatrix (by calling one of its constructors)
-
Call methods to add entries to the sparse matrix
-
Call the matrix's
fillComplete()
method
We will explain each of these steps in turn.
Creating the CrsMatrix
Tpetra's sparse matrices are distributed over one or more parallel processes, just like vectors or other distributed objects. Also just like vectors, you have to tell the sparse matrix its distribution on construction. Unlike vectors, though, sparse matrices have two dimensions over which to be distributed: rows and columns.
Many users are perfectly happy ignoring the column distribution and just distributing the matrix in "one-dimensional" fashion over rows. In that case, you need only supply the "row Map" to the constructor. This implies that for any row which a process owns, that process may insert entries in any column in that row.
Some users want to use the full flexibility of distributing both the rows and columns of the matrix over processes. This "two-dimensional" distribution, if chosen optimally, can significantly reduce the amount of communication needed for distributed-memory parallel sparse matrix-vector multiply. Trilinos packages like Zoltan and Zoltan2 can help you compute this distribution. In that case, you may give both the "row Map" and the "column Map" to the constructor. This implies that for any row which a process owns, that process may insert entries in any column in that row which that process owns in its column Map.
Finally, other users already know the structure of the sparse matrix, and just want to fill in values. These users should first create the graph (a CrsGraph), call fillComplete()
on the graph, and then give the graph to the constructor of CrsMatrix. The graph may have either a "1-D" or "2-D" distribution, as mentioned above.
Adding entries to the sparse matrix
Methods of CrsMatrix that start with "insert" actually change the structure of the sparse matrix. Methods that start with "replace" or "sumInto" only modify existing values.
Calling \c fillComplete()
Calling fillComplete()
signals that you are done changing the structure (if allowed) or values of the sparse matrix. This is an expensive operation, because it both rearranges local data, and communicates in order to build reusable communication patterns for sparse matrix-vector multiply. You should try to amortize the cost of this operation whenever possible over many sparse matrix-vector multiplies.
fillComplete()
takes two arguments:
-
the domain Map (the distribution of the input vector x in a sparse matrix-vector multiply y = A*x)
-
the range Map (the distribution of the output vector y in a sparse matrix-vector multiply y = A*x)
Both the domain and range Maps must be one-to-one: that is, each global index in the Map must be uniquely owned by one and only one process. You will need to supply these two arguments to fillComplete()
under any of the following conditions:
-
When the row Map is not the same as the range Map (it can't be if the row Map is not one to one)
-
When the domain and range Maps are not equal (e.g., if the matrix is not square)
-
When the domain or range Map as not the same as the row Map
If the domain and range Maps equal the row Map and the row Map is one-to-one, then you may call fillComplete()
with no arguments.
The most significant difference between Epetra and Tpetra sparse matrices, is that in order to modify the entries of a Tpetra::CrsMatrix once you have called fillComplete()
, you must first call resumeFill()
. Epetra_CrsMatrix has no corresponding "resume fill" method, and you may modify the values of entries after FillComplete()
has been called.
The reason for this difference is that Tpetra's fillComplete()
has the right to rearrange the matrix's data in ways that violate user expectations. For example, it may give the data to a third-party library that rearranges them in an opaque way, or even copy them into a different memory space (for example, into GPU memory). Calling resumeFill()
signals Tpetra that you want to change either the values or the structure.
Code example
The following code example shows how to fill and compute with a Tpetra sparse matrix, using the procedure discussed in the text above.
#include <Tpetra_CrsMatrix.hpp>
#include <Tpetra_Map.hpp>
#include <Tpetra_MultiVector.hpp>
#include <Tpetra_Vector.hpp>
#include <Tpetra_Version.hpp>
#include <Teuchos_Array.hpp>
#include <Teuchos_ScalarTraits.hpp>
#include <Teuchos_RCP.hpp>
template <class TpetraOperatorType>
class PowerMethod {
public:
typedef typename TpetraOperatorType::scalar_type scalar_type;
typedef typename TpetraOperatorType::global_ordinal_type global_ordinal_type;
typedef typename TpetraOperatorType::node_type
node_type;
typedef typename vec_type::mag_type magnitude_type;
static scalar_type
run (const TpetraOperatorType& A,
const int niters,
const magnitude_type tolerance,
std::ostream& out)
{
using std::endl;
typedef Teuchos::ScalarTraits<scalar_type> STS;
typedef Teuchos::ScalarTraits<magnitude_type> STM;
const int myRank = A.getMap ()->getComm ()->getRank ();
vec_type q (A.getDomainMap ());
vec_type z (A.getRangeMap ());
vec_type resid (A.getRangeMap ());
z.randomize ();
scalar_type lambda = STS::zero ();
magnitude_type normz = STM::zero ();
magnitude_type residual = STM::zero ();
const scalar_type one = STS::one ();
const scalar_type zero = STS::zero ();
const int reportFrequency = 10;
for (int iter = 0; iter < niters; ++iter) {
normz = z.norm2 ();
q.scale (one / normz, z);
A.apply (q, z);
lambda = q.dot (z);
if (iter % reportFrequency == 0 || iter + 1 == niters) {
resid.update (one, z, -lambda, q, zero);
residual = resid.norm2 ();
if (myRank == 0) {
out << "Iteration " << iter << ":" << endl
<< "- lambda = " << lambda << endl
<< "- ||A*q - lambda*q||_2 = " << residual << endl;
}
}
if (residual < tolerance) {
if (myRank == 0) {
out << "Converged after " << iter << " iterations" << endl;
}
break;
}
else if (iter+1 == niters) {
if (myRank == 0) {
out << "Failed to converge after " << niters
<< " iterations" << endl;
}
break;
}
}
return lambda;
}
};
int
main (int argc, char *argv[])
{
using Teuchos::Array;
using Teuchos::ArrayView;
using Teuchos::ArrayRCP;
using Teuchos::arcp;
using Teuchos::RCP;
using Teuchos::rcp;
using Teuchos::tuple;
using std::cout;
using std::endl;
{
const size_t myRank = comm->getRank();
if (myRank == 0) {
cout << Tpetra::version () << endl << endl;
}
const global_ordinal_type indexBase = 0;
RCP<const map_type> map =
rcp (new map_type (numGblIndices, indexBase, comm));
const size_t numMyElements = map->getNodeNumElements ();
if (myRank == 0) {
cout << endl << "Creating the sparse matrix" << endl;
}
RCP<crs_matrix_type> A (new crs_matrix_type (map, 3));
const scalar_type two = static_cast<scalar_type> (2.0);
const scalar_type negOne = static_cast<scalar_type> (-1.0);
for (local_ordinal_type lclRow = 0;
lclRow < static_cast<local_ordinal_type> (numMyElements);
++lclRow) {
const global_ordinal_type gblRow = map->getGlobalElement (lclRow);
if (gblRow == 0) {
A->insertGlobalValues (gblRow,
tuple<global_ordinal_type> (gblRow, gblRow + 1),
tuple<scalar_type> (two, negOne));
}
else if (static_cast<Tpetra::global_size_t> (gblRow) == numGblIndices - 1) {
A->insertGlobalValues (gblRow,
tuple<global_ordinal_type> (gblRow - 1, gblRow),
tuple<scalar_type> (negOne, two));
}
else {
A->insertGlobalValues (gblRow,
tuple<global_ordinal_type> (gblRow - 1, gblRow, gblRow + 1),
tuple<scalar_type> (negOne, two, negOne));
}
}
A->fillComplete ();
const int niters = 500;
const magnitude_type tolerance = 1.0e-2;
scalar_type lambda =
PowerMethod<crs_matrix_type>::run (*A, niters, tolerance, cout);
if (myRank == 0) {
cout << endl << "Estimated max eigenvalue: " << lambda << endl;
}
if (myRank == 0) {
cout << endl << "Increasing magnitude of A(0,0), "
"solving again" << endl;
}
A->resumeFill ();
if (A->getRowMap ()->isNodeGlobalElement (0)) {
const global_ordinal_type idOfFirstRow = 0;
size_t numEntriesInRow = A->getNumEntriesInGlobalRow (idOfFirstRow);
Array<scalar_type> rowvals (numEntriesInRow);
Array<global_ordinal_type> rowinds (numEntriesInRow);
A->getGlobalRowCopy (idOfFirstRow, rowinds (), rowvals (), numEntriesInRow);
for (size_t i = 0; i < numEntriesInRow; i++) {
if (rowinds[i] == idOfFirstRow) {
rowvals[i] *= 10.0;
}
}
A->replaceGlobalValues (idOfFirstRow, rowinds (), rowvals ());
}
A->fillComplete ();
lambda = PowerMethod<crs_matrix_type>::run (*A, niters, tolerance,
cout);
if (myRank == 0) {
cout << endl << "Estimated max eigenvalue: " << lambda << endl;
}
if (myRank == 0) {
cout << "End Result: TEST PASSED" << endl;
}
}
return 0;
}