Tpetra parallel linear algebra  Version of the Day
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Tpetra_DistributionLowerTriangularBlock.hpp
1 // @HEADER
2 // *****************************************************************************
3 // Tpetra: Templated Linear Algebra Services Package
4 //
5 // Copyright 2008 NTESS and the Tpetra contributors.
6 // SPDX-License-Identifier: BSD-3-Clause
7 // *****************************************************************************
8 // @HEADER
9 
10 #ifndef __TPETRA_DISTRIBUTORLOWERTRIANGULARBLOCK_HPP
11 #define __TPETRA_DISTRIBUTORLOWERTRIANGULARBLOCK_HPP
12 
13 // Needed by DistributionLowerTriangularBlock
14 #include "Tpetra_Distributor.hpp"
15 
16 // Needed by LowerTriangularBlock operator
17 #include "Tpetra_Core.hpp"
18 #include "Tpetra_Map.hpp"
19 #include "Tpetra_Operator.hpp"
20 #include "Tpetra_Vector.hpp"
21 #include "Tpetra_CrsMatrix.hpp"
22 
23 namespace Tpetra {
24 
26 template <typename gno_t, typename scalar_t>
27 class DistributionLowerTriangularBlock : public Distribution<gno_t, scalar_t> {
28  // Seher Acer's lower-triangular block decomposition for triangle counting
29  // See also: LowerTriangularBlockOperator below that allows this distribution
30  // to be used in Tpetra SpMV.
31  //
32  // Requirements:
33  // Matrix must be square (undirected graph)
34  // Number of processors np = q(q+1)/2 for some q.
35  //
36  // Only the lower triangular portion of the matrix is stored.
37  // Processors are arranged logically as follows:
38  // 0
39  // 1 2
40  // 3 4 5
41  // ...
42  //
43  // The lower triangular part of the matrix is stored in a 2D distribution.
44  // For example, the dense 7x7 lower triangular matrix below would be assigned
45  // to processors according to numbers shown as follows:
46  // 0 | |
47  // 00| |
48  // ---------
49  // 11|2 |
50  // 11|22 |
51  // 11|222|
52  // ---------
53  // 33|444|5
54  // 33|444|55
55  // ...
56  // (Note that we expect the matrix to be sparse. For dense matrices,
57  // CrsMatrix is the wrong tool.)
58  //
59  // Matrix rows are assigned to processor rows greedily to roughly balance
60  // (# nonzeros in processor row / # processors in processor row)
61  // across processor rows.
62  // The same cuts are used to divide rows and columns among processors
63  // (that is, all processors have a square block).
64  //
65  // The lower triangular algorithm:
66  // 1. distribute all matrix entries via 1D linear distribution
67  // (this initial distribution is needed to avoid storing the entire
68  // matrix on one processor, while providing info about the nonzeros per row
69  // needed in step 2.
70  // 2. (optional) sort rows in decreasing order wrt the number of nonzeros
71  // per row
72  // 3. find "chunk cuts": divisions in row assignments such that
73  // (# nonzeros in processor row / # processors in processor row) is
74  // roughly equal for all processor rows
75  // 4. send nonzeros to their new processor assignment
76  //
77  // Known issues: (TODO)
78  // - The sorting in Step 2 and computation of chunk cuts in step 3 are
79  // currently done in serial and requires O(number of rows) storage each
80  // processor. More effort could parallelize this computation, but parallel
81  // load balancing algorithms are more appropriate in Zoltan2 than Tpetra.
82  // - The sorting in Step 2 renumbers the rows (assigns new Global Ordinals to
83  // the rows) to make them contiguous, as needed in Acer's triangle counting
84  // algorithm.
85  // (Acer's algorithm relies on local indexing from the chunk boundaries to
86  // find neighbors needed for communication.)
87  // The class currently provides a permutation matrix P describing the
88  // reordering. Thus, the matrix stored in the lower triangular block
89  // distribution is actually P A P -- the row and column permutation of
90  // matrix A in the Matrix Market file.
91  // The fact that a permuted matrix is stored complicates use of the matrix
92  // in algorithms other than Acer's triangle counting. For SpMV with the
93  // vector numbered according to the MatrixMarket numbering, for example,
94  // P^T must be applied to the vector before SpMV, and P^T must be applied to
95  // the result of SpMV. See LowerTriangularBlockOperator to see how this
96  // permutation matrix is used.
97  //
98  // Before addressing these issues, we will decide (TODO)
99  // - Is this Distribution general enough to be in Tpetra?
100  // - Should we, instead, have a separate package for distributions (that could
101  // use Zoltan2 and Tpetra without circular dependence)?
102  // - Or should we allow users (such as the triangle counting algorithm) to
103  // provide their own distributions (e.g., LowerTriangularBlock) that
104  // inherit from Tpetra's Distribution class?
105  // For now, we will push this Distribution into Tpetra, but we will revisit
106  // this decision.
107 
108  public:
109  using Distribution<gno_t, scalar_t>::me;
110  using Distribution<gno_t, scalar_t>::np;
111  using Distribution<gno_t, scalar_t>::comm;
112  using Distribution<gno_t, scalar_t>::nrows;
113  using typename Distribution<gno_t, scalar_t>::NZindex_t;
114  using typename Distribution<gno_t, scalar_t>::LocalNZmap_t;
115 
116  using map_t = Tpetra::Map<>;
117  using matrix_t = Tpetra::CrsMatrix<scalar_t>;
118 
119  DistributionLowerTriangularBlock(size_t nrows_,
120  const Teuchos::RCP<const Teuchos::Comm<int> > &comm_,
121  const Teuchos::ParameterList &params)
122  : Distribution<gno_t, scalar_t>(nrows_, comm_, params)
123  , initialDist(nrows_, comm_, params)
124  , sortByDegree(false)
125  , permMatrix(Teuchos::null)
126  , redistributed(false)
127  , chunksComputed(false)
128  , nChunks(0) {
129  int npnp = 2 * np;
130  nChunks = int(std::sqrt(float(npnp)));
131  while (nChunks * (nChunks + 1) < npnp) nChunks++;
132 
133  TEUCHOS_TEST_FOR_EXCEPTION(nChunks * (nChunks + 1) != npnp, std::logic_error,
134  "Number of processors np = " << np << " must satisfy np = q(q+1)/2 for some q"
135  << " for LowerTriangularBlock distribution");
136  nChunksPerRow = double(nChunks) / double(nrows);
137 
138  const Teuchos::ParameterEntry *pe = params.getEntryPtr("sortByDegree");
139  if (pe != NULL) sortByDegree = pe->getValue<bool>(&sortByDegree);
140 
141  pe = params.getEntryPtr("readPerProcess");
142  if (pe != NULL) redistributed = pe->getValue<bool>(&redistributed);
143 
144  if (me == 0) std::cout << "\n LowerTriangularBlock Distribution: "
145  << "\n np = " << np
146  << "\n nChunks = " << nChunks
147  << std::endl;
148  }
149 
150  enum DistributionType DistType() { return LowerTriangularBlock; }
151 
152  bool areChunksComputed() { return chunksComputed; }
153 
154  Teuchos::Array<gno_t> getChunkCuts() {
155  if (chunksComputed)
156  return chunkCuts;
157  else {
158  throw std::runtime_error("Error: Requested chunk cuts have not been computed yet.");
159  }
160  }
161 
162  // Return whether this rank owns vector entry i.
163  // TODO: for now, use same vector dist as 1DLinear;
164  // TODO: think about best distribution of Vectors
165  inline bool VecMine(gno_t i) { return initialDist.VecMine(i); }
166 
167  // Return whether this rank owns nonzero (i,j)
168  // Vector map and row map are the same in 1D distribution.
169  // But keep only the lower Triangular entries
170  bool Mine(gno_t i, gno_t j) {
171  if (redistributed) {
172  if (j > i)
173  return false; // Don't keep any upper triangular entries
174  else
175  return (procFromChunks(i, j) == me);
176  } else
177  return initialDist.Mine(i, j);
178  }
179 
180  inline bool Mine(gno_t i, gno_t j, int p) { return Mine(i, j); }
181 
182  // How to redistribute according to chunk-based row distribution
183  void Redistribute(LocalNZmap_t &localNZ) {
184  // Going to do chunking and sorting serially for now;
185  // need to gather per-row information from each processor
186  // TODO: think about a parallel implementation
187 
188  gno_t myFirstRow = initialDist.getFirstRow(me);
189  gno_t nMyRows = initialDist.getNumRow(me);
190  Teuchos::Array<gno_t> nnzPerRow(nMyRows, 0);
191 
192  Teuchos::Array<int> rcvcnt(np);
193  Teuchos::Array<int> disp(np);
194  for (int sum = 0, p = 0; p < np; p++) {
195  int prows = initialDist.getNumRow(p);
196  rcvcnt[p] = prows;
197  disp[p] = sum;
198  sum += prows;
199  }
200 
201  // If desire sortByDegree, first need to sort with respect to ALL entries
202  // in matrix (not lower-triangular entries);
203  // decreasing sort by number of entries per row in global matrix.
204  // Generate permuteIndex for the sorted rows
205 
206  Teuchos::Array<gno_t> permuteIndex; // This is the inverse permutation
207  Teuchos::Array<gno_t> sortedOrder; // This is the original permutation
208 
209  Teuchos::Array<gno_t> globalRowBuf;
210  // TODO Dunno why there isn't a Teuchos::gatherAllv;
211  // TODO for now, compute and broadcast
212  if (me == 0) {
213  globalRowBuf.resize(nrows, 0); // TODO: Ick! Need parallel
214  }
215 
216  if (sortByDegree) {
217  // Compute nnzPerRow; distribution is currently 1D and includes all nz
218  for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
219  gno_t I = it->first.first;
220  nnzPerRow[I - myFirstRow]++;
221  }
222 
223  Teuchos::gatherv<int, gno_t>(nnzPerRow.getRawPtr(), nMyRows,
224  globalRowBuf.getRawPtr(),
225  rcvcnt.getRawPtr(), disp.getRawPtr(),
226  0, *comm);
227 
228  permuteIndex.resize(nrows); // TODO: Ick! Need parallel
229  sortedOrder.resize(nrows); // TODO: Ick! Need parallel
230 
231  if (me == 0) { // TODO: do on all procs once have allgatherv
232 
233  for (size_t i = 0; i != nrows; i++) sortedOrder[i] = i;
234 
235  std::sort(sortedOrder.begin(), sortedOrder.end(),
236  [&](const size_t &a, const size_t &b) {
237  return (globalRowBuf[a] > globalRowBuf[b]);
238  });
239 
240  // Compute inverse permutation; it is more useful for our needs
241  for (size_t i = 0; i < nrows; i++) {
242  permuteIndex[sortedOrder[i]] = i;
243  }
244  }
245 
246  Teuchos::broadcast<int, gno_t>(*comm, 0, permuteIndex(0, nrows));
247  // Ick! Use a directory TODO
248 
249  // Sorting is changing the global IDs associated
250  // with rows/columns. To make this distribution applicable beyond
251  // triangle counting (e.g., in a Tpetra operator), we need a way
252  // to map from the original global IDs and back again.
253  // Create a permutation matrix for use in the operator; use
254  // default Tpetra layout.
255  Teuchos::Array<gno_t> myRows;
256  for (size_t i = 0; i < nrows; i++) {
257  if (VecMine(i)) myRows.push_back(i);
258  }
259 
260  Tpetra::global_size_t dummy =
261  Teuchos::OrdinalTraits<Tpetra::global_size_t>::invalid();
262  Teuchos::RCP<const map_t> permMap =
263  rcp(new map_t(dummy, myRows(), 0, comm));
264 
265  permMatrix = rcp(new matrix_t(permMap, 1)); // one nz / row in permMatrix
266 
267  Teuchos::Array<gno_t> cols(1);
268  Teuchos::Array<scalar_t> vals(1);
269  vals[0] = 1.;
270 
271  for (size_t i = 0; i < permMap->getLocalNumElements(); i++) {
272  gno_t gid = permMap->getGlobalElement(i);
273  cols[0] = permuteIndex[gid];
274  permMatrix->insertGlobalValues(gid, cols(), vals());
275  }
276 
277  permMatrix->fillComplete(permMap, permMap);
278  }
279 
280  // Now, to determine the chunks, we care only about the number of
281  // nonzeros in the lower triangular matrix.
282  // Compute nnzPerRow; distribution is currently 1D
283  nnzPerRow.assign(nMyRows, 0);
284  size_t nnz = 0;
285  for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
286  gno_t I = (sortByDegree ? permuteIndex[it->first.first]
287  : it->first.first);
288  gno_t J = (sortByDegree ? permuteIndex[it->first.second]
289  : it->first.second);
290  if (J <= I) { // Lower-triangular part
291  nnzPerRow[it->first.first - myFirstRow]++;
292  nnz++;
293  }
294  }
295 
296  // TODO Dunno why there isn't a Teuchos::gatherAllv;
297  // TODO for now, compute and broadcast
298 
299  Teuchos::gatherv<int, gno_t>(nnzPerRow.getRawPtr(), nMyRows,
300  globalRowBuf.getRawPtr(),
301  rcvcnt.getRawPtr(), disp.getRawPtr(),
302  0, *comm);
303 
304  Teuchos::Array<int>().swap(rcvcnt); // no longer needed
305  Teuchos::Array<int>().swap(disp); // no longer needed
306 
307  size_t gNnz;
308  Teuchos::reduceAll(*comm, Teuchos::REDUCE_SUM, 1, &nnz, &gNnz);
309 
310  chunkCuts.resize(nChunks + 1, 0);
311 
312  if (me == 0) { // TODO: when have allgatherv, can do on all procs
313  // TODO: or better, implement parallel version
314 
315  // Determine chunk cuts
316  size_t target = gNnz / np; // target nnz per processor
317  size_t targetRunningTotal = 0;
318  size_t currentRunningTotal = 0;
319  gno_t I = gno_t(0);
320  for (int chunkCnt = 0; chunkCnt < nChunks; chunkCnt++) {
321  targetRunningTotal = (target * (chunkCnt + 1));
322  currentRunningTotal = 0;
323  while (I < static_cast<gno_t>(nrows)) {
324  size_t nextNnz = (sortByDegree ? globalRowBuf[sortedOrder[I]]
325  : globalRowBuf[I]);
326  if (currentRunningTotal + nextNnz <= targetRunningTotal) {
327  currentRunningTotal += nextNnz;
328  I++;
329  } else
330  break;
331  }
332  chunkCuts[chunkCnt + 1] = I;
333  }
334  chunkCuts[nChunks] = static_cast<gno_t>(nrows);
335  }
336 
337  // Free memory associated with globalRowBuf
338  Teuchos::Array<gno_t>().swap(globalRowBuf);
339 
340  Teuchos::broadcast<int, gno_t>(*comm, 0, chunkCuts(0, nChunks + 1));
341  chunksComputed = true;
342 
343  // Determine new owner of each nonzero; buffer for sending
344  Kokkos::View<gno_t *, Kokkos::HostSpace> iOut("iOut", localNZ.size());
345  Kokkos::View<gno_t *, Kokkos::HostSpace> jOut("jOut", localNZ.size());
346  Kokkos::View<scalar_t *, Kokkos::HostSpace> vOut("vOut", localNZ.size());
347  Teuchos::Array<int> pOut(localNZ.size());
348 
349  size_t sendCnt = 0;
350  for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
351  iOut[sendCnt] = (sortByDegree ? permuteIndex[it->first.first]
352  : it->first.first);
353  jOut[sendCnt] = (sortByDegree ? permuteIndex[it->first.second]
354  : it->first.second);
355  if (jOut[sendCnt] <= iOut[sendCnt]) { // keep only lower diagonal entries
356  vOut[sendCnt] = it->second;
357  pOut[sendCnt] = procFromChunks(iOut[sendCnt], jOut[sendCnt]);
358 
359  sendCnt++;
360  }
361  }
362 
363  // Free memory associated with localNZ and permuteIndex
364  LocalNZmap_t().swap(localNZ);
365  if (sortByDegree) Teuchos::Array<gno_t>().swap(permuteIndex);
366 
367  // Use a Distributor to send nonzeros to new processors.
368  Tpetra::Distributor plan(comm);
369  size_t nrecvs = plan.createFromSends(pOut(0, sendCnt));
370  Kokkos::View<gno_t *, Kokkos::HostSpace> iIn("iIn", nrecvs);
371  Kokkos::View<gno_t *, Kokkos::HostSpace> jIn("jIn", nrecvs);
372  Kokkos::View<scalar_t *, Kokkos::HostSpace> vIn("vIn", nrecvs);
373 
374  // TODO: With more clever packing, could do only one round of communication
375  auto sendIndices = std::make_pair(static_cast<size_t>(0), sendCnt);
376  plan.doPostsAndWaits(Kokkos::subview(iOut, sendIndices), 1, iIn);
377  plan.doPostsAndWaits(Kokkos::subview(jOut, sendIndices), 1, jIn);
378  plan.doPostsAndWaits(Kokkos::subview(vOut, sendIndices), 1, vIn);
379 
380  // Put received nonzeros in map
381  for (size_t n = 0; n < nrecvs; n++) {
382  NZindex_t nz(iIn[n], jIn[n]);
383  localNZ[nz] = vIn[n];
384  }
385 
386  redistributed = true;
387  }
388 
389  Teuchos::RCP<matrix_t> getPermutationMatrix() const { return permMatrix; }
390 
391  private:
392  // Initially distribute nonzeros with a 1D linear distribution
393  Distribution1DLinear<gno_t, scalar_t> initialDist;
394 
395  // Flag indicating whether matrix should be reordered and renumbered
396  // in decreasing sort order of number of nonzeros per row in full matrix
397  bool sortByDegree;
398 
399  // Column permutation matrix built only when sortByDegree = true;
400  Teuchos::RCP<matrix_t> permMatrix;
401 
402  // Flag whether redistribution has occurred yet
403  // This is true
404  // i) after Tpetra performs the redistribution or
405  // ii) when Tpetra reads already-distributed nonzeros by readPerProcess function
406  bool redistributed;
407 
408  // If we read the already-distributed nonzeros from per-process files,
409  // this will remain false until a triangle counting code actually computes
410  // the chunks when the need arises.
411  bool chunksComputed;
412 
413  int nChunks; // in np = q(q+1)/2 layout, nChunks = q
414  double nChunksPerRow;
415  Teuchos::Array<gno_t> chunkCuts;
416 
417  int findIdxInChunks(gno_t I) {
418  int m = I * nChunksPerRow;
419  while (I < chunkCuts[m]) m--;
420  while (I >= chunkCuts[m + 1]) m++;
421  return m;
422  }
423 
424  int procFromChunks(gno_t I, gno_t J) {
425  int m = findIdxInChunks(I);
426  int n = findIdxInChunks(J);
427  int p = m * (m + 1) / 2 + n;
428  return p;
429  }
430 };
431 
433 // Tpetra::Operator that works with the DistributionLowerTriangularBlock
434 
435 template <typename scalar_t,
436  class Node = ::Tpetra::Details::DefaultTypes::node_type>
437 class LowerTriangularBlockOperator : public Tpetra::Operator<scalar_t, Tpetra::Map<>::local_ordinal_type,
438  Tpetra::Map<>::global_ordinal_type,
439  Node> {
440  public:
441  using lno_t = Tpetra::Map<>::local_ordinal_type;
443  using map_t = Tpetra::Map<>;
444  using import_t = Tpetra::Import<>;
445  using export_t = Tpetra::Export<>;
446  using vector_t = Tpetra::Vector<scalar_t>;
447  using mvector_t = Tpetra::MultiVector<scalar_t>;
449  using dist_t = Tpetra::DistributionLowerTriangularBlock<gno_t, scalar_t>;
450 
451  LowerTriangularBlockOperator(
452  const Teuchos::RCP<const matrix_t> &lowerTriangularMatrix_,
453  const dist_t &dist)
454  : lowerTriangularMatrix(lowerTriangularMatrix_)
455  , permMatrix(dist.getPermutationMatrix()) {
456  // LowerTriangularBlockOperator requires the range map and domain map
457  // to be the same. Check it here.
458  TEUCHOS_TEST_FOR_EXCEPTION(
459  !lowerTriangularMatrix->getRangeMap()->isSameAs(
460  *lowerTriangularMatrix->getDomainMap()),
461  std::logic_error,
462  "The Domain and Range maps of the LowerTriangularBlock matrix "
463  "must be the same");
464 
465  // Extract diagonals
466 
467  vector_t diagByRowMap(lowerTriangularMatrix->getRowMap());
468  lowerTriangularMatrix->getLocalDiagCopy(diagByRowMap);
469  diag = Teuchos::rcp(new vector_t(lowerTriangularMatrix->getRangeMap()));
470  Tpetra::Export<> exporter(lowerTriangularMatrix->getRowMap(),
471  lowerTriangularMatrix->getRangeMap());
472  diag->doExport(diagByRowMap, exporter, Tpetra::ADD);
473  }
474 
475  void apply(const mvector_t &x, mvector_t &y, Teuchos::ETransp mode,
476  scalar_t alpha, scalar_t beta) const {
477  scalar_t ZERO = Teuchos::ScalarTraits<scalar_t>::zero();
478  scalar_t ONE = Teuchos::ScalarTraits<scalar_t>::one();
479  if (alpha == ZERO) {
480  if (beta == ZERO)
481  y.putScalar(ZERO);
482  else
483  y.scale(beta);
484  return;
485  }
486 
487  if (permMatrix == Teuchos::null) {
488  // Multiply lower triangular
489  lowerTriangularMatrix->apply(x, y, Teuchos::NO_TRANS, alpha, beta);
490 
491  // Multiply upper triangular
492  lowerTriangularMatrix->apply(x, y, Teuchos::TRANS, alpha, ONE);
493 
494  // Subtract out duplicate diagonal terms
495  y.elementWiseMultiply(-alpha, *diag, x, ONE);
496  } else {
497  // With sorting, the LowerTriangularBlock distribution stores (P^T A P)
498  // in the CrsMatrix, for permutation matrix P.
499  // Thus, apply must compute
500  // y = P (beta (P^T y) + alpha (P^T A P) (P^T x))
501 
502  vector_t xtmp(x.getMap(), x.getNumVectors());
503  vector_t ytmp(y.getMap(), y.getNumVectors());
504 
505  permMatrix->apply(x, xtmp, Teuchos::TRANS);
506  if (beta != ZERO) permMatrix->apply(y, ytmp, Teuchos::TRANS);
507 
508  // Multiply lower triangular
509  lowerTriangularMatrix->apply(xtmp, ytmp, Teuchos::NO_TRANS, alpha, beta);
510 
511  // Multiply upper triangular
512  lowerTriangularMatrix->apply(xtmp, ytmp, Teuchos::TRANS, alpha, ONE);
513 
514  // Subtract out duplicate diagonal terms
515  ytmp.elementWiseMultiply(-alpha, *diag, xtmp, ONE);
516 
517  permMatrix->apply(ytmp, y, Teuchos::NO_TRANS);
518  }
519  }
520 
521  Teuchos::RCP<const map_t> getDomainMap() const {
522  return lowerTriangularMatrix->getDomainMap();
523  }
524 
525  Teuchos::RCP<const map_t> getRangeMap() const {
526  return lowerTriangularMatrix->getRangeMap();
527  }
528 
529  bool hasTransposeApply() const { return true; } // Symmetric matrix
530 
531  private:
532  const Teuchos::RCP<const matrix_t> lowerTriangularMatrix;
533  const Teuchos::RCP<const matrix_t> permMatrix;
534  Teuchos::RCP<vector_t> diag;
535 };
536 
537 } // namespace Tpetra
538 #endif
Communication plan for data redistribution from a uniquely-owned to a (possibly) multiply-owned distr...
LocalOrdinal local_ordinal_type
The type of local indices.
Sparse matrix that presents a row-oriented interface that lets users read or modify entries...
GlobalOrdinal global_ordinal_type
The type of global indices.
One or more distributed dense vectors.
virtual void apply(const MultiVector< scalar_t, Tpetra::Map<>::local_ordinal_type, Tpetra::Map<>::global_ordinal_type, Node > &X, MultiVector< scalar_t, Tpetra::Map<>::local_ordinal_type, Tpetra::Map<>::global_ordinal_type, Node > &Y, Teuchos::ETransp mode=Teuchos::NO_TRANS, scalar_talpha=Teuchos::ScalarTraits< scalar_t >::one(), scalar_tbeta=Teuchos::ScalarTraits< scalar_t >::zero()) const =0
Computes the operator-multivector application.
void sort(View &view, const size_t &size)
Convenience wrapper for std::sort for host-accessible views.
size_t global_size_t
Global size_t object.
Abstract interface for operators (e.g., matrices and preconditioners).
Communication plan for data redistribution from a (possibly) multiply-owned to a uniquely-owned distr...
Functions for initializing and finalizing Tpetra.
Sets up and executes a communication plan for a Tpetra DistObject.
Sum new values.
Replace old values with zero.
A parallel distribution of indices over processes.
A distributed dense vector.