Tpetra parallel linear algebra
Version of the Day
|
Nonmember function that computes a residual Computes R = B - A * X. More...
Namespaces | |
DefaultTypes | |
Declarations of values of Tpetra classes' default template parameters. | |
DeepCopyCounter | |
Counter for Kokkos::deep_copy calls. | |
FenceCounter | |
Counter for Kokkos::fence calls. | |
KokkosRegionCounter | |
Counter for Kokkos regions representing third-party library usage. | |
Classes | |
struct | AbsMax |
Functor for the the ABSMAX CombineMode of Import and Export operations. More... | |
class | Behavior |
Description of Tpetra's behavior. More... | |
class | CrsPadding |
Keep track of how much more space a CrsGraph or CrsMatrix needs, when the graph or matrix is the target of a doExport or doImport. More... | |
struct | LocalTriangularStructureResult |
Return value of determineLocalTriangularStructure. More... | |
class | DistributorPlan |
struct | EquilibrationInfo |
Struct storing results of Tpetra::computeRowAndColumnOneNorms. More... | |
class | FixedHashTable |
struct | CrsMatrixGetDiagCopyFunctor |
Functor that implements much of the one-argument overload of Tpetra::CrsMatrix::getLocalDiagCopy, for the case where the matrix is fill complete. More... | |
struct | Hash |
The hash function for FixedHashTable. More... | |
struct | Hash< KeyType, DeviceType, OffsetType, int > |
Specialization for ResultType = int. More... | |
class | CommRequest |
Base class for the request (more or less a future) representing a pending nonblocking MPI operation. More... | |
struct | IntRowPtrHelper |
class | LeftScaleLocalCrsMatrix |
Kokkos::parallel_for functor that left-scales a KokkosSparse::CrsMatrix. More... | |
class | LocalMap |
"Local" part of Map suitable for Kokkos kernels. More... | |
class | OptColMap |
Implementation detail of makeOptimizedColMap, and makeOptimizedColMapAndImport. More... | |
struct | MatrixApplyHelper |
struct | PackTraits |
Traits class for packing / unpacking data of type T . More... | |
class | ProfilingRegion |
Profile the given scope. More... | |
struct | LocalResidualFunctor |
Functor for computing the residual. More... | |
struct | OffRankUpdateFunctor |
Functor for computing R -= A_offRank*X_colmap. More... | |
class | RightScaleLocalCrsMatrix |
Kokkos::parallel_for functor that right-scales a KokkosSparse::CrsMatrix. More... | |
struct | ScalarViewTraits |
Traits class for allocating a Kokkos::View<T*, D>. More... | |
class | Transfer |
Common base class of Import and Export. More... | |
class | WrappedDualView |
A wrapper around Kokkos::DualView to safely manage data that might be replicated between host and device. More... | |
class | Directory |
Computes the local ID and process ID corresponding to given global IDs. More... | |
class | ReplicatedDirectory |
Implementation of Directory for a locally replicated Map. More... | |
class | ContiguousUniformDirectory |
Implementation of Directory for a contiguous, uniformly distributed Map. More... | |
class | DistributedContiguousDirectory |
Implementation of Directory for a distributed contiguous Map. More... | |
class | DistributedNoncontiguousDirectory |
Implementation of Directory for a distributed noncontiguous Map. More... | |
class | InvalidGlobalIndex |
Exception thrown by CrsMatrix on invalid global index. More... | |
class | InvalidGlobalRowIndex |
Exception thrown by CrsMatrix on invalid global row index. More... | |
class | HashTable |
class | TieBreak |
Interface for breaking ties in ownership. More... | |
class | CooMatrix |
Sparse matrix used only for file input / output. More... | |
Enumerations | |
enum | EStorageStatus |
Status of the graph's or matrix's storage, when not in a fill-complete state. More... | |
enum | EDistributorSendType |
The type of MPI send that Distributor should use. More... | |
enum | EDistributorHowInitialized |
Enum indicating how and whether a Distributor was initialized. More... | |
enum | EWhichNorm |
Input argument for normImpl() (which see). More... | |
Functions | |
template<class DstViewType , class SrcViewType , class DstWhichVecsType , class SrcWhichVecsType > | |
void | localDeepCopy (const DstViewType &dst, const SrcViewType &src, const bool dstConstStride, const bool srcConstStride, const DstWhichVecsType &dstWhichVecs, const SrcWhichVecsType &srcWhichVecs) |
Implementation of Tpetra::MultiVector deep copy of local data. More... | |
template<class DstViewType , class SrcViewType > | |
void | localDeepCopyConstStride (const DstViewType &dst, const SrcViewType &src) |
Implementation of Tpetra::MultiVector deep copy of local data, for when both the source and destination MultiVector objects have constant stride (isConstantStride() is true). More... | |
template<class SC , class LO , class GO , class NT > | |
void | computeLocalRowScaledColumnNorms_RowMatrix (EquilibrationInfo< typename Kokkos::ArithTraits< SC >::val_type, typename NT::device_type > &result, const Tpetra::RowMatrix< SC, LO, GO, NT > &A) |
For a given Tpetra::RowMatrix that is not a Tpetra::CrsMatrix, assume that result.rowNorms has been computed (and globalized), and compute result.rowScaledColNorms. More... | |
template<class SC , class LO , class GO , class NT > | |
EquilibrationInfo< typename Kokkos::ArithTraits< SC > ::val_type, typename NT::device_type > | computeLocalRowOneNorms_RowMatrix (const Tpetra::RowMatrix< SC, LO, GO, NT > &A) |
Implementation of computeLocalRowOneNorms for a Tpetra::RowMatrix that is NOT a Tpetra::CrsMatrix. More... | |
template<class SC , class LO , class GO , class NT > | |
EquilibrationInfo< typename Kokkos::ArithTraits< SC > ::val_type, typename NT::device_type > | computeLocalRowAndColumnOneNorms_RowMatrix (const Tpetra::RowMatrix< SC, LO, GO, NT > &A, const bool assumeSymmetric) |
Implementation of computeLocalRowAndColumnOneNorms for a Tpetra::RowMatrix that is NOT a Tpetra::CrsMatrix. More... | |
template<class SC , class LO , class GO , class NT > | |
EquilibrationInfo< typename Kokkos::ArithTraits< SC > ::val_type, typename NT::device_type > | computeLocalRowOneNorms_CrsMatrix (const Tpetra::CrsMatrix< SC, LO, GO, NT > &A) |
Implementation of computeLocalRowOneNorms for a Tpetra::CrsMatrix. More... | |
template<class SC , class LO , class GO , class NT > | |
EquilibrationInfo< typename Kokkos::ArithTraits< SC > ::val_type, typename NT::device_type > | computeLocalRowAndColumnOneNorms_CrsMatrix (const Tpetra::CrsMatrix< SC, LO, GO, NT > &A, const bool assumeSymmetric) |
Implementation of computeLocalRowAndColumnOneNorms for a Tpetra::CrsMatrix. More... | |
template<class SC , class LO , class GO , class NT > | |
EquilibrationInfo< typename Kokkos::ArithTraits< SC > ::val_type, typename NT::device_type > | computeLocalRowOneNorms (const Tpetra::RowMatrix< SC, LO, GO, NT > &A) |
Compute LOCAL row one-norms ("row sums" etc.) of the input sparse matrix A. More... | |
template<class SC , class LO , class GO , class NT > | |
EquilibrationInfo< typename Kokkos::ArithTraits< SC > ::val_type, typename NT::device_type > | computeLocalRowAndColumnOneNorms (const Tpetra::RowMatrix< SC, LO, GO, NT > &A, const bool assumeSymmetric) |
Compute LOCAL row and column one-norms ("row sums" etc.) of the input sparse matrix A. Optionally, also compute row-scaled column norms (in the manner of LAPACK's DGEEQU routine). More... | |
template<class LO , class GO , class DT , class OffsetType , class NumEntType > | |
OffsetType | convertColumnIndicesFromGlobalToLocal (const Kokkos::View< LO *, DT > &lclColInds, const Kokkos::View< const GO *, DT > &gblColInds, const Kokkos::View< const OffsetType *, DT > &ptr, const LocalMap< LO, GO, DT > &lclColMap, const Kokkos::View< const NumEntType *, DT > &numRowEnt) |
Convert a CrsGraph's global column indices into local column indices. More... | |
template<class SC , class LO , class GO , class NO > | |
void | residual (const Operator< SC, LO, GO, NO > &Aop, const MultiVector< SC, LO, GO, NO > &X_in, const MultiVector< SC, LO, GO, NO > &B_in, MultiVector< SC, LO, GO, NO > &R_in) |
Computes R = B - A * X. More... | |
template<class InputViewType , class OutputViewType > | |
static void | allReduceView (const OutputViewType &output, const InputViewType &input, const Teuchos::Comm< int > &comm) |
All-reduce from input Kokkos::View to output Kokkos::View. More... | |
template<class ValueType , class DeviceType > | |
Kokkos::DualView< ValueType *, DeviceType > | castAwayConstDualView (const Kokkos::DualView< const ValueType *, DeviceType > &input_dv) |
Cast away const-ness of a 1-D Kokkos::DualView. More... | |
template<class DataType , class... Properties> | |
bool | checkLocalViewValidity (std::ostream *lclErrStrm, const int myMpiProcessRank, const Kokkos::View< DataType, Properties...> &view) |
Is the given View valid? More... | |
template<class DataType , class... Args> | |
bool | checkLocalDualViewValidity (std::ostream *const lclErrStrm, const int myMpiProcessRank, const Kokkos::DualView< DataType, Args...> &dv) |
Is the given Kokkos::DualView valid? More... | |
template<class DataType , class... Args> | |
bool | checkLocalWrappedDualViewValidity (std::ostream *const lclErrStrm, const int myMpiProcessRank, const Tpetra::Details::WrappedDualView< Kokkos::DualView< DataType, Args...> > &dv) |
Is the given Tpetra::WrappedDualView valid? More... | |
template<class ExecutionSpace , class OffsetsViewType , class CountsViewType , class SizeType = typename OffsetsViewType::size_type> | |
OffsetsViewType::non_const_value_type | computeOffsetsFromCounts (const ExecutionSpace &execSpace, const OffsetsViewType &ptr, const CountsViewType &counts) |
Compute offsets from counts. More... | |
template<class OffsetsViewType , class CountsViewType , class SizeType = typename OffsetsViewType::size_type> | |
OffsetsViewType::non_const_value_type | computeOffsetsFromCounts (const OffsetsViewType &ptr, const CountsViewType &counts) |
Overload that uses OffsetsViewType's execution space. More... | |
template<class OffsetsViewType , class CountType , class SizeType = typename OffsetsViewType::size_type> | |
OffsetsViewType::non_const_value_type | computeOffsetsFromConstantCount (const OffsetsViewType &ptr, const CountType count) |
Compute offsets from a constant count. More... | |
template<class OutputViewType , class InputViewType > | |
void | copyConvert (const OutputViewType &dst, const InputViewType &src) |
Copy values from the 1-D Kokkos::View src, to the 1-D Kokkos::View dst, of the same length. The entries of src and dst may have different types, but it must be possible to copy-construct each entry of dst with its corresponding entry of src. More... | |
template<class OutputViewType , class InputViewType > | |
void | copyOffsets (const OutputViewType &dst, const InputViewType &src) |
Copy row offsets (in a sparse graph or matrix) from src to dst. The offsets may have different types. More... | |
template<class ValueType , class OutputDeviceType > | |
Impl::CreateMirrorViewFromUnmanagedHostArray < ValueType, OutputDeviceType > ::output_view_type | create_mirror_view_from_raw_host_array (const OutputDeviceType &, ValueType *inPtr, const size_t inSize, const bool copy=true, const char label[]="") |
Variant of Kokkos::create_mirror_view that takes a raw host 1-d array as input. More... | |
template<class SparseMatrixType , class ValsViewType > | |
KOKKOS_FUNCTION SparseMatrixType::ordinal_type | crsMatrixSumIntoValues_sortedSortedLinear (const SparseMatrixType &A, const typename SparseMatrixType::ordinal_type lclRow, const typename SparseMatrixType::ordinal_type lclColInds[], const typename SparseMatrixType::ordinal_type sortPerm[], const ValsViewType &vals, const typename SparseMatrixType::ordinal_type numEntInInput, const bool forceAtomic=false, const bool checkInputIndices=true) |
A(lclRow, lclColsInds[sortPerm[j]]) += vals[sortPerm[j]] , for all j in 0 .. eltDim-1. More... | |
template<class SparseMatrixType , class ValsViewType > | |
KOKKOS_FUNCTION SparseMatrixType::ordinal_type | crsMatrixReplaceValues_sortedSortedLinear (const SparseMatrixType &A, const typename SparseMatrixType::ordinal_type lclRow, const typename SparseMatrixType::ordinal_type lclColInds[], const typename SparseMatrixType::ordinal_type sortPerm[], const ValsViewType &vals, const typename SparseMatrixType::ordinal_type numEntInInput, const bool forceAtomic=false, const bool checkInputIndices=true) |
A(lclRow, lclColsInds[sortPerm[j]]) = vals[sortPerm[j]] , for all j in 0 .. eltDim-1. More... | |
template<class SparseMatrixType , class VectorViewType , class RhsViewType , class LhsViewType > | |
KOKKOS_FUNCTION SparseMatrixType::ordinal_type | crsMatrixAssembleElement_sortedLinear (const SparseMatrixType &A, const VectorViewType &x, typename SparseMatrixType::ordinal_type lids[], typename SparseMatrixType::ordinal_type sortPerm[], const RhsViewType &rhs, const LhsViewType &lhs, const bool forceAtomic=false, const bool checkInputIndices=true) |
A(lids[j], lids[j]) += lhs(j,j) and x(lids[j]) += rhs(j) , for all j in 0 .. eltDim-1. More... | |
template<class RowPtr , class Indices , class Padding > | |
void | padCrsArrays (const RowPtr &rowPtrBeg, const RowPtr &rowPtrEnd, Indices &indices_wdv, const Padding &padding, const int my_rank, const bool verbose) |
Determine if the row pointers and indices arrays need to be resized to accommodate new entries. If they do need to be resized, resize the indices arrays and shift the existing contents to accommodate new entries. Modify values in the row pointers array to point to the newly shifted locations in the indices arrays. More... | |
template<class Pointers , class InOutIndices , class InIndices > | |
size_t | insertCrsIndices (typename Pointers::value_type const row, Pointers const &rowPtrs, InOutIndices &curIndices, size_t &numAssigned, InIndices const &newIndices, std::function< void(const size_t, const size_t, const size_t)> cb=std::function< void(const size_t, const size_t, const size_t)>()) |
Insert new indices in to current list of indices. More... | |
template<class Pointers , class Indices1 , class Indices2 , class Callback > | |
size_t | findCrsIndices (typename Pointers::value_type const row, Pointers const &rowPtrs, const size_t curNumEntries, Indices1 const &curIndices, Indices2 const &newIndices, Callback &&cb) |
Finds offsets in to current list of indices. More... | |
template<class LocalGraphType , class LocalMapType > | |
LocalTriangularStructureResult < typename LocalMapType::local_ordinal_type > | determineLocalTriangularStructure (const LocalGraphType &G, const LocalMapType &rowMap, const LocalMapType &colMap, const bool ignoreMapsForTriangularStructure) |
Count the local number of diagonal entries in a local sparse graph, and determine whether the local part of the graph is structurally lower or upper triangular (or neither). More... | |
std::string | DistributorSendTypeEnumToString (EDistributorSendType sendType) |
Convert an EDistributorSendType enum value to a string. More... | |
std::string | DistributorHowInitializedEnumToString (EDistributorHowInitialized how) |
Convert an EDistributorHowInitialized enum value to a string. More... | |
auto | view_alloc_no_init (const std::string &label) -> decltype(Kokkos::view_alloc(label, Kokkos::WithoutInitializing)) |
Use in place of the string label as the first argument of Kokkos::View's constructor, in case you want to allocate without initializing. More... | |
template<class ElementType , class DeviceType > | |
void | makeDualViewFromOwningHostView (Kokkos::DualView< ElementType *, DeviceType > &dv, const typename Kokkos::DualView< ElementType *, DeviceType >::t_host &hostView) |
Initialize dv such that its host View is hostView . More... | |
bool | teuchosCommIsAnMpiComm (const Teuchos::Comm< int > &comm) |
Is the given Comm a Teuchos::MpiComm<int> instance? More... | |
void | gathervPrint (std::ostream &out, const std::string &s, const Teuchos::Comm< int > &comm) |
On Process 0 in the given communicator, print strings from each process in that communicator, in rank order. More... | |
template<class DiagType , class LocalMapType , class CrsMatrixType > | |
static LocalMapType::local_ordinal_type | getDiagCopyWithoutOffsets (const DiagType &D, const LocalMapType &rowMap, const LocalMapType &colMap, const CrsMatrixType &A) |
Given a locally indexed, local sparse matrix, and corresponding local row and column Maps, extract the matrix's diagonal entries into a 1-D Kokkos::View. More... | |
template<class SC , class LO , class GO , class NT > | |
LO | getLocalDiagCopyWithoutOffsetsNotFillComplete (::Tpetra::Vector< SC, LO, GO, NT > &diag, const ::Tpetra::RowMatrix< SC, LO, GO, NT > &A, const bool debug=false) |
Given a locally indexed, global sparse matrix, extract the matrix's diagonal entries into a Tpetra::Vector. More... | |
template<class CrsGraphType > | |
CrsGraphType::local_ordinal_type | getLocalNumDiags (const CrsGraphType &G) |
Number of populated diagonal entries in the given sparse graph, on the calling (MPI) process. More... | |
template<class CrsGraphType > | |
CrsGraphType::global_ordinal_type | getGlobalNumDiags (const CrsGraphType &G) |
Number of populated diagonal entries in the given sparse graph, over all processes in the graph's (MPI) communicator. More... | |
template<class InputViewType , class OutputViewType > | |
std::shared_ptr< CommRequest > | iallreduce (const InputViewType &sendbuf, const OutputViewType &recvbuf, const ::Teuchos::EReductionType op, const ::Teuchos::Comm< int > &comm) |
Nonblocking all-reduce, for either rank-1 or rank-0 Kokkos::View objects. More... | |
void | initializeKokkos () |
Initialize Kokkos, using command-line arguments (if any) given to Teuchos::GlobalMPISession. More... | |
bool | isInterComm (const Teuchos::Comm< int > &comm) |
Return true if and only if the input communicator wraps an MPI intercommunicator. More... | |
template<class LocalSparseMatrixType , class ScalingFactorsViewType > | |
void | leftScaleLocalCrsMatrix (const LocalSparseMatrixType &A_lcl, const ScalingFactorsViewType &scalingFactors, const bool assumeSymmetric, const bool divide=true) |
Left-scale a KokkosSparse::CrsMatrix. More... | |
template<class LO , class GO , class NT > | |
int | makeColMap (Teuchos::RCP< const Tpetra::Map< LO, GO, NT > > &colMap, Teuchos::Array< int > &remotePIDs, const Teuchos::RCP< const Tpetra::Map< LO, GO, NT > > &domMap, const RowGraph< LO, GO, NT > &graph, const bool sortEachProcsGids=true, std::ostream *errStrm=NULL) |
Make the graph's column Map. More... | |
template<class LO , class GO , class NT > | |
int | makeColMap (Teuchos::RCP< const Tpetra::Map< LO, GO, NT >> &colMap, const Teuchos::RCP< const Tpetra::Map< LO, GO, NT >> &domMap, Kokkos::View< GO *, typename NT::memory_space > gids, std::ostream *errStrm=NULL) |
Construct a column map for the given set of gids (always sorting remote GIDs within each remote process). More... | |
template<class MapType > | |
Teuchos::RCP< const MapType > | makeOptimizedColMap (std::ostream &errStream, bool &lclErr, const MapType &domMap, const MapType &colMap, const Tpetra::Import< typename MapType::local_ordinal_type, typename MapType::global_ordinal_type, typename MapType::node_type > *oldImport=nullptr) |
Return an optimized reordering of the given column Map. More... | |
template<class MapType > | |
std::pair< Teuchos::RCP< const MapType >, Teuchos::RCP < typename OptColMap< MapType > ::import_type > > | makeOptimizedColMapAndImport (std::ostream &errStream, bool &lclErr, const MapType &domMap, const MapType &colMap, const typename OptColMap< MapType >::import_type *oldImport=nullptr) |
Return an optimized reordering of the given column Map. Optionally, recompute an Import from the input domain Map to the new column Map. More... | |
template<class OrdinalType , class IndexType > | |
IndexType | countMergeUnsortedIndices (const OrdinalType curInds[], const IndexType numCurInds, const OrdinalType inputInds[], const IndexType numInputInds) |
Count the number of column indices that can be merged into the current row, assuming that both the current row's indices and the input indices are unsorted. More... | |
template<class OrdinalType , class IndexType > | |
IndexType | countMergeSortedIndices (const OrdinalType curInds[], const IndexType numCurInds, const OrdinalType inputInds[], const IndexType numInputInds) |
Count the number of column indices that can be merged into the current row, assuming that both the current row's indices and the input indices are sorted. More... | |
template<class OrdinalType , class IndexType > | |
std::pair< bool, IndexType > | mergeSortedIndices (OrdinalType curInds[], const IndexType midPos, const IndexType endPos, const OrdinalType inputInds[], const IndexType numInputInds) |
Attempt to merge the input indices into the current row's column indices, assuming that both the current row's indices and the input indices are sorted. More... | |
template<class OrdinalType , class IndexType > | |
std::pair< bool, IndexType > | mergeUnsortedIndices (OrdinalType curInds[], const IndexType midPos, const IndexType endPos, const OrdinalType inputInds[], const IndexType numInputInds) |
Attempt to merge the input indices into the current row's column indices, assuming that both the current row's indices and the input indices are unsorted. More... | |
template<class OrdinalType , class ValueType , class IndexType > | |
std::pair< bool, IndexType > | mergeUnsortedIndicesAndValues (OrdinalType curInds[], ValueType curVals[], const IndexType midPos, const IndexType endPos, const OrdinalType inputInds[], const ValueType inputVals[], const IndexType numInputInds) |
Attempt to merge the input indices and values into the current row's column indices and corresponding values, assuming that both the current row's indices and the input indices are unsorted. More... | |
bool | mpiIsInitialized () |
Has MPI_Init been called (on this process)? More... | |
bool | mpiIsFinalized () |
Has MPI_Finalize been called (on this process)? More... | |
template<class ValueType , class ArrayLayout , class DeviceType , class MagnitudeType > | |
void | normImpl (MagnitudeType norms[], const Kokkos::View< const ValueType **, ArrayLayout, DeviceType > &X, const EWhichNorm whichNorm, const Teuchos::ArrayView< const size_t > &whichVecs, const bool isConstantStride, const bool isDistributed, const Teuchos::Comm< int > *comm) |
Implementation of MultiVector norms. More... | |
template<typename LO , typename GO , typename NT > | |
void | packCrsGraph (const CrsGraph< LO, GO, NT > &sourceGraph, Teuchos::Array< typename CrsGraph< LO, GO, NT >::packet_type > &exports, const Teuchos::ArrayView< size_t > &numPacketsPerLID, const Teuchos::ArrayView< const LO > &exportLIDs, size_t &constantNumPackets) |
Pack specified entries of the given local sparse graph for communication. More... | |
template<typename LO , typename GO , typename NT > | |
void | packCrsGraphNew (const CrsGraph< LO, GO, NT > &sourceGraph, const Kokkos::DualView< const LO *, typename CrsGraph< LO, GO, NT >::buffer_device_type > &exportLIDs, const Kokkos::DualView< const int *, typename CrsGraph< LO, GO, NT >::buffer_device_type > &exportPIDs, Kokkos::DualView< typename CrsGraph< LO, GO, NT >::packet_type *, typename CrsGraph< LO, GO, NT >::buffer_device_type > &exports, Kokkos::DualView< size_t *, typename CrsGraph< LO, GO, NT >::buffer_device_type > numPacketsPerLID, size_t &constantNumPackets, const bool pack_pids) |
Pack specified entries of the given local sparse graph for communication, for "new" DistObject interface. More... | |
template<typename LO , typename GO , typename NT > | |
void | packCrsGraphWithOwningPIDs (const CrsGraph< LO, GO, NT > &sourceGraph, Kokkos::DualView< typename CrsGraph< LO, GO, NT >::packet_type *, typename CrsGraph< LO, GO, NT >::buffer_device_type > &exports_dv, const Teuchos::ArrayView< size_t > &numPacketsPerLID, const Teuchos::ArrayView< const LO > &exportLIDs, const Teuchos::ArrayView< const int > &sourcePIDs, size_t &constantNumPackets) |
Pack specified entries of the given local sparse graph for communication. More... | |
template<typename ST , typename LO , typename GO , typename NT > | |
void | packCrsMatrix (const CrsMatrix< ST, LO, GO, NT > &sourceMatrix, Teuchos::Array< char > &exports, const Teuchos::ArrayView< size_t > &numPacketsPerLID, const Teuchos::ArrayView< const LO > &exportLIDs, size_t &constantNumPackets) |
Pack specified entries of the given local sparse matrix for communication. More... | |
template<typename ST , typename LO , typename GO , typename NT > | |
void | packCrsMatrixNew (const CrsMatrix< ST, LO, GO, NT > &sourceMatrix, Kokkos::DualView< char *, typename DistObject< char, LO, GO, NT >::buffer_device_type > &exports, const Kokkos::DualView< size_t *, typename DistObject< char, LO, GO, NT >::buffer_device_type > &numPacketsPerLID, const Kokkos::DualView< const LO *, typename DistObject< char, LO, GO, NT >::buffer_device_type > &exportLIDs, size_t &constantNumPackets) |
Pack specified entries of the given local sparse matrix for communication, for "new" DistObject interface. More... | |
template<typename ST , typename LO , typename GO , typename NT > | |
void | packCrsMatrixWithOwningPIDs (const CrsMatrix< ST, LO, GO, NT > &sourceMatrix, Kokkos::DualView< char *, typename DistObject< char, LO, GO, NT >::buffer_device_type > &exports_dv, const Teuchos::ArrayView< size_t > &numPacketsPerLID, const Teuchos::ArrayView< const LO > &exportLIDs, const Teuchos::ArrayView< const int > &sourcePIDs, size_t &constantNumPackets) |
Pack specified entries of the given local sparse matrix for communication. More... | |
void | printOnce (std::ostream &out, const std::string &s, const Teuchos::Comm< int > *comm) |
Print on one process of the given communicator, or at least try to do so (if MPI is not initialized). More... | |
template<typename KeyType , typename ValueType , typename IndexType > | |
KOKKOS_INLINE_FUNCTION void | radixSortKeysAndValues (KeyType *keys, KeyType *keysAux, ValueType *values, ValueType *valuesAux, IndexType n, IndexType upperBound) |
Radix sort the input array keys , and permute values identically to the keys. More... | |
template<class ValueType , class DeviceType > | |
bool | reallocDualViewIfNeeded (Kokkos::DualView< ValueType *, DeviceType > &dv, const size_t newSize, const char newLabel[], const size_t tooBigFactor=2, const bool needFenceBeforeRealloc=true) |
Reallocate the DualView in/out argument, if needed. More... | |
template<class ValueType , class DeviceType > | |
bool | reallocDualViewIfNeeded (Kokkos::DualView< ValueType *, DeviceType > &exports, const size_t newSize, const std::string &newLabel, const size_t tooBigFactor=2, const bool needFenceBeforeRealloc=true) |
Like above, but with std::string label argument. More... | |
template<class LocalSparseMatrixType , class ScalingFactorsViewType > | |
void | rightScaleLocalCrsMatrix (const LocalSparseMatrixType &A_lcl, const ScalingFactorsViewType &scalingFactors, const bool assumeSymmetric, const bool divide=true) |
Right-scale a KokkosSparse::CrsMatrix. More... | |
template<class KeyType , class ValueType > | |
KOKKOS_FUNCTION void | shortSortKeysAndValues_2 (KeyType keys[2], ValueType values[2]) |
Sort keys and values jointly, by keys, for arrays of length 2. More... | |
template<class KeyType > | |
KOKKOS_FUNCTION void | shortSortKeys_2 (KeyType keys[2]) |
Sort length-2 array of keys. More... | |
template<class KeyType , class ValueType > | |
KOKKOS_FUNCTION void | shortSortKeysAndValues_3 (KeyType keys[3], ValueType values[3]) |
Sort keys and values jointly, by keys, for arrays of length 3. More... | |
template<class KeyType > | |
KOKKOS_FUNCTION void | shortSortKeys_3 (KeyType keys[3]) |
Sort length-3 array of keys. More... | |
template<class KeyType , class ValueType > | |
KOKKOS_FUNCTION void | shortSortKeysAndValues_4 (KeyType keys[4], ValueType values[4]) |
Sort keys and values jointly, by keys, for arrays of length 4. More... | |
template<class KeyType > | |
KOKKOS_FUNCTION void | shortSortKeys_4 (KeyType keys[4]) |
Sort length-4 array of keys. More... | |
template<class KeyType , class ValueType > | |
KOKKOS_FUNCTION void | shortSortKeysAndValues_8 (KeyType keys[8], ValueType values[8]) |
Sort keys and values jointly, by keys, for arrays of length 8. More... | |
template<class KeyType > | |
KOKKOS_FUNCTION void | shortSortKeys_8 (KeyType keys[8]) |
Sort length-8 array of keys. More... | |
template<class KeyType , class ValueType , class IndexType > | |
KOKKOS_FUNCTION void | shellSortKeysAndValues (KeyType keys[], ValueType values[], const IndexType n) |
Shellsort (yes, it's one word) the input array keys , and apply the resulting permutation to the input array values . More... | |
template<class KeyType , class IndexType > | |
KOKKOS_FUNCTION void | shellSortKeys (KeyType keys[], const IndexType n) |
Shellsort (yes, it's one word) the input array keys . More... | |
template<class LO , class GO , class NT > | |
size_t | unpackAndCombineWithOwningPIDsCount (const CrsGraph< LO, GO, NT > &sourceGraph, const Teuchos::ArrayView< const LO > &importLIDs, const Teuchos::ArrayView< const typename CrsGraph< LO, GO, NT >::packet_type > &imports, const Teuchos::ArrayView< const size_t > &numPacketsPerLID, size_t constantNumPackets, CombineMode combineMode, size_t numSameIDs, const Teuchos::ArrayView< const LO > &permuteToLIDs, const Teuchos::ArrayView< const LO > &permuteFromLIDs) |
Special version of Tpetra::Details::unpackCrsGraphAndCombine that also unpacks owning process ranks. More... | |
template<class LO , class GO , class NT > | |
void | unpackAndCombineIntoCrsArrays (const CrsGraph< LO, GO, NT > &sourceGraph, const Teuchos::ArrayView< const LO > &importLIDs, const Teuchos::ArrayView< const typename CrsGraph< LO, GO, NT >::packet_type > &imports, const Teuchos::ArrayView< const size_t > &numPacketsPerLID, const size_t constantNumPackets, const CombineMode combineMode, const size_t numSameIDs, const Teuchos::ArrayView< const LO > &permuteToLIDs, const Teuchos::ArrayView< const LO > &permuteFromLIDs, size_t TargetNumRows, size_t TargetNumNonzeros, const int MyTargetPID, const Teuchos::ArrayView< size_t > &CRS_rowptr, const Teuchos::ArrayView< GO > &CRS_colind, const Teuchos::ArrayView< const int > &SourcePids, Teuchos::Array< int > &TargetPids) |
unpackAndCombineIntoCrsArrays More... | |
template<class LocalOrdinal , class GlobalOrdinal , class Node > | |
size_t | unpackAndCombineWithOwningPIDsCount (const CrsGraph< LocalOrdinal, GlobalOrdinal, Node > &sourceGraph, const Teuchos::ArrayView< const LocalOrdinal > &importLIDs, const Teuchos::ArrayView< const typename CrsGraph< LocalOrdinal, GlobalOrdinal, Node >::packet_type > &imports, const Teuchos::ArrayView< const size_t > &numPacketsPerLID, size_t, CombineMode, size_t numSameIDs, const Teuchos::ArrayView< const LocalOrdinal > &permuteToLIDs, const Teuchos::ArrayView< const LocalOrdinal > &permuteFromLIDs) |
Special version of Tpetra::Details::unpackCrsGraphAndCombine that also unpacks owning process ranks. More... | |
template<class LocalOrdinal , class GlobalOrdinal , class Node > | |
void | unpackAndCombineIntoCrsArrays (const CrsGraph< LocalOrdinal, GlobalOrdinal, Node > &sourceGraph, const Teuchos::ArrayView< const LocalOrdinal > &importLIDs, const Teuchos::ArrayView< const typename CrsGraph< LocalOrdinal, GlobalOrdinal, Node >::packet_type > &imports, const Teuchos::ArrayView< const size_t > &numPacketsPerLID, const size_t, const CombineMode, const size_t numSameIDs, const Teuchos::ArrayView< const LocalOrdinal > &permuteToLIDs, const Teuchos::ArrayView< const LocalOrdinal > &permuteFromLIDs, size_t TargetNumRows, size_t TargetNumNonzeros, const int MyTargetPID, const Teuchos::ArrayView< size_t > &CRS_rowptr, const Teuchos::ArrayView< GlobalOrdinal > &CRS_colind, const Teuchos::ArrayView< const int > &SourcePids, Teuchos::Array< int > &TargetPids) |
unpackAndCombineIntoCrsArrays More... | |
template<typename ST , typename LO , typename GO , typename NT > | |
void | unpackCrsMatrixAndCombine (const CrsMatrix< ST, LO, GO, NT > &sourceMatrix, const Teuchos::ArrayView< const char > &imports, const Teuchos::ArrayView< const size_t > &numPacketsPerLID, const Teuchos::ArrayView< const LO > &importLIDs, size_t constantNumPackets, CombineMode combineMode) |
Unpack the imported column indices and values, and combine into matrix. More... | |
template<typename Scalar , typename LocalOrdinal , typename GlobalOrdinal , typename Node > | |
size_t | unpackAndCombineWithOwningPIDsCount (const CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node > &sourceMatrix, const Teuchos::ArrayView< const LocalOrdinal > &importLIDs, const Teuchos::ArrayView< const char > &imports, const Teuchos::ArrayView< const size_t > &numPacketsPerLID, size_t constantNumPackets, CombineMode combineMode, size_t numSameIDs, const Teuchos::ArrayView< const LocalOrdinal > &permuteToLIDs, const Teuchos::ArrayView< const LocalOrdinal > &permuteFromLIDs) |
Special version of Tpetra::Details::unpackCrsMatrixAndCombine that also unpacks owning process ranks. More... | |
template<typename Scalar , typename LocalOrdinal , typename GlobalOrdinal , typename Node > | |
void | unpackAndCombineIntoCrsArrays (const CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node > &sourceMatrix, const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void >, const Kokkos::View< const char *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void >, const Kokkos::View< const size_t *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void >, const size_t numSameIDs, const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void >, const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void >, size_t TargetNumRows, const int MyTargetPID, Teuchos::ArrayRCP< size_t > &CRS_rowptr, Teuchos::ArrayRCP< GlobalOrdinal > &CRS_colind, Teuchos::ArrayRCP< Scalar > &CRS_vals, const Teuchos::ArrayView< const int > &SourcePids, Teuchos::Array< int > &TargetPids) |
unpackAndCombineIntoCrsArrays More... | |
template<typename Scalar , typename LocalOrdinal , typename GlobalOrdinal , typename Node > | |
void | unpackAndCombineIntoCrsArrays (const CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node > &sourceMatrix, const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > import_lids_d, const Kokkos::View< const char *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > imports_d, const Kokkos::View< const size_t *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > num_packets_per_lid_d, const size_t numSameIDs, const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > permute_to_lids_d, const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > permute_from_lids_d, size_t TargetNumRows, const int MyTargetPID, Kokkos::View< size_t *, typename Node::device_type > &crs_rowptr_d, Kokkos::View< GlobalOrdinal *, typename Node::device_type > &crs_colind_d, Kokkos::View< typename CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node >::impl_scalar_type *, typename Node::device_type > &crs_vals_d, const Teuchos::ArrayView< const int > &SourcePids, Kokkos::View< int *, typename Node::device_type > &TargetPids) |
unpackAndCombineIntoCrsArrays More... | |
template<typename ST , typename LO , typename GO , typename Node > | |
void | unpackCrsMatrixAndCombine (const CrsMatrix< ST, LO, GO, Node > &sourceMatrix, const Teuchos::ArrayView< const char > &imports, const Teuchos::ArrayView< const size_t > &numPacketsPerLID, const Teuchos::ArrayView< const LO > &importLIDs, size_t, CombineMode combineMode) |
Unpack the imported column indices and values, and combine into matrix. More... | |
void | enableWDVTracking () |
Enable WrappedDualView reference-count tracking and syncing. Call this after exiting a host-parallel region that uses WrappedDualView. More... | |
void | disableWDVTracking () |
Disable WrappedDualView reference-count tracking and syncing. Call this before entering a host-parallel region that uses WrappedDualView. For each WrappedDualView used in the parallel region, its view must be accessed (e.g. getHostView...) before disabling the tracking, so that it may be synced and marked modified correctly. More... | |
template<class MV , class ResultView , bool runOnDevice> | |
void | idotLocal (const ResultView &localResult, const MV &X, const MV &Y) |
Compute dot product locally. Where the kernel runs controlled by runOnDevice. More... | |
template<class MV , class ResultView > | |
std::shared_ptr < ::Tpetra::Details::CommRequest > | idotImpl (const ResultView &globalResult, const MV &X, const MV &Y) |
Internal (common) version of idot, a global dot product that uses a non-blocking MPI reduction. More... | |
bool | congruent (const Teuchos::Comm< int > &comm1, const Teuchos::Comm< int > &comm2) |
Whether the two communicators are congruent. More... | |
std::unique_ptr< std::string > | createPrefix (const int myRank, const char prefix[]) |
Create string prefix for each line of verbose output. More... | |
std::unique_ptr< std::string > | createPrefix (const Teuchos::Comm< int > *comm, const char functionName[]) |
Create string prefix for each line of verbose output, for a Tpetra function (not a class or instance method). More... | |
std::unique_ptr< std::string > | createPrefix (const Teuchos::Comm< int > *, const char className[], const char methodName[]) |
Create string prefix for each line of verbose output, for a method of a Tpetra class. More... | |
template<class DualViewType > | |
Teuchos::ArrayView< typename DualViewType::t_dev::value_type > | getArrayViewFromDualView (const DualViewType &x) |
Get a Teuchos::ArrayView which views the host Kokkos::View of the input 1-D Kokkos::DualView. More... | |
template<class T , class DT > | |
Kokkos::DualView< T *, DT > | getDualViewCopyFromArrayView (const Teuchos::ArrayView< const T > &x_av, const char label[], const bool leaveOnHost) |
Get a 1-D Kokkos::DualView which is a deep copy of the input Teuchos::ArrayView (which views host memory). More... | |
template<class DualViewType > | |
std::string | dualViewStatusToString (const DualViewType &dv, const char name[]) |
Return the status of the given Kokkos::DualView, as a human-readable string. More... | |
template<class ArrayType > | |
void | verbosePrintArray (std::ostream &out, const ArrayType &x, const char name[], const size_t maxNumToPrint) |
Print min(x.size(), maxNumToPrint) entries of x. More... | |
int | countPackTriplesCount (const ::Teuchos::Comm< int > &comm, int &size, std::ostream *errStrm=NULL) |
Compute the buffer size required by packTriples for packing the number of matrix entries ("triples"). More... | |
int | packTriplesCount (const int numEnt, char outBuf[], const int outBufSize, int &outBufCurPos, const ::Teuchos::Comm< int > &comm, std::ostream *errStrm=NULL) |
Pack the count (number) of matrix triples. More... | |
int | unpackTriplesCount (const char inBuf[], const int inBufSize, int &inBufCurPos, int &numEnt,const ::Teuchos::Comm< int > &comm, std::ostream *errStrm=NULL) |
Unpack just the count of triples from the given input buffer. More... | |
template<class ScalarType , class OrdinalType > | |
int | countPackTriples (const int numEnt, const ::Teuchos::Comm< int > &comm, int &size, std::ostream *errStrm=NULL) |
Compute the buffer size required by packTriples for packing numEnt number of (i,j,A(i,j)) matrix entries ("triples"). More... | |
template<class ScalarType , class OrdinalType > | |
int | packTriples (const OrdinalType[], const OrdinalType[], const ScalarType[], const int, char[], const int, int &, const ::Teuchos::Comm< int > &, std::ostream *errStrm=NULL) |
Pack matrix entries ("triples" (i, j, A(i,j))) into the given output buffer. More... | |
template<class ScalarType , class OrdinalType > | |
int | unpackTriples (const char[], const int, int &, OrdinalType[], OrdinalType[], ScalarType[], const int, const ::Teuchos::Comm< int > &, std::ostream *errStrm=NULL) |
Unpack matrix entries ("triples" (i, j, A(i,j))) from the given input buffer. More... | |
template<class SC , class GO > | |
int | readAndDealOutTriples (std::istream &inputStream, std::size_t &curLineNum, std::size_t &totalNumEntRead, std::function< int(const GO, const GO, const SC &)> processTriple, const std::size_t maxNumEntPerMsg, const ::Teuchos::Comm< int > &comm, const bool tolerant=false, std::ostream *errStrm=NULL, const bool debug=false) |
On Process 0 in the given communicator, read sparse matrix entries (in chunks of at most maxNumEntPerMsg entries at a time) from the input stream, and "deal them out" to all other processes in the communicator. More... | |
Variables | |
bool | wdvTrackingEnabled = true |
Whether WrappedDualView reference count checking is enabled. Initially true. Since the DualView sync functions are not thread-safe, tracking should be disabled during host-parallel regions where WrappedDualView is used. More... | |
Nonmember function that computes a residual Computes R = B - A * X.
Namespace for Tpetra implementation details.
Status of the graph's or matrix's storage, when not in a fill-complete state.
When a CrsGraph or CrsMatrix is not fill complete and is allocated, then its data live in one of two storage formats:
"Unpacked 1-D storage": The graph uses a row offsets array, and stores column indices in a single array. The matrix also stores values in a single array. "Unpacked" means that there may be extra space in each row: that is, the row offsets array only says how much space there is in each row. The graph must use k_numRowEntries_ to find out how many entries there actually are in the row. A matrix with unpacked 1-D storage must own its graph, and the graph must have unpacked 1-D storage.
The phrase "When not in a fill-complete state" is important. When the graph is fill complete, it always uses 1-D "packed" storage. However, if storage is "not optimized," we retain the 1-D unpacked format, and thus retain this enum value.
Definition at line 120 of file Tpetra_CrsGraph_decl.hpp.
The type of MPI send that Distributor should use.
This is an implementation detail of Distributor. Please do not rely on these values in your code.
Definition at line 42 of file Tpetra_Details_DistributorPlan.hpp.
Enum indicating how and whether a Distributor was initialized.
This is an implementation detail of Distributor. Please do not rely on these values in your code.
Definition at line 64 of file Tpetra_Details_DistributorPlan.hpp.
Input argument for normImpl() (which see).
Definition at line 46 of file Tpetra_Details_normImpl.hpp.
void Tpetra::Details::localDeepCopy | ( | const DstViewType & | dst, |
const SrcViewType & | src, | ||
const bool | dstConstStride, | ||
const bool | srcConstStride, | ||
const DstWhichVecsType & | dstWhichVecs, | ||
const SrcWhichVecsType & | srcWhichVecs | ||
) |
Implementation of Tpetra::MultiVector deep copy of local data.
This implements Tpetra::MultiVector
deep copy, as in
Tpetra::deep_copy
Tpetra::MultiVector::assign
Tpetra::MultiVector::createCopy
The two-argument MultiVector copy constructor with Teuchos::Copy
as the second argument
dst | [in/out] Rank-2 Kokkos::View ; destination of the copy |
src | [in] Rank-2 Kokkos::View ; source of the copy |
dstConstStride | [in] Whether dst is "constant
stride." If so, then the j -th column of dst has index j . If not, then it has index dstWhichVecs[j] . |
srcConstStride | [in] Whether src is "constant
stride." If so, then the j -th column of src has index j . If not, then it has index srcWhichVecs[j] . |
dstWhichVecs | [in] Host-readable Rank-1 array of some kind, corresponding to dst.whichVectors_ . Need only be readable (from host) if dstConstStride is true. |
srcWhichVecs | [in] Host-readable Rank-1 array of some kind, corresponding to src.whichVectors_ . Need only be readable (from host) if srcConstStride is true. |
Definition at line 100 of file Tpetra_KokkosRefactor_Details_MultiVectorLocalDeepCopy.hpp.
void Tpetra::Details::localDeepCopyConstStride | ( | const DstViewType & | dst, |
const SrcViewType & | src | ||
) |
Implementation of Tpetra::MultiVector deep copy of local data, for when both the source and destination MultiVector objects have constant stride (isConstantStride() is true).
Definition at line 137 of file Tpetra_KokkosRefactor_Details_MultiVectorLocalDeepCopy.hpp.
void Tpetra::Details::computeLocalRowScaledColumnNorms_RowMatrix | ( | EquilibrationInfo< typename Kokkos::ArithTraits< SC >::val_type, typename NT::device_type > & | result, |
const Tpetra::RowMatrix< SC, LO, GO, NT > & | A | ||
) |
For a given Tpetra::RowMatrix that is not a Tpetra::CrsMatrix, assume that result.rowNorms has been computed (and globalized), and compute result.rowScaledColNorms.
Definition at line 97 of file Tpetra_computeRowAndColumnOneNorms_def.hpp.
EquilibrationInfo<typename Kokkos::ArithTraits<SC>::val_type, typename NT::device_type> Tpetra::Details::computeLocalRowOneNorms_RowMatrix | ( | const Tpetra::RowMatrix< SC, LO, GO, NT > & | A | ) |
Implementation of computeLocalRowOneNorms for a Tpetra::RowMatrix that is NOT a Tpetra::CrsMatrix.
Definition at line 133 of file Tpetra_computeRowAndColumnOneNorms_def.hpp.
EquilibrationInfo<typename Kokkos::ArithTraits<SC>::val_type, typename NT::device_type> Tpetra::Details::computeLocalRowAndColumnOneNorms_RowMatrix | ( | const Tpetra::RowMatrix< SC, LO, GO, NT > & | A, |
const bool | assumeSymmetric | ||
) |
Implementation of computeLocalRowAndColumnOneNorms for a Tpetra::RowMatrix that is NOT a Tpetra::CrsMatrix.
Definition at line 202 of file Tpetra_computeRowAndColumnOneNorms_def.hpp.
EquilibrationInfo<typename Kokkos::ArithTraits<SC>::val_type, typename NT::device_type> Tpetra::Details::computeLocalRowOneNorms_CrsMatrix | ( | const Tpetra::CrsMatrix< SC, LO, GO, NT > & | A | ) |
Implementation of computeLocalRowOneNorms for a Tpetra::CrsMatrix.
Definition at line 586 of file Tpetra_computeRowAndColumnOneNorms_def.hpp.
EquilibrationInfo<typename Kokkos::ArithTraits<SC>::val_type, typename NT::device_type> Tpetra::Details::computeLocalRowAndColumnOneNorms_CrsMatrix | ( | const Tpetra::CrsMatrix< SC, LO, GO, NT > & | A, |
const bool | assumeSymmetric | ||
) |
Implementation of computeLocalRowAndColumnOneNorms for a Tpetra::CrsMatrix.
Definition at line 618 of file Tpetra_computeRowAndColumnOneNorms_def.hpp.
EquilibrationInfo<typename Kokkos::ArithTraits<SC>::val_type, typename NT::device_type> Tpetra::Details::computeLocalRowOneNorms | ( | const Tpetra::RowMatrix< SC, LO, GO, NT > & | A | ) |
Compute LOCAL row one-norms ("row sums" etc.) of the input sparse matrix A.
A | [in] The input sparse matrix A. |
Definition at line 653 of file Tpetra_computeRowAndColumnOneNorms_def.hpp.
EquilibrationInfo<typename Kokkos::ArithTraits<SC>::val_type, typename NT::device_type> Tpetra::Details::computeLocalRowAndColumnOneNorms | ( | const Tpetra::RowMatrix< SC, LO, GO, NT > & | A, |
const bool | assumeSymmetric | ||
) |
Compute LOCAL row and column one-norms ("row sums" etc.) of the input sparse matrix A. Optionally, also compute row-scaled column norms (in the manner of LAPACK's DGEEQU routine).
A | [in] The input sparse matrix A. |
assumeSymmetric | [in] Whether to assume that the matrix A is (globally) symmetric. If so, don't compute row-scaled column norms separately from row norms. |
This function will only compute (local) row-scaled column norms in the same pass as row norms, if and only if BOTH of the following conditions hold:
Definition at line 689 of file Tpetra_computeRowAndColumnOneNorms_def.hpp.
OffsetType Tpetra::Details::convertColumnIndicesFromGlobalToLocal | ( | const Kokkos::View< LO *, DT > & | lclColInds, |
const Kokkos::View< const GO *, DT > & | gblColInds, | ||
const Kokkos::View< const OffsetType *, DT > & | ptr, | ||
const LocalMap< LO, GO, DT > & | lclColMap, | ||
const Kokkos::View< const NumEntType *, DT > & | numRowEnt | ||
) |
Convert a CrsGraph's global column indices into local column indices.
lclColInds | [out] On output: The graph's local column indices. This may alias gblColInds, if LO == GO. |
gblColInds | [in] On input: The graph's global column indices. This may alias lclColInds, if LO == GO. |
ptr | [in] The graph's row offsets. |
lclColMap | [in] "Local" (threaded-kernel-worthy) version of the column Map. |
numRowEnt | [in] Array with number of entries in each row. |
Definition at line 186 of file Tpetra_CrsGraph_def.hpp.
void Tpetra::Details::residual | ( | const Operator< SC, LO, GO, NO > & | A, |
const MultiVector< SC, LO, GO, NO > & | X, | ||
const MultiVector< SC, LO, GO, NO > & | B, | ||
MultiVector< SC, LO, GO, NO > & | R | ||
) |
Computes R = B - A * X.
Definition at line 522 of file Tpetra_Details_residual.hpp.
|
static |
All-reduce from input Kokkos::View to output Kokkos::View.
The two Views may alias one another.
Definition at line 62 of file Tpetra_Details_allReduceView.hpp.
Kokkos::DualView<ValueType*, DeviceType> Tpetra::Details::castAwayConstDualView | ( | const Kokkos::DualView< const ValueType *, DeviceType > & | input_dv | ) |
Cast away const-ness of a 1-D Kokkos::DualView.
Kokkos::DualView<const ValueType*, DeviceType> forbids sync, at run time. If we want to sync it, we have to cast away const.
Definition at line 32 of file Tpetra_Details_castAwayConstDualView.hpp.
bool Tpetra::Details::checkLocalViewValidity | ( | std::ostream * | lclErrStrm, |
const int | myMpiProcessRank, | ||
const Kokkos::View< DataType, Properties...> & | view | ||
) |
Is the given View valid?
"Valid" means one of the following
lclErrStrm | [out] If the View is invalid, and this pointer is nonnull, then write a human-readable explanation of what's wrong with the View to the stream. |
myMpiProcessRank | [in] The rank of the calling MPI process, in whatever communicator is relevant to the caller. Only used as part of human-readable error output to *lclErrStrm . |
view | [in] The Kokkos::View to investigate. |
Definition at line 56 of file Tpetra_Details_checkView.hpp.
bool Tpetra::Details::checkLocalDualViewValidity | ( | std::ostream *const | lclErrStrm, |
const int | myMpiProcessRank, | ||
const Kokkos::DualView< DataType, Args...> & | dv | ||
) |
Is the given Kokkos::DualView valid?
A DualView is valid if both of its constituent Views are valid.
Definition at line 93 of file Tpetra_Details_checkView.hpp.
bool Tpetra::Details::checkLocalWrappedDualViewValidity | ( | std::ostream *const | lclErrStrm, |
const int | myMpiProcessRank, | ||
const Tpetra::Details::WrappedDualView< Kokkos::DualView< DataType, Args...> > & | dv | ||
) |
Is the given Tpetra::WrappedDualView valid?
A WrappedDualView is valid if both of its constituent Views are valid.
Definition at line 190 of file Tpetra_Details_checkView.hpp.
OffsetsViewType::non_const_value_type Tpetra::Details::computeOffsetsFromCounts | ( | const ExecutionSpace & | execSpace, |
const OffsetsViewType & | ptr, | ||
const CountsViewType & | counts | ||
) |
Compute offsets from counts.
Compute offsets from counts via prefix sum:
ptr[i+1] = {j=0}^{i} counts[j]
Thus, ptr[i+1] - ptr[i] = counts[i], so that ptr[i+1] = ptr[i] + counts[i]. If we stored counts[i] in ptr[i+1] on input, then the formula is ptr[i+1] += ptr[i].
ptr
.ExecutionSpace | Kokkos execution space instance on which to run. |
OffsetsViewType | Type of the Kokkos::View specialization used to store the offsets; the output array of this function. |
CountsViewType | Type of the Kokkos::View specialization used to store the counts; the input array of this function. |
SizeType | The parallel loop index type; a built-in integer type. Defaults to the type of the input View's dimension. You may use a shorter type to improve performance. |
The type of each entry of the ptr
array must be able to store the sum of all the entries of counts
. This functor makes no attempt to check for overflow in this sum.
Definition at line 216 of file Tpetra_Details_computeOffsets.hpp.
OffsetsViewType::non_const_value_type Tpetra::Details::computeOffsetsFromCounts | ( | const OffsetsViewType & | ptr, |
const CountsViewType & | counts | ||
) |
Overload that uses OffsetsViewType's execution space.
Definition at line 299 of file Tpetra_Details_computeOffsets.hpp.
OffsetsViewType::non_const_value_type Tpetra::Details::computeOffsetsFromConstantCount | ( | const OffsetsViewType & | ptr, |
const CountType | count | ||
) |
Compute offsets from a constant count.
Compute offsets from a constant count via prefix sum:
ptr[i+1] = {j=0}^{i} count
Thus, ptr[i+1] - ptr[i] = count, so that ptr[i+1] = ptr[i] + count.
ptr
.OffsetsViewType | Type of the Kokkos::View specialization used to store the offsets; the output array of this function. |
CountType | Type of the constant count; the input argument of this function. |
SizeType | The parallel loop index type; a built-in integer type. Defaults to the type of the output View's dimension. You may use a shorter type to improve performance. |
The type of each entry of the ptr
array must be able to store ptr.extent (0) * count
. This functor makes no attempt to check for overflow in this sum.
Definition at line 332 of file Tpetra_Details_computeOffsets.hpp.
void Tpetra::Details::copyConvert | ( | const OutputViewType & | dst, |
const InputViewType & | src | ||
) |
Copy values from the 1-D Kokkos::View src, to the 1-D Kokkos::View dst, of the same length. The entries of src and dst may have different types, but it must be possible to copy-construct each entry of dst with its corresponding entry of src.
Everything above is an implementation detail of this function, copyConvert.
Definition at line 328 of file Tpetra_Details_copyConvert.hpp.
void Tpetra::Details::copyOffsets | ( | const OutputViewType & | dst, |
const InputViewType & | src | ||
) |
Copy row offsets (in a sparse graph or matrix) from src to dst. The offsets may have different types.
The implementation reserves the right to do bounds checking if the offsets in the two arrays have different types.
Everything above is an implementation detail of this function, copyOffsets. This function in turn is an implementation detail of FixedHashTable, in particular of the "copy constructor" that copies a FixedHashTable from one Kokkos device to another. copyOffsets copies the array of offsets (ptr_).
Definition at line 503 of file Tpetra_Details_copyOffsets.hpp.
Impl::CreateMirrorViewFromUnmanagedHostArray<ValueType, OutputDeviceType>::output_view_type Tpetra::Details::create_mirror_view_from_raw_host_array | ( | const OutputDeviceType & | , |
ValueType * | inPtr, | ||
const size_t | inSize, | ||
const bool | copy = true , |
||
const char | label[] = "" |
||
) |
Variant of Kokkos::create_mirror_view that takes a raw host 1-d array as input.
Given a pointer to a 1-D array in host memory, and the number of entries in the array, return a Kokkos::View that lives in OutputDeviceType, and that is a mirror view of the input array. By default, copy the host data to the output View, if necessary.
Definition at line 173 of file Tpetra_Details_createMirrorView.hpp.
KOKKOS_FUNCTION SparseMatrixType::ordinal_type Tpetra::Details::crsMatrixSumIntoValues_sortedSortedLinear | ( | const SparseMatrixType & | A, |
const typename SparseMatrixType::ordinal_type | lclRow, | ||
const typename SparseMatrixType::ordinal_type | lclColInds[], | ||
const typename SparseMatrixType::ordinal_type | sortPerm[], | ||
const ValsViewType & | vals, | ||
const typename SparseMatrixType::ordinal_type | numEntInInput, | ||
const bool | forceAtomic = false , |
||
const bool | checkInputIndices = true |
||
) |
A(lclRow, lclColsInds[sortPerm[j]]) += vals[sortPerm[j]]
, for all j in 0 .. eltDim-1.
In the row of the matrix A with the local row index lclRow, find entries with column indices lclColInds, and sum into those entries with vals. Assume that lclColInds[sortPerm] is sorted, and that the column indices in that row of the matrix are sorted as well. Use linear search to find the entries in that row of the matrix.
SparseMatrixType | Specialization of KokkosSparse::CrsMatrix. |
ValsViewType | Specialization of a 1-D Kokkos::View. |
A | [in/out] Sparse matrix whose entries to modify. |
lclRow | [in] Local index of the row in the matrix A to modify. lclRow MUST be a valid local row index of A. |
lclColInds | [in] Local column indices to modify in that row. |
sortPerm | [in] Permutation that makes lclColInds sorted. That is, lclColInds[sortPerm] is sorted. |
vals | [in] Input 1-D Kokkos::View of the values to ruse. This is a Kokkos::View and not a raw 1-D array, because it may be strided, if the original element being used (see crsMatrixSumInElement) has a column-major layout. |
numEntInInput | [in] Number of entries in the input. This function will read the first numEntInInput entries of lclColInds, sortPerm, and vals. |
forceAtomic | [in] Whether to use atomic updates when modifying the entries of the matrix A. This MUST be a compile-time constant. It defaults to whether the matrix's Kokkos execution space is NOT Kokkos::Serial. |
checkInputIndices | [in] Whether to check whether the input indices are valid column indices before just using them. For forwards compatibility, this should always be a compile-time constant. Default is true, that is, always check. |
Definition at line 61 of file Tpetra_Details_crsMatrixAssembleElement.hpp.
KOKKOS_FUNCTION SparseMatrixType::ordinal_type Tpetra::Details::crsMatrixReplaceValues_sortedSortedLinear | ( | const SparseMatrixType & | A, |
const typename SparseMatrixType::ordinal_type | lclRow, | ||
const typename SparseMatrixType::ordinal_type | lclColInds[], | ||
const typename SparseMatrixType::ordinal_type | sortPerm[], | ||
const ValsViewType & | vals, | ||
const typename SparseMatrixType::ordinal_type | numEntInInput, | ||
const bool | forceAtomic = false , |
||
const bool | checkInputIndices = true |
||
) |
A(lclRow, lclColsInds[sortPerm[j]]) = vals[sortPerm[j]]
, for all j in 0 .. eltDim-1.
In the row of the matrix A with the local row index lclRow, find entries with column indices lclColInds, and replace those entries with vals. Assume that lclColInds[sortPerm] is sorted, and that the column indices in that row of the matrix are sorted as well. Use linear search to find the entries in that row of the matrix.
SparseMatrixType | Specialization of KokkosSparse::CrsMatrix. |
ValsViewType | Specialization of a 1-D Kokkos::View. |
A | [in/out] Sparse matrix whose entries to modify. |
lclRow | [in] Local index of the row in the matrix A to modify. lclRow MUST be a valid local row index of A. |
lclColInds | [in] Local column indices to modify in that row. |
sortPerm | [in] Permutation that makes lclColInds sorted. That is, lclColInds[sortPerm] is sorted. |
vals | [in] Input 1-D Kokkos::View of the values to use. This is a Kokkos::View and not a raw 1-D array, because it may be strided, if the original element being used (see crsMatrixSumInElement) has a column-major layout. |
numEntInInput | [in] Number of entries in the input. This function will read the first numEntInInput entries of lclColInds, sortPerm, and vals. |
forceAtomic | [in] Whether to use atomic updates when modifying the entries of the matrix A. For forwards compatibility, this should always be a compile-time constant. It defaults to whether the matrix's Kokkos execution space is NOT Kokkos::Serial. |
checkInputIndices | [in] Whether to check whether the input indices are valid column indices before just using them. This MUST be a compile-time constant. Default is true, that is, always check. |
Definition at line 185 of file Tpetra_Details_crsMatrixAssembleElement.hpp.
KOKKOS_FUNCTION SparseMatrixType::ordinal_type Tpetra::Details::crsMatrixAssembleElement_sortedLinear | ( | const SparseMatrixType & | A, |
const VectorViewType & | x, | ||
typename SparseMatrixType::ordinal_type | lids[], | ||
typename SparseMatrixType::ordinal_type | sortPerm[], | ||
const RhsViewType & | rhs, | ||
const LhsViewType & | lhs, | ||
const bool | forceAtomic = false , |
||
const bool | checkInputIndices = true |
||
) |
A(lids[j], lids[j]) += lhs(j,j)
and x(lids[j]) += rhs(j)
, for all j in 0 .. eltDim-1.
Assume the following:
Sum the dense "element" matrix (2-D Kokkos::View) lhs
into the entries of the sparse matrix A corresponding to the input row and column indices lids
. Also, sum the dense "element" vector (1-D Kokkos::View) rhs
into the entries of the dense vector x corresponding to the input row indices lids
.
SparseMatrixType | Specialization of KokkosSparse::CrsMatrix. |
RhsViewType | Specialization of a 1-D Kokkos::View. |
LhsViewType | Specialization of a 2-D Kokkos::View. |
A | [in/out] Sparse matrix (KokkosSparse::CrsMatrix) to modify. |
x | [in/out] Dense vector (1-D Kokkos::View) to modify. |
lids | [in/out] Local row and column indices of A to modify. This function may sort this array, and output the permutation that makes it sorted to sortPerm . lids must have the same number of entries as rhs.extent(0) , lhs.extent(0) , and lhs.extent(1) . |
sortPerm | [out] Permutation that makes lids (on input) sorted. It must have the same number of writeable entries as lids (see above). |
rhs | [in] Dense "element" vector of input values to sum into the dense vector x; a 1-D Kokkos::View. It must have the same number of entries as each dimension of lhs . |
lhs | [in] Dense, square "element" matrix of input values to sum into the sparse matrix A; a 2-D Kokkos::View. Each of its dimensions must be the same as the number of entries in rhs . |
forceAtomic | [in] Whether to use atomic updates when modifying the entries of the matrix A and vector x. For forwards compatibility, this should always be a compile-time constant. It defaults to whether the matrix's Kokkos execution space is NOT Kokkos::Serial. |
checkInputIndices | [in] Whether to check whether the input indices are valid column indices before just using them. This MUST be a compile-time constant. Default is true, that is, always check. |
Definition at line 324 of file Tpetra_Details_crsMatrixAssembleElement.hpp.
void Tpetra::Details::padCrsArrays | ( | const RowPtr & | rowPtrBeg, |
const RowPtr & | rowPtrEnd, | ||
Indices & | indices_wdv, | ||
const Padding & | padding, | ||
const int | my_rank, | ||
const bool | verbose | ||
) |
Determine if the row pointers and indices arrays need to be resized to accommodate new entries. If they do need to be resized, resize the indices arrays and shift the existing contents to accommodate new entries. Modify values in the row pointers array to point to the newly shifted locations in the indices arrays.
This routine is called to resize/shift the CRS arrays before attempting to insert new values if the number of new values exceeds the amount of free space in the CRS arrays.
[in/out] | rowPtrBeg - rowPtrBeg[i] points to the first column index (in the indices array) of row i. |
[in/out] | rowPtrEnd - rowPtrEnd[i] points to the last column index (in the indices array) of row i. |
[in/out] | indices - array containing columns indices of nonzeros in CRS representation. |
Definition at line 540 of file Tpetra_Details_crsUtils.hpp.
size_t Tpetra::Details::insertCrsIndices | ( | typename Pointers::value_type const | row, |
Pointers const & | rowPtrs, | ||
InOutIndices & | curIndices, | ||
size_t & | numAssigned, | ||
InIndices const & | newIndices, | ||
std::function< void(const size_t, const size_t, const size_t)> | cb = std::function<void(const size_t, const size_t, const size_t)>() |
||
) |
Insert new indices in to current list of indices.
row | [in] The row in which to insert |
rowPtrs | [in] "Pointers" to beginning of each row |
curIndices | [in/out] The current indices |
numAssigned | [in/out] The number of currently assigned indices in row row |
newIndices | [in] The indices to insert |
map | [in] An optional function mapping newIndices[k] to its actual index |
cb | [in] An optional callback function called on every insertion at the local index and the offset in to the inserted location |
Notes curIndices
is the current list of CRS indices. it is not assumed to be sorted, but entries are unique. For each newIndices
[k], we look to see if the index exists in cur_indices
. If it does, we do not insert it (no repeats). If it does not exist, we first check to make sure there is capacity in curIndices
and if there is we insert it at the end.
The actual value of newIndices
[k] that is inserted is the value returned from map(newIndices[k])
. If an identity map is provided, newIndices
[k] is directly inserted. However, any other map can be provided. For instance, for a locally indexed graph on which insertGlobalIndices
is called, the curIndices
array can be a view of the graph's local indices, the newIndices
array are the new global indices, and map
is the graph's column map to convert global indices to local. If this function is called through the overload below without the map
argument, the identity map is provided.
The optional function cb
is called on every valid index. cb
is sent the current loop index k
, rowPtrs
[k] (the start of the row), and the relative offset from start
in to the curIndices
array for newIndices
[k]. This function could, for example, be used by CrsMatrix
to fill the values array during sumInto*Values
or replace*Values
; Eg, CrsMatrix::sumIntoLocalValues
might have the following:
CrsMatrix::sumIntoLocalValues(LO row, array<LO> cols, array<S> vals) { this->graph_->insertLocalValues(row, cols, [&](size_t const k, size_t const start, size_t const offset){ this->values_[start+offset] += vals[k]; }); }
Definition at line 620 of file Tpetra_Details_crsUtils.hpp.
size_t Tpetra::Details::findCrsIndices | ( | typename Pointers::value_type const | row, |
Pointers const & | rowPtrs, | ||
const size_t | curNumEntries, | ||
Indices1 const & | curIndices, | ||
Indices2 const & | newIndices, | ||
Callback && | cb | ||
) |
Finds offsets in to current list of indices.
row | [in] The row in which to insert |
rowPtrs | [in] "Pointers" to beginning of each row |
curIndices | [in] The current indices |
numAssigned | [in] The number of currently assigned indices in row row |
newIndices | [in] The indices to insert |
cb | [in] An optional function called on every insertion at the local index and the offset in to the inserted location |
Notes curIndices
is the current list of CRS indices. it is not assumed to be sorted, but entries are unique. For each newIndices
[k], we look to see if the index exists in curIndices
. If it does, we do not insert it (no repeats). If it does not exist, we first check to make sure there is capacity in curIndices
and if there is we insert it at the end.
The actual value of newIndices
[k] that is inserted is the value returned from map(newIndices[k])
. If an identity map is provided, newIndices
[k] is directly inserted. However, any other map can be provided. For instance, for a locally indexed graph on which insertGlobalIndices
is called, the curIndices
array can be a view of the graph's local indices, the newIndices
array are the new global indices, and map
is the graph's column map to convert global indices to local. If this function is called through the overload below without the map
argument, the identity map is provided.
The function cb
is called on every valid index.
Definition at line 689 of file Tpetra_Details_crsUtils.hpp.
LocalTriangularStructureResult<typename LocalMapType::local_ordinal_type> Tpetra::Details::determineLocalTriangularStructure | ( | const LocalGraphType & | G, |
const LocalMapType & | rowMap, | ||
const LocalMapType & | colMap, | ||
const bool | ignoreMapsForTriangularStructure | ||
) |
Count the local number of diagonal entries in a local sparse graph, and determine whether the local part of the graph is structurally lower or upper triangular (or neither).
LocalGraphType | Kokkos::StaticCrsGraph specialization |
LocalMapType | Result of Tpetra::Map::getLocalGraph() |
G | [in] The local sparse graph |
rowMap | [in] The graph's local row Map |
colMap | [in] The graph's local column Map |
ignoreMapsForTriangularStructure | [in] If true, ignore the Maps when determining whether the graph is structurally lower or upper triangular (or neither). See GitHub Issue #2658. Regardless, use the Maps to count diagonal entries. |
Definition at line 207 of file Tpetra_Details_determineLocalTriangularStructure.hpp.
std::string Tpetra::Details::DistributorSendTypeEnumToString | ( | EDistributorSendType | sendType | ) |
Convert an EDistributorSendType enum value to a string.
This is an implementation detail of Distributor. Please do not rely on this function in your code.
Definition at line 21 of file Tpetra_Details_DistributorPlan.cpp.
std::string Tpetra::Details::DistributorHowInitializedEnumToString | ( | EDistributorHowInitialized | how | ) |
Convert an EDistributorHowInitialized enum value to a string.
This is an implementation detail of Distributor. Please do not rely on this function in your code.
Definition at line 47 of file Tpetra_Details_DistributorPlan.cpp.
auto Tpetra::Details::view_alloc_no_init | ( | const std::string & | label | ) | -> |
Use in place of the string label as the first argument of Kokkos::View's constructor, in case you want to allocate without initializing.
Definition at line 16 of file Tpetra_Details_DualViewUtil.cpp.
void Tpetra::Details::makeDualViewFromOwningHostView | ( | Kokkos::DualView< ElementType *, DeviceType > & | dv, |
const typename Kokkos::DualView< ElementType *, DeviceType >::t_host & | hostView | ||
) |
Initialize dv
such that its host View is hostView
.
This shallow copies the host View into the output DualView, and syncs the output DualView to device.
Definition at line 39 of file Tpetra_Details_DualViewUtil.hpp.
bool Tpetra::Details::teuchosCommIsAnMpiComm | ( | const Teuchos::Comm< int > & | ) |
Is the given Comm a Teuchos::MpiComm<int> instance?
Definition at line 59 of file Tpetra_Details_extractMpiCommFromTeuchos.cpp.
void Tpetra::Details::gathervPrint | ( | std::ostream & | out, |
const std::string & | s, | ||
const Teuchos::Comm< int > & | comm | ||
) |
On Process 0 in the given communicator, print strings from each process in that communicator, in rank order.
For each process in the given communicator comm
, send its string s
to Process 0 in that communicator. Process 0 prints the strings in rank order.
This is a collective over the given communicator comm
. Process 0 promises not to store all the strings in its memory. This function's total memory usage on any process is proportional to the calling process' string length, plus the max string length over any process. This does NOT depend on the number of processes in the communicator. Thus, we call this a "memory-scalable" operation. While the function's name suggests MPI_Gatherv, the implementation may NOT use MPI_Gather or MPI_Gatherv, because neither of those are not memory scalable.
Process 0 prints nothing other than what is in the string. It does not add an endline after each string, nor does it identify each string with its owning process' rank. If you want either of those in the string, you have to put it there yourself.
out | [out] The output stream to which to write. ONLY Process 0 in the given communicator will write to this. Thus, this stream need only be valid on Process 0. |
s | [in] The string to write. Each process in the given communicator has its own string. Strings may be different on different processes. Zero-length strings are OK. |
comm | [in] The communicator over which this operation is a collective. |
Definition at line 18 of file Tpetra_Details_gathervPrint.cpp.
|
static |
Given a locally indexed, local sparse matrix, and corresponding local row and column Maps, extract the matrix's diagonal entries into a 1-D Kokkos::View.
This function implements much of the one-argument overload of Tpetra::CrsMatrix::getLocalDiagCopy, for the case where the matrix is fill complete. The function computes offsets of diagonal entries inline, and does not store them. If you want to store the offsets, call computeOffsets() instead.
DiagType | 1-D nonconst Kokkos::View |
CrsMatrixType | Specialization of KokkosSparse::CrsMatrix |
LocalMapType | Specialization of Tpetra::Details::LocalMap; type of the "local" part of a Tpetra::Map |
D | [out] 1-D Kokkos::View to which to write the diagonal entries. |
rowMap | [in] "Local" part of the sparse matrix's row Map. |
colMap | [in] "Local" part of the sparse matrix's column Map. |
A | [in] The sparse matrix. |
Definition at line 148 of file Tpetra_Details_getDiagCopyWithoutOffsets_decl.hpp.
LO Tpetra::Details::getLocalDiagCopyWithoutOffsetsNotFillComplete | ( | ::Tpetra::Vector< SC, LO, GO, NT > & | diag, |
const ::Tpetra::RowMatrix< SC, LO, GO, NT > & | A, | ||
const bool | debug = false |
||
) |
Given a locally indexed, global sparse matrix, extract the matrix's diagonal entries into a Tpetra::Vector.
This function is a work-around for Github Issue #499. It implements one-argument Tpetra::CrsMatrix::getLocalDiagCopy for the case where the matrix is not fill complete. The function computes offsets of diagonal entries inline, and does not store them. If you want to store the offsets, call computeOffsets() instead.
SC | Same as first template parameter (Scalar) of Tpetra::CrsMatrix and Tpetra::Vector. |
LO | Same as second template parameter (LocalOrdinal) of Tpetra::CrsMatrix and Tpetra::Vector. |
GO | Same as third template parameter (GlobalOrdinal) of Tpetra::CrsMatrix and Tpetra::Vector. |
NT | Same as fourth template parameter (Node) of Tpetra::CrsMatrix and Tpetra::Vector. |
diag | [out] Tpetra::Vector to which to write the diagonal entries. Its Map must be the same (in the sense of Tpetra::Map::isSameAs()) as the row Map of A . |
A | [in] The sparse matrix. Must be a Tpetra::RowMatrix (the base class of Tpetra::CrsMatrix), must be locally indexed, and must have row views. |
debug | [in] Whether to do extra run-time checks. This costs MPI communication. The default is false in a release build, and true in a debug build. |
We pass in the sparse matrix as a Tpetra::RowMatrix because the implementation of Tpetra::CrsMatrix uses this function, and we want to avoid a circular header dependency. On the other hand, the implementation does not actually depend on Tpetra::CrsMatrix.
Definition at line 150 of file Tpetra_Details_getDiagCopyWithoutOffsets_def.hpp.
CrsGraphType::local_ordinal_type Tpetra::Details::getLocalNumDiags | ( | const CrsGraphType & | G | ) |
Number of populated diagonal entries in the given sparse graph, on the calling (MPI) process.
Definition at line 368 of file Tpetra_Details_getNumDiags.hpp.
CrsGraphType::global_ordinal_type Tpetra::Details::getGlobalNumDiags | ( | const CrsGraphType & | G | ) |
Number of populated diagonal entries in the given sparse graph, over all processes in the graph's (MPI) communicator.
Definition at line 377 of file Tpetra_Details_getNumDiags.hpp.
std::shared_ptr<CommRequest> Tpetra::Details::iallreduce | ( | const InputViewType & | sendbuf, |
const OutputViewType & | recvbuf, | ||
const ::Teuchos::EReductionType | op, | ||
const ::Teuchos::Comm< int > & | comm | ||
) |
Nonblocking all-reduce, for either rank-1 or rank-0 Kokkos::View objects.
InputViewType | Type of the send buffer |
OutputViewType | Type of the receive buffer |
This function wraps MPI_Iallreduce. It does a nonblocking all-reduce over the input communicator comm
, from sendbuf
into recvbuf
, using op
as the reduction operator. The function returns without blocking; the all-reduce only blocks for completion when one calls wait() on the returned request.
sendbuf | [in] Input buffer; must be either a rank-1 or rank-0 Kokkos::View, and must have the same rank as recvbuf. |
recvbuf | [in] Output buffer; must be either a rank-1 or rank-0 Kokkos::View, and must have the same rank as sendbuf. |
op | [in] Teuchos enum representing the reduction operator. |
comm | [in] Communicator over which to do the all-reduce. |
sendbuf
and recvbuf
must either be disjoint, or identical (point to the same array). They may not partially overlap. Furthermore, if they are identical, the input communicator must be an intracommunicator. It may not be an intercommunicator. If you don't know what an intercommunicator is, you probably just have an intracommunicator, so everything is fine.
Definition at line 271 of file Tpetra_Details_iallreduce.hpp.
void Tpetra::Details::initializeKokkos | ( | ) |
Initialize Kokkos, using command-line arguments (if any) given to Teuchos::GlobalMPISession.
Definition at line 29 of file Tpetra_Details_initializeKokkos.cpp.
bool Tpetra::Details::isInterComm | ( | const Teuchos::Comm< int > & | comm | ) |
Return true if and only if the input communicator wraps an MPI intercommunicator.
The most common MPI communicators are intracommunicators ("in<i>tra</i>," not "in<i>ter</i>"). This includes MPI_COMM_WORLD, MPI_COMM_SELF, and the results of MPI_Comm_dup and MPI_Comm_split. Intercommunicators come from MPI_Intercomm_create.
This distinction matters because only collectives over intracommunicators may use MPI_IN_PLACE, to let the send and receive buffers alias each other. Collectives over intercommunicators may not use MPI_IN_PLACE.
comm | [in] The input communicator. |
Definition at line 34 of file Tpetra_Details_isInterComm.cpp.
void Tpetra::Details::leftScaleLocalCrsMatrix | ( | const LocalSparseMatrixType & | A_lcl, |
const ScalingFactorsViewType & | scalingFactors, | ||
const bool | assumeSymmetric, | ||
const bool | divide = true |
||
) |
Left-scale a KokkosSparse::CrsMatrix.
LocalSparseMatrixType | KokkosSparse::CrsMatrix specialization. |
ScalingFactorsViewType | Kokkos::View specialization storing scaling factors by which to divide the rows of the local sparse matrix. |
A_lcl | [in/out] The local sparse matrix. |
scalingFactors | [in] Row scaling factors. |
assumeSymmetric | [in] If true, divide matrix entries by square roots of scaling factors; else, divide by the scaling factors themselves. |
divide | [in] If true, divide; else multiply. |
Definition at line 110 of file Tpetra_Details_leftScaleLocalCrsMatrix.hpp.
int Tpetra::Details::makeColMap | ( | Teuchos::RCP< const Tpetra::Map< LO, GO, NT > > & | colMap, |
Teuchos::Array< int > & | remotePIDs, | ||
const Teuchos::RCP< const Tpetra::Map< LO, GO, NT > > & | domMap, | ||
const RowGraph< LO, GO, NT > & | graph, | ||
const bool | sortEachProcsGids = true , |
||
std::ostream * | errStrm = NULL |
||
) |
Make the graph's column Map.
LO | Local ordinal type; the type of local indices in the graph. |
GO | Global ordinal type; the type of global indices in the graph. |
NT | Node type; the third template parameter of Tpetra::CrsGraph. |
colMap | [out] On output: pointer to the column Map for the given graph. This is only valid if the returned error code is zero on all processes in the communicator of domMap (see below). This may be the same (literally the same object as) as domMap , depending on the graph. |
remotePIDs | [out] The process ranks corresponding to the column Map's "remote" (not on the calling process in the domain Map) indices. |
domMap | [in] The domain Map to use for creating the column Map. This need not be the same as graph.getDomainMap(). It's OK for the latter to be null, in fact. domMap needs to be passed in by RCP, because it's possible for the returned column Map colMap (see above) to equal domMap . |
graph | [in] The graph for which to make a column Map. This function does NOT modify the graph's column Map, if it happens to have one already. Thus, this function supports graph modification. |
sortEachProcsGids | [in] Whether to sort column Map GIDs associated with each remote process in ascending order. This is true by default. If false , leave GIDs in their original order as discovered in the graph by iterating in ascending order through the local rows of the graph. |
errStrm | [out] If nonnull, print error messages to this. |
This function always makes a column Map, even if the graph already has one. This makes it possible to change the graph's structure, and have its column Map and corresponding Import update in the same way.
The sortEachProcsGids argument corresponds to sortGhostsAssociatedWithEachProcessor_ in CrsGraph. This function always groups remote GIDs by process rank, so that all remote GIDs with the same owning rank occur contiguously. The sortEachProcsGids argument (see above) whether this function sorts remote GIDs in increasing order within those groups. This function sorts by default. This behavior differs from Epetra, which does not sort remote GIDs with the same owning process. means "sort remote GIDs." If you don't want to sort, for compatibility with Epetra, set sortEachProcsGids to false.
Definition at line 295 of file Tpetra_Details_makeColMap_def.hpp.
int Tpetra::Details::makeColMap | ( | Teuchos::RCP< const Tpetra::Map< LO, GO, NT >> & | colMap, |
const Teuchos::RCP< const Tpetra::Map< LO, GO, NT >> & | domMap, | ||
Kokkos::View< GO *, typename NT::memory_space > | gids, | ||
std::ostream * | errStrm = NULL |
||
) |
Construct a column map for the given set of gids (always sorting remote GIDs within each remote process).
colMap | [out] Will be set to the new column map. |
domMap | [in] The domain map, used to determine which global columns are locally owned. |
Definition at line 547 of file Tpetra_Details_makeColMap_def.hpp.
Teuchos::RCP<const MapType> Tpetra::Details::makeOptimizedColMap | ( | std::ostream & | errStream, |
bool & | lclErr, | ||
const MapType & | domMap, | ||
const MapType & | colMap, | ||
const Tpetra::Import< typename MapType::local_ordinal_type, typename MapType::global_ordinal_type, typename MapType::node_type > * | oldImport = nullptr |
||
) |
Return an optimized reordering of the given column Map.
MapType | A specialization of Map. |
err | [out] Output stream for human-readable error reporting. This is local to the calling process and may differ on different processes. |
lclErr | [out] On output: true if anything went wrong on the calling process. This value is local to the calling process and may differ on different processes. |
domMap | [in] Domain Map of a CrsGraph or CrsMatrix. |
colMap | [in] Original column Map of the same CrsGraph or CrsMatrix as domMap . |
oldImport | [in] Optional pointer to the "original Import: an Import from domMap to colMap . This is not required, but if you supply this, this function may use it to avoid some communication and/or work when setting up the new Import object. |
newColMap
.This is a convenience wrapper for makeOptimizedColMapAndImport(). (Please refer to that function's documentation in this file.) It does everything that that function does, except that it does not compute a new Import.
Definition at line 349 of file Tpetra_Details_makeOptimizedColMap.hpp.
std::pair<Teuchos::RCP<const MapType>, Teuchos::RCP<typename OptColMap<MapType>::import_type> > Tpetra::Details::makeOptimizedColMapAndImport | ( | std::ostream & | errStream, |
bool & | lclErr, | ||
const MapType & | domMap, | ||
const MapType & | colMap, | ||
const typename OptColMap< MapType >::import_type * | oldImport = nullptr |
||
) |
Return an optimized reordering of the given column Map. Optionally, recompute an Import from the input domain Map to the new column Map.
MapType | A specialization of Map. |
This function takes a domain Map and a column Map of a distributed graph (Tpetra::CrsGraph) or matrix (e.g., Tpetra::CrsMatrix). It then creates a new column Map, which optimizes the performance of an Import operation from the domain Map to the new column Map. This function also optionally creates that Import. Creating the new column Map and its Import at the same time saves some communication, since making the Import requires some of the same information that optimizing the column Map does.
err | [out] Output stream for human-readable error reporting. This is local to the calling process and may differ on different processes. |
lclErr | [out] On output: true if anything went wrong on the calling process. This value is local to the calling process and may differ on different processes. |
domMap | [in] Domain Map of a CrsGraph or CrsMatrix. |
colMap | [in] Original column Map of the same CrsGraph or CrsMatrix as domMap . |
oldImport | [in] Optional pointer to the "original Import: an Import from domMap to colMap . This is not required, but if you supply this, this function may use it to avoid some communication and/or work when setting up the new Import object. |
newColMap
, and the corresponding Import from domMap
to newColMap
.domMap
and colMap
must have the same or congruent communicators. colMap
must be a subset of the indices in domMap
.The returned column Map's global indices (GIDs) will have the following order on all calling processes:
colMap
and domMap
(on the calling process) go first. colMap
on the calling process, but not in the domain Map on the calling process, follow. They are ordered first contiguously by their owning process rank (in the domain Map), then in increasing order within that. This imitates the ordering used by AztecOO and Epetra. Storing indices contiguously that are owned by the same process (in the domain Map) permits the use of contiguous send and receive buffers in Distributor, which is used in an Import operation.
Definition at line 427 of file Tpetra_Details_makeOptimizedColMap.hpp.
IndexType Tpetra::Details::countMergeUnsortedIndices | ( | const OrdinalType | curInds[], |
const IndexType | numCurInds, | ||
const OrdinalType | inputInds[], | ||
const IndexType | numInputInds | ||
) |
Count the number of column indices that can be merged into the current row, assuming that both the current row's indices and the input indices are unsorted.
Neither the current row's entries, nor the input, are sorted. Return the number of input entries that can be merged into the current row. Don't actually merge them. 'numCurInds' corresponds to 'midPos' in mergeUnsortedIndices.
The current indices are NOT allowed to have repeats, but the input indices ARE allowed to have repeats. (The whole point of these methods is to keep the current entries without repeats – "merged in.") Repeats in the input are counted separately with respect to merges.
The unsorted case is bad for asymptotics, but the asymptotics only show up with dense or nearly dense rows, which are bad for other reasons.
Definition at line 42 of file Tpetra_Details_Merge.hpp.
IndexType Tpetra::Details::countMergeSortedIndices | ( | const OrdinalType | curInds[], |
const IndexType | numCurInds, | ||
const OrdinalType | inputInds[], | ||
const IndexType | numInputInds | ||
) |
Count the number of column indices that can be merged into the current row, assuming that both the current row's indices and the input indices are sorted.
Both the current row's entries and the input are sorted. Return the number of input entries that can be merged into the current row. Don't actually merge them. 'numCurInds' corresponds to 'midPos' in mergeSortedIndices.
The current indices are NOT allowed to have repeats, but the input indices ARE allowed to have repeats. (The whole point of these methods is to keep the current entries without repeats – "merged in.") Repeats in the input are counted separately with respect to merges.
The sorted case is good for asymptotics, but imposes an order on the entries of each row. Sometimes users don't want that.
Definition at line 101 of file Tpetra_Details_Merge.hpp.
std::pair<bool, IndexType> Tpetra::Details::mergeSortedIndices | ( | OrdinalType | curInds[], |
const IndexType | midPos, | ||
const IndexType | endPos, | ||
const OrdinalType | inputInds[], | ||
const IndexType | numInputInds | ||
) |
Attempt to merge the input indices into the current row's column indices, assuming that both the current row's indices and the input indices are sorted.
Both the current row's entries and the input are sorted. If and only if the current row has enough space for the input (after merging), merge the input with the current row.
Assume that both curInds and inputInds are sorted. Current indices: curInds[0 .. midPos-1]. Extra space at end: curInds[midPos .. endPos-1] Input indices to merge in: inputInds[0 .. numInputInds]. Any of those could be empty.
If the merge succeeded, return true and the new number of entries in the row. Else, return false and the new number of entries in the row required to fit the input.
The sorted case is good for asymptotics, but imposes an order on the entries of each row. Sometimes users don't want that.
Definition at line 173 of file Tpetra_Details_Merge.hpp.
std::pair<bool, IndexType> Tpetra::Details::mergeUnsortedIndices | ( | OrdinalType | curInds[], |
const IndexType | midPos, | ||
const IndexType | endPos, | ||
const OrdinalType | inputInds[], | ||
const IndexType | numInputInds | ||
) |
Attempt to merge the input indices into the current row's column indices, assuming that both the current row's indices and the input indices are unsorted.
Neither the current row's entries nor the input are sorted. If and only if the current row has enough space for the input (after merging), merge the input with the current row.
Assume that neither curInds nor inputInds are sorted. Current indices: curInds[0 .. midPos-1]. Extra space at end: curInds[midPos .. endPos-1] Input indices to merge in: inputInds[0 .. numInputInds]. Any of those could be empty.
If the merge succeeded, return true and the new number of entries in the row. Else, return false and the new number of entries in the row required to fit the input.
The unsorted case is bad for asymptotics, but the asymptotics only show up with dense or nearly dense rows, which are bad for other reasons.
Definition at line 293 of file Tpetra_Details_Merge.hpp.
std::pair<bool, IndexType> Tpetra::Details::mergeUnsortedIndicesAndValues | ( | OrdinalType | curInds[], |
ValueType | curVals[], | ||
const IndexType | midPos, | ||
const IndexType | endPos, | ||
const OrdinalType | inputInds[], | ||
const ValueType | inputVals[], | ||
const IndexType | numInputInds | ||
) |
Attempt to merge the input indices and values into the current row's column indices and corresponding values, assuming that both the current row's indices and the input indices are unsorted.
Neither the current row's entries nor the input are sorted. If and only if the current row has enough space for the input (after merging), merge the input with the current row.
Assume that neither curInds nor inputInds are sorted. Current indices: curInds[0 .. midPos-1]. Current values: curVals[0 .. midPos-1]. Extra space for indices at end: curInds[midPos .. endPos-1]. Extra space for values at end: curVals[midPos .. endPos-1]. Input indices to merge in: inputInds[0 .. numInputInds]. Input values to merge in: inputVals[0 .. numInputInds].
If the merge succeeded, return true and the new number of entries in the row. Else, return false and the new number of entries in the row required to fit the input.
The unsorted case is bad for asymptotics, but the asymptotics only show up with dense or nearly dense rows, which are bad for other reasons.
Definition at line 387 of file Tpetra_Details_Merge.hpp.
bool Tpetra::Details::mpiIsInitialized | ( | ) |
Has MPI_Init been called (on this process)?
If Tpetra was built with MPI support, then this wraps MPI_Initialized. If Tpetra was not built with MPI support, then this always returns false, regardless of whether the user has built with MPI.
MPI (at least 3.0) only permits MPI to be initialized once. After MPI_Init has been called on a process, MPI_Initialized always returns true on that process, regardless of whether MPI_Finalize has been called.
If you want to know whether MPI_Finalize has been called on this process, use mpiIsFinalized() (see below).
Definition at line 19 of file Tpetra_Details_mpiIsInitialized.cpp.
bool Tpetra::Details::mpiIsFinalized | ( | ) |
Has MPI_Finalize been called (on this process)?
If Tpetra was built with MPI support, then this wraps MPI_Finalized. If Tpetra was not built with MPI support, then this always returns false, regardless of whether the user has built with MPI.
MPI (at least 3.0) only permits MPI_Init to be called at most once on a process. After MPI_Finalize has been called successfully on a process, MPI_Finalized always returns true on that process.
If you want to know whether MPI_Init has been called on this process, use mpiIsInitialized() (see above).
Definition at line 32 of file Tpetra_Details_mpiIsInitialized.cpp.
void Tpetra::Details::normImpl | ( | MagnitudeType | norms[], |
const Kokkos::View< const ValueType **, ArrayLayout, DeviceType > & | X, | ||
const EWhichNorm | whichNorm, | ||
const Teuchos::ArrayView< const size_t > & | whichVecs, | ||
const bool | isConstantStride, | ||
const bool | isDistributed, | ||
const Teuchos::Comm< int > * | comm | ||
) |
Implementation of MultiVector norms.
Definition at line 282 of file Tpetra_Details_normImpl.hpp.
void Tpetra::Details::packCrsGraph | ( | const CrsGraph< LO, GO, NT > & | sourceGraph, |
Teuchos::Array< typename CrsGraph< LO, GO, NT >::packet_type > & | exports, | ||
const Teuchos::ArrayView< size_t > & | numPacketsPerLID, | ||
const Teuchos::ArrayView< const LO > & | exportLIDs, | ||
size_t & | constantNumPackets | ||
) |
Pack specified entries of the given local sparse graph for communication.
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
NT | The Kokkos Node type. See the documentation of Map for requirements. |
sourceGraph | [in] the CrsGraph source |
exports | [in/out] Output pack buffer; resized if needed. |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local graph. |
exportLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
This is the public interface to the pack machinery converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsGraph migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
Definition at line 734 of file Tpetra_Details_packCrsGraph_def.hpp.
void Tpetra::Details::packCrsGraphNew | ( | const CrsGraph< LO, GO, NT > & | sourceGraph, |
const Kokkos::DualView< const LO *, typename CrsGraph< LO, GO, NT >::buffer_device_type > & | exportLIDs, | ||
const Kokkos::DualView< const int *, typename CrsGraph< LO, GO, NT >::buffer_device_type > & | exportPIDs, | ||
Kokkos::DualView< typename CrsGraph< LO, GO, NT >::packet_type *, typename CrsGraph< LO, GO, NT >::buffer_device_type > & | exports, | ||
Kokkos::DualView< size_t *, typename CrsGraph< LO, GO, NT >::buffer_device_type > | numPacketsPerLID, | ||
size_t & | constantNumPackets, | ||
const bool | pack_pids | ||
) |
Pack specified entries of the given local sparse graph for communication, for "new" DistObject interface.
Pack specified entries of the given local sparse graph for communication ("new" DistObject interface version).
LO | The type of local indices. This must be the same as the LocalOrdinal template parameter of Tpetra::CrsGraph. |
GO | The type of global indices. This must be the same as the GlobalOrdinal template parameter of Tpetra::CrsGraph. |
NT | The Node type. This must be the same as the Node template parameter of Tpetra::CrsGraph. |
sourceGraph | [in] The "source" graph to pack. |
exports | [in/out] Output pack buffer; resized if needed. |
numPacketsPerLID | [out] On output, numPacketsPerLID.d_view[k] is the number of bytes packed for row exportLIDs.d_view[k] of the local graph. |
exportLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Same as the constantNumPackets output argument of Tpetra::DistObject::packAndPrepare (which see). |
This method implements CrsGraph::packNew, and thus CrsGraph::packAndPrepare, for the case where the graph to pack has a valid KokkosSparse::CrsGraph.
Definition at line 828 of file Tpetra_Details_packCrsGraph_def.hpp.
void Tpetra::Details::packCrsGraphWithOwningPIDs | ( | const CrsGraph< LO, GO, NT > & | sourceGraph, |
Kokkos::DualView< typename CrsGraph< LO, GO, NT >::packet_type *, typename CrsGraph< LO, GO, NT >::buffer_device_type > & | exports_dv, | ||
const Teuchos::ArrayView< size_t > & | numPacketsPerLID, | ||
const Teuchos::ArrayView< const LO > & | exportLIDs, | ||
const Teuchos::ArrayView< const int > & | sourcePIDs, | ||
size_t & | constantNumPackets | ||
) |
Pack specified entries of the given local sparse graph for communication.
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
NT | The Kokkos Node type. See the documentation of Map for requirements. |
sourceGraph | [in] the CrsGraph source |
exports | [in/out] Output pack buffer; resized if needed. |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local graph. |
exportLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
This is the public interface to the pack machinery converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsGraph migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
Definition at line 926 of file Tpetra_Details_packCrsGraph_def.hpp.
void Tpetra::Details::packCrsMatrix | ( | const CrsMatrix< ST, LO, GO, NT > & | sourceMatrix, |
Teuchos::Array< char > & | exports, | ||
const Teuchos::ArrayView< size_t > & | numPacketsPerLID, | ||
const Teuchos::ArrayView< const LO > & | exportLIDs, | ||
size_t & | constantNumPackets | ||
) |
Pack specified entries of the given local sparse matrix for communication.
ST | The type of the numerical entries of the matrix. (You can use real-valued or complex-valued types here, unlike in Epetra, where the scalar type is always double .) |
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
NT | The Kokkos Node type. See the documentation of Map for requirements. |
sourceMatrix | [in] the CrsMatrix source |
exports | [in/out] Output pack buffer; resized if needed. |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local matrix. |
exportLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
This is the public interface to the pack machinery converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsMatrix migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
Definition at line 829 of file Tpetra_Details_packCrsMatrix_def.hpp.
void Tpetra::Details::packCrsMatrixNew | ( | const CrsMatrix< ST, LO, GO, NT > & | sourceMatrix, |
Kokkos::DualView< char *, typename DistObject< char, LO, GO, NT >::buffer_device_type > & | exports, | ||
const Kokkos::DualView< size_t *, typename DistObject< char, LO, GO, NT >::buffer_device_type > & | numPacketsPerLID, | ||
const Kokkos::DualView< const LO *, typename DistObject< char, LO, GO, NT >::buffer_device_type > & | exportLIDs, | ||
size_t & | constantNumPackets | ||
) |
Pack specified entries of the given local sparse matrix for communication, for "new" DistObject interface.
ST | The type of the entries of the matrix. This must be the same as the Scalar template parameter of Tpetra::CrsMatrix. |
LO | The type of local indices. This must be the same as the LocalOrdinal template parameter of Tpetra::CrsMatrix. |
GO | The type of global indices. This must be the same as the GlobalOrdinal template parameter of Tpetra::CrsMatrix. |
NT | The Node type. This must be the same as the Node template parameter of Tpetra::CrsMatrix. |
sourceMatrix | [in] The "source" matrix to pack. |
exports | [in/out] Output pack buffer; resized if needed. |
numPacketsPerLID | [out] On output, numPacketsPerLID.d_view[k] is the number of bytes packed for row exportLIDs.d_view[k] of the local matrix. |
exportLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Same as the constantNumPackets output argument of Tpetra::DistObject::packAndPrepare (which see). |
This method implements CrsMatrix::packNew, and thus CrsMatrix::packAndPrepare, for the case where the matrix to pack has a valid KokkosSparse::CrsMatrix.
Definition at line 895 of file Tpetra_Details_packCrsMatrix_def.hpp.
void Tpetra::Details::packCrsMatrixWithOwningPIDs | ( | const CrsMatrix< ST, LO, GO, NT > & | sourceMatrix, |
Kokkos::DualView< char *, typename DistObject< char, LO, GO, NT >::buffer_device_type > & | exports_dv, | ||
const Teuchos::ArrayView< size_t > & | numPacketsPerLID, | ||
const Teuchos::ArrayView< const LO > & | exportLIDs, | ||
const Teuchos::ArrayView< const int > & | sourcePIDs, | ||
size_t & | constantNumPackets | ||
) |
Pack specified entries of the given local sparse matrix for communication.
ST | The type of the numerical entries of the matrix. (You can use real-valued or complex-valued types here, unlike in Epetra, where the scalar type is always double .) |
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
NT | The Kokkos Node type. See the documentation of Map for requirements. |
sourceMatrix | [in] the CrsMatrix source |
exports | [in/out] Output pack buffer; resized if needed. |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local matrix. |
exportLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
This is the public interface to the pack machinery converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsMatrix migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
Definition at line 930 of file Tpetra_Details_packCrsMatrix_def.hpp.
void Tpetra::Details::printOnce | ( | std::ostream & | out, |
const std::string & | s, | ||
const Teuchos::Comm< int > * | comm | ||
) |
Print on one process of the given communicator, or at least try to do so (if MPI is not initialized).
out | [out] Output stream to which to print. If MPI is initialized, then it need only be valid on Process 0 of the given communicator. Otherwise, it must be valid on all processes of the given communicator. |
s | [in] String to print. |
comm | [in] Communicator; if nullptr, print on all processes, else, print based on above rule. |
Definition at line 80 of file Tpetra_Details_printOnce.cpp.
KOKKOS_INLINE_FUNCTION void Tpetra::Details::radixSortKeysAndValues | ( | KeyType * | keys, |
KeyType * | keysAux, | ||
ValueType * | values, | ||
ValueType * | valuesAux, | ||
IndexType | n, | ||
IndexType | upperBound | ||
) |
Radix sort the input array keys
, and permute values identically to the keys.
Radix sort may be significantly faster (60%) than Details::shellsort but only works for integers
keys | [in/out] Input array of keys to sort. |
keysAux | [in] Scratch space (double buffer) for keys (must be allocated to same size as keys) |
values | [in/out] Input array of values to permute (must have same number of elements as keys) |
valuesAux | [in] Scratch space (double buffer) for values (must be allocated to same size as values) |
n | [in] Length of all 4 input arrays keys, keysAux, values and valuesAux. |
Definition at line 34 of file Tpetra_Details_radixSort.hpp.
bool Tpetra::Details::reallocDualViewIfNeeded | ( | Kokkos::DualView< ValueType *, DeviceType > & | dv, |
const size_t | newSize, | ||
const char | newLabel[], | ||
const size_t | tooBigFactor = 2 , |
||
const bool | needFenceBeforeRealloc = true |
||
) |
Reallocate the DualView in/out argument, if needed.
dv | [in/out] The DualView to reallocate, if needed. |
newSize | [in] New (requested) size of the DualView. |
newLabel | [in] New label for the DualView; only used if reallocating. |
tooBigFactor | [in] Factor for deciding whether to free and reallocate, or just take a subview, if dv is too big. Taking a subview avoids reallocation, which is expensive for some memory spaces. |
needFenceBeforeRealloc | [in] Whether we need to execute a fence before reallocation (see below). The fence will only happen if this function needs to reallocate. |
If dv
is too small, reallocate it to the requested size. If it is too large, and at least tooBigFactor times bigger than it needs to be, free it and reallocate to the size we need, in order to save space. Otherwise, just set it to a subview of itself, so that the size is correct.
DeviceType::execution_space().fence()
. Definition at line 51 of file Tpetra_Details_reallocDualViewIfNeeded.hpp.
bool Tpetra::Details::reallocDualViewIfNeeded | ( | Kokkos::DualView< ValueType *, DeviceType > & | exports, |
const size_t | newSize, | ||
const std::string & | newLabel, | ||
const size_t | tooBigFactor = 2 , |
||
const bool | needFenceBeforeRealloc = true |
||
) |
Like above, but with std::string
label argument.
Definition at line 112 of file Tpetra_Details_reallocDualViewIfNeeded.hpp.
void Tpetra::Details::rightScaleLocalCrsMatrix | ( | const LocalSparseMatrixType & | A_lcl, |
const ScalingFactorsViewType & | scalingFactors, | ||
const bool | assumeSymmetric, | ||
const bool | divide = true |
||
) |
Right-scale a KokkosSparse::CrsMatrix.
LocalSparseMatrixType | KokkosSparse::CrsMatrix specialization. |
ScalingFactorsViewType | Kokkos::View specialization storing scaling factors by which to divide the rows of the local sparse matrix. |
A_lcl | [in/out] The local sparse matrix. |
scalingFactors | [in] Column scaling factors. |
assumeSymmetric | [in] If true, divide matrix entries by square roots of scaling factors; else, divide by the scaling factors themselves. |
divide | [in] If true, divide; else multiply. |
Definition at line 113 of file Tpetra_Details_rightScaleLocalCrsMatrix.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeysAndValues_2 | ( | KeyType | keys[2], |
ValueType | values[2] | ||
) |
Sort keys and values jointly, by keys, for arrays of length 2.
KeyType | Greater-than comparable, copy constructible, assignable |
ValueType | Copy constructible, assignable |
keys | [in/out] Length 2 array of keys. This function sorts this keys array, and applies the same permutation to the values array. |
values | [in/out] Length 2 array of values. |
Definition at line 104 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeys_2 | ( | KeyType | keys[2] | ) |
Sort length-2 array of keys.
KeyType | Greater-than comparable, copy constructible, assignable |
keys | [in/out] Length-2 array of keys to sort. |
Definition at line 120 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeysAndValues_3 | ( | KeyType | keys[3], |
ValueType | values[3] | ||
) |
Sort keys and values jointly, by keys, for arrays of length 3.
KeyType | Greater-than comparable, copy constructible, assignable |
ValueType | Copy constructible, assignable |
keys | [in/out] Length 3 array of keys. This function sorts this keys array, and applies the same permutation to the values array. |
values | [in/out] Length 3 array of values. |
Definition at line 140 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeys_3 | ( | KeyType | keys[3] | ) |
Sort length-3 array of keys.
KeyType | Greater-than comparable, copy constructible, assignable |
keys | [in/out] Length-3 array of keys to sort. |
Definition at line 162 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeysAndValues_4 | ( | KeyType | keys[4], |
ValueType | values[4] | ||
) |
Sort keys and values jointly, by keys, for arrays of length 4.
KeyType | Greater-than comparable, copy constructible, assignable |
ValueType | Copy constructible, assignable |
keys | [in/out] Length 4 array of keys. This function sorts this keys array, and applies the same permutation to the values array. |
values | [in/out] Length 4 array of values. |
Definition at line 188 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeys_4 | ( | KeyType | keys[4] | ) |
Sort length-4 array of keys.
KeyType | Greater-than comparable, copy constructible, assignable |
keys | [in/out] Length-4 array of keys to sort. |
Definition at line 212 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeysAndValues_8 | ( | KeyType | keys[8], |
ValueType | values[8] | ||
) |
Sort keys and values jointly, by keys, for arrays of length 8.
KeyType | Greater-than comparable, copy constructible, assignable |
ValueType | Copy constructible, assignable |
keys | [in/out] Length 8 array of keys. This function sorts this keys array, and applies the same permutation to the values array. |
values | [in/out] Length 8 array of values. |
Definition at line 240 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shortSortKeys_8 | ( | KeyType | keys[8] | ) |
Sort length-8 array of keys.
KeyType | Greater-than comparable, copy constructible, assignable |
keys | [in/out] Length-8 array of keys to sort. |
Definition at line 278 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shellSortKeysAndValues | ( | KeyType | keys[], |
ValueType | values[], | ||
const IndexType | n | ||
) |
Shellsort (yes, it's one word) the input array keys
, and apply the resulting permutation to the input array values
.
mfh 28 Nov 2016, 17 Dec 2016: I adapted this function from sh_sort2 in Tpetra_Util.hpp (in this directory).
Definition at line 315 of file Tpetra_Details_shortSort.hpp.
KOKKOS_FUNCTION void Tpetra::Details::shellSortKeys | ( | KeyType | keys[], |
const IndexType | n | ||
) |
Shellsort (yes, it's one word) the input array keys
.
keys | [in/out] Input array of keys to sort. |
n | [in] Length of the input array keys . |
Definition at line 356 of file Tpetra_Details_shortSort.hpp.
size_t Tpetra::Details::unpackAndCombineWithOwningPIDsCount | ( | const CrsGraph< LO, GO, NT > & | sourceGraph, |
const Teuchos::ArrayView< const LO > & | importLIDs, | ||
const Teuchos::ArrayView< const typename CrsGraph< LO, GO, NT >::packet_type > & | imports, | ||
const Teuchos::ArrayView< const size_t > & | numPacketsPerLID, | ||
size_t | constantNumPackets, | ||
CombineMode | combineMode, | ||
size_t | numSameIDs, | ||
const Teuchos::ArrayView< const LO > & | permuteToLIDs, | ||
const Teuchos::ArrayView< const LO > & | permuteFromLIDs | ||
) |
Special version of Tpetra::Details::unpackCrsGraphAndCombine that also unpacks owning process ranks.
Perform the count for unpacking the imported column indices and pids, and combining them into graph. Return (a ceiling on) the number of local stored entries ("nonzeros") in the graph. If there are no shared rows in the sourceGraph this count is exact.
Note: This routine also counts the copyAndPermute nonzeros in addition to those that come in via import.
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
NT | The Kokkos Node type. See the documentation of Map for requirements. |
sourceGraph | [in] the CrsGraph source |
imports | [in] Input pack buffer |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local graph. |
importLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
combineMode | [in] the mode to use for combining |
numSameIds | [in] |
permuteToLIDs | [in] |
permuteFromLIDs | [in] |
combineMode
are: ADD, REPLACE, and ABSMAX. INSERT is not allowed. Note: This is the public interface to the unpack and combine machinery and converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsGraph migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
void Tpetra::Details::unpackAndCombineIntoCrsArrays | ( | const CrsGraph< LO, GO, NT > & | sourceGraph, |
const Teuchos::ArrayView< const LO > & | importLIDs, | ||
const Teuchos::ArrayView< const typename CrsGraph< LO, GO, NT >::packet_type > & | imports, | ||
const Teuchos::ArrayView< const size_t > & | numPacketsPerLID, | ||
const size_t | constantNumPackets, | ||
const CombineMode | combineMode, | ||
const size_t | numSameIDs, | ||
const Teuchos::ArrayView< const LO > & | permuteToLIDs, | ||
const Teuchos::ArrayView< const LO > & | permuteFromLIDs, | ||
size_t | TargetNumRows, | ||
size_t | TargetNumNonzeros, | ||
const int | MyTargetPID, | ||
const Teuchos::ArrayView< size_t > & | CRS_rowptr, | ||
const Teuchos::ArrayView< GO > & | CRS_colind, | ||
const Teuchos::ArrayView< const int > & | SourcePids, | ||
Teuchos::Array< int > & | TargetPids | ||
) |
unpackAndCombineIntoCrsArrays
Note: The SourcePids vector (on input) should contain owning PIDs for each column in the (source) ColMap, as from Tpetra::Import_Util::getPids, with the "-1 for local" option being used.
Note: The TargetPids vector (on output) will contain owning PIDs for each entry in the graph, with the "-1 for local" for locally owned entries.
size_t Tpetra::Details::unpackAndCombineWithOwningPIDsCount | ( | const CrsGraph< LocalOrdinal, GlobalOrdinal, Node > & | sourceGraph, |
const Teuchos::ArrayView< const LocalOrdinal > & | importLIDs, | ||
const Teuchos::ArrayView< const typename CrsGraph< LocalOrdinal, GlobalOrdinal, Node >::packet_type > & | imports, | ||
const Teuchos::ArrayView< const size_t > & | numPacketsPerLID, | ||
size_t | , | ||
CombineMode | , | ||
size_t | numSameIDs, | ||
const Teuchos::ArrayView< const LocalOrdinal > & | permuteToLIDs, | ||
const Teuchos::ArrayView< const LocalOrdinal > & | permuteFromLIDs | ||
) |
Special version of Tpetra::Details::unpackCrsGraphAndCombine that also unpacks owning process ranks.
Perform the count for unpacking the imported column indices and pids, and combining them into graph. Return (a ceiling on) the number of local stored entries ("nonzeros") in the graph. If there are no shared rows in the sourceGraph this count is exact.
Note: This routine also counts the copyAndPermute nonzeros in addition to those that come in via import.
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
Node | The Kokkos Node type. See the documentation of Map for requirements. |
sourceGraph | [in] the CrsGraph source |
imports | [in] Input pack buffer |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local graph. |
importLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
combineMode | [in] the mode to use for combining |
numSameIds | [in] |
permuteToLIDs | [in] |
permuteFromLIDs | [in] |
Note: This is the public interface to the unpack and combine machinery and converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsGraph migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
Definition at line 824 of file Tpetra_Details_unpackCrsGraphAndCombine_def.hpp.
void Tpetra::Details::unpackAndCombineIntoCrsArrays | ( | const CrsGraph< LocalOrdinal, GlobalOrdinal, Node > & | sourceGraph, |
const Teuchos::ArrayView< const LocalOrdinal > & | importLIDs, | ||
const Teuchos::ArrayView< const typename CrsGraph< LocalOrdinal, GlobalOrdinal, Node >::packet_type > & | imports, | ||
const Teuchos::ArrayView< const size_t > & | numPacketsPerLID, | ||
const size_t | , | ||
const CombineMode | , | ||
const size_t | numSameIDs, | ||
const Teuchos::ArrayView< const LocalOrdinal > & | permuteToLIDs, | ||
const Teuchos::ArrayView< const LocalOrdinal > & | permuteFromLIDs, | ||
size_t | TargetNumRows, | ||
size_t | TargetNumNonzeros, | ||
const int | MyTargetPID, | ||
const Teuchos::ArrayView< size_t > & | CRS_rowptr, | ||
const Teuchos::ArrayView< GlobalOrdinal > & | CRS_colind, | ||
const Teuchos::ArrayView< const int > & | SourcePids, | ||
Teuchos::Array< int > & | TargetPids | ||
) |
unpackAndCombineIntoCrsArrays
Note: The SourcePids vector (on input) should contam Tpetra::Import_Util::getPids, with the "-1 for local" option being used.
Note: The TargetPids vector (on output) will contain owning PIDs for each entry in the graph, with the "-1 for local" for locally owned entries.
Definition at line 895 of file Tpetra_Details_unpackCrsGraphAndCombine_def.hpp.
void Tpetra::Details::unpackCrsMatrixAndCombine | ( | const CrsMatrix< ST, LO, GO, NT > & | sourceMatrix, |
const Teuchos::ArrayView< const char > & | imports, | ||
const Teuchos::ArrayView< const size_t > & | numPacketsPerLID, | ||
const Teuchos::ArrayView< const LO > & | importLIDs, | ||
size_t | constantNumPackets, | ||
CombineMode | combineMode | ||
) |
Unpack the imported column indices and values, and combine into matrix.
ST | The type of the numerical entries of the matrix. (You can use real-valued or complex-valued types here, unlike in Epetra, where the scalar type is always double .) |
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
NT | The Node type. See the documentation of Map for requirements. |
sourceMatrix | [in] the CrsMatrix source |
imports | [in] Input pack buffer |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local matrix. |
importLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
distor | [in] The distributor (not used) |
combineMode | [in] the mode to use for combining values |
combineMode
are: ADD, REPLACE, and ABSMAX. INSERT is not allowed.This is the public interface to the unpack and combine machinery and converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsMatrix migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
size_t Tpetra::Details::unpackAndCombineWithOwningPIDsCount | ( | const CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node > & | sourceMatrix, |
const Teuchos::ArrayView< const LocalOrdinal > & | importLIDs, | ||
const Teuchos::ArrayView< const char > & | imports, | ||
const Teuchos::ArrayView< const size_t > & | numPacketsPerLID, | ||
size_t | constantNumPackets, | ||
CombineMode | combineMode, | ||
size_t | numSameIDs, | ||
const Teuchos::ArrayView< const LocalOrdinal > & | permuteToLIDs, | ||
const Teuchos::ArrayView< const LocalOrdinal > & | permuteFromLIDs | ||
) |
Special version of Tpetra::Details::unpackCrsMatrixAndCombine that also unpacks owning process ranks.
Perform the count for unpacking the imported column indices pids, and values, and combining them into matrix. Return (a ceiling on) the number of local stored entries ("nonzeros") in the matrix. If there are no shared rows in the sourceMatrix this count is exact.
Note: This routine also counts the copyAndPermute nonzeros in addition to those that come in via import.
ST | The type of the numerical entries of the matrix. (You can use real-valued or complex-valued types here, unlike in Epetra, where the scalar type is always double .) |
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
Node | The Kokkos Node type. See the documentation of Map for requirements. |
sourceMatrix | [in] the CrsMatrix source |
imports | [in] Input pack buffer |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local matrix. |
importLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
distor | [in] The distributor (not used) |
combineMode | [in] the mode to use for combining values |
numSameIds | [in] |
permuteToLIDs | [in] |
permuteFromLIDs | [in] |
combineMode
are: ADD, REPLACE, and ABSMAX. INSERT is not allowed. Note: This is the public interface to the unpack and combine machinery and converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsMatrix migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
Definition at line 1324 of file Tpetra_Details_unpackCrsMatrixAndCombine_def.hpp.
void Tpetra::Details::unpackAndCombineIntoCrsArrays | ( | const CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node > & | sourceMatrix, |
const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | import_lids_d, | ||
const Kokkos::View< const char *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | imports_d, | ||
const Kokkos::View< const size_t *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | num_packets_per_lid_d, | ||
const size_t | numSameIDs, | ||
const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | permute_to_lids_d, | ||
const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | permute_from_lids_d, | ||
size_t | TargetNumRows, | ||
const int | MyTargetPID, | ||
Teuchos::ArrayRCP< size_t > & | CRS_rowptr, | ||
Teuchos::ArrayRCP< GlobalOrdinal > & | CRS_colind, | ||
Teuchos::ArrayRCP< Scalar > & | CRS_vals, | ||
const Teuchos::ArrayView< const int > & | SourcePids, | ||
Teuchos::Array< int > & | TargetPids | ||
) |
unpackAndCombineIntoCrsArrays
Note: The SourcePids vector (on input) should contain owning PIDs for each column in the (source) ColMap, as from Tpetra::Import_Util::getPids, with the "-1 for local" option being used.
Note: The TargetPids vector (on output) will contain owning PIDs for each entry in the matrix, with the "-1 for local" for locally owned entries.
Note: This method does the work previously done in unpackAndCombineWithOwningPIDsCount, namely, calculating the local number of nonzeros, and allocates CRS arrays of the correct sizes.
Definition at line 1575 of file Tpetra_Details_unpackCrsMatrixAndCombine_def.hpp.
void Tpetra::Details::unpackAndCombineIntoCrsArrays | ( | const CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node > & | sourceMatrix, |
const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | import_lids_d, | ||
const Kokkos::View< const char *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | imports_d, | ||
const Kokkos::View< const size_t *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | num_packets_per_lid_d, | ||
const size_t | numSameIDs, | ||
const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | permute_to_lids_d, | ||
const Kokkos::View< LocalOrdinal const *, Kokkos::Device< typename Node::device_type::execution_space, Tpetra::Details::DefaultTypes::comm_buffer_memory_space< typename Node::device_type >>, void, void > | permute_from_lids_d, | ||
size_t | TargetNumRows, | ||
const int | MyTargetPID, | ||
Kokkos::View< size_t *, typename Node::device_type > & | crs_rowptr_d, | ||
Kokkos::View< GlobalOrdinal *, typename Node::device_type > & | crs_colind_d, | ||
Kokkos::View< typename CrsMatrix< Scalar, LocalOrdinal, GlobalOrdinal, Node >::impl_scalar_type *, typename Node::device_type > & | crs_vals_d, | ||
const Teuchos::ArrayView< const int > & | SourcePids, | ||
Kokkos::View< int *, typename Node::device_type > & | TargetPids | ||
) |
unpackAndCombineIntoCrsArrays
Note: The SourcePids vector (on input) should contain owning PIDs for each column in the (source) ColMap, as from Tpetra::Import_Util::getPids, with the "-1 for local" option being used.
Note: The TargetPids vector (on output) will contain owning PIDs for each entry in the matrix, with the "-1 for local" for locally owned entries.
Definition at line 1400 of file Tpetra_Details_unpackCrsMatrixAndCombine_def.hpp.
void Tpetra::Details::unpackCrsMatrixAndCombine | ( | const CrsMatrix< ST, LO, GO, Node > & | sourceMatrix, |
const Teuchos::ArrayView< const char > & | imports, | ||
const Teuchos::ArrayView< const size_t > & | numPacketsPerLID, | ||
const Teuchos::ArrayView< const LO > & | importLIDs, | ||
size_t | , | ||
CombineMode | combineMode | ||
) |
Unpack the imported column indices and values, and combine into matrix.
ST | The type of the numerical entries of the matrix. (You can use real-valued or complex-valued types here, unlike in Epetra, where the scalar type is always double .) |
LO | The type of local indices. See the documentation of Map for requirements. |
GO | The type of global indices. See the documentation of Map for requirements. |
Node | The Kokkos Node type. See the documentation of Map for requirements. |
sourceMatrix | [in] the CrsMatrix source |
imports | [in] Input pack buffer |
numPacketsPerLID | [out] Entry k gives the number of bytes packed for row exportLIDs[k] of the local matrix. |
importLIDs | [in] Local indices of the rows to pack. |
constantNumPackets | [out] Setting this to zero tells the caller to expect a possibly /// different ("nonconstant") number of packets per local index (i.e., a possibly different number of entries per row). |
distor | [in] The distributor (not used) |
combineMode | [in] the mode to use for combining values |
atomic | [in] whether or not do atomic adds/replaces in to the matrix |
combineMode
are: ADD, REPLACE, and ABSMAX. INSERT is not allowed.This is the public interface to the unpack and combine machinery and converts passed Teuchos::ArrayView objects to Kokkos::View objects (and copies back in to the Teuchos::ArrayView objects, if needed). When CrsMatrix migrates fully to adopting Kokkos::DualView objects for its storage of data, this procedure could be bypassed.
Definition at line 1167 of file Tpetra_Details_unpackCrsMatrixAndCombine_def.hpp.
void Tpetra::Details::enableWDVTracking | ( | ) |
Enable WrappedDualView reference-count tracking and syncing. Call this after exiting a host-parallel region that uses WrappedDualView.
Definition at line 17 of file Tpetra_Details_WrappedDualView.cpp.
void Tpetra::Details::disableWDVTracking | ( | ) |
Disable WrappedDualView reference-count tracking and syncing. Call this before entering a host-parallel region that uses WrappedDualView. For each WrappedDualView used in the parallel region, its view must be accessed (e.g. getHostView...) before disabling the tracking, so that it may be synced and marked modified correctly.
Definition at line 24 of file Tpetra_Details_WrappedDualView.cpp.
void Tpetra::Details::idotLocal | ( | const ResultView & | localResult, |
const MV & | X, | ||
const MV & | Y | ||
) |
Compute dot product locally. Where the kernel runs controlled by runOnDevice.
Definition at line 65 of file Tpetra_idot.hpp.
std::shared_ptr< ::Tpetra::Details::CommRequest> Tpetra::Details::idotImpl | ( | const ResultView & | globalResult, |
const MV & | X, | ||
const MV & | Y | ||
) |
Internal (common) version of idot, a global dot product that uses a non-blocking MPI reduction.
Definition at line 178 of file Tpetra_idot.hpp.
bool Tpetra::Details::congruent | ( | const Teuchos::Comm< int > & | comm1, |
const Teuchos::Comm< int > & | comm2 | ||
) |
Whether the two communicators are congruent.
Two communicators are congruent when they have the same number of processes, and those processes occur in the same rank order.
If both communicators are MpiComm instances, this function returns true
exactly when MPI_Comm_compare
returns MPI_IDENT
(the communicators are handles for the same object) or MPI_CONGRUENT
. SerialComm instances are always congruent. An MpiComm is congruent to a SerialComm if the MpiComm has only one process. This function is symmetric in its arguments.
If either Comm instance is neither an MpiComm nor a SerialComm, this method cannot do any better than to compare their process counts.
Two communicators are congruent when they have the same number of processes, and those processes occur in the same rank order.
If both communicators are Teuchos::MpiComm instances, this function returns true
exactly when MPI_Comm_compare
returns MPI_IDENT
(the communicators are handles for the same object) or MPI_CONGRUENT
on their MPI_Comm handles. Any two Teuchos::SerialComm instances are always congruent. An MpiComm instance is congruent to a SerialComm instance if and only if the MpiComm has one process. This function is symmetric in its arguments.
If either Teuchos::Comm instance is neither an MpiComm nor a SerialComm, this method cannot do any better than to compare their process counts.
Definition at line 34 of file Tpetra_Util.cpp.
std::unique_ptr< std::string > Tpetra::Details::createPrefix | ( | const int | myRank, |
const char | prefix[] | ||
) |
Create string prefix for each line of verbose output.
Definition at line 71 of file Tpetra_Util.cpp.
std::unique_ptr< std::string > Tpetra::Details::createPrefix | ( | const Teuchos::Comm< int > * | comm, |
const char | functionName[] | ||
) |
Create string prefix for each line of verbose output, for a Tpetra function (not a class or instance method).
comm | [in] May be null; if not, the communicator from which to draw the (MPI) process rank. |
functionName | [in] Name of the function. |
Definition at line 80 of file Tpetra_Util.cpp.
std::unique_ptr< std::string > Tpetra::Details::createPrefix | ( | const Teuchos::Comm< int > * | , |
const char | className[], | ||
const char | methodName[] | ||
) |
Create string prefix for each line of verbose output, for a method of a Tpetra class.
className | [in] Name of the class. |
methodName | [in] Name of the (class or instance) method. |
Definition at line 89 of file Tpetra_Util.cpp.
Teuchos::ArrayView<typename DualViewType::t_dev::value_type> Tpetra::Details::getArrayViewFromDualView | ( | const DualViewType & | x | ) |
Get a Teuchos::ArrayView which views the host Kokkos::View of the input 1-D Kokkos::DualView.
x | [in] A specialization of Kokkos::DualView. |
Definition at line 974 of file Tpetra_Util.hpp.
Kokkos::DualView<T*, DT> Tpetra::Details::getDualViewCopyFromArrayView | ( | const Teuchos::ArrayView< const T > & | x_av, |
const char | label[], | ||
const bool | leaveOnHost | ||
) |
Get a 1-D Kokkos::DualView which is a deep copy of the input Teuchos::ArrayView (which views host memory).
T | The type of the entries of the input Teuchos::ArrayView. |
DT | The Kokkos Device type. |
x_av | [in] The Teuchos::ArrayView to copy. |
label | [in] String label for the Kokkos::DualView. |
leaveOnHost | [in] If true, the host version of the returned Kokkos::DualView is most recently updated (and the DualView may need a sync to device). If false, the device version is most recently updated (and the DualView may need a sync to host). |
Definition at line 1012 of file Tpetra_Util.hpp.
std::string Tpetra::Details::dualViewStatusToString | ( | const DualViewType & | dv, |
const char | name[] | ||
) |
Return the status of the given Kokkos::DualView, as a human-readable string.
This is meant for Tpetra developers as a debugging aid.
dv | [in] Kokkos::DualView |
name | [in] Human-readable name of the Kokkos::DualView |
Definition at line 1045 of file Tpetra_Util.hpp.
void Tpetra::Details::verbosePrintArray | ( | std::ostream & | out, |
const ArrayType & | x, | ||
const char | name[], | ||
const size_t | maxNumToPrint | ||
) |
Print min(x.size(), maxNumToPrint) entries of x.
out
is an std::ostringstream. Definition at line 1062 of file Tpetra_Util.hpp.
int Tpetra::Details::countPackTriplesCount | ( | const ::Teuchos::Comm< int > & | comm, |
int & | size, | ||
std::ostream * | errStrm = NULL |
||
) |
Compute the buffer size required by packTriples for packing the number of matrix entries ("triples").
countPackTriples tells me an upper bound on how much buffer space I need to hold numEnt triples. packTriplesCount actually packs numEnt, the number of triples. countPackTriplesCount tells me an upper bound on how much buffer space I need to hold the number of triples, not the triples themselves.
comm | [in] Communicator used in sending and receiving the packed entries. (MPI wants this, so we have to include it.). |
size | [out] Pack buffer size in bytes (sizeof(char)). |
errStrm | [out] If nonnull, print any error messages to this stream, else don't print error messages. |
Definition at line 141 of file Tpetra_Details_PackTriples.cpp.
int Tpetra::Details::packTriplesCount | ( | const int | numEnt, |
char | outBuf[], | ||
const int | outBufSize, | ||
int & | outBufCurPos, | ||
const ::Teuchos::Comm< int > & | comm, | ||
std::ostream * | errStrm = NULL |
||
) |
Pack the count (number) of matrix triples.
This function is NOT the same thing as countPackTriples. countPackTriples tells me an upper bound on how much buffer space I need to hold numEnt triples. packTriplesCount actually packs numEnt, the number of triples. countPackTriplesCount tells me an upper bound on how much buffer space I need to hold the number of triples, not the triples themselves.
numEnt | [in] Number of matrix entries ("triples") to pack. |
outBuf | [out] Output buffer. |
outBufSize | [out] Total output buffer size in bytes. |
outBufCurPos | [in/out] Current position from which to start writing to the output buffer. This corresponds to the 'position' in/out argument of MPI_Pack. |
comm | [in] Communicator used in sending and receiving the packed entries. (MPI wants this, so we have to include it.). |
errStrm | [out] If nonnull, print any error messages to this stream, else don't print error messages. |
Definition at line 169 of file Tpetra_Details_PackTriples.cpp.
int Tpetra::Details::unpackTriplesCount | ( | const char | inBuf[], |
const int | inBufSize, | ||
int & | inBufCurPos, | ||
int & | numEnt, | ||
const ::Teuchos::Comm< int > & | comm, | ||
std::ostream * | errStrm = NULL |
||
) |
Unpack just the count of triples from the given input buffer.
We store the count of triples as an int
, because MPI buffer sizes are int
.
inBuf | [in] Input buffer. |
inBufSize | [out] Total input buffer size in bytes. |
inBufCurPos | [in/out] Current position from which to start reading from the input buffer. This corresponds to the 'position' in/out argument of MPI_Unpack. |
numEnt | [out] Number of matrix entries ("triples") that were packed. |
comm | [in] Communicator used in sending and receiving the packed entries. (MPI wants this, so we have to include it.). |
errStrm | [out] If nonnull, print any error messages to this stream, else don't print error messages. |
Definition at line 200 of file Tpetra_Details_PackTriples.cpp.
int Tpetra::Details::countPackTriples | ( | const int | numEnt, |
const ::Teuchos::Comm< int > & | comm, | ||
int & | size, | ||
std::ostream * | errStrm = NULL |
||
) |
Compute the buffer size required by packTriples for packing numEnt
number of (i,j,A(i,j)) matrix entries ("triples").
This function is NOT the same thing as packTriplesCount. countPackTriples tells me an upper bound on how much buffer space I need to hold numEnt triples. packTriplesCount actually packs numEnt, the number of triples. countPackTriplesCount tells me an upper bound on how much buffer space I need to hold the number of triples, not the triples themselves.
ScalarType | Type of each matrix entry A(i,j). |
OrdinalType | Type of each matrix index i or j. |
numEnt | [in] Number of matrix entries ("triples") to pack. |
comm | [in] Communicator used in sending and receiving the packed entries. (MPI wants this, so we have to include it.). |
size | [out] Pack buffer size in bytes (sizeof(char)). |
errStrm | [out] If nonnull, print any error messages to this stream, else don't print error messages. |
Definition at line 321 of file Tpetra_Details_PackTriples.hpp.
int Tpetra::Details::packTriples | ( | const OrdinalType | [], |
const OrdinalType | [], | ||
const ScalarType | [], | ||
const int | , | ||
char | [], | ||
const int | , | ||
int & | , | ||
const ::Teuchos::Comm< int > & | , | ||
std::ostream * | errStrm = NULL |
||
) |
Pack matrix entries ("triples" (i, j, A(i,j))) into the given output buffer.
ScalarType | Type of each matrix entry A(i,j). |
OrdinalType | Type of each matrix index i or j. |
gblRowInds | [in] Row indices to pack. |
gblColInds | [in] Column indices to pack. |
val | [in] Matrix values A(i,j) to pack. |
numEnt | [in] Number of matrix entries ("triples") to pack. |
outBuf | [out] Output buffer. |
outBufSize | [out] Total output buffer size in bytes. |
outBufCurPos | [in/out] Current position from which to start writing to the output buffer. This corresponds to the 'position' in/out argument of MPI_Pack. |
comm | [in] Communicator used in sending and receiving the packed entries. (MPI wants this, so we have to include it.). |
errStrm | [out] If nonnull, print any error messages to this stream, else don't print error messages. |
Definition at line 431 of file Tpetra_Details_PackTriples.hpp.
int Tpetra::Details::unpackTriples | ( | const char | [], |
const int | , | ||
int & | , | ||
OrdinalType | [], | ||
OrdinalType | [], | ||
ScalarType | [], | ||
const int | , | ||
const ::Teuchos::Comm< int > & | , | ||
std::ostream * | errStrm = NULL |
||
) |
Unpack matrix entries ("triples" (i, j, A(i,j))) from the given input buffer.
ScalarType | Type of each matrix entry A(i,j). |
OrdinalType | Type of each matrix index i or j. |
inBuf | [in] Input buffer. |
inBufSize | [out] Total pack buffer size in bytes (sizeof(char)). |
inBufCurPos | [in/out] Current position from which to start reading from the input buffer. This corresponds to the 'position' in/out argument of MPI_Unpack. |
gblRowInds | [out] Row indices unpacked. |
gblColInds | [out] Column indices unpacked. |
val | [out] Matrix values A(i,j) unpacked. |
numEnt | [in] Number of matrix entries ("triples") to unpack. If you don't know it, then you should have senders pack the triples count as the first thing in the buffer, and unpack it first via unpackTriplesCount(). |
comm | [in] Communicator used in sending and receiving the packed entries. (MPI wants this, so we have to include it.). |
errStrm | [out] If nonnull, print any error messages to this stream, else don't print error messages. |
Definition at line 547 of file Tpetra_Details_PackTriples.hpp.
int Tpetra::Details::readAndDealOutTriples | ( | std::istream & | inputStream, |
std::size_t & | curLineNum, | ||
std::size_t & | totalNumEntRead, | ||
std::function< int(const GO, const GO, const SC &)> | processTriple, | ||
const std::size_t | maxNumEntPerMsg, | ||
const ::Teuchos::Comm< int > & | comm, | ||
const bool | tolerant = false , |
||
std::ostream * | errStrm = NULL , |
||
const bool | debug = false |
||
) |
On Process 0 in the given communicator, read sparse matrix entries (in chunks of at most maxNumEntPerMsg entries at a time) from the input stream, and "deal them out" to all other processes in the communicator.
This is a collective over the communicator.
SC | The type of the value of each matrix entry. |
GO | The type of each (global) index of each matrix entry. |
inputStream | [in/out] Input stream from which to read Matrix Market - format matrix entries ("triples"). Only Process 0 in the communicator needs to be able to access this. |
curLineNum | [in/out] On both input and output, the current line number in the input stream. (In the Matrix Market format, sparse matrix entries cannot start until at least line 3 of the file.) This is only valid on Process 0. |
totalNumEntRead | [out] Total number of matrix entries (triples) read on Process 0. This is only valid on Process 0. |
processTriple | [in] Closure, generally with side effects, that takes in and stores off a sparse matrix entry. First argument is the (global) row index, second argument is the (global) column index, and third argument is the value of the entry. The closure must NOT do MPI communication. Return value is an error code, that is zero if and only if the closure succeeded. We intend for you to use this to call CooMatrix::insertEntry. |
comm | [in] Communicator to use for receiving the triples. |
tolerant | [in] Whether to read tolerantly. |
errStrm | [in] If not NULL, print any error messages to this stream. |
Definition at line 907 of file Tpetra_Details_ReadTriples.hpp.
bool Tpetra::Details::wdvTrackingEnabled = true |
Whether WrappedDualView reference count checking is enabled. Initially true. Since the DualView sync functions are not thread-safe, tracking should be disabled during host-parallel regions where WrappedDualView is used.
Definition at line 15 of file Tpetra_Details_WrappedDualView.cpp.