Tpetra parallel linear algebra
Version of the Day
|
Description of Tpetra's behavior. More...
#include <Tpetra_Details_Behavior.hpp>
Static Public Member Functions | |
static bool | debug () |
Whether Tpetra is in debug mode. More... | |
static bool | debug (const char name[]) |
Whether the given Tpetra object is in debug mode. More... | |
static bool | verbose () |
Whether Tpetra is in verbose mode. More... | |
static bool | verbose (const char name[]) |
Whether the given Tpetra object is in verbose mode. More... | |
static void | disable_verbose_behavior () |
Disable verbose mode, programatically. More... | |
static void | enable_verbose_behavior () |
Enable verbose mode, programatically. More... | |
static bool | timing () |
Whether Tpetra is in timing mode. More... | |
static bool | timing (const char name[]) |
Whether the given Tpetra object is in timing mode. More... | |
static void | disable_timing () |
Disable timing, programatically. More... | |
static void | enable_timing () |
Enable timing, programatically. More... | |
static bool | assumeMpiIsGPUAware () |
Whether to assume that MPI is CUDA aware. More... | |
static bool | cudaLaunchBlocking () |
Whether the CUDA_LAUNCH_BLOCKING environment variable has been set. More... | |
static int | TAFC_OptimizationCoreCount () |
MPI process count above which Tpetra::CrsMatrix::transferAndFillComplete will attempt to do advanced neighbor discovery. More... | |
static size_t | verbosePrintCountThreshold () |
Number of entries below which arrays, lists, etc. will be printed in debug mode. More... | |
static size_t | rowImbalanceThreshold () |
Threshold for deciding if a local matrix is "imbalanced" in the number of entries per row. The threshold is compared against the difference between maximum row length and average row length. More... | |
static bool | useMergePathMultiVector () |
Whether to use the cuSPARSE merge path algorithm to perform sparse matrix-multivector products, one vector at a time. Depending on the matrix and the number of vectors in the multivector, this may be better than just applying the default SpMV algorithm to the entire multivector at once. More... | |
static bool | hierarchicalUnpack () |
Unpack rows of a matrix using hierarchical unpacking. More... | |
static size_t | hierarchicalUnpackBatchSize () |
Size of batch for hierarchical unpacking. More... | |
static size_t | hierarchicalUnpackTeamSize () |
Size of team for hierarchical unpacking. More... | |
static size_t | multivectorKernelLocationThreshold () |
the threshold for transitioning from device to host More... | |
static bool | profilingRegionUseTeuchosTimers () |
Use Teuchos::Timer in Tpetra::ProfilingRegion. More... | |
static bool | profilingRegionUseKokkosProfiling () |
Use Kokkos::Profiling in Tpetra::ProfilingRegion. More... | |
static bool | fusedResidual () |
Fusing SpMV and update in residual instead of using 2 kernel launches. Fusing kernels implies that no TPLs (CUSPARSE, ROCSPARSE, ...) will be used for the residual. More... | |
static bool | skipCopyAndPermuteIfPossible () |
Skip copyAndPermute if possible. More... | |
static bool | overlapCommunicationAndComputation () |
Overlap communication and computation. More... | |
static bool | timeKokkosDeepCopy () |
Add Teuchos timers for all host calls to Kokkos::deep_copy(). This is especially useful for identifying host/device data transfers. More... | |
static bool | timeKokkosDeepCopyVerbose1 () |
Adds verbose output to Kokkos deep_copy timers by appending source and destination. This is especially useful for identifying host/device data transfers. More... | |
static bool | timeKokkosDeepCopyVerbose2 () |
Adds verbose output to Kokkos deep_copy timers by appending source, destination, and size. This is especially useful for identifying host/device data transfers. More... | |
static bool | timeKokkosFence () |
Add Teuchos timers for all host calls to Kokkos::fence(). More... | |
static bool | timeKokkosFunctions () |
Add Teuchos timers for all host calls to Kokkos::parallel_for(), Kokkos::parallel_reduce() and Kokkos::parallel_scan(). More... | |
static size_t | spacesIdWarnLimit () |
Warn if more than this many Kokkos spaces are accessed. More... | |
static void | reject_unrecognized_env_vars () |
Search the environment for TPETRA_ variables and reject unrecognized ones. More... | |
Description of Tpetra's behavior.
"Behavior" means things like whether to do extra debug checks or print debug output. These depend both on build options and on environment variables. Build options generally control the default behavior.
This class' methods have the following properties:
We intended for it to be inexpensive to call this class' methods repeatedly. The idea is that you don't have to cache variables; you should just call the functions freely. In the common case, the bool
methods should just perform an 'if' test and just return the bool
value. We spent some time thinking about how to make the methods reentrant without a possibly expensive mutex-like pthread_once / std::call_once cost on each call.
Tpetra does not promise to see changes to environment variables made after using any Tpetra class or calling any Tpetra function. Best practice would be to set any environment variables that you want to set, before starting the executable.
Our main goal with this class is to give both users and developers more run-time control in determining Tpetra's behavior, by setting environment variables. This makes debugging much more efficient, since before, enabling debugging code would have required reconfiguring and recompiling. Not all of Tpetra has bought into this system yet; some debug code is still protected by macros like HAVE_TPETRA_DEBUG
. However, our goal is that as much Tpetra debugging code as possible can be enabled or disabled via environment variable. This will have the additional advantage of avoiding errors due to only building and testing in debug or release mode, but not both.
The behavior of Tpetra can be modified at runtime through two environment variables:
TPETRA_DEBUG: flags Tpetra to turn on debug checking. TPETRA_VERBOSE: flags Tpetra to turn on debug output. TPETRA_TIMING: flags Tpetra to turn on timing code.
These are two different things. For example, TPETRA_DEBUG may do extra MPI communication in order to ensure correct error state propagation, but TPETRA_DEBUG should never print copious debug output if no errors occurred. The idea is that if users get a mysterious error or hang, they can rerun with TPETRA_DEBUG set. TPETRA_VERBOSE is for Tpetra developers to use for debugging Tpetra. TPETRA_TIMING is for Tpetra developers to use for timing Tpetra.
The environment variables are understood to be "on" or "off" and recognized if specified in one of two ways. The first is to specify the variable unconditionally ON or OFF. e.g., TPETRA_[VERBOSE,DEBUG,TIMING]=ON or TPETRA_[VERBOSE,DEBUG,TIMING]=OFF. The default value of TPETRA_VERBOSE and TPETRA_TIMING is always OFF. The default value for TPETRA_DEBUG is ON if Tpetra is configured with Tpetra_ENABLE_DEBUG, otherwise it is OFF.
The second is to specify the variable on a per class/object basis, e.g., TPETRA_VERBOSE=CrsGraph,CrsMatrix,Distributor means that verbose output will be enabled for CrsGraph, CrsMatrix, and Distributor classes. For this second method, the default values of both TPETRA_VERBOSE and TPETRA_DEBUG is OFF.
Definition at line 91 of file Tpetra_Details_Behavior.hpp.
|
static |
Whether Tpetra is in debug mode.
"Debug mode" means that Tpetra does extra error checks that may require more MPI communication or local computation. It may also produce more detailed error messages, and more copious debug output.
Definition at line 442 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether the given Tpetra object is in debug mode.
name | [in] Name of the Tpetra object. Typically, the object would be a class name, e.g., "CrsGraph" or method, e.g., "CrsGraph::insertLocalIndices". |
Definition at line 592 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether Tpetra is in verbose mode.
"Verbose mode" means that Tpetra prints copious debug output to std::cerr on every MPI process. This is a LOT of output! You really don't want to do this when running on many MPI processes.
Definition at line 451 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether the given Tpetra object is in verbose mode.
name | [in] Name of the Tpetra object. Typically, the object would be a class name, e.g., "CrsGraph" or method, e.g., "CrsGraph::insertLocalIndices". |
Definition at line 600 of file Tpetra_Details_Behavior.cpp.
|
static |
Disable verbose mode, programatically.
Definition at line 615 of file Tpetra_Details_Behavior.cpp.
|
static |
Enable verbose mode, programatically.
Definition at line 611 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether Tpetra is in timing mode.
"Timing mode" means that Tpetra enables code that instruments internal timing.
Definition at line 463 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether the given Tpetra object is in timing mode.
name | [in] Name of the Tpetra object. Typically, the object would be a class name, e.g., "CrsGraph" or method, e.g., "CrsGraph::insertLocalIndices". |
Definition at line 619 of file Tpetra_Details_Behavior.cpp.
|
static |
Disable timing, programatically.
Definition at line 632 of file Tpetra_Details_Behavior.cpp.
|
static |
Enable timing, programatically.
Definition at line 630 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether to assume that MPI is CUDA aware.
An MPI implementation is "CUDA aware" if it can accept CUDA device buffers (Kokkos::CudaSpace) as send and receive buffers. You may control this behavior at run time via the TPETRA_ASSUME_GPU_AWARE_MPI
environment variable.
For a discussion, see Trilinos GitHub issues #1571 and #1088.
Definition at line 475 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether the CUDA_LAUNCH_BLOCKING environment variable has been set.
Definition at line 485 of file Tpetra_Details_Behavior.cpp.
|
static |
MPI process count above which Tpetra::CrsMatrix::transferAndFillComplete will attempt to do advanced neighbor discovery.
This is platform dependent, and the user/developer should test each new platform for the correct value. You may control this at run time via the MM_TAFC_OptimizationCoreCount
environment variable.
Definition at line 495 of file Tpetra_Details_Behavior.cpp.
|
static |
Number of entries below which arrays, lists, etc. will be printed in debug mode.
You may control this at run time via the TPETRA_VERBOSE_PRINT_COUNT_THRESHOLD
environment variable.
Definition at line 504 of file Tpetra_Details_Behavior.cpp.
|
static |
Threshold for deciding if a local matrix is "imbalanced" in the number of entries per row. The threshold is compared against the difference between maximum row length and average row length.
The threshold is measured in max number of entries in excess of the average (it is not a proportion between max and average).
If the "imbalance" of a local matrix is greater than this threshold, a different algorithm may be used for some operations like sparse matrix-vector multiply, packAndPrepare, and unpackAndCombine. You may control this at run time via the TPETRA_ROW_IMBALANCE_THRESHOLD
environment variable.
Definition at line 514 of file Tpetra_Details_Behavior.cpp.
|
static |
Whether to use the cuSPARSE merge path algorithm to perform sparse matrix-multivector products, one vector at a time. Depending on the matrix and the number of vectors in the multivector, this may be better than just applying the default SpMV algorithm to the entire multivector at once.
Note: full support for merge path SPMV on multivectors is coming soon.
You may control this at run time via the TPETRA_MULTIVECTOR_USE_MERGE_PATH
environment variable (default: false)
Definition at line 524 of file Tpetra_Details_Behavior.cpp.
|
static |
Unpack rows of a matrix using hierarchical unpacking.
Definition at line 634 of file Tpetra_Details_Behavior.cpp.
|
static |
Size of batch for hierarchical unpacking.
Definition at line 544 of file Tpetra_Details_Behavior.cpp.
|
static |
Size of team for hierarchical unpacking.
Definition at line 559 of file Tpetra_Details_Behavior.cpp.
|
static |
the threshold for transitioning from device to host
If the number of elements in the multivector does not exceed this threshold and the data is on host, then run the calculation on host. Otherwise, run on device. By default this is 10000, but may be altered by the environment variable TPETRA_VECTOR_DEVICE_THRESHOLD
Definition at line 534 of file Tpetra_Details_Behavior.cpp.
|
static |
Use Teuchos::Timer in Tpetra::ProfilingRegion.
This is disabled by default. You may control this at run time via the TPETRA_USE_TEUCHOS_TIMERS
environment variable.
Definition at line 573 of file Tpetra_Details_Behavior.cpp.
|
static |
Use Kokkos::Profiling in Tpetra::ProfilingRegion.
This is enabled by default if KOKKOS_ENABLE_PROFILING is defined. You may control this at run time via the TPETRA_USE_KOKKOS_PROFILING
environment variable.
Definition at line 582 of file Tpetra_Details_Behavior.cpp.
|
static |
Fusing SpMV and update in residual instead of using 2 kernel launches. Fusing kernels implies that no TPLs (CUSPARSE, ROCSPARSE, ...) will be used for the residual.
This is enabled by default. You may control this at run time via the TPETRA_FUSED_RESIDUAL
environment variable.
Definition at line 653 of file Tpetra_Details_Behavior.cpp.
|
static |
Skip copyAndPermute if possible.
This is disabled by default. You may control this at run time via the TPETRA_SKIP_COPY_AND_PERMUTE
environment variable.
Definition at line 643 of file Tpetra_Details_Behavior.cpp.
|
static |
Overlap communication and computation.
This is disabled by default. You may control this at run time via the TPETRA_OVERLAP
environment variable.
Definition at line 668 of file Tpetra_Details_Behavior.cpp.
|
static |
Add Teuchos timers for all host calls to Kokkos::deep_copy(). This is especially useful for identifying host/device data transfers.
This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_DEEP_COPY
environment variable.
Definition at line 687 of file Tpetra_Details_Behavior.cpp.
|
static |
Adds verbose output to Kokkos deep_copy timers by appending source and destination. This is especially useful for identifying host/device data transfers.
This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_DEEP_COPY_VERBOSE1
environment variable.
Definition at line 697 of file Tpetra_Details_Behavior.cpp.
|
static |
Adds verbose output to Kokkos deep_copy timers by appending source, destination, and size. This is especially useful for identifying host/device data transfers.
This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_DEEP_COPY_VERBOSE2
environment variable.
Definition at line 707 of file Tpetra_Details_Behavior.cpp.
|
static |
Add Teuchos timers for all host calls to Kokkos::fence().
This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_FENCE
environment variable.
Definition at line 717 of file Tpetra_Details_Behavior.cpp.
|
static |
Add Teuchos timers for all host calls to Kokkos::parallel_for(), Kokkos::parallel_reduce() and Kokkos::parallel_scan().
This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_FUNCTIONS
environment variable.
Definition at line 726 of file Tpetra_Details_Behavior.cpp.
|
static |
Warn if more than this many Kokkos spaces are accessed.
This is disabled by default. You may control this at run time via the TPETRA_SPACE_ID_WARN_LIMIT
environment variable.
Definition at line 677 of file Tpetra_Details_Behavior.cpp.
|
static |
Search the environment for TPETRA_ variables and reject unrecognized ones.
Definition at line 393 of file Tpetra_Details_Behavior.cpp.