Tpetra parallel linear algebra  Version of the Day
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Static Public Member Functions | List of all members
Tpetra::Details::Behavior Class Reference

Description of Tpetra's behavior. More...

#include <Tpetra_Details_Behavior.hpp>

Static Public Member Functions

static bool debug ()
 Whether Tpetra is in debug mode. More...
 
static bool debug (const char name[])
 Whether the given Tpetra object is in debug mode. More...
 
static bool verbose ()
 Whether Tpetra is in verbose mode. More...
 
static bool verbose (const char name[])
 Whether the given Tpetra object is in verbose mode. More...
 
static void disable_verbose_behavior ()
 Disable verbose mode, programatically. More...
 
static void enable_verbose_behavior ()
 Enable verbose mode, programatically. More...
 
static bool timing ()
 Whether Tpetra is in timing mode. More...
 
static bool timing (const char name[])
 Whether the given Tpetra object is in timing mode. More...
 
static void disable_timing ()
 Disable timing, programatically. More...
 
static void enable_timing ()
 Enable timing, programatically. More...
 
static bool assumeMpiIsGPUAware ()
 Whether to assume that MPI is CUDA aware. More...
 
static bool cudaLaunchBlocking ()
 Whether the CUDA_LAUNCH_BLOCKING environment variable has been set. More...
 
static int TAFC_OptimizationCoreCount ()
 MPI process count above which Tpetra::CrsMatrix::transferAndFillComplete will attempt to do advanced neighbor discovery. More...
 
static size_t verbosePrintCountThreshold ()
 Number of entries below which arrays, lists, etc. will be printed in debug mode. More...
 
static size_t rowImbalanceThreshold ()
 Threshold for deciding if a local matrix is "imbalanced" in the number of entries per row. The threshold is compared against the difference between maximum row length and average row length. More...
 
static bool useMergePathMultiVector ()
 Whether to use the cuSPARSE merge path algorithm to perform sparse matrix-multivector products, one vector at a time. Depending on the matrix and the number of vectors in the multivector, this may be better than just applying the default SpMV algorithm to the entire multivector at once. More...
 
static bool hierarchicalUnpack ()
 Unpack rows of a matrix using hierarchical unpacking. More...
 
static size_t hierarchicalUnpackBatchSize ()
 Size of batch for hierarchical unpacking. More...
 
static size_t hierarchicalUnpackTeamSize ()
 Size of team for hierarchical unpacking. More...
 
static size_t multivectorKernelLocationThreshold ()
 the threshold for transitioning from device to host More...
 
static bool profilingRegionUseTeuchosTimers ()
 Use Teuchos::Timer in Tpetra::ProfilingRegion. More...
 
static bool profilingRegionUseKokkosProfiling ()
 Use Kokkos::Profiling in Tpetra::ProfilingRegion. More...
 
static bool fusedResidual ()
 Fusing SpMV and update in residual instead of using 2 kernel launches. Fusing kernels implies that no TPLs (CUSPARSE, ROCSPARSE, ...) will be used for the residual. More...
 
static bool skipCopyAndPermuteIfPossible ()
 Skip copyAndPermute if possible. More...
 
static bool overlapCommunicationAndComputation ()
 Overlap communication and computation. More...
 
static bool timeKokkosDeepCopy ()
 Add Teuchos timers for all host calls to Kokkos::deep_copy(). This is especially useful for identifying host/device data transfers. More...
 
static bool timeKokkosDeepCopyVerbose1 ()
 Adds verbose output to Kokkos deep_copy timers by appending source and destination. This is especially useful for identifying host/device data transfers. More...
 
static bool timeKokkosDeepCopyVerbose2 ()
 Adds verbose output to Kokkos deep_copy timers by appending source, destination, and size. This is especially useful for identifying host/device data transfers. More...
 
static bool timeKokkosFence ()
 Add Teuchos timers for all host calls to Kokkos::fence(). More...
 
static bool timeKokkosFunctions ()
 Add Teuchos timers for all host calls to Kokkos::parallel_for(), Kokkos::parallel_reduce() and Kokkos::parallel_scan(). More...
 
static size_t spacesIdWarnLimit ()
 Warn if more than this many Kokkos spaces are accessed. More...
 
static void reject_unrecognized_env_vars ()
 Search the environment for TPETRA_ variables and reject unrecognized ones. More...
 

Detailed Description

Description of Tpetra's behavior.

"Behavior" means things like whether to do extra debug checks or print debug output. These depend both on build options and on environment variables. Build options generally control the default behavior.

This class' methods have the following properties:

We intended for it to be inexpensive to call this class' methods repeatedly. The idea is that you don't have to cache variables; you should just call the functions freely. In the common case, the bool methods should just perform an 'if' test and just return the bool value. We spent some time thinking about how to make the methods reentrant without a possibly expensive mutex-like pthread_once / std::call_once cost on each call.

Tpetra does not promise to see changes to environment variables made after using any Tpetra class or calling any Tpetra function. Best practice would be to set any environment variables that you want to set, before starting the executable.

Our main goal with this class is to give both users and developers more run-time control in determining Tpetra's behavior, by setting environment variables. This makes debugging much more efficient, since before, enabling debugging code would have required reconfiguring and recompiling. Not all of Tpetra has bought into this system yet; some debug code is still protected by macros like HAVE_TPETRA_DEBUG. However, our goal is that as much Tpetra debugging code as possible can be enabled or disabled via environment variable. This will have the additional advantage of avoiding errors due to only building and testing in debug or release mode, but not both.

The behavior of Tpetra can be modified at runtime through two environment variables:

TPETRA_DEBUG: flags Tpetra to turn on debug checking. TPETRA_VERBOSE: flags Tpetra to turn on debug output. TPETRA_TIMING: flags Tpetra to turn on timing code.

These are two different things. For example, TPETRA_DEBUG may do extra MPI communication in order to ensure correct error state propagation, but TPETRA_DEBUG should never print copious debug output if no errors occurred. The idea is that if users get a mysterious error or hang, they can rerun with TPETRA_DEBUG set. TPETRA_VERBOSE is for Tpetra developers to use for debugging Tpetra. TPETRA_TIMING is for Tpetra developers to use for timing Tpetra.

The environment variables are understood to be "on" or "off" and recognized if specified in one of two ways. The first is to specify the variable unconditionally ON or OFF. e.g., TPETRA_[VERBOSE,DEBUG,TIMING]=ON or TPETRA_[VERBOSE,DEBUG,TIMING]=OFF. The default value of TPETRA_VERBOSE and TPETRA_TIMING is always OFF. The default value for TPETRA_DEBUG is ON if Tpetra is configured with Tpetra_ENABLE_DEBUG, otherwise it is OFF.

The second is to specify the variable on a per class/object basis, e.g., TPETRA_VERBOSE=CrsGraph,CrsMatrix,Distributor means that verbose output will be enabled for CrsGraph, CrsMatrix, and Distributor classes. For this second method, the default values of both TPETRA_VERBOSE and TPETRA_DEBUG is OFF.

Definition at line 91 of file Tpetra_Details_Behavior.hpp.

Member Function Documentation

bool Tpetra::Details::Behavior::debug ( )
static

Whether Tpetra is in debug mode.

"Debug mode" means that Tpetra does extra error checks that may require more MPI communication or local computation. It may also produce more detailed error messages, and more copious debug output.

Definition at line 442 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::debug ( const char  name[])
static

Whether the given Tpetra object is in debug mode.

Parameters
name[in] Name of the Tpetra object. Typically, the object would be a class name, e.g., "CrsGraph" or method, e.g., "CrsGraph::insertLocalIndices".

Definition at line 592 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::verbose ( )
static

Whether Tpetra is in verbose mode.

"Verbose mode" means that Tpetra prints copious debug output to std::cerr on every MPI process. This is a LOT of output! You really don't want to do this when running on many MPI processes.

Definition at line 451 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::verbose ( const char  name[])
static

Whether the given Tpetra object is in verbose mode.

Parameters
name[in] Name of the Tpetra object. Typically, the object would be a class name, e.g., "CrsGraph" or method, e.g., "CrsGraph::insertLocalIndices".

Definition at line 600 of file Tpetra_Details_Behavior.cpp.

void Tpetra::Details::Behavior::disable_verbose_behavior ( )
static

Disable verbose mode, programatically.

Definition at line 615 of file Tpetra_Details_Behavior.cpp.

void Tpetra::Details::Behavior::enable_verbose_behavior ( )
static

Enable verbose mode, programatically.

Definition at line 611 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::timing ( )
static

Whether Tpetra is in timing mode.

"Timing mode" means that Tpetra enables code that instruments internal timing.

Definition at line 463 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::timing ( const char  name[])
static

Whether the given Tpetra object is in timing mode.

Parameters
name[in] Name of the Tpetra object. Typically, the object would be a class name, e.g., "CrsGraph" or method, e.g., "CrsGraph::insertLocalIndices".

Definition at line 619 of file Tpetra_Details_Behavior.cpp.

void Tpetra::Details::Behavior::disable_timing ( )
static

Disable timing, programatically.

Definition at line 632 of file Tpetra_Details_Behavior.cpp.

void Tpetra::Details::Behavior::enable_timing ( )
static

Enable timing, programatically.

Definition at line 630 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::assumeMpiIsGPUAware ( )
static

Whether to assume that MPI is CUDA aware.

An MPI implementation is "CUDA aware" if it can accept CUDA device buffers (Kokkos::CudaSpace) as send and receive buffers. You may control this behavior at run time via the TPETRA_ASSUME_GPU_AWARE_MPI environment variable.

For a discussion, see Trilinos GitHub issues #1571 and #1088.

Definition at line 475 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::cudaLaunchBlocking ( )
static

Whether the CUDA_LAUNCH_BLOCKING environment variable has been set.

Definition at line 485 of file Tpetra_Details_Behavior.cpp.

int Tpetra::Details::Behavior::TAFC_OptimizationCoreCount ( )
static

MPI process count above which Tpetra::CrsMatrix::transferAndFillComplete will attempt to do advanced neighbor discovery.

This is platform dependent, and the user/developer should test each new platform for the correct value. You may control this at run time via the MM_TAFC_OptimizationCoreCount environment variable.

Definition at line 495 of file Tpetra_Details_Behavior.cpp.

size_t Tpetra::Details::Behavior::verbosePrintCountThreshold ( )
static

Number of entries below which arrays, lists, etc. will be printed in debug mode.

You may control this at run time via the TPETRA_VERBOSE_PRINT_COUNT_THRESHOLD environment variable.

Definition at line 504 of file Tpetra_Details_Behavior.cpp.

size_t Tpetra::Details::Behavior::rowImbalanceThreshold ( )
static

Threshold for deciding if a local matrix is "imbalanced" in the number of entries per row. The threshold is compared against the difference between maximum row length and average row length.

The threshold is measured in max number of entries in excess of the average (it is not a proportion between max and average).

If the "imbalance" of a local matrix is greater than this threshold, a different algorithm may be used for some operations like sparse matrix-vector multiply, packAndPrepare, and unpackAndCombine. You may control this at run time via the TPETRA_ROW_IMBALANCE_THRESHOLD environment variable.

Definition at line 514 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::useMergePathMultiVector ( )
static

Whether to use the cuSPARSE merge path algorithm to perform sparse matrix-multivector products, one vector at a time. Depending on the matrix and the number of vectors in the multivector, this may be better than just applying the default SpMV algorithm to the entire multivector at once.

Note: full support for merge path SPMV on multivectors is coming soon.

You may control this at run time via the TPETRA_MULTIVECTOR_USE_MERGE_PATH environment variable (default: false)

Definition at line 524 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::hierarchicalUnpack ( )
static

Unpack rows of a matrix using hierarchical unpacking.

Definition at line 634 of file Tpetra_Details_Behavior.cpp.

size_t Tpetra::Details::Behavior::hierarchicalUnpackBatchSize ( )
static

Size of batch for hierarchical unpacking.

Definition at line 544 of file Tpetra_Details_Behavior.cpp.

size_t Tpetra::Details::Behavior::hierarchicalUnpackTeamSize ( )
static

Size of team for hierarchical unpacking.

Definition at line 559 of file Tpetra_Details_Behavior.cpp.

size_t Tpetra::Details::Behavior::multivectorKernelLocationThreshold ( )
static

the threshold for transitioning from device to host

If the number of elements in the multivector does not exceed this threshold and the data is on host, then run the calculation on host. Otherwise, run on device. By default this is 10000, but may be altered by the environment variable TPETRA_VECTOR_DEVICE_THRESHOLD

Definition at line 534 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::profilingRegionUseTeuchosTimers ( )
static

Use Teuchos::Timer in Tpetra::ProfilingRegion.

This is disabled by default. You may control this at run time via the TPETRA_USE_TEUCHOS_TIMERS environment variable.

Definition at line 573 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::profilingRegionUseKokkosProfiling ( )
static

Use Kokkos::Profiling in Tpetra::ProfilingRegion.

This is enabled by default if KOKKOS_ENABLE_PROFILING is defined. You may control this at run time via the TPETRA_USE_KOKKOS_PROFILING environment variable.

Definition at line 582 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::fusedResidual ( )
static

Fusing SpMV and update in residual instead of using 2 kernel launches. Fusing kernels implies that no TPLs (CUSPARSE, ROCSPARSE, ...) will be used for the residual.

This is enabled by default. You may control this at run time via the TPETRA_FUSED_RESIDUAL environment variable.

Definition at line 653 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::skipCopyAndPermuteIfPossible ( )
static

Skip copyAndPermute if possible.

This is disabled by default. You may control this at run time via the TPETRA_SKIP_COPY_AND_PERMUTE environment variable.

Definition at line 643 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::overlapCommunicationAndComputation ( )
static

Overlap communication and computation.

This is disabled by default. You may control this at run time via the TPETRA_OVERLAP environment variable.

Definition at line 668 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::timeKokkosDeepCopy ( )
static

Add Teuchos timers for all host calls to Kokkos::deep_copy(). This is especially useful for identifying host/device data transfers.

This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_DEEP_COPY environment variable.

Definition at line 687 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::timeKokkosDeepCopyVerbose1 ( )
static

Adds verbose output to Kokkos deep_copy timers by appending source and destination. This is especially useful for identifying host/device data transfers.

This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_DEEP_COPY_VERBOSE1 environment variable.

Definition at line 697 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::timeKokkosDeepCopyVerbose2 ( )
static

Adds verbose output to Kokkos deep_copy timers by appending source, destination, and size. This is especially useful for identifying host/device data transfers.

This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_DEEP_COPY_VERBOSE2 environment variable.

Definition at line 707 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::timeKokkosFence ( )
static

Add Teuchos timers for all host calls to Kokkos::fence().

This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_FENCE environment variable.

Definition at line 717 of file Tpetra_Details_Behavior.cpp.

bool Tpetra::Details::Behavior::timeKokkosFunctions ( )
static

Add Teuchos timers for all host calls to Kokkos::parallel_for(), Kokkos::parallel_reduce() and Kokkos::parallel_scan().

This is disabled by default. You may control this at run time via the TPETRA_TIME_KOKKOS_FUNCTIONS environment variable.

Definition at line 726 of file Tpetra_Details_Behavior.cpp.

size_t Tpetra::Details::Behavior::spacesIdWarnLimit ( )
static

Warn if more than this many Kokkos spaces are accessed.

This is disabled by default. You may control this at run time via the TPETRA_SPACE_ID_WARN_LIMIT environment variable.

Definition at line 677 of file Tpetra_Details_Behavior.cpp.

void Tpetra::Details::Behavior::reject_unrecognized_env_vars ( )
static

Search the environment for TPETRA_ variables and reject unrecognized ones.

Definition at line 393 of file Tpetra_Details_Behavior.cpp.


The documentation for this class was generated from the following files: