Panzer
Version of the Day
|
Singleton class for accessing kokkos hierarchical parallelism parameters. More...
#include <Panzer_HierarchicParallelism.hpp>
Public Member Functions | |
void | overrideSizes (const int &team_size, const int &vector_size, const int &fad_vector_size, const bool force_override_safety=false) |
void | resetSizes () |
Reset the sizes to default. More... | |
template<typename Scalar > | |
int | vectorSize () const |
Returns the vector size. Specialized for AD scalar types. More... | |
void | setUseSharedMemory (const bool &use_shared_memory, const bool &fad_use_shared_memory) |
Tell kokkos kernels if they should use shared memory. This is very problem dependent. More... | |
template<typename Scalar > | |
bool | useSharedMemory () const |
template<typename ScalarT , typename... TeamPolicyProperties> | |
Kokkos::TeamPolicy < TeamPolicyProperties...> | teamPolicy (const int &league_size) |
Returns a TeamPolicy for hierarchic parallelism. More... | |
template<typename ScalarT , typename... TeamPolicyProperties, typename ExecSpace > | |
Kokkos::TeamPolicy< ExecSpace, TeamPolicyProperties...> | teamPolicy (ExecSpace exec_space, const int &league_size) |
Returns a TeamPolicy for hierarchic parallelism using an exec_space instance (for cuda streams). More... | |
Static Public Member Functions | |
static HP & | inst () |
Private ctor. More... | |
Private Member Functions | |
HP () | |
Use shared memory kokkos kernels for fad types. More... | |
Private Attributes | |
bool | use_auto_team_size_ |
int | team_size_ |
If true, the team size is set with Kokkos::AUTO() More... | |
int | vector_size_ |
User specified team size. More... | |
int | fad_vector_size_ |
Default vector size for non-AD types. More... | |
bool | use_shared_memory_ |
FAD vector size. More... | |
bool | fad_use_shared_memory_ |
Use shared memory kokkos kernels for non-fad types. More... | |
Singleton class for accessing kokkos hierarchical parallelism parameters.
Definition at line 19 of file Panzer_HierarchicParallelism.hpp.
|
private |
Use shared memory kokkos kernels for fad types.
Definition at line 15 of file Panzer_HierarchicParallelism.cpp.
|
static |
Private ctor.
Return singleton instance of this class.
Definition at line 33 of file Panzer_HierarchicParallelism.cpp.
void panzer::HP::overrideSizes | ( | const int & | team_size, |
const int & | vector_size, | ||
const int & | fad_vector_size, | ||
const bool | force_override_safety = false |
||
) |
Allows the user to override the Kokkos default team and vector sizes for kernel dispatch. The values will be capped by hardware limits and rounded down to the nearest power of two.
The final variable will force the values input to be set explicity and not round down to the nearest power of two or hardware maximum.
team_size | Team size requested for hierarchic kernel |
vector_size | Vector size requested for hierarchic kernel for non-FAD scalar types |
fad_vector_size | Vector size requested for hierarchic kernel for FAD scalar types |
force_override_safety | Ignore the power of two and other checks |
Definition at line 49 of file Panzer_HierarchicParallelism.cpp.
|
inline |
Reset the sizes to default.
Definition at line 51 of file Panzer_HierarchicParallelism.hpp.
|
inline |
Returns the vector size. Specialized for AD scalar types.
NOTE: For hierarchic parallelism, if we use the same code for both Residual and Jacobian (as we do in most evaluators), the loop over vector level is missing for Residual. The loop is implemented internally in the AD types for Jacobian where on CUDA the warp parallelizes over the derivative dimension. To prevent incorrect code, we need to force the vector size to 1 for non-AD scalar types. Eventual workaround is to use SIMD data type with similar hidden vector loop for Residual. In the mean time, this function will set correct vector_size of one.
Definition at line 66 of file Panzer_HierarchicParallelism.hpp.
void panzer::HP::setUseSharedMemory | ( | const bool & | use_shared_memory, |
const bool & | fad_use_shared_memory | ||
) |
Tell kokkos kernels if they should use shared memory. This is very problem dependent.
If a panzer hierarchic kernel can use shared memory to speed the calculation, then it carries a second implementation that takes advantage of shared memory. Shared memory on the GPU is very limited. On some of the example problems, the shared memory runs out if the basis is greated than order 2 on a hex mesh. This is also very dependent on the size of the derivative array. A large derivative array uses up memory much quicker. The default is that for non-fad types, we always enable shared memory. For fad types the default is to disable use of shared memory, but this function can override for specific problems. For example, the adapters-stk/examples/MixedPoission problem can use shared memory for fad types for basis order 2 or less. It will call this function based on the basis order to improve performance.
Definition at line 73 of file Panzer_HierarchicParallelism.cpp.
|
inline |
Definition at line 92 of file Panzer_HierarchicParallelism.hpp.
|
inline |
Returns a TeamPolicy for hierarchic parallelism.
Definition at line 99 of file Panzer_HierarchicParallelism.hpp.
|
inline |
Returns a TeamPolicy for hierarchic parallelism using an exec_space instance (for cuda streams).
Definition at line 112 of file Panzer_HierarchicParallelism.hpp.
|
private |
Definition at line 20 of file Panzer_HierarchicParallelism.hpp.
|
private |
If true, the team size is set with Kokkos::AUTO()
Definition at line 21 of file Panzer_HierarchicParallelism.hpp.
|
private |
User specified team size.
Definition at line 22 of file Panzer_HierarchicParallelism.hpp.
|
private |
Default vector size for non-AD types.
Definition at line 23 of file Panzer_HierarchicParallelism.hpp.
|
private |
FAD vector size.
Definition at line 24 of file Panzer_HierarchicParallelism.hpp.
|
private |
Use shared memory kokkos kernels for non-fad types.
Definition at line 25 of file Panzer_HierarchicParallelism.hpp.