Singleton class for accessing kokkos hierarchical parallelism parameters. More...

#include <Panzer_HierarchicParallelism.hpp>

Public Member Functions
void	overrideSizes (const int &team_size, const int &vector_size, const int &fad_vector_size, const bool force_override_safety=false)

void	resetSizes ()
	Reset the sizes to default. More...

template<typename Scalar >
int	vectorSize () const
	Returns the vector size. Specialized for AD scalar types. More...

void	setUseSharedMemory (const bool &use_shared_memory, const bool &fad_use_shared_memory)
	Tell kokkos kernels if they should use shared memory. This is very problem dependent. More...

template<typename Scalar >
bool	useSharedMemory () const

template<typename ScalarT , typename... TeamPolicyProperties>
Kokkos::TeamPolicy < TeamPolicyProperties...>	teamPolicy (const int &league_size)
	Returns a TeamPolicy for hierarchic parallelism. More...

template<typename ScalarT , typename... TeamPolicyProperties, typename ExecSpace >
Kokkos::TeamPolicy< ExecSpace, TeamPolicyProperties...>	teamPolicy (ExecSpace exec_space, const int &league_size)
	Returns a TeamPolicy for hierarchic parallelism using an exec_space instance (for cuda streams). More...

Static Public Member Functions
static HP &	inst ()
	Private ctor. More...

Private Member Functions
	HP ()
	Use shared memory kokkos kernels for fad types. More...

Private Attributes
bool	use_auto_team_size_

int	team_size_
	If true, the team size is set with Kokkos::AUTO() More...

int	vector_size_
	User specified team size. More...

int	fad_vector_size_
	Default vector size for non-AD types. More...

bool	use_shared_memory_
	FAD vector size. More...

bool	fad_use_shared_memory_
	Use shared memory kokkos kernels for non-fad types. More...

Detailed Description

Singleton class for accessing kokkos hierarchical parallelism parameters.

Definition at line 19 of file Panzer_HierarchicParallelism.hpp.

Constructor & Destructor Documentation

panzer::HP::HP ( )

private

Use shared memory kokkos kernels for fad types.

Definition at line 15 of file Panzer_HierarchicParallelism.cpp.

Member Function Documentation

HP & panzer::HP::inst ( )

static

Private ctor.

Return singleton instance of this class.

Definition at line 33 of file Panzer_HierarchicParallelism.cpp.

void panzer::HP::overrideSizes	(	const int &	team_size,
		const int &	vector_size,
		const int &	fad_vector_size,
		const bool	force_override_safety = `false`
	)

Allows the user to override the Kokkos default team and vector sizes for kernel dispatch. The values will be capped by hardware limits and rounded down to the nearest power of two.

The final variable will force the values input to be set explicity and not round down to the nearest power of two or hardware maximum.

Parameters

team_size	Team size requested for hierarchic kernel
vector_size	Vector size requested for hierarchic kernel for non-FAD scalar types
fad_vector_size	Vector size requested for hierarchic kernel for FAD scalar types
force_override_safety	Ignore the power of two and other checks

Definition at line 49 of file Panzer_HierarchicParallelism.cpp.

void panzer::HP::resetSizes ( )

inline

Reset the sizes to default.

Definition at line 51 of file Panzer_HierarchicParallelism.hpp.

template<typename Scalar >

int panzer::HP::vectorSize ( ) const

inline

Returns the vector size. Specialized for AD scalar types.

NOTE: For hierarchic parallelism, if we use the same code for both Residual and Jacobian (as we do in most evaluators), the loop over vector level is missing for Residual. The loop is implemented internally in the AD types for Jacobian where on CUDA the warp parallelizes over the derivative dimension. To prevent incorrect code, we need to force the vector size to 1 for non-AD scalar types. Eventual workaround is to use SIMD data type with similar hidden vector loop for Residual. In the mean time, this function will set correct vector_size of one.

Definition at line 66 of file Panzer_HierarchicParallelism.hpp.

void panzer::HP::setUseSharedMemory	(	const bool &	use_shared_memory,
		const bool &	fad_use_shared_memory
	)

Tell kokkos kernels if they should use shared memory. This is very problem dependent.

If a panzer hierarchic kernel can use shared memory to speed the calculation, then it carries a second implementation that takes advantage of shared memory. Shared memory on the GPU is very limited. On some of the example problems, the shared memory runs out if the basis is greated than order 2 on a hex mesh. This is also very dependent on the size of the derivative array. A large derivative array uses up memory much quicker. The default is that for non-fad types, we always enable shared memory. For fad types the default is to disable use of shared memory, but this function can override for specific problems. For example, the adapters-stk/examples/MixedPoission problem can use shared memory for fad types for basis order 2 or less. It will call this function based on the basis order to improve performance.

Definition at line 73 of file Panzer_HierarchicParallelism.cpp.

template<typename Scalar >

bool panzer::HP::useSharedMemory ( ) const

inline

Definition at line 92 of file Panzer_HierarchicParallelism.hpp.

template<typename ScalarT , typename... TeamPolicyProperties>

Kokkos::TeamPolicy<TeamPolicyProperties...> panzer::HP::teamPolicy ( const int & league_size )

inline

Returns a TeamPolicy for hierarchic parallelism.

Definition at line 99 of file Panzer_HierarchicParallelism.hpp.

template<typename ScalarT , typename... TeamPolicyProperties, typename ExecSpace >