Intrepid2
|
Implementation of a general sum factorization algorithm, abstracted from the algorithm described by Mora and Demkowicz, for integration. Uses hierarchical parallelism. More...
#include <Intrepid2_IntegrationToolsDef.hpp>
Public Member Functions | |
F_Integrate (Data< Scalar, DeviceType > integralData, TensorData< Scalar, DeviceType > leftComponent, Data< Scalar, DeviceType > composedTransform, TensorData< Scalar, DeviceType > rightComponent, TensorData< Scalar, DeviceType > cellMeasures, int a_offset, int b_offset, int leftFieldOrdinalOffset, int rightFieldOrdinalOffset, bool forceNonSpecialized) | |
template<size_t maxComponents, size_t numComponents = maxComponents> | |
KOKKOS_INLINE_FUNCTION int | incrementArgument (Kokkos::Array< int, maxComponents > &arguments, const Kokkos::Array< int, maxComponents > &bounds) const |
KOKKOS_INLINE_FUNCTION int | incrementArgument (Kokkos::Array< int, Parameters::MaxTensorComponents > &arguments, const Kokkos::Array< int, Parameters::MaxTensorComponents > &bounds, const int &numComponents) const |
runtime-sized variant of incrementArgument; gets used by approximate flop count. | |
template<size_t maxComponents, size_t numComponents = maxComponents> | |
KOKKOS_INLINE_FUNCTION int | nextIncrementResult (const Kokkos::Array< int, maxComponents > &arguments, const Kokkos::Array< int, maxComponents > &bounds) const |
KOKKOS_INLINE_FUNCTION int | nextIncrementResult (const Kokkos::Array< int, Parameters::MaxTensorComponents > &arguments, const Kokkos::Array< int, Parameters::MaxTensorComponents > &bounds, const int &numComponents) const |
runtime-sized variant of nextIncrementResult; gets used by approximate flop count. | |
template<size_t maxComponents, size_t numComponents = maxComponents> | |
KOKKOS_INLINE_FUNCTION int | relativeEnumerationIndex (const Kokkos::Array< int, maxComponents > &arguments, const Kokkos::Array< int, maxComponents > &bounds, const int startIndex) const |
KOKKOS_INLINE_FUNCTION void | runSpecialized3 (const TeamMember &teamMember) const |
runSpecialized implementations are hand-coded variants of run() for a particular number of components. To allow comparisons with the generic implementation (both in terms of performance and for verification), we use the member variable forceNonSpecialized_ to determine whether runSpecialized is selected when a specialized implementation is available. | |
template<size_t numTensorComponents> | |
KOKKOS_INLINE_FUNCTION void | run (const TeamMember &teamMember) const |
KOKKOS_INLINE_FUNCTION void | operator() (const TeamMember &teamMember) const |
long | approximateFlopCountPerCell () const |
returns an estimate of the number of floating point operations per cell (counting sums, subtractions, divisions, and multiplies, each of which counts as one operation). | |
int | teamSize (const int &maxTeamSizeFromKokkos) const |
returns the team size that should be provided to the policy constructor, based on the Kokkos maximum and the amount of thread parallelism we have available. | |
size_t | team_shmem_size (int team_size) const |
Provide the shared memory capacity. | |
Private Types | |
using | ExecutionSpace = typename DeviceType::execution_space |
using | TeamPolicy = Kokkos::TeamPolicy< ExecutionSpace > |
using | TeamMember = typename TeamPolicy::member_type |
using | IntegralViewType = Kokkos::View< typename RankExpander< Scalar, integralViewRank >::value_type, DeviceType > |
Private Attributes | |
IntegralViewType | integralView_ |
TensorData< Scalar, DeviceType > | leftComponent_ |
Data< Scalar, DeviceType > | composedTransform_ |
TensorData< Scalar, DeviceType > | rightComponent_ |
TensorData< Scalar, DeviceType > | cellMeasures_ |
int | a_offset_ |
int | b_offset_ |
int | leftComponentSpan_ |
int | rightComponentSpan_ |
int | numTensorComponents_ |
int | leftFieldOrdinalOffset_ |
int | rightFieldOrdinalOffset_ |
bool | forceNonSpecialized_ |
size_t | fad_size_output_ = 0 |
Kokkos::Array< int, 7 > | offsetsForComponentOrdinal_ |
Kokkos::Array< int, Parameters::MaxTensorComponents > | leftFieldBounds_ |
Kokkos::Array< int, Parameters::MaxTensorComponents > | rightFieldBounds_ |
Kokkos::Array< int, Parameters::MaxTensorComponents > | pointBounds_ |
Kokkos::Array< int, Parameters::MaxTensorComponents > | leftFieldRelativeEnumerationSpans_ |
Kokkos::Array< int, Parameters::MaxTensorComponents > | rightFieldRelativeEnumerationSpans_ |
int | maxFieldsLeft_ |
int | maxFieldsRight_ |
int | maxPointCount_ |
Implementation of a general sum factorization algorithm, abstracted from the algorithm described by Mora and Demkowicz, for integration. Uses hierarchical parallelism.
Definition at line 32 of file Intrepid2_IntegrationToolsDef.hpp.