Amesos Package Browser (Single Doxygen Collection)  Development
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Amesos_Merikos.h
Go to the documentation of this file.
1 This file is out of date. It has not been refactored to use Amesos_Status.
2 
3 /*
4 Task list:
5  Amesos_Merikos.h
6  Amesos_Merikos.cpp
7  Partition the matrix - store L as L^T?
8  Build tree
9  Initial data redistribution
10  Change row and column ownership (pass them up to the parent)
11  Amesos_Component_Solver.h
12  Amesos_BTF.h
13  Amesos_BTF.cpp
14 
15 
16 
17  Communications issues/challenges:
18  ** Redistributing the original matrix to the arrowhead form that we need, options:
19  1) Two matrices: L^T and U
20  2) One matrix: U | L^T
21  3) Intermediate "fat" matrix - the way that I did Scalapack
22 
23  ** Adding the Schur complements SC_0 and SC_1 to the original
24  trailing matirx A_2, owned by the parent
25  1) Precompute the final size and shape of A_2 + SC_0 + SC_1
26  2) Perform A_2 + SC_0 + SC_1 in an empty matrix of size n by n
27  and extract the non-empty rows.
28  CHALLENGES:
29  A) Only process 0/1 knows the size and map for SC_0/SC_1
30  B) It would be nice to allow SC_0 and SC_1 to be sent as soon as
31  they are available
32  C) It would be nice to have just one copy of the matrix on the
33  parent. Hence, it would be nice to know the shape of
34  A_2 + SC_0 + SC_1 in advance.
35  D) An import would do the trick provided that we know both maps
36  in advance. But, neither map is available to us in advance.
37  The original map (which has SC_0 on process 0 and SC_1 on
38  process 1) is not known
39  QUESTION:
40  Should the maps be in some global address space or should they be
41  in a local address space?
42  I'd like to keep them in the global address space as long as possible,
43  but we can't do the import of the A_2 + SC_0 + SC_1 in a global
44  address space because that would require a map that changes at each
45 
46  ** Redistributing the right hand side vector, b
47  If we create a map that reflects the post-pivoting reality, assigning
48  each row of U and each column of L to the process that owns the diagonal
49  entry, we can redistribute the right hand side vector, b, to the
50  processes where the values of b will first be used, in a single, efficient,
51  import operation.
52 
53 
54 Observations:
55 1) Although this algorithm is recursive, a non-recursive implementation
56  might be cleaner. If it is done recursively, it should be done in place,
57  i.e. any data movement of the matrix itself should have been done in
58  advance.
59 2) There are two choices for the basic paradigm for parallelism. Consider
60  a two level bisection of the matrix, yielding seven tasks or diaganol
61  blocks:: T0, T1, T01, T2, T3, T23 and T0123. In both paradigms,
62  T0, T1, T2 and T3 would each
63  be handled by a single process. Up until now, we have been assuming
64  that T01 would be handled by processes 0 and/or 1 while T23 would be
65  handled by processes 2 and/or 3. The other option is to arrange the
66  tasks/diagonal blocks as follows: T0, T1, T2, T3, T01, T23, T0123 and
67  treat the last three blocks: T01, T23 and T0123 as a single block to be
68  handled by all four processes. This second paradigm includes an
69  additional synchronization, but may allow a better partitioning of
70  the remaining matrix because the resulting schur complement is now
71  known. This improved partitioning will also improve the refactorization
72  (i.e. pivotless factorization). The second paradigm may also allow for
73  better load balancing. For example, after using recursive minimum
74  degree bi-section (or any other scheme) to partition the matrix, one could
75  run a peephole optimization pass that would look for individuals blocks
76  that could be moved from the largest partition to a smaller one. Finally,
77  if it is clear that a given partition is going to be the slowest, it might
78  make sense to shift some rows/columns off of it into the splitter just
79  for load balancing.
80 3) It seems possible that Merikos would be a cleaner code if rows
81  which are shared by multiple processes are split such that each row
82  resides entirely within a given process.
83 
84 4) Support for pivotless refactorization is important.
85 5) There is no mention of the required row and column permutations.
86 6) Amesos_Merikos only needs to support the Amesos_Component interface if
87  it will call itself recursively on the subblocks.
88 7) Perhaps Amesos_Component.h should be an added interface. Instead
89  of replacing Amesos_BaseSolver.h, maybe it should add functionality
90  to it.
91 */
92 // @HEADER
93 // ***********************************************************************
94 //
95 // Amesos: Direct Sparse Solver Package
96 // Copyright (2004) Sandia Corporation
97 //
98 // Under terms of Contract DE-AC04-94AL85000, there is a non-exclusive
99 // license for use of this work by or on behalf of the U.S. Government.
100 //
101 // This library is free software; you can redistribute it and/or modify
102 // it under the terms of the GNU Lesser General Public License as
103 // published by the Free Software Foundation; either version 2.1 of the
104 // License, or (at your option) any later version.
105 //
106 // This library is distributed in the hope that it will be useful, but
107 // WITHOUT ANY WARRANTY; without even the implied warranty of
108 // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
109 // Lesser General Public License for more details.
110 //
111 // You should have received a copy of the GNU Lesser General Public
112 // License along with this library; if not, write to the Free Software
113 // Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
114 // USA
115 // Questions? Contact Michael A. Heroux (maherou@sandia.gov)
116 //
117 // ***********************************************************************
118 // @HEADER
119 
120 #ifndef _AMESOS_MERIKOS_H_
121 #define _AMESOS_MERIKOS_H_
122 
123 #include "Amesos_ConfigDefs.h"
124 #include "Amesos_BaseSolver.h"
125 #include "Epetra_LinearProblem.h"
126 #include "Epetra_Time.h"
127 #ifdef EPETRA_MPI
128 #include "Epetra_MpiComm.h"
129 #else
130 #include "Epetra_Comm.h"
131 #endif
132 #include "Epetra_CrsGraph.h"
133 
134 
136 
155 
156 public:
157 
159 
164  Amesos_Merikos(const Epetra_LinearProblem& LinearProblem );
165 
167 
169  ~Amesos_Merikos(void);
171 
173 
175 
181  int RedistributeA() ;
182  int ConvertToScalapack() ;
184  int SymbolicFactorization() ;
185 
187 
211  int NumericFactorization() ;
212 
214 
234  int LSolve();
235 
236 
238 
258  int USolve();
259 
261 
290  int Solve();
291 
292 
293 
295 
297 
298 
300  const Epetra_LinearProblem *GetProblem() const { return(Problem_); };
301 
303 
306  bool MatrixShapeOK() const ;
307 
309 
312 
314  bool UseTranspose() const {return(UseTranspose_);};
315 
317  const Epetra_Comm & Comm() const {return(GetProblem()->GetOperator()->Comm());};
318 
320 
323  int SetParameters( Teuchos::ParameterList &ParameterList ) ;
324 
326  int NumSymbolicFact() const { return( NumSymbolicFact_ ); }
327 
329  int NumNumericFact() const { return( NumNumericFact_ ); }
330 
332  int NumSolve() const { return( NumSolve_ ); }
333 
335  void PrintTiming();
336 
338  void PrintStatus();
339 
341 
342  protected:
343 
346 
349 
354 
355  int verbose_;
356  int debug_;
357 
358  // some timing internal, copied from MUMPS
359  double ConTime_; // time to convert to MERIKOS format
360  double SymTime_; // time for symbolic factorization
361  double NumTime_; // time for numeric factorization
362  double SolTime_; // time for solution
363  double VecTime_; // time to redistribute vectors
364  double MatTime_; // time to redistribute matrix
365 
368  int NumSolve_;
369 
371 
372 
373 
374 //
375 // These allow us to use the Scalapack based Merikos code
376 //
377  Epetra_Map *ScaLAPACK1DMap_ ; // Points to a 1D Map which matches a ScaLAPACK 1D
378  // blocked (not block cyclic) distribution
379  Epetra_CrsMatrix *ScaLAPACK1DMatrix_ ; // Points to a ScaLAPACK 1D
380  // blocked (not block cyclic) distribution
381  Epetra_Map *VectorMap_ ; // Points to a Map for vectors X and B
382  std::vector<double> DenseA_; // The data in a ScaLAPACK 1D blocked format
383  std::vector<int> Ipiv_ ; // ScaLAPACK pivot information
386 
387 
388  //
389  // Control of the data distribution
390  //
391  bool TwoD_distribution_; // True if 2D data distribution is used
392  int grid_nb_; // Row and Column blocking factor (only used in 2D distribution)
393  int mypcol_; // Process column in the ScaLAPACK2D grid
394  int myprow_; // Process row in the ScaLAPACK2D grid
396 
397  //
398  // Blocking factors (For both 1D and 2D data distributions)
399  //
400  int nb_;
401  int lda_;
402 
403 int iam_;
404 int nprow_;
405 int npcol_;
408 
409 
410 
411 }; // End of class Amesos_Merikos
412 #endif /* _AMESOS_MERIKOS_H_ */
void PrintStatus()
Print information about the factorization and solution phases.
Epetra_Map * ScaLAPACK1DMap_
int NumericFactorization()
Performs NumericFactorization on the matrix A.
Epetra_Map * VectorMap_
int LSolve()
Solves L X = B.
int SetUseTranspose(bool UseTranspose)
SetUseTranpose() controls whether to compute AX=B or ATX = B.
Epetra_Time * Time_
int NumSolve() const
Returns the number of solves performed by this object.
int SymbolicFactorization()
Performs SymbolicFactorization on the matrix A.
const Epetra_Comm & Comm() const
Returns a pointer to the Epetra_Comm communicator associated with this matrix.
Epetra_CrsMatrix * ScaLAPACK1DMatrix_
int PerformNumericFactorization()
bool MatrixShapeOK() const
Returns true if MERIKOS can handle this matrix shape.
const Epetra_LinearProblem * GetProblem() const
Get a pointer to the Problem.
Epetra_CrsMatrix * L
bool UseTranspose() const
Returns the current UseTranspose setting.
~Amesos_Merikos(void)
Amesos_Merikos Destructor.
void PrintTiming()
Print timing information.
std::vector< double > DenseA_
int NumSymbolicFact() const
Returns the number of symbolic factorizations performed by this object.
Amesos_Status: Container for some status variables.
Definition: Amesos_Status.h:20
const Epetra_LinearProblem * Problem_
int USolve()
Solves U X = B.
int RedistributeA()
Performs SymbolicFactorization on the matrix A.
std::vector< int > Ipiv_
Amesos_Merikos(const Epetra_LinearProblem &LinearProblem)
Amesos_Merikos Constructor.
int Solve()
Solves A X = B.
Epetra_CrsMatrix * FatOut_
int NumNumericFact() const
Returns the number of numeric factorizations performed by this object.
Amesos_Merikos: A parallel divide and conquer solver.
Epetra_CrsMatrix * U
Amesos_BaseSolver: A pure virtual class for direct solution of real-valued double-precision operators...
int ConvertToScalapack()
int SetParameters(Teuchos::ParameterList &ParameterList)
Updates internal variables.