Документ взят из кэша поисковой машины. Адрес оригинального документа : http://classic.chem.msu.su/gran/gamess/gcmcscf.pdf
Дата изменения: Wed Nov 22 14:24:34 2006
Дата индексирования: Mon Oct 1 19:41:39 2012
Кодировка:
Fast direct large-scale MCSCF code for Segmented and General Contraction Basis Sets
Alexander A. Granovsky

Laboratory of Chemical Cybernetics, M.V. Lomonosov Moscow State University, Moscow, Russia
September 14, 2005


Large-scale MCSCF
Main steps of MCSCF iteration ("unfolded two step" type)
Integral transformation CI problem DM1 & DM2 calculation Orbital improvement
Multiple different strategies based on linear, quasi-linear, or quadratic minimization methods

Large basis sets, medium size active spaces
Performance limited by integral transformation

Large active spaces, small basis set
Performance limited by CI matrix diagonalization


Memory requirements
Integral transformation
C·N3 C·N2

CI matrix diagonalization:
C·Ndet

Orbitals improvement
up to C·(N2+ Ndet)2 ("folded one- & two-step") C·N4 C·N3 C·N2 (example: quasi-Newton type methods)


Classification of transformed 2-e integrals
Orbital types:
o - doubly occupied (core) a - active space (valence) v - virtual p, q, r, s - arbitrary

(pq|rs) types:
(aa|aa) & Fock matrix - required always (CI step) (aa|rs) - required for calculation of the diagonal part of orbital Hessian and quasi-Newton orbital improvement methods (o+a,q|rs) - required for full orbital Hessian and true Newton-type orbital improvement step (integrals with three virtual indices are not needed)


Method selection for large-scale MCSCF
Memory requirements: C·N2 =>
Dedicated low-memory demands integral transformation code

Quasi-Newton orbital improvement step
Fast Modest memory demands Requires only small subset of transformed integrals =>
simpler and more efficient integral transformation


Main problem
Special efficient integral transformation code for (aa|rs)-type integrals with:
Quadratic memory demands Ability to handle both SC and GC basis sets efficiently High parallel mode scalability


Integral transformation basics
(pq|rs) = CpCqCrCs (|) Usually considered as a sequence of four quartertransformations:
(p|) = Cp(|) (pq|) = Cq(p|), etc...

Alternative approach:
(pq|) = Cq Cp (|)
D(pq)= Cq Cp J(pq) = (pq|) = D(pq) (|)

Reminiscence: Fock Matrix
F2(D) = J(D) - K(D)
J


= (|)D


Approach comparison
Standard approach (four sequential quartertransformations):
Asymptotically naN2 operations Straightforward to utilize the eightfold permutation symmetry of ERIs N3 memory demands Limited parallel scalability

Alternative approach:
Asymptotically na2N2 operations Straightforward to utilize the eightfold permutation symmetry of ERIs N2 memory demands High degree of scalability Implementation based on our direct Fock matrix construction code


Alternative approach: pros and cons
Pros
For small active spaces, na is small => additional overhead due to worse asymptotic can be neglected as dominant part of the calculations is evaluation of ERIs in AO basis Modest memory requirements Allows direct generalization to GC case based on our approach to Fock matrix construction for GC-type basis sets High level of intrinsic parallelism

Cons
For larger active spaces, na2 is significantly larger than na => additional overhead due to different asymptotic is considerable For GC-type basis sets, additional overhead is even more serious if using our strategy of Fock-like matrix builds.


Optimal strategy
Small active spaces:
use alternative approach for both SC and GCtype basis sets

Larger active spaces:
use something else (but not the standard approach in its straightforward implementation)


Standard way modification
Why standard way requires so much memory?
Because it utilizes eightfold permutation symmetry of ERIs:
Cp(|) Cp(|) Cp(|) Cp(|) -> -> -> -> (p|) (p|) (p|) (p|)

Solution:
use only fourfold permutation symmetry
Cp(|) -> (p|) Cp(|) -> (p|) Compute (p|) for all and fixed , then perform second half-transformation (matrix multiplication) (pq|) = Cq(p|) ( fixed) and store


Modified vs. standard way
Larger overhead due to ERI reevaluation
Not significant for large active spaces

Requires much less memory (the same amount as the alternative approach) Has the same parallel scaling properties as the alternative approach Has the same good naN2 operations count asymptotic as the standard way Allows efficient generalization for GC-type basis sets based on our approach to Fock matrix construction


Generalization for GC basis sets
(pq|rs) = CpCqCrCs (|) (|) = M N L S MCNCLCS (MN|LS)
=> new transformation matrix is simply C*C

(pq|rs) = CpCqCrCs M N L S CMCNCLCS (MN|LS) (pq|rs) = M N L S ( CM Cp) ( CN Cq )( CL Cr)( CS Cs)(MN|LS)

It is not efficient for standard way as N would be replaced by much larger Nprim, dramatically increasing memory demands and computational costs It is much more efficient for modified way and MCSCF due to
different memory asymptotic small values of na required for MCSCF integral transformation


Thank you for your attention!