I have done extensive research on similar problems in the past:
~50 species, Bott difference-based advection, Crank-Nicholson
diffusion, QSSA and EBI chemical solvers. That experience
is behind the MAQSIP-RT model we use for our air-quality
forecast work.
Based on that experience, I can give you this advice:
1) Subscript order is *really* important.
2) Relationship between loop order and subscript order is also
really important.
3) Subroutine-call structure can cost you some performance.
If you put species subscript as "fastest" (leftmost in Fortran), and
code all the processes with species-loop innermost, then you get at
least factor-of-three improvement over the species-subscript
"slowest"/species-loop outermost approach, in my experience. Note also
that having both the species and spatial loops visible at the same time
is also important; the structure you describe (where species loop
encloses subroutine calls that contain the spatial loops) prevents many
optimizations, and can easily cost factor-of-five performance. Or even
more-- MAQSIP-RT outperforms an equivalent STEM by a factor of 12 and
CAMx by a factor of 5, for typical problem-sizes...
You can find the notes for a seminar I presented to the US EPA's
Office of Research and Development in 2002 at URL
<
http://www.baronams.com/staff/coats/epa2002_seminar.html>
(The examples were deliberately chosen from met model code rather
than atmospheric chemistry-transport code, so as to avoid stepping
on anyone's toes...Sorry)
Carlie J. Coats, Jr.,Ph.D.
Chief Systems Architect
Baron Advanced Meteorological Systems, LLC.