>> 4) now I know what a "Profiler" is, and it has nothing to do with the
>> CIA or FBI.
>> The OS is Win XP.
>> 5) The g95 compiler has the following Option Synopsis:
>> g95 [-g] [-pg] ....Debug options; and
>> g95 [-O[n] ] ....Optimization level, n=0, 1,2,3
>> According to the ref. manual: "zero (default) means no optimization,
>> and 3 means more aggressive optimization, which makes the code
>> faster."
>
> -O and -O2 typically give the best performance. The most aggressive
> optimization turned on by -O3 is additional function inlining. g95,
> like gfortran, can only make limited use of the additional inlining.
>
>> 6) the program currently has NO optimization. Will use -O3 option
>> once I complete all the code improvements suggested by you and others.
>> I'll try to re-compile with:
>> ....g95 -pg -O3 -march=pentium4 -funroll-loops -o program program.for
>> and see what happens! Do you agree ?? (plse see item 19 below)
>
> -pg will absolutely kill performance. It inserts code to not only
> count the number of times a routine is called and the execution,
> but keeps tracking of the call graph (ie., the other routines that
> called the routine). -pg is simply a debugging aid. I'd suggest
> testing your code with the following sets of options:
>
> -O
> -O2
> -O -funroll-loops
> -O2 -funroll-loops
>
> You also add the -march=pentium4, but make sure you have the right
> arch.
>
>> 7) your suggestion reg picking a different compiler is a reasonable
>> one, yet I would like to stick with g95 for now.
>> There's a good chance that tweaking the code here and there, and
>> properly re-compiling with the optimization options (item 19 below)
>> would result in a faster code. How fast? Well, it's hard to tell in
>> advance, but it's worth trying! Agree ??
>
> One advantage of using more than one compiler is that each compiler
> has its own strengths and weakness in catching nonconforming
> code.
>
>> glen herrmannsfeldt:
>> 10) your advice is well taken reg adding extra temporary variables for
>> the repeated subexpressions (as suggested earlier by Richard Maine,
>> item 1) might result in a slower code.
>
> An issue that Richard hinted at that others have not discussed is
> the computation of polynomials in your code. IIRC, you have code
> of the form
>
> y = c0 + c1 * x + c2 * x**2 + c3 * x**3 + c4 * x**4
>
> It is usually a better idea to compute the above as
>
> y = c0 + x * (c1 + x * (c2 + x * (c3 + c4 * x))))
>
> The above re-arrangement of a polynomial is known as Horner's
> method. There are other algorithms.
>
>> 16) Thank you for your suggestion. The program currently has no
>> optimization. Once I complete the changes (items 1,2,3,8, etc.), I'll
>> re-compile the entire program with:
>> ....g95 -pg -O3 -march=pentium4 -funroll-loops -o program program.for
>> 17) Do you agree with the above options in the command line ??
>> Do they've to be in a certain order ?? (plse see item 19 below)
>
> Order doesn't matter. Drop the -pg if you aren't profiling the code.
>
>> linu:
>> 19) you're suggesting compilation with:
>> ....g95 -msse2 -mfpmath=sse -ftree-vectorize -O3
>> Are these options to be added to the command line: (items 16 & 17
>> above)
Steve;
>-pg will absolutely kill performance. -pg is simply a debugging aid. I'd suggest
>testing your code with the following sets of options:
>-O
>-O2
>-O -funroll-loops
>-O2 -funroll-loops
>You also add the -march=pentium4, but make sure you have the right arch.
Will do.
>An issue that Richard hinted at that others have not discussed is
>the computation of polynomials in your code. IIRC, you have code
>of the form:
> y = c0 + c1 * x + c2 * x**2 + c3 * x**3 + c4 * x**4
>It is usually a better idea to compute the above as
> y = c0 + x * (c1 + x * (c2 + x * (c3 + c4 * x))))
I'm sure I've used the Horner's re-arrangement throughout, but will re-
check to make sure.
>Order (of options in the command line) doesn't matter.
Good!
>Check your version of g95. -ftree-vectorize may have no effect
>if you have a
4.0.x version of the backend.
It's g95 (Oct 2006) version 0.91.
So, basically I don't know! Will try the option anyway!
Thank you.
Monir