<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<title></title>
<meta name="GENERATOR" content="OpenOffice.org 2.0 (Linux)">
<meta name="AUTHOR" content="Patrice Martinez">
<meta name="CREATED" content="20070925;8402200">
<meta name="CHANGED" content="16010102;0">
<style type="text/css">
        <!--
                @page { size: 21cm 29.7cm; margin: 2cm }
                P { margin-bottom: 0.21cm }
        --></style>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">Hello,</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">I encounter problem
running linpack benchmark with mvapich2 configured for BLCR support:
computations are sometimes right, sometimes wrong.<br>
Let me describe the context:<br>
</p>
<br>
<blockquote>
<blockquote><u>Hardware used:<br>
<br>
</u></blockquote>
</blockquote>
<ol>
<ol>
<li>
<p style="margin-bottom: 0cm;"> Bull Novascale R422, 2xXeon Core
2 Duo 5150@ 2.66 Ghz, 8Gb de RAM</p>
</li>
<li>
<p style="margin-bottom: 0cm;"> IB HCA Mellanox MT25208 dual-port</p>
</li>
</ol>
</ol>
<blockquote>
<blockquote><u>Software used<br>
<br>
</u></blockquote>
</blockquote>
<ol>
<ol start="3">
<li>
<p style="margin-bottom: 0cm;"> RHEL4 U4, kernel 2.6.9.42-ELSmp,</p>
</li>
<li>
<p style="margin-bottom: 0cm;"> gcc-3.4.6</p>
</li>
<li>
<p style="margin-bottom: 0cm;"> intel mkl 9.1</p>
</li>
<li>
<p style="margin-bottom: 0cm;"> blcr-0.6.0, </p>
</li>
<li>
<p style="margin-bottom: 0cm;"> mvapich2-1.0,</p>
</li>
<li>
<p style="margin-bottom: 0cm;"> OFED-1.2.5.1,</p>
</li>
<li>
<p style="margin-bottom: 0cm;"> linpack-9.1</p>
</li>
</ol>
<br>
<br>
</ol>
<blockquote>
<blockquote>
<p style="margin-bottom: 0cm;"></p>
</blockquote>
</blockquote>
<blockquote>
<blockquote><u>Tests</u></blockquote>
</blockquote>
<blockquote>
<blockquote>
<p style="margin-bottom: 0cm;"><u><br>
</u></p>
</blockquote>
</blockquote>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">-For this test, the
two ports of the IB HCA are connected together.<br>
</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">-I made the
following link to avoid problems forwarding environment variables:<br>
</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">#l
/lib64/libcr.so.0 <br>
lrwxrwxrwx 1 root root 23 Sep 21 11:19 /lib64/libcr.so.0 ->
/usr/local/lib/libcr.so<br>
</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">- blcr
modules are loaded:</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;"><font
face="Utah MT, sans-serif"><font style="font-size: 11pt;" size="2">service
blcr start</font></font></p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">- mpd daemon is run:</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;"><font
style="font-size: 11pt;" size="2">mpdboot --ncpus=4 </font>
</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">- And finally,
linpack is configured to invert a small matrix (N=5000), and linpack is
executed:<br>
</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;"> <font
style="font-size: 11pt;" size="2">mpiexec
-n 4 ./xhpl</font> <u><br>
</u></p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;"><u><br>
</u></p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;"><u>Analyse</u></p>
<br>
<ol start="4">
</ol>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">Depending on the
parameters P and Q given in the HPL.dat file, computations are always
right or always wrong...<br>
With P=4, Q=1:<br>
============================================================================<br>
T/V N NB P Q Time
Gflops<br>
----------------------------------------------------------------------------<br>
W00C2L4 5000 112 4 1 4.28
1.948e+01<br>
----------------------------------------------------------------------------<br>
||Ax-b||_oo / ( eps * ||A||_1 * N ) =
25110713646301407346688.0000000 ...... FAILED<br>
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 155458419119.8088379
...... FAILED<br>
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 17288875125.5442734
...... FAILED<br>
||Ax-b||_oo . . . . . . . . . . . . . . . . . = 17973740643825.015625<br>
||A||_oo . . . . . . . . . . . . . . . . . . . = 1283.266028<br>
||A||_1 . . . . . . . . . . . . . . . . . . . = 1289.434188<br>
||x||_oo . . . . . . . . . . . . . . . . . . . = 1459401545070.356201<br>
||x||_1 . . . . . . . . . . . . . . . . . . . = 807634407595160.750000<br>
============================================================================<br>
</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">With P=2, Q=2</p>
<p style="margin-left: 1.27cm; margin-bottom: 0cm;">============================================================================<br>
T/V N NB P Q Time
Gflops<br>
----------------------------------------------------------------------------<br>
W00C2L4 5000 112 2 2 3.39
2.459e+01<br>
----------------------------------------------------------------------------<br>
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0420265 ......
PASSED<br>
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0277438 ......
PASSED<br>
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0054156 ......
PASSED<br>
============================================================================<br>
</p>
<br>
It is interesting to see that computations are faster when they're
right...<br>
<br>
When using mvapich2 compiled without BLCR support, computations are
always right, of course.<br>
Any idea?<br>
<br>
<pre class="moz-signature" cols="72">--
Cordialement/Best regards
Patrice Martinez
Linux Kernel Architect.
OFFICE : B1-405
PHONE : +33 (0)4 76 29 74 69
EMAIL : <a class="moz-txt-link-abbreviated"
href="mailto:Patrice.martinez@bull.net">Patrice.martinez@bull.net</a>
ADDR : BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE
</pre>
</body>
</html>