<br><font size=2 face="sans-serif">Hello,</font>
<p><font size=2 face="sans-serif">I'm running on a relatively large cluster
(160 nodes, dual-core dual-socket) with IB connecting all nodes.</font>
<p><font size=2 face="sans-serif">I recompiled MVAPICH 0.9.8 because I
wanted to run under IBM's batch scheduler, LoadLeveler, and that worked
fine.</font>
<p><font size=2 face="sans-serif">The IB implementation is with Voltaire
PCIe adapters and I compiled MVAPICH using the "make.mvapich.gen2"
script with appropriate modifications. I'm using Pathscale compilers, for
example.</font>
<p><font size=2 face="sans-serif">With anything like a "reasonable"
number of nodes (sometimes even 16, but >=64 for sure) I'm getting failures:</font>
<p><tt><font size=2>[chpcc022:14] Got completion with error, code=12, dest
rank=78 at line 397 in file viacheck.c</font></tt>
<br>
<br><font size=2 face="sans-serif">I have now recompiled MVAPICH with -DON_DEMAND
and, at run-time, VIADEV_CM_TIMEOUT=5000000.</font>
<p><font size=2 face="sans-serif">[REQUEST: the documentation is unclear
but the value for this parameter needs to be specified in microseconds,
I believe]</font>
<p><font size=2 face="sans-serif">Now my job is running, but it's probably
running very badly; in due course I plan on changing this timeout value
to something less (but greater than the default).</font>
<p><font size=2 face="sans-serif">Just looking for now for any comments,
ideas, experiences, advice?</font>
<p><font size=2 face="sans-serif">Gratefully received of course,</font>
<p><font size=2 face="sans-serif">Thanks,</font>
<p><font size=2 face="sans-serif">Jonathan Follows<br>
Deep Computing, Consulting I/T Specialist<br>
IBM UK, Manchester [Internal 487099]<br>
POST: c/o IBM UK Limited, NHBR-1PH, Portsmouth PO6 3AU<br>
Tel: (+44) 1619057099 FAX: (+44) 870 1385642<br>
Mobile: (+44) 7764660714 MOBX 273842<br>
E-mail: Jonathan_Follows@uk.ibm.com<br>
Text messaging: http://www.jonathanfollows.com/pageme.html<br>
</font><font size=3 face="sans-serif"><br>
</font>
<br><font size=3 face="sans-serif"><br>
</font>
<hr><font size=2 face="sans-serif"><br>
<i><br>
</i></font>
<p><font size=2 face="sans-serif"><i>Unless stated otherwise above:<br>
IBM United Kingdom Limited - Registered in England and Wales with number
741598. <br>
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU</i></font>
<p><font size=2 face="sans-serif"><br>
</font><font size=3 face="sans-serif"><br>
</font>
<br>
<br><font size=3 face="sans-serif"><br>
</font>