From vera_wx_cn at yahoo.com.cn Tue Sep 2 22:30:31 2008 From: vera_wx_cn at yahoo.com.cn (=?gb2312?q?=C7=BF=20=C2=ED?=) Date: Tue Sep 2 22:30:50 2008 Subject: [mvapich-discuss] the kernel panic Message-ID: <867459.72566.qm@web15301.mail.cnb.yahoo.com> Hello. My NAS programs run with mvapich-1.0(gen2) on IA64 cluster. Now the kernel panic everytime, but run well with mvapich-1.0(tcp). ibstat show: CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.1.0 Hardware version: a0 panic information: Kernel panic - not syncing: arch/ia64/hp/common/sba_iommu.c: I/O MMU @ c0000000fed01000 is out of mapping resources kernel BUG at kernel/panic.c:75! ft.C.4[3367]: bugcheck! 0 [1] Modules linked in: blcr(U) blcr_vmadump(U) blcr_imports(U) nfs(U) lockd(U) nfs_acl(U) osc(U) mgc(U) lustre(U) lov(U) lquota(U) mdc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_cm(U) netconsole(U) ib_addr(U) netdump(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) ib_ipoib(U) ds(U) yenta_socket(U) pcmcia_core(U) vfat(U) fat(U) dm_mirror(U) dm_multipath(U) dm_mod(U) button(U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) tg3(U) ext3(U) jbd(U) mptscsih(U) mptfc(U) mptsas(U) mptspi(U) mptscsi(U) mptbase(U) usb_storage(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) sd_mod(U) scsi_mod(U) Pid: 3367, CPU 3, comm: ft.C.4 psr : 0000101008122030 ifs : 8000000000000814 ip : [] Tainted: GF ip is at panic+0x5f0/0x6a0 unat: 0000000000000000 pfs : 0000000000000814 rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : fa0166a6855a59a9 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000100077410 b6 : a00000010025ebe0 b7 : a00000010025ebe0 f6 : 1003e00000000000000a0 f7 : 1003e0000000000000001 f8 : 1003e00000000000000a0 f9 : 10002a000000000000000 f10 : 0fffeb33333332fa80000 f11 : 1003e0000000000000000 r1 : a0000001009cc240 r2 : 000000000005bac7 r3 : a0000001007cc898 r8 : 0000000000000021 r9 : a0000001007df5b0 r10 : 0000000000000fff r11 : 0000000000ffffff r12 : e00001006797fd40 r13 : e000010067978000 r14 : 0000000000004000 r15 : a000000100778bd8 r16 : 0000000000000001 r17 : a0000001007e0108 r18 : ffffffffffc66d68 r19 : a000000100611258 r20 : a000000100611248 r21 : a0000001007dbd68 r22 : e0000000066e0404 r23 : e0000000066e0380 r24 : 0000000000000002 r25 : 0000000000000002 r26 : e0000000066e03d4 r27 : 0000001008122030 r28 : e0000000066e03d4 r29 : a000000100669e28 r30 : 0000000000000000 r31 : a0000001007df588 Call Trace: [] show_stack+0x80/0xa0 sp=e00001006797f8b0 bsp=e000010067979470 [] show_regs+0x890/0x8c0 sp=e00001006797fa80 bsp=e000010067979428 [] die+0x150/0x240 sp=e00001006797faa0 bsp=e0000100679793e0 [] die_if_kernel+0x40/0x60 sp=e00001006797faa0 bsp=e0000100679793b0 [] ia64_bad_break+0x180/0x600 sp=e00001006797faa0 bsp=e000010067979388 [] ia64_leave_kernel+0x0/0x260 sp=e00001006797fb70 bsp=e000010067979388 [] panic+0x5f0/0x6a0 sp=e00001006797fd40 bsp=e0000100679792e8 [] sba_alloc_range+0xa80/0x16e0 sp=e00001006797fda0 bsp=e000010067979278 [] sba_map_sg+0x380/0x760 sp=e00001006797fda0 bsp=e0000100679791e0 [] ib_umem_get+0x770/0xa80 [ib_uverbs] sp=e00001006797fdb0 bsp=e000010067979120 [] ib_uverbs_reg_mr+0x2a0/0x9a0 [ib_uverbs] sp=e00001006797fdb0 bsp=e0000100679790a8 [] ib_uverbs_write+0x210/0x280 [ib_uverbs] sp=e00001006797fe10 bsp=e000010067979078 [] vfs_write+0x290/0x360 sp=e00001006797fe20 bsp=e000010067979028 [] sys_write+0x70/0xe0 sp=e00001006797fe20 bsp=e000010067978fa8 [] ia64_ret_from_syscall+0x0/0x20 sp=e00001006797fe30 bsp=e000010067978fa8 [] 0xa000000000010640 sp=e000010067980000 bsp=e000010067978fa8 --------------------------------- ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080903/9463d927/attachment.html From vera_wx_cn at yahoo.com.cn Tue Sep 2 22:42:56 2008 From: vera_wx_cn at yahoo.com.cn (=?gb2312?q?=C7=BF=20=C2=ED?=) Date: Tue Sep 2 22:43:14 2008 Subject: [mvapich-discuss] mvapich-1.0(gen2) panic our IA64 cluster, but mvapich-1.0(tcp) not Message-ID: <394868.80731.qm@web15306.mail.cnb.yahoo.com> Hello. My NAS programs run with mvapich-1.0(gen2) on IA64 cluster. Now the kernel panic everytime, but run well with mvapich-1.0(tcp). ibstat show: CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.1.0 Hardware version: a0 panic information: Kernel panic - not syncing: arch/ia64/hp/common/sba_iommu.c: I/O MMU @ c0000000fed01000 is out of mapping resources kernel BUG at kernel/panic.c:75! ft.C.4[3367]: bugcheck! 0 [1] Modules linked in: blcr(U) blcr_vmadump(U) blcr_imports(U) nfs(U) lockd(U) nfs_acl(U) osc(U) mgc(U) lustre(U) lov(U) lquota(U) mdc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_cm(U) netconsole(U) ib_addr(U) netdump(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) ib_ipoib(U) ds(U) yenta_socket(U) pcmcia_core(U) vfat(U) fat(U) dm_mirror(U) dm_multipath(U) dm_mod(U) button(U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) tg3(U) ext3(U) jbd(U) mptscsih(U) mptfc(U) mptsas(U) mptspi(U) mptscsi(U) mptbase(U) usb_storage(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) sd_mod(U) scsi_mod(U) Pid: 3367, CPU 3, comm: ft.C.4 psr : 0000101008122030 ifs : 8000000000000814 ip : [] Tainted: GF ip is at panic+0x5f0/0x6a0 unat: 0000000000000000 pfs : 0000000000000814 rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : fa0166a6855a59a9 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000100077410 b6 : a00000010025ebe0 b7 : a00000010025ebe0 f6 : 1003e00000000000000a0 f7 : 1003e0000000000000001 f8 : 1003e00000000000000a0 f9 : 10002a000000000000000 f10 : 0fffeb33333332fa80000 f11 : 1003e0000000000000000 r1 : a0000001009cc240 r2 : 000000000005bac7 r3 : a0000001007cc898 r8 : 0000000000000021 r9 : a0000001007df5b0 r10 : 0000000000000fff r11 : 0000000000ffffff r12 : e00001006797fd40 r13 : e000010067978000 r14 : 0000000000004000 r15 : a000000100778bd8 r16 : 0000000000000001 r17 : a0000001007e0108 r18 : ffffffffffc66d68 r19 : a000000100611258 r20 : a000000100611248 r21 : a0000001007dbd68 r22 : e0000000066e0404 r23 : e0000000066e0380 r24 : 0000000000000002 r25 : 0000000000000002 r26 : e0000000066e03d4 r27 : 0000001008122030 r28 : e0000000066e03d4 r29 : a000000100669e28 r30 : 0000000000000000 r31 : a0000001007df588 Call Trace: [] show_stack+0x80/0xa0 sp=e00001006797f8b0 bsp=e000010067979470 [] show_regs+0x890/0x8c0 sp=e00001006797fa80 bsp=e000010067979428 [] die+0x150/0x240 sp=e00001006797faa0 bsp=e0000100679793e0 [] die_if_kernel+0x40/0x60 sp=e00001006797faa0 bsp=e0000100679793b0 [] ia64_bad_break+0x180/0x600 sp=e00001006797faa0 bsp=e000010067979388 [] ia64_leave_kernel+0x0/0x260 sp=e00001006797fb70 bsp=e000010067979388 [] panic+0x5f0/0x6a0 sp=e00001006797fd40 bsp=e0000100679792e8 [] sba_alloc_range+0xa80/0x16e0 sp=e00001006797fda0 bsp=e000010067979278 [] sba_map_sg+0x380/0x760 sp=e00001006797fda0 bsp=e0000100679791e0 [] ib_umem_get+0x770/0xa80 [ib_uverbs] sp=e00001006797fdb0 bsp=e000010067979120 [] ib_uverbs_reg_mr+0x2a0/0x9a0 [ib_uverbs] sp=e00001006797fdb0 bsp=e0000100679790a8 [] ib_uverbs_write+0x210/0x280 [ib_uverbs] sp=e00001006797fe10 bsp=e000010067979078 [] vfs_write+0x290/0x360 sp=e00001006797fe20 bsp=e000010067979028 [] sys_write+0x70/0xe0 sp=e00001006797fe20 bsp=e000010067978fa8 [] ia64_ret_from_syscall+0x0/0x20 sp=e00001006797fe30 bsp=e000010067978fa8 [] 0xa000000000010640 sp=e000010067980000 bsp=e000010067978fa8 --------------------------------- ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080903/5dc0381e/attachment-0001.html From perkinjo at cse.ohio-state.edu Wed Sep 3 10:53:37 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Wed Sep 3 10:54:51 2008 Subject: [mvapich-discuss] RE: MVAPICH 1.0.0 and stdin In-Reply-To: References: <6DB5B58A8E5AB846A7B3B3BFF1B4315A01EA86AD@AVEXCH1.qlogic.org> Message-ID: <20080903145336.GC2908@cse.ohio-state.edu> Mark: Attached is a potential fix for this issue. Can you apply the patch and let us know whether it solves your problem? We'll make sure this is resolved in our next release. On Fri, Aug 29, 2008 at 02:56:51PM -0400, Dhabaleswar Panda wrote: > Hi Mark, > > Thanks for reporting this problem and the associated details regarding > where things are failing. We will work on a fix for this for the upcoming > 1.1 release. > > Thanks, > > DK > > On Fri, 29 Aug 2008, Mark Debbage wrote: > > > OK, this turns out to be pretty straightforward. > > spawn_linear (the legacy spawner) arranges for stdin to > > be propagated to just rank 0 and uses /dev/null for all > > other ranks: > > > > if (i != 0) { > > int fd = open("/dev/null", O_RDWR, 0); > > (void) dup2(fd, STDIN_FILENO); > > } > > > > spawn_fast (the new spawner) doesn't have any code to do > > this. My guess is that the local ssh processes for the other > > ranks are looking at stdin (maybe just polling it) and stealing > > the stdin from rank 0. > > > > Can you include a fix for this in your next release? Thanks, > > > > Mark. > > > > -----Original Message----- > > From: Mark Debbage > > Sent: Fri 8/29/2008 10:45 AM > > To: Mark Debbage; mvapich-discuss@cse.ohio-state.edu > > Subject: RE: MVAPICH 1.0.0 and stdin > > > > This is a resend with in-line attachment. Also note that the > > problem does not occur with MVAPICH 0.9.9. If I use MVAPICH 1.0.0 > > and arrange to use the "legacy" start-up mechanism then it also > > works reliably. For example: > > > > /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun_rsh -legacy -np 2 -hostfile hosts /home/markdebbage/support/OU/./mpicat < input > > > > This makes me think that the new source code allowing multiple > > MPI processes per ssh is the problem, though in this case there > > is just one MPI process per node. > > > > Mark. > > > > > > -----Original Message----- > > From: Mark Debbage > > Sent: Fri 8/29/2008 10:25 AM > > To: mvapich-discuss@cse.ohio-state.edu > > Subject: MVAPICH 1.0.0 and stdin > > > > We are having problems with stdin and MVAPICH 1.0.0 (from OFED 1.3). > > I am running with the mpirun process and rank 0 on the same host > > and expecting the stdin of the mpirun process to be available to > > rank 0. This works reliably if there is just one process in the job, > > or if all MPI processes are mapped to that same host. However, if > > there are MPI processes on other hosts, then stdin becomes > > intermittent - about 4 in 5 times it works fine, but 1 in 5 times > > all reads on stdin return EOF. > > > > I've attached the example source code. It is a simple MPI version > > of cat. I am building and running like this: > > > > markdebbage@perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpicc mpicat.c -o mpicat > > > > markdebbage@perf-15:~/support/OU> cat hosts > > perf-15 > > perf-16 > > > > Here's a working run: > > > > markdebbage@perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun -machinefile hosts -np 2 ./mpicat < input > > This is rank 0 - start loop > > 1 > > 2 > > 3 > > 4 > > 5 > > 6 > > 999 > > This is rank 0 - end loop > > > > Here's a non-working run: > > > > markdebbage@perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun -machinefile hosts -np 2 ./mpicat < input > > This is rank 0 - start loop > > This is rank 0 - end loop > > markdebbage@perf-15:~/support/OU> > > > > I've tried this with OFED 1.3 running on Mellanox and QLogic adapters, > > and also with the PSM version of MVAPICH running on QLogic adapters. > > It appears that this is independent of transport. I also tried the > > -stdin option that appears on the mpirun help page. However, that > > seems to be silently ignored. I can see the code in mpirun.args that > > processes that option but it doesn't appear to be connected up to > > anything. > > > > Cheers, > > > > Mark. > > > > #include > > #include > > #include > > > > int main (int argc, char **argv) > > { > > int rank; > > MPI_Init(&argc, &argv); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > if (rank == 0) { > > printf("This is rank 0 - start loop\n"); > > int c; > > while ((c = getchar()) != EOF) { > > putchar(c); > > } > > printf("This is rank 0 - end loop\n"); > > } > > MPI_Finalize(); > > return EXIT_SUCCESS; > > } > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- Index: mpid/ch_gen2/process/mpirun_rsh.c =================================================================== --- mpid/ch_gen2/process/mpirun_rsh.c (revision 2965) +++ mpid/ch_gen2/process/mpirun_rsh.c (working copy) @@ -1969,6 +1969,10 @@ exit(EXIT_SUCCESS); } + if(strcmp(pglist->data[i].hostname, plist[0].hostname)) { + close(STDIN_FILENO); + } + execv(argv[0], (char* const*) argv); perror("execv"); From panda at cse.ohio-state.edu Wed Sep 3 12:22:52 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Sep 3 12:23:07 2008 Subject: [mvapich-discuss] Process Termination Detection with mpirun_rsh In-Reply-To: Message-ID: Hi Fred and Tom, This is to let you know that we have come up with a solution for the "mpirun_rsh -rsh" problem. This solution also solves the hanging process issue. These solutions have been applied to the following versions: MVAPICH 1.0 branch and trunk MVAPICH2 1.2 trunk These solutions will get reflected on the nightly tarballs tonight. Please try these latest versions (tarballs or directly from SVN) and let us know whether it solves all of your problems. Thanks, DK On Wed, 20 Aug 2008, Stecher, Fred wrote: > Tom, > We use MVAPICH-1.0 which comes with mpirun_rsh. It has the same problem > and we do not use a scheduler. We have to check the nodes when a run is > aborted by the application. For a node that still has processes running > even though they should have been aborted, we have to kill one process > at a time to clear the node. I would think that this is a known problem > and should be corrected soon. > > > Fred > > > -----Original Message----- > From: mvapich-discuss-bounces@cse.ohio-state.edu > [mailto:mvapich-discuss-bounces@cse.ohio-state.edu] On Behalf Of Tom > Crockett > Sent: Tuesday, August 19, 2008 6:20 PM > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] Process Termination Detection with mpirun_rsh > > Hi, > > I've recently installed MVAPICH2 1.2rc1 on my cluster, and have been > experimenting with the new mpirun_rsh job launcher. In general, I much > prefer this simpler approach, and have found it to be faster and more > reliable than MPD. However, I'm having one fairly serious problem > relating to termination detection when processes abort. > > Here's the scenario: > > 1. Launch an MPI job on multiple nodes via "mpirun_rsh -rsh", typically > with multiple processes per node (multi-process, multi-core). > > 2. One process dies, e.g., with a segmentation violation, on some random > node. > > 3. The node with the offending process seems to notice this locally; all > the sibling processes and the local mpispawn process terminate. > However, the remaining nodes (including the master) don't seem to > notice; their processes continue to run (or more likely stall, waiting > on communication which will never arrive). > > > If I run this experiment on two nodes (for example) and look at the > process state on the master node before the process dies on the remote > node, I see two sets of "rsh" processes, with one active process and one > defunct process in each set. "ps" shows that each defunct "rsh" is a > child of an active process. > > Following abnormal process termination on the remote node, there will be > only one active rsh process and one defunct rsh process, confirming that > the remote processes have cleaned up and exited. So it seems that > mpirun_rsh is not responding properly to the death of a child process. > > Here's a concrete example showing the process state on the master node > following termination of the processes on the remote node: > > 11 [ty10] /bin/ps -utom -o 'user pid ppid s nice vsz rss pmem time > fname' > USER PID PPID S NI VSZ RSS %MEM TIME COMMAND > tom 6218 14345 S 0 9000 1984 0.0 00:00:00 tcsh > tom 6219 6218 S 0 1772 428 0.0 00:00:00 pbs_demu > tom 6251 6218 S 0 9368 1588 0.0 00:00:00 28027.ty > tom 6252 6251 S 0 12216 3072 0.0 00:00:00 pbsmvp2 > tom 6257 6252 S 0 5288 676 0.0 00:00:00 mpirun_r > tom 6258 6257 S 0 6396 692 0.0 00:00:00 rsh > tom 6261 6260 S 0 9784 2096 0.0 00:00:00 tcsh > tom 6262 6258 Z 0 0 0 0.0 00:00:00 rsh > tom 6307 6261 S 0 5492 712 0.0 00:00:00 mpispawn > tom 6308 6307 R 0 8038032 19576 0.2 00:04:48 rand4 > tom 6309 6307 R 0 8038036 14416 0.1 00:05:06 rand4 > tom 6310 6307 R 0 8037904 14264 0.1 00:05:07 rand4 > tom 6311 6307 R 0 8038032 14328 0.1 00:05:06 rand4 > > Interestingly, whether the master node detects the remote process > termination seems to depend on how the remote process dies. If I hit > the remote process with a SIGTERM, mpirun_rsh seems to notice and things > get cleaned up after a minute or two. If it terminates with something > else (e.g., a SIGSEGV), the job will sit there forever. > > Finally, it's not just remote nodes that suffer from this problem. The > behavior is the same if it's a local process on the master node that > aborts -- the local rsh and its descendants disappear, but mpirun_rsh > and processes on remote nodes persist. > > Now for a few more specifics about our environment: > > OS: SuSE Linux Enterprise Server 10 SP1 > Compiler: PGI 7.1-4 > InfiniBand: OFED 1.3 > Scheduler: TORQUE 2.2.1 > Hardware Platform: Dell SC1435 (Opteron 2218) > > Eventually, of course, the job scheduler will timeout the job and kill > the master mpirun_rsh process, which seems to clean everything up OK. > (In general, top-down kills by the scheduler seem to work fine. It's > bottom-up termination that's problematic.) But much of our workload has > very long runtimes (on the order of days to weeks), and my users don't > want to wait that long only to find out that their job actually bombed > with a segfault several days earlier. > > Any thoughts on what might be causing this and how to fix it? > > -Tom > > -- > Tom Crockett > > College of William and Mary email: twcroc@wm.edu > IT/High Performance Computing Group phone: (757) 221-2762 > Savage House fax: (757) 221-2023 > P.O. Box 8795 > Williamsburg, VA 23187-8795 > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Wed Sep 3 13:49:29 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Sep 3 13:49:43 2008 Subject: [mvapich-discuss] mvapich-1.0(gen2) panic our IA64 cluster, but mvapich-1.0(tcp) not In-Reply-To: <394868.80731.qm@web15306.mail.cnb.yahoo.com> Message-ID: Are you able to run Gen2-level tests (not MPI-level tests) on this cluster for a long period of time without any panic? Your earlier posting indicates that you have many modules loaded (blcr, lustre, etc.). When running mvapich 1.0 with TCP/IP mode, the IB adapters are being invoked through IPoIB. When running mvapich 1.0 with gen2 mode, the IB adaters are being invoked through the native libibverbs library. Thus, it will be good for you to try out the Gen2-level tests (rdma_latency, rdma_bandwidth, etc.) first for a long period of time to see whether they run smoothly without any panic. DK On Wed, 3 Sep 2008, [gb2312] Ç¿ Âí wrote: > Hello. > > My NAS programs run with mvapich-1.0(gen2) on IA64 cluster. Now the kernel panic everytime, but run well with mvapich-1.0(tcp). > > ibstat show: > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.1.0 > Hardware version: a0 > panic information: > > Kernel panic - not syncing: arch/ia64/hp/common/sba_iommu.c: I/O MMU @ c0000000fed01000 is out of mapping resources > kernel BUG at kernel/panic.c:75! > ft.C.4[3367]: bugcheck! 0 [1] > Modules linked in: blcr(U) blcr_vmadump(U) blcr_imports(U) nfs(U) lockd(U) nfs_acl(U) osc(U) mgc(U) lustre(U) lov(U) lquota(U) mdc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_cm(U) netconsole(U) ib_addr(U) netdump(U) md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) ib_ipoib(U) ds(U) yenta_socket(U) pcmcia_core(U) vfat(U) fat(U) dm_mirror(U) dm_multipath(U) dm_mod(U) button(U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) tg3(U) ext3(U) jbd(U) mptscsih(U) mptfc(U) mptsas(U) mptspi(U) mptscsi(U) mptbase(U) usb_storage(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) sd_mod(U) scsi_mod(U) > Pid: 3367, CPU 3, comm: ft.C.4 > psr : 0000101008122030 ifs : 8000000000000814 ip : [] Tainted: GF > ip is at panic+0x5f0/0x6a0 > unat: 0000000000000000 pfs : 0000000000000814 rsc : 0000000000000003 > rnat: 0000000000000000 bsps: 0000000000000000 pr : fa0166a6855a59a9 > ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f > csd : 0000000000000000 ssd : 0000000000000000 > b0 : a000000100077410 b6 : a00000010025ebe0 b7 : a00000010025ebe0 > f6 : 1003e00000000000000a0 f7 : 1003e0000000000000001 > f8 : 1003e00000000000000a0 f9 : 10002a000000000000000 > f10 : 0fffeb33333332fa80000 f11 : 1003e0000000000000000 > r1 : a0000001009cc240 r2 : 000000000005bac7 r3 : a0000001007cc898 > r8 : 0000000000000021 r9 : a0000001007df5b0 r10 : 0000000000000fff > r11 : 0000000000ffffff r12 : e00001006797fd40 r13 : e000010067978000 > r14 : 0000000000004000 r15 : a000000100778bd8 r16 : 0000000000000001 > r17 : a0000001007e0108 r18 : ffffffffffc66d68 r19 : a000000100611258 > r20 : a000000100611248 r21 : a0000001007dbd68 r22 : e0000000066e0404 > r23 : e0000000066e0380 r24 : 0000000000000002 r25 : 0000000000000002 > r26 : e0000000066e03d4 r27 : 0000001008122030 r28 : e0000000066e03d4 > r29 : a000000100669e28 r30 : 0000000000000000 r31 : a0000001007df588 > Call Trace: > [] show_stack+0x80/0xa0 > sp=e00001006797f8b0 bsp=e000010067979470 > [] show_regs+0x890/0x8c0 > sp=e00001006797fa80 bsp=e000010067979428 > [] die+0x150/0x240 > sp=e00001006797faa0 bsp=e0000100679793e0 > [] die_if_kernel+0x40/0x60 > sp=e00001006797faa0 bsp=e0000100679793b0 > [] ia64_bad_break+0x180/0x600 > sp=e00001006797faa0 bsp=e000010067979388 > [] ia64_leave_kernel+0x0/0x260 > sp=e00001006797fb70 bsp=e000010067979388 > [] panic+0x5f0/0x6a0 > sp=e00001006797fd40 bsp=e0000100679792e8 > [] sba_alloc_range+0xa80/0x16e0 > sp=e00001006797fda0 bsp=e000010067979278 > [] sba_map_sg+0x380/0x760 > sp=e00001006797fda0 bsp=e0000100679791e0 > [] ib_umem_get+0x770/0xa80 [ib_uverbs] > sp=e00001006797fdb0 bsp=e000010067979120 > [] ib_uverbs_reg_mr+0x2a0/0x9a0 [ib_uverbs] > sp=e00001006797fdb0 bsp=e0000100679790a8 > [] ib_uverbs_write+0x210/0x280 [ib_uverbs] > sp=e00001006797fe10 bsp=e000010067979078 > [] vfs_write+0x290/0x360 > sp=e00001006797fe20 bsp=e000010067979028 > [] sys_write+0x70/0xe0 > sp=e00001006797fe20 bsp=e000010067978fa8 > [] ia64_ret_from_syscall+0x0/0x20 > sp=e00001006797fe30 bsp=e000010067978fa8 > [] 0xa000000000010640 > sp=e000010067980000 bsp=e000010067978fa8 > > > > > --------------------------------- > ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 From vera_wx_cn at yahoo.com.cn Thu Sep 4 10:15:14 2008 From: vera_wx_cn at yahoo.com.cn (=?gb2312?q?=C7=BF=20=C2=ED?=) Date: Thu Sep 4 10:15:31 2008 Subject: [mvapich-discuss] (no subject) Message-ID: <708177.67629.qm@web15306.mail.cnb.yahoo.com> Hello I download mvapich2-1.2rc2, and help to run with srun, so I build it as the following: #./configure --with-device=ch3:mrail --with-rdma=gen2 --with-slurm=/usr/local/slurm --disable-romio --without-mpe #make slurm-1.2.25 is installed in /usr/local/slurm. But srun resports: Only 1 processors found application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0srun: error: xcn088: task3: Exited with exit code 1 Please tell me how to build mvapich2 , if I want to run mvapich2 with slurm. Thanks. --------------------------------- ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080904/f4f9492a/attachment.html From bfp at purdue.edu Thu Sep 4 10:36:24 2008 From: bfp at purdue.edu (Bryan Putnam) Date: Thu Sep 4 10:36:38 2008 Subject: [mvapich-discuss] mvapich2-1.2rc2 and -hostfile Message-ID: I've been experimenting with mvapich2-1.2rc2, and I wonder if you could tell me how to make, for example, -hostfile $PBS_NODEFILE th default specification on the mpirun_rsh command line. Thanks, Bryan Putnam From perkinjo at cse.ohio-state.edu Thu Sep 4 11:01:35 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 4 11:02:50 2008 Subject: [mvapich-discuss] (no subject) In-Reply-To: <708177.67629.qm@web15306.mail.cnb.yahoo.com> References: <708177.67629.qm@web15306.mail.cnb.yahoo.com> Message-ID: <20080904150134.GB2880@cse.ohio-state.edu> Hi: This doesn't look like a build problem. Did you specify how many processes you wanted to use via the srun command? I suggest looking at the documentation provided by the SLURM website for more information on how to use their process manager. https://computing.llnl.gov/linux/slurm/documentation.html On Thu, Sep 04, 2008 at 10:15:14PM +0800, ? ? wrote: > Hello > > I download mvapich2-1.2rc2, and help to run with srun, so I build it as the following: > #./configure --with-device=ch3:mrail --with-rdma=gen2 --with-slurm=/usr/local/slurm --disable-romio --without-mpe > #make > > slurm-1.2.25 is installed in /usr/local/slurm. > But srun resports: > > Only 1 processors found > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0srun: error: xcn088: task3: Exited with exit code 1 > > Please tell me how to build mvapich2 , if I want to run mvapich2 with slurm. > Thanks. > > > --------------------------------- > ???????????? > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From perkinjo at cse.ohio-state.edu Thu Sep 4 11:12:18 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 4 11:13:34 2008 Subject: [mvapich-discuss] mvapich2-1.2rc2 and -hostfile In-Reply-To: References: Message-ID: <20080904151218.GC2880@cse.ohio-state.edu> Bryan: This command line option is required when using a PBS system with mpirun_rsh. On Thu, Sep 04, 2008 at 10:36:24AM -0400, Bryan Putnam wrote: > I've been experimenting with mvapich2-1.2rc2, and I wonder if you could > tell me how to make, for example, > > -hostfile $PBS_NODEFILE > > th default specification on the mpirun_rsh command line. > > Thanks, > Bryan Putnam > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From kallies at zib.de Thu Sep 4 12:42:04 2008 From: kallies at zib.de (Bernd Kallies) Date: Thu Sep 4 12:42:23 2008 Subject: [mvapich-discuss] mvapich2-1.2rc2 and -hostfile In-Reply-To: References: Message-ID: <1220546524.21455.223.camel@kallies.zib.de> On Thu, 2008-09-04 at 10:36 -0400, Bryan Putnam wrote: > I've been experimenting with mvapich2-1.2rc2, and I wonder if you could > tell me how to make, for example, > > -hostfile $PBS_NODEFILE > > th default specification on the mpirun_rsh command line. Give Pete Wyckoff's mpiexec a try, it will handle this for you if you use a PBS. In addition, it will protect you from all these zombie tasks hanging around when batch jobs are aborted, cancelled, ... See http://www.osc.edu/~pw/mpiexec/index.php > Thanks, > Bryan Putnam > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Dr. Bernd Kallies Konrad-Zuse-Zentrum f?r Informationstechnik Berlin Takustr. 7 14195 Berlin Tel: +49-30-84185-270 Fax: +49-30-84185-311 e-mail: kallies@zib.de From perkinjo at cse.ohio-state.edu Thu Sep 4 13:12:19 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 4 13:13:36 2008 Subject: [mvapich-discuss] mvapich2-1.2rc2 and -hostfile In-Reply-To: <1220546524.21455.223.camel@kallies.zib.de> References: <1220546524.21455.223.camel@kallies.zib.de> Message-ID: <20080904171219.GN2880@cse.ohio-state.edu> On Thu, Sep 04, 2008 at 06:42:04PM +0200, Bernd Kallies wrote: > On Thu, 2008-09-04 at 10:36 -0400, Bryan Putnam wrote: > > I've been experimenting with mvapich2-1.2rc2, and I wonder if you could > > tell me how to make, for example, > > > > -hostfile $PBS_NODEFILE > > > > th default specification on the mpirun_rsh command line. > > Give Pete Wyckoff's mpiexec a try, it will handle this for you if you > use a PBS. In addition, it will protect you from all these zombie tasks > hanging around when batch jobs are aborted, cancelled, ... This is to clarify that the mpirun_rsh/mpiswpan framework in MVAPICH2 1.2 together with the fixes posted recently do not leave any processes when jobs get aborted. This new framework also provides much faster and scalable job start-up. We recommend MVAPICH2 users to start moving to this new job start-up framework. > > See > > http://www.osc.edu/~pw/mpiexec/index.php > > > Thanks, > > Bryan Putnam > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- > Dr. Bernd Kallies > Konrad-Zuse-Zentrum f?r Informationstechnik Berlin > Takustr. 7 > 14195 Berlin > Tel: +49-30-84185-270 > Fax: +49-30-84185-311 > e-mail: kallies@zib.de > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From vera_wx_cn at yahoo.com.cn Thu Sep 4 23:33:49 2008 From: vera_wx_cn at yahoo.com.cn (=?gb2312?q?=C7=BF=20=C2=ED?=) Date: Thu Sep 4 23:34:07 2008 Subject: [mvapich-discuss] Can I run mvapich2-1.2rc2 with srun? Message-ID: <753920.91983.qm@web15304.mail.cnb.yahoo.com> If mvapich2-1.2rc2 can be run with srun, please tell me how to configure. Thanks --------------------------------- ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080905/c2f4abc2/attachment.html From mbozzore at platform.com Mon Sep 8 07:49:19 2008 From: mbozzore at platform.com (Mehdi Bozzo-Rey) Date: Mon Sep 8 07:48:18 2008 Subject: [mvapich-discuss] mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer Message-ID: <531893A968B34D40B36C7A6445BC828A01E001D0@catoexm06.noam.corp.platform.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: Bmake.inc Type: application/octet-stream Size: 11035 bytes Desc: Bmake.inc Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080908/e068b604/Bmake-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: SLmake.inc Type: application/octet-stream Size: 2994 bytes Desc: SLmake.inc Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080908/e068b604/SLmake-0001.obj From mbozzore at platform.com Mon Sep 8 08:00:41 2008 From: mbozzore at platform.com (Mehdi Bozzo-Rey) Date: Mon Sep 8 07:59:36 2008 Subject: [mvapich-discuss] RE: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer Message-ID: <531893A968B34D40B36C7A6445BC828A01E001D1@catoexm06.noam.corp.platform.com> >From what I can see in the archive (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001888.html), there is something missing, so I resend my original email. Mehdi ======================================= From: Mehdi Bozzo-Rey Sent: September-08-08 7:48 AM To: 'mvapich-discuss@cse.ohio-state.edu' Subject: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer Hello, I recompiled mvapich 1.0.1, BLACS and ScaLAPACK. - I am able to run the tests included in the BLACS distribution - I am able to run most of the tests included in the ScaLAPACK distribution, except xcnep and xznep. They fail with the following errors: Do you have any idea what could be the root cause ? xcnep: -------------------------------------------------------------------------- [mbozzore@compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xcnep ScaLAPACK QSQ^H by Schur Decomposition. 'MPI machine' Tests of the parallel complex single precision Schur decomposition. The following scaled residual checks will be computed: ?Residual?????????????? = ||H-QSQ^H|| / (||H|| * eps * N ) ?Orthogonality residual = ||I - Q^HQ|| / ( eps * N ) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME??? : Indicates whether WALL or CPU time was used. N?????? : The number of columns in the matrix A. NB????? : The size of the square blocks the matrix A is split into. P?????? : The number of process rows. Q?????? : The number of process columns. THRESH? : If a residual value is less than THRESH, CHECK is flagged as PASSED NEP time : Time in seconds to decompose the? matrix MFLOPS? : Rate of execution The following parameter values will be used: ? N?????? :???????????? 1???? 2???? 3???? 4???? 6??? 10??? 50 ? NB????? :???????????? 6???? 8??? 17 ? P?????? :???????????? 1???? 2 ? Q?????? :???????????? 1???? 2 Relative machine precision (eps) is taken to be?????? 0.596046E-07 Routines pass computational tests if scaled residual is less than?? 20.000 TIME???? N? NB??? P??? Q NEP Time?? MFLOPS? CHECK ---- ----- --- ---- ---- -------- -------- ------ WALL???? 1?? 6??? 1??? 1???? 0.00???? 1.06 PASSED WALL??? ?1?? 8??? 1??? 1???? 0.00??? 18.00 PASSED WALL???? 1? 17??? 1??? 1???? 0.00???? 9.00 PASSED WALL???? 2?? 6??? 1??? 1???? 0.00???? 2.77 PASSED WALL???? 2?? 8??? 1??? 1???? 0.00??? 20.57 PASSED WALL???? 2? 17??? 1??? 1???? 0.00??? 20.57 PASSED WALL???? 3?? 6??? 1??? 1???? 0.00???? 4.26 PASSED WALL???? 3?? 8??? 1??? 1???? 0.00??? 12.15 PASSED WALL???? 3? 17??? 1??? 1???? 0.00??? 12.15 PASSED WALL???? 4?? 6??? 1??? 1???? 0.00??? 19.53 PASSED WALL???? 4?? 8??? 1??? 1???? 0.00??? 21.33 PASSED WALL???? 4? 17??? 1??? 1???? 0.00??? 20.21 PASSED WALL???? 6?? 6??? 1??? 1???? 0.00??? 30.61 PASSED WALL???? 6?? 8??? 1??? 1???? 0.00??? 32.67 PASSED WALL???? 6? 17??? 1??? 1???? 0.00??? 29.91 PASSED WALL??? 10?? 6??? 1??? 1???? 0.00??? 72.87 PASSED WALL? ??10?? 8??? 1??? 1???? 0.00??? 80.00 PASSED WALL??? 10? 17??? 1??? 1???? 0.00??? 88.67 PASSED WALL??? 50?? 6??? 1??? 1???? 0.01?? 408.57 PASSED WALL??? 50?? 8??? 1??? 1???? 0.01?? 428.08 PASSED WALL??? 50? 17??? 1??? 1???? 0.00?? 481.49 PASSED WALL???? 1? ?6??? 2??? 2???? 0.00???? 0.90 PASSED WALL???? 1?? 8??? 2??? 2???? 0.00???? 1.50 PASSED WALL???? 1? 17??? 2??? 2???? 0.00???? 1.50 PASSED WALL???? 2?? 6??? 2??? 2???? 0.00???? 1.22 PASSED WALL???? 2?? 8??? 2??? 2???? 0.00???? 2.53 PASSED WALL???? 2? 17??? 2??? 2???? 0.00???? 2.62 PASSED WALL???? 3?? 6??? 2??? 2???? 0.00???? 1.65 PASSED WALL???? 3?? 8??? 2??? 2???? 0.00???? 2.09 PASSED WALL???? 3? 17??? 2??? 2???? 0.00???? 2.05 PASSED WALL???? 4?? 6??? 2??? 2???? 0.00???? 4.04 PASSED WALL???? 4?? 8??? 2??? 2???? 0.00???? 4.13 PASSED WALL???? 4? 17??? 2??? 2???? 0.00???? 4.07 PASSED WALL???? 6?? 6??? 2??? 2???? 0.00???? 6.81 PASSED WALL???? 6?? 8??? 2??? 2???? 0.00???? 7.51 PASSED WALL???? 6? 17??? 2??? 2???? 0.00???? 7.64 PASSED 0 - MPI_RECV : Invalid buffer pointer 2 - MPI_RECV : Invalid buffer pointer [2] [] Aborting Program! [0] [] Aborting Program! Abort signaled by rank 0:? Aborting program ! Exit code -3 signaled from compute-00-02 Killing remote processes...Abort signaled by rank 2:? Aborting program ! MPI process terminated unexpectedly DONE [mbozzore@compute-00-02 TESTING]$ Signal 15 received. Signal 15 received. -------------------------------------------------------------------------- And xznep: -------------------------------------------------------------------------- [mbozzore@compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xznep ScaLAPACK QSQ^H by Schur Decomposition. 'MPI machine' Tests of the parallel complex double precision Schur decomposition. The following scaled residual checks will be computed: ?Residual?????????????? = ||H-QSQ^H|| / (||H|| * eps * N ) ?Orthogonality residual = ||I - Q^HQ|| / ( eps * N ) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME??? : Indicates whether WALL or CPU time was used. N?????? : The number of columns in the matrix A. NB????? : The size of the square blocks the matrix A is split into. P?????? : The number of process rows. Q?????? : The number of process columns. THRESH? : If a residual value is less than THRESH, CHECK is flagged as PASSED NEP time : Time in seconds to decompose the? matrix MFLOPS? : Rate of execution The following parameter values will be used: ? N?????? :???????????? 1???? 2???? 3???? 4???? 6??? 10??? 50 ? NB????? :???????????? 6???? 8??? 17 ? P?????? :???????????? 1???? 2 ? Q?????? :???????????? 1???? 2 Relative machine precision (eps) is taken to be?????? 0.111022E-15 Routines pass computational tests if scaled residual is less than?? 20.000 TIME???? N? NB??? P??? Q NEP Time?? MFLOPS? CHECK ---- ----- --- ---- ---- -------- -------- ------ WALL???? 1? ?6??? 1??? 1???? 0.00???? 1.50 PASSED WALL???? 1?? 8??? 1??? 1???? 0.00??? 18.00 PASSED WALL???? 1? 17??? 1??? 1???? 0.00??? 18.00 PASSED WALL???? 2?? 6??? 1??? 1???? 0.00???? 2.15 PASSED WALL???? 2?? 8??? 1??? 1???? 0.00??? 16.00 PASSED WALL???? 2? 17??? 1??? 1???? 0.00??? 16.00 PASSED WALL???? 3?? 6??? 1??? 1???? 0.00???? 3.80 PASSED WALL???? 3?? 8??? 1??? 1???? 0.00???? 8.10 PASSED WALL???? 3? 17??? 1??? 1???? 0.00???? 8.24 PASSED WALL???? 4?? 6??? 1??? 1???? 0.00??? 14.22 PASSED WALL???? 4?? 8??? 1??? 1???? 0.00??? 15.16 PASSED WALL???? 4? 17??? 1??? 1???? 0.00??? 15.16 PASSED WALL???? 6?? 6??? 1??? 1???? 0.00??? 23.01 PASSED WALL???? 6?? 8??? 1??? 1???? 0.00??? 23.71 PASSED WALL???? 6? 17??? 1??? 1???? 0.00??? 22.74 PASSED WALL??? 10?? 6??? 1??? 1???? 0.00??? 46.39 PASSED WALL??? 10?? 8??? 1??? 1???? 0.00??? 51.14 PASSED WALL??? 10? 17??? 1??? 1???? 0.00??? 55.56 PASSED WALL??? 50?? 6??? 1??? 1???? 0.01?? 263.62 PASSED WALL??? 50?? 8??? 1??? 1???? 0.01?? 283.73 PASSED WALL??? 50? 17??? 1??? 1???? 0.01?? 328.23 PASSED WALL???? 1?? 6??? 2??? 2???? 0.00???? 0.90 PASSED WALL???? 1?? 8??? 2??? 2???? 0.00???? 1.50 PASSED WALL???? 1? 17??? 2??? 2???? 0.00???? 1.50 PASSED WALL???? 2?? 6??? 2??? 2???? 0.00???? 1.04 PASSED WALL? ???2?? 8??? 2??? 2???? 0.00???? 2.48 PASSED WALL???? 2? 17??? 2??? 2???? 0.00???? 2.53 PASSED WALL???? 3?? 6??? 2??? 2???? 0.00???? 1.28 PASSED WALL???? 3?? 8??? 2??? 2???? 0.00???? 1.51 PASSED WALL???? 3? 17??? 2??? 2???? 0.00???? 1.51 PASSED WALL???? 4? ?6??? 2??? 2???? 0.00???? 2.92 PASSED WALL???? 4?? 8??? 2??? 2???? 0.00???? 2.98 PASSED WALL???? 4? 17??? 2??? 2???? 0.00???? 2.95 PASSED WALL???? 6?? 6??? 2??? 2???? 0.00???? 6.12 PASSED WALL???? 6?? 8??? 2??? 2???? 0.00???? 6.52 PASSED WALL???? 6? 17??? 2??? 2???? 0.00???? 6.92 PASSED 0 - MPI_RECV : Invalid buffer pointer 2 - MPI_RECV : Invalid buffer pointer [2] [] Aborting Program! [0] [] Aborting Program! Abort signaled by rank 2:? Aborting program ! Abort signaled by rank 0:? Aborting program ! Exit code -3 signaled from compute-00-02 Killing remote processes...MPI process terminated unexpectedly DONE [mbozzore@compute-00-02 TESTING]$ Signal 15 received. Signal 15 received. -------------------------------------------------------------------------- My Bmake.inc and SLmake.inc are attached to this email. Note: mpich 1.27p1, Open MPI 1.2.4 (IB) and Open MPI 1.2.5 (IB) are OK. For example: -------------------------------------------------------------------------- [mbozzore@compute-00-02 openmpi1.2.5]$ ompi_info | less ??????????????? Open MPI: 1.2.5 ?? Open MPI SVN revision: r16989 -------------------------------------------------------------------------- -------------------------------------------------------------------------- [mbozzore@compute-00-02 openmpi1.2.5]$ mpirun -np 4 --machinefile ./hosts --mca btl openib,self ./xcnep ScaLAPACK QSQ^H by Schur Decomposition. 'MPI machine' Tests of the parallel complex single precision Schur decomposition. The following scaled residual checks will be computed: ?Residual?????????????? = ||H-QSQ^H|| / (||H|| * eps * N ) ?Orthogonality residual = ||I - Q^HQ|| / ( eps * N ) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME??? : Indicates whether WALL or CPU time was used. N?????? : The number of columns in the matrix A. NB????? : The size of the square blocks the matrix A is split into. P?????? : The number of process rows. Q?????? : The number of process columns. THRESH? : If a residual value is less than THRESH, CHECK is flagged as PASSED NEP time : Time in seconds to decompose the? matrix MFLOPS? : Rate of execution The following parameter values will be used: ? N?????? :???????????? 1???? 2???? 3???? 4???? 6??? 10??? 50 ? NB?? ???:???????????? 6???? 8??? 17 ? P?????? :???????????? 1???? 2 ? Q?????? :???????????? 1???? 2 Relative machine precision (eps) is taken to be?????? 0.596046E-07 Routines pass computational tests if scaled residual is less than?? 20.000 TIME???? N? NB??? P??? Q NEP Time?? MFLOPS? CHECK ---- ----- --- ---- ---- -------- -------- ------ WALL???? 1?? 6??? 1??? 1???? 0.00???? 1.51 PASSED WALL???? 1?? 8??? 1??? 1???? 0.00??? 18.87 PASSED WALL???? 1? 17??? 1??? 1???? 0.00??? 18.87 PASSED WALL ????2?? 6??? 1??? 1???? 0.00???? 1.87 PASSED WALL???? 2?? 8??? 1??? 1???? 0.00??? 10.24 PASSED WALL???? 2? 17??? 1??? 1???? 0.00??? 18.30 PASSED WALL???? 3?? 6??? 1??? 1???? 0.00???? 4.19 PASSED WALL???? 3?? 8??? 1??? 1???? 0.00??? 11.08 PASSED WALL???? 3 ?17??? 1??? 1???? 0.00??? 11.52 PASSED WALL???? 4?? 6??? 1??? 1???? 0.00??? 18.58 PASSED WALL???? 4?? 8??? 1??? 1???? 0.00??? 20.22 PASSED WALL???? 4? 17??? 1??? 1???? 0.00??? 20.13 PASSED WALL???? 6?? 6??? 1??? 1???? 0.00??? 27.78 PASSED WALL???? 6?? 8?? ?1??? 1???? 0.00??? 29.49 PASSED WALL???? 6? 17??? 1??? 1???? 0.00??? 28.61 PASSED WALL??? 10?? 6??? 1??? 1???? 0.00??? 66.17 PASSED WALL??? 10?? 8??? 1??? 1???? 0.00??? 72.04 PASSED WALL??? 10? 17??? 1??? 1???? 0.00??? 81.09 PASSED WALL??? 50?? 6??? 1??? 1???? 0.01?? 392.33 PASSED WALL??? 50?? 8??? 1??? 1???? 0.01?? 409.76 PASSED WALL??? 50? 17??? 1??? 1???? 0.00?? 463.93 PASSED WALL???? 1?? 6??? 2??? 2???? 0.00???? 0.31 PASSED WALL???? 1?? 8??? 2??? 2???? 0.00???? 0.72 PASSED WALL???? 1? 17??? 2??? 2???? 0.00???? 0.75 PASSED WALL???? 2?? 6??? 2??? 2???? 0.00???? 0.76 PASSED WALL???? 2?? 8??? 2??? 2???? 0.00???? 1.00 PASSED WALL???? 2? 17??? 2??? 2???? 0.00???? 1.11 PASSED WALL???? 3?? 6??? 2??? 2???? 0.00???? 0.55 PASSED WALL???? 3?? 8??? 2??? 2???? 0.00???? 1.12 PASSED WALL???? 3? 17??? 2??? 2???? 0.00???? 1.11 PASSED WALL???? 4?? 6??? 2??? 2???? 0.00???? 2.20 PASSED WALL???? 4?? 8??? 2??? 2???? 0.00???? 2.19 PASSED WALL???? 4? 17??? 2??? 2???? 0.00???? 2.19 PASSED WALL? ???6?? 6??? 2??? 2???? 0.00???? 3.96 PASSED WALL???? 6?? 8??? 2??? 2???? 0.00???? 4.13 PASSED WALL???? 6? 17??? 2??? 2???? 0.00???? 4.46 PASSED WALL??? 10?? 6??? 2??? 2???? 0.02???? 1.16 PASSED WALL??? 10?? 8??? 2??? 2???? 0.00???? 7.40 PASSED WALL??? 10? 17??? 2??? 2???? 0.00??? 13.08 PASSED WALL??? 50?? 6??? 2??? 2???? 0.05??? 47.80 PASSED WALL??? 50?? 8??? 2??? 2???? 0.04??? 57.51 PASSED WALL??? 50? 17??? 2??? 2???? 0.02?? 128.66 PASSED Finished???? 42 tests, with the following results: ?? 42 tests completed and passed residual checks. ??? 0 tests completed and failed residual checks. ??? 0 tests skipped because of illegal input values. END OF TESTS. -------------------------------------------------------------------------- Thanks, Mehdi Mehdi Bozzo-Rey HPC Solution Developer Platform OCS5 Platform computing Phone: +1 905 948 4649 From dzieko at wcss.pl Tue Sep 9 06:33:41 2008 From: dzieko at wcss.pl (Pawel Dziekonski) Date: Tue Sep 9 06:34:03 2008 Subject: [mvapich-discuss] Upgrade of MVAPICH -> recompile all apps?? Message-ID: <20080909103341.GD26153@cefeid.wcss.wroc.pl> Hi, currently we are using MVAPICH 0.9.9 from OFED 1.2.5.5. We would like to upgrade OFED to latest release. THis would be probably OFED 1.3.2, together with integrated MVAPICH mvapich-1.0.1-2533. Question: do we have to recompile all our applications against new MVAPICH? thanks in advance, Pawel -- Pawel Dziekonski Wroclaw Centre for Networking & Supercomputing, HPC Department Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl From michael.heinz at qlogic.com Tue Sep 9 08:48:13 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue Sep 9 08:48:30 2008 Subject: [mvapich-discuss] Problem with the newest mvapich RPM spec file? In-Reply-To: <20080909103341.GD26153@cefeid.wcss.wroc.pl> References: <20080909103341.GD26153@cefeid.wcss.wroc.pl> Message-ID: When building recent versions of the mvapich RPM, we are seeing the following error when compiling on systems that still have g77 installed: configure: error: Fortran 90 and Fortran 77 compilers are not compatible. This was causing our automated build system to fail, even though the make process seems to ignore the error. This raises a couple of questions: 1. g77 doesn't support Fortran 90 as far as I know - is it correct to configure mpif90 to use g77? 2. Should this be an error or perhaps a warning? We can work around the error message easily enough now that we understand it - but I'm a little concerned about distributing an mpif90 command that won't be able to process real Fortran 90 programs - should the spec file be altered to disable Fortran 90 in this case? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From panda at cse.ohio-state.edu Tue Sep 9 09:36:11 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Sep 9 09:36:26 2008 Subject: [mvapich-discuss] Problem with the newest mvapich RPM spec file? In-Reply-To: Message-ID: Mike, Are you referring to the MVAPICH RPM spec file in OFED 1.3.1 or OFED 1.4? Thanks, DK On Tue, 9 Sep 2008, Mike Heinz wrote: > When building recent versions of the mvapich RPM, we are seeing the > following error when compiling on systems that still have g77 installed: > > configure: error: Fortran 90 and Fortran 77 compilers are not > compatible. > > This was causing our automated build system to fail, even though the > make process seems to ignore the error. This raises a couple of > questions: > > 1. g77 doesn't support Fortran 90 as far as I know - is it correct to > configure mpif90 to use g77? > 2. Should this be an error or perhaps a warning? > > We can work around the error message easily enough now that we > understand it - but I'm a little concerned about distributing an mpif90 > command that won't be able to process real Fortran 90 programs - should > the spec file be altered to disable Fortran 90 in this case? > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From michael.heinz at qlogic.com Tue Sep 9 09:37:55 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue Sep 9 09:38:35 2008 Subject: [mvapich-discuss] Problem with the newest mvapich RPM spec file? In-Reply-To: References: Message-ID: Sorry, OFED 1.4. The RPM is mvapich-1.1.0-2931.src.rpm -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Tuesday, September 09, 2008 9:36 AM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu; pasha@mellanox.co.il Subject: Re: [mvapich-discuss] Problem with the newest mvapich RPM spec file? Mike, Are you referring to the MVAPICH RPM spec file in OFED 1.3.1 or OFED 1.4? Thanks, DK On Tue, 9 Sep 2008, Mike Heinz wrote: > When building recent versions of the mvapich RPM, we are seeing the > following error when compiling on systems that still have g77 installed: > > configure: error: Fortran 90 and Fortran 77 compilers are not > compatible. > > This was causing our automated build system to fail, even though the > make process seems to ignore the error. This raises a couple of > questions: > > 1. g77 doesn't support Fortran 90 as far as I know - is it correct to > configure mpif90 to use g77? > 2. Should this be an error or perhaps a warning? > > We can work around the error message easily enough now that we > understand it - but I'm a little concerned about distributing an > mpif90 command that won't be able to process real Fortran 90 programs > - should the spec file be altered to disable Fortran 90 in this case? > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Tue Sep 9 09:44:39 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Sep 9 09:44:52 2008 Subject: [mvapich-discuss] Problem with the newest mvapich RPM spec file? In-Reply-To: Message-ID: Mike - Thanks for the clarification. Pasha - Can you take a look at it. Thanks, DK > Sorry, OFED 1.4. The RPM is mvapich-1.1.0-2931.src.rpm > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Tuesday, September 09, 2008 9:36 AM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu; pasha@mellanox.co.il > Subject: Re: [mvapich-discuss] Problem with the newest mvapich RPM spec > file? > > Mike, > > Are you referring to the MVAPICH RPM spec file in OFED 1.3.1 or OFED > 1.4? > > Thanks, > > DK > > On Tue, 9 Sep 2008, Mike Heinz wrote: > > > When building recent versions of the mvapich RPM, we are seeing the > > following error when compiling on systems that still have g77 > installed: > > > > configure: error: Fortran 90 and Fortran 77 compilers are not > > compatible. > > > > This was causing our automated build system to fail, even though the > > make process seems to ignore the error. This raises a couple of > > questions: > > > > 1. g77 doesn't support Fortran 90 as far as I know - is it correct to > > configure mpif90 to use g77? > > 2. Should this be an error or perhaps a warning? > > > > We can work around the error message easily enough now that we > > understand it - but I'm a little concerned about distributing an > > mpif90 command that won't be able to process real Fortran 90 programs > > - should the spec file be altered to disable Fortran 90 in this case? > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > From perkinjo at cse.ohio-state.edu Tue Sep 9 10:34:27 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Tue Sep 9 10:35:40 2008 Subject: [mvapich-discuss] Upgrade of MVAPICH -> recompile all apps?? In-Reply-To: <20080909103341.GD26153@cefeid.wcss.wroc.pl> References: <20080909103341.GD26153@cefeid.wcss.wroc.pl> Message-ID: <20080909143426.GB2920@cse.ohio-state.edu> On Tue, Sep 09, 2008 at 12:33:41PM +0200, Pawel Dziekonski wrote: > Hi, > > currently we are using MVAPICH 0.9.9 from OFED 1.2.5.5. We would like > to upgrade OFED to latest release. THis would be probably OFED 1.3.2, > together with integrated MVAPICH mvapich-1.0.1-2533. > > Question: do we have to recompile all our applications against new > MVAPICH? Pawel: One of the enhancements present in mvapich-1.0.1 over 0.9.9 is a new interface for the mpirun_rsh process manager that allows us to bootstrap mvapich programs in a more scalable manner. Because of this change in interface your mvapich-0.9.9 mpi programs cannot be launched by the mvapich-1.0.1 mpirun_rsh and would require a recompilation. > > thanks in advance, Pawel > -- > Pawel Dziekonski > Wroclaw Centre for Networking & Supercomputing, HPC Department > Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND > phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From panda at cse.ohio-state.edu Tue Sep 9 11:09:58 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Sep 9 11:10:12 2008 Subject: [mvapich-discuss] Upgrade of MVAPICH -> recompile all apps?? In-Reply-To: <20080909103341.GD26153@cefeid.wcss.wroc.pl> Message-ID: Hi, > currently we are using MVAPICH 0.9.9 from OFED 1.2.5.5. We would like > to upgrade OFED to latest release. THis would be probably OFED 1.3.2, > together with integrated MVAPICH mvapich-1.0.1-2533. > > Question: do we have to recompile all our applications against new > MVAPICH? In the latest MVAPICH 1.0 series (including 1.0.1), a new scalable job start-up scheme has been introduced. There have been also multiple other enhancements since 0.9.9. Thus, it will be good to recompile all your applications to take advantage of these latest features and enhancements. Also, there is no OFED 1.3.2. I believe you are referring to OFED 1.3.1? Thanks, DK > thanks in advance, Pawel > -- > Pawel Dziekonski > Wroclaw Centre for Networking & Supercomputing, HPC Department > Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND > phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From pasha at mellanox.co.il Tue Sep 9 10:45:47 2008 From: pasha at mellanox.co.il (Pavel Shamis) Date: Tue Sep 9 12:10:38 2008 Subject: [mvapich-discuss] Problem with the newest mvapich RPM spec file? In-Reply-To: Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD75975E@mtlexch01.mtl.com> Hi All, I will check the issue after euroPVM conf. Thanks, Pasha > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Tuesday, September 09, 2008 4:45 PM > To: Mike Heinz; Pavel Shamis > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] Problem with the newest > mvapich RPM spec file? > > Mike - Thanks for the clarification. > > Pasha - Can you take a look at it. > > Thanks, > > DK > > > Sorry, OFED 1.4. The RPM is mvapich-1.1.0-2931.src.rpm > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > > > -----Original Message----- > > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > > Sent: Tuesday, September 09, 2008 9:36 AM > > To: Mike Heinz > > Cc: mvapich-discuss@cse.ohio-state.edu; pasha@mellanox.co.il > > Subject: Re: [mvapich-discuss] Problem with the newest mvapich RPM > > spec file? > > > > Mike, > > > > Are you referring to the MVAPICH RPM spec file in OFED > 1.3.1 or OFED > > 1.4? > > > > Thanks, > > > > DK > > > > On Tue, 9 Sep 2008, Mike Heinz wrote: > > > > > When building recent versions of the mvapich RPM, we are > seeing the > > > following error when compiling on systems that still have g77 > > installed: > > > > > > configure: error: Fortran 90 and Fortran 77 compilers are not > > > compatible. > > > > > > This was causing our automated build system to fail, even > though the > > > make process seems to ignore the error. This raises a couple of > > > questions: > > > > > > 1. g77 doesn't support Fortran 90 as far as I know - is > it correct > > > to configure mpif90 to use g77? > > > 2. Should this be an error or perhaps a warning? > > > > > > We can work around the error message easily enough now that we > > > understand it - but I'm a little concerned about distributing an > > > mpif90 command that won't be able to process real Fortran 90 > > > programs > > > - should the spec file be altered to disable Fortran 90 > in this case? > > > > > > -- > > > Michael Heinz > > > Principal Engineer, Qlogic Corporation King of Prussia, > Pennsylvania > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > From pasha at mellanox.co.il Tue Sep 9 11:03:17 2008 From: pasha at mellanox.co.il (Pavel Shamis) Date: Tue Sep 9 12:10:39 2008 Subject: [mvapich-discuss] Problem with the newest mvapich RPM spec file? In-Reply-To: Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD759788@mtlexch01.mtl.com> Looks that it was some issue in the spec file, can you please check the new version : http://www.openfabrics.org/~pasha/ofed_1_4/mvapich/mvapich-1.1.0-2977.sr c.rpm Thanks, Pasha. > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Tuesday, September 09, 2008 4:45 PM > To: Mike Heinz; Pavel Shamis > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] Problem with the newest > mvapich RPM spec file? > > Mike - Thanks for the clarification. > > Pasha - Can you take a look at it. > > Thanks, > > DK > > > Sorry, OFED 1.4. The RPM is mvapich-1.1.0-2931.src.rpm > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > > > -----Original Message----- > > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > > Sent: Tuesday, September 09, 2008 9:36 AM > > To: Mike Heinz > > Cc: mvapich-discuss@cse.ohio-state.edu; pasha@mellanox.co.il > > Subject: Re: [mvapich-discuss] Problem with the newest mvapich RPM > > spec file? > > > > Mike, > > > > Are you referring to the MVAPICH RPM spec file in OFED > 1.3.1 or OFED > > 1.4? > > > > Thanks, > > > > DK > > > > On Tue, 9 Sep 2008, Mike Heinz wrote: > > > > > When building recent versions of the mvapich RPM, we are > seeing the > > > following error when compiling on systems that still have g77 > > installed: > > > > > > configure: error: Fortran 90 and Fortran 77 compilers are not > > > compatible. > > > > > > This was causing our automated build system to fail, even > though the > > > make process seems to ignore the error. This raises a couple of > > > questions: > > > > > > 1. g77 doesn't support Fortran 90 as far as I know - is > it correct > > > to configure mpif90 to use g77? > > > 2. Should this be an error or perhaps a warning? > > > > > > We can work around the error message easily enough now that we > > > understand it - but I'm a little concerned about distributing an > > > mpif90 command that won't be able to process real Fortran 90 > > > programs > > > - should the spec file be altered to disable Fortran 90 > in this case? > > > > > > -- > > > Michael Heinz > > > Principal Engineer, Qlogic Corporation King of Prussia, > Pennsylvania > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > From pawel.dziekonski at wcss.pl Tue Sep 9 14:26:57 2008 From: pawel.dziekonski at wcss.pl (Pawel Dziekonski) Date: Tue Sep 9 14:44:45 2008 Subject: [mvapich-discuss] Upgrade of MVAPICH -> recompile all apps?? In-Reply-To: References: <20080909103341.GD26153@cefeid.wcss.wroc.pl> Message-ID: <20080909182657.GA741@cefeid.wcss.wroc.pl> On Tue, 09 Sep 2008 at 11:09:58AM -0400, Dhabaleswar Panda wrote: > Hi, > > > currently we are using MVAPICH 0.9.9 from OFED 1.2.5.5. We would > > like to upgrade OFED to latest release. THis would be probably > > OFED 1.3.2, together with integrated MVAPICH mvapich-1.0.1-2533. > > > > Question: do we have to recompile all our applications against new > > MVAPICH? > > In the latest MVAPICH 1.0 series (including 1.0.1), a new scalable > job start-up scheme has been introduced. There have been also > multiple other enhancements since 0.9.9. Thus, it will be good to > recompile all your applications to take advantage of these latest > features and enhancements. painful but OK. :) currently, we use mpiexec as a job start-up mechanism. also because of good integration with PBS Pro batch queueing system. would it be a good time to switch to MVAPICH2? what is your and list suggestion? > Also, there is no OFED 1.3.2. I believe you are referring to OFED 1.3.1? You are correct - I meant daily build from http://www.openfabrics.org/downloads/OFED/ofed-1.3.2-daily/ anyway, OFED 1.3.1 also contains mvapich-1.0.1-2533. thanks for all replies! Pawel -- Pawel Dziekonski Wroclaw Centre for Networking & Supercomputing, HPC Department Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl From ake.sandgren at hpc2n.umu.se Wed Sep 10 01:58:43 2008 From: ake.sandgren at hpc2n.umu.se (=?ISO-8859-1?Q?=C5ke?= Sandgren) Date: Wed Sep 10 01:59:05 2008 Subject: [mvapich-discuss] Assertion failure in mvapich 1.2rc2 Message-ID: <1221026323.1719.2.camel@skalman.hpc2n.umu.se> Hi! I'm running VASP (4.6.35) built with mvapich 1.2rc2 built with the pathscale compiler (3.2) and for some inputs i get Assertion failed in file ibv_rndv.c at line 645: sreq->mrail.rndv_buf_off == sreq->mrail.rndv_buf_sz This is a bit hard to debug since it takes ~1.5hours before i happens. Any ideas of why this would occur? And more importantly how to fix it. -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: ake@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se From panda at cse.ohio-state.edu Wed Sep 10 09:38:11 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Sep 10 09:38:27 2008 Subject: [mvapich-discuss] Assertion failure in mvapich 1.2rc2 In-Reply-To: <1221026323.1719.2.camel@skalman.hpc2n.umu.se> Message-ID: Thanks for reporting this. Do you see this assertion error with any other compiler? Could you also provide some more details on the set-up: how many nodes you are using to run this application, type of computing platform, type of IB adapter, etc. Also, are you using the default build of mvapich2 or with any runtime configuration options. You indicate getting this error for some inputs. Are there any differences in the input file for which you see the error and for which you do not see the error. Thanks, DK On Wed, 10 Sep 2008, [ISO-8859-1] Åke Sandgren wrote: > Hi! > > I'm running VASP (4.6.35) built with mvapich 1.2rc2 built with the > pathscale compiler (3.2) and for some inputs i get > > Assertion failed in file ibv_rndv.c at line 645: > sreq->mrail.rndv_buf_off == sreq->mrail.rndv_buf_sz > > This is a bit hard to debug since it takes ~1.5hours before i happens. > > Any ideas of why this would occur? > And more importantly how to fix it. > > -- > Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden > Internet: ake@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 > Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From ake.sandgren at hpc2n.umu.se Wed Sep 10 10:04:34 2008 From: ake.sandgren at hpc2n.umu.se (=?ISO-8859-1?Q?=C5ke?= Sandgren) Date: Wed Sep 10 10:04:59 2008 Subject: [mvapich-discuss] Assertion failure in mvapich 1.2rc2 In-Reply-To: References: Message-ID: <1221055474.22648.25.camel@skalman.hpc2n.umu.se> On Wed, 2008-09-10 at 09:38 -0400, Dhabaleswar Panda wrote: > Thanks for reporting this. Do you see this assertion error with any other > compiler? Could you also provide some more details on the set-up: how many > nodes you are using to run this application, type of computing platform, > type of IB adapter, etc. Also, are you using the default build of mvapich2 > or with any runtime configuration options. You indicate getting this error > for some inputs. Are there any differences in the input file for which you > see the error and for which you do not see the error. I'll try :-) I have only tried with pathscale (my top priority right now is to get the VASP code running so i'll stick to pathscale a while longer). The number of nodes vary from 8 to 25 (we have dual quad-core nodes so x8 for processors). The system is an IBM blade system with Mellanox Technologies MT25208 InfiniHost III Ex adapters. configure --enable-error-checking=all --enable-error-messages=all --enable-g=handle,dbg,mutex,meminit --enable-fast=none,O0 --with-rdma=gen2 --with-pm=mpd --without-mpe (with CC and so on set to pathcc etc) Running without any mvapich env vars set. As for the inputs to VASP i don't know enough about VASP to be able to guess why some generate the error and some not. Note also that mvapich2 1.0.2p1 gave the same assertion error when i used it. If you need more details just ask. -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: ake@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se From panda at cse.ohio-state.edu Wed Sep 10 17:10:35 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Sep 10 17:10:56 2008 Subject: [mvapich-discuss] Upgrade of MVAPICH -> recompile all apps?? In-Reply-To: <20080909182657.GA741@cefeid.wcss.wroc.pl> Message-ID: > On Tue, 09 Sep 2008 at 11:09:58AM -0400, Dhabaleswar Panda wrote: > > Hi, > > > > > currently we are using MVAPICH 0.9.9 from OFED 1.2.5.5. We would > > > like to upgrade OFED to latest release. THis would be probably > > > OFED 1.3.2, together with integrated MVAPICH mvapich-1.0.1-2533. > > > > > > Question: do we have to recompile all our applications against new > > > MVAPICH? > > > > In the latest MVAPICH 1.0 series (including 1.0.1), a new scalable > > job start-up scheme has been introduced. There have been also > > multiple other enhancements since 0.9.9. Thus, it will be good to > > recompile all your applications to take advantage of these latest > > features and enhancements. > > painful but OK. :) You can also try to build MPI library with `shared library' feature. This will avoid the need to recompile your applications. Both MVAPICH and MVAPICH2 have these features. > currently, we use mpiexec as a job start-up mechanism. also because of > good integration with PBS Pro batch queueing system. > > would it be a good time to switch to MVAPICH2? what is your and list > suggestion? Since MVAPICH2 has more features and functionalities, we have been suggesting users to migrate to MVAPICH2. At many places, both MVAPICH and MVAPICH2 are being installed on the same system so that the users can easily switch between these two libraries. This helps the users to test their applications with MVAPICH2 and gradually move to MVAPICH2. Depending on the requirements from your users, you may consider this option. > > Also, there is no OFED 1.3.2. I believe you are referring to OFED 1.3.1? > > You are correct - I meant daily build from > http://www.openfabrics.org/downloads/OFED/ofed-1.3.2-daily/ > anyway, OFED 1.3.1 also contains mvapich-1.0.1-2533. Yes, you can use this. It is not a formal release yet. Once your OFED libraries are installed, you can also download the latest MVAPICH and MVAPICH2 released versions from the OSU site and install these. They will work. Thanks, DK > thanks for all replies! > Pawel > -- > Pawel Dziekonski > Wroclaw Centre for Networking & Supercomputing, HPC Department > Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND > phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl > From twcroc at wm.edu Thu Sep 11 15:21:07 2008 From: twcroc at wm.edu (Tom Crockett) Date: Thu Sep 11 15:21:23 2008 Subject: [mvapich-discuss] Process Termination Detection with mpirun_rsh In-Reply-To: References: Message-ID: <48C96FA3.7050509@wm.edu> Dhabaleswar Panda wrote: > This is to let you know that we have come up with a solution for the > "mpirun_rsh -rsh" problem. This solution also solves the hanging process > issue. > Please > try these latest versions (tarballs or directly from SVN) and let us know > whether it solves all of your problems. My test runs are now detecting abnormal process termination from whatever cause and the mpirun_rsh/mpispawn framework cleans up quickly and reliably. I really like this new daemonless approach; it's a lot easier to work with in our batch environment, and ultimately I think it's going to be more dependable, too. Thank you very much, -Tom -- Tom Crockett College of William and Mary email: twcroc@wm.edu IT/High Performance Computing Group phone: (757) 221-2762 Savage House fax: (757) 221-2023 P.O. Box 8795 Williamsburg, VA 23187-8795 From hoot at ptpnow.com Fri Sep 12 10:49:25 2008 From: hoot at ptpnow.com (Hoot Thompson) Date: Fri Sep 12 10:49:43 2008 Subject: [mvapich-discuss] Mvapich2 on windows Message-ID: <450C368F5CE9444A887B6879CBB34AAE@ptpdesk> Is there a way to install mvapich2 on a Windows platform under cygwin or the Portland CDK? We're testing a Microsoft HPC Cluster with infiniband and the OFED for windows software is installed (IB interfaces appear to be working). The make.mvapich2.ofa script sets OPEN_IB_HOME to /usr/local/ofed. Can that be changed to be consistent with the Windows world and if so, to what? Thanks! From panda at cse.ohio-state.edu Fri Sep 12 11:49:51 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Sep 12 11:50:08 2008 Subject: [mvapich-discuss] Mvapich2 on windows In-Reply-To: <450C368F5CE9444A887B6879CBB34AAE@ptpdesk> Message-ID: Hi Hoot, > Is there a way to install mvapich2 on a Windows platform under cygwin or the > Portland CDK? We're testing a Microsoft HPC Cluster with infiniband and the > OFED for windows software is installed (IB interfaces appear to be working). > The make.mvapich2.ofa script sets OPEN_IB_HOME to /usr/local/ofed. Can that > be changed to be consistent with the Windows world and if so, to what? I do not think simply changing the OPEN_IB_HOME to the Windows world will enable it. We have not tried it. You can try it and let us know your experience. I believe it will require substantial changes to make MVAPICH2 work with the OFED version for Windows. Thanks, DK > Thanks! > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From hoot at ptpnow.com Fri Sep 12 11:58:54 2008 From: hoot at ptpnow.com (Hoot Thompson) Date: Fri Sep 12 11:59:14 2008 Subject: [mvapich-discuss] Mvapich2 on windows In-Reply-To: References: <450C368F5CE9444A887B6879CBB34AAE@ptpdesk> Message-ID: How about with the Melanox drivers and Network Direct? Any easier? -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Friday, September 12, 2008 11:50 AM To: Hoot Thompson Cc: mvapich-discuss@cse.ohio-state.edu; 'Kirk Hunter' Subject: Re: [mvapich-discuss] Mvapich2 on windows Hi Hoot, > Is there a way to install mvapich2 on a Windows platform under cygwin > or the Portland CDK? We're testing a Microsoft HPC Cluster with > infiniband and the OFED for windows software is installed (IB interfaces appear to be working). > The make.mvapich2.ofa script sets OPEN_IB_HOME to /usr/local/ofed. > Can that be changed to be consistent with the Windows world and if so, to what? I do not think simply changing the OPEN_IB_HOME to the Windows world will enable it. We have not tried it. You can try it and let us know your experience. I believe it will require substantial changes to make MVAPICH2 work with the OFED version for Windows. Thanks, DK > Thanks! > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From hoot at ptpnow.com Fri Sep 12 12:02:04 2008 From: hoot at ptpnow.com (Hoot Thompson) Date: Fri Sep 12 12:02:25 2008 Subject: [mvapich-discuss] Mvapich2 on windows In-Reply-To: References: <450C368F5CE9444A887B6879CBB34AAE@ptpdesk> Message-ID: Correction, NetworkDirect is actually a Microsoft product. So can mvapich2 talk to it? -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Friday, September 12, 2008 11:50 AM To: Hoot Thompson Cc: mvapich-discuss@cse.ohio-state.edu; 'Kirk Hunter' Subject: Re: [mvapich-discuss] Mvapich2 on windows Hi Hoot, > Is there a way to install mvapich2 on a Windows platform under cygwin > or the Portland CDK? We're testing a Microsoft HPC Cluster with > infiniband and the OFED for windows software is installed (IB interfaces appear to be working). > The make.mvapich2.ofa script sets OPEN_IB_HOME to /usr/local/ofed. > Can that be changed to be consistent with the Windows world and if so, to what? I do not think simply changing the OPEN_IB_HOME to the Windows world will enable it. We have not tried it. You can try it and let us know your experience. I believe it will require substantial changes to make MVAPICH2 work with the OFED version for Windows. Thanks, DK > Thanks! > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Fri Sep 12 14:38:03 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Sep 12 14:38:17 2008 Subject: [mvapich-discuss] Mvapich2 on windows In-Reply-To: Message-ID: > Correction, NetworkDirect is actually a Microsoft product. So can mvapich2 > talk to it? We have not tried it. However, I believe this NetworkDirect interface is different than the standard OFED libibverbs interface. Thus, MVAPICH2-ofa interface will not work with it directly unless changes are done to the MVAPICH2 stack. Might be some people from Microsoft can provide us with more details and insights to this issue. Thanks, DK > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Friday, September 12, 2008 11:50 AM > To: Hoot Thompson > Cc: mvapich-discuss@cse.ohio-state.edu; 'Kirk Hunter' > Subject: Re: [mvapich-discuss] Mvapich2 on windows > > Hi Hoot, > > > Is there a way to install mvapich2 on a Windows platform under cygwin > > or the Portland CDK? We're testing a Microsoft HPC Cluster with > > infiniband and the OFED for windows software is installed (IB interfaces > appear to be working). > > The make.mvapich2.ofa script sets OPEN_IB_HOME to /usr/local/ofed. > > Can that be changed to be consistent with the Windows world and if so, to > what? > > I do not think simply changing the OPEN_IB_HOME to the Windows world will > enable it. We have not tried it. You can try it and let us know your > experience. I believe it will require substantial changes to make MVAPICH2 > work with the OFED version for Windows. > > Thanks, > > DK > > > Thanks! > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From hoot at ptpnow.com Fri Sep 12 14:40:21 2008 From: hoot at ptpnow.com (Hoot Thompson) Date: Fri Sep 12 14:40:41 2008 Subject: [mvapich-discuss] Mvapich2 on windows In-Reply-To: References: Message-ID: Thanks! -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Friday, September 12, 2008 2:38 PM To: Hoot Thompson Cc: 'Kirk Hunter'; mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Mvapich2 on windows > Correction, NetworkDirect is actually a Microsoft product. So can > mvapich2 talk to it? We have not tried it. However, I believe this NetworkDirect interface is different than the standard OFED libibverbs interface. Thus, MVAPICH2-ofa interface will not work with it directly unless changes are done to the MVAPICH2 stack. Might be some people from Microsoft can provide us with more details and insights to this issue. Thanks, DK > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Friday, September 12, 2008 11:50 AM > To: Hoot Thompson > Cc: mvapich-discuss@cse.ohio-state.edu; 'Kirk Hunter' > Subject: Re: [mvapich-discuss] Mvapich2 on windows > > Hi Hoot, > > > Is there a way to install mvapich2 on a Windows platform under > > cygwin or the Portland CDK? We're testing a Microsoft HPC Cluster > > with infiniband and the OFED for windows software is installed (IB > > interfaces > appear to be working). > > The make.mvapich2.ofa script sets OPEN_IB_HOME to /usr/local/ofed. > > Can that be changed to be consistent with the Windows world and if > > so, to > what? > > I do not think simply changing the OPEN_IB_HOME to the Windows world > will enable it. We have not tried it. You can try it and let us know > your experience. I believe it will require substantial changes to make > MVAPICH2 work with the OFED version for Windows. > > Thanks, > > DK > > > Thanks! > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From drace at appro.com Mon Sep 15 19:45:43 2008 From: drace at appro.com (David Race) Date: Mon Sep 15 19:46:06 2008 Subject: [mvapich-discuss] vbuf problem Message-ID: Hello, We are using mvapich2-1.2rc2 with a system that has four mellanox DDR interfaces in each computer and 16 cpus in each computer. When we define MV2_NUM_HCAS=4 we get a failure in line 230 of vbuf.c which indicates a failure in the following code for (i = 0; i < rdma_num_hcas; ++i) { reg->mem_handle[i] = ibv_reg_mr( ptag_save[i], vbuf_dma_buffer, nvbufs * rdma_vbuf_total_size, IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE); if (!reg->mem_handle[i]) { fprintf(stderr, "[%s %d] Cannot register vbuf region\n", __FILE__, __LINE__); return -1; } } We get this failure in as few as 289 processors, has someone run across this problem before? Is there a suggested set of environment variables that might help prevent the failure? Thanks David Race, Ph.D. Principle Engineer Appro International, Inc. 25003 Pitkin Road, Suite F600 Spring, TX 77386 Phone: 469-212-4860 Email: drace@appro.com From chai.15 at osu.edu Mon Sep 15 23:26:15 2008 From: chai.15 at osu.edu (Lei Chai) Date: Mon Sep 15 23:26:00 2008 Subject: [mvapich-discuss] vbuf problem In-Reply-To: References: Message-ID: <48CF2757.7010602@osu.edu> Hi David, Thanks for reporting the error. We have not tested it with 4 HCAs per node. Could you run the command "ulimit -l" on your system and let us know the output? If it's not "unlimited", please follow the instructions in the userguide section 9.3.4 ( http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2rc2.html#x1-530009.3.4 ) and set the limit to "unlimited" and try again. If you still see the error, then may I ask you the following questions: - Did you see the error with a benchmark or an application? And what benchmark/application is it? - What configure/make/run-time options did you use? - Do you see the error when using less than 4 HCAs? These will help us get more insight into the problem. Thanks, Lei David Race wrote: > Hello, > > We are using mvapich2-1.2rc2 with a system that has four mellanox DDR interfaces in each computer and 16 cpus in each computer. When we define > > MV2_NUM_HCAS=4 > > we get a failure in line 230 of vbuf.c which indicates a failure in the following code > > for (i = 0; i < rdma_num_hcas; ++i) > { > reg->mem_handle[i] = ibv_reg_mr( > ptag_save[i], > vbuf_dma_buffer, > nvbufs * rdma_vbuf_total_size, > IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE); > if (!reg->mem_handle[i]) > { > fprintf(stderr, "[%s %d] Cannot register vbuf region\n", __FILE__, __LINE__); > return -1; > } > } > We get this failure in as few as 289 processors, has someone run across this problem before? Is there a suggested set of environment variables that might help prevent the failure? > > Thanks > > David Race, Ph.D. > Principle Engineer > Appro International, Inc. > 25003 Pitkin Road, Suite F600 > Spring, TX 77386 > Phone: 469-212-4860 > Email: drace@appro.com > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From gopalakk at cse.ohio-state.edu Tue Sep 16 12:24:13 2008 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Tue Sep 16 12:24:29 2008 Subject: [mvapich-discuss] Assertion failure in mvapich 1.2rc2 (fwd) In-Reply-To: References: Message-ID: <92eddfb50809160924k6365f5bs1dfcbdb5da2441a0@mail.gmail.com> Hello. You can try adding "ulimit -c unlimited" to you .bashrc file. This should enable dumping core. Regards, Karthik On Tue, Sep 16, 2008 at 12:17 PM, Hari Subramoni wrote: > > > ---------- Forwarded message ---------- > Date: Thu, 11 Sep 2008 22:00:57 +0200 > From: "[ISO-8859-1] ?ke Sandgren" > To: Lei Chai > Cc: Dhabaleswar Panda , > mvapich-core@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] Assertion failure in mvapich 1.2rc2 > > On Thu, 2008-09-11 at 20:47 +0200, ?ke Sandgren wrote: >> On Thu, 2008-09-11 at 11:17 -0700, Lei Chai wrote: >> > Hi, >> > >> > To use the debug mode of mvapich2, you can use this configure option: >> > >> > ./configure --enable-g=dbg >> > >> > Then when you run the program, you can use "mpirun_rsh -debug" or >> > "mpiexec -gdb", then I think you will be able to get core dump by gdb >> > commands like "generate-core-file". Alternatively, you can replace the >> > assertion in the code with an artificial segfault, and see the backtrace. >> >> What i want is for the code to drop a core-file by itself when running >> through the batchsystem not through use of any debugger. It's a real >> pain running a debugger on anything with more then 4 tasks. >> (And our ddt license is for max 16 tasks so i can't use that either) > > Perhaps i should have been a bit more specifik. I want the mvapich to > not catch SEGV itself and instead let the system drop core. > > From drace at appro.com Tue Sep 16 14:42:27 2008 From: drace at appro.com (David Race) Date: Tue Sep 16 14:42:57 2008 Subject: [mvapich-discuss] vbuf problem In-Reply-To: <48CF2757.7010602@osu.edu> References: , <48CF2757.7010602@osu.edu> Message-ID: <114ACC34-3020-466F-BFB2-A4AEDD1DA105@mimectl> The "ulimit -l" is unlimited on all of the compute nodes and management nodes. We saw this error with a benchmark. It was a transpose algorithm. (I have included the application in the attached tar file.) I have attached the configure file and the runtime files in the tar file also. I saw the error with 1024 cpus and two HCA with the same application. Do you need any more information? Thanks David Race, Ph.D. Principle Engineer Appro International, Inc. 25003 Pitkin Road, Suite F600 Spring, TX 77386 Phone: 469-212-4860 Email: drace@appro.com ________________________________ From: Lei Chai [chai.15@osu.edu] Sent: Monday, September 15, 2008 10:26 PM To: David Race Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] vbuf problem Hi David, Thanks for reporting the error. We have not tested it with 4 HCAs per node. Could you run the command "ulimit -l" on your system and let us know the output? If it's not "unlimited", please follow the instructions in the userguide section 9.3.4 ( http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2rc2.html#x1-530009.3.4 ) and set the limit to "unlimited" and try again. If you still see the error, then may I ask you the following questions: - Did you see the error with a benchmark or an application? And what benchmark/application is it? - What configure/make/run-time options did you use? - Do you see the error when using less than 4 HCAs? These will help us get more insight into the problem. Thanks, Lei David Race wrote: > Hello, > > We are using mvapich2-1.2rc2 with a system that has four mellanox DDR interfaces in each computer and 16 cpus in each computer. When we define > > MV2_NUM_HCAS=4 > > we get a failure in line 230 of vbuf.c which indicates a failure in the following code > > for (i = 0; i < rdma_num_hcas; ++i) > { > reg->mem_handle[i] = ibv_reg_mr( > ptag_save[i], > vbuf_dma_buffer, > nvbufs * rdma_vbuf_total_size, > IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE); > if (!reg->mem_handle[i]) > { > fprintf(stderr, "[%s %d] Cannot register vbuf region\n", __FILE__, __LINE__); > return -1; > } > } > We get this failure in as few as 289 processors, has someone run across this problem before? Is there a suggested set of environment variables that might help prevent the failure? > > Thanks > > David Race, Ph.D. > Principle Engineer > Appro International, Inc. > 25003 Pitkin Road, Suite F600 > Spring, TX 77386 > Phone: 469-212-4860 > Email: drace@appro.com > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -------------- next part -------------- A non-text attachment was scrubbed... Name: bug.tar Type: application/x-tar Size: 20480 bytes Desc: bug.tar Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080916/2d00e3d6/bug-0001.tar From chai.15 at osu.edu Wed Sep 17 16:16:40 2008 From: chai.15 at osu.edu (Lei Chai) Date: Wed Sep 17 16:16:11 2008 Subject: [mvapich-discuss] vbuf problem In-Reply-To: <114ACC34-3020-466F-BFB2-A4AEDD1DA105@mimectl> References: <48CF2757.7010602@osu.edu> <114ACC34-3020-466F-BFB2-A4AEDD1DA105@mimectl> Message-ID: <48D165A8.3080400@osu.edu> Thanks for sending us the program and the related files. We are taking a look at the problem and will get back to you. In the mean time, could you try mvapich-1.0.3? And also with 1 HCA do you see this error at all? Thanks, Lei David Race wrote: > The "ulimit -l" is unlimited on all of the compute nodes and management nodes. > > We saw this error with a benchmark. It was a transpose algorithm. (I have included the application in the attached tar file.) > > I have attached the configure file and the runtime files in the tar file also. > > I saw the error with 1024 cpus and two HCA with the same application. > > Do you need any more information? > > Thanks > > David Race, Ph.D. > Principle Engineer > Appro International, Inc. > 25003 Pitkin Road, Suite F600 > Spring, TX 77386 > Phone: 469-212-4860 > Email: drace@appro.com > ________________________________ > From: Lei Chai [chai.15@osu.edu] > Sent: Monday, September 15, 2008 10:26 PM > To: David Race > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] vbuf problem > > Hi David, > > Thanks for reporting the error. We have not tested it with 4 HCAs per node. Could you run the command "ulimit -l" on your system and let us know the output? If it's not "unlimited", please follow the instructions in the userguide section 9.3.4 ( > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2rc2.html#x1-530009.3.4 > ) and set the limit to "unlimited" and try again. > > If you still see the error, then may I ask you the following questions: > > - Did you see the error with a benchmark or an application? And what benchmark/application is it? > > - What configure/make/run-time options did you use? > > - Do you see the error when using less than 4 HCAs? > > These will help us get more insight into the problem. > > Thanks, > Lei > > > David Race wrote: > > >> Hello, >> >> We are using mvapich2-1.2rc2 with a system that has four mellanox DDR interfaces in each computer and 16 cpus in each computer. When we define >> >> MV2_NUM_HCAS=4 >> >> we get a failure in line 230 of vbuf.c which indicates a failure in the following code >> >> for (i = 0; i < rdma_num_hcas; ++i) >> { >> reg->mem_handle[i] = ibv_reg_mr( >> ptag_save[i], >> vbuf_dma_buffer, >> nvbufs * rdma_vbuf_total_size, >> IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE); >> if (!reg->mem_handle[i]) >> { >> fprintf(stderr, "[%s %d] Cannot register vbuf region\n", __FILE__, __LINE__); >> return -1; >> } >> } >> We get this failure in as few as 289 processors, has someone run across this problem before? Is there a suggested set of environment variables that might help prevent the failure? >> >> Thanks >> >> David Race, Ph.D. >> Principle Engineer >> Appro International, Inc. >> 25003 Pitkin Road, Suite F600 >> Spring, TX 77386 >> Phone: 469-212-4860 >> Email: drace@appro.com >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> From panda at cse.ohio-state.edu Thu Sep 18 08:25:19 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Sep 18 08:25:24 2008 Subject: [mvapich-discuss] Q: MPI_ALLGATHERV causes Invalid communicator & comm=0x0 Message-ID: Manfred - Thanks for confirming that things are working well on your side wrt the modified application and MVAPICH2 library. For information to the list - An offline discussion was held on this issue with Manfred. With the help from ANL folks, an error was found in the MPI application (not in the MVAPICH2 library). Since things are working smooth with the modified application, we are closing this request. Thanks, DK ============================= Date: Thu, 3 Jul 2008 18:32:20 +0200 (CEST) From: Manfred Muecke To: mvapich-discuss@cse.ohio-state.edu Subject: Q: MPI_ALLGATHERV causes Invalid communicator & comm=0x0 Hi, I have the following problem and ran out of ideas. Maybe someone can help with some advice. I get the following error message from all instances of my MPI-program (FORTRAN90), using MVAPICH2 1.0 (compiled with "mpe=mpicheck"): Invalid communicator, error stack: MPI_Comm_rank(107): MPI_Comm_rank(comm=0x0, rank=fffffd7fffdfd2bc) failed MPI_Comm_rank(65).: Invalid communicator The error is caused by a call to MPI_ALLGATHERV. It was discussed here that a similiar looking error is caused by including the wrong mpi.h. This one differs however in that comm=0x0 (the actual value of the communicator was 1140850688). "mpif90 -show" gives: /opt/local/SunStudio12/SUNWspro/bin/f90 -xO3 -xtarget=opteron -m64 -I/opt/local/MVAPICH/mvapich2-1.0/include -xO3 -M/opt/local/MVAPICH/mvapich2-1.0/include -L/opt/local/MVAPICH/mvapich2-1.0/lib -lmpichf90 -lmpichf90 -lmpich -L/usr/lib/amd64 -L/usr/ucblib/amd64 -lsocket -lnsl -lresolv -lpthread -ldat -lrt -lnsl -lsocket I have checked thoroughly and can not find any mpi.h from other installations interfering. Any other ideas? Thanks for your help, Manfred -- Manfred M\303\274cke manfred.muecke@univie.ac.at Research Lab Computational Technologies and Applications rlcta.univie.ac.at Lenaugasse 2, 1080 Wien, AUSTRIA From michael.heinz at qlogic.com Thu Sep 18 09:07:08 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu Sep 18 09:07:14 2008 Subject: [mvapich-discuss] Problem with the newest mvapich RPM spec file? In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD759788@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD759788@mtlexch01.mtl.com> Message-ID: Pavil, Unfortunately, the modified spec file does not correct the behavior - configure still tries to create an f90 version. I believe the correct patch for configuring with gcc v3 is: *** mvapich.spec.orig 2008-09-18 09:03:50.000000000 -0400 --- mvapich.spec.new 2008-09-18 09:04:27.000000000 -0400 *************** *** 120,126 **** export FC=g77 export F77=g77 export F90=g77 ! CONFIG_ENABLE_F90="" fi export CFLAGS="-Wall" export FFLAGS="-fPIC" --- 120,126 ---- export FC=g77 export F77=g77 export F90=g77 ! CONFIG_ENABLE_F90="--disable-f90" fi export CFLAGS="-Wall" export FFLAGS="-fPIC" I've tested this patch and it correctly stops configure from trying to build mpif90. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Pavel Shamis [mailto:pasha@mellanox.co.il] Sent: Tuesday, September 09, 2008 11:03 AM To: Dhabaleswar Panda; Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Problem with the newest mvapich RPM spec file? Looks that it was some issue in the spec file, can you please check the new version : http://www.openfabrics.org/~pasha/ofed_1_4/mvapich/mvapich-1.1.0-2977.sr c.rpm Thanks, Pasha. > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Tuesday, September 09, 2008 4:45 PM > To: Mike Heinz; Pavel Shamis > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] Problem with the newest mvapich RPM > spec file? > > Mike - Thanks for the clarification. > > Pasha - Can you take a look at it. > > Thanks, > > DK > > > Sorry, OFED 1.4. The RPM is mvapich-1.1.0-2931.src.rpm > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > > > -----Original Message----- > > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > > Sent: Tuesday, September 09, 2008 9:36 AM > > To: Mike Heinz > > Cc: mvapich-discuss@cse.ohio-state.edu; pasha@mellanox.co.il > > Subject: Re: [mvapich-discuss] Problem with the newest mvapich RPM > > spec file? > > > > Mike, > > > > Are you referring to the MVAPICH RPM spec file in OFED > 1.3.1 or OFED > > 1.4? > > > > Thanks, > > > > DK > > > > On Tue, 9 Sep 2008, Mike Heinz wrote: > > > > > When building recent versions of the mvapich RPM, we are > seeing the > > > following error when compiling on systems that still have g77 > > installed: > > > > > > configure: error: Fortran 90 and Fortran 77 compilers are not > > > compatible. > > > > > > This was causing our automated build system to fail, even > though the > > > make process seems to ignore the error. This raises a couple of > > > questions: > > > > > > 1. g77 doesn't support Fortran 90 as far as I know - is > it correct > > > to configure mpif90 to use g77? > > > 2. Should this be an error or perhaps a warning? > > > > > > We can work around the error message easily enough now that we > > > understand it - but I'm a little concerned about distributing an > > > mpif90 command that won't be able to process real Fortran 90 > > > programs > > > - should the spec file be altered to disable Fortran 90 > in this case? > > > > > > -- > > > Michael Heinz > > > Principal Engineer, Qlogic Corporation King of Prussia, > Pennsylvania > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > From dstuebe at umassd.edu Thu Sep 18 21:15:25 2008 From: dstuebe at umassd.edu (David Stuebe) Date: Thu Sep 18 21:15:32 2008 Subject: [mvapich-discuss] Fwd: fortran system calls In-Reply-To: <1f31dac10809181814h6869624di933eb89ce6f7612@mail.gmail.com> References: <1f31dac10809181814h6869624di933eb89ce6f7612@mail.gmail.com> Message-ID: <1f31dac10809181815u70d75ab9l520d42c084057dc@mail.gmail.com> Hello MVAPICH I am helping set up and new cluster and I have run into a problem using mvapich to compile and run a Fortran90 code which uses system calls. The program compiles, but will not run on more than one node, even though only one processor makes the system call. Very strange! All is well when run on only one node of the cluster. Running: mpvapich2 1.0.2 Intel 10.1 compiler OFED 1.2.5.3 Linux X86_64 2.6.9-67.0.7.ELsmp Cluster built by aspen systems - dual processor Quad core hardware. Has anyone seen anything similar - I am not sure it is worth trying to fix, but if by posting it I save someones else some time, I will feel warm and fuzzy inside... !================================================== program mpi_test USE MPI implicit none INTEGER:: MYID,NPROCS, IERR WRITE(6,*)"START TEST" CALL MPI_INIT(IERR) WRITE(6,*)"MPI_INIT: MPI_COMM_WORLD,IERR",MPI_COMM_WORLD,IERR CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR) WRITE(6,*)"MPI_COMM_RANK: MYID,IERR",MYID,IERR CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPROCS,IERR) WRITE(6,*)"MPI_COMM_RANK: NPROCS,IERR",NPROCS,IERR CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) WRITE(6,*) "CALLED BARRIER: myid",myid,IERR IF(MYID==0) THEN CALL SYSTEM( "uptime > up_out" ) WRITE(6,*) "CALLED SYSTEM: myid",myid END IF CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) WRITE(6,*) "CALLED BARRIER: myid",myid,IERR CALL MPI_FINALIZE(IERR) end program mpi_test !================================================== RESULT FROM RUN:mpiexec -n 2 ./mpit START TEST START TEST MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0 MPI_COMM_RANK: MYID,IERR 0 0 MPI_COMM_RANK: NPROCS,IERR 2 0 MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0 MPI_COMM_RANK: MYID,IERR 1 0 MPI_COMM_RANK: NPROCS,IERR 2 0 CALLED BARRIER: myid 1 0 CALLED BARRIER: myid 0 0 CALLED SYSTEM: myid 0 CALLED BARRIER: myid 0 0 send desc error [0] Abort: [] Got completion with error 4, vendor code=52, dest rank=1 at line 513 in file ibv_channel_manager.c rank 0 in job 50 cpr_52824 caused collective abort of all ranks exit status of rank 0: killed by signal 9 Thanks so much David From koop at cse.ohio-state.edu Thu Sep 18 21:35:46 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Thu Sep 18 21:35:50 2008 Subject: [mvapich-discuss] Fwd: fortran system calls In-Reply-To: <1f31dac10809181815u70d75ab9l520d42c084057dc@mail.gmail.com> Message-ID: Hi David, This is a known problem with OFED. Your kernel is too old to support system calls and OFED at the same time. To have fork() and system call support you need to have a 2.6.16 or later kernel with OFED 1.2+ and also export the IBV_FORK_SAFE=1 environment variable. This is why it isn't having any problems on a single node since shared memory (and not IB) is being used for communication. Matt On Thu, 18 Sep 2008, David Stuebe wrote: > Hello MVAPICH > > I am helping set up and new cluster and I have run into a problem > using mvapich to compile and run a Fortran90 code which uses system > calls. The program compiles, but will not run on more than one node, > even though only one processor makes the system call. Very strange! > > All is well when run on only one node of the cluster. > > Running: > mpvapich2 1.0.2 > Intel 10.1 compiler > OFED 1.2.5.3 > Linux X86_64 2.6.9-67.0.7.ELsmp > > Cluster built by aspen systems - dual processor Quad core hardware. > > Has anyone seen anything similar - I am not sure it is worth trying to > fix, but if by posting it I save someones else some time, I will feel > warm and fuzzy inside... > > !================================================== > program mpi_test > USE MPI > implicit none > > INTEGER:: MYID,NPROCS, IERR > > WRITE(6,*)"START TEST" > CALL MPI_INIT(IERR) > WRITE(6,*)"MPI_INIT: MPI_COMM_WORLD,IERR",MPI_COMM_WORLD,IERR > > CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR) > WRITE(6,*)"MPI_COMM_RANK: MYID,IERR",MYID,IERR > CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPROCS,IERR) > WRITE(6,*)"MPI_COMM_RANK: NPROCS,IERR",NPROCS,IERR > > CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) > > WRITE(6,*) "CALLED BARRIER: myid",myid,IERR > > > IF(MYID==0) THEN > > CALL SYSTEM( "uptime > up_out" ) > WRITE(6,*) "CALLED SYSTEM: myid",myid > END IF > > CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) > > WRITE(6,*) "CALLED BARRIER: myid",myid,IERR > > > > CALL MPI_FINALIZE(IERR) > > > end program mpi_test > !================================================== > > RESULT FROM RUN:mpiexec -n 2 ./mpit > > START TEST > START TEST > MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0 > MPI_COMM_RANK: MYID,IERR 0 0 > MPI_COMM_RANK: NPROCS,IERR 2 0 > MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0 > MPI_COMM_RANK: MYID,IERR 1 0 > MPI_COMM_RANK: NPROCS,IERR 2 0 > CALLED BARRIER: myid 1 0 > CALLED BARRIER: myid 0 0 > CALLED SYSTEM: myid 0 > CALLED BARRIER: myid 0 0 > send desc error > [0] Abort: [] Got completion with error 4, vendor code=52, dest rank=1 > at line 513 in file ibv_channel_manager.c > rank 0 in job 50 cpr_52824 caused collective abort of all ranks > exit status of rank 0: killed by signal 9 > > > Thanks so much > > David > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From L-marks at northwestern.edu Fri Sep 19 16:47:33 2008 From: L-marks at northwestern.edu (Laurence Marks) Date: Fri Sep 19 16:47:38 2008 Subject: [mvapich-discuss] Possible mvapich bug (possibly not as well). Message-ID: <876512660809191347i7eb20dcav8be27c88aceeff7d@mail.gmail.com> I have a highly reproducible, but so far untraceable problem. It could be due to mvapich, but also not. In a code which calls the scalapack subroutine PDSYGST (which uses two distributed matrices), if the matrices are 36927x36927 it works fine; if they are 38381x38381 it runs forever, i.e.until I kill it. This behavior occurs for the Intel mkl versions 10.0.3.020, 10.0.4.023, 10.1.0.009 and ifort/icc versions 10.1.015 and 10.1.018. It occurs for both an April 2008 svn of mvapich, and an svn of a few days ago. It also occurs with OFED-1.2.5.5 and OFED-1.3. I would welcome any suggestions. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/IUCR_CED From panda at cse.ohio-state.edu Fri Sep 19 17:54:06 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Sep 19 17:54:10 2008 Subject: [mvapich-discuss] Possible mvapich bug (possibly not as well). In-Reply-To: <876512660809191347i7eb20dcav8be27c88aceeff7d@mail.gmail.com> Message-ID: Thanks for your report. What is your computing platform and how much memory your computing nodes have (per processor/core)? Does your application (using the 38381x38381 PDSYGST configuration) require more memory than being available on these platforms. As you might be knowing, if you run an application which requires higher amount of memory than being available, a lot of swapping will occur and your computation will not be able to make progress. Not sure whether this is the situation you are encountering. Does this problem happen for this exact matrix size? Are you able to run your application with any higher sized matrix? If it is a multi-core-based cluster, can you run your application using more nodes and less cores/node while keeping the total number of cores for the application constant (such as using 8 nodes with 4 cores/node vs. 4 nodes with 8 cores/node). If the application runs with the first configuration but not with the second configuration, it will show that you are getting constrained by the amount of memory being available per core/node. Thanks, DK On Fri, 19 Sep 2008, Laurence Marks wrote: > I have a highly reproducible, but so far untraceable problem. It could > be due to mvapich, but also not. > > In a code which calls the scalapack subroutine PDSYGST (which uses two > distributed matrices), if the matrices are 36927x36927 it works fine; > if they are 38381x38381 it runs forever, i.e.until I kill it. > > This behavior occurs for the Intel mkl versions 10.0.3.020, > 10.0.4.023, 10.1.0.009 and ifort/icc versions 10.1.015 and 10.1.018. > It occurs for both an April 2008 svn of mvapich, and an svn of a few > days ago. It also occurs with OFED-1.2.5.5 and OFED-1.3. > > I would welcome any suggestions. > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/IUCR_CED > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From mbkumar at gmail.com Fri Sep 19 18:30:12 2008 From: mbkumar at gmail.com (Bharat) Date: Fri Sep 19 18:30:22 2008 Subject: [mvapich-discuss] Possible mvapich bug (possibly not as well). In-Reply-To: <876512660809191347i7eb20dcav8be27c88aceeff7d@mail.gmail.com> References: <876512660809191347i7eb20dcav8be27c88aceeff7d@mail.gmail.com> Message-ID: I observed a similar problem with a program, which calls SCALAPCK routines. The trouble was the program for certain job distribution patterns runs fine upto certain stage and then gives no further output, but the threads run @ 100% CPU utilization. The same program would run till the end for following cases 64 threads on 16nodes (8 cores/node), 64 threads on 8 machines and some others, but gives problems for 32 threads, 16threads on 16 machines.... I was using mvapich2-1.0.3, mkl 10.0.1.014 mkl libraries After replacing mvapich2-1.0.3 with mvapich2-1.2RC2, the problem disappeared. Rgds, Bharat On Fri, 19 Sep 2008 14:47:33 -0600, Laurence Marks wrote: > I have a highly reproducible, but so far untraceable problem. It could > be due to mvapich, but also not. > > In a code which calls the scalapack subroutine PDSYGST (which uses two > distributed matrices), if the matrices are 36927x36927 it works fine; > if they are 38381x38381 it runs forever, i.e.until I kill it. > > This behavior occurs for the Intel mkl versions 10.0.3.020, > 10.0.4.023, 10.1.0.009 and ifort/icc versions 10.1.015 and 10.1.018. > It occurs for both an April 2008 svn of mvapich, and an svn of a few > days ago. It also occurs with OFED-1.2.5.5 and OFED-1.3. > > I would welcome any suggestions. > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ From L-marks at northwestern.edu Fri Sep 19 18:42:41 2008 From: L-marks at northwestern.edu (Laurence Marks) Date: Fri Sep 19 18:42:46 2008 Subject: [mvapich-discuss] Possible mvapich bug (possibly not as well). In-Reply-To: References: <876512660809191347i7eb20dcav8be27c88aceeff7d@mail.gmail.com> Message-ID: <876512660809191542q1823a99o388a553f99e986a3@mail.gmail.com> The computing platform is Intel duo quad-cores with 8G per node. Intel(R) Xeon(R) CPU E5410 @ 2.33GHz. Linux version 2.6.18-8.1.15.el5 (mockbuild@builder6.centos.org) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Mon Oct 22 08:32:28 EDT 2007 I have been running this on between 80 and 96 cores (10-12 nodes), both with a single core only used on the master (first in the list) or with 8, the same result. It is not memory limited, only using ~6G/node and not getting into swap from what I can see -- I have run into this one and that's not it. (The total memory needed is around 50G.) I have not pinned down the exact size where this occurs, I just know that it's between 36927x36927 and 38381x38381 ; I did run one larger and it failed the same way. On Fri, Sep 19, 2008 at 4:54 PM, Dhabaleswar Panda wrote: > Thanks for your report. What is your computing platform and how much > memory your computing nodes have (per processor/core)? Does your > application (using the 38381x38381 PDSYGST configuration) require more > memory than being available on these platforms. As you might be knowing, > if you run an application which requires higher amount of memory than > being available, a lot of swapping will occur and your computation will > not be able to make progress. Not sure whether this is the situation you > are encountering. Does this problem happen for this exact matrix size? Are > you able to run your application with any higher sized matrix? > > If it is a multi-core-based cluster, can you run your application using > more nodes and less cores/node while keeping the total number of cores for > the application constant (such as using 8 nodes with 4 cores/node vs. 4 > nodes with 8 cores/node). If the application runs with the first > configuration but not with the second configuration, it will show that you > are getting constrained by the amount of memory being available per > core/node. > > > Thanks, > > DK > > On Fri, 19 Sep 2008, Laurence Marks wrote: > >> I have a highly reproducible, but so far untraceable problem. It could >> be due to mvapich, but also not. >> >> In a code which calls the scalapack subroutine PDSYGST (which uses two >> distributed matrices), if the matrices are 36927x36927 it works fine; >> if they are 38381x38381 it runs forever, i.e.until I kill it. >> >> This behavior occurs for the Intel mkl versions 10.0.3.020, >> 10.0.4.023, 10.1.0.009 and ifort/icc versions 10.1.015 and 10.1.018. >> It occurs for both an April 2008 svn of mvapich, and an svn of a few >> days ago. It also occurs with OFED-1.2.5.5 and OFED-1.3. >> >> I would welcome any suggestions. >> >> -- >> Laurence Marks >> Department of Materials Science and Engineering >> MSE Rm 2036 Cook Hall >> 2220 N Campus Drive >> Northwestern University >> Evanston, IL 60208, USA >> Tel: (847) 491-3996 Fax: (847) 491-7820 >> email: L-marks at northwestern dot edu >> Web: www.numis.northwestern.edu >> Chair, Commission on Electron Crystallography of IUCR >> www.numis.northwestern.edu/IUCR_CED >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/IUCR_CED From L-marks at northwestern.edu Fri Sep 19 18:44:40 2008 From: L-marks at northwestern.edu (Laurence Marks) Date: Fri Sep 19 18:44:44 2008 Subject: [mvapich-discuss] Possible mvapich bug (possibly not as well). In-Reply-To: References: <876512660809191347i7eb20dcav8be27c88aceeff7d@mail.gmail.com> Message-ID: <876512660809191544j6343ef59i7d7053cf516d343d@mail.gmail.com> Hmmm. I will look at mvapich2, I've been using mvapich. N.B., mkl 10.0.1 was a bit buggy for my application. On Fri, Sep 19, 2008 at 5:30 PM, Bharat wrote: > I observed a similar problem with a program, which calls SCALAPCK routines. > The trouble was the program for certain job distribution patterns runs fine > upto certain stage and then gives no further output, but the threads run > @ 100% CPU utilization. > The same program would run till the end for following cases > 64 threads on 16nodes (8 cores/node), 64 threads on 8 machines and some > others, > but gives problems for 32 threads, 16threads on 16 machines.... > I was using mvapich2-1.0.3, mkl 10.0.1.014 mkl libraries > > After replacing mvapich2-1.0.3 with mvapich2-1.2RC2, the problem > disappeared. > > > Rgds, > Bharat > > On Fri, 19 Sep 2008 14:47:33 -0600, Laurence Marks > wrote: > >> I have a highly reproducible, but so far untraceable problem. It could >> be due to mvapich, but also not. >> >> In a code which calls the scalapack subroutine PDSYGST (which uses two >> distributed matrices), if the matrices are 36927x36927 it works fine; >> if they are 38381x38381 it runs forever, i.e.until I kill it. >> >> This behavior occurs for the Intel mkl versions 10.0.3.020, >> 10.0.4.023, 10.1.0.009 and ifort/icc versions 10.1.015 and 10.1.018. >> It occurs for both an April 2008 svn of mvapich, and an svn of a few >> days ago. It also occurs with OFED-1.2.5.5 and OFED-1.3. >> >> I would welcome any suggestions. >> > > > > -- > Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/IUCR_CED From moody20 at llnl.gov Fri Sep 19 19:34:37 2008 From: moody20 at llnl.gov (Adam Moody) Date: Fri Sep 19 19:34:43 2008 Subject: [mvapich-discuss] Got FATAL event 0 Message-ID: <48D4370D.4040506@llnl.gov> Hello MVAPICH team, I have a user hitting some errors, and I'm hoping you may have some insight. When running with MVAPICH1-0.9.7, the user sees the following non-fatal error message on occasion: Error getting event! [0] Got unknown event 1075841344 (Unknown) ... continuing ... With 0.9.9 (and PTMALLOC disabled), the user sees the following fatal error with the same frequency as the above message: [0] Got FATAL event 0 (CQ Error) This error is detected by the async_thread function in viachek.c. The series of MPI calls the user app has made at this point looks like the following: MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); MPI_Initialized(&initialized); MPI_Comm_size(MPI_COMM_WORLD, &d_size) MPI_Comm_rank(MPI_COMM_WORLD, &d_rank) MPI_Bcast(const_cast(d_key), SECURE_KEY_SIZE, MPI_CHAR, 0, MPI_COMM_WORLD); MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast(const_cast(d_parentUrl.c_str()), length, MPI_CHAR, 0, MPI_COMM_WORLD); MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast(const_cast(d_rank0Url.c_str()), length, MPI_CHAR, 0, MPI_COMM_WORLD); Have others reported this problem before? Any idea on how to fix it? Thanks again, -Adam From panda at cse.ohio-state.edu Fri Sep 19 20:17:33 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Sep 19 20:17:39 2008 Subject: [mvapich-discuss] Got FATAL event 0 In-Reply-To: <48D4370D.4040506@llnl.gov> Message-ID: Hi Adam, Thanks for reporting this. As you know, MVAPICH 0.9.7 and 0.9.9 are older versions (almost 1.5 to 2.0 years old). We are coming closer to 1.1 release. You need to upgrade your MVAPICH stack :-) A lot of enhancements (feature wise) and bug fixes have happened to MVAPICH library (including MPI_BCast) in the recent years. Can you verify whether this error happens with the latest MVAPICH 1.0.1 release. If this error happens with the 1.0.1 release, it will be much more quicker to analyze and debug it. Thanks, DK On Fri, 19 Sep 2008, Adam Moody wrote: > Hello MVAPICH team, > I have a user hitting some errors, and I'm hoping you may have some > insight. When running with MVAPICH1-0.9.7, the user sees the following > non-fatal error message on occasion: > > Error getting event! > [0] Got unknown event 1075841344 (Unknown) ... continuing ... > > With 0.9.9 (and PTMALLOC disabled), the user sees the following fatal > error with the same frequency as the above message: > > [0] Got FATAL event 0 (CQ Error) > > This error is detected by the async_thread function in viachek.c. The > series of MPI calls the user app has made at this point looks like the > following: > > MPI_Init(&argc, &argv); > MPI_Comm_rank(MPI_COMM_WORLD, &myRank); > MPI_Initialized(&initialized); > MPI_Comm_size(MPI_COMM_WORLD, &d_size) > MPI_Comm_rank(MPI_COMM_WORLD, &d_rank) > MPI_Bcast(const_cast(d_key), SECURE_KEY_SIZE, MPI_CHAR, > 0, MPI_COMM_WORLD); > MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD); > MPI_Bcast(const_cast(d_parentUrl.c_str()), length, MPI_CHAR, 0, > MPI_COMM_WORLD); > MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD); > MPI_Bcast(const_cast(d_rank0Url.c_str()), length, MPI_CHAR, 0, > MPI_COMM_WORLD); > > Have others reported this problem before? Any idea on how to fix it? > Thanks again, > -Adam > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From thakur at mcs.anl.gov Mon Sep 22 12:05:19 2008 From: thakur at mcs.anl.gov (Rajeev Thakur) Date: Mon Sep 22 12:05:30 2008 Subject: [mvapich-discuss] FW: [mpich-discuss] Odd differences between runs on two clusters / Lost messages? Message-ID: <503D201CCC5E435987847312CC3FAECB@mcs.anl.gov> I am forwarding your note to the MVAPICH mailing list. Rajeev -----Original Message----- From: owner-mpich-discuss@mcs.anl.gov [mailto:owner-mpich-discuss@mcs.anl.gov] On Behalf Of Nicolas Rosner Sent: Monday, September 22, 2008 6:29 AM To: mpich-discuss@mcs.anl.gov Subject: [mpich-discuss] Odd differences between runs on two clusters / Lost messages? Hello, hope this is the right place to ask about this. I'm developing an MPI app using a "task pool" approach: there is a "pool" process that essentially monitors a queue, allowing (safely synchronized, hopefully) PUSH and POP messages on this data structure, and there are N agents that: 1) POP a new task from the pool 2) try to solve it for a while 3) either declare it solved and go back to step 1, or 4) declare it "too hard", split it up into subtasks, and 5) PUSH each generated subtask back into the pool 6) go back to step 1. The tasks have hierarchical IDs, the root task being "1". Thus, an agent could obtain task 1.23.15 and, after massaging it for a while, decide to partition it into 1.23.15.1, 1.23.15.2, 1.23.15.3, etc, all of which would be queued at the central pool, waiting to be obtained by other agents, and so on. My program seems to run fine on my test cluster, which consists of 3 dual-core PCs in my office running MPICH2 1.0.7. But it's not working well on the production one, consisting of 50 quad-core nodes, on an InfiniBand network, running MVAPICH 1.0. I have already asked the cluster admins whether it would be possible to upgrade MPI on the cluster to the latest MVAPICH release (which seems to be based on the same MPICH2 1.0.7 that is installed on the test cluster). But the problem seems basic enough, and I'd be surprised if the rather old MVAPICH version was to blame. (Meaning, my guess is that I probably have some bug that shows itself quite easily on the one platform, remains asymptomatic on the other one, yet is still a bug). You can see an example of a (very verbose) logfile showing the unwanted behavior here: http://neurus.dc.uba.ar/rsat/logs/publ/112/knine.o122 The three lines where we last hear about agents 2, 3 and 4 are 8.74 s -- Agent 3 sending PUSH message to pool for task 1.1.7 9.15 s -- Agent 4 sending PUSH message to pool for task 1.2.7 29.75 s -- Agent 2 sending PUSH message to pool for task 1.3.7 The agents are using fully synchronous Ssend()s for the PUSH messages, and the pool process is using Iprobe() to test whether there is a new message, and if that returns true, Recv() to get it. Notice how several PUSHes in a row succeed fine, then suddenly one of them gets lost somehow. The pool process doesn't seem to ever get the message (i.e. Iprobe() never returns true for it) so, naturally, the sending agent blocks forever on its Ssend() call. Once this happens to all agents, progress stops. If I try to run that same test, with same inputs, same parameters, same PRNG seed, same number of agents, etc, but on the test cluster in my office, it runs fine; messages are not lost, progress never stops and the process eventually ends normally. Any input, comments or suggestions would be greatly appreciated. I can provide source code, more logs and further details upon request to anyone interested in helping me out. Thanks a lot in advance! Nicol?s From panda at cse.ohio-state.edu Mon Sep 22 12:38:56 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Sep 22 12:39:01 2008 Subject: [mvapich-discuss] FW: [mpich-discuss] Odd differences between runs on two clusters / Lost messages? In-Reply-To: <503D201CCC5E435987847312CC3FAECB@mcs.anl.gov> Message-ID: Rajeev - Thanks for forwarding this note to mvapich-discuss list. Nicolas - If your application runs well with MPICH2 1.0.7, you should use MVAPICH2 1.2 (this is based on MPICH2 1.0.7). If this version does not work, please let us know. Please note that we have two versions: 1) MVAPICH (based on MPICH and supports MPI-1 functionalities only) and 2) MVAPICH2 (based on MPICH2 and supports both MPI-2 and MPI-1 functionalities). If you are using MPI-2 functionalities, please use MVAPICH2 (not MVAPICH). Thanks, DK On Mon, 22 Sep 2008, Rajeev Thakur wrote: > I am forwarding your note to the MVAPICH mailing list. > > Rajeev > > > -----Original Message----- > From: owner-mpich-discuss@mcs.anl.gov > [mailto:owner-mpich-discuss@mcs.anl.gov] On Behalf Of Nicolas Rosner > Sent: Monday, September 22, 2008 6:29 AM > To: mpich-discuss@mcs.anl.gov > Subject: [mpich-discuss] Odd differences between runs on two clusters / Lost > messages? > > Hello, hope this is the right place to ask about this. > > I'm developing an MPI app using a "task pool" approach: there is a > "pool" process that essentially monitors a queue, allowing (safely > synchronized, hopefully) PUSH and POP messages on this data structure, > and there are N agents that: > > 1) POP a new task from the pool > 2) try to solve it for a while > 3) either declare it solved and go back to step 1, or > 4) declare it "too hard", split it up into subtasks, and > 5) PUSH each generated subtask back into the pool > 6) go back to step 1. > > The tasks have hierarchical IDs, the root task being "1". Thus, an > agent could obtain task 1.23.15 and, after massaging it for a while, > decide to partition it into 1.23.15.1, 1.23.15.2, 1.23.15.3, etc, all > of which would be queued at the central pool, waiting to be obtained > by other agents, and so on. > > My program seems to run fine on my test cluster, which consists of 3 > dual-core PCs in my office running MPICH2 1.0.7. But it's not working > well on the production one, consisting of 50 quad-core nodes, on an > InfiniBand network, running MVAPICH 1.0. > > I have already asked the cluster admins whether it would be possible > to upgrade MPI on the cluster to the latest MVAPICH release (which > seems to be based on the same MPICH2 1.0.7 that is installed on the > test cluster). But the problem seems basic enough, and I'd be > surprised if the rather old MVAPICH version was to blame. (Meaning, my > guess is that I probably have some bug that shows itself quite easily > on the one platform, remains asymptomatic on the other one, yet is > still a bug). > > You can see an example of a (very verbose) logfile showing the > unwanted behavior here: > > http://neurus.dc.uba.ar/rsat/logs/publ/112/knine.o122 > > The three lines where we last hear about agents 2, 3 and 4 are > > 8.74 s -- Agent 3 sending PUSH message to pool for task 1.1.7 > 9.15 s -- Agent 4 sending PUSH message to pool for task 1.2.7 > 29.75 s -- Agent 2 sending PUSH message to pool for task 1.3.7 > > The agents are using fully synchronous Ssend()s for the PUSH messages, > and the pool process is using Iprobe() to test whether there is a new > message, and if that returns true, Recv() to get it. > > Notice how several PUSHes in a row succeed fine, then suddenly one of > them gets lost somehow. The pool process doesn't seem to ever get the > message (i.e. Iprobe() never returns true for it) so, naturally, the > sending agent blocks forever on its Ssend() call. Once this happens to > all agents, progress stops. > > If I try to run that same test, with same inputs, same parameters, > same PRNG seed, same number of agents, etc, but on the test cluster in > my office, it runs fine; messages are not lost, progress never stops > and the process eventually ends normally. > > Any input, comments or suggestions would be greatly appreciated. I > can provide source code, more logs and further details upon request to > anyone interested in helping me out. > > Thanks a lot in advance! > > Nicolás > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From nrosner at gmail.com Mon Sep 22 16:46:50 2008 From: nrosner at gmail.com (Nicolas Rosner) Date: Mon Sep 22 16:51:29 2008 Subject: [mvapich-discuss] FW: [mpich-discuss] Odd differences between runs on two clusters / Lost messages? In-Reply-To: References: <503D201CCC5E435987847312CC3FAECB@mcs.anl.gov> Message-ID: <152828240809221346m1d577810gd29a22636d37eb5d@mail.gmail.com> Hello, thanks for your reply! > Please note that we have two versions [...] Sorry, my bad for omitting the 2 after MVAPICH. Yes, the production cluster IS using MVAPICH2 version 1.0. > If your application runs well with MPICH2 1.0.7, you should use > MVAPICH2 1.2 (this is based on MPICH2 1.0.7). If this version does not > work, please let us know. As I said in my previous email, I'd love to do so, but the second cluster's administration is beyond my control. I have already asked the admins to do the upgrade; I guess they might do it sooner or later, or not at all, depending on the kind of impact that would have on other users of the cluster. Was 1.0 a particularly flawed or quirky version of MVAPICH2? It seems to be working fine for the rest of the cluster users. Then again, the admin said I'm the only one doing actual MPI development right now (rest of users tend to use "canned" MPI programs and feed them their datasets). On the other hand, how different can 1.2 be from 1.0, considering I'm just using three or four primitives? (Ssend, Send, Iprobe, Recv). I still think there must be some error on my code, rather than the base libraries. If there's anything you think I should try while waiting for the update, I'd be very interested in hearing about it. Thanks again for your time, Nicol?s > On Mon, 22 Sep 2008, Rajeev Thakur wrote: > > > I am forwarding your note to the MVAPICH mailing list. > > > > Rajeev > > > > > > -----Original Message----- > > From: owner-mpich-discuss@mcs.anl.gov > > [mailto:owner-mpich-discuss@mcs.anl.gov] On Behalf Of Nicolas Rosner > > Sent: Monday, September 22, 2008 6:29 AM > > To: mpich-discuss@mcs.anl.gov > > Subject: [mpich-discuss] Odd differences between runs on two clusters / Lost > > messages? > > > > Hello, hope this is the right place to ask about this. > > > > I'm developing an MPI app using a "task pool" approach: there is a > > "pool" process that essentially monitors a queue, allowing (safely > > synchronized, hopefully) PUSH and POP messages on this data structure, > > and there are N agents that: > > > > 1) POP a new task from the pool > > 2) try to solve it for a while > > 3) either declare it solved and go back to step 1, or > > 4) declare it "too hard", split it up into subtasks, and > > 5) PUSH each generated subtask back into the pool > > 6) go back to step 1. > > > > The tasks have hierarchical IDs, the root task being "1". Thus, an > > agent could obtain task 1.23.15 and, after massaging it for a while, > > decide to partition it into 1.23.15.1, 1.23.15.2, 1.23.15.3, etc, all > > of which would be queued at the central pool, waiting to be obtained > > by other agents, and so on. > > > > My program seems to run fine on my test cluster, which consists of 3 > > dual-core PCs in my office running MPICH2 1.0.7. But it's not working > > well on the production one, consisting of 50 quad-core nodes, on an > > InfiniBand network, running MVAPICH 1.0. > > > > I have already asked the cluster admins whether it would be possible > > to upgrade MPI on the cluster to the latest MVAPICH release (which > > seems to be based on the same MPICH2 1.0.7 that is installed on the > > test cluster). But the problem seems basic enough, and I'd be > > surprised if the rather old MVAPICH version was to blame. (Meaning, my > > guess is that I probably have some bug that shows itself quite easily > > on the one platform, remains asymptomatic on the other one, yet is > > still a bug). > > > > You can see an example of a (very verbose) logfile showing the > > unwanted behavior here: > > > > http://neurus.dc.uba.ar/rsat/logs/publ/112/knine.o122 > > > > The three lines where we last hear about agents 2, 3 and 4 are > > > > 8.74 s -- Agent 3 sending PUSH message to pool for task 1.1.7 > > 9.15 s -- Agent 4 sending PUSH message to pool for task 1.2.7 > > 29.75 s -- Agent 2 sending PUSH message to pool for task 1.3.7 > > > > The agents are using fully synchronous Ssend()s for the PUSH messages, > > and the pool process is using Iprobe() to test whether there is a new > > message, and if that returns true, Recv() to get it. > > > > Notice how several PUSHes in a row succeed fine, then suddenly one of > > them gets lost somehow. The pool process doesn't seem to ever get the > > message (i.e. Iprobe() never returns true for it) so, naturally, the > > sending agent blocks forever on its Ssend() call. Once this happens to > > all agents, progress stops. > > > > If I try to run that same test, with same inputs, same parameters, > > same PRNG seed, same number of agents, etc, but on the test cluster in > > my office, it runs fine; messages are not lost, progress never stops > > and the process eventually ends normally. > > > > Any input, comments or suggestions would be greatly appreciated. I > > can provide source code, more logs and further details upon request to > > anyone interested in helping me out. > > > > Thanks a lot in advance! > > > > Nicol?s > > > > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > From panda at cse.ohio-state.edu Mon Sep 22 17:32:07 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Sep 22 17:32:10 2008 Subject: [mvapich-discuss] FW: [mpich-discuss] Odd differences between runs on two clusters / Lost messages? In-Reply-To: <152828240809221346m1d577810gd29a22636d37eb5d@mail.gmail.com> Message-ID: Hi Nicolas, Thanks for your reply and confirming that you are running MVAPICH2. Does your cluster run MVAPICH2 1.0.0 or the latest bugfix version of MVAPICH2 which is 1.0.3. Please note that when we make our release, we go through extensive testing. However, it is not possible to test it against all possible MPI programs in the world. As we get feedbacks from users about problems, we continuously fix bugs and make bugfix releases periodically. In addition, subsequent releases also add new features. On the MVAPICH2 front, new releases also get `sync'ed' up with the latest MPICH2 releases from ANL. For example, MVAPICH2 1.0 series is based on MPICH2 1.0.5p4 where as MVAPICH2 1.2 series is based on MPICH2 1.0.7. Thus, if your application works well with MPICH2 1.0.7, it is not guranteed that it will work with MVAPICH2 1.0 series (because it is based on MPICH2 1.0.5p4). That's why I had suggested you in my earlier e-mail to test it against MVAPICH2 1.2 series because it is based on MPICH2 1.0.7. Based on the above description, let me suggest you to proceed as follows: 1. Since your cluster has MVAPICH2 1.0 installed, please test your application using MPICH2 1.0.5p4. If it does not work with MPICH2 1.0.5p4, it will not work with MVAPICH2 1.0. 2. If your program works with MPICH2 1.0.5p4, but not with MVAPICH2 1.0.0, please tell your cluster administrator to update your installation to the latest bugfix release of 1.0 series (MVAPICH2 1.0.3.). Such bugfix updates are normal for clusters and your cluster adiministrator should be doing this periodically. Details of the changes in the mvapich2 1.0 series can be obtained from the following URL: http://mvapich.cse.ohio-state.edu/download/mvapich2/changes.shtml 3. If your application still fails with MVAPICH2 1.0.3, you may try to install MVAPICH2 1.2Rc2 on your directoy and try it out. MVAPICH2 is a user-level library and you should be able to install it in your home directory and try it out. 4. If it still does not work out, let us know and we will take a look at the issue in more depth. If you can send us a sample of your application, it will be much faster for us to debug the problem and come up with a fix in MVAPICH2 library. Thanks, DK On Mon, 22 Sep 2008, Nicolas Rosner wrote: > Hello, thanks for your reply! > > > > Please note that we have two versions [...] > > Sorry, my bad for omitting the 2 after MVAPICH. Yes, the production > cluster IS using MVAPICH2 version 1.0. > > > > If your application runs well with MPICH2 1.0.7, you should use > > MVAPICH2 1.2 (this is based on MPICH2 1.0.7). If this version does not > > work, please let us know. > > As I said in my previous email, I'd love to do so, but the second > cluster's administration is beyond my control. I have already asked > the admins to do the upgrade; I guess they might do it sooner or > later, or not at all, depending on the kind of impact that would have > on other users of the cluster. > > Was 1.0 a particularly flawed or quirky version of MVAPICH2? It seems > to be working fine for the rest of the cluster users. Then again, the > admin said I'm the only one doing actual MPI development right now > (rest of users tend to use "canned" MPI programs and feed them their > datasets). On the other hand, how different can 1.2 be from 1.0, > considering I'm just using three or four primitives? (Ssend, Send, > Iprobe, Recv). > > I still think there must be some error on my code, rather than the > base libraries. > > If there's anything you think I should try while waiting for the > update, I'd be very interested in hearing about it. > > Thanks again for your time, > Nicolás > > > > > On Mon, 22 Sep 2008, Rajeev Thakur wrote: > > > > > I am forwarding your note to the MVAPICH mailing list. > > > > > > Rajeev > > > > > > > > > -----Original Message----- > > > From: owner-mpich-discuss@mcs.anl.gov > > > [mailto:owner-mpich-discuss@mcs.anl.gov] On Behalf Of Nicolas Rosner > > > Sent: Monday, September 22, 2008 6:29 AM > > > To: mpich-discuss@mcs.anl.gov > > > Subject: [mpich-discuss] Odd differences between runs on two clusters / Lost > > > messages? > > > > > > Hello, hope this is the right place to ask about this. > > > > > > I'm developing an MPI app using a "task pool" approach: there is a > > > "pool" process that essentially monitors a queue, allowing (safely > > > synchronized, hopefully) PUSH and POP messages on this data structure, > > > and there are N agents that: > > > > > > 1) POP a new task from the pool > > > 2) try to solve it for a while > > > 3) either declare it solved and go back to step 1, or > > > 4) declare it "too hard", split it up into subtasks, and > > > 5) PUSH each generated subtask back into the pool > > > 6) go back to step 1. > > > > > > The tasks have hierarchical IDs, the root task being "1". Thus, an > > > agent could obtain task 1.23.15 and, after massaging it for a while, > > > decide to partition it into 1.23.15.1, 1.23.15.2, 1.23.15.3, etc, all > > > of which would be queued at the central pool, waiting to be obtained > > > by other agents, and so on. > > > > > > My program seems to run fine on my test cluster, which consists of 3 > > > dual-core PCs in my office running MPICH2 1.0.7. But it's not working > > > well on the production one, consisting of 50 quad-core nodes, on an > > > InfiniBand network, running MVAPICH 1.0. > > > > > > I have already asked the cluster admins whether it would be possible > > > to upgrade MPI on the cluster to the latest MVAPICH release (which > > > seems to be based on the same MPICH2 1.0.7 that is installed on the > > > test cluster). But the problem seems basic enough, and I'd be > > > surprised if the rather old MVAPICH version was to blame. (Meaning, my > > > guess is that I probably have some bug that shows itself quite easily > > > on the one platform, remains asymptomatic on the other one, yet is > > > still a bug). > > > > > > You can see an example of a (very verbose) logfile showing the > > > unwanted behavior here: > > > > > > http://neurus.dc.uba.ar/rsat/logs/publ/112/knine.o122 > > > > > > The three lines where we last hear about agents 2, 3 and 4 are > > > > > > 8.74 s -- Agent 3 sending PUSH message to pool for task 1.1.7 > > > 9.15 s -- Agent 4 sending PUSH message to pool for task 1.2.7 > > > 29.75 s -- Agent 2 sending PUSH message to pool for task 1.3.7 > > > > > > The agents are using fully synchronous Ssend()s for the PUSH messages, > > > and the pool process is using Iprobe() to test whether there is a new > > > message, and if that returns true, Recv() to get it. > > > > > > Notice how several PUSHes in a row succeed fine, then suddenly one of > > > them gets lost somehow. The pool process doesn't seem to ever get the > > > message (i.e. Iprobe() never returns true for it) so, naturally, the > > > sending agent blocks forever on its Ssend() call. Once this happens to > > > all agents, progress stops. > > > > > > If I try to run that same test, with same inputs, same parameters, > > > same PRNG seed, same number of agents, etc, but on the test cluster in > > > my office, it runs fine; messages are not lost, progress never stops > > > and the process eventually ends normally. > > > > > > Any input, comments or suggestions would be greatly appreciated. I > > > can provide source code, more logs and further details upon request to > > > anyone interested in helping me out. > > > > > > Thanks a lot in advance! > > > > > > Nicolás > > > > > > > > > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From nrosner at gmail.com Mon Sep 22 17:43:13 2008 From: nrosner at gmail.com (Nicolas Rosner) Date: Mon Sep 22 17:50:11 2008 Subject: [mvapich-discuss] FW: [mpich-discuss] Odd differences between runs on two clusters / Lost messages? In-Reply-To: References: <152828240809221346m1d577810gd29a22636d37eb5d@mail.gmail.com> Message-ID: <152828240809221443l3d03854am60e0adea07f0e53a@mail.gmail.com> Hello Dhabaleswar, Thanks so much for your prompt and detailed reply. I will carry out the steps as suggested, then get back to you with news soon. Best regards from Buenos Aires, Nicol?s From nrosner at gmail.com Mon Sep 22 22:08:41 2008 From: nrosner at gmail.com (Nicolas Rosner) Date: Mon Sep 22 22:28:52 2008 Subject: [mvapich-discuss] FW: [mpich-discuss] Odd differences between runs on two clusters / Lost messages? In-Reply-To: References: <152828240809221346m1d577810gd29a22636d37eb5d@mail.gmail.com> Message-ID: <152828240809221908i75dc17f3v5b3bca097f05cf67@mail.gmail.com> Hi, > 1. Since your cluster has MVAPICH2 1.0 installed, please test your > application using MPICH2 1.0.5p4. If it does not work with MPICH2 1.0.5p4, > it will not work with MVAPICH2 1.0. My program seems to be working OK under MPICH2 1.0.5p4, which I just built and installed on the 3-machine test cluster. I ran several test cases without any visible problems. > 2. If your program works with MPICH2 1.0.5p4, but not with MVAPICH2 1.0.0, > please tell your cluster administrator to update your installation to the > latest bugfix release of 1.0 series (MVAPICH2 1.0.3.). Correct, except they already agreed to try and upgrade to MVAPICH2 1.2 altogether, at some point in the near future. Still, it's very nice to know that 1.0.3 might also be an option (in case upgrading to 1.2 isn't acceptable for some reason). Thanks, Nicol?s From howardp at cray.com Wed Sep 24 13:10:08 2008 From: howardp at cray.com (Howard Pritchard) Date: Wed Sep 24 13:04:40 2008 Subject: [mvapich-discuss] mpi_allgather performance at high core counts question Message-ID: <48DA7470.4060503@cray.com> Hello Mvapich users, I'd be curious to know if any unusual performance problems have been observed with the mpich2 mpi_allgather algorithm for small data sizes/rank (for example 8 bytes per rank) at high core counts. For example, have any performance issused been observed on the TAC machine using mvapich2 at say 10000 ranks or so? Thanks for any info, Howard -- Howard Pritchard Cray Inc. From chai.15 at osu.edu Wed Sep 24 17:14:47 2008 From: chai.15 at osu.edu (Lei Chai) Date: Wed Sep 24 17:14:46 2008 Subject: [mvapich-discuss] RE: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer In-Reply-To: <531893A968B34D40B36C7A6445BC828A01E001D1@catoexm06.noam.corp.platform.com> References: <531893A968B34D40B36C7A6445BC828A01E001D1@catoexm06.noam.corp.platform.com> Message-ID: <48DAADC7.3020205@osu.edu> For the information of the mailing list, the problem has been solved offline. It is fixed by the patch below. The patch has been checked in to the trunk version of mvapich, and will also be in future releases. Lei Patch: Index: mpid/ch_gen2/mpid_hrecv.c =================================================================== --- mpid/ch_gen2/mpid_hrecv.c (revision 2989) +++ mpid/ch_gen2/mpid_hrecv.c (working copy) @@ -118,20 +118,6 @@ } - /* We have a non-contiguous buffer. - * Normally we would check for a null user buffer inside - * MPID_VIA_Irecv, but in this case we will pass the allocated - * buffer, not the user buffer, so check the user buffer - * here. - */ - - if (Is_MPI_Bottom(buf, count, dtype_ptr)) { - /* do not have to adjust ptr here */ - } else if (buf == 0 && count > 0) { - *error_code = MPI_ERR_BUFFER; - return; - } - /* Increment reference count for this type */ MPIR_Type_dup(dtype_ptr); =========================================================== Mehdi Bozzo-Rey wrote: > >From what I can see in the archive (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001888.html), there is something missing, so I resend my original email. > > Mehdi > > > ======================================= > From: Mehdi Bozzo-Rey > Sent: September-08-08 7:48 AM > To: 'mvapich-discuss@cse.ohio-state.edu' > Subject: mvapich 1.01 / scalapack: xcnep and xznep fail with MPI_RECV : Invalid buffer pointer > > Hello, > > I recompiled mvapich 1.0.1, BLACS and ScaLAPACK. > > - I am able to run the tests included in the BLACS distribution > - I am able to run most of the tests included in the ScaLAPACK distribution, except xcnep and xznep. They fail with the following errors: > > Do you have any idea what could be the root cause ? > > xcnep: > > -------------------------------------------------------------------------- > [mbozzore@compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xcnep > ScaLAPACK QSQ^H by Schur Decomposition. > 'MPI machine' > > Tests of the parallel complex single precision Schur decomposition. > The following scaled residual checks will be computed: > Residual = ||H-QSQ^H|| / (||H|| * eps * N ) > Orthogonality residual = ||I - Q^HQ|| / ( eps * N ) > The matrix A is randomly generated for each test. > > An explanation of the input/output parameters follows: > TIME : Indicates whether WALL or CPU time was used. > N : The number of columns in the matrix A. > NB : The size of the square blocks the matrix A is split into. > P : The number of process rows. > Q : The number of process columns. > THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED > NEP time : Time in seconds to decompose the matrix > MFLOPS : Rate of execution > > The following parameter values will be used: > N : 1 2 3 4 6 10 50 > NB : 6 8 17 > P : 1 2 > Q : 1 2 > > Relative machine precision (eps) is taken to be 0.596046E-07 > Routines pass computational tests if scaled residual is less than 20.000 > > TIME N NB P Q NEP Time MFLOPS CHECK > ---- ----- --- ---- ---- -------- -------- ------ > > WALL 1 6 1 1 0.00 1.06 PASSED > WALL 1 8 1 1 0.00 18.00 PASSED > WALL 1 17 1 1 0.00 9.00 PASSED > WALL 2 6 1 1 0.00 2.77 PASSED > WALL 2 8 1 1 0.00 20.57 PASSED > WALL 2 17 1 1 0.00 20.57 PASSED > WALL 3 6 1 1 0.00 4.26 PASSED > WALL 3 8 1 1 0.00 12.15 PASSED > WALL 3 17 1 1 0.00 12.15 PASSED > WALL 4 6 1 1 0.00 19.53 PASSED > WALL 4 8 1 1 0.00 21.33 PASSED > WALL 4 17 1 1 0.00 20.21 PASSED > WALL 6 6 1 1 0.00 30.61 PASSED > WALL 6 8 1 1 0.00 32.67 PASSED > WALL 6 17 1 1 0.00 29.91 PASSED > WALL 10 6 1 1 0.00 72.87 PASSED > WALL 10 8 1 1 0.00 80.00 PASSED > WALL 10 17 1 1 0.00 88.67 PASSED > WALL 50 6 1 1 0.01 408.57 PASSED > WALL 50 8 1 1 0.01 428.08 PASSED > WALL 50 17 1 1 0.00 481.49 PASSED > WALL 1 6 2 2 0.00 0.90 PASSED > WALL 1 8 2 2 0.00 1.50 PASSED > WALL 1 17 2 2 0.00 1.50 PASSED > WALL 2 6 2 2 0.00 1.22 PASSED > WALL 2 8 2 2 0.00 2.53 PASSED > WALL 2 17 2 2 0.00 2.62 PASSED > WALL 3 6 2 2 0.00 1.65 PASSED > WALL 3 8 2 2 0.00 2.09 PASSED > WALL 3 17 2 2 0.00 2.05 PASSED > WALL 4 6 2 2 0.00 4.04 PASSED > WALL 4 8 2 2 0.00 4.13 PASSED > WALL 4 17 2 2 0.00 4.07 PASSED > WALL 6 6 2 2 0.00 6.81 PASSED > WALL 6 8 2 2 0.00 7.51 PASSED > WALL 6 17 2 2 0.00 7.64 PASSED > 0 - MPI_RECV : Invalid buffer pointer > 2 - MPI_RECV : Invalid buffer pointer > [2] [] Aborting Program! > [0] [] Aborting Program! > Abort signaled by rank 0: Aborting program ! > Exit code -3 signaled from compute-00-02 > Killing remote processes...Abort signaled by rank 2: Aborting program ! > MPI process terminated unexpectedly > DONE > [mbozzore@compute-00-02 TESTING]$ Signal 15 received. > Signal 15 received. > -------------------------------------------------------------------------- > > > And xznep: > > -------------------------------------------------------------------------- > [mbozzore@compute-00-02 TESTING]$ mpirun_rsh -ssh -np 4 -hostfile ./hosts GFORTRAN_UNBUFFERED_ALL=yes ./xznep > ScaLAPACK QSQ^H by Schur Decomposition. > 'MPI machine' > > Tests of the parallel complex double precision Schur decomposition. > The following scaled residual checks will be computed: > Residual = ||H-QSQ^H|| / (||H|| * eps * N ) > Orthogonality residual = ||I - Q^HQ|| / ( eps * N ) > The matrix A is randomly generated for each test. > > An explanation of the input/output parameters follows: > TIME : Indicates whether WALL or CPU time was used. > N : The number of columns in the matrix A. > NB : The size of the square blocks the matrix A is split into. > P : The number of process rows. > Q : The number of process columns. > THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED > NEP time : Time in seconds to decompose the matrix > MFLOPS : Rate of execution > > The following parameter values will be used: > N : 1 2 3 4 6 10 50 > NB : 6 8 17 > P : 1 2 > Q : 1 2 > > Relative machine precision (eps) is taken to be 0.111022E-15 > Routines pass computational tests if scaled residual is less than 20.000 > > TIME N NB P Q NEP Time MFLOPS CHECK > ---- ----- --- ---- ---- -------- -------- ------ > > WALL 1 6 1 1 0.00 1.50 PASSED > WALL 1 8 1 1 0.00 18.00 PASSED > WALL 1 17 1 1 0.00 18.00 PASSED > WALL 2 6 1 1 0.00 2.15 PASSED > WALL 2 8 1 1 0.00 16.00 PASSED > WALL 2 17 1 1 0.00 16.00 PASSED > WALL 3 6 1 1 0.00 3.80 PASSED > WALL 3 8 1 1 0.00 8.10 PASSED > WALL 3 17 1 1 0.00 8.24 PASSED > WALL 4 6 1 1 0.00 14.22 PASSED > WALL 4 8 1 1 0.00 15.16 PASSED > WALL 4 17 1 1 0.00 15.16 PASSED > WALL 6 6 1 1 0.00 23.01 PASSED > WALL 6 8 1 1 0.00 23.71 PASSED > WALL 6 17 1 1 0.00 22.74 PASSED > WALL 10 6 1 1 0.00 46.39 PASSED > WALL 10 8 1 1 0.00 51.14 PASSED > WALL 10 17 1 1 0.00 55.56 PASSED > WALL 50 6 1 1 0.01 263.62 PASSED > WALL 50 8 1 1 0.01 283.73 PASSED > WALL 50 17 1 1 0.01 328.23 PASSED > WALL 1 6 2 2 0.00 0.90 PASSED > WALL 1 8 2 2 0.00 1.50 PASSED > WALL 1 17 2 2 0.00 1.50 PASSED > WALL 2 6 2 2 0.00 1.04 PASSED > WALL 2 8 2 2 0.00 2.48 PASSED > WALL 2 17 2 2 0.00 2.53 PASSED > WALL 3 6 2 2 0.00 1.28 PASSED > WALL 3 8 2 2 0.00 1.51 PASSED > WALL 3 17 2 2 0.00 1.51 PASSED > WALL 4 6 2 2 0.00 2.92 PASSED > WALL 4 8 2 2 0.00 2.98 PASSED > WALL 4 17 2 2 0.00 2.95 PASSED > WALL 6 6 2 2 0.00 6.12 PASSED > WALL 6 8 2 2 0.00 6.52 PASSED > WALL 6 17 2 2 0.00 6.92 PASSED > 0 - MPI_RECV : Invalid buffer pointer > 2 - MPI_RECV : Invalid buffer pointer > [2] [] Aborting Program! > [0] [] Aborting Program! > Abort signaled by rank 2: Aborting program ! > Abort signaled by rank 0: Aborting program ! > Exit code -3 signaled from compute-00-02 > Killing remote processes...MPI process terminated unexpectedly > DONE > [mbozzore@compute-00-02 TESTING]$ Signal 15 received. > Signal 15 received. > -------------------------------------------------------------------------- > > > > My Bmake.inc and SLmake.inc are attached to this email. > > Note: mpich 1.27p1, Open MPI 1.2.4 (IB) and Open MPI 1.2.5 (IB) are OK. > > For example: > -------------------------------------------------------------------------- > [mbozzore@compute-00-02 openmpi1.2.5]$ ompi_info | less > Open MPI: 1.2.5 > Open MPI SVN revision: r16989 > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > [mbozzore@compute-00-02 openmpi1.2.5]$ mpirun -np 4 --machinefile ./hosts --mca btl openib,self ./xcnep > ScaLAPACK QSQ^H by Schur Decomposition. > 'MPI machine' > > Tests of the parallel complex single precision Schur decomposition. > The following scaled residual checks will be computed: > Residual = ||H-QSQ^H|| / (||H|| * eps * N ) > Orthogonality residual = ||I - Q^HQ|| / ( eps * N ) > The matrix A is randomly generated for each test. > > An explanation of the input/output parameters follows: > TIME : Indicates whether WALL or CPU time was used. > N : The number of columns in the matrix A. > NB : The size of the square blocks the matrix A is split into. > P : The number of process rows. > Q : The number of process columns. > THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED > NEP time : Time in seconds to decompose the matrix > MFLOPS : Rate of execution > > The following parameter values will be used: > N : 1 2 3 4 6 10 50 > NB : 6 8 17 > P : 1 2 > Q : 1 2 > > Relative machine precision (eps) is taken to be 0.596046E-07 > Routines pass computational tests if scaled residual is less than 20.000 > > TIME N NB P Q NEP Time MFLOPS CHECK > ---- ----- --- ---- ---- -------- -------- ------ > > WALL 1 6 1 1 0.00 1.51 PASSED > WALL 1 8 1 1 0.00 18.87 PASSED > WALL 1 17 1 1 0.00 18.87 PASSED > WALL 2 6 1 1 0.00 1.87 PASSED > WALL 2 8 1 1 0.00 10.24 PASSED > WALL 2 17 1 1 0.00 18.30 PASSED > WALL 3 6 1 1 0.00 4.19 PASSED > WALL 3 8 1 1 0.00 11.08 PASSED > WALL 3 17 1 1 0.00 11.52 PASSED > WALL 4 6 1 1 0.00 18.58 PASSED > WALL 4 8 1 1 0.00 20.22 PASSED > WALL 4 17 1 1 0.00 20.13 PASSED > WALL 6 6 1 1 0.00 27.78 PASSED > WALL 6 8 1 1 0.00 29.49 PASSED > WALL 6 17 1 1 0.00 28.61 PASSED > WALL 10 6 1 1 0.00 66.17 PASSED > WALL 10 8 1 1 0.00 72.04 PASSED > WALL 10 17 1 1 0.00 81.09 PASSED > WALL 50 6 1 1 0.01 392.33 PASSED > WALL 50 8 1 1 0.01 409.76 PASSED > WALL 50 17 1 1 0.00 463.93 PASSED > WALL 1 6 2 2 0.00 0.31 PASSED > WALL 1 8 2 2 0.00 0.72 PASSED > WALL 1 17 2 2 0.00 0.75 PASSED > WALL 2 6 2 2 0.00 0.76 PASSED > WALL 2 8 2 2 0.00 1.00 PASSED > WALL 2 17 2 2 0.00 1.11 PASSED > WALL 3 6 2 2 0.00 0.55 PASSED > WALL 3 8 2 2 0.00 1.12 PASSED > WALL 3 17 2 2 0.00 1.11 PASSED > WALL 4 6 2 2 0.00 2.20 PASSED > WALL 4 8 2 2 0.00 2.19 PASSED > WALL 4 17 2 2 0.00 2.19 PASSED > WALL 6 6 2 2 0.00 3.96 PASSED > WALL 6 8 2 2 0.00 4.13 PASSED > WALL 6 17 2 2 0.00 4.46 PASSED > WALL 10 6 2 2 0.02 1.16 PASSED > WALL 10 8 2 2 0.00 7.40 PASSED > WALL 10 17 2 2 0.00 13.08 PASSED > WALL 50 6 2 2 0.05 47.80 PASSED > WALL 50 8 2 2 0.04 57.51 PASSED > WALL 50 17 2 2 0.02 128.66 PASSED > > Finished 42 tests, with the following results: > 42 tests completed and passed residual checks. > 0 tests completed and failed residual checks. > 0 tests skipped because of illegal input values. > > > END OF TESTS. > -------------------------------------------------------------------------- > > > Thanks, > > Mehdi > > > Mehdi Bozzo-Rey > HPC Solution Developer > Platform OCS5 > Platform computing > Phone: +1 905 948 4649 > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Thu Sep 25 00:25:46 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Sep 25 00:25:49 2008 Subject: [mvapich-discuss] mpi_allgather performance at high core counts question In-Reply-To: <48DA7470.4060503@cray.com> Message-ID: I am posting this note for information to the mvapich-discuss users. I had an off-the-list discussion with Howard. We have not received any report related to performance issues observed on the TACC system for mpi_allgather with mvapich2 stack. DK On Wed, 24 Sep 2008, Howard Pritchard wrote: > Hello Mvapich users, > > I'd be curious to know if any unusual performance problems have > been observed with the mpich2 mpi_allgather algorithm for small > data sizes/rank (for example 8 bytes per rank) at high core counts. > > For example, have any performance issused been observed on the TAC machine > using mvapich2 at say 10000 ranks or so? > > Thanks for any info, > > Howard > > -- > > Howard Pritchard > Cray Inc. > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From L-marks at northwestern.edu Thu Sep 25 11:48:58 2008 From: L-marks at northwestern.edu (Laurence Marks) Date: Thu Sep 25 11:49:02 2008 Subject: [mvapich-discuss] Possibly undesirable mvapich "feature" (was Possible bug) Message-ID: <876512660809250848r4b1434b0v7a63268f40e81498@mail.gmail.com> I think I may have partially resolved my previous problem (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001920.html ), but not completely. One of the engineers at the company that sold me the cluster pointed out that the first node running the job (using 8 cores) was doing a little swap, even though the mpi job itself was not requiring swap. I suspect that doing I/O and general other OS tasks associated with communicating from the 1st core to all the others was leading to this and causing problems. I can resolve this by running with the first entry in the machines file on my head node, then everything is OK. Unfortunately this leads to another problem. If I have two mpi jobs both using one core on the head node, instead of using separate cores they both use the same one! I suspect that this is a design feature, i.e. to use the first core unless something else has been specified with VIADEV_CPU_MAPPING or similar. I wonder if there is any way around this short of specifying different mappings for different jobs which would become a bit of a nightmare since individual users (i.e. my students) would have to get it right. An alternative is running with 7 cores on the first machine to leave some free CPU for OS operations, but this is inefficient. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/IUCR_CED From yogyas at gmail.com Thu Sep 25 11:56:00 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Thu Sep 25 11:56:04 2008 Subject: [mvapich-discuss] How & where to set ulimit ? Message-ID: Hi all, I am using mvapich2 on OFED stack. Referring to user guide, i have set memlock limit to unlimited in /etc/security/limits.conf & /etc/init.d/sshd. ssh service is restarted. Now, if i do a relogin, ulimit -a should show unlimited. But it is showing 32Kb only. The account used is user account, not root account. Am i doing the correct steps ? I tried to set this command in .bashrc of the user account. But at every login, following error comes:- -bash: ulimit: max locked memory: cannot modify limit: Operation not permitted I tried to set some number, i.e. ulimit -l 4096. This is successful. But setting to 4097 is failing. Can somebody give me the correct steps & info ? Mainly, i am running HPL. So whether these settings can hamper the performance figures ? OR what are the optimal settings ? Thanking you, Yogeshwar From chai.15 at osu.edu Thu Sep 25 12:54:48 2008 From: chai.15 at osu.edu (Lei Chai) Date: Thu Sep 25 12:54:52 2008 Subject: [mvapich-discuss] Possibly undesirable mvapich "feature" (was Possible bug) In-Reply-To: <876512660809250848r4b1434b0v7a63268f40e81498@mail.gmail.com> References: <876512660809250848r4b1434b0v7a63268f40e81498@mail.gmail.com> Message-ID: <48DBC258.305@osu.edu> Hi Laurence, By default, mvapich uses cpu affinity and tries to use cpu's starting from cpu 0. To solve your problem, there are two options: - Use the VIADEV_CPU_MAPPING env variable as you mentioned. Map to different cpu sets for different MPI jobs. - Use the VIADEV_USE_AFFINITY=0 env variable to disable cpu affinity. The OS will schedule the MPI jobs on different cpu's. Hope this helps. Lei Laurence Marks wrote: > I think I may have partially resolved my previous problem > (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001920.html > ), but not completely. > > One of the engineers at the company that sold me the cluster pointed > out that the first node running the job (using 8 cores) was doing a > little swap, even though the mpi job itself was not requiring swap. I > suspect that doing I/O and general other OS tasks associated with > communicating from the 1st core to all the others was leading to this > and causing problems. I can resolve this by running with the first > entry in the machines file on my head node, then everything is OK. > > Unfortunately this leads to another problem. If I have two mpi jobs > both using one core on the head node, instead of using separate cores > they both use the same one! I suspect that this is a design feature, > i.e. to use the first core unless something else has been specified > with VIADEV_CPU_MAPPING or similar. I wonder if there is any way > around this short of specifying different mappings for different jobs > which would become a bit of a nightmare since individual users (i.e. > my students) would have to get it right. An alternative is running > with 7 cores on the first machine to leave some free CPU for OS > operations, but this is inefficient. > > From L-marks at northwestern.edu Thu Sep 25 13:09:49 2008 From: L-marks at northwestern.edu (Laurence Marks) Date: Thu Sep 25 13:09:55 2008 Subject: [mvapich-discuss] Possibly undesirable mvapich "feature" (was Possible bug) In-Reply-To: <48DBC258.305@osu.edu> References: <876512660809250848r4b1434b0v7a63268f40e81498@mail.gmail.com> <48DBC258.305@osu.edu> Message-ID: <876512660809251009q387d2c82xe5bd41bf52566dcf@mail.gmail.com> Thanks. I thought something like this might be the answer. However, before I set this as a global option, will doing this lead to the mpi tasks on the compute nodes hopping among the cores and slowing the calculation down? (Since each job takes 60-90 minutes testing different options is tedious.) On Thu, Sep 25, 2008 at 11:54 AM, Lei Chai wrote: > Hi Laurence, > > By default, mvapich uses cpu affinity and tries to use cpu's starting from > cpu 0. To solve your problem, there are two options: > > - Use the VIADEV_CPU_MAPPING env variable as you mentioned. Map to different > cpu sets for different MPI jobs. > - Use the VIADEV_USE_AFFINITY=0 env variable to disable cpu affinity. The OS > will schedule the MPI jobs on different cpu's. > > Hope this helps. > > Lei > > > Laurence Marks wrote: >> >> I think I may have partially resolved my previous problem >> >> (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001920.html >> ), but not completely. >> >> One of the engineers at the company that sold me the cluster pointed >> out that the first node running the job (using 8 cores) was doing a >> little swap, even though the mpi job itself was not requiring swap. I >> suspect that doing I/O and general other OS tasks associated with >> communicating from the 1st core to all the others was leading to this >> and causing problems. I can resolve this by running with the first >> entry in the machines file on my head node, then everything is OK. >> >> Unfortunately this leads to another problem. If I have two mpi jobs >> both using one core on the head node, instead of using separate cores >> they both use the same one! I suspect that this is a design feature, >> i.e. to use the first core unless something else has been specified >> with VIADEV_CPU_MAPPING or similar. I wonder if there is any way >> around this short of specifying different mappings for different jobs >> which would become a bit of a nightmare since individual users (i.e. >> my students) would have to get it right. An alternative is running >> with 7 cores on the first machine to leave some free CPU for OS >> operations, but this is inefficient. >> >> > > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/IUCR_CED From perkinjo at cse.ohio-state.edu Thu Sep 25 13:25:20 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 25 13:26:27 2008 Subject: [mvapich-discuss] How & where to set ulimit ? In-Reply-To: References: Message-ID: <20080925172518.GE2994@cse.ohio-state.edu> On Thu, Sep 25, 2008 at 09:26:00PM +0530, yogeshwar sonawane wrote: > Hi all, > > I am using mvapich2 on OFED stack. Referring to user guide, i have set > memlock limit to unlimited in /etc/security/limits.conf & > /etc/init.d/sshd. ssh service is restarted. Now, if i do a relogin, > ulimit -a should show unlimited. But it is showing 32Kb only. The > account used is user account, not root account. > Am i doing the correct steps ? Did you use 'ulimit -l unlimited' in the /etc/init.d/sshd file before it created the sshd daemon? If so, it seems that you're following the correct steps. > > I tried to set this command in .bashrc of the user account. But at > every login, following error comes:- > -bash: ulimit: max locked memory: cannot modify limit: Operation not permitted > > I tried to set some number, i.e. ulimit -l 4096. This is successful. > But setting to 4097 is failing. > Can somebody give me the correct steps & info ? > Is there a hard memlock limit set? You may want to try setting this as well to see if this gives you better results. Which Linux distribution are you using? I found that with RHEL5 I can simply set the soft and hard memlock limits in /etc/security/limits.conf without even touching /etc/init.d/sshd. > Mainly, i am running HPL. So whether these settings can hamper the > performance figures ? > OR what are the optimal settings ? I'm not sure if there is a such thing as optimal settings for this. It depends on how much memory you want to allow user processes to lock compared to how much is left for the OS to do its work. > > Thanking you, > Yogeshwar > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From chai.15 at osu.edu Thu Sep 25 13:28:11 2008 From: chai.15 at osu.edu (Lei Chai) Date: Thu Sep 25 13:28:16 2008 Subject: [mvapich-discuss] Possibly undesirable mvapich "feature" (was Possible bug) In-Reply-To: <876512660809251009q387d2c82xe5bd41bf52566dcf@mail.gmail.com> References: <876512660809250848r4b1434b0v7a63268f40e81498@mail.gmail.com> <48DBC258.305@osu.edu> <876512660809251009q387d2c82xe5bd41bf52566dcf@mail.gmail.com> Message-ID: <48DBCA2B.5040909@osu.edu> Yes, disabling cpu affinity will lead to mpi tasks hopping among the cores. If you are using Intel platform the performance won't be affected much. If you are using AMD NUMA architecture there might be some performance difference since some cores may need to access remote memory sometimes. It also depends on the application, such as data access patterns etc. So I don't have an accurate estimate, but I guess the difference won't be too much. Lei Laurence Marks wrote: > Thanks. I thought something like this might be the answer. > > However, before I set this as a global option, will doing this lead to > the mpi tasks on the compute nodes hopping among the cores and slowing > the calculation down? (Since each job takes 60-90 minutes testing > different options is tedious.) > > On Thu, Sep 25, 2008 at 11:54 AM, Lei Chai wrote: > >> Hi Laurence, >> >> By default, mvapich uses cpu affinity and tries to use cpu's starting from >> cpu 0. To solve your problem, there are two options: >> >> - Use the VIADEV_CPU_MAPPING env variable as you mentioned. Map to different >> cpu sets for different MPI jobs. >> - Use the VIADEV_USE_AFFINITY=0 env variable to disable cpu affinity. The OS >> will schedule the MPI jobs on different cpu's. >> >> Hope this helps. >> >> Lei >> >> >> Laurence Marks wrote: >> >>> I think I may have partially resolved my previous problem >>> >>> (http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001920.html >>> ), but not completely. >>> >>> One of the engineers at the company that sold me the cluster pointed >>> out that the first node running the job (using 8 cores) was doing a >>> little swap, even though the mpi job itself was not requiring swap. I >>> suspect that doing I/O and general other OS tasks associated with >>> communicating from the 1st core to all the others was leading to this >>> and causing problems. I can resolve this by running with the first >>> entry in the machines file on my head node, then everything is OK. >>> >>> Unfortunately this leads to another problem. If I have two mpi jobs >>> both using one core on the head node, instead of using separate cores >>> they both use the same one! I suspect that this is a design feature, >>> i.e. to use the first core unless something else has been specified >>> with VIADEV_CPU_MAPPING or similar. I wonder if there is any way >>> around this short of specifying different mappings for different jobs >>> which would become a bit of a nightmare since individual users (i.e. >>> my students) would have to get it right. An alternative is running >>> with 7 cores on the first machine to leave some free CPU for OS >>> operations, but this is inefficient. >>> >>> >>> >> > > > > From twcroc at wm.edu Thu Sep 25 17:11:57 2008 From: twcroc at wm.edu (Tom Crockett) Date: Thu Sep 25 17:12:04 2008 Subject: [mvapich-discuss] Bug in MVAPICH2 SVN Trunk Configure Script Message-ID: <48DBFE9D.7090808@wm.edu> I think I've found a bug in "mvapich2-trunk-2008-09-09/src/binding/f90/configure". Specifically, line 1370 reads: F90FLAGS="$MPICH2_INTERNAL_C90FLAGS" I believe this should instead be: F90FLAGS="$MPICH2_INTERNAL_F90FLAGS" This didn't cause me any trouble when I built the code with the Portland Group compiler suite, but it resulted in linking problems between gfortran and my OFED 1.3 libraries when I tried to build a 32-bit version with GCC 4.1.2. I'm having a similar problem in test/mpi/configure, this time with CFLAGS not being set. I believe the offending section of code is the following: # If it is building with MPICH2, set xFLAGS to null, as mpiXX contains xFLAGS. if test "$FROM_MPICH2" = "yes" ; then CFLAGS="" CXXFLAGS="" FFLAGS="" F90FLAGS="" fi If I comment this out, the configure step completes successfully. For the record, I'm setting CFLAGS, FFLAGS, F90FLAGS, etc. to "-march=k8 -m32". -Tom -- Tom Crockett College of William and Mary email: twcroc@wm.edu IT/High Performance Computing Group phone: (757) 221-2762 Savage House fax: (757) 221-2023 P.O. Box 8795 Williamsburg, VA 23187-8795 From perkinjo at cse.ohio-state.edu Thu Sep 25 17:33:35 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 25 17:34:42 2008 Subject: [mvapich-discuss] Bug in MVAPICH2 SVN Trunk Configure Script In-Reply-To: <48DBFE9D.7090808@wm.edu> References: <48DBFE9D.7090808@wm.edu> Message-ID: <20080925213334.GB2955@cse.ohio-state.edu> On Thu, Sep 25, 2008 at 05:11:57PM -0400, Tom Crockett wrote: > I think I've found a bug in > "mvapich2-trunk-2008-09-09/src/binding/f90/configure". > > Specifically, line 1370 reads: > > F90FLAGS="$MPICH2_INTERNAL_C90FLAGS" > > I believe this should instead be: > > F90FLAGS="$MPICH2_INTERNAL_F90FLAGS" Thanks Tom, this looks like it can be problematic. Thanks for reporting it we'll take a look and incorporate this change. > > This didn't cause me any trouble when I built the code with the Portland > Group compiler suite, but it resulted in linking problems between > gfortran and my OFED 1.3 libraries when I tried to build a 32-bit > version with GCC 4.1.2. > > I'm having a similar problem in test/mpi/configure, this time with > CFLAGS not being set. I believe the offending section of code is the > following: > > # If it is building with MPICH2, set xFLAGS to null, as mpiXX contains > xFLAGS. > if test "$FROM_MPICH2" = "yes" ; then > CFLAGS="" > CXXFLAGS="" > FFLAGS="" > F90FLAGS="" > fi > > If I comment this out, the configure step completes successfully. We'll also take a look at this. We may still need to reset these flags but maybe it should still include various 'internal' flags that the user has specified. > > For the record, I'm setting CFLAGS, FFLAGS, F90FLAGS, etc. to "-march=k8 > -m32". > > -Tom > > -- > Tom Crockett > > College of William and Mary email: twcroc@wm.edu > IT/High Performance Computing Group phone: (757) 221-2762 > Savage House fax: (757) 221-2023 > P.O. Box 8795 > Williamsburg, VA 23187-8795 > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From balaji at mcs.anl.gov Thu Sep 25 19:08:38 2008 From: balaji at mcs.anl.gov (Pavan Balaji) Date: Thu Sep 25 19:08:45 2008 Subject: [mvapich-discuss] Bug in MVAPICH2 SVN Trunk Configure Script In-Reply-To: <20080925213334.GB2955@cse.ohio-state.edu> References: <48DBFE9D.7090808@wm.edu> <20080925213334.GB2955@cse.ohio-state.edu> Message-ID: <48DC19F6.40901@mcs.anl.gov> >> Specifically, line 1370 reads: >> >> F90FLAGS="$MPICH2_INTERNAL_C90FLAGS" >> >> I believe this should instead be: >> >> F90FLAGS="$MPICH2_INTERNAL_F90FLAGS" > > Thanks Tom, this looks like it can be problematic. Thanks for reporting > it we'll take a look and incorporate this change. This was a bug in MPICH2 that was fixed in r1000 shortly after the 1.0.7 release. But, this error should only mean that the Fortran bindings are not compiled with the internal optimizations; so there shouldn't be any noticeable performance impact. >> # If it is building with MPICH2, set xFLAGS to null, as mpiXX contains >> xFLAGS. >> if test "$FROM_MPICH2" = "yes" ; then >> CFLAGS="" >> CXXFLAGS="" >> FFLAGS="" >> F90FLAGS="" >> fi >> >> If I comment this out, the configure step completes successfully. > > We'll also take a look at this. We may still need to reset these flags > but maybe it should still include various 'internal' flags that the user > has specified. The CFLAGS will be included within mpicc; they won't show up externally. You can check this using mpicc -show. Is this not what you noticed? What exact CFLAGS and MPICH2LIB_CFLAGS did you use? Note that these are are exclusive. CFLAGS sets flags that are used to compile MPICH2/MVAPICH2 as well as show up in mpicc. MPICH2LIB_CFLAGS only sets flags that are used to compile MPICH2/MVAPICH2. -- Pavan -- Pavan Balaji http://www.mcs.anl.gov/~balaji From xmxmxie at gmail.com Fri Sep 26 02:07:14 2008 From: xmxmxie at gmail.com (Xie Min) Date: Fri Sep 26 02:07:37 2008 Subject: [mvapich-discuss] send desc error? Message-ID: <91bd441b0809252307v44e4ca99jc05064f8f608934d@mail.gmail.com> We are using mvapich2 on an infiniBand cluster, each node has two Quad-Core Intel Xeon 64 CPU. After install mvapich2-1.0.3, we use NPB 3.3 to do some tests, but at least the bt.C.64 cannot run, it will exit with error after PMI_Barrier(). Many tasks print the similar error messages: send desc error [23] Abort: [] Got completion with error 5, vendor vcode=f9, dest rank = 40 (or error 9, vendor code=8a, etc) at line 512 in file ibv_channel_manager.c We tried mvapich2-1.2rc2, bt.C.64 can run to completion without error. Because it seems mvapich2-1.0.3 is a stable version, so I am not sure if our runtime environment has some problems. We use OpenFabrics 1.3 in the cluster nodes. BTW, mvapich-1.0.3 use mpich2-1.0.5 as the base, mvapich2-1.2rc2 use mpich2-1.0.7, what I want to know is what is the difference of ROMIO in these two version? Thanks. From weikuan.yu at gmail.com Fri Sep 26 12:03:48 2008 From: weikuan.yu at gmail.com (Weikuan Yu) Date: Fri Sep 26 12:03:52 2008 Subject: [mvapich-discuss] send desc error? In-Reply-To: <91bd441b0809252307v44e4ca99jc05064f8f608934d@mail.gmail.com> References: <91bd441b0809252307v44e4ca99jc05064f8f608934d@mail.gmail.com> Message-ID: <48DD07E4.9050605@gmail.com> Hi, Xie, > BTW, mvapich-1.0.3 use mpich2-1.0.5 as the base, > mvapich2-1.2rc2 use mpich2-1.0.7, Not sure how you got this impression. As far as I know, mvapich-1.0.3 is based on MPICH version 1. > what I want to know is what is the difference of ROMIO > in these two version? MVAPICH1 has its romio based from the original from MPICH1, MVAPICH2 from MPIPCH2. In addition, ROMIO in MVAPICH1 has added support for Lustre ADIO driver. --Weikuan Xie Min wrote: > We are using mvapich2 on an infiniBand cluster, each node has two > Quad-Core Intel Xeon 64 CPU. > > After install mvapich2-1.0.3, we use NPB 3.3 to do some tests, but at > least the bt.C.64 cannot run, it will exit with error after > PMI_Barrier(). > Many tasks print the similar error messages: > > send desc error > [23] Abort: [] Got completion with error 5, vendor vcode=f9, dest rank > = 40 (or error 9, vendor code=8a, etc) > at line 512 in file ibv_channel_manager.c > > We tried mvapich2-1.2rc2, bt.C.64 can run to completion without error. > Because it seems mvapich2-1.0.3 is a stable version, so I am not sure > if our runtime environment has some problems. > > We use OpenFabrics 1.3 in the cluster nodes. > > BTW, mvapich-1.0.3 use mpich2-1.0.5 as the base, mvapich2-1.2rc2 use > mpich2-1.0.7, what I want to know is what is the difference of ROMIO > in these two version? > > Thanks. > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- Weikuan Yu <+> 1-865-574-7990 http://ft.ornl.gov/~wyu/ From weikuan.yu at gmail.com Fri Sep 26 13:14:47 2008 From: weikuan.yu at gmail.com (Weikuan Yu) Date: Fri Sep 26 13:14:50 2008 Subject: [mvapich-discuss] send desc error? In-Reply-To: <91bd441b0809252307v44e4ca99jc05064f8f608934d@mail.gmail.com> References: <91bd441b0809252307v44e4ca99jc05064f8f608934d@mail.gmail.com> Message-ID: <48DD1887.5010608@gmail.com> Hi, Xie, Sorry for the earlier typing errors. > BTW, mvapich-1.0.3 use mpich2-1.0.5 as the base, mvapich2-1.2rc2 use > mpich2-1.0.7, what I want to know is what is the difference of ROMIO > in these two version? I guess you meant mvapich2-1.0.3, as I could not find mvapich-1.0.3. A convenient link below may help you find out the ROMIO differences between the two. # https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/MPICH2_1_0_7/CHANGES --Weikuan Xie Min wrote: > We are using mvapich2 on an infiniBand cluster, each node has two > Quad-Core Intel Xeon 64 CPU. > > After install mvapich2-1.0.3, we use NPB 3.3 to do some tests, but at > least the bt.C.64 cannot run, it will exit with error after > PMI_Barrier(). > Many tasks print the similar error messages: > > send desc error > [23] Abort: [] Got completion with error 5, vendor vcode=f9, dest rank > = 40 (or error 9, vendor code=8a, etc) > at line 512 in file ibv_channel_manager.c > > We tried mvapich2-1.2rc2, bt.C.64 can run to completion without error. > Because it seems mvapich2-1.0.3 is a stable version, so I am not sure > if our runtime environment has some problems. > > We use OpenFabrics 1.3 in the cluster nodes. > > BTW, mvapich-1.0.3 use mpich2-1.0.5 as the base, mvapich2-1.2rc2 use > mpich2-1.0.7, what I want to know is what is the difference of ROMIO > in these two version? > > Thanks. > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- Weikuan Yu <+> 1-865-574-7990 http://ft.ornl.gov/~wyu/ From yogyas at gmail.com Sat Sep 27 04:47:34 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Sat Sep 27 04:47:40 2008 Subject: [mvapich-discuss] How & where to set ulimit ? In-Reply-To: <20080925172518.GE2994@cse.ohio-state.edu> References: <20080925172518.GE2994@cse.ohio-state.edu> Message-ID: Hi jonathan, Thanks for the reply. My comments are inline. On Thu, Sep 25, 2008 at 10:55 PM, Jonathan Perkins wrote: > On Thu, Sep 25, 2008 at 09:26:00PM +0530, yogeshwar sonawane wrote: >> Hi all, >> >> I am using mvapich2 on OFED stack. Referring to user guide, i have set >> memlock limit to unlimited in /etc/security/limits.conf & >> /etc/init.d/sshd. ssh service is restarted. Now, if i do a relogin, >> ulimit -a should show unlimited. But it is showing 32Kb only. The >> account used is user account, not root account. >> Am i doing the correct steps ? > > Did you use 'ulimit -l unlimited' in the /etc/init.d/sshd file before it > created the sshd daemon? If so, it seems that you're following the > correct steps. > I restart the sshd service >> >> I tried to set this command in .bashrc of the user account. But at >> every login, following error comes:- >> -bash: ulimit: max locked memory: cannot modify limit: Operation not permitted >> >> I tried to set some number, i.e. ulimit -l 4096. This is successful. >> But setting to 4097 is failing. >> Can somebody give me the correct steps & info ? >> > > Is there a hard memlock limit set? You may want to try setting this as > well to see if this gives you better results. i have added following line in /etc/security/limit.conf * hard memlock unlimited > > Which Linux distribution are you using? I found that with RHEL5 I can > simply set the soft and hard memlock limits in /etc/security/limits.conf > without even touching /etc/init.d/sshd. > my Linux distribution is RHEL 4- update 5-x86_64-Workstation >> Mainly, i am running HPL. So whether these settings can hamper the >> performance figures ? >> OR what are the optimal settings ? > > I'm not sure if there is a such thing as optimal settings for this. It > depends on how much memory you want to allow user processes to lock > compared to how much is left for the OS to do its work. > >> >> Thanking you, >> Yogeshwar >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > From mbkumar at gmail.com Sat Sep 27 16:37:43 2008 From: mbkumar at gmail.com (Bharat) Date: Sat Sep 27 16:38:11 2008 Subject: [mvapich-discuss] mvapich2 error Message-ID: Hi All, After several days of trying various things, I am posting my problem. We have 16node, dual processor, Quad Core Intel Xeon with 16GB RAM/node cluster interconnected with infiniband. I am using mvapich2-1.2RC2. And I am running an application compiled using ifort 10.1.017, intel mkl 10.0.1.014 (scalapack & blacs taken from intel libraries). The program runs fine for some time and then it stops with the error message like siesta: ============================== Begin CG move = 15 ============================== siesta: iscf Eharris(eV) E_KS(eV) FreeEng(eV) dDmax Ef(eV) siesta: 1 -110464.5442 -110476.9339 -110477.1312 0.1268 -4.4928 siesta: 2 -110507.6684 -110459.2304 -110459.4392 0.3223 -5.8411 siesta: 3 -110463.9960 -110472.4056 -110472.5206 0.0867 -4.6470 Fatal error in MPI_Bcast: Message truncated, error stack: MPI_Bcast(1144)...................: MPI_Bcast(buf=0x20c0fe0, count=1, dtype=USER, root=2, comm=0xc4000006) failed MPIR_Bcast(228)...................: MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2 truncated; 31744 bytes received but buffer size is 1600 rank 5 in job 27 master_39065 caused collective abort of all ranks exit status of rank 5: killed by signal 9 I tried different compiler flags, and also tried gfortran, but the problem is still present. So I am thinking the error is related to mvapich2. And I am new to mvapich2. So can someone please help me in solving this issue. I did only default install of mvapich2 (i.e., ./configure CC=... F90=..., make, make install). Do I have to set any environment variables? I used the option of -heap_arrays during compiling to overcome stack size issue. The output of ibstatus is Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0027:da55 base lid: 0x13 sm lid: 0x13 state: 4: ACTIVE phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) The output of ibv_devinfo is hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0002:c902:0027:da54 sys_image_guid: 0002:c902:0027:da57 vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_03B0150002 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 19 port_lid: 19 port_lmc: 0x00 Thanks, Bharat From panda at cse.ohio-state.edu Sun Sep 28 00:37:59 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun Sep 28 00:38:05 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH 1.1RC1 Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH 1.1RC1 with the following NEW features: - New Features for OpenFabrics Gen2-IB Interface - eXtended Reliable Connection (XRC) support - Lock-free design to provide support for asynchronous progress at both sender and receiver to overlap computation and communication - New OpenFabrics Gen2-Hybrid interface - Replaces the Gen2-UD interface of MVAPICH 1.0 series - Targeted for large-scale IB clusters (multi-thousand cores) to provide highest performance and minimal memory usage - Support for UD, RC and XRC transports - Adaptive selection during run-time (based on application and systems characteristics) to switch between RC and UD (or between XRC and UD) transports - Delivers performance and scalability with near constant memory footprint for communication contexts - Zero-copy protocol with UD for large data transfer - Multiple buffer organizations with XRC support - Shared memory communication between cores within a node - Multi-core optimized collectives (MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce) - Enhanced MPI_Allgather collective For downloading MVAPICH 1.1RC1, associated user guide and accessing the SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu This version is also being made available through OFED 1.4. All feedbacks, including bug reports and hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From poulson.jack at gmail.com Sun Sep 28 01:13:42 2008 From: poulson.jack at gmail.com (Jack Poulson) Date: Sun Sep 28 08:03:33 2008 Subject: [mvapich-discuss] Bug in Allreduce for user-defined ops Message-ID: <1ef1de420809272213i45d98b62oa90bf05d15594e34@mail.gmail.com> I believe I've run into a bug in the implementation of Allreduce for user-defined functions in MVAPICH 1.0 and 1.0.1 (0.9.8 works). In 0.9.8, for power-of-two processes, the user-op is called log2 times with the correct length. In the new versions, it appears to be called log2+2 times, where the first call to the user-op passes in a count of zero (I found this by simply printing it from within the user-op). I've looked through the intra_Allreduce routine in src/coll/intra_fns_new.c, but I don't see why the user-op is called more than log2 times for power-of-two processes. Should user-defined ops check to ensure the length is nonzero? I've attached a driver and output that demonstrate the problem. The issue causes problems in operations such as a custom pivoting operation in an LU factorization, where an integer is tacked onto the end of a set of doubles, and a zero length in bytes would cause the routine to decide negative doubles are being operated on. I've been working around the problem with a custom Allreduce implementation that uses a reduce-to-one/bcast, but I would like to take advantage of your team's multicore optimizations. Thank you, Jack Poulson -------------- next part -------------- A non-text attachment was scrubbed... Name: user_op.c Type: text/x-csrc Size: 1938 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user_op-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: user-op-0.9.8 Type: application/octet-stream Size: 23200 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user-op-0.9-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: user-op-1.0 Type: application/octet-stream Size: 2709 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user-op-1-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: user-op-1.0.1 Type: application/octet-stream Size: 2692 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080928/2878e0d7/user-op-1.0-0001.obj From chai.15 at osu.edu Mon Sep 29 21:14:06 2008 From: chai.15 at osu.edu (Lei Chai) Date: Mon Sep 29 21:14:07 2008 Subject: [mvapich-discuss] mvapich2 error In-Reply-To: References: Message-ID: <48E17D5E.8000301@osu.edu> Hi Bharat, Thanks for reporting the problem. Since we don't have the license for siesta we are not able to run it on our cluster. Could you try the following and let us know the results: - Use the option MV2_USE_SHMEM_COLL=0 <#x1-13400011.56> e.g. $ mpirun_rsh -np N -hostfile ./hosts MV2_USE_SHMEM_COLL=0 <#x1-13400011.56> ./prog - Try to run the program with MPICH2-1.0.7, since mvapich2-1.2rc2 is based on MPICH2-1.0.7 This will help us get more insight into the problem. Thanks, Lei Bharat wrote: > Hi All, > > After several days of trying various things, I am posting my problem. > We have 16node, dual processor, Quad Core Intel Xeon with 16GB > RAM/node cluster interconnected with infiniband. I am using > mvapich2-1.2RC2. And I am running an application compiled using ifort > 10.1.017, intel mkl 10.0.1.014 (scalapack & blacs taken from intel > libraries). The program runs fine for some time and then it stops with > the error message like > > siesta: ============================== > Begin CG move = 15 > ============================== > > > siesta: iscf Eharris(eV) E_KS(eV) FreeEng(eV) dDmax Ef(eV) > siesta: 1 -110464.5442 -110476.9339 -110477.1312 0.1268 -4.4928 > siesta: 2 -110507.6684 -110459.2304 -110459.4392 0.3223 -5.8411 > siesta: 3 -110463.9960 -110472.4056 -110472.5206 0.0867 -4.6470 > Fatal error in MPI_Bcast: > Message truncated, error stack: > MPI_Bcast(1144)...................: MPI_Bcast(buf=0x20c0fe0, count=1, > dtype=USER, root=2, comm=0xc4000006) failed > MPIR_Bcast(228)...................: > MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2 > truncated; 31744 bytes received but buffer size is 1600 > rank 5 in job 27 master_39065 caused collective abort of all ranks > exit status of rank 5: killed by signal 9 > > I tried different compiler flags, and also tried gfortran, but the > problem is still present. So I am thinking > the error is related to mvapich2. And I am new to mvapich2. So can > someone please help me in solving this issue. > I did only default install of mvapich2 (i.e., ./configure CC=... > F90=..., make, make install). Do I have to > set any environment variables? I used the option of -heap_arrays > during compiling to overcome stack size issue. > The output of ibstatus is > > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0027:da55 > base lid: 0x13 > sm lid: 0x13 > state: 4: ACTIVE > phys state: 5: LinkUp > rate: 20 Gb/sec (4X DDR) > > The output of ibv_devinfo is > hca_id: mthca0 > fw_ver: 1.2.0 > node_guid: 0002:c902:0027:da54 > sys_image_guid: 0002:c902:0027:da57 > vendor_id: 0x02c9 > vendor_part_id: 25204 > hw_ver: 0xA0 > board_id: MT_03B0150002 > phys_port_cnt: 1 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 19 > port_lid: 19 > port_lmc: 0x00 > > > Thanks, > Bharat > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss