From ibatis2 at 163.com Tue Jan 1 07:12:05 2008 From: ibatis2 at 163.com (jetspeed) Date: Tue Jan 1 07:23:01 2008 Subject: [mvapich-discuss] what package may I need? Message-ID: <20080101201205.6c0a5aa8.ibatis2@163.com> Hi,all I use Mvapich2 0.9.8, on PowerPC, RHEL4 , when I use mpicc to compile hpl, I got many errors as below: (mvapich successfully compiled simple MPI program and Mpich2 on this machine successfully compiled HPL) /usr/bin/ld: /usr/mpi/gcc/mvapich2-0.9.8-15/lib/libmpich.a(malloc.o)(.text+0x4ac0): unresolvable R_PPC64_REL24 relocation against symbol `pthread_mutex_trylock@@GLIBC_2.3' /usr/bin/ld: /usr/mpi/gcc/mvapich2-0.9.8-15/lib/libmpich.a(malloc.o)(.text+0x4b10): unresolvable R_PPC64_REL24 relocation against symbol `pthread_mutex_unlock@@GLIBC_2.3' /usr/bin/ld: final link failed: Nonrepresentable section on output collect2: ld returned 1 exit status make[2]: *** [dexe.grd] Error 1 my glibc is glibc = 2.3.4-2.19, glibc-devel-2.3.4-2.19.ppc, glibc-devel-2.3.4-2.19.ppc64. What package do I need for successful compile? , or any suggestions? Thanks in advance From nilesh_awate at yahoo.com Wed Jan 2 03:50:02 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Wed Jan 2 04:00:17 2008 Subject: [mvapich-discuss] different mpiexec options Message-ID: <985896.2893.qm@web94101.mail.in2.yahoo.com> Hi Lei, thanks a bunch, my problem has been solved . . . actually i had seen that machine file option in help, but i did't find much(how to spacify) about it in man mpiexec thanks & regards, Nilesh Awate C-DAC R&D ----- Original Message ---- From: LEI CHAI To: nilesh awate Cc: mvapich-discuss@cse..ohio-state.edu Sent: Tuesday, 1 January, 2008 12:37:56 AM Subject: Re: [mvapich-discuss] different mpiexec options Hi Nilesh, You can map processes to machines by using the -machinefile option. For example, suppose you have four nodes, m[1-4], and you want to run 4 processes on a single node, say m1, without modifying mpd.hosts you can run the program like this: $ mpiexec -machinefile ./mf -n 4 ./a.out where mf is a file containing the machine mapping, e.g. $ cat mf m1 m1 m1 m1 And if you want to run 4 processes on 2 nodes, then mf may look like this: $ cat mf m1 m2 m1 m2 More information about running mvapich2 can be found from mvapich2 user guide: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2.html Thanks, Lei Content-Type: multipart/alternative; boundary="0-2031564087-1199096938=:11089" --0-2031564087-1199096938=:11089 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi all,=0A=0AI'm using mvapich2-1.0.1 with OFED1.2 udapl stack=0Ai've setup= 4 nodes & using them.=0Abut when i run foll commmand=0Ampiexec -n 4 ./mpit= st=0Ampitst get executed all 4 nodes . .. .=0Acan i restrict its execution t= o only 2 nodes( without reducing node number in mpd.hosts))=0Aby spacifying= option while running mpiexec ?=0Awhich different option we can give to mpi= exec ?=0Asuupose i want to run 4 instance of executable on single node havi= ng quadra core cpu=0Ahow can i tell it to mpiexec to run it on single node = ,let the other node remain idle.=0A=0Awaiting for reply=0Aregards,=0A Niles= h Awate=0AC-DAC R&D=0A=0A=0A=0A=0A=0A=0A Get the freedom to save as ma= ny mails as you wish. To know how, go to http://help.yahoo.com/l/in/yahoo/m= ail/yahoomail/tools/tools-08.html --0-2031564087-1199096938=:11089 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hi all,

I'm using mvapich2-1.0.1 with OFED1.2 udapl = stack
i've setup 4 nodes & using them.
but when i run foll commma= nd
mpiexec -n 4 ./mpitst
mpitst get executed all 4 nodes . . .
can i restrict its execution to only 2 node= s( without reducing node number in mpd.hosts))
by spacifying opti= on while running mpiexec ?
which different option we can give to mpiexec= ?
suupose i want to run 4 instance of executable on single node having = quadra core cpu
how can i tell it to mpiexec to run it on single node ,l= et the other node remain idle.

waiting for reply
regards,
&nbs= p;Nilesh Awate
C-DAC R&D



=0A=0A=0A
Chat on a cool, new interface. No download required. Click here. --0-2031564087-1199096938=:11089-- Now you can chat without downloading messenger. Go to http://in.messenger.yahoo.com/webmessengerpromo.php -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080102/4e9db8c5/attachment-0001.html From ibatis2 at 163.com Thu Jan 3 05:54:03 2008 From: ibatis2 at 163.com (jetspeed) Date: Thu Jan 3 05:59:24 2008 Subject: [mvapich-discuss] 64bit error? Message-ID: <20080103185403.5b789147.ibatis2@163.com> Hi,all I use Mvapich2 0.9.8(OFED1.2.5.4 for InfiniBand), on PowerPC, RHEL4 , when I use mpicc to compile hpl, I got many errors as below: /usr/bin/ld: /usr/mpi/gcc/mvapich2-0.9.8-15/lib/libmpich.a(malloc.o)(.text+0x4ac0): unresolvable R_PPC64_REL24 relocation against symbol `pthread_mutex_trylock@@GLIBC_2.3' /usr/bin/ld: /usr/mpi/gcc/mvapich2-0.9.8-15/lib/libmpich.a(malloc.o)(.text+0x4b10): unresolvable R_PPC64_REL24 relocation against symbol `pthread_mutex_unlock@@GLIBC_2.3' /usr/bin/ld: final link failed: Nonrepresentable section on output collect2: ld returned 1 exit status make[2]: *** [dexe.grd] Error 1 but after I set the BINARY64 = 1 in the Lapack makefile which produces 64 bit binary, and link the lapack.a in HPL, the compile will succeed. but running the xhpl program, got errors as follow: rank 4 in job 6 inode01_42535 caused collective abort of all ranks exit status of rank 4: killed by signal 9 rank 3 in job 6 inode01_42535 caused collective abort of all ranks exit status of rank 3: killed by signal 9 rank 1 in job 6 inode01_42535 caused collective abort of all ranks exit status of rank 1: killed by signal 9 rank 0 in job 6 inode01_42535 caused collective abort of all ranks exit status of rank 0: killed by signal 9 the question is what does the "R_PPC64_REL24 relocation" mean? How can I compile and run the HPL tests by using Mvapich2 ? anyone done this ? From ibatis2 at 163.com Thu Jan 3 09:22:48 2008 From: ibatis2 at 163.com (jetspeed) Date: Thu Jan 3 09:28:17 2008 Subject: [mvapich-discuss] 64bit error? In-Reply-To: References: <20080103185403.5b789147.ibatis2@163.com> Message-ID: <20080103222248.a905506e.ibatis2@163.com> I tried MVAPICH2 1.0.1 £¬ It works! but mpicc can't use -m64 option.(I compiled by the default make.mvapich2.ofa) I guess the MVAPICH2 0.9.8 in my OFED1.2.5.4 was compiled to 64bit version, so it should use 64bit Lapack. Is that right? is there a setting to define 32bit/64bit? I see there is a script make.mvapich2.def to find the architure(my `uname -m` outputs ppc64), On Thu, 3 Jan 2008 06:35:09 -0500 (EST) Dhabaleswar Panda wrote: > Thanks for your note. > > Do you see the same problem with MVAPICH2 1.0.1 from our web site? > > Unfortunately, we do not have any working PowerPC system with RHEL4 to > reproduce and analyze this problem. > > If you can provide us remote access to your system for some time, we will > be happy to analyze and solve this. Let us know whether this will be > feasible. Accordingly, I will ask one of my team members to be in touch > with you regarding this. > > DK > > On Thu, 3 Jan 2008, jetspeed wrote: > > > Hi,all > > I use Mvapich2 0.9.8(OFED1.2.5.4 for InfiniBand), on PowerPC, RHEL4 , when I use mpicc to compile hpl, I got many errors as below: > > > > /usr/bin/ld: /usr/mpi/gcc/mvapich2-0.9.8-15/lib/libmpich.a(malloc.o)(.text+0x4ac0): unresolvable R_PPC64_REL24 relocation against symbol `pthread_mutex_trylock@@GLIBC_2.3' > > /usr/bin/ld: /usr/mpi/gcc/mvapich2-0.9.8-15/lib/libmpich.a(malloc.o)(.text+0x4b10): unresolvable R_PPC64_REL24 relocation against symbol `pthread_mutex_unlock@@GLIBC_2.3' > > /usr/bin/ld: final link failed: Nonrepresentable section on output > > collect2: ld returned 1 exit status > > make[2]: *** [dexe.grd] Error 1 > > > > > > but after I set the BINARY64 = 1 in the Lapack makefile which produces 64 bit binary, and link the lapack.a in HPL, the compile will succeed. but running the xhpl program, got errors as follow: > > > > rank 4 in job 6 inode01_42535 caused collective abort of all ranks > > exit status of rank 4: killed by signal 9 > > rank 3 in job 6 inode01_42535 caused collective abort of all ranks > > exit status of rank 3: killed by signal 9 > > rank 1 in job 6 inode01_42535 caused collective abort of all ranks > > exit status of rank 1: killed by signal 9 > > rank 0 in job 6 inode01_42535 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > > > the question is what does the "R_PPC64_REL24 relocation" mean? How can I compile and run the HPL tests by using Mvapich2 ? anyone done this ? > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > From eborisch at ieee.org Thu Jan 3 10:24:08 2008 From: eborisch at ieee.org (Eric A. Borisch) Date: Thu Jan 3 10:24:18 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: <1a7dd1d151.1d1511a7dd@osu.edu> References: <1a7dd1d151.1d1511a7dd@osu.edu> Message-ID: <392f95800801030724l209990f1g8609288378f70dac@mail.gmail.com> Lei, Thanks for the information. I would suggest that, if this can't be fixed in the vapi version, then the LAZY_MEM_UNREGISTER define should be removed from the default compile options for the versions where it is (apparently) not fully supported. This is a very nasty bug. The MPI layer reports back no errors, but the data isn't actually transferred successfully. In addition, it presents as a timing / waiting error to the user, as all of the local (shared mem) peers transfer data successfully, so significant time can be spent chasing down a suspected user oversight for what is actually an error within the MPI layer. This would apply to the MVAPICH and MVAPICH2, in both the vapi and vapi_multirail makefiles. In addition, it should be documented that the LAZY_MEM_UNREGISTER switch is NOT compatible with vapi-based channels. Thanks, Eric On Dec 21, 2007 5:29 PM, LEI CHAI wrote: > Hi Eric, > > Thanks for using mvapich/mvapich2. The problem you reported can be solved by using the PTMALLOC feature which is supported by the gen2 device but not vapi/vapi_multirail. Not much features have been added to vapi/vapi_multirail devices for the last few releases because not many people use them. Since you cannot move to gen2, we would suggest you disable LAZY_MEM_UNREGISTER for your tests. > > Thanks, > Lei > > > > ----- Original Message ----- > From: "Eric A. Borisch" > Date: Friday, December 21, 2007 10:23 am > Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 > > > I seem to be running into a memory registration issue. > > > > Observations: > > > > 1) During some transfers (MPI_Isend / MPI_Irecv / MPI_Waitall) > > into a > > local buffer on the root rank, I receive all of the data from any > > ranks that are running on the same machine, but only part (or none at > > all) of the data from ranks running on external machines. The transfer > > length is above the eager/rendezvous threshold. > > 2) Once the problem occurs, it is persistent. However, if I force > > MVAPICH to re-register by calling "while(dreg_evict())" at this point > > and then re-transfer, the correct data is received. (Same memory being > > transferred from / to.) > > 3) I've only witnessed problems occurring above the 4G (as > > returned by > > malloc()) memory range. > > 4) When I receive partial data from ranks, the data ends on a (4k) > > page bound. Data past this bound (which should have been updated) is > > unchanged during the transfer, yet both the sender and receiver report > > no errors. (This is very bad!) > > 5) Stepping through the code on both ends of the transfer shows the > > software agreeing on the (correct) length and location as far down as > > I can follow it. > > 6) Running against a compilation with no -DLAZY_MEM_UNREGISTER shows > > no issues. (Other than the expected performance hit.) > > 7) Occurs on both MVAPICH-1.0-beta (vapi_multirail) and mvapich2- > > 1.0 (vapi) > > 8) The user code is also sending data out (from a different buffer) > > over ethernet to a remote gui from the root node. > > > > I can't move to gen2 at this point -- we are using a vendor library > > for interfacing to another system, and this library uses VAPI. > > > > uname -a output: > > Linux rt2.mayo.edu 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST > > 2006 x86_64 x86_64 x86_64 GNU/Linux > > > > Intel SE7520JR2 motherboards. 4G physical ram on each node. > > > > It appears (perhaps this is obvious) that the assumption that memory > > registered (by the dreg.c code) remains registered until explicitly > > unregistered (again, by the dreg.c code) is being violated in some > > way. This, however, is wading in to uncharted (for me, at least) linux > > memory management waters. The user code is doing nothing to fiddle > > with registration in any explicit way. (With the exception of as > > mentioned in (2)) > > > > Please let me know what other information I can provide to resolve > > this. I'm still trying to put together a small test program to cause > > the problem, but have been unsuccessful so far. > > > > Thanks, > > Eric > > -- > > Eric A. Borisch > > eborisch@ieee.org > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > -- Eric A. Borisch eborisch@ieee.org From ben.held at staarinc.com Thu Jan 3 12:56:19 2008 From: ben.held at staarinc.com (Ben Held) Date: Thu Jan 3 12:56:30 2008 Subject: [mvapich-discuss] Building mvapich-based applications without access to infiniband system Message-ID: <00a401c84e31$f1eeeb30$d5ccc190$@held@staarinc.com> Our company offers a commercial product that we currently build for standard MPICH-1 and LAM. We have a client that has a new Infiniband Linux cluster that has MVAPICH installed on it. Our company does not own any infiniband hardware, but we are faced with providing an application for this customer's cluster. Is this possible and how do we proceed. It appears that the build process for mvapich automatically detects the hardware (that we don't have), so I have concerns that building mvapich here and the linking it into our app will result in a binary that will not run on their cluster. Thanks, Ben Ben Held Simulation Technology & Applied Research, Inc. 11520 N. Port Washington Rd., Suite 201 Mequon, WI 53092 P: 1.262.240.0291 x101 F: 1.262.240.0294 E: ben.held@staarinc.com http://www.staarinc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080103/ed499588/attachment.html From jsquyres at cisco.com Thu Jan 3 13:29:08 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu Jan 3 13:29:39 2008 Subject: [mvapich-discuss] Building mvapich-based applications without access to infiniband system In-Reply-To: <00a401c84e31$f1eeeb30$d5ccc190$@held@staarinc.com> References: <00a401c84e31$f1eeeb30$d5ccc190$@held@staarinc.com> Message-ID: <3C002364-889A-49C3-9F20-674974357287@cisco.com> Ben -- You might want to install the OFED stack, which comes with Open MPI, MVAPICH 1, and MVAPICH 2 pre-installed (all with IB support included). Cisco offers pre-made RPMs of these ins OFED distribution off cisco.com -- I believe that other vendors do as well, but I don't know the details. The pre-made binary RPMs avoids the problem of the build system trying to detect specific hardware when you have none. After installation, you can use the mpi-selector-menu command to select which MPI to use in OFED (see the man page for details). However, you're still in a bit of an odd spot in that you want to ship a product but don't have the hardware to test it on. Even if you get it to compile/link, you don't have a way to test whether it actually works or not. That's a real bummer (and could be a support nightmare). :-( FWIW, if budgets are tight, you could buy a pair of IB HCAs and connect them back-to-back without a switch for pretty cheap. This is nowhere near real testing, but at least it would give you some indication of whether your app works over an IB-enabled MPI or not. On Jan 3, 2008, at 12:56 PM, Ben Held wrote: > Our company offers a commercial product that we currently build for > standard MPICH-1 and LAM. We have a client that has a new > Infiniband Linux cluster that has MVAPICH installed on it. Our > company does not own any infiniband hardware, but we are faced with > providing an application for this customer?s cluster. Is this > possible and how do we proceed. It appears that the build process > for mvapich automatically detects the hardware (that we don?t have), > so I have concerns that building mvapich here and the linking it > into our app will result in a binary that will not run on their > cluster. > > Thanks, > Ben > > Ben Held > Simulation Technology & Applied Research, Inc. > 11520 N. Port Washington Rd., Suite 201 > Mequon, WI 53092 > P: 1.262.240.0291 x101 > F: 1.262.240.0294 > E: ben.held@staarinc.com > http://www.staarinc.com > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jeff Squyres Cisco Systems From tom.mitchell at qlogic.com Thu Jan 3 17:47:46 2008 From: tom.mitchell at qlogic.com (Tom Mitchell) Date: Thu Jan 3 17:48:09 2008 Subject: [mvapich-discuss] Building mvapich-based applications without access to infiniband system Message-ID: <20080103224746.GC29464@qlogic.com> On Jan 03 11:56, Ben Held wrote: > > Our company offers a commercial product that we currently build for standard > MPICH-1 and LAM. We have a client that has a new Infiniband Linux cluster > that has MVAPICH installed on it. Our company does not own any infiniband > hardware, but we are faced with providing an application for this > customer???s cluster. Is this possible and how do we proceed. It appears > that the build process for mvapich automatically detects the hardware (that > we don???t have), so I have concerns that building mvapich here and the > linking it into our app will result in a binary that will not run on their > cluster. Ben, Jeff Squyres had very good advice. I would like to add that MPI is an API not ABI. As you branch out you will have to pay attention to the binary stacks that you target at compile time. Examples might be HP-MPI, Cisco's MPI, QLogic's MPI, Open MPI, LAM, Intel MPI... and more including a customers hand crafted MPI. The MVAPICH that your client built for his Infiniband Linux cluster will have been compiled with a specific set of options and a specific compiler. Having built MVAPICH the client would have versions of the helper scripts mpicc, mpif77, mpif90... these scripts match the correct compiler to the correct library and almost all the other moving parts. If you look at the ABI issue for compilers in isolation you can find subtle things like Fortran logical True and False having underlying differences in the digital representation. For some Fortran compilers the logical .TRUE. and .FALSE. use the int pair 1 and 0. While others use 0 and -1.... getargs, memcpy are also other places I know where ABI mismatches can happen. The logical .TRUE. and .FALSE. case is interesting because correct Boolean logic transformations by the compiler can convert working code to code that fails in strange ways after ABI cross linking or a change in optimization.... This can be critical for Basic Linear Algebra packages.... where a researcher finds that compiler A gives +5% on library foo.so and compiler B gives +5% on library bar.so and then MPI was built with compiler C. Or worse ld search order finds unexpected and different packages out on nodes in a cluster. To research this a bit look at Open MPI. The Open MPI configure script and README does have comments that discuss and address the logical .TRUE. and .FALSE. issue. For Open MPI users the ompi_info command is valuable to rediscover which Fortran compiler Open MPI was configured with but may not have all the compiler flags (like "....Portland Group compilers provide the "-Munixlogical" option, and Intel compilers (version >= 8.) provide the "-fpscomp logicals" option...." Also the environment (see also alternatives) can get in the mix.... Now with gcc we also have gcc3 and gcc4 versions to watch... As you branch out your build environments need full and detailed records so you can reproduce/ debug these issues. Since MPI is an API you would do well to collect as many MPIs and compilers as you can find then build and test with each. In this case you only have the one additional customers MVAPICH and cluster to work with. That has the potential of making your life easy as long as the customer can give you access. It is the next handful of customers that makes things interesting. If you build your package on the customers cluster do log all you can about the cluster and build environment. A security fix, aptget, yum update, emerge world or up2date can change things that you do not expect ;-) Have fun, mitch > > > Thanks, > > Ben > > > Ben Held > Simulation Technology & Applied Research, Inc. > 11520 N. Port Washington Rd., Suite 201 > Mequon, WI 53092 > P: 1.262.240.0291 x101 > F: 1.262.240.0294 > E: [1]ben.held@staarinc.com > [2]http://www.staarinc.com > > References > > 1. mailto:ben.held@staarinc.com > 2. http://www.staarinc.com/ > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- T o m M i t c h e l l Host Solutions Group, QLogic Corp. http://www.qlogic.com http://support.qlogic.com From brian.budge at gmail.com Thu Jan 3 20:46:15 2008 From: brian.budge at gmail.com (Brian Budge) Date: Thu Jan 3 20:46:49 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB Message-ID: <5b7094580801031746h57c0e7f0i8f80b45e6f6918e7@mail.gmail.com> Hi all - I'm new to the list here... hi! I have been using OpenMPI for a while, and LAM before that, but new requirements keep pushing me to new implementations. In particular, I was interested in using infiniband (using OFED 1.2.5.1) in a multi-threaded environment. It seems that MVAPICH is the library for that particular combination :) In any case, I installed MVAPICH, and I can boot the daemons, and run the ring speed test with no problems. When I run any programs with mpirun, however, I get an error when sending or receiving more than 8192 bytes. For example, if I run the bandwidth test from the benchmarks page (osu_bw.c), I get the following: --------------------------------------------------------------- budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out Thursday 06:16:00 burn burn-3 # OSU MPI Bandwidth Test v3.0 # Size Bandwidth (MB/s) 1 1.24 2 2.72 4 5.44 8 10.18 16 19.09 32 29.69 64 65.01 128 147.31 256 244.61 512 354.32 1024 367.91 2048 451.96 4096 550.66 8192 598.35 [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to send Internal Error: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3_RndvSend:263 Fatal error in MPI_Waitall: Other MPI error, error stack: MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, status_array=0xdb3140) failed (unknown)(): Other MPI error rank 1 in job 4 burn_37156 caused collective abort of all ranks exit status of rank 1: killed by signal 9 --------------------------------------------------------------- I get a similar problem with the latency test, however, the protocol that is complained about is different: -------------------------------------------------------------------- budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out Thursday 09:21:20 # OSU MPI Latency Test v3.0 # Size Latency (us) 0 3.93 1 4.07 2 4.06 4 3.82 8 3.98 16 4.03 32 4.00 64 4.28 128 5.22 256 5.88 512 8.65 1024 9.11 2048 11.53 4096 16.17 8192 25.67 [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv req to send Internal Error: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3_RndvSend:263 Fatal error in MPI_Recv: Other MPI error, error stack: MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, tag=1, MPI_COMM_WORLD, status=0x7fff14c7bde0) failed (unknown)(): Other MPI error rank 1 in job 5 burn_37156 caused collective abort of all ranks -------------------------------------------------------------------- The protocols (0 and 8126589) are consistent if I run the program multiple times. Anyone have any ideas? If you need more info, please let me know. Thanks, Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080103/28e0f7e4/attachment.html From lexa at adam.botik.ru Fri Jan 4 09:03:22 2008 From: lexa at adam.botik.ru (Alexei I. Adamovich) Date: Fri Jan 4 09:03:48 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: <392f95800801030724l209990f1g8609288378f70dac@mail.gmail.com> References: <1a7dd1d151.1d1511a7dd@osu.edu> <392f95800801030724l209990f1g8609288378f70dac@mail.gmail.com> Message-ID: <20080104140322.GA27951@adam.botik.ru> Eric, what is the version of glibc you are using? I've found the following message on Wolfram Gloger's malloc homepage (http://www.malloc.de/en/index.html): WG> ... WG> New ptmalloc2 release Jun 5th, 2006! WG> WG> Here you can download the current snapshot of ptmalloc2 (C source WG> code), the second version of ptmalloc based on Doug Lea's WG> malloc-2.7.x. This code has already been included in WG> glibc-2.3.x. In multi-thread Applications, ptmalloc2 is currently WG> slightly more memory-efficient than ptmalloc3. WG> WG> .. So, I guess, the usage of more fresh glibc could be a solution. Please, inform me if you have evaluated this possibility already. In case you have RPM-based Linux distribution, you could found your current glibc version using 'rpm -qa | grep -i libc' command. Lei, am I wrong? Is the ptmalloc2 being used only as a thread-safe version of malloc, or possibly there is a more sufficient reason for using just the ptmalloc2 source code supplied? Sincerely, Alexei I. Adamovich On Thu, Jan 03, 2008 at 09:24:08AM -0600, Eric A. Borisch wrote: > Lei, > > Thanks for the information. I would suggest that, if this can't be > fixed in the vapi version, then the LAZY_MEM_UNREGISTER define should > be removed from the default compile options for the versions where it > is (apparently) not fully supported. > > This is a very nasty bug. The MPI layer reports back no errors, but > the data isn't actually transferred successfully. In addition, it > presents as a timing / waiting error to the user, as all of the local > (shared mem) peers transfer data successfully, so significant time can > be spent chasing down a suspected user oversight for what is actually > an error within the MPI layer. > > This would apply to the MVAPICH and MVAPICH2, in both the vapi and > vapi_multirail makefiles. > > In addition, it should be documented that the LAZY_MEM_UNREGISTER > switch is NOT compatible with vapi-based channels. > > Thanks, > Eric > > On Dec 21, 2007 5:29 PM, LEI CHAI wrote: > > Hi Eric, > > > > Thanks for using mvapich/mvapich2. The problem you reported can be solved by using the PTMALLOC feature which is supported by the gen2 device but not vapi/vapi_multirail. Not much features have been added to vapi/vapi_multirail devices for the last few releases because not many people use them. Since you cannot move to gen2, we would suggest you disable LAZY_MEM_UNREGISTER for your tests. > > > > Thanks, > > Lei > > > > > > > > ----- Original Message ----- > > From: "Eric A. Borisch" > > Date: Friday, December 21, 2007 10:23 am > > Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 > > > > > I seem to be running into a memory registration issue. > > > > > > Observations: > > > > > > 1) During some transfers (MPI_Isend / MPI_Irecv / MPI_Waitall) > > > into a > > > local buffer on the root rank, I receive all of the data from any > > > ranks that are running on the same machine, but only part (or none at > > > all) of the data from ranks running on external machines. The transfer > > > length is above the eager/rendezvous threshold. > > > 2) Once the problem occurs, it is persistent. However, if I force > > > MVAPICH to re-register by calling "while(dreg_evict())" at this point > > > and then re-transfer, the correct data is received. (Same memory being > > > transferred from / to.) > > > 3) I've only witnessed problems occurring above the 4G (as > > > returned by > > > malloc()) memory range. > > > 4) When I receive partial data from ranks, the data ends on a (4k) > > > page bound. Data past this bound (which should have been updated) is > > > unchanged during the transfer, yet both the sender and receiver report > > > no errors. (This is very bad!) > > > 5) Stepping through the code on both ends of the transfer shows the > > > software agreeing on the (correct) length and location as far down as > > > I can follow it. > > > 6) Running against a compilation with no -DLAZY_MEM_UNREGISTER shows > > > no issues. (Other than the expected performance hit.) > > > 7) Occurs on both MVAPICH-1.0-beta (vapi_multirail) and mvapich2- > > > 1.0 (vapi) > > > 8) The user code is also sending data out (from a different buffer) > > > over ethernet to a remote gui from the root node. > > > > > > I can't move to gen2 at this point -- we are using a vendor library > > > for interfacing to another system, and this library uses VAPI. > > > > > > uname -a output: > > > Linux rt2.mayo.edu 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST > > > 2006 x86_64 x86_64 x86_64 GNU/Linux > > > > > > Intel SE7520JR2 motherboards. 4G physical ram on each node. > > > > > > It appears (perhaps this is obvious) that the assumption that memory > > > registered (by the dreg.c code) remains registered until explicitly > > > unregistered (again, by the dreg.c code) is being violated in some > > > way. This, however, is wading in to uncharted (for me, at least) linux > > > memory management waters. The user code is doing nothing to fiddle > > > with registration in any explicit way. (With the exception of as > > > mentioned in (2)) > > > > > > Please let me know what other information I can provide to resolve > > > this. I'm still trying to put together a small test program to cause > > > the problem, but have been unsuccessful so far. > > > > > > Thanks, > > > Eric > > > -- > > > Eric A. Borisch > > > eborisch@ieee.org > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > -- > Eric A. Borisch > eborisch@ieee.org > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From panda at cse.ohio-state.edu Fri Jan 4 13:23:20 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jan 4 13:23:26 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: <392f95800801030724l209990f1g8609288378f70dac@mail.gmail.com> Message-ID: Hi Eric, Thanks for your suggestions. We will make these changes to vapi and vapi_multirail devices and add the information to the user guides too. Thanks, DK > Lei, > > Thanks for the information. I would suggest that, if this can't be > fixed in the vapi version, then the LAZY_MEM_UNREGISTER define should > be removed from the default compile options for the versions where it > is (apparently) not fully supported. > > This is a very nasty bug. The MPI layer reports back no errors, but > the data isn't actually transferred successfully. In addition, it > presents as a timing / waiting error to the user, as all of the local > (shared mem) peers transfer data successfully, so significant time can > be spent chasing down a suspected user oversight for what is actually > an error within the MPI layer. > > This would apply to the MVAPICH and MVAPICH2, in both the vapi and > vapi_multirail makefiles. > > In addition, it should be documented that the LAZY_MEM_UNREGISTER > switch is NOT compatible with vapi-based channels. > > Thanks, > Eric > > On Dec 21, 2007 5:29 PM, LEI CHAI wrote: > > Hi Eric, > > > > Thanks for using mvapich/mvapich2. The problem you reported can be solved by using the PTMALLOC feature which is supported by the gen2 device but not vapi/vapi_multirail. Not much features have been added to vapi/vapi_multirail devices for the last few releases because not many people use them. Since you cannot move to gen2, we would suggest you disable LAZY_MEM_UNREGISTER for your tests. > > > > Thanks, > > Lei > > > > > > > > ----- Original Message ----- > > From: "Eric A. Borisch" > > Date: Friday, December 21, 2007 10:23 am > > Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 > > > > > I seem to be running into a memory registration issue. > > > > > > Observations: > > > > > > 1) During some transfers (MPI_Isend / MPI_Irecv / MPI_Waitall) > > > into a > > > local buffer on the root rank, I receive all of the data from any > > > ranks that are running on the same machine, but only part (or none at > > > all) of the data from ranks running on external machines. The transfer > > > length is above the eager/rendezvous threshold. > > > 2) Once the problem occurs, it is persistent. However, if I force > > > MVAPICH to re-register by calling "while(dreg_evict())" at this point > > > and then re-transfer, the correct data is received. (Same memory being > > > transferred from / to.) > > > 3) I've only witnessed problems occurring above the 4G (as > > > returned by > > > malloc()) memory range. > > > 4) When I receive partial data from ranks, the data ends on a (4k) > > > page bound. Data past this bound (which should have been updated) is > > > unchanged during the transfer, yet both the sender and receiver report > > > no errors. (This is very bad!) > > > 5) Stepping through the code on both ends of the transfer shows the > > > software agreeing on the (correct) length and location as far down as > > > I can follow it. > > > 6) Running against a compilation with no -DLAZY_MEM_UNREGISTER shows > > > no issues. (Other than the expected performance hit.) > > > 7) Occurs on both MVAPICH-1.0-beta (vapi_multirail) and mvapich2- > > > 1.0 (vapi) > > > 8) The user code is also sending data out (from a different buffer) > > > over ethernet to a remote gui from the root node. > > > > > > I can't move to gen2 at this point -- we are using a vendor library > > > for interfacing to another system, and this library uses VAPI. > > > > > > uname -a output: > > > Linux rt2.mayo.edu 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST > > > 2006 x86_64 x86_64 x86_64 GNU/Linux > > > > > > Intel SE7520JR2 motherboards. 4G physical ram on each node. > > > > > > It appears (perhaps this is obvious) that the assumption that memory > > > registered (by the dreg.c code) remains registered until explicitly > > > unregistered (again, by the dreg.c code) is being violated in some > > > way. This, however, is wading in to uncharted (for me, at least) linux > > > memory management waters. The user code is doing nothing to fiddle > > > with registration in any explicit way. (With the exception of as > > > mentioned in (2)) > > > > > > Please let me know what other information I can provide to resolve > > > this. I'm still trying to put together a small test program to cause > > > the problem, but have been unsuccessful so far. > > > > > > Thanks, > > > Eric > > > -- > > > Eric A. Borisch > > > eborisch@ieee.org > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > -- > Eric A. Borisch > eborisch@ieee.org > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From koop at cse.ohio-state.edu Fri Jan 4 14:03:07 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri Jan 4 14:03:12 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: <20080104140322.GA27951@adam.botik.ru> Message-ID: Alexei, ptmalloc2 is being used in our case to provide enhanced performance and correctness. To speed up communications we cache registration of memory regions (a costly operation) that are being used for communication. To provide correct behavior we need to intercept malloc/free and friends so old registrations can be flushed (otherwise the virtual->physical mapping can change, leading to incorrect results). Matt On Fri, 4 Jan 2008, Alexei I. Adamovich wrote: > Eric, > > what is the version of glibc you are using? > > I've found the following message on Wolfram Gloger's malloc homepage > (http://www.malloc.de/en/index.html): > > WG> ... > WG> New ptmalloc2 release Jun 5th, 2006! > WG> > WG> Here you can download the current snapshot of ptmalloc2 (C source > WG> code), the second version of ptmalloc based on Doug Lea's > WG> malloc-2.7.x. This code has already been included in > WG> glibc-2.3.x. In multi-thread Applications, ptmalloc2 is currently > WG> slightly more memory-efficient than ptmalloc3. > WG> > WG> .. > > So, I guess, the usage of more fresh glibc could be a solution. > > Please, inform me if you have evaluated this possibility already. > > In case you have RPM-based Linux distribution, you could found > your current glibc version using > > 'rpm -qa | grep -i libc' > > command. > > > Lei, > > am I wrong? Is the ptmalloc2 being used only as a thread-safe version of malloc, > or possibly there is a more sufficient reason for using just the ptmalloc2 > source code supplied? > > Sincerely, > > Alexei I. Adamovich > > On Thu, Jan 03, 2008 at 09:24:08AM -0600, Eric A. Borisch wrote: > > Lei, > > > > Thanks for the information. I would suggest that, if this can't be > > fixed in the vapi version, then the LAZY_MEM_UNREGISTER define should > > be removed from the default compile options for the versions where it > > is (apparently) not fully supported. > > > > This is a very nasty bug. The MPI layer reports back no errors, but > > the data isn't actually transferred successfully. In addition, it > > presents as a timing / waiting error to the user, as all of the local > > (shared mem) peers transfer data successfully, so significant time can > > be spent chasing down a suspected user oversight for what is actually > > an error within the MPI layer. > > > > This would apply to the MVAPICH and MVAPICH2, in both the vapi and > > vapi_multirail makefiles. > > > > In addition, it should be documented that the LAZY_MEM_UNREGISTER > > switch is NOT compatible with vapi-based channels. > > > > Thanks, > > Eric > > > > On Dec 21, 2007 5:29 PM, LEI CHAI wrote: > > > Hi Eric, > > > > > > Thanks for using mvapich/mvapich2. The problem you reported can be solved by using the PTMALLOC feature which is supported by the gen2 device but not vapi/vapi_multirail. Not much features have been added to vapi/vapi_multirail devices for the last few releases because not many people use them. Since you cannot move to gen2, we would suggest you disable LAZY_MEM_UNREGISTER for your tests. > > > > > > Thanks, > > > Lei > > > > > > > > > > > > ----- Original Message ----- > > > From: "Eric A. Borisch" > > > Date: Friday, December 21, 2007 10:23 am > > > Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 > > > > > > > I seem to be running into a memory registration issue. > > > > > > > > Observations: > > > > > > > > 1) During some transfers (MPI_Isend / MPI_Irecv / MPI_Waitall) > > > > into a > > > > local buffer on the root rank, I receive all of the data from any > > > > ranks that are running on the same machine, but only part (or none at > > > > all) of the data from ranks running on external machines. The transfer > > > > length is above the eager/rendezvous threshold. > > > > 2) Once the problem occurs, it is persistent. However, if I force > > > > MVAPICH to re-register by calling "while(dreg_evict())" at this point > > > > and then re-transfer, the correct data is received. (Same memory being > > > > transferred from / to.) > > > > 3) I've only witnessed problems occurring above the 4G (as > > > > returned by > > > > malloc()) memory range. > > > > 4) When I receive partial data from ranks, the data ends on a (4k) > > > > page bound. Data past this bound (which should have been updated) is > > > > unchanged during the transfer, yet both the sender and receiver report > > > > no errors. (This is very bad!) > > > > 5) Stepping through the code on both ends of the transfer shows the > > > > software agreeing on the (correct) length and location as far down as > > > > I can follow it. > > > > 6) Running against a compilation with no -DLAZY_MEM_UNREGISTER shows > > > > no issues. (Other than the expected performance hit.) > > > > 7) Occurs on both MVAPICH-1.0-beta (vapi_multirail) and mvapich2- > > > > 1.0 (vapi) > > > > 8) The user code is also sending data out (from a different buffer) > > > > over ethernet to a remote gui from the root node. > > > > > > > > I can't move to gen2 at this point -- we are using a vendor library > > > > for interfacing to another system, and this library uses VAPI. > > > > > > > > uname -a output: > > > > Linux rt2.mayo.edu 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST > > > > 2006 x86_64 x86_64 x86_64 GNU/Linux > > > > > > > > Intel SE7520JR2 motherboards. 4G physical ram on each node. > > > > > > > > It appears (perhaps this is obvious) that the assumption that memory > > > > registered (by the dreg.c code) remains registered until explicitly > > > > unregistered (again, by the dreg.c code) is being violated in some > > > > way. This, however, is wading in to uncharted (for me, at least) linux > > > > memory management waters. The user code is doing nothing to fiddle > > > > with registration in any explicit way. (With the exception of as > > > > mentioned in (2)) > > > > > > > > Please let me know what other information I can provide to resolve > > > > this. I'm still trying to put together a small test program to cause > > > > the problem, but have been unsuccessful so far. > > > > > > > > Thanks, > > > > Eric > > > > -- > > > > Eric A. Borisch > > > > eborisch@ieee.org > > > > _______________________________________________ > > > > mvapich-discuss mailing list > > > > mvapich-discuss@cse.ohio-state.edu > > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > > > > > > > > -- > > Eric A. Borisch > > eborisch@ieee.org > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From eborisch at ieee.org Fri Jan 4 14:30:13 2008 From: eborisch at ieee.org (Eric A. Borisch) Date: Fri Jan 4 14:30:20 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: References: <20080104140322.GA27951@adam.botik.ru> Message-ID: <392f95800801041130v593fb6bax7ea02d0796bf5818@mail.gmail.com> Matt, I'm curious if this is something that could be correctly handled in the vapi variants or if this is something that is fundamentally possible with gen2 but not vapi. In response to Alexi's question, I'm running glibc-2.3.4-2.25 Thanks, Eric On Jan 4, 2008 1:03 PM, Matthew Koop wrote: > Alexei, > > ptmalloc2 is being used in our case to provide enhanced performance and > correctness. To speed up communications we cache registration of memory > regions (a costly operation) that are being used for communication. To > provide correct behavior we need to intercept malloc/free and friends so > old registrations can be flushed (otherwise the virtual->physical mapping > can change, leading to incorrect results). > > Matt > > > On Fri, 4 Jan 2008, Alexei I. Adamovich wrote: > > > Eric, > > > > what is the version of glibc you are using? > > > > I've found the following message on Wolfram Gloger's malloc homepage > > (http://www.malloc.de/en/index.html): > > > > WG> ... > > WG> New ptmalloc2 release Jun 5th, 2006! > > WG> > > WG> Here you can download the current snapshot of ptmalloc2 (C source > > WG> code), the second version of ptmalloc based on Doug Lea's > > WG> malloc-2.7.x. This code has already been included in > > WG> glibc-2.3.x. In multi-thread Applications, ptmalloc2 is currently > > WG> slightly more memory-efficient than ptmalloc3. > > WG> > > WG> .. > > > > So, I guess, the usage of more fresh glibc could be a solution. > > > > Please, inform me if you have evaluated this possibility already. > > > > In case you have RPM-based Linux distribution, you could found > > your current glibc version using > > > > 'rpm -qa | grep -i libc' > > > > command. > > > > > > Lei, > > > > am I wrong? Is the ptmalloc2 being used only as a thread-safe version of malloc, > > or possibly there is a more sufficient reason for using just the ptmalloc2 > > source code supplied? > > > > Sincerely, > > > > Alexei I. Adamovich > > > > On Thu, Jan 03, 2008 at 09:24:08AM -0600, Eric A. Borisch wrote: > > > Lei, > > > > > > Thanks for the information. I would suggest that, if this can't be > > > fixed in the vapi version, then the LAZY_MEM_UNREGISTER define should > > > be removed from the default compile options for the versions where it > > > is (apparently) not fully supported. > > > > > > This is a very nasty bug. The MPI layer reports back no errors, but > > > the data isn't actually transferred successfully. In addition, it > > > presents as a timing / waiting error to the user, as all of the local > > > (shared mem) peers transfer data successfully, so significant time can > > > be spent chasing down a suspected user oversight for what is actually > > > an error within the MPI layer. > > > > > > This would apply to the MVAPICH and MVAPICH2, in both the vapi and > > > vapi_multirail makefiles. > > > > > > In addition, it should be documented that the LAZY_MEM_UNREGISTER > > > switch is NOT compatible with vapi-based channels. > > > > > > Thanks, > > > Eric > > > > > > On Dec 21, 2007 5:29 PM, LEI CHAI wrote: > > > > Hi Eric, > > > > > > > > Thanks for using mvapich/mvapich2. The problem you reported can be solved by using the PTMALLOC feature which is supported by the gen2 device but not vapi/vapi_multirail. Not much features have been added to vapi/vapi_multirail devices for the last few releases because not many people use them. Since you cannot move to gen2, we would suggest you disable LAZY_MEM_UNREGISTER for your tests. > > > > > > > > Thanks, > > > > Lei > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Eric A. Borisch" > > > > Date: Friday, December 21, 2007 10:23 am > > > > Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 > > > > > > > > > I seem to be running into a memory registration issue. > > > > > > > > > > Observations: > > > > > > > > > > 1) During some transfers (MPI_Isend / MPI_Irecv / MPI_Waitall) > > > > > into a > > > > > local buffer on the root rank, I receive all of the data from any > > > > > ranks that are running on the same machine, but only part (or none at > > > > > all) of the data from ranks running on external machines. The transfer > > > > > length is above the eager/rendezvous threshold. > > > > > 2) Once the problem occurs, it is persistent. However, if I force > > > > > MVAPICH to re-register by calling "while(dreg_evict())" at this point > > > > > and then re-transfer, the correct data is received. (Same memory being > > > > > transferred from / to.) > > > > > 3) I've only witnessed problems occurring above the 4G (as > > > > > returned by > > > > > malloc()) memory range. > > > > > 4) When I receive partial data from ranks, the data ends on a (4k) > > > > > page bound. Data past this bound (which should have been updated) is > > > > > unchanged during the transfer, yet both the sender and receiver report > > > > > no errors. (This is very bad!) > > > > > 5) Stepping through the code on both ends of the transfer shows the > > > > > software agreeing on the (correct) length and location as far down as > > > > > I can follow it. > > > > > 6) Running against a compilation with no -DLAZY_MEM_UNREGISTER shows > > > > > no issues. (Other than the expected performance hit.) > > > > > 7) Occurs on both MVAPICH-1.0-beta (vapi_multirail) and mvapich2- > > > > > 1.0 (vapi) > > > > > 8) The user code is also sending data out (from a different buffer) > > > > > over ethernet to a remote gui from the root node. > > > > > > > > > > I can't move to gen2 at this point -- we are using a vendor library > > > > > for interfacing to another system, and this library uses VAPI. > > > > > > > > > > uname -a output: > > > > > Linux rt2.mayo.edu 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST > > > > > 2006 x86_64 x86_64 x86_64 GNU/Linux > > > > > > > > > > Intel SE7520JR2 motherboards. 4G physical ram on each node. > > > > > > > > > > It appears (perhaps this is obvious) that the assumption that memory > > > > > registered (by the dreg.c code) remains registered until explicitly > > > > > unregistered (again, by the dreg.c code) is being violated in some > > > > > way. This, however, is wading in to uncharted (for me, at least) linux > > > > > memory management waters. The user code is doing nothing to fiddle > > > > > with registration in any explicit way. (With the exception of as > > > > > mentioned in (2)) > > > > > > > > > > Please let me know what other information I can provide to resolve > > > > > this. I'm still trying to put together a small test program to cause > > > > > the problem, but have been unsuccessful so far. > > > > > > > > > > Thanks, > > > > > Eric > > > > > -- > > > > > Eric A. Borisch > > > > > eborisch@ieee.org > > > > > _______________________________________________ > > > > > mvapich-discuss mailing list > > > > > mvapich-discuss@cse.ohio-state.edu > > > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Eric A. Borisch > > > eborisch@ieee.org > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > -- Eric A. Borisch eborisch@ieee.org Howard Roark laughed. From ben.held at staarinc.com Fri Jan 4 15:59:24 2008 From: ben.held at staarinc.com (Ben Held) Date: Fri Jan 4 15:59:36 2008 Subject: [mvapich-discuss] Troubles building/installing OFEM 1.2 on Fedora Core 4 64-bit Message-ID: <009c01c84f14$b00a89c0$101f9d40$@held@staarinc.com> We are seeing a failure during the install process (out of rpmbuild) on a Fedora Core 4 64-bit system. The tail of the log is here: Hunk #1 succeeded at 456 (offset 156 lines). Hunk #2 succeeded at 569 (offset 75 lines). Hunk #3 succeeded at 672 (offset 157 lines). Hunk #4 succeeded at 1444 (offset 281 lines). Hunk #5 succeeded at 1340 (offset 157 lines). Hunk #6 succeeded at 1791 with fuzz 1 (offset 498 lines). /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_patches/backport/2.6.11_FC4/use r_mad_3935_to_2_6_11_FC4.patch patching file drivers/infiniband/core/user_mad.c patch: **** malformed patch at line 12: @@ -827,13 +952,13 @@ static int ib_umad_init_port(struct ib_d Failed to apply patch: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_patches/backport/2.6.11_FC4/use r_mad_3935_to_2_6_11_FC4.patch error: Bad exit status from /var/tmp/rpm-tmp.88475 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.88475 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --defi ne 'configure_options --with-cxgb3-mod --with-ipoib-mod --with-mthca-mod --with-sdp-mod --with-srp-mod --with-core-mod --with-user_mad-mod --with- user_access-mod --with-addr_trans-mod --with-rds-mod ' --define 'KVERSION 2.6.11-1.1369_FC4smp' --define 'KSRC /lib/modules/2.6.11-1.1369_FC4smp/b uild' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprob e_update 1' --define 'include_ipoib_conf 1' /usr/etc/OFED-1.2-rc5/SRPMS/ofa_kernel-1.2-rc5.src.rpm" Any ideas? Regards, Ben Held Simulation Technology & Applied Research, Inc. 11520 N. Port Washington Rd., Suite 201 Mequon, WI 53092 P: 1.262.240.0291 x101 F: 1.262.240.0294 E: ben.held@staarinc.com http://www.staarinc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080104/28c0a6d1/attachment-0001.html From koop at cse.ohio-state.edu Fri Jan 4 17:27:16 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri Jan 4 17:27:26 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: <392f95800801041130v593fb6bax7ea02d0796bf5818@mail.gmail.com> Message-ID: Eric, The problem is not an inherent issue with VAPI. Similar support could be ported to the VAPI device as well. Thus far, we have been including new features in the OpenFabrics/Gen2 device as vendors have mostly moved support to Gen2. Matt On Fri, 4 Jan 2008, Eric A. Borisch wrote: > Matt, > > I'm curious if this is something that could be correctly handled in > the vapi variants or if this is something that is fundamentally > possible with gen2 but not vapi. > > In response to Alexi's question, I'm running glibc-2.3.4-2.25 > > Thanks, > Eric > > On Jan 4, 2008 1:03 PM, Matthew Koop wrote: > > Alexei, > > > > ptmalloc2 is being used in our case to provide enhanced performance and > > correctness. To speed up communications we cache registration of memory > > regions (a costly operation) that are being used for communication. To > > provide correct behavior we need to intercept malloc/free and friends so > > old registrations can be flushed (otherwise the virtual->physical mapping > > can change, leading to incorrect results). > > > > Matt > > > > > > On Fri, 4 Jan 2008, Alexei I. Adamovich wrote: > > > > > Eric, > > > > > > what is the version of glibc you are using? > > > > > > I've found the following message on Wolfram Gloger's malloc homepage > > > (http://www.malloc.de/en/index.html): > > > > > > WG> ... > > > WG> New ptmalloc2 release Jun 5th, 2006! > > > WG> > > > WG> Here you can download the current snapshot of ptmalloc2 (C source > > > WG> code), the second version of ptmalloc based on Doug Lea's > > > WG> malloc-2.7.x. This code has already been included in > > > WG> glibc-2.3.x. In multi-thread Applications, ptmalloc2 is currently > > > WG> slightly more memory-efficient than ptmalloc3. > > > WG> > > > WG> .. > > > > > > So, I guess, the usage of more fresh glibc could be a solution. > > > > > > Please, inform me if you have evaluated this possibility already. > > > > > > In case you have RPM-based Linux distribution, you could found > > > your current glibc version using > > > > > > 'rpm -qa | grep -i libc' > > > > > > command. > > > > > > > > > Lei, > > > > > > am I wrong? Is the ptmalloc2 being used only as a thread-safe version of malloc, > > > or possibly there is a more sufficient reason for using just the ptmalloc2 > > > source code supplied? > > > > > > Sincerely, > > > > > > Alexei I. Adamovich > > > > > > On Thu, Jan 03, 2008 at 09:24:08AM -0600, Eric A. Borisch wrote: > > > > Lei, > > > > > > > > Thanks for the information. I would suggest that, if this can't be > > > > fixed in the vapi version, then the LAZY_MEM_UNREGISTER define should > > > > be removed from the default compile options for the versions where it > > > > is (apparently) not fully supported. > > > > > > > > This is a very nasty bug. The MPI layer reports back no errors, but > > > > the data isn't actually transferred successfully. In addition, it > > > > presents as a timing / waiting error to the user, as all of the local > > > > (shared mem) peers transfer data successfully, so significant time can > > > > be spent chasing down a suspected user oversight for what is actually > > > > an error within the MPI layer. > > > > > > > > This would apply to the MVAPICH and MVAPICH2, in both the vapi and > > > > vapi_multirail makefiles. > > > > > > > > In addition, it should be documented that the LAZY_MEM_UNREGISTER > > > > switch is NOT compatible with vapi-based channels. > > > > > > > > Thanks, > > > > Eric > > > > > > > > On Dec 21, 2007 5:29 PM, LEI CHAI wrote: > > > > > Hi Eric, > > > > > > > > > > Thanks for using mvapich/mvapich2. The problem you reported can be solved by using the PTMALLOC feature which is supported by the gen2 device but not vapi/vapi_multirail. Not much features have been added to vapi/vapi_multirail devices for the last few releases because not many people use them. Since you cannot move to gen2, we would suggest you disable LAZY_MEM_UNREGISTER for your tests. > > > > > > > > > > Thanks, > > > > > Lei > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Eric A. Borisch" > > > > > Date: Friday, December 21, 2007 10:23 am > > > > > Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 > > > > > > > > > > > I seem to be running into a memory registration issue. > > > > > > > > > > > > Observations: > > > > > > > > > > > > 1) During some transfers (MPI_Isend / MPI_Irecv / MPI_Waitall) > > > > > > into a > > > > > > local buffer on the root rank, I receive all of the data from any > > > > > > ranks that are running on the same machine, but only part (or none at > > > > > > all) of the data from ranks running on external machines. The transfer > > > > > > length is above the eager/rendezvous threshold. > > > > > > 2) Once the problem occurs, it is persistent. However, if I force > > > > > > MVAPICH to re-register by calling "while(dreg_evict())" at this point > > > > > > and then re-transfer, the correct data is received. (Same memory being > > > > > > transferred from / to.) > > > > > > 3) I've only witnessed problems occurring above the 4G (as > > > > > > returned by > > > > > > malloc()) memory range. > > > > > > 4) When I receive partial data from ranks, the data ends on a (4k) > > > > > > page bound. Data past this bound (which should have been updated) is > > > > > > unchanged during the transfer, yet both the sender and receiver report > > > > > > no errors. (This is very bad!) > > > > > > 5) Stepping through the code on both ends of the transfer shows the > > > > > > software agreeing on the (correct) length and location as far down as > > > > > > I can follow it. > > > > > > 6) Running against a compilation with no -DLAZY_MEM_UNREGISTER shows > > > > > > no issues. (Other than the expected performance hit.) > > > > > > 7) Occurs on both MVAPICH-1.0-beta (vapi_multirail) and mvapich2- > > > > > > 1.0 (vapi) > > > > > > 8) The user code is also sending data out (from a different buffer) > > > > > > over ethernet to a remote gui from the root node. > > > > > > > > > > > > I can't move to gen2 at this point -- we are using a vendor library > > > > > > for interfacing to another system, and this library uses VAPI. > > > > > > > > > > > > uname -a output: > > > > > > Linux rt2.mayo.edu 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST > > > > > > 2006 x86_64 x86_64 x86_64 GNU/Linux > > > > > > > > > > > > Intel SE7520JR2 motherboards. 4G physical ram on each node. > > > > > > > > > > > > It appears (perhaps this is obvious) that the assumption that memory > > > > > > registered (by the dreg.c code) remains registered until explicitly > > > > > > unregistered (again, by the dreg.c code) is being violated in some > > > > > > way. This, however, is wading in to uncharted (for me, at least) linux > > > > > > memory management waters. The user code is doing nothing to fiddle > > > > > > with registration in any explicit way. (With the exception of as > > > > > > mentioned in (2)) > > > > > > > > > > > > Please let me know what other information I can provide to resolve > > > > > > this. I'm still trying to put together a small test program to cause > > > > > > the problem, but have been unsuccessful so far. > > > > > > > > > > > > Thanks, > > > > > > Eric > > > > > > -- > > > > > > Eric A. Borisch > > > > > > eborisch@ieee.org > > > > > > _______________________________________________ > > > > > > mvapich-discuss mailing list > > > > > > mvapich-discuss@cse.ohio-state.edu > > > > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Eric A. Borisch > > > > eborisch@ieee.org > > > > _______________________________________________ > > > > mvapich-discuss mailing list > > > > mvapich-discuss@cse.ohio-state.edu > > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > -- > Eric A. Borisch > eborisch@ieee.org > > Howard Roark laughed. > From panda at cse.ohio-state.edu Fri Jan 4 17:31:53 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jan 4 17:31:58 2008 Subject: [mvapich-discuss] Troubles building/installing OFEM 1.2 on Fedora Core 4 64-bit In-Reply-To: <009c01c84f14$b00a89c0$101f9d40$@held@staarinc.com> Message-ID: Ben - Sorry to know that you are experiencing problems in building/installing OFED 1.2 on Fedora Core 4 64 bit system. FYI, the latest released version of OFED 1.2 is OFED 1.2.5.4. Regarding your rpm build errors, I am forwarding your note to `ewg' and `general' lists of Open Fabrics. More experienced users on these two lists can give you prompt feedbacks and guidance on the basic OFED installation issues. Thanks, DK On Fri, 4 Jan 2008, Ben Held wrote: > We are seeing a failure during the install process (out of rpmbuild) on a > Fedora Core 4 64-bit system. The tail of the log is here: > > > > Hunk #1 succeeded at 456 (offset 156 lines). > > Hunk #2 succeeded at 569 (offset 75 lines). > > Hunk #3 succeeded at 672 (offset 157 lines). > > Hunk #4 succeeded at 1444 (offset 281 lines). > > Hunk #5 succeeded at 1340 (offset 157 lines). > > Hunk #6 succeeded at 1791 with fuzz 1 (offset 498 lines). > > > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_patches/backport/2.6.11_FC4/use > r_mad_3935_to_2_6_11_FC4.patch > > patching file drivers/infiniband/core/user_mad.c > > patch: **** malformed patch at line 12: @@ -827,13 +952,13 @@ static int > ib_umad_init_port(struct ib_d > > > > Failed to apply patch: > /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_patches/backport/2.6.11_FC4/use > r_mad_3935_to_2_6_11_FC4.patch > > error: Bad exit status from /var/tmp/rpm-tmp.88475 (%install) > > > > > > RPM build errors: > > user vlad does not exist - using root > > group vlad does not exist - using root > > user vlad does not exist - using root > > group vlad does not exist - using root > > Bad exit status from /var/tmp/rpm-tmp.88475 (%install) > > ERROR: Failed executing "rpmbuild --rebuild --define '_topdir > /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root > /var/tmp/OFED' --defi > > ne 'configure_options --with-cxgb3-mod --with-ipoib-mod --with-mthca-mod > --with-sdp-mod --with-srp-mod --with-core-mod --with-user_mad-mod --with- > > user_access-mod --with-addr_trans-mod --with-rds-mod ' --define 'KVERSION > 2.6.11-1.1369_FC4smp' --define 'KSRC /lib/modules/2.6.11-1.1369_FC4smp/b > > uild' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' > --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprob > > e_update 1' --define 'include_ipoib_conf 1' > /usr/etc/OFED-1.2-rc5/SRPMS/ofa_kernel-1.2-rc5.src.rpm" > > > > > > Any ideas? > > > > Regards, > > > > Ben Held > Simulation Technology & Applied Research, Inc. > 11520 N. Port Washington Rd., Suite 201 > Mequon, WI 53092 > P: 1.262.240.0291 x101 > F: 1.262.240.0294 > E: ben.held@staarinc.com > http://www.staarinc.com > > > > > > From jsquyres at cisco.com Fri Jan 4 17:48:41 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri Jan 4 17:49:00 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: References: Message-ID: <247F99BD-D5E2-415E-9078-55FF4071756A@cisco.com> On Jan 4, 2008, at 5:27 PM, Matthew Koop wrote: > The problem is not an inherent issue with VAPI. Similar support > could be > ported to the VAPI device as well. Thus far, we have been including > new > features in the OpenFabrics/Gen2 device as vendors have mostly moved > support to Gen2. I'll second this: Cisco is doing all of its new HPC IB development with the OpenFabrics stack (and has been over over a year). Open MPI has dropped VAPI support in its upcoming v1.3 release. We encourage all of our HPC customers to upgrade from VAPI-based stacks to OFED if possible. -- Jeff Squyres Cisco Systems From eborisch at ieee.org Fri Jan 4 17:53:13 2008 From: eborisch at ieee.org (Eric A. Borisch) Date: Fri Jan 4 17:53:21 2008 Subject: [mvapich-discuss] Odd behavior with memory registration / dreg / MVAPICH and MVAPICH2 In-Reply-To: <247F99BD-D5E2-415E-9078-55FF4071756A@cisco.com> References: <247F99BD-D5E2-415E-9078-55FF4071756A@cisco.com> Message-ID: <392f95800801041453n4d73c5bbxa6e21d02d0a3829b@mail.gmail.com> I plan to move over and will as soon as the vendor we share an interface with (also via Infiniband, but not MPI) moves over. (Out of the scope of this discussion. :) For now, it sounds like I'll be turning off this (LAZY_MEM_UNREGISTER) option in the vapi code and running in that fashion. Thanks for the help, Eric On Jan 4, 2008 4:48 PM, Jeff Squyres wrote: > On Jan 4, 2008, at 5:27 PM, Matthew Koop wrote: > > > The problem is not an inherent issue with VAPI. Similar support > > could be > > ported to the VAPI device as well. Thus far, we have been including > > new > > features in the OpenFabrics/Gen2 device as vendors have mostly moved > > support to Gen2. > > I'll second this: Cisco is doing all of its new HPC IB development > with the OpenFabrics stack (and has been over over a year). Open MPI > has dropped VAPI support in its upcoming v1.3 release. > > We encourage all of our HPC customers to upgrade from VAPI-based > stacks to OFED if possible. > > -- > Jeff Squyres > Cisco Systems > > -- Eric A. Borisch eborisch@ieee.org From brian.budge at gmail.com Fri Jan 4 18:04:33 2008 From: brian.budge at gmail.com (Brian Budge) Date: Fri Jan 4 18:04:42 2008 Subject: [mvapich-discuss] Re: unrecognized protocol for send/recv over 8KB In-Reply-To: <5b7094580801031746h57c0e7f0i8f80b45e6f6918e7@mail.gmail.com> References: <5b7094580801031746h57c0e7f0i8f80b45e6f6918e7@mail.gmail.com> Message-ID: <5b7094580801041504h392f7889vbe4712bfa8a71d46@mail.gmail.com> Hi again - I noticed this in the benchmark code: int large_message_size = 8192; Does MVAPICH internally treat messages over 8192 bytes differently than those under 8 KB? Could this be something wrong with how I've configured infiniband? I had a program running OpenMPI already over IB on the system, but maybe I need to configure something special for MVAPICH? Sorry if I appear to be grasping at straws... but I am ;) Thanks, Brian On Jan 3, 2008 5:46 PM, Brian Budge wrote: > Hi all - > > I'm new to the list here... hi! I have been using OpenMPI for a while, > and LAM before that, but new requirements keep pushing me to new > implementations. In particular, I was interested in using infiniband (using > OFED 1.2.5.1) in a multi-threaded environment. It seems that MVAPICH is > the library for that particular combination :) > > In any case, I installed MVAPICH, and I can boot the daemons, and run the > ring speed test with no problems. When I run any programs with mpirun, > however, I get an error when sending or receiving more than 8192 bytes. > > For example, if I run the bandwidth test from the benchmarks page > (osu_bw.c), I get the following: > --------------------------------------------------------------- > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > Thursday 06:16:00 > burn > burn-3 > # OSU MPI Bandwidth Test v3.0 > # Size Bandwidth (MB/s) > 1 1.24 > 2 2.72 > 4 5.44 > 8 10.18 > 16 19.09 > 32 29.69 > 64 65.01 > 128 147.31 > 256 244.61 > 512 354.32 > 1024 367.91 > 2048 451.96 > 4096 550.66 > 8192 598.35 > [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to send > Internal Error: invalid error code ffffffff (Ring Index out of range) in > MPIDI_CH3_RndvSend:263 > Fatal error in MPI_Waitall: > Other MPI error, error stack: > MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, > status_array=0xdb3140) failed > (unknown)(): Other MPI error > rank 1 in job 4 burn_37156 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 > --------------------------------------------------------------- > > I get a similar problem with the latency test, however, the protocol that > is complained about is different: > -------------------------------------------------------------------- > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > Thursday 09:21:20 > # OSU MPI Latency Test v3.0 > # Size Latency (us) > 0 3.93 > 1 4.07 > 2 4.06 > 4 3.82 > 8 3.98 > 16 4.03 > 32 4.00 > 64 4.28 > 128 5.22 > 256 5.88 > 512 8.65 > 1024 9.11 > 2048 11.53 > 4096 16.17 > 8192 25.67 > [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv req to > send > Internal Error: invalid error code ffffffff (Ring Index out of range) in > MPIDI_CH3_RndvSend:263 > Fatal error in MPI_Recv: > Other MPI error, error stack: > MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, tag=1, > MPI_COMM_WORLD, status=0x7fff14c7bde0) failed > (unknown)(): Other MPI error > rank 1 in job 5 burn_37156 caused collective abort of all ranks > -------------------------------------------------------------------- > > The protocols (0 and 8126589) are consistent if I run the program multiple > times. > > Anyone have any ideas? If you need more info, please let me know. > > Thanks, > Brian > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080104/fb86b07d/attachment-0001.html From huanwei at cse.ohio-state.edu Fri Jan 4 21:12:46 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Fri Jan 4 21:12:52 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB (fwd) In-Reply-To: Message-ID: Hi Brian, Thanks for letting us know this problem. Would you please let us know some more details to help us locate the issue. 1) More details on your platform. 2) Exact version of mvapich2 you are using. Is it from OFED package? or some version from our website. 3) If it is from our website, did you change anything from the default compiling scripts? Thanks. -- Wei > I'm new to the list here... hi! I have been using OpenMPI for a while, and > LAM before that, but new requirements keep pushing me to new > implementations. In particular, I was interested in using infiniband (using > OFED 1.2.5.1) in a multi-threaded environment. It seems that MVAPICH is the > library for that particular combination :) > > In any case, I installed MVAPICH, and I can boot the daemons, and run the > ring speed test with no problems. When I run any programs with mpirun, > however, I get an error when sending or receiving more than 8192 bytes. > > For example, if I run the bandwidth test from the benchmarks page > (osu_bw.c), I get the following: > --------------------------------------------------------------- > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > Thursday 06:16:00 > burn > burn-3 > # OSU MPI Bandwidth Test v3.0 > # Size Bandwidth (MB/s) > 1 1.24 > 2 2.72 > 4 5.44 > 8 10.18 > 16 19.09 > 32 29.69 > 64 65.01 > 128 147.31 > 256 244.61 > 512 354.32 > 1024 367.91 > 2048 451.96 > 4096 550.66 > 8192 598.35 > [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to send > Internal Error: invalid error code ffffffff (Ring Index out of range) in > MPIDI_CH3_RndvSend:263 > Fatal error in MPI_Waitall: > Other MPI error, error stack: > MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, > status_array=0xdb3140) failed > (unknown)(): Other MPI error > rank 1 in job 4 burn_37156 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 > --------------------------------------------------------------- > > I get a similar problem with the latency test, however, the protocol that is > complained about is different: > -------------------------------------------------------------------- > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > Thursday 09:21:20 > # OSU MPI Latency Test v3.0 > # Size Latency (us) > 0 3.93 > 1 4.07 > 2 4.06 > 4 3.82 > 8 3.98 > 16 4.03 > 32 4.00 > 64 4.28 > 128 5.22 > 256 5.88 > 512 8.65 > 1024 9.11 > 2048 11.53 > 4096 16.17 > 8192 25.67 > [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv req to > send > Internal Error: invalid error code ffffffff (Ring Index out of range) in > MPIDI_CH3_RndvSend:263 > Fatal error in MPI_Recv: > Other MPI error, error stack: > MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, tag=1, > MPI_COMM_WORLD, status=0x7fff14c7bde0) failed > (unknown)(): Other MPI error > rank 1 in job 5 burn_37156 caused collective abort of all ranks > -------------------------------------------------------------------- > > The protocols (0 and 8126589) are consistent if I run the program multiple > times. > > Anyone have any ideas? If you need more info, please let me know. > > Thanks, > Brian > From brian.budge at gmail.com Fri Jan 4 21:23:58 2008 From: brian.budge at gmail.com (Brian Budge) Date: Fri Jan 4 21:24:07 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB (fwd) In-Reply-To: References: Message-ID: <5b7094580801041823y10e4a565x36833e66431b94e1@mail.gmail.com> Hi Wei - I am running gentoo linux on amd64, 2 or 4 opteron 8216 per node. Kernel is 2.6.23-gentoo-r4 SMP. I have infiniband built into the kernel: CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_USER_MEM=y CONFIG_INFINIBAND_ADDR_TRANS=y CONFIG_INFINIBAND_MTHCA=y CONFIG_INFINIBAND_MTHCA_DEBUG=y CONFIG_INFINIBAND_AMSO1100=y CONFIG_MLX4_INFINIBAND=y CONFIG_INFINIBAND_IPOIB=y CONFIG_INFINIBAND_IPOIB_DEBUG=y I am using the openib-mvapich2-1.0.1 package in the gentoo-science overlay addition to the standard gentoo packages. I have also tried 1.0 with the same results. I compiled with multithreading turned on (haven't tried without this, but the sample codes I am initially testing are not multithreaded, although my application is). I also tried with or without rdma with no change. The script seems to be setting the build for SMALL_CLUSTER. Let me know what other information would be useful. Thanks, Brian On Jan 4, 2008 6:12 PM, wei huang wrote: > Hi Brian, > > Thanks for letting us know this problem. Would you please let us know some > more details to help us locate the issue. > > 1) More details on your platform. > > 2) Exact version of mvapich2 you are using. Is it from OFED package? or > some version from our website. > > 3) If it is from our website, did you change anything from the default > compiling scripts? > > Thanks. > > -- Wei > > I'm new to the list here... hi! I have been using OpenMPI for a while, > and > > LAM before that, but new requirements keep pushing me to new > > implementations. In particular, I was interested in using infiniband > (using > > OFED 1.2.5.1) in a multi-threaded environment. It seems that MVAPICH is > the > > library for that particular combination :) > > > > In any case, I installed MVAPICH, and I can boot the daemons, and run > the > > ring speed test with no problems. When I run any programs with mpirun, > > however, I get an error when sending or receiving more than 8192 bytes. > > > > For example, if I run the bandwidth test from the benchmarks page > > (osu_bw.c), I get the following: > > --------------------------------------------------------------- > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > Thursday 06:16:00 > > burn > > burn-3 > > # OSU MPI Bandwidth Test v3.0 > > # Size Bandwidth (MB/s) > > 1 1.24 > > 2 2.72 > > 4 5.44 > > 8 10.18 > > 16 19.09 > > 32 29.69 > > 64 65.01 > > 128 147.31 > > 256 244.61 > > 512 354.32 > > 1024 367.91 > > 2048 451.96 > > 4096 550.66 > > 8192 598.35 > > [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to > send > > Internal Error: invalid error code ffffffff (Ring Index out of range) in > > MPIDI_CH3_RndvSend:263 > > Fatal error in MPI_Waitall: > > Other MPI error, error stack: > > MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, > > status_array=0xdb3140) failed > > (unknown)(): Other MPI error > > rank 1 in job 4 burn_37156 caused collective abort of all ranks > > exit status of rank 1: killed by signal 9 > > --------------------------------------------------------------- > > > > I get a similar problem with the latency test, however, the protocol > that is > > complained about is different: > > -------------------------------------------------------------------- > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > Thursday 09:21:20 > > # OSU MPI Latency Test v3.0 > > # Size Latency (us) > > 0 3.93 > > 1 4.07 > > 2 4.06 > > 4 3.82 > > 8 3.98 > > 16 4.03 > > 32 4.00 > > 64 4.28 > > 128 5.22 > > 256 5.88 > > 512 8.65 > > 1024 9.11 > > 2048 11.53 > > 4096 16.17 > > 8192 25.67 > > [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv req > to > > send > > Internal Error: invalid error code ffffffff (Ring Index out of range) in > > MPIDI_CH3_RndvSend:263 > > Fatal error in MPI_Recv: > > Other MPI error, error stack: > > MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, > tag=1, > > MPI_COMM_WORLD, status=0x7fff14c7bde0) failed > > (unknown)(): Other MPI error > > rank 1 in job 5 burn_37156 caused collective abort of all ranks > > -------------------------------------------------------------------- > > > > The protocols (0 and 8126589) are consistent if I run the program > multiple > > times. > > > > Anyone have any ideas? If you need more info, please let me know. > > > > Thanks, > > Brian > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080104/86c75792/attachment.html From tziporet at dev.mellanox.co.il Sun Jan 6 06:34:07 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun Jan 6 07:11:00 2008 Subject: [ewg] Re: [mvapich-discuss] Troubles building/installing OFEM 1.2 on Fedora Core 4 64-bit In-Reply-To: References: Message-ID: <4780BCAF.2020806@mellanox.co.il> Dhabaleswar Panda wrote: > Ben - Sorry to know that you are experiencing problems in > building/installing OFED 1.2 on Fedora Core 4 64 bit system. > > FYI, the latest released version of OFED 1.2 is OFED 1.2.5.4. > > Regarding your rpm build errors, I am forwarding your note to `ewg' and > `general' lists of Open Fabrics. More experienced users on these two > lists can give you prompt feedbacks and guidance on the basic OFED > installation issues. > > We do not support Fedora Core 4 with OFED 1.2 and 1.2.5 I suggest you move to Fedora Core 6 Tziporet From huanwei at cse.ohio-state.edu Sun Jan 6 09:38:20 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Sun Jan 6 09:38:25 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB (fwd) In-Reply-To: <5b7094580801041823y10e4a565x36833e66431b94e1@mail.gmail.com> Message-ID: Hi Brian, > I am using the openib-mvapich2-1.0.1 package in the gentoo-science overlay > addition to the standard gentoo packages. I have also tried 1.0 with the > same results. > > I compiled with multithreading turned on (haven't tried without this, but > the sample codes I am initially testing are not multithreaded, although my > application is). I also tried with or without rdma with no change. The > script seems to be setting the build for SMALL_CLUSTER. So you are using make.mvapich2.ofa to compile the package? I am a bit confused about ''I also tried with or without rdma with no change''. What exact change you made here? Also, SMALL_CLUSTER is obsolete for ofa stack... -- Wei > > Let me know what other information would be useful. > > Thanks, > Brian > > > > On Jan 4, 2008 6:12 PM, wei huang wrote: > > > Hi Brian, > > > > Thanks for letting us know this problem. Would you please let us know some > > more details to help us locate the issue. > > > > 1) More details on your platform. > > > > 2) Exact version of mvapich2 you are using. Is it from OFED package? or > > some version from our website. > > > > 3) If it is from our website, did you change anything from the default > > compiling scripts? > > > > Thanks. > > > > -- Wei > > > I'm new to the list here... hi! I have been using OpenMPI for a while, > > and > > > LAM before that, but new requirements keep pushing me to new > > > implementations. In particular, I was interested in using infiniband > > (using > > > OFED 1.2.5.1) in a multi-threaded environment. It seems that MVAPICH is > > the > > > library for that particular combination :) > > > > > > In any case, I installed MVAPICH, and I can boot the daemons, and run > > the > > > ring speed test with no problems. When I run any programs with mpirun, > > > however, I get an error when sending or receiving more than 8192 bytes. > > > > > > For example, if I run the bandwidth test from the benchmarks page > > > (osu_bw.c), I get the following: > > > --------------------------------------------------------------- > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > Thursday 06:16:00 > > > burn > > > burn-3 > > > # OSU MPI Bandwidth Test v3.0 > > > # Size Bandwidth (MB/s) > > > 1 1.24 > > > 2 2.72 > > > 4 5.44 > > > 8 10.18 > > > 16 19.09 > > > 32 29.69 > > > 64 65.01 > > > 128 147.31 > > > 256 244.61 > > > 512 354.32 > > > 1024 367.91 > > > 2048 451.96 > > > 4096 550.66 > > > 8192 598.35 > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to > > send > > > Internal Error: invalid error code ffffffff (Ring Index out of range) in > > > MPIDI_CH3_RndvSend:263 > > > Fatal error in MPI_Waitall: > > > Other MPI error, error stack: > > > MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, > > > status_array=0xdb3140) failed > > > (unknown)(): Other MPI error > > > rank 1 in job 4 burn_37156 caused collective abort of all ranks > > > exit status of rank 1: killed by signal 9 > > > --------------------------------------------------------------- > > > > > > I get a similar problem with the latency test, however, the protocol > > that is > > > complained about is different: > > > -------------------------------------------------------------------- > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > Thursday 09:21:20 > > > # OSU MPI Latency Test v3.0 > > > # Size Latency (us) > > > 0 3.93 > > > 1 4.07 > > > 2 4.06 > > > 4 3.82 > > > 8 3.98 > > > 16 4.03 > > > 32 4.00 > > > 64 4.28 > > > 128 5.22 > > > 256 5.88 > > > 512 8.65 > > > 1024 9.11 > > > 2048 11.53 > > > 4096 16.17 > > > 8192 25.67 > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv req > > to > > > send > > > Internal Error: invalid error code ffffffff (Ring Index out of range) in > > > MPIDI_CH3_RndvSend:263 > > > Fatal error in MPI_Recv: > > > Other MPI error, error stack: > > > MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, > > tag=1, > > > MPI_COMM_WORLD, status=0x7fff14c7bde0) failed > > > (unknown)(): Other MPI error > > > rank 1 in job 5 burn_37156 caused collective abort of all ranks > > > -------------------------------------------------------------------- > > > > > > The protocols (0 and 8126589) are consistent if I run the program > > multiple > > > times. > > > > > > Anyone have any ideas? If you need more info, please let me know. > > > > > > Thanks, > > > Brian > > > > > > > > From nilesh_awate at yahoo.com Mon Jan 7 01:15:26 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Mon Jan 7 01:15:36 2008 Subject: [mvapich-discuss] protocol used for MPI_FInaize in mvapich2 Message-ID: <80583.69286.qm@web94115.mail.in2.yahoo.com> Hi all, I'm using mvapich2-1.0.1 with OFED1.2(udapl stack) To know the flow of MPI_FInalize i put some debug statement in source code & tried simple mpi test code (only init & finalize api) I observed there is shutting down/closing protocol (in which every process does 2dto) some body plz tell how these dto (function trace of MPI_Finalize) happen what is exact protocol is mvapich follows. thanking, Nilesh 5, 50, 500, 5000 - Store N number of mails in your inbox. Go to http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080107/0c655951/attachment.html From methier at CGR.Harvard.edu Mon Jan 7 10:49:14 2008 From: methier at CGR.Harvard.edu (Michael Ethier) Date: Mon Jan 7 10:49:22 2008 Subject: [mvapich-discuss] error IBV_WC_LOC_LEN_ERR and FATAL event IBV_EVENT_QP_LAST_WQE_REACHED Message-ID: Hello, I am new to this forum and hoping someone can help solve the following problem for me. We have a modeling application that initializes and runs fine using an ordinary Ethernet connection. When we compile using the Infiniband software package (mvapich-0.9.9) and run, the application fails with the following at then end: [0:moorcrofth] Abort: [moorcrofth:0] Got completion with error IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 at line 388 in file viacheck.c [0:moorcrofth] Abort: [0] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, code=16 at line 2552 in file viacheck.c mpirun_rsh: Abort signaled from [0 : moorcrofth] remote host is [1 : moorcroft8 ] forrtl: error (78): process killed (SIGTERM) forrtl: error (78): process killed (SIGTERM) done. This occurs at the initialization phase it seems when communication starts between different nodes. If I set the hostfile to contain the same node so that all the cpus used are on 1 node, it initializes fine and runs. We are using Redhat Enterprise 4 Update 5 on x86_64 uname -a Linux moorcrofth 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux In addition we are using mvapich-0.9.9 for our Infiniband software package, and Intel 9.1: [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpicc --version icc (ICC) 9.1 20070510 Copyright (C) 1985-2007 Intel Corporation. All rights reserved. [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpif90 --version ifort (IFORT) 9.1 20070510 Copyright (C) 1985-2007 Intel Corporation. All rights reserved. We are using the rsh communication protocol for this: /usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 ........ Can anyone suggest how this problem can be solved ? Thank You in advance, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080107/df28a63c/attachment.html From koop at cse.ohio-state.edu Mon Jan 7 12:26:01 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Jan 7 12:26:08 2008 Subject: [mvapich-discuss] error IBV_WC_LOC_LEN_ERR and FATAL event IBV_EVENT_QP_LAST_WQE_REACHED In-Reply-To: Message-ID: Michael, Do other more simple benchmarks work (e.g. osu_benchmarks/osu_bw)? If they do, this is something we'd like to take a closer look at. I'd be interested to know if setting VIADEV_USE_COALESCE=0 resolves the issue: e.g. mpirun_rsh -np 2 h1 h2 VIADEV_USE_COALESCE=0 ./exec Matt On Mon, 7 Jan 2008, Michael Ethier wrote: > Hello, > > > > I am new to this forum and hoping someone can help solve the following > problem for me. > > > > We have a modeling application that initializes and runs fine using an > ordinary Ethernet connection. > > > > When we compile using the Infiniband software package (mvapich-0.9.9) > and run, the application fails with the following > > at then end: > > > > [0:moorcrofth] Abort: [moorcrofth:0] Got completion with error > IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 > > at line 388 in file viacheck.c > > [0:moorcrofth] Abort: [0] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, > code=16 > > at line 2552 in file viacheck.c > > mpirun_rsh: Abort signaled from [0 : moorcrofth] remote host is [1 : > moorcroft8 ] > > forrtl: error (78): process killed (SIGTERM) > > forrtl: error (78): process killed (SIGTERM) > > done. > > > > This occurs at the initialization phase it seems when communication > starts between different nodes. > > If I set the hostfile to contain the same node so that all the cpus used > are on 1 node, it initializes fine and runs. > > > > We are using Redhat Enterprise 4 Update 5 on x86_64 > > > > uname -a > > Linux moorcrofth 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 > x86_64 x86_64 x86_64 GNU/Linux > > > > In addition we are using mvapich-0.9.9 for our Infiniband software > package, and Intel 9.1: > > > > [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpicc --version > > icc (ICC) 9.1 20070510 > > Copyright (C) 1985-2007 Intel Corporation. All rights reserved. > > > > [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpif90 --version > > ifort (IFORT) 9.1 20070510 > > Copyright (C) 1985-2007 Intel Corporation. All rights reserved. > > > > We are using the rsh communication protocol for this: > > /usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 ........ > > > > Can anyone suggest how this problem can be solved ? > > > > Thank You in advance, > > Mike > > > > From koop at cse.ohio-state.edu Mon Jan 7 12:27:38 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Jan 7 12:27:44 2008 Subject: [mvapich-discuss] error IBV_WC_LOC_LEN_ERR and FATAL event IBV_EVENT_QP_LAST_WQE_REACHED In-Reply-To: Message-ID: Michael, Also, is your code making any system calls or forking? Matt On Mon, 7 Jan 2008, Matthew Koop wrote: > Michael, > > Do other more simple benchmarks work (e.g. osu_benchmarks/osu_bw)? > > If they do, this is something we'd like to take a closer look at. I'd be > interested to know if setting VIADEV_USE_COALESCE=0 resolves the issue: > > e.g. > mpirun_rsh -np 2 h1 h2 VIADEV_USE_COALESCE=0 ./exec > > > Matt > > On Mon, 7 Jan 2008, Michael Ethier wrote: > > > Hello, > > > > > > > > I am new to this forum and hoping someone can help solve the following > > problem for me. > > > > > > > > We have a modeling application that initializes and runs fine using an > > ordinary Ethernet connection. > > > > > > > > When we compile using the Infiniband software package (mvapich-0.9.9) > > and run, the application fails with the following > > > > at then end: > > > > > > > > [0:moorcrofth] Abort: [moorcrofth:0] Got completion with error > > IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 > > > > at line 388 in file viacheck.c > > > > [0:moorcrofth] Abort: [0] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, > > code=16 > > > > at line 2552 in file viacheck.c > > > > mpirun_rsh: Abort signaled from [0 : moorcrofth] remote host is [1 : > > moorcroft8 ] > > > > forrtl: error (78): process killed (SIGTERM) > > > > forrtl: error (78): process killed (SIGTERM) > > > > done. > > > > > > > > This occurs at the initialization phase it seems when communication > > starts between different nodes. > > > > If I set the hostfile to contain the same node so that all the cpus used > > are on 1 node, it initializes fine and runs. > > > > > > > > We are using Redhat Enterprise 4 Update 5 on x86_64 > > > > > > > > uname -a > > > > Linux moorcrofth 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 > > x86_64 x86_64 x86_64 GNU/Linux > > > > > > > > In addition we are using mvapich-0.9.9 for our Infiniband software > > package, and Intel 9.1: > > > > > > > > [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpicc --version > > > > icc (ICC) 9.1 20070510 > > > > Copyright (C) 1985-2007 Intel Corporation. All rights reserved. > > > > > > > > [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpif90 --version > > > > ifort (IFORT) 9.1 20070510 > > > > Copyright (C) 1985-2007 Intel Corporation. All rights reserved. > > > > > > > > We are using the rsh communication protocol for this: > > > > /usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 ........ > > > > > > > > Can anyone suggest how this problem can be solved ? > > > > > > > > Thank You in advance, > > > > Mike > > > > > > > > > > From brian.budge at gmail.com Mon Jan 7 12:30:24 2008 From: brian.budge at gmail.com (Brian Budge) Date: Mon Jan 7 12:32:49 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB (fwd) In-Reply-To: References: <5b7094580801041823y10e4a565x36833e66431b94e1@mail.gmail.com> Message-ID: <5b7094580801070930j26608c5qef31b73fa4d426e7@mail.gmail.com> Hi Wei - I changed from SMALL_CLUSTER to MEDIUM_CLUSTER, but it made no difference. When I build with rdma, this adds the following: export LIBS="${LIBS} -lrdmacm" export CFLAGS="${CFLAGS} -DADAPTIVE_RDMA_FAST_PATH -DRDMA_CM" It seems that I am using the make.mvapich2.detect script to build. It asks me for my interface, and gives me the option for the mellanox interface, which I choose. I just tried a fresh install directly from the tarball instead of using the gentoo package. Now the program completes (goes beyond 8K message), but my bandwidth isn't very good. Running the osu_bw.c test, I get about 250 MB/s maximum. It seems like IB isn't being used. I did the following: ./make.mvapich2.detect #, and chose the mellanox option ./configure --enable-threads=multiple make make install So it seems that the package is doing something to enable infiniband that I am not doing with the tarball. Conversely, the tarball can run without crashing. Advice? Thanks, Brian On Jan 6, 2008 6:38 AM, wei huang < huanwei@cse.ohio-state.edu> wrote: > Hi Brian, > > > I am using the openib-mvapich2-1.0.1 package in the gentoo-science > overlay > > addition to the standard gentoo packages. I have also tried 1.0 with > the > > same results. > > > > I compiled with multithreading turned on (haven't tried without this, > but > > the sample codes I am initially testing are not multithreaded, although > my > > application is). I also tried with or without rdma with no change. The > > > script seems to be setting the build for SMALL_CLUSTER. > > So you are using make.mvapich2.ofa to compile the package? I am a bit > confused about ''I also tried with or without rdma with no change''. What > exact change you made here? Also, SMALL_CLUSTER is obsolete for ofa > stack... > > -- Wei > > > > > Let me know what other information would be useful. > > > > Thanks, > > Brian > > > > > > > > On Jan 4, 2008 6:12 PM, wei huang wrote: > > > > > Hi Brian, > > > > > > Thanks for letting us know this problem. Would you please let us know > some > > > more details to help us locate the issue. > > > > > > 1) More details on your platform. > > > > > > 2) Exact version of mvapich2 you are using. Is it from OFED package? > or > > > some version from our website. > > > > > > 3) If it is from our website, did you change anything from the default > > > > compiling scripts? > > > > > > Thanks. > > > > > > -- Wei > > > > I'm new to the list here... hi! I have been using OpenMPI for a > while, > > > and > > > > LAM before that, but new requirements keep pushing me to new > > > > implementations. In particular, I was interested in using > infiniband > > > (using > > > > OFED 1.2.5.1) in a multi-threaded environment. It seems that > MVAPICH is > > > the > > > > library for that particular combination :) > > > > > > > > In any case, I installed MVAPICH, and I can boot the daemons, and > run > > > the > > > > ring speed test with no problems. When I run any programs with > mpirun, > > > > however, I get an error when sending or receiving more than 8192 > bytes. > > > > > > > > For example, if I run the bandwidth test from the benchmarks page > > > > (osu_bw.c), I get the following: > > > > --------------------------------------------------------------- > > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > > Thursday 06:16:00 > > > > burn > > > > burn-3 > > > > # OSU MPI Bandwidth Test v3.0 > > > > # Size Bandwidth (MB/s) > > > > 1 1.24 > > > > 2 2.72 > > > > 4 5.44 > > > > 8 10.18 > > > > 16 19.09 > > > > 32 29.69 > > > > 64 65.01 > > > > 128 147.31 > > > > 256 244.61 > > > > 512 354.32 > > > > 1024 367.91 > > > > 2048 451.96 > > > > 4096 550.66 > > > > 8192 598.35 > > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to > > > send > > > > Internal Error: invalid error code ffffffff (Ring Index out of > range) in > > > > MPIDI_CH3_RndvSend:263 > > > > Fatal error in MPI_Waitall: > > > > Other MPI error, error stack: > > > > MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, > > > > status_array=0xdb3140) failed > > > > (unknown)(): Other MPI error > > > > rank 1 in job 4 burn_37156 caused collective abort of all ranks > > > > exit status of rank 1: killed by signal 9 > > > > --------------------------------------------------------------- > > > > > > > > I get a similar problem with the latency test, however, the protocol > > > that is > > > > complained about is different: > > > > -------------------------------------------------------------------- > > > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > > Thursday 09:21:20 > > > > # OSU MPI Latency Test v3.0 > > > > # Size Latency (us) > > > > 0 3.93 > > > > 1 4.07 > > > > 2 4.06 > > > > 4 3.82 > > > > 8 3.98 > > > > 16 4.03 > > > > 32 4.00 > > > > 64 4.28 > > > > 128 5.22 > > > > 256 5.88 > > > > 512 8.65 > > > > 1024 9.11 > > > > 2048 11.53 > > > > 4096 16.17 > > > > 8192 25.67 > > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv > req > > > to > > > > send > > > > Internal Error: invalid error code ffffffff (Ring Index out of > range) in > > > > MPIDI_CH3_RndvSend:263 > > > > Fatal error in MPI_Recv: > > > > Other MPI error, error stack: > > > > MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, > > > tag=1, > > > > MPI_COMM_WORLD, status=0x7fff14c7bde0) failed > > > > (unknown)(): Other MPI error > > > > rank 1 in job 5 burn_37156 caused collective abort of all ranks > > > > -------------------------------------------------------------------- > > > > > > > > The protocols (0 and 8126589) are consistent if I run the program > > > multiple > > > > times. > > > > > > > > Anyone have any ideas? If you need more info, please let me know. > > > > > > > > Thanks, > > > > Brian > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080107/4ae6be47/attachment-0001.html From methier at CGR.Harvard.edu Mon Jan 7 13:10:03 2008 From: methier at CGR.Harvard.edu (Michael Ethier) Date: Mon Jan 7 13:10:12 2008 Subject: [mvapich-discuss] error IBV_WC_LOC_LEN_ERR and FATAL event IBV_EVENT_QP_LAST_WQE_REACHED In-Reply-To: References: Message-ID: Hi Matthew, The osu_bw test ran ok as seen below. I added the VIADEV_USE_COALESCE=0 variable to the command line and in the environment, and it made no difference, I set get the same errors. #!/bin/tcsh setenv VIADEV_USE_COALESCE 0 /usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 -hostfile ./hostfile VIADEV_USE_COALESCE=0 ./raflesi -f ./EDRAFLES_IN Thank You, Mike The benchmark test: foo.test script has in it #!/bin/tcsh /usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 -hostfile ./hostfile VIADEV_USE_COALESCE=0 /usr/mpi/intel/mvapich-0.9.9/tests/osu_benchmarks-2.2/osu_bw [gb16@moorcrofth run]$ ./foo.test # OSU MPI Bandwidth Test (Version 2.2) # Size Bandwidth (MB/s) 1 0.135198 2 0.273329 4 0.540415 8 1.087788 16 2.179976 32 4.371585 64 8.668233 128 17.290726 256 34.458536 512 68.269511 1024 129.384822 2048 239.992676 4096 392.348909 8192 542.819870 16384 452.196563 32768 625.604678 65536 764.094184 131072 836.010006 262144 871.899242 524288 890.772813 1048576 901.838432 2097152 906.494955 4194304 909.296621 [gb16@moorcrofth run]$ more ./hostfile moorcrofth moorcroft8 moorcroft11 -----Original Message----- From: Matthew Koop [mailto:koop@cse.ohio-state.edu] Sent: Monday, January 07, 2008 12:26 PM To: Michael Ethier Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] error IBV_WC_LOC_LEN_ERR and FATAL event IBV_EVENT_QP_LAST_WQE_REACHED Michael, Do other more simple benchmarks work (e.g. osu_benchmarks/osu_bw)? If they do, this is something we'd like to take a closer look at. I'd be interested to know if setting VIADEV_USE_COALESCE=0 resolves the issue: e.g. mpirun_rsh -np 2 h1 h2 VIADEV_USE_COALESCE=0 ./exec Matt On Mon, 7 Jan 2008, Michael Ethier wrote: > Hello, > > > > I am new to this forum and hoping someone can help solve the following > problem for me. > > > > We have a modeling application that initializes and runs fine using an > ordinary Ethernet connection. > > > > When we compile using the Infiniband software package (mvapich-0.9.9) > and run, the application fails with the following > > at then end: > > > > [0:moorcrofth] Abort: [moorcrofth:0] Got completion with error > IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 > > at line 388 in file viacheck.c > > [0:moorcrofth] Abort: [0] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, > code=16 > > at line 2552 in file viacheck.c > > mpirun_rsh: Abort signaled from [0 : moorcrofth] remote host is [1 : > moorcroft8 ] > > forrtl: error (78): process killed (SIGTERM) > > forrtl: error (78): process killed (SIGTERM) > > done. > > > > This occurs at the initialization phase it seems when communication > starts between different nodes. > > If I set the hostfile to contain the same node so that all the cpus used > are on 1 node, it initializes fine and runs. > > > > We are using Redhat Enterprise 4 Update 5 on x86_64 > > > > uname -a > > Linux moorcrofth 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 > x86_64 x86_64 x86_64 GNU/Linux > > > > In addition we are using mvapich-0.9.9 for our Infiniband software > package, and Intel 9.1: > > > > [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpicc --version > > icc (ICC) 9.1 20070510 > > Copyright (C) 1985-2007 Intel Corporation. All rights reserved. > > > > [gb16@moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpif90 --version > > ifort (IFORT) 9.1 20070510 > > Copyright (C) 1985-2007 Intel Corporation. All rights reserved. > > > > We are using the rsh communication protocol for this: > > /usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 ........ > > > > Can anyone suggest how this problem can be solved ? > > > > Thank You in advance, > > Mike > > > > From koop at cse.ohio-state.edu Mon Jan 7 16:21:24 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Jan 7 16:21:33 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB (fwd) In-Reply-To: <5b7094580801070930j26608c5qef31b73fa4d426e7@mail.gmail.com> Message-ID: Brian, The make.mvapich.detect script is just a helper script (not meant to be executed directly). You need to use the make.mvapich.ofa script, which will call configure and make for you with the correct arguments. More information can be found in our MVAPICH2 user guide under "4.4.1 Build MVAPICH2 with OpenFabrics Gen2-IB and iWARP" https://mvapich.cse.ohio-state.edu/support/ Let us know if you have any other problems. Matt On Mon, 7 Jan 2008, Brian Budge wrote: > Hi Wei - > > I changed from SMALL_CLUSTER to MEDIUM_CLUSTER, but it made no difference. > > When I build with rdma, this adds the following: > export LIBS="${LIBS} -lrdmacm" > export CFLAGS="${CFLAGS} -DADAPTIVE_RDMA_FAST_PATH -DRDMA_CM" > > It seems that I am using the make.mvapich2.detect script to build. It asks > me for my interface, and gives me the option for the mellanox interface, > which I choose. > > I just tried a fresh install directly from the tarball instead of using the > gentoo package. Now the program completes (goes beyond 8K message), but my > bandwidth isn't very good. Running the osu_bw.c test, I get about 250 MB/s > maximum. It seems like IB isn't being used. > > I did the following: > ./make.mvapich2.detect #, and chose the mellanox option > ./configure --enable-threads=multiple > make > make install > > So it seems that the package is doing something to enable infiniband that I > am not doing with the tarball. Conversely, the tarball can run without > crashing. > > Advice? > > Thanks, > Brian > > On Jan 6, 2008 6:38 AM, wei huang < huanwei@cse.ohio-state.edu> wrote: > > > Hi Brian, > > > > > I am using the openib-mvapich2-1.0.1 package in the gentoo-science > > overlay > > > addition to the standard gentoo packages. I have also tried 1.0 with > > the > > > same results. > > > > > > I compiled with multithreading turned on (haven't tried without this, > > but > > > the sample codes I am initially testing are not multithreaded, although > > my > > > application is). I also tried with or without rdma with no change. The > > > > > script seems to be setting the build for SMALL_CLUSTER. > > > > So you are using make.mvapich2.ofa to compile the package? I am a bit > > confused about ''I also tried with or without rdma with no change''. What > > exact change you made here? Also, SMALL_CLUSTER is obsolete for ofa > > stack... > > > > -- Wei > > > > > > > > Let me know what other information would be useful. > > > > > > Thanks, > > > Brian > > > > > > > > > > > > On Jan 4, 2008 6:12 PM, wei huang wrote: > > > > > > > Hi Brian, > > > > > > > > Thanks for letting us know this problem. Would you please let us know > > some > > > > more details to help us locate the issue. > > > > > > > > 1) More details on your platform. > > > > > > > > 2) Exact version of mvapich2 you are using. Is it from OFED package? > > or > > > > some version from our website. > > > > > > > > 3) If it is from our website, did you change anything from the default > > > > > > compiling scripts? > > > > > > > > Thanks. > > > > > > > > -- Wei > > > > > I'm new to the list here... hi! I have been using OpenMPI for a > > while, > > > > and > > > > > LAM before that, but new requirements keep pushing me to new > > > > > implementations. In particular, I was interested in using > > infiniband > > > > (using > > > > > OFED 1.2.5.1) in a multi-threaded environment. It seems that > > MVAPICH is > > > > the > > > > > library for that particular combination :) > > > > > > > > > > In any case, I installed MVAPICH, and I can boot the daemons, and > > run > > > > the > > > > > ring speed test with no problems. When I run any programs with > > mpirun, > > > > > however, I get an error when sending or receiving more than 8192 > > bytes. > > > > > > > > > > For example, if I run the bandwidth test from the benchmarks page > > > > > (osu_bw.c), I get the following: > > > > > --------------------------------------------------------------- > > > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > > > Thursday 06:16:00 > > > > > burn > > > > > burn-3 > > > > > # OSU MPI Bandwidth Test v3.0 > > > > > # Size Bandwidth (MB/s) > > > > > 1 1.24 > > > > > 2 2.72 > > > > > 4 5.44 > > > > > 8 10.18 > > > > > 16 19.09 > > > > > 32 29.69 > > > > > 64 65.01 > > > > > 128 147.31 > > > > > 256 244.61 > > > > > 512 354.32 > > > > > 1024 367.91 > > > > > 2048 451.96 > > > > > 4096 550.66 > > > > > 8192 598.35 > > > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to > > > > send > > > > > Internal Error: invalid error code ffffffff (Ring Index out of > > range) in > > > > > MPIDI_CH3_RndvSend:263 > > > > > Fatal error in MPI_Waitall: > > > > > Other MPI error, error stack: > > > > > MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, > > > > > status_array=0xdb3140) failed > > > > > (unknown)(): Other MPI error > > > > > rank 1 in job 4 burn_37156 caused collective abort of all ranks > > > > > exit status of rank 1: killed by signal 9 > > > > > --------------------------------------------------------------- > > > > > > > > > > I get a similar problem with the latency test, however, the protocol > > > > that is > > > > > complained about is different: > > > > > -------------------------------------------------------------------- > > > > > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > > > Thursday 09:21:20 > > > > > # OSU MPI Latency Test v3.0 > > > > > # Size Latency (us) > > > > > 0 3.93 > > > > > 1 4.07 > > > > > 2 4.06 > > > > > 4 3.82 > > > > > 8 3.98 > > > > > 16 4.03 > > > > > 32 4.00 > > > > > 64 4.28 > > > > > 128 5.22 > > > > > 256 5.88 > > > > > 512 8.65 > > > > > 1024 9.11 > > > > > 2048 11.53 > > > > > 4096 16.17 > > > > > 8192 25.67 > > > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv > > req > > > > to > > > > > send > > > > > Internal Error: invalid error code ffffffff (Ring Index out of > > range) in > > > > > MPIDI_CH3_RndvSend:263 > > > > > Fatal error in MPI_Recv: > > > > > Other MPI error, error stack: > > > > > MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, > > > > tag=1, > > > > > MPI_COMM_WORLD, status=0x7fff14c7bde0) failed > > > > > (unknown)(): Other MPI error > > > > > rank 1 in job 5 burn_37156 caused collective abort of all ranks > > > > > -------------------------------------------------------------------- > > > > > > > > > > The protocols (0 and 8126589) are consistent if I run the program > > > > multiple > > > > > times. > > > > > > > > > > Anyone have any ideas? If you need more info, please let me know. > > > > > > > > > > Thanks, > > > > > Brian > > > > > > > > > > > > > > > > > > > > > From brian.budge at gmail.com Mon Jan 7 19:15:09 2008 From: brian.budge at gmail.com (Brian Budge) Date: Mon Jan 7 19:15:21 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB (fwd) In-Reply-To: References: <5b7094580801070930j26608c5qef31b73fa4d426e7@mail.gmail.com> Message-ID: <5b7094580801071615y29148164v469332e1e3e7aa83@mail.gmail.com> Hi Matt - I have now done the install from the ofa build file, and I can boot and run the ring test, but now when I run the osu_bw.c benchmark, the executable dies in MPI_Init(). The things I altered in make.mvapich2.ofa were: OPEN_IB_HOME=${OPEN_IB_HOME:-/usr} SHARED_LIBS=${SHARED_LIBS:-yes} and on the configure line I added: --disable-f77 --disable-f90 Here is the error message that I am getting: rank 1 in job 1 burn_60139 caused collective abort of all ranks exit status of rank 1: killed by signal 9 Thanks, Brian On Jan 7, 2008 1:21 PM, Matthew Koop wrote: > Brian, > > The make.mvapich.detect script is just a helper script (not meant to be > executed directly). You need to use the make.mvapich.ofa script, which > will call configure and make for you with the correct arguments. > > More information can be found in our MVAPICH2 user guide under > "4.4.1 Build MVAPICH2 with OpenFabrics Gen2-IB and iWARP" > > https://mvapich.cse.ohio-state.edu/support/ > > Let us know if you have any other problems. > > Matt > > > > > On Mon, 7 Jan 2008, Brian Budge wrote: > > > Hi Wei - > > > > I changed from SMALL_CLUSTER to MEDIUM_CLUSTER, but it made no > difference. > > > > When I build with rdma, this adds the following: > > export LIBS="${LIBS} -lrdmacm" > > export CFLAGS="${CFLAGS} -DADAPTIVE_RDMA_FAST_PATH -DRDMA_CM" > > > > It seems that I am using the make.mvapich2.detect script to build. It > asks > > me for my interface, and gives me the option for the mellanox interface, > > which I choose. > > > > I just tried a fresh install directly from the tarball instead of using > the > > gentoo package. Now the program completes (goes beyond 8K message), but > my > > bandwidth isn't very good. Running the osu_bw.c test, I get about 250 > MB/s > > maximum. It seems like IB isn't being used. > > > > I did the following: > > ./make.mvapich2.detect #, and chose the mellanox option > > ./configure --enable-threads=multiple > > make > > make install > > > > So it seems that the package is doing something to enable infiniband > that I > > am not doing with the tarball. Conversely, the tarball can run without > > crashing. > > > > Advice? > > > > Thanks, > > Brian > > > > On Jan 6, 2008 6:38 AM, wei huang < huanwei@cse.ohio-state.edu> wrote: > > > > > Hi Brian, > > > > > > > I am using the openib-mvapich2-1.0.1 package in the gentoo-science > > > overlay > > > > addition to the standard gentoo packages. I have also tried 1.0with > > > the > > > > same results. > > > > > > > > I compiled with multithreading turned on (haven't tried without > this, > > > but > > > > the sample codes I am initially testing are not multithreaded, > although > > > my > > > > application is). I also tried with or without rdma with no change. > The > > > > > > > script seems to be setting the build for SMALL_CLUSTER. > > > > > > So you are using make.mvapich2.ofa to compile the package? I am a bit > > > confused about ''I also tried with or without rdma with no change''. > What > > > exact change you made here? Also, SMALL_CLUSTER is obsolete for ofa > > > stack... > > > > > > -- Wei > > > > > > > > > > > Let me know what other information would be useful. > > > > > > > > Thanks, > > > > Brian > > > > > > > > > > > > > > > > On Jan 4, 2008 6:12 PM, wei huang > wrote: > > > > > > > > > Hi Brian, > > > > > > > > > > Thanks for letting us know this problem. Would you please let us > know > > > some > > > > > more details to help us locate the issue. > > > > > > > > > > 1) More details on your platform. > > > > > > > > > > 2) Exact version of mvapich2 you are using. Is it from OFED > package? > > > or > > > > > some version from our website. > > > > > > > > > > 3) If it is from our website, did you change anything from the > default > > > > > > > > compiling scripts? > > > > > > > > > > Thanks. > > > > > > > > > > -- Wei > > > > > > I'm new to the list here... hi! I have been using OpenMPI for a > > > while, > > > > > and > > > > > > LAM before that, but new requirements keep pushing me to new > > > > > > implementations. In particular, I was interested in using > > > infiniband > > > > > (using > > > > > > OFED 1.2.5.1) in a multi-threaded environment. It seems that > > > MVAPICH is > > > > > the > > > > > > library for that particular combination :) > > > > > > > > > > > > In any case, I installed MVAPICH, and I can boot the daemons, > and > > > run > > > > > the > > > > > > ring speed test with no problems. When I run any programs with > > > mpirun, > > > > > > however, I get an error when sending or receiving more than 8192 > > > bytes. > > > > > > > > > > > > For example, if I run the bandwidth test from the benchmarks > page > > > > > > (osu_bw.c), I get the following: > > > > > > --------------------------------------------------------------- > > > > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > > > > Thursday 06:16:00 > > > > > > burn > > > > > > burn-3 > > > > > > # OSU MPI Bandwidth Test v3.0 > > > > > > # Size Bandwidth (MB/s) > > > > > > 1 1.24 > > > > > > 2 2.72 > > > > > > 4 5.44 > > > > > > 8 10.18 > > > > > > 16 19.09 > > > > > > 32 29.69 > > > > > > 64 65.01 > > > > > > 128 147.31 > > > > > > 256 244.61 > > > > > > 512 354.32 > > > > > > 1024 367.91 > > > > > > 2048 451.96 > > > > > > 4096 550.66 > > > > > > 8192 598.35 > > > > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv > req to > > > > > send > > > > > > Internal Error: invalid error code ffffffff (Ring Index out of > > > range) in > > > > > > MPIDI_CH3_RndvSend:263 > > > > > > Fatal error in MPI_Waitall: > > > > > > Other MPI error, error stack: > > > > > > MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0, > > > > > > status_array=0xdb3140) failed > > > > > > (unknown)(): Other MPI error > > > > > > rank 1 in job 4 burn_37156 caused collective abort of all > ranks > > > > > > exit status of rank 1: killed by signal 9 > > > > > > --------------------------------------------------------------- > > > > > > > > > > > > I get a similar problem with the latency test, however, the > protocol > > > > > that is > > > > > > complained about is different: > > > > > > > -------------------------------------------------------------------- > > > > > > > > > budge@burn:~/tests/testMvapich2> mpirun -np 2 ./a.out > > > > > > Thursday 09:21:20 > > > > > > # OSU MPI Latency Test v3.0 > > > > > > # Size Latency (us) > > > > > > 0 3.93 > > > > > > 1 4.07 > > > > > > 2 4.06 > > > > > > 4 3.82 > > > > > > 8 3.98 > > > > > > 16 4.03 > > > > > > 32 4.00 > > > > > > 64 4.28 > > > > > > 128 5.22 > > > > > > 256 5.88 > > > > > > 512 8.65 > > > > > > 1024 9.11 > > > > > > 2048 11.53 > > > > > > 4096 16.17 > > > > > > 8192 25.67 > > > > > > [1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from > rndv > > > req > > > > > to > > > > > > send > > > > > > Internal Error: invalid error code ffffffff (Ring Index out of > > > range) in > > > > > > MPIDI_CH3_RndvSend:263 > > > > > > Fatal error in MPI_Recv: > > > > > > Other MPI error, error stack: > > > > > > MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, > src=0, > > > > > tag=1, > > > > > > MPI_COMM_WORLD, status=0x7fff14c7bde0) failed > > > > > > (unknown)(): Other MPI error > > > > > > rank 1 in job 5 burn_37156 caused collective abort of all > ranks > > > > > > > -------------------------------------------------------------------- > > > > > > > > > > > > The protocols (0 and 8126589) are consistent if I run the > program > > > > > multiple > > > > > > times. > > > > > > > > > > > > Anyone have any ideas? If you need more info, please let me > know. > > > > > > > > > > > > Thanks, > > > > > > Brian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080107/b3f5f91e/attachment-0001.html From koop at cse.ohio-state.edu Mon Jan 7 20:12:26 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Jan 7 20:12:33 2008 Subject: [mvapich-discuss] unrecognized protocol for send/recv over 8KB (fwd) In-Reply-To: <5b7094580801071615y29148164v469332e1e3e7aa83@mail.gmail.com> Message-ID: Brian, Can you try the ibv_rc_pingpong program, which is a low-level (non-MPI) test that ships with OFED? This will make sure that your basic InfiniBand setup is working properly. Did any other error message print out other than the one you gave? Matt On Mon, 7 Jan 2008, Brian Budg