From rpsmic001 at gmail.com Wed Jul 1 10:11:20 2009 From: rpsmic001 at gmail.com (Michael Rapson) Date: Wed Jul 1 10:11:43 2009 Subject: [mvapich-discuss] compiling MVAPICH against GCC-4.3.3 In-Reply-To: <20090630152143.GE2761@cse.ohio-state.edu> References: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> <20090629115951.GA2432@cse.ohio-state.edu> <73bd53f00906300611n47d7e8c3o3f966b890de68e8f@mail.gmail.com> <20090630135314.GD2761@cse.ohio-state.edu> <73bd53f00906300721r1b084a2bo8f2d752ccb61ed2f@mail.gmail.com> <20090630152143.GE2761@cse.ohio-state.edu> Message-ID: <73bd53f00907010711s55df2cafq37697dc155a3b4ba@mail.gmail.com> Hi Jonathan, Thanks for those suggestions. I did get the latest tarball and installation went without hitch. I didn't run into the PARAMETERS command problem at all. Incidentally my gcc version is installed in my home directory so I compiled with -LLIBDIR switch and have exported the relevant LD_RUN_PATH and LD_LIBRARY_PATH environment variables. This leads into a weird question below but for now just the update on what I have done. I corrected the environment variables as you suggested and have tried a couple of variations of my submit script. The typical error for mpirun, whether using a single processor or 4 on an node is: Child exited abnormally! Killing remote processes...DONE If I use mpirun_rsh -rsh with more than one node I get the slightly worse error: Permission denied. Child exited abnormally! Killing remote processes...Permission denied. DONE This does not occur with just one cpu requested. (I presume this may indicate that ssh is not allowed by the cluster?) I have tested the -legacy switch and this does not seem to make any difference. Finally the weird question, PETSc detects that my MVAPICH installation is static only. GCC was built with shared libraries enabled, and I seem to remember that the GotoBLAS library that I build required the GCC shared libraries even for the static version. (Which meant that I needed to provide LD_LIBRARY_PATH when running GotoBLAS tests). Is it possible that the MVAPICH library is dependent on the GCC shared libraries in a similar way? The previous installation of MVAPICH was also static only, apparently because that was all that they were able to get working with the ofed binaries that they have. If there is this problem, could it be causing the weird behavior? Thanks, Michael From potluri at cse.ohio-state.edu Thu Jul 2 10:38:18 2009 From: potluri at cse.ohio-state.edu (sreeram potluri) Date: Thu Jul 2 10:38:45 2009 Subject: Fwd: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers (fwd) In-Reply-To: References: Message-ID: <23b6b0910907020738y5883e65k1c478e663e0b281b@mail.gmail.com> Hi Saurabh, We have a fix for this issue with PSM using PGI Compiler in our latest trunk available at https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich2/trunk (svn co https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich2/trunk mvapich2) Let us know if you see any further issues. Thanks Sreeram Potluri ---------- Forwarded message ---------- Date: Tue, 30 Jun 2009 08:56:31 -0700 From: Saurabh Barve To: Jonathan Perkins Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers On Mon, Jun 29, 2009 at 10:16:01PM -0700, Saurabh Barve wrote: > Hi, > > I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux > machine, and am running into errors. > > Here is how I run the configure script: > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - > DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable-f77 -- > enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX --with- > romio --without-mpe --prefix=/opt/mvapich2/pgi > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Don't set CFLAGS or LIBS, this is taken care of by the supplied configure options. There is also shouldn't be a need to specify the arch. Do you get the same error with the following command? ./configure --with-device=ch3:psm --with-romio --without-mpe --prefix=/opt/mvapich2/pgi CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 > > > This is the error I get when I run 'make': > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ... > ... > make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm/mpirun' > make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src' > make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > ../bin/mpicc -o cpi cpi.o -lm > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_irecv.o): In function `MPID_Irecv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_recv.o): In function `MPID_Recv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' > make[1]: *** [cpi] Error 2 > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > make: *** [all-redirect] Error 2 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > More details about the system I'm using: > > 1) Operating System - CentOS Linux 5.0 > > 2) Kernel version - 2.6.18-8.1.14.el5 > > 3) PGI Compiler Suite - Version 8.0-2 > > 4) MVAPICH2 version 1.4rc1 > > MVAPICH2 builds fine for me when I use the Intel compilers (icc, icpc, > ifort) and use the same configure options as above. > > What am I doing wrong? > > Thanks, > Saurabh > -- Jonathan, I get the same error when I try to use your configure command and then run 'make': ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ../bin/mpicc -I../src/include -I../src/include -c cpi.c ../bin/mpicc -o cpi cpi.o -lm /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/libmpich.a(mpid_irecv.o): In function `MPID_Irecv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/libmpich.a(mpid_recv.o): In function `MPID_Recv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' make[1]: *** [cpi] Error 2 make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/examples' make: *** [all-redirect] Error 2 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -Saurabh -------------------------------------------------- From: "Jonathan Perkins" Sent: Tuesday, June 30, 2009 5:00 AM To: "Saurabh Barve" Cc: Subject: Re: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090702/067559f0/attachment.html From saurabh.barve at gmail.com Thu Jul 2 20:25:41 2009 From: saurabh.barve at gmail.com (Saurabh Barve) Date: Thu Jul 2 20:26:14 2009 Subject: Fwd: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers (fwd) In-Reply-To: <23b6b0910907020738y5883e65k1c478e663e0b281b@mail.gmail.com> References: <23b6b0910907020738y5883e65k1c478e663e0b281b@mail.gmail.com> Message-ID: Sreeram, Thanks. The latest trunk build fixed the problem. There were no errors during 'make' or 'make install'. All the binaries were generated. -Saurabh From: sreeram potluri Sent: Thursday, July 02, 2009 7:38 AM To: saurabh.barve@gmail.com Cc: Dhabaleswar Panda ; Jonathan Perkins ; mvapich-core ; mvapich-discuss@cse.ohio-state.edu Subject: Fwd: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers (fwd) Hi Saurabh, We have a fix for this issue with PSM using PGI Compiler in our latest trunk available at https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich2/trunk (svn co https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich2/trunk mvapich2) Let us know if you see any further issues. Thanks Sreeram Potluri ---------- Forwarded message ---------- Date: Tue, 30 Jun 2009 08:56:31 -0700 From: Saurabh Barve To: Jonathan Perkins Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers On Mon, Jun 29, 2009 at 10:16:01PM -0700, Saurabh Barve wrote: > Hi, > > I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux > machine, and am running into errors. > > Here is how I run the configure script: > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - > DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable-f77 -- > enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX --with- > romio --without-mpe --prefix=/opt/mvapich2/pgi > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Don't set CFLAGS or LIBS, this is taken care of by the supplied configure options. There is also shouldn't be a need to specify the arch. Do you get the same error with the following command? ./configure --with-device=ch3:psm --with-romio --without-mpe --prefix=/opt/mvapich2/pgi CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 > > > This is the error I get when I run 'make': > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ... > ... > make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm/mpirun' > make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src' > make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > ../bin/mpicc -o cpi cpi.o -lm > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_irecv.o): In function `MPID_Irecv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_recv.o): In function `MPID_Recv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' > make[1]: *** [cpi] Error 2 > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > make: *** [all-redirect] Error 2 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > More details about the system I'm using: > > 1) Operating System - CentOS Linux 5.0 > > 2) Kernel version - 2.6.18-8.1.14.el5 > > 3) PGI Compiler Suite - Version 8.0-2 > > 4) MVAPICH2 version 1.4rc1 > > MVAPICH2 builds fine for me when I use the Intel compilers (icc, icpc, > ifort) and use the same configure options as above. > > What am I doing wrong? > > Thanks, > Saurabh > -- Jonathan, I get the same error when I try to use your configure command and then run 'make': ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ../bin/mpicc -I../src/include -I../src/include -c cpi.c ../bin/mpicc -o cpi cpi.o -lm /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/libmpich.a(mpid_irecv.o): In function `MPID_Irecv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/libmpich.a(mpid_recv.o): In function `MPID_Recv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' make[1]: *** [cpi] Error 2 make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/examples' make: *** [all-redirect] Error 2 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -Saurabh -------------------------------------------------- From: "Jonathan Perkins" Sent: Tuesday, June 30, 2009 5:00 AM To: "Saurabh Barve" Cc: Subject: Re: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090702/d818a8e7/attachment-0001.html From rpsmic001 at gmail.com Mon Jul 6 06:49:11 2009 From: rpsmic001 at gmail.com (Michael Rapson) Date: Mon Jul 6 06:49:36 2009 Subject: [mvapich-discuss] editing mpirun_rsh.c for use with Tivoli LoadLeveler Message-ID: <73bd53f00907060349r6c52dbc9r15bd3d3c3eae4e1b@mail.gmail.com> Hi all, In tracking down a problem that I mentioned earlier causing the error message "Child exited abnormally!" I have come across a section in the Tivoli LoadLeveler manual entitled "Configuring LoadLeveler to support MVAPICH jobs". This section discusses how their preferred way to launch MPI processes is through the llspawn command and explains modifications that should be made to the mpirun_rsh.c program to call this command rather than "/usr/bin/rsh" and "/usr/bin/ssh". Unfortunately the steps given seem to be out of date (for MVAPICH version 1.1). Has anyone successfully installed MVAPICH 1.1 with support for llspawn and could you let me know which files you needed to edit? For reference sake, I am pasting the relevant section from the LoadLeveler documentation below, I have also made some comments about where I found closest matches to the given advice. Thanks for your help! Michael // section from documentation, my comments preceeded by "//" Configuring LoadLeveler to support MVAPICH jobs To run MVAPICH jobs under LoadLeveler control, you must specify the llspawn command to replace the default RSHCOMMAND value during software configuration. The compiled MVAPICH implementation code uses the llspawn command to start tasks under LoadLeveler control. This allows LoadLeveler to have total control over the remote tasks for accounting and cleanup. To configure the MVAPICH code to use the llspawn command as RSHCOMMAND, change the mpirun_rsh.c program source code by following these steps before compiling MVAPICH: 1. Replace: Void child_handler(int); // this is in the mpirun_rsh.c file, with void starting on a small letter obviously with: Void child_handler(int); Void term_handler(int); 2. For Linux, replace: #define RSH_CMD ?/usr/bin/rsh? // these I found in mpirun_rsh.h file #define RSH_CMD ?/usr/bin/ssh? with: #define RSH_CMD ?/opt/ibmll/LoadL/full/bin/llspawn? #define SSH_CMD ?/opt/ibmll/LoadL/full/bin/llpsawn? 3. Replace: signal(SIGCHLD, child_handler); // this command I could not find, the closest matches came from the serv_p4.c files: signal(SIGCHLD, reaper); with: signal(SIGCHLD, SIG_IGN); signal(SIGTERM, term_handler); 4. Add the definition for term_handler function at the end: Void term_handler(int signal) { exit(0); } // presumably this could still be added to mpirun_rsh.c ? Where should signal(SIGTERM, term_handler); be added if this is the case? From penoff at cs.ubc.ca Wed Jul 8 02:29:02 2009 From: penoff at cs.ubc.ca (Brad Penoff) Date: Wed Jul 8 02:29:27 2009 Subject: [mvapich-discuss] OSU MVAPICH2 1.4-RC1-3378 (06/02/09) VPATH build and "debug" static? Message-ID: hey, I'm not sure if these is are known or legitimate issues or if it's particular to my system, but I was not able to do a VPATH build with your latest MVAPICH2 tarball nor build my application. These were fixed after two work-arounds. I wondered if these work-arounds were known or if I did something wrong in the first place. ----Issue #1---- I downloaded http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-1.4rc1-3378.tgz and then tar zxf, cd mvapich2-1.4rc1, and then mkdir build. From inside build, I did a VPATH build by configuring to create 32-bit libraries the following way on my 64-bit machine (Red Hat Enterprise Linux Server release 5.1 (Tikanga)): $ ../configure CFLAGS=-m32 CPPFLAGS=-m32 FC=gfortran F90=gfortran FFLAGS=-m32 F90FLAGS=-m32 LDFLAGS=-m32 --prefix=/home/penoff/installs/mvapich2-1.4rc1 Eventually the "make" died with the error below. When I did the same configure line but instead did not to a VPATH build (so from mvapich2-1.4rc1), the build succeeded as did the "make install". I was just wondering if this was a known issue. The error I saw is below. ----- Issue #2 ---- Once installed, we compiled our code. We have a function called debug() in our code somewhere. It was conflicting when compiling with an internal variable of your code. I'm not sure who is at fault here, but instead of renaming our function and adjusting all of our code in countless places, instead to fix this, I just made the long variable "debug" in src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:503 to be static by putting that keyword at the beginning of the line. Are these fixes necessary or am I doing something wrong to begin with? Thanks, brad make[3]: Entering directory `/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun' gcc -DHAVE_CONFIG_H -I. -I../../../../src/pm/mpirun -I/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun/../../../../src/pm/mpirun/include -m32 -DNDEBUG -O2 -m32 -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/ch3/include -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/ch3/include -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/common/datatype -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/common/datatype -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/common/locks -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/common/locks -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/ch3/channels/mrail/include -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/ch3/channels/mrail/src/gen2 -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/common/locks -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/common/locks -c ../../../../src/pm/mpirun/mpirun_rsh.c ../../../../src/pm/mpirun/mpirun_rsh.c:27:24: error: mpirunconf.h: No such file or directory ../../../../src/pm/mpirun/mpirun_rsh.c:272: error: expected identifier or ?(? before ?__extension__? make[3]: *** [mpirun_rsh.o] Error 1 make[3]: Leaving directory `/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun' make[2]: *** [all-redirect] Error 2 make[2]: Leaving directory `/home/penoff/src/mvapich2-1.4rc1/build/src/pm' make[1]: *** [all-redirect] Error 2 make[1]: Leaving directory `/home/penoff/src/mvapich2-1.4rc1/build/src' make: *** [all-redirect] Error 2 [penoff@hpc0001 build]$ find .. -name mpirunconf.h ../build/src/pm/mpirun/include/mpirunconf.h [penoff@hpc0001 build]$ From Craig.Tierney at noaa.gov Wed Jul 8 18:21:32 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Wed Jul 8 18:21:55 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: References: Message-ID: <4A551BEC.6030207@noaa.gov> I am running mvapich2 1.2, built with Ofed support (v1.3.1). For large jobs, I am having problems where they do not start. I am using the mpirun_rsh launcher. When I try to start jobs with ~512 cores or larger, I can see the problem. The problem doesn't happen all the time. I can't rule our quirky hardware. The IB tree seems to be clean (as reported by ibdiagnet). My last hang, I looked to see if xhpl had started on all the nodes (8 cases for each node for dual-socket quad-core systems). I found that 7 of the 245 nodes (1960 core job) had no xhpl processes on them. So either the launching mechanism hung, or something was up with one of those nodes. My question is, how should I start debugging this to understand what process is hanging? Thanks, Craig -- Craig Tierney (craig.tierney@noaa.gov) From panda at cse.ohio-state.edu Wed Jul 8 22:22:01 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Jul 8 22:22:25 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: <4A551BEC.6030207@noaa.gov> Message-ID: Are you able to run simple MPI programs (say MPI Hello World) or some IMB tests using ~512 cores or larger. This will help you to find out whether there are any issues when launching jobs and isolate any nodes which might be having problems. Thanks, DK On Wed, 8 Jul 2009, Craig Tierney wrote: > I am running mvapich2 1.2, built with Ofed support (v1.3.1). > For large jobs, I am having problems where they do not start. > I am using the mpirun_rsh launcher. When I try to start jobs > with ~512 cores or larger, I can see the problem. The problem > doesn't happen all the time. > > I can't rule our quirky hardware. The IB tree seems to be > clean (as reported by ibdiagnet). My last hang, I looked to > see if xhpl had started on all the nodes (8 cases for each > node for dual-socket quad-core systems). I found that 7 of > the 245 nodes (1960 core job) had no xhpl processes on them. > So either the launching mechanism hung, or something was up with one of > those nodes. > > My question is, how should I start debugging this to understand > what process is hanging? > > Thanks, > Craig > > > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From alex.ninaber at clustervision.com Thu Jul 9 03:19:32 2009 From: alex.ninaber at clustervision.com (Alex Ninaber) Date: Thu Jul 9 07:10:35 2009 Subject: [mvapich-discuss] mvapich2 1.4rc1 blcr and mpirun_rsh Message-ID: <4A559A04.3010508@clustervision.com> Dear list, One of the new features in 1.4rc1 is listed as: "o (NEW) Scalable Checkpoint-restart with mpirun_rsh framework " Assuming this means that mpd is no longer needed, we get the following error: mpirun_rsh -ssh -np 2 -hostfile ./nodes ./IMB-MPI1 [Rank 0][cr.c: line 186][Rank 0][cr.c: line 186]connect 0 failed connect 1 failed MPI process (rank: 1) terminated unexpectedly on quad01 Exit code -5 signaled from quad01 MPI process (rank: 0) terminated unexpectedly on quad01 Terminated In .bashrc: export MV2_CKPT_FILE=/home/user/chkpoint Config: ./configure --prefix=/cvos/shared/apps/ofed/1.4/mpi/gcc/mvapich2-1.4rc1/ --with-rdma=gen2 --enable-blcr \ --disable-romio --disable-rdma-cm --with-ib-libpath=/cvos/shared/apps/ofed/1.4/lib64 \ --with-blcr-libpath=/cvos/shared/apps/blcr/0.8.2/lib/ --with-blcr-include=/cvos/shared/apps/blcr/0.8.2/include/ \ --with-ib-include=/cvos/shared/apps/ofed/1.4/include/ --enable-header-caching Are we missing something? Both mpirun_rsh and mpd mechanisms are available, or should we choose one during install? Thanks & regards, Alex (please cc alex.ninaber@clustervision.com) From gopalakk at cse.ohio-state.edu Thu Jul 9 07:46:46 2009 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Thu Jul 9 07:47:33 2009 Subject: [mvapich-discuss] mvapich2 1.4rc1 blcr and mpirun_rsh In-Reply-To: <4A559A04.3010508@clustervision.com> References: <4A559A04.3010508@clustervision.com> Message-ID: <92eddfb50907090446q473c70c0s7ac384ffce97364f@mail.gmail.com> Hi Alex. Your understanding is correct. You no longer need to use mpd for the Checkpoint / Restart feature. You can use mpirun_rsh. You can also switch between mpirun_rsh and mpd at runtime. However, changes have been made in the MPI Library to support this feature. Looking at the line number in cr.c reported in the error message, it looks like IMB-MPI1 has been compiled with an older version of MVAPICH2. Please recompile IMB with MVAPICH2-1.4 RC1 and rerun IMB-MPI1. Also note that mpirun_rsh treats MV2_CKPT_FILE as a parameter and does not read it from the environment variable. You need to run IMB as follows: mpirun_rsh -ssh -np 2 -hostfile ./nodes MV2_CKPT_FILE=/home/user/chkpoint ./IMB-MPI1 Please let us know if this helps. Regards, Karthik On Thu, Jul 9, 2009 at 3:19 AM, Alex Ninaber wrote: > > Dear list, > > One of the new features in 1.4rc1 is listed as: > > "o (NEW) Scalable Checkpoint-restart with mpirun_rsh framework " > > Assuming this means that mpd is no longer needed, we get the following > error: > > mpirun_rsh -ssh -np 2 -hostfile ./nodes ./IMB-MPI1 > [Rank 0][cr.c: line 186][Rank 0][cr.c: line 186]connect 0 failed > connect 1 failed > MPI process (rank: 1) terminated unexpectedly on quad01 > Exit code -5 signaled from quad01 > MPI process (rank: 0) terminated unexpectedly on quad01 > Terminated > > In .bashrc: > export MV2_CKPT_FILE=/home/user/chkpoint > > Config: > > ./configure --prefix=/cvos/shared/apps/ofed/1.4/mpi/gcc/mvapich2-1.4rc1/ > --with-rdma=gen2 --enable-blcr \ > --disable-romio --disable-rdma-cm > --with-ib-libpath=/cvos/shared/apps/ofed/1.4/lib64 \ > --with-blcr-libpath=/cvos/shared/apps/blcr/0.8.2/lib/ > --with-blcr-include=/cvos/shared/apps/blcr/0.8.2/include/ \ > --with-ib-include=/cvos/shared/apps/ofed/1.4/include/ > --enable-header-caching > > > Are we missing something? Both mpirun_rsh and mpd mechanisms are available, > or should we choose one during install? > > Thanks & regards, > > Alex > > > (please cc alex.ninaber@clustervision.com) > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From Craig.Tierney at noaa.gov Thu Jul 9 13:05:07 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Thu Jul 9 13:05:33 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: References: Message-ID: <4A562343.8060507@noaa.gov> Dhabaleswar Panda wrote: > Are you able to run simple MPI programs (say MPI Hello World) or some IMB > tests using ~512 cores or larger. This will help you to find out whether > there are any issues when launching jobs and isolate any nodes which might > be having problems. > > Thanks, > I have been using HPL to test, but I have also used IMB and a user code. I can't say for certain that 512 cores is the cut-off to the problem, but the user that gets bit the most tries to use about 512 cores. If it happened more, I am sure users would complain. I have used hpl to search for bad hardware. It has been a good technique in the past and I have used it to bring up several clusters. This one seems so random that I hoped to do something better. For example, I have seen that all the processes have started, but the code doesn't get out of MPI_Init. In this case, I was wondering if there was a way to debug one (all) of the process and see which process hadn't responded yet. Craig > DK > > On Wed, 8 Jul 2009, Craig Tierney wrote: > >> I am running mvapich2 1.2, built with Ofed support (v1.3.1). >> For large jobs, I am having problems where they do not start. >> I am using the mpirun_rsh launcher. When I try to start jobs >> with ~512 cores or larger, I can see the problem. The problem >> doesn't happen all the time. >> >> I can't rule our quirky hardware. The IB tree seems to be >> clean (as reported by ibdiagnet). My last hang, I looked to >> see if xhpl had started on all the nodes (8 cases for each >> node for dual-socket quad-core systems). I found that 7 of >> the 245 nodes (1960 core job) had no xhpl processes on them. >> So either the launching mechanism hung, or something was up with one of >> those nodes. >> >> My question is, how should I start debugging this to understand >> what process is hanging? >> >> Thanks, >> Craig >> >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > -- Craig Tierney (craig.tierney@noaa.gov) From Craig.Tierney at noaa.gov Thu Jul 9 18:19:50 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Thu Jul 9 18:20:15 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: References: Message-ID: <4A566D06.6020802@noaa.gov> Dhabaleswar Panda wrote: > Are you able to run simple MPI programs (say MPI Hello World) or some IMB > tests using ~512 cores or larger. This will help you to find out whether > there are any issues when launching jobs and isolate any nodes which might > be having problems. > > Thanks, > > DK > I dug in further today while the system was offline, and this is what I found. The mpispawn process is hanging. When it hangs it does hang on different nodes each time. What I see is that one side thinks the connection is closed, and the other side waits. At one end: [root@h43 ~]# netstat Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED tcp 0 0 h43:49730 h6:56443 ESTABLISHED tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT tcp 0 0 h43:ssh h1:35169 ESTABLISHED tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED (gdb) bt #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 At other end (node h4): (gdb) bt #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 The netstat on h4 does not show any connections back to h43. I tried the latest 1.4Beta from the website (not svn) I found that for large jobs mpirun_rsh will sometimes exits without running anything. The large the job, the more likely it is to not to start the job properly. The only difference is that it doesn't hang. I turned on debugging with MPISPAWN_DEBUG, but I didn't see anything interesting from that. Craig > On Wed, 8 Jul 2009, Craig Tierney wrote: > >> I am running mvapich2 1.2, built with Ofed support (v1.3.1). >> For large jobs, I am having problems where they do not start. >> I am using the mpirun_rsh launcher. When I try to start jobs >> with ~512 cores or larger, I can see the problem. The problem >> doesn't happen all the time. >> >> I can't rule our quirky hardware. The IB tree seems to be >> clean (as reported by ibdiagnet). My last hang, I looked to >> see if xhpl had started on all the nodes (8 cases for each >> node for dual-socket quad-core systems). I found that 7 of >> the 245 nodes (1960 core job) had no xhpl processes on them. >> So either the launching mechanism hung, or something was up with one of >> those nodes. >> >> My question is, how should I start debugging this to understand >> what process is hanging? >> >> Thanks, >> Craig >> >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > -- Craig Tierney (craig.tierney@noaa.gov) From alex.ninaber at clustervision.com Fri Jul 10 04:35:40 2009 From: alex.ninaber at clustervision.com (Alex Ninaber) Date: Fri Jul 10 04:36:09 2009 Subject: [mvapich-discuss] mvapich2 1.4rc1 blcr and mpirun_rsh In-Reply-To: <92eddfb50907090446q473c70c0s7ac384ffce97364f@mail.gmail.com> References: <4A559A04.3010508@clustervision.com> <92eddfb50907090446q473c70c0s7ac384ffce97364f@mail.gmail.com> Message-ID: <4A56FD5C.7050201@clustervision.com> Dear Kathik, Indeed you're right; I was loading the wrong module, Regards, Alex Karthik Gopalakrishnan wrote: > Hi Alex. > > Your understanding is correct. You no longer need to use mpd for the > Checkpoint / Restart feature. You can use mpirun_rsh. You can also > switch between mpirun_rsh and mpd at runtime. > > However, changes have been made in the MPI Library to support this > feature. Looking at the line number in cr.c reported in the error > message, it looks like IMB-MPI1 has been compiled with an older > version of MVAPICH2. Please recompile IMB with MVAPICH2-1.4 RC1 and > rerun IMB-MPI1. > > Also note that mpirun_rsh treats MV2_CKPT_FILE as a parameter and does > not read it from the environment variable. You need to run IMB as > follows: > mpirun_rsh -ssh -np 2 -hostfile ./nodes > MV2_CKPT_FILE=/home/user/chkpoint ./IMB-MPI1 > > Please let us know if this helps. > > Regards, > Karthik > > On Thu, Jul 9, 2009 at 3:19 AM, Alex > Ninaber wrote: > >> Dear list, >> >> One of the new features in 1.4rc1 is listed as: >> >> "o (NEW) Scalable Checkpoint-restart with mpirun_rsh framework " >> >> Assuming this means that mpd is no longer needed, we get the following >> error: >> >> mpirun_rsh -ssh -np 2 -hostfile ./nodes ./IMB-MPI1 >> [Rank 0][cr.c: line 186][Rank 0][cr.c: line 186]connect 0 failed >> connect 1 failed >> MPI process (rank: 1) terminated unexpectedly on quad01 >> Exit code -5 signaled from quad01 >> MPI process (rank: 0) terminated unexpectedly on quad01 >> Terminated >> >> In .bashrc: >> export MV2_CKPT_FILE=/home/user/chkpoint >> >> Config: >> >> ./configure --prefix=/cvos/shared/apps/ofed/1.4/mpi/gcc/mvapich2-1.4rc1/ >> --with-rdma=gen2 --enable-blcr \ >> --disable-romio --disable-rdma-cm >> --with-ib-libpath=/cvos/shared/apps/ofed/1.4/lib64 \ >> --with-blcr-libpath=/cvos/shared/apps/blcr/0.8.2/lib/ >> --with-blcr-include=/cvos/shared/apps/blcr/0.8.2/include/ \ >> --with-ib-include=/cvos/shared/apps/ofed/1.4/include/ >> --enable-header-caching >> >> >> Are we missing something? Both mpirun_rsh and mpd mechanisms are available, >> or should we choose one during install? >> >> Thanks & regards, >> >> Alex >> >> >> (please cc alex.ninaber@clustervision.com) >> >> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> -- ------------------------------------------------------------------ Dr Alex Ninaber ClusterVision Technical Manager tel NL: +31 20 407 7557 http://www.ClusterVision.com tel UK: +44 870 080 1980 email:Alex.Ninaber@ClusterVision.com tel Mob: +31 61 650 4127 support: support@ClusterVision.com skype: AlexNinaber From panda at cse.ohio-state.edu Fri Jul 10 09:46:46 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jul 10 09:47:14 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: <4A566D06.6020802@noaa.gov> Message-ID: Craig - Could you please tell us little more about the details on your system: node configuration (sockets and cores/socket, processor type), OS version, etc. What kind of Ethernet connectivity does your system have? FYI, mpirun_rsh framework launches the job using the standard TCP/IP calls. Thanks, DK On Thu, 9 Jul 2009, Craig Tierney wrote: > Dhabaleswar Panda wrote: > > Are you able to run simple MPI programs (say MPI Hello World) or some IMB > > tests using ~512 cores or larger. This will help you to find out whether > > there are any issues when launching jobs and isolate any nodes which might > > be having problems. > > > > Thanks, > > > > DK > > > > I dug in further today while the system was offline, and this > is what I found. The mpispawn process is hanging. When it hangs > it does hang on different nodes each time. What I see is that > one side thinks the connection is closed, and the other side waits. > > At one end: > > [root@h43 ~]# netstat > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED > tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED > tcp 0 0 h43:49730 h6:56443 ESTABLISHED > tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT > tcp 0 0 h43:ssh h1:35169 ESTABLISHED > tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED > > > (gdb) bt > #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 > #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 > #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 > #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 > > At other end (node h4): > > (gdb) bt > #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 > #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 > #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 > > The netstat on h4 does not show any connections back to h43. > > I tried the latest 1.4Beta from the website (not svn) I found that > for large jobs mpirun_rsh will sometimes exits without running anything. > The large the job, the more likely it is to not to start the job properly. > The only difference is that it doesn't hang. I turned on debugging with > MPISPAWN_DEBUG, but I didn't see anything interesting from that. > > Craig > > > > > > On Wed, 8 Jul 2009, Craig Tierney wrote: > > > >> I am running mvapich2 1.2, built with Ofed support (v1.3.1). > >> For large jobs, I am having problems where they do not start. > >> I am using the mpirun_rsh launcher. When I try to start jobs > >> with ~512 cores or larger, I can see the problem. The problem > >> doesn't happen all the time. > >> > >> I can't rule our quirky hardware. The IB tree seems to be > >> clean (as reported by ibdiagnet). My last hang, I looked to > >> see if xhpl had started on all the nodes (8 cases for each > >> node for dual-socket quad-core systems). I found that 7 of > >> the 245 nodes (1960 core job) had no xhpl processes on them. > >> So either the launching mechanism hung, or something was up with one of > >> those nodes. > >> > >> My question is, how should I start debugging this to understand > >> what process is hanging? > >> > >> Thanks, > >> Craig > >> > >> > >> -- > >> Craig Tierney (craig.tierney@noaa.gov) > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > > > > > -- > Craig Tierney (craig.tierney@noaa.gov) > From RGigon at slb.com Fri Jul 10 11:29:22 2009 From: RGigon at slb.com (Roberta Gigon) Date: Fri Jul 10 12:08:24 2009 Subject: [mvapich-discuss] Segmentation fault when using MVAPICH 1.1 Message-ID: <9314796A0C8D864B8CAD7C3BFA9CD90C212F8464AE@NL0105EXC01V01.eur.slb.com> Hi there, One of my users has an MPI code that gives a segmentation fault when running under MVAPICH 1.1, but runs fine under 0.9.9. Does anyone have a suggestion on what could be causing this and what steps I might take to remedy it? I have a sample code I can send that reliably seg faults, if that is helpful. Best regards, Roberta --------------------------------------------------------------------------------------------- Roberta M. Gigon Schlumberger-Doll Research One Hampshire Street, MD-B253 Cambridge, MA 02139 617.768.2099 - phone 617.768.2381 - fax This message is considered Schlumberger CONFIDENTIAL. Please treat the information contained herein accordingly. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090710/31dffca8/attachment-0001.html From panda at cse.ohio-state.edu Fri Jul 10 12:17:08 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jul 10 12:17:31 2009 Subject: [mvapich-discuss] Segmentation fault when using MVAPICH 1.1 In-Reply-To: <9314796A0C8D864B8CAD7C3BFA9CD90C212F8464AE@NL0105EXC01V01.eur.slb.com> Message-ID: Hi, > One of my users has an MPI code that gives a segmentation fault when > running under MVAPICH 1.1, but runs fine under 0.9.9. Does anyone > have a suggestion on what could be causing this and what steps I might > take to remedy it? I have a sample code I can send that reliably seg > faults, if that is helpful. Thanks for your note. Have you tried your application with the MVAPICH 1.1 branch version. A couple of patches have gone there recently including one related to ptmalloc. Please try this branch version and let us know whether this solves the problem or not. If the problem is not solved, we will aprpeciate receiving the sample code from you to reproduce the problem. Best Regards, DK > Best regards, > Roberta > > --------------------------------------------------------------------------------------------- > Roberta M. Gigon > Schlumberger-Doll Research > One Hampshire Street, MD-B253 > Cambridge, MA 02139 > 617.768.2099 - phone > 617.768.2381 - fax > > This message is considered Schlumberger CONFIDENTIAL. Please treat the information contained herein accordingly. > > From perkinjo at cse.ohio-state.edu Fri Jul 10 12:23:06 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Fri Jul 10 12:23:32 2009 Subject: [mvapich-discuss] Segmentation fault when using MVAPICH 1.1 In-Reply-To: <9314796A0C8D864B8CAD7C3BFA9CD90C212F8464AE@NL0105EXC01V01.eur.slb.com> References: <9314796A0C8D864B8CAD7C3BFA9CD90C212F8464AE@NL0105EXC01V01.eur.slb.com> Message-ID: <20090710162306.GC4530@cse.ohio-state.edu> On Fri, Jul 10, 2009 at 05:29:22PM +0200, Roberta Gigon wrote: > Hi there, > One of my users has an MPI code that gives a segmentation fault when > running under MVAPICH 1.1, but runs fine under 0.9.9. Does anyone > have a suggestion on what could be causing this and what steps I might > take to remedy it? I have a sample code I can send that reliably seg > faults, if that is helpful. You can send the mpi program. However it will also be useful to send a backtrace from the generated core file(s). Also, are any environment variables set or did you modify the install script in any way other than setting the installation directory? -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090710/ac61bb7d/attachment.bin From Craig.Tierney at noaa.gov Fri Jul 10 12:37:18 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Fri Jul 10 12:37:42 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: References: Message-ID: <4A576E3E.3030508@noaa.gov> Dhabaleswar Panda wrote: > Craig - Could you please tell us little more about the details on your > system: node configuration (sockets and cores/socket, processor type), OS > version, etc. What kind of Ethernet connectivity does your system have? > FYI, mpirun_rsh framework launches the job using the standard TCP/IP > calls. > The system is a cluster based on Supermicro Motherboards. Each node is a dual-socket, quad-core harpertown, 2.8 GHz. Each node has 16 GB of RAM. We are running Centos 5.1. We are using the 2.6.18-92.1.13.el5 kernel and the e1000e Intel GigE driver. The nodes boot over NFS. The entire OS image is available via NFS. About 30 nodes each attach to an SMC8150L2 gigE switch. The 9 swtiches have 2 uplinks to an Force10 switch (not sure of model number). The links are bonded via a port-channel. Spanning tree is disabled. We are testing a Centos 5.3 image for a new Nehalemm cluster, but I won't have the hardware up until the end of next week. Craig > Thanks, > > DK > > > > On Thu, 9 Jul 2009, Craig Tierney wrote: > >> Dhabaleswar Panda wrote: >>> Are you able to run simple MPI programs (say MPI Hello World) or some IMB >>> tests using ~512 cores or larger. This will help you to find out whether >>> there are any issues when launching jobs and isolate any nodes which might >>> be having problems. >>> >>> Thanks, >>> >>> DK >>> >> I dug in further today while the system was offline, and this >> is what I found. The mpispawn process is hanging. When it hangs >> it does hang on different nodes each time. What I see is that >> one side thinks the connection is closed, and the other side waits. >> >> At one end: >> >> [root@h43 ~]# netstat >> Active Internet connections (w/o servers) >> Proto Recv-Q Send-Q Local Address Foreign Address State >> tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED >> tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED >> tcp 0 0 h43:49730 h6:56443 ESTABLISHED >> tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT >> tcp 0 0 h43:ssh h1:35169 ESTABLISHED >> tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED >> >> >> (gdb) bt >> #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 >> #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 >> #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 >> #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 >> >> At other end (node h4): >> >> (gdb) bt >> #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 >> #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 >> #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 >> >> The netstat on h4 does not show any connections back to h43. >> >> I tried the latest 1.4Beta from the website (not svn) I found that >> for large jobs mpirun_rsh will sometimes exits without running anything. >> The large the job, the more likely it is to not to start the job properly. >> The only difference is that it doesn't hang. I turned on debugging with >> MPISPAWN_DEBUG, but I didn't see anything interesting from that. >> >> Craig >> >> >> >> >>> On Wed, 8 Jul 2009, Craig Tierney wrote: >>> >>>> I am running mvapich2 1.2, built with Ofed support (v1.3.1). >>>> For large jobs, I am having problems where they do not start. >>>> I am using the mpirun_rsh launcher. When I try to start jobs >>>> with ~512 cores or larger, I can see the problem. The problem >>>> doesn't happen all the time. >>>> >>>> I can't rule our quirky hardware. The IB tree seems to be >>>> clean (as reported by ibdiagnet). My last hang, I looked to >>>> see if xhpl had started on all the nodes (8 cases for each >>>> node for dual-socket quad-core systems). I found that 7 of >>>> the 245 nodes (1960 core job) had no xhpl processes on them. >>>> So either the launching mechanism hung, or something was up with one of >>>> those nodes. >>>> >>>> My question is, how should I start debugging this to understand >>>> what process is hanging? >>>> >>>> Thanks, >>>> Craig >>>> >>>> >>>> -- >>>> Craig Tierney (craig.tierney@noaa.gov) >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>> >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- Craig Tierney (craig.tierney@noaa.gov) From Craig.Tierney at noaa.gov Fri Jul 10 13:43:07 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Fri Jul 10 13:43:31 2009 Subject: [mvapich-discuss] Question about NUMA support in mvapich2-1.2 (and later) In-Reply-To: <9314796A0C8D864B8CAD7C3BFA9CD90C212F8464AE@NL0105EXC01V01.eur.slb.com> References: <9314796A0C8D864B8CAD7C3BFA9CD90C212F8464AE@NL0105EXC01V01.eur.slb.com> Message-ID: <4A577DAB.4040801@noaa.gov> The mvapich2 documentation states: Optimized for Bus-based SMP and NUMA-Based SMP systems. But I cannot find any other reference to what exactly mvapich2 does for NUMA-Based systems (like nehalem). Simple tests have shown that I cannot use numactl to explicitly lay out MPI processes to improve performance for memory bandwidth sensitive applications over just running the application directly with mpirun. But I would like to understand what is being done by mvapich2. Thanks, Craig -- Craig Tierney (craig.tierney@noaa.gov) From isono at cray.com Mon Jul 13 00:37:41 2009 From: isono at cray.com (Satoshi Isono) Date: Mon Jul 13 00:38:41 2009 Subject: [mvapich-discuss] Needs firewall setting for MPI Message-ID: <925346A443D4E340BEB20248BAFCDBDF0BC3084C@CFEVS1-IP.americas.cray.com> Hello everyone, Does anyone know FireWall configuration which allows MVAPICH code to run. Which port do I have to open? I am using mpirun_rsh based on SSH. Certainly, ssh command without password works. [craysp@t2k-0004 ~]$ ssh t2k-ps1 hostname t2k-ps1 [craysp@t2k-0004 ~]$ ssh t2k-ps2 hostname t2k-ps2 t2k-0004 is login node, on where everyone launches mpirun_rsh command. For two nodes, t2k-ps1 and t2k-ps2 are actually compute nodes. When launching mpirun_rsh, but it fails with the following messages. gethostbyname: Host name lookup failure Child exited abnormally! cleanupKilling remote processes...gethostbyname: Host name lookup failure DONE It seems resolving hostname is fail. Can you please advise me or point me checking files? Regards, Satoshi Isono From isono at cray.com Mon Jul 13 01:07:03 2009 From: isono at cray.com (Satoshi Isono) Date: Mon Jul 13 01:07:30 2009 Subject: [mvapich-discuss] Disable interactive login to compute node References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> Message-ID: <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> Hello Bill, everyone, Sorry, this issue may not be a MVAPICH article. Please let me have your opinion on this. The background is I use MVAPICH based on SSH authorization. And the number of users is more one thousand. In order to run MPI, I have done SSH setting. As a result of my configuration, SSH login between compute nodes does not need password. As we know, this is general setting for MPI run environment. On the other hand, anyone who has an account on compute node has done login for arbitrary nodes. This action is not cared from system security side. I think we should consider that all users aren't able to login during other users job running. My concern is how everyone control such as operations. I know this may depend on the system policy. On big system site like a TACC, how is this restricted? For example, before/after running MPI, to set available user, we are able to edit password file, automatically? Best regards, Satoshi Isono From wgy at altair.com.cn Mon Jul 13 02:14:43 2009 From: wgy at altair.com.cn (Guangyu Wu) Date: Mon Jul 13 02:15:11 2009 Subject: [mvapich-discuss] Disable interactive login to compute node In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> Message-ID: Hello: This has been a common security/management issue raised by admins. Not sure if anyone has addressed it without using a WMS. The latest version of PBS Pro provides a way around where the methodology is simple, I.e, having the WMS daemon on nodes scanning processes owner, if the owner doesn't has a job (submitted thru WMS) running then kill the processes. This way even reomote login is not possible. Of course some exception are there, e.g. User can login a node during the time his job is running there. Admins can except some users who is not limited by this behavior. HTH Henry, Wu On 7/13/09, Satoshi Isono wrote: > Hello Bill, everyone, > > Sorry, this issue may not be a MVAPICH article. Please let me have your > opinion on this. > > The background is I use MVAPICH based on SSH authorization. And the > number of users is more one thousand. In order to run MPI, I have done > SSH setting. As a result of my configuration, SSH login between compute > nodes does not need password. As we know, this is general setting for > MPI run environment. > > On the other hand, anyone who has an account on compute node has done > login for arbitrary nodes. This action is not cared from system security > side. I think we should consider that all users aren't able to login > during other users job running. > > My concern is how everyone control such as operations. I know this may > depend on the system policy. On big system site like a TACC, how is this > restricted? > > For example, before/after running MPI, to set available user, we are > able to edit password file, automatically? > > Best regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From bbarth at tacc.utexas.edu Mon Jul 13 08:34:31 2009 From: bbarth at tacc.utexas.edu (Bill Barth) Date: Mon Jul 13 08:35:03 2009 Subject: [mvapich-discuss] RE: Disable interactive login to compute node In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> Message-ID: <0E07074B82CE4B4A9982802A8484B6968425F366DE@EXCHANGE2K7.tacc.utexas.edu> Satoshi, We solve this problem by having a PAM module that checks and SGE spool directory for whether the user has a job on the node. If it does, it allows access, otherwise it does not. We could also have used the SGE prolog script to modify /etc/security/access.conf to allow access and epilog to modify it to remove access. Regards, Bill. -- Bill Barth, Ph.D., Director, HPC bbarth@tacc.utexas.edu??????? |?? Phone: (512) 232-7069 Office: ROC 1.435???????????? |?? Fax:?? (512) 475-9445 > -----Original Message----- > From: Satoshi Isono [mailto:isono@cray.com] > Sent: Monday, July 13, 2009 12:07 AM > To: Bill Barth > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Disable interactive login to compute node > > Hello Bill, everyone, > > Sorry, this issue may not be a MVAPICH article. Please let me have your > opinion on this. > > The background is I use MVAPICH based on SSH authorization. And the > number of users is more one thousand. In order to run MPI, I have done > SSH setting. As a result of my configuration, SSH login between compute > nodes does not need password. As we know, this is general setting for > MPI run environment. > > On the other hand, anyone who has an account on compute node has done > login for arbitrary nodes. This action is not cared from system > security > side. I think we should consider that all users aren't able to login > during other users job running. > > My concern is how everyone control such as operations. I know this may > depend on the system policy. On big system site like a TACC, how is > this > restricted? > > For example, before/after running MPI, to set available user, we are > able to edit password file, automatically? > > Best regards, > Satoshi Isono From isono at cray.com Tue Jul 14 00:18:37 2009 From: isono at cray.com (Satoshi Isono) Date: Tue Jul 14 00:19:05 2009 Subject: [mvapich-discuss] RE: Disable interactive login to compute node In-Reply-To: <0E07074B82CE4B4A9982802A8484B6968425F366DE@EXCHANGE2K7.tacc.utexas.edu> References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968425F366DE@EXCHANGE2K7.tacc.utexas.edu> Message-ID: <925346A443D4E340BEB20248BAFCDBDF0BCD60CD@CFEVS1-IP.americas.cray.com> Hello Bill, Thank you very much for your information. I think this is good idea. I also know pro/epilog operation. I will modify these scripts to edit SSH accessible user database. Regards, Satoshi Isono -----Original Message----- From: Bill Barth [mailto:bbarth@tacc.utexas.edu] Sent: Monday, July 13, 2009 9:35 PM To: Satoshi Isono Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: Disable interactive login to compute node Satoshi, We solve this problem by having a PAM module that checks and SGE spool directory for whether the user has a job on the node. If it does, it allows access, otherwise it does not. We could also have used the SGE prolog script to modify /etc/security/access.conf to allow access and epilog to modify it to remove access. Regards, Bill. -- Bill Barth, Ph.D., Director, HPC bbarth@tacc.utexas.edu??????? |?? Phone: (512) 232-7069 Office: ROC 1.435???????????? |?? Fax:?? (512) 475-9445 > -----Original Message----- > From: Satoshi Isono [mailto:isono@cray.com] > Sent: Monday, July 13, 2009 12:07 AM > To: Bill Barth > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Disable interactive login to compute node > > Hello Bill, everyone, > > Sorry, this issue may not be a MVAPICH article. Please let me have your > opinion on this. > > The background is I use MVAPICH based on SSH authorization. And the > number of users is more one thousand. In order to run MPI, I have done > SSH setting. As a result of my configuration, SSH login between compute > nodes does not need password. As we know, this is general setting for > MPI run environment. > > On the other hand, anyone who has an account on compute node has done > login for arbitrary nodes. This action is not cared from system > security > side. I think we should consider that all users aren't able to login > during other users job running. > > My concern is how everyone control such as operations. I know this may > depend on the system policy. On big system site like a TACC, how is > this > restricted? > > For example, before/after running MPI, to set available user, we are > able to edit password file, automatically? > > Best regards, > Satoshi Isono From isono at cray.com Tue Jul 14 00:28:07 2009 From: isono at cray.com (Satoshi Isono) Date: Tue Jul 14 00:28:36 2009 Subject: [mvapich-discuss] Disable interactive login to compute node In-Reply-To: References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> Message-ID: <925346A443D4E340BEB20248BAFCDBDF0BCD60D0@CFEVS1-IP.americas.cray.com> Dear Guangyu, Thanks for your advice. From your view, I am interested in WMS. Is WMS contained in latest PBS pro package? Can we use WMS on other job scheduler like Sun Grid Engine? Regards, Satoshi Isono -----Original Message----- From: henry.wuguangyu@gmail.com [mailto:henry.wuguangyu@gmail.com] On Behalf Of Guangyu Wu Sent: Monday, July 13, 2009 3:15 PM To: Satoshi Isono; Bill Barth; mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Disable interactive login to compute node Hello: This has been a common security/management issue raised by admins. Not sure if anyone has addressed it without using a WMS. The latest version of PBS Pro provides a way around where the methodology is simple, I.e, having the WMS daemon on nodes scanning processes owner, if the owner doesn't has a job (submitted thru WMS) running then kill the processes. This way even reomote login is not possible. Of course some exception are there, e.g. User can login a node during the time his job is running there. Admins can except some users who is not limited by this behavior. HTH Henry, Wu On 7/13/09, Satoshi Isono wrote: > Hello Bill, everyone, > > Sorry, this issue may not be a MVAPICH article. Please let me have your > opinion on this. > > The background is I use MVAPICH based on SSH authorization. And the > number of users is more one thousand. In order to run MPI, I have done > SSH setting. As a result of my configuration, SSH login between compute > nodes does not need password. As we know, this is general setting for > MPI run environment. > > On the other hand, anyone who has an account on compute node has done > login for arbitrary nodes. This action is not cared from system security > side. I think we should consider that all users aren't able to login > during other users job running. > > My concern is how everyone control such as operations. I know this may > depend on the system policy. On big system site like a TACC, how is this > restricted? > > For example, before/after running MPI, to set available user, we are > able to edit password file, automatically? > > Best regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From wgy at altair.com.cn Tue Jul 14 03:48:13 2009 From: wgy at altair.com.cn (Guangyu Wu) Date: Tue Jul 14 03:48:41 2009 Subject: =?gb2312?B?tPC4tDogW212YXBpY2gtZGlzY3Vzc10gRGlzYWJsZSBpbnRlcmFjdA==?= =?gb2312?B?aXZlIGxvZ2luIHRvIGNvbXB1dGUgbm9kZQ==?= In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF0BCD60D0@CFEVS1-IP.americas.cray.com> References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0BCD60D0@CFEVS1-IP.americas.cray.com> Message-ID: Hello, satoshi: WMS means workload management system like pbspro, lsf, sge, openpbs ,etc. it is not module though. In the latest pbspro package, it provides a functionality to control access to computing nodes. It's not perfect but works very well and provides great flexibility. If you would learn more information on this, you can find pbs pro admin guide by google. What bill suggested should work. However it would take considerable scripting effort to gain robustness and necessary flexibility. Thanks Wu -----ÓʼþÔ­¼þ----- ·¢¼þÈË: Satoshi Isono [mailto:isono@cray.com] ·¢ËÍʱ¼ä: 2009Äê7ÔÂ14ÈÕ 12:28 ÊÕ¼þÈË: Guangyu Wu ³­ËÍ: Bill Barth; mvapich-discuss@cse.ohio-state.edu Ö÷Ìâ: RE: [mvapich-discuss] Disable interactive login to compute node Dear Guangyu, Thanks for your advice. From your view, I am interested in WMS. Is WMS contained in latest PBS pro package? Can we use WMS on other job scheduler like Sun Grid Engine? Regards, Satoshi Isono -----Original Message----- From: henry.wuguangyu@gmail.com [mailto:henry.wuguangyu@gmail.com] On Behalf Of Guangyu Wu Sent: Monday, July 13, 2009 3:15 PM To: Satoshi Isono; Bill Barth; mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Disable interactive login to compute node Hello: This has been a common security/management issue raised by admins. Not sure if anyone has addressed it without using a WMS. The latest version of PBS Pro provides a way around where the methodology is simple, I.e, having the WMS daemon on nodes scanning processes owner, if the owner doesn't has a job (submitted thru WMS) running then kill the processes. This way even reomote login is not possible. Of course some exception are there, e.g. User can login a node during the time his job is running there. Admins can except some users who is not limited by this behavior. HTH Henry, Wu On 7/13/09, Satoshi Isono wrote: > Hello Bill, everyone, > > Sorry, this issue may not be a MVAPICH article. Please let me have your > opinion on this. > > The background is I use MVAPICH based on SSH authorization. And the > number of users is more one thousand. In order to run MPI, I have done > SSH setting. As a result of my configuration, SSH login between compute > nodes does not need password. As we know, this is general setting for > MPI run environment. > > On the other hand, anyone who has an account on compute node has done > login for arbitrary nodes. This action is not cared from system security > side. I think we should consider that all users aren't able to login > during other users job running. > > My concern is how everyone control such as operations. I know this may > depend on the system policy. On big system site like a TACC, how is this > restricted? > > For example, before/after running MPI, to set available user, we are > able to edit password file, automatically? > > Best regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From leahy at crystal.harvard.edu Tue Jul 14 12:23:11 2009 From: leahy at crystal.harvard.edu (Jerry Leahy) Date: Tue Jul 14 13:31:44 2009 Subject: [mvapich-discuss] Problem compiling gromacs-4.0.5 against mvapich2-1.4rc1 Message-ID: <12a5e6520907140923o4dd786b5ya9dca1953a4db530@mail.gmail.com> Hello, I have built MVAPICH2-1.4rc1 on a CentOS 5.3 box and am trying to build gromacs-4.0.5 with MPI support. make is failing with the following error message: mpicc -O3 -fomit-frame-pointer -finline-functions -Wall -Wno-unused -funroll-all-loops -std=gnu99 -o grompp grompp.o ./.libs/libgmxpreprocess_mpi.a -L/usr/lib64 ../mdlib/.libs/libmd_mpi.a /opt/osg-shared/se/data/shared/install/gromacs-4.0.5/src/gmxlib/.libs/libgmx_mpi.a ../gmxlib/.libs/libgmx_mpi.a -lxml2 -lnsl -lfftw3f -lm -lX11 /usr/local/lib/libmpich.a(ibv_channel_manager.o):(.bss+0x10): multiple definition of `debug' <..snip..>/gromacs-4.0.5/src/gmxlib/.libs/libgmx_mpi.a(gmx_fatal.o):(.bss+0x0): first defined here collect2: ld returned 1 exit status make[3]: *** [grompp] Error 1 It looks like 'debug' is conflicting in both MVAPICH2 and in Gromacs. Any suggestions? Thanks, Jerry. Research Systems Administrator SBGrid -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090714/36bc39f7/attachment.html From michael.heinz at qlogic.com Tue Jul 14 17:04:29 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue Jul 14 17:14:33 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> We're having a very odd problem with our fabric, where, out of the entire cluster, machine "A" can't run mvapich2 programs with machine "B", and machine "C" can't run programs with machine "D" - even though "A" can run with "D" and "B" can run with "C" - and the rest of the fabric works fine. 1) There are no IB errors anywhere on the fabric that I can find, and the machines in question all work correctly with mvapich1 and low-level IB tests. 2) The problem occurs whether using mpd or rsh. 3) If I attach to the running processes, both machines appear to be waiting for a read operation to complete. (See below) Can anyone make a suggestion on how to debug this? Stack trace for node 0: #0 0x000000361160abb5 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, wc=0x7fff9d835900) at src/cq.c:468 #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) at /usr/include/infiniband/verbs.h:934 #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fff9d8359e0, v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, state=) at ch3_progress.c:202 #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) at helper_fns.c:269 #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, recvtype=1275069445, source=1, recvtag=7, comm=1140850688, status=0x7fff9d835b60) at helper_fns.c:125 #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, sendcount=, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) at allgather.c:192 #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, recvtype=1275069445, comm=1140850688) at allgather.c:866 ---Type to continue, or q to quit--- #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0, newcomm=0x2aaaaae1c2f4) at comm_split.c:196 #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, my_rank=) at create_2level_comm.c:142 #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, argv=0x7fff9d835e70) at init.c:146 #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 Stack trace for node 1: #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress (vc_pptr=0x7fffdee81020, v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, state=) at ch3_progress.c:202 #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) at helper_fns.c:269 #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, recvtype=1275069445, source=0, recvtag=7, comm=1140850688, status=0x7fffdee811a0) at helper_fns.c:125 #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, sendcount=, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) at allgather.c:192 #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, recvtype=1275069445, comm=1140850688) at allgather.c:866 #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, key=0, newcomm=0x2ac3cbfb0d94) at comm_split.c:196 #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, my_rank=) at create_2level_comm.c:142 #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, argv=0x7fffdee814b0) at init.c:146 ---Type to continue, or q to quit--- #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090714/51ae6f42/attachment-0001.html From kandalla at cse.ohio-state.edu Tue Jul 14 18:38:40 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya) Date: Tue Jul 14 18:39:04 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> Message-ID: Mike, The hang seems to be occuring when the MPI library is trying to create the 2-level communicator, during the init phase. Can you try running the test with MV2_USE_SHMEM_COLL=0. This will ensure that a flat communicator is used for the subsequent MPI calls. This might help us isolate the problem. Thanks, Krishna On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz wrote: > We?re having a very odd problem with our fabric, where, out of the entire > cluster, machine ?A? can?t run mvapich2 programs with machine ?B?, and > machine ?C? can?t run programs with machine ?D? ? even though ?A? can run > with ?D? and ?B? can run with ?C? ? and the rest of the fabric works fine. > > > > 1) There are no IB errors anywhere on the fabric that I can find, and > the machines in question all work correctly with mvapich1 and low-level IB > tests. > > 2) The problem occurs whether using mpd or rsh. > > 3) If I attach to the running processes, both machines appear to be > waiting for a read operation to complete. (See below) > > > > Can anyone make a suggestion on how to debug this? > > > > Stack trace for node 0: > > > > #0 0x000000361160abb5 in pthread_spin_lock () from /lib64/libpthread.so.0 > > #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, > > wc=0x7fff9d835900) at src/cq.c:468 > > #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( > > vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) > > at /usr/include/infiniband/verbs.h:934 > > #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fff9d8359e0, > > v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 > > #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, > > state=) at ch3_progress.c:202 > > #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) > > at helper_fns.c:269 > > #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, > > sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, > > recvtype=1275069445, source=1, recvtag=7, comm=1140850688, > > status=0x7fff9d835b60) at helper_fns.c:125 > > #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, > > sendcount=, sendtype=1275069445, > recvbuf=0x217fc50, > > recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) > > at allgather.c:192 > > #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, > > sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, > > recvtype=1275069445, comm=1140850688) at allgather.c:866 > > ---Type to continue, or q to quit--- > > #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0, > > newcomm=0x2aaaaae1c2f4) at comm_split.c:196 > > #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, > > my_rank=) at create_2level_comm.c:142 > > #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, > argv=0x7fff9d835e70) > > at init.c:146 > > #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 > > > > Stack trace for node 1: > > > > #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress (vc_pptr=0x7fffdee81020, > > v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 > > #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, > > state=) at ch3_progress.c:202 > > #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) > > at helper_fns.c:269 > > #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, > > sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, > > recvtype=1275069445, source=0, recvtag=7, comm=1140850688, > > status=0x7fffdee811a0) at helper_fns.c:125 > > #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, > > sendcount=, sendtype=1275069445, recvbuf=0xf79020, > > recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) > > at allgather.c:192 > > #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, > > sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, > > recvtype=1275069445, comm=1140850688) at allgather.c:866 > > #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, key=0, > > newcomm=0x2ac3cbfb0d94) at comm_split.c:196 > > #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, > > my_rank=) at create_2level_comm.c:142 > > #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, > argv=0x7fffdee814b0) > > at init.c:146 > > ---Type to continue, or q to quit--- > > #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation > > King of Prussia, Pennsylvania > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- In the middle of difficulty, lies opportunity -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090714/de258c01/attachment.html From isono at cray.com Wed Jul 15 00:50:56 2009 From: isono at cray.com (Satoshi Isono) Date: Wed Jul 15 00:51:33 2009 Subject: [mvapich-discuss] Disable interactive login to compute node In-Reply-To: References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> <925346A443D4E340BEB20248BAFCDBDF0BC30856@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0BCD60D0@CFEVS1-IP.americas.cray.com> Message-ID: <925346A443D4E340BEB20248BAFCDBDF0BD46F4F@CFEVS1-IP.americas.cray.com> Hello Guangyu, I appreciate your advice. I have misunderstood. WMS is not product name. WMS is short for Workload Management Solution. Regards, Satoshi Isono -----Original Message----- From: Guangyu Wu [mailto:wgy@altair.com.cn] Sent: Tuesday, July 14, 2009 4:48 PM To: Satoshi Isono Cc: 'Bill Barth'; mvapich-discuss@cse.ohio-state.edu Subject: ´ð¸´: [mvapich-discuss] Disable interactive login to compute node Hello, satoshi: WMS means workload management system like pbspro, lsf, sge, openpbs ,etc. it is not module though. In the latest pbspro package, it provides a functionality to control access to computing nodes. It's not perfect but works very well and provides great flexibility. If you would learn more information on this, you can find pbs pro admin guide by google. What bill suggested should work. However it would take considerable scripting effort to gain robustness and necessary flexibility. Thanks Wu -----ÓʼþÔ­¼þ----- ·¢¼þÈË: Satoshi Isono [mailto:isono@cray.com] ·¢ËÍʱ¼ä: 2009Äê7ÔÂ14ÈÕ 12:28 ÊÕ¼þÈË: Guangyu Wu ³­ËÍ: Bill Barth; mvapich-discuss@cse.ohio-state.edu Ö÷Ìâ: RE: [mvapich-discuss] Disable interactive login to compute node Dear Guangyu, Thanks for your advice. From your view, I am interested in WMS. Is WMS contained in latest PBS pro package? Can we use WMS on other job scheduler like Sun Grid Engine? Regards, Satoshi Isono -----Original Message----- From: henry.wuguangyu@gmail.com [mailto:henry.wuguangyu@gmail.com] On Behalf Of Guangyu Wu Sent: Monday, July 13, 2009 3:15 PM To: Satoshi Isono; Bill Barth; mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Disable interactive login to compute node Hello: This has been a common security/management issue raised by admins. Not sure if anyone has addressed it without using a WMS. The latest version of PBS Pro provides a way around where the methodology is simple, I.e, having the WMS daemon on nodes scanning processes owner, if the owner doesn't has a job (submitted thru WMS) running then kill the processes. This way even reomote login is not possible. Of course some exception are there, e.g. User can login a node during the time his job is running there. Admins can except some users who is not limited by this behavior. HTH Henry, Wu On 7/13/09, Satoshi Isono wrote: > Hello Bill, everyone, > > Sorry, this issue may not be a MVAPICH article. Please let me have your > opinion on this. > > The background is I use MVAPICH based on SSH authorization. And the > number of users is more one thousand. In order to run MPI, I have done > SSH setting. As a result of my configuration, SSH login between compute > nodes does not need password. As we know, this is general setting for > MPI run environment. > > On the other hand, anyone who has an account on compute node has done > login for arbitrary nodes. This action is not cared from system security > side. I think we should consider that all users aren't able to login > during other users job running. > > My concern is how everyone control such as operations. I know this may > depend on the system policy. On big system site like a TACC, how is this > restricted? > > For example, before/after running MPI, to set available user, we are > able to edit password file, automatically? > > Best regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From perkinjo at cse.ohio-state.edu Wed Jul 15 14:23:57 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Wed Jul 15 14:24:22 2009 Subject: [mvapich-discuss] OSU MVAPICH2 1.4-RC1-3378 (06/02/09) VPATH build and "debug" static? In-Reply-To: References: Message-ID: <20090715182357.GD2455@cse.ohio-state.edu> Brad: Sorry for the late reply, my responses are inline. On Tue, Jul 07, 2009 at 11:29:02PM -0700, Brad Penoff wrote: > hey, > > I'm not sure if these is are known or legitimate issues or if it's > particular to my system, but I was not able to do a VPATH build with > your latest MVAPICH2 tarball nor build my application. These were > fixed after two work-arounds. I wondered if these work-arounds were > known or if I did something wrong in the first place. > > ----Issue #1---- > > I downloaded http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-1.4rc1-3378.tgz > and then tar zxf, cd mvapich2-1.4rc1, and then mkdir build. From > inside build, I did a VPATH build by configuring to create 32-bit > libraries the following way on my 64-bit machine (Red Hat Enterprise > Linux Server release 5.1 (Tikanga)): > > $ ../configure CFLAGS=-m32 CPPFLAGS=-m32 FC=gfortran F90=gfortran > FFLAGS=-m32 F90FLAGS=-m32 LDFLAGS=-m32 > --prefix=/home/penoff/installs/mvapich2-1.4rc1 > > Eventually the "make" died with the error below. When I did the same > configure line but instead did not to a VPATH build (so from > mvapich2-1.4rc1), the build succeeded as did the "make install". I was > just wondering if this was a known issue. The error I saw is below. I'll see if I can reproduce this and if so, resolve it. > > ----- Issue #2 ---- > Once installed, we compiled our code. We have a function called > debug() in our code somewhere. It was conflicting when compiling with > an internal variable of your code. I'm not sure who is at fault here, > but instead of renaming our function and adjusting all of our code in > countless places, instead to fix this, I just made the long variable > "debug" in src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:503 > to be static by putting that keyword at the beginning of the line. I don't think this is anyone's "fault" but can probably be avoided to you and other users on our side if we can avoid unnecessary namespace pollution. > > > > Are these fixes necessary or am I doing something wrong to begin with? It looks like it is in your case. I'll double check that there aren't any unwanted side effects of marking debug as static and get back to you. > > Thanks, > brad > > > make[3]: Entering directory > `/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun' > gcc -DHAVE_CONFIG_H -I. -I../../../../src/pm/mpirun > -I/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun/../../../../src/pm/mpirun/include > -m32 -DNDEBUG -O2 -m32 > -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/ch3/include > -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/ch3/include > -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/common/datatype > -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/common/datatype > -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/common/locks > -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/ch3/channels/mrail/include > -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/ch3/channels/mrail/src/gen2 > -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > -I/home/penoff/src/mvapich2-1.4rc1/build/src/mpid/common/locks > -I/home/penoff/src/mvapich2-1.4rc1/src/mpid/common/locks -c > ../../../../src/pm/mpirun/mpirun_rsh.c > ../../../../src/pm/mpirun/mpirun_rsh.c:27:24: error: mpirunconf.h: No > such file or directory > ../../../../src/pm/mpirun/mpirun_rsh.c:272: error: expected identifier > or ?(? before ?__extension__? > make[3]: *** [mpirun_rsh.o] Error 1 > make[3]: Leaving directory > `/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun' > make[2]: *** [all-redirect] Error 2 > make[2]: Leaving directory `/home/penoff/src/mvapich2-1.4rc1/build/src/pm' > make[1]: *** [all-redirect] Error 2 > make[1]: Leaving directory `/home/penoff/src/mvapich2-1.4rc1/build/src' > make: *** [all-redirect] Error 2 > [penoff@hpc0001 build]$ find .. -name mpirunconf.h > ../build/src/pm/mpirun/include/mpirunconf.h > [penoff@hpc0001 build]$ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090715/80e7dd9e/attachment-0001.bin From michael.heinz at qlogic.com Wed Jul 15 15:02:03 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed Jul 15 15:03:41 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL to zero did not seem to change the stack trace much: Node 0: 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll (vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 529 for (; i < rdma_num_hcas; ++i) { (gdb) where #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fffcb46d6a0, v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, state=) at ch3_progress.c:202 #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) at helper_fns.c:269 #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, sendcount=2, sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, recvcount=2, recvtype=1275069445, source=1, recvtag=7, comm=1140850688, status=0x7fffcb46d820) at helper_fns.c:125 #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, sendcount=, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) at allgather.c:192 #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, recvtype=1275069445, comm=1140850688) at allgather.c:866 #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0, newcomm=0x2aaaaae1c2f4) at comm_split.c:196 #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, ---Type to continue, or q to quit--- my_rank=) at create_2level_comm.c:142 #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, argv=0x7fffcb46db30) at init.c:146 #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 Node 1: MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, is_blocking=1) at ch3_read_progress.c:143 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); (gdb) where #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, is_blocking=1) at ch3_read_progress.c:143 #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, state=) at ch3_progress.c:202 #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) at helper_fns.c:269 #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, recvtype=1275069445, source=0, recvtag=7, comm=1140850688, status=0x7fff0b10bcd0) at helper_fns.c:125 #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, sendcount=, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) at allgather.c:192 #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, recvtype=1275069445, comm=1140850688) at allgather.c:866 #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, key=0, newcomm=0x2afc9fd26d94) at comm_split.c:196 #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, my_rank=) at create_2level_comm.c:142 #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, argv=0x7fff0b10bfe0) at init.c:146 ---Type to continue, or q to quit--- #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 Any suggestions would be appreciated. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From: kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] On Behalf Of Krishna Chaitanya Sent: Tuesday, July 14, 2009 6:39 PM To: Mike Heinz Cc: Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; mpich2-dev@mcs.anl.gov Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. Mike, The hang seems to be occuring when the MPI library is trying to create the 2-level communicator, during the init phase. Can you try running the test with MV2_USE_SHMEM_COLL=0. This will ensure that a flat communicator is used for the subsequent MPI calls. This might help us isolate the problem. Thanks, Krishna On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz > wrote: We're having a very odd problem with our fabric, where, out of the entire cluster, machine "A" can't run mvapich2 programs with machine "B", and machine "C" can't run programs with machine "D" - even though "A" can run with "D" and "B" can run with "C" - and the rest of the fabric works fine. 1) There are no IB errors anywhere on the fabric that I can find, and the machines in question all work correctly with mvapich1 and low-level IB tests. 2) The problem occurs whether using mpd or rsh. 3) If I attach to the running processes, both machines appear to be waiting for a read operation to complete. (See below) Can anyone make a suggestion on how to debug this? Stack trace for node 0: #0 0x000000361160abb5 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, wc=0x7fff9d835900) at src/cq.c:468 #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) at /usr/include/infiniband/verbs.h:934 #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fff9d8359e0, v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, state=) at ch3_progress.c:202 #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) at helper_fns.c:269 #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, recvtype=1275069445, source=1, recvtag=7, comm=1140850688, status=0x7fff9d835b60) at helper_fns.c:125 #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, sendcount=, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) at allgather.c:192 #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, recvtype=1275069445, comm=1140850688) at allgather.c:866 ---Type to continue, or q to quit--- #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0, newcomm=0x2aaaaae1c2f4) at comm_split.c:196 #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, my_rank=) at create_2level_comm.c:142 #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, argv=0x7fff9d835e70) at init.c:146 #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 Stack trace for node 1: #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress (vc_pptr=0x7fffdee81020, v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, state=) at ch3_progress.c:202 #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) at helper_fns.c:269 #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, recvtype=1275069445, source=0, recvtag=7, comm=1140850688, status=0x7fffdee811a0) at helper_fns.c:125 #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, sendcount=, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) at allgather.c:192 #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, recvtype=1275069445, comm=1140850688) at allgather.c:866 #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, key=0, newcomm=0x2ac3cbfb0d94) at comm_split.c:196 #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, my_rank=) at create_2level_comm.c:142 #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, argv=0x7fffdee814b0) at init.c:146 ---Type to continue, or q to quit--- #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- In the middle of difficulty, lies opportunity -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090715/6b85a1bc/attachment-0001.html From kandalla at cse.ohio-state.edu Wed Jul 15 16:13:12 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya Kandalla) Date: Wed Jul 15 16:13:43 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> Message-ID: <4A5E3858.5010702@cse.ohio-state.edu> Mike, Thank you for providing the source code. I am able to reproduce the hang on our cluster, as well. I will look into the issue. Thanks, Krishna Mike Heinz wrote: > I was wondering about that - I passed the parameter in a param file, using the -param argument to mpirun_rsh. I just tried passing it inline as well, here are the results: > > mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 /opt/iba/src/mpi_apps/bandwidth/bw 10 10 > > node 0 > > Loaded symbols for /lib64/libnss_files.so.2 > 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () > from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 > (gdb) where > #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () > from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 > #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) > at ch3_progress.c:174 > #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, > datatype=1275068673, source=1, tag=101, comm=1140850688, status=0x601520) > at recv.c:156 > #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 > > > (gdb) where > #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, > wc=0x7fffb9786a60) at src/cq.c:470 > #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( > vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, is_blocking=1) > at /usr/include/infiniband/verbs.h:934 > #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, > vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 > #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress (vc_pptr=0x7fffb9786b80, > v_ptr=0x7fffb9786b78, is_blocking=) > at ch3_read_progress.c:158 > #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, > state=) at ch3_progress.c:202 > #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at helper_fns.c:269 > #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, > sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, > recvtype=1275068685, source=0, recvtag=1, comm=1140850688, status=0x1) > at helper_fns.c:125 > #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) > at barrier.c:82 > #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at barrier.c:446 > #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 > > Bw.c is the old "bandwidth" benchmark. It looks like it actually gets out of MPI_Init() in this case, but then one side is waiting at a barrier while the other has already gone past the barrier. I've attached a copy of the program. > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > -----Original Message----- > From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] > Sent: Wednesday, July 15, 2009 3:42 PM > To: Mike Heinz > Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. > > Mike, > Thats a little surprising. Setting this variable off ensures that a > particular flag is set to 0. This flag is supposed to guard the piece of > code that does the 2-level communicator creation. Just out of curiosity, > can you also let me know the command that you are using to launch the > job. The env variables need to be set before the executable is > specified. If MV2_USE_SHMEM_COLL=0 appears after the executable name, > the job launcher might not pick it up. > > Thanks, > Krishna > > > > > Mike Heinz wrote: > >> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL to >> zero did not seem to change the stack trace much: >> >> Node 0: >> >> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >> (vbuf_handle=0x7fffcb46d698, >> >> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >> >> 529 for (; i < rdma_num_hcas; ++i) { >> >> (gdb) where >> >> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >> >> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >> >> at ibv_channel_manager.c:529 >> >> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fffcb46d6a0, >> >> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >> >> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >> >> state=) at ch3_progress.c:202 >> >> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >> >> at helper_fns.c:269 >> >> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, sendcount=2, >> >> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, recvcount=2, >> >> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >> >> status=0x7fffcb46d820) at helper_fns.c:125 >> >> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >> >> sendcount=, sendtype=1275069445, recvbuf=0x10993a80, >> >> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >> >> at allgather.c:192 >> >> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >> >> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >> >> recvtype=1275069445, comm=1140850688) at allgather.c:866 >> >> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0, >> >> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >> >> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >> >> ---Type to continue, or q to quit--- >> >> my_rank=) at create_2level_comm.c:142 >> >> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >> argv=0x7fffcb46db30) >> >> at init.c:146 >> >> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >> >> Node 1: >> >> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >> >> is_blocking=1) at ch3_read_progress.c:143 >> >> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >> >> (gdb) where >> >> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >> >> is_blocking=1) at ch3_read_progress.c:143 >> >> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >> >> state=) at ch3_progress.c:202 >> >> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >> >> at helper_fns.c:269 >> >> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >> >> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >> >> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >> >> status=0x7fff0b10bcd0) at helper_fns.c:125 >> >> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >> >> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >> >> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >> >> at allgather.c:192 >> >> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >> >> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >> >> recvtype=1275069445, comm=1140850688) at allgather.c:866 >> >> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, key=0, >> >> newcomm=0x2afc9fd26d94) at comm_split.c:196 >> >> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >> >> my_rank=) at create_2level_comm.c:142 >> >> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >> argv=0x7fff0b10bfe0) >> >> at init.c:146 >> >> ---Type to continue, or q to quit--- >> >> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >> >> Any suggestions would be appreciated. >> >> -- >> >> Michael Heinz >> >> Principal Engineer, Qlogic Corporation >> >> King of Prussia, Pennsylvania >> >> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On Behalf >> Of *Krishna Chaitanya >> *Sent:* Tuesday, July 14, 2009 6:39 PM >> *To:* Mike Heinz >> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >> mpich2-dev@mcs.anl.gov >> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >> a problem that only affects a few machines in our cluster. >> >> Mike, >> The hang seems to be occuring when the MPI library is trying to create >> the 2-level communicator, during the init phase. Can you try running >> the test with MV2_USE_SHMEM_COLL >> =0. >> This will ensure that a flat communicator is used for the subsequent >> MPI calls. This might help us isolate the problem. >> >> Thanks, >> Krishna >> >> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz > > wrote: >> >> We're having a very odd problem with our fabric, where, out of the >> entire cluster, machine "A" can't run mvapich2 programs with machine >> "B", and machine "C" can't run programs with machine "D" - even though >> "A" can run with "D" and "B" can run with "C" - and the rest of the >> fabric works fine. >> >> 1) There are no IB errors anywhere on the fabric that I can find, and >> the machines in question all work correctly with mvapich1 and >> low-level IB tests. >> >> 2) The problem occurs whether using mpd or rsh. >> >> 3) If I attach to the running processes, both machines appear to be >> waiting for a read operation to complete. (See below) >> >> Can anyone make a suggestion on how to debug this? >> >> Stack trace for node 0: >> >> #0 0x000000361160abb5 in pthread_spin_lock () from /lib64/libpthread.so.0 >> >> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >> >> wc=0x7fff9d835900) at src/cq.c:468 >> >> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >> >> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >> >> at /usr/include/infiniband/verbs.h:934 >> >> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress (vc_pptr=0x7fff9d8359e0, >> >> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >> >> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >> >> state=) at ch3_progress.c:202 >> >> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >> >> at helper_fns.c:269 >> >> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >> >> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >> >> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >> >> status=0x7fff9d835b60) at helper_fns.c:125 >> >> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >> >> sendcount=, sendtype=1275069445, recvbuf=0x217fc50, >> >> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >> >> at allgather.c:192 >> >> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >> >> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >> >> recvtype=1275069445, comm=1140850688) at allgather.c:866 >> >> ---Type to continue, or q to quit--- >> >> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, key=0, >> >> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >> >> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >> >> my_rank=) at create_2level_comm.c:142 >> >> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >> argv=0x7fff9d835e70) >> >> at init.c:146 >> >> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >> >> Stack trace for node 1: >> >> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress (vc_pptr=0x7fffdee81020, >> >> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >> >> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >> >> state=) at ch3_progress.c:202 >> >> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >> >> at helper_fns.c:269 >> >> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >> >> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >> >> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >> >> status=0x7fffdee811a0) at helper_fns.c:125 >> >> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >> >> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >> >> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >> >> at allgather.c:192 >> >> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >> >> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >> >> recvtype=1275069445, comm=1140850688) at allgather.c:866 >> >> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, key=0, >> >> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >> >> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >> >> my_rank=) at create_2level_comm.c:142 >> >> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >> argv=0x7fffdee814b0) >> >> at init.c:146 >> >> ---Type to continue, or q to quit--- >> >> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >> >> -- >> >> Michael Heinz >> >> Principal Engineer, Qlogic Corporation >> >> King of Prussia, Pennsylvania >> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> >> >> >> -- >> In the middle of difficulty, lies opportunity >> >> From kandalla at cse.ohio-state.edu Wed Jul 15 19:19:07 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya Kandalla) Date: Wed Jul 15 19:19:40 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4A5E3858.5010702@cse.ohio-state.edu> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> <4A5E3858.5010702@cse.ohio-state.edu> Message-ID: <4A5E63EB.7000700@cse.ohio-state.edu> Mike, I guess, I had mistakenly started the job on 3 processes earlier and it had hung. On running with 2 processes, (the way it is supposed to be run), it executes correctly on our machines. Can you give us some more information about your hardware. You were speaking about some reachability issues between certain two nodes. I am guessing that you are running tests with on either : 1. Nodes "A" and "D" or 2. Nodes "B" and "C" Also, > "A" can't run mvapich2 programs with machine "B", and machine "C" can't run programs with machine "D" What exactly is the kind of error message that you see in this case? Thanks, Krishna Krishna Chaitanya Kandalla wrote: > Mike, > Thank you for providing the source code. I am able to > reproduce the hang on our cluster, as well. I will look into the issue. > > Thanks, > Krishna > > Mike Heinz wrote: >> I was wondering about that - I passed the parameter in a param file, >> using the -param argument to mpirun_rsh. I just tried passing it >> inline as well, here are the results: >> >> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 >> /opt/iba/src/mpi_apps/bandwidth/bw 10 10 >> >> node 0 >> >> Loaded symbols for /lib64/libnss_files.so.2 >> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >> (gdb) where >> #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >> #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) >> at ch3_progress.c:174 >> #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, >> datatype=1275068673, source=1, tag=101, comm=1140850688, >> status=0x601520) >> at recv.c:156 >> #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 >> >> >> (gdb) where >> #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, >> wc=0x7fffb9786a60) at src/cq.c:470 >> #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( >> vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, >> is_blocking=1) >> at /usr/include/infiniband/verbs.h:934 >> #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, >> vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 >> #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress >> (vc_pptr=0x7fffb9786b80, >> v_ptr=0x7fffb9786b78, is_blocking=) >> at ch3_read_progress.c:158 >> #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, >> state=) at ch3_progress.c:202 >> #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at >> helper_fns.c:269 >> #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, >> sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, >> recvtype=1275068685, source=0, recvtag=1, comm=1140850688, >> status=0x1) >> at helper_fns.c:125 >> #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) >> at barrier.c:82 >> #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at >> barrier.c:446 >> #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 >> >> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets >> out of MPI_Init() in this case, but then one side is waiting at a >> barrier while the other has already gone past the barrier. I've >> attached a copy of the program. >> >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation >> King of Prussia, Pennsylvania >> -----Original Message----- >> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >> Sent: Wednesday, July 15, 2009 3:42 PM >> To: Mike Heinz >> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >> a problem that only affects a few machines in our cluster. >> >> Mike, >> Thats a little surprising. Setting this variable off ensures that a >> particular flag is set to 0. This flag is supposed to guard the piece >> of code that does the 2-level communicator creation. Just out of >> curiosity, can you also let me know the command that you are using to >> launch the job. The env variables need to be set before the >> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the >> executable name, the job launcher might not pick it up. >> >> Thanks, >> Krishna >> >> >> >> >> Mike Heinz wrote: >> >>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL >>> to zero did not seem to change the stack trace much: >>> >>> Node 0: >>> >>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >>> (vbuf_handle=0x7fffcb46d698, >>> >>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >>> >>> 529 for (; i < rdma_num_hcas; ++i) { >>> >>> (gdb) where >>> >>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >>> >>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >>> >>> at ibv_channel_manager.c:529 >>> >>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fffcb46d6a0, >>> >>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >>> >>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>> >>> at helper_fns.c:269 >>> >>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, >>> sendcount=2, >>> >>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, >>> recvcount=2, >>> >>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>> >>> status=0x7fffcb46d820) at helper_fns.c:125 >>> >>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, >>> recvbuf=0x10993a80, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>> >>> at allgather.c:192 >>> >>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>> key=0, >>> >>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>> >>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>> >>> ---Type to continue, or q to quit--- >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >>> argv=0x7fffcb46db30) >>> >>> at init.c:146 >>> >>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >>> >>> Node 1: >>> >>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >>> >>> is_blocking=1) at ch3_read_progress.c:143 >>> >>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >>> >>> (gdb) where >>> >>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, >>> v_ptr=0x7fff0b10bb48, >>> >>> is_blocking=1) at ch3_read_progress.c:143 >>> >>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >>> >>> at helper_fns.c:269 >>> >>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >>> >>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >>> >>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>> >>> status=0x7fff0b10bcd0) at helper_fns.c:125 >>> >>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >>> >>> at allgather.c:192 >>> >>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, >>> key=0, >>> >>> newcomm=0x2afc9fd26d94) at comm_split.c:196 >>> >>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >>> argv=0x7fff0b10bfe0) >>> >>> at init.c:146 >>> >>> ---Type to continue, or q to quit--- >>> >>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >>> >>> Any suggestions would be appreciated. >>> >>> -- >>> >>> Michael Heinz >>> >>> Principal Engineer, Qlogic Corporation >>> >>> King of Prussia, Pennsylvania >>> >>> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On >>> Behalf Of *Krishna Chaitanya >>> *Sent:* Tuesday, July 14, 2009 6:39 PM >>> *To:* Mike Heinz >>> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >>> mpich2-dev@mcs.anl.gov >>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in >>> debugging a problem that only affects a few machines in our cluster. >>> >>> Mike, >>> The hang seems to be occuring when the MPI library is trying to >>> create the 2-level communicator, during the init phase. Can you try >>> running the test with MV2_USE_SHMEM_COLL >>> =0. >>> This will ensure that a flat communicator is used for the subsequent >>> MPI calls. This might help us isolate the problem. >>> >>> Thanks, >>> Krishna >>> >>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz >>> > wrote: >>> >>> We're having a very odd problem with our fabric, where, out of the >>> entire cluster, machine "A" can't run mvapich2 programs with machine >>> "B", and machine "C" can't run programs with machine "D" - even >>> though "A" can run with "D" and "B" can run with "C" - and the rest >>> of the fabric works fine. >>> >>> 1) There are no IB errors anywhere on the fabric that I can find, >>> and the machines in question all work correctly with mvapich1 and >>> low-level IB tests. >>> >>> 2) The problem occurs whether using mpd or rsh. >>> >>> 3) If I attach to the running processes, both machines appear to be >>> waiting for a read operation to complete. (See below) >>> >>> Can anyone make a suggestion on how to debug this? >>> >>> Stack trace for node 0: >>> >>> #0 0x000000361160abb5 in pthread_spin_lock () from >>> /lib64/libpthread.so.0 >>> >>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >>> >>> wc=0x7fff9d835900) at src/cq.c:468 >>> >>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>> >>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >>> >>> at /usr/include/infiniband/verbs.h:934 >>> >>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fff9d8359e0, >>> >>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >>> >>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>> >>> at helper_fns.c:269 >>> >>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >>> >>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >>> >>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>> >>> status=0x7fff9d835b60) at helper_fns.c:125 >>> >>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, >>> recvbuf=0x217fc50, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>> >>> at allgather.c:192 >>> >>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> ---Type to continue, or q to quit--- >>> >>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>> key=0, >>> >>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>> >>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >>> argv=0x7fff9d835e70) >>> >>> at init.c:146 >>> >>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >>> >>> Stack trace for node 1: >>> >>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fffdee81020, >>> >>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >>> >>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >>> >>> at helper_fns.c:269 >>> >>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >>> >>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >>> >>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>> >>> status=0x7fffdee811a0) at helper_fns.c:125 >>> >>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >>> >>> at allgather.c:192 >>> >>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, >>> key=0, >>> >>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >>> >>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >>> argv=0x7fffdee814b0) >>> >>> at init.c:146 >>> >>> ---Type to continue, or q to quit--- >>> >>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >>> >>> -- >>> >>> Michael Heinz >>> >>> Principal Engineer, Qlogic Corporation >>> >>> King of Prussia, Pennsylvania >>> >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >>> >>> >>> -- >>> In the middle of difficulty, lies opportunity >>> >>> > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From michael.heinz at qlogic.com Thu Jul 16 09:34:44 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu Jul 16 09:35:12 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4A5E63EB.7000700@cse.ohio-state.edu> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> <4A5E3858.5010702@cse.ohio-state.edu> <4A5E63EB.7000700@cse.ohio-state.edu> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A75@MNEXMB1.qlogic.org> Krishna, What I'm saying is that if I run the program between A & D or A & C it works, but if I run it between A & B it silently hangs, never making progress. Meanwhile, I can run the same program between C & B and C & A, but runs between C & D silently hang without making progress. This problem only occurs with mvapich2, not with mvapich1 or openmpi. All other Infiniband operations appear to be working normally. This behavior is repeatable for those two pairs of machines ( A & B and C & D), but has not been seen on any other machines on the fabric, and we have not seen this on any other fabric - if I had to guess there's some kind of timing hole that's being exposed in very narrow conditions. The fabric in question is actually used to test software before we release it, so it contains a mix of Linux distros, but all machines are X86_64 architecture. For the stack traces I sent you, node 0 is a 8-way Xeon E5320 1.86 gHZ, node 1 is a 2-way Opteron running at 2.4 GHz. I realize the symptoms are quite bizarre - we've had several Infiniband coders and testers investigating this for a couple of weeks now - I was just hoping you might be able to suggest a line of investigation. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] Sent: Wednesday, July 15, 2009 7:19 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu; Todd Rimmer Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. Mike, I guess, I had mistakenly started the job on 3 processes earlier and it had hung. On running with 2 processes, (the way it is supposed to be run), it executes correctly on our machines. Can you give us some more information about your hardware. You were speaking about some reachability issues between certain two nodes. I am guessing that you are running tests with on either : 1. Nodes "A" and "D" or 2. Nodes "B" and "C" Also, > "A" can't run mvapich2 programs with machine "B", and machine "C" can't run programs with machine "D" What exactly is the kind of error message that you see in this case? Thanks, Krishna Krishna Chaitanya Kandalla wrote: > Mike, > Thank you for providing the source code. I am able to > reproduce the hang on our cluster, as well. I will look into the issue. > > Thanks, > Krishna > > Mike Heinz wrote: >> I was wondering about that - I passed the parameter in a param file, >> using the -param argument to mpirun_rsh. I just tried passing it >> inline as well, here are the results: >> >> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 >> /opt/iba/src/mpi_apps/bandwidth/bw 10 10 >> >> node 0 >> >> Loaded symbols for /lib64/libnss_files.so.2 >> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >> (gdb) where >> #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >> #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) >> at ch3_progress.c:174 >> #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, >> datatype=1275068673, source=1, tag=101, comm=1140850688, >> status=0x601520) >> at recv.c:156 >> #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 >> >> >> (gdb) where >> #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, >> wc=0x7fffb9786a60) at src/cq.c:470 >> #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( >> vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, >> is_blocking=1) >> at /usr/include/infiniband/verbs.h:934 >> #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, >> vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 >> #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress >> (vc_pptr=0x7fffb9786b80, >> v_ptr=0x7fffb9786b78, is_blocking=) >> at ch3_read_progress.c:158 >> #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, >> state=) at ch3_progress.c:202 >> #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at >> helper_fns.c:269 >> #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, >> sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, >> recvtype=1275068685, source=0, recvtag=1, comm=1140850688, >> status=0x1) >> at helper_fns.c:125 >> #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) >> at barrier.c:82 >> #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at >> barrier.c:446 >> #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 >> >> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets >> out of MPI_Init() in this case, but then one side is waiting at a >> barrier while the other has already gone past the barrier. I've >> attached a copy of the program. >> >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation >> King of Prussia, Pennsylvania >> -----Original Message----- >> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >> Sent: Wednesday, July 15, 2009 3:42 PM >> To: Mike Heinz >> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >> a problem that only affects a few machines in our cluster. >> >> Mike, >> Thats a little surprising. Setting this variable off ensures that a >> particular flag is set to 0. This flag is supposed to guard the piece >> of code that does the 2-level communicator creation. Just out of >> curiosity, can you also let me know the command that you are using to >> launch the job. The env variables need to be set before the >> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the >> executable name, the job launcher might not pick it up. >> >> Thanks, >> Krishna >> >> >> >> >> Mike Heinz wrote: >> >>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL >>> to zero did not seem to change the stack trace much: >>> >>> Node 0: >>> >>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >>> (vbuf_handle=0x7fffcb46d698, >>> >>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >>> >>> 529 for (; i < rdma_num_hcas; ++i) { >>> >>> (gdb) where >>> >>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >>> >>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >>> >>> at ibv_channel_manager.c:529 >>> >>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fffcb46d6a0, >>> >>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >>> >>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>> >>> at helper_fns.c:269 >>> >>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, >>> sendcount=2, >>> >>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, >>> recvcount=2, >>> >>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>> >>> status=0x7fffcb46d820) at helper_fns.c:125 >>> >>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, >>> recvbuf=0x10993a80, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>> >>> at allgather.c:192 >>> >>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>> key=0, >>> >>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>> >>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>> >>> ---Type to continue, or q to quit--- >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >>> argv=0x7fffcb46db30) >>> >>> at init.c:146 >>> >>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >>> >>> Node 1: >>> >>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >>> >>> is_blocking=1) at ch3_read_progress.c:143 >>> >>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >>> >>> (gdb) where >>> >>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, >>> v_ptr=0x7fff0b10bb48, >>> >>> is_blocking=1) at ch3_read_progress.c:143 >>> >>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >>> >>> at helper_fns.c:269 >>> >>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >>> >>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >>> >>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>> >>> status=0x7fff0b10bcd0) at helper_fns.c:125 >>> >>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >>> >>> at allgather.c:192 >>> >>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, >>> key=0, >>> >>> newcomm=0x2afc9fd26d94) at comm_split.c:196 >>> >>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >>> argv=0x7fff0b10bfe0) >>> >>> at init.c:146 >>> >>> ---Type to continue, or q to quit--- >>> >>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >>> >>> Any suggestions would be appreciated. >>> >>> -- >>> >>> Michael Heinz >>> >>> Principal Engineer, Qlogic Corporation >>> >>> King of Prussia, Pennsylvania >>> >>> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On >>> Behalf Of *Krishna Chaitanya >>> *Sent:* Tuesday, July 14, 2009 6:39 PM >>> *To:* Mike Heinz >>> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >>> mpich2-dev@mcs.anl.gov >>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in >>> debugging a problem that only affects a few machines in our cluster. >>> >>> Mike, >>> The hang seems to be occuring when the MPI library is trying to >>> create the 2-level communicator, during the init phase. Can you try >>> running the test with MV2_USE_SHMEM_COLL >>> =0. >>> This will ensure that a flat communicator is used for the subsequent >>> MPI calls. This might help us isolate the problem. >>> >>> Thanks, >>> Krishna >>> >>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz >>> > wrote: >>> >>> We're having a very odd problem with our fabric, where, out of the >>> entire cluster, machine "A" can't run mvapich2 programs with machine >>> "B", and machine "C" can't run programs with machine "D" - even >>> though "A" can run with "D" and "B" can run with "C" - and the rest >>> of the fabric works fine. >>> >>> 1) There are no IB errors anywhere on the fabric that I can find, >>> and the machines in question all work correctly with mvapich1 and >>> low-level IB tests. >>> >>> 2) The problem occurs whether using mpd or rsh. >>> >>> 3) If I attach to the running processes, both machines appear to be >>> waiting for a read operation to complete. (See below) >>> >>> Can anyone make a suggestion on how to debug this? >>> >>> Stack trace for node 0: >>> >>> #0 0x000000361160abb5 in pthread_spin_lock () from >>> /lib64/libpthread.so.0 >>> >>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >>> >>> wc=0x7fff9d835900) at src/cq.c:468 >>> >>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>> >>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >>> >>> at /usr/include/infiniband/verbs.h:934 >>> >>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fff9d8359e0, >>> >>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >>> >>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>> >>> at helper_fns.c:269 >>> >>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >>> >>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >>> >>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>> >>> status=0x7fff9d835b60) at helper_fns.c:125 >>> >>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, >>> recvbuf=0x217fc50, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>> >>> at allgather.c:192 >>> >>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> ---Type to continue, or q to quit--- >>> >>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>> key=0, >>> >>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>> >>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >>> argv=0x7fff9d835e70) >>> >>> at init.c:146 >>> >>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >>> >>> Stack trace for node 1: >>> >>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fffdee81020, >>> >>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >>> >>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >>> >>> state=) at ch3_progress.c:202 >>> >>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >>> >>> at helper_fns.c:269 >>> >>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >>> >>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >>> >>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>> >>> status=0x7fffdee811a0) at helper_fns.c:125 >>> >>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >>> >>> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >>> >>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >>> >>> at allgather.c:192 >>> >>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>> >>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >>> >>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>> >>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, >>> key=0, >>> >>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >>> >>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >>> >>> my_rank=) at create_2level_comm.c:142 >>> >>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >>> argv=0x7fffdee814b0) >>> >>> at init.c:146 >>> >>> ---Type to continue, or q to quit--- >>> >>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >>> >>> -- >>> >>> Michael Heinz >>> >>> Principal Engineer, Qlogic Corporation >>> >>> King of Prussia, Pennsylvania >>> >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >>> >>> >>> -- >>> In the middle of difficulty, lies opportunity >>> >>> > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From kandalla at cse.ohio-state.edu Thu Jul 16 12:33:50 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya Kandalla) Date: Thu Jul 16 12:34:24 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A75@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> <4A5E3858.5010702@cse.ohio-state.edu> <4A5E63EB.7000700@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A75@MNEXMB1.qlogic.org> Message-ID: <4A5F566E.9050004@cse.ohio-state.edu> Mike, Can you also let us know the version numbers of the mvapich2 and mvapich1 stacks that you are using? Thanks, Krishna Mike Heinz wrote: > Krishna, > > What I'm saying is that if I run the program between A & D or A & C it works, but if I run it between A & B it silently hangs, never making progress. Meanwhile, I can run the same program between C & B and C & A, but runs between C & D silently hang without making progress. This problem only occurs with mvapich2, not with mvapich1 or openmpi. All other Infiniband operations appear to be working normally. > > This behavior is repeatable for those two pairs of machines ( A & B and C & D), but has not been seen on any other machines on the fabric, and we have not seen this on any other fabric - if I had to guess there's some kind of timing hole that's being exposed in very narrow conditions. > > The fabric in question is actually used to test software before we release it, so it contains a mix of Linux distros, but all machines are X86_64 architecture. > > For the stack traces I sent you, node 0 is a 8-way Xeon E5320 1.86 gHZ, node 1 is a 2-way Opteron running at 2.4 GHz. > > I realize the symptoms are quite bizarre - we've had several Infiniband coders and testers investigating this for a couple of weeks now - I was just hoping you might be able to suggest a line of investigation. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] > Sent: Wednesday, July 15, 2009 7:19 PM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu; Todd Rimmer > Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. > > Mike, > I guess, I had mistakenly started the job on 3 processes earlier > and it had hung. On running with 2 processes, (the way it is supposed to > be run), it executes correctly on our machines. Can you give us some > more information about your hardware. You were speaking about some > reachability issues between certain two nodes. I am guessing that you > are running tests with on either : > 1. Nodes "A" and "D" or > 2. Nodes "B" and "C" > > Also, > > "A" can't run mvapich2 programs with machine "B", and machine "C" > can't run programs with machine "D" > > What exactly is the kind of error message that you see in this case? > > Thanks, > Krishna > > Krishna Chaitanya Kandalla wrote: > >> Mike, >> Thank you for providing the source code. I am able to >> reproduce the hang on our cluster, as well. I will look into the issue. >> >> Thanks, >> Krishna >> >> Mike Heinz wrote: >> >>> I was wondering about that - I passed the parameter in a param file, >>> using the -param argument to mpirun_rsh. I just tried passing it >>> inline as well, here are the results: >>> >>> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 >>> /opt/iba/src/mpi_apps/bandwidth/bw 10 10 >>> >>> node 0 >>> >>> Loaded symbols for /lib64/libnss_files.so.2 >>> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>> (gdb) where >>> #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>> #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) >>> at ch3_progress.c:174 >>> #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, >>> datatype=1275068673, source=1, tag=101, comm=1140850688, >>> status=0x601520) >>> at recv.c:156 >>> #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 >>> >>> >>> (gdb) where >>> #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, >>> wc=0x7fffb9786a60) at src/cq.c:470 >>> #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>> vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, >>> is_blocking=1) >>> at /usr/include/infiniband/verbs.h:934 >>> #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, >>> vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 >>> #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fffb9786b80, >>> v_ptr=0x7fffb9786b78, is_blocking=) >>> at ch3_read_progress.c:158 >>> #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, >>> state=) at ch3_progress.c:202 >>> #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at >>> helper_fns.c:269 >>> #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, >>> sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, >>> recvtype=1275068685, source=0, recvtag=1, comm=1140850688, >>> status=0x1) >>> at helper_fns.c:125 >>> #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) >>> at barrier.c:82 >>> #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at >>> barrier.c:446 >>> #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 >>> >>> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets >>> out of MPI_Init() in this case, but then one side is waiting at a >>> barrier while the other has already gone past the barrier. I've >>> attached a copy of the program. >>> >>> >>> -- >>> Michael Heinz >>> Principal Engineer, Qlogic Corporation >>> King of Prussia, Pennsylvania >>> -----Original Message----- >>> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >>> Sent: Wednesday, July 15, 2009 3:42 PM >>> To: Mike Heinz >>> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >>> a problem that only affects a few machines in our cluster. >>> >>> Mike, >>> Thats a little surprising. Setting this variable off ensures that a >>> particular flag is set to 0. This flag is supposed to guard the piece >>> of code that does the 2-level communicator creation. Just out of >>> curiosity, can you also let me know the command that you are using to >>> launch the job. The env variables need to be set before the >>> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the >>> executable name, the job launcher might not pick it up. >>> >>> Thanks, >>> Krishna >>> >>> >>> >>> >>> Mike Heinz wrote: >>> >>> >>>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL >>>> to zero did not seem to change the stack trace much: >>>> >>>> Node 0: >>>> >>>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >>>> (vbuf_handle=0x7fffcb46d698, >>>> >>>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >>>> >>>> 529 for (; i < rdma_num_hcas; ++i) { >>>> >>>> (gdb) where >>>> >>>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>> >>>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >>>> >>>> at ibv_channel_manager.c:529 >>>> >>>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fffcb46d6a0, >>>> >>>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>> >>>> at helper_fns.c:269 >>>> >>>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, >>>> sendcount=2, >>>> >>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, >>>> recvcount=2, >>>> >>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fffcb46d820) at helper_fns.c:125 >>>> >>>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, >>>> recvbuf=0x10993a80, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>> >>>> at allgather.c:192 >>>> >>>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>> key=0, >>>> >>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>> >>>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >>>> argv=0x7fffcb46db30) >>>> >>>> at init.c:146 >>>> >>>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >>>> >>>> Node 1: >>>> >>>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >>>> >>>> is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >>>> >>>> (gdb) where >>>> >>>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, >>>> v_ptr=0x7fff0b10bb48, >>>> >>>> is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >>>> >>>> at helper_fns.c:269 >>>> >>>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >>>> >>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >>>> >>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fff0b10bcd0) at helper_fns.c:125 >>>> >>>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >>>> >>>> at allgather.c:192 >>>> >>>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, >>>> key=0, >>>> >>>> newcomm=0x2afc9fd26d94) at comm_split.c:196 >>>> >>>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >>>> argv=0x7fff0b10bfe0) >>>> >>>> at init.c:146 >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >>>> >>>> Any suggestions would be appreciated. >>>> >>>> -- >>>> >>>> Michael Heinz >>>> >>>> Principal Engineer, Qlogic Corporation >>>> >>>> King of Prussia, Pennsylvania >>>> >>>> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On >>>> Behalf Of *Krishna Chaitanya >>>> *Sent:* Tuesday, July 14, 2009 6:39 PM >>>> *To:* Mike Heinz >>>> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >>>> mpich2-dev@mcs.anl.gov >>>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in >>>> debugging a problem that only affects a few machines in our cluster. >>>> >>>> Mike, >>>> The hang seems to be occuring when the MPI library is trying to >>>> create the 2-level communicator, during the init phase. Can you try >>>> running the test with MV2_USE_SHMEM_COLL >>>> =0. >>>> This will ensure that a flat communicator is used for the subsequent >>>> MPI calls. This might help us isolate the problem. >>>> >>>> Thanks, >>>> Krishna >>>> >>>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz >>>> > wrote: >>>> >>>> We're having a very odd problem with our fabric, where, out of the >>>> entire cluster, machine "A" can't run mvapich2 programs with machine >>>> "B", and machine "C" can't run programs with machine "D" - even >>>> though "A" can run with "D" and "B" can run with "C" - and the rest >>>> of the fabric works fine. >>>> >>>> 1) There are no IB errors anywhere on the fabric that I can find, >>>> and the machines in question all work correctly with mvapich1 and >>>> low-level IB tests. >>>> >>>> 2) The problem occurs whether using mpd or rsh. >>>> >>>> 3) If I attach to the running processes, both machines appear to be >>>> waiting for a read operation to complete. (See below) >>>> >>>> Can anyone make a suggestion on how to debug this? >>>> >>>> Stack trace for node 0: >>>> >>>> #0 0x000000361160abb5 in pthread_spin_lock () from >>>> /lib64/libpthread.so.0 >>>> >>>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >>>> >>>> wc=0x7fff9d835900) at src/cq.c:468 >>>> >>>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>> >>>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >>>> >>>> at /usr/include/infiniband/verbs.h:934 >>>> >>>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fff9d8359e0, >>>> >>>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>> >>>> at helper_fns.c:269 >>>> >>>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >>>> >>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >>>> >>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fff9d835b60) at helper_fns.c:125 >>>> >>>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, >>>> recvbuf=0x217fc50, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>> >>>> at allgather.c:192 >>>> >>>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>> key=0, >>>> >>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>> >>>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >>>> argv=0x7fff9d835e70) >>>> >>>> at init.c:146 >>>> >>>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >>>> >>>> Stack trace for node 1: >>>> >>>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fffdee81020, >>>> >>>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >>>> >>>> at helper_fns.c:269 >>>> >>>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >>>> >>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >>>> >>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fffdee811a0) at helper_fns.c:125 >>>> >>>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >>>> >>>> at allgather.c:192 >>>> >>>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, >>>> key=0, >>>> >>>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >>>> >>>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >>>> argv=0x7fffdee814b0) >>>> >>>> at init.c:146 >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >>>> >>>> -- >>>> >>>> Michael Heinz >>>> >>>> Principal Engineer, Qlogic Corporation >>>> >>>> King of Prussia, Pennsylvania >>>> >>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >>>> >>>> -- >>>> In the middle of difficulty, lies opportunity >>>> >>>> >>>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> >> > > > From michael.heinz at qlogic.com Thu Jul 16 12:37:06 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu Jul 16 12:37:34 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4A5F566E.9050004@cse.ohio-state.edu> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> <4A5E3858.5010702@cse.ohio-state.edu> <4A5E63EB.7000700@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A75@MNEXMB1.qlogic.org> <4A5F566E.9050004@cse.ohio-state.edu> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2ABF@MNEXMB1.qlogic.org> mvapich-1.1.0-3355.src.rpm mvapich2-1.2p1-1.src.rpm -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] Sent: Thursday, July 16, 2009 12:34 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. Mike, Can you also let us know the version numbers of the mvapich2 and mvapich1 stacks that you are using? Thanks, Krishna Mike Heinz wrote: > Krishna, > > What I'm saying is that if I run the program between A & D or A & C it works, but if I run it between A & B it silently hangs, never making progress. Meanwhile, I can run the same program between C & B and C & A, but runs between C & D silently hang without making progress. This problem only occurs with mvapich2, not with mvapich1 or openmpi. All other Infiniband operations appear to be working normally. > > This behavior is repeatable for those two pairs of machines ( A & B and C & D), but has not been seen on any other machines on the fabric, and we have not seen this on any other fabric - if I had to guess there's some kind of timing hole that's being exposed in very narrow conditions. > > The fabric in question is actually used to test software before we release it, so it contains a mix of Linux distros, but all machines are X86_64 architecture. > > For the stack traces I sent you, node 0 is a 8-way Xeon E5320 1.86 gHZ, node 1 is a 2-way Opteron running at 2.4 GHz. > > I realize the symptoms are quite bizarre - we've had several Infiniband coders and testers investigating this for a couple of weeks now - I was just hoping you might be able to suggest a line of investigation. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] > Sent: Wednesday, July 15, 2009 7:19 PM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu; Todd Rimmer > Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. > > Mike, > I guess, I had mistakenly started the job on 3 processes earlier > and it had hung. On running with 2 processes, (the way it is supposed to > be run), it executes correctly on our machines. Can you give us some > more information about your hardware. You were speaking about some > reachability issues between certain two nodes. I am guessing that you > are running tests with on either : > 1. Nodes "A" and "D" or > 2. Nodes "B" and "C" > > Also, > > "A" can't run mvapich2 programs with machine "B", and machine "C" > can't run programs with machine "D" > > What exactly is the kind of error message that you see in this case? > > Thanks, > Krishna > > Krishna Chaitanya Kandalla wrote: > >> Mike, >> Thank you for providing the source code. I am able to >> reproduce the hang on our cluster, as well. I will look into the issue. >> >> Thanks, >> Krishna >> >> Mike Heinz wrote: >> >>> I was wondering about that - I passed the parameter in a param file, >>> using the -param argument to mpirun_rsh. I just tried passing it >>> inline as well, here are the results: >>> >>> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 >>> /opt/iba/src/mpi_apps/bandwidth/bw 10 10 >>> >>> node 0 >>> >>> Loaded symbols for /lib64/libnss_files.so.2 >>> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>> (gdb) where >>> #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>> #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) >>> at ch3_progress.c:174 >>> #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, >>> datatype=1275068673, source=1, tag=101, comm=1140850688, >>> status=0x601520) >>> at recv.c:156 >>> #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 >>> >>> >>> (gdb) where >>> #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, >>> wc=0x7fffb9786a60) at src/cq.c:470 >>> #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>> vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, >>> is_blocking=1) >>> at /usr/include/infiniband/verbs.h:934 >>> #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, >>> vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 >>> #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress >>> (vc_pptr=0x7fffb9786b80, >>> v_ptr=0x7fffb9786b78, is_blocking=) >>> at ch3_read_progress.c:158 >>> #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, >>> state=) at ch3_progress.c:202 >>> #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at >>> helper_fns.c:269 >>> #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, >>> sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, >>> recvtype=1275068685, source=0, recvtag=1, comm=1140850688, >>> status=0x1) >>> at helper_fns.c:125 >>> #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) >>> at barrier.c:82 >>> #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at >>> barrier.c:446 >>> #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 >>> >>> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets >>> out of MPI_Init() in this case, but then one side is waiting at a >>> barrier while the other has already gone past the barrier. I've >>> attached a copy of the program. >>> >>> >>> -- >>> Michael Heinz >>> Principal Engineer, Qlogic Corporation >>> King of Prussia, Pennsylvania >>> -----Original Message----- >>> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >>> Sent: Wednesday, July 15, 2009 3:42 PM >>> To: Mike Heinz >>> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >>> a problem that only affects a few machines in our cluster. >>> >>> Mike, >>> Thats a little surprising. Setting this variable off ensures that a >>> particular flag is set to 0. This flag is supposed to guard the piece >>> of code that does the 2-level communicator creation. Just out of >>> curiosity, can you also let me know the command that you are using to >>> launch the job. The env variables need to be set before the >>> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the >>> executable name, the job launcher might not pick it up. >>> >>> Thanks, >>> Krishna >>> >>> >>> >>> >>> Mike Heinz wrote: >>> >>> >>>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL >>>> to zero did not seem to change the stack trace much: >>>> >>>> Node 0: >>>> >>>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >>>> (vbuf_handle=0x7fffcb46d698, >>>> >>>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >>>> >>>> 529 for (; i < rdma_num_hcas; ++i) { >>>> >>>> (gdb) where >>>> >>>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>> >>>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >>>> >>>> at ibv_channel_manager.c:529 >>>> >>>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fffcb46d6a0, >>>> >>>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>> >>>> at helper_fns.c:269 >>>> >>>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, >>>> sendcount=2, >>>> >>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, >>>> recvcount=2, >>>> >>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fffcb46d820) at helper_fns.c:125 >>>> >>>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, >>>> recvbuf=0x10993a80, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>> >>>> at allgather.c:192 >>>> >>>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>> key=0, >>>> >>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>> >>>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >>>> argv=0x7fffcb46db30) >>>> >>>> at init.c:146 >>>> >>>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >>>> >>>> Node 1: >>>> >>>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >>>> >>>> is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >>>> >>>> (gdb) where >>>> >>>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, >>>> v_ptr=0x7fff0b10bb48, >>>> >>>> is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >>>> >>>> at helper_fns.c:269 >>>> >>>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >>>> >>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >>>> >>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fff0b10bcd0) at helper_fns.c:125 >>>> >>>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >>>> >>>> at allgather.c:192 >>>> >>>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, >>>> key=0, >>>> >>>> newcomm=0x2afc9fd26d94) at comm_split.c:196 >>>> >>>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >>>> argv=0x7fff0b10bfe0) >>>> >>>> at init.c:146 >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >>>> >>>> Any suggestions would be appreciated. >>>> >>>> -- >>>> >>>> Michael Heinz >>>> >>>> Principal Engineer, Qlogic Corporation >>>> >>>> King of Prussia, Pennsylvania >>>> >>>> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On >>>> Behalf Of *Krishna Chaitanya >>>> *Sent:* Tuesday, July 14, 2009 6:39 PM >>>> *To:* Mike Heinz >>>> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >>>> mpich2-dev@mcs.anl.gov >>>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in >>>> debugging a problem that only affects a few machines in our cluster. >>>> >>>> Mike, >>>> The hang seems to be occuring when the MPI library is trying to >>>> create the 2-level communicator, during the init phase. Can you try >>>> running the test with MV2_USE_SHMEM_COLL >>>> =0. >>>> This will ensure that a flat communicator is used for the subsequent >>>> MPI calls. This might help us isolate the problem. >>>> >>>> Thanks, >>>> Krishna >>>> >>>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz >>>> > wrote: >>>> >>>> We're having a very odd problem with our fabric, where, out of the >>>> entire cluster, machine "A" can't run mvapich2 programs with machine >>>> "B", and machine "C" can't run programs with machine "D" - even >>>> though "A" can run with "D" and "B" can run with "C" - and the rest >>>> of the fabric works fine. >>>> >>>> 1) There are no IB errors anywhere on the fabric that I can find, >>>> and the machines in question all work correctly with mvapich1 and >>>> low-level IB tests. >>>> >>>> 2) The problem occurs whether using mpd or rsh. >>>> >>>> 3) If I attach to the running processes, both machines appear to be >>>> waiting for a read operation to complete. (See below) >>>> >>>> Can anyone make a suggestion on how to debug this? >>>> >>>> Stack trace for node 0: >>>> >>>> #0 0x000000361160abb5 in pthread_spin_lock () from >>>> /lib64/libpthread.so.0 >>>> >>>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >>>> >>>> wc=0x7fff9d835900) at src/cq.c:468 >>>> >>>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>> >>>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >>>> >>>> at /usr/include/infiniband/verbs.h:934 >>>> >>>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fff9d8359e0, >>>> >>>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>> >>>> at helper_fns.c:269 >>>> >>>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >>>> >>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >>>> >>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fff9d835b60) at helper_fns.c:125 >>>> >>>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, >>>> recvbuf=0x217fc50, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>> >>>> at allgather.c:192 >>>> >>>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>> key=0, >>>> >>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>> >>>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >>>> argv=0x7fff9d835e70) >>>> >>>> at init.c:146 >>>> >>>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >>>> >>>> Stack trace for node 1: >>>> >>>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fffdee81020, >>>> >>>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >>>> >>>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >>>> >>>> state=) at ch3_progress.c:202 >>>> >>>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >>>> >>>> at helper_fns.c:269 >>>> >>>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >>>> >>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >>>> >>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>> >>>> status=0x7fffdee811a0) at helper_fns.c:125 >>>> >>>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >>>> >>>> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >>>> >>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >>>> >>>> at allgather.c:192 >>>> >>>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>> >>>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >>>> >>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>> >>>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, >>>> key=0, >>>> >>>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >>>> >>>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >>>> >>>> my_rank=) at create_2level_comm.c:142 >>>> >>>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >>>> argv=0x7fffdee814b0) >>>> >>>> at init.c:146 >>>> >>>> ---Type to continue, or q to quit--- >>>> >>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >>>> >>>> -- >>>> >>>> Michael Heinz >>>> >>>> Principal Engineer, Qlogic Corporation >>>> >>>> King of Prussia, Pennsylvania >>>> >>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >>>> >>>> -- >>>> In the middle of difficulty, lies opportunity >>>> >>>> >>>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> >> > > > From kandalla at cse.ohio-state.edu Thu Jul 16 13:04:29 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya Kandalla) Date: Thu Jul 16 13:05:02 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2ABF@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> <4A5E3858.5010702@cse.ohio-state.edu> <4A5E63EB.7000700@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A75@MNEXMB1.qlogic.org> <4A5F566E.9050004@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2ABF@MNEXMB1.qlogic.org> Message-ID: <4A5F5D9D.5080300@cse.ohio-state.edu> Mike, Can you also try out mvapich2-1.4 RC1. We have added a bunch of enhancements and bug-fixes in this version. Thanks, Krishna Mike Heinz wrote: > mvapich-1.1.0-3355.src.rpm > > mvapich2-1.2p1-1.src.rpm > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] > Sent: Thursday, July 16, 2009 12:34 PM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. > > Mike, > Can you also let us know the version numbers of the mvapich2 > and mvapich1 stacks that you are using? > > Thanks, > Krishna > > Mike Heinz wrote: > >> Krishna, >> >> What I'm saying is that if I run the program between A & D or A & C it works, but if I run it between A & B it silently hangs, never making progress. Meanwhile, I can run the same program between C & B and C & A, but runs between C & D silently hang without making progress. This problem only occurs with mvapich2, not with mvapich1 or openmpi. All other Infiniband operations appear to be working normally. >> >> This behavior is repeatable for those two pairs of machines ( A & B and C & D), but has not been seen on any other machines on the fabric, and we have not seen this on any other fabric - if I had to guess there's some kind of timing hole that's being exposed in very narrow conditions. >> >> The fabric in question is actually used to test software before we release it, so it contains a mix of Linux distros, but all machines are X86_64 architecture. >> >> For the stack traces I sent you, node 0 is a 8-way Xeon E5320 1.86 gHZ, node 1 is a 2-way Opteron running at 2.4 GHz. >> >> I realize the symptoms are quite bizarre - we've had several Infiniband coders and testers investigating this for a couple of weeks now - I was just hoping you might be able to suggest a line of investigation. >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation >> King of Prussia, Pennsylvania >> >> -----Original Message----- >> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >> Sent: Wednesday, July 15, 2009 7:19 PM >> To: Mike Heinz >> Cc: mvapich-discuss@cse.ohio-state.edu; Todd Rimmer >> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. >> >> Mike, >> I guess, I had mistakenly started the job on 3 processes earlier >> and it had hung. On running with 2 processes, (the way it is supposed to >> be run), it executes correctly on our machines. Can you give us some >> more information about your hardware. You were speaking about some >> reachability issues between certain two nodes. I am guessing that you >> are running tests with on either : >> 1. Nodes "A" and "D" or >> 2. Nodes "B" and "C" >> >> Also, >> > "A" can't run mvapich2 programs with machine "B", and machine "C" >> can't run programs with machine "D" >> >> What exactly is the kind of error message that you see in this case? >> >> Thanks, >> Krishna >> >> Krishna Chaitanya Kandalla wrote: >> >> >>> Mike, >>> Thank you for providing the source code. I am able to >>> reproduce the hang on our cluster, as well. I will look into the issue. >>> >>> Thanks, >>> Krishna >>> >>> Mike Heinz wrote: >>> >>> >>>> I was wondering about that - I passed the parameter in a param file, >>>> using the -param argument to mpirun_rsh. I just tried passing it >>>> inline as well, here are the results: >>>> >>>> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 >>>> /opt/iba/src/mpi_apps/bandwidth/bw 10 10 >>>> >>>> node 0 >>>> >>>> Loaded symbols for /lib64/libnss_files.so.2 >>>> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>>> (gdb) where >>>> #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>>> #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) >>>> at ch3_progress.c:174 >>>> #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, >>>> datatype=1275068673, source=1, tag=101, comm=1140850688, >>>> status=0x601520) >>>> at recv.c:156 >>>> #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 >>>> >>>> >>>> (gdb) where >>>> #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, >>>> wc=0x7fffb9786a60) at src/cq.c:470 >>>> #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>> vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, >>>> is_blocking=1) >>>> at /usr/include/infiniband/verbs.h:934 >>>> #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, >>>> vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 >>>> #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fffb9786b80, >>>> v_ptr=0x7fffb9786b78, is_blocking=) >>>> at ch3_read_progress.c:158 >>>> #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>> state=) at ch3_progress.c:202 >>>> #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at >>>> helper_fns.c:269 >>>> #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, >>>> sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, >>>> recvtype=1275068685, source=0, recvtag=1, comm=1140850688, >>>> status=0x1) >>>> at helper_fns.c:125 >>>> #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) >>>> at barrier.c:82 >>>> #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at >>>> barrier.c:446 >>>> #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 >>>> >>>> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets >>>> out of MPI_Init() in this case, but then one side is waiting at a >>>> barrier while the other has already gone past the barrier. I've >>>> attached a copy of the program. >>>> >>>> >>>> -- >>>> Michael Heinz >>>> Principal Engineer, Qlogic Corporation >>>> King of Prussia, Pennsylvania >>>> -----Original Message----- >>>> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >>>> Sent: Wednesday, July 15, 2009 3:42 PM >>>> To: Mike Heinz >>>> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >>>> a problem that only affects a few machines in our cluster. >>>> >>>> Mike, >>>> Thats a little surprising. Setting this variable off ensures that a >>>> particular flag is set to 0. This flag is supposed to guard the piece >>>> of code that does the 2-level communicator creation. Just out of >>>> curiosity, can you also let me know the command that you are using to >>>> launch the job. The env variables need to be set before the >>>> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the >>>> executable name, the job launcher might not pick it up. >>>> >>>> Thanks, >>>> Krishna >>>> >>>> >>>> >>>> >>>> Mike Heinz wrote: >>>> >>>> >>>> >>>>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL >>>>> to zero did not seem to change the stack trace much: >>>>> >>>>> Node 0: >>>>> >>>>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >>>>> (vbuf_handle=0x7fffcb46d698, >>>>> >>>>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >>>>> >>>>> 529 for (; i < rdma_num_hcas; ++i) { >>>>> >>>>> (gdb) where >>>>> >>>>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>>> >>>>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >>>>> >>>>> at ibv_channel_manager.c:529 >>>>> >>>>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fffcb46d6a0, >>>>> >>>>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, >>>>> sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, >>>>> recvcount=2, >>>>> >>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fffcb46d820) at helper_fns.c:125 >>>>> >>>>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, >>>>> recvbuf=0x10993a80, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>>> key=0, >>>>> >>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>>> >>>>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >>>>> argv=0x7fffcb46db30) >>>>> >>>>> at init.c:146 >>>>> >>>>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >>>>> >>>>> Node 1: >>>>> >>>>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >>>>> >>>>> is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >>>>> >>>>> (gdb) where >>>>> >>>>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, >>>>> v_ptr=0x7fff0b10bb48, >>>>> >>>>> is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >>>>> >>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fff0b10bcd0) at helper_fns.c:125 >>>>> >>>>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, >>>>> key=0, >>>>> >>>>> newcomm=0x2afc9fd26d94) at comm_split.c:196 >>>>> >>>>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >>>>> argv=0x7fff0b10bfe0) >>>>> >>>>> at init.c:146 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> -- >>>>> >>>>> Michael Heinz >>>>> >>>>> Principal Engineer, Qlogic Corporation >>>>> >>>>> King of Prussia, Pennsylvania >>>>> >>>>> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On >>>>> Behalf Of *Krishna Chaitanya >>>>> *Sent:* Tuesday, July 14, 2009 6:39 PM >>>>> *To:* Mike Heinz >>>>> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >>>>> mpich2-dev@mcs.anl.gov >>>>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in >>>>> debugging a problem that only affects a few machines in our cluster. >>>>> >>>>> Mike, >>>>> The hang seems to be occuring when the MPI library is trying to >>>>> create the 2-level communicator, during the init phase. Can you try >>>>> running the test with MV2_USE_SHMEM_COLL >>>>> =0. >>>>> This will ensure that a flat communicator is used for the subsequent >>>>> MPI calls. This might help us isolate the problem. >>>>> >>>>> Thanks, >>>>> Krishna >>>>> >>>>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz >>>>> > wrote: >>>>> >>>>> We're having a very odd problem with our fabric, where, out of the >>>>> entire cluster, machine "A" can't run mvapich2 programs with machine >>>>> "B", and machine "C" can't run programs with machine "D" - even >>>>> though "A" can run with "D" and "B" can run with "C" - and the rest >>>>> of the fabric works fine. >>>>> >>>>> 1) There are no IB errors anywhere on the fabric that I can find, >>>>> and the machines in question all work correctly with mvapich1 and >>>>> low-level IB tests. >>>>> >>>>> 2) The problem occurs whether using mpd or rsh. >>>>> >>>>> 3) If I attach to the running processes, both machines appear to be >>>>> waiting for a read operation to complete. (See below) >>>>> >>>>> Can anyone make a suggestion on how to debug this? >>>>> >>>>> Stack trace for node 0: >>>>> >>>>> #0 0x000000361160abb5 in pthread_spin_lock () from >>>>> /lib64/libpthread.so.0 >>>>> >>>>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >>>>> >>>>> wc=0x7fff9d835900) at src/cq.c:468 >>>>> >>>>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>>> >>>>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >>>>> >>>>> at /usr/include/infiniband/verbs.h:934 >>>>> >>>>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fff9d8359e0, >>>>> >>>>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >>>>> >>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fff9d835b60) at helper_fns.c:125 >>>>> >>>>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, >>>>> recvbuf=0x217fc50, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>>> key=0, >>>>> >>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>>> >>>>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >>>>> argv=0x7fff9d835e70) >>>>> >>>>> at init.c:146 >>>>> >>>>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >>>>> >>>>> Stack trace for node 1: >>>>> >>>>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fffdee81020, >>>>> >>>>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >>>>> >>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fffdee811a0) at helper_fns.c:125 >>>>> >>>>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, >>>>> key=0, >>>>> >>>>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >>>>> >>>>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >>>>> argv=0x7fffdee814b0) >>>>> >>>>> at init.c:146 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >>>>> >>>>> -- >>>>> >>>>> Michael Heinz >>>>> >>>>> Principal Engineer, Qlogic Corporation >>>>> >>>>> King of Prussia, Pennsylvania >>>>> >>>>> >>>>> _______________________________________________ >>>>> mvapich-discuss mailing list >>>>> mvapich-discuss@cse.ohio-state.edu >>>>> >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> In the middle of difficulty, lies opportunity >>>>> >>>>> >>>>> >>>>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >>> >>> >> >> > > > From michael.heinz at qlogic.com Thu Jul 16 14:20:24 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu Jul 16 14:20:57 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4A5F5D9D.5080300@cse.ohio-state.edu> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> <4A5E3858.5010702@cse.ohio-state.edu> <4A5E63EB.7000700@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A75@MNEXMB1.qlogic.org> <4A5F566E.9050004@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2ABF@MNEXMB1.qlogic.org> <4A5F5D9D.5080300@cse.ohio-state.edu> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2ADE@MNEXMB1.qlogic.org> Basically the same behavior - although the node that seems stuck in MPIR_Barrier() seems to have switched from 0 to 1: Node 0: (gdb) where #0 0x000000361160abb5 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00002aaaab096b6c in mthca_poll_cq (ibcq=0x13c62980, ne=1, wc=0x7fffadb1bea0) at src/cq.c:468 #2 0x00002aaaaab617a4 in MPIDI_CH3I_MRAILI_Cq_poll () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #3 0x00002aaaaab1904d in MPIDI_CH3I_read_progress () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #4 0x00002aaaaab18cb4 in MPIDI_CH3I_Progress () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #5 0x00002aaaaab9e724 in PMPI_Recv () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #6 0x0000000000400ea8 in main (argc=3, argv=0x7fffadb1c208) at bw.c:91 Node 1: #0 mthca_poll_cq (ibcq=0xf5cdb0, ne=1, wc=0x7fff24883b40) at src/cq.c:461 #1 0x00002b24863f3164 in MPIDI_CH3I_MRAILI_Cq_poll () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #2 0x00002b24863f4248 in MPIDI_CH3I_MRAILI_Waiting_msg () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #3 0x00002b24863aab4b in MPIDI_CH3I_read_progress () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #4 0x00002b24863aa784 in MPIDI_CH3I_Progress () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #5 0x00002b24863f13b7 in MPIC_Wait () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #6 0x00002b24863f17b3 in MPIC_Sendrecv () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #7 0x00002b248639cc6a in MPIR_Barrier () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #8 0x00002b248639d258 in PMPI_Barrier () from /usr/mpi/gcc/mvapich2-1.4rc1/lib/libmpich.so.1.1 #9 0x0000000000400ea3 in main (argc=3, argv=0x7fff24883f68) at bw.c:81 I did some basic debugging; MPIR_Barrier() is being called correctly on both nodes, but it appears that one of them never receives a send from the other. I'm guessing that it's a dropped IB completion, but that's all I can say at this point. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] Sent: Thursday, July 16, 2009 1:04 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. Mike, Can you also try out mvapich2-1.4 RC1. We have added a bunch of enhancements and bug-fixes in this version. Thanks, Krishna Mike Heinz wrote: > mvapich-1.1.0-3355.src.rpm > > mvapich2-1.2p1-1.src.rpm > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] > Sent: Thursday, July 16, 2009 12:34 PM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. > > Mike, > Can you also let us know the version numbers of the mvapich2 > and mvapich1 stacks that you are using? > > Thanks, > Krishna > > Mike Heinz wrote: > >> Krishna, >> >> What I'm saying is that if I run the program between A & D or A & C it works, but if I run it between A & B it silently hangs, never making progress. Meanwhile, I can run the same program between C & B and C & A, but runs between C & D silently hang without making progress. This problem only occurs with mvapich2, not with mvapich1 or openmpi. All other Infiniband operations appear to be working normally. >> >> This behavior is repeatable for those two pairs of machines ( A & B and C & D), but has not been seen on any other machines on the fabric, and we have not seen this on any other fabric - if I had to guess there's some kind of timing hole that's being exposed in very narrow conditions. >> >> The fabric in question is actually used to test software before we release it, so it contains a mix of Linux distros, but all machines are X86_64 architecture. >> >> For the stack traces I sent you, node 0 is a 8-way Xeon E5320 1.86 gHZ, node 1 is a 2-way Opteron running at 2.4 GHz. >> >> I realize the symptoms are quite bizarre - we've had several Infiniband coders and testers investigating this for a couple of weeks now - I was just hoping you might be able to suggest a line of investigation. >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation >> King of Prussia, Pennsylvania >> >> -----Original Message----- >> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >> Sent: Wednesday, July 15, 2009 7:19 PM >> To: Mike Heinz >> Cc: mvapich-discuss@cse.ohio-state.edu; Todd Rimmer >> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. >> >> Mike, >> I guess, I had mistakenly started the job on 3 processes earlier >> and it had hung. On running with 2 processes, (the way it is supposed to >> be run), it executes correctly on our machines. Can you give us some >> more information about your hardware. You were speaking about some >> reachability issues between certain two nodes. I am guessing that you >> are running tests with on either : >> 1. Nodes "A" and "D" or >> 2. Nodes "B" and "C" >> >> Also, >> > "A" can't run mvapich2 programs with machine "B", and machine "C" >> can't run programs with machine "D" >> >> What exactly is the kind of error message that you see in this case? >> >> Thanks, >> Krishna >> >> Krishna Chaitanya Kandalla wrote: >> >> >>> Mike, >>> Thank you for providing the source code. I am able to >>> reproduce the hang on our cluster, as well. I will look into the issue. >>> >>> Thanks, >>> Krishna >>> >>> Mike Heinz wrote: >>> >>> >>>> I was wondering about that - I passed the parameter in a param file, >>>> using the -param argument to mpirun_rsh. I just tried passing it >>>> inline as well, here are the results: >>>> >>>> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 >>>> /opt/iba/src/mpi_apps/bandwidth/bw 10 10 >>>> >>>> node 0 >>>> >>>> Loaded symbols for /lib64/libnss_files.so.2 >>>> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>>> (gdb) where >>>> #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>>> #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) >>>> at ch3_progress.c:174 >>>> #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, >>>> datatype=1275068673, source=1, tag=101, comm=1140850688, >>>> status=0x601520) >>>> at recv.c:156 >>>> #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 >>>> >>>> >>>> (gdb) where >>>> #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, >>>> wc=0x7fffb9786a60) at src/cq.c:470 >>>> #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>> vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, >>>> is_blocking=1) >>>> at /usr/include/infiniband/verbs.h:934 >>>> #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, >>>> vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 >>>> #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fffb9786b80, >>>> v_ptr=0x7fffb9786b78, is_blocking=) >>>> at ch3_read_progress.c:158 >>>> #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>> state=) at ch3_progress.c:202 >>>> #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at >>>> helper_fns.c:269 >>>> #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, >>>> sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, >>>> recvtype=1275068685, source=0, recvtag=1, comm=1140850688, >>>> status=0x1) >>>> at helper_fns.c:125 >>>> #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) >>>> at barrier.c:82 >>>> #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at >>>> barrier.c:446 >>>> #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 >>>> >>>> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets >>>> out of MPI_Init() in this case, but then one side is waiting at a >>>> barrier while the other has already gone past the barrier. I've >>>> attached a copy of the program. >>>> >>>> >>>> -- >>>> Michael Heinz >>>> Principal Engineer, Qlogic Corporation >>>> King of Prussia, Pennsylvania >>>> -----Original Message----- >>>> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >>>> Sent: Wednesday, July 15, 2009 3:42 PM >>>> To: Mike Heinz >>>> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >>>> a problem that only affects a few machines in our cluster. >>>> >>>> Mike, >>>> Thats a little surprising. Setting this variable off ensures that a >>>> particular flag is set to 0. This flag is supposed to guard the piece >>>> of code that does the 2-level communicator creation. Just out of >>>> curiosity, can you also let me know the command that you are using to >>>> launch the job. The env variables need to be set before the >>>> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the >>>> executable name, the job launcher might not pick it up. >>>> >>>> Thanks, >>>> Krishna >>>> >>>> >>>> >>>> >>>> Mike Heinz wrote: >>>> >>>> >>>> >>>>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL >>>>> to zero did not seem to change the stack trace much: >>>>> >>>>> Node 0: >>>>> >>>>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >>>>> (vbuf_handle=0x7fffcb46d698, >>>>> >>>>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >>>>> >>>>> 529 for (; i < rdma_num_hcas; ++i) { >>>>> >>>>> (gdb) where >>>>> >>>>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>>> >>>>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >>>>> >>>>> at ibv_channel_manager.c:529 >>>>> >>>>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fffcb46d6a0, >>>>> >>>>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, >>>>> sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, >>>>> recvcount=2, >>>>> >>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fffcb46d820) at helper_fns.c:125 >>>>> >>>>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, >>>>> recvbuf=0x10993a80, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>>> key=0, >>>>> >>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>>> >>>>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >>>>> argv=0x7fffcb46db30) >>>>> >>>>> at init.c:146 >>>>> >>>>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >>>>> >>>>> Node 1: >>>>> >>>>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >>>>> >>>>> is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >>>>> >>>>> (gdb) where >>>>> >>>>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, >>>>> v_ptr=0x7fff0b10bb48, >>>>> >>>>> is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >>>>> >>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fff0b10bcd0) at helper_fns.c:125 >>>>> >>>>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, >>>>> key=0, >>>>> >>>>> newcomm=0x2afc9fd26d94) at comm_split.c:196 >>>>> >>>>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >>>>> argv=0x7fff0b10bfe0) >>>>> >>>>> at init.c:146 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> -- >>>>> >>>>> Michael Heinz >>>>> >>>>> Principal Engineer, Qlogic Corporation >>>>> >>>>> King of Prussia, Pennsylvania >>>>> >>>>> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On >>>>> Behalf Of *Krishna Chaitanya >>>>> *Sent:* Tuesday, July 14, 2009 6:39 PM >>>>> *To:* Mike Heinz >>>>> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >>>>> mpich2-dev@mcs.anl.gov >>>>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in >>>>> debugging a problem that only affects a few machines in our cluster. >>>>> >>>>> Mike, >>>>> The hang seems to be occuring when the MPI library is trying to >>>>> create the 2-level communicator, during the init phase. Can you try >>>>> running the test with MV2_USE_SHMEM_COLL >>>>> =0. >>>>> This will ensure that a flat communicator is used for the subsequent >>>>> MPI calls. This might help us isolate the problem. >>>>> >>>>> Thanks, >>>>> Krishna >>>>> >>>>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz >>>>> > wrote: >>>>> >>>>> We're having a very odd problem with our fabric, where, out of the >>>>> entire cluster, machine "A" can't run mvapich2 programs with machine >>>>> "B", and machine "C" can't run programs with machine "D" - even >>>>> though "A" can run with "D" and "B" can run with "C" - and the rest >>>>> of the fabric works fine. >>>>> >>>>> 1) There are no IB errors anywhere on the fabric that I can find, >>>>> and the machines in question all work correctly with mvapich1 and >>>>> low-level IB tests. >>>>> >>>>> 2) The problem occurs whether using mpd or rsh. >>>>> >>>>> 3) If I attach to the running processes, both machines appear to be >>>>> waiting for a read operation to complete. (See below) >>>>> >>>>> Can anyone make a suggestion on how to debug this? >>>>> >>>>> Stack trace for node 0: >>>>> >>>>> #0 0x000000361160abb5 in pthread_spin_lock () from >>>>> /lib64/libpthread.so.0 >>>>> >>>>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >>>>> >>>>> wc=0x7fff9d835900) at src/cq.c:468 >>>>> >>>>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>>> >>>>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >>>>> >>>>> at /usr/include/infiniband/verbs.h:934 >>>>> >>>>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fff9d8359e0, >>>>> >>>>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >>>>> >>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fff9d835b60) at helper_fns.c:125 >>>>> >>>>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, >>>>> recvbuf=0x217fc50, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>>> key=0, >>>>> >>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>>> >>>>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >>>>> argv=0x7fff9d835e70) >>>>> >>>>> at init.c:146 >>>>> >>>>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >>>>> >>>>> Stack trace for node 1: >>>>> >>>>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fffdee81020, >>>>> >>>>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >>>>> >>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fffdee811a0) at helper_fns.c:125 >>>>> >>>>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, >>>>> key=0, >>>>> >>>>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >>>>> >>>>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >>>>> argv=0x7fffdee814b0) >>>>> >>>>> at init.c:146 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >>>>> >>>>> -- >>>>> >>>>> Michael Heinz >>>>> >>>>> Principal Engineer, Qlogic Corporation >>>>> >>>>> King of Prussia, Pennsylvania >>>>> >>>>> >>>>> _______________________________________________ >>>>> mvapich-discuss mailing list >>>>> mvapich-discuss@cse.ohio-state.edu >>>>> >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> In the middle of difficulty, lies opportunity >>>>> >>>>> >>>>> >>>>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >>> >>> >> >> > > > From michael.heinz at qlogic.com Thu Jul 16 15:06:54 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu Jul 16 15:07:24 2009 Subject: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. In-Reply-To: <4A5F5D9D.5080300@cse.ohio-state.edu> References: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F296D@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A0F@MNEXMB1.qlogic.org> <4A5E30F5.8070302@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A2C@MNEXMB1.qlogic.org> <4A5E3858.5010702@cse.ohio-state.edu> <4A5E63EB.7000700@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2A75@MNEXMB1.qlogic.org> <4A5F566E.9050004@cse.ohio-state.edu> <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2ABF@MNEXMB1.qlogic.org> <4A5F5D9D.5080300@cse.ohio-state.edu> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB453E3F2AE8@MNEXMB1.qlogic.org> Krishna - just to be clear, this isn't just a problem with the bw program, it happens with all mpi programs tested on these machines. For example, when testing osu_bw, it hangs in Waitall(), polling the Cq, similar to the way bw hung in Barrier(). -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] Sent: Thursday, July 16, 2009 1:04 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. Mike, Can you also try out mvapich2-1.4 RC1. We have added a bunch of enhancements and bug-fixes in this version. Thanks, Krishna Mike Heinz wrote: > mvapich-1.1.0-3355.src.rpm > > mvapich2-1.2p1-1.src.rpm > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] > Sent: Thursday, July 16, 2009 12:34 PM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. > > Mike, > Can you also let us know the version numbers of the mvapich2 > and mvapich1 stacks that you are using? > > Thanks, > Krishna > > Mike Heinz wrote: > >> Krishna, >> >> What I'm saying is that if I run the program between A & D or A & C it works, but if I run it between A & B it silently hangs, never making progress. Meanwhile, I can run the same program between C & B and C & A, but runs between C & D silently hang without making progress. This problem only occurs with mvapich2, not with mvapich1 or openmpi. All other Infiniband operations appear to be working normally. >> >> This behavior is repeatable for those two pairs of machines ( A & B and C & D), but has not been seen on any other machines on the fabric, and we have not seen this on any other fabric - if I had to guess there's some kind of timing hole that's being exposed in very narrow conditions. >> >> The fabric in question is actually used to test software before we release it, so it contains a mix of Linux distros, but all machines are X86_64 architecture. >> >> For the stack traces I sent you, node 0 is a 8-way Xeon E5320 1.86 gHZ, node 1 is a 2-way Opteron running at 2.4 GHz. >> >> I realize the symptoms are quite bizarre - we've had several Infiniband coders and testers investigating this for a couple of weeks now - I was just hoping you might be able to suggest a line of investigation. >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation >> King of Prussia, Pennsylvania >> >> -----Original Message----- >> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >> Sent: Wednesday, July 15, 2009 7:19 PM >> To: Mike Heinz >> Cc: mvapich-discuss@cse.ohio-state.edu; Todd Rimmer >> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging a problem that only affects a few machines in our cluster. >> >> Mike, >> I guess, I had mistakenly started the job on 3 processes earlier >> and it had hung. On running with 2 processes, (the way it is supposed to >> be run), it executes correctly on our machines. Can you give us some >> more information about your hardware. You were speaking about some >> reachability issues between certain two nodes. I am guessing that you >> are running tests with on either : >> 1. Nodes "A" and "D" or >> 2. Nodes "B" and "C" >> >> Also, >> > "A" can't run mvapich2 programs with machine "B", and machine "C" >> can't run programs with machine "D" >> >> What exactly is the kind of error message that you see in this case? >> >> Thanks, >> Krishna >> >> Krishna Chaitanya Kandalla wrote: >> >> >>> Mike, >>> Thank you for providing the source code. I am able to >>> reproduce the hang on our cluster, as well. I will look into the issue. >>> >>> Thanks, >>> Krishna >>> >>> Mike Heinz wrote: >>> >>> >>>> I was wondering about that - I passed the parameter in a param file, >>>> using the -param argument to mpirun_rsh. I just tried passing it >>>> inline as well, here are the results: >>>> >>>> mpiexec -env MV2_USE_SHMEM_COLL 0 -np 2 >>>> /opt/iba/src/mpi_apps/bandwidth/bw 10 10 >>>> >>>> node 0 >>>> >>>> Loaded symbols for /lib64/libnss_files.so.2 >>>> 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>>> (gdb) where >>>> #0 0x00002aaaaaae5bf8 in MPIDI_CH3I_SMP_write_progress@plt () >>>> from /usr/mpi/gcc/mvapich2-1.2p1/lib/libmpich.so.1.1 >>>> #1 0x00002aaaaab17536 in MPIDI_CH3I_Progress (is_blocking=1, state=0x1) >>>> at ch3_progress.c:174 >>>> #2 0x00002aaaaab98e14 in PMPI_Recv (buf=0xc50000, count=4, >>>> datatype=1275068673, source=1, tag=101, comm=1140850688, >>>> status=0x601520) >>>> at recv.c:156 >>>> #3 0x0000000000400ea8 in main (argc=3, argv=0x7ffffe2de508) at bw.c:91 >>>> >>>> >>>> (gdb) where >>>> #0 0x00002b9af218cd80 in mthca_poll_cq (ibcq=0xf5de80, ne=1, >>>> wc=0x7fffb9786a60) at src/cq.c:470 >>>> #1 0x00002b9af14ee2a8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>> vbuf_handle=0x7fffb9786b78, vc_req=0xf55d00, receiving=0, >>>> is_blocking=1) >>>> at /usr/include/infiniband/verbs.h:934 >>>> #2 0x00002b9af14ef2e5 in MPIDI_CH3I_MRAILI_Waiting_msg (vc=0xf55d00, >>>> vbuf_handle=0x7fffb9786b78, blocking=1) at ibv_channel_manager.c:468 >>>> #3 0x00002b9af14a8304 in MPIDI_CH3I_read_progress >>>> (vc_pptr=0x7fffb9786b80, >>>> v_ptr=0x7fffb9786b78, is_blocking=) >>>> at ch3_read_progress.c:158 >>>> #4 0x00002b9af14a7f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>> state=) at ch3_progress.c:202 >>>> #5 0x00002b9af14ec60e in MPIC_Wait (request_ptr=0xfc7978) at >>>> helper_fns.c:269 >>>> #6 0x00002b9af14eca03 in MPIC_Sendrecv (sendbuf=0x0, sendcount=0, >>>> sendtype=1275068685, dest=0, sendtag=1, recvbuf=0x0, recvcount=0, >>>> recvtype=1275068685, source=0, recvtag=1, comm=1140850688, >>>> status=0x1) >>>> at helper_fns.c:125 >>>> #7 0x00002b9af149b07a in MPIR_Barrier (comm_ptr=) >>>> at barrier.c:82 >>>> #8 0x00002b9af149b698 in PMPI_Barrier (comm=1140850688) at >>>> barrier.c:446 >>>> #9 0x0000000000400ea3 in main (argc=3, argv=0x7fffb9786e88) at bw.c:81 >>>> >>>> Bw.c is the old "bandwidth" benchmark. It looks like it actually gets >>>> out of MPI_Init() in this case, but then one side is waiting at a >>>> barrier while the other has already gone past the barrier. I've >>>> attached a copy of the program. >>>> >>>> >>>> -- >>>> Michael Heinz >>>> Principal Engineer, Qlogic Corporation >>>> King of Prussia, Pennsylvania >>>> -----Original Message----- >>>> From: Krishna Chaitanya Kandalla [mailto:kandalla@cse.ohio-state.edu] >>>> Sent: Wednesday, July 15, 2009 3:42 PM >>>> To: Mike Heinz >>>> Subject: Re: [mvapich-discuss] [mpich2-dev] Need a hint in debugging >>>> a problem that only affects a few machines in our cluster. >>>> >>>> Mike, >>>> Thats a little surprising. Setting this variable off ensures that a >>>> particular flag is set to 0. This flag is supposed to guard the piece >>>> of code that does the 2-level communicator creation. Just out of >>>> curiosity, can you also let me know the command that you are using to >>>> launch the job. The env variables need to be set before the >>>> executable is specified. If MV2_USE_SHMEM_COLL=0 appears after the >>>> executable name, the job launcher might not pick it up. >>>> >>>> Thanks, >>>> Krishna >>>> >>>> >>>> >>>> >>>> Mike Heinz wrote: >>>> >>>> >>>> >>>>> Krishna, thanks for the suggestion - but setting MV2_USE_SHMEM_COLL >>>>> to zero did not seem to change the stack trace much: >>>>> >>>>> Node 0: >>>>> >>>>> 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll >>>>> (vbuf_handle=0x7fffcb46d698, >>>>> >>>>> vc_req=0x0, receiving=0, is_blocking=1) at ibv_channel_manager.c:529 >>>>> >>>>> 529 for (; i < rdma_num_hcas; ++i) { >>>>> >>>>> (gdb) where >>>>> >>>>> #0 0x00002aaaaab5d8b7 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>>> >>>>> vbuf_handle=0x7fffcb46d698, vc_req=0x0, receiving=0, is_blocking=1) >>>>> >>>>> at ibv_channel_manager.c:529 >>>>> >>>>> #1 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fffcb46d6a0, >>>>> >>>>> v_ptr=0x7fffcb46d698, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #2 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #3 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #4 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x10993a80, >>>>> sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x10993a88, >>>>> recvcount=2, >>>>> >>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fffcb46d820) at helper_fns.c:125 >>>>> >>>>> #5 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, >>>>> recvbuf=0x10993a80, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #6 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0x10993a80, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #7 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>>> key=0, >>>>> >>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>>> >>>>> #8 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #9 0x00002aaaaab6877d in PMPI_Init (argc=0x7fffcb46db3c, >>>>> argv=0x7fffcb46db30) >>>>> >>>>> at init.c:146 >>>>> >>>>> #10 0x0000000000400b2f in main (argc=3, argv=0x7fffcb46dc78) at bw.c:27 >>>>> >>>>> Node 1: >>>>> >>>>> MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, v_ptr=0x7fff0b10bb48, >>>>> >>>>> is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> 143 type = MPIDI_CH3I_MRAILI_Cq_poll(v_ptr, NULL, 0, is_blocking); >>>>> >>>>> (gdb) where >>>>> >>>>> #0 MPIDI_CH3I_read_progress (vc_pptr=0x7fff0b10bb50, >>>>> v_ptr=0x7fff0b10bb48, >>>>> >>>>> is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #1 0x00002afc9fb21f44 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #2 0x00002afc9fb6660e in MPIC_Wait (request_ptr=0x2afc9fd242a0) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #3 0x00002afc9fb66a03 in MPIC_Sendrecv (sendbuf=0xf77028, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf77020, recvcount=4, >>>>> >>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fff0b10bcd0) at helper_fns.c:125 >>>>> >>>>> #4 0x00002afc9fb08ddb in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, recvbuf=0xf77020, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2afc9fd26c80) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #5 0x00002afc9fb09a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf77020, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #6 0x00002afc9fb4591b in PMPI_Comm_split (comm=1140850688, color=1, >>>>> key=0, >>>>> >>>>> newcomm=0x2afc9fd26d94) at comm_split.c:196 >>>>> >>>>> #7 0x00002afc9fb478f4 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #8 0x00002afc9fb730a5 in PMPI_Init (argc=0x7fff0b10bfec, >>>>> argv=0x7fff0b10bfe0) >>>>> >>>>> at init.c:146 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fff0b10c128) at bw.c:27 >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> -- >>>>> >>>>> Michael Heinz >>>>> >>>>> Principal Engineer, Qlogic Corporation >>>>> >>>>> King of Prussia, Pennsylvania >>>>> >>>>> *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On >>>>> Behalf Of *Krishna Chaitanya >>>>> *Sent:* Tuesday, July 14, 2009 6:39 PM >>>>> *To:* Mike Heinz >>>>> *Cc:* Todd Rimmer; mvapich-discuss@cse.ohio-state.edu; >>>>> mpich2-dev@mcs.anl.gov >>>>> *Subject:* Re: [mvapich-discuss] [mpich2-dev] Need a hint in >>>>> debugging a problem that only affects a few machines in our cluster. >>>>> >>>>> Mike, >>>>> The hang seems to be occuring when the MPI library is trying to >>>>> create the 2-level communicator, during the init phase. Can you try >>>>> running the test with MV2_USE_SHMEM_COLL >>>>> =0. >>>>> This will ensure that a flat communicator is used for the subsequent >>>>> MPI calls. This might help us isolate the problem. >>>>> >>>>> Thanks, >>>>> Krishna >>>>> >>>>> On Tue, Jul 14, 2009 at 5:04 PM, Mike Heinz >>>>> > wrote: >>>>> >>>>> We're having a very odd problem with our fabric, where, out of the >>>>> entire cluster, machine "A" can't run mvapich2 programs with machine >>>>> "B", and machine "C" can't run programs with machine "D" - even >>>>> though "A" can run with "D" and "B" can run with "C" - and the rest >>>>> of the fabric works fine. >>>>> >>>>> 1) There are no IB errors anywhere on the fabric that I can find, >>>>> and the machines in question all work correctly with mvapich1 and >>>>> low-level IB tests. >>>>> >>>>> 2) The problem occurs whether using mpd or rsh. >>>>> >>>>> 3) If I attach to the running processes, both machines appear to be >>>>> waiting for a read operation to complete. (See below) >>>>> >>>>> Can anyone make a suggestion on how to debug this? >>>>> >>>>> Stack trace for node 0: >>>>> >>>>> #0 0x000000361160abb5 in pthread_spin_lock () from >>>>> /lib64/libpthread.so.0 >>>>> >>>>> #1 0x00002aaaab08fb6c in mthca_poll_cq (ibcq=0x2060980, ne=1, >>>>> >>>>> wc=0x7fff9d835900) at src/cq.c:468 >>>>> >>>>> #2 0x00002aaaaab5d8d8 in MPIDI_CH3I_MRAILI_Cq_poll ( >>>>> >>>>> vbuf_handle=0x7fff9d8359d8, vc_req=0x0, receiving=0, is_blocking=1) >>>>> >>>>> at /usr/include/infiniband/verbs.h:934 >>>>> >>>>> #3 0x00002aaaaab177fa in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fff9d8359e0, >>>>> >>>>> v_ptr=0x7fff9d8359d8, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #4 0x00002aaaaab17464 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #5 0x00002aaaaab5bc4e in MPIC_Wait (request_ptr=0x2aaaaae19800) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #6 0x00002aaaaab5c043 in MPIC_Sendrecv (sendbuf=0x217fc50, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=1, sendtag=7, recvbuf=0x217fc58, recvcount=2, >>>>> >>>>> recvtype=1275069445, source=1, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fff9d835b60) at helper_fns.c:125 >>>>> >>>>> #7 0x00002aaaaaafe387 in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, >>>>> recvbuf=0x217fc50, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2aaaaae1c1e0) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #8 0x00002aaaaaafeff9 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0x217fc50, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x00002aaaaab3b00b in PMPI_Comm_split (comm=1140850688, color=0, >>>>> key=0, >>>>> >>>>> newcomm=0x2aaaaae1c2f4) at comm_split.c:196 >>>>> >>>>> #10 0x00002aaaaab3cd84 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #11 0x00002aaaaab6877d in PMPI_Init (argc=0x7fff9d835e7c, >>>>> argv=0x7fff9d835e70) >>>>> >>>>> at init.c:146 >>>>> >>>>> #12 0x0000000000400b2f in main (argc=3, argv=0x7fff9d835fb8) at bw.c:27 >>>>> >>>>> Stack trace for node 1: >>>>> >>>>> #0 0x00002ac3cbdac2d2 in MPIDI_CH3I_read_progress >>>>> (vc_pptr=0x7fffdee81020, >>>>> >>>>> v_ptr=0x7fffdee81018, is_blocking=1) at ch3_read_progress.c:143 >>>>> >>>>> #1 0x00002ac3cbdabf44 in MPIDI_CH3I_Progress (is_blocking=1, >>>>> >>>>> state=) at ch3_progress.c:202 >>>>> >>>>> #2 0x00002ac3cbdf060e in MPIC_Wait (request_ptr=0x2ac3cbfae2a0) >>>>> >>>>> at helper_fns.c:269 >>>>> >>>>> #3 0x00002ac3cbdf0a03 in MPIC_Sendrecv (sendbuf=0xf79028, sendcount=2, >>>>> >>>>> sendtype=1275069445, dest=0, sendtag=7, recvbuf=0xf79020, recvcount=4, >>>>> >>>>> recvtype=1275069445, source=0, recvtag=7, comm=1140850688, >>>>> >>>>> status=0x7fffdee811a0) at helper_fns.c:125 >>>>> >>>>> #4 0x00002ac3cbd92ddb in MPIR_Allgather (sendbuf=, >>>>> >>>>> sendcount=, sendtype=1275069445, recvbuf=0xf79020, >>>>> >>>>> recvcount=2, recvtype=1275069445, comm_ptr=0x2ac3cbfb0c80) >>>>> >>>>> at allgather.c:192 >>>>> >>>>> #5 0x00002ac3cbd93a45 in PMPI_Allgather (sendbuf=0xffffffffffffffff, >>>>> >>>>> sendcount=2, sendtype=1275069445, recvbuf=0xf79020, recvcount=2, >>>>> >>>>> recvtype=1275069445, comm=1140850688) at allgather.c:866 >>>>> >>>>> #6 0x00002ac3cbdcf91b in PMPI_Comm_split (comm=1140850688, color=1, >>>>> key=0, >>>>> >>>>> newcomm=0x2ac3cbfb0d94) at comm_split.c:196 >>>>> >>>>> #7 0x00002ac3cbdd18f4 in create_2level_comm (comm=1140850688, size=2, >>>>> >>>>> my_rank=) at create_2level_comm.c:142 >>>>> >>>>> #8 0x00002ac3cbdfd0a5 in PMPI_Init (argc=0x7fffdee814bc, >>>>> argv=0x7fffdee814b0) >>>>> >>>>> at init.c:146 >>>>> >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> #9 0x0000000000400bcf in main (argc=3, argv=0x7fffdee815f8) at bw.c:27 >>>>> >>>>> -- >>>>> >>>>> Michael Heinz >>>>> >>>>> Principal Engineer, Qlogic Corporation >>>>> >>>>> King of Prussia, Pennsylvania >>>>> >>>>> >>>>> _______________________________________________ >>>>> mvapich-discuss mailing list >>>>> mvapich-discuss@cse.ohio-state.edu >>>>> >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> In the middle of difficulty, lies opportunity >>>>> >>>>> >>>>> >>>>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >>> >>> >> >> > > > From subramon at cse.ohio-state.edu Fri Jul 17 15:23:30 2009 From: subramon at cse.ohio-state.edu (Hari Subramoni) Date: Fri Jul 17 15:23:55 2009 Subject: [mvapich-discuss] Question about NUMA support in mvapich2-1.2 (and later) In-Reply-To: <4A577DAB.4040801@noaa.gov> Message-ID: Hi Craig, MVAPICH2 supports user defined CPU mapping through Portable Linux Processor Affinity (PLPA) library. The MVAPICH2 user guide provides detailed instructions on how to run applications with user desired core mapping. Please refer the following link for more information... https://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 Please let us know if you face any issues with this. Thx, Hari. On Fri, 10 Jul 2009, Craig Tierney wrote: > The mvapich2 documentation states: > > Optimized for Bus-based SMP and NUMA-Based SMP systems. > > But I cannot find any other reference to what exactly > mvapich2 does for NUMA-Based systems (like nehalem). Simple > tests have shown that I cannot use numactl to explicitly > lay out MPI processes to improve performance for memory > bandwidth sensitive applications over just running > the application directly with mpirun. But I would like > to understand what is being done by mvapich2. > > Thanks, > Craig > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From perkinjo at cse.ohio-state.edu Mon Jul 20 08:58:53 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Mon Jul 20 08:59:22 2009 Subject: [mvapich-discuss] OSU MVAPICH2 1.4-RC1-3378 (06/02/09) VPATH build and "debug" static? In-Reply-To: <20090715182357.GD2455@cse.ohio-state.edu> References: <20090715182357.GD2455@cse.ohio-state.edu> Message-ID: <20090720125853.GB2461@cse.ohio-state.edu> On Wed, Jul 15, 2009 at 02:23:57PM -0400, Jonathan Perkins wrote: > On Tue, Jul 07, 2009 at 11:29:02PM -0700, Brad Penoff wrote: > > ----Issue #1---- > > I downloaded > > http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-1.4rc1-3378.tgz > > and then tar zxf, cd mvapich2-1.4rc1, and then mkdir build. From > > inside build, I did a VPATH build by configuring to create 32-bit > > libraries the following way on my 64-bit machine (Red Hat Enterprise > > Linux Server release 5.1 (Tikanga)): > > > > $ ../configure CFLAGS=-m32 CPPFLAGS=-m32 FC=gfortran F90=gfortran > > FFLAGS=-m32 F90FLAGS=-m32 LDFLAGS=-m32 > > --prefix=/home/penoff/installs/mvapich2-1.4rc1 > > > > Eventually the "make" died with the error below. When I did the > > same configure line but instead did not to a VPATH build (so from > > mvapich2-1.4rc1), the build succeeded as did the "make install". I > > was just wondering if this was a known issue. The error I saw is > > below. Placed shortened build error in-line... > > make[3]: Entering directory > > `/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun' > > GCC [../../../../src/pm/mpirun/mpirun_rsh.c] > > ../../../../src/pm/mpirun/mpirun_rsh.c:27:24: error: mpirunconf.h: No > > such file or directory > > ../../../../src/pm/mpirun/mpirun_rsh.c:272: error: expected identifier > > or ?(? before ?__extension__? > > make[3]: *** [mpirun_rsh.o] Error 1 > > make[3]: Leaving directory > > `/home/penoff/src/mvapich2-1.4rc1/build/src/pm/mpirun' > > make[2]: *** [all-redirect] Error 2 > > make[2]: Leaving directory `/home/penoff/src/mvapich2-1.4rc1/build/src/pm' > > make[1]: *** [all-redirect] Error 2 > > make[1]: Leaving directory `/home/penoff/src/mvapich2-1.4rc1/build/src' > > make: *** [all-redirect] Error 2 > > I'll see if I can reproduce this and if so, resolve it. I've reproduced and resolved this on trunk today. Can you try this out and let us know if everything builds smoothly for you? > > ----- Issue #2 ---- > > Once installed, we compiled our code. We have a function called > > debug() in our code somewhere. It was conflicting when compiling > > with an internal variable of your code. I'm not sure who is at > > fault here, but instead of renaming our function and adjusting all > > of our code in countless places, instead to fix this, I just made > > the long variable "debug" in > > src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:503 to be > > static by putting that keyword at the beginning of the line. > > I don't think this is anyone's "fault" but can probably be avoided to > you and other users on our side if we can avoid unnecessary namespace > pollution. This is also included in the commits made to trunk this morning. -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090720/8a267f1d/attachment.bin From perkinjo at cse.ohio-state.edu Mon Jul 20 09:02:26 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Mon Jul 20 09:02:51 2009 Subject: [mvapich-discuss] Problem compiling gromacs-4.0.5 against mvapich2-1.4rc1 In-Reply-To: <12a5e6520907140923o4dd786b5ya9dca1953a4db530@mail.gmail.com> References: <12a5e6520907140923o4dd786b5ya9dca1953a4db530@mail.gmail.com> Message-ID: <20090720130226.GC2461@cse.ohio-state.edu> On Tue, Jul 14, 2009 at 12:23:11PM -0400, Jerry Leahy wrote: > /usr/local/lib/libmpich.a(ibv_channel_manager.o):(.bss+0x10): multiple > definition of `debug' > <..snip..>/gromacs-4.0.5/src/gmxlib/.libs/libgmx_mpi.a(gmx_fatal.o):(.bss+0x0): > first defined here > collect2: ld returned 1 exit status > make[3]: *** [grompp] Error 1 > > It looks like 'debug' is conflicting in both MVAPICH2 and in Gromacs. > > Any suggestions? We've committed a change to our source this morning that changes this definition of debug to static so that this shouldn't cause any conflicts in the shared namespace. Can you try building the latest from trunk and confirm that this resolves your problem? -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090720/536c06dd/attachment.bin From j.r.jones at bath.ac.uk Mon Jul 20 11:16:36 2009 From: j.r.jones at bath.ac.uk (Jessica Jones) Date: Mon Jul 20 11:24:08 2009 Subject: [mvapich-discuss] Trouble attempting to use MVAPICH2 built with BLCR support Message-ID: <4A648A54.70408@bath.ac.uk> Hi I built my MVAPICH2 implementation with --enable-blcr but it doesn't seem to be behaving itself. I've tried rebuilding it with both the Intel Compiler Suite and also GNU GCC (4.1 in this case). Basically, even when I don't try to use any sort of checkpointing at all, I get this when trying to run a basic hello-world: Running helloworld .. mpiexec_node100 (mpiexec 532): mpiexec: from man, invalid msg=:{}: I managed to find where in the MVAPICH2 source this message comes from, but I'm not sure what is causing it. Does anyone have any ideas? I've been through the documentation for both BLCR and MVAPICH2 and I can't see anything that I'm missing. Thanks Jess From perkinjo at cse.ohio-state.edu Mon Jul 20 11:55:13 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Mon Jul 20 11:55:42 2009 Subject: [mvapich-discuss] Trouble attempting to use MVAPICH2 built with BLCR support In-Reply-To: <4A648A54.70408@bath.ac.uk> References: <4A648A54.70408@bath.ac.uk> Message-ID: <20090720155513.GA2467@cse.ohio-state.edu> On Mon, Jul 20, 2009 at 04:16:36PM +0100, Jessica Jones wrote: > Hi > > I built my MVAPICH2 implementation with --enable-blcr but it doesn't > seem to be behaving itself. I've tried rebuilding it with both the > Intel Compiler Suite and also GNU GCC (4.1 in this case). > > Basically, even when I don't try to use any sort of checkpointing at > all, I get this when trying to run a basic hello-world: > > Running helloworld .. > mpiexec_node100 (mpiexec 532): mpiexec: from man, invalid msg=:{}: > > I managed to find where in the MVAPICH2 source this message comes from, > but I'm not sure what is causing it. > > Does anyone have any ideas? I've been through the documentation for > both BLCR and MVAPICH2 and I can't see anything that I'm missing. Can you let us know which version of MVAPICH2 you're using? Looking at the message it seems that issue is probably due to using an mpiexec that isn't a symbolic link to mpiexec_cr. Is there an mpiexec_cr in your path or where you installed mvapich2? Try this out if you see it. > > Thanks > > Jess > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090720/663f104b/attachment.bin From leahy at crystal.harvard.edu Tue Jul 21 18:05:29 2009 From: leahy at crystal.harvard.edu (Jerry Leahy) Date: Tue Jul 21 18:05:56 2009 Subject: [mvapich-discuss] Problem compiling gromacs-4.0.5 against mvapich2-1.4rc1 In-Reply-To: <20090720130226.GC2461@cse.ohio-state.edu> References: <12a5e6520907140923o4dd786b5ya9dca1953a4db530@mail.gmail.com> <20090720130226.GC2461@cse.ohio-state.edu> Message-ID: <12a5e6520907211505r4e927d65u64dd52dee9560f6e@mail.gmail.com> Hi Jonathan, Gromacs 4.0.5 compiles AOK after checking out mvapich2-trunk. Thanks! Jerry. On Mon, Jul 20, 2009 at 9:02 AM, Jonathan Perkins < perkinjo@cse.ohio-state.edu> wrote: > On Tue, Jul 14, 2009 at 12:23:11PM -0400, Jerry Leahy wrote: > > /usr/local/lib/libmpich.a(ibv_channel_manager.o):(.bss+0x10): multiple > > definition of `debug' > > > <..snip..>/gromacs-4.0.5/src/gmxlib/.libs/libgmx_mpi.a(gmx_fatal.o):(.bss+0x0): > > first defined here > > collect2: ld returned 1 exit status > > make[3]: *** [grompp] Error 1 > > > > It looks like 'debug' is conflicting in both MVAPICH2 and in Gromacs. > > > > Any suggestions? > > We've committed a change to our source this morning that changes this > definition of debug to static so that this shouldn't cause any conflicts > in the shared namespace. Can you try building the latest from trunk and > confirm that this resolves your problem? > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090721/1e25b49d/attachment.html From xlffei at gmail.com Wed Jul 22 02:27:29 2009 From: xlffei at gmail.com (lei xu) Date: Wed Jul 22 09:48:11 2009 Subject: [mvapich-discuss] Why cannot I get the mpirun_rsh command? Message-ID: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> Hi, there I try to build mvapich-1.1 by using make.mvapich.tcp. GCC and gfortran are used to compile the source file. I work on the vritual machine created by vmware workstation on vista. The guest os is SLES-11, gcc-4.3.2,Ethern I test the mpirun command after building. It works fine. But I cannot find the mpirun_rsh command in bin folder. Can anybody show me why? thanks Best regards ----------------------------------------------- lxu,shanghai,china -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090722/96138c1d/attachment.html From perkinjo at cse.ohio-state.edu Wed Jul 22 12:53:35 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Wed Jul 22 12:54:03 2009 Subject: [mvapich-discuss] Why cannot I get the mpirun_rsh command? In-Reply-To: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> References: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> Message-ID: <20090722165335.GC3023@cse.ohio-state.edu> On Wed, Jul 22, 2009 at 02:27:29PM +0800, lei xu wrote: > Hi, there > I try to build mvapich-1.1 by using make.mvapich.tcp. GCC and gfortran are > used to compile the source file. > I work on the vritual machine created by vmware workstation on vista. The > guest os is SLES-11, gcc-4.3.2,Ethern > > I test the mpirun command after building. It works fine. > But I cannot find the mpirun_rsh command in bin folder. > > Can anybody show me why? The mpirun_rsh command is not built with the tcp interface. You should continue using mpirun with this build. > > thanks > > Best regards > > ----------------------------------------------- > lxu,shanghai,china > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090722/040fa91a/attachment-0001.bin From perkinjo at cse.ohio-state.edu Wed Jul 22 13:01:38 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Wed Jul 22 13:02:04 2009 Subject: [mvapich-discuss] Needs firewall setting for MPI In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF0BC3084C@CFEVS1-IP.americas.cray.com> References: <925346A443D4E340BEB20248BAFCDBDF0BC3084C@CFEVS1-IP.americas.cray.com> Message-ID: <20090722170137.GD3023@cse.ohio-state.edu> On Sun, Jul 12, 2009 at 11:37:41PM -0500, Satoshi Isono wrote: > Hello everyone, > > Does anyone know FireWall configuration which allows MVAPICH code to > run. Which port do I have to open? I am using mpirun_rsh based on SSH. > Certainly, ssh command without password works. > > [craysp@t2k-0004 ~]$ ssh t2k-ps1 hostname > t2k-ps1 > > [craysp@t2k-0004 ~]$ ssh t2k-ps2 hostname > t2k-ps2 > > t2k-0004 is login node, on where everyone launches mpirun_rsh command. > For two nodes, t2k-ps1 and t2k-ps2 are actually compute nodes. When > launching mpirun_rsh, but it fails with the following messages. > > gethostbyname: Host name lookup failure > > Child exited abnormally! > cleanupKilling remote processes...gethostbyname: Host name lookup > failure > DONE > > It seems resolving hostname is fail. Can you please advise me or point > me checking files? This seems to be more of an issue network configuration in resolving names and seemingly not a firewall issue. Snippet from manpage... The function gethostname(2) is used to get the hostname. Only when the hostname -s is called will gethostbyname(3) be called. The difference in gethostname(2) and gethostbyname(3) is that gethostbyname(3) is net- work aware, so it consults /etc/nsswitch.conf and /etc/host.conf to decide whether to read information in /etc/sysconfig/network or /etc/hosts the hostname is also set when the network interface is brought up. -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090722/af86a74b/attachment.bin From perkinjo at cse.ohio-state.edu Fri Jul 24 15:42:57 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Fri Jul 24 15:43:25 2009 Subject: [mvapich-discuss] Re: [mpich-discuss] Problem compiling gromacs-4.0.5 against mvapich2-1.4rc1 In-Reply-To: <20090724114354.1fr3xjkurk4k8kgw@webmail.utoronto.ca> Message-ID: <20090724194257.GA9052@cse.ohio-state.edu> My responses to your questions are inline. Please post questions related to MVAPICH or MVAPICH2 to mvapich-discuss@cse.ohio-state.edu. Thanks. At some point chris.neale wrote: > Hello, > > I was having the same problem with gromacs mvapich2 until I found this > post. I would rather not use the branch version for production > simulations right now, so I have taken the mvapich2-1.4rc1-3378.tar.gz > distribution and modified > > src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c > at line 503 > from: > unsigned long debug = 0; > to: > static unsigned long debug = 0; > > as per mvapich2-trunk-2009-07-23 > > The compilation runs to completion and I am hoping that you can offer > a bit of advice. > > 1) Is this the only change that was completed in order to address > Jerry's original probem? Yes. > > 2) Is this fix stable with the mvapich2-1.4rc1-3378.tar.gz distribution? Yes. > > Thank you, > Chris. > > > On Tue, Jul 14, 2009 at 12:23:11PM -0400, Jerry Leahy wrote: > > /usr/local/lib/libmpich.a(ibv_channel_manager.o):(.bss+0x10): multiple > > definition of `debug' > > <..snip..>/gromacs-4.0.5/src/gmxlib/.libs/libgmx_mpi.a(gmx_fatal.o):(.bss+0x0): > > first defined here > > collect2: ld returned 1 exit status > > make[3]: *** [grompp] Error 1 > > > > It looks like 'debug' is conflicting in both MVAPICH2 and in Gromacs. > > > > Any suggestions? > > We've committed a change to our source this morning that changes this > definition of debug to static so that this shouldn't cause any conflicts > in the shared namespace. Can you try building the latest from trunk and > confirm that this resolves your problem? > > -- original message -- > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-July/002403.html -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090724/28a2b551/attachment.bin From yjlim at samboo.co.kr Mon Jul 27 01:38:50 2009 From: yjlim at samboo.co.kr (=?ks_c_5601-1987?B?wNO/68HY?=) Date: Mon Jul 27 09:54:03 2009 Subject: Subject: [mvapich-discuss]ibv_recv.c error Message-ID: <27F3665DDBE6428FB0DE1A3659490745@tech> Hello, All Our environment is below O/S : Fedora Core 10 Kernel Ver. : 2.6.27.5-117 HCA : QLE7240 Driver : OFED-1.4 Application : mvapich2 (1.2 version) We use Qlogic QLE7240 HCA, Qlogic9040 Infiniband switch and MVAPICH2 Server happen error message below when he run MPI process(MVAPICH2) and then MPI process stop So, I don't know why happen this error Please, I need help ---------------------------error message------------------------------ [2] Abort: Control shouldn't reach here in prototype, header 240 at line 276 in file ibv_recv.c rank 2 in job 6 master_43626 caused collective abort of all ranks exit status of rank 2: killed by signal 9 ----------------------------------------------------------------------- Thank you Jun. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090727/9894a6bd/attachment.html From panda at cse.ohio-state.edu Mon Jul 27 10:16:44 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Jul 27 10:17:10 2009 Subject: Subject: [mvapich-discuss]ibv_recv.c error In-Reply-To: <27F3665DDBE6428FB0DE1A3659490745@tech> Message-ID: Please note that QLogic-PSM support is not available in MVAPICH2 1.2. It is available in MVAPICH2 1.4. You can follow the steps mentioned in MVAPICH2 1.4 user guide (as indicated below) to build and use MVAPICH2 1.4 with QLogic PSM. http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-120004.6 DK On Mon, 27 Jul 2009, [ks_c_5601-1987] ÀÓ¿ëÁØ wrote: > Hello, All > > Our environment is below > > O/S : Fedora Core 10 > Kernel Ver. : 2.6.27.5-117 > HCA : QLE7240 > Driver : OFED-1.4 > Application : mvapich2 (1.2 version) > > We use Qlogic QLE7240 HCA, Qlogic9040 Infiniband switch and MVAPICH2 > > Server happen error message below when he run MPI process(MVAPICH2) and > then MPI process stop > > So, I don't know why happen this error > > Please, I need help > > ---------------------------error message------------------------------ > > [2] Abort: Control shouldn't reach here in prototype, header 240 > > at line 276 in file ibv_recv.c > > rank 2 in job 6 master_43626 caused collective abort of all ranks > > exit status of rank 2: killed by signal 9 > > ----------------------------------------------------------------------- > > Thank you > > Jun. > > From Craig.Tierney at noaa.gov Mon Jul 27 13:36:41 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Mon Jul 27 13:37:08 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: <4A576E3E.3030508@noaa.gov> References: <4A576E3E.3030508@noaa.gov> Message-ID: <4A6DE5A9.7090506@noaa.gov> Craig Tierney wrote: > Dhabaleswar Panda wrote: >> Craig - Could you please tell us little more about the details on your >> system: node configuration (sockets and cores/socket, processor type), OS >> version, etc. What kind of Ethernet connectivity does your system have? >> FYI, mpirun_rsh framework launches the job using the standard TCP/IP >> calls. >> > > The system is a cluster based on Supermicro Motherboards. Each node > is a dual-socket, quad-core harpertown, 2.8 GHz. Each node has 16 GB > of RAM. > > We are running Centos 5.1. We are using the 2.6.18-92.1.13.el5 kernel > and the e1000e Intel GigE driver. The nodes boot over NFS. The entire > OS image is available via NFS. > > About 30 nodes each attach to an SMC8150L2 gigE switch. The 9 swtiches > have 2 uplinks to an Force10 switch (not sure of model number). The > links are bonded via a port-channel. Spanning tree is disabled. > > We are testing a Centos 5.3 image for a new Nehalemm cluster, but I won't > have the hardware up until the end of next week. > > Craig > > A follow-up to my problem. On the new Nehalem cluster (QDR, Centos 5.3, OFED-1.4.1, Mvapich-1.2p1), I am still having applications hang when using mpirun_rsh. The problem seems to start around 512 cores, but it isn't exact. Not sure if this helps, but Openmpi does not have an issue (but I know has a completely different launching mechanism). The one similarity is that both systems are using SMC Tiger switch Gige switches within the racks and uplink to a Force10 GigE switch (although the behavior was repeated when the core switch was a Cisco unit). I have tried messing with MV2_MT_DEGREE. Setting this low, 4, seems to help large jobs start, but it does not solve the problem. So the problem could be hardware or a race condition caused in the software. Any ideas of how to debug the software side (or both) would be appreciated). Thanks, Craig > > > >> Thanks, >> >> DK >> >> >> >> On Thu, 9 Jul 2009, Craig Tierney wrote: >> >>> Dhabaleswar Panda wrote: >>>> Are you able to run simple MPI programs (say MPI Hello World) or some IMB >>>> tests using ~512 cores or larger. This will help you to find out whether >>>> there are any issues when launching jobs and isolate any nodes which might >>>> be having problems. >>>> >>>> Thanks, >>>> >>>> DK >>>> >>> I dug in further today while the system was offline, and this >>> is what I found. The mpispawn process is hanging. When it hangs >>> it does hang on different nodes each time. What I see is that >>> one side thinks the connection is closed, and the other side waits. >>> >>> At one end: >>> >>> [root@h43 ~]# netstat >>> Active Internet connections (w/o servers) >>> Proto Recv-Q Send-Q Local Address Foreign Address State >>> tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED >>> tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED >>> tcp 0 0 h43:49730 h6:56443 ESTABLISHED >>> tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT >>> tcp 0 0 h43:ssh h1:35169 ESTABLISHED >>> tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED >>> >>> >>> (gdb) bt >>> #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 >>> #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 >>> #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 >>> #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 >>> >>> At other end (node h4): >>> >>> (gdb) bt >>> #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 >>> #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 >>> #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 >>> >>> The netstat on h4 does not show any connections back to h43. >>> >>> I tried the latest 1.4Beta from the website (not svn) I found that >>> for large jobs mpirun_rsh will sometimes exits without running anything. >>> The large the job, the more likely it is to not to start the job properly. >>> The only difference is that it doesn't hang. I turned on debugging with >>> MPISPAWN_DEBUG, but I didn't see anything interesting from that. >>> >>> Craig >>> >>> >>> >>> >>>> On Wed, 8 Jul 2009, Craig Tierney wrote: >>>> >>>>> I am running mvapich2 1.2, built with Ofed support (v1.3.1). >>>>> For large jobs, I am having problems where they do not start. >>>>> I am using the mpirun_rsh launcher. When I try to start jobs >>>>> with ~512 cores or larger, I can see the problem. The problem >>>>> doesn't happen all the time. >>>>> >>>>> I can't rule our quirky hardware. The IB tree seems to be >>>>> clean (as reported by ibdiagnet). My last hang, I looked to >>>>> see if xhpl had started on all the nodes (8 cases for each >>>>> node for dual-socket quad-core systems). I found that 7 of >>>>> the 245 nodes (1960 core job) had no xhpl processes on them. >>>>> So either the launching mechanism hung, or something was up with one of >>>>> those nodes. >>>>> >>>>> My question is, how should I start debugging this to understand >>>>> what process is hanging? >>>>> >>>>> Thanks, >>>>> Craig >>>>> >>>>> >>>>> -- >>>>> Craig Tierney (craig.tierney@noaa.gov) >>>>> _______________________________________________ >>>>> mvapich-discuss mailing list >>>>> mvapich-discuss@cse.ohio-state.edu >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>> >>> -- >>> Craig Tierney (craig.tierney@noaa.gov) >>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > -- Craig Tierney (craig.tierney@noaa.gov) From panda at cse.ohio-state.edu Mon Jul 27 15:50:45 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Jul 27 15:51:13 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: <4A6DE5A9.7090506@noaa.gov> Message-ID: Craig, > A follow-up to my problem. On the new Nehalem cluster (QDR, Centos 5.3, > OFED-1.4.1, Mvapich-1.2p1), I am still having applications hang when using > mpirun_rsh. The problem seems to start around 512 cores, but it isn't exact. > Not sure if this helps, but Openmpi does not have an issue (but I know has > a completely different launching mechanism). Does this happen with OFED 1.4. As you might have seen from the OFA mailing lists, there have been some issues related to NFS traffic with OFED 1.4.1. > The one similarity is that both systems are using SMC Tiger switch Gige switches > within the racks and uplink to a Force10 GigE switch (although the behavior > was repeated when the core switch was a Cisco unit). > > I have tried messing with MV2_MT_DEGREE. Setting this low, 4, seems to help > large jobs start, but it does not solve the problem. This is good to know. What happens if you reduce MV2_MT_DEGREE to 2. The job start-up might be slower. However, we need to see whether it is able to start the large-scale jobs. > So the problem could be hardware or a race condition caused in the software. > Any ideas of how to debug the software side (or both) would be appreciated). Thanks, DK > Thanks, > Craig > > > > > > > > > > >> Thanks, > >> > >> DK > >> > >> > >> > >> On Thu, 9 Jul 2009, Craig Tierney wrote: > >> > >>> Dhabaleswar Panda wrote: > >>>> Are you able to run simple MPI programs (say MPI Hello World) or some IMB > >>>> tests using ~512 cores or larger. This will help you to find out whether > >>>> there are any issues when launching jobs and isolate any nodes which might > >>>> be having problems. > >>>> > >>>> Thanks, > >>>> > >>>> DK > >>>> > >>> I dug in further today while the system was offline, and this > >>> is what I found. The mpispawn process is hanging. When it hangs > >>> it does hang on different nodes each time. What I see is that > >>> one side thinks the connection is closed, and the other side waits. > >>> > >>> At one end: > >>> > >>> [root@h43 ~]# netstat > >>> Active Internet connections (w/o servers) > >>> Proto Recv-Q Send-Q Local Address Foreign Address State > >>> tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED > >>> tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED > >>> tcp 0 0 h43:49730 h6:56443 ESTABLISHED > >>> tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT > >>> tcp 0 0 h43:ssh h1:35169 ESTABLISHED > >>> tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED > >>> > >>> > >>> (gdb) bt > >>> #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 > >>> #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 > >>> #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 > >>> #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 > >>> > >>> At other end (node h4): > >>> > >>> (gdb) bt > >>> #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 > >>> #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 > >>> #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 > >>> > >>> The netstat on h4 does not show any connections back to h43. > >>> > >>> I tried the latest 1.4Beta from the website (not svn) I found that > >>> for large jobs mpirun_rsh will sometimes exits without running anything. > >>> The large the job, the more likely it is to not to start the job properly. > >>> The only difference is that it doesn't hang. I turned on debugging with > >>> MPISPAWN_DEBUG, but I didn't see anything interesting from that. > >>> > >>> Craig > >>> > >>> > >>> > >>> > >>>> On Wed, 8 Jul 2009, Craig Tierney wrote: > >>>> > >>>>> I am running mvapich2 1.2, built with Ofed support (v1.3.1). > >>>>> For large jobs, I am having problems where they do not start. > >>>>> I am using the mpirun_rsh launcher. When I try to start jobs > >>>>> with ~512 cores or larger, I can see the problem. The problem > >>>>> doesn't happen all the time. > >>>>> > >>>>> I can't rule our quirky hardware. The IB tree seems to be > >>>>> clean (as reported by ibdiagnet). My last hang, I looked to > >>>>> see if xhpl had started on all the nodes (8 cases for each > >>>>> node for dual-socket quad-core systems). I found that 7 of > >>>>> the 245 nodes (1960 core job) had no xhpl processes on them. > >>>>> So either the launching mechanism hung, or something was up with one of > >>>>> those nodes. > >>>>> > >>>>> My question is, how should I start debugging this to understand > >>>>> what process is hanging? > >>>>> > >>>>> Thanks, > >>>>> Craig > >>>>> > >>>>> > >>>>> -- > >>>>> Craig Tierney (craig.tierney@noaa.gov) > >>>>> _______________________________________________ > >>>>> mvapich-discuss mailing list > >>>>> mvapich-discuss@cse.ohio-state.edu > >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>> > >>> -- > >>> Craig Tierney (craig.tierney@noaa.gov) > >>> > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > > > > > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From Craig.Tierney at noaa.gov Mon Jul 27 16:19:28 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Mon Jul 27 16:19:56 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: References: Message-ID: <4A6E0BD0.2000408@noaa.gov> Dhabaleswar Panda wrote: > Craig, > >> A follow-up to my problem. On the new Nehalem cluster (QDR, Centos 5.3, >> OFED-1.4.1, Mvapich-1.2p1), I am still having applications hang when using >> mpirun_rsh. The problem seems to start around 512 cores, but it isn't exact. >> Not sure if this helps, but Openmpi does not have an issue (but I know has >> a completely different launching mechanism). > > Does this happen with OFED 1.4. As you might have seen from the OFA > mailing lists, there have been some issues related to NFS traffic with > OFED 1.4.1. > I haven't tested with OFED 1.4, but on the other system (which generated the original post) is running OFED 1.3.1. >> The one similarity is that both systems are using SMC Tiger switch Gige switches >> within the racks and uplink to a Force10 GigE switch (although the behavior >> was repeated when the core switch was a Cisco unit). >> >> I have tried messing with MV2_MT_DEGREE. Setting this low, 4, seems to help >> large jobs start, but it does not solve the problem. > > This is good to know. What happens if you reduce MV2_MT_DEGREE to 2. The > job start-up might be slower. However, we need to see whether it is able > to start the large-scale jobs. > I will try it. For some reason I thought 4 was the smallest value. Craig >> So the problem could be hardware or a race condition caused in the software. >> Any ideas of how to debug the software side (or both) would be appreciated). > > Thanks, > > DK > >> Thanks, >> Craig >> >> >> >>> >>> >>>> Thanks, >>>> >>>> DK >>>> >>>> >>>> >>>> On Thu, 9 Jul 2009, Craig Tierney wrote: >>>> >>>>> Dhabaleswar Panda wrote: >>>>>> Are you able to run simple MPI programs (say MPI Hello World) or some IMB >>>>>> tests using ~512 cores or larger. This will help you to find out whether >>>>>> there are any issues when launching jobs and isolate any nodes which might >>>>>> be having problems. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> DK >>>>>> >>>>> I dug in further today while the system was offline, and this >>>>> is what I found. The mpispawn process is hanging. When it hangs >>>>> it does hang on different nodes each time. What I see is that >>>>> one side thinks the connection is closed, and the other side waits. >>>>> >>>>> At one end: >>>>> >>>>> [root@h43 ~]# netstat >>>>> Active Internet connections (w/o servers) >>>>> Proto Recv-Q Send-Q Local Address Foreign Address State >>>>> tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED >>>>> tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED >>>>> tcp 0 0 h43:49730 h6:56443 ESTABLISHED >>>>> tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT >>>>> tcp 0 0 h43:ssh h1:35169 ESTABLISHED >>>>> tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED >>>>> >>>>> >>>>> (gdb) bt >>>>> #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 >>>>> #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 >>>>> #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 >>>>> #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 >>>>> >>>>> At other end (node h4): >>>>> >>>>> (gdb) bt >>>>> #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 >>>>> #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 >>>>> #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 >>>>> >>>>> The netstat on h4 does not show any connections back to h43. >>>>> >>>>> I tried the latest 1.4Beta from the website (not svn) I found that >>>>> for large jobs mpirun_rsh will sometimes exits without running anything. >>>>> The large the job, the more likely it is to not to start the job properly. >>>>> The only difference is that it doesn't hang. I turned on debugging with >>>>> MPISPAWN_DEBUG, but I didn't see anything interesting from that. >>>>> >>>>> Craig >>>>> >>>>> >>>>> >>>>> >>>>>> On Wed, 8 Jul 2009, Craig Tierney wrote: >>>>>> >>>>>>> I am running mvapich2 1.2, built with Ofed support (v1.3.1). >>>>>>> For large jobs, I am having problems where they do not start. >>>>>>> I am using the mpirun_rsh launcher. When I try to start jobs >>>>>>> with ~512 cores or larger, I can see the problem. The problem >>>>>>> doesn't happen all the time. >>>>>>> >>>>>>> I can't rule our quirky hardware. The IB tree seems to be >>>>>>> clean (as reported by ibdiagnet). My last hang, I looked to >>>>>>> see if xhpl had started on all the nodes (8 cases for each >>>>>>> node for dual-socket quad-core systems). I found that 7 of >>>>>>> the 245 nodes (1960 core job) had no xhpl processes on them. >>>>>>> So either the launching mechanism hung, or something was up with one of >>>>>>> those nodes. >>>>>>> >>>>>>> My question is, how should I start debugging this to understand >>>>>>> what process is hanging? >>>>>>> >>>>>>> Thanks, >>>>>>> Craig >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Craig Tierney (craig.tierney@noaa.gov) >>>>>>> _______________________________________________ >>>>>>> mvapich-discuss mailing list >>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>>> >>>>> -- >>>>> Craig Tierney (craig.tierney@noaa.gov) >>>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>> >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > -- Craig Tierney (craig.tierney@noaa.gov) From chris.neale at utoronto.ca Mon Jul 27 21:04:10 2009 From: chris.neale at utoronto.ca (chris.neale@utoronto.ca) Date: Mon Jul 27 21:41:29 2009 Subject: [mvapich-discuss] Problem compiling gromacs-4.0.5 against mvapich2-1.4rc1 Message-ID: <20090727210410.p42xnp4qpwgk008c@webmail.utoronto.ca> Thank you very much Jonathan. This was very helpful. Chris. -- original message -- My responses to your questions are inline. Please post questions related to MVAPICH or MVAPICH2 to mvapich-discuss at cse.ohio-state.edu. Thanks. At some point chris.neale wrote: > Hello, > > I was having the same problem with gromacs mvapich2 until I found > this post. I would rather not use the branch version for production > simulations right now, so I have taken the > mvapich2-1.4rc1-3378.tar.gz distribution and modified > > src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c > at line 503 > from: > unsigned long debug = 0; > to: > static unsigned long debug = 0; > > as per mvapich2-trunk-2009-07-23 > > The compilation runs to completion and I am hoping that you can > offer a bit of advice. > > 1) Is this the only change that was completed in order to address > Jerry's original probem? Yes. > > 2) Is this fix stable with the mvapich2-1.4rc1-3378.tar.gz distribution? Yes. > > Thank you, > Chris. > > > On Tue, Jul 14, 2009 at 12:23:11PM -0400, Jerry Leahy wrote: > > /usr/local/lib/libmpich.a(ibv_channel_manager.o):(.bss+0x10): multiple > > definition of `debug' > > > <..snip..>/gromacs-4.0.5/src/gmxlib/.libs/libgmx_mpi.a(gmx_fatal.o):(.bss+0x0): > > first defined here > > collect2: ld returned 1 exit status > > make[3]: *** [grompp] Error 1 > > > > It looks like 'debug' is conflicting in both MVAPICH2 and in Gromacs. > > > > Any suggestions? > > We've committed a change to our source this morning that changes this > definition of debug to static so that this shouldn't cause any conflicts > in the shared namespace. Can you try building the latest from trunk and > confirm that this resolves your problem? > > -- original message -- > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-July/002403.html -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available From Craig.Tierney at noaa.gov Wed Jul 29 16:36:41 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Wed Jul 29 16:37:07 2009 Subject: [mvapich-discuss] Can I start a job using mpirun_rsh without using ethernet? In-Reply-To: <20090722165335.GC3023@cse.ohio-state.edu> References: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> <20090722165335.GC3023@cse.ohio-state.edu> Message-ID: <4A70B2D9.9060804@noaa.gov> I am trying to figure out how to launch a job with mpirun_rsh without using the ethernet. If I specify the IBoIP addresses in my machine file, then mpispawn is launched over the IB. However, mpispawn still connects using the ethernet host names. Thanks, Craig -- Craig Tierney (craig.tierney@noaa.gov) From perkinjo at cse.ohio-state.edu Wed Jul 29 17:04:11 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Wed Jul 29 17:04:38 2009 Subject: [mvapich-discuss] Can I start a job using mpirun_rsh without using ethernet? In-Reply-To: <4A70B2D9.9060804@noaa.gov> References: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> <20090722165335.GC3023@cse.ohio-state.edu> <4A70B2D9.9060804@noaa.gov> Message-ID: <20090729210411.GT2447@cse.ohio-state.edu> On Wed, Jul 29, 2009 at 02:36:41PM -0600, Craig Tierney wrote: > > I am trying to figure out how to launch a job with mpirun_rsh > without using the ethernet. If I specify the IBoIP addresses > in my machine file, then mpispawn is launched over the IB. > However, mpispawn still connects using the ethernet host names. In order to do this then the hostname returned by gethostbyname would have to negotiate to the IPoIB address for that host. It sounds like this is not the case in your setup. -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090729/a1959ca2/attachment.bin From Craig.Tierney at noaa.gov Wed Jul 29 17:45:14 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Wed Jul 29 17:45:40 2009 Subject: [mvapich-discuss] Can I start a job using mpirun_rsh without using ethernet? In-Reply-To: <20090729210411.GT2447@cse.ohio-state.edu> References: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> <20090722165335.GC3023@cse.ohio-state.edu> <4A70B2D9.9060804@noaa.gov> <20090729210411.GT2447@cse.ohio-state.edu> Message-ID: <4A70C2EA.2050009@noaa.gov> Jonathan Perkins wrote: > On Wed, Jul 29, 2009 at 02:36:41PM -0600, Craig Tierney wrote: >> I am trying to figure out how to launch a job with mpirun_rsh >> without using the ethernet. If I specify the IBoIP addresses >> in my machine file, then mpispawn is launched over the IB. >> However, mpispawn still connects using the ethernet host names. > > In order to do this then the hostname returned by gethostbyname would > have to negotiate to the IPoIB address for that host. It sounds like > this is not the case in your setup. Is it that gethostbyname needs to resolve properly, or gethostname does? I wrote a small test and if you pass a IB host name, such as h1-ib0, that is resolves correctly. There is a place in mpispawn that calls gethostname and that would resolved to the ethernet address given the current configuration. I can't tell from the code path if that is used to communicate between hosts during the mpispawn setup or not. Thanks, Craig > > > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Craig Tierney (craig.tierney@noaa.gov) From perkinjo at cse.ohio-state.edu Wed Jul 29 19:29:17 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Wed Jul 29 19:29:44 2009 Subject: [mvapich-discuss] Can I start a job using mpirun_rsh without using ethernet? In-Reply-To: <4A70C2EA.2050009@noaa.gov> References: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> <20090722165335.GC3023@cse.ohio-state.edu> <4A70B2D9.9060804@noaa.gov> <20090729210411.GT2447@cse.ohio-state.edu> <4A70C2EA.2050009@noaa.gov> Message-ID: <20090729232917.GB2532@cse.ohio-state.edu> On Wed, Jul 29, 2009 at 03:45:14PM -0600, Craig Tierney wrote: > Jonathan Perkins wrote: > > On Wed, Jul 29, 2009 at 02:36:41PM -0600, Craig Tierney wrote: > >> I am trying to figure out how to launch a job with mpirun_rsh > >> without using the ethernet. If I specify the IBoIP addresses > >> in my machine file, then mpispawn is launched over the IB. > >> However, mpispawn still connects using the ethernet host names. > > > > In order to do this then the hostname returned by gethostbyname would > > have to negotiate to the IPoIB address for that host. It sounds like > > this is not the case in your setup. > > Is it that gethostbyname needs to resolve properly, or gethostname > does? I wrote a small test and if you pass a IB host name, such as > h1-ib0, that is resolves correctly. There is a place in mpispawn that > calls gethostname and that would resolved to the ethernet address > given the current configuration. I can't tell from the code path if > that is used to communicate between hosts during the mpispawn setup or > not. I guess I meant the hostname passed to gethostbyname would have to resolve to the IPoIB address for that host. The function gethostname is used in a couple locations to determine how things should connect to each other when bootstrapping the mpi application. These portions of code will lead to traffic on whichever network the hostname returned by gethostname is on. > > Thanks, > Craig -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090729/87e8d18f/attachment.bin From nathan.baca at gmail.com Thu Jul 30 11:22:23 2009 From: nathan.baca at gmail.com (Nathan Baca) Date: Thu Jul 30 11:22:50 2009 Subject: [mvapich-discuss] mvapich2-1.4 full release? Message-ID: Are there plans to do a full release of mvapich2-1.4? It looks like the current download (mvapich2 1.4-RC1-3378) is a release candidate. Thanks, Nate -- Nathan Baca nathan.baca@gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090730/832b5377/attachment-0001.html From panda at cse.ohio-state.edu Thu Jul 30 11:39:52 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Jul 30 11:40:19 2009 Subject: [mvapich-discuss] mvapich2-1.4 full release? In-Reply-To: Message-ID: > Are there plans to do a full release of mvapich2-1.4? It looks like the > current download (mvapich2 1.4-RC1-3378) is a release candidate. Thanks, > Nate Yes, coming to you very soon. We have already incorporated all the comments, feedbacks and bug-fixes received from the community into the trunk version. Some more fixes have also been applied based on our internal testing. We are going through one round of final testing with the latest trunk version to make sure that everything is super-stable. We plan to bring out RC2 during the coming week. Then we plan to carry out performance tuning for various platforms and configurations. The final version is expected to be released two weeks after that (around the second/third week of August). If you want the latest update, please take the trunk version. This is what we have been testing currently. Alternatively, you can wait for the RC2 to come out next week. This RC2 version will be very close to the final version. Hope this helps. Thanks, DK > -- > Nathan Baca > nathan.baca@gmail.com >