From dog at lanl.gov Tue Jul 1 10:40:12 2008 From: dog at lanl.gov (David Gunter) Date: Tue Jul 1 10:40:23 2008 Subject: [mvapich-discuss] Trouble building MVAPICH2-1.0.3 with any compiler other than gcc Message-ID: <196B4114-4507-44B9-A0DE-4B6AEBB21BF0@lanl.gov> I have run into this problem previously with the PGI compilers and was once able to work around it; however, it seems to have reared its ugly head again and I'm hoping someone on the list knows of a solution. The problem is that we need to build MVAPICH2 using the Intel, PathScale and PGI compilers in addition to the GCC compilers. Even though the documentation states that it has been tested with these other compilers I think that such tests were not done with Totalview support in mind. What happens during the build is that src/pm/mpd/mtv_setup.py is invoked. This causes the Python Distutils to try and create a Totalview module but Disutils only knows to put in flags for the GCC compilers. I have found switches for PGI and PathScale to ignore "invalid" flags but any code compiled with the resulting build does nothing but segfault. I have yet to get Intel to compile the sourcecode. This leaves us with two options: Give up on MVAPICH2 in favor of Open- MPI, which means having only one MPI implementation on a system where we'd prefer to have two, or give up on Totalview support - which is not going to fly with our user base. Does anyone know enough about Distutils to work around this problem? -david -- David Gunter HPC-3: Parallel Tools Team Los Alamos National Laboratory From Craig.Tierney at noaa.gov Tue Jul 1 14:58:40 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Tue Jul 1 14:58:46 2008 Subject: [mvapich-discuss] Question about launch times with mpd (fwd) In-Reply-To: References: Message-ID: <486A7E60.4090808@noaa.gov> wei huang wrote: > Hi Craig, > > The current mpd based startup can take longer time as the system size > grows. The main reason for this is that mpd needs to exchange connection > information through a TCP/IP based ring structure. It may also take some > time to launch the processes (from you type in the mpiexec command to the > processes reach their mpd_init stage). > > The number you see, however, is too large though compared with what we > observe on our system. Could you please confirm that you are at our latest > release version (mvapich2-1.0.3)? We have some mpd related patch in that > version. > The same problem exists with mvapich2-1.0.3. I ensured that I was using the mpiexec from this distribution (because multiple versions exist). Craig > Another news that you may interest in is that we are releasing > mvapich2-1.2-rc1 either today or tomorrow. In this version, we have much > improved scalability in job launching using a new startup mechanism. You > may want to try this version out. > > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > >> Date: Mon, 30 Jun 2008 10:17:51 -0600 >> From: Craig Tierney >> To: mvapich-discuss@cse.ohio-state.edu >> Subject: [mvapich-discuss] Question about launch times with mpd >> >> I am trying to benchmark some applications on my system >> and I have found something I did not expect with regards >> to launch time of applications. >> >> All jobs are launched through a batch system (Sun Gridengine). >> SGE is configured without tight-integration. Since mpd has >> to be setup for each user in each job, I have a wrapper script >> that does roughly the following: >> >> ----------------------- >> $me=`uname -n` >> $port=`mpd --ncpus=4 --echo --daemon --ifhn=$me-ib0` >> >> for every other node >> ssh $node mpd --ncpus=4 -h $me -p $port --daemon --ifhn=$node-ib0 & >> end >> waitall >> >> mpiexec -machinefile $machine_file $EXE >> ----------------------- >> >> I am running an MPI program that does very little. >> It calls mpi_init, writes the hostname, then calls mpi_finalize. >> >> I measured the time it takes to launch mpd, the time it >> takes for the program to execute (after MPI_init to just before >> MPI_Finalize), and the time to call MPI_Init. >> >> Cores mpd complete runjob >> ------------------------------ >> 4 ~0.0 ~0.0 0.6 >> 16 0.6 ~0.0 0.8 >> 64 0.7 ~0.0 3.5 >> 128 1.0 0.2 11.2 >> 256 1.7 0.6 47.0 >> 324 2.1 0.1 76.4 >> 512 3.1 0.5 202.4 >> >> - All timings are in seconds. >> >> mpd - the time to launch the mpd processes in parallel >> complete - the time to run the application >> runjob - the time to execute mpiexec >> >> My question is, why does mpd take so long to launch a job? >> Am I doing something wrong? Is there something I can do >> to minimize the startup time? >> >> Thanks, >> Craig >> >> >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- Craig Tierney (craig.tierney@noaa.gov) From yogyas at gmail.com Wed Jul 2 07:26:31 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Wed Jul 2 07:26:38 2008 Subject: [mvapich-discuss] SDR & DDR selection in MVAPICH2-uDAPL compilation Message-ID: Hi all, While compiling MVAPICH2 for udapl device, the selection menu asks for link speed. Two options are available :- SDR & DDR Generally these are the properties of the IB HCA. Now, on what things this selection is based on ? on IB HCA charactristics ? any other thing ? & In MVAPICH2-uDAPL code, how this selection matters ? OR How this selection is going to change the behaviour of MVAPICH2-uDAPL ? Thanks, Yogeshwar From chai.15 at osu.edu Wed Jul 2 12:52:27 2008 From: chai.15 at osu.edu (Lei Chai) Date: Wed Jul 2 12:52:27 2008 Subject: [mvapich-discuss] SDR & DDR selection in MVAPICH2-uDAPL compilation In-Reply-To: References: Message-ID: <486BB24B.5010607@osu.edu> Hi Yogeshwar, The selection is based on your IB card, whether it is SDR or DDR. Inside mvapich2-udapl code this information is used for parameter tuning. We have observed before that different parameter values are needed to yield best performance for these two kinds of cards. On a separate topic, if you are using IB, we suggest you use the OpenFabrics interface in mvapich2, which provides the best performance, scalability, and fault tolerance. Lei yogeshwar sonawane wrote: > Hi all, > > While compiling MVAPICH2 for udapl device, the selection menu asks for > link speed. > Two options are available :- SDR & DDR > Generally these are the properties of the IB HCA. > > Now, on what things this selection is based on ? > on IB HCA charactristics ? any other thing ? > & > In MVAPICH2-uDAPL code, how this selection matters ? > OR How this selection is going to change the behaviour of MVAPICH2-uDAPL ? > > Thanks, > Yogeshwar > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Wed Jul 2 23:08:15 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Jul 2 23:08:20 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.2RC1 Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH2 1.2RC1 with the following NEW features: - Based on MPICH2 1.0.7 - Scalable and robust daemon-less job startup - Enhanced and robust mpirun_rsh framework (non-MPD-based) to provide scalable job launching on multi-thousand core clusters - Available for OpenFabrics (IB and iWARP) and uDAPL interfaces (including Solaris) - Checkpoint-restart with intra-node shared memory support - Allows best performance and scalability with fault-tolerance support - Enhancement to software installation - Full autoconf-based configuration - An application (mpiname) for querying the MVAPICH2 library version and configuration information - Enhanced processor affinity using PLPA for multi-core architectures - Allows user-defined flexible processor affinity - Enhanced scalability for RDMA-based direct one-sided communication with less communication resource - Shared memory optimized MPI_Bcast operations - Optimized and tuned MPI_Alltoall For downloading MVAPICH2 1.2RC1, associated user guide and accessing the SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu All feedbacks, including bug reports and hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From panda at cse.ohio-state.edu Thu Jul 3 09:12:05 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Jul 3 09:12:09 2008 Subject: [mvapich-discuss] Trouble building MVAPICH2-1.0.3 with any compiler other than gcc In-Reply-To: <196B4114-4507-44B9-A0DE-4B6AEBB21BF0@lanl.gov> Message-ID: Hi David, You might have noticed that we made a release of MVAPICH2 1.2RC1 yesterday night. This has multiple start-up schemes including the traditional MPD-based and also a new scalable mpirun_rsh-based (similar to MVAPICH). We have verified that the MPD-based startup works with TotalView for all compilers. The complete TotalView support with the new mpirun_rsh-based scheme is not there yet. We are working on it and plan to have it in the final release version. You can check the MPD-based startup + TotalView for all compilers for this release and let us know if it works from your view point. Thanks, DK On Tue, 1 Jul 2008, David Gunter wrote: > I have run into this problem previously with the PGI compilers and was > once able to work around it; however, it seems to have reared its ugly > head again and I'm hoping someone on the list knows of a solution. > > The problem is that we need to build MVAPICH2 using the Intel, > PathScale and PGI compilers in addition to the GCC compilers. Even > though the documentation states that it has been tested with these > other compilers I think that such tests were not done with Totalview > support in mind. > > What happens during the build is that src/pm/mpd/mtv_setup.py is > invoked. This causes the Python Distutils to try and create a > Totalview module but Disutils only knows to put in flags for the GCC > compilers. > > I have found switches for PGI and PathScale to ignore "invalid" flags > but any code compiled with the resulting build does nothing but > segfault. I have yet to get Intel to compile the sourcecode. > > This leaves us with two options: Give up on MVAPICH2 in favor of Open- > MPI, which means having only one MPI implementation on a system where > we'd prefer to have two, or give up on Totalview support - which is > not going to fly with our user base. > > Does anyone know enough about Distutils to work around this problem? > > -david > -- > David Gunter > HPC-3: Parallel Tools Team > Los Alamos National Laboratory > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From manfred.muecke at univie.ac.at Thu Jul 3 12:32:20 2008 From: manfred.muecke at univie.ac.at (Manfred Muecke) Date: Thu Jul 3 12:36:02 2008 Subject: [mvapich-discuss] Q: MPI_ALLGATHERV causes Invalid communicator & comm=0x0 Message-ID: <43450.129.27.140.172.1215102740.squirrel@webmail.univie.ac.at> Hi, I have the following problem and ran out of ideas. Maybe someone can help with some advice. I get the following error message from all instances of my MPI-program (FORTRAN90), using MVAPICH2 1.0 (compiled with "mpe=mpicheck"): Invalid communicator, error stack: MPI_Comm_rank(107): MPI_Comm_rank(comm=0x0, rank=fffffd7fffdfd2bc) failed MPI_Comm_rank(65).: Invalid communicator The error is caused by a call to MPI_ALLGATHERV. It was discussed here that a similiar looking error is caused by including the wrong mpi.h. This one differs however in that comm=0x0 (the actual value of the communicator was 1140850688). "mpif90 -show" gives: /opt/local/SunStudio12/SUNWspro/bin/f90 -xO3 -xtarget=opteron -m64 -I/opt/local/MVAPICH/mvapich2-1.0/include -xO3 -M/opt/local/MVAPICH/mvapich2-1.0/include -L/opt/local/MVAPICH/mvapich2-1.0/lib -lmpichf90 -lmpichf90 -lmpich -L/usr/lib/amd64 -L/usr/ucblib/amd64 -lsocket -lnsl -lresolv -lpthread -ldat -lrt -lnsl -lsocket I have checked thoroughly and can not find any mpi.h from other installations interfering. Any other ideas? Thanks for your help, Manfred -- Manfred M?cke manfred.muecke@univie.ac.at Research Lab Computational Technologies and Applications rlcta.univie.ac.at Lenaugasse 2, 1080 Wien, AUSTRIA From thakur at mcs.anl.gov Thu Jul 3 13:27:58 2008 From: thakur at mcs.anl.gov (Rajeev Thakur) Date: Thu Jul 3 13:28:08 2008 Subject: [mvapich-discuss] Trouble building MVAPICH2-1.0.3 with any compiler other than gcc In-Reply-To: <200807031653.m63GrMof016224@cse.ohio-state.edu> References: <200807031653.m63GrMof016224@cse.ohio-state.edu> Message-ID: <003401c8dd32$23330a70$860add8c@mcs.anl.gov> David, One way around that problem is to edit the Makefile created by configure in src/pm/mpd and change the compiler to gcc (just for that directory). The rest of MPICH2 will get built with the Intel or other compiler chosen. Rajeev > On Tue, 1 Jul 2008, David Gunter wrote: > > > I have run into this problem previously with the PGI > compilers and was > > once able to work around it; however, it seems to have > reared its ugly > > head again and I'm hoping someone on the list knows of a solution. > > > > The problem is that we need to build MVAPICH2 using the Intel, > > PathScale and PGI compilers in addition to the GCC compilers. Even > > though the documentation states that it has been tested with these > > other compilers I think that such tests were not done with Totalview > > support in mind. > > > > What happens during the build is that src/pm/mpd/mtv_setup.py is > > invoked. This causes the Python Distutils to try and create a > > Totalview module but Disutils only knows to put in flags for the GCC > > compilers. > > > > I have found switches for PGI and PathScale to ignore > "invalid" flags > > but any code compiled with the resulting build does nothing but > > segfault. I have yet to get Intel to compile the sourcecode. > > > > This leaves us with two options: Give up on MVAPICH2 in > favor of Open- > > MPI, which means having only one MPI implementation on a > system where > > we'd prefer to have two, or give up on Totalview support - which is > > not going to fly with our user base. > > > > Does anyone know enough about Distutils to work around this problem? > > > > -david > > -- > > David Gunter > > HPC-3: Parallel Tools Team > > Los Alamos National Laboratory > > From curtisbr at cse.ohio-state.edu Thu Jul 3 14:34:44 2008 From: curtisbr at cse.ohio-state.edu (Brian Curtis) Date: Thu Jul 3 14:34:49 2008 Subject: [mvapich-discuss] Q: MPI_ALLGATHERV causes Invalid communicator & comm=0x0 In-Reply-To: <43450.129.27.140.172.1215102740.squirrel@webmail.univie.ac.at> References: <43450.129.27.140.172.1215102740.squirrel@webmail.univie.ac.at> Message-ID: <8D311EC4-491A-4277-8099-F4D546A0C4B3@cse.ohio-state.edu> Manfred, Do you see this problem when MPE is disabled? Also, we released MVAPICH2 1.2rc1 yesterday. It contains numerous bug fixes and enhancements for improved scalability and performance. Can you try it out and see if you still experience this problem? Brian On Jul 3, 2008, at 12:32 PM, Manfred Muecke wrote: > Hi, > > I have the following problem and ran out of ideas. Maybe someone > can help > with some advice. > > I get the following error message from all instances of my MPI-program > (FORTRAN90), using MVAPICH2 1.0 (compiled with "mpe=mpicheck"): > > Invalid communicator, error stack: > MPI_Comm_rank(107): MPI_Comm_rank(comm=0x0, rank=fffffd7fffdfd2bc) > failed > MPI_Comm_rank(65).: Invalid communicator > > The error is caused by a call to MPI_ALLGATHERV. It was discussed here > that a similiar looking error is caused by including the wrong > mpi.h. This > one differs however in that comm=0x0 (the actual value of the > communicator > was 1140850688). > > "mpif90 -show" gives: > > /opt/local/SunStudio12/SUNWspro/bin/f90 > -xO3 -xtarget=opteron -m64 > -I/opt/local/MVAPICH/mvapich2-1.0/include > -xO3 > -M/opt/local/MVAPICH/mvapich2-1.0/include > -L/opt/local/MVAPICH/mvapich2-1.0/lib > -lmpichf90 -lmpichf90 -lmpich > -L/usr/lib/amd64 > -L/usr/ucblib/amd64 > -lsocket -lnsl -lresolv -lpthread -ldat -lrt -lnsl -lsocket > > I have checked thoroughly and can not find any mpi.h from other > installations interfering. Any other ideas? > > Thanks for your help, Manfred > > > > > -- > Manfred M?cke manfred.muecke@univie.ac.at > Research Lab Computational Technologies and Applications > rlcta.univie.ac.at Lenaugasse 2, 1080 Wien, AUSTRIA > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From worleys at gmail.com Fri Jul 4 11:21:33 2008 From: worleys at gmail.com (Chris Worley) Date: Fri Jul 4 11:21:39 2008 Subject: [mvapich-discuss] SEGFAULT: mpispawn.c line 303 Message-ID: Using OFED 1.3 and the MVAPICH 1.0 included on OFED's download. Intermittently (not always) get segfault in mpispawn line 303 at large (i.e. 2048) core counts. Chris From Marcos.Verissimo at uclouvain.be Fri Jul 4 11:21:58 2008 From: Marcos.Verissimo at uclouvain.be (Marcos Verissimo Alves) Date: Fri Jul 4 11:22:10 2008 Subject: [mvapich-discuss] Trouble configuring MVAPICH2 1.2RC1 Message-ID: <62345.189.68.75.35.1215184918.squirrel@mmp-3-1.sipr-dc.ucl.ac.be> Hi all, I am trying to build the new release candidate of mvapich in our cluster, but I get an error: (...) checking for the InfiniBand includes path... default checking for the InfiniBand library path... default checking for library containing umad_init... no configure: error: 'libibumad not found. Did you specify --with-ib-libpath=?' configure: error: /bin/sh '/home/pcpm/mverissi/test/mvapich2-1.2rc1/src/mpid/ch3/channels/mrail/configure' failed for channels/mrail configure: error: Configure of src/mpid/ch3 failed! What is the usual name of the library file? Any way of getting around this, if I can't find the library containing umad_init ? Cheers, Marcos -- Dr. Marcos Verissimo Alves Post-Doctoral Fellow Unit? de Physico-Chimie et de Physique des Mat?riaux (PCPM) Universit? Catholique de Louvain 1 Place Croix du Sud, B-1348 Louvain-la-Neuve Belgique ------ Gort, Klaatu barada nikto. Klaatu barada nikto. Klaatu barada nikto. Free translation: Gort, Google is your friend. Google is your friend. Google is your friend. From curtisbr at cse.ohio-state.edu Sat Jul 5 00:51:25 2008 From: curtisbr at cse.ohio-state.edu (Brian Curtis) Date: Sat Jul 5 00:51:36 2008 Subject: [mvapich-discuss] Trouble configuring MVAPICH2 1.2RC1 In-Reply-To: <62345.189.68.75.35.1215184918.squirrel@mmp-3-1.sipr-dc.ucl.ac.be> References: <62345.189.68.75.35.1215184918.squirrel@mmp-3-1.sipr-dc.ucl.ac.be> Message-ID: Marcos, The error is indicating that the libibumad library cannot be found in the default search path. This is a required library. If you do not have this library installed, it can be obtained from the OpenFabrics Alliance software stack (http://www.openfabrics.org/). If you do not install the InfiniBand libraries in the system's library path (usr/ lib or usr/local/lib is recommended), please specify the path to this library during configuration with --with-ib-libpath={path to InfiniBand libraries}. Brian On Jul 4, 2008, at 11:21 AM, Marcos Verissimo Alves wrote: > Hi all, > > I am trying to build the new release candidate of mvapich in our > cluster, > but I get an error: > > (...) > checking for the InfiniBand includes path... default > checking for the InfiniBand library path... default > checking for library containing umad_init... no > configure: error: 'libibumad not found. Did you specify > --with-ib-libpath=?' > configure: error: /bin/sh > '/home/pcpm/mverissi/test/mvapich2-1.2rc1/src/mpid/ch3/channels/ > mrail/configure' > failed for channels/mrail > configure: error: Configure of src/mpid/ch3 failed! > > What is the usual name of the library file? Any way of getting around > this, if I can't find the library containing umad_init ? > > Cheers, > > Marcos > > -- > Dr. Marcos Verissimo Alves > Post-Doctoral Fellow > Unit? de Physico-Chimie et de Physique des Mat?riaux (PCPM) > Universit? Catholique de Louvain > 1 Place Croix du Sud, B-1348 > Louvain-la-Neuve > Belgique > > ------ > > Gort, Klaatu barada nikto. Klaatu barada nikto. Klaatu barada nikto. > > Free translation: > > Gort, Google is your friend. Google is your friend. Google is your > friend. > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From yogyas at gmail.com Sat Jul 5 04:23:47 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Sat Jul 5 04:23:54 2008 Subject: [mvapich-discuss] MPD related error Message-ID: Hi all, I am trying to run 64 processes using MVAPICH2-1.0.1-uDAPL on 8 nodes. Every node has 8 cores/cpus. Out of 64, sometimes one or more processes gets killed or closed. The node on which there are less than 8 processes running has following message which comes in /var/log/messages file :- Jul 4 13:23:05 pn02 mpdman: pn02_mpdman_12: mpd_uncaught_except_tb handling: exceptions.AttributeError: 'int' object has no attribute 'send_dict_msg' /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 652 handle_lhs_input self.ring.rhsSock.send_dict_msg(msg) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdlib.py 743 handle_active_streams handler(stream,*args) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 481 run rv = self.streamHandler.handle_active_streams(timeout=5.0) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1408 launch_mpdman_via_fork mpdman.run() /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1325 run_one_cli (manPid,toManSock) = self.launch_mpdman_via_fork(msg,man_env) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1199 do_mpdrun self.run_one_cli(lorank,msg) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 854 handle_lhs_input self.do_mpdrun(msg) /home/htdg Can anybody give me some more info about this ? Is this some kind of setup/settings issue on nodes ? Thanks, Yogeshwar From panda at cse.ohio-state.edu Sat Jul 5 08:18:06 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Jul 5 08:18:15 2008 Subject: [mvapich-discuss] MPD related error In-Reply-To: Message-ID: Thanks for your note. You might have noticed that a new version of MVAPICH2 (1.2RC1) was released a few days back. This release has a non-MPD (daemon-less) startup scheme. This new start-up scheme is applicable for all interfaces including uDAPL. I will suggest you to upgrade your software stack to this release. It will provide you faster start-up and you need not worry about the MPD-related issues. DK On Sat, 5 Jul 2008, yogeshwar sonawane wrote: > Hi all, > > I am trying to run 64 processes using MVAPICH2-1.0.1-uDAPL on 8 nodes. > Every node has 8 cores/cpus. > > Out of 64, sometimes one or more processes gets killed or closed. The > node on which there are less than 8 processes running has following > message which comes in /var/log/messages file :- > > Jul 4 13:23:05 pn02 mpdman: pn02_mpdman_12: mpd_uncaught_except_tb > handling: exceptions.AttributeError: 'int' > object has no attribute 'send_dict_msg' > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 652 > handle_lhs_input self.ring.rhsSock.send_dict_msg(msg) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdlib.py 743 > handle_active_streams handler(stream,*args) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 481 run > rv = self.streamHandler.handle_active_streams(timeout=5.0) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1408 > launch_mpdman_via_fork mpdman.run() > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1325 > run_one_cli (manPid,toManSock) = > self.launch_mpdman_via_fork(msg,man_env) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1199 > do_mpdrun self.run_one_cli(lorank,msg) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 854 > handle_lhs_input self.do_mpdrun(msg) /home/htdg > > Can anybody give me some more info about this ? > Is this some kind of setup/settings issue on nodes ? > > Thanks, > Yogeshwar > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From singh.jasjit at yahoo.co.in Mon Jul 7 09:44:11 2008 From: singh.jasjit at yahoo.co.in (jasjit singh) Date: Mon Jul 7 09:44:24 2008 Subject: [mvapich-discuss] EP attributes' values Message-ID: <179419.97136.qm@web94005.mail.in2.yahoo.com> Hi I am using mvapich2-1.0.1 While running more than 64 processes on 8 nodes (each with 8 cores, 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain attributes. 1) Value of max_rdma_write_iov changes from 0 to 42. Value of max_rdma_read_iov also changes from 0 to a non-zero value. I want to know why there is such a dramatic change in these values.How should we proceed if we want to run more than 64 processes successfully ? 2) Value of max_message_size attribute in our stack is 4294967296 (i.e 4GB) that is returned in dat_ia_query(). So we are expecting MVAPICH to set the same value for max_message_size while setting DAT_EP_ATTR in EP creation. It is doing so if we run upto 64 processes. But if number of processes exceed 64, MVAPICH sets this value to 1024(i.e 1K). This is again a drastic change. And what is more surprising is it does post recv for size larger than 1K. MVAPICH, it seems, is on one hand limiting MAX MESSAGE SIZE and on the other hand posting larger data size. I am sure that changes in these values have nothing to do with the number of nodes (or oversubscription, I essentially mean).(CMIIW) These changes are only due to increase in number of processes. And one thing more I want to confirm is this has nothing to do with cluster type whether this is small, medium or large as the limit for number of processes for small cluster is 128. Regards, Jasjit Singh __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080707/22842a17/attachment-0001.html From alex.theodore at hp.com Mon Jul 7 12:21:39 2008 From: alex.theodore at hp.com (Theodore, Alex) Date: Mon Jul 7 12:25:37 2008 Subject: [mvapich-discuss] MVAPICH2 Installation / Configuration help Message-ID: <2B7BF477E7CCD44892CBB596329A10A228D07E8EB7@GVW0432EXB.americas.hpqcorp.net> I've installed MVAPICH2 (mvapich2-1.2rc1) with the following options: ./configure --prefix=/opt/mvapich2 --with-rdma=gen2 --enable-f90 --enable-f77 --enable-mpe make make install Then I configured the following environment variables in ~/.bashrc: export MVAPICH2_HOME=/opt/mvapich2 export PATH=$MVAPICH2_HOME/bin:$MVAPICH2_HOME/sbin:$PATH export MANPATH=$MVAPICH2_HOME/man:$MANPATH When I try to run the application it doesn't seem to work properly: 1) Created file called "hosts1" with four compute nodes... each host's hostname on one line 2) Created .mpd.conf and .mpdpasswd with password on head node, and distributed to compute nodes 3) Ran "mpirun_rsh -ssh -np 4 -hostfile /root/MPI/hosts1 ./test-alex-bcast" with following output /usr/bin/env: mpispawn: No such file or directory Child exited abnormally! cleanupKilling remote processes.../usr/bin/env: mpispawn: No such file or directory /usr/bin/env: mpispawn: No such file or directory /usr/bin/env: mpispawn: No such file or directory DONE What am I missing? I'm sure this is likely a configuration issue.. any help / guidance would be greatly appreciated. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080707/a73f3800/attachment.html From sridharj at cse.ohio-state.edu Mon Jul 7 12:36:55 2008 From: sridharj at cse.ohio-state.edu (Jaidev Sridhar) Date: Mon Jul 7 12:35:35 2008 Subject: [mvapich-discuss] MVAPICH2 Installation / Configuration help In-Reply-To: <2B7BF477E7CCD44892CBB596329A10A228D07E8EB7@GVW0432EXB.americas.hpqcorp.net> References: <2B7BF477E7CCD44892CBB596329A10A228D07E8EB7@GVW0432EXB.americas.hpqcorp.net> Message-ID: <1215448615.13766.4.camel@t13.nowlab.cis.ohio-state.edu> Hi Alex, mpispawn is a utility that mpirun_rsh starts on all nodes. Can you check if a) mpispawn is installed in $MVAPICH2_HOME/bin (i.e., same directory as mpirun_rsh) on all nodes and b) $PATH (including $MVAPICH2_HOME/bin) is being propagated correctly -Jaidev On Mon, 2008-07-07 at 16:21 +0000, Theodore, Alex wrote: > I?ve installed MVAPICH2 (mvapich2-1.2rc1) with the following options: > > > > ./configure --prefix=/opt/mvapich2 --with-rdma=gen2 --enable-f90 > --enable-f77 --enable-mpe > > make > > make install > > > > Then I configured the following environment variables in ~/.bashrc: > > > > export MVAPICH2_HOME=/opt/mvapich2 > > export PATH=$MVAPICH2_HOME/bin:$MVAPICH2_HOME/sbin:$PATH > > export MANPATH=$MVAPICH2_HOME/man:$MANPATH > > > > When I try to run the application it doesn?t seem to work properly: > > > > 1) Created file called ?hosts1? with four compute nodes? each > host?s hostname on one line > > 2) Created .mpd.conf and .mpdpasswd with password on head node, > and distributed to compute nodes > > 3) Ran ?mpirun_rsh -ssh -np 4 > -hostfile /root/MPI/hosts1 ./test-alex-bcast? with following output > > > > /usr/bin/env: mpispawn: No such file or directory > > > > Child exited abnormally! > > cleanupKilling remote processes.../usr/bin/env: mpispawn: No such file > or directory > > /usr/bin/env: mpispawn: No such file or directory > > /usr/bin/env: mpispawn: No such file or directory > > DONE > > > > What am I missing? I?m sure this is likely a configuration issue.. > any help / guidance would be greatly appreciated. > > > > Thanks, > > > > Alex > > > > > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From sridharj at cse.ohio-state.edu Mon Jul 7 14:03:04 2008 From: sridharj at cse.ohio-state.edu (Jaidev Sridhar) Date: Mon Jul 7 14:01:44 2008 Subject: [Fwd: RE: [mvapich-discuss] SEGFAULT: mpispawn.c line 303] Message-ID: <1215453784.13766.12.camel@t13.nowlab.cis.ohio-state.edu> Looks like I missed the mailing list in my original response. Chris, in addition, we think the user application is segfaulting or terminating unexpectedly. In some versions of mvapich, you'd see mpispawn.c:303 Unexpected exit status when an application seg-faults. -Jaidev -------- Forwarded Message -------- > From: Jaidev Sridhar > Reply-To: Jaidev Sridhar > To: Chris Worley > Subject: RE: [mvapich-discuss] SEGFAULT: mpispawn.c line 303 > Date: Sat, 5 Jul 2008 10:20:05 -0400 > > Hi Chris, > > Do you see any messages on the console after failure? If yes, they would help us mail down this issue. > > -Jaidev > > - original message - > Subject: [mvapich-discuss] SEGFAULT: mpispawn.c line 303 > From: "Chris Worley" > Date: 07-04-2008 15:26 > > Using OFED 1.3 and the MVAPICH 1.0 included on OFED's download. > > Intermittently (not always) get segfault in mpispawn line 303 at large > (i.e. 2048) core counts. > > Chris > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > From chai.15 at osu.edu Mon Jul 7 14:45:01 2008 From: chai.15 at osu.edu (Lei Chai) Date: Mon Jul 7 14:45:05 2008 Subject: [mvapich-discuss] EP attributes' values In-Reply-To: <179419.97136.qm@web94005.mail.in2.yahoo.com> References: <179419.97136.qm@web94005.mail.in2.yahoo.com> Message-ID: <4872642D.5010509@osu.edu> Hi Jasjit, Thanks for using mvapich2. I believe you are using the udapl interface. When the number of processes is larger than 64, on demand connection establishment model is used for better scalability and thus the attribute values are different. If this is a problem on your stack, could you try to disable on demand by setting the threshold to be larger than the number of processes, e.g. $ mpiexec -n 64 -env MV2_ON_DEMAND_THRESHOLD 1024 ./a.out FYI, since the udapl interface in mvapich2 doesn't support the blocking progress mode yet, it will not be beneficial to use over-subscription. If you are using InfiniBand as the network we recommend you use the OFED interface in mvapich2, which provides the best performance, scalability, and features, such as blocking mode for over-subscription etc. The latest release is mvapich2-1.2rc1. Lei jasjit singh wrote: > Hi > > I am using mvapich2-1.0.1 > > While running more than 64 processes on 8 nodes (each with 8 cores, > 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain > attributes. > > 1) > Value of max_rdma_write_iov changes from 0 to 42. > Value of max_rdma_read_iov also changes from 0 to a non-zero value.. > I want to know why there is such a dramatic change in these values.How > should we proceed if we want to run more than 64 processes successfully ? > > 2) > Value of max_message_size attribute in our stack is 4294967296 (i.e > 4GB) that is returned in dat_ia_query(). So we are expecting MVAPICH > to set the same value for max_message_size while setting DAT_EP_ATTR > in EP creation. It is doing so if we run upto 64 processes. But if > number of processes exceed 64, MVAPICH sets this value to 1024(i.e > 1K). This is again a drastic change. And what is more surprising is it > does post recv for size larger than 1K. MVAPICH, it seems, is on one > hand limiting MAX MESSAGE SIZE and on the other hand posting larger > data size. > > I am sure that changes in these values have nothing to do with the > number of nodes (or oversubscription, I essentially mean).(CMIIW) > These changes are only due to increase in number of processes. And one > thing more I want to confirm is this has nothing to do with cluster > type whether this is small, medium or large as the limit for number of > processes for small cluster is 128. > > Regards, > Jasjit Singh > > > ------------------------------------------------------------------------ > Not happy with your email address? > Get the one you really want > - millions of new email addresses available now at Yahoo! > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From yogyas at gmail.com Tue Jul 8 09:46:21 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Tue Jul 8 09:46:29 2008 Subject: [mvapich-discuss] EP attributes' values In-Reply-To: <4872642D.5010509@osu.edu> References: <179419.97136.qm@web94005.mail.in2.yahoo.com> <4872642D.5010509@osu.edu> Message-ID: Hi lei, On 7/8/08, Lei Chai wrote: > Hi Jasjit, > > Thanks for using mvapich2. I believe you are using the udapl interface. > > When the number of processes is larger than 64, on demand connection > establishment model is used for better scalability and thus the attribute > values are different. If this is a problem on your stack, could you try to Can you elaborate more on this:- better scalability & change in the attribute values. Any link or reference will be also helpful. > disable on demand by setting the threshold to be larger than the number of > processes, e.g. > > $ mpiexec -n 64 -env MV2_ON_DEMAND_THRESHOLD 1024 ./a.out > > FYI, since the udapl interface in mvapich2 doesn't support the blocking > progress mode yet, it will not be beneficial to use over-subscription. If > you are using InfiniBand as the network we recommend you use the OFED > interface in mvapich2, which provides the best performance, scalability, and > features, such as blocking mode for over-subscription etc. The latest > release is mvapich2-1.2rc1. > > Lei > > > jasjit singh wrote: > > > > > Hi > > > > I am using mvapich2-1.0.1 > > > > While running more than 64 processes on 8 nodes (each with 8 cores, > 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain > attributes. > > > > 1) > > Value of max_rdma_write_iov changes from 0 to 42. > > Value of max_rdma_read_iov also changes from 0 to a non-zero value.. > > I want to know why there is such a dramatic change in these values.How > should we proceed if we want to run more than 64 processes successfully ? > > > > 2) > > Value of max_message_size attribute in our stack is 4294967296 (i.e 4GB) > that is returned in dat_ia_query(). So we are expecting MVAPICH to set the > same value for max_message_size while setting DAT_EP_ATTR in EP creation. It > is doing so if we run upto 64 processes. But if number of processes exceed > 64, MVAPICH sets this value to 1024(i.e 1K). This is again a drastic change. > And what is more surprising is it does post recv for size larger than 1K. > MVAPICH, it seems, is on one hand limiting MAX MESSAGE SIZE and on the other > hand posting larger data size. > > > > I am sure that changes in these values have nothing to do with the number > of nodes (or oversubscription, I essentially mean).(CMIIW) > > These changes are only due to increase in number of processes. And one > thing more I want to confirm is this has nothing to do with cluster type > whether this is small, medium or large as the limit for number of processes > for small cluster is 128. > > > > Regards, > > Jasjit Singh > > > > > > > ------------------------------------------------------------------------ > > Not happy with your email address? > > Get the one you really want > - millions of new > email addresses available now at Yahoo! > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > with regards, Yogeshwar From kus at free.net Tue Jul 8 09:57:32 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Tue Jul 8 09:57:40 2008 Subject: [mvapich-discuss] mvapich over OFED *and* over IBGD IB stack In-Reply-To: <200807071403.m67E3AMm008795@cse.ohio-state.edu> Message-ID: AFAIK mvapich versions something around 0.9.5 version were "switched" from support of IBGD Mellanox IB stack to support of OFED. But was there some mvapich versions which might work over IBGD *and* OFED ? I'm especially interesting for support of IBGD 1.8.0 and OFED-1.2 or 1.3. Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry Moscow From chai.15 at osu.edu Tue Jul 8 16:49:17 2008 From: chai.15 at osu.edu (Lei Chai) Date: Tue Jul 8 16:49:22 2008 Subject: [mvapich-discuss] EP attributes' values In-Reply-To: References: <179419.97136.qm@web94005.mail.in2.yahoo.com> <4872642D.5010509@osu.edu> Message-ID: <4873D2CD.6050602@osu.edu> To clarify, I mean when on demand is used, the code path is different and the values are set in different places, that's why you have observed different values for on demand and non on demand cases. These values are not directly related to scalability. Regarding the specific parameters Jasjit mentioned, we found that the values were not explicitly set for on demand. I'm attaching a patch below. Could you try and see if it solves your problem. The patch has also been checked in to the latest trunk version. Regards, Lei --------------------------------------------------------- Index: src/mpid/osu_ch3/channels/mrail/src/udapl/rdma_udapl_priv.c =================================================================== --- src/mpid/osu_ch3/channels/mrail/src/udapl/rdma_udapl_priv.c (revision 2839) +++ src/mpid/osu_ch3/channels/mrail/src/udapl/rdma_udapl_priv.c (working copy) @@ -1067,6 +1067,7 @@ { ep_attr.service_type = DAT_SERVICE_TYPE_RC; ep_attr.max_mtu_size = rdma_default_mtu_size; + ep_attr.max_message_size = ia_attr.max_message_size; ep_attr.max_rdma_size = ia_attr.max_rdma_size; ep_attr.qos = DAT_QOS_BEST_EFFORT; ep_attr.recv_completion_flags = DAT_COMPLETION_DEFAULT_FLAG; @@ -1081,6 +1082,8 @@ ep_attr.max_request_iov = MIN (rdma_default_max_sg_list, ia_attr.max_iov_segments_per_dto); + ep_attr.max_rdma_write_iov = 0; + ep_attr.max_rdma_read_iov = 0; ep_attr.max_rdma_read_in = DAPL_DEFAULT_MAX_RDMA_IN; ep_attr.max_rdma_read_out = DAPL_DEFAULT_MAX_RDMA_OUT; ----------------------------------------------------------------------------- yogeshwar sonawane wrote: > Hi lei, > > On 7/8/08, Lei Chai wrote: > >> Hi Jasjit, >> >> Thanks for using mvapich2. I believe you are using the udapl interface. >> >> When the number of processes is larger than 64, on demand connection >> establishment model is used for better scalability and thus the attribute >> values are different. If this is a problem on your stack, could you try to >> > > Can you elaborate more on this:- better scalability & change in the > attribute values. > Any link or reference will be also helpful. > > >> disable on demand by setting the threshold to be larger than the number of >> processes, e.g. >> >> $ mpiexec -n 64 -env MV2_ON_DEMAND_THRESHOLD 1024 ./a.out >> >> FYI, since the udapl interface in mvapich2 doesn't support the blocking >> progress mode yet, it will not be beneficial to use over-subscription. If >> you are using InfiniBand as the network we recommend you use the OFED >> interface in mvapich2, which provides the best performance, scalability, and >> features, such as blocking mode for over-subscription etc. The latest >> release is mvapich2-1.2rc1. >> >> Lei >> >> >> jasjit singh wrote: >> >> >>> Hi >>> >>> I am using mvapich2-1.0.1 >>> >>> While running more than 64 processes on 8 nodes (each with 8 cores, >>> >> 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain >> attributes. >> >>> 1) >>> Value of max_rdma_write_iov changes from 0 to 42. >>> Value of max_rdma_read_iov also changes from 0 to a non-zero value.. >>> I want to know why there is such a dramatic change in these values.How >>> >> should we proceed if we want to run more than 64 processes successfully ? >> >>> 2) >>> Value of max_message_size attribute in our stack is 4294967296 (i.e 4GB) >>> >> that is returned in dat_ia_query(). So we are expecting MVAPICH to set the >> same value for max_message_size while setting DAT_EP_ATTR in EP creation. It >> is doing so if we run upto 64 processes. But if number of processes exceed >> 64, MVAPICH sets this value to 1024(i.e 1K). This is again a drastic change. >> And what is more surprising is it does post recv for size larger than 1K. >> MVAPICH, it seems, is on one hand limiting MAX MESSAGE SIZE and on the other >> hand posting larger data size. >> >>> I am sure that changes in these values have nothing to do with the number >>> >> of nodes (or oversubscription, I essentially mean).(CMIIW) >> >>> These changes are only due to increase in number of processes. And one >>> >> thing more I want to confirm is this has nothing to do with cluster type >> whether this is small, medium or large as the limit for number of processes >> for small cluster is 128. >> >>> Regards, >>> Jasjit Singh >>> >>> >>> >>> >> ------------------------------------------------------------------------ >> >>> Not happy with your email address? >>> Get the one you really want >>> >> - millions of new >> email addresses available now at Yahoo! >> >> >> ------------------------------------------------------------------------ >> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> >>> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > with regards, > Yogeshwar > From nilesh_awate at yahoo.com Wed Jul 9 08:41:53 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Wed Jul 9 08:42:02 2008 Subject: [mvapich-discuss] Couldn't run mpi program with mvapich2-1.2rc1 Message-ID: <98078.59087.qm@web94115.mail.in2.yahoo.com> Hi all, I downloaded mvapich2-1.2rc1 (as it runs without mpd daemons) for a trial. i configure it for udapl with prefix ~/mpi_bin_rc1 then set path to ~/mpi_bin_rc1/bin:$PATH (both mpirun_rsh & mpispawn present) to run mpi code i executed following command ./mpirun_rsh -np 2 node1 node2 ./mpicode (ssh wo passwd is enabled with nfs share) first i face foll. error /usr/bin/env mpispawn: no such file then i search on mailing list, there was a reply from Karl that try to run from installed path of mvapich bin directory because "mpispawn is being invoked without execvp" i tried that, then i got following error Child exited abnormally! cleanupKilling remote processes...DONE then i saw the output of ps -eaf on both the node i observe mpispawn was running on remote node & "/usr/bin/ssh -q sun00 cd /home/nilesha; /usr/bin/env LD_LIBRARY_PATH=/usr/mvapich/lib/share" process was hanging on executing node I don't know what is missing from my side ? please tell me if any thing more i need to do, waiting for reply, Nilesh Download prohibited? No problem. CHAT from any browser, without download. Go to http://in.messenger.yahoo.com/webmessengerpromo.php/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080709/4ad26ea7/attachment.html From nilesh_awate at yahoo.com Wed Jul 9 09:05:55 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Wed Jul 9 09:06:08 2008 Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled Message-ID: <243435.83777.qm@web94111.mail.in2.yahoo.com> Hi lei, i have created a small patch which take care of transport error; abort the mpi appliaction and come out of it. i have tried it on mvapich2-1.0.1 & mvapich2-1.0.3 here is the patch --- orig_mvapich2-1.0.1/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c 2007-09-06 02:14:15.000000000 +0530 +++ mvapich2-1.0.1_patched/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c 2008-07-02 15:30:45.000000000 +0530 @@ -455,6 +455,8 @@ int i, j, needed; static int last_poll = 0; int type = T_CHANNEL_NO_ARRIVE; + int rank; + PMI_Get_rank(&rank); *vbuf_handle = NULL; for (i = last_poll, j = 0; @@ -467,6 +469,16 @@ { DEBUG_PRINT ("[poll cq]: get complete queue entry\n"); assert (event.event_number == DAT_DTO_COMPLETION_EVENT); + + /* Following is the patch to come out in case of fatal error like + DAT_DTO_ERR_TRANSPORT (occures when network disfunction) */ + + if (event.event_data.dto_completion_event_data.status != DAT_DTO_SUCCESS) + { + udapl_error_abort(UDAPL_STATUS_ERR,"[%d]DAT_EVD_ERROR in Consume_signals %x \n",rank, + event.event_data.dto_completion_event_data.status); + } + sc = ((struct vbuf *) event.event_data. dto_completion_event_data.user_cookie.as_ptr)->desc; v = (vbuf *) ((aint_t) sc.cookie.as_ptr); regards Nilesh ----- Original Message ---- From: LEI CHAI To: nilesh awate Cc: MVAPICH2 Sent: Wednesday, 18 June, 2008 2:27:32 AM Subject: Re: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled Hi, We have never got the DAT_DTO_ERR_TRANSPORT error before. This error usually means the network has problem and is not functional well. I think a proper way to handle it is to report the error and abort the mpi program since it is kind of a fatal error. Lei ----- Original Message ----- From: nilesh awate Date: Tuesday, June 17, 2008 10:58 am Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled To: MVAPICH2 > Hi All, > I am using mvapich2-1.0.1 over udapl stack. > I am getting DAT_DTO_ERR_TRANSPORT error at udapl level, but mpi application is not terminating with some error > as i browse through the code i observe following thing. > ret1 = dat_evd_dequeue (MPIDI_CH3I_RDMA_Process.cq_hndl[i], &event); > if (ret1 == DAT_SUCCESS) { > assert (event.event_number == DAT_DTO_COMPLETION_EVENT); > /* but there is no check for event.event_data.dto_completion_event_data.status */ > . . . . > . . . . } > but above condition is handled in rdma_udapl_1sc.c file while dequeuing > what is expected behavior of mpi when udapl throws error like DAT_DTO_ERR_TRANSPORT ? > How this kind of error going to be handled at mpi level? > OR > How underlying udapl errors are reflected by mpi ? > I am using pallas as an application for testing purpose > waiting for reply > thanking > Nilesh ________________________________ > Bring your gang together. Do your thing. Find your favourite Yahoo! Group. > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state..edu/mailman/listinfo/mvapich-discuss Meet people who discuss and share your passions. Go to http://in.promos.yahoo.com/groups/bestofyahoo/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080709/2b217c9a/attachment-0001.html From koop at cse.ohio-state.edu Wed Jul 9 11:17:48 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed Jul 9 11:17:55 2008 Subject: [mvapich-discuss] mvapich over OFED *and* over IBGD IB stack In-Reply-To: Message-ID: Mikhail, MVAPICH has had support for both OpenFabrics and VAPI (IBGD) for quite some time although we are now phasing out support of VAPI since vendors now suggest OFED. MVAPICH 1.0 has support for both OFED and IBGD. It will require two different compiled versions, however. Let me know if this doesn't answer your question. Matt On Tue, 8 Jul 2008, Mikhail Kuzminsky wrote: > AFAIK mvapich versions something around 0.9.5 version were "switched" > from support of IBGD Mellanox IB stack to support of OFED. > > But was there some mvapich versions which might work over IBGD *and* > OFED ? I'm especially interesting for support of IBGD 1.8.0 and > OFED-1.2 or 1.3. > > Mikhail Kuzminsky > Computer Assistance to Chemical Research Center > Zelinsky Institute of Organic Chemistry > Moscow > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From koop at cse.ohio-state.edu Wed Jul 9 11:23:36 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed Jul 9 11:23:41 2008 Subject: [mvapich-discuss] Couldn't run mpi program with mvapich2-1.2rc1 In-Reply-To: <98078.59087.qm@web94115.mail.in2.yahoo.com> Message-ID: Nilesh, How are you setting the path? Is it in your .bashrc (or shell equivalent) or did you just set it the current environment? Also, is mpispawn also in the same directory where you are doing the ./mpirun_rsh -np 2 node1 node2 ./mpicode ? Matt On Wed, 9 Jul 2008, nilesh awate wrote: > Hi all, I downloaded mvapich2-1.2rc1 (as it runs without mpd daemons) for a trial. i configure it for udapl with prefix ~/mpi_bin_rc1 then set path to ~/mpi_bin_rc1/bin:$PATH (both mpirun_rsh & mpispawn present) to run mpi code i executed following command ./mpirun_rsh -np 2 node1 node2 ./mpicode (ssh wo passwd is enabled with nfs share) first i face foll. error /usr/bin/env mpispawn: no such file then i search on mailing list, there was a reply from Karl that try to run from installed path of mvapich bin directory because "mpispawn is being invoked without execvp" i tried that, then i got following error Child exited abnormally! cleanupKilling remote processes...DONE then i saw the output of ps -eaf on both the node i observe mpispawn was running on remote node & "/usr/bin/ssh -q sun00 cd /home/nilesha; /usr/bin/env LD_LIBRARY_PATH=/usr/mvapich/lib/share" process was hanging on executing node I don't know what is missing from my side ? please tell me if any thing more i need to do, waiting for reply, Nilesh Download prohibited? No problem. CHAT from any browser, without download. Go to http://in.messenger.yahoo.com/webmessengerpromo.php/ From chai.15 at osu.edu Wed Jul 9 15:01:47 2008 From: chai.15 at osu.edu (Lei Chai) Date: Wed Jul 9 15:01:50 2008 Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled In-Reply-To: <243435.83777.qm@web94111.mail.in2.yahoo.com> References: <243435.83777.qm@web94111.mail.in2.yahoo.com> Message-ID: <48750B1B.2080806@osu.edu> Hi Nilesh, Thanks for the patch. It has been applied to the latest mvapich2 svn trunk with minor enhancement. Lei nilesh awate wrote: > Hi lei, > > i have created a small patch which take care of transport error; > abort the mpi appliaction > and come out of it. > i have tried it on mvapich2-1.0.1 & mvapich2-1.0.3 > > here is the patch > > --- > orig_mvapich2-1.0.1/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c > 2007-09-06 02:14:15.000000000 +0530 > +++ > mvapich2-1.0.1_patched/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c > 2008-07-02 15:30:45.000000000 +0530 > @@ -455,6 +455,8 @@ > int i, j, needed; > static int last_poll = 0; > int type = T_CHANNEL_NO_ARRIVE; > + int rank; > + PMI_Get_rank(&rank); > > *vbuf_handle = NULL; > for (i = last_poll, j = 0; > @@ -467,6 +469,16 @@ > { > DEBUG_PRINT ("[poll cq]: get complete queue entry\n"); > assert (event.event_number == DAT_DTO_COMPLETION_EVENT); > + > + /* Following is the patch to come out in case of fatal > error like > + DAT_DTO_ERR_TRANSPORT (occures when network > disfunction) */ > + > + if (event.event_data.dto_completion_event_data.status > != DAT_DTO_SUCCESS) > + { > + > udapl_error_abort(UDAPL_STATUS_ERR,"[%d]DAT_EVD_ERROR in > Consume_signals %x \n",rank, > + > event.event_data.dto_completion_event_data.status); > + } > + > sc = ((struct vbuf *) event.event_data. > > dto_completion_event_data.user_cookie.as_ptr)->desc; > v = (vbuf *) ((aint_t) sc.cookie.as_ptr); > > > regards > > Nilesh > > > ----- Original Message ---- > From: LEI CHAI > To: nilesh awate > Cc: MVAPICH2 > Sent: Wednesday, 18 June, 2008 2:27:32 AM > Subject: Re: [mvapich-discuss] dat_evd_dequeue erroneous condition is > not handled > > Hi, > > We have never got the DAT_DTO_ERR_TRANSPORT error before. This error > usually means the network has problem and is not functional well. I > think a proper way to handle it is to report the error and abort the > mpi program since it is kind of a fatal error. > > Lei > > > ----- Original Message ----- > From: nilesh awate > Date: Tuesday, June 17, 2008 10:58 am > Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not > handled > To: MVAPICH2 > > > > > Hi All, > > > I am using mvapich2-1.0.1 over udapl stack. > > > I am getting DAT_DTO_ERR_TRANSPORT error at udapl level, but mpi > application is not terminating with some error > > > as i browse through the code i observe following thing. > > > ret1 = dat_evd_dequeue (MPIDI_CH3I_RDMA_Process.cq_hndl[i], &event); > > if (ret1 == DAT_SUCCESS) > { > > assert (event.event_number == DAT_DTO_COMPLETION_EVENT); > > /* but there is no check for > event.event_data.dto_completion_event_data.status */ > > . . . . > > . . . . > > } > > > but above condition is handled in rdma_udapl_1sc.c file while dequeuing > > > what is expected behavior of mpi when udapl throws error like > DAT_DTO_ERR_TRANSPORT ? > > > How this kind of error going to be handled at mpi level? > > OR > > How underlying udapl errors are reflected by mpi ? > > > I am using pallas as an application for testing purpose > > > waiting for reply > > thanking > > Nilesh > > > > > > > > ------------------------------------------------------------------------ > > Bring your gang together. Do your thing. Find your favourite Yahoo! > Group. > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > ------------------------------------------------------------------------ > Bollywood, fun, friendship, sports and more. You name it, we have it. > From yogyas at gmail.com Sun Jul 13 07:22:37 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Sun Jul 13 07:22:46 2008 Subject: [mvapich-discuss] HPL with mvapich2-1.0.1 issue. Message-ID: Hi all, I am using mvapich2-1.0.1 with uDAPL as the device configured with default settings like shared memory support. I am running HPL compiled with this mpi binaries. HPL-version 1.0a downloaded from www.netlib.org is used. ATLAS-3.8.1 is used which is required for HPL. I am using udapl from OFED-1.2 on IB card. Now, when i run HPL with 16 processes, on a single node (quad-core, quad-socket having 64 GB of RAM), machine gets stuck/hang after first HPL reading. This is not kernel-panic condition. I did some observations. The problem size of HPL is for 65 %, 70 % & 75 % of total memory with multiple NB values. When HPL is fired, everything is smooth, around 14 GB out of 64 GB is free. After first reading with some combination is displayed, memory usage increases to full. Then swap space is also comsumed near to full. This happens very quickly. Then machine becomes unresponsive to commands. But machine is able to ping from other nodes. Now after around 2 hrs, HPL exited with "caused collective abort of all ranks exit status of rank 14: killed by signal 9" error. There were kernel messages "out of memory, killing xhpl..." Multiple runs, having different N, have shown similar behaviour. One point to note is, after first reading only problem will start. I tried to provide HPL.dat which will produce only single reading. That run was successful. Such multiple runs of HPL, each producing only single reading/combination are done. All are successful. Problem seems to be there when multiple combination/reading HPL.dat is used. I did the same HPL run with MVAPICH2-1.0.1 compiled for TCP/IP again on single node. But, this run was successful, with all readings displayed, no swap usage & normal closure of HPL. Can anybody help me to solve the issue ? Any links or references are welcomed. I am not sure whether this list is the correct for HPL related query. So, kindly guide me on this also. Thanks, Yogeshwar From noam.bernstein at nrl.navy.mil Wed Jul 16 10:14:27 2008 From: noam.bernstein at nrl.navy.mil (Noam Bernstein) Date: Wed Jul 16 10:13:34 2008 Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance Message-ID: Should I be surprised as this gap in bandwidth between mvapich 1 and mvapich 2 (OSU benchmarks 3.0, osu_bibw)? mpi1 version is quite close to expected maximum for IB (8 Gb/s each way), but mpi2 is 25% lower. Our cluster uses dual processor single core Opterons, Mellanox Infiniband HCAs with OFED 1.2.5.1, only 1 processor on each node in use. Below, mpi1 is mvapich 1.0.1 compiled with make.mvapich.gen2 mpi2 is mvapich2 1.0.3 compiled with make.mvapich2.ofa No other flags at compile or run time, everything compiled with gcc. thanks, Noam bibw.mpi1.stdout:Warning: no access to tty (Bad file descriptor). bibw.mpi1.stdout:Thus no job control in this shell. bibw.mpi1.stdout:orig machines bibw.mpi1.stdout:edited machines bibw.mpi1.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 bibw.mpi1.stdout:# Size Bi-Bandwidth (MB/s) bibw.mpi1.stdout:1 1.51 bibw.mpi1.stdout:2 3.12 bibw.mpi1.stdout:4 6.15 bibw.mpi1.stdout:8 11.83 bibw.mpi1.stdout:16 23.46 bibw.mpi1.stdout:32 41.45 bibw.mpi1.stdout:64 81.72 bibw.mpi1.stdout:128 156.60 bibw.mpi1.stdout:256 264.41 bibw.mpi1.stdout:512 423.20 bibw.mpi1.stdout:1024 604.07 bibw.mpi1.stdout:2048 772.51 bibw.mpi1.stdout:4096 883.79 bibw.mpi1.stdout:8192 1029.38 bibw.mpi1.stdout:16384 1469.52 bibw.mpi1.stdout:32768 1666.29 bibw.mpi1.stdout:65536 1784.16 bibw.mpi1.stdout:131072 1685.49 bibw.mpi1.stdout:262144 1883.22 bibw.mpi1.stdout:524288 1901.34 bibw.mpi1.stdout:1048576 1910.08 bibw.mpi1.stdout:2097152 1917.89 bibw.mpi1.stdout:4194304 1919.68 bibw.mpi2.stdout:Warning: no access to tty (Bad file descriptor). bibw.mpi2.stdout:Thus no job control in this shell. bibw.mpi2.stdout:orig machines bibw.mpi2.stdout:edited machines bibw.mpi2.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 bibw.mpi2.stdout:# Size Bi-Bandwidth (MB/s) bibw.mpi2.stdout:1 1.10 bibw.mpi2.stdout:2 2.21 bibw.mpi2.stdout:4 4.04 bibw.mpi2.stdout:8 8.33 bibw.mpi2.stdout:16 16.07 bibw.mpi2.stdout:32 30.32 bibw.mpi2.stdout:64 62.31 bibw.mpi2.stdout:128 121.45 bibw.mpi2.stdout:256 216.58 bibw.mpi2.stdout:512 373.28 bibw.mpi2.stdout:1024 568.49 bibw.mpi2.stdout:2048 739.37 bibw.mpi2.stdout:4096 878.16 bibw.mpi2.stdout:8192 889.26 bibw.mpi2.stdout:16384 1079.31 bibw.mpi2.stdout:32768 1164.42 bibw.mpi2.stdout:65536 1226.60 bibw.mpi2.stdout:131072 1227.85 bibw.mpi2.stdout:262144 1265.47 bibw.mpi2.stdout:524288 1262.38 bibw.mpi2.stdout:1048576 1747.40 bibw.mpi2.stdout:2097152 1582.24 bibw.mpi2.stdout:4194304 1543.45 From huanwei at cse.ohio-state.edu Wed Jul 16 12:26:07 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Wed Jul 16 12:26:15 2008 Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance In-Reply-To: Message-ID: Hi Noam, mvapich and mvapich2 should have very close performance and we have never seen the difference between the peak bandwidth reported by OSU benchmarks. May I ask what HCA that you are using on your systems? And are there multiple HCAs on each node? CPU affinity can also play a role here. Can you manually set CPU mappings? You can do that by setting environmental variables: mvapich1: mpirun_rsh -np 2 h1 h2 VIADEV_CPU_MAPPING=0 ./a.out (for detail, see http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-1440009.6.6) mvapich2-1.0.3 does not support manual mapping. You can change VIADEV_CPU_MAPPING from 0 and 1 above and see if CPU mapping is playing a role here. However, we just released mvapich2-1.2rc1, which will support cpu mappings. We suggest you try this version as well. If you use mvapich2-1.2, then you can set mapping by (this version support mpirun_rsh startup as mvapich1): mpirun_rsh -np 2 h1 h2 MV2_CPU_MAPPING=0 ./a.out (http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2rc1.html#x1-320006.8) Hope this helps. -- Wei > ---------- Forwarded message ---------- > Date: Wed, 16 Jul 2008 10:14:27 -0400 > From: Noam Bernstein > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance > > Should I be surprised as this gap in bandwidth between mvapich 1 and > mvapich 2 > (OSU benchmarks 3.0, osu_bibw)? mpi1 version is quite close to > expected maximum for IB (8 Gb/s each way), but mpi2 is 25% lower. > > Our cluster uses dual processor single core Opterons, Mellanox > Infiniband > HCAs with OFED 1.2.5.1, only 1 processor on each node in use. > > Below, mpi1 is > mvapich 1.0.1 compiled with make.mvapich.gen2 > mpi2 is > mvapich2 1.0.3 compiled with make.mvapich2.ofa > > No other flags at compile or run time, everything compiled with gcc. > > thanks, > Noam > > > bibw.mpi1.stdout:Warning: no access to tty (Bad file descriptor). > bibw.mpi1.stdout:Thus no job control in this shell. > bibw.mpi1.stdout:orig machines > bibw.mpi1.stdout:edited machines > bibw.mpi1.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 > bibw.mpi1.stdout:# Size Bi-Bandwidth (MB/s) > bibw.mpi1.stdout:1 1.51 > bibw.mpi1.stdout:2 3.12 > bibw.mpi1.stdout:4 6.15 > bibw.mpi1.stdout:8 11.83 > bibw.mpi1.stdout:16 23.46 > bibw.mpi1.stdout:32 41.45 > bibw.mpi1.stdout:64 81.72 > bibw.mpi1.stdout:128 156.60 > bibw.mpi1.stdout:256 264.41 > bibw.mpi1.stdout:512 423.20 > bibw.mpi1.stdout:1024 604.07 > bibw.mpi1.stdout:2048 772.51 > bibw.mpi1.stdout:4096 883.79 > bibw.mpi1.stdout:8192 1029.38 > bibw.mpi1.stdout:16384 1469.52 > bibw.mpi1.stdout:32768 1666.29 > bibw.mpi1.stdout:65536 1784.16 > bibw.mpi1.stdout:131072 1685.49 > bibw.mpi1.stdout:262144 1883.22 > bibw.mpi1.stdout:524288 1901.34 > bibw.mpi1.stdout:1048576 1910.08 > bibw.mpi1.stdout:2097152 1917.89 > bibw.mpi1.stdout:4194304 1919.68 > > bibw.mpi2.stdout:Warning: no access to tty (Bad file descriptor). > bibw.mpi2.stdout:Thus no job control in this shell. > bibw.mpi2.stdout:orig machines > bibw.mpi2.stdout:edited machines > bibw.mpi2.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 > bibw.mpi2.stdout:# Size Bi-Bandwidth (MB/s) > bibw.mpi2.stdout:1 1.10 > bibw.mpi2.stdout:2 2.21 > bibw.mpi2.stdout:4 4.04 > bibw.mpi2.stdout:8 8.33 > bibw.mpi2.stdout:16 16.07 > bibw.mpi2.stdout:32 30.32 > bibw.mpi2.stdout:64 62.31 > bibw.mpi2.stdout:128 121.45 > bibw.mpi2.stdout:256 216.58 > bibw.mpi2.stdout:512 373.28 > bibw.mpi2.stdout:1024 568.49 > bibw.mpi2.stdout:2048 739.37 > bibw.mpi2.stdout:4096 878.16 > bibw.mpi2.stdout:8192 889.26 > bibw.mpi2.stdout:16384 1079.31 > bibw.mpi2.stdout:32768 1164.42 > bibw.mpi2.stdout:65536 1226.60 > bibw.mpi2.stdout:131072 1227.85 > bibw.mpi2.stdout:262144 1265.47 > bibw.mpi2.stdout:524288 1262.38 > bibw.mpi2.stdout:1048576 1747.40 > bibw.mpi2.stdout:2097152 1582.24 > bibw.mpi2.stdout:4194304 1543.45 > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From thakur at mcs.anl.gov Thu Jul 17 13:23:59 2008 From: thakur at mcs.anl.gov (Rajeev Thakur) Date: Thu Jul 17 13:24:10 2008 Subject: [mvapich-discuss] FW: Proposed patches to MVAPICH and MVAPICH2 rpm spec files Message-ID: <002a01c8e831$e6b98aa0$860add8c@mcs.anl.gov> -----Original Message----- From: owner-mpich2-dev@mcs.anl.gov [mailto:owner-mpich2-dev@mcs.anl.gov] Sent: Thursday, July 17, 2008 12:14 PM To: owner-mpich2-dev@mcs.anl.gov Subject: BOUNCE mpich2-dev@mcs.anl.gov: Non-member submission from ["Mike Heinz" ] >From owner-mpich2-dev@mcs.anl.gov Thu Jul 17 12:14:11 2008 Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4]) by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m6HHE8l21800 for ; Thu, 17 Jul 2008 12:14:11 -0500 Received: from localhost (localhost [127.0.0.1]) by mailgw.mcs.anl.gov (Postfix) with ESMTP id C7E60348004 for ; Thu, 17 Jul 2008 12:14:08 -0500 (CDT) X-Greylist: delayed 60 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Thu, 17 Jul 2008 12:14:06 CDT Received: from EPEXCH1.qlogic.org (eppat.qlogic.com [198.186.5.11]) by mailgw.mcs.anl.gov (Postfix) with ESMTP id BEA6E348002 for ; Thu, 17 Jul 2008 12:14:06 -0500 (CDT) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C8E830.6055FDD4" Subject: Proposed patches to MVAPICH and MVAPICH2 rpm spec files Date: Thu, 17 Jul 2008 12:13:03 -0500 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Proposed patches to MVAPICH and MVAPICH2 rpm spec files Thread-Index: AcjoMF88YEzoPuNcSj6yftFO624GxA== From: "Mike Heinz" To: , X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov X-Spam-Status: No, hits=0.2 required=5.0 tests=HTML_30_40,HTML_MESSAGE,PATCH_UNIFIED_DIFF version=2.55 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) X-MCS-Mail-Loop: mpich2-dev This is a multi-part message in MIME format. ------_=_NextPart_001_01C8E830.6055FDD4 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I'm not sure who the best person is to receive these changes: We've been encountering complications whe converting users to OFED 1.3 because the scripts provided for configuring the shell (mpivars.sh and mpivars.csh) don't update the library path. This can lead to MPI programs failing to link or failing to run. The fix is to modify the spec files for the RPMs for these packages so that they set the LD_LIBRARY_PATH as well as the PATH. =20 The fix for MVAPICH-1.0.1 is this: =20 --- mvapich.spec.orig 2008-07-16 17:06:44.000000000 -0400 +++ mvapich.spec 2008-07-16 16:49:27.000000000 -0400 @@ -300,17 +300,25 @@ if ! echo \${PATH} | grep -q %{_prefix}/bin ; then export PATH=3D%{_prefix}/bin:\${PATH} fi +if ! echo \${LD_LIBRARY_PATH} | grep -q %{_prefix}/lib ; then + export LD_LIBRARY_PATH=3D%{_prefix}/lib:%{_prefix}/lib:/shared:\${LD_LIBRARY_PAT= H } +fi EOF =20 # Script for csh cat < %{build_root}/%{_prefix}/bin/%{shell_scripts_basename}.csh -if (?$path) then - if ( "\${path}" !~ *%{_prefix}/bin* ) then - setenv path %{_prefix}/bin:\$path +if ("\$path" !~ *%{_prefix}/bin) then + set path=3D(%{_prefix}/bin \$path) +endif + +if ("1" =3D=3D "\$?LD_LIBRARY_PATH") then + if ("\$LD_LIBRARY_PATH" !~ *%{_prefix}/lib) then + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared:\${LD_LIBRARY_PATH} endif else - setenv path %{_prefix}/bin: + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared endif + EOF =20 and the fix for MVAPICH2-1.0.3 is this: =20 =20 --- ../mvapich2.spec.orig 2008-07-16 17:17:10.000000000 -0400 +++ mvapich2.spec 2008-07-17 09:03:19.000000000 -0400 @@ -261,12 +261,16 @@ =20 # Additionally, create the mpivars.[c]sh files. cat >bin/mpivars.csh < Message-ID: Hi Mike, Thanks for posting these patches. In future, please feel free to post patches related to MVAPICH and MVAPICH2 to mvapich-discuss list (cc'ed in this e-mail). Pasha (cc'ed here) will take care of the changes to MVAPICH rpm spec file. Jonathan (cc'ed here) will take care of the changes to MVAPICH2 rpm spec file. Thanks, DK On Thu, 17 Jul 2008, Mike Heinz wrote: > I'm not sure who the best person is to receive these changes: We've been > encountering complications whe converting users to OFED 1.3 because the > scripts provided for configuring the shell (mpivars.sh and mpivars.csh) > don't update the library path. This can lead to MPI programs failing to > link or failing to run. The fix is to modify the spec files for the RPMs > for these packages so that they set the LD_LIBRARY_PATH as well as the > PATH. > > The fix for MVAPICH-1.0.1 is this: > > --- mvapich.spec.orig 2008-07-16 17:06:44.000000000 -0400 > +++ mvapich.spec 2008-07-16 16:49:27.000000000 -0400 > @@ -300,17 +300,25 @@ > if ! echo \${PATH} | grep -q %{_prefix}/bin ; then > export PATH=%{_prefix}/bin:\${PATH} > fi > +if ! echo \${LD_LIBRARY_PATH} | grep -q %{_prefix}/lib ; then > + export > LD_LIBRARY_PATH=%{_prefix}/lib:%{_prefix}/lib:/shared:\${LD_LIBRARY_PATH > } > +fi > EOF > > # Script for csh > cat < %{build_root}/%{_prefix}/bin/%{shell_scripts_basename}.csh > -if (?$path) then > - if ( "\${path}" !~ *%{_prefix}/bin* ) then > - setenv path %{_prefix}/bin:\$path > +if ("\$path" !~ *%{_prefix}/bin) then > + set path=(%{_prefix}/bin \$path) > +endif > + > +if ("1" == "\$?LD_LIBRARY_PATH") then > + if ("\$LD_LIBRARY_PATH" !~ *%{_prefix}/lib) then > + setenv LD_LIBRARY_PATH > %{_prefix}/lib:%{_prefix}/lib/shared:\${LD_LIBRARY_PATH} > endif > else > - setenv path %{_prefix}/bin: > + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared > endif > + > EOF > > > and the fix for MVAPICH2-1.0.3 is this: > > > --- ../mvapich2.spec.orig 2008-07-16 17:17:10.000000000 -0400 > +++ mvapich2.spec 2008-07-17 09:03:19.000000000 -0400 > @@ -261,12 +261,16 @@ > > # Additionally, create the mpivars.[c]sh files. > cat >bin/mpivars.csh < -if (\$?path) then > - if ( "\${path}" !~ *%{_prefix}/bin* ) then > +if ("\$path" !~ *%{_prefix}/bin) then > set path = ( %{_prefix}/bin \$path ) > endif > + > +if ("1" == "\$?LD_LIBRARY_PATH") then > + if ("\$LD_LIBRARY_PATH" !~ *%{_prefix}/lib) then > + setenv LD_LIBRARY_PATH %{_prefix}/lib:\${LD_LIBRARY_PATH} > + endif > else > - set path = ( %{_prefix}/bin ) > + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared > endif > > if (\$?MANPATH) then > @@ -282,7 +286,9 @@ > if ! echo \${PATH} | grep -q %{_prefix}/bin ; then > PATH=%{_prefix}/bin:\${PATH} > fi > - > +if ! echo \${LD_LIBRARY_PATH} | grep -q %{_prefix}/lib ; then > + export LD_LIBRARY_PATH=%{_prefix}/lib:\${LD_LIBRARY_PATH} > +fi > if ! echo \${MANPATH} | grep -q %{_prefix}/man ; then > MANPATH=%{_prefix}/man:\${MANPATH} > fi > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > From David_Kewley at Dell.com Thu Jul 17 21:47:46 2008 From: David_Kewley at Dell.com (David_Kewley@Dell.com) Date: Thu Jul 17 21:49:00 2008 Subject: [mvapich-discuss] uninitialized struct member leading to MVAPICH 1.0 segfault? Message-ID: I have an MVAPICH 1.0 program segfaulting, and I think I may have traced it back to MVAPICH's failure to initialize a struct member before using it. We are testing a speculative fix right now. The full story follows; let me know what you think. struct MPI_COMMUNICATOR member shmem_comm_rank is only set in one place as far as I can see: src/context/create_2level_comm.c: 100 void create_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr, int size, int my_rank){ ... 208 if (shmem_comm_count < shmem_coll_blocks){ 209 shmem_ptr->shmem_comm_rank = shmem_comm_count; 210 input_flag = 1; 211 } 212 else{ 213 input_flag = 0; 214 } ... 277 } Note that shmem_comm_rank is set only if the condition holds; if the condition does not hold, then the value of shmem_comm_rank is whatever happened to be in memory at that point. So, what might that value be? Best I can figure out, memory for a struct MPIR_COMMUNICATOR is always allocated using malloc(). My manpage for malloc says that malloc() does not clear the memory it allocates, which I take to mean it does not set the memory contents to zero, but simply leaves it as it was. So if malloc() chooses to allocate memory which was previously free()'d, then the memory handed to the requester may have inappropriate, nonzero data in it. I do not know for sure what happens if the memory happens to be freshly granted by the kernel, but I suspect in this case it is guaranteed to be zeroed by the kernel. So... If the condition (shmem_comm_count < shmem_coll_blocks) does not hold, then shmem_comm_rank is not initialized. If it is later referenced, its value is meaningless and may lead to an error. I believe that is what is happening to us; the major unknown at this point is whether we are in fact hitting the "else" part of the above clause. I'd love your comments about what is likely the case, and how we can tell without doing a printf() or similar. :) Eventually we see a segfault in free_2level_comm(): src/context/create_2level_comm.c: 62 void free_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr) 63 { ... 87 if (comm_ptr->shmem_comm != MPI_COMM_NULL) { 88 struct MPIR_COMMUNICATOR* shmem_ptr; 89 shmem_ptr= MPIR_GET_COMM_PTR(comm_ptr->shmem_comm); 90 pthread_spin_lock(&shmem_coll->shmem_coll_lock); 91 shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = 1; 92 pthread_spin_unlock(&shmem_coll->shmem_coll_lock); 93 MPI_Comm_free(&(comm_ptr->shmem_comm)); 94 } ... 98 } The segfault happens at line 91, because it appears that shmem_ptr->shmem_comm_rank is a large negative number. I suspect in fact shmem_comm_rank was never initialized (see above), which means the negative number is an "accidental" value [1]. We only see this segfault in around 1 out of 20 runs of a particular application. I suspect the ~1/20 hit rate is simply accidents of how memory gets allocated in each run. Sometimes shmem_ptr->shmem_comm_rank probably happens to sit in a memory location that contains 0, so the above code does not cause a segfault. I suspect the fact that we've only noticed this in one code may be an accident; I do not assume it is significant. We may not have visibility to whether other codes are hitting this segfault mechanism. Do you agree that this failure to initialize shmem_comm_rank is a bug? If so, probably the right fix is to add "shmem_ptr->shmem_comm_rank = 0;" to the "else" clause in the first code snippet above. Would you agree? That is the fix we are testing right now. Or should it be done in a structure-initialization operation somehow? Mind you, I don't know whether it is *semantically* correct to set shmem_comm_rank to 0 by default. I am doing it simply because it replicates the likely common case (~19 out of 20 runs) where the contents of that memory location often just happen to be cleared to zero. Finding this bug raises a question: How do we guarantee that there are not other unrecognized problems like this one? How to we check for use of uninitialized variables (e.g. structure members) allocated by malloc()? Is it best practice to do a memset(x, 0, sizeof(x))? This is a C-coding best-practices question, and also a question about how MPICH and MVAPICH are coded. Thanks, David [1] On x86_64 an int is 4 bytes and a pointer is 8 bytes. Looking at the contents of the 8 bytes starting at &(shmem_ptr->shmem_comm_rank), they appear to be a valid pointer value similar to other pointer values I see in this core dump. I do not know what this pointer points to (or pointed to in the past). We get shmem_comm_rank interpreted as a large negative number simply because the MSbit of the first four bytes happens to be set. I think it is incontrovertible that these eight bytes hold a pointer value that was at some point valid. This value could have been written to memory before the *MPIR_COMMUNICATOR was allocated (presumably part of an object that was free()'d). This is the hypothesis I explore above. It's also possible that this pointer was written to those eight bytes *after* the *MPIR_COMMUNICATOR was created. That is, someone is stomping on our structure. If that is the case, we should still see segfaults after fixing the failure to initialize shmem_comm_rank. We're doing runs right now in which shmem_comm_rank is also initialized (to 0) in the "else" clause, to check this possibility. The final possibility is that a legitimate user of this structure is writing this pointer value inappropriately. I think this is very unlikely, assuming this problem is not caused by a compiler bug, because the source code only writes to shmem_coll_rank in one place that I can see, and the code logically can only write an integer value. Regardless of the outcome of those tests, however, it is definitely a bug not to initialize shmem_comm_rank before it is used, unless I'm missing something. David Kewley Dell Infrastructure Consulting Services Onsite Engineer at the Maui HPC Center Cell: 602-460-7617 David_Kewley@Dell.com Dell Services: http://www.dell.com/services/ How am I doing? Email my manager Russell_Kelly@Dell.com with any feedback. From kernel at tekno-soft.it Fri Jul 18 12:04:57 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Fri Jul 18 12:06:39 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI Message-ID: <4880BF29.8010003@tekno-soft.it> Hi All on the list, I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, initialize using MPI_THREAD_MULTI. I've the master application doing the following thing, start several thread depending by the assigned nodes, on each node a slave application is spawned using the MPI_Comm_spawn(). Before to call the MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each thread, in order to set the all keys (host and wdir) for addressing the wanted behaviour. So, as sooner as the master application starts, it races immediately with 4 nodes, 1 master and 3 slaves. Below you can see the status of the master application at race time. It seems stuck on the PMIU_readline() which never returns so the global lock is never relesead. MVAPICH2 is compiled with: PKG_PATH=/HRI/External/mvapich2/1.2rc1 ./configure --prefix=$PKG_PATH \ --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ --enable-sharedlibs=gcc \ --enable-f90 \ --enable-threads=multiple \ --enable-g=-ggdb \ --enable-debuginfo \ --with-device=ch3:sock \ --datadir=$PKG_PATH/data \ --with-htmldir=$PKG_PATH/doc/html \ --with-docdir=$PKG_PATH/doc \ LDFLAGS='-Wl,-z,noexecstack' so I'm using the ch3:sock device. -----Thread 2 [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 --->>#2 0x00000033ca408390 in pthread_mutex_lock () from /lib64/libpthread.so.0 --->>#3 0x00002aaaab382654 in PMPI_Info_set () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=, key=0x0, value=0x33ca40ff58 "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) at ParallelWorker.c:664 #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) at ParallelWorker.c:719 #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at ParallelWorker.c:504 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 -----Thread 3 [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 --->>#2 0x00000033ca408390 in pthread_mutex_lock () from /lib64/libpthread.so.0 --->>#3 0x00002aaaab382654 in PMPI_Info_set () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=, key=0x0, value=0x33ca40ff58 "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) at ParallelWorker.c:664 #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) at ParallelWorker.c:719 #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at ParallelWorker.c:504 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 -----Thread 4 [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 --->>#1 0x00002aaaab3db84a in PMIU_readline () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) at ParallelWorker.c:754 #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at ParallelWorker.c:504 #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 I also tried to run against MPICH2 v1.0.7, but here I got a similar scenery which show up after between 1 - 2 hours of execution, see below: ----- thread 2 [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 ----- thread 3 [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 ----- thread 4 [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 where the thread 2 is poll()ing never never returns, so never signals the poll() completion and than all the others waiters in the MPIDI_CH3I_Progress() condition will never wake up. Does anyone is having the same problem? Thanks in advance, Roberto Fichera. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080718/e5357966/attachment-0001.html From panda at cse.ohio-state.edu Fri Jul 18 12:39:17 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jul 18 12:39:24 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: <4880BF29.8010003@tekno-soft.it> Message-ID: Hi Roberto, Thanks for your note. You are using the ch3:sock device in MVAPICH2 which is the same as MPICH2. You are also seeing similar failure scenarios (but in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 mailing list. One of the MPICH2 developers will be able to extend help on this issue faster. Thanks, DK On Fri, 18 Jul 2008, Roberto Fichera wrote: > Hi All on the list, > > I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, > initialize using MPI_THREAD_MULTI. > I've the master application doing the following thing, start several > thread depending by the assigned nodes, > on each node a slave application is spawned using the MPI_Comm_spawn(). > Before to call the > MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each > thread, in order to set the all keys > (host and wdir) for addressing the wanted behaviour. So, as sooner as > the master application starts, it races > immediately with 4 nodes, 1 master and 3 slaves. Below you can see the > status of the master application at race > time. It seems stuck on the PMIU_readline() which never returns so the > global lock is never relesead. MVAPICH2 > is compiled with: > > PKG_PATH=/HRI/External/mvapich2/1.2rc1 > > ./configure --prefix=$PKG_PATH \ > --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ > --enable-sharedlibs=gcc \ > --enable-f90 \ > --enable-threads=multiple \ > --enable-g=-ggdb \ > --enable-debuginfo \ > --with-device=ch3:sock \ > --datadir=$PKG_PATH/data \ > --with-htmldir=$PKG_PATH/doc/html \ > --with-docdir=$PKG_PATH/doc \ > LDFLAGS='-Wl,-z,noexecstack' > > so I'm using the ch3:sock device. > > -----Thread 2 > [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 > 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > /lib64/libpthread.so.0 > --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= optimized out>, key=0x0, value=0x33ca40ff58 > "!\204ÿÿ\r\206ÿÿ\030\204ÿÿ3\206ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ"...) > at ParallelWorker.c:664 > #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) > at ParallelWorker.c:719 > #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at > ParallelWorker.c:504 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > -----Thread 3 > [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 > 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > /lib64/libpthread.so.0 > --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= optimized out>, key=0x0, value=0x33ca40ff58 > "!\204ÿÿ\r\206ÿÿ\030\204ÿÿ3\206ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ"...) > at ParallelWorker.c:664 > #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) > at ParallelWorker.c:719 > #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at > ParallelWorker.c:504 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > -----Thread 4 > [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 > 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > --->>#1 0x00002aaaab3db84a in PMIU_readline () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) > at ParallelWorker.c:754 > #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at > ParallelWorker.c:504 > #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > I also tried to run against MPICH2 v1.0.7, but here I got a similar > scenery which show up after between 1 - 2 hours of execution, > see below: > > ----- thread 2 > [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 > #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 > #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > ----- thread 3 > [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 > #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > > ----- thread 4 > [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 > #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > where the thread 2 is poll()ing never never returns, so never signals > the poll() completion and than all the others > waiters in the MPIDI_CH3I_Progress() condition will never wake up. > > Does anyone is having the same problem? > > Thanks in advance, > Roberto Fichera. > From kernel at tekno-soft.it Fri Jul 18 12:49:48 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Fri Jul 18 12:51:28 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: References: Message-ID: <4880C9AC.2050001@tekno-soft.it> Dhabaleswar Panda ha scritto: > Hi Roberto, > > Thanks for your note. You are using the ch3:sock device in MVAPICH2 which > is the same as MPICH2. You are also seeing similar failure scenarios (but > in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 > mailing list. One of the MPICH2 developers will be able to extend help on > this issue faster. > Thanks for that. About the mpich2 problem, I already sent an email regarding its related issue. But the strange thing is that when linking against mpich2 I don't see a so fast race as I see in the mvapich2. In the mpich2 case I had to wait 1 or 2 hours before the lock. > Thanks, > > DK > > > On Fri, 18 Jul 2008, Roberto Fichera wrote: > > >> Hi All on the list, >> >> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, >> initialize using MPI_THREAD_MULTI. >> I've the master application doing the following thing, start several >> thread depending by the assigned nodes, >> on each node a slave application is spawned using the MPI_Comm_spawn(). >> Before to call the >> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each >> thread, in order to set the all keys >> (host and wdir) for addressing the wanted behaviour. So, as sooner as >> the master application starts, it races >> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the >> status of the master application at race >> time. It seems stuck on the PMIU_readline() which never returns so the >> global lock is never relesead. MVAPICH2 >> is compiled with: >> >> PKG_PATH=/HRI/External/mvapich2/1.2rc1 >> >> ./configure --prefix=$PKG_PATH \ >> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ >> --enable-sharedlibs=gcc \ >> --enable-f90 \ >> --enable-threads=multiple \ >> --enable-g=-ggdb \ >> --enable-debuginfo \ >> --with-device=ch3:sock \ >> --datadir=$PKG_PATH/data \ >> --with-htmldir=$PKG_PATH/doc/html \ >> --with-docdir=$PKG_PATH/doc \ >> LDFLAGS='-Wl,-z,noexecstack' >> >> so I'm using the ch3:sock device. >> >> -----Thread 2 >> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >> /lib64/libpthread.so.0 >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=> optimized out>, key=0x0, value=0x33ca40ff58 >> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >> at ParallelWorker.c:664 >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) >> at ParallelWorker.c:719 >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at >> ParallelWorker.c:504 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> -----Thread 3 >> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >> /lib64/libpthread.so.0 >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=> optimized out>, key=0x0, value=0x33ca40ff58 >> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >> at ParallelWorker.c:664 >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) >> at ParallelWorker.c:719 >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at >> ParallelWorker.c:504 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> -----Thread 4 >> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 >> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >> --->>#1 0x00002aaaab3db84a in PMIU_readline () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) >> at ParallelWorker.c:754 >> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at >> ParallelWorker.c:504 >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> I also tried to run against MPICH2 v1.0.7, but here I got a similar >> scenery which show up after between 1 - 2 hours of execution, >> see below: >> >> ----- thread 2 >> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 >> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> ----- thread 3 >> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 >> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> >> ----- thread 4 >> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 >> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> where the thread 2 is poll()ing never never returns, so never signals >> the poll() completion and than all the others >> waiters in the MPIDI_CH3I_Progress() condition will never wake up. >> >> Does anyone is having the same problem? >> >> Thanks in advance, >> Roberto Fichera. >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080718/375664b2/attachment-0001.html From koop at cse.ohio-state.edu Fri Jul 18 14:14:56 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri Jul 18 14:15:05 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: <4880C9AC.2050001@tekno-soft.it> Message-ID: Hi Roberto, Are you using the new 'mpirun_rsh' command for launching your job? If so, that would explain the hang you see in the PMI calls (and why they happen at the spawn). We currently do not support spawn functionality in this release for mpirun_rsh. You will need to use MPD if your application needs spawn functionality until we release an updated version of mpirun_rsh. Thanks, Matt On Fri, 18 Jul 2008, Roberto Fichera wrote: > Dhabaleswar Panda ha scritto: > > Hi Roberto, > > > > Thanks for your note. You are using the ch3:sock device in MVAPICH2 which > > is the same as MPICH2. You are also seeing similar failure scenarios (but > > in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 > > mailing list. One of the MPICH2 developers will be able to extend help on > > this issue faster. > > > Thanks for that. About the mpich2 problem, I already sent an email > regarding its related issue. > But the strange thing is that when linking against mpich2 I don't see a > so fast race as I see in the > mvapich2. In the mpich2 case I had to wait 1 or 2 hours before the lock. > > Thanks, > > > > DK > > > > > > On Fri, 18 Jul 2008, Roberto Fichera wrote: > > > > > >> Hi All on the list, > >> > >> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, > >> initialize using MPI_THREAD_MULTI. > >> I've the master application doing the following thing, start several > >> thread depending by the assigned nodes, > >> on each node a slave application is spawned using the MPI_Comm_spawn(). > >> Before to call the > >> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each > >> thread, in order to set the all keys > >> (host and wdir) for addressing the wanted behaviour. So, as sooner as > >> the master application starts, it races > >> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the > >> status of the master application at race > >> time. It seems stuck on the PMIU_readline() which never returns so the > >> global lock is never relesead. MVAPICH2 > >> is compiled with: > >> > >> PKG_PATH=/HRI/External/mvapich2/1.2rc1 > >> > >> ./configure --prefix=$PKG_PATH \ > >> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ > >> --enable-sharedlibs=gcc \ > >> --enable-f90 \ > >> --enable-threads=multiple \ > >> --enable-g=-ggdb \ > >> --enable-debuginfo \ > >> --with-device=ch3:sock \ > >> --datadir=$PKG_PATH/data \ > >> --with-htmldir=$PKG_PATH/doc/html \ > >> --with-docdir=$PKG_PATH/doc \ > >> LDFLAGS='-Wl,-z,noexecstack' > >> > >> so I'm using the ch3:sock device. > >> > >> -----Thread 2 > >> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 > >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >> /lib64/libpthread.so.0 > >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >> optimized out>, key=0x0, value=0x33ca40ff58 > >> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >> at ParallelWorker.c:664 > >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) > >> at ParallelWorker.c:719 > >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at > >> ParallelWorker.c:504 > >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> -----Thread 3 > >> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 > >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >> /lib64/libpthread.so.0 > >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >> optimized out>, key=0x0, value=0x33ca40ff58 > >> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >> at ParallelWorker.c:664 > >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) > >> at ParallelWorker.c:719 > >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at > >> ParallelWorker.c:504 > >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> -----Thread 4 > >> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 > >> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >> --->>#1 0x00002aaaab3db84a in PMIU_readline () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) > >> at ParallelWorker.c:754 > >> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at > >> ParallelWorker.c:504 > >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> I also tried to run against MPICH2 v1.0.7, but here I got a similar > >> scenery which show up after between 1 - 2 hours of execution, > >> see below: > >> > >> ----- thread 2 > >> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >> (gdb) bt > >> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 > >> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 > >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> ----- thread 3 > >> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 > >> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 > >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> > >> ----- thread 4 > >> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> #1 0x00002aaaab52bec7 in MPIDI_CH3I