From dog at lanl.gov Tue Jul 1 10:40:12 2008 From: dog at lanl.gov (David Gunter) Date: Tue Jul 1 10:40:23 2008 Subject: [mvapich-discuss] Trouble building MVAPICH2-1.0.3 with any compiler other than gcc Message-ID: <196B4114-4507-44B9-A0DE-4B6AEBB21BF0@lanl.gov> I have run into this problem previously with the PGI compilers and was once able to work around it; however, it seems to have reared its ugly head again and I'm hoping someone on the list knows of a solution. The problem is that we need to build MVAPICH2 using the Intel, PathScale and PGI compilers in addition to the GCC compilers. Even though the documentation states that it has been tested with these other compilers I think that such tests were not done with Totalview support in mind. What happens during the build is that src/pm/mpd/mtv_setup.py is invoked. This causes the Python Distutils to try and create a Totalview module but Disutils only knows to put in flags for the GCC compilers. I have found switches for PGI and PathScale to ignore "invalid" flags but any code compiled with the resulting build does nothing but segfault. I have yet to get Intel to compile the sourcecode. This leaves us with two options: Give up on MVAPICH2 in favor of Open- MPI, which means having only one MPI implementation on a system where we'd prefer to have two, or give up on Totalview support - which is not going to fly with our user base. Does anyone know enough about Distutils to work around this problem? -david -- David Gunter HPC-3: Parallel Tools Team Los Alamos National Laboratory From Craig.Tierney at noaa.gov Tue Jul 1 14:58:40 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Tue Jul 1 14:58:46 2008 Subject: [mvapich-discuss] Question about launch times with mpd (fwd) In-Reply-To: References: Message-ID: <486A7E60.4090808@noaa.gov> wei huang wrote: > Hi Craig, > > The current mpd based startup can take longer time as the system size > grows. The main reason for this is that mpd needs to exchange connection > information through a TCP/IP based ring structure. It may also take some > time to launch the processes (from you type in the mpiexec command to the > processes reach their mpd_init stage). > > The number you see, however, is too large though compared with what we > observe on our system. Could you please confirm that you are at our latest > release version (mvapich2-1.0.3)? We have some mpd related patch in that > version. > The same problem exists with mvapich2-1.0.3. I ensured that I was using the mpiexec from this distribution (because multiple versions exist). Craig > Another news that you may interest in is that we are releasing > mvapich2-1.2-rc1 either today or tomorrow. In this version, we have much > improved scalability in job launching using a new startup mechanism. You > may want to try this version out. > > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > >> Date: Mon, 30 Jun 2008 10:17:51 -0600 >> From: Craig Tierney >> To: mvapich-discuss@cse.ohio-state.edu >> Subject: [mvapich-discuss] Question about launch times with mpd >> >> I am trying to benchmark some applications on my system >> and I have found something I did not expect with regards >> to launch time of applications. >> >> All jobs are launched through a batch system (Sun Gridengine). >> SGE is configured without tight-integration. Since mpd has >> to be setup for each user in each job, I have a wrapper script >> that does roughly the following: >> >> ----------------------- >> $me=`uname -n` >> $port=`mpd --ncpus=4 --echo --daemon --ifhn=$me-ib0` >> >> for every other node >> ssh $node mpd --ncpus=4 -h $me -p $port --daemon --ifhn=$node-ib0 & >> end >> waitall >> >> mpiexec -machinefile $machine_file $EXE >> ----------------------- >> >> I am running an MPI program that does very little. >> It calls mpi_init, writes the hostname, then calls mpi_finalize. >> >> I measured the time it takes to launch mpd, the time it >> takes for the program to execute (after MPI_init to just before >> MPI_Finalize), and the time to call MPI_Init. >> >> Cores mpd complete runjob >> ------------------------------ >> 4 ~0.0 ~0.0 0.6 >> 16 0.6 ~0.0 0.8 >> 64 0.7 ~0.0 3.5 >> 128 1.0 0.2 11.2 >> 256 1.7 0.6 47.0 >> 324 2.1 0.1 76.4 >> 512 3.1 0.5 202.4 >> >> - All timings are in seconds. >> >> mpd - the time to launch the mpd processes in parallel >> complete - the time to run the application >> runjob - the time to execute mpiexec >> >> My question is, why does mpd take so long to launch a job? >> Am I doing something wrong? Is there something I can do >> to minimize the startup time? >> >> Thanks, >> Craig >> >> >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- Craig Tierney (craig.tierney@noaa.gov) From yogyas at gmail.com Wed Jul 2 07:26:31 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Wed Jul 2 07:26:38 2008 Subject: [mvapich-discuss] SDR & DDR selection in MVAPICH2-uDAPL compilation Message-ID: Hi all, While compiling MVAPICH2 for udapl device, the selection menu asks for link speed. Two options are available :- SDR & DDR Generally these are the properties of the IB HCA. Now, on what things this selection is based on ? on IB HCA charactristics ? any other thing ? & In MVAPICH2-uDAPL code, how this selection matters ? OR How this selection is going to change the behaviour of MVAPICH2-uDAPL ? Thanks, Yogeshwar From chai.15 at osu.edu Wed Jul 2 12:52:27 2008 From: chai.15 at osu.edu (Lei Chai) Date: Wed Jul 2 12:52:27 2008 Subject: [mvapich-discuss] SDR & DDR selection in MVAPICH2-uDAPL compilation In-Reply-To: References: Message-ID: <486BB24B.5010607@osu.edu> Hi Yogeshwar, The selection is based on your IB card, whether it is SDR or DDR. Inside mvapich2-udapl code this information is used for parameter tuning. We have observed before that different parameter values are needed to yield best performance for these two kinds of cards. On a separate topic, if you are using IB, we suggest you use the OpenFabrics interface in mvapich2, which provides the best performance, scalability, and fault tolerance. Lei yogeshwar sonawane wrote: > Hi all, > > While compiling MVAPICH2 for udapl device, the selection menu asks for > link speed. > Two options are available :- SDR & DDR > Generally these are the properties of the IB HCA. > > Now, on what things this selection is based on ? > on IB HCA charactristics ? any other thing ? > & > In MVAPICH2-uDAPL code, how this selection matters ? > OR How this selection is going to change the behaviour of MVAPICH2-uDAPL ? > > Thanks, > Yogeshwar > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Wed Jul 2 23:08:15 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Jul 2 23:08:20 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.2RC1 Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH2 1.2RC1 with the following NEW features: - Based on MPICH2 1.0.7 - Scalable and robust daemon-less job startup - Enhanced and robust mpirun_rsh framework (non-MPD-based) to provide scalable job launching on multi-thousand core clusters - Available for OpenFabrics (IB and iWARP) and uDAPL interfaces (including Solaris) - Checkpoint-restart with intra-node shared memory support - Allows best performance and scalability with fault-tolerance support - Enhancement to software installation - Full autoconf-based configuration - An application (mpiname) for querying the MVAPICH2 library version and configuration information - Enhanced processor affinity using PLPA for multi-core architectures - Allows user-defined flexible processor affinity - Enhanced scalability for RDMA-based direct one-sided communication with less communication resource - Shared memory optimized MPI_Bcast operations - Optimized and tuned MPI_Alltoall For downloading MVAPICH2 1.2RC1, associated user guide and accessing the SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu All feedbacks, including bug reports and hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From panda at cse.ohio-state.edu Thu Jul 3 09:12:05 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Jul 3 09:12:09 2008 Subject: [mvapich-discuss] Trouble building MVAPICH2-1.0.3 with any compiler other than gcc In-Reply-To: <196B4114-4507-44B9-A0DE-4B6AEBB21BF0@lanl.gov> Message-ID: Hi David, You might have noticed that we made a release of MVAPICH2 1.2RC1 yesterday night. This has multiple start-up schemes including the traditional MPD-based and also a new scalable mpirun_rsh-based (similar to MVAPICH). We have verified that the MPD-based startup works with TotalView for all compilers. The complete TotalView support with the new mpirun_rsh-based scheme is not there yet. We are working on it and plan to have it in the final release version. You can check the MPD-based startup + TotalView for all compilers for this release and let us know if it works from your view point. Thanks, DK On Tue, 1 Jul 2008, David Gunter wrote: > I have run into this problem previously with the PGI compilers and was > once able to work around it; however, it seems to have reared its ugly > head again and I'm hoping someone on the list knows of a solution. > > The problem is that we need to build MVAPICH2 using the Intel, > PathScale and PGI compilers in addition to the GCC compilers. Even > though the documentation states that it has been tested with these > other compilers I think that such tests were not done with Totalview > support in mind. > > What happens during the build is that src/pm/mpd/mtv_setup.py is > invoked. This causes the Python Distutils to try and create a > Totalview module but Disutils only knows to put in flags for the GCC > compilers. > > I have found switches for PGI and PathScale to ignore "invalid" flags > but any code compiled with the resulting build does nothing but > segfault. I have yet to get Intel to compile the sourcecode. > > This leaves us with two options: Give up on MVAPICH2 in favor of Open- > MPI, which means having only one MPI implementation on a system where > we'd prefer to have two, or give up on Totalview support - which is > not going to fly with our user base. > > Does anyone know enough about Distutils to work around this problem? > > -david > -- > David Gunter > HPC-3: Parallel Tools Team > Los Alamos National Laboratory > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From manfred.muecke at univie.ac.at Thu Jul 3 12:32:20 2008 From: manfred.muecke at univie.ac.at (Manfred Muecke) Date: Thu Jul 3 12:36:02 2008 Subject: [mvapich-discuss] Q: MPI_ALLGATHERV causes Invalid communicator & comm=0x0 Message-ID: <43450.129.27.140.172.1215102740.squirrel@webmail.univie.ac.at> Hi, I have the following problem and ran out of ideas. Maybe someone can help with some advice. I get the following error message from all instances of my MPI-program (FORTRAN90), using MVAPICH2 1.0 (compiled with "mpe=mpicheck"): Invalid communicator, error stack: MPI_Comm_rank(107): MPI_Comm_rank(comm=0x0, rank=fffffd7fffdfd2bc) failed MPI_Comm_rank(65).: Invalid communicator The error is caused by a call to MPI_ALLGATHERV. It was discussed here that a similiar looking error is caused by including the wrong mpi.h. This one differs however in that comm=0x0 (the actual value of the communicator was 1140850688). "mpif90 -show" gives: /opt/local/SunStudio12/SUNWspro/bin/f90 -xO3 -xtarget=opteron -m64 -I/opt/local/MVAPICH/mvapich2-1.0/include -xO3 -M/opt/local/MVAPICH/mvapich2-1.0/include -L/opt/local/MVAPICH/mvapich2-1.0/lib -lmpichf90 -lmpichf90 -lmpich -L/usr/lib/amd64 -L/usr/ucblib/amd64 -lsocket -lnsl -lresolv -lpthread -ldat -lrt -lnsl -lsocket I have checked thoroughly and can not find any mpi.h from other installations interfering. Any other ideas? Thanks for your help, Manfred -- Manfred M?cke manfred.muecke@univie.ac.at Research Lab Computational Technologies and Applications rlcta.univie.ac.at Lenaugasse 2, 1080 Wien, AUSTRIA From thakur at mcs.anl.gov Thu Jul 3 13:27:58 2008 From: thakur at mcs.anl.gov (Rajeev Thakur) Date: Thu Jul 3 13:28:08 2008 Subject: [mvapich-discuss] Trouble building MVAPICH2-1.0.3 with any compiler other than gcc In-Reply-To: <200807031653.m63GrMof016224@cse.ohio-state.edu> References: <200807031653.m63GrMof016224@cse.ohio-state.edu> Message-ID: <003401c8dd32$23330a70$860add8c@mcs.anl.gov> David, One way around that problem is to edit the Makefile created by configure in src/pm/mpd and change the compiler to gcc (just for that directory). The rest of MPICH2 will get built with the Intel or other compiler chosen. Rajeev > On Tue, 1 Jul 2008, David Gunter wrote: > > > I have run into this problem previously with the PGI > compilers and was > > once able to work around it; however, it seems to have > reared its ugly > > head again and I'm hoping someone on the list knows of a solution. > > > > The problem is that we need to build MVAPICH2 using the Intel, > > PathScale and PGI compilers in addition to the GCC compilers. Even > > though the documentation states that it has been tested with these > > other compilers I think that such tests were not done with Totalview > > support in mind. > > > > What happens during the build is that src/pm/mpd/mtv_setup.py is > > invoked. This causes the Python Distutils to try and create a > > Totalview module but Disutils only knows to put in flags for the GCC > > compilers. > > > > I have found switches for PGI and PathScale to ignore > "invalid" flags > > but any code compiled with the resulting build does nothing but > > segfault. I have yet to get Intel to compile the sourcecode. > > > > This leaves us with two options: Give up on MVAPICH2 in > favor of Open- > > MPI, which means having only one MPI implementation on a > system where > > we'd prefer to have two, or give up on Totalview support - which is > > not going to fly with our user base. > > > > Does anyone know enough about Distutils to work around this problem? > > > > -david > > -- > > David Gunter > > HPC-3: Parallel Tools Team > > Los Alamos National Laboratory > > From curtisbr at cse.ohio-state.edu Thu Jul 3 14:34:44 2008 From: curtisbr at cse.ohio-state.edu (Brian Curtis) Date: Thu Jul 3 14:34:49 2008 Subject: [mvapich-discuss] Q: MPI_ALLGATHERV causes Invalid communicator & comm=0x0 In-Reply-To: <43450.129.27.140.172.1215102740.squirrel@webmail.univie.ac.at> References: <43450.129.27.140.172.1215102740.squirrel@webmail.univie.ac.at> Message-ID: <8D311EC4-491A-4277-8099-F4D546A0C4B3@cse.ohio-state.edu> Manfred, Do you see this problem when MPE is disabled? Also, we released MVAPICH2 1.2rc1 yesterday. It contains numerous bug fixes and enhancements for improved scalability and performance. Can you try it out and see if you still experience this problem? Brian On Jul 3, 2008, at 12:32 PM, Manfred Muecke wrote: > Hi, > > I have the following problem and ran out of ideas. Maybe someone > can help > with some advice. > > I get the following error message from all instances of my MPI-program > (FORTRAN90), using MVAPICH2 1.0 (compiled with "mpe=mpicheck"): > > Invalid communicator, error stack: > MPI_Comm_rank(107): MPI_Comm_rank(comm=0x0, rank=fffffd7fffdfd2bc) > failed > MPI_Comm_rank(65).: Invalid communicator > > The error is caused by a call to MPI_ALLGATHERV. It was discussed here > that a similiar looking error is caused by including the wrong > mpi.h. This > one differs however in that comm=0x0 (the actual value of the > communicator > was 1140850688). > > "mpif90 -show" gives: > > /opt/local/SunStudio12/SUNWspro/bin/f90 > -xO3 -xtarget=opteron -m64 > -I/opt/local/MVAPICH/mvapich2-1.0/include > -xO3 > -M/opt/local/MVAPICH/mvapich2-1.0/include > -L/opt/local/MVAPICH/mvapich2-1.0/lib > -lmpichf90 -lmpichf90 -lmpich > -L/usr/lib/amd64 > -L/usr/ucblib/amd64 > -lsocket -lnsl -lresolv -lpthread -ldat -lrt -lnsl -lsocket > > I have checked thoroughly and can not find any mpi.h from other > installations interfering. Any other ideas? > > Thanks for your help, Manfred > > > > > -- > Manfred M?cke manfred.muecke@univie.ac.at > Research Lab Computational Technologies and Applications > rlcta.univie.ac.at Lenaugasse 2, 1080 Wien, AUSTRIA > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From worleys at gmail.com Fri Jul 4 11:21:33 2008 From: worleys at gmail.com (Chris Worley) Date: Fri Jul 4 11:21:39 2008 Subject: [mvapich-discuss] SEGFAULT: mpispawn.c line 303 Message-ID: Using OFED 1.3 and the MVAPICH 1.0 included on OFED's download. Intermittently (not always) get segfault in mpispawn line 303 at large (i.e. 2048) core counts. Chris From Marcos.Verissimo at uclouvain.be Fri Jul 4 11:21:58 2008 From: Marcos.Verissimo at uclouvain.be (Marcos Verissimo Alves) Date: Fri Jul 4 11:22:10 2008 Subject: [mvapich-discuss] Trouble configuring MVAPICH2 1.2RC1 Message-ID: <62345.189.68.75.35.1215184918.squirrel@mmp-3-1.sipr-dc.ucl.ac.be> Hi all, I am trying to build the new release candidate of mvapich in our cluster, but I get an error: (...) checking for the InfiniBand includes path... default checking for the InfiniBand library path... default checking for library containing umad_init... no configure: error: 'libibumad not found. Did you specify --with-ib-libpath=?' configure: error: /bin/sh '/home/pcpm/mverissi/test/mvapich2-1.2rc1/src/mpid/ch3/channels/mrail/configure' failed for channels/mrail configure: error: Configure of src/mpid/ch3 failed! What is the usual name of the library file? Any way of getting around this, if I can't find the library containing umad_init ? Cheers, Marcos -- Dr. Marcos Verissimo Alves Post-Doctoral Fellow Unit? de Physico-Chimie et de Physique des Mat?riaux (PCPM) Universit? Catholique de Louvain 1 Place Croix du Sud, B-1348 Louvain-la-Neuve Belgique ------ Gort, Klaatu barada nikto. Klaatu barada nikto. Klaatu barada nikto. Free translation: Gort, Google is your friend. Google is your friend. Google is your friend. From curtisbr at cse.ohio-state.edu Sat Jul 5 00:51:25 2008 From: curtisbr at cse.ohio-state.edu (Brian Curtis) Date: Sat Jul 5 00:51:36 2008 Subject: [mvapich-discuss] Trouble configuring MVAPICH2 1.2RC1 In-Reply-To: <62345.189.68.75.35.1215184918.squirrel@mmp-3-1.sipr-dc.ucl.ac.be> References: <62345.189.68.75.35.1215184918.squirrel@mmp-3-1.sipr-dc.ucl.ac.be> Message-ID: Marcos, The error is indicating that the libibumad library cannot be found in the default search path. This is a required library. If you do not have this library installed, it can be obtained from the OpenFabrics Alliance software stack (http://www.openfabrics.org/). If you do not install the InfiniBand libraries in the system's library path (usr/ lib or usr/local/lib is recommended), please specify the path to this library during configuration with --with-ib-libpath={path to InfiniBand libraries}. Brian On Jul 4, 2008, at 11:21 AM, Marcos Verissimo Alves wrote: > Hi all, > > I am trying to build the new release candidate of mvapich in our > cluster, > but I get an error: > > (...) > checking for the InfiniBand includes path... default > checking for the InfiniBand library path... default > checking for library containing umad_init... no > configure: error: 'libibumad not found. Did you specify > --with-ib-libpath=?' > configure: error: /bin/sh > '/home/pcpm/mverissi/test/mvapich2-1.2rc1/src/mpid/ch3/channels/ > mrail/configure' > failed for channels/mrail > configure: error: Configure of src/mpid/ch3 failed! > > What is the usual name of the library file? Any way of getting around > this, if I can't find the library containing umad_init ? > > Cheers, > > Marcos > > -- > Dr. Marcos Verissimo Alves > Post-Doctoral Fellow > Unit? de Physico-Chimie et de Physique des Mat?riaux (PCPM) > Universit? Catholique de Louvain > 1 Place Croix du Sud, B-1348 > Louvain-la-Neuve > Belgique > > ------ > > Gort, Klaatu barada nikto. Klaatu barada nikto. Klaatu barada nikto. > > Free translation: > > Gort, Google is your friend. Google is your friend. Google is your > friend. > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From yogyas at gmail.com Sat Jul 5 04:23:47 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Sat Jul 5 04:23:54 2008 Subject: [mvapich-discuss] MPD related error Message-ID: Hi all, I am trying to run 64 processes using MVAPICH2-1.0.1-uDAPL on 8 nodes. Every node has 8 cores/cpus. Out of 64, sometimes one or more processes gets killed or closed. The node on which there are less than 8 processes running has following message which comes in /var/log/messages file :- Jul 4 13:23:05 pn02 mpdman: pn02_mpdman_12: mpd_uncaught_except_tb handling: exceptions.AttributeError: 'int' object has no attribute 'send_dict_msg' /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 652 handle_lhs_input self.ring.rhsSock.send_dict_msg(msg) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdlib.py 743 handle_active_streams handler(stream,*args) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 481 run rv = self.streamHandler.handle_active_streams(timeout=5.0) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1408 launch_mpdman_via_fork mpdman.run() /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1325 run_one_cli (manPid,toManSock) = self.launch_mpdman_via_fork(msg,man_env) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1199 do_mpdrun self.run_one_cli(lorank,msg) /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 854 handle_lhs_input self.do_mpdrun(msg) /home/htdg Can anybody give me some more info about this ? Is this some kind of setup/settings issue on nodes ? Thanks, Yogeshwar From panda at cse.ohio-state.edu Sat Jul 5 08:18:06 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Jul 5 08:18:15 2008 Subject: [mvapich-discuss] MPD related error In-Reply-To: Message-ID: Thanks for your note. You might have noticed that a new version of MVAPICH2 (1.2RC1) was released a few days back. This release has a non-MPD (daemon-less) startup scheme. This new start-up scheme is applicable for all interfaces including uDAPL. I will suggest you to upgrade your software stack to this release. It will provide you faster start-up and you need not worry about the MPD-related issues. DK On Sat, 5 Jul 2008, yogeshwar sonawane wrote: > Hi all, > > I am trying to run 64 processes using MVAPICH2-1.0.1-uDAPL on 8 nodes. > Every node has 8 cores/cpus. > > Out of 64, sometimes one or more processes gets killed or closed. The > node on which there are less than 8 processes running has following > message which comes in /var/log/messages file :- > > Jul 4 13:23:05 pn02 mpdman: pn02_mpdman_12: mpd_uncaught_except_tb > handling: exceptions.AttributeError: 'int' > object has no attribute 'send_dict_msg' > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 652 > handle_lhs_input self.ring.rhsSock.send_dict_msg(msg) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdlib.py 743 > handle_active_streams handler(stream,*args) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpdman.py 481 run > rv = self.streamHandler.handle_active_streams(timeout=5.0) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1408 > launch_mpdman_via_fork mpdman.run() > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1325 > run_one_cli (manPid,toManSock) = > self.launch_mpdman_via_fork(msg,man_env) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 1199 > do_mpdrun self.run_one_cli(lorank,msg) > /home/htdg/pn_mpi/mpi-bin_send-recv_pnet3/bin/mpd.py 854 > handle_lhs_input self.do_mpdrun(msg) /home/htdg > > Can anybody give me some more info about this ? > Is this some kind of setup/settings issue on nodes ? > > Thanks, > Yogeshwar > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From singh.jasjit at yahoo.co.in Mon Jul 7 09:44:11 2008 From: singh.jasjit at yahoo.co.in (jasjit singh) Date: Mon Jul 7 09:44:24 2008 Subject: [mvapich-discuss] EP attributes' values Message-ID: <179419.97136.qm@web94005.mail.in2.yahoo.com> Hi I am using mvapich2-1.0.1 While running more than 64 processes on 8 nodes (each with 8 cores, 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain attributes. 1) Value of max_rdma_write_iov changes from 0 to 42. Value of max_rdma_read_iov also changes from 0 to a non-zero value. I want to know why there is such a dramatic change in these values.How should we proceed if we want to run more than 64 processes successfully ? 2) Value of max_message_size attribute in our stack is 4294967296 (i.e 4GB) that is returned in dat_ia_query(). So we are expecting MVAPICH to set the same value for max_message_size while setting DAT_EP_ATTR in EP creation. It is doing so if we run upto 64 processes. But if number of processes exceed 64, MVAPICH sets this value to 1024(i.e 1K). This is again a drastic change. And what is more surprising is it does post recv for size larger than 1K. MVAPICH, it seems, is on one hand limiting MAX MESSAGE SIZE and on the other hand posting larger data size. I am sure that changes in these values have nothing to do with the number of nodes (or oversubscription, I essentially mean).(CMIIW) These changes are only due to increase in number of processes. And one thing more I want to confirm is this has nothing to do with cluster type whether this is small, medium or large as the limit for number of processes for small cluster is 128. Regards, Jasjit Singh __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080707/22842a17/attachment-0001.html From alex.theodore at hp.com Mon Jul 7 12:21:39 2008 From: alex.theodore at hp.com (Theodore, Alex) Date: Mon Jul 7 12:25:37 2008 Subject: [mvapich-discuss] MVAPICH2 Installation / Configuration help Message-ID: <2B7BF477E7CCD44892CBB596329A10A228D07E8EB7@GVW0432EXB.americas.hpqcorp.net> I've installed MVAPICH2 (mvapich2-1.2rc1) with the following options: ./configure --prefix=/opt/mvapich2 --with-rdma=gen2 --enable-f90 --enable-f77 --enable-mpe make make install Then I configured the following environment variables in ~/.bashrc: export MVAPICH2_HOME=/opt/mvapich2 export PATH=$MVAPICH2_HOME/bin:$MVAPICH2_HOME/sbin:$PATH export MANPATH=$MVAPICH2_HOME/man:$MANPATH When I try to run the application it doesn't seem to work properly: 1) Created file called "hosts1" with four compute nodes... each host's hostname on one line 2) Created .mpd.conf and .mpdpasswd with password on head node, and distributed to compute nodes 3) Ran "mpirun_rsh -ssh -np 4 -hostfile /root/MPI/hosts1 ./test-alex-bcast" with following output /usr/bin/env: mpispawn: No such file or directory Child exited abnormally! cleanupKilling remote processes.../usr/bin/env: mpispawn: No such file or directory /usr/bin/env: mpispawn: No such file or directory /usr/bin/env: mpispawn: No such file or directory DONE What am I missing? I'm sure this is likely a configuration issue.. any help / guidance would be greatly appreciated. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080707/a73f3800/attachment.html From sridharj at cse.ohio-state.edu Mon Jul 7 12:36:55 2008 From: sridharj at cse.ohio-state.edu (Jaidev Sridhar) Date: Mon Jul 7 12:35:35 2008 Subject: [mvapich-discuss] MVAPICH2 Installation / Configuration help In-Reply-To: <2B7BF477E7CCD44892CBB596329A10A228D07E8EB7@GVW0432EXB.americas.hpqcorp.net> References: <2B7BF477E7CCD44892CBB596329A10A228D07E8EB7@GVW0432EXB.americas.hpqcorp.net> Message-ID: <1215448615.13766.4.camel@t13.nowlab.cis.ohio-state.edu> Hi Alex, mpispawn is a utility that mpirun_rsh starts on all nodes. Can you check if a) mpispawn is installed in $MVAPICH2_HOME/bin (i.e., same directory as mpirun_rsh) on all nodes and b) $PATH (including $MVAPICH2_HOME/bin) is being propagated correctly -Jaidev On Mon, 2008-07-07 at 16:21 +0000, Theodore, Alex wrote: > I?ve installed MVAPICH2 (mvapich2-1.2rc1) with the following options: > > > > ./configure --prefix=/opt/mvapich2 --with-rdma=gen2 --enable-f90 > --enable-f77 --enable-mpe > > make > > make install > > > > Then I configured the following environment variables in ~/.bashrc: > > > > export MVAPICH2_HOME=/opt/mvapich2 > > export PATH=$MVAPICH2_HOME/bin:$MVAPICH2_HOME/sbin:$PATH > > export MANPATH=$MVAPICH2_HOME/man:$MANPATH > > > > When I try to run the application it doesn?t seem to work properly: > > > > 1) Created file called ?hosts1? with four compute nodes? each > host?s hostname on one line > > 2) Created .mpd.conf and .mpdpasswd with password on head node, > and distributed to compute nodes > > 3) Ran ?mpirun_rsh -ssh -np 4 > -hostfile /root/MPI/hosts1 ./test-alex-bcast? with following output > > > > /usr/bin/env: mpispawn: No such file or directory > > > > Child exited abnormally! > > cleanupKilling remote processes.../usr/bin/env: mpispawn: No such file > or directory > > /usr/bin/env: mpispawn: No such file or directory > > /usr/bin/env: mpispawn: No such file or directory > > DONE > > > > What am I missing? I?m sure this is likely a configuration issue.. > any help / guidance would be greatly appreciated. > > > > Thanks, > > > > Alex > > > > > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From sridharj at cse.ohio-state.edu Mon Jul 7 14:03:04 2008 From: sridharj at cse.ohio-state.edu (Jaidev Sridhar) Date: Mon Jul 7 14:01:44 2008 Subject: [Fwd: RE: [mvapich-discuss] SEGFAULT: mpispawn.c line 303] Message-ID: <1215453784.13766.12.camel@t13.nowlab.cis.ohio-state.edu> Looks like I missed the mailing list in my original response. Chris, in addition, we think the user application is segfaulting or terminating unexpectedly. In some versions of mvapich, you'd see mpispawn.c:303 Unexpected exit status when an application seg-faults. -Jaidev -------- Forwarded Message -------- > From: Jaidev Sridhar > Reply-To: Jaidev Sridhar > To: Chris Worley > Subject: RE: [mvapich-discuss] SEGFAULT: mpispawn.c line 303 > Date: Sat, 5 Jul 2008 10:20:05 -0400 > > Hi Chris, > > Do you see any messages on the console after failure? If yes, they would help us mail down this issue. > > -Jaidev > > - original message - > Subject: [mvapich-discuss] SEGFAULT: mpispawn.c line 303 > From: "Chris Worley" > Date: 07-04-2008 15:26 > > Using OFED 1.3 and the MVAPICH 1.0 included on OFED's download. > > Intermittently (not always) get segfault in mpispawn line 303 at large > (i.e. 2048) core counts. > > Chris > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > From chai.15 at osu.edu Mon Jul 7 14:45:01 2008 From: chai.15 at osu.edu (Lei Chai) Date: Mon Jul 7 14:45:05 2008 Subject: [mvapich-discuss] EP attributes' values In-Reply-To: <179419.97136.qm@web94005.mail.in2.yahoo.com> References: <179419.97136.qm@web94005.mail.in2.yahoo.com> Message-ID: <4872642D.5010509@osu.edu> Hi Jasjit, Thanks for using mvapich2. I believe you are using the udapl interface. When the number of processes is larger than 64, on demand connection establishment model is used for better scalability and thus the attribute values are different. If this is a problem on your stack, could you try to disable on demand by setting the threshold to be larger than the number of processes, e.g. $ mpiexec -n 64 -env MV2_ON_DEMAND_THRESHOLD 1024 ./a.out FYI, since the udapl interface in mvapich2 doesn't support the blocking progress mode yet, it will not be beneficial to use over-subscription. If you are using InfiniBand as the network we recommend you use the OFED interface in mvapich2, which provides the best performance, scalability, and features, such as blocking mode for over-subscription etc. The latest release is mvapich2-1.2rc1. Lei jasjit singh wrote: > Hi > > I am using mvapich2-1.0.1 > > While running more than 64 processes on 8 nodes (each with 8 cores, > 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain > attributes. > > 1) > Value of max_rdma_write_iov changes from 0 to 42. > Value of max_rdma_read_iov also changes from 0 to a non-zero value.. > I want to know why there is such a dramatic change in these values.How > should we proceed if we want to run more than 64 processes successfully ? > > 2) > Value of max_message_size attribute in our stack is 4294967296 (i.e > 4GB) that is returned in dat_ia_query(). So we are expecting MVAPICH > to set the same value for max_message_size while setting DAT_EP_ATTR > in EP creation. It is doing so if we run upto 64 processes. But if > number of processes exceed 64, MVAPICH sets this value to 1024(i.e > 1K). This is again a drastic change. And what is more surprising is it > does post recv for size larger than 1K. MVAPICH, it seems, is on one > hand limiting MAX MESSAGE SIZE and on the other hand posting larger > data size. > > I am sure that changes in these values have nothing to do with the > number of nodes (or oversubscription, I essentially mean).(CMIIW) > These changes are only due to increase in number of processes. And one > thing more I want to confirm is this has nothing to do with cluster > type whether this is small, medium or large as the limit for number of > processes for small cluster is 128. > > Regards, > Jasjit Singh > > > ------------------------------------------------------------------------ > Not happy with your email address? > Get the one you really want > - millions of new email addresses available now at Yahoo! > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From yogyas at gmail.com Tue Jul 8 09:46:21 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Tue Jul 8 09:46:29 2008 Subject: [mvapich-discuss] EP attributes' values In-Reply-To: <4872642D.5010509@osu.edu> References: <179419.97136.qm@web94005.mail.in2.yahoo.com> <4872642D.5010509@osu.edu> Message-ID: Hi lei, On 7/8/08, Lei Chai wrote: > Hi Jasjit, > > Thanks for using mvapich2. I believe you are using the udapl interface. > > When the number of processes is larger than 64, on demand connection > establishment model is used for better scalability and thus the attribute > values are different. If this is a problem on your stack, could you try to Can you elaborate more on this:- better scalability & change in the attribute values. Any link or reference will be also helpful. > disable on demand by setting the threshold to be larger than the number of > processes, e.g. > > $ mpiexec -n 64 -env MV2_ON_DEMAND_THRESHOLD 1024 ./a.out > > FYI, since the udapl interface in mvapich2 doesn't support the blocking > progress mode yet, it will not be beneficial to use over-subscription. If > you are using InfiniBand as the network we recommend you use the OFED > interface in mvapich2, which provides the best performance, scalability, and > features, such as blocking mode for over-subscription etc. The latest > release is mvapich2-1.2rc1. > > Lei > > > jasjit singh wrote: > > > > > Hi > > > > I am using mvapich2-1.0.1 > > > > While running more than 64 processes on 8 nodes (each with 8 cores, > 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain > attributes. > > > > 1) > > Value of max_rdma_write_iov changes from 0 to 42. > > Value of max_rdma_read_iov also changes from 0 to a non-zero value.. > > I want to know why there is such a dramatic change in these values.How > should we proceed if we want to run more than 64 processes successfully ? > > > > 2) > > Value of max_message_size attribute in our stack is 4294967296 (i.e 4GB) > that is returned in dat_ia_query(). So we are expecting MVAPICH to set the > same value for max_message_size while setting DAT_EP_ATTR in EP creation. It > is doing so if we run upto 64 processes. But if number of processes exceed > 64, MVAPICH sets this value to 1024(i.e 1K). This is again a drastic change. > And what is more surprising is it does post recv for size larger than 1K. > MVAPICH, it seems, is on one hand limiting MAX MESSAGE SIZE and on the other > hand posting larger data size. > > > > I am sure that changes in these values have nothing to do with the number > of nodes (or oversubscription, I essentially mean).(CMIIW) > > These changes are only due to increase in number of processes. And one > thing more I want to confirm is this has nothing to do with cluster type > whether this is small, medium or large as the limit for number of processes > for small cluster is 128. > > > > Regards, > > Jasjit Singh > > > > > > > ------------------------------------------------------------------------ > > Not happy with your email address? > > Get the one you really want > - millions of new > email addresses available now at Yahoo! > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > with regards, Yogeshwar From kus at free.net Tue Jul 8 09:57:32 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Tue Jul 8 09:57:40 2008 Subject: [mvapich-discuss] mvapich over OFED *and* over IBGD IB stack In-Reply-To: <200807071403.m67E3AMm008795@cse.ohio-state.edu> Message-ID: AFAIK mvapich versions something around 0.9.5 version were "switched" from support of IBGD Mellanox IB stack to support of OFED. But was there some mvapich versions which might work over IBGD *and* OFED ? I'm especially interesting for support of IBGD 1.8.0 and OFED-1.2 or 1.3. Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry Moscow From chai.15 at osu.edu Tue Jul 8 16:49:17 2008 From: chai.15 at osu.edu (Lei Chai) Date: Tue Jul 8 16:49:22 2008 Subject: [mvapich-discuss] EP attributes' values In-Reply-To: References: <179419.97136.qm@web94005.mail.in2.yahoo.com> <4872642D.5010509@osu.edu> Message-ID: <4873D2CD.6050602@osu.edu> To clarify, I mean when on demand is used, the code path is different and the values are set in different places, that's why you have observed different values for on demand and non on demand cases. These values are not directly related to scalability. Regarding the specific parameters Jasjit mentioned, we found that the values were not explicitly set for on demand. I'm attaching a patch below. Could you try and see if it solves your problem. The patch has also been checked in to the latest trunk version. Regards, Lei --------------------------------------------------------- Index: src/mpid/osu_ch3/channels/mrail/src/udapl/rdma_udapl_priv.c =================================================================== --- src/mpid/osu_ch3/channels/mrail/src/udapl/rdma_udapl_priv.c (revision 2839) +++ src/mpid/osu_ch3/channels/mrail/src/udapl/rdma_udapl_priv.c (working copy) @@ -1067,6 +1067,7 @@ { ep_attr.service_type = DAT_SERVICE_TYPE_RC; ep_attr.max_mtu_size = rdma_default_mtu_size; + ep_attr.max_message_size = ia_attr.max_message_size; ep_attr.max_rdma_size = ia_attr.max_rdma_size; ep_attr.qos = DAT_QOS_BEST_EFFORT; ep_attr.recv_completion_flags = DAT_COMPLETION_DEFAULT_FLAG; @@ -1081,6 +1082,8 @@ ep_attr.max_request_iov = MIN (rdma_default_max_sg_list, ia_attr.max_iov_segments_per_dto); + ep_attr.max_rdma_write_iov = 0; + ep_attr.max_rdma_read_iov = 0; ep_attr.max_rdma_read_in = DAPL_DEFAULT_MAX_RDMA_IN; ep_attr.max_rdma_read_out = DAPL_DEFAULT_MAX_RDMA_OUT; ----------------------------------------------------------------------------- yogeshwar sonawane wrote: > Hi lei, > > On 7/8/08, Lei Chai wrote: > >> Hi Jasjit, >> >> Thanks for using mvapich2. I believe you are using the udapl interface. >> >> When the number of processes is larger than 64, on demand connection >> establishment model is used for better scalability and thus the attribute >> values are different. If this is a problem on your stack, could you try to >> > > Can you elaborate more on this:- better scalability & change in the > attribute values. > Any link or reference will be also helpful. > > >> disable on demand by setting the threshold to be larger than the number of >> processes, e.g. >> >> $ mpiexec -n 64 -env MV2_ON_DEMAND_THRESHOLD 1024 ./a.out >> >> FYI, since the udapl interface in mvapich2 doesn't support the blocking >> progress mode yet, it will not be beneficial to use over-subscription. If >> you are using InfiniBand as the network we recommend you use the OFED >> interface in mvapich2, which provides the best performance, scalability, and >> features, such as blocking mode for over-subscription etc. The latest >> release is mvapich2-1.2rc1. >> >> Lei >> >> >> jasjit singh wrote: >> >> >>> Hi >>> >>> I am using mvapich2-1.0.1 >>> >>> While running more than 64 processes on 8 nodes (each with 8 cores, >>> >> 64-bit, RHEL-2.6.9-42.ELsmp), I have observed some changes in certain >> attributes. >> >>> 1) >>> Value of max_rdma_write_iov changes from 0 to 42. >>> Value of max_rdma_read_iov also changes from 0 to a non-zero value.. >>> I want to know why there is such a dramatic change in these values.How >>> >> should we proceed if we want to run more than 64 processes successfully ? >> >>> 2) >>> Value of max_message_size attribute in our stack is 4294967296 (i.e 4GB) >>> >> that is returned in dat_ia_query(). So we are expecting MVAPICH to set the >> same value for max_message_size while setting DAT_EP_ATTR in EP creation. It >> is doing so if we run upto 64 processes. But if number of processes exceed >> 64, MVAPICH sets this value to 1024(i.e 1K). This is again a drastic change. >> And what is more surprising is it does post recv for size larger than 1K. >> MVAPICH, it seems, is on one hand limiting MAX MESSAGE SIZE and on the other >> hand posting larger data size. >> >>> I am sure that changes in these values have nothing to do with the number >>> >> of nodes (or oversubscription, I essentially mean).(CMIIW) >> >>> These changes are only due to increase in number of processes. And one >>> >> thing more I want to confirm is this has nothing to do with cluster type >> whether this is small, medium or large as the limit for number of processes >> for small cluster is 128. >> >>> Regards, >>> Jasjit Singh >>> >>> >>> >>> >> ------------------------------------------------------------------------ >> >>> Not happy with your email address? >>> Get the one you really want >>> >> - millions of new >> email addresses available now at Yahoo! >> >> >> ------------------------------------------------------------------------ >> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> >>> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > with regards, > Yogeshwar > From nilesh_awate at yahoo.com Wed Jul 9 08:41:53 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Wed Jul 9 08:42:02 2008 Subject: [mvapich-discuss] Couldn't run mpi program with mvapich2-1.2rc1 Message-ID: <98078.59087.qm@web94115.mail.in2.yahoo.com> Hi all, I downloaded mvapich2-1.2rc1 (as it runs without mpd daemons) for a trial. i configure it for udapl with prefix ~/mpi_bin_rc1 then set path to ~/mpi_bin_rc1/bin:$PATH (both mpirun_rsh & mpispawn present) to run mpi code i executed following command ./mpirun_rsh -np 2 node1 node2 ./mpicode (ssh wo passwd is enabled with nfs share) first i face foll. error /usr/bin/env mpispawn: no such file then i search on mailing list, there was a reply from Karl that try to run from installed path of mvapich bin directory because "mpispawn is being invoked without execvp" i tried that, then i got following error Child exited abnormally! cleanupKilling remote processes...DONE then i saw the output of ps -eaf on both the node i observe mpispawn was running on remote node & "/usr/bin/ssh -q sun00 cd /home/nilesha; /usr/bin/env LD_LIBRARY_PATH=/usr/mvapich/lib/share" process was hanging on executing node I don't know what is missing from my side ? please tell me if any thing more i need to do, waiting for reply, Nilesh Download prohibited? No problem. CHAT from any browser, without download. Go to http://in.messenger.yahoo.com/webmessengerpromo.php/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080709/4ad26ea7/attachment.html From nilesh_awate at yahoo.com Wed Jul 9 09:05:55 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Wed Jul 9 09:06:08 2008 Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled Message-ID: <243435.83777.qm@web94111.mail.in2.yahoo.com> Hi lei, i have created a small patch which take care of transport error; abort the mpi appliaction and come out of it. i have tried it on mvapich2-1.0.1 & mvapich2-1.0.3 here is the patch --- orig_mvapich2-1.0.1/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c 2007-09-06 02:14:15.000000000 +0530 +++ mvapich2-1.0.1_patched/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c 2008-07-02 15:30:45.000000000 +0530 @@ -455,6 +455,8 @@ int i, j, needed; static int last_poll = 0; int type = T_CHANNEL_NO_ARRIVE; + int rank; + PMI_Get_rank(&rank); *vbuf_handle = NULL; for (i = last_poll, j = 0; @@ -467,6 +469,16 @@ { DEBUG_PRINT ("[poll cq]: get complete queue entry\n"); assert (event.event_number == DAT_DTO_COMPLETION_EVENT); + + /* Following is the patch to come out in case of fatal error like + DAT_DTO_ERR_TRANSPORT (occures when network disfunction) */ + + if (event.event_data.dto_completion_event_data.status != DAT_DTO_SUCCESS) + { + udapl_error_abort(UDAPL_STATUS_ERR,"[%d]DAT_EVD_ERROR in Consume_signals %x \n",rank, + event.event_data.dto_completion_event_data.status); + } + sc = ((struct vbuf *) event.event_data. dto_completion_event_data.user_cookie.as_ptr)->desc; v = (vbuf *) ((aint_t) sc.cookie.as_ptr); regards Nilesh ----- Original Message ---- From: LEI CHAI To: nilesh awate Cc: MVAPICH2 Sent: Wednesday, 18 June, 2008 2:27:32 AM Subject: Re: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled Hi, We have never got the DAT_DTO_ERR_TRANSPORT error before. This error usually means the network has problem and is not functional well. I think a proper way to handle it is to report the error and abort the mpi program since it is kind of a fatal error. Lei ----- Original Message ----- From: nilesh awate Date: Tuesday, June 17, 2008 10:58 am Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled To: MVAPICH2 > Hi All, > I am using mvapich2-1.0.1 over udapl stack. > I am getting DAT_DTO_ERR_TRANSPORT error at udapl level, but mpi application is not terminating with some error > as i browse through the code i observe following thing. > ret1 = dat_evd_dequeue (MPIDI_CH3I_RDMA_Process.cq_hndl[i], &event); > if (ret1 == DAT_SUCCESS) { > assert (event.event_number == DAT_DTO_COMPLETION_EVENT); > /* but there is no check for event.event_data.dto_completion_event_data.status */ > . . . . > . . . . } > but above condition is handled in rdma_udapl_1sc.c file while dequeuing > what is expected behavior of mpi when udapl throws error like DAT_DTO_ERR_TRANSPORT ? > How this kind of error going to be handled at mpi level? > OR > How underlying udapl errors are reflected by mpi ? > I am using pallas as an application for testing purpose > waiting for reply > thanking > Nilesh ________________________________ > Bring your gang together. Do your thing. Find your favourite Yahoo! Group. > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state..edu/mailman/listinfo/mvapich-discuss Meet people who discuss and share your passions. Go to http://in.promos.yahoo.com/groups/bestofyahoo/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080709/2b217c9a/attachment-0001.html From koop at cse.ohio-state.edu Wed Jul 9 11:17:48 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed Jul 9 11:17:55 2008 Subject: [mvapich-discuss] mvapich over OFED *and* over IBGD IB stack In-Reply-To: Message-ID: Mikhail, MVAPICH has had support for both OpenFabrics and VAPI (IBGD) for quite some time although we are now phasing out support of VAPI since vendors now suggest OFED. MVAPICH 1.0 has support for both OFED and IBGD. It will require two different compiled versions, however. Let me know if this doesn't answer your question. Matt On Tue, 8 Jul 2008, Mikhail Kuzminsky wrote: > AFAIK mvapich versions something around 0.9.5 version were "switched" > from support of IBGD Mellanox IB stack to support of OFED. > > But was there some mvapich versions which might work over IBGD *and* > OFED ? I'm especially interesting for support of IBGD 1.8.0 and > OFED-1.2 or 1.3. > > Mikhail Kuzminsky > Computer Assistance to Chemical Research Center > Zelinsky Institute of Organic Chemistry > Moscow > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From koop at cse.ohio-state.edu Wed Jul 9 11:23:36 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed Jul 9 11:23:41 2008 Subject: [mvapich-discuss] Couldn't run mpi program with mvapich2-1.2rc1 In-Reply-To: <98078.59087.qm@web94115.mail.in2.yahoo.com> Message-ID: Nilesh, How are you setting the path? Is it in your .bashrc (or shell equivalent) or did you just set it the current environment? Also, is mpispawn also in the same directory where you are doing the ./mpirun_rsh -np 2 node1 node2 ./mpicode ? Matt On Wed, 9 Jul 2008, nilesh awate wrote: > Hi all, I downloaded mvapich2-1.2rc1 (as it runs without mpd daemons) for a trial. i configure it for udapl with prefix ~/mpi_bin_rc1 then set path to ~/mpi_bin_rc1/bin:$PATH (both mpirun_rsh & mpispawn present) to run mpi code i executed following command ./mpirun_rsh -np 2 node1 node2 ./mpicode (ssh wo passwd is enabled with nfs share) first i face foll. error /usr/bin/env mpispawn: no such file then i search on mailing list, there was a reply from Karl that try to run from installed path of mvapich bin directory because "mpispawn is being invoked without execvp" i tried that, then i got following error Child exited abnormally! cleanupKilling remote processes...DONE then i saw the output of ps -eaf on both the node i observe mpispawn was running on remote node & "/usr/bin/ssh -q sun00 cd /home/nilesha; /usr/bin/env LD_LIBRARY_PATH=/usr/mvapich/lib/share" process was hanging on executing node I don't know what is missing from my side ? please tell me if any thing more i need to do, waiting for reply, Nilesh Download prohibited? No problem. CHAT from any browser, without download. Go to http://in.messenger.yahoo.com/webmessengerpromo.php/ From chai.15 at osu.edu Wed Jul 9 15:01:47 2008 From: chai.15 at osu.edu (Lei Chai) Date: Wed Jul 9 15:01:50 2008 Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled In-Reply-To: <243435.83777.qm@web94111.mail.in2.yahoo.com> References: <243435.83777.qm@web94111.mail.in2.yahoo.com> Message-ID: <48750B1B.2080806@osu.edu> Hi Nilesh, Thanks for the patch. It has been applied to the latest mvapich2 svn trunk with minor enhancement. Lei nilesh awate wrote: > Hi lei, > > i have created a small patch which take care of transport error; > abort the mpi appliaction > and come out of it. > i have tried it on mvapich2-1.0.1 & mvapich2-1.0.3 > > here is the patch > > --- > orig_mvapich2-1.0.1/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c > 2007-09-06 02:14:15.000000000 +0530 > +++ > mvapich2-1.0.1_patched/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c > 2008-07-02 15:30:45.000000000 +0530 > @@ -455,6 +455,8 @@ > int i, j, needed; > static int last_poll = 0; > int type = T_CHANNEL_NO_ARRIVE; > + int rank; > + PMI_Get_rank(&rank); > > *vbuf_handle = NULL; > for (i = last_poll, j = 0; > @@ -467,6 +469,16 @@ > { > DEBUG_PRINT ("[poll cq]: get complete queue entry\n"); > assert (event.event_number == DAT_DTO_COMPLETION_EVENT); > + > + /* Following is the patch to come out in case of fatal > error like > + DAT_DTO_ERR_TRANSPORT (occures when network > disfunction) */ > + > + if (event.event_data.dto_completion_event_data.status > != DAT_DTO_SUCCESS) > + { > + > udapl_error_abort(UDAPL_STATUS_ERR,"[%d]DAT_EVD_ERROR in > Consume_signals %x \n",rank, > + > event.event_data.dto_completion_event_data.status); > + } > + > sc = ((struct vbuf *) event.event_data. > > dto_completion_event_data.user_cookie.as_ptr)->desc; > v = (vbuf *) ((aint_t) sc.cookie.as_ptr); > > > regards > > Nilesh > > > ----- Original Message ---- > From: LEI CHAI > To: nilesh awate > Cc: MVAPICH2 > Sent: Wednesday, 18 June, 2008 2:27:32 AM > Subject: Re: [mvapich-discuss] dat_evd_dequeue erroneous condition is > not handled > > Hi, > > We have never got the DAT_DTO_ERR_TRANSPORT error before. This error > usually means the network has problem and is not functional well. I > think a proper way to handle it is to report the error and abort the > mpi program since it is kind of a fatal error. > > Lei > > > ----- Original Message ----- > From: nilesh awate > Date: Tuesday, June 17, 2008 10:58 am > Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not > handled > To: MVAPICH2 > > > > > Hi All, > > > I am using mvapich2-1.0.1 over udapl stack. > > > I am getting DAT_DTO_ERR_TRANSPORT error at udapl level, but mpi > application is not terminating with some error > > > as i browse through the code i observe following thing. > > > ret1 = dat_evd_dequeue (MPIDI_CH3I_RDMA_Process.cq_hndl[i], &event); > > if (ret1 == DAT_SUCCESS) > { > > assert (event.event_number == DAT_DTO_COMPLETION_EVENT); > > /* but there is no check for > event.event_data.dto_completion_event_data.status */ > > . . . . > > . . . . > > } > > > but above condition is handled in rdma_udapl_1sc.c file while dequeuing > > > what is expected behavior of mpi when udapl throws error like > DAT_DTO_ERR_TRANSPORT ? > > > How this kind of error going to be handled at mpi level? > > OR > > How underlying udapl errors are reflected by mpi ? > > > I am using pallas as an application for testing purpose > > > waiting for reply > > thanking > > Nilesh > > > > > > > > ------------------------------------------------------------------------ > > Bring your gang together. Do your thing. Find your favourite Yahoo! > Group. > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > ------------------------------------------------------------------------ > Bollywood, fun, friendship, sports and more. You name it, we have it. > From yogyas at gmail.com Sun Jul 13 07:22:37 2008 From: yogyas at gmail.com (yogeshwar sonawane) Date: Sun Jul 13 07:22:46 2008 Subject: [mvapich-discuss] HPL with mvapich2-1.0.1 issue. Message-ID: Hi all, I am using mvapich2-1.0.1 with uDAPL as the device configured with default settings like shared memory support. I am running HPL compiled with this mpi binaries. HPL-version 1.0a downloaded from www.netlib.org is used. ATLAS-3.8.1 is used which is required for HPL. I am using udapl from OFED-1.2 on IB card. Now, when i run HPL with 16 processes, on a single node (quad-core, quad-socket having 64 GB of RAM), machine gets stuck/hang after first HPL reading. This is not kernel-panic condition. I did some observations. The problem size of HPL is for 65 %, 70 % & 75 % of total memory with multiple NB values. When HPL is fired, everything is smooth, around 14 GB out of 64 GB is free. After first reading with some combination is displayed, memory usage increases to full. Then swap space is also comsumed near to full. This happens very quickly. Then machine becomes unresponsive to commands. But machine is able to ping from other nodes. Now after around 2 hrs, HPL exited with "caused collective abort of all ranks exit status of rank 14: killed by signal 9" error. There were kernel messages "out of memory, killing xhpl..." Multiple runs, having different N, have shown similar behaviour. One point to note is, after first reading only problem will start. I tried to provide HPL.dat which will produce only single reading. That run was successful. Such multiple runs of HPL, each producing only single reading/combination are done. All are successful. Problem seems to be there when multiple combination/reading HPL.dat is used. I did the same HPL run with MVAPICH2-1.0.1 compiled for TCP/IP again on single node. But, this run was successful, with all readings displayed, no swap usage & normal closure of HPL. Can anybody help me to solve the issue ? Any links or references are welcomed. I am not sure whether this list is the correct for HPL related query. So, kindly guide me on this also. Thanks, Yogeshwar From noam.bernstein at nrl.navy.mil Wed Jul 16 10:14:27 2008 From: noam.bernstein at nrl.navy.mil (Noam Bernstein) Date: Wed Jul 16 10:13:34 2008 Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance Message-ID: Should I be surprised as this gap in bandwidth between mvapich 1 and mvapich 2 (OSU benchmarks 3.0, osu_bibw)? mpi1 version is quite close to expected maximum for IB (8 Gb/s each way), but mpi2 is 25% lower. Our cluster uses dual processor single core Opterons, Mellanox Infiniband HCAs with OFED 1.2.5.1, only 1 processor on each node in use. Below, mpi1 is mvapich 1.0.1 compiled with make.mvapich.gen2 mpi2 is mvapich2 1.0.3 compiled with make.mvapich2.ofa No other flags at compile or run time, everything compiled with gcc. thanks, Noam bibw.mpi1.stdout:Warning: no access to tty (Bad file descriptor). bibw.mpi1.stdout:Thus no job control in this shell. bibw.mpi1.stdout:orig machines bibw.mpi1.stdout:edited machines bibw.mpi1.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 bibw.mpi1.stdout:# Size Bi-Bandwidth (MB/s) bibw.mpi1.stdout:1 1.51 bibw.mpi1.stdout:2 3.12 bibw.mpi1.stdout:4 6.15 bibw.mpi1.stdout:8 11.83 bibw.mpi1.stdout:16 23.46 bibw.mpi1.stdout:32 41.45 bibw.mpi1.stdout:64 81.72 bibw.mpi1.stdout:128 156.60 bibw.mpi1.stdout:256 264.41 bibw.mpi1.stdout:512 423.20 bibw.mpi1.stdout:1024 604.07 bibw.mpi1.stdout:2048 772.51 bibw.mpi1.stdout:4096 883.79 bibw.mpi1.stdout:8192 1029.38 bibw.mpi1.stdout:16384 1469.52 bibw.mpi1.stdout:32768 1666.29 bibw.mpi1.stdout:65536 1784.16 bibw.mpi1.stdout:131072 1685.49 bibw.mpi1.stdout:262144 1883.22 bibw.mpi1.stdout:524288 1901.34 bibw.mpi1.stdout:1048576 1910.08 bibw.mpi1.stdout:2097152 1917.89 bibw.mpi1.stdout:4194304 1919.68 bibw.mpi2.stdout:Warning: no access to tty (Bad file descriptor). bibw.mpi2.stdout:Thus no job control in this shell. bibw.mpi2.stdout:orig machines bibw.mpi2.stdout:edited machines bibw.mpi2.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 bibw.mpi2.stdout:# Size Bi-Bandwidth (MB/s) bibw.mpi2.stdout:1 1.10 bibw.mpi2.stdout:2 2.21 bibw.mpi2.stdout:4 4.04 bibw.mpi2.stdout:8 8.33 bibw.mpi2.stdout:16 16.07 bibw.mpi2.stdout:32 30.32 bibw.mpi2.stdout:64 62.31 bibw.mpi2.stdout:128 121.45 bibw.mpi2.stdout:256 216.58 bibw.mpi2.stdout:512 373.28 bibw.mpi2.stdout:1024 568.49 bibw.mpi2.stdout:2048 739.37 bibw.mpi2.stdout:4096 878.16 bibw.mpi2.stdout:8192 889.26 bibw.mpi2.stdout:16384 1079.31 bibw.mpi2.stdout:32768 1164.42 bibw.mpi2.stdout:65536 1226.60 bibw.mpi2.stdout:131072 1227.85 bibw.mpi2.stdout:262144 1265.47 bibw.mpi2.stdout:524288 1262.38 bibw.mpi2.stdout:1048576 1747.40 bibw.mpi2.stdout:2097152 1582.24 bibw.mpi2.stdout:4194304 1543.45 From huanwei at cse.ohio-state.edu Wed Jul 16 12:26:07 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Wed Jul 16 12:26:15 2008 Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance In-Reply-To: Message-ID: Hi Noam, mvapich and mvapich2 should have very close performance and we have never seen the difference between the peak bandwidth reported by OSU benchmarks. May I ask what HCA that you are using on your systems? And are there multiple HCAs on each node? CPU affinity can also play a role here. Can you manually set CPU mappings? You can do that by setting environmental variables: mvapich1: mpirun_rsh -np 2 h1 h2 VIADEV_CPU_MAPPING=0 ./a.out (for detail, see http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-1440009.6.6) mvapich2-1.0.3 does not support manual mapping. You can change VIADEV_CPU_MAPPING from 0 and 1 above and see if CPU mapping is playing a role here. However, we just released mvapich2-1.2rc1, which will support cpu mappings. We suggest you try this version as well. If you use mvapich2-1.2, then you can set mapping by (this version support mpirun_rsh startup as mvapich1): mpirun_rsh -np 2 h1 h2 MV2_CPU_MAPPING=0 ./a.out (http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2rc1.html#x1-320006.8) Hope this helps. -- Wei > ---------- Forwarded message ---------- > Date: Wed, 16 Jul 2008 10:14:27 -0400 > From: Noam Bernstein > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance > > Should I be surprised as this gap in bandwidth between mvapich 1 and > mvapich 2 > (OSU benchmarks 3.0, osu_bibw)? mpi1 version is quite close to > expected maximum for IB (8 Gb/s each way), but mpi2 is 25% lower. > > Our cluster uses dual processor single core Opterons, Mellanox > Infiniband > HCAs with OFED 1.2.5.1, only 1 processor on each node in use. > > Below, mpi1 is > mvapich 1.0.1 compiled with make.mvapich.gen2 > mpi2 is > mvapich2 1.0.3 compiled with make.mvapich2.ofa > > No other flags at compile or run time, everything compiled with gcc. > > thanks, > Noam > > > bibw.mpi1.stdout:Warning: no access to tty (Bad file descriptor). > bibw.mpi1.stdout:Thus no job control in this shell. > bibw.mpi1.stdout:orig machines > bibw.mpi1.stdout:edited machines > bibw.mpi1.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 > bibw.mpi1.stdout:# Size Bi-Bandwidth (MB/s) > bibw.mpi1.stdout:1 1.51 > bibw.mpi1.stdout:2 3.12 > bibw.mpi1.stdout:4 6.15 > bibw.mpi1.stdout:8 11.83 > bibw.mpi1.stdout:16 23.46 > bibw.mpi1.stdout:32 41.45 > bibw.mpi1.stdout:64 81.72 > bibw.mpi1.stdout:128 156.60 > bibw.mpi1.stdout:256 264.41 > bibw.mpi1.stdout:512 423.20 > bibw.mpi1.stdout:1024 604.07 > bibw.mpi1.stdout:2048 772.51 > bibw.mpi1.stdout:4096 883.79 > bibw.mpi1.stdout:8192 1029.38 > bibw.mpi1.stdout:16384 1469.52 > bibw.mpi1.stdout:32768 1666.29 > bibw.mpi1.stdout:65536 1784.16 > bibw.mpi1.stdout:131072 1685.49 > bibw.mpi1.stdout:262144 1883.22 > bibw.mpi1.stdout:524288 1901.34 > bibw.mpi1.stdout:1048576 1910.08 > bibw.mpi1.stdout:2097152 1917.89 > bibw.mpi1.stdout:4194304 1919.68 > > bibw.mpi2.stdout:Warning: no access to tty (Bad file descriptor). > bibw.mpi2.stdout:Thus no job control in this shell. > bibw.mpi2.stdout:orig machines > bibw.mpi2.stdout:edited machines > bibw.mpi2.stdout:# OSU MPI Bi-Directional Bandwidth Test v3.0 > bibw.mpi2.stdout:# Size Bi-Bandwidth (MB/s) > bibw.mpi2.stdout:1 1.10 > bibw.mpi2.stdout:2 2.21 > bibw.mpi2.stdout:4 4.04 > bibw.mpi2.stdout:8 8.33 > bibw.mpi2.stdout:16 16.07 > bibw.mpi2.stdout:32 30.32 > bibw.mpi2.stdout:64 62.31 > bibw.mpi2.stdout:128 121.45 > bibw.mpi2.stdout:256 216.58 > bibw.mpi2.stdout:512 373.28 > bibw.mpi2.stdout:1024 568.49 > bibw.mpi2.stdout:2048 739.37 > bibw.mpi2.stdout:4096 878.16 > bibw.mpi2.stdout:8192 889.26 > bibw.mpi2.stdout:16384 1079.31 > bibw.mpi2.stdout:32768 1164.42 > bibw.mpi2.stdout:65536 1226.60 > bibw.mpi2.stdout:131072 1227.85 > bibw.mpi2.stdout:262144 1265.47 > bibw.mpi2.stdout:524288 1262.38 > bibw.mpi2.stdout:1048576 1747.40 > bibw.mpi2.stdout:2097152 1582.24 > bibw.mpi2.stdout:4194304 1543.45 > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From thakur at mcs.anl.gov Thu Jul 17 13:23:59 2008 From: thakur at mcs.anl.gov (Rajeev Thakur) Date: Thu Jul 17 13:24:10 2008 Subject: [mvapich-discuss] FW: Proposed patches to MVAPICH and MVAPICH2 rpm spec files Message-ID: <002a01c8e831$e6b98aa0$860add8c@mcs.anl.gov> -----Original Message----- From: owner-mpich2-dev@mcs.anl.gov [mailto:owner-mpich2-dev@mcs.anl.gov] Sent: Thursday, July 17, 2008 12:14 PM To: owner-mpich2-dev@mcs.anl.gov Subject: BOUNCE mpich2-dev@mcs.anl.gov: Non-member submission from ["Mike Heinz" ] >From owner-mpich2-dev@mcs.anl.gov Thu Jul 17 12:14:11 2008 Received: from mailgw.mcs.anl.gov (mailgw.mcs.anl.gov [140.221.9.4]) by mcs.anl.gov (8.11.6/8.9.3) with ESMTP id m6HHE8l21800 for ; Thu, 17 Jul 2008 12:14:11 -0500 Received: from localhost (localhost [127.0.0.1]) by mailgw.mcs.anl.gov (Postfix) with ESMTP id C7E60348004 for ; Thu, 17 Jul 2008 12:14:08 -0500 (CDT) X-Greylist: delayed 60 seconds by postgrey-1.21 at mailgw.mcs.anl.gov; Thu, 17 Jul 2008 12:14:06 CDT Received: from EPEXCH1.qlogic.org (eppat.qlogic.com [198.186.5.11]) by mailgw.mcs.anl.gov (Postfix) with ESMTP id BEA6E348002 for ; Thu, 17 Jul 2008 12:14:06 -0500 (CDT) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C8E830.6055FDD4" Subject: Proposed patches to MVAPICH and MVAPICH2 rpm spec files Date: Thu, 17 Jul 2008 12:13:03 -0500 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Proposed patches to MVAPICH and MVAPICH2 rpm spec files Thread-Index: AcjoMF88YEzoPuNcSj6yftFO624GxA== From: "Mike Heinz" To: , X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov X-Spam-Status: No, hits=0.2 required=5.0 tests=HTML_30_40,HTML_MESSAGE,PATCH_UNIFIED_DIFF version=2.55 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) X-MCS-Mail-Loop: mpich2-dev This is a multi-part message in MIME format. ------_=_NextPart_001_01C8E830.6055FDD4 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I'm not sure who the best person is to receive these changes: We've been encountering complications whe converting users to OFED 1.3 because the scripts provided for configuring the shell (mpivars.sh and mpivars.csh) don't update the library path. This can lead to MPI programs failing to link or failing to run. The fix is to modify the spec files for the RPMs for these packages so that they set the LD_LIBRARY_PATH as well as the PATH. =20 The fix for MVAPICH-1.0.1 is this: =20 --- mvapich.spec.orig 2008-07-16 17:06:44.000000000 -0400 +++ mvapich.spec 2008-07-16 16:49:27.000000000 -0400 @@ -300,17 +300,25 @@ if ! echo \${PATH} | grep -q %{_prefix}/bin ; then export PATH=3D%{_prefix}/bin:\${PATH} fi +if ! echo \${LD_LIBRARY_PATH} | grep -q %{_prefix}/lib ; then + export LD_LIBRARY_PATH=3D%{_prefix}/lib:%{_prefix}/lib:/shared:\${LD_LIBRARY_PAT= H } +fi EOF =20 # Script for csh cat < %{build_root}/%{_prefix}/bin/%{shell_scripts_basename}.csh -if (?$path) then - if ( "\${path}" !~ *%{_prefix}/bin* ) then - setenv path %{_prefix}/bin:\$path +if ("\$path" !~ *%{_prefix}/bin) then + set path=3D(%{_prefix}/bin \$path) +endif + +if ("1" =3D=3D "\$?LD_LIBRARY_PATH") then + if ("\$LD_LIBRARY_PATH" !~ *%{_prefix}/lib) then + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared:\${LD_LIBRARY_PATH} endif else - setenv path %{_prefix}/bin: + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared endif + EOF =20 and the fix for MVAPICH2-1.0.3 is this: =20 =20 --- ../mvapich2.spec.orig 2008-07-16 17:17:10.000000000 -0400 +++ mvapich2.spec 2008-07-17 09:03:19.000000000 -0400 @@ -261,12 +261,16 @@ =20 # Additionally, create the mpivars.[c]sh files. cat >bin/mpivars.csh < Message-ID: Hi Mike, Thanks for posting these patches. In future, please feel free to post patches related to MVAPICH and MVAPICH2 to mvapich-discuss list (cc'ed in this e-mail). Pasha (cc'ed here) will take care of the changes to MVAPICH rpm spec file. Jonathan (cc'ed here) will take care of the changes to MVAPICH2 rpm spec file. Thanks, DK On Thu, 17 Jul 2008, Mike Heinz wrote: > I'm not sure who the best person is to receive these changes: We've been > encountering complications whe converting users to OFED 1.3 because the > scripts provided for configuring the shell (mpivars.sh and mpivars.csh) > don't update the library path. This can lead to MPI programs failing to > link or failing to run. The fix is to modify the spec files for the RPMs > for these packages so that they set the LD_LIBRARY_PATH as well as the > PATH. > > The fix for MVAPICH-1.0.1 is this: > > --- mvapich.spec.orig 2008-07-16 17:06:44.000000000 -0400 > +++ mvapich.spec 2008-07-16 16:49:27.000000000 -0400 > @@ -300,17 +300,25 @@ > if ! echo \${PATH} | grep -q %{_prefix}/bin ; then > export PATH=%{_prefix}/bin:\${PATH} > fi > +if ! echo \${LD_LIBRARY_PATH} | grep -q %{_prefix}/lib ; then > + export > LD_LIBRARY_PATH=%{_prefix}/lib:%{_prefix}/lib:/shared:\${LD_LIBRARY_PATH > } > +fi > EOF > > # Script for csh > cat < %{build_root}/%{_prefix}/bin/%{shell_scripts_basename}.csh > -if (?$path) then > - if ( "\${path}" !~ *%{_prefix}/bin* ) then > - setenv path %{_prefix}/bin:\$path > +if ("\$path" !~ *%{_prefix}/bin) then > + set path=(%{_prefix}/bin \$path) > +endif > + > +if ("1" == "\$?LD_LIBRARY_PATH") then > + if ("\$LD_LIBRARY_PATH" !~ *%{_prefix}/lib) then > + setenv LD_LIBRARY_PATH > %{_prefix}/lib:%{_prefix}/lib/shared:\${LD_LIBRARY_PATH} > endif > else > - setenv path %{_prefix}/bin: > + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared > endif > + > EOF > > > and the fix for MVAPICH2-1.0.3 is this: > > > --- ../mvapich2.spec.orig 2008-07-16 17:17:10.000000000 -0400 > +++ mvapich2.spec 2008-07-17 09:03:19.000000000 -0400 > @@ -261,12 +261,16 @@ > > # Additionally, create the mpivars.[c]sh files. > cat >bin/mpivars.csh < -if (\$?path) then > - if ( "\${path}" !~ *%{_prefix}/bin* ) then > +if ("\$path" !~ *%{_prefix}/bin) then > set path = ( %{_prefix}/bin \$path ) > endif > + > +if ("1" == "\$?LD_LIBRARY_PATH") then > + if ("\$LD_LIBRARY_PATH" !~ *%{_prefix}/lib) then > + setenv LD_LIBRARY_PATH %{_prefix}/lib:\${LD_LIBRARY_PATH} > + endif > else > - set path = ( %{_prefix}/bin ) > + setenv LD_LIBRARY_PATH %{_prefix}/lib:%{_prefix}/lib/shared > endif > > if (\$?MANPATH) then > @@ -282,7 +286,9 @@ > if ! echo \${PATH} | grep -q %{_prefix}/bin ; then > PATH=%{_prefix}/bin:\${PATH} > fi > - > +if ! echo \${LD_LIBRARY_PATH} | grep -q %{_prefix}/lib ; then > + export LD_LIBRARY_PATH=%{_prefix}/lib:\${LD_LIBRARY_PATH} > +fi > if ! echo \${MANPATH} | grep -q %{_prefix}/man ; then > MANPATH=%{_prefix}/man:\${MANPATH} > fi > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > From David_Kewley at Dell.com Thu Jul 17 21:47:46 2008 From: David_Kewley at Dell.com (David_Kewley@Dell.com) Date: Thu Jul 17 21:49:00 2008 Subject: [mvapich-discuss] uninitialized struct member leading to MVAPICH 1.0 segfault? Message-ID: I have an MVAPICH 1.0 program segfaulting, and I think I may have traced it back to MVAPICH's failure to initialize a struct member before using it. We are testing a speculative fix right now. The full story follows; let me know what you think. struct MPI_COMMUNICATOR member shmem_comm_rank is only set in one place as far as I can see: src/context/create_2level_comm.c: 100 void create_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr, int size, int my_rank){ ... 208 if (shmem_comm_count < shmem_coll_blocks){ 209 shmem_ptr->shmem_comm_rank = shmem_comm_count; 210 input_flag = 1; 211 } 212 else{ 213 input_flag = 0; 214 } ... 277 } Note that shmem_comm_rank is set only if the condition holds; if the condition does not hold, then the value of shmem_comm_rank is whatever happened to be in memory at that point. So, what might that value be? Best I can figure out, memory for a struct MPIR_COMMUNICATOR is always allocated using malloc(). My manpage for malloc says that malloc() does not clear the memory it allocates, which I take to mean it does not set the memory contents to zero, but simply leaves it as it was. So if malloc() chooses to allocate memory which was previously free()'d, then the memory handed to the requester may have inappropriate, nonzero data in it. I do not know for sure what happens if the memory happens to be freshly granted by the kernel, but I suspect in this case it is guaranteed to be zeroed by the kernel. So... If the condition (shmem_comm_count < shmem_coll_blocks) does not hold, then shmem_comm_rank is not initialized. If it is later referenced, its value is meaningless and may lead to an error. I believe that is what is happening to us; the major unknown at this point is whether we are in fact hitting the "else" part of the above clause. I'd love your comments about what is likely the case, and how we can tell without doing a printf() or similar. :) Eventually we see a segfault in free_2level_comm(): src/context/create_2level_comm.c: 62 void free_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr) 63 { ... 87 if (comm_ptr->shmem_comm != MPI_COMM_NULL) { 88 struct MPIR_COMMUNICATOR* shmem_ptr; 89 shmem_ptr= MPIR_GET_COMM_PTR(comm_ptr->shmem_comm); 90 pthread_spin_lock(&shmem_coll->shmem_coll_lock); 91 shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = 1; 92 pthread_spin_unlock(&shmem_coll->shmem_coll_lock); 93 MPI_Comm_free(&(comm_ptr->shmem_comm)); 94 } ... 98 } The segfault happens at line 91, because it appears that shmem_ptr->shmem_comm_rank is a large negative number. I suspect in fact shmem_comm_rank was never initialized (see above), which means the negative number is an "accidental" value [1]. We only see this segfault in around 1 out of 20 runs of a particular application. I suspect the ~1/20 hit rate is simply accidents of how memory gets allocated in each run. Sometimes shmem_ptr->shmem_comm_rank probably happens to sit in a memory location that contains 0, so the above code does not cause a segfault. I suspect the fact that we've only noticed this in one code may be an accident; I do not assume it is significant. We may not have visibility to whether other codes are hitting this segfault mechanism. Do you agree that this failure to initialize shmem_comm_rank is a bug? If so, probably the right fix is to add "shmem_ptr->shmem_comm_rank = 0;" to the "else" clause in the first code snippet above. Would you agree? That is the fix we are testing right now. Or should it be done in a structure-initialization operation somehow? Mind you, I don't know whether it is *semantically* correct to set shmem_comm_rank to 0 by default. I am doing it simply because it replicates the likely common case (~19 out of 20 runs) where the contents of that memory location often just happen to be cleared to zero. Finding this bug raises a question: How do we guarantee that there are not other unrecognized problems like this one? How to we check for use of uninitialized variables (e.g. structure members) allocated by malloc()? Is it best practice to do a memset(x, 0, sizeof(x))? This is a C-coding best-practices question, and also a question about how MPICH and MVAPICH are coded. Thanks, David [1] On x86_64 an int is 4 bytes and a pointer is 8 bytes. Looking at the contents of the 8 bytes starting at &(shmem_ptr->shmem_comm_rank), they appear to be a valid pointer value similar to other pointer values I see in this core dump. I do not know what this pointer points to (or pointed to in the past). We get shmem_comm_rank interpreted as a large negative number simply because the MSbit of the first four bytes happens to be set. I think it is incontrovertible that these eight bytes hold a pointer value that was at some point valid. This value could have been written to memory before the *MPIR_COMMUNICATOR was allocated (presumably part of an object that was free()'d). This is the hypothesis I explore above. It's also possible that this pointer was written to those eight bytes *after* the *MPIR_COMMUNICATOR was created. That is, someone is stomping on our structure. If that is the case, we should still see segfaults after fixing the failure to initialize shmem_comm_rank. We're doing runs right now in which shmem_comm_rank is also initialized (to 0) in the "else" clause, to check this possibility. The final possibility is that a legitimate user of this structure is writing this pointer value inappropriately. I think this is very unlikely, assuming this problem is not caused by a compiler bug, because the source code only writes to shmem_coll_rank in one place that I can see, and the code logically can only write an integer value. Regardless of the outcome of those tests, however, it is definitely a bug not to initialize shmem_comm_rank before it is used, unless I'm missing something. David Kewley Dell Infrastructure Consulting Services Onsite Engineer at the Maui HPC Center Cell: 602-460-7617 David_Kewley@Dell.com Dell Services: http://www.dell.com/services/ How am I doing? Email my manager Russell_Kelly@Dell.com with any feedback. From kernel at tekno-soft.it Fri Jul 18 12:04:57 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Fri Jul 18 12:06:39 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI Message-ID: <4880BF29.8010003@tekno-soft.it> Hi All on the list, I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, initialize using MPI_THREAD_MULTI. I've the master application doing the following thing, start several thread depending by the assigned nodes, on each node a slave application is spawned using the MPI_Comm_spawn(). Before to call the MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each thread, in order to set the all keys (host and wdir) for addressing the wanted behaviour. So, as sooner as the master application starts, it races immediately with 4 nodes, 1 master and 3 slaves. Below you can see the status of the master application at race time. It seems stuck on the PMIU_readline() which never returns so the global lock is never relesead. MVAPICH2 is compiled with: PKG_PATH=/HRI/External/mvapich2/1.2rc1 ./configure --prefix=$PKG_PATH \ --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ --enable-sharedlibs=gcc \ --enable-f90 \ --enable-threads=multiple \ --enable-g=-ggdb \ --enable-debuginfo \ --with-device=ch3:sock \ --datadir=$PKG_PATH/data \ --with-htmldir=$PKG_PATH/doc/html \ --with-docdir=$PKG_PATH/doc \ LDFLAGS='-Wl,-z,noexecstack' so I'm using the ch3:sock device. -----Thread 2 [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 --->>#2 0x00000033ca408390 in pthread_mutex_lock () from /lib64/libpthread.so.0 --->>#3 0x00002aaaab382654 in PMPI_Info_set () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=, key=0x0, value=0x33ca40ff58 "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) at ParallelWorker.c:664 #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) at ParallelWorker.c:719 #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at ParallelWorker.c:504 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 -----Thread 3 [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 --->>#2 0x00000033ca408390 in pthread_mutex_lock () from /lib64/libpthread.so.0 --->>#3 0x00002aaaab382654 in PMPI_Info_set () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=, key=0x0, value=0x33ca40ff58 "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) at ParallelWorker.c:664 #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) at ParallelWorker.c:719 #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at ParallelWorker.c:504 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 -----Thread 4 [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 --->>#1 0x00002aaaab3db84a in PMIU_readline () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) at ParallelWorker.c:754 #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at ParallelWorker.c:504 #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 I also tried to run against MPICH2 v1.0.7, but here I got a similar scenery which show up after between 1 - 2 hours of execution, see below: ----- thread 2 [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 ----- thread 3 [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 ----- thread 4 [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 where the thread 2 is poll()ing never never returns, so never signals the poll() completion and than all the others waiters in the MPIDI_CH3I_Progress() condition will never wake up. Does anyone is having the same problem? Thanks in advance, Roberto Fichera. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080718/e5357966/attachment-0001.html From panda at cse.ohio-state.edu Fri Jul 18 12:39:17 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jul 18 12:39:24 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: <4880BF29.8010003@tekno-soft.it> Message-ID: Hi Roberto, Thanks for your note. You are using the ch3:sock device in MVAPICH2 which is the same as MPICH2. You are also seeing similar failure scenarios (but in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 mailing list. One of the MPICH2 developers will be able to extend help on this issue faster. Thanks, DK On Fri, 18 Jul 2008, Roberto Fichera wrote: > Hi All on the list, > > I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, > initialize using MPI_THREAD_MULTI. > I've the master application doing the following thing, start several > thread depending by the assigned nodes, > on each node a slave application is spawned using the MPI_Comm_spawn(). > Before to call the > MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each > thread, in order to set the all keys > (host and wdir) for addressing the wanted behaviour. So, as sooner as > the master application starts, it races > immediately with 4 nodes, 1 master and 3 slaves. Below you can see the > status of the master application at race > time. It seems stuck on the PMIU_readline() which never returns so the > global lock is never relesead. MVAPICH2 > is compiled with: > > PKG_PATH=/HRI/External/mvapich2/1.2rc1 > > ./configure --prefix=$PKG_PATH \ > --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ > --enable-sharedlibs=gcc \ > --enable-f90 \ > --enable-threads=multiple \ > --enable-g=-ggdb \ > --enable-debuginfo \ > --with-device=ch3:sock \ > --datadir=$PKG_PATH/data \ > --with-htmldir=$PKG_PATH/doc/html \ > --with-docdir=$PKG_PATH/doc \ > LDFLAGS='-Wl,-z,noexecstack' > > so I'm using the ch3:sock device. > > -----Thread 2 > [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 > 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > /lib64/libpthread.so.0 > --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= optimized out>, key=0x0, value=0x33ca40ff58 > "!\204ÿÿ\r\206ÿÿ\030\204ÿÿ3\206ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ"...) > at ParallelWorker.c:664 > #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) > at ParallelWorker.c:719 > #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at > ParallelWorker.c:504 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > -----Thread 3 > [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 > 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > /lib64/libpthread.so.0 > --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= optimized out>, key=0x0, value=0x33ca40ff58 > "!\204ÿÿ\r\206ÿÿ\030\204ÿÿ3\206ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\177\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\033\205ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\n\204ÿÿ\033\205ÿÿ\033\205ÿÿ"...) > at ParallelWorker.c:664 > #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) > at ParallelWorker.c:719 > #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at > ParallelWorker.c:504 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > -----Thread 4 > [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 > 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > --->>#1 0x00002aaaab3db84a in PMIU_readline () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from > /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) > at ParallelWorker.c:754 > #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at > ParallelWorker.c:504 > #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > I also tried to run against MPICH2 v1.0.7, but here I got a similar > scenery which show up after between 1 - 2 hours of execution, > see below: > > ----- thread 2 > [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 > #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 > #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > ----- thread 3 > [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 > #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > > ----- thread 4 > [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 > #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 > #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > > where the thread 2 is poll()ing never never returns, so never signals > the poll() completion and than all the others > waiters in the MPIDI_CH3I_Progress() condition will never wake up. > > Does anyone is having the same problem? > > Thanks in advance, > Roberto Fichera. > From kernel at tekno-soft.it Fri Jul 18 12:49:48 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Fri Jul 18 12:51:28 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: References: Message-ID: <4880C9AC.2050001@tekno-soft.it> Dhabaleswar Panda ha scritto: > Hi Roberto, > > Thanks for your note. You are using the ch3:sock device in MVAPICH2 which > is the same as MPICH2. You are also seeing similar failure scenarios (but > in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 > mailing list. One of the MPICH2 developers will be able to extend help on > this issue faster. > Thanks for that. About the mpich2 problem, I already sent an email regarding its related issue. But the strange thing is that when linking against mpich2 I don't see a so fast race as I see in the mvapich2. In the mpich2 case I had to wait 1 or 2 hours before the lock. > Thanks, > > DK > > > On Fri, 18 Jul 2008, Roberto Fichera wrote: > > >> Hi All on the list, >> >> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, >> initialize using MPI_THREAD_MULTI. >> I've the master application doing the following thing, start several >> thread depending by the assigned nodes, >> on each node a slave application is spawned using the MPI_Comm_spawn(). >> Before to call the >> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each >> thread, in order to set the all keys >> (host and wdir) for addressing the wanted behaviour. So, as sooner as >> the master application starts, it races >> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the >> status of the master application at race >> time. It seems stuck on the PMIU_readline() which never returns so the >> global lock is never relesead. MVAPICH2 >> is compiled with: >> >> PKG_PATH=/HRI/External/mvapich2/1.2rc1 >> >> ./configure --prefix=$PKG_PATH \ >> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ >> --enable-sharedlibs=gcc \ >> --enable-f90 \ >> --enable-threads=multiple \ >> --enable-g=-ggdb \ >> --enable-debuginfo \ >> --with-device=ch3:sock \ >> --datadir=$PKG_PATH/data \ >> --with-htmldir=$PKG_PATH/doc/html \ >> --with-docdir=$PKG_PATH/doc \ >> LDFLAGS='-Wl,-z,noexecstack' >> >> so I'm using the ch3:sock device. >> >> -----Thread 2 >> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >> /lib64/libpthread.so.0 >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=> optimized out>, key=0x0, value=0x33ca40ff58 >> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >> at ParallelWorker.c:664 >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) >> at ParallelWorker.c:719 >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at >> ParallelWorker.c:504 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> -----Thread 3 >> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >> /lib64/libpthread.so.0 >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=> optimized out>, key=0x0, value=0x33ca40ff58 >> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >> at ParallelWorker.c:664 >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) >> at ParallelWorker.c:719 >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at >> ParallelWorker.c:504 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> -----Thread 4 >> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 >> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >> --->>#1 0x00002aaaab3db84a in PMIU_readline () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) >> at ParallelWorker.c:754 >> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at >> ParallelWorker.c:504 >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> I also tried to run against MPICH2 v1.0.7, but here I got a similar >> scenery which show up after between 1 - 2 hours of execution, >> see below: >> >> ----- thread 2 >> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 >> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> ----- thread 3 >> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 >> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> >> ----- thread 4 >> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 >> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >> >> where the thread 2 is poll()ing never never returns, so never signals >> the poll() completion and than all the others >> waiters in the MPIDI_CH3I_Progress() condition will never wake up. >> >> Does anyone is having the same problem? >> >> Thanks in advance, >> Roberto Fichera. >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080718/375664b2/attachment-0001.html From koop at cse.ohio-state.edu Fri Jul 18 14:14:56 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri Jul 18 14:15:05 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: <4880C9AC.2050001@tekno-soft.it> Message-ID: Hi Roberto, Are you using the new 'mpirun_rsh' command for launching your job? If so, that would explain the hang you see in the PMI calls (and why they happen at the spawn). We currently do not support spawn functionality in this release for mpirun_rsh. You will need to use MPD if your application needs spawn functionality until we release an updated version of mpirun_rsh. Thanks, Matt On Fri, 18 Jul 2008, Roberto Fichera wrote: > Dhabaleswar Panda ha scritto: > > Hi Roberto, > > > > Thanks for your note. You are using the ch3:sock device in MVAPICH2 which > > is the same as MPICH2. You are also seeing similar failure scenarios (but > > in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 > > mailing list. One of the MPICH2 developers will be able to extend help on > > this issue faster. > > > Thanks for that. About the mpich2 problem, I already sent an email > regarding its related issue. > But the strange thing is that when linking against mpich2 I don't see a > so fast race as I see in the > mvapich2. In the mpich2 case I had to wait 1 or 2 hours before the lock. > > Thanks, > > > > DK > > > > > > On Fri, 18 Jul 2008, Roberto Fichera wrote: > > > > > >> Hi All on the list, > >> > >> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, > >> initialize using MPI_THREAD_MULTI. > >> I've the master application doing the following thing, start several > >> thread depending by the assigned nodes, > >> on each node a slave application is spawned using the MPI_Comm_spawn(). > >> Before to call the > >> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each > >> thread, in order to set the all keys > >> (host and wdir) for addressing the wanted behaviour. So, as sooner as > >> the master application starts, it races > >> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the > >> status of the master application at race > >> time. It seems stuck on the PMIU_readline() which never returns so the > >> global lock is never relesead. MVAPICH2 > >> is compiled with: > >> > >> PKG_PATH=/HRI/External/mvapich2/1.2rc1 > >> > >> ./configure --prefix=$PKG_PATH \ > >> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ > >> --enable-sharedlibs=gcc \ > >> --enable-f90 \ > >> --enable-threads=multiple \ > >> --enable-g=-ggdb \ > >> --enable-debuginfo \ > >> --with-device=ch3:sock \ > >> --datadir=$PKG_PATH/data \ > >> --with-htmldir=$PKG_PATH/doc/html \ > >> --with-docdir=$PKG_PATH/doc \ > >> LDFLAGS='-Wl,-z,noexecstack' > >> > >> so I'm using the ch3:sock device. > >> > >> -----Thread 2 > >> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 > >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >> /lib64/libpthread.so.0 > >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >> optimized out>, key=0x0, value=0x33ca40ff58 > >> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >> at ParallelWorker.c:664 > >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) > >> at ParallelWorker.c:719 > >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at > >> ParallelWorker.c:504 > >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> -----Thread 3 > >> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 > >> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >> /lib64/libpthread.so.0 > >> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >> optimized out>, key=0x0, value=0x33ca40ff58 > >> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >> at ParallelWorker.c:664 > >> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) > >> at ParallelWorker.c:719 > >> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at > >> ParallelWorker.c:504 > >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> -----Thread 4 > >> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 > >> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >> --->>#1 0x00002aaaab3db84a in PMIU_readline () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from > >> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) > >> at ParallelWorker.c:754 > >> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at > >> ParallelWorker.c:504 > >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> I also tried to run against MPICH2 v1.0.7, but here I got a similar > >> scenery which show up after between 1 - 2 hours of execution, > >> see below: > >> > >> ----- thread 2 > >> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >> (gdb) bt > >> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 > >> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 > >> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> ----- thread 3 > >> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 > >> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 > >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> > >> ----- thread 4 > >> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> (gdb) bt > >> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 > >> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 > >> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >> > >> where the thread 2 is poll()ing never never returns, so never signals > >> the poll() completion and than all the others > >> waiters in the MPIDI_CH3I_Progress() condition will never wake up. > >> > >> Does anyone is having the same problem? > >> > >> Thanks in advance, > >> Roberto Fichera. > >> > >> > > > > > > > > From kernel at tekno-soft.it Fri Jul 18 14:17:55 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Fri Jul 18 14:19:28 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: References: Message-ID: <4880DE53.8060400@tekno-soft.it> Matthew Koop ha scritto: > Hi Roberto, > > Are you using the new 'mpirun_rsh' command for launching your job? If so, > that would explain the hang you see in the PMI calls (and why they happen > at the spawn). > > We currently do not support spawn functionality in this release for > mpirun_rsh. You will need to use MPD if your application needs spawn > functionality until we release an updated version of mpirun_rsh. > No! I use a torque script for executing it or the torque interactive way, so qsub -I TestParallel1.pbs. Below there is the relevant part of the pbs script: ## Compute the number of associated nodes NODES=`wc -l < $PBS_NODEFILE` # Arrange the PBS host file for MPI handling TMPFILE=`mktemp` || exit 1 sort $PBS_NODEFILE | uniq -c | awk '{ printf("%s:%s\n", $2, $1); }' > $TMPFILE ## start MPI with the requested nodes $HGR/External/mvapich2/1.2/bin/$MAKEFILE_PLATFORM/mpdboot -n $NODES -f $TMPFILE ## Run the application /data/roberto/newBST/Libraries/Parallelization/Parallel/1.0/examples/TestParallel1.sh # remove the temporary file rm -f $TMPFILE ## Quit from MPI $HGR/External/mvapich2/1.2/bin/$MAKEFILE_PLATFORM/mpdallexit > Thanks, > > Matt > > On Fri, 18 Jul 2008, Roberto Fichera wrote: > > >> Dhabaleswar Panda ha scritto: >> >>> Hi Roberto, >>> >>> Thanks for your note. You are using the ch3:sock device in MVAPICH2 which >>> is the same as MPICH2. You are also seeing similar failure scenarios (but >>> in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 >>> mailing list. One of the MPICH2 developers will be able to extend help on >>> this issue faster. >>> >>> >> Thanks for that. About the mpich2 problem, I already sent an email >> regarding its related issue. >> But the strange thing is that when linking against mpich2 I don't see a >> so fast race as I see in the >> mvapich2. In the mpich2 case I had to wait 1 or 2 hours before the lock. >> >>> Thanks, >>> >>> DK >>> >>> >>> On Fri, 18 Jul 2008, Roberto Fichera wrote: >>> >>> >>> >>>> Hi All on the list, >>>> >>>> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, >>>> initialize using MPI_THREAD_MULTI. >>>> I've the master application doing the following thing, start several >>>> thread depending by the assigned nodes, >>>> on each node a slave application is spawned using the MPI_Comm_spawn(). >>>> Before to call the >>>> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each >>>> thread, in order to set the all keys >>>> (host and wdir) for addressing the wanted behaviour. So, as sooner as >>>> the master application starts, it races >>>> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the >>>> status of the master application at race >>>> time. It seems stuck on the PMIU_readline() which never returns so the >>>> global lock is never relesead. MVAPICH2 >>>> is compiled with: >>>> >>>> PKG_PATH=/HRI/External/mvapich2/1.2rc1 >>>> >>>> ./configure --prefix=$PKG_PATH \ >>>> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >>>> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >>>> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ >>>> --enable-sharedlibs=gcc \ >>>> --enable-f90 \ >>>> --enable-threads=multiple \ >>>> --enable-g=-ggdb \ >>>> --enable-debuginfo \ >>>> --with-device=ch3:sock \ >>>> --datadir=$PKG_PATH/data \ >>>> --with-htmldir=$PKG_PATH/doc/html \ >>>> --with-docdir=$PKG_PATH/doc \ >>>> LDFLAGS='-Wl,-z,noexecstack' >>>> >>>> so I'm using the ch3:sock device. >>>> >>>> -----Thread 2 >>>> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 >>>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>> (gdb) bt >>>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >>>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >>>> /lib64/libpthread.so.0 >>>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=>>> optimized out>, key=0x0, value=0x33ca40ff58 >>>> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >>>> at ParallelWorker.c:664 >>>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) >>>> at ParallelWorker.c:719 >>>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at >>>> ParallelWorker.c:504 >>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>> >>>> -----Thread 3 >>>> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 >>>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>> (gdb) bt >>>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >>>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >>>> /lib64/libpthread.so.0 >>>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=>>> optimized out>, key=0x0, value=0x33ca40ff58 >>>> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >>>> at ParallelWorker.c:664 >>>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) >>>> at ParallelWorker.c:719 >>>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at >>>> ParallelWorker.c:504 >>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>> >>>> -----Thread 4 >>>> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 >>>> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >>>> (gdb) bt >>>> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >>>> --->>#1 0x00002aaaab3db84a in PMIU_readline () from >>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from >>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from >>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from >>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from >>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) >>>> at ParallelWorker.c:754 >>>> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at >>>> ParallelWorker.c:504 >>>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>> >>>> I also tried to run against MPICH2 v1.0.7, but here I got a similar >>>> scenery which show up after between 1 - 2 hours of execution, >>>> see below: >>>> >>>> ----- thread 2 >>>> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >>>> (gdb) bt >>>> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >>>> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 >>>> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 >>>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>> >>>> ----- thread 3 >>>> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>> (gdb) bt >>>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 >>>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 >>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>> >>>> >>>> ----- thread 4 >>>> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>> (gdb) bt >>>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 >>>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 >>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>> >>>> where the thread 2 is poll()ing never never returns, so never signals >>>> the poll() completion and than all the others >>>> waiters in the MPIDI_CH3I_Progress() condition will never wake up. >>>> >>>> Does anyone is having the same problem? >>>> >>>> Thanks in advance, >>>> Roberto Fichera. >>>> >>>> >>>> >>> >>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080718/af1f7f6c/attachment-0001.html From noam.bernstein at nrl.navy.mil Fri Jul 18 15:39:34 2008 From: noam.bernstein at nrl.navy.mil (Noam Bernstein) Date: Fri Jul 18 15:39:40 2008 Subject: [mvapich-discuss] mpi_ssend and system call (fortran) conflict Message-ID: I've been seeing a problem with mvapich with Intel ifort (10.1.015), where a call to system will cause subsequent calls to mpi_ssend to hang or crash. Using mpi_send seems to be fine, and removing the call to system also fixes the problem. This is on a dual Opteron, Infiniband system (Mellanox HCAs), with OFED 1.2.5.1. There A relaxed error (related to fork) was supposedly fixed in this version of OFED (1.2.1 according to the release notes http://www.open-mpi.org/svn/new.php). mvapich-1.0.1 with make.mvapich.gen2 hangs mvapich2-1.0.3 with make.mvapich2.ofa crashes, with the message: send desc error [0] Abort: [] Got completion with error 1, vendor code=69, dest rank=1 at line 519 in file ibv_channel_manager.c (rank 0 is the sender). I'm starting with this list, since the fact that mpi_ssend has problems but mpi_send does not makes me think that it's either an mvapich problem or an mvapich specific interaction with OFED. thanks, Noam -------------- next part -------------- A non-text attachment was scrubbed... Name: summary.tgz Type: application/octet-stream Size: 2177 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080718/9753d963/summary.obj From koop at cse.ohio-state.edu Fri Jul 18 15:51:01 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri Jul 18 15:51:08 2008 Subject: [mvapich-discuss] mpi_ssend and system call (fortran) conflict In-Reply-To: Message-ID: Hi Noam, By default we do not turn on the fork-safe support since there is some slight performance overhead. To enable the support you need to use the IBV_FORK_SAFE=1 enviroment variable e.g. mpirun_rsh -np X -hostfile Y IBV_FORK_SAFE=1 ./exec This is in 7.1.2 of the MVAPICH user guide as well. Let us know if this helps or not. Thanks! Matt On Fri, 18 Jul 2008, Noam Bernstein wrote: > I've been seeing a problem with mvapich with Intel ifort (10.1.015), > where a call to system > will cause subsequent calls to mpi_ssend to hang or crash. Using > mpi_send seems to be > fine, and removing the call to system also fixes the problem. > > This is on a dual Opteron, Infiniband system (Mellanox HCAs), with > OFED 1.2.5.1. There > A relaxed error (related to fork) was supposedly fixed in this > version of OFED (1.2.1 according > to the release notes http://www.open-mpi.org/svn/new.php). > > mvapich-1.0.1 with make.mvapich.gen2 hangs > > mvapich2-1.0.3 with make.mvapich2.ofa crashes, with the message: > > send desc error > [0] Abort: [] Got completion with error 1, vendor code=69, dest > rank=1 > at line 519 in file ibv_channel_manager.c > > (rank 0 is the sender). > > I'm starting with this list, since the fact that mpi_ssend has > problems but > mpi_send does not makes me think that it's either an mvapich problem or > an mvapich specific interaction with OFED. > > thanks, > Noam > > From panda at cse.ohio-state.edu Fri Jul 18 15:59:52 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jul 18 16:00:00 2008 Subject: [mvapich-discuss] mpi_ssend and system call (fortran) conflict In-Reply-To: Message-ID: Noam, Section 8.1.2 of MVAPICH2 1.0.3 user guide (a similar section exists for MVAPICH 1.0.1 too) indicates the following for the use of fork() and system() calls. ================================================================= fork() and system() calls are supported for the OpenFabrics device as long as the kernel is being used is Linux 2.6.16 or newer. Additionally, the version of OFED used should be 1.2 or higher. The environment variable IBV_FORK_SAFE=1 must also be set to enable fork support. ================================================================= Are all these constraints being satisfied? Thanks, DK On Fri, 18 Jul 2008, Noam Bernstein wrote: > I've been seeing a problem with mvapich with Intel ifort (10.1.015), > where a call to system > will cause subsequent calls to mpi_ssend to hang or crash. Using > mpi_send seems to be > fine, and removing the call to system also fixes the problem. > > This is on a dual Opteron, Infiniband system (Mellanox HCAs), with > OFED 1.2.5.1. There > A relaxed error (related to fork) was supposedly fixed in this > version of OFED (1.2.1 according > to the release notes http://www.open-mpi.org/svn/new.php). > > mvapich-1.0.1 with make.mvapich.gen2 hangs > > mvapich2-1.0.3 with make.mvapich2.ofa crashes, with the message: > > send desc error > [0] Abort: [] Got completion with error 1, vendor code=69, dest > rank=1 > at line 519 in file ibv_channel_manager.c > > (rank 0 is the sender). > > I'm starting with this list, since the fact that mpi_ssend has > problems but > mpi_send does not makes me think that it's either an mvapich problem or > an mvapich specific interaction with OFED. > > thanks, > Noam > > From noam.bernstein at nrl.navy.mil Fri Jul 18 16:37:46 2008 From: noam.bernstein at nrl.navy.mil (Noam Bernstein) Date: Fri Jul 18 16:38:13 2008 Subject: [mvapich-discuss] mpi_ssend and system call (fortran) conflict In-Reply-To: References: Message-ID: Aah - I missed the IBV_FORK_SAFE requirement. I've activated that, but it's complaining that it can't do it: jellium.nrl.navy.mil 1224 : cat ssend_forksafe.mvapich2-1.0.3.stderr libibverbs: Warning: fork()-safety requested but init failed libibverbs: Warning: fork()-safety requested but init failed libibverbs: Warning: fork()-safety requested but init failed libibverbs: Warning: fork()-safety requested but init failed send desc error [0] Abort: [] Got completion with error 1, vendor code=69, dest rank=1 at line 519 in file ibv_channel_manager.c Same error messages from mvapich-1.0.1, and both mvapich and mvapich2 fail in the same way as before (since presumably they're not _actually_ running in fork-safe mode). It's perhaps looking more like an OFED problem now, I suppose, but if it is, maybe you can give me some clues as to what to look for? Noam From kumarra at cse.ohio-state.edu Fri Jul 18 16:49:16 2008 From: kumarra at cse.ohio-state.edu (rahul kumar) Date: Fri Jul 18 16:49:25 2008 Subject: [mvapich-discuss] uninitialized struct member leading to MVAPICH 1.0 segfault? In-Reply-To: Message-ID: Hi David, Thanks for reporting this and for your analysis. You are correct that the variable shmem_comm_rank is not being initialized in else part of the below statements if (shmem_comm_count < shmem_coll_blocks){ shmem_ptr->shmem_comm_rank = shmem_comm_count; input_flag = 1; } else{ input_flag = 0; } However, that actually might not be required. Although shmem_comm_rank is not initialized but input_flag is set 0. If you follow the variable input_flag in the following statements: MPI_Allreduce(&input_flag, &output_flag, 1, MPI_INT, MPI_LAND, comm_ptr->self); The above statement would set output_flag as 0 if any one of the processes has input_flag as 0. Based on that the variable shmem_coll_ok is set to 0 in the following part of the code in the same create_2level_comm() function. if (output_flag == 1){ comm_ptr->shmem_coll_ok = 1; } else{ comm_ptr->shmem_coll_ok = 0; } If you see in the free_2level_comm() function at the place where the variable shmem_comm_rank is dereferenced. The dereferencing happens only when shmem_coll_ok is 1 which would not be in our case. if ((my_local_id == 0)&&(comm_ptr->shmem_coll_ok == 1)){ pthread_spin_lock(&shmem_coll->shmem_coll_lock); shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = 1; pthread_spin_unlock(&shmem_coll->shmem_coll_lock); } So therefore, not initializing the variable shmem_comm_rank should not cause a problem. If you could send us a reproducer and/or backtrace of the segfault. We will be happy to help you. Regards, rahul. On Thu, 17 Jul 2008 David_Kewley@Dell.com wrote: > I have an MVAPICH 1.0 program segfaulting, and I think I may have traced > it > back to MVAPICH's failure to initialize a struct member before using it. > We > are testing a speculative fix right now. The full story follows; let > me > know what you think. > > struct MPI_COMMUNICATOR member shmem_comm_rank is only set in one place > as far > as I can see: > > src/context/create_2level_comm.c: > > 100 void create_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr, int > size, int my_rank){ > ... > 208 if (shmem_comm_count < shmem_coll_blocks){ > 209 shmem_ptr->shmem_comm_rank = shmem_comm_count; > 210 input_flag = 1; > 211 } > 212 else{ > 213 input_flag = 0; > 214 } > ... > 277 } > > Note that shmem_comm_rank is set only if the condition holds; if the > condition > does not hold, then the value of shmem_comm_rank is whatever happened to > be > in memory at that point. So, what might that value be? > > Best I can figure out, memory for a struct MPIR_COMMUNICATOR is always > allocated using malloc(). My manpage for malloc says that malloc() does > not > clear the memory it allocates, which I take to mean it does not set the > memory contents to zero, but simply leaves it as it was. So if malloc() > > chooses to allocate memory which was previously free()'d, then the > memory > handed to the requester may have inappropriate, nonzero data in it. I > do not > know for sure what happens if the memory happens to be freshly granted > by the > kernel, but I suspect in this case it is guaranteed to be zeroed by the > kernel. > > So... If the condition (shmem_comm_count < shmem_coll_blocks) does not > hold, > then shmem_comm_rank is not initialized. If it is later referenced, its > > value is meaningless and may lead to an error. > > I believe that is what is happening to us; the major unknown at this > point is > whether we are in fact hitting the "else" part of the above clause. I'd > love > your comments about what is likely the case, and how we can tell > without > doing a printf() or similar. :) > > Eventually we see a segfault in free_2level_comm(): > > src/context/create_2level_comm.c: > > 62 void free_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr) > 63 { > ... > 87 if (comm_ptr->shmem_comm != MPI_COMM_NULL) { > 88 struct MPIR_COMMUNICATOR* shmem_ptr; > 89 shmem_ptr= MPIR_GET_COMM_PTR(comm_ptr->shmem_comm); > 90 pthread_spin_lock(&shmem_coll->shmem_coll_lock); > 91 > shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = 1; > 92 pthread_spin_unlock(&shmem_coll->shmem_coll_lock); > 93 MPI_Comm_free(&(comm_ptr->shmem_comm)); > 94 } > ... > 98 } > > The segfault happens at line 91, because it appears that > shmem_ptr->shmem_comm_rank is a large negative number. I suspect in > fact > shmem_comm_rank was never initialized (see above), which means the > negative > number is an "accidental" value [1]. > > We only see this segfault in around 1 out of 20 runs of a particular > application. I suspect the ~1/20 hit rate is simply accidents of how > memory > gets allocated in each run. Sometimes shmem_ptr->shmem_comm_rank > probably > happens to sit in a memory location that contains 0, so the above code > does > not cause a segfault. > > I suspect the fact that we've only noticed this in one code may be an > accident; I do not assume it is significant. We may not have visibility > to > whether other codes are hitting this segfault mechanism. > > Do you agree that this failure to initialize shmem_comm_rank is a bug? > If so, > probably the right fix is to add "shmem_ptr->shmem_comm_rank = 0;" to > the "else" clause in the first code snippet above. Would you agree? > That is > the fix we are testing right now. Or should it be done in a > structure-initialization operation somehow? > > Mind you, I don't know whether it is *semantically* correct to set > shmem_comm_rank to 0 by default. I am doing it simply because it > replicates > the likely common case (~19 out of 20 runs) where the contents of that > memory > location often just happen to be cleared to zero. > > Finding this bug raises a question: How do we guarantee that there are > not > other unrecognized problems like this one? How to we check for use of > uninitialized variables (e.g. structure members) allocated by malloc()? > Is > it best practice to do a memset(x, 0, sizeof(x))? This is a C-coding > best-practices question, and also a question about how MPICH and MVAPICH > are > coded. > > Thanks, > David > > > [1] On x86_64 an int is 4 bytes and a pointer is 8 bytes. Looking at > the > contents of the 8 bytes starting at &(shmem_ptr->shmem_comm_rank), they > appear to be a valid pointer value similar to other pointer values I see > in > this core dump. I do not know what this pointer points to (or pointed > to in > the past). We get shmem_comm_rank interpreted as a large negative > number > simply because the MSbit of the first four bytes happens to be set. > > I think it is incontrovertible that these eight bytes hold a pointer > value > that was at some point valid. This value could have been written to > memory > before the *MPIR_COMMUNICATOR was allocated (presumably part of an > object > that was free()'d). This is the hypothesis I explore above. > > It's also possible that this pointer was written to those eight bytes > *after* > the *MPIR_COMMUNICATOR was created. That is, someone is stomping on our > > structure. If that is the case, we should still see segfaults after > fixing > the failure to initialize shmem_comm_rank. We're doing runs right now > in > which shmem_comm_rank is also initialized (to 0) in the "else" clause, > to > check this possibility. > > The final possibility is that a legitimate user of this structure is > writing > this pointer value inappropriately. I think this is very unlikely, > assuming > this problem is not caused by a compiler bug, because the source code > only > writes to shmem_coll_rank in one place that I can see, and the code > logically > can only write an integer value. > > Regardless of the outcome of those tests, however, it is definitely a > bug not > to initialize shmem_comm_rank before it is used, unless I'm missing > something. > > > David Kewley > Dell Infrastructure Consulting Services > Onsite Engineer at the Maui HPC Center > Cell: 602-460-7617 > David_Kewley@Dell.com > > Dell Services: http://www.dell.com/services/ > How am I doing? Email my manager Russell_Kelly@Dell.com with any > feedback. > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From David_Kewley at Dell.com Fri Jul 18 17:32:56 2008 From: David_Kewley at Dell.com (David_Kewley@Dell.com) Date: Fri Jul 18 17:34:07 2008 Subject: [mvapich-discuss] uninitialized struct member leading to MVAPICH 1.0 segfault? In-Reply-To: References: Message-ID: Rahul, Thanks for your detailed analysis. It looks like we are looking at different versions of free_2level_comm(). Mine (mvapich-1.0) has no mention of shmem_coll_ok whatsoever, but I see that mvapich-1.0.1 has the check on shmem_coll_ok, which I agree should avoid this issue. It appears that 1.0.1 fixed this particular problem that exists in 1.0. Looks like it's time for us to figure out how to organize an upgrade. We're generally a bit slow in order to minimize churn for our users. How does one best avoid or detect this class of problems in general (using uninitialized variables)? Seems too much to ask the programmer to be careful about each and every possible instance of this problem in detail. Is memset(x,0,sizeof(x)) the general answer? Thanks! David > -----Original Message----- > From: rahul kumar [mailto:kumarra@cse.ohio-state.edu] > Sent: Friday, July 18, 2008 10:49 AM > To: Kewley, David > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] uninitialized struct member leading to > MVAPICH 1.0 segfault? > > Hi David, > Thanks for reporting this and for your analysis. > You are correct that the variable shmem_comm_rank is not being initialized > in else part of the below statements > if (shmem_comm_count < shmem_coll_blocks){ > shmem_ptr->shmem_comm_rank = shmem_comm_count; > input_flag = 1; > } > else{ > input_flag = 0; > } > However, that actually might not be required. Although shmem_comm_rank is > not initialized but input_flag is set 0. If you follow the variable > input_flag in the following statements: > MPI_Allreduce(&input_flag, &output_flag, 1, MPI_INT, MPI_LAND, > comm_ptr->self); > > The above statement would set output_flag as 0 if any one of the processes > has input_flag as 0. Based on that the variable shmem_coll_ok is set to 0 > in the following part of the code in the same create_2level_comm() > function. > if (output_flag == 1){ > comm_ptr->shmem_coll_ok = 1; > } > else{ > comm_ptr->shmem_coll_ok = 0; > } > > If you see in the free_2level_comm() function at the place where the > variable shmem_comm_rank is dereferenced. The dereferencing happens only > when shmem_coll_ok is 1 which would not be in our case. > if ((my_local_id == 0)&&(comm_ptr->shmem_coll_ok == 1)){ > pthread_spin_lock(&shmem_coll->shmem_coll_lock); > shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = 1; > pthread_spin_unlock(&shmem_coll->shmem_coll_lock); > } > > So therefore, not initializing the variable shmem_comm_rank should not > cause a problem. > > If you could send us a reproducer and/or backtrace of the segfault. We > will be happy to help you. > > Regards, > rahul. > > > On Thu, 17 Jul 2008 David_Kewley@Dell.com wrote: > > > I have an MVAPICH 1.0 program segfaulting, and I think I may have traced > > it > > back to MVAPICH's failure to initialize a struct member before using it. > > We > > are testing a speculative fix right now. The full story follows; let > > me > > know what you think. > > > > struct MPI_COMMUNICATOR member shmem_comm_rank is only set in one place > > as far > > as I can see: > > > > src/context/create_2level_comm.c: > > > > 100 void create_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr, int > > size, int my_rank){ > > ... > > 208 if (shmem_comm_count < shmem_coll_blocks){ > > 209 shmem_ptr->shmem_comm_rank = shmem_comm_count; > > 210 input_flag = 1; > > 211 } > > 212 else{ > > 213 input_flag = 0; > > 214 } > > ... > > 277 } > > > > Note that shmem_comm_rank is set only if the condition holds; if the > > condition > > does not hold, then the value of shmem_comm_rank is whatever happened to > > be > > in memory at that point. So, what might that value be? > > > > Best I can figure out, memory for a struct MPIR_COMMUNICATOR is always > > allocated using malloc(). My manpage for malloc says that malloc() does > > not > > clear the memory it allocates, which I take to mean it does not set the > > memory contents to zero, but simply leaves it as it was. So if malloc() > > > > chooses to allocate memory which was previously free()'d, then the > > memory > > handed to the requester may have inappropriate, nonzero data in it. I > > do not > > know for sure what happens if the memory happens to be freshly granted > > by the > > kernel, but I suspect in this case it is guaranteed to be zeroed by the > > kernel. > > > > So... If the condition (shmem_comm_count < shmem_coll_blocks) does not > > hold, > > then shmem_comm_rank is not initialized. If it is later referenced, its > > > > value is meaningless and may lead to an error. > > > > I believe that is what is happening to us; the major unknown at this > > point is > > whether we are in fact hitting the "else" part of the above clause. I'd > > love > > your comments about what is likely the case, and how we can tell > > without > > doing a printf() or similar. :) > > > > Eventually we see a segfault in free_2level_comm(): > > > > src/context/create_2level_comm.c: > > > > 62 void free_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr) > > 63 { > > ... > > 87 if (comm_ptr->shmem_comm != MPI_COMM_NULL) { > > 88 struct MPIR_COMMUNICATOR* shmem_ptr; > > 89 shmem_ptr= MPIR_GET_COMM_PTR(comm_ptr->shmem_comm); > > 90 pthread_spin_lock(&shmem_coll->shmem_coll_lock); > > 91 > > shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = 1; > > 92 pthread_spin_unlock(&shmem_coll->shmem_coll_lock); > > 93 MPI_Comm_free(&(comm_ptr->shmem_comm)); > > 94 } > > ... > > 98 } > > > > The segfault happens at line 91, because it appears that > > shmem_ptr->shmem_comm_rank is a large negative number. I suspect in > > fact > > shmem_comm_rank was never initialized (see above), which means the > > negative > > number is an "accidental" value [1]. > > > > We only see this segfault in around 1 out of 20 runs of a particular > > application. I suspect the ~1/20 hit rate is simply accidents of how > > memory > > gets allocated in each run. Sometimes shmem_ptr->shmem_comm_rank > > probably > > happens to sit in a memory location that contains 0, so the above code > > does > > not cause a segfault. > > > > I suspect the fact that we've only noticed this in one code may be an > > accident; I do not assume it is significant. We may not have visibility > > to > > whether other codes are hitting this segfault mechanism. > > > > Do you agree that this failure to initialize shmem_comm_rank is a bug? > > If so, > > probably the right fix is to add "shmem_ptr->shmem_comm_rank = 0;" to > > the "else" clause in the first code snippet above. Would you agree? > > That is > > the fix we are testing right now. Or should it be done in a > > structure-initialization operation somehow? > > > > Mind you, I don't know whether it is *semantically* correct to set > > shmem_comm_rank to 0 by default. I am doing it simply because it > > replicates > > the likely common case (~19 out of 20 runs) where the contents of that > > memory > > location often just happen to be cleared to zero. > > > > Finding this bug raises a question: How do we guarantee that there are > > not > > other unrecognized problems like this one? How to we check for use of > > uninitialized variables (e.g. structure members) allocated by malloc()? > > Is > > it best practice to do a memset(x, 0, sizeof(x))? This is a C-coding > > best-practices question, and also a question about how MPICH and MVAPICH > > are > > coded. > > > > Thanks, > > David > > > > > > [1] On x86_64 an int is 4 bytes and a pointer is 8 bytes. Looking at > > the > > contents of the 8 bytes starting at &(shmem_ptr->shmem_comm_rank), they > > appear to be a valid pointer value similar to other pointer values I see > > in > > this core dump. I do not know what this pointer points to (or pointed > > to in > > the past). We get shmem_comm_rank interpreted as a large negative > > number > > simply because the MSbit of the first four bytes happens to be set. > > > > I think it is incontrovertible that these eight bytes hold a pointer > > value > > that was at some point valid. This value could have been written to > > memory > > before the *MPIR_COMMUNICATOR was allocated (presumably part of an > > object > > that was free()'d). This is the hypothesis I explore above. > > > > It's also possible that this pointer was written to those eight bytes > > *after* > > the *MPIR_COMMUNICATOR was created. That is, someone is stomping on our > > > > structure. If that is the case, we should still see segfaults after > > fixing > > the failure to initialize shmem_comm_rank. We're doing runs right now > > in > > which shmem_comm_rank is also initialized (to 0) in the "else" clause, > > to > > check this possibility. > > > > The final possibility is that a legitimate user of this structure is > > writing > > this pointer value inappropriately. I think this is very unlikely, > > assuming > > this problem is not caused by a compiler bug, because the source code > > only > > writes to shmem_coll_rank in one place that I can see, and the code > > logically > > can only write an integer value. > > > > Regardless of the outcome of those tests, however, it is definitely a > > bug not > > to initialize shmem_comm_rank before it is used, unless I'm missing > > something. > > > > > > David Kewley > > Dell Infrastructure Consulting Services > > Onsite Engineer at the Maui HPC Center > > Cell: 602-460-7617 > > David_Kewley@Dell.com > > > > Dell Services: http://www.dell.com/services/ > > How am I doing? Email my manager Russell_Kelly@Dell.com with any > > feedback. > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From kumarra at cse.ohio-state.edu Fri Jul 18 18:28:22 2008 From: kumarra at cse.ohio-state.edu (Rahul Kumar) Date: Fri Jul 18 18:23:07 2008 Subject: [mvapich-discuss] uninitialized struct member leading to MVAPICH 1.0 segfault? In-Reply-To: References: Message-ID: <48811906.2090702@cse.ohio-state.edu> Hi David, Thanks for verifying the differences between 1.0 and 1.0.1. If you plan to upgrade to 1.0.1 please let us know if the failure persists. Yeah, memset could be a good practice. However, for this issue it does not solve the problem. Thanks again for reporting failures with MVAPICH. Rahul. David_Kewley@Dell.com wrote: > Rahul, > > Thanks for your detailed analysis. It looks like we are looking at > different versions of free_2level_comm(). Mine (mvapich-1.0) has no > mention of shmem_coll_ok whatsoever, but I see that mvapich-1.0.1 has > the check on shmem_coll_ok, which I agree should avoid this issue. > > It appears that 1.0.1 fixed this particular problem that exists in 1.0. > > Looks like it's time for us to figure out how to organize an upgrade. > We're generally a bit slow in order to minimize churn for our users. > > How does one best avoid or detect this class of problems in general > (using uninitialized variables)? Seems too much to ask the programmer > to be careful about each and every possible instance of this problem in > detail. Is memset(x,0,sizeof(x)) the general answer? > > Thanks! > David > > >> -----Original Message----- >> From: rahul kumar [mailto:kumarra@cse.ohio-state.edu] >> Sent: Friday, July 18, 2008 10:49 AM >> To: Kewley, David >> Cc: mvapich-discuss@cse.ohio-state.edu >> Subject: Re: [mvapich-discuss] uninitialized struct member leading to >> MVAPICH 1.0 segfault? >> >> Hi David, >> Thanks for reporting this and for your analysis. >> You are correct that the variable shmem_comm_rank is not being >> > initialized > >> in else part of the below statements >> if (shmem_comm_count < shmem_coll_blocks){ >> shmem_ptr->shmem_comm_rank = shmem_comm_count; >> input_flag = 1; >> } >> else{ >> input_flag = 0; >> } >> However, that actually might not be required. Although shmem_comm_rank >> > is > >> not initialized but input_flag is set 0. If you follow the variable >> input_flag in the following statements: >> MPI_Allreduce(&input_flag, &output_flag, 1, MPI_INT, MPI_LAND, >> comm_ptr->self); >> >> The above statement would set output_flag as 0 if any one of the >> > processes > >> has input_flag as 0. Based on that the variable shmem_coll_ok is set >> > to 0 > >> in the following part of the code in the same create_2level_comm() >> function. >> if (output_flag == 1){ >> comm_ptr->shmem_coll_ok = 1; >> } >> else{ >> comm_ptr->shmem_coll_ok = 0; >> } >> >> If you see in the free_2level_comm() function at the place where the >> variable shmem_comm_rank is dereferenced. The dereferencing happens >> > only > >> when shmem_coll_ok is 1 which would not be in our case. >> if ((my_local_id == 0)&&(comm_ptr->shmem_coll_ok == 1)){ >> pthread_spin_lock(&shmem_coll->shmem_coll_lock); >> shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = >> > 1; > >> pthread_spin_unlock(&shmem_coll->shmem_coll_lock); >> } >> >> So therefore, not initializing the variable shmem_comm_rank should not >> cause a problem. >> >> If you could send us a reproducer and/or backtrace of the segfault. We >> will be happy to help you. >> >> Regards, >> rahul. >> >> >> On Thu, 17 Jul 2008 David_Kewley@Dell.com wrote: >> >> >>> I have an MVAPICH 1.0 program segfaulting, and I think I may have >>> > traced > >>> it >>> back to MVAPICH's failure to initialize a struct member before using >>> > it. > >>> We >>> are testing a speculative fix right now. The full story follows; >>> > let > >>> me >>> know what you think. >>> >>> struct MPI_COMMUNICATOR member shmem_comm_rank is only set in one >>> > place > >>> as far >>> as I can see: >>> >>> src/context/create_2level_comm.c: >>> >>> 100 void create_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr, >>> > int > >>> size, int my_rank){ >>> ... >>> 208 if (shmem_comm_count < shmem_coll_blocks){ >>> 209 shmem_ptr->shmem_comm_rank = shmem_comm_count; >>> 210 input_flag = 1; >>> 211 } >>> 212 else{ >>> 213 input_flag = 0; >>> 214 } >>> ... >>> 277 } >>> >>> Note that shmem_comm_rank is set only if the condition holds; if the >>> condition >>> does not hold, then the value of shmem_comm_rank is whatever >>> > happened to > >>> be >>> in memory at that point. So, what might that value be? >>> >>> Best I can figure out, memory for a struct MPIR_COMMUNICATOR is >>> > always > >>> allocated using malloc(). My manpage for malloc says that malloc() >>> > does > >>> not >>> clear the memory it allocates, which I take to mean it does not set >>> > the > >>> memory contents to zero, but simply leaves it as it was. So if >>> > malloc() > >>> chooses to allocate memory which was previously free()'d, then the >>> memory >>> handed to the requester may have inappropriate, nonzero data in it. >>> > I > >>> do not >>> know for sure what happens if the memory happens to be freshly >>> > granted > >>> by the >>> kernel, but I suspect in this case it is guaranteed to be zeroed by >>> > the > >>> kernel. >>> >>> So... If the condition (shmem_comm_count < shmem_coll_blocks) does >>> > not > >>> hold, >>> then shmem_comm_rank is not initialized. If it is later referenced, >>> > its > >>> value is meaningless and may lead to an error. >>> >>> I believe that is what is happening to us; the major unknown at this >>> point is >>> whether we are in fact hitting the "else" part of the above clause. >>> > I'd > >>> love >>> your comments about what is likely the case, and how we can tell >>> without >>> doing a printf() or similar. :) >>> >>> Eventually we see a segfault in free_2level_comm(): >>> >>> src/context/create_2level_comm.c: >>> >>> 62 void free_2level_comm (struct MPIR_COMMUNICATOR* comm_ptr) >>> 63 { >>> ... >>> 87 if (comm_ptr->shmem_comm != MPI_COMM_NULL) { >>> 88 struct MPIR_COMMUNICATOR* shmem_ptr; >>> 89 shmem_ptr= >>> > MPIR_GET_COMM_PTR(comm_ptr->shmem_comm); > >>> 90 pthread_spin_lock(&shmem_coll->shmem_coll_lock); >>> 91 >>> shmem_coll_obj.shmem_avail[shmem_ptr->shmem_comm_rank] = 1; >>> 92 >>> > pthread_spin_unlock(&shmem_coll->shmem_coll_lock); > >>> 93 MPI_Comm_free(&(comm_ptr->shmem_comm)); >>> 94 } >>> ... >>> 98 } >>> >>> The segfault happens at line 91, because it appears that >>> shmem_ptr->shmem_comm_rank is a large negative number. I suspect in >>> fact >>> shmem_comm_rank was never initialized (see above), which means the >>> negative >>> number is an "accidental" value [1]. >>> >>> We only see this segfault in around 1 out of 20 runs of a particular >>> application. I suspect the ~1/20 hit rate is simply accidents of >>> > how > >>> memory >>> gets allocated in each run. Sometimes shmem_ptr->shmem_comm_rank >>> probably >>> happens to sit in a memory location that contains 0, so the above >>> > code > >>> does >>> not cause a segfault. >>> >>> I suspect the fact that we've only noticed this in one code may be >>> > an > >>> accident; I do not assume it is significant. We may not have >>> > visibility > >>> to >>> whether other codes are hitting this segfault mechanism. >>> >>> Do you agree that this failure to initialize shmem_comm_rank is a >>> > bug? > >>> If so, >>> probably the right fix is to add "shmem_ptr->shmem_comm_rank = 0;" >>> > to > >>> the "else" clause in the first code snippet above. Would you agree? >>> That is >>> the fix we are testing right now. Or should it be done in a >>> structure-initialization operation somehow? >>> >>> Mind you, I don't know whether it is *semantically* correct to set >>> shmem_comm_rank to 0 by default. I am doing it simply because it >>> replicates >>> the likely common case (~19 out of 20 runs) where the contents of >>> > that > >>> memory >>> location often just happen to be cleared to zero. >>> >>> Finding this bug raises a question: How do we guarantee that there >>> > are > >>> not >>> other unrecognized problems like this one? How to we check for use >>> > of > >>> uninitialized variables (e.g. structure members) allocated by >>> > malloc()? > >>> Is >>> it best practice to do a memset(x, 0, sizeof(x))? This is a >>> > C-coding > >>> best-practices question, and also a question about how MPICH and >>> > MVAPICH > >>> are >>> coded. >>> >>> Thanks, >>> David >>> >>> >>> [1] On x86_64 an int is 4 bytes and a pointer is 8 bytes. Looking >>> > at > >>> the >>> contents of the 8 bytes starting at &(shmem_ptr->shmem_comm_rank), >>> > they > >>> appear to be a valid pointer value similar to other pointer values I >>> > see > >>> in >>> this core dump. I do not know what this pointer points to (or >>> > pointed > >>> to in >>> the past). We get shmem_comm_rank interpreted as a large negative >>> number >>> simply because the MSbit of the first four bytes happens to be set. >>> >>> I think it is incontrovertible that these eight bytes hold a pointer >>> value >>> that was at some point valid. This value could have been written to >>> memory >>> before the *MPIR_COMMUNICATOR was allocated (presumably part of an >>> object >>> that was free()'d). This is the hypothesis I explore above. >>> >>> It's also possible that this pointer was written to those eight >>> > bytes > >>> *after* >>> the *MPIR_COMMUNICATOR was created. That is, someone is stomping on >>> > our > >>> structure. If that is the case, we should still see segfaults after >>> fixing >>> the failure to initialize shmem_comm_rank. We're doing runs right >>> > now > >>> in >>> which shmem_comm_rank is also initialized (to 0) in the "else" >>> > clause, > >>> to >>> check this possibility. >>> >>> The final possibility is that a legitimate user of this structure is >>> writing >>> this pointer value inappropriately. I think this is very unlikely, >>> assuming >>> this problem is not caused by a compiler bug, because the source >>> > code > >>> only >>> writes to shmem_coll_rank in one place that I can see, and the code >>> logically >>> can only write an integer value. >>> >>> Regardless of the outcome of those tests, however, it is definitely >>> > a > >>> bug not >>> to initialize shmem_comm_rank before it is used, unless I'm missing >>> something. >>> >>> >>> David Kewley >>> Dell Infrastructure Consulting Services >>> Onsite Engineer at the Maui HPC Center >>> Cell: 602-460-7617 >>> David_Kewley@Dell.com >>> >>> Dell Services: http://www.dell.com/services/ >>> How am I doing? Email my manager Russell_Kelly@Dell.com with any >>> feedback. >>> >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> > > > > From hoot at ptpnow.com Sun Jul 20 13:29:44 2008 From: hoot at ptpnow.com (Hoot Thompson) Date: Sun Jul 20 13:29:55 2008 Subject: [mvapich-discuss] Mvapich in xen enviroment Message-ID: I?m looking for comfirmation that someone has gotten ofed/mvapich to work on a virtual xen node.. And if so, a tutorial or at least hints would be useful :-) Thanks in advance, Hoot -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080720/1366c111/attachment.html From koop at cse.ohio-state.edu Mon Jul 21 14:15:09 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Jul 21 14:15:18 2008 Subject: [mvapich-discuss] mpi_ssend and system call (fortran) conflict In-Reply-To: Message-ID: Sorry to hear it still is not working. This error normally will be printed if the kernel being used is not new enough. What kernel are you running where this getting printed? Matt On Fri, 18 Jul 2008, Noam Bernstein wrote: > Aah - I missed the IBV_FORK_SAFE requirement. I've activated that, > but it's complaining that it can't do it: > > jellium.nrl.navy.mil 1224 : cat ssend_forksafe.mvapich2-1.0.3.stderr > libibverbs: Warning: fork()-safety requested but init failed > libibverbs: Warning: fork()-safety requested but init failed > libibverbs: Warning: fork()-safety requested but init failed > libibverbs: Warning: fork()-safety requested but init failed > send desc error > [0] Abort: [] Got completion with error 1, vendor code=69, dest rank=1 > at line 519 in file ibv_channel_manager.c > > Same error messages from mvapich-1.0.1, and both mvapich and mvapich2 > fail in the same way as before (since presumably they're not _actually_ > running in fork-safe mode). > > It's perhaps looking more like an OFED problem now, I suppose, but if it > is, maybe you can give me some clues as to what to look for? > > Noam > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From noam.bernstein at nrl.navy.mil Mon Jul 21 14:38:10 2008 From: noam.bernstein at nrl.navy.mil (Noam Bernstein) Date: Mon Jul 21 14:38:09 2008 Subject: [mvapich-discuss] mpi_ssend and system call (fortran) conflict In-Reply-To: References: Message-ID: On Jul 21, 2008, at 2:15 PM, Matthew Koop wrote: > > Sorry to hear it still is not working. This error normally will be > printed > if the kernel being used is not new enough. What kernel are you > running > where this getting printed? That explains the CPU affinity failure. We're running 2.6.9 - for some reason I thought it was higher than that. At some point in the next few months we'll go from CentOS 4 to CentOS 5, and then I'll revisit this issue. But it still doesn't explain why mvapich 2 is doing worse than mvapich when I don't turn on CPU affinity. Noam From kernel at tekno-soft.it Tue Jul 22 04:25:22 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Tue Jul 22 04:27:01 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: <4880C9AC.2050001@tekno-soft.it> References: <4880C9AC.2050001@tekno-soft.it> Message-ID: <48859972.1040907@tekno-soft.it> Roberto Fichera ha scritto: > Dhabaleswar Panda ha scritto: >> Hi Roberto, >> >> Thanks for your note. You are using the ch3:sock device in MVAPICH2 which >> is the same as MPICH2. You are also seeing similar failure scenarios (but >> in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 >> mailing list. One of the MPICH2 developers will be able to extend help on >> this issue faster. >> > Thanks for that. About the mpich2 problem, I already sent an email > regarding its related issue. > But the strange thing is that when linking against mpich2 I don't see > a so fast race as I see in the > mvapich2. In the mpich2 case I had to wait 1 or 2 hours before the lock. Just an update about the problem I got. After replacing all the MPI_Send() to MPI_Ssend() everything seems working well with mpich2 v1.0.7. My application doesn't race anymore at least after dispatching 50.000 jobs across 4 nodes, but trying to execute the same application against the last mvapich2 1.2rc1 I'm still getting the same problem as shown below. I've another question, since this multithreaded application has to run into a cluster with 1024 nodes equiped with Mellanox IB card, I really like to know if the OpenFabrics-IB interface does support the MPI_THREAD_MULTIPLE initialization and also the MPI_Comm_spawn() implementation. Thanks a lot for the feedback. >> Thanks, >> >> DK >> >> >> On Fri, 18 Jul 2008, Roberto Fichera wrote: >> >> >>> Hi All on the list, >>> >>> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, >>> initialize using MPI_THREAD_MULTI. >>> I've the master application doing the following thing, start several >>> thread depending by the assigned nodes, >>> on each node a slave application is spawned using the MPI_Comm_spawn(). >>> Before to call the >>> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each >>> thread, in order to set the all keys >>> (host and wdir) for addressing the wanted behaviour. So, as sooner as >>> the master application starts, it races >>> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the >>> status of the master application at race >>> time. It seems stuck on the PMIU_readline() which never returns so the >>> global lock is never relesead. MVAPICH2 >>> is compiled with: >>> >>> PKG_PATH=/HRI/External/mvapich2/1.2rc1 >>> >>> ./configure --prefix=$PKG_PATH \ >>> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >>> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >>> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ >>> --enable-sharedlibs=gcc \ >>> --enable-f90 \ >>> --enable-threads=multiple \ >>> --enable-g=-ggdb \ >>> --enable-debuginfo \ >>> --with-device=ch3:sock \ >>> --datadir=$PKG_PATH/data \ >>> --with-htmldir=$PKG_PATH/doc/html \ >>> --with-docdir=$PKG_PATH/doc \ >>> LDFLAGS='-Wl,-z,noexecstack' >>> >>> so I'm using the ch3:sock device. >>> >>> -----Thread 2 >>> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 >>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>> (gdb) bt >>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >>> /lib64/libpthread.so.0 >>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=>> optimized out>, key=0x0, value=0x33ca40ff58 >>> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >>> at ParallelWorker.c:664 >>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) >>> at ParallelWorker.c:719 >>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at >>> ParallelWorker.c:504 >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>> >>> -----Thread 3 >>> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 >>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>> (gdb) bt >>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >>> /lib64/libpthread.so.0 >>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=>> optimized out>, key=0x0, value=0x33ca40ff58 >>> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >>> at ParallelWorker.c:664 >>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) >>> at ParallelWorker.c:719 >>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at >>> ParallelWorker.c:504 >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>> >>> -----Thread 4 >>> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 >>> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >>> (gdb) bt >>> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >>> --->>#1 0x00002aaaab3db84a in PMIU_readline () from >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) >>> at ParallelWorker.c:754 >>> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at >>> ParallelWorker.c:504 >>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>> >>> I also tried to run against MPICH2 v1.0.7, but here I got a similar >>> scenery which show up after between 1 - 2 hours of execution, >>> see below: >>> >>> ----- thread 2 >>> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >>> (gdb) bt >>> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >>> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 >>> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 >>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>> >>> ----- thread 3 >>> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>> (gdb) bt >>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 >>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>> >>> >>> ----- thread 4 >>> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>> (gdb) bt >>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 >>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>> >>> where the thread 2 is poll()ing never never returns, so never signals >>> the poll() completion and than all the others >>> waiters in the MPIDI_CH3I_Progress() condition will never wake up. >>> >>> Does anyone is having the same problem? >>> >>> Thanks in advance, >>> Roberto Fichera. >>> >>> >> >> >> > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080722/ff12959e/attachment-0001.html From huanwei at cse.ohio-state.edu Tue Jul 22 10:39:06 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Tue Jul 22 10:39:15 2008 Subject: [mvapich-discuss] Mvapich in xen enviroment (fwd) Message-ID: Hi Hoot, Yes, in our group we have mvapich and mvapich2 running with ofed stack. Here is the basic steps you need to follow: 1) Install Xen (para-virtualized version) properly; 2) Create your domUs properly; 3) Install OFED-1.2 for Xen version (in both dom0 and domU), which you can download from this link: http://www.mellanox.com/products/xen.php 4) Get mvapich/mvapich2 properly installed in your domUs; 5) Use them as in native environments; Please let us know if this works for you. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Sun, 20 Jul 2008, Hoot Thompson wrote: > I¹m looking for comfirmation that someone has gotten ofed/mvapich to work on > a virtual xen node.. And if so, a tutorial or at least hints would be > useful :-) > > Thanks in advance, > > Hoot > From huanwei at cse.ohio-state.edu Tue Jul 22 10:47:59 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Tue Jul 22 10:48:07 2008 Subject: [mvapich-discuss] Mvapich in xen enviroment In-Reply-To: <017a01c8ec09$9fa439e0$640fa8c0@ptpdesk> Message-ID: I believe the package offers both rpm and source. It should work if you build from source. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Tue, 22 Jul 2008, Hoot Thompson wrote: > Thank you for the response. I am running on a rocks cluster which as I am > sure you know is based on Centos 5 distribution. Given the Mellanox OFED > version is targeted for SLES will it still work? > > Hoot > > > -----Original Message----- > From: wei huang [mailto:huanwei@cse.ohio-state.edu] > Sent: Tuesday, July 22, 2008 10:38 AM > To: Hoot Thompson > Subject: Re: [mvapich-discuss] Mvapich in xen enviroment > > Hi Hoot, > > Yes, in our group we have mvapich and mvapich2 running with ofed stack. > Here is the basic steps you need to follow: > > 1) Install Xen (para-virtualized version) properly; > > 2) Create your domUs properly; > > 3) Install OFED-1.2 for Xen version (in both dom0 and domU), which you can > download from this link: http://www.mellanox.com/products/xen.php > > 4) Get mvapich/mvapich2 properly installed in your domUs; > > 5) Use them as in native environments; > > Please let us know if this works for you. Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering Ohio State University OH 43210 > Tel: (614)292-8501 > > > On Sun, 20 Jul 2008, Hoot Thompson wrote: > > > I¹m looking for comfirmation that someone has gotten ofed/mvapich to > > work on a virtual xen node.. And if so, a tutorial or at least hints > > would be useful :-) > > > > Thanks in advance, > > > > Hoot > > > > > > > From hoot at ptpnow.com Tue Jul 22 10:50:33 2008 From: hoot at ptpnow.com (Hoot Thompson) Date: Tue Jul 22 10:50:44 2008 Subject: [mvapich-discuss] Mvapich in xen enviroment In-Reply-To: References: <017a01c8ec09$9fa439e0$640fa8c0@ptpdesk> Message-ID: <017b01c8ec0a$4b845150$640fa8c0@ptpdesk> Cool... I'll give it a try. Hoot -----Original Message----- From: wei huang [mailto:huanwei@cse.ohio-state.edu] Sent: Tuesday, July 22, 2008 10:48 AM To: Hoot Thompson Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Mvapich in xen enviroment I believe the package offers both rpm and source. It should work if you build from source. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Tue, 22 Jul 2008, Hoot Thompson wrote: > Thank you for the response. I am running on a rocks cluster which as > I am sure you know is based on Centos 5 distribution. Given the > Mellanox OFED version is targeted for SLES will it still work? > > Hoot > > > -----Original Message----- > From: wei huang [mailto:huanwei@cse.ohio-state.edu] > Sent: Tuesday, July 22, 2008 10:38 AM > To: Hoot Thompson > Subject: Re: [mvapich-discuss] Mvapich in xen enviroment > > Hi Hoot, > > Yes, in our group we have mvapich and mvapich2 running with ofed stack. > Here is the basic steps you need to follow: > > 1) Install Xen (para-virtualized version) properly; > > 2) Create your domUs properly; > > 3) Install OFED-1.2 for Xen version (in both dom0 and domU), which you > can download from this link: http://www.mellanox.com/products/xen.php > > 4) Get mvapich/mvapich2 properly installed in your domUs; > > 5) Use them as in native environments; > > Please let us know if this works for you. Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering Ohio State University OH > 43210 > Tel: (614)292-8501 > > > On Sun, 20 Jul 2008, Hoot Thompson wrote: > > > I?m looking for comfirmation that someone has gotten ofed/mvapich to > > work on a virtual xen node.. And if so, a tutorial or at least > > hints would be useful :-) > > > > Thanks in advance, > > > > Hoot > > > > > > > From huanwei at cse.ohio-state.edu Tue Jul 22 13:54:42 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Tue Jul 22 13:54:49 2008 Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance In-Reply-To: <993CF8B3-9FF0-4B82-9A96-FC32C6C652B2@nrl.navy.mil> Message-ID: Hi Noam, We have incorperated a patch in mvapich2-1.2rc1, and it's bandwidth should increase. You can update from our trunk and try again. Regarding the performance compared with mvapich, there is small gap in small and medium bandwidth achieved due to the internal architecture and features like message coalescing. However, we believe these differences should be hardly visible at application level. And for large message bandwidth, mvapich2-1.0.3, mvapich2-1.2rc1 and mvapich1 should show exactly the same bandwidth. This is consistent across all of our testing platforms. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Thu, 17 Jul 2008, Noam Bernstein wrote: > > On Jul 17, 2008, at 1:58 PM, wei huang wrote: > > > Hi Noam, > > > > It seems that the pdf file you attached plots a different set of > > data than > > the numbers you posted. Could you verify that? (There is a dip > > between 10k > > to 100k in the figure, but i cannot see the same in the numbers). > > The graph shows results without CPU affinity for mvapich2 1.0.1 > and mvapich2 1.2RC1. > > > > > Also, are these trends constant throughout multiple runs? > > Trends, yes. mvapich 1.0.3 is somewhat faster and more consistent than > mvapich2 1.0.1 or 1.2RC1. > > I've attached a new graph, with a second run of mvapich2 1.2RC1 (so you > can see the variability), and a tar file of all the data. > > Noam > > From huanwei at cse.ohio-state.edu Tue Jul 22 14:21:08 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Tue Jul 22 14:21:15 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI (fwd) In-Reply-To: Message-ID: Hi Roberto, > MPI_Send() to MPI_Ssend() everything > seems working well with mpich2 v1.0.7. My application doesn't race > anymore at least after dispatching > 50.000 jobs across 4 nodes, but trying to execute the same application > against the last mvapich2 1.2rc1 > I'm still getting the same problem as shown below. Good to know that it works with mpich2. How do you configure/compile mvapich2-1.2rc1? Are you using the ofed interface (default configuration), or using the TCP/IP stack (adding --with-device=ch3:sock in the configure option)? The TCP/IP stack should be the same as mpich2, in other words, it should work with your application. The implementation on ofed interface does not support spawn yet, thus there will be issues. > I've another question, since this multithreaded application has to run > into a cluster with 1024 nodes equiped > with Mellanox IB card, I really like to know if the OpenFabrics-IB > interface does support the MPI_THREAD_MULTIPLE > initialization and also the MPI_Comm_spawn() implementation. mvapich2 on the ofed (IB) interface supports MPI_THREAD_MULTIPLE. MPI_Comm_spawn() is not yet supported, however. We are working on this support and it will be available in our next release. Thanks. -- Wei > >> On Fri, 18 Jul 2008, Roberto Fichera wrote: > >> > >> > >>> Hi All on the list, > >>> > >>> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, > >>> initialize using MPI_THREAD_MULTI. > >>> I've the master application doing the following thing, start several > >>> thread depending by the assigned nodes, > >>> on each node a slave application is spawned using the MPI_Comm_spawn(). > >>> Before to call the > >>> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each > >>> thread, in order to set the all keys > >>> (host and wdir) for addressing the wanted behaviour. So, as sooner as > >>> the master application starts, it races > >>> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the > >>> status of the master application at race > >>> time. It seems stuck on the PMIU_readline() which never returns so the > >>> global lock is never relesead. MVAPICH2 > >>> is compiled with: > >>> > >>> PKG_PATH=/HRI/External/mvapich2/1.2rc1 > >>> > >>> ./configure --prefix=$PKG_PATH \ > >>> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >>> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >>> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ > >>> --enable-sharedlibs=gcc \ > >>> --enable-f90 \ > >>> --enable-threads=multiple \ > >>> --enable-g=-ggdb \ > >>> --enable-debuginfo \ > >>> --with-device=ch3:sock \ > >>> --datadir=$PKG_PATH/data \ > >>> --with-htmldir=$PKG_PATH/doc/html \ > >>> --with-docdir=$PKG_PATH/doc \ > >>> LDFLAGS='-Wl,-z,noexecstack' > >>> > >>> so I'm using the ch3:sock device. > >>> > >>> -----Thread 2 > >>> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 > >>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>> (gdb) bt > >>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >>> /lib64/libpthread.so.0 > >>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >>> optimized out>, key=0x0, value=0x33ca40ff58 > >>> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >>> at ParallelWorker.c:664 > >>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) > >>> at ParallelWorker.c:719 > >>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at > >>> ParallelWorker.c:504 > >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>> > >>> -----Thread 3 > >>> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 > >>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>> (gdb) bt > >>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >>> /lib64/libpthread.so.0 > >>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >>> optimized out>, key=0x0, value=0x33ca40ff58 > >>> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >>> at ParallelWorker.c:664 > >>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) > >>> at ParallelWorker.c:719 > >>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at > >>> ParallelWorker.c:504 > >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>> > >>> -----Thread 4 > >>> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 > >>> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >>> (gdb) bt > >>> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >>> --->>#1 0x00002aaaab3db84a in PMIU_readline () from > >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from > >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from > >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from > >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from > >>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) > >>> at ParallelWorker.c:754 > >>> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at > >>> ParallelWorker.c:504 > >>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>> > >>> I also tried to run against MPICH2 v1.0.7, but here I got a similar > >>> scenery which show up after between 1 - 2 hours of execution, > >>> see below: > >>> > >>> ----- thread 2 > >>> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >>> (gdb) bt > >>> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >>> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 > >>> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 > >>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>> > >>> ----- thread 3 > >>> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>> (gdb) bt > >>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 > >>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 > >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>> > >>> > >>> ----- thread 4 > >>> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>> (gdb) bt > >>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 > >>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 > >>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>> > >>> where the thread 2 is poll()ing never never returns, so never signals > >>> the poll() completion and than all the others > >>> waiters in the MPIDI_CH3I_Progress() condition will never wake up. > >>> > >>> Does anyone is having the same problem? > >>> > >>> Thanks in advance, > >>> Roberto Fichera. > >>> > >>> > >> > >> > >> > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > From kernel at tekno-soft.it Tue Jul 22 14:27:50 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Tue Jul 22 14:29:29 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI (fwd) In-Reply-To: References: Message-ID: <488626A6.4090105@tekno-soft.it> wei huang ha scritto: > Hi Roberto, > > >> MPI_Send() to MPI_Ssend() everything >> seems working well with mpich2 v1.0.7. My application doesn't race >> anymore at least after dispatching >> 50.000 jobs across 4 nodes, but trying to execute the same application >> against the last mvapich2 1.2rc1 >> I'm still getting the same problem as shown below. >> > > Good to know that it works with mpich2. How do you configure/compile > mvapich2-1.2rc1? Are you using the ofed interface (default configuration), > or using the TCP/IP stack (adding --with-device=ch3:sock in the configure > option)? The TCP/IP stack should be the same as mpich2, in other words, it > should work with your application. The implementation on ofed interface > does not support spawn yet, thus there will be issues. > Here is how I configured it, which is basically the same as for the mpich2 PKG_PATH=/HRI/External/mvapich2/1.2rc1 ./configure --prefix=$PKG_PATH \ --bindir=$PKG_PATH/bin/${MAKEFILE_PLATFORM} \ --sbindir=$PKG_PATH/bin/${MAKEFILE_PLATFORM} \ --libdir=$PKG_PATH/lib/${MAKEFILE_PLATFORM} \ --enable-sharedlibs=gcc \ --enable-f90 \ --enable-cxx \ --enable-romio \ --enable-threads=multiple \ --enable-g=gdb \ --enable-debuginfo \ --with-device=ch3:sock \ --datadir=$PKG_PATH/data \ --with-htmldir=$PKG_PATH/doc/html \ --with-docdir=$PKG_PATH/doc \ LDFLAGS='-Wl,-z,noexecstack' ----------------------------------------------------------- PKG_PATH=/HRI/External/mpich2/1.0.7 ./configure --prefix=$PKG_PATH \ --bindir=$PKG_PATH/bin/$MAKEFILE_PLATFORM \ --sbindir=$PKG_PATH/bin/$MAKEFILE_PLATFORM \ --libdir=$PKG_PATH/lib/$MAKEFILE_PLATFORM \ --enable-sharedlibs=gcc \ --enable-f90 \ --enable-cxx \ --enable-romio \ --enable-mpe \ --enable-threads=multiple \ --enable-debuginfo \ --enable-g=gdb \ --with-device=ch3:sock \ --datadir=$PKG_PATH/data \ --with-htmldir=$PKG_PATH/doc/html \ --with-docdir=$PKG_PATH/doc \ LDFLAGS='-Wl,-z,noexecstack' > >> I've another question, since this multithreaded application has to run >> into a cluster with 1024 nodes equiped >> with Mellanox IB card, I really like to know if the OpenFabrics-IB >> interface does support the MPI_THREAD_MULTIPLE >> initialization and also the MPI_Comm_spawn() implementation. >> > > mvapich2 on the ofed (IB) interface supports MPI_THREAD_MULTIPLE. > MPI_Comm_spawn() is not yet supported, however. We are working on this > support and it will be available in our next release. > So the question now become, when do you plan to release the ofed version with MPI_Comm_spawn() support ;-)? > Thanks. > > -- Wei > > > >>>> On Fri, 18 Jul 2008, Roberto Fichera wrote: >>>> >>>> >>>> >>>>> Hi All on the list, >>>>> >>>>> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, >>>>> initialize using MPI_THREAD_MULTI. >>>>> I've the master application doing the following thing, start several >>>>> thread depending by the assigned nodes, >>>>> on each node a slave application is spawned using the MPI_Comm_spawn(). >>>>> Before to call the >>>>> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each >>>>> thread, in order to set the all keys >>>>> (host and wdir) for addressing the wanted behaviour. So, as sooner as >>>>> the master application starts, it races >>>>> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the >>>>> status of the master application at race >>>>> time. It seems stuck on the PMIU_readline() which never returns so the >>>>> global lock is never relesead. MVAPICH2 >>>>> is compiled with: >>>>> >>>>> PKG_PATH=/HRI/External/mvapich2/1.2rc1 >>>>> >>>>> ./configure --prefix=$PKG_PATH \ >>>>> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >>>>> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ >>>>> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ >>>>> --enable-sharedlibs=gcc \ >>>>> --enable-f90 \ >>>>> --enable-threads=multiple \ >>>>> --enable-g=-ggdb \ >>>>> --enable-debuginfo \ >>>>> --with-device=ch3:sock \ >>>>> --datadir=$PKG_PATH/data \ >>>>> --with-htmldir=$PKG_PATH/doc/html \ >>>>> --with-docdir=$PKG_PATH/doc \ >>>>> LDFLAGS='-Wl,-z,noexecstack' >>>>> >>>>> so I'm using the ch3:sock device. >>>>> >>>>> -----Thread 2 >>>>> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 >>>>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>>> (gdb) bt >>>>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >>>>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >>>>> /lib64/libpthread.so.0 >>>>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=>>>> optimized out>, key=0x0, value=0x33ca40ff58 >>>>> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >>>>> at ParallelWorker.c:664 >>>>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) >>>>> at ParallelWorker.c:719 >>>>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at >>>>> ParallelWorker.c:504 >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>>> >>>>> -----Thread 3 >>>>> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 >>>>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>>> (gdb) bt >>>>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 >>>>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 >>>>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from >>>>> /lib64/libpthread.so.0 >>>>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self=>>>> optimized out>, key=0x0, value=0x33ca40ff58 >>>>> "!\204??\r\206??\030\204??3\206??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\177\205??\177\205??\177\205??\177\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??\033\205??\033\205??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\n\204??\033\205??\033\205??"...) >>>>> at ParallelWorker.c:664 >>>>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) >>>>> at ParallelWorker.c:719 >>>>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at >>>>> ParallelWorker.c:504 >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>>> >>>>> -----Thread 4 >>>>> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 >>>>> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >>>>> (gdb) bt >>>>> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 >>>>> --->>#1 0x00002aaaab3db84a in PMIU_readline () from >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) >>>>> at ParallelWorker.c:754 >>>>> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at >>>>> ParallelWorker.c:504 >>>>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>>> >>>>> I also tried to run against MPICH2 v1.0.7, but here I got a similar >>>>> scenery which show up after between 1 - 2 hours of execution, >>>>> see below: >>>>> >>>>> ----- thread 2 >>>>> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >>>>> (gdb) bt >>>>> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 >>>>> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 >>>>> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 >>>>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>>> >>>>> ----- thread 3 >>>>> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>> (gdb) bt >>>>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 >>>>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>>> >>>>> >>>>> ----- thread 4 >>>>> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>> (gdb) bt >>>>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >>>>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 >>>>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 >>>>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 >>>>> >>>>> where the thread 2 is poll()ing never never returns, so never signals >>>>> the poll() completion and than all the others >>>>> waiters in the MPIDI_CH3I_Progress() condition will never wake up. >>>>> >>>>> Does anyone is having the same problem? >>>>> >>>>> Thanks in advance, >>>>> Roberto Fichera. >>>>> >>>>> >>>>> >>>> >>>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080722/ccca9f3d/attachment-0001.html From christian.guggenberger at rzg.mpg.de Tue Jul 22 14:55:13 2008 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Tue Jul 22 14:55:24 2008 Subject: [mvapich-discuss] mvapich 1 vs. mvapich 2 performance In-Reply-To: References: Message-ID: <20080722185513.GA20837@daltons.rzg.mpg.de> On Wed, Jul 16, 2008 at 10:14:27AM -0400, Noam Bernstein wrote: > Should I be surprised as this gap in bandwidth between mvapich 1 and > mvapich 2 > (OSU benchmarks 3.0, osu_bibw)? mpi1 version is quite close to > expected maximum for IB (8 Gb/s each way), but mpi2 is 25% lower. > > Our cluster uses dual processor single core Opterons, Mellanox > Infiniband > HCAs with OFED 1.2.5.1, only 1 processor on each node in use. > Just curious because in a different thread (about fork() et al.) you also talked about performance problems, which were solvable using CPU affinity/mappings. Is this also the case here? (background: I am also looking into a real-app performance drop-down with mvapich2 vs. mvapich. This particular code evens shows the degradation when run with only one MPI-task and, in that case, even with mpich2. Furthermore, I cannot reproduce these results on core2-based xeons. So far I was only able reproduce it on Opteron with PCI-X HCAs (tavour-based). I thus would asperse NUMA vs SMP here...) cheers. - Christian From Craig.Tierney at noaa.gov Tue Jul 22 15:02:09 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Tue Jul 22 15:02:18 2008 Subject: [mvapich-discuss] Help with polled desc error revisited Message-ID: <48862EB1.9090804@noaa.gov> Back in January 2008, there was a thread about some users getting the following error messages: [9] Abort: Error code in polled desc! at line 1229 in file rdma_iba_priv.c I did not see a resolution to this problem. Did anyone find a solution? I am trying to run some applications that were compiled under the following setup: Dual-core Woodcrest, RHAS 4.4, Intel 9.1, OFED 1.2.5.1, Mvapich2-1.0 I am now trying to run these applications under a new environment: Quad-core Harpertown, Centos 5.1, Intel 9.1, OFED 1.3.1, Mvapich2-1.0 Our goal is to minimize the changes in the environment, while getting on a new OS base. Some executables that ran properly on the Woodcrest system do not run on the Harpertown system. Not all codes exhibit this problem. If the codes are run on 4 cores per host (vs. 8), they launch correctly. I have tried tweaking the limit for stacksize (because we generally set them to unlimited), but this did not help. I did find that using MV2_USE_SHAM=1 on the Harpertown image caused codes to fail when sending certain sized messages. I wonder if there is another variable that could affect startup. Thanks, Craig -- Craig Tierney (craig.tierney@noaa.gov) From Craig.Tierney at noaa.gov Tue Jul 22 17:53:17 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Tue Jul 22 17:53:30 2008 Subject: [mvapich-discuss] Help with polled desc error revisited In-Reply-To: <48862EB1.9090804@noaa.gov> References: <48862EB1.9090804@noaa.gov> Message-ID: <488656CD.7080908@noaa.gov> Sorry to followup my own message. I found the result of the previous thread. Clearing MV2_USE_RING_STARTUP solves the issue. Craig Craig Tierney wrote: > Back in January 2008, there was a thread about some users getting > the following error messages: > > [9] Abort: Error code in polled desc! > at line 1229 in file rdma_iba_priv.c > > I did not see a resolution to this problem. Did anyone > find a solution? > > I am trying to run some applications that were compiled under > the following setup: > > Dual-core Woodcrest, RHAS 4.4, Intel 9.1, OFED 1.2.5.1, Mvapich2-1.0 > > I am now trying to run these applications under a new environment: > > Quad-core Harpertown, Centos 5.1, Intel 9.1, OFED 1.3.1, Mvapich2-1.0 > > Our goal is to minimize the changes in the environment, while getting > on a new OS base. > > Some executables that ran properly on the Woodcrest system do not > run on the Harpertown system. Not all codes exhibit this problem. > If the codes are run on 4 cores per host (vs. 8), they launch correctly. > > I have tried tweaking the limit for stacksize (because we generally set > them to unlimited), but this did not help. > > I did find that using MV2_USE_SHAM=1 on the Harpertown image caused codes > to fail when sending certain sized messages. I wonder if there is > another variable that could affect startup. > > Thanks, > Craig > > > -- Craig Tierney (craig.tierney@noaa.gov) From kernel at tekno-soft.it Fri Jul 25 13:40:52 2008 From: kernel at tekno-soft.it (Roberto Fichera) Date: Fri Jul 25 13:42:34 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: References: Message-ID: <488A1024.60902@tekno-soft.it> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: TestMPICH2.zip Type: application/x-zip-compressed Size: 5073 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080725/a24d8580/TestMPICH2-0001.bin From panda at cse.ohio-state.edu Sat Jul 26 23:36:20 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Jul 26 23:36:32 2008 Subject: [mvapich-discuss] Races with MPI_THREAD_MULTI In-Reply-To: <488A1024.60902@tekno-soft.it> Message-ID: Roberto - Thanks for sending us the code. We will take a look at it. Thanks, DK On Fri, 25 Jul 2008, Roberto Fichera wrote: > Dhabaleswar Panda ha scritto: > > Hi Roberto, > > > > We have done several rounds of checks and do not see any difference > > between MPICH2 1.0.7 and the TCP/IP interface of MVAPICH2 1.2. Both these > > should perform exactly the same. We are continuing our investigation. > > > > We are wondering whether you can send us a sample code piece to reproduce > > the problem you are indicating across these two interfaces. This will > > help us to debug this problem faster and help you to solve your problem. > > > I've added other CCs in this email, maybe other people are interested to > have a look in. > > Attached you find the test program, which I'm working on, to turn up the > problem. I'm not completely sure if it works perfectly since I wasn't > able to complete its execution, but please let me know if I made > something wrong inside the code. The testmaster is quite easy, you must > provide the number > of jobs to simulate (say 50000) and the node file that the resource > manager provide for its schedule. Actually the node that matches the > master will > be excluded by the slave nodes. > > The testmain creates a ring of threads from the assigned nodes. So > walking in the ring, for each free node it find, a thread is started so > you should have as > many threads as the number of assigned nodes working in multithreading. > For simulating something to do each thread internally generate a random > integer, > sets some MPI_Info (host and pwd), spawn the testslave job, send it the > generated random number, wait that the testslave receive and send back that > number, sent and received numbers are comparated in order to verify > their coherency, the slave send an empty MPI_Send() for signaling its > termination, > the thread now calls MPI_Comm_disconnect() for closing the slave > connection, and finally all the MPI_Info are cleared. At this time the > thread terminate. > When the number of requested jobs are correctly "worked out" the > application should terminate ... but without cleaning up (too tired > sorry ;-), so it just wait a > bit and finalize the MPI. > > At this time, I wasn't able to complete any execution. Currently the > application still crashing with the backtrace you find below. Only one time > I was able to reach 3500 jobs but one thread was stuck in a mutex. > Looking in the backtrace you can find the same race I'm getting in my > applications. > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 1087666512 (LWP 18231)] > 0x00000000006a3902 in MPIDI_PG_Dup_vcr () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > Missing separate debuginfos, use: debuginfo-install glibc.x86_64 > (gdb) info threads > 29 Thread 1121462608 (LWP 18232) 0x0000003465a0a8f9 in > pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > * 28 Thread 1087666512 (LWP 18231) 0x00000000006a3902 in > MPIDI_PG_Dup_vcr () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > 27 Thread 1142442320 (LWP 18230) 0x0000003464ecbd66 in poll () from > /lib64/libc.so.6 > 26 Thread 1098156368 (LWP 18229) 0x0000003464e9ac61 in nanosleep () > from /lib64/libc.so.6 > 1 Thread 140135980537584 (LWP 18029) main (argc=3, > argv=0x7ffffb5992d8) at testmaster.c:437 > > (gdb) bt > #0 0x00000000006a3902 in MPIDI_PG_Dup_vcr () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #1 0x0000000000668012 in SetupNewIntercomm () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00000000006682c8 in MPIDI_Comm_accept () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00000000006a6617 in MPID_Comm_accept () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x000000000065ec5f in MPIDI_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x00000000006a17e6 in MPID_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #6 0x00000000006783fd in PMPI_Comm_spawn () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #7 0x00000000004017de in NodeThread_threadMain (arg=0x120a790) at > testmaster.c:314 > #8 0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0 > #9 0x0000003464ed4b0d in clone () from /lib64/libc.so.6 > (gdb) thread 29 > > [Switching to thread 29 (Thread 1121462608 (LWP 18232))]#0 > 0x0000003465a0a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > (gdb) bt > #0 0x0000003465a0a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x000000000065e2e7 in MPIDI_CH3I_Progress () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x00000000006675ca in FreeNewVC () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x0000000000668302 in MPIDI_Comm_accept () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x00000000006a6617 in MPID_Comm_accept () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #5 0x000000000065ec5f in MPIDI_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #6 0x00000000006a17e6 in MPID_Comm_spawn_multiple () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #7 0x00000000006783fd in PMPI_Comm_spawn () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #8 0x00000000004017de in NodeThread_threadMain (arg=0x120d590) at > testmaster.c:314 > #9 0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0 > #10 0x0000003464ed4b0d in clone () from /lib64/libc.so.6 > (gdb) thread 27 > > [Switching to thread 27 (Thread 1142442320 (LWP 18230))]#0 > 0x0000003464ecbd66 in poll () from /lib64/libc.so.6 > (gdb) bt > #0 0x0000003464ecbd66 in poll () from /lib64/libc.so.6 > #1 0x00000000006d63bf in MPIDU_Sock_wait () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #2 0x000000000065e1e7 in MPIDI_CH3I_Progress () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #3 0x00000000006cf87c in PMPI_Send () from > /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > #4 0x0000000000401831 in NodeThread_threadMain (arg=0x120a6f0) at > testmaster.c:480 > #5 0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0 > #6 0x0000003464ed4b0d in clone () from /lib64/libc.so.6 > > (gdb) thread 26 > [Switching to thread 26 (Thread 1098156368 (LWP 18229))]#0 > 0x0000003464e9ac61 in nanosleep () from /lib64/libc.so.6 > (gdb) bt > #0 0x0000003464e9ac61 in nanosleep () from /lib64/libc.so.6 > #1 0x0000003464e9aa84 in sleep () from /lib64/libc.so.6 > #2 0x000000000040197c in NodeThread_threadMain (arg=0x120d630) at > testmaster.c:505 > #3 0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0 > #4 0x0000003464ed4b0d in clone () from /lib64/libc.so.6 > (gdb) > > > Thanks, > > > > DK > > > > On Tue, 22 Jul 2008, Roberto Fichera wrote: > > > > > >> Roberto Fichera ha scritto: > >> > >>> Dhabaleswar Panda ha scritto: > >>> > >>>> Hi Roberto, > >>>> > >>>> Thanks for your note. You are using the ch3:sock device in MVAPICH2 which > >>>> is the same as MPICH2. You are also seeing similar failure scenarios (but > >>>> in different forms) with MPICH2 1.0.7. I am cc'ing this message to mpich2 > >>>> mailing list. One of the MPICH2 developers will be able to extend help on > >>>> this issue faster. > >>>> > >>>> > >>> Thanks for that. About the mpich2 problem, I already sent an email > >>> regarding its related issue. > >>> But the strange thing is that when linking against mpich2 I don't see > >>> a so fast race as I see in the > >>> mvapich2. In the mpich2 case I had to wait 1 or 2 hours before the lock. > >>> > >> Just an update about the problem I got. After replacing all the > >> MPI_Send() to MPI_Ssend() everything > >> seems working well with mpich2 v1.0.7. My application doesn't race > >> anymore at least after dispatching > >> 50.000 jobs across 4 nodes, but trying to execute the same application > >> against the last mvapich2 1.2rc1 > >> I'm still getting the same problem as shown below. > >> > >> I've another question, since this multithreaded application has to run > >> into a cluster with 1024 nodes equiped > >> with Mellanox IB card, I really like to know if the OpenFabrics-IB > >> interface does support the MPI_THREAD_MULTIPLE > >> initialization and also the MPI_Comm_spawn() implementation. > >> > >> Thanks a lot for the feedback. > >> > >>>> Thanks, > >>>> > >>>> DK > >>>> > >>>> > >>>> On Fri, 18 Jul 2008, Roberto Fichera wrote: > >>>> > >>>> > >>>> > >>>>> Hi All on the list, > >>>>> > >>>>> I'm trying to use mvapich2 v1.2rc1 in a multithreaded application, > >>>>> initialize using MPI_THREAD_MULTI. > >>>>> I've the master application doing the following thing, start several > >>>>> thread depending by the assigned nodes, > >>>>> on each node a slave application is spawned using the MPI_Comm_spawn(). > >>>>> Before to call the > >>>>> MPI_Comm_spawn() I prepare the given MPI_Info struct, one for each > >>>>> thread, in order to set the all keys > >>>>> (host and wdir) for addressing the wanted behaviour. So, as sooner as > >>>>> the master application starts, it races > >>>>> immediately with 4 nodes, 1 master and 3 slaves. Below you can see the > >>>>> status of the master application at race > >>>>> time. It seems stuck on the PMIU_readline() which never returns so the > >>>>> global lock is never relesead. MVAPICH2 > >>>>> is compiled with: > >>>>> > >>>>> PKG_PATH=/HRI/External/mvapich2/1.2rc1 > >>>>> > >>>>> ./configure --prefix=$PKG_PATH \ > >>>>> --bindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >>>>> --sbindir=$PKG_PATH/bin/linux-x86_64-gcc-glibc2.3.4 \ > >>>>> --libdir=$PKG_PATH/lib/linux-x86_64-gcc-glibc2.3.4 \ > >>>>> --enable-sharedlibs=gcc \ > >>>>> --enable-f90 \ > >>>>> --enable-threads=multiple \ > >>>>> --enable-g=-ggdb \ > >>>>> --enable-debuginfo \ > >>>>> --with-device=ch3:sock \ > >>>>> --datadir=$PKG_PATH/data \ > >>>>> --with-htmldir=$PKG_PATH/doc/html \ > >>>>> --with-docdir=$PKG_PATH/doc \ > >>>>> LDFLAGS='-Wl,-z,noexecstack' > >>>>> > >>>>> so I'm using the ch3:sock device. > >>>>> > >>>>> -----Thread 2 > >>>>> [Switching to thread 2 (Thread 1115699536 (LWP 29479))]#0 > >>>>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>>>> (gdb) bt > >>>>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>>>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >>>>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >>>>> /lib64/libpthread.so.0 > >>>>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >>>>> optimized out>, key=0x0, value=0x33ca40ff58 > >>>>> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >>>>> at ParallelWorker.c:664 > >>>>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62ff50) > >>>>> at ParallelWorker.c:719 > >>>>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ff50) at > >>>>> ParallelWorker.c:504 > >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>>>> > >>>>> -----Thread 3 > >>>>> [Switching to thread 3 (Thread 1105209680 (LWP 29478))]#0 > >>>>> 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>>>> (gdb) bt > >>>>> #0 0x00000033ca40cef4 in __lll_lock_wait () from /lib64/libpthread.so.0 > >>>>> #1 0x00000033ca408915 in _L_lock_102 () from /lib64/libpthread.so.0 > >>>>> --->>#2 0x00000033ca408390 in pthread_mutex_lock () from > >>>>> /lib64/libpthread.so.0 > >>>>> --->>#3 0x00002aaaab382654 in PMPI_Info_set () from > >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #4 0x0000000000417627 in ParallelWorker_setSlaveInfo (self= >>>>> optimized out>, key=0x0, value=0x33ca40ff58 > >>>>> "!\204��\r\206��\030\204��3\206��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\177\205��\177\205��\177\205��\177\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��\033\205��\033\205��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\n\204��\033\205��\033\205��"...) > >>>>> at ParallelWorker.c:664 > >>>>> #5 0x0000000000418905 in ParallelWorker_handleParallel (self=0x62f270) > >>>>> at ParallelWorker.c:719 > >>>>> #6 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62f270) at > >>>>> ParallelWorker.c:504 > >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>>>> > >>>>> -----Thread 4 > >>>>> [Switching to thread 4 (Thread 1094719824 (LWP 29477))]#0 > >>>>> 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >>>>> (gdb) bt > >>>>> #0 0x00000033ca40d34b in read () from /lib64/libpthread.so.0 > >>>>> --->>#1 0x00002aaaab3db84a in PMIU_readline () from > >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> --->>#2 0x00002aaaab3d9d37 in PMI_Spawn_multiple () from > >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #3 0x00002aaaab333893 in MPIDI_Comm_spawn_multiple () from > >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #4 0x00002aaaab38bcf6 in MPID_Comm_spawn_multiple () from > >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #5 0x00002aaaab355a10 in PMPI_Comm_spawn () from > >>>>> /home/roberto/.HRI/Proxy/HRI/External/mvapich2/1.2/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #6 0x00000000004189d8 in ParallelWorker_handleParallel (self=0x62ad40) > >>>>> at ParallelWorker.c:754 > >>>>> #7 0x000000000041b39e in ParallelWorker_threadMain (arg=0x62ad40) at > >>>>> ParallelWorker.c:504 > >>>>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>>>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>>>> > >>>>> I also tried to run against MPICH2 v1.0.7, but here I got a similar > >>>>> scenery which show up after between 1 - 2 hours of execution, > >>>>> see below: > >>>>> > >>>>> ----- thread 2 > >>>>> [Switching to thread 2 (Thread 1094719824 (LWP 1279))]#0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >>>>> (gdb) bt > >>>>> #0 0x00000033c94cbd66 in poll () from /lib64/libc.so.6 > >>>>> #1 0x00002aaaab5a3d2f in MPIDU_Sock_wait () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #2 0x00002aaaab52bdc7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #3 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #4 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #5 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #6 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6358e0) at ParallelWorker.c:819 > >>>>> #7 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6358e0) at ParallelWorker.c:515 > >>>>> #8 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>>>> #9 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>>>> > >>>>> ----- thread 3 > >>>>> [Switching to thread 3 (Thread 1084229968 (LWP 1278))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>>>> (gdb) bt > >>>>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>>>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x634d20) at ParallelWorker.c:819 > >>>>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x634d20) at ParallelWorker.c:515 > >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>>>> > >>>>> > >>>>> ----- thread 4 > >>>>> [Switching to thread 4 (Thread 1115699536 (LWP 1277))]#0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>>>> (gdb) bt > >>>>> #0 0x00000033ca40a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > >>>>> #1 0x00002aaaab52bec7 in MPIDI_CH3I_Progress () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #2 0x00002aaaab5301a7 in MPIDI_CH3U_VC_WaitForClose () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #3 0x00002aaaab56f162 in MPID_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #4 0x00002aaaab5417ec in PMPI_Comm_disconnect () from /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glibc2.3.4/libmpich.so.1.1 > >>>>> #5 0x00002aaaabda5a99 in ParallelWorker_destroySlave (self=0x6341a0) at ParallelWorker.c:819 > >>>>> #6 0x00002aaaabda6223 in ParallelWorker_threadMain (arg=0x6341a0) at ParallelWorker.c:515 > >>>>> #7 0x00000033ca406407 in start_thread () from /lib64/libpthread.so.0 > >>>>> #8 0x00000033c94d4b0d in clone () from /lib64/libc.so.6 > >>>>> > >>>>> where the thread 2 is poll()ing never never returns, so never signals > >>>>> the poll() completion and than all the others > >>>>> waiters in the MPIDI_CH3I_Progress() condition will never wake up. > >>>>> > >>>>> Does anyone is having the same problem? > >>>>> > >>>>> Thanks in advance, > >>>>> Roberto Fichera. > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> ------------------------------------------------------------------------ > >>> > >>> _______________________________________________ > >>> mvapich-discuss mailing list > >>> mvapich-discuss@cse.ohio-state.edu > >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>> > >>> > >> > > > > > > > > From henrybooks at yahoo.com.cn Mon Jul 28 07:38:50 2008 From: henrybooks at yahoo.com.cn (Peter Zhu) Date: Mon Jul 28 08:33:01 2008 Subject: [mvapich-discuss] build problem(Jul 28, 2008) Message-ID: <255974.42605.qm@web15006.mail.cnb.yahoo.com> Hi All, I downloaded mvapich mvapich-1.0.1.tar.gz, and installed it on a Cluster system(with OFED 1.3.1/Redhad AS4u6-x86-64/Intel Fortran Compiler 9.0). 1) tar -zxvf mvapich-1.0.1.tar.gz 2) cd mvapich-1.0.1 3) ./configure -prefix=/usr/local/mvapich1_0 --enable-weak-symbols --enable-sharedlibs --enable-f90modules -fc=/usr/local/intel/fce/9.0/bin/ifort -f90=/usr/local/intel/fce/9.0/bin/ifort 4) make 5) make install but I did not find mpirun_rsh in directory /usr/local/mvapich1_0/bin, why? Regards. peter2008 ___________________________________________________________ ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 http://cn.mail.yahoo.com/ From koop at cse.ohio-state.edu Mon Jul 28 10:11:27 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Jul 28 10:11:39 2008 Subject: [mvapich-discuss] build problem(Jul 28, 2008) In-Reply-To: <255974.42605.qm@web15006.mail.cnb.yahoo.com> Message-ID: Peter, MVAPICH at this point requires you to set some additional CFLAGS as well as a different device. To assist in this we have provided a make script 'make.mvapich.gen2' that will configure, compile and install MVAPICH. This script is in the base directory. Additional information is in the user guide: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.pdf Let us know if you have any additional questions. Thanks, Matt On Mon, 28 Jul 2008, Peter Zhu wrote: > Hi All, > I downloaded mvapich mvapich-1.0.1.tar.gz, and > installed it on a Cluster system(with OFED > 1.3.1/Redhad AS4u6-x86-64/Intel Fortran Compiler 9.0). > 1) tar -zxvf mvapich-1.0.1.tar.gz > 2) cd mvapich-1.0.1 > 3) ./configure -prefix=/usr/local/mvapich1_0 > --enable-weak-symbols --enable-sharedlibs > --enable-f90modules > -fc=/usr/local/intel/fce/9.0/bin/ifort > -f90=/usr/local/intel/fce/9.0/bin/ifort > 4) make > 5) make install > but I did not find mpirun_rsh in directory > /usr/local/mvapich1_0/bin, > why? > > > Regards. > > peter2008 > > > ___________________________________________________________ > ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 > http://cn.mail.yahoo.com/ > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From Terrence.LIAO at total.com Mon Jul 28 16:38:36 2008 From: Terrence.LIAO at total.com (Terrence.LIAO@total.com) Date: Mon Jul 28 16:38:52 2008 Subject: [mvapich-discuss] Question on "Couldn't create RC QP" Message-ID: Dear mvapich, I encountered this "Couldn't create RC QP" problem from time to time. When one run failed with this message, usually I can succeed by run it again. Would anyone explain to me why my second attempt will have no problem? I know this might link to the lock mem is too small, but I am puzzled why it did not fail to run every time? Thank you very much. -- Terrence -------------------------------------------------------- Terrence Liao, Ph.D. Research Computer Scientist TOTAL E&P RESEARCH & TECHNOLOGY USA, LLC 1201 Louisiana, Suite 1800, Houston, TX 77002 Tel: 713.647.3498 Fax: 713.647.3638 Email: terrence.liao@total.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080728/355b6e23/attachment-0001.html From johnip at sgi.com Mon Jul 28 12:17:23 2008 From: johnip at sgi.com (John Partridge) Date: Tue Jul 29 08:24:41 2008 Subject: [mvapich-discuss] Hard coded /tmp patch for shared memory files Message-ID: <488DF113.1020102@sgi.com> We recently had a customer issue with shared memory files being hard coded to /tmp. The circumstances were that the system was a diskless cluster with /tmp being an in memory files system. The /tmp file system was not large enough to support the shared memory files. So, the customer asked if we could make mvapich use an alternative path for the shared memory files. The version the customer is using is mvapich-0.9.9-1326 (from ofed-1.3) and we produced a patch to get an alternative path via an environment variable. The patch is attached in case you might want to include it in a future release of mvapich/mvapich2 Regards John -- John Partridge MPI Engineering Group Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip@sgi.com -------------- next part -------------- A non-text attachment was scrubbed... Name: tmppath.patch Type: text/x-patch Size: 10205 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080728/0bf6fb4b/tmppath.bin From kallies at zib.de Tue Jul 29 12:45:38 2008 From: kallies at zib.de (Bernd Kallies) Date: Tue Jul 29 12:45:54 2008 Subject: [mvapich-discuss] Performance differences between mvapich2-1.0 and mvapich2-1.2 Message-ID: <1217349938.3562.470.camel@kallies.zib.de> It seems to me that mvapich2-1.2rc1 seems to be slower that previous versions when compiling/using defaults. I'd like to know if I forgot some secret preprocessor flag or configure option for 1.2. I compiled the nighty build for mvapich2-1.0 as of July 28 (I guess it is something like mvapich2-1.0.5) with the following settings: export CC=icc export CXX=icpc export F77=ifort export F90=ifort export CFLAGS='-D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED -DMPIDI_CH3_CHANNEL_RNDV -DMPID_USE_SEQUENCE_NUMBERS -DRDMA_CM -O2' configure --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd --disable-romio --enable-sharedlibs=gcc --without-mpe I compiled the tarball source of mvapich2-1.2rc1 with unset CFLAGS ./configure --enable-romio --with-file-system=lustre+nfs --enable-fast=defopt --with-rdma=gen2 --with-thread-package --enable-sharedlibs=gcc --without-mpe I get the following when running osu_alltoall with 1 task per node on two nodes after setting MV2_NUM_PORTS=2 MV2_ENABLE_AFFINITY=0: mvapich2-1.0.5-intel: # OSU MPI All-to-All Personalized Exchange Latency Test v3.1 # Size Latency (us) 1 1.62 2 1.71 4 1.66 8 1.64 16 1.68 32 1.74 64 1.97 128 3.04 256 3.42 512 4.01 1024 5.26 2048 6.62 4096 9.45 8192 15.20 16384 17.76 32768 23.21 65536 38.60 131072 76.32 262144 151.70 524288 296.74 1048576 591.68 mvapich2-1.2rc1-intel: # OSU MPI All-to-All Personalized Exchange Latency Test v3.1 # Size Latency (us) 1 1.87 2 1.80 4 1.81 8 1.82 16 1.86 32 1.92 64 2.10 128 3.16 256 3.53 512 4.07 1024 5.33 2048 6.79 4096 9.54 8192 15.34 16384 17.48 32768 22.88 65536 38.78 131072 76.55 262144 149.74 524288 297.11 1048576 591.25 Other OSU benchmarks yield no visible differences between the two builds, e.g. osu_mbw_mr with 2 nodes and 4 tasks per node: mvapich2-1.0.5-intel: # OSU MPI Multiple Bandwidth / Message Rate Test v3.1 # [ pairs: 4 ] [ window size: 64 ] # Size MB/s Messages/s 1 3.45 3447336.26 2 6.93 3463236.43 4 13.83 3458551.26 8 27.68 3460000.08 16 62.91 3931824.03 32 109.74 3429389.41 64 213.14 3330258.12 128 353.90 2764881.74 256 624.27 2438548.84 512 980.57 1915173.15 1024 1241.38 1212281.33 2048 1463.71 714703.42 4096 1612.25 393616.25 8192 1721.11 210096.00 16384 1851.29 112993.94 32768 2051.28 62600.09 65536 2062.08 31464.92 131072 2065.59 15759.17 262144 2074.04 7911.82 524288 2082.66 3972.35 1048576 2087.94 1991.22 2097152 2090.20 996.69 4194304 2075.23 494.77 mvapich2-1.2rc1-intel: # OSU MPI Multiple Bandwidth / Message Rate Test v3.1 # [ pairs: 4 ] [ window size: 64 ] # Size MB/s Messages/s 1 3.42 3424686.07 2 6.92 3459442.70 4 13.73 3431691.09 8 27.59 3449218.84 16 62.63 3914337.15 32 108.91 3403302.14 64 210.89 3295101.65 128 347.89 2717920.88 256 621.49 2427687.32 512 982.32 1918595.24 1024 1246.40 1217187.35 2048 1490.18 727625.11 4096 1684.54 411264.55 8192 1768.11 215833.58 16384 1852.36 113059.37 32768 2048.83 62525.18 65536 2062.01 31463.76 131072 2066.38 15765.20 262144 2074.90 7915.12 524288 2082.75 3972.54 1048576 2088.07 1991.34 2097152 2090.04 996.61 4194304 2077.47 495.31 I also compiled the quantum chemistry code CPMD 3.11.1 with both libs. The code has own profiling. A benchmark run yields for a run with 64 nodes, 1 task per node, 1 thread per task, application-defined task pinning, MV2_NUM_PORTS=2 MV2_ENABLE_AFFINITY=0: mvapich2-1.0.5-intel: ... CPU TIME : 0 HOURS 17 MINUTES 7.53 SECONDS ELAPSED TIME : 0 HOURS 17 MINUTES 40.26 SECONDS ... ================================================================ = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS = = SEND/RECEIVE 36385. BYTES 722421. = = BROADCAST 37880. BYTES 368. = = GLOBAL SUMMATION 393974. BYTES 10556. = = GLOBAL MULTIPLICATION 0. BYTES 1. = = ALL TO ALL COMM 484310. BYTES 46464. = = PERFORMANCE TOTAL TIME = = SEND/RECEIVE 681.133 MB/S 38.591 SEC = = BROADCAST 87.115 MB/S 0.160 SEC = = GLOBAL SUMMATION 1520.563 MB/S 16.410 SEC = = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC = = ALL TO ALL COMM 86.898 MB/S 258.959 SEC = = SYNCHRONISATION 1.750 SEC = ================================================================ mvapich2-1.2rc1-intel: ... CPU TIME : 0 HOURS 18 MINUTES 59.23 SECONDS ELAPSED TIME : 0 HOURS 19 MINUTES 31.68 SECONDS ... ================================================================ = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS = = SEND/RECEIVE 36385. BYTES 722421. = = BROADCAST 37880. BYTES 368. = = GLOBAL SUMMATION 393974. BYTES 10556. = = GLOBAL MULTIPLICATION 0. BYTES 1. = = ALL TO ALL COMM 484310. BYTES 46464. = = PERFORMANCE TOTAL TIME = = SEND/RECEIVE 699.651 MB/S 37.570 SEC = = BROADCAST 87.114 MB/S 0.160 SEC = = GLOBAL SUMMATION 1557.608 MB/S 16.020 SEC = = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC = = ALL TO ALL COMM 61.302 MB/S 367.082 SEC = = SYNCHRONISATION 1.950 SEC = ================================================================ The difference is reproducible (mvapich2-1.2rc1-intel is slower, seems to be the reason of slow all to all comm.), also compared to mvapich2-1.0.3 from tarball, or mvapich2-1.0.1 and mvapich-0.9.9 (both precompiled from SGI, available from SGI). Note that the benchmarks are run with no intra-node communication. Sincerely, BK -- Dr. Bernd Kallies Konrad-Zuse-Zentrum f?r Informationstechnik Berlin Takustr. 7 14195 Berlin Tel: +49-30-84185-270 Fax: +49-30-84185-311 e-mail: kallies@zib.de From chai.15 at osu.edu Tue Jul 29 13:47:27 2008 From: chai.15 at osu.edu (Lei Chai) Date: Tue Jul 29 13:47:35 2008 Subject: [mvapich-discuss] Hard coded /tmp patch for shared memory files In-Reply-To: <488DF113.1020102@sgi.com> References: <488DF113.1020102@sgi.com> Message-ID: <488F57AF.3000704@osu.edu> Hi John, Thanks for reporting the problem and sending the patch to us. We have also realized the limitation, and have come up with a solution that does not require an actual file path for shared memory communication (by using shmget and shmat function calls, thanks to suggestions from TACC). The new solution will be available in the next mvapich2 release. Thanks again, Lei John Partridge wrote: > We recently had a customer issue with shared memory files being > hard coded to /tmp. The circumstances were that the system was > a diskless cluster with /tmp being an in memory files system. > > The /tmp file system was not large enough to support the shared > memory files. So, the customer asked if we could make mvapich use > an alternative path for the shared memory files. > > The version the customer is using is mvapich-0.9.9-1326 (from ofed-1.3) > and we produced a patch to get an alternative path via an environment > variable. The patch is attached in case you might want to include it > in a future release of mvapich/mvapich2 > > Regards > John > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From koop at cse.ohio-state.edu Tue Jul 29 13:47:50 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue Jul 29 13:47:59 2008 Subject: [mvapich-discuss] Hard coded /tmp patch for shared memory files In-Reply-To: <488DF113.1020102@sgi.com> Message-ID: John, Thanks for the patch, we'll try to include this functionality in future releases. We're also looking into not having to create the temp file at all, which will be another way to resolve this issue on diskless clusters. Thanks again for the feedback and patch, Matt On Mon, 28 Jul 2008, John Partridge wrote: > We recently had a customer issue with shared memory files being > hard coded to /tmp. The circumstances were that the system was > a diskless cluster with /tmp being an in memory files system. > > The /tmp file system was not large enough to support the shared > memory files. So, the customer asked if we could make mvapich use > an alternative path for the shared memory files. > > The version the customer is using is mvapich-0.9.9-1326 (from ofed-1.3) > and we produced a patch to get an alternative path via an environment > variable. The patch is attached in case you might want to include it > in a future release of mvapich/mvapich2 > > Regards > John > > -- > John Partridge > MPI Engineering Group > > Silicon Graphics Inc > Tel: 651-683-3428 > Vnet: 233-3428 > E-Mail: johnip@sgi.com > From moody20 at llnl.gov Tue Jul 29 17:16:35 2008 From: moody20 at llnl.gov (Adam Moody) Date: Tue Jul 29 17:16:47 2008 Subject: [mvapich-discuss] Hard coded /tmp patch for shared memory files In-Reply-To: <488F57AF.3000704@osu.edu> References: <488DF113.1020102@sgi.com> <488F57AF.3000704@osu.edu> Message-ID: <488F88B3.5020903@llnl.gov> Hi Lei, In practice, we found there are some disadvantages with using shared memory segments, as well. Some codes may seg fault or be killed early by the user, which then leaves its shared memory segment orphaned. Over time, the cluster runs into problems with resource exhaustion. It's difficult to know which segments can be freed, especially on nodes which may be running sereral jobs. We encountered such problems with another MPI implementation on a cluster which is cpu-scheduled, such that each node may run multiple jobs at once. We don't see this problem when using files in /tmp, since they are unlinked very soon after they are created (so that the OS will do the cleanup) and before MPI returns control to the user application from MPI_Init. It may be good to keep both methods available. I think we'd prefer the /tmp files here. -Adam Moody Lawrence Livermore National Laboratory Lei Chai wrote: > Hi John, > > Thanks for reporting the problem and sending the patch to us. We have > also realized the limitation, and have come up with a solution that > does not require an actual file path for shared memory communication > (by using shmget and shmat function calls, thanks to suggestions from > TACC). The new solution will be available in the next mvapich2 release. > > Thanks again, > Lei > > > John Partridge wrote: > >> We recently had a customer issue with shared memory files being >> hard coded to /tmp. The circumstances were that the system was >> a diskless cluster with /tmp being an in memory files system. >> >> The /tmp file system was not large enough to support the shared >> memory files. So, the customer asked if we could make mvapich use >> an alternative path for the shared memory files. >> >> The version the customer is using is mvapich-0.9.9-1326 (from ofed-1.3) >> and we produced a patch to get an alternative path via an environment >> variable. The patch is attached in case you might want to include it >> in a future release of mvapich/mvapich2 >> >> Regards >> John >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From chai.15 at osu.edu Tue Jul 29 17:56:40 2008 From: chai.15 at osu.edu (Lei Chai) Date: Tue Jul 29 17:56:49 2008 Subject: [mvapich-discuss] Hard coded /tmp patch for shared memory files In-Reply-To: <488F88B3.5020903@llnl.gov> References: <488DF113.1020102@sgi.com> <488F57AF.3000704@osu.edu> <488F88B3.5020903@llnl.gov> Message-ID: <488F9218.5020203@osu.edu> Hi Adam, Thank you for your feedbacks on using shared memory segments. They are helpful. We will investigate on the resource cleanup issue and before we find a single perfect solution we will keep both methods available. Thanks, Lei Adam Moody wrote: > Hi Lei, > In practice, we found there are some disadvantages with using shared > memory segments, as well. Some codes may seg fault or be killed early > by the user, which then leaves its shared memory segment orphaned. > Over time, the cluster runs into problems with resource exhaustion. > It's difficult to know which segments can be freed, especially on > nodes which may be running sereral jobs. We encountered such problems > with another MPI implementation on a cluster which is cpu-scheduled, > such that each node may run multiple jobs at once. > > We don't see this problem when using files in /tmp, since they are > unlinked very soon after they are created (so that the OS will do the > cleanup) and before MPI returns control to the user application from > MPI_Init. It may be good to keep both methods available. I think > we'd prefer the /tmp files here. > -Adam Moody > Lawrence Livermore National Laboratory > > > Lei Chai wrote: > >> Hi John, >> >> Thanks for reporting the problem and sending the patch to us. We have >> also realized the limitation, and have come up with a solution that >> does not require an actual file path for shared memory communication >> (by using shmget and shmat function calls, thanks to suggestions from >> TACC). The new solution will be available in the next mvapich2 release. >> >> Thanks again, >> Lei >> >> >> John Partridge wrote: >> >>> We recently had a customer issue with shared memory files being >>> hard coded to /tmp. The circumstances were that the system was >>> a diskless cluster with /tmp being an in memory files system. >>> >>> The /tmp file system was not large enough to support the shared >>> memory files. So, the customer asked if we could make mvapich use >>> an alternative path for the shared memory files. >>> >>> The version the customer is using is mvapich-0.9.9-1326 (from ofed-1.3) >>> and we produced a patch to get an alternative path via an environment >>> variable. The patch is attached in case you might want to include it >>> in a future release of mvapich/mvapich2 >>> >>> Regards >>> John >>> >>> ------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http:// mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> From peter.cebull at inl.gov Wed Jul 30 10:45:18 2008 From: peter.cebull at inl.gov (Peter Cebull) Date: Wed Jul 30 10:45:58 2008 Subject: [mvapich-discuss] MVAPICH2 Allreduce Performance Message-ID: <48907E7E.1050009@inl.gov> We are looking at some scalability issues for a particular application on one of our clusters. Specifically, I plotted the MPI_Allreduce performance of MVAPICH2, MVAPICH, Intel MPI, and Open MPI as measured by the Intel MPI Allreduce Benchmark. The plot shows average time in microseconds vs the number of processes from 2 to 512 for a message size of 4 kB. The results show MVAPICH2 performing very well up to 128 process, but for 256 and 512 processes the performance drops off by an order of magnitude to match the performance of MVAPICH and Intel MPI. Is this expected behavior, and is there a way to improve the scalability for 256+ processes? I didn't see this topic in the archive, I apologize if it's been discussed before. We are running dual quad-core EM64t nodes, OFED 1.2, Mellanox Technologies MT25204 [InfiniHost III Lx HCA]. This machine is an SGI Altix ICE with ProPack 5 SP3. The timing data are listed below. mpich2version Version: mvapich2-1.0 Device: osu_ch3:mrail Configure Options: '--prefix=/usr/local/mvapich2/mvapich2-1.0.2/intel-opt' '--with-device=osu_ch3:mrail' '--with-rdma=gen2' '--with-pm=mpd' '--enable-shared=gcc' '--enable-sharedlibs=gcc' '--disable-romio' '--without-mpe' 'CC=icc' 'CFLAGS=-fPIC -D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED -DMPIDI_CH3_CHANNEL_RNDV -DMPID_USE_SEQUENCE_NUMBERS -DRDMA_CM -I/usr/include -fPIC -O2' 'CXX=icpc' 'F77=ifort' 'F90=ifort' 'FFLAGS=-L/usr/lib64 -fPIC' CC: icc -fPIC -D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED -DMPIDI_CH3_CHANNEL_RNDV -DMPID_USE_SEQUENCE_NUMBERS -DRDMA_CM -I/usr/include -fPIC -O2 CXX: icpc F77: ifort -L/usr/lib64 -fPIC F90: ifort Thanks, Peter # processes vs time in us Intel MPI 3.1 2 7.12 4 14.82 8 26.07 16 83.85 32 543.00 64 1025.87 128 1492.71 256 1957.55 512 2445.58 MVAPICH 0.9.9 2 13.44 4 20.72 8 37.08 16 84.59 32 545.56 64 1018.50 128 1509.70 256 1959.09 512 2481.70 MVAPICH2 1.0.2 2 11.76 4 19.16 8 37.26 16 80.09 32 105.88 64 111.21 128 126.11 256 1942.33 512 2434.15 Open MPI 1.2.6 2 13.23 4 30.25 8 63.63 16 95.66 32 155.05 64 272.42 128 512.11 256 752.29 512 999.50 -- Peter Cebull Idaho National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: AR4096.jpg Type: image/jpeg Size: 41827 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080730/80c55bf7/AR4096-0001.jpg From huanwei at cse.ohio-state.edu Wed Jul 30 10:47:16 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Wed Jul 30 10:47:25 2008 Subject: [mvapich-discuss] Performance differences between mvapich2-1.0 and mvapich2-1.2 (fwd) In-Reply-To: Message-ID: Hi Bernd, Thanks for trying out mvapich2-1.2rc1 and let us know the problem. We are in the process of performance tuning and are looking at this issue. We will get back to you soon. Thanks. -- Wei > ---------- Forwarded message ---------- > Date: Tue, 29 Jul 2008 18:45:38 +0200 > From: Bernd Kallies > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] Performance differences between mvapich2-1.0 and > mvapich2-1.2 > > It seems to me that mvapich2-1.2rc1 seems to be slower that previous > versions when compiling/using defaults. I'd like to know if I forgot > some secret preprocessor flag or configure option for 1.2. > > I compiled the nighty build for mvapich2-1.0 as of July 28 (I guess it > is something like mvapich2-1.0.5) with the following settings: > > export CC=icc > export CXX=icpc > export F77=ifort > export F90=ifort > export CFLAGS='-D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED > -DMPIDI_CH3_CHANNEL_RNDV -DMPID_USE_SEQUENCE_NUMBERS -DRDMA_CM -O2' > configure --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd > --disable-romio --enable-sharedlibs=gcc --without-mpe > > I compiled the tarball source of mvapich2-1.2rc1 with > unset CFLAGS > ./configure --enable-romio --with-file-system=lustre+nfs > --enable-fast=defopt --with-rdma=gen2 --with-thread-package > --enable-sharedlibs=gcc --without-mpe > > I get the following when running osu_alltoall with 1 task per node on > two nodes after setting MV2_NUM_PORTS=2 MV2_ENABLE_AFFINITY=0: > > mvapich2-1.0.5-intel: > # OSU MPI All-to-All Personalized Exchange Latency Test v3.1 > # Size Latency (us) > 1 1.62 > 2 1.71 > 4 1.66 > 8 1.64 > 16 1.68 > 32 1.74 > 64 1.97 > 128 3.04 > 256 3.42 > 512 4.01 > 1024 5.26 > 2048 6.62 > 4096 9.45 > 8192 15.20 > 16384 17.76 > 32768 23.21 > 65536 38.60 > 131072 76.32 > 262144 151.70 > 524288 296.74 > 1048576 591.68 > > mvapich2-1.2rc1-intel: > # OSU MPI All-to-All Personalized Exchange Latency Test v3.1 > # Size Latency (us) > 1 1.87 > 2 1.80 > 4 1.81 > 8 1.82 > 16 1.86 > 32 1.92 > 64 2.10 > 128 3.16 > 256 3.53 > 512 4.07 > 1024 5.33 > 2048 6.79 > 4096 9.54 > 8192 15.34 > 16384 17.48 > 32768 22.88 > 65536 38.78 > 131072 76.55 > 262144 149.74 > 524288 297.11 > 1048576 591.25 > > Other OSU benchmarks yield no visible differences between the two > builds, e.g. osu_mbw_mr with 2 nodes and 4 tasks per node: > > mvapich2-1.0.5-intel: > # OSU MPI Multiple Bandwidth / Message Rate Test v3.1 > # [ pairs: 4 ] [ window size: 64 ] > # Size MB/s Messages/s > 1 3.45 3447336.26 > 2 6.93 3463236.43 > 4 13.83 3458551.26 > 8 27.68 3460000.08 > 16 62.91 3931824.03 > 32 109.74 3429389.41 > 64 213.14 3330258.12 > 128 353.90 2764881.74 > 256 624.27 2438548.84 > 512 980.57 1915173.15 > 1024 1241.38 1212281.33 > 2048 1463.71 714703.42 > 4096 1612.25 393616.25 > 8192 1721.11 210096.00 > 16384 1851.29 112993.94 > 32768 2051.28 62600.09 > 65536 2062.08 31464.92 > 131072 2065.59 15759.17 > 262144 2074.04 7911.82 > 524288 2082.66 3972.35 > 1048576 2087.94 1991.22 > 2097152 2090.20 996.69 > 4194304 2075.23 494.77 > > mvapich2-1.2rc1-intel: > # OSU MPI Multiple Bandwidth / Message Rate Test v3.1 > # [ pairs: 4 ] [ window size: 64 ] > # Size MB/s Messages/s > 1 3.42 3424686.07 > 2 6.92 3459442.70 > 4 13.73 3431691.09 > 8 27.59 3449218.84 > 16 62.63 3914337.15 > 32 108.91 3403302.14 > 64 210.89 3295101.65 > 128 347.89 2717920.88 > 256 621.49 2427687.32 > 512 982.32 1918595.24 > 1024 1246.40 1217187.35 > 2048 1490.18 727625.11 > 4096 1684.54 411264.55 > 8192 1768.11 215833.58 > 16384 1852.36 113059.37 > 32768 2048.83 62525.18 > 65536 2062.01 31463.76 > 131072 2066.38 15765.20 > 262144 2074.90 7915.12 > 524288 2082.75 3972.54 > 1048576 2088.07 1991.34 > 2097152 2090.04 996.61 > 4194304 2077.47 495.31 > > I also compiled the quantum chemistry code CPMD 3.11.1 with both libs. > The code has own profiling. A benchmark run yields for a run with 64 > nodes, 1 task per node, 1 thread per task, application-defined task > pinning, MV2_NUM_PORTS=2 MV2_ENABLE_AFFINITY=0: > > mvapich2-1.0.5-intel: > .. > CPU TIME : 0 HOURS 17 MINUTES 7.53 SECONDS > ELAPSED TIME : 0 HOURS 17 MINUTES 40.26 SECONDS > .. > ================================================================ > = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS = > = SEND/RECEIVE 36385. BYTES 722421. = > = BROADCAST 37880. BYTES 368. = > = GLOBAL SUMMATION 393974. BYTES 10556. = > = GLOBAL MULTIPLICATION 0. BYTES 1. = > = ALL TO ALL COMM 484310. BYTES 46464. = > = PERFORMANCE TOTAL TIME = > = SEND/RECEIVE 681.133 MB/S 38.591 SEC = > = BROADCAST 87.115 MB/S 0.160 SEC = > = GLOBAL SUMMATION 1520.563 MB/S 16.410 SEC = > = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC = > = ALL TO ALL COMM 86.898 MB/S 258.959 SEC = > = SYNCHRONISATION 1.750 SEC = > ================================================================ > > mvapich2-1.2rc1-intel: > .. > CPU TIME : 0 HOURS 18 MINUTES 59.23 SECONDS > ELAPSED TIME : 0 HOURS 19 MINUTES 31.68 SECONDS > .. > ================================================================ > = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS = > = SEND/RECEIVE 36385. BYTES 722421. = > = BROADCAST 37880. BYTES 368. = > = GLOBAL SUMMATION 393974. BYTES 10556. = > = GLOBAL MULTIPLICATION 0. BYTES 1. = > = ALL TO ALL COMM 484310. BYTES 46464. = > = PERFORMANCE TOTAL TIME = > = SEND/RECEIVE 699.651 MB/S 37.570 SEC = > = BROADCAST 87.114 MB/S 0.160 SEC = > = GLOBAL SUMMATION 1557.608 MB/S 16.020 SEC = > = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC = > = ALL TO ALL COMM 61.302 MB/S 367.082 SEC = > = SYNCHRONISATION 1.950 SEC = > ================================================================ > > The difference is reproducible (mvapich2-1.2rc1-intel is slower, seems > to be the reason of slow all to all comm.), also compared to > mvapich2-1.0.3 from tarball, or mvapich2-1.0.1 and mvapich-0.9.9 (both > precompiled from SGI, available from SGI). Note that the benchmarks are > run with no intra-node communication. > > Sincerely, BK > -- > Dr. Bernd Kallies > Konrad-Zuse-Zentrum für Informationstechnik Berlin > Takustr. 7 > 14195 Berlin > Tel: +49-30-84185-270 > Fax: +49-30-84185-311 > e-mail: kallies@zib.de > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From timholme at gmail.com Wed Jul 30 11:47:15 2008 From: timholme at gmail.com (Tim Holme) Date: Thu Jul 31 00:02:05 2008 Subject: [mvapich-discuss] PMGR_COLLECTIVE ERROR: unitialized MPI task Message-ID: <8aeeb60807300847o50ef1564q4b46efa40ee77313@mail.gmail.com> I am running x86_64 GNU/Linux. After compiling a fortran program with mvapich-1.0/pgi/bin/mpif90 , I run the program with mpiexec and get the error: PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required environment variable: MPIRUN_RANK Alternatively, if I compile the program with mvapich/pgi/bin/mpif77, I run the program with mpiexec and get the error: Can't read MPIRUN_HOST I think the program is not getting the MPI environment variables passed. How can I fix this? Any help would be appreciated. Thanks, Tim From koop at cse.ohio-state.edu Thu Jul 31 11:21:21 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Thu Jul 31 11:21:30 2008 Subject: [mvapich-discuss] PMGR_COLLECTIVE ERROR: unitialized MPI task In-Reply-To: <8aeeb60807300847o50ef1564q4b46efa40ee77313@mail.gmail.com> Message-ID: Hi Tim, I assume you are using the OSC mpiexec for PBS? If so, it looks like you need to update to the 0.83 version of mpiexec. In the 1.0 release we updated the startup protocol to a more flexible and scalable protocol for startup. You can download the new version here: http://www.osc.edu/~pw/mpiexec/ Let us know if this solves your problem. Thanks, Matt On Wed, 30 Jul 2008, Tim Holme wrote: > I am running x86_64 GNU/Linux. After compiling a fortran program with > mvapich-1.0/pgi/bin/mpif90 , I run the program with mpiexec and get > the error: > > PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required > environment variable: MPIRUN_RANK > > > Alternatively, if I compile the program with mvapich/pgi/bin/mpif77, I > run the program with mpiexec and get the error: > > Can't read MPIRUN_HOST > > > I think the program is not getting the MPI environment variables > passed. How can I fix this? Any help would be appreciated. > > Thanks, > Tim > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From stevejones at stanford.edu Thu Jul 31 12:00:05 2008 From: stevejones at stanford.edu (Steve Jones) Date: Thu Jul 31 12:00:18 2008 Subject: [mvapich-discuss] PMGR_COLLECTIVE ERROR: unitialized MPI task In-Reply-To: References: Message-ID: <20080731090005.tblwb5g60cosokgg@webmail.stanford.edu> Hi Matt. 0.83 gives us this error: [smjones@compute-5-31 ~]$ /share/apps/mpiexec-test/bin/mpiexec bounce.intel.mvapich-1.0 mpiexec: Warning: read_ib_one: protocol version 8 not known, but might still work. mpiexec: Error: read_ib_one: mixed version executables (6 and 8), no hope. I thought we were waiting on 0.84 to solve this? Steve Quoting Matthew Koop : > Hi Tim, > > I assume you are using the OSC mpiexec for PBS? If so, it looks like you > need to update to the 0.83 version of mpiexec. In the 1.0 release we > updated the startup protocol to a more flexible and scalable protocol for > startup. > > You can download the new version here: > http://www.osc.edu/~pw/mpiexec/ > > Let us know if this solves your problem. > > Thanks, > Matt > > > > On Wed, 30 Jul 2008, Tim Holme wrote: > >> I am running x86_64 GNU/Linux. After compiling a fortran program with >> mvapich-1.0/pgi/bin/mpif90 , I run the program with mpiexec and get >> the error: >> >> PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required >> environment variable: MPIRUN_RANK >> >> >> Alternatively, if I compile the program with mvapich/pgi/bin/mpif77, I >> run the program with mpiexec and get the error: >> >> Can't read MPIRUN_HOST >> >> >> I think the program is not getting the MPI environment variables >> passed. How can I fix this? Any help would be appreciated. >> >> Thanks, >> Tim >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From koop at cse.ohio-state.edu Thu Jul 31 12:58:16 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Thu Jul 31 12:58:26 2008 Subject: [mvapich-discuss] PMGR_COLLECTIVE ERROR: unitialized MPI task In-Reply-To: <20080731090005.tblwb5g60cosokgg@webmail.stanford.edu> Message-ID: It looks like to get current support you may need the SVN version then (future 0.84). I was going from the release notes there. Matt On Thu, 31 Jul 2008, Steve Jones wrote: > Hi Matt. > > 0.83 gives us this error: > > [smjones@compute-5-31 ~]$ /share/apps/mpiexec-test/bin/mpiexec > bounce.intel.mvapich-1.0 > mpiexec: Warning: read_ib_one: protocol version 8 not known, but might > still work. > mpiexec: Error: read_ib_one: mixed version executables (6 and 8), no hope. > > I thought we were waiting on 0.84 to solve this? > > Steve > > Quoting Matthew Koop : > > > Hi Tim, > > > > I assume you are using the OSC mpiexec for PBS? If so, it looks like you > > need to update to the 0.83 version of mpiexec. In the 1.0 release we > > updated the startup protocol to a more flexible and scalable protocol for > > startup. > > > > You can download the new version here: > > http://www.osc.edu/~pw/mpiexec/ > > > > Let us know if this solves your problem. > > > > Thanks, > > Matt > > > > > > > > On Wed, 30 Jul 2008, Tim Holme wrote: > > > >> I am running x86_64 GNU/Linux. After compiling a fortran program with > >> mvapich-1.0/pgi/bin/mpif90 , I run the program with mpiexec and get > >> the error: > >> > >> PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required > >> environment variable: MPIRUN_RANK > >> > >> > >> Alternatively, if I compile the program with mvapich/pgi/bin/mpif77, I > >> run the program with mpiexec and get the error: > >> > >> Can't read MPIRUN_HOST > >> > >> > >> I think the program is not getting the MPI environment variables > >> passed. How can I fix this? Any help would be appreciated. > >> > >> Thanks, > >> Tim > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > From michael.heinz at qlogic.com Thu Jul 31 14:18:43 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu Jul 31 14:18:55 2008 Subject: [mvapich-discuss] Problems with mvapich, gfortan and rhel4? Message-ID: I've been building test clusters using OFED 1.3.1, which includes mvapich. One of the problems I've been running into is odd problems with Fortran90 programs. These programs fail to compile with odd messages that make me think that mvapich failed to build incorrectly. For example: Fatal Error: Reading module mpi at line 4 column 61: Expected left parenthesis >From the message, it would appear that mpi.f90 is faulty. But mpi.f90 appears to be created by mvapich during the build process (of mvapich) and then deleted, which makes it hard to determine for sure. The one thing I can say is that these machines have both gcc version 3 and gcc version 4 compilers installed and that the default gcc and fortran compilers are the version 3 ones. Is it possible that this is causing the problem? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080731/51f7b3fc/attachment-0001.html