From andrey.slepuhin at t-platforms.ru Mon Jul 2 12:41:43 2007 From: andrey.slepuhin at t-platforms.ru (Andrey Slepuhin) Date: Mon Jul 2 12:42:20 2007 Subject: [mvapich-discuss] MVAPICH and HugeTLB Message-ID: <46892AC7.6020305@t-platforms.ru> Dear folks, I am playing with libhugetlb to see the performance difference with different application, but in order get it work correctly I need 1) to disable PTMALLOC (otherwise malloc hooks from libhugetlb do not work and the application uses standard pages) 2) to turn off memory registration cache (otherwise I got incorrect results or application simpy fails) Does anybody else tried libhugetlb with MVAPICH? Are there any plans to implement "transparent" support for libhugetlb that will not require any special configuration for MVAPICH? Thanks, Andrey From koop at cse.ohio-state.edu Mon Jul 2 17:46:54 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Jul 2 17:47:10 2007 Subject: [mvapich-discuss] process limits in MVAPICH In-Reply-To: <4686542F.3050204@hpcapplications.com> Message-ID: > Is there a hard limit on the number of processes or on the number > of compute nodes that can be used in a single ch_gen2 MVAPICH > 0.9.9 job? Any advice on known practical limits? There shouldn't be any hard limits as to the number of processes or nodes in the code in the latest release. Practically speaking, it has already been used to 8K+ processes at LLNL, so not sure what scale we are discussing here. Let us know if you have any additional questions, Matt From koop at cse.ohio-state.edu Tue Jul 3 00:12:39 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue Jul 3 00:12:58 2007 Subject: [mvapich-discuss] MVAPICH and HugeTLB In-Reply-To: <46892AC7.6020305@t-platforms.ru> Message-ID: Andrey, As you found, for optimal performance MVAPICH caches memory registrations. Unfortunately, due to inherent issues in current InfiniBand drivers, to do this properly we need to override malloc and free. MVAPICH currently will automatically turn off the registration cache if other malloc hooks are detected. We have found this to be sufficient for other applications that override malloc. Does simply turning off the registration cache (VIADEV_USE_DREG_CACHE=0) work for you and not compiling with PTMALLOC off? If not, can you send more information on your compilation parameters for the applications as well as MPI library? Thanks, Matt On Mon, 2 Jul 2007, Andrey Slepuhin wrote: > Dear folks, > > I am playing with libhugetlb to see the performance difference with > different application, but in order get it work correctly I need > 1) to disable PTMALLOC (otherwise malloc hooks from libhugetlb do not > work and the application uses standard pages) > 2) to turn off memory registration cache (otherwise I got incorrect > results or application simpy fails) > Does anybody else tried libhugetlb with MVAPICH? Are there any plans to > implement "transparent" support for libhugetlb that will not require any > special configuration for MVAPICH? > > Thanks, > Andrey > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From pnguyen9 at houston.oilfield.slb.com Tue Jul 3 08:25:44 2007 From: pnguyen9 at houston.oilfield.slb.com (phan nguyen) Date: Tue Jul 3 08:49:31 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: Message-ID: <000601c7bd6d$4772f6b0$5055bca3@nam.slb.com> Hi Matthew, Which version do you recommend? OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 I still have trouble installing with my IB path? Cheers, Phan Nguyen Work: 713 513 2550 Mobile: 281 650 8280 -----Original Message----- From: Matthew Koop [mailto:koop@cse.ohio-state.edu] Sent: Wednesday, June 27, 2007 8:54 PM To: phan nguyen Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Cannot run test mpirun after install mvapich. Phan, Just to get some details here -- I have a few questions that will help us diagnose the issue here: - Do you have VAPI installed? (e.g. Topspin or IB-Gold) - Does compilation finish successfully? - What OS/platform is this being installed on? - How are you compiling? Is it with the mpicc in $PREFIX/bin? Also, if you want to run a serial MPI job, you must still use the mpirun_rsh command to launch it. Just run it with only one process. Matt On Wed, 27 Jun 2007, phan nguyen wrote: > Hi, > > I tried to install mvapich on my cluster (version mvapich-0.9.9+psm) > > For installation, I use the custom make file make.mvapich.vapi where I > change only the $PREFIX to point to MPI the location on my machine. > > > > # Mandatory variables. All are checked except CXX and F90. > > MTHOME=/usr/local/ibgd/driver/infinihost > > #PREFIX=/usr/local/mvapich > > PREFIX=/ixdata/IX_Software/mvapich > > > > This IB path remained intact. > > > > After the install, I cannot run any test program under /examples (after > compiling) > > When building with my application and run some serial test (without mpirun) > I got this error > > error while loading shared libraries: libpmpich++.so.1.0: cannot open shared > object file: No such file or directory > > > > I am sure this lib is in my LD_LIBRARY_PATH. > > Please help since this is my first time install MPICH over Infiniband, > > Regards, > > > > Phan Nguyen > > Work: 713 513 2550 > > Mobile: 281 650 8280 > > > > From koop at cse.ohio-state.edu Tue Jul 3 15:44:19 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue Jul 3 15:44:38 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: <000601c7bd6d$4772f6b0$5055bca3@nam.slb.com> Message-ID: > Which version do you recommend? > OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 > I still have trouble installing with my IB path? Either of them will work for you. "OSU MVAPICH 0.9.9 + InfiniPath" is the MVAPICH 0.9.9 release along with a new ch_psm device specifically for the InfiniPath HCAs from QLogic/PathScale. So, you are now compiling with $PREFIX/bin/mpicc and running with $PREFIX/bin/mpirun_rsh? Can you try something like: cd $SRC/osu_benchmarks $PREFIX/bin/mpicc osu_bw.c -o bw $PREFIX/bin/mpirun_rsh -np 2 host1 host2 ./bw (where host1 and host2 are machines you want to run on) In order to run with this setup you will need to be able to ssh from the launching node to the compute nodes without a password. So, make sure you have password-less keys setup for ssh before you try this out. So 'ssh host1' should not prompt for a password. Let us know what happens if you follow the above steps. Thanks, Matt > -----Original Message----- > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > Sent: Wednesday, June 27, 2007 8:54 PM > To: phan nguyen > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > Phan, > > Just to get some details here -- I have a few questions that will help us > diagnose the issue here: > > - Do you have VAPI installed? (e.g. Topspin or IB-Gold) > - Does compilation finish successfully? > - What OS/platform is this being installed on? > - How are you compiling? Is it with the mpicc in $PREFIX/bin? > > Also, if you want to run a serial MPI job, you must still use the > mpirun_rsh command to launch it. Just run it with only one process. > > Matt > > > > On Wed, 27 Jun 2007, phan nguyen wrote: > > > Hi, > > > > I tried to install mvapich on my cluster (version mvapich-0.9.9+psm) > > > > For installation, I use the custom make file make.mvapich.vapi where I > > change only the $PREFIX to point to MPI the location on my machine. > > > > > > > > # Mandatory variables. All are checked except CXX and F90. > > > > MTHOME=/usr/local/ibgd/driver/infinihost > > > > #PREFIX=/usr/local/mvapich > > > > PREFIX=/ixdata/IX_Software/mvapich > > > > > > > > This IB path remained intact. > > > > > > > > After the install, I cannot run any test program under /examples (after > > compiling) > > > > When building with my application and run some serial test (without > mpirun) > > I got this error > > > > error while loading shared libraries: libpmpich++.so.1.0: cannot open > shared > > object file: No such file or directory > > > > > > > > I am sure this lib is in my LD_LIBRARY_PATH. > > > > Please help since this is my first time install MPICH over Infiniband, > > > > Regards, > > > > > > > > Phan Nguyen > > > > Work: 713 513 2550 > > > > Mobile: 281 650 8280 > > > > > > > > > From pnguyen9 at houston.oilfield.slb.com Tue Jul 3 17:47:59 2007 From: pnguyen9 at houston.oilfield.slb.com (phan nguyen) Date: Tue Jul 3 17:50:44 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: Message-ID: <000e01c7bdbb$d2ac4df0$5055bca3@nam.slb.com> Hi Mat, Now that I tried on both version OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 and have it installed under different $PREFIX OSU MVAPICH 0.9.9 + InfiniPath -- > /ixdata/IX_Sodtware/mvapich And OSU MVAPICH 0.9.9 --> /ixdata/IX_Sodtware/mvapich1 I compiled the programs as root and run the test as root mpirun_rsh -np 2 htcixcluster htcixcluster ./bw It seems to run after prompting the passwords But when I run the same program with my regular id, it did not return any message. Note that root default shell is bash and it has ssh authentication setup to all the compute nodes, and my id's default shell is tcsh. My id has rsh authentication setup Definitely, when running with my id, the installed mpirun did not work for some reason.. To simplify the matter, Let's focus on one version only - (version mvapich-0.9.9+psm as stated in the previous email) Phan Nguyen Work: 713 513 2550 Mobile: 281 650 8280 -----Original Message----- From: Matthew Koop [mailto:koop@cse.ohio-state.edu] Sent: Tuesday, July 03, 2007 2:44 PM To: phan nguyen Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > Which version do you recommend? > OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 > I still have trouble installing with my IB path? Either of them will work for you. "OSU MVAPICH 0.9.9 + InfiniPath" is the MVAPICH 0.9.9 release along with a new ch_psm device specifically for the InfiniPath HCAs from QLogic/PathScale. So, you are now compiling with $PREFIX/bin/mpicc and running with $PREFIX/bin/mpirun_rsh? Can you try something like: cd $SRC/osu_benchmarks $PREFIX/bin/mpicc osu_bw.c -o bw $PREFIX/bin/mpirun_rsh -np 2 host1 host2 ./bw (where host1 and host2 are machines you want to run on) In order to run with this setup you will need to be able to ssh from the launching node to the compute nodes without a password. So, make sure you have password-less keys setup for ssh before you try this out. So 'ssh host1' should not prompt for a password. Let us know what happens if you follow the above steps. Thanks, Matt > -----Original Message----- > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > Sent: Wednesday, June 27, 2007 8:54 PM > To: phan nguyen > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > Phan, > > Just to get some details here -- I have a few questions that will help us > diagnose the issue here: > > - Do you have VAPI installed? (e.g. Topspin or IB-Gold) > - Does compilation finish successfully? > - What OS/platform is this being installed on? > - How are you compiling? Is it with the mpicc in $PREFIX/bin? > > Also, if you want to run a serial MPI job, you must still use the > mpirun_rsh command to launch it. Just run it with only one process. > > Matt > > > > On Wed, 27 Jun 2007, phan nguyen wrote: > > > Hi, > > > > I tried to install mvapich on my cluster (version mvapich-0.9.9+psm) > > > > For installation, I use the custom make file make.mvapich.vapi where I > > change only the $PREFIX to point to MPI the location on my machine. > > > > > > > > # Mandatory variables. All are checked except CXX and F90. > > > > MTHOME=/usr/local/ibgd/driver/infinihost > > > > #PREFIX=/usr/local/mvapich > > > > PREFIX=/ixdata/IX_Software/mvapich > > > > > > > > This IB path remained intact. > > > > > > > > After the install, I cannot run any test program under /examples (after > > compiling) > > > > When building with my application and run some serial test (without > mpirun) > > I got this error > > > > error while loading shared libraries: libpmpich++.so.1.0: cannot open > shared > > object file: No such file or directory > > > > > > > > I am sure this lib is in my LD_LIBRARY_PATH. > > > > Please help since this is my first time install MPICH over Infiniband, > > > > Regards, > > > > > > > > Phan Nguyen > > > > Work: 713 513 2550 > > > > Mobile: 281 650 8280 > > > > > > > > > From pnguyen9 at houston.oilfield.slb.com Tue Jul 3 18:02:42 2007 From: pnguyen9 at houston.oilfield.slb.com (phan nguyen) Date: Tue Jul 3 18:10:50 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. Message-ID: <000f01c7bdbd$e14ac1f0$5055bca3@nam.slb.com> Hi Matt, Follow up the previous email When running as root (which has ssh keyless setup). IT'S WORKING When running with my regular id (only rsh keyless is setup). NOT WORKING. which is what we expect isn't it? Phan Nguyen Work: 713 513 2550 Mobile: 281 650 8280 -----Original Message----- From: phan nguyen [mailto:pnguyen9@houston.oilfield.slb.com] Sent: Tuesday, July 03, 2007 4:48 PM To: 'Matthew Koop' Cc: 'mvapich-discuss@cse.ohio-state.edu' Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. Hi Mat, Now that I tried on both version OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 and have it installed under different $PREFIX OSU MVAPICH 0.9.9 + InfiniPath -- > /ixdata/IX_Sodtware/mvapich And OSU MVAPICH 0.9.9 --> /ixdata/IX_Sodtware/mvapich1 I compiled the programs as root and run the test as root mpirun_rsh -np 2 htcixcluster htcixcluster ./bw It seems to run after prompting the passwords But when I run the same program with my regular id, it did not return any message. Note that root default shell is bash and it has ssh authentication setup to all the compute nodes, and my id's default shell is tcsh. My id has rsh authentication setup Definitely, when running with my id, the installed mpirun did not work for some reason.. To simplify the matter, Let's focus on one version only - (version mvapich-0.9.9+psm as stated in the previous email) Phan Nguyen Work: 713 513 2550 Mobile: 281 650 8280 -----Original Message----- From: Matthew Koop [mailto:koop@cse.ohio-state.edu] Sent: Tuesday, July 03, 2007 2:44 PM To: phan nguyen Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > Which version do you recommend? > OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 > I still have trouble installing with my IB path? Either of them will work for you. "OSU MVAPICH 0.9.9 + InfiniPath" is the MVAPICH 0.9.9 release along with a new ch_psm device specifically for the InfiniPath HCAs from QLogic/PathScale. So, you are now compiling with $PREFIX/bin/mpicc and running with $PREFIX/bin/mpirun_rsh? Can you try something like: cd $SRC/osu_benchmarks $PREFIX/bin/mpicc osu_bw.c -o bw $PREFIX/bin/mpirun_rsh -np 2 host1 host2 ./bw (where host1 and host2 are machines you want to run on) In order to run with this setup you will need to be able to ssh from the launching node to the compute nodes without a password. So, make sure you have password-less keys setup for ssh before you try this out. So 'ssh host1' should not prompt for a password. Let us know what happens if you follow the above steps. Thanks, Matt > -----Original Message----- > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > Sent: Wednesday, June 27, 2007 8:54 PM > To: phan nguyen > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > Phan, > > Just to get some details here -- I have a few questions that will help us > diagnose the issue here: > > - Do you have VAPI installed? (e.g. Topspin or IB-Gold) > - Does compilation finish successfully? > - What OS/platform is this being installed on? > - How are you compiling? Is it with the mpicc in $PREFIX/bin? > > Also, if you want to run a serial MPI job, you must still use the > mpirun_rsh command to launch it. Just run it with only one process. > > Matt > > > > On Wed, 27 Jun 2007, phan nguyen wrote: > > > Hi, > > > > I tried to install mvapich on my cluster (version mvapich-0.9.9+psm) > > > > For installation, I use the custom make file make.mvapich.vapi where I > > change only the $PREFIX to point to MPI the location on my machine. > > > > > > > > # Mandatory variables. All are checked except CXX and F90. > > > > MTHOME=/usr/local/ibgd/driver/infinihost > > > > #PREFIX=/usr/local/mvapich > > > > PREFIX=/ixdata/IX_Software/mvapich > > > > > > > > This IB path remained intact. > > > > > > > > After the install, I cannot run any test program under /examples (after > > compiling) > > > > When building with my application and run some serial test (without > mpirun) > > I got this error > > > > error while loading shared libraries: libpmpich++.so.1.0: cannot open > shared > > object file: No such file or directory > > > > > > > > I am sure this lib is in my LD_LIBRARY_PATH. > > > > Please help since this is my first time install MPICH over Infiniband, > > > > Regards, > > > > > > > > Phan Nguyen > > > > Work: 713 513 2550 > > > > Mobile: 281 650 8280 > > > > > > > > > From koop at cse.ohio-state.edu Tue Jul 3 18:17:27 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue Jul 3 18:17:44 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: <000f01c7bdbd$e14ac1f0$5055bca3@nam.slb.com> Message-ID: > Follow up the previous email > When running as root (which has ssh keyless setup). IT'S WORKING > When running with my regular id (only rsh keyless is setup). NOT WORKING. > which is what we expect isn't it? Yes, this is to be expected. By default 'ssh' is used -- so as your user it is trying to use ssh, which you do not have setup for password-less login (and is required). You can try using 'rsh' as your own user by using the "-rsh" flag: mpirun_rsh -rsh -np 2 host1 host2 ./exec Can you give this a try? Thanks, Matt > -----Original Message----- > From: phan nguyen [mailto:pnguyen9@houston.oilfield.slb.com] > Sent: Tuesday, July 03, 2007 4:48 PM > To: 'Matthew Koop' > Cc: 'mvapich-discuss@cse.ohio-state.edu' > Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > Hi Mat, > Now that I tried on both version OSU MVAPICH 0.9.9 + InfiniPath or OSU > MVAPICH 0.9.9 and have it installed under different $PREFIX > > OSU MVAPICH 0.9.9 + InfiniPath -- > /ixdata/IX_Sodtware/mvapich > And OSU MVAPICH 0.9.9 --> /ixdata/IX_Sodtware/mvapich1 > > I compiled the programs as root and run the test as root > mpirun_rsh -np 2 htcixcluster htcixcluster ./bw > It seems to run after prompting the passwords > But when I run the same program with my regular id, it did not return any > message. > Note that root default shell is bash and it has ssh authentication setup to > all the compute nodes, and my id's default shell is tcsh. My id has rsh > authentication setup > > Definitely, when running with my id, the installed mpirun did not work for > some reason.. > > To simplify the matter, Let's focus on one version only - (version > mvapich-0.9.9+psm as stated in the previous email) > > > Phan Nguyen > Work: 713 513 2550 > Mobile: 281 650 8280 > > -----Original Message----- > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > Sent: Tuesday, July 03, 2007 2:44 PM > To: phan nguyen > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > > > Which version do you recommend? > > OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 > > I still have trouble installing with my IB path? > > Either of them will work for you. "OSU MVAPICH 0.9.9 + InfiniPath" is the > MVAPICH 0.9.9 release along with a new ch_psm device specifically for the > InfiniPath HCAs from QLogic/PathScale. > > So, you are now compiling with $PREFIX/bin/mpicc and running with > $PREFIX/bin/mpirun_rsh? > > Can you try something like: > > cd $SRC/osu_benchmarks > $PREFIX/bin/mpicc osu_bw.c -o bw > $PREFIX/bin/mpirun_rsh -np 2 host1 host2 ./bw > (where host1 and host2 are machines you want to run on) > > In order to run with this setup you will need to be able to ssh from the > launching node to the compute nodes without a password. So, make sure you > have password-less keys setup for ssh before you try this out. So 'ssh > host1' should not prompt for a password. > > Let us know what happens if you follow the above steps. > > Thanks, > > Matt > > > > > > > > -----Original Message----- > > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > > Sent: Wednesday, June 27, 2007 8:54 PM > > To: phan nguyen > > Cc: mvapich-discuss@cse.ohio-state.edu > > Subject: Re: [mvapich-discuss] Cannot run test mpirun after install > mvapich. > > > > Phan, > > > > Just to get some details here -- I have a few questions that will help us > > diagnose the issue here: > > > > - Do you have VAPI installed? (e.g. Topspin or IB-Gold) > > - Does compilation finish successfully? > > - What OS/platform is this being installed on? > > - How are you compiling? Is it with the mpicc in $PREFIX/bin? > > > > Also, if you want to run a serial MPI job, you must still use the > > mpirun_rsh command to launch it. Just run it with only one process. > > > > Matt > > > > > > > > On Wed, 27 Jun 2007, phan nguyen wrote: > > > > > Hi, > > > > > > I tried to install mvapich on my cluster (version mvapich-0.9.9+psm) > > > > > > For installation, I use the custom make file make.mvapich.vapi where I > > > change only the $PREFIX to point to MPI the location on my machine. > > > > > > > > > > > > # Mandatory variables. All are checked except CXX and F90. > > > > > > MTHOME=/usr/local/ibgd/driver/infinihost > > > > > > #PREFIX=/usr/local/mvapich > > > > > > PREFIX=/ixdata/IX_Software/mvapich > > > > > > > > > > > > This IB path remained intact. > > > > > > > > > > > > After the install, I cannot run any test program under /examples (after > > > compiling) > > > > > > When building with my application and run some serial test (without > > mpirun) > > > I got this error > > > > > > error while loading shared libraries: libpmpich++.so.1.0: cannot open > > shared > > > object file: No such file or directory > > > > > > > > > > > > I am sure this lib is in my LD_LIBRARY_PATH. > > > > > > Please help since this is my first time install MPICH over Infiniband, > > > > > > Regards, > > > > > > > > > > > > Phan Nguyen > > > > > > Work: 713 513 2550 > > > > > > Mobile: 281 650 8280 > > > > > > > > > > > > > > > > From pnguyen9 at houston.oilfield.slb.com Tue Jul 3 19:43:59 2007 From: pnguyen9 at houston.oilfield.slb.com (phan nguyen) Date: Tue Jul 3 23:22:11 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: Message-ID: <001001c7bdcc$0797dec0$5055bca3@nam.slb.com> Mat, Ah... It's working now... Let me go compile with my code and see how things going... Phan Nguyen Work: 713 513 2550 Mobile: 281 650 8280 -----Original Message----- From: Matthew Koop [mailto:koop@cse.ohio-state.edu] Sent: Tuesday, July 03, 2007 5:17 PM To: phan nguyen Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > Follow up the previous email > When running as root (which has ssh keyless setup). IT'S WORKING > When running with my regular id (only rsh keyless is setup). NOT WORKING. > which is what we expect isn't it? Yes, this is to be expected. By default 'ssh' is used -- so as your user it is trying to use ssh, which you do not have setup for password-less login (and is required). You can try using 'rsh' as your own user by using the "-rsh" flag: mpirun_rsh -rsh -np 2 host1 host2 ./exec Can you give this a try? Thanks, Matt > -----Original Message----- > From: phan nguyen [mailto:pnguyen9@houston.oilfield.slb.com] > Sent: Tuesday, July 03, 2007 4:48 PM > To: 'Matthew Koop' > Cc: 'mvapich-discuss@cse.ohio-state.edu' > Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > Hi Mat, > Now that I tried on both version OSU MVAPICH 0.9.9 + InfiniPath or OSU > MVAPICH 0.9.9 and have it installed under different $PREFIX > > OSU MVAPICH 0.9.9 + InfiniPath -- > /ixdata/IX_Sodtware/mvapich > And OSU MVAPICH 0.9.9 --> /ixdata/IX_Sodtware/mvapich1 > > I compiled the programs as root and run the test as root > mpirun_rsh -np 2 htcixcluster htcixcluster ./bw > It seems to run after prompting the passwords > But when I run the same program with my regular id, it did not return any > message. > Note that root default shell is bash and it has ssh authentication setup to > all the compute nodes, and my id's default shell is tcsh. My id has rsh > authentication setup > > Definitely, when running with my id, the installed mpirun did not work for > some reason.. > > To simplify the matter, Let's focus on one version only - (version > mvapich-0.9.9+psm as stated in the previous email) > > > Phan Nguyen > Work: 713 513 2550 > Mobile: 281 650 8280 > > -----Original Message----- > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > Sent: Tuesday, July 03, 2007 2:44 PM > To: phan nguyen > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > > > Which version do you recommend? > > OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 > > I still have trouble installing with my IB path? > > Either of them will work for you. "OSU MVAPICH 0.9.9 + InfiniPath" is the > MVAPICH 0.9.9 release along with a new ch_psm device specifically for the > InfiniPath HCAs from QLogic/PathScale. > > So, you are now compiling with $PREFIX/bin/mpicc and running with > $PREFIX/bin/mpirun_rsh? > > Can you try something like: > > cd $SRC/osu_benchmarks > $PREFIX/bin/mpicc osu_bw.c -o bw > $PREFIX/bin/mpirun_rsh -np 2 host1 host2 ./bw > (where host1 and host2 are machines you want to run on) > > In order to run with this setup you will need to be able to ssh from the > launching node to the compute nodes without a password. So, make sure you > have password-less keys setup for ssh before you try this out. So 'ssh > host1' should not prompt for a password. > > Let us know what happens if you follow the above steps. > > Thanks, > > Matt > > > > > > > > -----Original Message----- > > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > > Sent: Wednesday, June 27, 2007 8:54 PM > > To: phan nguyen > > Cc: mvapich-discuss@cse.ohio-state.edu > > Subject: Re: [mvapich-discuss] Cannot run test mpirun after install > mvapich. > > > > Phan, > > > > Just to get some details here -- I have a few questions that will help us > > diagnose the issue here: > > > > - Do you have VAPI installed? (e.g. Topspin or IB-Gold) > > - Does compilation finish successfully? > > - What OS/platform is this being installed on? > > - How are you compiling? Is it with the mpicc in $PREFIX/bin? > > > > Also, if you want to run a serial MPI job, you must still use the > > mpirun_rsh command to launch it. Just run it with only one process. > > > > Matt > > > > > > > > On Wed, 27 Jun 2007, phan nguyen wrote: > > > > > Hi, > > > > > > I tried to install mvapich on my cluster (version mvapich-0.9.9+psm) > > > > > > For installation, I use the custom make file make.mvapich.vapi where I > > > change only the $PREFIX to point to MPI the location on my machine. > > > > > > > > > > > > # Mandatory variables. All are checked except CXX and F90. > > > > > > MTHOME=/usr/local/ibgd/driver/infinihost > > > > > > #PREFIX=/usr/local/mvapich > > > > > > PREFIX=/ixdata/IX_Software/mvapich > > > > > > > > > > > > This IB path remained intact. > > > > > > > > > > > > After the install, I cannot run any test program under /examples (after > > > compiling) > > > > > > When building with my application and run some serial test (without > > mpirun) > > > I got this error > > > > > > error while loading shared libraries: libpmpich++.so.1.0: cannot open > > shared > > > object file: No such file or directory > > > > > > > > > > > > I am sure this lib is in my LD_LIBRARY_PATH. > > > > > > Please help since this is my first time install MPICH over Infiniband, > > > > > > Regards, > > > > > > > > > > > > Phan Nguyen > > > > > > Work: 713 513 2550 > > > > > > Mobile: 281 650 8280 > > > > > > > > > > > > > > > > From THOMAS.T.O'SHEA at saic.com Thu Jul 5 16:58:50 2007 From: THOMAS.T.O'SHEA at saic.com (OShea, Thomas T.) Date: Thu Jul 5 17:04:27 2007 Subject: [mvapich-discuss] MVAPICH Assertion Error Message-ID: <3A8D5723B7BEC34C88B5506F25F3FA4607437BF3@0599-its-exmb02.us.saic.com> Any luck debugging this problem? Thanks, Tom ------------------------------------------------------------------------ -------------------------------- Hi, I posted about a problem with mvapich2 and communicating inside a node using smp. My computer died and along with it my email, so I'm using this email account to post a test code that I put together to illustrate the error I'm getting. This code builds 2 arrays on each processor, and then tries to have the master process (rank = 0) grab the arrays from each other ranks using remote memory communication with passive synchrization. On our system this code will run until it gets to a process that is on the same node and then it gives the SMP assertion error. It was a hard bug to track down because so many things will make it go away, such as switching the order of the 2 arrays that are passed. If you pass the larger array first (buff_x), there is no error. Hope this helps, and thanks for you time in looking into this. Thomas O'Shea SAIC -------------- next part -------------- program smp_test c use mpi include 'mpif.h' parameter(iu_bnd=20,ju_bnd=20,ku_bnd=20,maxblocks=2) parameter(len_x=iu_bnd*ju_bnd*ku_bnd,len_x2=iu_bnd) integer mype,nprocs,jproc,i,j,k,mb,ierr,winx,winx2 integer(kind=MPI_ADDRESS_KIND) winsize,targ_disp common/var/x(iu_bnd,ju_bnd,ku_bnd,maxblocks), 1 x2(iu_bnd,maxblocks) real*8 ,target :: x real*8 ,target :: x2 call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, mype, ierr) do mb = 1,maxblocks do k = 1,ku_bnd do j = 1,ju_bnd do i = 1,iu_bnd x(i,j,k,mb) =10*mb+ mype+0.01*i+0.0001*j+.000001*k enddo enddo x2(k,mb) = 10*mb+mype+0.01*k enddo enddo call test(mype) call MPI_BARRIER(MPI_COMM_WORLD,ierr) print *,'Final Barrier',mype CALL MPI_FINALIZE() end c --------------------------------------------------------------------- subroutine test(mype) c use mpi include 'mpif.h' parameter(iu_bnd=20,ju_bnd=20,ku_bnd=20,maxblocks=2) parameter(len_x=iu_bnd*ju_bnd*ku_bnd,len_x2=iu_bnd) integer mype,nprocs,jproc,i,j,k,mb,ierr,winx,winx2 integer(kind=MPI_ADDRESS_KIND) winsize,targ_disp common/var/x(iu_bnd,ju_bnd,ku_bnd,maxblocks), 1 x2(iu_bnd,maxblocks) real*8 ,target :: x real*8 ,target :: x2 real*8 x0(iu_bnd,ju_bnd,ku_bnd) real*8 x20(iu_bnd) real*8 buff_x,buff_x2 pointer(p_x,buff_x(iu_bnd,ju_bnd,ku_bnd,maxblocks)) pointer(p_x2,buff_x2(iu_bnd,maxblocks)) call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) winsize = 8*len_x*maxblocks CALL MPI_ALLOC_MEM(winsize, MPI_INFO_NULL, p_x, ierr) winsize = 8*len_x2*maxblocks CALL MPI_ALLOC_MEM(winsize, MPI_INFO_NULL, p_x2, ierr) winsize = 8*len_x*maxblocks CALL MPI_WIN_CREATE(buff_x,winsize,8,MPI_INFO_NULL, & MPI_COMM_WORLD,winx,ierr) winsize = 8*len_x2*maxblocks CALL MPI_WIN_CREATE(buff_x2,winsize,8,MPI_INFO_NULL, & MPI_COMM_WORLD,winx2,ierr) buff_x = x buff_x2 = x2 if(mype.eq.0) then ! collect arrays from other ranks do jproc=0,(nprocs-1) do mb = 1,2 targ_disp = len_x2*(mb-1) CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,jproc,0,winx2,ierr) CALL MPI_GET(x20,len_x2,MPI_DOUBLE_PRECISION,jproc,targ_disp, & len_x2,MPI_DOUBLE_PRECISION,winx2,ierr) CALL MPI_WIN_UNLOCK(jproc,winx2,ierr) print *,'2nd RMA: jproc, mb',jproc,mb targ_disp = len_x*(mb-1) CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,jproc,0,winx,ierr) CALL MPI_GET(x0,len_x,MPI_DOUBLE_PRECISION,jproc,targ_disp, & len_x,MPI_DOUBLE_PRECISION,winx,ierr) CALL MPI_WIN_UNLOCK(jproc,winx,ierr) enddo ! mb enddo ! jproc endif ! mype=0 ! Freeing Windows and Memory CALL MPI_WIN_FREE(winx,ierr) CALL MPI_WIN_FREE(winx2,ierr) CALL MPI_FREE_MEM(buff_x,ierr) CALL MPI_FREE_MEM(buff_x2,ierr) end subroutine test -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070705/82734987/attachment-0001.html From pnguyen9 at houston.oilfield.slb.com Thu Jul 5 17:54:17 2007 From: pnguyen9 at houston.oilfield.slb.com (phan nguyen) Date: Fri Jul 6 00:39:31 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: Message-ID: <008601c7bf4f$0963b740$5055bca3@nam.slb.com> HI Mat, I have another problem with MPICH+infinipath This is the error when compiling with my code /usr/bin/ld: /ixdata/IX_Software/mvapich/lib/libpmpich++.a(intercepts.o): relocation R_X86_64_32S against `a local symbol' can not be used when making a shared object; recompile with -fPIC /ixdata/IX_Software/mvapich/lib/libpmpich++.a: could not read symbols: Bad value collect2: ld returned 1 exit status I know that this library has to be compiled with -fPIC flag some where in the make file, try to add this to the CFLAGS section but still doesn't work. Please help P.S I have the attachment for the make.mvapich.vapi file Phan Nguyen Work: 713 513 2550 Mobile: 281 650 8280 -----Original Message----- From: Matthew Koop [mailto:koop@cse.ohio-state.edu] Sent: Tuesday, July 03, 2007 5:17 PM To: phan nguyen Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > Follow up the previous email > When running as root (which has ssh keyless setup). IT'S WORKING > When running with my regular id (only rsh keyless is setup). NOT WORKING. > which is what we expect isn't it? Yes, this is to be expected. By default 'ssh' is used -- so as your user it is trying to use ssh, which you do not have setup for password-less login (and is required). You can try using 'rsh' as your own user by using the "-rsh" flag: mpirun_rsh -rsh -np 2 host1 host2 ./exec Can you give this a try? Thanks, Matt > -----Original Message----- > From: phan nguyen [mailto:pnguyen9@houston.oilfield.slb.com] > Sent: Tuesday, July 03, 2007 4:48 PM > To: 'Matthew Koop' > Cc: 'mvapich-discuss@cse.ohio-state.edu' > Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > Hi Mat, > Now that I tried on both version OSU MVAPICH 0.9.9 + InfiniPath or OSU > MVAPICH 0.9.9 and have it installed under different $PREFIX > > OSU MVAPICH 0.9.9 + InfiniPath -- > /ixdata/IX_Sodtware/mvapich > And OSU MVAPICH 0.9.9 --> /ixdata/IX_Sodtware/mvapich1 > > I compiled the programs as root and run the test as root > mpirun_rsh -np 2 htcixcluster htcixcluster ./bw > It seems to run after prompting the passwords > But when I run the same program with my regular id, it did not return any > message. > Note that root default shell is bash and it has ssh authentication setup to > all the compute nodes, and my id's default shell is tcsh. My id has rsh > authentication setup > > Definitely, when running with my id, the installed mpirun did not work for > some reason.. > > To simplify the matter, Let's focus on one version only - (version > mvapich-0.9.9+psm as stated in the previous email) > > > Phan Nguyen > Work: 713 513 2550 > Mobile: 281 650 8280 > > -----Original Message----- > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > Sent: Tuesday, July 03, 2007 2:44 PM > To: phan nguyen > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] Cannot run test mpirun after install mvapich. > > > > Which version do you recommend? > > OSU MVAPICH 0.9.9 + InfiniPath or OSU MVAPICH 0.9.9 > > I still have trouble installing with my IB path? > > Either of them will work for you. "OSU MVAPICH 0.9.9 + InfiniPath" is the > MVAPICH 0.9.9 release along with a new ch_psm device specifically for the > InfiniPath HCAs from QLogic/PathScale. > > So, you are now compiling with $PREFIX/bin/mpicc and running with > $PREFIX/bin/mpirun_rsh? > > Can you try something like: > > cd $SRC/osu_benchmarks > $PREFIX/bin/mpicc osu_bw.c -o bw > $PREFIX/bin/mpirun_rsh -np 2 host1 host2 ./bw > (where host1 and host2 are machines you want to run on) > > In order to run with this setup you will need to be able to ssh from the > launching node to the compute nodes without a password. So, make sure you > have password-less keys setup for ssh before you try this out. So 'ssh > host1' should not prompt for a password. > > Let us know what happens if you follow the above steps. > > Thanks, > > Matt > > > > > > > > -----Original Message----- > > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > > Sent: Wednesday, June 27, 2007 8:54 PM > > To: phan nguyen > > Cc: mvapich-discuss@cse.ohio-state.edu > > Subject: Re: [mvapich-discuss] Cannot run test mpirun after install > mvapich. > > > > Phan, > > > > Just to get some details here -- I have a few questions that will help us > > diagnose the issue here: > > > > - Do you have VAPI installed? (e.g. Topspin or IB-Gold) > > - Does compilation finish successfully? > > - What OS/platform is this being installed on? > > - How are you compiling? Is it with the mpicc in $PREFIX/bin? > > > > Also, if you want to run a serial MPI job, you must still use the > > mpirun_rsh command to launch it. Just run it with only one process. > > > > Matt > > > > > > > > On Wed, 27 Jun 2007, phan nguyen wrote: > > > > > Hi, > > > > > > I tried to install mvapich on my cluster (version mvapich-0.9.9+psm) > > > > > > For installation, I use the custom make file make.mvapich.vapi where I > > > change only the $PREFIX to point to MPI the location on my machine. > > > > > > > > > > > > # Mandatory variables. All are checked except CXX and F90. > > > > > > MTHOME=/usr/local/ibgd/driver/infinihost > > > > > > #PREFIX=/usr/local/mvapich > > > > > > PREFIX=/ixdata/IX_Software/mvapich > > > > > > > > > > > > This IB path remained intact. > > > > > > > > > > > > After the install, I cannot run any test program under /examples (after > > > compiling) > > > > > > When building with my application and run some serial test (without > > mpirun) > > > I got this error > > > > > > error while loading shared libraries: libpmpich++.so.1.0: cannot open > > shared > > > object file: No such file or directory > > > > > > > > > > > > I am sure this lib is in my LD_LIBRARY_PATH. > > > > > > Please help since this is my first time install MPICH over Infiniband, > > > > > > Regards, > > > > > > > > > > > > Phan Nguyen > > > > > > Work: 713 513 2550 > > > > > > Mobile: 281 650 8280 > > > > > > > > > > > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: make.mvapich.vapi Type: application/octet-stream Size: 3089 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070705/eeffeccc/make.mvapich.obj From rowland at cse.ohio-state.edu Fri Jul 6 02:08:12 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Fri Jul 6 02:08:39 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: <008601c7bf4f$0963b740$5055bca3@nam.slb.com> References: <008601c7bf4f$0963b740$5055bca3@nam.slb.com> Message-ID: <468DDC4C.5090203@cse.ohio-state.edu> phan nguyen wrote: > HI Mat, > I have another problem with MPICH+infinipath > > This is the error when compiling with my code > > /usr/bin/ld: /ixdata/IX_Software/mvapich/lib/libpmpich++.a(intercepts.o): > relocation R_X86_64_32S against `a local symbol' can not be used when making > a shared object; recompile with -fPIC > /ixdata/IX_Software/mvapich/lib/libpmpich++.a: could not read symbols: Bad > value > collect2: ld returned 1 exit status > > I know that this library has to be compiled with -fPIC flag some where in > the make file, try to add this to the CFLAGS section but still doesn't work. > Please help Is your code trying to create its own shared library, or is this the result of building a standalone program using mpicxx/mpiCC? It seems you built MVAPICH with static libraries. Does it work if you edit mpicxx and mpiCC and change the following (it should be line 83, or close): CXXFLAGS="-I${includedir}/mpi2c++ -fexceptions" to: CXXFLAGS="-I${includedir}/mpi2c++ -fexceptions -fPIC" thus adding -fPIC there? Another option might be to try building MVAPICH with shared libraries by adding: --enable-sharedlib to the configure line in make.mvapich.vapi and then applying the following patch to configure: --- CUT HERE --- Index: configure =================================================================== --- configure (revision 1361) +++ configure (working copy) @@ -6674,6 +6674,7 @@ # This isn't quite right, but it will work for some systems # Export the shared option to the MPI-2-C++ configure CXXFLAGS_FOR_SHARED=$CC_SHARED_OPT + CXXFLAGS="$CXXFLAGS $CC_SHARED_OPT" fi if test "$SHAREDKIND" != "ignore" ; then # Fortran choices --- CUT HERE --- That should result in mpicxx/mpiCC having the same fix as in the static case I mentioned above. If you could try either of these and let us know if that works - and also if you are trying to build a shared library against the static C++ library that's generated, that would be very helpful. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From pnguyen9 at houston.oilfield.slb.com Fri Jul 6 10:51:03 2007 From: pnguyen9 at houston.oilfield.slb.com (phan nguyen) Date: Fri Jul 6 12:38:31 2007 Subject: [mvapich-discuss] Cannot run test mpirun after install mvapich. In-Reply-To: <468DDC4C.5090203@cse.ohio-state.edu> Message-ID: <00a001c7bfdd$13272a80$5055bca3@nam.slb.com> Shaun, I follow your suggestions and did the following 1. Add --enable-sharedlib to the configure line in make.mvapich.vapi and then applying the following patch to configure: --- CUT HERE --- # This isn't quite right, but it will work for some systems # Export the shared option to the MPI-2-C++ configure CXXFLAGS_FOR_SHARED=$CC_SHARED_OPT + CXXFLAGS="$CXXFLAGS $CC_SHARED_OPT" fi if test "$SHAREDKIND" != "ignore" ; then # Fortran choices --- CUT HERE --- It's seems to work now. Thanks so much for your help Regards, Phan Nguyen Work: 713 513 2550 Mobile: 281 650 8280 -----Original Message----- From: mvapich-discuss-bounces@cse.ohio-state.edu [mailto:mvapich-discuss-bounces@cse.ohio-state.edu] On Behalf Of Shaun Rowland Sent: Friday, July 06, 2007 1:08 AM To: phan nguyen Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Cannot run test mpirun after install mvapich. phan nguyen wrote: > HI Mat, > I have another problem with MPICH+infinipath > > This is the error when compiling with my code > > /usr/bin/ld: /ixdata/IX_Software/mvapich/lib/libpmpich++.a(intercepts.o): > relocation R_X86_64_32S against `a local symbol' can not be used when making > a shared object; recompile with -fPIC > /ixdata/IX_Software/mvapich/lib/libpmpich++.a: could not read symbols: Bad > value > collect2: ld returned 1 exit status > > I know that this library has to be compiled with -fPIC flag some where in > the make file, try to add this to the CFLAGS section but still doesn't work. > Please help Is your code trying to create its own shared library, or is this the result of building a standalone program using mpicxx/mpiCC? It seems you built MVAPICH with static libraries. Does it work if you edit mpicxx and mpiCC and change the following (it should be line 83, or close): CXXFLAGS="-I${includedir}/mpi2c++ -fexceptions" to: CXXFLAGS="-I${includedir}/mpi2c++ -fexceptions -fPIC" thus adding -fPIC there? Another option might be to try building MVAPICH with shared libraries by adding: --enable-sharedlib to the configure line in make.mvapich.vapi and then applying the following patch to configure: --- CUT HERE --- Index: configure =================================================================== --- configure (revision 1361) +++ configure (working copy) @@ -6674,6 +6674,7 @@ # This isn't quite right, but it will work for some systems # Export the shared option to the MPI-2-C++ configure CXXFLAGS_FOR_SHARED=$CC_SHARED_OPT + CXXFLAGS="$CXXFLAGS $CC_SHARED_OPT" fi if test "$SHAREDKIND" != "ignore" ; then # Fortran choices --- CUT HERE --- That should result in mpicxx/mpiCC having the same fix as in the static case I mentioned above. If you could try either of these and let us know if that works - and also if you are trying to build a shared library against the static C++ library that's generated, that would be very helpful. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From santhana at cse.ohio-state.edu Sat Jul 7 05:54:22 2007 From: santhana at cse.ohio-state.edu (Gopal Santhanaraman) Date: Sat Jul 7 05:54:40 2007 Subject: [mvapich-discuss] MVAPICH Assertion Error In-Reply-To: <3A8D5723B7BEC34C88B5506F25F3FA4607437BF3@0599-its-exmb02.us.saic.com> Message-ID: Hi Thomas, We have debugged the problem further with respect to the fortran test code you had sent earlier. please find attached a patch against mvapich2-0.9.8p2 . Could you please try it out and let us know how it goes. Thanks Gopal On Thu, 5 Jul 2007, OShea, Thomas T. wrote: > Any luck debugging this problem? > > > > Thanks, > > Tom > > > > > > > > ------------------------------------------------------------------------ > -------------------------------- > > Hi, I posted about a problem with mvapich2 and communicating inside a > > node using smp. My computer died and along with it my email, so I'm > > using this email account to post a test code that I put together to > > illustrate the error I'm getting. > > > > This code builds 2 arrays on each processor, and then tries to have > > the master process (rank = 0) grab the arrays from each other ranks > > using remote memory communication with passive synchrization. > > > > On our system this code will run until it gets to a process that is on > > the same node and then it gives the SMP assertion error. > > > > It was a hard bug to track down because so many things will make it go > > away, such as switching the order of the 2 arrays that are passed. If > > you pass the larger array first (buff_x), there is no error. > > > > Hope this helps, and thanks for you time in looking into this. > > > > Thomas O'Shea > > SAIC > > -------------- next part -------------- > > program smp_test > > > > c use mpi > > include 'mpif.h' > > parameter(iu_bnd=20,ju_bnd=20,ku_bnd=20,maxblocks=2) > > parameter(len_x=iu_bnd*ju_bnd*ku_bnd,len_x2=iu_bnd) > > integer mype,nprocs,jproc,i,j,k,mb,ierr,winx,winx2 > > integer(kind=MPI_ADDRESS_KIND) winsize,targ_disp > > > > common/var/x(iu_bnd,ju_bnd,ku_bnd,maxblocks), > > 1 x2(iu_bnd,maxblocks) > > real*8 ,target :: x > > real*8 ,target :: x2 > > > > call MPI_INIT(ierr) > > call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) > > call MPI_COMM_RANK(MPI_COMM_WORLD, mype, ierr) > > > > do mb = 1,maxblocks > > do k = 1,ku_bnd > > do j = 1,ju_bnd > > do i = 1,iu_bnd > > x(i,j,k,mb) =10*mb+ mype+0.01*i+0.0001*j+.000001*k > > enddo > > enddo > > x2(k,mb) = 10*mb+mype+0.01*k > > enddo > > enddo > > > > call test(mype) > > > > call MPI_BARRIER(MPI_COMM_WORLD,ierr) > > print *,'Final Barrier',mype > > CALL MPI_FINALIZE() > > end > > > > c --------------------------------------------------------------------- > > > > subroutine test(mype) > > c use mpi > > include 'mpif.h' > > parameter(iu_bnd=20,ju_bnd=20,ku_bnd=20,maxblocks=2) > > parameter(len_x=iu_bnd*ju_bnd*ku_bnd,len_x2=iu_bnd) > > integer mype,nprocs,jproc,i,j,k,mb,ierr,winx,winx2 > > integer(kind=MPI_ADDRESS_KIND) winsize,targ_disp > > > > common/var/x(iu_bnd,ju_bnd,ku_bnd,maxblocks), > > 1 x2(iu_bnd,maxblocks) > > real*8 ,target :: x > > real*8 ,target :: x2 > > > > real*8 x0(iu_bnd,ju_bnd,ku_bnd) > > real*8 x20(iu_bnd) > > > > real*8 buff_x,buff_x2 > > pointer(p_x,buff_x(iu_bnd,ju_bnd,ku_bnd,maxblocks)) > > pointer(p_x2,buff_x2(iu_bnd,maxblocks)) > > > > call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) > > > > winsize = 8*len_x*maxblocks > > CALL MPI_ALLOC_MEM(winsize, MPI_INFO_NULL, p_x, ierr) > > > > winsize = 8*len_x2*maxblocks > > CALL MPI_ALLOC_MEM(winsize, MPI_INFO_NULL, p_x2, ierr) > > > > > > winsize = 8*len_x*maxblocks > > CALL MPI_WIN_CREATE(buff_x,winsize,8,MPI_INFO_NULL, > > & MPI_COMM_WORLD,winx,ierr) > > > > winsize = 8*len_x2*maxblocks > > CALL MPI_WIN_CREATE(buff_x2,winsize,8,MPI_INFO_NULL, > > & MPI_COMM_WORLD,winx2,ierr) > > > > buff_x = x > > buff_x2 = x2 > > > > > > if(mype.eq.0) then ! collect arrays from other ranks > > do jproc=0,(nprocs-1) > > do mb = 1,2 > > targ_disp = len_x2*(mb-1) > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,jproc,0,winx2,ierr) > > CALL MPI_GET(x20,len_x2,MPI_DOUBLE_PRECISION,jproc,targ_disp, > > & len_x2,MPI_DOUBLE_PRECISION,winx2,ierr) > > CALL MPI_WIN_UNLOCK(jproc,winx2,ierr) > > > > print *,'2nd RMA: jproc, mb',jproc,mb > > targ_disp = len_x*(mb-1) > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,jproc,0,winx,ierr) > > CALL MPI_GET(x0,len_x,MPI_DOUBLE_PRECISION,jproc,targ_disp, > > & len_x,MPI_DOUBLE_PRECISION,winx,ierr) > > CALL MPI_WIN_UNLOCK(jproc,winx,ierr) > > > > > > enddo ! mb > > enddo ! jproc > > endif ! mype=0 > > > > ! Freeing Windows and Memory > > CALL MPI_WIN_FREE(winx,ierr) > > CALL MPI_WIN_FREE(winx2,ierr) > > CALL MPI_FREE_MEM(buff_x,ierr) > > CALL MPI_FREE_MEM(buff_x2,ierr) > > > > end subroutine test > > -------------- next part -------------- diff -ruN mvapich2-0.9.8p2/src/mpid/osu_ch3/src/ch3u_rma_sync.c mvapich2-0.9.8p2/src/mpid/osu_ch3/src/ch3u_rma_sync.c --- mvapich2-0.9.8p2/src/mpid/osu_ch3/src/ch3u_rma_sync.c 2006-10-03 14:22:56.000000000 -0400 +++ mvapich2-0.9.8p2/src/mpid/osu_ch3/src/ch3u_rma_sync.c 2007-07-07 04:42:32.000000000 -0400 @@ -1980,6 +1980,7 @@ if ((HANDLE_GET_KIND(curr_op->target_datatype) == HANDLE_KIND_BUILTIN) && MPIDI_CH3_Eager_ok(vc, type_size * curr_op->origin_count)) { +#if 0 single_op_opt = 1; /* Set the lock granted flag to 1 */ win_ptr->lock_granted = 1; @@ -1993,6 +1994,7 @@ if (mpi_errno) { MPIU_ERR_POP(mpi_errno); } +#endif } } From surs at cse.ohio-state.edu Tue Jul 10 13:58:55 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Tue Jul 10 13:59:15 2007 Subject: [mvapich-discuss] mvapich jobs cleanup In-Reply-To: <466FF588.8050204@hpcapplications.com> References: <466FF588.8050204@hpcapplications.com> Message-ID: <4693C8DF.9000509@cse.ohio-state.edu> Hi Mark, We have a patch to solve this stray process issue with MVAPICH-0.9.9. I'm attaching the patch with this email. To apply the patch please follow these steps: $ cd mvapich-0.9.9 $ #save mpirun_rsh_patch to this directory $ patch -p1 < mpirun_rsh_patch Could you please let us know if this patch solves the problem for you? Thanks, Sayantan. Mark Potts wrote: > Hi, > We are observing a number of cases in which MVAPICH-0.9.9 > jobs launched with mpirun_rsh leave stray processes on some > nodes when the job terminates abnormally. Those stray > processes continue to run forever and require recognition > and killing. > > Is there a reason this happens with MVAPICH, and is there a > way to prevent it. This doesn't seem to be the behavior > that occurs for abnormally terminated Voltaire MPI or Intel > MPI jobs. > regards, -- http://www.cse.ohio-state.edu/~surs -------------- next part -------------- diff -ruN 0.9.9/mpid/ch_gen2/process/mpirun_rsh.c exp1/mpid/ch_gen2/process/mpirun_rsh.c --- 0.9.9/mpid/ch_gen2/process/mpirun_rsh.c 2007-05-29 03:47:10.000000000 -0400 +++ exp1/mpid/ch_gen2/process/mpirun_rsh.c 2007-07-09 11:56:32.000000000 -0400 @@ -59,6 +59,7 @@ #define _GNU_SOURCE #include #include +#include #include #include #include @@ -91,20 +92,34 @@ typedef struct { char *hostname; char *device; - int pid; + pid_t pid; + pid_t remote_pid; int port; int control_socket; process_state state; } process; +typedef struct { + const char * hostname; + pid_t * pids; + size_t npids, npids_allocated; +} process_group; + +typedef struct { + process_group * data; + process_group ** index; + size_t npgs, npgs_allocated; +} process_groups; + #define RUNNING(i) ((plist[i].state == P_STARTED || \ plist[i].state == P_CONNECTED || \ plist[i].state == P_RUNNING) ? 1 : 0) /* other information: a.out and rank are implicit. */ -process *plist; -int nprocs; +process_groups * pglist = NULL; +process * plist = NULL; +int nprocs = 0; int aout_index, port; #define MAX_WD_LEN 256 char wd[MAX_WD_LEN]; /* working directory of current process */ @@ -112,11 +127,19 @@ char mpirun_host[MAX_HOST_LEN]; /* hostname of current process */ /* xxx need to add checking for string overflow, do this more carefully ... */ +/* + * Message notifying user of what timed out + */ +static const char * alarm_msg = NULL; #define COMMAND_LEN 2000 #define SEPARATOR ':' - +void free_memory(void); +void pglist_print(void); +void pglist_insert(const char * const, const pid_t const); +void rkill_fast(void); +void rkill_linear(void); void cleanup_handler(int); void nostop_handler(int); void alarm_handler(int); @@ -239,15 +262,19 @@ int hostname_len = 0; totalview_cmd[199] = 0; display[0]='\0'; - + pidglen = sizeof(pid_t); + /* mpirun [-debug] [-xterm] -np N [-hostfile hfile | h1 h2 h3 ... hN] a.out [args] */ + atexit(free_memory); + do { c = getopt_long_only(argc, argv, "+", option_table, &option_index); switch (c) { case '?': case ':': usage(); + exit(EXIT_FAILURE); break; case EOF: break; @@ -255,8 +282,10 @@ switch (option_index) { case 0: nprocs = atoi(optarg); - if (nprocs < 1) + if (nprocs < 1) { usage(); + exit(EXIT_FAILURE); + } break; case 1: debug_on = 1; @@ -290,11 +319,11 @@ case 8: show_version(); usage(); - exit(0); + exit(EXIT_SUCCESS); break; case 9: show_version(); - exit(0); + exit(EXIT_SUCCESS); break; case 10: use_totalview = 1; @@ -311,17 +340,19 @@ break; case 11: usage(); - exit(0); + exit(EXIT_SUCCESS); break; default: fprintf(stderr, "Unknown option\n"); usage(); + exit(EXIT_FAILURE); break; } break; default: fprintf(stderr, "Unreachable statement!\n"); usage(); + exit(EXIT_FAILURE); break; } } while (c != EOF); @@ -332,7 +363,7 @@ fprintf(stderr, "Without hostfile option, hostnames must be " "specified on command line.\n"); usage(); - exit(1); + exit(EXIT_FAILURE); } aout_index = nprocs + optind; } else { @@ -361,13 +392,14 @@ plist = malloc(nprocs * sizeof(process)); if (plist == NULL) { perror("malloc"); - exit(1); + exit(EXIT_FAILURE); } for (i = 0; i < nprocs; i++) { plist[i].state = P_NOTSTARTED; plist[i].device = NULL; plist[i].port = -1; + plist[i].remote_pid = 0; } /* grab hosts from command line or file */ @@ -376,7 +408,7 @@ hostname_len = read_hostfile(hostfile); } else { for (i = 0; i < nprocs; i++) { - plist[i].hostname = argv[optind + i]; + plist[i].hostname = (char *)strndup(argv[optind + i], 100); hostname_len = hostname_len > strlen(plist[i].hostname) ? hostname_len : strlen(plist[i].hostname); } @@ -388,7 +420,7 @@ if (!mpirun_processes) { perror("malloc"); - exit(1); + exit(EXIT_FAILURE); } else { memset(mpirun_processes, 0, nprocs * (hostname_len + 4)); } @@ -412,18 +444,18 @@ s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (s < 0) { perror("socket"); - exit(1); + exit(EXIT_FAILURE); } sockaddr.sin_addr.s_addr = INADDR_ANY; sockaddr.sin_port = 0; if (bind(s, (struct sockaddr *) &sockaddr, sockaddr_len) < 0) { perror("bind"); - exit(1); + exit(EXIT_FAILURE); } if (getsockname(s, (struct sockaddr *) &sockaddr, &sockaddr_len) < 0) { perror("getsockname"); - exit(1); + exit(EXIT_FAILURE); } port = (int) ntohs(sockaddr.sin_port); @@ -431,14 +463,31 @@ if (!show_on) { - signal(SIGHUP, cleanup_handler); - signal(SIGINT, cleanup_handler); - signal(SIGTSTP, nostop_handler); - signal(SIGCHLD, child_handler); - signal(SIGALRM, alarm_handler); + struct sigaction signal_handler; + signal_handler.sa_handler = cleanup_handler; + sigfillset(&signal_handler.sa_mask); + signal_handler.sa_flags = 0; + + sigaction(SIGHUP, &signal_handler, NULL); + sigaction(SIGINT, &signal_handler, NULL); + sigaction(SIGTERM, &signal_handler, NULL); + + signal_handler.sa_handler = nostop_handler; + + sigaction(SIGTSTP, &signal_handler, NULL); + + signal_handler.sa_handler = alarm_handler; + + sigaction(SIGALRM, &signal_handler, NULL); + + signal_handler.sa_handler = child_handler; + sigemptyset(&signal_handler.sa_mask); + + sigaction(SIGCHLD, &signal_handler, NULL); } alarm(1000); + alarm_msg = "Timeout during client startup.\n"; /* long timeout for testing, where process may be stopped in debugger */ #ifdef USE_DDD @@ -511,7 +560,7 @@ } if (show_on) - exit(0); + exit(EXIT_SUCCESS); /*Hostid exchange start */ /* accept incoming connections, read port numbers */ @@ -522,6 +571,9 @@ ACCEPT_HID: sockaddr_len = sizeof(sockaddr); s1 = accept(s, (struct sockaddr *) &sockaddr, &sockaddr_len); + + alarm_msg = "Timeout during hostid exchange.\n"; + if (s1 < 0) { if (errno == EINTR) goto ACCEPT_HID; @@ -592,7 +644,7 @@ hostids = (int *) malloc(hostidlen * nprocs); if (hostids == NULL) { perror("malloc"); - exit(1); + exit(EXIT_FAILURE); } } @@ -626,66 +678,33 @@ } } - /* close all opend sockets */ - for (i = 0; i < nprocs; i++) { - close(plist[i].control_socket); - } - alarm(1000); - /* let enbale the timer again*/ + alarm_msg = "Timeout during address exchange.\n"; + /* lets enable the timer again*/ /* Lets read all other information, LID QP,etc..*/ /* accept incoming connections, read port numbers */ for (i = 0; i < nprocs; i++) { - int version, rank, nread; - char pidstr[12]; -ACCEPT: - sockaddr_len = sizeof(sockaddr); - s1 = accept(s, (struct sockaddr *) &sockaddr, &sockaddr_len); - if (s1 < 0) { - if (errno == EINTR) - goto ACCEPT; - perror("accept"); - cleanup(); - } + int nread; /* * protocol: - * We don't need version number, - * 0. read rank of process - * 1. read address length - * 2. read address itself - * 3. send array of all addresses + * We don't need the version number or the rank, + * 0. read address length + * 1. read address itself + * 2. send array of all addresses */ - /* 0. Find out who we're talking to */ - nread = read(s1, &rank, sizeof(rank)); - if (nread != sizeof(rank)) { - perror("read"); - cleanup(); - } - - if (rank < 0 || rank >= nprocs - || plist[rank].state != P_STARTED) { - fprintf(stderr, "mpirun: invalid rank received. \n"); - cleanup(); - } - - plist[rank].control_socket = s1; - plist[rank].state = P_CONNECTED; + plist[i].state = P_CONNECTED; /* Let us know connection was established * printf("MPIRUN_RSH: Process rank %d connected\n",rank); */ /* 1. Find out length of the data */ - nread = read(s1, &addrlen, sizeof(addrlen)); + nread = read(plist[i].control_socket, &addrlen, sizeof(addrlen)); if (nread != sizeof(addrlen)) { - /* nread == 0 is not actually an error! */ - if (nread == 0) - continue; - perror("read"); cleanup(); } @@ -707,21 +726,20 @@ alladdrs = (int *) malloc(addrlen * nprocs); if (alladdrs == NULL) { perror("malloc"); - exit(1); + exit(EXIT_FAILURE); } } /* 2. Read info from each process */ /* for byte location */ - alladdrs_char = (char *) &alladdrs[rank * addrlen / sizeof(int)]; + alladdrs_char = (char *) &alladdrs[i * addrlen / sizeof(int)]; tot_nread = 0; while (tot_nread < addrlen) { - nread = - read(s1, (void *) (alladdrs_char + tot_nread), - addrlen - tot_nread); + nread = read(plist[i].control_socket, + (void *) (alladdrs_char + tot_nread), addrlen - tot_nread); if (nread < 0) { perror("read"); @@ -733,36 +751,32 @@ read_pid: /* 3. Find out length of the data */ - nread = read(s1, &pidlen, sizeof(pidlen)); + nread = read(plist[i].control_socket, &pidlen, sizeof(pidlen)); if (nread != sizeof(pidlen)) { perror("read"); cleanup(); } /*fprintf(stderr, "read Pid lengths %d and %d \n", pidlen, nread);*/ - if (i == 0) { - pidglen = pidlen; - } else { - if (pidlen != pidglen) { - fprintf(stderr, "Pid lengths %d and %d do not match\n", - pidlen, pidglen); - cleanup(); - } - } + if (pidlen != pidglen) { + fprintf(stderr, "Pid lengths %d and %d do not match\n", + pidlen, pidglen); + cleanup(); + } if (i == 0) { - /* allocate as soon as we know the address length */ + /* allocate as soon as we know the pid length */ allpids = (char *)malloc(pidlen * nprocs); if (allpids == NULL) { perror("malloc"); - exit(1); + exit(EXIT_FAILURE); } } tot_nread=0; while(tot_nread < pidlen) { - nread = read(s1, (void*)(allpids+rank*pidlen+tot_nread), - pidlen - tot_nread); + nread = read(plist[i].control_socket, + (void*)(allpids+i*pidlen+tot_nread), pidlen - tot_nread); /*fprintf(stderr, "read length %d \n", nread);*/ if(nread < 0) { perror("read"); @@ -770,6 +784,9 @@ } tot_nread += nread; } + + plist[i].remote_pid = *((pid_t *)(allpids+i*pidlen)); + pglist_insert(plist[i].hostname, plist[i].remote_pid); } @@ -795,7 +812,7 @@ out_addrs = (int *) malloc(out_addrs_len); if (out_addrs == NULL) { perror("malloc"); - exit(1); + exit(EXIT_FAILURE); } for (i = 0; i < nprocs; i++) { @@ -876,8 +893,7 @@ sleep(100); } close(s); - exit(0); - + exit(EXIT_SUCCESS); } int start_process(int i, char *command_name, char *env) @@ -925,12 +941,12 @@ if ((remote_command = malloc(str_len)) == NULL) { fprintf(stderr, "Failed to malloc %d bytes for remote_command\n", str_len); - exit(1); + exit(EXIT_FAILURE); } if ((xterm_command = malloc(str_len)) == NULL) { fprintf(stderr, "Failed to malloc %d bytes for xterm_command\n", str_len); - exit(1); + exit(EXIT_FAILURE); } @@ -1010,7 +1026,7 @@ if (!show_on) { perror("RSH/SSH command failed!"); } - exit(1); + exit(EXIT_FAILURE); } free(remote_command); @@ -1189,8 +1205,6 @@ fprintf(stderr, "\ta.out => " "name of MPI binary\n"); fprintf(stderr, "\targs => " "arguments for MPI binary\n"); fprintf(stderr, "\n"); - - exit(1); } /* finds first non-whitespace char in input string */ @@ -1221,7 +1235,7 @@ if (hf == NULL) { fprintf(stderr, "Can't open hostfile %s\n", hostfile_name); perror("open"); - exit(1); + exit(EXIT_FAILURE); } for (i = 0; i < nprocs; i++) { @@ -1287,7 +1301,7 @@ } else { fprintf(stderr, "End of file reached on " "hostfile at %d of %d hostnames\n", i, nprocs); - exit(1); + exit(EXIT_FAILURE); } } fclose(hf); @@ -1321,14 +1335,14 @@ if ((pf = fopen(paramfile, "r")) == NULL) { sprintf(errstr, "Cant open paramfile = %s", paramfile); perror(errstr); - exit(1); + exit(EXIT_FAILURE); } if ( strlen(env) == 0 ){ /* Allocating space for env first time */ if ((env = malloc(ENV_LEN)) == NULL) { fprintf(stderr, "Malloc of env failed in read_param_file\n"); - exit(1); + exit(EXIT_FAILURE); } env_left = ENV_LEN - 1; }else{ @@ -1367,7 +1381,7 @@ (ENV_LEN > e_len + 1 ? ENV_LEN : e_len + 1) + strlen(env); if ((env = realloc(env, newlen)) == NULL) { fprintf(stderr, "realloc failed in read_param_file\n"); - exit(1); + exit(EXIT_FAILURE); } if (param_debug) { printf("realloc to %d\n", newlen); @@ -1395,15 +1409,213 @@ } cleanup(); - exit(1); + exit(EXIT_FAILURE); +} + +void pglist_print(void) { + if(pglist) { + int i, j; + size_t npids = 0, npids_allocated = 0; + + fprintf(stderr, "\n--pglist--\ndata:\n"); + for(i = 0; i < pglist->npgs; i++) { + fprintf(stderr, "%p - %s:", &pglist->data[i], + pglist->data[i].hostname); + + for(j = 0; j < pglist->data[i].npids; j++) { + fprintf(stderr, " %d", pglist->data[i].pids[j]); + } + + fprintf(stderr, "\n"); + npids += pglist->data[i].npids; + npids_allocated += pglist->data[i].npids_allocated; + } + + fprintf(stderr, "\nindex:"); + for(i = 0; i < pglist->npgs; i++) { + fprintf(stderr, " %p", pglist->index[i]); + } + + fprintf(stderr, "\nnpgs/allocated: %d/%d (%d%%)\n", pglist->npgs, + pglist->npgs_allocated, (int)(pglist->npgs_allocated ? 100. * + pglist->npgs / pglist->npgs_allocated : 100.)); + fprintf(stderr, "npids/allocated: %d/%d (%d%%)\n", npids, + npids_allocated, (int)(npids_allocated ? 100. * npids / + npids_allocated : 100.)); + fprintf(stderr, "--pglist--\n\n"); + } +} + +void pglist_insert(const char * const hostname, const pid_t const pid) { + const size_t increment = nprocs > 4 ? nprocs / 4 : 1; + size_t index = 0, bottom = 0, top; + static size_t alloc_error = 0; + int i, strcmp_result; + process_group * pg; + void * backup_ptr; + + if(alloc_error) return; + if(pglist == NULL) goto init_pglist; + + top = pglist->npgs - 1; + index = (top + bottom) / 2; + + while(strcmp_result = strcmp(hostname, pglist->index[index]->hostname)) { + if(bottom >= top) break; + + if(strcmp_result > 0) { + bottom = index + 1; + } + + else { + top = index - 1; + } + + index = (top + bottom) / 2; + } + + if(!strcmp_result) goto insert_pid; + if(strcmp_result > 0) index++; + + goto add_process_group; + +init_pglist: + pglist = malloc(sizeof(process_groups)); + + if(pglist) { + pglist->data = NULL; + pglist->index = NULL; + pglist->npgs = 0; + pglist->npgs_allocated = 0; + } + + else { + goto register_alloc_error; + } + +add_process_group: + if(pglist->npgs == pglist->npgs_allocated) { + process_group * pglist_data_backup = pglist->data; + process_group ** pglist_index_backup = pglist->index; + ptrdiff_t offset; + + pglist->npgs_allocated += increment; + + backup_ptr = pglist->data; + pglist->data = realloc(pglist->data, sizeof(process_group) * + pglist->npgs_allocated); + + if(pglist->data == NULL) { + pglist->data = backup_ptr; + goto register_alloc_error; + } + + backup_ptr = pglist->index; + pglist->index = realloc(pglist->index, sizeof(process_group *) * + pglist->npgs_allocated); + + if(pglist->index == NULL) { + pglist->index = backup_ptr; + goto register_alloc_error; + } + + if(offset = (size_t)pglist->data - (size_t)pglist_data_backup) { + for(i = 0; i < pglist->npgs; i++) { + pglist->index[i] = (process_group *)((size_t)pglist->index[i] + + offset); + } + } + } + + for(i = pglist->npgs; i > index; i--) { + pglist->index[i] = pglist->index[i-1]; + } + + pglist->data[pglist->npgs].hostname = hostname; + pglist->data[pglist->npgs].pids = NULL; + pglist->data[pglist->npgs].npids = 0; + pglist->data[pglist->npgs].npids_allocated = 0; + + pglist->index[index] = &pglist->data[pglist->npgs++]; + +insert_pid: + pg = pglist->index[index]; + + if(pg->npids == pg->npids_allocated) { + if(pg->npids_allocated) { + pg->npids_allocated <<= 1; + + if(pg->npids_allocated < pg->npids) pg->npids_allocated = SIZE_MAX; + if(pg->npids_allocated > nprocs) pg->npids_allocated = nprocs; + } + + else { + pg->npids_allocated = 1; + } + + backup_ptr = pg->pids; + pg->pids = realloc(pg->pids, pg->npids_allocated * sizeof(pid_t)); + + if(pg->pids == NULL) { + pg->pids = backup_ptr; + goto register_alloc_error; + } + } + + pg->pids[pg->npids++] = pid; + + return; + +register_alloc_error: + if(pglist) { + if(pglist->data) { + process_group * pg = pglist->data; + + while(pglist->npgs--) { + if(pg->pids) free((pg++)->pids); + } + + free(pglist->data); + } + + if(pglist->index) free(pglist->index); + + free(pglist); + } + + alloc_error = 1; +} + +void free_memory(void) { + if(pglist) { + if(pglist->data) { + process_group * pg = pglist->data; + + while(pglist->npgs--) { + if(pg->pids) free((pg++)->pids); + } + + free(pglist->data); + } + + if(pglist->index) free(pglist->index); + + free(pglist); + } + + if(plist) { + while(nprocs--) { + if(plist[nprocs].device) free(plist[nprocs].device); + if(plist[nprocs].hostname) free(plist[nprocs].hostname); + } + + free(plist); + } } void cleanup(void) { int i; - /* could walk through list of processes, but it looks - like we can just send the signal to the process group - */ if (use_totalview) { fprintf(stderr, "Cleaning up all processes ..."); @@ -1417,36 +1629,180 @@ } for (i = 0; i < nprocs; i++) { - if (RUNNING(i)) { - /* send terminal interrupt, which will hopefully - propagate to the other side. (not sure what xterm will - do here. - */ - kill(plist[i].pid, SIGINT); - } + if (RUNNING(i)) { + /* send terminal interrupt, which will hopefully + propagate to the other side. (not sure what xterm will + do here. + */ + kill(plist[i].pid, SIGINT); + } } + sleep(1); for (i = 0; i < nprocs; i++) { - if (plist[i].state != P_NOTSTARTED) { - /* send regular interrupt to rsh */ - kill(plist[i].pid, SIGTERM); - } + if (plist[i].state != P_NOTSTARTED) { + /* send regular interrupt to rsh */ + kill(plist[i].pid, SIGTERM); + } } sleep(1); for (i = 0; i < nprocs; i++) { - if (plist[i].state != P_NOTSTARTED) { - /* Kill the processes */ - kill(plist[i].pid, SIGKILL); - } + if (plist[i].state != P_NOTSTARTED) { + /* Kill the processes */ + kill(plist[i].pid, SIGKILL); + } + } + + if(pglist) { + rkill_fast(); + } + + else { + rkill_linear(); + } + + exit(EXIT_FAILURE); +} + +void rkill_fast(void) { + int i, j, tryagain, spawned_pid[pglist->npgs]; + + fprintf(stderr, "Killing remote processes..."); + + for(i = 0; i < pglist->npgs; i++) { + if(0 == (spawned_pid[i] = fork())) { + if(pglist->index[i]->npids) { + const size_t bufsize = 40 + 10 * pglist->index[i]->npids; + const process_group * pg = pglist->index[i]; + char kill_cmd[bufsize], tmp[10]; + + kill_cmd[0] = '\0'; + strcat(kill_cmd, "kill -s SIGKILL"); + + for(j = 0; j < pg->npids; j++) { + snprintf(tmp, 10, " %d", pg->pids[j]); + strcat(kill_cmd, tmp); + } + + strcat(kill_cmd, " >&/dev/null"); + + if(use_rsh) { + execl(RSH_CMD, RSH_CMD, pg->hostname, kill_cmd, NULL); + } + + else { + execl(SSH_CMD, SSH_CMD, SSH_ARG, "-x", pg->hostname, + kill_cmd, NULL); + } + + perror(NULL); + exit(EXIT_FAILURE); + } + + else { + exit(EXIT_SUCCESS); + } + } + } + + while(1) { + static int iteration = 0; + tryagain = 0; + + sleep(1 << iteration); + + for (i = 0; i < pglist->npgs; i++) { + if(spawned_pid[i]) { + if(!(spawned_pid[i] = waitpid(spawned_pid[i], NULL, WNOHANG))) { + tryagain = 1; + } + } + } + + if(++iteration == 5 || !tryagain) { + fprintf(stderr, "DONE\n"); + break; + } + } + + if(tryagain) { + fprintf(stderr, "The following processes may have not been killed:\n"); + for (i = 0; i < pglist->npgs; i++) { + if(spawned_pid[i]) { + const process_group * pg = pglist->index[i]; + + fprintf(stderr, "%s:", pg->hostname); + + for (j = 0; j < pg->npids; j++) { + fprintf(stderr, " %d", pg->pids[j]); + } + + fprintf(stderr, "\n"); + } + } + } +} + +void rkill_linear(void) { + int i, j, tryagain, spawned_pid[nprocs]; + + fprintf(stderr, "Killing remote processes..."); + + for (i = 0; i < nprocs; i++) { + if(0 == (spawned_pid[i] = fork())) { + char kill_cmd[80]; + + if(!plist[i].remote_pid) exit(EXIT_SUCCESS); + + snprintf(kill_cmd, 80, "kill -s SIGKILL %d >&/dev/null", + plist[i].remote_pid); + + if(use_rsh) { + execl(RSH_CMD, RSH_CMD, plist[i].hostname, kill_cmd, NULL); + } + + else { + execl(SSH_CMD, SSH_CMD, SSH_ARG, "-x", + plist[i].hostname, kill_cmd, NULL); + } + + perror(NULL); + exit(EXIT_FAILURE); + } } - fprintf(stderr, "done.\n"); + while(1) { + static int iteration = 0; + tryagain = 0; + + sleep(1 << iteration); + + for (i = 0; i < nprocs; i++) { + if(spawned_pid[i]) { + if(!(spawned_pid[i] = waitpid(spawned_pid[i], NULL, WNOHANG))) { + tryagain = 1; + } + } + } - exit(1); + if(++iteration == 5 || !tryagain) { + fprintf(stderr, "DONE\n"); + break; + } + } + if(tryagain) { + fprintf(stderr, "The following processes may have not been killed:\n"); + for (i = 0; i < nprocs; i++) { + if(spawned_pid[i]) { + fprintf(stderr, "%s [%d]\n", plist[i].hostname, + plist[i].remote_pid); + } + } + } } @@ -1457,9 +1813,13 @@ void alarm_handler(int signal) { + extern const char * alarm_msg; + if (use_totalview) { fprintf(stderr, "Timeout alarm signaled\n"); } + + if(alarm_msg) fprintf(stderr, alarm_msg); cleanup(); } @@ -1467,19 +1827,21 @@ void child_handler(int signal) { int status, i, child, pid; - int exitstatus = 0; + int exitstatus = EXIT_SUCCESS; if (use_totalview) { fprintf(stderr, "mpirun: child died. Waiting for others.\n"); } alarm(10); + alarm_msg = "Child died. Timeout while waiting for others.\n"; + for (i = 0; i < nprocs; i++) { pid = wait(&status); if (pid == -1) { perror("wait"); - exitstatus = 1; + exitstatus = EXIT_FAILURE; } else if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) { - exitstatus = 1; + exitstatus = EXIT_FAILURE; } for (child = 0; child < nprocs; child++) { if (plist[child].pid == pid) { @@ -1489,9 +1851,11 @@ } if (child == nprocs) { fprintf(stderr, "Unable to find child %d!\n", pid); - exitstatus = 1; + exitstatus = EXIT_FAILURE; } } alarm(0); exit(exitstatus); } + +/* vi:set sw=4 sts=4 tw=80: */ diff -ruN 0.9.9/mpid/ch_gen2/process/pmgr_client.h exp1/mpid/ch_gen2/process/pmgr_client.h --- 0.9.9/mpid/ch_gen2/process/pmgr_client.h 2007-05-29 03:47:10.000000000 -0400 +++ exp1/mpid/ch_gen2/process/pmgr_client.h 2007-07-02 12:59:51.000000000 -0400 @@ -108,6 +108,6 @@ * of the spawner, e.g. mpirun_rsh, to check that it understands * the version of the executable. */ -#define PMGR_VERSION 5 +#define PMGR_VERSION 6 #endif diff -ruN 0.9.9/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c exp1/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c --- 0.9.9/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c 2007-05-29 03:47:10.000000000 -0400 +++ exp1/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c 2007-07-02 12:59:51.000000000 -0400 @@ -171,6 +171,9 @@ int nwritten; int version; struct sockaddr_in sockaddr; + + if(phase != 0) return; + /* * Exchange information with the mpirun program. Send it our * socket address, get back addresses for our siblings. @@ -208,14 +211,12 @@ */ version = PMGR_VERSION; - if (0 == phase) { - /* first, send a version number */ - nwritten = write(mpirun_socket, &version, sizeof(version)); - if (nwritten != sizeof(version)) { - sleep(2); - perror("write"); - exit(1); - } + /* first, send a version number */ + nwritten = write(mpirun_socket, &version, sizeof(version)); + if (nwritten != sizeof(version)) { + sleep(2); + perror("write"); + exit(1); } /* next, send our rank */ @@ -264,7 +265,6 @@ tot_nread = tot_nread + nread; } fflush(stdout); - close(mpirun_socket); return 1; } @@ -280,7 +280,6 @@ pid_t *ppids = (pid_t *)pallpids; pid_t *allpids = NULL; - pmgr_init_connection(1); /* next, send size of addr */ nwritten = write(mpirun_socket, &addrlen, sizeof(addrlen)); if (nwritten != sizeof(addrlen)) { @@ -314,7 +313,7 @@ exit(1); } - /* next, send size of addr */ + /* next, send size of pid */ nwritten = write(mpirun_socket, &pidlen, sizeof(pidlen)); if (nwritten != sizeof(mypid_len)) { sleep(2); @@ -322,6 +321,7 @@ exit(1); } + /* next, send our pid */ if (pidlen != 0) { nwritten = write(mpirun_socket, &my_pid_int, (size_t) pidlen); if (nwritten != pidlen) { @@ -345,7 +345,7 @@ if (pidlen != 0) { tot_nread=0; - /* finally, read addresses from all processes */ + /* finally, read pids from all processes */ while (tot_nread < pmgr_nprocs*pidlen) { nread = read(mpirun_socket, (void*)((char *)allpids+tot_nread), (size_t) ((pmgr_nprocs*pidlen)-tot_nread)); From jayesh at mcs.anl.gov Wed Jul 11 16:14:37 2007 From: jayesh at mcs.anl.gov (Jayesh Krishna) Date: Wed Jul 11 16:22:15 2007 Subject: [mvapich-discuss] RE: [MPICH2 Req #3227] FW: Bug in smpd In-Reply-To: <001601c759c6$5d96a960$860add8c@mcs.anl.gov> References: <001601c759c6$5d96a960$860add8c@mcs.anl.gov> Message-ID: <00fd01c7c3f8$1b1d28b0$9d09dd8c@mcs.anl.gov> Hi, Are you now able to use the "-smpdfile" option with mpiexec/smpd ? (PS: In the MPICH2 distribution from ANL, the "-smpdfile" option works correctly.) Regards, Jayesh ---------------------------------- Jayesh Krishna Argonne National Laboratory Mathematics and Computer Science Email: jayesh@mcs.anl.gov ---------------------------------- -----Original Message----- From: Rajeev Thakur [mailto:thakur@mcs.anl.gov] Sent: Monday, February 26, 2007 10:52 AM To: mpich2-maint@mcs.anl.gov Cc: mpich2-maint@mcs.anl.gov Subject: [MPICH2 Req #3227] FW: Bug in smpd Date: Fri, 23 Feb 2007 17:53:01 +0100 From: Luis Kornblueh Subject: [mvapich-discuss] Bug in smpd To: mvapich-discuss@cse.ohio-state.edu Message-ID: <20070223165301.GB26201@creus.mpi.zmaw.de> Content-Type: text/plain; charset=us-ascii Hi, sorry, if this is the wrong place. I try to get mvapich2 running in tight integration with SGE. It is recommended to use smpd for this. On our cluster are no home directories available. I tried to use the -smpdfile option. The problem is that in smpd_connect the clean call for smpd_open_xxx is not used but some - it looks like - quick hack code. So smpd is not using the command line option smpdfile. You can get smpd coming up as daemons, but mpiexec is bailing out. My target is to distribute the smpd file in the SGE TMPDIR which makes it available to a full job and gets cleaned up at the end. Hope someone can help - thanks a lot, Luis -- \\\\\\ (-0^0-) --------------------------oOO--(_)--OOo----------------------------- Luis Kornblueh Tel. : +49-40-41173289 Max-Planck-Institute for Meteorology Fax. : +49-40-41173298 Bundesstr. 53 D-20146 Hamburg Email: luis.kornblueh@zmaw.de Federal Republic of Germany ------------------------------ From MZebrowski at x-iss.com Wed Jul 11 15:55:18 2007 From: MZebrowski at x-iss.com (Michael Zebrowski) Date: Wed Jul 11 16:22:26 2007 Subject: [mvapich-discuss] mvapich: using IPoIB or direct IB? Message-ID: <98E55D6E1B3CFD43BDA59EEB56DD7D72275836@sbs01.xiss.private> Does anyone know a good way to determine whether mvapich is utilizing IPoIB or direct IB during its run? I have validated that the packet counters increase significantly (using OFED's perfquery) on nodes that are referenced during the mpi run (and do not increase when they are not referenced), but I wanted to confirm that this is an accurate approach in determining this. Also, what exactly does the 'tdevice' executable report? - Michael NOTICE: This message may contain privileged or otherwise confidential information. If you are not the intended recipient, please immediately advise the sender by reply email and delete the message and any attachments without using, copying or disclosing the contents. From MZebrowski at x-iss.com Wed Jul 11 16:23:39 2007 From: MZebrowski at x-iss.com (Michael Zebrowski) Date: Wed Jul 11 16:24:12 2007 Subject: [mvapich-discuss] mvapich: using IPoIB or direct IB? Message-ID: <98E55D6E1B3CFD43BDA59EEB56DD7D7227583E@sbs01.xiss.private> Does anyone know a good way to determine whether mvapich is utilizing IPoIB or direct IB during its run? I have validated that the IB packet counters increase significantly (using OFED's perfquery) on nodes that are referenced during the mpi run (and do not increase when they are not referenced), but I wanted to confirm that this is an accurate approach in determining this. Also, what exactly does the 'tdevice' executable report? - Michael NOTICE: This message may contain privileged or otherwise confidential information. If you are not the intended recipient, please immediately advise the sender by reply email and delete the message and any attachments without using, copying or disclosing the contents. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070711/629b42eb/attachment.html From surs at cse.ohio-state.edu Wed Jul 11 16:56:31 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Wed Jul 11 16:56:50 2007 Subject: [mvapich-discuss] mvapich: using IPoIB or direct IB? In-Reply-To: <98E55D6E1B3CFD43BDA59EEB56DD7D7227583E@sbs01.xiss.private> References: <98E55D6E1B3CFD43BDA59EEB56DD7D7227583E@sbs01.xiss.private> Message-ID: <469543FF.4040100@cse.ohio-state.edu> Hi Michael, Michael Zebrowski wrote: > > Does anyone know a good way to determine whether mvapich is utilizing > IPoIB or direct IB during its run? I have validated that the IB packet > counters increase significantly (using OFED's perfquery) on nodes that > are referenced during the mpi run (and do not increase when they are > not referenced), but I wanted to confirm that this is an accurate > approach in determining this. > > > > Also, what exactly does the 'tdevice' executable report? > There are multiple ways to determine this: 1) $MVAPICH/bin/mpirun_rsh -v For the latest MVAPICH version, it should print "OSU MVAPICH VERSION 0.9.9-SingleRail". If you use `mpirun_rsh' to launch any MPI program, then you can be sure that it will use direct IB for the run. 2) Build and execute the OSU benchmarks. For modern servers equipped with Mellanox DDR cards, you should get results around 1400 MB/s from osu_bw.c and around 2500 for osu_bibw.c. These numbers are possible only using direct IB. The perf counters are unfortunately not a good way to determine whether it is using direct IB or not, since in both modes (IPoIB and direct IB), packets are sent over the IB device and counters may change. Currently the 'tdevice' script just prints ch_p4 (which is the TCP/IP support from MPICH). Please let us know if you have questions. Thanks, Sayantan. > > > - Michael > > > > > NOTICE: > This message may contain privileged or otherwise confidential information. > If you are not the intended recipient, please immediately advise the sender > by reply email and delete the message and any attachments without using, > copying or disclosing the contents. > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- http://www.cse.ohio-state.edu/~surs From MZebrowski at x-iss.com Wed Jul 11 17:08:19 2007 From: MZebrowski at x-iss.com (Michael Zebrowski) Date: Wed Jul 11 17:08:46 2007 Subject: [mvapich-discuss] mvapich: using IPoIB or direct IB? References: <98E55D6E1B3CFD43BDA59EEB56DD7D7227583E@sbs01.xiss.private> <469543FF.4040100@cse.ohio-state.edu> Message-ID: <98E55D6E1B3CFD43BDA59EEB56DD7D72275848@sbs01.xiss.private> Thanks very much Sayantan. I will give those commands a try. By the way we are using mpirun_rsh... Regards, Michael -----Original Message----- From: Sayantan Sur [mailto:surs@cse.ohio-state.edu] Sent: Wednesday, July 11, 2007 3:57 PM To: Michael Zebrowski Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] mvapich: using IPoIB or direct IB? Hi Michael, Michael Zebrowski wrote: > > Does anyone know a good way to determine whether mvapich is utilizing > IPoIB or direct IB during its run? I have validated that the IB packet > counters increase significantly (using OFED's perfquery) on nodes that > are referenced during the mpi run (and do not increase when they are > not referenced), but I wanted to confirm that this is an accurate > approach in determining this. > > > > Also, what exactly does the 'tdevice' executable report? > There are multiple ways to determine this: 1) $MVAPICH/bin/mpirun_rsh -v For the latest MVAPICH version, it should print "OSU MVAPICH VERSION 0.9.9-SingleRail". If you use `mpirun_rsh' to launch any MPI program, then you can be sure that it will use direct IB for the run. 2) Build and execute the OSU benchmarks. For modern servers equipped with Mellanox DDR cards, you should get results around 1400 MB/s from osu_bw.c and around 2500 for osu_bibw.c. These numbers are possible only using direct IB. The perf counters are unfortunately not a good way to determine whether it is using direct IB or not, since in both modes (IPoIB and direct IB), packets are sent over the IB device and counters may change. Currently the 'tdevice' script just prints ch_p4 (which is the TCP/IP support from MPICH). Please let us know if you have questions. Thanks, Sayantan. > > > - Michael > > > > > NOTICE: > This message may contain privileged or otherwise confidential information. > If you are not the intended recipient, please immediately advise the sender > by reply email and delete the message and any attachments without using, > copying or disclosing the contents. > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- http://www.cse.ohio-state.edu/~surs NOTICE: This message may contain privileged or otherwise confidential information. If you are not the intended recipient, please immediately advise the sender by reply email and delete the message and any attachments without using, copying or disclosing the contents. From potts at hpcapplications.com Wed Jul 11 23:01:17 2007 From: potts at hpcapplications.com (Mark Potts) Date: Wed Jul 11 23:01:21 2007 Subject: [mvapich-discuss] MVAPICH problem in MPI_Finalize Message-ID: <4695997D.8050500@hpcapplications.com> Hi, I've finally tracked an intermittent problem that causes MVAPICH processes to generate segmentation faults during their shutdown. It seems to only happen on fairly large jobs on a 256 node cluster (8-32 cores/node). The following is the backtrace from the core file of one of the failed processes from a purposely simple pgm. (simpleprint_c). This particular job ran with 1024 processes. We are using ch_gen2 MVAPICH 0.9.9 singlerail with _SMP turned on. This segmentation fault occurs across a host of different pgms. but never on all processes and randomly(?) from one run to the next. From the core dump, the seg fault occurs as a result of the call to MPI_Finalize() but ultimately lies in the free() function of ptmalloc2/malloc.c. From some cursory code examination it appears that the error is hit when trying to unmap a memory segment. Since the seg fault occurrence is seemingly random, is this perhaps a timing issue in which processes within an SMP node get confused about who should be unmapping/freeing memory? gdb simpleprint_c core.9334 : : Core was generated by `/var/tmp/mjpworkspace/simpleprint_c'. Program terminated with signal 11, Segmentation fault. #0 free (mem=0xfa00940af900940a) at ptmalloc2/malloc.c:3455 3455 ptmalloc2/malloc.c: No such file or directory. in ptmalloc2/malloc.c (gdb) bt #0 free (mem=0xfa00940af900940a) at ptmalloc2/malloc.c:3455 #1 0x00002b70b40489c5 in free_2level_comm (comm_ptr=0x57a720) at create_2level_comm.c:49 #2 0x00002b70b40461af in PMPI_Comm_free (commp=0x7ffff6bb4e44) at comm_free.c:187 #3 0x00002b70b404604f in PMPI_Comm_free (commp=0x7ffff6bb4e70) at comm_free.c:217 #4 0x00002b70b404d56e in PMPI_Finalize () at finalize.c:159 #5 0x0000000000400814 in main (argc=1, argv=0x7ffff6bb4fa8) at simple.c:18 (gdb) regards, -- *********************************** >> Mark J. Potts, PhD >> >> HPC Applications Inc. >> phone: 410-992-8360 Bus >> 410-313-9318 Home >> 443-418-4375 Cell >> email: potts@hpcapplications.com >> potts@excray.com *********************************** From potts at hpcapplications.com Wed Jul 11 23:03:05 2007 From: potts at hpcapplications.com (Mark Potts) Date: Wed Jul 11 23:03:04 2007 Subject: [mvapich-discuss] mvapich jobs cleanup In-Reply-To: <4693C8DF.9000509@cse.ohio-state.edu> References: <466FF588.8050204@hpcapplications.com> <4693C8DF.9000509@cse.ohio-state.edu> Message-ID: <469599E9.8020703@hpcapplications.com> Sayantan, Thanks. This may take a couple of days, but I've got the patch and I'll get word back to you about its success. regards, Sayantan Sur wrote: > Hi Mark, > > We have a patch to solve this stray process issue with MVAPICH-0.9.9. > I'm attaching the patch with this email. To apply the patch please > follow these steps: > > $ cd mvapich-0.9.9 > $ #save mpirun_rsh_patch to this directory > $ patch -p1 < mpirun_rsh_patch > > Could you please let us know if this patch solves the problem for you? > > Thanks, > Sayantan. > > > Mark Potts wrote: >> Hi, >> We are observing a number of cases in which MVAPICH-0.9.9 >> jobs launched with mpirun_rsh leave stray processes on some >> nodes when the job terminates abnormally. Those stray >> processes continue to run forever and require recognition >> and killing. >> >> Is there a reason this happens with MVAPICH, and is there a >> way to prevent it. This doesn't seem to be the behavior >> that occurs for abnormally terminated Voltaire MPI or Intel >> MPI jobs. >> regards, > > > > ------------------------------------------------------------------------ > > diff -ruN 0.9.9/mpid/ch_gen2/process/mpirun_rsh.c exp1/mpid/ch_gen2/process/mpirun_rsh.c > --- 0.9.9/mpid/ch_gen2/process/mpirun_rsh.c 2007-05-29 03:47:10.000000000 -0400 > +++ exp1/mpid/ch_gen2/process/mpirun_rsh.c 2007-07-09 11:56:32.000000000 -0400 > @@ -59,6 +59,7 @@ > #define _GNU_SOURCE > #include > #include > +#include > #include > #include > #include > @@ -91,20 +92,34 @@ > typedef struct { > char *hostname; > char *device; > - int pid; > + pid_t pid; > + pid_t remote_pid; > int port; > int control_socket; > process_state state; > } process; > > +typedef struct { > + const char * hostname; > + pid_t * pids; > + size_t npids, npids_allocated; > +} process_group; > + > +typedef struct { > + process_group * data; > + process_group ** index; > + size_t npgs, npgs_allocated; > +} process_groups; > + > #define RUNNING(i) ((plist[i].state == P_STARTED || \ > plist[i].state == P_CONNECTED || \ > plist[i].state == P_RUNNING) ? 1 : 0) > > /* other information: a.out and rank are implicit. */ > > -process *plist; > -int nprocs; > +process_groups * pglist = NULL; > +process * plist = NULL; > +int nprocs = 0; > int aout_index, port; > #define MAX_WD_LEN 256 > char wd[MAX_WD_LEN]; /* working directory of current process */ > @@ -112,11 +127,19 @@ > char mpirun_host[MAX_HOST_LEN]; /* hostname of current process */ > /* xxx need to add checking for string overflow, do this more carefully ... */ > > +/* > + * Message notifying user of what timed out > + */ > +static const char * alarm_msg = NULL; > > #define COMMAND_LEN 2000 > #define SEPARATOR ':' > > - > +void free_memory(void); > +void pglist_print(void); > +void pglist_insert(const char * const, const pid_t const); > +void rkill_fast(void); > +void rkill_linear(void); > void cleanup_handler(int); > void nostop_handler(int); > void alarm_handler(int); > @@ -239,15 +262,19 @@ > int hostname_len = 0; > totalview_cmd[199] = 0; > display[0]='\0'; > - > + pidglen = sizeof(pid_t); > + > /* mpirun [-debug] [-xterm] -np N [-hostfile hfile | h1 h2 h3 ... hN] a.out [args] */ > > + atexit(free_memory); > + > do { > c = getopt_long_only(argc, argv, "+", option_table, &option_index); > switch (c) { > case '?': > case ':': > usage(); > + exit(EXIT_FAILURE); > break; > case EOF: > break; > @@ -255,8 +282,10 @@ > switch (option_index) { > case 0: > nprocs = atoi(optarg); > - if (nprocs < 1) > + if (nprocs < 1) { > usage(); > + exit(EXIT_FAILURE); > + } > break; > case 1: > debug_on = 1; > @@ -290,11 +319,11 @@ > case 8: > show_version(); > usage(); > - exit(0); > + exit(EXIT_SUCCESS); > break; > case 9: > show_version(); > - exit(0); > + exit(EXIT_SUCCESS); > break; > case 10: > use_totalview = 1; > @@ -311,17 +340,19 @@ > break; > case 11: > usage(); > - exit(0); > + exit(EXIT_SUCCESS); > break; > default: > fprintf(stderr, "Unknown option\n"); > usage(); > + exit(EXIT_FAILURE); > break; > } > break; > default: > fprintf(stderr, "Unreachable statement!\n"); > usage(); > + exit(EXIT_FAILURE); > break; > } > } while (c != EOF); > @@ -332,7 +363,7 @@ > fprintf(stderr, "Without hostfile option, hostnames must be " > "specified on command line.\n"); > usage(); > - exit(1); > + exit(EXIT_FAILURE); > } > aout_index = nprocs + optind; > } else { > @@ -361,13 +392,14 @@ > plist = malloc(nprocs * sizeof(process)); > if (plist == NULL) { > perror("malloc"); > - exit(1); > + exit(EXIT_FAILURE); > } > > for (i = 0; i < nprocs; i++) { > plist[i].state = P_NOTSTARTED; > plist[i].device = NULL; > plist[i].port = -1; > + plist[i].remote_pid = 0; > } > > /* grab hosts from command line or file */ > @@ -376,7 +408,7 @@ > hostname_len = read_hostfile(hostfile); > } else { > for (i = 0; i < nprocs; i++) { > - plist[i].hostname = argv[optind + i]; > + plist[i].hostname = (char *)strndup(argv[optind + i], 100); > hostname_len = hostname_len > strlen(plist[i].hostname) ? > hostname_len : strlen(plist[i].hostname); > } > @@ -388,7 +420,7 @@ > > if (!mpirun_processes) { > perror("malloc"); > - exit(1); > + exit(EXIT_FAILURE); > } else { > memset(mpirun_processes, 0, nprocs * (hostname_len + 4)); > } > @@ -412,18 +444,18 @@ > s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); > if (s < 0) { > perror("socket"); > - exit(1); > + exit(EXIT_FAILURE); > } > sockaddr.sin_addr.s_addr = INADDR_ANY; > sockaddr.sin_port = 0; > if (bind(s, (struct sockaddr *) &sockaddr, sockaddr_len) < 0) { > perror("bind"); > - exit(1); > + exit(EXIT_FAILURE); > } > > if (getsockname(s, (struct sockaddr *) &sockaddr, &sockaddr_len) < 0) { > perror("getsockname"); > - exit(1); > + exit(EXIT_FAILURE); > } > > port = (int) ntohs(sockaddr.sin_port); > @@ -431,14 +463,31 @@ > > > if (!show_on) { > - signal(SIGHUP, cleanup_handler); > - signal(SIGINT, cleanup_handler); > - signal(SIGTSTP, nostop_handler); > - signal(SIGCHLD, child_handler); > - signal(SIGALRM, alarm_handler); > + struct sigaction signal_handler; > + signal_handler.sa_handler = cleanup_handler; > + sigfillset(&signal_handler.sa_mask); > + signal_handler.sa_flags = 0; > + > + sigaction(SIGHUP, &signal_handler, NULL); > + sigaction(SIGINT, &signal_handler, NULL); > + sigaction(SIGTERM, &signal_handler, NULL); > + > + signal_handler.sa_handler = nostop_handler; > + > + sigaction(SIGTSTP, &signal_handler, NULL); > + > + signal_handler.sa_handler = alarm_handler; > + > + sigaction(SIGALRM, &signal_handler, NULL); > + > + signal_handler.sa_handler = child_handler; > + sigemptyset(&signal_handler.sa_mask); > + > + sigaction(SIGCHLD, &signal_handler, NULL); > } > > alarm(1000); > + alarm_msg = "Timeout during client startup.\n"; > /* long timeout for testing, where process may be stopped in debugger */ > > #ifdef USE_DDD > @@ -511,7 +560,7 @@ > } > > if (show_on) > - exit(0); > + exit(EXIT_SUCCESS); > > /*Hostid exchange start */ > /* accept incoming connections, read port numbers */ > @@ -522,6 +571,9 @@ > ACCEPT_HID: > sockaddr_len = sizeof(sockaddr); > s1 = accept(s, (struct sockaddr *) &sockaddr, &sockaddr_len); > + > + alarm_msg = "Timeout during hostid exchange.\n"; > + > if (s1 < 0) { > if (errno == EINTR) > goto ACCEPT_HID; > @@ -592,7 +644,7 @@ > hostids = (int *) malloc(hostidlen * nprocs); > if (hostids == NULL) { > perror("malloc"); > - exit(1); > + exit(EXIT_FAILURE); > } > } > > @@ -626,66 +678,33 @@ > } > } > > - /* close all opend sockets */ > - for (i = 0; i < nprocs; i++) { > - close(plist[i].control_socket); > - } > - > alarm(1000); > - /* let enbale the timer again*/ > + alarm_msg = "Timeout during address exchange.\n"; > + /* lets enable the timer again*/ > > /* Lets read all other information, LID QP,etc..*/ > > /* accept incoming connections, read port numbers */ > for (i = 0; i < nprocs; i++) { > - int version, rank, nread; > - char pidstr[12]; > -ACCEPT: > - sockaddr_len = sizeof(sockaddr); > - s1 = accept(s, (struct sockaddr *) &sockaddr, &sockaddr_len); > - if (s1 < 0) { > - if (errno == EINTR) > - goto ACCEPT; > - perror("accept"); > - cleanup(); > - } > + int nread; > > /* > * protocol: > - * We don't need version number, > - * 0. read rank of process > - * 1. read address length > - * 2. read address itself > - * 3. send array of all addresses > + * We don't need the version number or the rank, > + * 0. read address length > + * 1. read address itself > + * 2. send array of all addresses > */ > > - /* 0. Find out who we're talking to */ > - nread = read(s1, &rank, sizeof(rank)); > - if (nread != sizeof(rank)) { > - perror("read"); > - cleanup(); > - } > - > - if (rank < 0 || rank >= nprocs > - || plist[rank].state != P_STARTED) { > - fprintf(stderr, "mpirun: invalid rank received. \n"); > - cleanup(); > - } > - > - plist[rank].control_socket = s1; > - plist[rank].state = P_CONNECTED; > + plist[i].state = P_CONNECTED; > > /* Let us know connection was established > * printf("MPIRUN_RSH: Process rank %d connected\n",rank); > */ > > /* 1. Find out length of the data */ > - nread = read(s1, &addrlen, sizeof(addrlen)); > + nread = read(plist[i].control_socket, &addrlen, sizeof(addrlen)); > if (nread != sizeof(addrlen)) { > - /* nread == 0 is not actually an error! */ > - if (nread == 0) > - continue; > - > perror("read"); > cleanup(); > } > @@ -707,21 +726,20 @@ > alladdrs = (int *) malloc(addrlen * nprocs); > if (alladdrs == NULL) { > perror("malloc"); > - exit(1); > + exit(EXIT_FAILURE); > } > } > > /* 2. Read info from each process */ > > /* for byte location */ > - alladdrs_char = (char *) &alladdrs[rank * addrlen / sizeof(int)]; > + alladdrs_char = (char *) &alladdrs[i * addrlen / sizeof(int)]; > > tot_nread = 0; > > while (tot_nread < addrlen) { > - nread = > - read(s1, (void *) (alladdrs_char + tot_nread), > - addrlen - tot_nread); > + nread = read(plist[i].control_socket, > + (void *) (alladdrs_char + tot_nread), addrlen - tot_nread); > > if (nread < 0) { > perror("read"); > @@ -733,36 +751,32 @@ > > read_pid: > /* 3. Find out length of the data */ > - nread = read(s1, &pidlen, sizeof(pidlen)); > + nread = read(plist[i].control_socket, &pidlen, sizeof(pidlen)); > if (nread != sizeof(pidlen)) { > perror("read"); > cleanup(); > } > > /*fprintf(stderr, "read Pid lengths %d and %d \n", pidlen, nread);*/ > - if (i == 0) { > - pidglen = pidlen; > - } else { > - if (pidlen != pidglen) { > - fprintf(stderr, "Pid lengths %d and %d do not match\n", > - pidlen, pidglen); > - cleanup(); > - } > - } > + if (pidlen != pidglen) { > + fprintf(stderr, "Pid lengths %d and %d do not match\n", > + pidlen, pidglen); > + cleanup(); > + } > > if (i == 0) { > - /* allocate as soon as we know the address length */ > + /* allocate as soon as we know the pid length */ > allpids = (char *)malloc(pidlen * nprocs); > if (allpids == NULL) { > perror("malloc"); > - exit(1); > + exit(EXIT_FAILURE); > } > } > > tot_nread=0; > while(tot_nread < pidlen) { > - nread = read(s1, (void*)(allpids+rank*pidlen+tot_nread), > - pidlen - tot_nread); > + nread = read(plist[i].control_socket, > + (void*)(allpids+i*pidlen+tot_nread), pidlen - tot_nread); > /*fprintf(stderr, "read length %d \n", nread);*/ > if(nread < 0) { > perror("read"); > @@ -770,6 +784,9 @@ > } > tot_nread += nread; > } > + > + plist[i].remote_pid = *((pid_t *)(allpids+i*pidlen)); > + pglist_insert(plist[i].hostname, plist[i].remote_pid); > } > > > @@ -795,7 +812,7 @@ > out_addrs = (int *) malloc(out_addrs_len); > if (out_addrs == NULL) { > perror("malloc"); > - exit(1); > + exit(EXIT_FAILURE); > } > > for (i = 0; i < nprocs; i++) { > @@ -876,8 +893,7 @@ > sleep(100); > } > close(s); > - exit(0); > - > + exit(EXIT_SUCCESS); > } > > int start_process(int i, char *command_name, char *env) > @@ -925,12 +941,12 @@ > if ((remote_command = malloc(str_len)) == NULL) { > fprintf(stderr, "Failed to malloc %d bytes for remote_command\n", > str_len); > - exit(1); > + exit(EXIT_FAILURE); > } > if ((xterm_command = malloc(str_len)) == NULL) { > fprintf(stderr, "Failed to malloc %d bytes for xterm_command\n", > str_len); > - exit(1); > + exit(EXIT_FAILURE); > } > > > @@ -1010,7 +1026,7 @@ > if (!show_on) { > perror("RSH/SSH command failed!"); > } > - exit(1); > + exit(EXIT_FAILURE); > } > > free(remote_command); > @@ -1189,8 +1205,6 @@ > fprintf(stderr, "\ta.out => " "name of MPI binary\n"); > fprintf(stderr, "\targs => " "arguments for MPI binary\n"); > fprintf(stderr, "\n"); > - > - exit(1); > } > > /* finds first non-whitespace char in input string */ > @@ -1221,7 +1235,7 @@ > if (hf == NULL) { > fprintf(stderr, "Can't open hostfile %s\n", hostfile_name); > perror("open"); > - exit(1); > + exit(EXIT_FAILURE); > } > > for (i = 0; i < nprocs; i++) { > @@ -1287,7 +1301,7 @@ > } else { > fprintf(stderr, "End of file reached on " > "hostfile at %d of %d hostnames\n", i, nprocs); > - exit(1); > + exit(EXIT_FAILURE); > } > } > fclose(hf); > @@ -1321,14 +1335,14 @@ > if ((pf = fopen(paramfile, "r")) == NULL) { > sprintf(errstr, "Cant open paramfile = %s", paramfile); > perror(errstr); > - exit(1); > + exit(EXIT_FAILURE); > } > > if ( strlen(env) == 0 ){ > /* Allocating space for env first time */ > if ((env = malloc(ENV_LEN)) == NULL) { > fprintf(stderr, "Malloc of env failed in read_param_file\n"); > - exit(1); > + exit(EXIT_FAILURE); > } > env_left = ENV_LEN - 1; > }else{ > @@ -1367,7 +1381,7 @@ > (ENV_LEN > e_len + 1 ? ENV_LEN : e_len + 1) + strlen(env); > if ((env = realloc(env, newlen)) == NULL) { > fprintf(stderr, "realloc failed in read_param_file\n"); > - exit(1); > + exit(EXIT_FAILURE); > } > if (param_debug) { > printf("realloc to %d\n", newlen); > @@ -1395,15 +1409,213 @@ > } > cleanup(); > > - exit(1); > + exit(EXIT_FAILURE); > +} > + > +void pglist_print(void) { > + if(pglist) { > + int i, j; > + size_t npids = 0, npids_allocated = 0; > + > + fprintf(stderr, "\n--pglist--\ndata:\n"); > + for(i = 0; i < pglist->npgs; i++) { > + fprintf(stderr, "%p - %s:", &pglist->data[i], > + pglist->data[i].hostname); > + > + for(j = 0; j < pglist->data[i].npids; j++) { > + fprintf(stderr, " %d", pglist->data[i].pids[j]); > + } > + > + fprintf(stderr, "\n"); > + npids += pglist->data[i].npids; > + npids_allocated += pglist->data[i].npids_allocated; > + } > + > + fprintf(stderr, "\nindex:"); > + for(i = 0; i < pglist->npgs; i++) { > + fprintf(stderr, " %p", pglist->index[i]); > + } > + > + fprintf(stderr, "\nnpgs/allocated: %d/%d (%d%%)\n", pglist->npgs, > + pglist->npgs_allocated, (int)(pglist->npgs_allocated ? 100. * > + pglist->npgs / pglist->npgs_allocated : 100.)); > + fprintf(stderr, "npids/allocated: %d/%d (%d%%)\n", npids, > + npids_allocated, (int)(npids_allocated ? 100. * npids / > + npids_allocated : 100.)); > + fprintf(stderr, "--pglist--\n\n"); > + } > +} > + > +void pglist_insert(const char * const hostname, const pid_t const pid) { > + const size_t increment = nprocs > 4 ? nprocs / 4 : 1; > + size_t index = 0, bottom = 0, top; > + static size_t alloc_error = 0; > + int i, strcmp_result; > + process_group * pg; > + void * backup_ptr; > + > + if(alloc_error) return; > + if(pglist == NULL) goto init_pglist; > + > + top = pglist->npgs - 1; > + index = (top + bottom) / 2; > + > + while(strcmp_result = strcmp(hostname, pglist->index[index]->hostname)) { > + if(bottom >= top) break; > + > + if(strcmp_result > 0) { > + bottom = index + 1; > + } > + > + else { > + top = index - 1; > + } > + > + index = (top + bottom) / 2; > + } > + > + if(!strcmp_result) goto insert_pid; > + if(strcmp_result > 0) index++; > + > + goto add_process_group; > + > +init_pglist: > + pglist = malloc(sizeof(process_groups)); > + > + if(pglist) { > + pglist->data = NULL; > + pglist->index = NULL; > + pglist->npgs = 0; > + pglist->npgs_allocated = 0; > + } > + > + else { > + goto register_alloc_error; > + } > + > +add_process_group: > + if(pglist->npgs == pglist->npgs_allocated) { > + process_group * pglist_data_backup = pglist->data; > + process_group ** pglist_index_backup = pglist->index; > + ptrdiff_t offset; > + > + pglist->npgs_allocated += increment; > + > + backup_ptr = pglist->data; > + pglist->data = realloc(pglist->data, sizeof(process_group) * > + pglist->npgs_allocated); > + > + if(pglist->data == NULL) { > + pglist->data = backup_ptr; > + goto register_alloc_error; > + } > + > + backup_ptr = pglist->index; > + pglist->index = realloc(pglist->index, sizeof(process_group *) * > + pglist->npgs_allocated); > + > + if(pglist->index == NULL) { > + pglist->index = backup_ptr; > + goto register_alloc_error; > + } > + > + if(offset = (size_t)pglist->data - (size_t)pglist_data_backup) { > + for(i = 0; i < pglist->npgs; i++) { > + pglist->index[i] = (process_group *)((size_t)pglist->index[i] + > + offset); > + } > + } > + } > + > + for(i = pglist->npgs; i > index; i--) { > + pglist->index[i] = pglist->index[i-1]; > + } > + > + pglist->data[pglist->npgs].hostname = hostname; > + pglist->data[pglist->npgs].pids = NULL; > + pglist->data[pglist->npgs].npids = 0; > + pglist->data[pglist->npgs].npids_allocated = 0; > + > + pglist->index[index] = &pglist->data[pglist->npgs++]; > + > +insert_pid: > + pg = pglist->index[index]; > + > + if(pg->npids == pg->npids_allocated) { > + if(pg->npids_allocated) { > + pg->npids_allocated <<= 1; > + > + if(pg->npids_allocated < pg->npids) pg->npids_allocated = SIZE_MAX; > + if(pg->npids_allocated > nprocs) pg->npids_allocated = nprocs; > + } > + > + else { > + pg->npids_allocated = 1; > + } > + > + backup_ptr = pg->pids; > + pg->pids = realloc(pg->pids, pg->npids_allocated * sizeof(pid_t)); > + > + if(pg->pids == NULL) { > + pg->pids = backup_ptr; > + goto register_alloc_error; > + } > + } > + > + pg->pids[pg->npids++] = pid; > + > + return; > + > +register_alloc_error: > + if(pglist) { > + if(pglist->data) { > + process_group * pg = pglist->data; > + > + while(pglist->npgs--) { > + if(pg->pids) free((pg++)->pids); > + } > + > + free(pglist->data); > + } > + > + if(pglist->index) free(pglist->index); > + > + free(pglist); > + } > + > + alloc_error = 1; > +} > + > +void free_memory(void) { > + if(pglist) { > + if(pglist->data) { > + process_group * pg = pglist->data; > + > + while(pglist->npgs--) { > + if(pg->pids) free((pg++)->pids); > + } > + > + free(pglist->data); > + } > + > + if(pglist->index) free(pglist->index); > + > + free(pglist); > + } > + > + if(plist) { > + while(nprocs--) { > + if(plist[nprocs].device) free(plist[nprocs].device); > + if(plist[nprocs].hostname) free(plist[nprocs].hostname); > + } > + > + free(plist); > + } > } > > void cleanup(void) > { > int i; > - /* could walk through list of processes, but it looks > - like we can just send the signal to the process group > - */ > > if (use_totalview) { > fprintf(stderr, "Cleaning up all processes ..."); > @@ -1417,36 +1629,180 @@ > } > > for (i = 0; i < nprocs; i++) { > - if (RUNNING(i)) { > - /* send terminal interrupt, which will hopefully > - propagate to the other side. (not sure what xterm will > - do here. > - */ > - kill(plist[i].pid, SIGINT); > - } > + if (RUNNING(i)) { > + /* send terminal interrupt, which will hopefully > + propagate to the other side. (not sure what xterm will > + do here. > + */ > + kill(plist[i].pid, SIGINT); > + } > } > + > sleep(1); > > for (i = 0; i < nprocs; i++) { > - if (plist[i].state != P_NOTSTARTED) { > - /* send regular interrupt to rsh */ > - kill(plist[i].pid, SIGTERM); > - } > + if (plist[i].state != P_NOTSTARTED) { > + /* send regular interrupt to rsh */ > + kill(plist[i].pid, SIGTERM); > + } > } > > sleep(1); > > for (i = 0; i < nprocs; i++) { > - if (plist[i].state != P_NOTSTARTED) { > - /* Kill the processes */ > - kill(plist[i].pid, SIGKILL); > - } > + if (plist[i].state != P_NOTSTARTED) { > + /* Kill the processes */ > + kill(plist[i].pid, SIGKILL); > + } > + } > + > + if(pglist) { > + rkill_fast(); > + } > + > + else { > + rkill_linear(); > + } > + > + exit(EXIT_FAILURE); > +} > + > +void rkill_fast(void) { > + int i, j, tryagain, spawned_pid[pglist->npgs]; > + > + fprintf(stderr, "Killing remote processes..."); > + > + for(i = 0; i < pglist->npgs; i++) { > + if(0 == (spawned_pid[i] = fork())) { > + if(pglist->index[i]->npids) { > + const size_t bufsize = 40 + 10 * pglist->index[i]->npids; > + const process_group * pg = pglist->index[i]; > + char kill_cmd[bufsize], tmp[10]; > + > + kill_cmd[0] = '\0'; > + strcat(kill_cmd, "kill -s SIGKILL"); > + > + for(j = 0; j < pg->npids; j++) { > + snprintf(tmp, 10, " %d", pg->pids[j]); > + strcat(kill_cmd, tmp); > + } > + > + strcat(kill_cmd, " >&/dev/null"); > + > + if(use_rsh) { > + execl(RSH_CMD, RSH_CMD, pg->hostname, kill_cmd, NULL); > + } > + > + else { > + execl(SSH_CMD, SSH_CMD, SSH_ARG, "-x", pg->hostname, > + kill_cmd, NULL); > + } > + > + perror(NULL); > + exit(EXIT_FAILURE); > + } > + > + else { > + exit(EXIT_SUCCESS); > + } > + } > + } > + > + while(1) { > + static int iteration = 0; > + tryagain = 0; > + > + sleep(1 << iteration); > + > + for (i = 0; i < pglist->npgs; i++) { > + if(spawned_pid[i]) { > + if(!(spawned_pid[i] = waitpid(spawned_pid[i], NULL, WNOHANG))) { > + tryagain = 1; > + } > + } > + } > + > + if(++iteration == 5 || !tryagain) { > + fprintf(stderr, "DONE\n"); > + break; > + } > + } > + > + if(tryagain) { > + fprintf(stderr, "The following processes may have not been killed:\n"); > + for (i = 0; i < pglist->npgs; i++) { > + if(spawned_pid[i]) { > + const process_group * pg = pglist->index[i]; > + > + fprintf(stderr, "%s:", pg->hostname); > + > + for (j = 0; j < pg->npids; j++) { > + fprintf(stderr, " %d", pg->pids[j]); > + } > + > + fprintf(stderr, "\n"); > + } > + } > + } > +} > + > +void rkill_linear(void) { > + int i, j, tryagain, spawned_pid[nprocs]; > + > + fprintf(stderr, "Killing remote processes..."); > + > + for (i = 0; i < nprocs; i++) { > + if(0 == (spawned_pid[i] = fork())) { > + char kill_cmd[80]; > + > + if(!plist[i].remote_pid) exit(EXIT_SUCCESS); > + > + snprintf(kill_cmd, 80, "kill -s SIGKILL %d >&/dev/null", > + plist[i].remote_pid); > + > + if(use_rsh) { > + execl(RSH_CMD, RSH_CMD, plist[i].hostname, kill_cmd, NULL); > + } > + > + else { > + execl(SSH_CMD, SSH_CMD, SSH_ARG, "-x", > + plist[i].hostname, kill_cmd, NULL); > + } > + > + perror(NULL); > + exit(EXIT_FAILURE); > + } > } > > - fprintf(stderr, "done.\n"); > + while(1) { > + static int iteration = 0; > + tryagain = 0; > + > + sleep(1 << iteration); > + > + for (i = 0; i < nprocs; i++) { > + if(spawned_pid[i]) { > + if(!(spawned_pid[i] = waitpid(spawned_pid[i], NULL, WNOHANG))) { > + tryagain = 1; > + } > + } > + } > > - exit(1); > + if(++iteration == 5 || !tryagain) { > + fprintf(stderr, "DONE\n"); > + break; > + } > + } > > + if(tryagain) { > + fprintf(stderr, "The following processes may have not been killed:\n"); > + for (i = 0; i < nprocs; i++) { > + if(spawned_pid[i]) { > + fprintf(stderr, "%s [%d]\n", plist[i].hostname, > + plist[i].remote_pid); > + } > + } > + } > } > > > @@ -1457,9 +1813,13 @@ > > void alarm_handler(int signal) > { > + extern const char * alarm_msg; > + > if (use_totalview) { > fprintf(stderr, "Timeout alarm signaled\n"); > } > + > + if(alarm_msg) fprintf(stderr, alarm_msg); > cleanup(); > } > > @@ -1467,19 +1827,21 @@ > void child_handler(int signal) > { > int status, i, child, pid; > - int exitstatus = 0; > + int exitstatus = EXIT_SUCCESS; > > if (use_totalview) { > fprintf(stderr, "mpirun: child died. Waiting for others.\n"); > } > alarm(10); > + alarm_msg = "Child died. Timeout while waiting for others.\n"; > + > for (i = 0; i < nprocs; i++) { > pid = wait(&status); > if (pid == -1) { > perror("wait"); > - exitstatus = 1; > + exitstatus = EXIT_FAILURE; > } else if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) { > - exitstatus = 1; > + exitstatus = EXIT_FAILURE; > } > for (child = 0; child < nprocs; child++) { > if (plist[child].pid == pid) { > @@ -1489,9 +1851,11 @@ > } > if (child == nprocs) { > fprintf(stderr, "Unable to find child %d!\n", pid); > - exitstatus = 1; > + exitstatus = EXIT_FAILURE; > } > } > alarm(0); > exit(exitstatus); > } > + > +/* vi:set sw=4 sts=4 tw=80: */ > diff -ruN 0.9.9/mpid/ch_gen2/process/pmgr_client.h exp1/mpid/ch_gen2/process/pmgr_client.h > --- 0.9.9/mpid/ch_gen2/process/pmgr_client.h 2007-05-29 03:47:10.000000000 -0400 > +++ exp1/mpid/ch_gen2/process/pmgr_client.h 2007-07-02 12:59:51.000000000 -0400 > @@ -108,6 +108,6 @@ > * of the spawner, e.g. mpirun_rsh, to check that it understands > * the version of the executable. > */ > -#define PMGR_VERSION 5 > +#define PMGR_VERSION 6 > > #endif > diff -ruN 0.9.9/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c exp1/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c > --- 0.9.9/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c 2007-05-29 03:47:10.000000000 -0400 > +++ exp1/mpid/ch_gen2/process/pmgr_client_mpirun_rsh.c 2007-07-02 12:59:51.000000000 -0400 > @@ -171,6 +171,9 @@ > int nwritten; > int version; > struct sockaddr_in sockaddr; > + > + if(phase != 0) return; > + > /* > * Exchange information with the mpirun program. Send it our > * socket address, get back addresses for our siblings. > @@ -208,14 +211,12 @@ > */ > > version = PMGR_VERSION; > - if (0 == phase) { > - /* first, send a version number */ > - nwritten = write(mpirun_socket, &version, sizeof(version)); > - if (nwritten != sizeof(version)) { > - sleep(2); > - perror("write"); > - exit(1); > - } > + /* first, send a version number */ > + nwritten = write(mpirun_socket, &version, sizeof(version)); > + if (nwritten != sizeof(version)) { > + sleep(2); > + perror("write"); > + exit(1); > } > > /* next, send our rank */ > @@ -264,7 +265,6 @@ > tot_nread = tot_nread + nread; > } > fflush(stdout); > - close(mpirun_socket); > return 1; > } > > @@ -280,7 +280,6 @@ > pid_t *ppids = (pid_t *)pallpids; > pid_t *allpids = NULL; > > - pmgr_init_connection(1); > /* next, send size of addr */ > nwritten = write(mpirun_socket, &addrlen, sizeof(addrlen)); > if (nwritten != sizeof(addrlen)) { > @@ -314,7 +313,7 @@ > exit(1); > } > > - /* next, send size of addr */ > + /* next, send size of pid */ > nwritten = write(mpirun_socket, &pidlen, sizeof(pidlen)); > if (nwritten != sizeof(mypid_len)) { > sleep(2); > @@ -322,6 +321,7 @@ > exit(1); > } > > + /* next, send our pid */ > if (pidlen != 0) { > nwritten = write(mpirun_socket, &my_pid_int, (size_t) pidlen); > if (nwritten != pidlen) { > @@ -345,7 +345,7 @@ > > if (pidlen != 0) { > tot_nread=0; > - /* finally, read addresses from all processes */ > + /* finally, read pids from all processes */ > while (tot_nread < pmgr_nprocs*pidlen) { > nread = read(mpirun_socket, (void*)((char *)allpids+tot_nread), > (size_t) ((pmgr_nprocs*pidlen)-tot_nread)); -- *********************************** >> Mark J. Potts, PhD >> >> HPC Applications Inc. >> phone: 410-992-8360 Bus >> 410-313-9318 Home >> 443-418-4375 Cell >> email: potts@hpcapplications.com >> potts@excray.com *********************************** From cap at nsc.liu.se Thu Jul 12 03:29:08 2007 From: cap at nsc.liu.se (Peter Kjellstrom) Date: Thu Jul 12 03:30:04 2007 Subject: [mvapich-discuss] mvapich: using IPoIB or direct IB? In-Reply-To: <98E55D6E1B3CFD43BDA59EEB56DD7D72275836@sbs01.xiss.private> References: <98E55D6E1B3CFD43BDA59EEB56DD7D72275836@sbs01.xiss.private> Message-ID: <200707120929.12570.cap@nsc.liu.se> On Wednesday 11 July 2007, Michael Zebrowski wrote: > Does anyone know a good way to determine whether mvapich is utilizing > IPoIB or direct IB during its run? two out-of-band sanity checks: 1) If the counters on ib0 (ifconfig ib0) doesn't tick up then you're running over verbs. 2) if the latency you see is <10 us then you're running over verbs > I have validated that the packet > counters increase significantly (using OFED's perfquery) on nodes that > are referenced during the mpi run (and do not increase when they are not > referenced), but I wanted to confirm that this is an accurate approach > in determining this. It is not, perfquery will show both verbs and IPoIB traffic, it will show all IB traffic afaik. /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070712/65ad9df4/attachment.bin From bfp at purdue.edu Thu Jul 12 10:23:04 2007 From: bfp at purdue.edu (Bryan Putnam) Date: Thu Jul 12 10:23:25 2007 Subject: [mvapich-discuss] mvapich2 and malloc In-Reply-To: <4496CBA4.8050203@mellanox.co.il> References: <000001c69347$065babf0$6500a8c0@Rowley> <16df8ae40606190437g7768d256j1df417617e773a37@mail.gmail.com> <4496CBA4.8050203@mellanox.co.il> Message-ID: Hi, I've run across a piece of code that fails (hangs) with mvapich2, but runs succesfully with mpich2 and mpich. The problem seems to occur when the number of bytes being allocated is greater than the largest 32-bit integer. So, even though we used 64-bit compilers to build this version of mvapich2, it appears that mvapich2 may be using its own version of malloc that isn't able to handle 64-bit addresses. Is this a known problem? Thanks, Bryan I've appended the code for your enjoyment in case you'd like to experiment with it. It works OK with ncol=nrow=nsec=812, but fails with 813. In the later case, the # of bytes exceeds the max 32-bit integer. program alloc3 use mpi c include 'mpif.h' integer me, nt, mpierr, status(MPI_STATUS_SIZE) integer*4 allocate_stat real*4, allocatable :: x(:,:,:) c real*8, allocatable :: x(:,:,:) call MPI_INIT(mpierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nt, mpierr) call MPI_COMM_RANK(MPI_COMM_WORLD, me, mpierr) c max int = 2147483647 c 813**3 * 4 = 2149471188 c 812**3 * 4 = 2141549312 NCOL = 813 NROW = 813 NSEC = 813 c NCOL = 812 c NROW = 812 c NSEC = 812 ALLOCATE(X(NCOL,NROW,NSEC), STAT=ALLOCATE_STAT) IF( ALLOCATE_STAT .NE. 0)THEN print *,'Can not allocated memory in GET_PRJSPFTS_G' ENDIF DO I = 1,NSEC DO J = 1,NROW DO K = 1, NCOL X(K,J,I) = 0.0 ENDDO ENDDO if(me.eq.0)print *,'finish initilize map3d sec',I ENDDO dealloca