From divi at ncat.edu Tue Jun 3 13:36:17 2008 From: divi at ncat.edu (Divi Venkateswarlu) Date: Tue Jun 3 14:21:05 2008 Subject: [mvapich-discuss] how to set "ulimit -l unlimited" at user level? Message-ID: <000501c8c5a0$552163f0$080aa8c0@DJ85D2C1> Hello: I am running ROCKS-5 on two DP quad-core machines with mellanox IB HCA cards. I compiled mvapich with ifort without any problems. I am able to run at root level with NO problems. I could set ulimit -l unlimited to increase RLIMIT_MEMLOCK size. My program (PMEMD of AMBER package) runs on all 16-cores with no hiccups. When I try to set ulimit -l unlimited at user level, I get the following error message. -bash: ulimit: max locked memory: cannot modify limit: Operation not permitted. Can somebody help me how to fix this problem? I am running mvapich-1.0 Thanks a lot for your help Divi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080603/ec7dfc54/attachment.html From gopalakk at cse.ohio-state.edu Tue Jun 3 14:47:47 2008 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Tue Jun 3 14:48:03 2008 Subject: [mvapich-discuss] how to set "ulimit -l unlimited" at user level? In-Reply-To: <000501c8c5a0$552163f0$080aa8c0@DJ85D2C1> References: <000501c8c5a0$552163f0$080aa8c0@DJ85D2C1> Message-ID: <92eddfb50806031147p2a2e4303jbbca9ee3be397d52@mail.gmail.com> Try adding the "ulimit -c unlimited" line to /etc/profile first. Regards, Karthik On Tue, Jun 3, 2008 at 1:36 PM, Divi Venkateswarlu wrote: > > Hello: > > I am running ROCKS-5 on two DP quad-core machines with mellanox IB HCA > cards. > I compiled mvapich with ifort without any problems. > > I am able to run at root level with NO problems. I could set ulimit -l > unlimited to increase > RLIMIT_MEMLOCK size. My program (PMEMD of AMBER package) runs on all > 16-cores with no hiccups. > > When I try to set ulimit -l unlimited at user level, I get the > following error message. > > -bash: ulimit: max locked memory: cannot modify limit: Operation not > permitted. > > Can somebody help me how to fix this problem? I am running mvapich-1.0 > > Thanks a lot for your help > > Divi > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From Lenny_Neil at raytheon.com Tue Jun 3 22:44:08 2008 From: Lenny_Neil at raytheon.com (Lenny Neil) Date: Tue Jun 3 22:58:40 2008 Subject: [mvapich-discuss] problems with OFED installation with MVAPICH2 RPM Message-ID: Good evening The installation of OFED 1.3 and 1.3.1 - I have been running into some issues installing on RHEL 5.1 kernel-2.6.18-53.el5.*64. I'm also seeing installation failures with OFED with the latest RHEL 5.1 kernel-2.6.18-92, which your documentation says it's not covered as of yet. Do you know when you'll have a updated OFED package that will install on the latest RHEL 5.1 kernel? Any ideas why we are seeing the error below? The RHEL 5.1 system running kernel-2.6.18-53 has gcc and gcc++ installed with the development package. Here is the error Building the MVAPICH2 RPM in the OFA configuration. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist ' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' --define 'rdma_cm 0' --define 'blcr_home /usr/ofed' --define 'ckpt 1' --define 'open_ib_home /usr' --define '_usr /usr' --define 'shared_libs 1' --define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran F90=gfortran' --define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_prefix /usr/mpi/gcc/mvapich2-1.0.3' --define 'ofa_build 0' /local/OFED-1.3.1/SRPMS/mvapich2-1.0.3-1.src.rpm Failed to build mvapich2 RPM See /tmp/OFED.4596.logs/mvapich2.rpmbuild.log checking for C compiler default output file name... configure: error: C compiler cannot create executa bles Failure in configuration. Please file an error report to mvapich-discuss@cse.ohio-state.edu with all your log files. error: Bad exit status from /var/tmp/rpm-tmp.75105 (%install) Any help would be greatly appreciated Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080603/baf2f1c2/attachment.html From bboas at systemfabricworks.com Wed Jun 4 08:02:21 2008 From: bboas at systemfabricworks.com (Bill Boas) Date: Wed Jun 4 09:23:16 2008 Subject: [mvapich-discuss] RE: problems with OFED installation with MVAPICH2 RPM In-Reply-To: References: Message-ID: <003801c8c63a$d87a0fa0$020da8c0@YOURCB10AA3FFD> Lenny, The membership mail-list at OFA comes to me, amongst others. The mail list on which you will get responses to your questions is the Enterprise Working Group as they plan, create, test and release OFED on behalf of the Open Fabrics Alliance (OFA). Let me know if there are other issues or questions that you are aware of. Thanks. Bill. Bill Boas VP, Business Development Vice Chair, OFA System Fabric Works 510-375-8840 bboas@systemfabricworks.com www.systemfabricworks.com _____ From: Lenny Neil [mailto:Lenny_Neil@raytheon.com] Sent: Tuesday, June 03, 2008 7:44 PM To: membership@openfabrics.org; mvapich-discuss@cse.ohio-state.edu Cc: Richard A Stephens Jr; Russell Dube; David R Gulla; Sean T Price Subject: problems with OFED installation with MVAPICH2 RPM Good evening The installation of OFED 1.3 and 1.3.1 - I have been running into some issues installing on RHEL 5.1 kernel-2.6.18-53.el5.*64. I'm also seeing installation failures with OFED with the latest RHEL 5.1 kernel-2.6.18-92, which your documentation says it's not covered as of yet. Do you know when you'll have a updated OFED package that will install on the latest RHEL 5.1 kernel? Any ideas why we are seeing the error below? The RHEL 5.1 system running kernel-2.6.18-53 has gcc and gcc++ installed with the development package. Here is the error Building the MVAPICH2 RPM in the OFA configuration. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist ' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' --define 'rdma_cm 0' --define 'blcr_home /usr/ofed' --define 'ckpt 1' --define 'open_ib_home /usr' --define '_usr /usr' --define 'shared_libs 1' --define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran F90=gfortran' --define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_prefix /usr/mpi/gcc/mvapich2-1.0.3' --define 'ofa_build 0' /local/OFED-1.3.1/SRPMS/mvapich2-1.0.3-1.src.rpm Failed to build mvapich2 RPM See /tmp/OFED.4596.logs/mvapich2.rpmbuild.log checking for C compiler default output file name... configure: error: C compiler cannot create executa bles Failure in configuration. Please file an error report to mvapich-discuss@cse.ohio-state.edu with all your log files. error: Bad exit status from /var/tmp/rpm-tmp.75105 (%install) Any help would be greatly appreciated Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080604/6b91b6f9/attachment-0001.html From kus at free.net Wed Jun 4 12:15:50 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Wed Jun 4 12:16:09 2008 Subject: [mvapich-discuss] Re: how to set "ulimit -l unlimited" at user level? In-Reply-To: <200806041351.m54Dp0oa029739@cse.ohio-state.edu> Message-ID: I don't know about Rocks-5 itself, but in general you should add 2 lines to /etc/security/limits.conf file: * soft memlock unlimited * hard memlock unlimited But for some more old Linux kernels you should recompile the kernel with changing of RLIMIT_MEMLOCK value in /include/linux/resource.h :-( Mikhail Kuzminsky, Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry Russ. Ac. Sci. Moscow From gopalakk at cse.ohio-state.edu Wed Jun 4 17:13:13 2008 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Wed Jun 4 17:13:31 2008 Subject: [mvapich-discuss] problems with OFED installation with MVAPICH2 RPM In-Reply-To: References: Message-ID: <92eddfb50806041413g138dadf6vac87b1d4db38c44e@mail.gmail.com> Hi Lenny. I see that you are trying to compile MVAPICH2 with Checkpoint / Restart support. This requires you to install BLCR and set the "BLCR_HOME" environment variable to the path of the BLCR installation. Please refer to http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2.html#x1-290006.3 for more information. However, the error message indicates that your C compiler is unable to generate a.out's. Can you please try to compile a simple "Hello World" program to ensure that you have a sane build environment. Regards, Karthik On Tue, Jun 3, 2008 at 10:44 PM, Lenny Neil wrote: > > Good evening > > The installation of OFED 1.3 and 1.3.1 - I have been running into some > issues installing on RHEL 5.1 kernel-2.6.18-53.el5.*64. I'm also seeing > installation failures with OFED with the latest RHEL 5.1 kernel-2.6.18-92, > which your documentation says it's not covered as of yet. > > Do you know when you'll have a updated OFED package that will install on the > latest RHEL 5.1 kernel? > > Any ideas why we are seeing the error below? The RHEL 5.1 system running > kernel-2.6.18-53 has gcc and gcc++ installed with the development package. > > Here is the error > > Building the MVAPICH2 RPM in the OFA configuration. Please wait... > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define > 'dist ' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' > --define 'rdma_cm 0' --define 'blcr_home /usr/ofed' --define 'ckpt 1' > --define 'open_ib_home /usr' --define '_usr /usr' --define 'shared_libs 1' > --define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran > F90=gfortran' --define 'auto_req 0' --define 'mpi_selector > /usr/bin/mpi-selector' --define '_prefix /usr/mpi/gcc/mvapich2-1.0.3' > --define 'ofa_build 0' /local/OFED-1.3.1/SRPMS/mvapich2-1.0.3-1.src.rpm > Failed to build mvapich2 RPM > See /tmp/OFED.4596.logs/mvapich2.rpmbuild.log > > checking for C compiler default output file name... configure: error: C > compiler cannot create executa > bles > > Failure in configuration. > Please file an error report to mvapich-discuss@cse.ohio-state.edu with all > your log files. > error: Bad exit status from /var/tmp/rpm-tmp.75105 (%install) > > Any help would be greatly appreciated > > Thanks, > Lenny Neil > Sudbury Site Support Group Lead > SRP Computer Systems Lead > Raytheon Integrated Defense Systems > SSC - Sudbury, MA > Work 978-440-2876 > Cell 978-423-5120 > Pager 978-245-1941 > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From rafaarco at ugr.es Thu Jun 5 04:48:20 2008 From: rafaarco at ugr.es (Rafael Arco Arredondo) Date: Thu Jun 5 04:48:40 2008 Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier command Message-ID: <1212655700.3878.13.camel@boabdilmec.ugr.es> Hello, We are having some issues with MVAPICH2-1.0.2 and the OFA drivers for InfiniBand. The compilation process of MVAPICH2 ends successfully, and the applications compile with (apparently) no problems with mpicc. However, mpiexec fails when programs are executed on more than one computer. Particularly, MPI_Finalize reports an error which comes from MPIDI_CH3I_RMDA_finalize. The error occurs both for OFA-Gen2 and uDAPL. We are using daemonless smpd Here is the command executed and its output: mpiexec -rsh -nopm -n 10 -machinefile ./hosts /home/rafaarco/mmul Task 0 of 10 Task 3 of 10 mpi_matrix_mult_slave() Task 4 of 10 mpi_matrix_mult_slave() Task 5 of 10 mpi_matrix_mult_slave() Task 7 of 10 mpi_matrix_mult_slave() Task 8 of 10 mpi_matrix_mult_slave() Task 9 of 10 mpi_matrix_mult_slave() Task 1 of 10 mpi_matrix_mult_slave() Task 2 of 10 mpi_matrix_mult_slave() Task 6 of 10 mpi_matrix_mult_slave() mpi_matrix_mult_master() Exiting task 1 of 10 Exiting task 2 of 10 Exiting task 3 of 10 Exiting task 4 of 10 Exiting task 5 of 10 Exiting task 6 of 10 Exiting task 7 of 10 Exiting task 8 of 10 Exiting task 9 of 10 Time: 3.258242 Exiting task 0 of 10 [0] unable to post a write of the barrier command. [0] PMI_Barrier failed. Fatal error in MPI_Finalize: Other MPI error, error stack: MPI_Finalize(234)............: MPI_Finalize failed MPI_Finalize(154)............: MPID_Finalize(132)...........: MPIDI_CH3_Finalize(87).......: MPI_Finalize failed MPIDI_CH3_Finalize(70).......: MPIDI_CH3I_RMDA_finalize(736): PMI_Barrier returned -1 Any clues about what the problem may be? Thanks in advance, Rafa -- Rafael Arco Arredondo Centro de Servicios de Inform?tica y Redes de Comunicaciones Campus de Fuentenueva - Edificio Mecenas Universidad de Granada From Lenny_Neil at raytheon.com Wed Jun 4 08:09:00 2008 From: Lenny_Neil at raytheon.com (Lenny Neil) Date: Thu Jun 5 19:23:41 2008 Subject: [mvapich-discuss] RE: problems with OFED installation with MVAPICH2 RPM In-Reply-To: <003801c8c63a$d87a0fa0$020da8c0@YOURCB10AA3FFD> Message-ID: Good Morning Bill Thank you for forwarding my email and for the fast response. Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 "Bill Boas" 06/04/2008 08:03 AM To "'Lenny Neil'" , "'OpenFabrics EWG'" , cc "'Richard A Stephens Jr'" , "'Russell Dube'" , "'David R Gulla'" , "'Sean T Price'" Subject RE: problems with OFED installation with MVAPICH2 RPM Lenny, The membership mail-list at OFA comes to me, amongst others. The mail list on which you will get responses to your questions is the Enterprise Working Group as they plan, create, test and release OFED on behalf of the Open Fabrics Alliance (OFA). Let me know if there are other issues or questions that you are aware of. Thanks. Bill. Bill Boas VP, Business Development Vice Chair, OFA System Fabric Works 510-375-8840 bboas@systemfabricworks.com www.systemfabricworks.com From: Lenny Neil [mailto:Lenny_Neil@raytheon.com] Sent: Tuesday, June 03, 2008 7:44 PM To: membership@openfabrics.org; mvapich-discuss@cse.ohio-state.edu Cc: Richard A Stephens Jr; Russell Dube; David R Gulla; Sean T Price Subject: problems with OFED installation with MVAPICH2 RPM Good evening The installation of OFED 1.3 and 1.3.1 - I have been running into some issues installing on RHEL 5.1 kernel-2.6.18-53.el5.*64. I'm also seeing installation failures with OFED with the latest RHEL 5.1 kernel-2.6.18-92, which your documentation says it's not covered as of yet. Do you know when you'll have a updated OFED package that will install on the latest RHEL 5.1 kernel? Any ideas why we are seeing the error below? The RHEL 5.1 system running kernel-2.6.18-53 has gcc and gcc++ installed with the development package. Here is the error Building the MVAPICH2 RPM in the OFA configuration. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist ' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' --define 'rdma_cm 0' --define 'blcr_home /usr/ofed' --define 'ckpt 1' --define 'open_ib_home /usr' --define '_usr /usr' --define 'shared_libs 1' --define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran F90=gfortran' --define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_prefix /usr/mpi/gcc/mvapich2-1.0.3' --define 'ofa_build 0' /local/OFED-1.3.1/SRPMS/mvapich2-1.0.3-1.src.rpm Failed to build mvapich2 RPM See /tmp/OFED.4596.logs/mvapich2.rpmbuild.log checking for C compiler default output file name... configure: error: C compiler cannot create executa bles Failure in configuration. Please file an error report to mvapich-discuss@cse.ohio-state.edu with all your log files. error: Bad exit status from /var/tmp/rpm-tmp.75105 (%install) Any help would be greatly appreciated Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080604/65be2208/attachment-0001.html From moshek at voltaire.com Thu Jun 5 01:46:13 2008 From: moshek at voltaire.com (Moshe Kazir) Date: Thu Jun 5 19:23:42 2008 Subject: [mvapich-discuss] RE: [ewg] RE: problems with OFED installation with MVAPICH2 RPM In-Reply-To: References: <003801c8c63a$d87a0fa0$020da8c0@YOURCB10AA3FFD> Message-ID: <39C75744D164D948A170E9792AF8E7CAC5AFBC@exil.voltaire.com> OFED-1.3.1-rc2 was tested in Voltaire on RHEL5 U 1 and RHEL5 U 2 on x86_64 AMD I tested using ./install.pl --all everything passed with no errors. Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com ________________________________ From: ewg-bounces@lists.openfabrics.org [mailto:ewg-bounces@lists.openfabrics.org] On Behalf Of Lenny Neil Sent: Wednesday, June 04, 2008 3:09 PM To: Bill Boas Cc: mvapich-discuss@cse.ohio-state.edu; 'OpenFabrics EWG'; 'Richard A Stephens Jr'; 'Russell Dube'; 'Sean T Price'; 'David R Gulla' Subject: [ewg] RE: problems with OFED installation with MVAPICH2 RPM Good Morning Bill Thank you for forwarding my email and for the fast response. Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 "Bill Boas" 06/04/2008 08:03 AM To "'Lenny Neil'" , "'OpenFabrics EWG'" , cc "'Richard A Stephens Jr'" , "'Russell Dube'" , "'David R Gulla'" , "'Sean T Price'" Subject RE: problems with OFED installation with MVAPICH2 RPM Lenny, The membership mail-list at OFA comes to me, amongst others. The mail list on which you will get responses to your questions is the Enterprise Working Group as they plan, create, test and release OFED on behalf of the Open Fabrics Alliance (OFA). Let me know if there are other issues or questions that you are aware of. Thanks. Bill. Bill Boas VP, Business Development Vice Chair, OFA System Fabric Works 510-375-8840 bboas@systemfabricworks.com www.systemfabricworks.com ________________________________ From: Lenny Neil [mailto:Lenny_Neil@raytheon.com] Sent: Tuesday, June 03, 2008 7:44 PM To: membership@openfabrics.org; mvapich-discuss@cse.ohio-state.edu Cc: Richard A Stephens Jr; Russell Dube; David R Gulla; Sean T Price Subject: problems with OFED installation with MVAPICH2 RPM Good evening The installation of OFED 1.3 and 1.3.1 - I have been running into some issues installing on RHEL 5.1 kernel-2.6.18-53.el5.*64. I'm also seeing installation failures with OFED with the latest RHEL 5.1 kernel-2.6.18-92, which your documentation says it's not covered as of yet. Do you know when you'll have a updated OFED package that will install on the latest RHEL 5.1 kernel? Any ideas why we are seeing the error below? The RHEL 5.1 system running kernel-2.6.18-53 has gcc and gcc++ installed with the development package. Here is the error Building the MVAPICH2 RPM in the OFA configuration. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist ' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' --define 'rdma_cm 0' --define 'blcr_home /usr/ofed' --define 'ckpt 1' --define 'open_ib_home /usr' --define '_usr /usr' --define 'shared_libs 1' --define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran F90=gfortran' --define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_prefix /usr/mpi/gcc/mvapich2-1.0.3' --define 'ofa_build 0' /local/OFED-1.3.1/SRPMS/mvapich2-1.0.3-1.src.rpm Failed to build mvapich2 RPM See /tmp/OFED.4596.logs/mvapich2.rpmbuild.log checking for C compiler default output file name... configure: error: C compiler cannot create executa bles Failure in configuration. Please file an error report to mvapich-discuss@cse.ohio-state.edu with all your log files. error: Bad exit status from /var/tmp/rpm-tmp.75105 (%install) Any help would be greatly appreciated Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080605/dad07e49/attachment-0001.html From Lenny_Neil at raytheon.com Thu Jun 5 08:17:18 2008 From: Lenny_Neil at raytheon.com (Lenny Neil) Date: Thu Jun 5 19:23:43 2008 Subject: [mvapich-discuss] RE: [ewg] RE: problems with OFED installation with MVAPICH2 RPM In-Reply-To: <39C75744D164D948A170E9792AF8E7CAC5AFBC@exil.voltaire.com> Message-ID: Good Morning Moshe Which RHEL 5.1 kernel version are you running ? Can you please check and let me know, Thanks in advance Lenny To find out do cat /var/log/rpmpkgs | grep kernel Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 "Moshe Kazir" 06/05/2008 01:46 AM To "Lenny Neil" , "Bill Boas" cc , "OpenFabrics EWG" , "Richard A Stephens Jr" , "Russell Dube" , "Sean T Price" , "David R Gulla" Subject RE: [ewg] RE: problems with OFED installation with MVAPICH2 RPM OFED-1.3.1-rc2 was tested in Voltaire on RHEL5 U 1 and RHEL5 U 2 on x86_64 AMD I tested using ./install.pl --all everything passed with no errors. Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire ? The Grid Backbone www.voltaire.com From: ewg-bounces@lists.openfabrics.org [mailto:ewg-bounces@lists.openfabrics.org] On Behalf Of Lenny Neil Sent: Wednesday, June 04, 2008 3:09 PM To: Bill Boas Cc: mvapich-discuss@cse.ohio-state.edu; 'OpenFabrics EWG'; 'Richard A Stephens Jr'; 'Russell Dube'; 'Sean T Price'; 'David R Gulla' Subject: [ewg] RE: problems with OFED installation with MVAPICH2 RPM Good Morning Bill Thank you for forwarding my email and for the fast response. Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 "Bill Boas" 06/04/2008 08:03 AM To "'Lenny Neil'" , "'OpenFabrics EWG'" , cc "'Richard A Stephens Jr'" , "'Russell Dube'" , "'David R Gulla'" , "'Sean T Price'" Subject RE: problems with OFED installation with MVAPICH2 RPM Lenny, The membership mail-list at OFA comes to me, amongst others. The mail list on which you will get responses to your questions is the Enterprise Working Group as they plan, create, test and release OFED on behalf of the Open Fabrics Alliance (OFA). Let me know if there are other issues or questions that you are aware of. Thanks. Bill. Bill Boas VP, Business Development Vice Chair, OFA System Fabric Works 510-375-8840 bboas@systemfabricworks.com www.systemfabricworks.com From: Lenny Neil [mailto:Lenny_Neil@raytheon.com] Sent: Tuesday, June 03, 2008 7:44 PM To: membership@openfabrics.org; mvapich-discuss@cse.ohio-state.edu Cc: Richard A Stephens Jr; Russell Dube; David R Gulla; Sean T Price Subject: problems with OFED installation with MVAPICH2 RPM Good evening The installation of OFED 1.3 and 1.3.1 - I have been running into some issues installing on RHEL 5.1 kernel-2.6.18-53.el5.*64. I'm also seeing installation failures with OFED with the latest RHEL 5.1 kernel-2.6.18-92, which your documentation says it's not covered as of yet. Do you know when you'll have a updated OFED package that will install on the latest RHEL 5.1 kernel? Any ideas why we are seeing the error below? The RHEL 5.1 system running kernel-2.6.18-53 has gcc and gcc++ installed with the development package. Here is the error Building the MVAPICH2 RPM in the OFA configuration. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist ' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' --define 'rdma_cm 0' --define 'blcr_home /usr/ofed' --define 'ckpt 1' --define 'open_ib_home /usr' --define '_usr /usr' --define 'shared_libs 1' --define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran F90=gfortran' --define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_prefix /usr/mpi/gcc/mvapich2-1.0.3' --define 'ofa_build 0' /local/OFED-1.3.1/SRPMS/mvapich2-1.0.3-1.src.rpm Failed to build mvapich2 RPM See /tmp/OFED.4596.logs/mvapich2.rpmbuild.log checking for C compiler default output file name... configure: error: C compiler cannot create executa bles Failure in configuration. Please file an error report to mvapich-discuss@cse.ohio-state.edu with all your log files. error: Bad exit status from /var/tmp/rpm-tmp.75105 (%install) Any help would be greatly appreciated Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080605/66fb2d9d/attachment-0001.html From Lenny_Neil at raytheon.com Thu Jun 5 12:04:54 2008 From: Lenny_Neil at raytheon.com (Lenny Neil) Date: Thu Jun 5 19:23:43 2008 Subject: [mvapich-discuss] RE: [ewg] RE: problems with OFED installation with MVAPICH2 RPM In-Reply-To: <39C75744D164D948A170E9792AF8E7CAC5AFBC@exil.voltaire.com> Message-ID: Good news --- installed OFED 1.3.1 on cancun9 successfully.. After seeing this email I tried installing OFED 1.3.1 using ./install.pl --all and not going through the installation menu's where I selected to install all through a menu selection. You guessed it.. It installed. !!! even with the latest updates on the systems YEAH !!! Sean please test cancun9 stuff. Lenny "Moshe Kazir" 06/05/2008 01:46 AM To "Lenny Neil" , "Bill Boas" cc , "OpenFabrics EWG" , "Richard A Stephens Jr" , "Russell Dube" , "Sean T Price" , "David R Gulla" Subject RE: [ewg] RE: problems with OFED installation with MVAPICH2 RPM OFED-1.3.1-rc2 was tested in Voltaire on RHEL5 U 1 and RHEL5 U 2 on x86_64 AMD I tested using ./install.pl --all everything passed with no errors. Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire ? The Grid Backbone www.voltaire.com From: ewg-bounces@lists.openfabrics.org [mailto:ewg-bounces@lists.openfabrics.org] On Behalf Of Lenny Neil Sent: Wednesday, June 04, 2008 3:09 PM To: Bill Boas Cc: mvapich-discuss@cse.ohio-state.edu; 'OpenFabrics EWG'; 'Richard A Stephens Jr'; 'Russell Dube'; 'Sean T Price'; 'David R Gulla' Subject: [ewg] RE: problems with OFED installation with MVAPICH2 RPM Good Morning Bill Thank you for forwarding my email and for the fast response. Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 "Bill Boas" 06/04/2008 08:03 AM To "'Lenny Neil'" , "'OpenFabrics EWG'" , cc "'Richard A Stephens Jr'" , "'Russell Dube'" , "'David R Gulla'" , "'Sean T Price'" Subject RE: problems with OFED installation with MVAPICH2 RPM Lenny, The membership mail-list at OFA comes to me, amongst others. The mail list on which you will get responses to your questions is the Enterprise Working Group as they plan, create, test and release OFED on behalf of the Open Fabrics Alliance (OFA). Let me know if there are other issues or questions that you are aware of. Thanks. Bill. Bill Boas VP, Business Development Vice Chair, OFA System Fabric Works 510-375-8840 bboas@systemfabricworks.com www.systemfabricworks.com From: Lenny Neil [mailto:Lenny_Neil@raytheon.com] Sent: Tuesday, June 03, 2008 7:44 PM To: membership@openfabrics.org; mvapich-discuss@cse.ohio-state.edu Cc: Richard A Stephens Jr; Russell Dube; David R Gulla; Sean T Price Subject: problems with OFED installation with MVAPICH2 RPM Good evening The installation of OFED 1.3 and 1.3.1 - I have been running into some issues installing on RHEL 5.1 kernel-2.6.18-53.el5.*64. I'm also seeing installation failures with OFED with the latest RHEL 5.1 kernel-2.6.18-92, which your documentation says it's not covered as of yet. Do you know when you'll have a updated OFED package that will install on the latest RHEL 5.1 kernel? Any ideas why we are seeing the error below? The RHEL 5.1 system running kernel-2.6.18-53 has gcc and gcc++ installed with the development package. Here is the error Building the MVAPICH2 RPM in the OFA configuration. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist ' --target x86_64 --define '_name mvapich2_gcc' --define 'impl ofa' --define 'rdma_cm 0' --define 'blcr_home /usr/ofed' --define 'ckpt 1' --define 'open_ib_home /usr' --define '_usr /usr' --define 'shared_libs 1' --define 'romio 1' --define 'comp_env CC=gcc CXX=g++ F77=gfortran F90=gfortran' --define 'auto_req 0' --define 'mpi_selector /usr/bin/mpi-selector' --define '_prefix /usr/mpi/gcc/mvapich2-1.0.3' --define 'ofa_build 0' /local/OFED-1.3.1/SRPMS/mvapich2-1.0.3-1.src.rpm Failed to build mvapich2 RPM See /tmp/OFED.4596.logs/mvapich2.rpmbuild.log checking for C compiler default output file name... configure: error: C compiler cannot create executa bles Failure in configuration. Please file an error report to mvapich-discuss@cse.ohio-state.edu with all your log files. error: Bad exit status from /var/tmp/rpm-tmp.75105 (%install) Any help would be greatly appreciated Thanks, Lenny Neil Sudbury Site Support Group Lead SRP Computer Systems Lead Raytheon Integrated Defense Systems SSC - Sudbury, MA Work 978-440-2876 Cell 978-423-5120 Pager 978-245-1941 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080605/903c2b40/attachment-0001.html From panda at cse.ohio-state.edu Thu Jun 5 23:27:02 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Jun 5 23:27:18 2008 Subject: [mvapich-discuss] Followup: mvapich2 issue regarding mpd timeout in mpiexec In-Reply-To: Message-ID: Hi David, Thanks for your note here. Please feel free to use a higher value of 200 for larger cluster. We are exploring to see whether we can dynamically adjust this value based on the system size. We are also forwarding this note to MPICH2 folks. FYI, in the upcoming MVAPICH2 release, we will be providing a non-MPD-based scalable startup mechanism (mpirun_rsh, similar to the one used in MVAPICH). This will help to launch MPI jobs on multi-thousand node clusters with very little overhead. The upcoming release will be available in a few weeks. Thanks, DK On Thu, 29 May 2008 David_Kewley@Dell.com wrote: > This is a followup to this thread: > > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-May/000834 > .html > > between Greg Bauer and Qi Gao. > > We had the same problem that Greg saw -- failure of mpiexec, with the > characteristic error message "no msg recvd from mpd when expecting ack > of request". It was resolved for us by setting recvTimeout in > mpiexec.py to a higher value, just as Greg suggested and Qi concurred. > The default value is 20; we chose 200 (we did not experiment with values > between these two, so lower may work in many cases). > > I think this change should be made permanent in MVAPICH2. I do not > think it will negatively impact anyone, because in the four cases where > this timeout is used, if the timeout expires mpiexec immediately makes > an error exit anyway. So the worst consequence is that mpiexec would > take longer to fail (3 minutes longer if 200 is used instead of 20). > The user who encounters this timeout has to fix the root cause of the > timeout in order to get any work done, so they are not likely to > encounter it repeatedly and thereby lose lots of runtime simply because > the timeout is large. Is this analysis correct? > > Meanwhile, this change would clearly help at least some people with > large clusters. We see failure with the default recvTimeout between 900 > and 1000 processes; larger recvTimeout allows us to scale to 3000 > processes and beyond. > > The default setting does not cause failure if I make a simple, direct > call to mpiexec. I only see it when I use mpirun.lsf to launch a large > job. I think the failure in the LSF case is due to the longer time it > presumably takes to launch LSF's TaskStarter for every process, etc. > The time required seems to be O(#processes) in the LSF case. (We have > LSF 6.2, with a local custom wrapper script for TaskStarter). > > If you agree that this change to the value of recvTimeout is OK, please > implement this one-line change in MVAPICH2, and consider contributing it > upstream to MPICH2 as well. > > If you decline to make this change, at least it's now on the web that > this change does fix the problem. :) > > Thanks, > David > > David Kewley > Dell Infrastructure Consulting Services > Onsite Engineer at the Maui HPC Center > Cell: 602-460-7617 > David_Kewley@Dell.com > > Dell Services: http://www.dell.com/services/ > How am I doing? Email my manager Russell_Kelly@Dell.com with any > feedback. > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From huanwei at cse.ohio-state.edu Thu Jun 5 23:36:15 2008 From: huanwei at cse.ohio-state.edu (wei huang) Date: Thu Jun 5 23:36:32 2008 Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier command (fwd) In-Reply-To: Message-ID: Hi Rafael, smpd is not the default launcher for mpich2, on which our mvapich2 is based. Thus, there can be issues and unstability with that. We strongly recommend using mpd based startup. Is there any specific reason that you want to use daemonless startup? FYI, we are working on a new mvapich2 release which will have our own daemonless startup support. It will be available in couple of weeks. Maybe you can use that once it is released. Thanks. -- Wei > ---------- Forwarded message ---------- > Date: Thu, 05 Jun 2008 10:48:20 +0200 > From: Rafael Arco Arredondo > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier > command > > Hello, > > We are having some issues with MVAPICH2-1.0.2 and the OFA drivers for > InfiniBand. The compilation process of MVAPICH2 ends successfully, and > the applications compile with (apparently) no problems with mpicc. > However, mpiexec fails when programs are executed on more than one > computer. Particularly, MPI_Finalize reports an error which comes from > MPIDI_CH3I_RMDA_finalize. The error occurs both for OFA-Gen2 and uDAPL. > We are using daemonless smpd > > Here is the command executed and its output: > mpiexec -rsh -nopm -n 10 -machinefile ./hosts /home/rafaarco/mmul > > Task 0 of 10 > Task 3 of 10 > mpi_matrix_mult_slave() > Task 4 of 10 > mpi_matrix_mult_slave() > Task 5 of 10 > mpi_matrix_mult_slave() > Task 7 of 10 > mpi_matrix_mult_slave() > Task 8 of 10 > mpi_matrix_mult_slave() > Task 9 of 10 > mpi_matrix_mult_slave() > Task 1 of 10 > mpi_matrix_mult_slave() > Task 2 of 10 > mpi_matrix_mult_slave() > Task 6 of 10 > mpi_matrix_mult_slave() > mpi_matrix_mult_master() > Exiting task 1 of 10 > Exiting task 2 of 10 > Exiting task 3 of 10 > Exiting task 4 of 10 > Exiting task 5 of 10 > Exiting task 6 of 10 > Exiting task 7 of 10 > Exiting task 8 of 10 > Exiting task 9 of 10 > Time: 3.258242 > Exiting task 0 of 10 > [0] unable to post a write of the barrier command. > [0] PMI_Barrier failed. > Fatal error in MPI_Finalize: > Other MPI error, error stack: > MPI_Finalize(234)............: MPI_Finalize failed > MPI_Finalize(154)............: > MPID_Finalize(132)...........: > MPIDI_CH3_Finalize(87).......: MPI_Finalize failed > MPIDI_CH3_Finalize(70).......: > MPIDI_CH3I_RMDA_finalize(736): PMI_Barrier returned -1 > > Any clues about what the problem may be? > > Thanks in advance, > > Rafa > > -- > Rafael Arco Arredondo > Centro de Servicios de Informática y Redes de Comunicaciones > Campus de Fuentenueva - Edificio Mecenas > Universidad de Granada > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From jbernstein at penguincomputing.com Fri Jun 6 16:59:34 2008 From: jbernstein at penguincomputing.com (Joshua Bernstein) Date: Fri Jun 6 16:59:50 2008 Subject: [mvapich-discuss] how to set "ulimit -l unlimited" at user level? In-Reply-To: <92eddfb50806031147p2a2e4303jbbca9ee3be397d52@mail.gmail.com> References: <000501c8c5a0$552163f0$080aa8c0@DJ85D2C1> <92eddfb50806031147p2a2e4303jbbca9ee3be397d52@mail.gmail.com> Message-ID: <4849A536.80606@penguincomputing.com> Also, If you are running AMBER with using SSH, you will want to add the ulimit -l command to your /etc/init.d/sshd startup script on the nodes. That way any proces forked by SSH on the compute node will inherit that setting and hence allow AMBER to run. -Joshua Bernstein Software Engineer Penguin Computing Karthik Gopalakrishnan wrote: > Try adding the "ulimit -c unlimited" line to /etc/profile first. > > Regards, > Karthik > > On Tue, Jun 3, 2008 at 1:36 PM, Divi Venkateswarlu wrote: >> Hello: >> >> I am running ROCKS-5 on two DP quad-core machines with mellanox IB HCA >> cards. >> I compiled mvapich with ifort without any problems. >> >> I am able to run at root level with NO problems. I could set ulimit -l >> unlimited to increase >> RLIMIT_MEMLOCK size. My program (PMEMD of AMBER package) runs on all >> 16-cores with no hiccups. >> >> When I try to set ulimit -l unlimited at user level, I get the >> following error message. >> >> -bash: ulimit: max locked memory: cannot modify limit: Operation not >> permitted. >> >> Can somebody help me how to fix this problem? I am running mvapich-1.0 >> >> Thanks a lot for your help >> >> Divi >> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From koop at cse.ohio-state.edu Fri Jun 6 22:38:58 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri Jun 6 22:39:15 2008 Subject: [mvapich-discuss] Bug: deadlock between ibv_destroy_srq and async_thread In-Reply-To: Message-ID: We've done some offline discussion and the patch appears to have resolved the issue. This is now checked into our SVN and will be in future MVAPICH2 releases. Matt On Wed, 28 May 2008 David_Kewley@Dell.com wrote: > Matt, > > Thank you very much. Glancing at the patch, looks good to me. I'll > apply and test it ASAP, should be able to report the results tomorrow or > Friday. > > David > > > -----Original Message----- > > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > > Sent: Wednesday, May 28, 2008 11:35 AM > > To: Kewley, David > > Cc: mvapich-discuss@cse.ohio-state.edu > > Subject: Re: [mvapich-discuss] Bug: deadlock between ibv_destroy_srq > and > > async_thread > > > > David, > > > > Can you try the attached patch and let us know if it solves your > problem? > > This adds some synchronization between the threads. > > > > Thanks, > > > > Matt > > > > On Fri, 23 May 2008 David_Kewley@Dell.com wrote: > > > > > I have a user running a 192-way job using MVAPICH2 1.0.1 and OFED > > > 1.2.5.5, > > > where MPI_Finalize() does not return. In the two example jobs I've > > > examined, > > > 189 processes exited, but the other three hung. The ranks that hung > > > were > > > different in the two examples, so I don't think the "3" is > significant. > > > > > > All processes I've looked at appear to be stuck in the same way. In > > > normal > > > running, each process has four threads. When the process gets > stuck, > > > only the > > > original thread remains. Here is a gdb backtrace from one: > > > > > > #0 0x00000036b2608b3a in pthread_cond_wait@@GLIBC_2.3.2 () from > > > /lib64/tls/libpthread.so.0 > > > #1 0x0000002a9595405b in ibv_cmd_destroy_srq (srq=0x82b370) at > > > src/cmd.c:582 > > > #2 0x0000002a962b5419 in mthca_destroy_srq (srq=0x82b3bc) at > > > src/verbs.c:475 > > > #3 0x0000002a9564878e in MPIDI_CH3I_CM_Finalize () from > > > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so > > > #4 0x0000002a955c053b in MPIDI_CH3_Finalize () from > > > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so > > > #5 0x0000002a95626202 in MPID_Finalize () from > > > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so > > > #6 0x0000002a955f7fee in PMPI_Finalize () from > > > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so > > > #7 0x0000002a955f7eae in pmpi_finalize_ () from > > > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so > > > #8 0x0000000000459ff8 in stoprog_ () > > > #9 0x000000000047afa6 in MAIN__ () > > > #10 0x0000000000405d62 in main () > > > > > > After hours of opportunity to study the MVAPICH2 code :), I think I > > > tracked it > > > down to lines 1302-1306 in rdma_iba_init.c: > > > > > > if (MPIDI_CH3I_RDMA_Process.has_srq) { > > > > pthread_cancel(MPIDI_CH3I_RDMA_Process.async_thread[i]); > > > > pthread_join(MPIDI_CH3I_RDMA_Process.async_thread[i], > > > NULL); > > > > ibv_destroy_srq(MPIDI_CH3I_RDMA_Process.srq_hndl[i]); > > > } > > > > > > Consider what would happen if async_thread() were processing a > > > IBV_EVENT_SRQ_LIMIT_REACHED event when pthread_cancel() was called > on > > > async_thread(). async_thread() has already called > ibv_get_async_event() > > > for this event, but it has not yet called ibv_ack_async_event(). > The > > > result would be the observed deadlock in this part of > > > ibv_cmd_destroy_srq(): > > > > > > pthread_mutex_lock(&srq->mutex); > > > while (srq->events_completed != resp.events_reported) > > > pthread_cond_wait(&srq->cond, &srq->mutex); > > > pthread_mutex_unlock(&srq->mutex); > > > > > > That is, events_completed == events_reported-1 at this point. The > > > pthread_cond_signal() would be called, and events_completed could be > > > made > > > equal to events_reported, only by by calling ibv_ack_async_event() > on > > > this > > > event. But that will never happen because async_thread() is the > only > > > code > > > that would have done that, and it's already been pthread_cancel()'d > and > > > pthread_join()'d before ibv_destroy_srq() is called. > > > > > > I think the fix is to add some sort of synchronization between > > > async_thread() and the code that calls the pthread_cancel() on it. > To > > > the > > > MVAPICH developers: Do you think you can work up a fix soon, and > forward > > > the patch for testing? > > > > > > Thanks, > > > David > > > > > > > > > David Kewley > > > Dell Infrastructure Consulting Services > > > Onsite Engineer at the Maui HPC Center > > > Cell: 602-460-7617 > > > David_Kewley@Dell.com > > > > > > Dell Services: http://www.dell.com/services/ > > > How am I doing? Email my manager Russell_Kelly@Dell.com with any > > > feedback. > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > From christian.guggenberger at rzg.mpg.de Sat Jun 7 01:46:36 2008 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Sat Jun 7 01:46:57 2008 Subject: [mvapich-discuss] Bug: deadlock between ibv_destroy_srq and async_thread In-Reply-To: References: Message-ID: <20080607054635.GA31970@daltons.rzg.mpg.de> On Fri, Jun 06, 2008 at 10:38:58PM -0400, Matthew Koop wrote: > > We've done some offline discussion and the patch appears to have resolved > the issue. This is now checked into our SVN and will be in future > MVAPICH2 releases. thanks for the feedback, Matthew. I know, there's a major mvapich2-release planned for the next weeks - will there also be some kind of bugfix release to the 1.0.? release series ? cheers. - Christian From divi at ncat.edu Fri Jun 6 19:03:53 2008 From: divi at ncat.edu (Divi Venkateswarlu) Date: Sat Jun 7 08:03:12 2008 Subject: [mvapich-discuss] Poor scale-up from 16 to 64 processors!! Message-ID: <005e01c8c829$9785c5c0$080aa8c0@DJ85D2C1> Hello all: I have just built a 64-core cluster and the following is my setup. 8 DP quad-core machines running ROCKS-5 and MVAPICH-1.0 I am using 8-port flextronic switch (it is SDR switch) and the cards are MHES18 (10 GB/sec HCA cards) I have the following questions. How do I know if my computation is using IB network or ethernetwork? I named each IB card "fast1 .... fast8. I created a host file with 8-copies of each of fast1, fast2.....fast8. The details of 8 nodes with IB config below (only two shown).. HOST SUBNET IFACE IP NETMASK NAME divilab: ibnet ib0 20.1.1.1 255.255.255.0 fast1 ................................................................................................................ ................................................................................................................ compute-0-6: ibnet ib0 20.1.1.8 255.255.255.0 fast8 I do not see any scale-up from 16 to 32 to 64 processes. One benchmark of MD simulation (for one picosecond) of a protein (FIXa) is given below: The MD code is PMEMD/MVAPICH with IFORT/MKL compilation. # of CPUs/cores Time (sec) Nodes (load-balanced) 8 82 8 16 49 8 32 42 8 64 39 8 I am suspecting that I might have not set up something right or SDR switch/card limitations... definitely not happy with poor scale-up... I used all default values of make.mvapich.gen2 (with intel fortran 9.0). There seems too many options in this script. Not sure what most of them would do, therefore, just let the script run as such. Could somebody offer some help on how to fix/improve the scaling? Thanks a lot... Divi ----- Original Message ----- From: "Joshua Bernstein" To: "Karthik Gopalakrishnan" Cc: "Divi Venkateswarlu" ; Sent: Friday, June 06, 2008 4:59 PM Subject: Re: [mvapich-discuss] how to set "ulimit -l unlimited" at user level? > Also, > > If you are running AMBER with using SSH, you will want to add the > ulimit -l command to your /etc/init.d/sshd startup script on the nodes. > That way any proces forked by SSH on the compute node will inherit that > setting and hence allow AMBER to run. > > -Joshua Bernstein > Software Engineer > Penguin Computing > > Karthik Gopalakrishnan wrote: >> Try adding the "ulimit -c unlimited" line to /etc/profile first. >> >> Regards, >> Karthik >> >> On Tue, Jun 3, 2008 at 1:36 PM, Divi Venkateswarlu wrote: >>> Hello: >>> >>> I am running ROCKS-5 on two DP quad-core machines with mellanox IB >>> HCA >>> cards. >>> I compiled mvapich with ifort without any problems. >>> >>> I am able to run at root level with NO problems. I could set >>> ulimit -l >>> unlimited to increase >>> RLIMIT_MEMLOCK size. My program (PMEMD of AMBER package) runs on >>> all >>> 16-cores with no hiccups. >>> >>> When I try to set ulimit -l unlimited at user level, I get the >>> following error message. >>> >>> -bash: ulimit: max locked memory: cannot modify limit: Operation >>> not >>> permitted. >>> >>> Can somebody help me how to fix this problem? I am running >>> mvapich-1.0 >>> >>> Thanks a lot for your help >>> >>> Divi >>> >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080606/04a339c4/attachment-0001.html From divi at ncat.edu Fri Jun 6 22:30:37 2008 From: divi at ncat.edu (Divi Venkateswarlu) Date: Sat Jun 7 08:03:13 2008 Subject: [mvapich-discuss] How to check MPI traffic going through Infiniband ports? Message-ID: <003301c8c846$789cd870$080aa8c0@DJ85D2C1> Hello all: I have just built a 64-core cluster and the following is my setup. 8 DP quad-core machines running ROCKS-5 and MVAPICH-1.0 I am using 8-port flextronic switch (it is SDR switch) and the cards are MHES18 (10 GB/sec HCA cards) I have the following questions. How do I know if my computation is using IB network or ethernetwork? I named each IB card "fast1 .... fast8. I created a host file with 8-copies of each of fast1, fast2.....fast8. The details of 8 nodes with IB config below (only two shown).. HOST SUBNET IFACE IP NETMASK NAME divilab: ibnet ib0 20.1.1.1 255.255.255.0 fast1 ................................................................................................................ ................................................................................................................ compute-0-6: ibnet ib0 20.1.1.8 255.255.255.0 fast8 I do not see any scale-up from 16 to 32 to 64 processes. One benchmark of MD simulation (for one picosecond) of a protein (FIXa) is given below: The MD code is PMEMD/MVAPICH with IFORT/MKL compilation. # of CPUs/cores Time (sec) Nodes (load-balanced) 8 82 8 16 49 8 32 42 8 64 39 8 I am suspecting that I might have not set up something right or SDR switch/card limitations... definitely not happy with poor scale-up... I used all default values of make.mvapich.gen2 (with intel fortran 9.0). There seems too many options in this script. Not sure what most of them would do, therefore, just let the script run as such. Could somebody offer some help on how to fix/improve the scaling? Thanks a lot... Divi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080606/fd8a350a/attachment.html From Hung-Sheng.Tsao at Sun.COM Fri Jun 6 19:13:05 2008 From: Hung-Sheng.Tsao at Sun.COM (Dr. Hung-Sheng Tsao(LaoTsao)) Date: Sat Jun 7 08:03:55 2008 Subject: [mvapich-discuss] Re: [Rocks-Discuss] Poor scale-up from 16 to 64 processors!! In-Reply-To: <005e01c8c829$9785c5c0$080aa8c0@DJ85D2C1> References: <005e01c8c829$9785c5c0$080aa8c0@DJ85D2C1> Message-ID: <4849C481.8040002@sun.com> hi Is this intel or amd QC? Divi Venkateswarlu wrote: > Hello all: > > I have just built a 64-core cluster and the following is my setup. > > 8 DP quad-core machines running ROCKS-5 and MVAPICH-1.0 > I am using 8-port flextronic switch (it is SDR switch) and the cards > are MHES18 (10 GB/sec HCA cards) > > I have the following questions. > > How do I know if my computation is using IB network or ethernetwork? > I named each IB card "fast1 .... fast8. > I created a host file with 8-copies of each of fast1, fast2.....fast8. > > The details of 8 nodes with IB config below (only two shown).. > > HOST SUBNET IFACE IP NETMASK NAME > divilab: ibnet ib0 20.1.1.1 255.255.255.0 fast1 > ................................................................................................................ > ................................................................................................................ > compute-0-6: ibnet ib0 20.1.1.8 255.255.255.0 fast8 > > I do not see any scale-up from 16 to 32 to 64 processes. > > One benchmark of MD simulation (for one picosecond) of a protein (FIXa) is > given below: > > The MD code is PMEMD/MVAPICH with IFORT/MKL compilation. > > # of CPUs/cores Time (sec) Nodes (load-balanced) > 8 82 8 > 16 49 8 > 32 42 8 > 64 39 8 > > I am suspecting that I might have not set up something right or SDR switch/card limitations... > definitely not happy with poor scale-up... > > I used all default values of make.mvapich.gen2 (with intel fortran 9.0). > There seems too many options in this script. Not sure what most of them would do, therefore, just > let the script run as such. > > Could somebody offer some help on how to fix/improve the scaling? > > Thanks a lot... > Divi > > ----- Original Message ----- > From: "Joshua Bernstein" > To: "Karthik Gopalakrishnan" > Cc: "Divi Venkateswarlu" ; > > Sent: Friday, June 06, 2008 4:59 PM > Subject: Re: [mvapich-discuss] how to set "ulimit -l unlimited" at user > level? > > > >> Also, >> >> If you are running AMBER with using SSH, you will want to add the >> ulimit -l command to your /etc/init.d/sshd startup script on the nodes. >> That way any proces forked by SSH on the compute node will inherit that >> setting and hence allow AMBER to run. >> >> -Joshua Bernstein >> Software Engineer >> Penguin Computing >> >> Karthik Gopalakrishnan wrote: >> >>> Try adding the "ulimit -c unlimited" line to /etc/profile first. >>> >>> Regards, >>> Karthik >>> >>> On Tue, Jun 3, 2008 at 1:36 PM, Divi Venkateswarlu wrote: >>> >>>> Hello: >>>> >>>> I am running ROCKS-5 on two DP quad-core machines with mellanox IB >>>> HCA >>>> cards. >>>> I compiled mvapich with ifort without any problems. >>>> >>>> I am able to run at root level with NO problems. I could set >>>> ulimit -l >>>> unlimited to increase >>>> RLIMIT_MEMLOCK size. My program (PMEMD of AMBER package) runs on >>>> all >>>> 16-cores with no hiccups. >>>> >>>> When I try to set ulimit -l unlimited at user level, I get the >>>> following error message. >>>> >>>> -bash: ulimit: max locked memory: cannot modify limit: Operation >>>> not >>>> permitted. >>>> >>>> Can somebody help me how to fix this problem? I am running >>>> mvapich-1.0 >>>> >>>> Thanks a lot for your help >>>> >>>> Divi >>>> >>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20080606/04a339c4/attachment.html > -- Hung-Sheng Tsao, Ph.D. (LaoTsao) Sr. System Engineer US, GEH East Data Center Ambassador 400 Atrium Dr, 1ST FLOOR P/F:1877 319 0460 (x67079) Somerset, NJ 08873 C: 973 495 0840 http://blogs.sun.com/hstsao/ E:Hung-Sheng.Tsao@sun.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From panda at cse.ohio-state.edu Sat Jun 7 08:09:38 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Jun 7 08:09:55 2008 Subject: [mvapich-discuss] Bug: deadlock between ibv_destroy_srq and async_thread In-Reply-To: <20080607054635.GA31970@daltons.rzg.mpg.de> Message-ID: > > We've done some offline discussion and the patch appears to have resolved > > the issue. This is now checked into our SVN and will be in future > > MVAPICH2 releases. > > thanks for the feedback, Matthew. > > I know, there's a major mvapich2-release planned for the next weeks - > will there also be some kind of bugfix release to the 1.0.? release > series ? Yes, a bugfix release (MVAPICH2 1.0.3) is planned for the coming week. Thanks, DK > cheers. > - Christian > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From vera_wx_cn at yahoo.com.cn Sat Jun 7 08:42:17 2008 From: vera_wx_cn at yahoo.com.cn (=?gb2312?q?=C7=BF=20=C2=ED?=) Date: Sat Jun 7 08:42:38 2008 Subject: [mvapich-discuss] How far can mvapich go? Message-ID: <771908.10094.qm@web15312.mail.cnb.yahoo.com> How many tasks can mvapich-1.0 support on a job, with MPD, on-demand or without MPD? Why does't slurm support the mvapich with MPD? How many connections can a task keep with others if based on IBCM? How far can mvapich go? --------------------------------- ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080607/d28dec16/attachment.html From divi at ncat.edu Sat Jun 7 09:02:12 2008 From: divi at ncat.edu (Divi Venkateswarlu) Date: Sat Jun 7 09:02:30 2008 Subject: [mvapich-discuss] How to check if MVAPICH is using IB network but not ethernetwork? Message-ID: <007e01c8c89e$b3b32830$080aa8c0@DJ85D2C1> Hello all: Good morning! I set up a 64-core cluster based on ROCKS-5.0 using eight Dell PE2900 boxes. All are dual-processor QC machines. compiled MVAPICH-1.0 (using intel compiler) with default parameters in make.mvapich.gen2 IB stack is OFED-1.2.5.5. My MD program (PMEMD/AMBER) is compiled with no errors with IFORT/MKL libraries and I could run the code on all 64 cores, but the scaling from 16 to 32 to 64 is terrible. I am inclosing the benchmarks on a test run. # of CPUs/cores Time (sec) Nodes (load-balanced) Scaling (%) 8 82 8 100 16 49 8 84 32 42 8 49 64 39 8 26 In contrast, on single box, I get a reasonable scaling. # cores time (sec) 2 284 (100%) 4 164 (87% 8 107 (65%) For some reason, I suspect, MPI traffic is not going over IB net. MVAPICH is built using make.mvapich.gen2 with F77=ifort and CC=gcc mpif77 -link_info is: /state/partition1/fc91052/bin/ifort -L/usr/local/ofed/lib64 -L/usr/local/mvapich/lib -lmpich -L/usr/local/ofed/lib64 -Wl,-rpath=/usr/local/ofed/lib64 -libverbs -libumad -lpthread -lpthread -lrt How can I be sure that MPI traffic is going through IB network rather than ethernet? Are there any specific checks I should perform? Thanks a lot for your help. Divi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080607/0ed904bc/attachment-0001.html From divi at ncat.edu Sat Jun 7 09:07:12 2008 From: divi at ncat.edu (Divi Venkateswarlu) Date: Sat Jun 7 09:07:30 2008 Subject: [mvapich-discuss] Re: How to check if MVAPICH is using IB network but not ethernetwork? References: Message-ID: <00b801c8c89f$66969900$080aa8c0@DJ85D2C1> one more to add: My IB hardware 8-port Flextronics SDR switch MHES18 Mallanox HCA cards ibchecknet shows the following [root@divilab bin]# ibchecknet # Checking Ca: nodeguid 0x0002c90200244228 # Checking Ca: nodeguid 0x0002c90200244230 # Checking Ca: nodeguid 0x0002c902002740fc # Checking Ca: nodeguid 0x0002c902002441a4 # Checking Ca: nodeguid 0x0002c902002441c4 # Checking Ca: nodeguid 0x0002c9020024422c # Checking Ca: nodeguid 0x0002c902002441ac # Checking Ca: nodeguid 0x0002c9020024418c ## Summary: 9 nodes checked, 0 bad nodes found ## 16 ports checked, 0 bad ports found ## 0 ports have errors beyond threshold ----- Original Message ----- From: Divi Venkateswarlu To: mvapich-discuss@cse.ohio-state.edu Sent: Saturday, June 07, 2008 9:02 AM Subject: How to check if MVAPICH is using IB network but not ethernetwork? Hello all: Good morning! I set up a 64-core cluster based on ROCKS-5.0 using eight Dell PE2900 boxes. All are dual-processor QC machines. compiled MVAPICH-1.0 (using intel compiler) with default parameters in make.mvapich.gen2 IB stack is OFED-1.2.5.5. My MD program (PMEMD/AMBER) is compiled with no errors with IFORT/MKL libraries and I could run the code on all 64 cores, but the scaling from 16 to 32 to 64 is terrible. I am inclosing the benchmarks on a test run. # of CPUs/cores Time (sec) Nodes (load-balanced) Scaling (%) 8 82 8 100 16 49 8 84 32 42 8 49 64 39 8 26 In contrast, on single box, I get a reasonable scaling. # cores time (sec) 2 284 (100%) 4 164 (87% 8 107 (65%) For some reason, I suspect, MPI traffic is not going over IB net. MVAPICH is built using make.mvapich.gen2 with F77=ifort and CC=gcc mpif77 -link_info is: /state/partition1/fc91052/bin/ifort -L/usr/local/ofed/lib64 -L/usr/local/mvapich/lib -lmpich -L/usr/local/ofed/lib64 -Wl,-rpath=/usr/local/ofed/lib64 -libverbs -libumad -lpthread -lpthread -lrt How can I be sure that MPI traffic is going through IB network rather than ethernet? Are there any specific checks I should perform? Thanks a lot for your help. Divi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080607/a86b1dc3/attachment.html From panda at cse.ohio-state.edu Sat Jun 7 09:40:32 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Jun 7 09:40:49 2008 Subject: [mvapich-discuss] How far can mvapich go? In-Reply-To: <771908.10094.qm@web15312.mail.cnb.yahoo.com> Message-ID: > How many tasks can mvapich-1.0 support on a job, with MPD, on-demand > or without MPD? A new scalable and high performance start-up procedure (mpirun_rsh) has been added to mvapich-1.0. Several installations have used this start-up procedure to run `tens of thousands' of tasks. You should use this start-up procedure and should not see any limitation on very large-scale systems. > Why does't slurm support the mvapich with MPD? There is a support for slurm with mvapich. Several production installations also use mvapich with slurm. Please refer to the section 5.3 in mvapich 1.0 user guide about how to use this support: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-220005.3 > How many connections can a task keep with others if based on IBCM? RDMA_CM is the latest connection management with the Open Fabrics stack. If you are interested to use this connection management, you should use MVAPICH2 (instead of MVAPICH). MVAPICH2 has more advanced and latest features than MVAPICH. > How far can mvapich go? As indicated above, applications are being run with mvapich on tens of thousands of cores. Hope this helps. DK > > > --------------------------------- > ÑÅ»¢ÓÊÏ䣬ÄúµÄÖÕÉúÓÊÏ䣡 From divi at ncat.edu Sat Jun 7 09:42:24 2008 From: divi at ncat.edu (Divi Venkateswarlu) Date: Sat Jun 7 09:42:44 2008 Subject: [mvapich-discuss] Re: How to check if MVAPICH is using IB networkbut not ethernetwork? References: <00b801c8c89f$66969900$080aa8c0@DJ85D2C1> Message-ID: <010e01c8c8a4$512d63a0$080aa8c0@DJ85D2C1> One more to add: OSU_BENCHMARKS are as follows: [root@divilab osu_benchmarks]# mpirun_rsh -np 64 -hostfile ./host1 osu_alltoall # OSU MPI All-to-All Personalized Exchange Latency Test v3.0 # Size Latency (us) 1 651.27 2 650.83 4 647.61 8 669.23 16 658.48 32 652.27 64 663.42 128 698.62 256 795.00 512 2298.14 1024 2894.17 2048 4510.67 4096 8636.66 8192 18353.73 16384 34942.05 32768 43737.38 65536 71297.06 131072 138771.72 262144 273233.08 524288 543174.56 1048576 1086598.00 Are these as expected? ----- Original Message ----- From: Divi Venkateswarlu To: mvapich-discuss@cse.ohio-state.edu Sent: Saturday, June 07, 2008 9:07 AM Subject: [mvapich-discuss] Re: How to check if MVAPICH is using IB networkbut not ethernetwork? one more to add: My IB hardware 8-port Flextronics SDR switch MHES18 Mallanox HCA cards ibchecknet shows the following [root@divilab bin]# ibchecknet # Checking Ca: nodeguid 0x0002c90200244228 # Checking Ca: nodeguid 0x0002c90200244230 # Checking Ca: nodeguid 0x0002c902002740fc # Checking Ca: nodeguid 0x0002c902002441a4 # Checking Ca: nodeguid 0x0002c902002441c4 # Checking Ca: nodeguid 0x0002c9020024422c # Checking Ca: nodeguid 0x0002c902002441ac # Checking Ca: nodeguid 0x0002c9020024418c ## Summary: 9 nodes checked, 0 bad nodes found ## 16 ports checked, 0 bad ports found ## 0 ports have errors beyond threshold ----- Original Message ----- From: Divi Venkateswarlu To: mvapich-discuss@cse.ohio-state.edu Sent: Saturday, June 07, 2008 9:02 AM Subject: How to check if MVAPICH is using IB network but not ethernetwork? Hello all: Good morning! I set up a 64-core cluster based on ROCKS-5.0 using eight Dell PE2900 boxes. All are dual-processor QC machines. compiled MVAPICH-1.0 (using intel compiler) with default parameters in make.mvapich.gen2 IB stack is OFED-1.2.5.5. My MD program (PMEMD/AMBER) is compiled with no errors with IFORT/MKL libraries and I could run the code on all 64 cores, but the scaling from 16 to 32 to 64 is terrible. I am inclosing the benchmarks on a test run. # of CPUs/cores Time (sec) Nodes (load-balanced) Scaling (%) 8 82 8 100 16 49 8 84 32 42 8 49 64 39 8 26 In contrast, on single box, I get a reasonable scaling. # cores time (sec) 2 284 (100%) 4 164 (87% 8 107 (65%) For some reason, I suspect, MPI traffic is not going over IB net. MVAPICH is built using make.mvapich.gen2 with F77=ifort and CC=gcc mpif77 -link_info is: /state/partition1/fc91052/bin/ifort -L/usr/local/ofed/lib64 -L/usr/local/mvapich/lib -lmpich -L/usr/local/ofed/lib64 -Wl,-rpath=/usr/local/ofed/lib64 -libverbs -libumad -lpthread -lpthread -lrt How can I be sure that MPI traffic is going through IB network rather than ethernet? Are there any specific checks I should perform? Thanks a lot for your help. Divi ------------------------------------------------------------------------------ _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080607/4d5c9074/attachment-0001.html From panda at cse.ohio-state.edu Sat Jun 7 13:59:14 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Jun 7 13:59:29 2008 Subject: [mvapich-discuss] Re: How to check if MVAPICH is using IB networkbut not ethernetwork? In-Reply-To: <010e01c8c8a4$512d63a0$080aa8c0@DJ85D2C1> Message-ID: All-to-all is a much more complex operation. You will not be able to quickly find out whether your configuration is correct or not based on these numbers. You can run osu_latency, osu_bw, osu_bibw tests between two processes (across nodes and within nodes). Then you can compare your inter-node and intra-node performance numbers with the numbers and graphs available from the performance page of mvapich web site for different platforms, IB cards and devices. http://mvapich.cse.ohio-state.edu/performance/ If these numbers match/come close, you can be very sure that MVAPICH is using IB network on your system. Otherwise, there could be some issues in your installation. DK On Sat, 7 Jun 2008, Divi Venkateswarlu wrote: > One more to add: > > OSU_BENCHMARKS are as follows: > > [root@divilab osu_benchmarks]# mpirun_rsh -np 64 -hostfile ./host1 osu_alltoall > # OSU MPI All-to-All Personalized Exchange Latency Test v3.0 > # Size Latency (us) > 1 651.27 > 2 650.83 > 4 647.61 > 8 669.23 > 16 658.48 > 32 652.27 > 64 663.42 > 128 698.62 > 256 795.00 > 512 2298.14 > 1024 2894.17 > 2048 4510.67 > 4096 8636.66 > 8192 18353.73 > 16384 34942.05 > 32768 43737.38 > 65536 71297.06 > 131072 138771.72 > 262144 273233.08 > 524288 543174.56 > 1048576 1086598.00 > > Are these as expected? > > > ----- Original Message ----- > From: Divi Venkateswarlu > To: mvapich-discuss@cse.ohio-state.edu > Sent: Saturday, June 07, 2008 9:07 AM > Subject: [mvapich-discuss] Re: How to check if MVAPICH is using IB networkbut not ethernetwork? > > > > one more to add: > > My IB hardware 8-port Flextronics SDR switch > MHES18 Mallanox HCA cards > > ibchecknet shows the following > > [root@divilab bin]# ibchecknet > > # Checking Ca: nodeguid 0x0002c90200244228 > > # Checking Ca: nodeguid 0x0002c90200244230 > > # Checking Ca: nodeguid 0x0002c902002740fc > > # Checking Ca: nodeguid 0x0002c902002441a4 > > # Checking Ca: nodeguid 0x0002c902002441c4 > > # Checking Ca: nodeguid 0x0002c9020024422c > > # Checking Ca: nodeguid 0x0002c902002441ac > > # Checking Ca: nodeguid 0x0002c9020024418c > > ## Summary: 9 nodes checked, 0 bad nodes found > ## 16 ports checked, 0 bad ports found > ## 0 ports have errors beyond threshold > > ----- Original Message ----- > From: Divi Venkateswarlu > To: mvapich-discuss@cse.ohio-state.edu > Sent: Saturday, June 07, 2008 9:02 AM > Subject: How to check if MVAPICH is using IB network but not ethernetwork? > > > > Hello all: > > Good morning! > I set up a 64-core cluster based on ROCKS-5.0 using eight Dell PE2900 boxes. > All are dual-processor QC machines. > > compiled MVAPICH-1.0 (using intel compiler) with default parameters in make.mvapich.gen2 > IB stack is OFED-1.2.5.5. > > My MD program (PMEMD/AMBER) is compiled with no errors with IFORT/MKL libraries and > I could run the code on all 64 cores, but the scaling from 16 to 32 to 64 is terrible. I am inclosing > the benchmarks on a test run. > > # of CPUs/cores Time (sec) Nodes (load-balanced) Scaling (%) > 8 82 8 100 > 16 49 8 84 > 32 42 8 49 > 64 39 8 26 > > In contrast, on single box, I get a reasonable scaling. > > # cores time (sec) > 2 284 (100%) > 4 164 (87% > 8 107 (65%) > > For some reason, I suspect, MPI traffic is not going over IB net. > > MVAPICH is built using make.mvapich.gen2 with F77=ifort and CC=gcc > > mpif77 -link_info is: > > /state/partition1/fc91052/bin/ifort -L/usr/local/ofed/lib64 -L/usr/local/mvapich/lib > -lmpich -L/usr/local/ofed/lib64 -Wl,-rpath=/usr/local/ofed/lib64 -libverbs > -libumad -lpthread -lpthread -lrt > > > How can I be sure that MPI traffic is going through IB network rather than ethernet? > Are there any specific checks I should perform? > > Thanks a lot for your help. > > Divi > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From rafaarco at ugr.es Mon Jun 9 03:01:26 2008 From: rafaarco at ugr.es (Rafael Arco Arredondo) Date: Mon Jun 9 03:01:48 2008 Subject: [mvapich-discuss] mpiexec: unable to post a write of the barriercommand (fwd) In-Reply-To: References: Message-ID: <1212994886.3899.7.camel@boabdilmec.ugr.es> Hi Wei, Thanks for your reply. We'll try the following release when it's available. We have to use smpd in our scenario. We use Sun Grid Engine to launch the MPI processes, and so far it doesn't support mpd. Best regards, Rafa El jue, 05-06-2008 a las 23:36 -0400, wei huang escribi?: > Hi Rafael, > > smpd is not the default launcher for mpich2, on which our mvapich2 is > based. Thus, there can be issues and unstability with that. We strongly > recommend using mpd based startup. > > Is there any specific reason that you want to use daemonless startup? > > FYI, we are working on a new mvapich2 release which will have our own > daemonless startup support. It will be available in couple of weeks. Maybe > you can use that once it is released. > > Thanks. > > -- Wei > > > ---------- Forwarded message ---------- > > Date: Thu, 05 Jun 2008 10:48:20 +0200 > > From: Rafael Arco Arredondo > > To: mvapich-discuss@cse.ohio-state.edu > > Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier > > command > > > > Hello, > > > > We are having some issues with MVAPICH2-1.0.2 and the OFA drivers for > > InfiniBand. The compilation process of MVAPICH2 ends successfully, and > > the applications compile with (apparently) no problems with mpicc. > > However, mpiexec fails when programs are executed on more than one > > computer. Particularly, MPI_Finalize reports an error which comes from > > MPIDI_CH3I_RMDA_finalize. The error occurs both for OFA-Gen2 and uDAPL. > > We are using daemonless smpd > > > > Here is the command executed and its output: > > mpiexec -rsh -nopm -n 10 -machinefile ./hosts /home/rafaarco/mmul > > > > Task 0 of 10 > > Task 3 of 10 > > mpi_matrix_mult_slave() > > Task 4 of 10 > > mpi_matrix_mult_slave() > > Task 5 of 10 > > mpi_matrix_mult_slave() > > Task 7 of 10 > > mpi_matrix_mult_slave() > > Task 8 of 10 > > mpi_matrix_mult_slave() > > Task 9 of 10 > > mpi_matrix_mult_slave() > > Task 1 of 10 > > mpi_matrix_mult_slave() > > Task 2 of 10 > > mpi_matrix_mult_slave() > > Task 6 of 10 > > mpi_matrix_mult_slave() > > mpi_matrix_mult_master() > > Exiting task 1 of 10 > > Exiting task 2 of 10 > > Exiting task 3 of 10 > > Exiting task 4 of 10 > > Exiting task 5 of 10 > > Exiting task 6 of 10 > > Exiting task 7 of 10 > > Exiting task 8 of 10 > > Exiting task 9 of 10 > > Time: 3.258242 > > Exiting task 0 of 10 > > [0] unable to post a write of the barrier command. > > [0] PMI_Barrier failed. > > Fatal error in MPI_Finalize: > > Other MPI error, error stack: > > MPI_Finalize(234)............: MPI_Finalize failed > > MPI_Finalize(154)............: > > MPID_Finalize(132)...........: > > MPIDI_CH3_Finalize(87).......: MPI_Finalize failed > > MPIDI_CH3_Finalize(70).......: > > MPIDI_CH3I_RMDA_finalize(736): PMI_Barrier returned -1 > > > > Any clues about what the problem may be? > > > > Thanks in advance, > > > > Rafa > > > > -- > > Rafael Arco Arredondo > > Centro de Servicios de Inform?tica y Redes de Comunicaciones > > Campus de Fuentenueva - Edificio Mecenas > > Universidad de Granada > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > -- Rafael Arco Arredondo Centro de Servicios de Inform?tica y Redes de Comunicaciones Campus de Fuentenueva - Edificio Mecenas Universidad de Granada E-18071 Granada Spain Tel: +34 958 241010 Ext:31114 E-mail: rafaarco@ugr.es From curtisbr at cse.ohio-state.edu Mon Jun 9 12:02:53 2008 From: curtisbr at cse.ohio-state.edu (Brian Curtis) Date: Mon Jun 9 12:03:13 2008 Subject: [mvapich-discuss] Followup: mvapich2 issue regarding mpd timeout in mpiexec In-Reply-To: References: Message-ID: David, Thank you for this suggestion. We have enhanced the MPD mpiexec.py so that the timeout is based on a multiplier (default=0.05) and the number of processes. Further, this can be configured by setting the environment variable MV2_MPD_RECVTIMEOUT_MULTIPLIER. This enhancement is now available in our MVAPICH2 svn trunk (r2668) and 1.0 branch (r2669). Brian On May 29, 2008, at 11:06 PM, wrote: > This is a followup to this thread: > > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-May/ > 000834 > .html > > between Greg Bauer and Qi Gao. > > We had the same problem that Greg saw -- failure of mpiexec, with the > characteristic error message "no msg recvd from mpd when expecting ack > of request". It was resolved for us by setting recvTimeout in > mpiexec.py to a higher value, just as Greg suggested and Qi concurred. > The default value is 20; we chose 200 (we did not experiment with > values > between these two, so lower may work in many cases). > > I think this change should be made permanent in MVAPICH2. I do not > think it will negatively impact anyone, because in the four cases > where > this timeout is used, if the timeout expires mpiexec immediately makes > an error exit anyway. So the worst consequence is that mpiexec would > take longer to fail (3 minutes longer if 200 is used instead of 20). > The user who encounters this timeout has to fix the root cause of the > timeout in order to get any work done, so they are not likely to > encounter it repeatedly and thereby lose lots of runtime simply > because > the timeout is large. Is this analysis correct? > > Meanwhile, this change would clearly help at least some people with > large clusters. We see failure with the default recvTimeout > between 900 > and 1000 processes; larger recvTimeout allows us to scale to 3000 > processes and beyond. > > The default setting does not cause failure if I make a simple, direct > call to mpiexec. I only see it when I use mpirun.lsf to launch a > large > job. I think the failure in the LSF case is due to the longer time it > presumably takes to launch LSF's TaskStarter for every process, etc. > The time required seems to be O(#processes) in the LSF case. (We have > LSF 6.2, with a local custom wrapper script for TaskStarter). > > If you agree that this change to the value of recvTimeout is OK, > please > implement this one-line change in MVAPICH2, and consider > contributing it > upstream to MPICH2 as well. > > If you decline to make this change, at least it's now on the web that > this change does fix the problem. :) > > Thanks, > David > > David Kewley > Dell Infrastructure Consulting Services > Onsite Engineer at the Maui HPC Center > Cell: 602-460-7617 > David_Kewley@Dell.com > > Dell Services: http://www.dell.com/services/ > How am I doing? Email my manager Russell_Kelly@Dell.com with any > feedback. > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From panda at cse.ohio-state.edu Tue Jun 10 17:58:25 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Jun 10 17:58:40 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.0.3 Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH2 1.0.3 version. This is a bug-fix release. Specific changes (compared to MVAPICH2 1.0.2) are as follows: - Add additional synchronization before pthread_cancel() call in finalization to avoid killing the thread that is supposed to acknowledge outstanding IB events. Thanks to David Kewley from Dell for reporting this issue. - Post buffers before accepting QP connections for iWARP mode. Thanks to Steve Wise for reporting this issue. - Fix a startup performance issue when on-demand connection setup is not used. - Configurable MPD mpiexec based on the number of processes. Detailed CHANGELOG for MVAPICH2 versions can be obtained by visiting the following URL: http://mvapich.cse.ohio-state.edu/download/mvapich2/changes.shtml We strongly encourage MVAPICH2 users to update their installations to this latest version. For downloading MVAPICH2 1.0.3, associated user guide and accessing the SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu This version is also being made available through OFED 1.3.1. All feedbacks, including bug reports and hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From biswajit at crlindia.com Wed Jun 11 04:40:43 2008 From: biswajit at crlindia.com (biswajit@crlindia.com) Date: Wed Jun 11 04:42:05 2008 Subject: [mvapich-discuss] checkpointing failure ... Message-ID: While running HPL with checkpointing enabled MVAPICH2 1.0.2 the progamme crushed giving following errors: 1. 2: reregister dentry 0x642010, addr 0x2bc6578000 pagebase_low_p, 10121216 register_nbytes [2] Abort: reregister fails at line 1104 in file dreg.c rank 2 in job 1 n163_32790 caused collective abort of all ranks exit status of rank 2: killed by signal 9 [mpiexec_cr][/home/biswajit/mvapichBlcrInstall/mvapich2-1.0.2ckpt1/src/pm/mpd/mpiexec_cr.c: line 196]abort: checkpoint failed While restarting restart fails gving following errors: cri_syscall(CR_OP_RSTRT_PROCS): Invalid argument -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080611/7d708726/attachment.html From gopalakk at cse.ohio-state.edu Wed Jun 11 17:23:24 2008 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Thu Jun 12 02:17:39 2008 Subject: [mvapich-discuss] checkpointing failure ... In-Reply-To: References: Message-ID: <92eddfb50806111423ic984e5cj326bf7cac814c9bb@mail.gmail.com> Hi Biswajit. You are seeing this error due to a memory allocation failure. MVAPICH2 calls an OFED function ibv_reg_mr() to reallocate a memory region for the HCA after taking a checkpoint. This function call is returning an error. This is a failure in the OFED stack. Please let us know if you are seeing this error for all problem sizes or only large ones. Also let us know the amount of memory you have on your processing nodes. Maybe you are running low on memory which is causing the allocation failure. Thanks & Regards, Karthik On Wed, Jun 11, 2008 at 4:40 AM, wrote: > > While running HPL with checkpointing enabled MVAPICH2 1.0.2 the progamme > crushed giving > following errors: > > 1. 2: reregister dentry 0x642010, addr 0x2bc6578000 pagebase_low_p, > 10121216 register_nbytes > [2] Abort: reregister fails > at line 1104 in file dreg.c > rank 2 in job 1 n163_32790 caused collective abort of all ranks > exit status of rank 2: killed by signal 9 > > [mpiexec_cr][/home/biswajit/mvapichBlcrInstall/mvapich2-1.0.2ckpt1/src/pm/mpd/mpiexec_cr.c: > line 196]abort: checkpoint failed > > > > > While restarting restart fails gving following errors: > > cri_syscall(CR_OP_RSTRT_PROCS): Invalid argument > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From yiannis.georgiou at imag.fr Thu Jun 12 04:33:19 2008 From: yiannis.georgiou at imag.fr (yiannis georgiou) Date: Thu Jun 12 04:33:10 2008 Subject: [mvapich-discuss] checkpointing failure ... In-Reply-To: <92eddfb50806111423ic984e5cj326bf7cac814c9bb@mail.gmail.com> References: <92eddfb50806111423ic984e5cj326bf7cac814c9bb@mail.gmail.com> Message-ID: <20080612103319.eqr2pqznxck8wsc4@webmail.imag.fr> Hello, I got the same error several times mainly on large problem sizes and large number of nodes, even though I've seen it some times with small number of nodes as well. Most of the times the fatal error was produced just before the finish of the checkpointing procedure. The strange thing is that the error doesn't appear all the times. After a number of repeats under the same configuration (HPL problem size- cluster number of nodes), we observe that sometimes we can get this error but sometimes the checkpoints can be successfully taken and restarted. The explanation seems reasonable. Is this a known OFED bug ? Is there a way to avoid it? The nodes used on my experiments consist of : Intel Xeon EM64T 3GHz 2CPU , 1CORE 3 GHz / 1 MB L2 cache and memory 2 GB (4x512MB) / 400MHz(2.5ns) Thanks!! regards, Yiannis Quoting Karthik Gopalakrishnan : > Hi Biswajit. > > You are seeing this error due to a memory allocation failure. MVAPICH2 > calls an OFED function ibv_reg_mr() to reallocate a memory region for > the HCA after taking a checkpoint. This function call is returning an > error. This is a failure in the OFED stack. Please let us know if you > are seeing this error for all problem sizes or only large ones. Also > let us know the amount of memory you have on your processing nodes. > Maybe you are running low on memory which is causing the allocation > failure. > > Thanks & Regards, > Karthik > > On Wed, Jun 11, 2008 at 4:40 AM, wrote: >> >> While running HPL with checkpointing enabled MVAPICH2 1.0.2 the progamme >> crushed giving >> following errors: >> >> 1. 2: reregister dentry 0x642010, addr 0x2bc6578000 pagebase_low_p, >> 10121216 register_nbytes >> [2] Abort: reregister fails >> at line 1104 in file dreg.c >> rank 2 in job 1 n163_32790 caused collective abort of all ranks >> exit status of rank 2: killed by signal 9 >> >> >> [mpiexec_cr][/home/biswajit/mvapichBlcrInstall/mvapich2-1.0.2ckpt1/src/pm/mpd/mpiexec_cr.c: >> line 196]abort: checkpoint failed >> >> >> >> >> While restarting restart fails gving following errors: >> >> cri_syscall(CR_OP_RSTRT_PROCS): Invalid argument >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- Yiannis Georgiou LIG Laboratory / MESCAL Project Yiannis.Georgiou@imag.fr http://mescal.imag.fr/ FRANCE From narravul at cse.ohio-state.edu Thu Jun 12 16:05:08 2008 From: narravul at cse.ohio-state.edu (Sundeep Narravula) Date: Thu Jun 12 16:05:12 2008 Subject: [mvapich-discuss] checkpointing failure ... In-Reply-To: <20080612103319.eqr2pqznxck8wsc4@webmail.imag.fr> Message-ID: Hi Yiannis, If you are operating close to the limit for memory pinning, I would suspect that fluctuations in system memory usage could cause this error when you attempt to deregister and reregister the memory needed. Can you try running the tests with slightly smaller problem sizes? i.e. Same number of nodes but smaller HPL problem size. Regards. --Sundeep. On Thu, 12 Jun 2008, yiannis georgiou wrote: > Hello, > > I got the same error several times mainly on large problem sizes and > large number of nodes, even though I've seen it some times with small > number of nodes as well. Most of the times the fatal error was > produced just before the finish of the checkpointing procedure. The > strange thing is that the error doesn't appear all the times. After a > number of repeats under the same configuration (HPL problem size- > cluster number of nodes), we observe that sometimes we can get this > error but sometimes the checkpoints can be successfully taken and > restarted. > > The explanation seems reasonable. Is this a known OFED bug ? Is there > a way to avoid it? > > The nodes used on my experiments consist of : > Intel Xeon EM64T 3GHz 2CPU , 1CORE > 3 GHz / 1 MB L2 cache > > and memory 2 GB (4x512MB) / 400MHz(2.5ns) > > Thanks!! > > regards, > Yiannis > > Quoting Karthik Gopalakrishnan : > > > Hi Biswajit. > > > > You are seeing this error due to a memory allocation failure. MVAPICH2 > > calls an OFED function ibv_reg_mr() to reallocate a memory region for > > the HCA after taking a checkpoint. This function call is returning an > > error. This is a failure in the OFED stack. Please let us know if you > > are seeing this error for all problem sizes or only large ones. Also > > let us know the amount of memory you have on your processing nodes. > > Maybe you are running low on memory which is causing the allocation > > failure. > > > > Thanks & Regards, > > Karthik > > > > On Wed, Jun 11, 2008 at 4:40 AM, wrote: > >> > >> While running HPL with checkpointing enabled MVAPICH2 1.0.2 the progamme > >> crushed giving > >> following errors: > >> > >> 1. 2: reregister dentry 0x642010, addr 0x2bc6578000 pagebase_low_p, > >> 10121216 register_nbytes > >> [2] Abort: reregister fails > >> at line 1104 in file dreg.c > >> rank 2 in job 1 n163_32790 caused collective abort of all ranks > >> exit status of rank 2: killed by signal 9 > >> > >> > >> [mpiexec_cr][/home/biswajit/mvapichBlcrInstall/mvapich2-1.0.2ckpt1/src/pm/mpd/mpiexec_cr.c: > >> line 196]abort: checkpoint failed > >> > >> > >> > >> > >> While restarting restart fails gving following errors: > >> > >> cri_syscall(CR_OP_RSTRT_PROCS): Invalid argument > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > -- > > Yiannis Georgiou LIG Laboratory / MESCAL Project > Yiannis.Georgiou@imag.fr http://mescal.imag.fr/ > FRANCE > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From rafaarco at ugr.es Mon Jun 16 06:55:56 2008 From: rafaarco at ugr.es (Rafael Arco Arredondo) Date: Mon Jun 16 06:56:06 2008 Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier command In-Reply-To: <1212994886.3899.7.camel@boabdilmec.ugr.es> References: <1212994886.3899.7.camel@boabdilmec.ugr.es> Message-ID: <1213613756.11581.5.camel@boabdilmec.ugr.es> Hi, We just tried with version 1.0.3 and we are still getting the same behavior. Do you know about any successful cases of mvapich2 with daemonless smpd? Thanks again, Rafa El lun, 09-06-2008 a las 09:01 +0200, Rafael Arco Arredondo escribi?: > Hi Wei, > > Thanks for your reply. We'll try the following release when it's > available. > > We have to use smpd in our scenario. We use Sun Grid Engine to launch > the MPI processes, and so far it doesn't support mpd. > > Best regards, > > Rafa > > El jue, 05-06-2008 a las 23:36 -0400, wei huang escribi?: > > Hi Rafael, > > > > smpd is not the default launcher for mpich2, on which our mvapich2 is > > based. Thus, there can be issues and unstability with that. We strongly > > recommend using mpd based startup. > > > > Is there any specific reason that you want to use daemonless startup? > > > > FYI, we are working on a new mvapich2 release which will have our own > > daemonless startup support. It will be available in couple of weeks. Maybe > > you can use that once it is released. > > > > Thanks. > > > > -- Wei > > > > > ---------- Forwarded message ---------- > > > Date: Thu, 05 Jun 2008 10:48:20 +0200 > > > From: Rafael Arco Arredondo > > > To: mvapich-discuss@cse.ohio-state.edu > > > Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier > > > command > > > > > > Hello, > > > > > > We are having some issues with MVAPICH2-1.0.2 and the OFA drivers for > > > InfiniBand. The compilation process of MVAPICH2 ends successfully, and > > > the applications compile with (apparently) no problems with mpicc. > > > However, mpiexec fails when programs are executed on more than one > > > computer. Particularly, MPI_Finalize reports an error which comes from > > > MPIDI_CH3I_RMDA_finalize. The error occurs both for OFA-Gen2 and uDAPL. > > > We are using daemonless smpd > > > > > > Here is the command executed and its output: > > > mpiexec -rsh -nopm -n 10 -machinefile ./hosts /home/rafaarco/mmul > > > > > > Task 0 of 10 > > > Task 3 of 10 > > > mpi_matrix_mult_slave() > > > Task 4 of 10 > > > mpi_matrix_mult_slave() > > > Task 5 of 10 > > > mpi_matrix_mult_slave() > > > Task 7 of 10 > > > mpi_matrix_mult_slave() > > > Task 8 of 10 > > > mpi_matrix_mult_slave() > > > Task 9 of 10 > > > mpi_matrix_mult_slave() > > > Task 1 of 10 > > > mpi_matrix_mult_slave() > > > Task 2 of 10 > > > mpi_matrix_mult_slave() > > > Task 6 of 10 > > > mpi_matrix_mult_slave() > > > mpi_matrix_mult_master() > > > Exiting task 1 of 10 > > > Exiting task 2 of 10 > > > Exiting task 3 of 10 > > > Exiting task 4 of 10 > > > Exiting task 5 of 10 > > > Exiting task 6 of 10 > > > Exiting task 7 of 10 > > > Exiting task 8 of 10 > > > Exiting task 9 of 10 > > > Time: 3.258242 > > > Exiting task 0 of 10 > > > [0] unable to post a write of the barrier command. > > > [0] PMI_Barrier failed. > > > Fatal error in MPI_Finalize: > > > Other MPI error, error stack: > > > MPI_Finalize(234)............: MPI_Finalize failed > > > MPI_Finalize(154)............: > > > MPID_Finalize(132)...........: > > > MPIDI_CH3_Finalize(87).......: MPI_Finalize failed > > > MPIDI_CH3_Finalize(70).......: > > > MPIDI_CH3I_RMDA_finalize(736): PMI_Barrier returned -1 > > > > > > Any clues about what the problem may be? > > > > > > Thanks in advance, > > > > > > Rafa > > > > > > -- > > > Rafael Arco Arredondo > > > Centro de Servicios de Inform?tica y Redes de Comunicaciones > > > Campus de Fuentenueva - Edificio Mecenas > > > Universidad de Granada > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > -- Rafael Arco Arredondo Centro de Servicios de Inform?tica y Redes de Comunicaciones Campus de Fuentenueva - Edificio Mecenas Universidad de Granada E-18071 Granada Spain Tel: +34 958 241010 Ext:31114 E-mail: rafaarco@ugr.es From panda at cse.ohio-state.edu Mon Jun 16 07:58:08 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Jun 16 07:58:12 2008 Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier command In-Reply-To: <1213613756.11581.5.camel@boabdilmec.ugr.es> Message-ID: > Hi, > > We just tried with version 1.0.3 and we are still getting the same > behavior. Please note that 1.0.3 is a bug-fix release. It is not a major release. > Do you know about any successful cases of mvapich2 with daemonless smpd? We will be releasing 1.2 shortly. This is the release which was mentioned earlier. This will have the non-MPD-based startup which you are looking for. Thanks, DK > Thanks again, > > Rafa > > El lun, 09-06-2008 a las 09:01 +0200, Rafael Arco Arredondo escribió: > > Hi Wei, > > > > Thanks for your reply. We'll try the following release when it's > > available. > > > > We have to use smpd in our scenario. We use Sun Grid Engine to launch > > the MPI processes, and so far it doesn't support mpd. > > > > Best regards, > > > > Rafa > > > > El jue, 05-06-2008 a las 23:36 -0400, wei huang escribió: > > > Hi Rafael, > > > > > > smpd is not the default launcher for mpich2, on which our mvapich2 is > > > based. Thus, there can be issues and unstability with that. We strongly > > > recommend using mpd based startup. > > > > > > Is there any specific reason that you want to use daemonless startup? > > > > > > FYI, we are working on a new mvapich2 release which will have our own > > > daemonless startup support. It will be available in couple of weeks. Maybe > > > you can use that once it is released. > > > > > > Thanks. > > > > > > -- Wei > > > > > > > ---------- Forwarded message ---------- > > > > Date: Thu, 05 Jun 2008 10:48:20 +0200 > > > > From: Rafael Arco Arredondo > > > > To: mvapich-discuss@cse.ohio-state.edu > > > > Subject: [mvapich-discuss] mpiexec: unable to post a write of the barrier > > > > command > > > > > > > > Hello, > > > > > > > > We are having some issues with MVAPICH2-1.0.2 and the OFA drivers for > > > > InfiniBand. The compilation process of MVAPICH2 ends successfully, and > > > > the applications compile with (apparently) no problems with mpicc. > > > > However, mpiexec fails when programs are executed on more than one > > > > computer. Particularly, MPI_Finalize reports an error which comes from > > > > MPIDI_CH3I_RMDA_finalize. The error occurs both for OFA-Gen2 and uDAPL. > > > > We are using daemonless smpd > > > > > > > > Here is the command executed and its output: > > > > mpiexec -rsh -nopm -n 10 -machinefile ./hosts /home/rafaarco/mmul > > > > > > > > Task 0 of 10 > > > > Task 3 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 4 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 5 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 7 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 8 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 9 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 1 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 2 of 10 > > > > mpi_matrix_mult_slave() > > > > Task 6 of 10 > > > > mpi_matrix_mult_slave() > > > > mpi_matrix_mult_master() > > > > Exiting task 1 of 10 > > > > Exiting task 2 of 10 > > > > Exiting task 3 of 10 > > > > Exiting task 4 of 10 > > > > Exiting task 5 of 10 > > > > Exiting task 6 of 10 > > > > Exiting task 7 of 10 > > > > Exiting task 8 of 10 > > > > Exiting task 9 of 10 > > > > Time: 3.258242 > > > > Exiting task 0 of 10 > > > > [0] unable to post a write of the barrier command. > > > > [0] PMI_Barrier failed. > > > > Fatal error in MPI_Finalize: > > > > Other MPI error, error stack: > > > > MPI_Finalize(234)............: MPI_Finalize failed > > > > MPI_Finalize(154)............: > > > > MPID_Finalize(132)...........: > > > > MPIDI_CH3_Finalize(87).......: MPI_Finalize failed > > > > MPIDI_CH3_Finalize(70).......: > > > > MPIDI_CH3I_RMDA_finalize(736): PMI_Barrier returned -1 > > > > > > > > Any clues about what the problem may be? > > > > > > > > Thanks in advance, > > > > > > > > Rafa > > > > > > > > -- > > > > Rafael Arco Arredondo > > > > Centro de Servicios de Informática y Redes de Comunicaciones > > > > Campus de Fuentenueva - Edificio Mecenas > > > > Universidad de Granada > > > > > > > > _______________________________________________ > > > > mvapich-discuss mailing list > > > > mvapich-discuss@cse.ohio-state.edu > > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > > -- > Rafael Arco Arredondo > Centro de Servicios de Informática y Redes de Comunicaciones > Campus de Fuentenueva - Edificio Mecenas > Universidad de Granada > E-18071 Granada Spain > Tel: +34 958 241010 Ext:31114 E-mail: rafaarco@ugr.es > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From biswajit at crlindia.com Tue Jun 17 06:54:45 2008 From: biswajit at crlindia.com (biswajit@crlindia.com) Date: Tue Jun 17 06:54:31 2008 Subject: [mvapich-discuss] problem with running mvapich Message-ID: When I ran a simple MPI application with mvapich2-1.0.2, I got the following error messages: Unknown Mellanox PCI-Express HCA best guess as Mellanox PCI-Express SDR [3] Abort: Not enough ports are in active stateneeded active ports 1 at line 424 in file rdma_iba_priv.c rank 3 in job 1 n23_32790 caused collective abort of all ranks exit status of rank 3: return code 252 But there is a active port in each node. See the below 'ibstat' output. CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.1.0 Hardware version: a0 Node GUID: 0x0019bbfffff70cb8 System image GUID: 0x0019bbfffff70cbb Port 1: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0019bbfffff70cb9 CA 'mthca1' CA type: MT25204 Number of ports: 1 Firmware version: 1.1.0 Hardware version: a0 Node GUID: 0x0019bbfffff7fbe8 System image GUID: 0x0019bbfffff7fbeb Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 226 LMC: 0 SM lid: 117 Capability mask: 0x02510a68 Port GUID: 0x0019bbfffff7fbe9 And, whenever I run same job in nodes with IB port 1 active, it works properly. Is there any option in MVAPICH to select the IB port which should be used ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080617/b6b609f4/attachment-0001.html From nilesh_awate at yahoo.com Tue Jun 17 10:57:06 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Tue Jun 17 10:57:12 2008 Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled Message-ID: <683632.36410.qm@web94114.mail.in2.yahoo.com> Hi All, I am using mvapich2-1.0.1 over udapl stack. I am getting DAT_DTO_ERR_TRANSPORT error at udapl level, but mpi application is not terminating with some error as i browse through the code i observe following thing. ret1 = dat_evd_dequeue (MPIDI_CH3I_RDMA_Process.cq_hndl[i], &event); if (ret1 == DAT_SUCCESS) { assert (event.event_number == DAT_DTO_COMPLETION_EVENT); /* but there is no check for event.event_data.dto_completion_event_data.status */ . . . . . . . . } but above condition is handled in rdma_udapl_1sc.c file while dequeuing what is expected behavior of mpi when udapl throws error like DAT_DTO_ERR_TRANSPORT ? How this kind of error going to be handled at mpi level? OR How underlying udapl errors are reflected by mpi ? I am using pallas as an application for testing purpose waiting for reply thanking Nilesh Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080617/84cdcd4b/attachment.html From chai.15 at osu.edu Tue Jun 17 16:57:32 2008 From: chai.15 at osu.edu (LEI CHAI) Date: Tue Jun 17 16:58:26 2008 Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not handled In-Reply-To: <683632.36410.qm@web94114.mail.in2.yahoo.com> References: <683632.36410.qm@web94114.mail.in2.yahoo.com> Message-ID: An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080617/a52cff39/attachment.html From chai.15 at osu.edu Tue Jun 17 17:31:59 2008 From: chai.15 at osu.edu (LEI CHAI) Date: Tue Jun 17 17:33:01 2008 Subject: [mvapich-discuss] problem with running mvapich In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080617/1e447b5b/attachment-0001.html From biswajit at crlindia.com Wed Jun 18 01:22:11 2008 From: biswajit at crlindia.com (biswajit@crlindia.com) Date: Wed Jun 18 01:22:07 2008 Subject: [mvapich-discuss] problem with running mvapich In-Reply-To: References: Message-ID: Hi Lei This is working . MVAPICH2 is supposed to detect the active port automatically , why is it not working ...?? LEI CHAI 06/18/2008 03:03 AM To biswajit@crlindia.com cc mvapich-discuss@cse.ohio-state.edu Subject Re: [mvapich-discuss] problem with running mvapich Hi, MVAPICH2 is supposed to detect the active port automatically for you. Could you try the following options: $ mpiexec -n 2 -env MV2_IBA_HCA mthca1 -env MV2_DEFAULT_PORT 1 ./a.out and see if it works for you? Lei ----- Original Message ----- From: biswajit@crlindia.com Date: Tuesday, June 17, 2008 6:55 am Subject: [mvapich-discuss] problem with running mvapich To: mvapich-discuss@cse.ohio-state.edu > When I ran a simple MPI application with mvapich2-1.0.2, I got the following error messages: > Unknown Mellanox PCI-Express HCA best guess as Mellanox PCI-Express SDR > [3] Abort: Not enough ports are in active stateneeded active ports 1 > at line 424 in file rdma_iba_priv.c > rank 3 in job 1 n23_32790 caused collective abort of all ranks > exit status of rank 3: return code 252 > But there is a active port in each node. See the below 'ibstat' output. > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.1.0 > Hardware version: a0 > Node GUID: 0x0019bbfffff70cb8 > System image GUID: 0x0019bbfffff70cbb > Port 1: > State: Down > Physical state: Polling > Rate: 10 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0019bbfffff70cb9 > CA 'mthca1' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.1.0 > Hardware version: a0 > Node GUID: 0x0019bbfffff7fbe8 > System image GUID: 0x0019bbfffff7fbeb > Port 1: > State: Active > Physical state: LinkUp > Rate: 20 > Base lid: 226 > LMC: 0 > SM lid: 117 > Capability mask: 0x02510a68 > Port GUID: 0x0019bbfffff7fbe9 > And, whenever I run same job in nodes with IB port 1 active, it works properly. > Is there any option in MVAPICH to select the IB port which should be used ? > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080618/ad048d21/attachment.html From tewo0001 at tc.umn.edu Thu Jun 19 14:14:31 2008 From: tewo0001 at tc.umn.edu (Minassie Y Tewoldebrhan) Date: Thu Jun 19 14:35:02 2008 Subject: [mvapich-discuss] Found '/var/tmp/mvapich-1.0.0-2106-root' in installed files; aborting Message-ID: Any ideas? Found '/var/tmp/mvapich-1.0.0-2106-root' in installed files; aborting error : Bad exit status from /var/tmp/rpm-tmp.91591 (%install) Regards, M From David_Kewley at Dell.com Thu Jun 19 17:33:15 2008 From: David_Kewley at Dell.com (David_Kewley@Dell.com) Date: Thu Jun 19 17:34:09 2008 Subject: [mvapich-discuss] -DDISABLE_PTMALLOC, MPI_Bcast, and -DMCST_SUPPORT Message-ID: Topic #1: What are the likely performance impacts of using -DDISABLE_PTMALLOC (including memory use)? Does this differ between MVAPICH and MVAPICH2? We are considering seeing what effect this has on certain applications that have seen problems with realloc. Topic #2: We are using the OpenIB components of OFED 1.2.5.5, and are building our own MVAPICH and MVAPICH2, with various versions of MV* and compiler. We have an application apparently failing during MVAPICH MPI_Bcast of a many MB of data to dozens to hundreds of MPI ranks. (Actually I believe it's Fortran, so I guess MPI_BCAST.) We have already implemented VIADEV_USE_SHMEM_BCAST=0 just in case, but we are still having problems. (I'm not 100% reassured by the user's reports that the problem is still in MPI_Bcast, but I think it's likely.) If you have any thoughts on the crashes, please do comment. Meanwhile I'm continuing to chase it down locally. Topic #3: As I looked through the MVAPICH code to see how MPI_Bcast is implemented for ch_gen2, I see MCST_SUPPORT repeatedly checked. It appears this is not set by default (by make.mvapich.gen2). If MCST_SUPPORT is disabled, what algorithm is used to implement MPI_Bcast? If MCST_SUPPORT is enabled, does MPI_Bcast use IB multicast? Should it greatly speed up MPI_Bcast if enabled? It seems like MCST_SUPPORT would be beneficial, but the fact that it is not enabled by default makes me wonder what the risks are of enabling it? Thanks, David David Kewley Dell Infrastructure Consulting Services Onsite Engineer at the Maui HPC Center Cell: 602-460-7617 David_Kewley@Dell.com Dell Services: http://www.dell.com/services/ How am I doing? Email my manager Russell_Kelly@Dell.com with any feedback. From perkinjo at cse.ohio-state.edu Fri Jun 20 08:36:56 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Fri Jun 20 08:37:09 2008 Subject: [mvapich-discuss] Found '/var/tmp/mvapich-1.0.0-2106-root' in installed files; aborting In-Reply-To: References: Message-ID: <20080620123655.GA5381@cse.ohio-state.edu> On Thu, Jun 19, 2008 at 01:14:31PM -0500, Minassie Y Tewoldebrhan wrote: > Any ideas? > > Found '/var/tmp/mvapich-1.0.0-2106-root' in installed files; aborting > error : Bad exit status from /var/tmp/rpm-tmp.91591 (%install) > It looks like you're either installing from rpm or building from an srpm. Can you tell us what your installation source is, whether there have been any modifications, and what environment, such as OS and compiler, in which you're performing the installation? > > Regards, > > M > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From koop at cse.ohio-state.edu Fri Jun 20 10:14:38 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Fri Jun 20 10:14:44 2008 Subject: [mvapich-discuss] -DDISABLE_PTMALLOC, MPI_Bcast, and -DMCST_SUPPORT In-Reply-To: Message-ID: David, I'll answer your questions inline: > What are the likely performance impacts of using -DDISABLE_PTMALLOC > (including memory use)? Does this differ between MVAPICH and MVAPICH2? > We are considering seeing what effect this has on certain applications > that have seen problems with realloc. The effects of turning off PTMALLOC (using -DDISABLE_PTMALLOC) will be the same between MVAPICH and MVAPICH2. The point of using the PTMALLOC library is to allow caching of InfiniBand memory registrations. To ensure correctness we need to know if memory is being free'd, etc. Since registration for InfiniBand is very expensive we attempt to cache these registrations so if the same buffer is re-used again for communication it will already be registered (speeding up the application). So the performance change will be application-dependent. If the application makes frequent re-use of buffers for communication the performance will likely be hurt. On the flip side, if the application has very poor buffer re-use the performance may actually be better by not using the registration cache (you can always turn it off at runtime with VIADEV_USE_DREG_CACHE=0 on MVAPICH). When the registration cache is not turned on a copy-based approach is used for messages under a certain size -- so no zero-copy that is normally used, but registration is not used. I hope this helps. Please let me know if you need additional clarification. > Topic #2: > > We are using the OpenIB components of OFED 1.2.5.5, and are building our > own MVAPICH and MVAPICH2, with various versions of MV* and compiler. > > We have an application apparently failing during MVAPICH MPI_Bcast of a > many MB of data to dozens to hundreds of MPI ranks. (Actually I believe > it's Fortran, so I guess MPI_BCAST.) We have already implemented > VIADEV_USE_SHMEM_BCAST=0 just in case, but we are still having problems. > (I'm not 100% reassured by the user's reports that the problem is still > in MPI_Bcast, but I think it's likely.) We have not seen this error before, so we're very interested to track this down. If there is a reproducer for this we would be very interested to try to out here. Does the same error occur with MVAPICH2 as MVAPICH? Also, does turning off all shared memory collectives avoid the error? (VIADEV_USE_SHMEM_COLL=0) > Topic #3: > > As I looked through the MVAPICH code to see how MPI_Bcast is implemented > for ch_gen2, I see MCST_SUPPORT repeatedly checked. It appears this is > not set by default (by make.mvapich.gen2). > > If MCST_SUPPORT is disabled, what algorithm is used to implement > MPI_Bcast? If MCST_SUPPORT is enabled, does MPI_Bcast use IB multicast? > Should it greatly speed up MPI_Bcast if enabled? > > It seems like MCST_SUPPORT would be beneficial, but the fact that it is > not enabled by default makes me wonder what the risks are of enabling > it? MCST support (hardware-based multicast) is not supported right now. InfiniBand's multicast is unreliable and supports sending only in 2KB chunks and we haven't seen good performance for it on large systems. Mellanox is planning on adding reliable multicast support to the ConnectX adapter soon, at which point we'll re-evaluate the benefits. So at this point the MCST support should not be enabled. Let us know if you have any more questions. Thanks, Matt From David_Kewley at Dell.com Mon Jun 23 22:28:31 2008 From: David_Kewley at Dell.com (David_Kewley@Dell.com) Date: Mon Jun 23 22:29:02 2008 Subject: [mvapich-discuss] -DDISABLE_PTMALLOC, MPI_Bcast, and -DMCST_SUPPORT In-Reply-To: References: Message-ID: Matt, Thanks for clarifying the effect of -DDISABLE_PTMALLOC, and the fact that hardware-based multicast is not enabled right now. I think that's all I need to know on those topics for now. I have a reproducer and observations about the apparent MPI_Bcast segfault bug. This is on x86_64, using Intel Fortran 10.1.015 (Build 20080312), and the executable ends up using the Intel implementation of memcpy(), in case that's significant -- see the backtrace below. This is with MVAPICH 1.0. The segfault occurs whenever these two conditions both hold: 1) length of the character array sent is > 8MB-11kB 2) #procs is > (7 nodes) * (N procs per node) For the second condition I tested with N=1,2,4 procs per node, in which cases the segfault occurred when #procs in the job size exceeded 7,14,28 procs respectively. If either of the conditions does not hold, the segfault does not occur. The threshold is exactly 8MB-11kB. If the length of the char array is 8MB-11kB, it's fine, but if it's 8MB-11kB+1, it segfaults. The segfault occurs in the memcpy function (again, it's the Intel memcpy), when it tries to copy into the rhandle->buf beyond the 8MB-11kB mark. The backtrace is, for example: #0 0x00000000004045c1 in __intel_new_memcpy () #1 0x0000000000401ee8 in _intel_fast_memcpy.J () #2 0x0000002a9560010e in MPID_VIA_self_start () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #3 0x0000002a955d8e82 in MPID_IsendContig () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #4 0x0000002a955d7564 in MPID_IsendDatatype () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #5 0x0000002a955cc4d6 in PMPI_Isend () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #6 0x0000002a955e95d2 in PMPI_Sendrecv () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #7 0x0000002a955bf7e9 in intra_Bcast_Large () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #8 0x0000002a955bcfa0 in intra_newBcast () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #9 0x0000002a95594e00 in PMPI_Bcast () from /opt/mvapich/1.0/intel/10.1.015/lib/shared/libmpich.so.1.0 #10 0x0000000000401e3d in main () Attached find a simple reproducer C program. David > -----Original Message----- > From: Matthew Koop [mailto:koop@cse.ohio-state.edu] > Sent: Friday, June 20, 2008 4:15 AM > To: Kewley, David > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] -DDISABLE_PTMALLOC, MPI_Bcast, and - > DMCST_SUPPORT > > David, > > I'll answer your questions inline: > > > What are the likely performance impacts of using -DDISABLE_PTMALLOC > > (including memory use)? Does this differ between MVAPICH and MVAPICH2? > > We are considering seeing what effect this has on certain applications > > that have seen problems with realloc. > > The effects of turning off PTMALLOC (using -DDISABLE_PTMALLOC) will be the > same between MVAPICH and MVAPICH2. > > The point of using the PTMALLOC library is to allow caching of InfiniBand > memory registrations. To ensure correctness we need to know if memory is > being free'd, etc. Since registration for InfiniBand is very expensive we > attempt to cache these registrations so if the same buffer is re-used > again for communication it will already be registered (speeding up the > application). > > So the performance change will be application-dependent. If the > application makes frequent re-use of buffers for communication the > performance will likely be hurt. On the flip side, if the application has > very poor buffer re-use the performance may actually be better by not > using the registration cache (you can always turn it off at runtime with > VIADEV_USE_DREG_CACHE=0 on MVAPICH). When the registration cache is not > turned on a copy-based approach is used for messages under a certain size > -- so no zero-copy that is normally used, but registration is not used. > > I hope this helps. Please let me know if you need additional > clarification. > > > Topic #2: > > > > We are using the OpenIB components of OFED 1.2.5.5, and are building our > > own MVAPICH and MVAPICH2, with various versions of MV* and compil