From paravpandit at yahoo.com Mon Aug 3 12:37:15 2009 From: paravpandit at yahoo.com (Parav Pandit) Date: Mon Aug 3 12:37:43 2009 Subject: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters Message-ID: <114581.58704.qm@web30101.mail.mud.yahoo.com> Hi, I am using MVAPICH-2 version 1.2p1 as part of OFED-1.4 with #2 cxgb3 Chelsio iWarp cards. I am trying to run srtest.c using mpiexec between these two adapters. One as sender, other as receiver. I am unable to configure MPD daemon. Can someone suggest me simplest way to configure MPD and how to run srtest using mpiexec? I want to have one as Front-Node other as Compute-Node. I have done following configuration. System-1 echo 192.168.1.1 > /etc/mv2.conf System-2 echo 192.168.1.2 > /etc/mv2.conf Do I have to start MPD on both systems first, what should be the parameters to it? Regards, Parav Pandit From perkinjo at cse.ohio-state.edu Mon Aug 3 13:09:29 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Mon Aug 3 13:09:56 2009 Subject: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters In-Reply-To: <114581.58704.qm@web30101.mail.mud.yahoo.com> References: <114581.58704.qm@web30101.mail.mud.yahoo.com> Message-ID: <20090803170929.GC2617@cse.ohio-state.edu> On Mon, Aug 03, 2009 at 09:37:15AM -0700, Parav Pandit wrote: > Hi, > > I am using MVAPICH-2 version 1.2p1 as part of OFED-1.4 with #2 cxgb3 > Chelsio iWarp cards. > > I am trying to run srtest.c using mpiexec between these two adapters. > One as sender, other as receiver. I am unable to configure MPD > daemon. Can someone suggest me simplest way to configure MPD and how > to run srtest using mpiexec? I want to have one as Front-Node other > as Compute-Node. > > I have done following configuration. > System-1 > echo 192.168.1.1 > /etc/mv2.conf > > System-2 > echo 192.168.1.2 > /etc/mv2.conf > > Do I have to start MPD on both systems first, what should be the > parameters to it? Yes, please take a look at the following section in our userguide for mvapich2-1.2. http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-180005.2.3 > > Regards, > Parav Pandit > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090803/5b5cad3b/attachment.bin From subramon at cse.ohio-state.edu Mon Aug 3 13:12:38 2009 From: subramon at cse.ohio-state.edu (Hari Subramoni) Date: Mon Aug 3 13:13:04 2009 Subject: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters In-Reply-To: <114581.58704.qm@web30101.mail.mud.yahoo.com> Message-ID: Hi Parav, The information you need is available in our userguide at the following link... http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-180005.2.3 The following link give information on how to launch jobs using mpiexec http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-200005.2.5 MVAPICH2-1.2 has support for a new job launcher - mpirun_rsh, which does not require you to setup mpd rings. It will also give improved startup performance. The following link give you detailed information on how to launch applications with this. http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-160005.2.1 Let us know if you have any more questions. Thx, Hari. On Mon, 3 Aug 2009, Parav Pandit wrote: > Hi, > > I am using MVAPICH-2 version 1.2p1 as part of OFED-1.4 with #2 cxgb3 Chelsio iWarp cards. > > I am trying to run srtest.c using mpiexec between these two adapters. > One as sender, other as receiver. > I am unable to configure MPD daemon. Can someone suggest me simplest way to configure MPD and how to run srtest using mpiexec? > I want to have one as Front-Node other as Compute-Node. > > I have done following configuration. > System-1 > echo 192.168.1.1 > /etc/mv2.conf > > System-2 > echo 192.168.1.2 > /etc/mv2.conf > > Do I have to start MPD on both systems first, what should be the parameters to it? > > Regards, > Parav Pandit > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From paravpandit at yahoo.com Mon Aug 3 14:04:27 2009 From: paravpandit at yahoo.com (Parav Pandit) Date: Mon Aug 3 14:04:56 2009 Subject: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters In-Reply-To: Message-ID: <122766.141.qm@web30101.mail.mud.yahoo.com> Thanks for the quick inputs. I was referring to it and couldn't do it, so post the generic question. Let me come to specific question now. System-1(front-node) I started mpd using #mpd -n 1 user guide doesn't say whether to start mpd on front node or not. It says start on the compute node. I think its implicit to start on the front-node also. Similarly started the mpd on the System-2 also with same command. Then on system-1 I run #mpiexec -machinefile mf_list -n 1 -env MV2_USE_RDMA_CM 1 -env MV2_USE_SRQ 0 ./srtest mf_list contains following data. 192.168.1.2 ifhn=192.168.1.1 #1.1 is the iWarp RNIC's IP (front-node) #1.2 is the remote end (compute-node). With this I am getting following error. mpiexec: unable to start all procs; may have invalid machine names remaining specified hosts: 192.168.1.2 I am unable to figure out whats wrong in the configuration. I haven't tried the mpirun_rsh yet. I would like to go step by step and understand the functionality and build-up. Regards, Parav Pandit --- On Mon, 8/3/09, Hari Subramoni wrote: > From: Hari Subramoni > Subject: Re: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters > To: "Parav Pandit" > Cc: mvapich-discuss@cse.ohio-state.edu > Date: Monday, August 3, 2009, 10:42 PM > Hi Parav, > > The information you need is available in our userguide at > the following > link... > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-180005.2.3 > > The following link give information on how to launch jobs > using mpiexec > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-200005.2.5 > > MVAPICH2-1.2 has support for a new job launcher - > mpirun_rsh, which does > not require you to setup mpd rings. It will also give > improved startup > performance. The following link give you detailed > information on how to > launch applications with this. > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-160005.2.1 > > Let us know if you have any more questions. > > Thx, > Hari. > > On Mon, 3 Aug 2009, Parav Pandit wrote: > > > Hi, > > > > I am using MVAPICH-2 version 1.2p1 as part of OFED-1.4 > with #2 cxgb3 Chelsio iWarp cards. > > > > I am trying to run srtest.c using mpiexec between > these two adapters. > > One as sender, other as receiver. > > I am unable to configure MPD daemon. Can someone > suggest me simplest way to configure MPD and how to run > srtest using mpiexec? > > I want to have one as Front-Node other as > Compute-Node. > > > > I have done following configuration. > > System-1 > > echo 192.168.1.1 > /etc/mv2.conf > > > > System-2 > > echo 192.168.1.2 > /etc/mv2.conf > > > > Do I have to start MPD on both systems first, what > should be the parameters to it? > > > > Regards, > > Parav Pandit > > > > > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > From jeff at haferman.com Mon Aug 3 15:01:14 2009 From: jeff at haferman.com (Jeff Haferman) Date: Mon Aug 3 15:01:44 2009 Subject: [mvapich-discuss] mvapich 1.0.1 Got error polling CQ Message-ID: <20090803190114.8781F1D91614@adint.net> Hi - This is our first try of IB and mvapich, and yes I will install a more up-to-date version, but could someone explain the following: mpirun -np 2 -hostfile hostfile ./osu_latency Abort signaled by rank 0: [compute-0-0.local:0] Got error polling CQ Abort signaled by rank 1: Error polling CQ Exit code -3 signaled from compute-ib-0-0 Killing remote processes...MPI process terminated unexpectedly MPI process terminated unexpectedly DONE This is mvapich 1.0.1 compiled with gnu 4.1.2 on Centos 5.2 with Linux kernel 2.6.18-92.1.26 and ofed 1.3.1 What is "Error polling CQ"? I done a search and read the manual but can't find anything helpful. Jeff From paravpandit at yahoo.com Mon Aug 3 15:23:12 2009 From: paravpandit at yahoo.com (Parav Pandit) Date: Mon Aug 3 15:23:41 2009 Subject: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters Message-ID: <199900.35736.qm@web30103.mail.mud.yahoo.com> Hi Hari, I am trying mpirun_rsh. I ran (on 1.1 system-1) ../bin/mpirun_rsh -np 1 system-2 LD_LIBRARY_PATH=/usr/local/lib MV2_USE_IWARP_MODE=1 MV2_USE_SRQ=0 ./srtest I get following output. Process 0 on system-2 Process 0 of 1 0 sending 'hello there' And it waits in MPI_Send(). I have basic question. Should I be running -np = 2 or 1? I believe one. If I run just one process, what will run on compute node and what will run on front-node?? Any other example through which I can exercise the traffic, various iWarp verbs through mvapich2? Regards, Parav Pandit --- On Mon, 8/3/09, Hari Subramoni wrote: > From: Hari Subramoni > Subject: Re: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters > To: "Parav Pandit" > Cc: mvapich-discuss@cse.ohio-state.edu > Date: Monday, August 3, 2009, 10:42 PM > Hi Parav, > > The information you need is available in our userguide at > the following > link... > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-180005.2.3 > > The following link give information on how to launch jobs > using mpiexec > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-200005.2.5 > > MVAPICH2-1.2 has support for a new job launcher - > mpirun_rsh, which does > not require you to setup mpd rings. It will also give > improved startup > performance. The following link give you detailed > information on how to > launch applications with this. > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-160005.2.1 > > Let us know if you have any more questions. > > Thx, > Hari. > > On Mon, 3 Aug 2009, Parav Pandit wrote: > > > Hi, > > > > I am using MVAPICH-2 version 1.2p1 as part of OFED-1.4 > with #2 cxgb3 Chelsio iWarp cards. > > > > I am trying to run srtest.c using mpiexec between > these two adapters. > > One as sender, other as receiver. > > I am unable to configure MPD daemon. Can someone > suggest me simplest way to configure MPD and how to run > srtest using mpiexec? > > I want to have one as Front-Node other as > Compute-Node. > > > > I have done following configuration. > > System-1 > > echo 192.168.1.1 > /etc/mv2.conf > > > > System-2 > > echo 192.168.1.2 > /etc/mv2.conf > > > > Do I have to start MPD on both systems first, what > should be the parameters to it? > > > > Regards, > > Parav Pandit > > > > > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > From panda at cse.ohio-state.edu Mon Aug 3 21:19:58 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Aug 3 21:20:24 2009 Subject: [mvapich-discuss] mvapich 1.0.1 Got error polling CQ In-Reply-To: <20090803190114.8781F1D91614@adint.net> Message-ID: This error indicates that when a process is able to poll Completion Queue (CQ) of InfiniBand ntwork, it is getting an error. On your IB set-up, are you able to carry out IB native-level (verbs-level, not MPI-level) tests across the nodes. Please make sure that the IB set-up is correct. Then you can carry out MPI-level tests. DK On Mon, 3 Aug 2009, Jeff Haferman wrote: > > Hi - > This is our first try of IB and mvapich, and yes I will install a more up-to-date version, but could someone explain the following: > > mpirun -np 2 -hostfile hostfile ./osu_latency > Abort signaled by rank 0: [compute-0-0.local:0] Got error polling CQ > > Abort signaled by rank 1: Error polling CQ > > Exit code -3 signaled from compute-ib-0-0 > Killing remote processes...MPI process terminated unexpectedly > MPI process terminated unexpectedly > DONE > > > This is mvapich 1.0.1 compiled with gnu 4.1.2 on Centos 5.2 with Linux kernel 2.6.18-92.1.26 and ofed 1.3.1 > What is "Error polling CQ"? I done a search and read the manual but can't find anything helpful. > > Jeff > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Mon Aug 3 21:29:03 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Aug 3 21:29:29 2009 Subject: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters In-Reply-To: <199900.35736.qm@web30103.mail.mud.yahoo.com> Message-ID: You need to use -np 2 to run your application on 2 processes. You need at least two processes to run any MPI application. For detailed usage of mpirun with MVAPICH2 1.2, take a look at Section 5.2.1 of MVAPICH2 1.2 user guide at the following URL: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-160005.2.1 DK On Mon, 3 Aug 2009, Parav Pandit wrote: > Hi Hari, > > I am trying mpirun_rsh. > > I ran (on 1.1 system-1) > ../bin/mpirun_rsh -np 1 system-2 LD_LIBRARY_PATH=/usr/local/lib MV2_USE_IWARP_MODE=1 MV2_USE_SRQ=0 ./srtest > > I get following output. > > Process 0 on system-2 > Process 0 of 1 > 0 sending 'hello there' > > And it waits in MPI_Send(). > > I have basic question. Should I be running -np = 2 or 1? I believe one. > If I run just one process, what will run on compute node and what will run on front-node?? > > Any other example through which I can exercise the traffic, various iWarp verbs through mvapich2? > > Regards, > Parav Pandit > > > --- On Mon, 8/3/09, Hari Subramoni wrote: > > > From: Hari Subramoni > > Subject: Re: [mvapich-discuss] How to configure MPD for using RDMA through Chelsio iWarp adapters > > To: "Parav Pandit" > > Cc: mvapich-discuss@cse.ohio-state.edu > > Date: Monday, August 3, 2009, 10:42 PM > > Hi Parav, > > > > The information you need is available in our userguide at > > the following > > link... > > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-180005.2.3 > > > > The following link give information on how to launch jobs > > using mpiexec > > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-200005.2.5 > > > > MVAPICH2-1.2 has support for a new job launcher - > > mpirun_rsh, which does > > not require you to setup mpd rings. It will also give > > improved startup > > performance. The following link give you detailed > > information on how to > > launch applications with this. > > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-160005.2.1 > > > > Let us know if you have any more questions. > > > > Thx, > > Hari. > > > > On Mon, 3 Aug 2009, Parav Pandit wrote: > > > > > Hi, > > > > > > I am using MVAPICH-2 version 1.2p1 as part of OFED-1.4 > > with #2 cxgb3 Chelsio iWarp cards. > > > > > > I am trying to run srtest.c using mpiexec between > > these two adapters. > > > One as sender, other as receiver. > > > I am unable to configure MPD daemon. Can someone > > suggest me simplest way to configure MPD and how to run > > srtest using mpiexec? > > > I want to have one as Front-Node other as > > Compute-Node. > > > > > > I have done following configuration. > > > System-1 > > > echo 192.168.1.1 > /etc/mv2.conf > > > > > > System-2 > > > echo 192.168.1.2 > /etc/mv2.conf > > > > > > Do I have to start MPD on both systems first, what > > should be the parameters to it? > > > > > > Regards, > > > Parav Pandit > > > > > > > > > > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From jeff at haferman.com Mon Aug 3 22:38:16 2009 From: jeff at haferman.com (Jeff Haferman) Date: Mon Aug 3 22:38:44 2009 Subject: [mvapich-discuss] mvapich 1.0.1 Got error polling CQ In-Reply-To: References: <20090803190114.8781F1D91614@adint.net> Message-ID: <20090804023816.0B83B1D91627@adint.net> Thank you, you are correct... verbs level tests are failing. They were working earlier, so something broke. Dhabaleswar Panda wrote: > This error indicates that when a process is able to poll Completion Queue > (CQ) of InfiniBand ntwork, it is getting an error. > > On your IB set-up, are you able to carry out IB native-level (verbs-level, > not MPI-level) tests across the nodes. Please make sure that the IB set-up > is correct. Then you can carry out MPI-level tests. > > DK > > On Mon, 3 Aug 2009, Jeff Haferman wrote: > >> >> Hi - >> This is our first try of IB and mvapich, and yes I will install a more up-to-date version, but could someone explain the following: >> >> mpirun -np 2 -hostfile hostfile ./osu_latency >> Abort signaled by rank 0: [compute-0-0.local:0] Got error polling CQ >> >> Abort signaled by rank 1: Error polling CQ >> >> Exit code -3 signaled from compute-ib-0-0 >> Killing remote processes...MPI process terminated unexpectedly >> MPI process terminated unexpectedly >> DONE >> >> >> This is mvapich 1.0.1 compiled with gnu 4.1.2 on Centos 5.2 with Linux kernel 2.6.18-92.1.26 and ofed 1.3.1 >> What is "Error polling CQ"? I done a search and read the manual but can't find anything helpful. >> >> Jeff >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > From polk678 at gmail.com Tue Aug 4 02:29:45 2009 From: polk678 at gmail.com (gossips J) Date: Tue Aug 4 02:30:13 2009 Subject: [mvapich-discuss] loopback with mvapich2-1.2p1 Message-ID: Hi, I would like to know whether mvapcih2-1.2p1 makes loopback connections or not. Does anybody have idea on the same? I could not locate this details on FAQ as well as user guide for the same. Thanks, Polk. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090804/c7eca2e6/attachment.html From pashash at gmail.com Tue Aug 4 04:10:51 2009 From: pashash at gmail.com (Pavel Shamis (Pasha)) Date: Tue Aug 4 04:11:21 2009 Subject: [mvapich-discuss] mvapich 1.0.1 Got error polling CQ In-Reply-To: <20090804023816.0B83B1D91627@adint.net> References: <20090803190114.8781F1D91614@adint.net> <20090804023816.0B83B1D91627@adint.net> Message-ID: <4A77ED0B.7050004@dev.mellanox.co.il> Jeff, You can use the ibdiagnet Open Fabrics tool for IB network debug: http://linux.die.net/man/1/ibdiagnet Pasha Jeff Haferman wrote: > Thank you, you are correct... verbs level tests are failing. They were > working earlier, so something broke. > > > > Dhabaleswar Panda wrote: > >> This error indicates that when a process is able to poll Completion Queue >> (CQ) of InfiniBand ntwork, it is getting an error. >> >> On your IB set-up, are you able to carry out IB native-level (verbs-level, >> not MPI-level) tests across the nodes. Please make sure that the IB set-up >> is correct. Then you can carry out MPI-level tests. >> >> DK >> >> On Mon, 3 Aug 2009, Jeff Haferman wrote: >> >> >>> Hi - >>> This is our first try of IB and mvapich, and yes I will install a more up-to-date version, but could someone explain the following: >>> >>> mpirun -np 2 -hostfile hostfile ./osu_latency >>> Abort signaled by rank 0: [compute-0-0.local:0] Got error polling CQ >>> >>> Abort signaled by rank 1: Error polling CQ >>> >>> Exit code -3 signaled from compute-ib-0-0 >>> Killing remote processes...MPI process terminated unexpectedly >>> MPI process terminated unexpectedly >>> DONE >>> >>> >>> This is mvapich 1.0.1 compiled with gnu 4.1.2 on Centos 5.2 with Linux kernel 2.6.18-92.1.26 and ofed 1.3.1 >>> What is "Error polling CQ"? I done a search and read the manual but can't find anything helpful. >>> >>> Jeff >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From polk678 at gmail.com Tue Aug 4 04:19:08 2009 From: polk678 at gmail.com (gossips J) Date: Tue Aug 4 04:19:34 2009 Subject: [mvapich-discuss] Re: loopback with mvapich2-1.2p1 In-Reply-To: References: Message-ID: Re factored the question, Whether mvapich2-1.2p1 supports loopback connections?? As with Intel MPI we can specify device as "rdma" or "rdssm" where "rdssm" does _no_ loopback whereas "rdma" does both loopback as well as non-loopback connections with both IMB-MPI1 and IMB-EXT. On Tue, Aug 4, 2009 at 11:59 AM, gossips J wrote: > Hi, > I would like to know whether mvapcih2-1.2p1 makes loopback connections or > not. > > Does anybody have idea on the same? > > I could not locate this details on FAQ as well as user guide for the same. > > Thanks, > Polk. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090804/f9ec9628/attachment.html From panda at cse.ohio-state.edu Tue Aug 4 09:54:58 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Aug 4 09:55:24 2009 Subject: [mvapich-discuss] Re: loopback with mvapich2-1.2p1 In-Reply-To: Message-ID: > Re factored the question, > Whether mvapich2-1.2p1 supports loopback connections?? Yes. > As with Intel MPI we can specify device as "rdma" or "rdssm" where "rdssm" > does _no_ loopback whereas "rdma" does both loopback as well as non-loopback > connections with both IMB-MPI1 and IMB-EXT. The runtime environmental parameter MV2_USE_SHARED_MEM controls this. By default, this is set to 1 which uses intra-node shared memory for communication. If this parameter is set to 0, communication takes place over loopback connection. More details on this parameter are available from MVAPICH2 1.2p1 user guide at the following URL: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-15600011.70 However, before changing this parameter to 0, you need to make sure that the underlying network adapter supports loopback connection. Some adapters do not support such functionality and this option (MV2_USE_SHARED_MEM=0) may not work with these adapters. Hope this helps. DK > > On Tue, Aug 4, 2009 at 11:59 AM, gossips J wrote: > > > Hi, > > I would like to know whether mvapcih2-1.2p1 makes loopback connections or > > not. > > > > Does anybody have idea on the same? > > > > I could not locate this details on FAQ as well as user guide for the same. > > > > Thanks, > > Polk. > > > From jeff at haferman.com Tue Aug 4 10:29:10 2009 From: jeff at haferman.com (Jeff Haferman) Date: Tue Aug 4 10:29:39 2009 Subject: [mvapich-discuss] mvapich 1.0.1 Got error polling CQ In-Reply-To: <4A77ED0B.7050004@dev.mellanox.co.il> References: <20090803190114.8781F1D91614@adint.net> <20090804023816.0B83B1D91627@adint.net> <4A77ED0B.7050004@dev.mellanox.co.il> Message-ID: <20090804142910.A4BAD1D91630@adint.net> Thank you Pasha - It looks like I have these OFED tools installed as part of the OFED "ibutils" rpm... ibdiagnet passes, but my ibping is not resolving node names properly... I think I will solve this now. Jeff Pavel Shamis (Pasha) wrote: > Jeff, > You can use the ibdiagnet Open Fabrics tool for IB network debug: > http://linux.die.net/man/1/ibdiagnet > > Pasha > > Jeff Haferman wrote: >> Thank you, you are correct... verbs level tests are failing. They were >> working earlier, so something broke. >> >> >> >> Dhabaleswar Panda wrote: >> >>> This error indicates that when a process is able to poll Completion Queue >>> (CQ) of InfiniBand ntwork, it is getting an error. >>> >>> On your IB set-up, are you able to carry out IB native-level (verbs-level, >>> not MPI-level) tests across the nodes. Please make sure that the IB set-up >>> is correct. Then you can carry out MPI-level tests. >>> >>> DK >>> >>> On Mon, 3 Aug 2009, Jeff Haferman wrote: >>> >>> >>>> Hi - >>>> This is our first try of IB and mvapich, and yes I will install a more up-to-date version, but could someone explain the following: >>>> >>>> mpirun -np 2 -hostfile hostfile ./osu_latency >>>> Abort signaled by rank 0: [compute-0-0.local:0] Got error polling CQ >>>> >>>> Abort signaled by rank 1: Error polling CQ >>>> >>>> Exit code -3 signaled from compute-ib-0-0 >>>> Killing remote processes...MPI process terminated unexpectedly >>>> MPI process terminated unexpectedly >>>> DONE >>>> >>>> >>>> This is mvapich 1.0.1 compiled with gnu 4.1.2 on Centos 5.2 with Linux kernel 2.6.18-92.1.26 and ofed 1.3.1 >>>> What is "Error polling CQ"? I done a search and read the manual but can't find anything helpful. >>>> >>>> Jeff >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From polk678 at gmail.com Thu Aug 6 01:55:47 2009 From: polk678 at gmail.com (gossips J) Date: Thu Aug 6 01:56:23 2009 Subject: [mvapich-discuss] Re: loopback with mvapich2-1.2p1 In-Reply-To: References: Message-ID: This env variable does not turn out into Loopback connections. I ran with the adapter which supports loopback connection as well. It makes non-loopback connections only. On Tue, Aug 4, 2009 at 7:24 PM, Dhabaleswar Panda wrote: > > Re factored the question, > > Whether mvapich2-1.2p1 supports loopback connections?? > > Yes. > > > As with Intel MPI we can specify device as "rdma" or "rdssm" where > "rdssm" > > does _no_ loopback whereas "rdma" does both loopback as well as > non-loopback > > connections with both IMB-MPI1 and IMB-EXT. > > The runtime environmental parameter MV2_USE_SHARED_MEM controls this. By > default, this is set to 1 which uses intra-node shared memory for > communication. If this parameter is set to 0, communication takes place > over loopback connection. > > More details on this parameter are available from MVAPICH2 1.2p1 user > guide at the following URL: > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-15600011.70 > > However, before changing this parameter to 0, you need to make sure that > the underlying network adapter supports loopback connection. Some adapters > do not support such functionality and this option (MV2_USE_SHARED_MEM=0) > may not work with these adapters. > > Hope this helps. > > DK > > > > > On Tue, Aug 4, 2009 at 11:59 AM, gossips J wrote: > > > > > Hi, > > > I would like to know whether mvapcih2-1.2p1 makes loopback connections > or > > > not. > > > > > > Does anybody have idea on the same? > > > > > > I could not locate this details on FAQ as well as user guide for the > same. > > > > > > Thanks, > > > Polk. > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090806/563f9e17/attachment.html From panda at cse.ohio-state.edu Thu Aug 6 13:14:48 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Aug 6 13:15:14 2009 Subject: [mvapich-discuss] Re: loopback with mvapich2-1.2p1 In-Reply-To: Message-ID: > This env variable does not turn out into Loopback connections. I ran with > the adapter which supports loopback connection as well. > It makes non-loopback connections only. How are you detecting that it is not working? Have you run OSU latency test on your adapter with the shared memory variable being turned `on' and `off'. You should see higher latency number when the loopback connection is used (i.e., shared memory being `off'). Here are some sample numbers with and without shared memory. When shared memory is on: # OSU MPI Latency Test v3.1.1 # Size Latency (us) 0 0.79 1 1.00 2 0.99 4 0.99 8 0.99 16 0.99 32 1.01 64 1.05 128 1.14 256 1.25 512 1.39 1024 1.63 2048 1.83 4096 2.48 8192 3.86 16384 7.08 32768 13.79 65536 27.05 131072 44.38 262144 77.17 524288 137.95 1048576 262.82 2097152 499.93 4194304 1187.30 When shared memory is off: # OSU MPI Latency Test v3.1.1 # Size Latency (us) 0 1.57 1 1.52 2 1.52 4 1.52 8 1.52 16 1.52 32 1.58 64 1.76 128 2.92 256 3.23 512 3.63 1024 4.39 2048 6.02 4096 7.20 8192 9.76 16384 15.77 32768 21.92 65536 33.20 131072 54.12 262144 99.65 524288 187.62 1048576 367.76 2097152 728.32 4194304 1557.02 DK From moody20 at llnl.gov Thu Aug 6 20:55:25 2009 From: moody20 at llnl.gov (Adam Moody) Date: Thu Aug 6 20:55:52 2009 Subject: [mvapich-discuss] SLURM with MVAPICH2-1.4-RC1 Message-ID: <4A7B7B7D.6070302@llnl.gov> Hello MVAPICH team, I'm trying to build and test MVAPICH2-1.4-RC1-r3378. I'd like to configure it to use SLURM (i.e., link against SLURM's PMI library) so that I can launch the job via something like "srun -n2 ./hellompi". According to the User Guide, it looks like there is a --with-slurm configuration parameter, but I can't get it to work. First, it's not clear which directory I should name here. We have a pmi.h in /usr/include/slurm and a libpmi.so in /usr/lib64. I tried --with-slurm=/usr/lib64 on the top most configure, but that didn't work. Can you tell me what I need to do? Thanks, -Adam From perkinjo at cse.ohio-state.edu Thu Aug 6 21:18:27 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Aug 6 21:19:03 2009 Subject: [mvapich-discuss] SLURM with MVAPICH2-1.4-RC1 In-Reply-To: <4A7B7B7D.6070302@llnl.gov> References: <4A7B7B7D.6070302@llnl.gov> Message-ID: <20090807011827.GA27360@cse.ohio-state.edu> On Thu, Aug 06, 2009 at 05:55:25PM -0700, Adam Moody wrote: > Hello MVAPICH team, > I'm trying to build and test MVAPICH2-1.4-RC1-r3378. I'd like to > configure it to use SLURM (i.e., link against SLURM's PMI library) so > that I can launch the job via something like "srun -n2 ./hellompi". > According to the User Guide, it looks like there is a --with-slurm > configuration parameter, but I can't get it to work. First, it's not > clear which directory I should name here. We have a pmi.h in > /usr/include/slurm and a libpmi.so in /usr/lib64. I tried > --with-slurm=/usr/lib64 on the top most configure, but that didn't work. > Can you tell me what I need to do? > Thanks, > -Adam The path that you provide to --with-slurm should by '/usr' if its needed at all. Is /usr/lib64 part of your systems standard search path? If so this should work, otherwise it looks like the build system won't work in your case as there is an explicit /lib appended to the path given. You may want to try editing src/pmi/slurm/configure.in:44 and re-running './maint/updatefiles' followed by configure to see if you can get better results. -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090806/5c84f9a9/attachment.bin From moody20 at llnl.gov Thu Aug 6 21:29:32 2009 From: moody20 at llnl.gov (Adam Moody) Date: Thu Aug 6 21:30:05 2009 Subject: [mvapich-discuss] Re: SLURM with MVAPICH2-1.4-RC1 In-Reply-To: <4A7B7B7D.6070302@llnl.gov> References: <4A7B7B7D.6070302@llnl.gov> Message-ID: <4A7B837C.2040500@llnl.gov> I may have found the issue. I added "--with-pmi=slurm" and "--with-pm=no" to my configure. Here is my configure: ./configure \ --prefix=$(PREFIXDIR) \ --enable-sharedlibs=gcc \ --enable-f77 --enable-f90 --enable-cxx \ --enable-fast=O0 --enable-error-checking=all --enable-error-messages=all --disable-nmpi-as-mpi \ --enable-g=all --enable-debuginfo \ --with-pmi=slurm --with-pm=no --with-slurm=/usr/lib64 \ --with-rdma=gen2 --with-ib-include=/usr/include --with-ib-lib=/usr/lib64 \ --enable-romio --with-file-system=lustre+nfs+ufs \ --disable-mpe --without-mpe After configure/make/make install, the following command successfully ran two processes as part of the same MPI job: srun -n2 -ppdebug --mpi=none ./mpiBench_mvapich2 Barrier Assuming this is the right approach, you may want to list the --with-pmi and --with-pm settings in Section 4.3.1 "Using SLURM" of the User Guide so others don't hit the same issue. Please let me know if I'm still missing anything. Thanks, -Adam Adam Moody wrote: > Hello MVAPICH team, > I'm trying to build and test MVAPICH2-1.4-RC1-r3378. I'd like to > configure it to use SLURM (i.e., link against SLURM's PMI library) so > that I can launch the job via something like "srun -n2 ./hellompi". > According to the User Guide, it looks like there is a --with-slurm > configuration parameter, but I can't get it to work. First, it's not > clear which directory I should name here. We have a pmi.h in > /usr/include/slurm and a libpmi.so in /usr/lib64. I tried > --with-slurm=/usr/lib64 on the top most configure, but that didn't > work. Can you tell me what I need to do? > Thanks, > -Adam > From polk678 at gmail.com Fri Aug 7 05:46:01 2009 From: polk678 at gmail.com (gossips J) Date: Fri Aug 7 05:46:31 2009 Subject: [mvapich-discuss] Re: loopback with mvapich2-1.2p1 In-Reply-To: References: Message-ID: Does high latency indicate loopback connections...??? I am referring to the ethtool logs for my adapter interface which has counter incremented for all loopback connection being made during the test (either IMB-MPI1 or IMB-EXT). These counter I am able to see loopback counters in case of Intel MPI (Rdma as device) but not with (RDSSM as device). Similar thing should be seen with MVAPICH2. Am I referring something different than loopback? Polk. On Thu, Aug 6, 2009 at 10:44 PM, Dhabaleswar Panda wrote: > > This env variable does not turn out into Loopback connections. I ran with > > the adapter which supports loopback connection as well. > > It makes non-loopback connections only. > > How are you detecting that it is not working? Have you run OSU latency > test on your adapter with the shared memory variable being turned `on' and > `off'. You should see higher latency number when the loopback connection > is used (i.e., shared memory being `off'). > > Here are some sample numbers with and without shared memory. > > When shared memory is on: > > # OSU MPI Latency Test v3.1.1 > # Size Latency (us) > 0 0.79 > 1 1.00 > 2 0.99 > 4 0.99 > 8 0.99 > 16 0.99 > 32 1.01 > 64 1.05 > 128 1.14 > 256 1.25 > 512 1.39 > 1024 1.63 > 2048 1.83 > 4096 2.48 > 8192 3.86 > 16384 7.08 > 32768 13.79 > 65536 27.05 > 131072 44.38 > 262144 77.17 > 524288 137.95 > 1048576 262.82 > 2097152 499.93 > 4194304 1187.30 > > When shared memory is off: > > # OSU MPI Latency Test v3.1.1 > # Size Latency (us) > 0 1.57 > 1 1.52 > 2 1.52 > 4 1.52 > 8 1.52 > 16 1.52 > 32 1.58 > 64 1.76 > 128 2.92 > 256 3.23 > 512 3.63 > 1024 4.39 > 2048 6.02 > 4096 7.20 > 8192 9.76 > 16384 15.77 > 32768 21.92 > 65536 33.20 > 131072 54.12 > 262144 99.65 > 524288 187.62 > 1048576 367.76 > 2097152 728.32 > 4194304 1557.02 > > DK > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090807/4a88b80a/attachment-0001.html From bfp at purdue.edu Fri Aug 7 12:30:17 2009 From: bfp at purdue.edu (Bryan Putnam) Date: Fri Aug 7 12:30:45 2009 Subject: [mvapich-discuss] mvapich2 and mpirun_rsh Message-ID: Hi All, I've built mvapich2-1.4rc1 with configure $MPI_SRC/configure \ --with-rdma=gen2 \ --with-ib-libpath=/usr/lib64 \ --enable-fast \ --enable-debuginfo \ --enable-sharedlibs=gcc \ --enable-f77 \ --enable-f90 \ --enable-cxx \ --enable-romio \ --with-pm=mpd \ --without-mpe \ --prefix=$MPI_INSTALL/$CVER \ > configure_$CVER.log 2>&1 and everything works normally if I do mpdboot -n #nodes -f $PBS_NODEFILE mpiexec -n #procs ./a.out however if I attempt to use ssh instead of mpd as in mpirun_rsh -np #procs -hostfile $PBS_NODEFILE ./a.out I see the errors Fatal error in MPI_Init: Error message texts are not available Fatal error in MPI_Init: Error message texts are not available MPI process (rank: 1) terminated unexpectedly on coates-e000.rcac.purdue.edu Note that rsh/ssh itself appears to be working OK. Any help appreciated! Thanks, Bryan From benbeeler at gatech.edu Fri Aug 7 12:35:29 2009 From: benbeeler at gatech.edu (benbeeler@gatech.edu) Date: Fri Aug 7 12:52:58 2009 Subject: [mvapich-discuss] pgi compilation errors In-Reply-To: <1317641513.883961249662885111.JavaMail.root@mail1.gatech.edu> Message-ID: <517389526.884111249662929602.JavaMail.root@mail1.gatech.edu> Greetings, I am compiling mvapich with pgi. I am getting an error when checking the preprocessor. It fails a sanity check when it is tested if it will detect non-existant headers. PGI is installed correctly on our system, and we shouldn't be getting an error here. Attached are my config.log, config-mine.log, and the config.log from the romio subdirectory, which is where it fails. Any help is much appreciated. Thanks. Benjamin Warren Beeler Nuclear and Radiological Engineering Georgia Institute of Technology -------------- next part -------------- A non-text attachment was scrubbed... Name: config.log Type: text/x-log Size: 7362 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090807/9a282f4c/config-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: config_romio.log Type: text/x-log Size: 14345 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090807/9a282f4c/config_romio-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: config-mine.log Type: text/x-log Size: 13595 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090807/9a282f4c/config-mine-0001.bin From Craig.Tierney at noaa.gov Mon Aug 10 13:37:55 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Mon Aug 10 13:38:50 2009 Subject: [mvapich-discuss] Can I start a job using mpirun_rsh without using ethernet? In-Reply-To: <20090729210411.GT2447@cse.ohio-state.edu> References: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> <20090722165335.GC3023@cse.ohio-state.edu> <4A70B2D9.9060804@noaa.gov> <20090729210411.GT2447@cse.ohio-state.edu> Message-ID: <4A805AF3.7040004@noaa.gov> Jonathan Perkins wrote: > On Wed, Jul 29, 2009 at 02:36:41PM -0600, Craig Tierney wrote: >> I am trying to figure out how to launch a job with mpirun_rsh >> without using the ethernet. If I specify the IBoIP addresses >> in my machine file, then mpispawn is launched over the IB. >> However, mpispawn still connects using the ethernet host names. > > In order to do this then the hostname returned by gethostbyname would > have to negotiate to the IPoIB address for that host. It sounds like > this is not the case in your setup. > It turns out that the reason I couldn't launch jobs over the IB was that in mpirun_rsh.c, it uses gethostname to set the host for the MPISPAWN_MPIRUN_HOST setting which gets passed to all children. I put a hack in there to use the ib interface and I get what I want. One more question... If mpirun_rsh is supposed to use a tree to launch the job, why does every child first connected back to the MPISPAWN_MPIRUN_HOST host directly and pass information before the tree part starts up? Craig -- Craig Tierney (craig.tierney@noaa.gov) From Craig.Tierney at noaa.gov Mon Aug 10 13:40:41 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Mon Aug 10 13:41:14 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: References: Message-ID: <4A805B99.2060100@noaa.gov> I have determined that my problem was related to hardware that was dropping or corrupting data that was being sent between mpispawn processes for jobs larger than 512 cores (64 nodes). The magic number may have to do with the fact that transfers started to exceed 8192 bytes, which is the setting of the MTU on our network. We are working with SMC to find a solution to the problem. For now, we hacked mpirun_rsh to launch the jobs over the IB. Thanks for the help, Craig Dhabaleswar Panda wrote: > Craig, > >> A follow-up to my problem. On the new Nehalem cluster (QDR, Centos 5.3, >> OFED-1.4.1, Mvapich-1.2p1), I am still having applications hang when using >> mpirun_rsh. The problem seems to start around 512 cores, but it isn't exact. >> Not sure if this helps, but Openmpi does not have an issue (but I know has >> a completely different launching mechanism). > > Does this happen with OFED 1.4. As you might have seen from the OFA > mailing lists, there have been some issues related to NFS traffic with > OFED 1.4.1. > >> The one similarity is that both systems are using SMC Tiger switch Gige switches >> within the racks and uplink to a Force10 GigE switch (although the behavior >> was repeated when the core switch was a Cisco unit). >> >> I have tried messing with MV2_MT_DEGREE. Setting this low, 4, seems to help >> large jobs start, but it does not solve the problem. > > This is good to know. What happens if you reduce MV2_MT_DEGREE to 2. The > job start-up might be slower. However, we need to see whether it is able > to start the large-scale jobs. > >> So the problem could be hardware or a race condition caused in the software. >> Any ideas of how to debug the software side (or both) would be appreciated). > > Thanks, > > DK > >> Thanks, >> Craig >> >> >> >>> >>> >>>> Thanks, >>>> >>>> DK >>>> >>>> >>>> >>>> On Thu, 9 Jul 2009, Craig Tierney wrote: >>>> >>>>> Dhabaleswar Panda wrote: >>>>>> Are you able to run simple MPI programs (say MPI Hello World) or some IMB >>>>>> tests using ~512 cores or larger. This will help you to find out whether >>>>>> there are any issues when launching jobs and isolate any nodes which might >>>>>> be having problems. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> DK >>>>>> >>>>> I dug in further today while the system was offline, and this >>>>> is what I found. The mpispawn process is hanging. When it hangs >>>>> it does hang on different nodes each time. What I see is that >>>>> one side thinks the connection is closed, and the other side waits. >>>>> >>>>> At one end: >>>>> >>>>> [root@h43 ~]# netstat >>>>> Active Internet connections (w/o servers) >>>>> Proto Recv-Q Send-Q Local Address Foreign Address State >>>>> tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED >>>>> tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED >>>>> tcp 0 0 h43:49730 h6:56443 ESTABLISHED >>>>> tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT >>>>> tcp 0 0 h43:ssh h1:35169 ESTABLISHED >>>>> tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED >>>>> >>>>> >>>>> (gdb) bt >>>>> #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 >>>>> #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 >>>>> #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 >>>>> #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 >>>>> >>>>> At other end (node h4): >>>>> >>>>> (gdb) bt >>>>> #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 >>>>> #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 >>>>> #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 >>>>> >>>>> The netstat on h4 does not show any connections back to h43. >>>>> >>>>> I tried the latest 1.4Beta from the website (not svn) I found that >>>>> for large jobs mpirun_rsh will sometimes exits without running anything. >>>>> The large the job, the more likely it is to not to start the job properly. >>>>> The only difference is that it doesn't hang. I turned on debugging with >>>>> MPISPAWN_DEBUG, but I didn't see anything interesting from that. >>>>> >>>>> Craig >>>>> >>>>> >>>>> >>>>> >>>>>> On Wed, 8 Jul 2009, Craig Tierney wrote: >>>>>> >>>>>>> I am running mvapich2 1.2, built with Ofed support (v1.3.1). >>>>>>> For large jobs, I am having problems where they do not start. >>>>>>> I am using the mpirun_rsh launcher. When I try to start jobs >>>>>>> with ~512 cores or larger, I can see the problem. The problem >>>>>>> doesn't happen all the time. >>>>>>> >>>>>>> I can't rule our quirky hardware. The IB tree seems to be >>>>>>> clean (as reported by ibdiagnet). My last hang, I looked to >>>>>>> see if xhpl had started on all the nodes (8 cases for each >>>>>>> node for dual-socket quad-core systems). I found that 7 of >>>>>>> the 245 nodes (1960 core job) had no xhpl processes on them. >>>>>>> So either the launching mechanism hung, or something was up with one of >>>>>>> those nodes. >>>>>>> >>>>>>> My question is, how should I start debugging this to understand >>>>>>> what process is hanging? >>>>>>> >>>>>>> Thanks, >>>>>>> Craig >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Craig Tierney (craig.tierney@noaa.gov) >>>>>>> _______________________________________________ >>>>>>> mvapich-discuss mailing list >>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>>> >>>>> -- >>>>> Craig Tierney (craig.tierney@noaa.gov) >>>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>> >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- Craig Tierney (craig.tierney@noaa.gov) From panda at cse.ohio-state.edu Mon Aug 10 13:53:41 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Aug 10 13:54:09 2009 Subject: [mvapich-discuss] Question on how to debug job start failures In-Reply-To: <4A805B99.2060100@noaa.gov> Message-ID: Craig, > I have determined that my problem was related to hardware that > was dropping or corrupting data that was being sent between mpispawn > processes for jobs larger than 512 cores (64 nodes). The magic > number may have to do with the fact that transfers started to exceed > 8192 bytes, which is the setting of the MTU on our network. Thanks for the insights here. > We are working with SMC to find a solution to the problem. For now, > we hacked mpirun_rsh to launch the jobs over the IB. Glad to know that things are working now. Thanks, DK > Thanks for the help, > Craig > > > Dhabaleswar Panda wrote: > > Craig, > > > >> A follow-up to my problem. On the new Nehalem cluster (QDR, Centos 5.3, > >> OFED-1.4.1, Mvapich-1.2p1), I am still having applications hang when using > >> mpirun_rsh. The problem seems to start around 512 cores, but it isn't exact. > >> Not sure if this helps, but Openmpi does not have an issue (but I know has > >> a completely different launching mechanism). > > > > Does this happen with OFED 1.4. As you might have seen from the OFA > > mailing lists, there have been some issues related to NFS traffic with > > OFED 1.4.1. > > > >> The one similarity is that both systems are using SMC Tiger switch Gige switches > >> within the racks and uplink to a Force10 GigE switch (although the behavior > >> was repeated when the core switch was a Cisco unit). > >> > >> I have tried messing with MV2_MT_DEGREE. Setting this low, 4, seems to help > >> large jobs start, but it does not solve the problem. > > > > This is good to know. What happens if you reduce MV2_MT_DEGREE to 2. The > > job start-up might be slower. However, we need to see whether it is able > > to start the large-scale jobs. > > > >> So the problem could be hardware or a race condition caused in the software. > >> Any ideas of how to debug the software side (or both) would be appreciated). > > > > Thanks, > > > > DK > > > >> Thanks, > >> Craig > >> > >> > >> > >>> > >>> > >>>> Thanks, > >>>> > >>>> DK > >>>> > >>>> > >>>> > >>>> On Thu, 9 Jul 2009, Craig Tierney wrote: > >>>> > >>>>> Dhabaleswar Panda wrote: > >>>>>> Are you able to run simple MPI programs (say MPI Hello World) or some IMB > >>>>>> tests using ~512 cores or larger. This will help you to find out whether > >>>>>> there are any issues when launching jobs and isolate any nodes which might > >>>>>> be having problems. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> DK > >>>>>> > >>>>> I dug in further today while the system was offline, and this > >>>>> is what I found. The mpispawn process is hanging. When it hangs > >>>>> it does hang on different nodes each time. What I see is that > >>>>> one side thinks the connection is closed, and the other side waits. > >>>>> > >>>>> At one end: > >>>>> > >>>>> [root@h43 ~]# netstat > >>>>> Active Internet connections (w/o servers) > >>>>> Proto Recv-Q Send-Q Local Address Foreign Address State > >>>>> tcp 0 0 h43:50797 wms-sge:sge_qmaster ESTABLISHED > >>>>> tcp 0 0 h43:816 jetsam1:nfs ESTABLISHED > >>>>> tcp 0 0 h43:49730 h6:56443 ESTABLISHED > >>>>> tcp 31245 0 h43:49730 h4:41799 CLOSE_WAIT > >>>>> tcp 0 0 h43:ssh h1:35169 ESTABLISHED > >>>>> tcp 0 0 h43:ssh wfe7-eth2:51964 ESTABLISHED > >>>>> > >>>>> > >>>>> (gdb) bt > >>>>> #0 0x00002b1284f0e950 in __read_nocancel () from /lib64/libc.so.6 > >>>>> #1 0x00000000004035ea in read_socket (socket=5, buffer=0x16dec8a0, bytes=640) at mpirun_util.c:97 > >>>>> #2 0x000000000040402f in mpispawn_tree_init (me=5, req_socket=383699104) at mpispawn_tree.c:190 > >>>>> #3 0x0000000000401a90 in main (argc=5, argv=0x16dec8a0) at mpispawn.c:496 > >>>>> > >>>>> At other end (node h4): > >>>>> > >>>>> (gdb) bt > >>>>> #0 0x00002b95b77308d3 in __select_nocancel () from /lib64/libc.so.6 > >>>>> #1 0x0000000000404379 in mtpmi_processops () at pmi_tree.c:754 > >>>>> #2 0x0000000000401c32 in main (argc=1024, argv=0x6101a0) at mpispawn.c:525 > >>>>> > >>>>> The netstat on h4 does not show any connections back to h43. > >>>>> > >>>>> I tried the latest 1.4Beta from the website (not svn) I found that > >>>>> for large jobs mpirun_rsh will sometimes exits without running anything. > >>>>> The large the job, the more likely it is to not to start the job properly. > >>>>> The only difference is that it doesn't hang. I turned on debugging with > >>>>> MPISPAWN_DEBUG, but I didn't see anything interesting from that. > >>>>> > >>>>> Craig > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> On Wed, 8 Jul 2009, Craig Tierney wrote: > >>>>>> > >>>>>>> I am running mvapich2 1.2, built with Ofed support (v1.3.1). > >>>>>>> For large jobs, I am having problems where they do not start. > >>>>>>> I am using the mpirun_rsh launcher. When I try to start jobs > >>>>>>> with ~512 cores or larger, I can see the problem. The problem > >>>>>>> doesn't happen all the time. > >>>>>>> > >>>>>>> I can't rule our quirky hardware. The IB tree seems to be > >>>>>>> clean (as reported by ibdiagnet). My last hang, I looked to > >>>>>>> see if xhpl had started on all the nodes (8 cases for each > >>>>>>> node for dual-socket quad-core systems). I found that 7 of > >>>>>>> the 245 nodes (1960 core job) had no xhpl processes on them. > >>>>>>> So either the launching mechanism hung, or something was up with one of > >>>>>>> those nodes. > >>>>>>> > >>>>>>> My question is, how should I start debugging this to understand > >>>>>>> what process is hanging? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Craig > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Craig Tierney (craig.tierney@noaa.gov) > >>>>>>> _______________________________________________ > >>>>>>> mvapich-discuss mailing list > >>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>>>> > >>>>> -- > >>>>> Craig Tierney (craig.tierney@noaa.gov) > >>>>> > >>>> _______________________________________________ > >>>> mvapich-discuss mailing list > >>>> mvapich-discuss@cse.ohio-state.edu > >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>> > >>> > >> > >> -- > >> Craig Tierney (craig.tierney@noaa.gov) > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > -- > Craig Tierney (craig.tierney@noaa.gov) > From perkinjo at cse.ohio-state.edu Mon Aug 10 14:18:50 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Mon Aug 10 14:19:18 2009 Subject: [mvapich-discuss] Can I start a job using mpirun_rsh without using ethernet? In-Reply-To: <4A805AF3.7040004@noaa.gov> References: <7f9dfe490907212327o71c13803p70f155c1903b9cec@mail.gmail.com> <20090722165335.GC3023@cse.ohio-state.edu> <4A70B2D9.9060804@noaa.gov> <20090729210411.GT2447@cse.ohio-state.edu> <4A805AF3.7040004@noaa.gov> Message-ID: <20090810181850.GF2534@cse.ohio-state.edu> On Mon, Aug 10, 2009 at 11:37:55AM -0600, Craig Tierney wrote: > Jonathan Perkins wrote: > > On Wed, Jul 29, 2009 at 02:36:41PM -0600, Craig Tierney wrote: > >> I am trying to figure out how to launch a job with mpirun_rsh > >> without using the ethernet. If I specify the IBoIP addresses > >> in my machine file, then mpispawn is launched over the IB. > >> However, mpispawn still connects using the ethernet host names. > > > > In order to do this then the hostname returned by gethostbyname would > > have to negotiate to the IPoIB address for that host. It sounds like > > this is not the case in your setup. > > > > It turns out that the reason I couldn't launch jobs over the IB > was that in mpirun_rsh.c, it uses gethostname to set the host > for the MPISPAWN_MPIRUN_HOST setting which gets passed to all children. > > I put a hack in there to use the ib interface and I get what I want. > > One more question... If mpirun_rsh is supposed to use a tree to launch > the job, why does every child first connected back to the MPISPAWN_MPIRUN_HOST > host directly and pass information before the tree part starts up? When mpispawn launches further mpispawns (creating a tree) it overrides the value of MPISPAWN_MPIRUN_HOST so that its children will talk to it instead of the root node. > > Craig > > > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090810/8e7edd94/attachment.bin From polk678 at gmail.com Tue Aug 11 01:18:18 2009 From: polk678 at gmail.com (gossips J) Date: Tue Aug 11 01:18:47 2009 Subject: [mvapich-discuss] BUG in MVAPICH2-1.2p1 - OFA (RDMA) inside vbuf.c file while calling deallocate_vbufs() Message-ID: Hi, It is observed that while deallocate_vbufs() there is error handling for ibv_dereg_mr() API. This, if it fails, mvapich2 goes for ibv_error_abort() call. Now before doing all these stuff it has been observed that there is spin lock acquired for vBUF. ++++ pthread_spin_lock(&vbuf_lock); ++++ So ideally before calling ibv_error_abort(), it should release this spin lock as well. If this is not done and MR dereg fails, OS gives kernel panic since spin lock has not been released. This seems BUG in mvapich2-1.2p1-1.src.rpm coming with OFED-1.4.1-GA. Following patch should fix this: ++++++++ --- src/mpid/ch3/channels/mrail/src/gen2/vbuf.c +++ src/mpid/ch3/channels/mrail/src/gen2/vbuf_fixed.c @@ -105,6 +105,7 @@ int init_vbuf_lock() void deallocate_vbufs(int hca_num) { vbuf_region *r = vbuf_region_head; + int err = 0; #if !defined(CKPT) if (MPIDI_CH3I_RDMA_Process.has_srq @@ -122,7 +123,8 @@ void deallocate_vbufs(int hca_num) if (r->mem_handle[hca_num] != NULL && ibv_dereg_mr(r->mem_handle[hca_num])) { - ibv_error_abort(IBV_RETURN_ERR, "could not deregister MR"); + err = -1; + break; } DEBUG_PRINT("deregister vbufs\n"); @@ -139,6 +141,9 @@ void deallocate_vbufs(int hca_num) { pthread_spin_unlock(&vbuf_lock); } + + if (err < 0) + ibv_error_abort(IBV_RETURN_ERR, "could not deregister MR"); } static int allocate_vbuf_region(int nvbufs) ++++++++ Thanks, Polk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090811/784828dc/attachment-0001.html From polk678 at gmail.com Tue Aug 11 02:01:49 2009 From: polk678 at gmail.com (gossips J) Date: Tue Aug 11 02:02:18 2009 Subject: [mvapich-discuss] handling of RDMA_CM_EVENT_REJECTED event in MVAPICH2-1.2p1 Message-ID: Hi, It looks like there is no handling for RDMA_CM_EVENT_REJECTED event in "" file. There has to be some sort of handling, isnt it? Is this feature (CM_REJECT) supported in mvapich2-1.2p1 over OFA??? In current scenario mvapich2 simply stuck during such event since there is no handling in MPI (mvapich2). Thoughts??? Thanks, Polk. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090811/c712df57/attachment.html From subramon at cse.ohio-state.edu Tue Aug 11 09:03:36 2009 From: subramon at cse.ohio-state.edu (Hari Subramoni) Date: Tue Aug 11 09:04:02 2009 Subject: [mvapich-discuss] handling of RDMA_CM_EVENT_REJECTED event in MVAPICH2-1.2p1 In-Reply-To: Message-ID: Hi Polk, The RDMA_CM_EVENT_REJECTED event is being handled inside mvapich2-1.2p1. On receiving this event, the task will abort. This is done in 'src/mpid/ch3/channels/mrail/src/gen2/rdma_cm.c' in the function 'ib_cma_event_handler'. Please let us know if you have any further questions. Thx, Hari. On Tue, 11 Aug 2009, gossips J wrote: > Hi, > > It looks like there is no handling for RDMA_CM_EVENT_REJECTED event in "" > file. > > There has to be some sort of handling, isnt it? > > Is this feature (CM_REJECT) supported in mvapich2-1.2p1 over OFA??? > > In current scenario mvapich2 simply stuck during such event since there is > no handling in MPI (mvapich2). > > Thoughts??? > > Thanks, > Polk. > From doriankrause at web.de Tue Aug 11 17:36:53 2009 From: doriankrause at web.de (Dorian Krause) Date: Tue Aug 11 17:38:17 2009 Subject: [mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96 In-Reply-To: <517389526.884111249662929602.JavaMail.root@mail1.gatech.edu> References: <517389526.884111249662929602.JavaMail.root@mail1.gatech.edu> Message-ID: <4A81E475.2040802@web.de> Dear list members, I have a code which uses MPI_Put + MPI_Win_fence for communication. The code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors without problems) and with mvapich for less than 96 processors (the maximal number I have currently access to). The core I got shows me the following: #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 1137 ++(vc_ptr->mrail.rails[rail].postsend_times_1sc); (gdb) p rail No symbol "rail" in current context. (gdb) p vc_ptr $1 = (MPIDI_VC_t *) 0x10013c60 Current language: auto; currently c (gdb) p vc_ptr->mrail $2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0, next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf = 0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0, RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0, 0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0}, RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0', remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0, p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0, in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0, cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager = {num_channels = 0, num_local_pollings = 0, msg_channels = 0x0, next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0, pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0, sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0} (gdb) p vc_ptr->mrail.rails $3 = (struct mrail_rail *) 0x0 (gdb) bt #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 #1 0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0, target_rank=0) at rdma_iba_1sc.c:476 #2 0x000000000045f434 in MPIDI_Win_fence (assert=12288, win_ptr=0x6e06a0) at ch3u_rma_sync.c:165 #3 0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736) at win_fence.c:108 #4 0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M (this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81 #5 0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at Scale4Bonn/scale.cc:129 Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why? The relevant code snippet is mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE); for(int k = 0; k < mTop.numprocs(); ++k) { if(1 == mMustResend[k]) { mWindow.put(&mSendBuf[k], 1, MPI_INT, k, mLocalGroup.myrank(), 1, MPI_INT); } } mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | MPI_MODE_NOPUT); and on the receiver side I just have mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE); mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | MPI_MODE_NOPUT); mWindow is an instance of a wrapper class about MPI_Window, The functions put and fence directly map to MPI_Win_put and MPI_Win_fence ... For this test I used mvapich2 1.4 rc1 configured with ./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0 -ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb Thanks for your help! Regards, Dorian From panda at cse.ohio-state.edu Tue Aug 11 17:55:00 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Aug 11 17:55:27 2009 Subject: [mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96 In-Reply-To: <4A81E475.2040802@web.de> Message-ID: Dorian, Thanks for your report. Do you see this error with the latest trunk version of MVAPICH2 1.4? After the RC1 release, some fixes have gone into the trunk. We are in preparation to bring out RC2. We will also be taking a look at this issue in the mean time. Thanks, DK On Tue, 11 Aug 2009, Dorian Krause wrote: > Dear list members, > > I have a code which uses MPI_Put + MPI_Win_fence for communication. The > code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors > without problems) and with mvapich for less than 96 processors (the > maximal number I have currently access to). The core I got shows me the > following: > > #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp= optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, > remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, > rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 > 1137 ++(vc_ptr->mrail.rails[rail].postsend_times_1sc); > (gdb) p rail > No symbol "rail" in current context. > (gdb) p vc_ptr > $1 = (MPIDI_VC_t *) 0x10013c60 > Current language: auto; currently c > (gdb) p vc_ptr->mrail > $2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0, > next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf = > 0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0, > RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0, > 0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0}, > RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0', > remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0, > p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0, > in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0, > cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager = > {num_channels = 0, num_local_pollings = 0, msg_channels = 0x0, > next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0, > pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0, > sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0} > (gdb) p vc_ptr->mrail.rails > $3 = (struct mrail_rail *) 0x0 > (gdb) bt > #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp= optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, > remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, > rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 > #1 0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0, > target_rank=0) at rdma_iba_1sc.c:476 > #2 0x000000000045f434 in MPIDI_Win_fence (assert=12288, > win_ptr=0x6e06a0) at ch3u_rma_sync.c:165 > #3 0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736) > at win_fence.c:108 > #4 0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M > (this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81 > #5 0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at > Scale4Bonn/scale.cc:129 > > > Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why? > > The relevant code snippet is > > mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE); > for(int k = 0; k < mTop.numprocs(); ++k) { > if(1 == mMustResend[k]) { > mWindow.put(&mSendBuf[k], 1, MPI_INT, k, > mLocalGroup.myrank(), 1, MPI_INT); > } > } > mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | > MPI_MODE_NOPUT); > > and on the receiver side I just have > > mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE); > mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | > MPI_MODE_NOPUT); > > mWindow is an instance of a wrapper class about MPI_Window, The > functions put and fence directly map to MPI_Win_put and MPI_Win_fence ... > > For this test I used mvapich2 1.4 rc1 configured with > > ./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0 > -ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb > > Thanks for your help! > > Regards, > Dorian > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From gopalakk at cse.ohio-state.edu Wed Aug 12 00:03:18 2009 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Wed Aug 12 00:04:20 2009 Subject: [mvapich-discuss] BUG in MVAPICH2-1.2p1 - OFA (RDMA) inside vbuf.c file while calling deallocate_vbufs() In-Reply-To: References: Message-ID: <92eddfb50908112103r227d9dd8j6124345357ce85b5@mail.gmail.com> Hi Polk. Thanks for the patch. We will test and apply this to the final release of MVAPICH2-1.4. I agree that it is good programming practice to release locks that have been previously acquired, before you exit. However, it should not make a difference even if we exit from the library, without releasing the spinlock protecting the vbuf list head, which is internal to the MVAPICH2 stack. MVAPICH2-1.2 library does run in user space. It *should* not result in a Kernel Panic. If it does, it means that some bug in the IB Core / Driver code has been exposed, and should be fixed there. Can you please tell me how the function call stack appears during the panic? Does this happen consistently? Does your patch fix the said issue? Regards, Karthik On Tue, Aug 11, 2009 at 1:18 AM, gossips J wrote: > Hi, > It is observed that while deallocate_vbufs() there is error handling for > ibv_dereg_mr() API. > This, if it fails, mvapich2 goes for ibv_error_abort() call. > Now before doing all these stuff it has been observed that there is spin > lock acquired for vBUF. > ++++ > pthread_spin_lock(&vbuf_lock); > ++++ > So ideally before calling ibv_error_abort(), it should release this spin > lock as well. > If this is not done and MR dereg fails, OS gives kernel panic since spin > lock has not been released. > This seems BUG in mvapich2-1.2p1-1.src.rpm coming with OFED-1.4.1-GA. > Following patch should fix this: > ++++++++ > --- src/mpid/ch3/channels/mrail/src/gen2/vbuf.c > +++ src/mpid/ch3/channels/mrail/src/gen2/vbuf_fixed.c > @@ -105,6 +105,7 @@ int init_vbuf_lock() > ?void deallocate_vbufs(int hca_num) > ?{ > ?? ? vbuf_region *r = vbuf_region_head; > + ? ?int err = 0; > ?#if !defined(CKPT) > ?? ? if (MPIDI_CH3I_RDMA_Process.has_srq > @@ -122,7 +123,8 @@ void deallocate_vbufs(int hca_num) > ?? ? ? ? if (r->mem_handle[hca_num] != NULL > ?? ? ? ? ? ? && ibv_dereg_mr(r->mem_handle[hca_num])) > ?? ? ? ? { > - ? ? ? ? ? ?ibv_error_abort(IBV_RETURN_ERR, "could not deregister MR"); > + ? ? ? ? ? ?err = -1; > + ? ? ? ? ? break; > ?? ? ? ? } > ?? ? ? ? DEBUG_PRINT("deregister vbufs\n"); > @@ -139,6 +141,9 @@ void deallocate_vbufs(int hca_num) > ?? ? { > ?? ? ? ? ?pthread_spin_unlock(&vbuf_lock); > ?? ? } > + > + ? ?if (err < 0) > + ? ? ? ibv_error_abort(IBV_RETURN_ERR, "could not deregister MR"); > ?} > ?static int allocate_vbuf_region(int nvbufs) > ++++++++ > Thanks, > Polk > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From polk678 at gmail.com Wed Aug 12 00:07:57 2009 From: polk678 at gmail.com (gossips J) Date: Wed Aug 12 00:08:55 2009 Subject: [mvapich-discuss] handling of RDMA_CM_EVENT_REJECTED event in MVAPICH2-1.2p1 In-Reply-To: References: Message-ID: Hi Hari, thanks for the response. Yes, this event is considered as CONNECT_ERROR and task aborts but REJECT has no handling so it does nothing. Two question: 1. Why should task gets aborted on Connect Error event. As i can predict the side effect would be affecting other connections with same src-dst pair in data transfer operation. 2. What happens in case of Reject event? Mvapich simply has DEBUG_PRINT and break out of switch case. Thanks, Polk. On Tue, Aug 11, 2009 at 6:33 PM, Hari Subramoni wrote: > Hi Polk, > > The RDMA_CM_EVENT_REJECTED event is being handled inside mvapich2-1.2p1. > On receiving this event, the task will abort. > > This is done in 'src/mpid/ch3/channels/mrail/src/gen2/rdma_cm.c' in the > function 'ib_cma_event_handler'. > > Please let us know if you have any further questions. > > Thx, > Hari. > > On Tue, 11 Aug 2009, gossips J wrote: > > > Hi, > > > > It looks like there is no handling for RDMA_CM_EVENT_REJECTED event in "" > > file. > > > > There has to be some sort of handling, isnt it? > > > > Is this feature (CM_REJECT) supported in mvapich2-1.2p1 over OFA??? > > > > In current scenario mvapich2 simply stuck during such event since there > is > > no handling in MPI (mvapich2). > > > > Thoughts??? > > > > Thanks, > > Polk. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090812/ecbe2412/attachment-0001.html From subramon at cse.ohio-state.edu Wed Aug 12 08:05:01 2009 From: subramon at cse.ohio-state.edu (Hari Subramoni) Date: Wed Aug 12 08:05:29 2009 Subject: [mvapich-discuss] handling of RDMA_CM_EVENT_REJECTED event in MVAPICH2-1.2p1 In-Reply-To: Message-ID: Hi Polk, I'm sorry about saying that the job will be aborted. It will not get aborted when we get a RDMA_CM_EVENT_REJECTED. My bad. Rejecting the connection is done with the knowledge of the src/dest process and is not due to some fault in the system. We reject the connection only when the server and the client try to make a connection to each other at the same time. In this secnario, the connection request from the server process is rejected by the client. Hope this clarifies. Thx, Hari. On Wed, 12 Aug 2009, gossips J wrote: > Hi Hari, > thanks for the response. > > Yes, this event is considered as CONNECT_ERROR and task aborts but REJECT > has no handling so it does nothing. > > Two question: > 1. Why should task gets aborted on Connect Error event. As i can predict the > side effect would be affecting other connections with same src-dst pair in > data transfer operation. > > 2. What happens in case of Reject event? Mvapich simply has DEBUG_PRINT and > break out of switch case. > > Thanks, > Polk. > > On Tue, Aug 11, 2009 at 6:33 PM, Hari Subramoni > wrote: > > > Hi Polk, > > > > The RDMA_CM_EVENT_REJECTED event is being handled inside mvapich2-1.2p1. > > On receiving this event, the task will abort. > > > > This is done in 'src/mpid/ch3/channels/mrail/src/gen2/rdma_cm.c' in the > > function 'ib_cma_event_handler'. > > > > Please let us know if you have any further questions. > > > > Thx, > > Hari. > > > > On Tue, 11 Aug 2009, gossips J wrote: > > > > > Hi, > > > > > > It looks like there is no handling for RDMA_CM_EVENT_REJECTED event in "" > > > file. > > > > > > There has to be some sort of handling, isnt it? > > > > > > Is this feature (CM_REJECT) supported in mvapich2-1.2p1 over OFA??? > > > > > > In current scenario mvapich2 simply stuck during such event since there > > is > > > no handling in MPI (mvapich2). > > > > > > Thoughts??? > > > > > > Thanks, > > > Polk. > > > > > > > > From michael.heinz at qlogic.com Wed Aug 12 11:40:27 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed Aug 12 11:59:19 2009 Subject: [mvapich-discuss] [mpich2-dev] MVAPICH2 does not work with specified PKEYs. Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB45E7DF4E42@MNEXMB1.qlogic.org> My testers are reporting further problems with mvapich2. On a fabric where the use of pkeys is required, mvapich2 is failing. 1) The MV2_DEFAULT_PKEY parameter does not appear to be supported when using mpirun_rsh. 2) When using mpd and mpiexec, the MV2_DEFAULT_PKEY parameter gets passed, but then fails. For example: [root@homer mpi_apps]# export MV2_DEFAULT_PKEY=0xffff [root@homer mpi_apps]# /usr/mpi/gcc/mvapich2-1.2p1/bin/mpiexec -machinefile /opt/iba/src/mpi_apps/mpi_hosts -n 2 osu2/osu_bw [0] Abort: Can't find PKEY INDEX according to given PKEY at line 1190 in file rdma_iba_priv.c rank 0 in job 6 homer.dev.silverstorm.com_33133 caused collective abort of all ranks exit status of rank 0: killed by signal 9 (Note that 0xffff is actually the default PKEY). A quick saquery reveals that the pkey is, in fact in the table: [root@homer mpi_apps]# iba_saquery -o pkey -l 1 LID: 0x0001 PortNum: 1 BlockNum: 0 0- 7: 0x9001 0xffff 0x9002 0x0000 0x0000 0x0000 0x0000 0x0000 8- 15: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 16- 23: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 24- 31: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 When I examine ibv_param.c to see what was going on, here is what I found: if ((value = getenv("MV2_DEFAULT_PKEY")) != NULL) { rdma_default_pkey = (uint16_t)strtol(value, (char **) NULL,0) & PKEY_MASK; } And... #define PKEY_MASK 0x7fff /* the last bit is reserved */ This makes it clear that mpiexec is doing bad things to the pkey - if nothing else, the high bit must be set in order for the connection to have full membership in an Infiniband partition. Without setting this bit, a node will only have "limited membership", and limited nodes are not permitted to talk to each other. I'm going to try and see if I can quickly put together a patch for you that fixes the problems with mpiexec - but I'm not sure what the correct fix is for mpirun_rsh. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090812/2732a78b/attachment.html From doriankrause at web.de Wed Aug 12 12:04:09 2009 From: doriankrause at web.de (Dorian Krause) Date: Wed Aug 12 12:04:56 2009 Subject: [mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96 In-Reply-To: References: Message-ID: <4A82E7F9.3010304@web.de> Hi, thanks for the note. The error is not present with the trunk version! Btw: I had a hard time to configure the trunk with autoconf-2.6.4 (which seems to be the newest version). The maint/updatefiles always failed in the F90 binding folder. You can see the error below: ~/mvapich2/src/binding/f90 kraused$ autoconf -I ../../../confdb configure.in:73: error: AC_LANG_CONFTEST: unknown language: Fortran 90 autoconf/lang.m4:215: AC_LANG_CONFTEST is expanded from... autoconf/general.m4:2585: _AC_COMPILE_IFELSE is expanded from... ../../lib/m4sugar/m4sh.m4:624: AS_IF is expanded from... autoconf/general.m4:2033: AC_CACHE_VAL is expanded from... autoconf/general.m4:2046: AC_CACHE_CHECK is expanded from... fortran90.m4:332: AC_PROG_F90 is expanded from... fortran90.m4:953: PAC_PROG_F90 is expanded from... configure.in:73: the top level autom4te: /usr/bin/m4 failed with exit status: 1 I solved by reverting to autoconf 2.6.3 ... Should I post this to the mpich2 mailing list or is there a difference between the mpich2 and mvapich2 configure scripts? Thanks, Dorian Dhabaleswar Panda wrote: > Dorian, > > Thanks for your report. Do you see this error with the latest trunk > version of MVAPICH2 1.4? After the RC1 release, some fixes have gone into > the trunk. We are in preparation to bring out RC2. > > We will also be taking a look at this issue in the mean time. > > Thanks, > > DK > > On Tue, 11 Aug 2009, Dorian Krause wrote: > > >> Dear list members, >> >> I have a code which uses MPI_Put + MPI_Win_fence for communication. The >> code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors >> without problems) and with mvapich for less than 96 processors (the >> maximal number I have currently access to). The core I got shows me the >> following: >> >> #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, >> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, >> rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 >> 1137 ++(vc_ptr->mrail.rails[rail].postsend_times_1sc); >> (gdb) p rail >> No symbol "rail" in current context. >> (gdb) p vc_ptr >> $1 = (MPIDI_VC_t *) 0x10013c60 >> Current language: auto; currently c >> (gdb) p vc_ptr->mrail >> $2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0, >> next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf = >> 0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0, >> RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0, >> 0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0}, >> RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0', >> remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0, >> p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0, >> in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0, >> cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager = >> {num_channels = 0, num_local_pollings = 0, msg_channels = 0x0, >> next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0, >> pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0, >> sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0} >> (gdb) p vc_ptr->mrail.rails >> $3 = (struct mrail_rail *) 0x0 >> (gdb) bt >> #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, >> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, >> rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 >> #1 0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0, >> target_rank=0) at rdma_iba_1sc.c:476 >> #2 0x000000000045f434 in MPIDI_Win_fence (assert=12288, >> win_ptr=0x6e06a0) at ch3u_rma_sync.c:165 >> #3 0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736) >> at win_fence.c:108 >> #4 0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M >> (this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81 >> #5 0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at >> Scale4Bonn/scale.cc:129 >> >> >> Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why? >> >> The relevant code snippet is >> >> mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE); >> for(int k = 0; k < mTop.numprocs(); ++k) { >> if(1 == mMustResend[k]) { >> mWindow.put(&mSendBuf[k], 1, MPI_INT, k, >> mLocalGroup.myrank(), 1, MPI_INT); >> } >> } >> mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | >> MPI_MODE_NOPUT); >> >> and on the receiver side I just have >> >> mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE); >> mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | >> MPI_MODE_NOPUT); >> >> mWindow is an instance of a wrapper class about MPI_Window, The >> functions put and fence directly map to MPI_Win_put and MPI_Win_fence ... >> >> For this test I used mvapich2 1.4 rc1 configured with >> >> ./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0 >> -ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb >> >> Thanks for your help! >> >> Regards, >> Dorian >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > > From balaji at mcs.anl.gov Wed Aug 12 13:54:36 2009 From: balaji at mcs.anl.gov (Pavan Balaji) Date: Wed Aug 12 13:55:24 2009 Subject: [mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96 In-Reply-To: <4A82E7F9.3010304@web.de> References: <4A82E7F9.3010304@web.de> Message-ID: <4A8301DC.6010402@mcs.anl.gov> > Btw: I had a hard time to configure the trunk with autoconf-2.6.4 (which > seems to be the newest version). The maint/updatefiles always failed in > the F90 binding folder. You can see the error below: Yeah, I can reproduce this error with the mpich2 trunk. I've filed a ticket for this: https://trac.mcs.anl.gov/projects/mpich2/ticket/791 -- Pavan -- Pavan Balaji http://www.mcs.anl.gov/~balaji From mohammad.rashti at queensu.ca Wed Aug 12 12:38:48 2009 From: mohammad.rashti at queensu.ca (Mohammad Javad Rashti) Date: Wed Aug 12 14:29:44 2009 Subject: [mvapich-discuss] Unknown Error in MVAPCH2 Message-ID: <3aea68870908120938h2251974dw663524b1f2c9678@mail.gmail.com> Hi, I am using MVAPICh2 1.0.3, osu_ch3 channel over IB ConnectX cards. When running any MPI program, I get the following error: Got unknown event 17 ... continuing ... To me it appears to be an SM related error but I do not know the cause. Can anyone please help in this regard? Thanks Mohammad -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090812/e35b66d4/attachment.html From panda at cse.ohio-state.edu Wed Aug 12 14:53:07 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Aug 12 14:53:36 2009 Subject: [mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96 In-Reply-To: <4A82E7F9.3010304@web.de> Message-ID: > thanks for the note. The error is not present with the trunk version! Thanks for confirming that it works with the trunk version. We also verified it. FYI, for the RC1 version, it will also work by increasing the on-demand-threshold to the same as the number of processes. This issue has been fixed in the trunk and will be available in the RC2 version. > configure.in:73: the top level > autom4te: /usr/bin/m4 failed with exit status: 1 > > I solved by reverting to autoconf 2.6.3 ... > > Should I post this to the mpich2 mailing list or is there a difference > between the mpich2 and mvapich2 configure scripts? Thanks for indicating this. Pavan has created a trac entry to resolve this. Since mvapich2 releases are typically based on previous mpich2 releases (for example, mvapich2 1.4 is based on mpich2 1.0.8, not 1.1.1), you may see some differences in the mpich2 and mvapich2 configure scripts. Thanks, DK > Thanks, > Dorian > > > Dhabaleswar Panda wrote: > > Dorian, > > > > Thanks for your report. Do you see this error with the latest trunk > > version of MVAPICH2 1.4? After the RC1 release, some fixes have gone into > > the trunk. We are in preparation to bring out RC2. > > > > We will also be taking a look at this issue in the mean time. > > > > Thanks, > > > > DK > > > > On Tue, 11 Aug 2009, Dorian Krause wrote: > > > > > >> Dear list members, > >> > >> I have a code which uses MPI_Put + MPI_Win_fence for communication. The > >> code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors > >> without problems) and with mvapich for less than 96 processors (the > >> maximal number I have currently access to). The core I got shows me the > >> following: > >> > >> #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp= >> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, > >> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, > >> rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 > >> 1137 ++(vc_ptr->mrail.rails[rail].postsend_times_1sc); > >> (gdb) p rail > >> No symbol "rail" in current context. > >> (gdb) p vc_ptr > >> $1 = (MPIDI_VC_t *) 0x10013c60 > >> Current language: auto; currently c > >> (gdb) p vc_ptr->mrail > >> $2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0, > >> next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf = > >> 0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0, > >> RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0, > >> 0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0}, > >> RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0', > >> remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0, > >> p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0, > >> in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0, > >> cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager = > >> {num_channels = 0, num_local_pollings = 0, msg_channels = 0x0, > >> next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0, > >> pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0, > >> sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0} > >> (gdb) p vc_ptr->mrail.rails > >> $3 = (struct mrail_rail *) 0x0 > >> (gdb) bt > >> #0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp= >> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, > >> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c, > >> rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137 > >> #1 0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0, > >> target_rank=0) at rdma_iba_1sc.c:476 > >> #2 0x000000000045f434 in MPIDI_Win_fence (assert=12288, > >> win_ptr=0x6e06a0) at ch3u_rma_sync.c:165 > >> #3 0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736) > >> at win_fence.c:108 > >> #4 0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M > >> (this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81 > >> #5 0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at > >> Scale4Bonn/scale.cc:129 > >> > >> > >> Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why? > >> > >> The relevant code snippet is > >> > >> mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE); > >> for(int k = 0; k < mTop.numprocs(); ++k) { > >> if(1 == mMustResend[k]) { > >> mWindow.put(&mSendBuf[k], 1, MPI_INT, k, > >> mLocalGroup.myrank(), 1, MPI_INT); > >> } > >> } > >> mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | > >> MPI_MODE_NOPUT); > >> > >> and on the receiver side I just have > >> > >> mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE); > >> mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | > >> MPI_MODE_NOPUT); > >> > >> mWindow is an instance of a wrapper class about MPI_Window, The > >> functions put and fence directly map to MPI_Win_put and MPI_Win_fence ... > >> > >> For this test I used mvapich2 1.4 rc1 configured with > >> > >> ./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0 > >> -ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb > >> > >> Thanks for your help! > >> > >> Regards, > >> Dorian > >> > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From subramon at cse.ohio-state.edu Thu Aug 13 00:01:16 2009 From: subramon at cse.ohio-state.edu (Hari Subramoni) Date: Thu Aug 13 00:01:44 2009 Subject: [mvapich-discuss] Unknown Error in MVAPCH2 In-Reply-To: <3aea68870908120938h2251974dw663524b1f2c9678@mail.gmail.com> Message-ID: Hi Mohammad, I believe this error message is being printed by the asynchronous progress thread because of receving an IBV_EVENT_CLIENT_REREGISTER event (event #17). Please note that MVAPICH2 1.0.3 is a relateively old version. You should upgrade to the latest MVAPICH2 1.4 release for better performance and features. Thx, Hari. On Wed, 12 Aug 2009, Mohammad Javad Rashti wrote: > Hi, > I am using MVAPICh2 1.0.3, osu_ch3 channel over IB ConnectX cards. > When running any MPI program, I get the following error: > > Got unknown event 17 ... continuing ... > > To me it appears to be an SM related error but I do not know the cause. > Can anyone please help in this regard? > > Thanks > Mohammad > From polk678 at gmail.com Thu Aug 13 00:27:36 2009 From: polk678 at gmail.com (gossips J) Date: Thu Aug 13 00:28:35 2009 Subject: [mvapich-discuss] handling of RDMA_CM_EVENT_REJECTED event in MVAPICH2-1.2p1 In-Reply-To: References: Message-ID: Hmm got the clarification. Thanks a lot. On Wed, Aug 12, 2009 at 5:35 PM, Hari Subramoni wrote: > Hi Polk, > > I'm sorry about saying that the job will be aborted. It will not get > aborted when we get a RDMA_CM_EVENT_REJECTED. My bad. > > Rejecting the connection is done with the knowledge of the src/dest > process and is not due to some fault in the system. > > We reject the connection only when the server and the client try to make a > connection to each other at the same time. In this secnario, the > connection request from the server process is rejected by the client. > > Hope this clarifies. > > Thx, > Hari. > > On Wed, 12 Aug 2009, gossips J wrote: > > > Hi Hari, > > thanks for the response. > > > > Yes, this event is considered as CONNECT_ERROR and task aborts but REJECT > > has no handling so it does nothing. > > > > Two question: > > 1. Why should task gets aborted on Connect Error event. As i can predict > the > > side effect would be affecting other connections with same src-dst pair > in > > data transfer operation. > > > > 2. What happens in case of Reject event? Mvapich simply has DEBUG_PRINT > and > > break out of switch case. > > > > Thanks, > > Polk. > > > > On Tue, Aug 11, 2009 at 6:33 PM, Hari Subramoni < > subramon@cse.ohio-state.edu > > > wrote: > > > > > Hi Polk, > > > > > > The RDMA_CM_EVENT_REJECTED event is being handled inside > mvapich2-1.2p1. > > > On receiving this event, the task will abort. > > > > > > This is done in 'src/mpid/ch3/channels/mrail/src/gen2/rdma_cm.c' in the > > > function 'ib_cma_event_handler'. > > > > > > Please let us know if you have any further questions. > > > > > > Thx, > > > Hari. > > > > > > On Tue, 11 Aug 2009, gossips J wrote: > > > > > > > Hi, > > > > > > > > It looks like there is no handling for RDMA_CM_EVENT_REJECTED event > in "" > > > > file. > > > > > > > > There has to be some sort of handling, isnt it? > > > > > > > > Is this feature (CM_REJECT) supported in mvapich2-1.2p1 over OFA??? > > > > > > > > In current scenario mvapich2 simply stuck during such event since > there > > > is > > > > no handling in MPI (mvapich2). > > > > > > > > Thoughts??? > > > > > > > > Thanks, > > > > Polk. > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090813/88261c67/attachment.html From doriankrause at web.de Thu Aug 13 12:47:16 2009 From: doriankrause at web.de (Dorian Krause) Date: Thu Aug 13 12:48:22 2009 Subject: [mvapich-discuss] Hang in MPI_Isend/MPI_Recv combination Message-ID: <4A844394.80607@web.de> Hi, again these 96 processors ... My application hangs in a communication step which looks like this: --------- Group A: for all neighbors { MPI_Isend(...); } MPI_Waitall(...); MPI_Barrier(); ---- Group B: while(#messages to receive > 0) { MPI_Probe(MPI_ANY_SOURCE, &stat); q = stat.MPI_SOURCE /* in subfunction: */ MPI_Probe(q, &stat) q = stat.MPI_COUNT; MPI_Recv(q, ...); } MPI_Barrier(); ---- for more 96 processes this application hangs. Since I can't debug on this scale, I used gdb to get backtraces. It tourned out that 94 processes are waiting in the barrier, One processor is trying to receive a message (stuck in MPI_Recv) and one other is waiting in MPI_Waitall(...). This looks fine, however the ranks do not match: On the PE with rank 83, I have #3 0x00000000004349b9 in PMPI_Recv (buf=0x1bd96010, count=202, datatype=-1946157051, source=40, tag=374, comm=-1006632954, status=0x1) at recv.c:156 and on PE with rank *12* I have #3 0x00000000004368f4 in PMPI_Waitall (count=8, array_of_requests=0x197e6b10, array_of_statuses=0x1) at waitall.c:191 It seems that rank 40 slipped throught the MPI_Waitall eventhough he was not supposed to do so ... Please find attached the output files. There are three processes which seem to be not in the barrier (2 on compute-0-3 and 1 on compute-0-13 but the one with the short backtrace on compute-0-3 is also in the barrier as I could confirm by hand). Any hints what might cause this error? I'm using the trunk version of mvapich2 (check-out yesterday) and the cluster consists of 14 LS22 blades (opteron) with 4x DDR Infiniband. I'm not quiet sure which ofed version it is (it is delivered with the rocks distribution and they are typically not very verbose concerning version numbers ...). Thanks for your help, Dorian -------------- next part -------------- A non-text attachment was scrubbed... Name: gdbout.tar.gz Type: application/x-gzip Size: 17140 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090813/417da3f3/gdbout.tar-0001.bin From kandalla at cse.ohio-state.edu Thu Aug 13 14:24:51 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya Kandalla) Date: Thu Aug 13 14:25:24 2009 Subject: [mvapich-discuss] Hang in MPI_Isend/MPI_Recv combination In-Reply-To: <4A844394.80607@web.de> References: <4A844394.80607@web.de> Message-ID: <4A845A73.4000205@cse.ohio-state.edu> Dorian, I have taken a quick look at the set of back-traces. Is it possible to give us a copy of the application that you are running? I noticed that the application is possibly changing the topology before it gets inside the MPI layer and hangs. I am also guessing that the code snippet that you provided is related to what is going on inside hgc::comm::Topology::barrier. But, we dont quite know how the set "all neighbors" has been setup. If we can run the application on our systems here, it would be easier to figure out what is going on. Thanks, Krishna Dorian Krause wrote: > Hi, > > again these 96 processors ... > > My application hangs in a communication step which looks like this: > > --------- > Group A: > > for all neighbors { > MPI_Isend(...); > } > MPI_Waitall(...); > > MPI_Barrier(); > ---- > Group B: > while(#messages to receive > 0) { > MPI_Probe(MPI_ANY_SOURCE, &stat); > q = stat.MPI_SOURCE > /* in subfunction: */ > MPI_Probe(q, &stat) > q = stat.MPI_COUNT; > MPI_Recv(q, ...); > } > MPI_Barrier(); > ---- > > for more 96 processes this application hangs. Since I can't debug on > this scale, I used gdb to get backtraces. It tourned out that 94 > processes are waiting in the barrier, One processor is trying to > receive a message (stuck in MPI_Recv) and one other is waiting in > MPI_Waitall(...). This looks fine, however the ranks do not match: > > On the PE with rank 83, I have > > #3 0x00000000004349b9 in PMPI_Recv (buf=0x1bd96010, count=202, > datatype=-1946157051, source=40, tag=374, comm=-1006632954, > status=0x1) > at recv.c:156 > > and on PE with rank *12* I have > > #3 0x00000000004368f4 in PMPI_Waitall (count=8, > array_of_requests=0x197e6b10, array_of_statuses=0x1) > at waitall.c:191 > > It seems that rank 40 slipped throught the MPI_Waitall eventhough he > was not supposed to do so ... > > Please find attached the output files. There are three processes which > seem to be not in the barrier (2 on compute-0-3 and 1 on compute-0-13 > but the one with the short backtrace on compute-0-3 is also in the > barrier as I could confirm by hand). > > Any hints what might cause this error? > > I'm using the trunk version of mvapich2 (check-out yesterday) and the > cluster consists of 14 LS22 blades (opteron) with 4x DDR Infiniband. > I'm not quiet sure which ofed version it is (it is delivered with the > rocks distribution and they are typically not very verbose concerning > version numbers ...). > > Thanks for your help, > Dorian > > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From subramon at cse.ohio-state.edu Fri Aug 14 10:27:30 2009 From: subramon at cse.ohio-state.edu (Hari Subramoni) Date: Fri Aug 14 10:27:57 2009 Subject: [mvapich-discuss] [mpich2-dev] MVAPICH2 does not work with specified PKEYs. In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB45E7DF4E42@MNEXMB1.qlogic.org> Message-ID: Hi Mike, I just verified with the latest trunk version of MVAPICH2 stack (should be same for older revisions as well) and the MV2_DEFAULT_PKEY parameter is getting passed correctly by mpirun_rsh to the spawned processes. The way we give environment variable like 'MV2_DEFAULT_PKEY' for mpirun_rsh framework and mpiexec is different. For mpirun_rsh we need to give these along with the command itself as given below. ./bin/mpirun_rsh -np 2 amd1 amd2 MV2_DEFAULT_PKEY=0xffff ./a.out For mpiexec, we generally export this to the environment. Could this be the reason you observed that mpirun_rsh is not passing the environment variable properly to the processes? As to the error you see with mpiexec (which you should also see with mpirun_rsh) when setting the MV2_DEFAULT_PKEY to some user defined value, it is because of a small coding error. The following one line patch should fix it. Could you please apply it to your trunk version of MVAPICH2 and let us know if things works fine? Index: src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c =================================================================== --- src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c (revision 3451) +++ src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c (working copy) @@ -1590,7 +1590,7 @@ uint16_t curr_pkey; ibv_query_pkey(MPIDI_CH3I_RDMA_Process.nic_context[hca_num], (uint8_t)port_num, (int)i ,&curr_pkey); - if (pkey == ntohs(curr_pkey) & PKEY_MASK) { + if (pkey == (ntohs(curr_pkey) & PKEY_MASK)) { *index = i; return 1; } I am not very familiar with PKEY's . So, I will have to look it up and get back to you on the reason for using the PKEY_MASK. Sorry about the delay. Thx, Hari. On Wed, 12 Aug 2009, Mike Heinz wrote: > My testers are reporting further problems with mvapich2. On a fabric where the use of pkeys is required, mvapich2 is failing. > > > 1) The MV2_DEFAULT_PKEY parameter does not appear to be supported when using mpirun_rsh. > > 2) When using mpd and mpiexec, the MV2_DEFAULT_PKEY parameter gets passed, but then fails. For example: > > [root@homer mpi_apps]# export MV2_DEFAULT_PKEY=0xffff > [root@homer mpi_apps]# /usr/mpi/gcc/mvapich2-1.2p1/bin/mpiexec -machinefile /opt/iba/src/mpi_apps/mpi_hosts -n 2 osu2/osu_bw > [0] Abort: Can't find PKEY INDEX according to given PKEY > at line 1190 in file rdma_iba_priv.c > rank 0 in job 6 homer.dev.silverstorm.com_33133 caused collective abort of all ranks > exit status of rank 0: killed by signal 9 > > (Note that 0xffff is actually the default PKEY). > > A quick saquery reveals that the pkey is, in fact in the table: > > [root@homer mpi_apps]# iba_saquery -o pkey -l 1 > LID: 0x0001 PortNum: 1 BlockNum: 0 > 0- 7: 0x9001 0xffff 0x9002 0x0000 0x0000 0x0000 0x0000 0x0000 > 8- 15: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 > 16- 23: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 > 24- 31: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 > > When I examine ibv_param.c to see what was going on, here is what I found: > > if ((value = getenv("MV2_DEFAULT_PKEY")) != NULL) { > rdma_default_pkey = (uint16_t)strtol(value, (char **) NULL,0) & PKEY_MASK; > } > And... > > #define PKEY_MASK 0x7fff /* the last bit is reserved */ > > This makes it clear that mpiexec is doing bad things to the pkey - if nothing else, the high bit must be set in order for the connection to have full membership in an Infiniband partition. Without setting this bit, a node will only have "limited membership", and limited nodes are not permitted to talk to each other. > > I'm going to try and see if I can quickly put together a patch for you that fixes the problems with mpiexec - but I'm not sure what the correct fix is for mpirun_rsh. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > From michael.heinz at qlogic.com Fri Aug 14 12:06:57 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Fri Aug 14 12:29:43 2009 Subject: [mvapich-discuss] MVAPICH2-1.2, MVAPICH2-1.4 do not work with specified PKEYs. Proposed patch included Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB45E7DF4F22@MNEXMB1.qlogic.org> My testers are reporting further problems with mvapich2. On a fabric where the use of pkeys is required, mvapich2 is failing. This has two causes: 1) The MV2_DEFAULT_PKEY parameter does not appear to be supported when using mpirun_rsh. Actually, it does not appear that mpirun_rsh supports any MV2 parameters. 2) When using mpd and mpiexec, the MV2_DEFAULT_PKEY parameter gets passed, but then fails. For example: [root@homer mpi_apps]# ?export MV2_DEFAULT_PKEY=0xffff [root@homer mpi_apps]# ?/usr/mpi/gcc/mvapich2-1.2p1/bin/mpiexec -machinefile /opt/iba/src/mpi_apps/mpi_hosts -n 2 osu2/osu_bw ?[0] Abort: Can't find PKEY INDEX according to given PKEY ?at line 1190 in file rdma_iba_priv.c rank 0 in job 6? homer.dev.silverstorm.com_33133?? caused collective abort of all ranks ? exit status of rank 0: killed by signal 9 (Note that 0xffff is actually the default PKEY). A quick saquery reveals that the pkey is, in fact in the table: [root@homer mpi_apps]# iba_saquery -o pkey -l 1 LID: 0x0001 PortNum:? 1 BlockNum:? 0 ????? 0-?? 7:? 0x9001? 0xffff? 0x9002? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000 ????? 8-? 15:? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000 ???? 16-? 23:? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000 ???? 24-? 31:? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000? 0x0000 When I examined ibv_param.c to see what was going on, here is what I found: ??? if ((value = getenv("MV2_DEFAULT_PKEY")) != NULL) { ??????? rdma_default_pkey = (uint16_t)strtol(value, (char **) NULL,0) & PKEY_MASK; ??? } And. ??? #define PKEY_MASK 0x7fff /* the last bit is reserved */ This makes it clear that mpiexec is doing bad things to the pkey - if nothing else, the high bit must be set in order for the connection to have full membership in an Infiniband partition. Without setting this bit, a node will only have "limited membership", and limited nodes are not permitted to talk to each other. The following patch fixes the errors in masking and comparing pkeys in mvapich2-1.2p1. The patch also works for mvapich2-1.4rc1, but with considerable fuzz. ################################################################################################3 diff -rwud mvapich2-1.2p1.orig/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.c mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.c --- mvapich2-1.2p1.orig/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.c 2008-11-02 14:44:32.000000000 -0500 +++ mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.c 2009-08-14 09:35:07.000000000 -0400 @@ -984,7 +984,7 @@ } if ((value = getenv("MV2_DEFAULT_PKEY")) != NULL) { - rdma_default_pkey = (uint16_t)strtol(value, (char **) NULL,0) & PKEY_MASK; + rdma_default_pkey = (uint16_t)strtol(value, (char **) NULL,0) | PKEY_FULL_MEMBERSHIP; } if ((value = getenv("MV2_DEFAULT_MIN_RNR_TIMER")) != NULL) { diff -rwud mvapich2-1.2p1.orig/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h --- mvapich2-1.2p1.orig/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h 2008-10-29 12:55:43.000000000 -0400 +++ mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h 2009-08-12 12:24:12.000000000 -0400 @@ -99,7 +99,8 @@ extern unsigned long rdma_spin_count; extern int USE_SMP; -#define PKEY_MASK 0x7fff /* the last bit is reserved */ +#define PKEY_MASK 0x7fff /* don't use the high bit when looking up pkeys. */ +#define PKEY_FULL_MEMBERSHIP 0x8000 /* MPI apps must be full members. */ #define RDMA_PIN_POOL_SIZE (2*1024*1024) #define RDMA_DEFAULT_MAX_CQ_SIZE (40000) #define RDMA_DEFAULT_PORT (-1) diff -rwud mvapich2-1.2p1.orig/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c --- mvapich2-1.2p1.orig/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c 2008-10-29 12:55:43.000000000 -0400 +++ mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c 2009-08-14 09:35:25.000000000 -0400 @@ -1161,7 +1161,7 @@ uint16_t curr_pkey; ibv_query_pkey(MPIDI_CH3I_RDMA_Process.nic_context[hca_num], (uint8_t)port_num, (int)i ,&curr_pkey); - if (pkey == ntohs(curr_pkey) & PKEY_MASK) { + if ((pkey & PKEY_MASK) == (ntohs(curr_pkey) & PKEY_MASK)) { *index = i; return 1; } ################################################################### On the subject of mpirun_rsh, it would be easy enough to patch it so that it respects MV2_* variables the way it currently respects VIADEV_* variables, but I'd like to understand why it doesn't already do that - is there a reason mpirun_rsh requires you to specify MV2_* variables on the command line instead of in environment or the parameter file? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From michael.heinz at qlogic.com Fri Aug 14 13:17:33 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Fri Aug 14 13:18:36 2009 Subject: [mvapich-discuss] MVAPICH2-1.2, MVAPICH2-1.4 do not work with specified PKEYs. Proposed patch included In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB45E7DF4F22@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB45E7DF4F22@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB45E7DF4F2F@MNEXMB1.qlogic.org> It looks like Hari and I crossed in the email today. My previously submitted patch still holds, but I'd still like to understand the philosophy behind letting some MPI settings be passed in the old parameter file or in the environment but others have to be passed in the command line. From my point of view, it would be much easier on the the job scheduling software if mpiexec and mpirun_rsh used the same methods to pass parameters. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From doriankrause at web.de Fri Aug 14 18:44:49 2009 From: doriankrause at web.de (Dorian Krause) Date: Fri Aug 14 18:45:49 2009 Subject: [mvapich-discuss] Hang in MPI_Isend/MPI_Recv combination Message-ID: <1335902065@web.de> Hi Krishna, > Dorian, > Good to know that the temporary work around has worked for > you too. But, this indicates that there is still something wrong with > our library. We will try to figure out a more concrete fix in the coming > few days. Thanks for your help with this! > Thanks for sending the profiling information. I will take a > look at it. Also, I was also wondering along these lines : With my > understanding of the application so far, the code snippet that you had > sent us separates two communication phases of the application. You are referring to the first code snippet I send? This is the second communication step. You're right: There are two steps. 1. Exchange Graph edges -> Here implemented by a MPI_Alltoall call on an intercommunicator with Group A sending and Group B only receiving (i.e. sendcounts all zero) 2. Exchange Data along the edges -> Done by the MPI_Isend/MPI_Probe in this case (so I actually don't use all the information available but the receiver only needs to know the number of senders to know for how many messages he has to probe). During > the execution of this code, most of the processes are either waiting > inside a barrier or the waitall calls. I hope not. Group A and Group B are disjoint and their union is the group of all processors. Therefore all processes are either sending, or receiving. The barrier is only used for my timing ... Since there are so many processes > involved, is it possible that we missed atleast one process that was > still in the previous phase of the application? I was wondering if we > could have each process make a call to barrier at the beginning of this > code so that we can know for sure that all the processes have completed > executing upto this phase. Please let me know if this is feasible and if > you make such a change in the code and re-send it. In principle it shouldn't be necessary as there is an allreduce in the methods which I inserted to check that the number of send messages and the number of (expected) messages to receive matches. However - to be sure - I added the barrier. Please find attached the patch for this (let me know if it works, I'm not so used to create patches ...). In my test, the outcome is the same. > And we are also interested in looking at the performance > comparisons that you were speaking about. Perfect. I will take some new measurements with the parameters you send me and will prepare some graphs ... Thanks, Dorian > > Thanks, > Krishna > > Dorian Krause wrote: > > Hi Krishna, > > > > sorry, I always forget to send to the list in cc ... > > > > I have tested the code with open-mpi and varying the eager size limit > > (which is 12kb for the openib btl by default) down to 1kb. It still > > works, > > > > Thanks, > > Dorian > > > > Krishna Chaitanya Kandalla wrote: > >> Dorian, > >> On our systems, by tweaking a few parameters, I was able to get > >> the application to complete upto 128 processes. You can probably run > >> your application in the following manner and let us know if it works > >> for you too. > >> > >> mpirun_rsh -np 128 -hostfile ./hosts MV2_IBA_EAGER_THRESHOLD=16384 > >> MV2_VBUF_TOTAL_SIZE=16384 scale_Trans_AlltoalPt2Pt abcdefg > >> > >> > How is the nonblocking communication implemented? > >> > >> Non-blocking calls are designed to provide overlap between > >> communication and computation. Calls to MPI_Isend and MPI_Irecv > >> return without waiting for a confirmation from the library if the > >> message has actually been sent/received. The applications are > >> supposed to do an MPI_Wait later to make sure that the exchange has > >> been completed. So, as long as the user does not touch the buffers > >> that were used for the Isend and Irecv calls, things should be ok. > >> In MVAPICH2, the pt2pt calls use the "eager" protocol for messages of > >> size less than about 8K and the rendezvous protocol for larger > >> messages. By using the above run-time flags, we can alter the > >> threshold between eager and rndv messages. Its not clear as to how > >> the application passes when this threshold is set to 16K. Do you > >> have any profiling information regarding the size of message > >> exchanged? Also, I noticed a lot of calls to Alltoall being made. It > >> will help if you can provide us some information about the size of > >> the buffers for the alltoall operations too. > >> > >> > >> > >> Thanks, > >> Krishna > >> > >> > >> Dorian Krause wrote: > >>> Hi Krishna, > >>> > >>> Krishna Chaitanya Kandalla wrote: > >>>> Dorian, > >>>> Were you able to run your application with open-mpi as well? > >>> > >>> Yes, I have no problem to run it with open-mpi (version 1.3.2). > >>> > >>>> If it is passing with both mpich2 and open-mpi, it indicates that > >>>> the mvapich2 library is doing something wrong. > >>> > >>> I don't know how I should interpret the program behavior. As you > >>> have pointed out, the crucial question is how the set of neighbors > >>> is constructed. You might have seen that I have inserted a small > >>> check in the code to check if the number of sends and the number of > >>> (expected) sends matches on the other side. This is the case. > >>> Since the hang occurs with all three methods to construct the > >>> neighbor set, either all of them are wrong, or the hang is not > >>> directly related to this. > >>> > >>> For me it looks like the following: > >>> Processor 40 sends the envelope to PE 12. PE 12 probes the message > >>> and issues a recv. In the meantime however, PE 40 somehow slipped > >>> through the MPI_Waitall function and so there is no matching send > >>> operation. > >>> > >>> Could it be the case (I'm just speculating). How is the nonblocking > >>> communication implemented? > >>> > >>> > >>> Thanks, > >>> Dorian > >>> > >>>> I tried toggling some of the mvapich2 related parameters, but the > >>>> hang doesnt seem to go away. > >>>> > >>>> Thanks, > >>>> Krishna > >>>> > >>>> > >>>> > >>>> > >>>> Dorian Krause wrote: > >>>>> Hi Krishna, > >>>>> > >>>>> thanks for your tests. If I can be of any help in finding the bug, > >>>>> please let me know ... > >>>>> > >>>>> Thanks, > >>>>> Dorian > >>>>> > >>>>> Krishna Chaitanya Kandalla wrote: > >>>>>> Dorian, > >>>>>> I am able to reproduce the hang with 96 processes on > >>>>>> our systems. I also checked that it runs correctly with > >>>>>> MPICH2-1.0.8. We will try to find a fix soon. > >>>>>> > >>>>>> Thanks, > >>>>>> Krishna > >>>>>> > >>>>>> > >>>>>> Krishna Chaitanya Kandalla wrote: > >>>>>>> Dorian, > >>>>>>> I have taken a quick look at the set of back-traces. > >>>>>>> Is it possible to give us a copy of the application that you are > >>>>>>> running? > >>>>>>> I noticed that the application is possibly changing > >>>>>>> the topology before it gets inside the MPI layer and hangs. I am > >>>>>>> also guessing that the code snippet that you provided is related > >>>>>>> to what is going on inside hgc::comm::Topology::barrier. But, > >>>>>>> we dont quite know how the set "all neighbors" has been setup. > >>>>>>> If we can run the application on our systems here, it would be > >>>>>>> easier to figure out what is going on. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Krishna > >>>>>>> > >>>>>>> Dorian Krause wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> again these 96 processors ... > >>>>>>>> > >>>>>>>> My application hangs in a communication step which looks like > >>>>>>>> this: > >>>>>>>> > >>>>>>>> --------- > >>>>>>>> Group A: > >>>>>>>> > >>>>>>>> for all neighbors { > >>>>>>>> MPI_Isend(...); > >>>>>>>> } > >>>>>>>> MPI_Waitall(...); > >>>>>>>> > >>>>>>>> MPI_Barrier(); > >>>>>>>> ---- > >>>>>>>> Group B: > >>>>>>>> while(#messages to receive > 0) { > >>>>>>>> MPI_Probe(MPI_ANY_SOURCE, &stat); > >>>>>>>> q = stat.MPI_SOURCE > >>>>>>>> /* in subfunction: */ > >>>>>>>> MPI_Probe(q, &stat) > >>>>>>>> q = stat.MPI_COUNT; > >>>>>>>> MPI_Recv(q, ...); > >>>>>>>> } > >>>>>>>> MPI_Barrier(); > >>>>>>>> ---- > >>>>>>>> > >>>>>>>> for more 96 processes this application hangs. Since I can't > >>>>>>>> debug on this scale, I used gdb to get backtraces. It tourned > >>>>>>>> out that 94 processes are waiting in the barrier, One processor > >>>>>>>> is trying to receive a message (stuck in MPI_Recv) and one > >>>>>>>> other is waiting in MPI_Waitall(...). This looks fine, however > >>>>>>>> the ranks do not match: > >>>>>>>> > >>>>>>>> On the PE with rank 83, I have > >>>>>>>> > >>>>>>>> #3 0x00000000004349b9 in PMPI_Recv (buf=0x1bd96010, count=202, > >>>>>>>> datatype=-1946157051, source=40, tag=374, comm=-1006632954, > >>>>>>>> status=0x1) > >>>>>>>> at recv.c:156 > >>>>>>>> > >>>>>>>> and on PE with rank *12* I have > >>>>>>>> > >>>>>>>> #3 0x00000000004368f4 in PMPI_Waitall (count=8, > >>>>>>>> array_of_requests=0x197e6b10, array_of_statuses=0x1) > >>>>>>>> at waitall.c:191 > >>>>>>>> > >>>>>>>> It seems that rank 40 slipped throught the MPI_Waitall > >>>>>>>> eventhough he was not supposed to do so ... > >>>>>>>> > >>>>>>>> Please find attached the output files. There are three > >>>>>>>> processes which seem to be not in the barrier (2 on compute-0-3 > >>>>>>>> and 1 on compute-0-13 but the one with the short backtrace on > >>>>>>>> compute-0-3 is also in the barrier as I could confirm by hand). > >>>>>>>> > >>>>>>>> Any hints what might cause this error? > >>>>>>>> > >>>>>>>> I'm using the trunk version of mvapich2 (check-out yesterday) > >>>>>>>> and the cluster consists of 14 LS22 blades (opteron) with 4x > >>>>>>>> DDR Infiniband. I'm not quiet sure which ofed version it is (it > >>>>>>>> is delivered with the rocks distribution and they are typically > >>>>>>>> not very verbose concerning version numbers ...). > >>>>>>>> > >>>>>>>> Thanks for your help, > >>>>>>>> Dorian > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------------ > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> mvapich-discuss mailing list > >>>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > ______________________________________________________ GRATIS f?r alle WEB.DE-Nutzer: Die maxdome Movie-FLAT! Jetzt freischalten unter http://movieflat.web.de -------------- next part -------------- --- /home/kraused/Devel/HGC/ParticleSubCommunicationManagerPt2Pt.cc 2009-08-13 17:10:42.000000000 +0200 +++ ParticleSubCommunicationManagerPt2Pt.cc 2009-08-15 00:08:20.000000000 +0200 @@ -16,6 +16,8 @@ int nremote, const int *sendcount) { + comm::SET_WORLD.barrier(); + ArrayBase::Request *req = new ArrayBase::Request[list.size()]; @@ -50,6 +52,8 @@ const int *recvcount, const ParticleForMeshInfo *info) { + comm::SET_WORLD.barrier(); + int n = 0; if(info) { n = info->numparticles(); From bjoern.olausson at biochemtech.uni-halle.de Mon Aug 17 05:09:25 2009 From: bjoern.olausson at biochemtech.uni-halle.de (Bjoern Olausson) Date: Mon Aug 17 05:10:23 2009 Subject: [mvapich-discuss] Can't charm++ with mpicxx Message-ID: <200908171109.36553.bjoern.olausson@biochemtech.uni-halle.de> Hi all, it's my first post here and it starts with a curiouse problem. I can successfully compile v1.2, v1.4rc1 and v1.4_trunk, but the resulting mpicc bin fails to compile charm++. Version 1.1 fails to compile itself. Chao Mei told met that those might be related to "multiple definitions" during the linking stage. Google comes up with the following: http://software.intel.com/en-us/forums/intel-c-compiler/topic/46414/ Please find attached the build logs: mvapich-1.1.log http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4305 -------------------------------------------------------------------------------- viainit.c(1417): warning #167: argument of type "void *" is incompatible with parameter of type "void *(*)(void *)" (void *) async_thread, (void *) viadev.context); ^ compilation aborted for viainit.c (code 2) make[3]: *** [viainit.o] Error 2 Exit status from make was 2 make[2]: *** [mpilib] Error 1 make[1]: *** [mpi-modules] Error 2 make: *** [mpi] Error 2 -------------------------------------------------------------------------------- mvapich2-1.2p1.log http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4304 -------------------------------------------------------------------------------- Compiles correctly, but I cant compile charm++ with it. See the following charm++ log charm++2.6.3_mpi-linux-x86_64-mvapich2-1.2p1.log http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4300 -------------------------------------------------------------------------------- xi-symbol.h(1281): warning #592: variable "rtn" is used before its value is set if (next) { rtn += next->genAccels_spe_c_funcBodies(str); } ^ ../bin/charmc -host -language c++ -cp ../bin/ -o charmxi xi-main.o xi- symbol.o xi-grammar.tab.o xi-scan.o xi-util.o sdag-globals.o CSdagConstruct.o CEntry.o /cvos/shared/apps/mvapich2/intel/64/1.2_custom/lib/libmpich.a(parser.o): (.bss+0x18): multiple definition of `yyin' xi-scan.o:(.bss+0x38): first defined here /cvos/shared/apps/mvapich2/intel/64/1.2_custom/lib/libmpich.a(parser.o): (.bss+0x20): multiple definition of `yyout' xi-scan.o:(.bss+0x40): first defined here ld: Warning: alignment 8 of symbol `yylval' in xi-grammar.tab.o is smaller than 32 in /cvos/shared/apps/mvapich2/intel/64/1.2_custom/lib/libmpich.a(tokens.o) Fatal Error by charmc in directory /root/NAMD_2.7b1_Source/charm-6.1.2/mpi- linux-x86_64-mpicxx/tmp Command mpicxx -o charmxi xi-main.o xi-symbol.o xi-grammar.tab.o xi-scan.o xi-util.o sdag-globals.o CSdagConstruct.o CEntry.o -L../bin/../lib returned error code 1 charmc exiting... gmake[1]: *** [../bin/charmxi] Error 1 gmake[1]: Leaving directory `/root/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux- x86_64-mpicxx/tmp' gmake: *** [headers] Error 2 -------------------------------------------------------------------------------- mvapich2-1.4_trunk-20090812.log http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4303 -------------------------------------------------------------------------------- Compiles correctly, but I cant compile charm++ with it. See the following charm++ log -------------------------------------------------------------------------------- charm++2.6.3_mpi-linux-x86_64-mvapich2-1.4_trunk-20090812.log http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4302 -------------------------------------------------------------------------------- xi-symbol.h(1281): warning #592: variable "rtn" is used before its value is set if (next) { rtn += next->genAccels_spe_c_funcBodies(str); } ^ ../bin/charmc -host -language c++ -cp ../bin/ -o charmxi xi-main.o xi- symbol.o xi-grammar.tab.o xi-scan.o xi-util.o sdag-globals.o CSdagConstruct.o CEntry.o /cvos/shared/apps/mvapich2/intel/64/1.4/20090812/lib/libmpich.a(parser.o): (.bss+0x18): multiple definition of `yyin' xi-scan.o:(.bss+0x38): first defined here /cvos/shared/apps/mvapich2/intel/64/1.4/20090812/lib/libmpich.a(parser.o): (.bss+0x20): multiple definition of `yyout' xi-scan.o:(.bss+0x40): first defined here ld: Warning: alignment 8 of symbol `yylval' in xi-grammar.tab.o is smaller than 32 in /cvos/shared/apps/mvapich2/intel/64/1.4/20090812/lib/libmpich.a(tokens.o) Fatal Error by charmc in directory /root/NAMD_2.7b1_Source/charm-6.1.2/mpi- linux-x86_64-mpicxx/tmp Command mpicxx -o charmxi xi-main.o xi-symbol.o xi-grammar.tab.o xi-scan.o xi-util.o sdag-globals.o CSdagConstruct.o CEntry.o -L../bin/../lib returned error code 1 charmc exiting... gmake[1]: *** [../bin/charmxi] Error 1 gmake[1]: Leaving directory `/root/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux- x86_64-mpicxx/tmp' gmake: *** [headers] Error 2 -------------------------------------------------------------------------------- Kind regards Bjoern -- Bjoern Olausson Martin-Luther-Universit?t Halle-Wittenberg Fachbereich Biochemie/Biotechnologie Kurt-Mothes-Str. 3 06120 Halle/Saale Phone: +49-345-55-24942 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090817/2196720b/attachment.bin From mancini at cse.ohio-state.edu Mon Aug 17 19:13:12 2009 From: mancini at cse.ohio-state.edu (Emilio Pasquale Mancini) Date: Mon Aug 17 19:13:41 2009 Subject: [mvapich-discuss] Can't charm++ with mpicxx In-Reply-To: <200908171109.36553.bjoern.olausson@biochemtech.uni-halle.de> References: <200908171109.36553.bjoern.olausson@biochemtech.uni-halle.de> Message-ID: <4A89E408.8010109@cse.ohio-state.edu> Hi Bjoern, Take a look at: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-March/002183.html and http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-June/000189.html Let me known if this helps Bye Emilio Bjoern Olausson wrote: > Hi all, > > it's my first post here and it starts with a curiouse problem. > I can successfully compile v1.2, v1.4rc1 and v1.4_trunk, but the resulting > mpicc bin fails to compile charm++. > > Version 1.1 fails to compile itself. > > Chao Mei told met that those might be related to "multiple definitions" during > the linking stage. Google comes up with the following: > http://software.intel.com/en-us/forums/intel-c-compiler/topic/46414/ > > Please find attached the build logs: > > > mvapich-1.1.log > http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4305 > -------------------------------------------------------------------------------- > viainit.c(1417): warning #167: argument of type "void *" is incompatible with > parameter of type "void *(*)(void *)" > (void *) async_thread, (void *) viadev.context); > ^ > > compilation aborted for viainit.c (code 2) > make[3]: *** [viainit.o] Error 2 > Exit status from make was 2 > make[2]: *** [mpilib] Error 1 > make[1]: *** [mpi-modules] Error 2 > make: *** [mpi] Error 2 > -------------------------------------------------------------------------------- > > > > mvapich2-1.2p1.log > http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4304 > -------------------------------------------------------------------------------- > Compiles correctly, but I cant compile charm++ with it. > See the following charm++ log > > charm++2.6.3_mpi-linux-x86_64-mvapich2-1.2p1.log > http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4300 > -------------------------------------------------------------------------------- > xi-symbol.h(1281): warning #592: variable "rtn" is used before its value is > set > if (next) { rtn += next->genAccels_spe_c_funcBodies(str); } > ^ > > ../bin/charmc -host -language c++ -cp ../bin/ -o charmxi xi-main.o xi- > symbol.o xi-grammar.tab.o xi-scan.o xi-util.o sdag-globals.o CSdagConstruct.o > CEntry.o > /cvos/shared/apps/mvapich2/intel/64/1.2_custom/lib/libmpich.a(parser.o): > (.bss+0x18): multiple definition of `yyin' > xi-scan.o:(.bss+0x38): first defined here > /cvos/shared/apps/mvapich2/intel/64/1.2_custom/lib/libmpich.a(parser.o): > (.bss+0x20): multiple definition of `yyout' > xi-scan.o:(.bss+0x40): first defined here > ld: Warning: alignment 8 of symbol `yylval' in xi-grammar.tab.o is smaller > than 32 in > /cvos/shared/apps/mvapich2/intel/64/1.2_custom/lib/libmpich.a(tokens.o) > Fatal Error by charmc in directory /root/NAMD_2.7b1_Source/charm-6.1.2/mpi- > linux-x86_64-mpicxx/tmp > Command mpicxx -o charmxi xi-main.o xi-symbol.o xi-grammar.tab.o xi-scan.o > xi-util.o sdag-globals.o CSdagConstruct.o CEntry.o -L../bin/../lib returned > error code 1 > charmc exiting... > gmake[1]: *** [../bin/charmxi] Error 1 > gmake[1]: Leaving directory `/root/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux- > x86_64-mpicxx/tmp' > gmake: *** [headers] Error 2 > -------------------------------------------------------------------------------- > > > > mvapich2-1.4_trunk-20090812.log > http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4303 > -------------------------------------------------------------------------------- > Compiles correctly, but I cant compile charm++ with it. > See the following charm++ log > -------------------------------------------------------------------------------- > > charm++2.6.3_mpi-linux-x86_64-mvapich2-1.4_trunk-20090812.log > http://daten-transport.de/download.php?id=baknrBs6wNkv&dateinummer=4302 > -------------------------------------------------------------------------------- > xi-symbol.h(1281): warning #592: variable "rtn" is used before its value is > set > if (next) { rtn += next->genAccels_spe_c_funcBodies(str); } > ^ > > ../bin/charmc -host -language c++ -cp ../bin/ -o charmxi xi-main.o xi- > symbol.o xi-grammar.tab.o xi-scan.o xi-util.o sdag-globals.o CSdagConstruct.o > CEntry.o > /cvos/shared/apps/mvapich2/intel/64/1.4/20090812/lib/libmpich.a(parser.o): > (.bss+0x18): multiple definition of `yyin' > xi-scan.o:(.bss+0x38): first defined here > /cvos/shared/apps/mvapich2/intel/64/1.4/20090812/lib/libmpich.a(parser.o): > (.bss+0x20): multiple definition of `yyout' > xi-scan.o:(.bss+0x40): first defined here > ld: Warning: alignment 8 of symbol `yylval' in xi-grammar.tab.o is smaller > than 32 in > /cvos/shared/apps/mvapich2/intel/64/1.4/20090812/lib/libmpich.a(tokens.o) > Fatal Error by charmc in directory /root/NAMD_2.7b1_Source/charm-6.1.2/mpi- > linux-x86_64-mpicxx/tmp > Command mpicxx -o charmxi xi-main.o xi-symbol.o xi-grammar.tab.o xi-scan.o > xi-util.o sdag-globals.o CSdagConstruct.o CEntry.o -L../bin/../lib returned > error code 1 > charmc exiting... > gmake[1]: *** [../bin/charmxi] Error 1 > gmake[1]: Leaving directory `/root/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux- > x86_64-mpicxx/tmp' > gmake: *** [headers] Error 2 > -------------------------------------------------------------------------------- > > Kind regards > Bjoern > > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From pralle at welfenlab.de Tue Aug 18 11:52:07 2009 From: pralle at welfenlab.de (pralle@welfenlab.de) Date: Tue Aug 18 13:24:57 2009 Subject: [mvapich-discuss] x86 and IBM QS22 blades together Message-ID: <20090818175207.im25ua7teu8ko8sk@mail.gdv.uni-hannover.de> Hello, is it possible to use x86-nodes and QS22 blades in an heterogeneous cluster? In the manpage I found: SPECIFYING HETEROGENEOUS SYSTEMS Multiple architectures may be handled by giving multiple -arch and -np arguments. For example, to run a pro- gram on 2 sun4s and 3 rs6000s, with the local machine being a sun4, use mpirun -arch sun4 -np 2 -arch rs6000 -np 3 program [...] This doesn't work here. I tried: > mpirun -arch cbea -np 1 -arch x86_64 -np 1 -hostfile > util/machines/machines.%a osu/osu_bw.%a and got: Can't open hostfile util/machines/machines.%a As I understood it, the "%a" should have been replaced by the value of the "-arch" arguments. Any help appreciated, daniel ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From michael.heinz at qlogic.com Tue Aug 18 14:05:45 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue Aug 18 14:06:58 2009 Subject: [mvapich-discuss] Hang in MPI_Isend/MPI_Recv combination In-Reply-To: <1335902065@web.de> References: <1335902065@web.de> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB45E9062FFD@MNEXMB1.qlogic.org> Just an FYI to the list - I have also been seeing mysterious hangs, even on simple 2-machine fabrics. (See "Need a hint in debugging a problem that only affects a few machines in our cluster") and this work-around fixed my problems as well. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: mvapich-discuss-bounces@cse.ohio-state.edu [mailto:mvapich-discuss-bounces@cse.ohio-state.edu] On Behalf Of Dorian Krause Sent: Friday, August 14, 2009 6:45 PM To: Krishna Chaitanya Kandalla Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Hang in MPI_Isend/MPI_Recv combination Hi Krishna, > Dorian, > Good to know that the temporary work around has worked for > you too. But, this indicates that there is still something wrong with > our library. We will try to figure out a more concrete fix in the > coming few days. Thanks for your help with this! > Thanks for sending the profiling information. I will take > a look at it. Also, I was also wondering along these lines : With my > understanding of the application so far, the code snippet that you > had sent us separates two communication phases of the application. You are referring to the first code snippet I send? This is the second communication step. You're right: There are two steps. 1. Exchange Graph edges -> Here implemented by a MPI_Alltoall call on an intercommunicator with -> Group A sending and Group B only receiving (i.e. sendcounts all zero) 2. Exchange Data along the edges -> Done by the MPI_Isend/MPI_Probe in this case (so I actually don't use all the information available but the receiver only needs to know the number of senders to know for how many messages he has to probe). During > the execution of this code, most of the processes are either waiting > inside a barrier or the waitall calls. I hope not. Group A and Group B are disjoint and their union is the group of all processors. Therefore all processes are either sending, or receiving. The barrier is only used for my timing ... Since there are so many processes > involved, is it possible that we missed atleast one process that was > still in the previous phase of the application? I was wondering if we > could have each process make a call to barrier at the beginning of > this code so that we can know for sure that all the processes have > completed executing upto this phase. Please let me know if this is > feasible and if you make such a change in the code and re-send it. In principle it shouldn't be necessary as there is an allreduce in the methods which I inserted to check that the number of send messages and the number of (expected) messages to receive matches. However - to be sure - I added the barrier. Please find attached the patch for this (let me know if it works, I'm not so used to create patches ...). In my test, the outcome is the same. > And we are also interested in looking at the performance > comparisons that you were speaking about. Perfect. I will take some new measurements with the parameters you send me and will prepare some graphs ... Thanks, Dorian > > Thanks, > Krishna > > Dorian Krause wrote: > > Hi Krishna, > > > > sorry, I always forget to send to the list in cc ... > > > > I have tested the code with open-mpi and varying the eager size > > limit (which is 12kb for the openib btl by default) down to 1kb. It > > still works, > > > > Thanks, > > Dorian > > > > Krishna Chaitanya Kandalla wrote: > >> Dorian, > >> On our systems, by tweaking a few parameters, I was able to get > >> the application to complete upto 128 processes. You can probably > >> run your application in the following manner and let us know if it > >> works for you too. > >> > >> mpirun_rsh -np 128 -hostfile ./hosts MV2_IBA_EAGER_THRESHOLD=16384 > >> MV2_VBUF_TOTAL_SIZE=16384 scale_Trans_AlltoalPt2Pt abcdefg > >> > >> > How is the nonblocking communication implemented? > >> > >> Non-blocking calls are designed to provide overlap between > >> communication and computation. Calls to MPI_Isend and MPI_Irecv > >> return without waiting for a confirmation from the library if the > >> message has actually been sent/received. The applications are > >> supposed to do an MPI_Wait later to make sure that the exchange has > >> been completed. So, as long as the user does not touch the buffers > >> that were used for the Isend and Irecv calls, things should be ok. > >> In MVAPICH2, the pt2pt calls use the "eager" protocol for messages > >> of size less than about 8K and the rendezvous protocol for larger > >> messages. By using the above run-time flags, we can alter the > >> threshold between eager and rndv messages. Its not clear as to how > >> the application passes when this threshold is set to 16K. Do you > >> have any profiling information regarding the size of message > >> exchanged? Also, I noticed a lot of calls to Alltoall being made. > >> It will help if you can provide us some information about the size > >> of the buffers for the alltoall operations too. > >> > >> > >> > >> Thanks, > >> Krishna > >> > >> > >> Dorian Krause wrote: > >>> Hi Krishna, > >>> > >>> Krishna Chaitanya Kandalla wrote: > >>>> Dorian, > >>>> Were you able to run your application with open-mpi as well? > >>> > >>> Yes, I have no problem to run it with open-mpi (version 1.3.2). > >>> > >>>> If it is passing with both mpich2 and open-mpi, it indicates > >>>> that the mvapich2 library is doing something wrong. > >>> > >>> I don't know how I should interpret the program behavior. As you > >>> have pointed out, the crucial question is how the set of neighbors > >>> is constructed. You might have seen that I have inserted a small > >>> check in the code to check if the number of sends and the number > >>> of > >>> (expected) sends matches on the other side. This is the case. > >>> Since the hang occurs with all three methods to construct the > >>> neighbor set, either all of them are wrong, or the hang is not > >>> directly related to this. > >>> > >>> For me it looks like the following: > >>> Processor 40 sends the envelope to PE 12. PE 12 probes the message > >>> and issues a recv. In the meantime however, PE 40 somehow slipped > >>> through the MPI_Waitall function and so there is no matching send > >>> operation. > >>> > >>> Could it be the case (I'm just speculating). How is the > >>> nonblocking communication implemented? > >>> > >>> > >>> Thanks, > >>> Dorian > >>> > >>>> I tried toggling some of the mvapich2 related parameters, but the > >>>> hang doesnt seem to go away. > >>>> > >>>> Thanks, > >>>> Krishna > >>>> > >>>> > >>>> > >>>> > >>>> Dorian Krause wrote: > >>>>> Hi Krishna, > >>>>> > >>>>> thanks for your tests. If I can be of any help in finding the > >>>>> bug, please let me know ... > >>>>> > >>>>> Thanks, > >>>>> Dorian > >>>>> > >>>>> Krishna Chaitanya Kandalla wrote: > >>>>>> Dorian, > >>>>>> I am able to reproduce the hang with 96 processes on > >>>>>> our systems. I also checked that it runs correctly with > >>>>>> MPICH2-1.0.8. We will try to find a fix soon. > >>>>>> > >>>>>> Thanks, > >>>>>> Krishna > >>>>>> > >>>>>> > >>>>>> Krishna Chaitanya Kandalla wrote: > >>>>>>> Dorian, > >>>>>>> I have taken a quick look at the set of back-traces. > >>>>>>> Is it possible to give us a copy of the application that you > >>>>>>> are running? > >>>>>>> I noticed that the application is possibly changing > >>>>>>> the topology before it gets inside the MPI layer and hangs. I > >>>>>>> am also guessing that the code snippet that you provided is > >>>>>>> related to what is going on inside > >>>>>>> hgc::comm::Topology::barrier. But, we dont quite know how the set "all neighbors" has been setup. > >>>>>>> If we can run the application on our systems here, it would be > >>>>>>> easier to figure out what is going on. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Krishna > >>>>>>> > >>>>>>> Dorian Krause wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> again these 96 processors ... > >>>>>>>> > >>>>>>>> My application hangs in a communication step which looks like > >>>>>>>> this: > >>>>>>>> > >>>>>>>> --------- > >>>>>>>> Group A: > >>>>>>>> > >>>>>>>> for all neighbors { > >>>>>>>> MPI_Isend(...); > >>>>>>>> } > >>>>>>>> MPI_Waitall(...); > >>>>>>>> > >>>>>>>> MPI_Barrier(); > >>>>>>>> ---- > >>>>>>>> Group B: > >>>>>>>> while(#messages to receive > 0) { > >>>>>>>> MPI_Probe(MPI_ANY_SOURCE, &stat); > >>>>>>>> q = stat.MPI_SOURCE > >>>>>>>> /* in subfunction: */ > >>>>>>>> MPI_Probe(q, &stat) > >>>>>>>> q = stat.MPI_COUNT; > >>>>>>>> MPI_Recv(q, ...); > >>>>>>>> } > >>>>>>>> MPI_Barrier(); > >>>>>>>> ---- > >>>>>>>> > >>>>>>>> for more 96 processes this application hangs. Since I can't > >>>>>>>> debug on this scale, I used gdb to get backtraces. It tourned > >>>>>>>> out that 94 processes are waiting in the barrier, One > >>>>>>>> processor is trying to receive a message (stuck in MPI_Recv) > >>>>>>>> and one other is waiting in MPI_Waitall(...). This looks > >>>>>>>> fine, however the ranks do not match: > >>>>>>>> > >>>>>>>> On the PE with rank 83, I have > >>>>>>>> > >>>>>>>> #3 0x00000000004349b9 in PMPI_Recv (buf=0x1bd96010, count=202, > >>>>>>>> datatype=-1946157051, source=40, tag=374, > >>>>>>>> comm=-1006632954, > >>>>>>>> status=0x1) > >>>>>>>> at recv.c:156 > >>>>>>>> > >>>>>>>> and on PE with rank *12* I have > >>>>>>>> > >>>>>>>> #3 0x00000000004368f4 in PMPI_Waitall (count=8, > >>>>>>>> array_of_requests=0x197e6b10, array_of_statuses=0x1) > >>>>>>>> at waitall.c:191 > >>>>>>>> > >>>>>>>> It seems that rank 40 slipped throught the MPI_Waitall > >>>>>>>> eventhough he was not supposed to do so ... > >>>>>>>> > >>>>>>>> Please find attached the output files. There are three > >>>>>>>> processes which seem to be not in the barrier (2 on > >>>>>>>> compute-0-3 and 1 on compute-0-13 but the one with the short > >>>>>>>> backtrace on > >>>>>>>> compute-0-3 is also in the barrier as I could confirm by hand). > >>>>>>>> > >>>>>>>> Any hints what might cause this error? > >>>>>>>> > >>>>>>>> I'm using the trunk version of mvapich2 (check-out yesterday) > >>>>>>>> and the cluster consists of 14 LS22 blades (opteron) with 4x > >>>>>>>> DDR Infiniband. I'm not quiet sure which ofed version it is > >>>>>>>> (it is delivered with the rocks distribution and they are > >>>>>>>> typically not very verbose concerning version numbers ...). > >>>>>>>> > >>>>>>>> Thanks for your help, > >>>>>>>> Dorian > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------- > >>>>>>>> ----------- > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> mvapich-discuss mailing list > >>>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discu > >>>>>>>> ss > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > ______________________________________________________ GRATIS f?r alle WEB.DE-Nutzer: Die maxdome Movie-FLAT! Jetzt freischalten unter http://movieflat.web.de From panda at cse.ohio-state.edu Tue Aug 18 14:48:53 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Aug 18 14:49:20 2009 Subject: [mvapich-discuss] x86 and IBM QS22 blades together In-Reply-To: <20090818175207.im25ua7teu8ko8sk@mail.gdv.uni-hannover.de> Message-ID: > Hello, > > is it possible to use x86-nodes and QS22 blades in an heterogeneous cluster? We have not tried this combination. However, MVAPICH supports heterogeneous architecture and it should work. > In the manpage I found: > > SPECIFYING HETEROGENEOUS SYSTEMS > Multiple architectures may be handled by giving multiple -arch > and -np arguments. For example, to run a pro- > gram on 2 sun4s and 3 rs6000s, with the local machine being a sun4, use > mpirun -arch sun4 -np 2 -arch rs6000 -np 3 program > [...] The above paragraph is not from MVAPICH user guide. Please check with the vendor of your system. They will be able to provide more details on this. > This doesn't work here. > I tried: > > mpirun -arch cbea -np 1 -arch x86_64 -np 1 -hostfile > > util/machines/machines.%a osu/osu_bw.%a > > and got: > Can't open hostfile util/machines/machines.%a > > As I understood it, the "%a" should have been replaced by the value of > the "-arch" arguments. > > > Any help appreciated, > daniel Thanks, DK > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From mohammad.rashti at ece.queensu.ca Tue Aug 18 14:51:51 2009 From: mohammad.rashti at ece.queensu.ca (Mohammad Javad Rashti) Date: Tue Aug 18 14:52:57 2009 Subject: [mvapich-discuss] Unknown Error in MVAPCH2 In-Reply-To: References: <3aea68870908120938h2251974dw663524b1f2c9678@mail.gmail.com> Message-ID: <3aea68870908181151k6d4bebd4y2b8b05b57f1f25b0@mail.gmail.com> Hi Hari, Thanks for the reply. Do you know what would cause such an error? Any action to avoid it? I have my code in MVAPICH2 1.0.3 and prefer to work there if possible. Do you think this is a problem with MVAPICH2 or my system? Thank you Mohammad On Thu, Aug 13, 2009 at 12:01 AM, Hari Subramoni < subramon@cse.ohio-state.edu> wrote: > Hi Mohammad, > > I believe this error message is being printed by the asynchronous progress > thread because of receving an IBV_EVENT_CLIENT_REREGISTER event (event > #17). > > Please note that MVAPICH2 1.0.3 is a relateively old version. You should > upgrade to the latest MVAPICH2 1.4 release for better performance and > features. > > Thx, > Hari. > > On Wed, 12 Aug 2009, Mohammad Javad Rashti wrote: > > > Hi, > > I am using MVAPICh2 1.0.3, osu_ch3 channel over IB ConnectX cards. > > When running any MPI program, I get the following error: > > > > Got unknown event 17 ... continuing ... > > > > To me it appears to be an SM related error but I do not know the cause. > > Can anyone please help in this regard? > > > > Thanks > > Mohammad > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090818/1042ff6d/attachment.html From bjoern.olausson at biochemtech.uni-halle.de Thu Aug 20 09:25:50 2009 From: bjoern.olausson at biochemtech.uni-halle.de (Bjoern Olausson) Date: Thu Aug 20 09:27:05 2009 Subject: [mvapich-discuss] Can't charm++ with mpicxx In-Reply-To: <4A89E408.8010109@cse.ohio-state.edu> References: <200908171109.36553.bjoern.olausson@biochemtech.uni-halle.de> <4A89E408.8010109@cse.ohio-state.edu> Message-ID: <200908201525.55464.bjoern.olausson@biochemtech.uni-halle.de> On Tuesday 18 August 2009 01:13:12 Emilio Pasquale Mancini wrote: > Hi Bjoern, > Take a look at: > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-March/002183. >html > Mmmh, actually I have OFED 1.3.2 installed (Scientific Linux SL release 5.3 (Boron)) But it seems as if removing -DXRC from CFLAGS made it at least compile. Anyway, this does not help compiling charm wit any version of mvapich I compiled on my own. >and > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-June/000189.h >tml > #define CMK_MALLOC_USE_GNU_MALLOC 0 #define CMK_MALLOC_USE_OS_BUILTIN 1 is default in charm 2.1.2 This problem remains unsolved (with all versions I compiled, including v1.1. > Let me known if this helps > No, sorry, didn't help, except that I am now able to compile mvapich v1.1. Cheers Bjoern -- Bjoern Olausson Martin-Luther-Universit?t Halle-Wittenberg Fachbereich Biochemie/Biotechnologie Kurt-Mothes-Str. 3 06120 Halle/Saale Phone: +49-345-55-24942 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090820/aeff925b/attachment.bin From mancini at cse.ohio-state.edu Thu Aug 20 13:57:39 2009 From: mancini at cse.ohio-state.edu (Emilio Pasquale Mancini) Date: Thu Aug 20 13:58:16 2009 Subject: [mvapich-discuss] Can't charm++ with mpicxx In-Reply-To: <200908201525.55464.bjoern.olausson@biochemtech.uni-halle.de> References: <200908171109.36553.bjoern.olausson@biochemtech.uni-halle.de> <4A89E408.8010109@cse.ohio-state.edu> <200908201525.55464.bjoern.olausson@biochemtech.uni-halle.de> Message-ID: <4A8D8E93.1070402@cse.ohio-state.edu> Hi Bjoern, On 08/20/2009 09:25 AM, Bjoern Olausson wrote: > On Tuesday 18 August 2009 01:13:12 Emilio Pasquale Mancini wrote: > >> Hi Bjoern, >> Take a look at: >> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-March/002183. >> html >> >> > Mmmh, actually I have OFED 1.3.2 installed (Scientific Linux SL release 5.3 > (Boron)) > > But it seems as if removing -DXRC from CFLAGS made it at least compile. > > Anyway, this does not help compiling charm wit any version of mvapich I > compiled on my own. > > >> and >> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-June/000189.h >> tml >> >> > #define CMK_MALLOC_USE_GNU_MALLOC 0 > #define CMK_MALLOC_USE_OS_BUILTIN 1 > > is default in charm 2.1.2 > > You mean 6.1.2? > This problem remains unsolved (with all versions I compiled, including v1.1. > > >> Let me known if this helps >> >> > No, sorry, didn't help, except that I am now able to compile mvapich v1.1. > > I reproduced your error, it seems to be an issue of icc compiler. Using gcc it works fine. With icc you should disable optimizations, I used the following comman line: ./build charm++ mpi-linux-x86_64-mvapich icc -O -I/include -L/lib --no-build-shared Then I resume the charm++ compilation process for the mailinglist: cd src/arch cp -r mpi-linux-x86_64 mpi-linux-x86_64-mvapich cd mpi-linux-x86_64-mvapich Edit conv-mach.h and change (not necessary in Charm++ 6.1.2): #define CMK_MALLOC_USE_GNU_MALLOC 1 #define CMK_MALLOC_USE_OS_BUILTIN 0 to: #define CMK_MALLOC_USE_GNU_MALLOC 0 #define CMK_MALLOC_USE_OS_BUILTIN 1 Make sure the MVAPICH mpicc and mpiCC are first in your path. Otherwise, add the full path to the mpicc and mpiCC commands in conv_mach.sh. If the ?build? command does not recognize the mpiCC compiler you can to edit the conv-mach.sh file changing the line MPICXX_DEF=mpicxx to: #MPICXX_DEF=mpiCC MPICXX_DEF=mpicxx cd ../../.. ./build charm++ mpi-linux-x86_64-mvapich --no-build-shared or for icc compiler: ./build charm++ mpi-linux-x86_64-mvapich icc -O -I/include -L/lib --no-build-shared > Cheers > Bjoern > > Bye Emilio From bjoern.olausson at biochemtech.uni-halle.de Mon Aug 24 09:04:22 2009 From: bjoern.olausson at biochemtech.uni-halle.de (Bjoern Olausson) Date: Mon Aug 24 09:18:48 2009 Subject: [mvapich-discuss] Can't charm++ with mpicxx In-Reply-To: <4A8D8E93.1070402@cse.ohio-state.edu> References: <200908171109.36553.bjoern.olausson@biochemtech.uni-halle.de> <200908201525.55464.bjoern.olausson@biochemtech.uni-halle.de> <4A8D8E93.1070402@cse.ohio-state.edu> Message-ID: <200908241504.34453.bjoern.olausson@biochemtech.uni-halle.de> On Thursday 20 August 2009 19:57:39 Emilio Pasquale Mancini wrote: > Hi Bjoern, > > On 08/20/2009 09:25 AM, Bjoern Olausson wrote: > > On Tuesday 18 August 2009 01:13:12 Emilio Pasquale Mancini wrote: > >> Hi Bjoern, > >> Take a look at: > >> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-March/0021 > >>83. html > > > > Mmmh, actually I have OFED 1.3.2 installed (Scientific Linux SL release > > 5.3 (Boron)) > > > > But it seems as if removing -DXRC from CFLAGS made it at least compile. > > > > Anyway, this does not help compiling charm wit any version of mvapich I > > compiled on my own. > > > >> and > >> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-June/00018 > >>9.h tml > > > > #define CMK_MALLOC_USE_GNU_MALLOC 0 > > #define CMK_MALLOC_USE_OS_BUILTIN 1 > > > > is default in charm 2.1.2 > > You mean 6.1.2? > Yes,sorry for the typo. > > This problem remains unsolved (with all versions I compiled, including > > v1.1. > > > >> Let me known if this helps > > > > No, sorry, didn't help, except that I am now able to compile mvapich > > v1.1. > > I reproduced your error, it seems to be an issue of icc compiler. Using > gcc it works fine. > With icc you should disable optimizations, I used the following comman > line: > > ./build charm++ mpi-linux-x86_64-mvapich icc -O -I/include > -L/lib --no-build-shared > > Then I resume the charm++ compilation process for the mailinglist: > > cd src/arch > cp -r mpi-linux-x86_64 mpi-linux-x86_64-mvapich > cd mpi-linux-x86_64-mvapich > > Edit conv-mach.h and change (not necessary in Charm++ 6.1.2): > > #define CMK_MALLOC_USE_GNU_MALLOC 1 > #define CMK_MALLOC_USE_OS_BUILTIN 0 > > to: > > #define CMK_MALLOC_USE_GNU_MALLOC 0 > #define CMK_MALLOC_USE_OS_BUILTIN 1 > > Make sure the MVAPICH mpicc and mpiCC are first in your path. Otherwise, > add the full path to the mpicc and mpiCC commands in conv_mach.sh. If > the ?build? command does not recognize the mpiCC compiler you can to > edit the conv-mach.sh file changing the line > > MPICXX_DEF=mpicxx > > to: > > #MPICXX_DEF=mpiCC > MPICXX_DEF=mpicxx > > > cd ../../.. > ./build charm++ mpi-linux-x86_64-mvapich --no-build-shared > > or for icc compiler: > > ./build charm++ mpi-linux-x86_64-mvapich icc -O -I/include > -L/lib --no-build-shared > Okay, now this made charm++ compile, but I can't produce any binary with charm++. Even the megatest binary fails to build. The error I get when compiling megatest is the following: ------------------------------------------------------------------------------ Fatal Error by charmc in directory /home/blub/src/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux-x86_64-mvapich1_1- icc/tests/charm++/megatest Command icpc -shared-intel -xO -rdynamic -o pgm -L../../../bin/../lib - I../../../bin/../include ../../../bin/../lib/libldb-rand.o megatest.o groupring.o nodering.o varsizetest.o varraystest.o groupcast.o nodecast.o synctest.o fib.o arrayring.o tempotest.o packtest.o queens.o migration.o marshall.o priomsg.o priotest.o rotest.o statistics.o templates.o inherit.o reduction.o bitvector.o immediatering.o callback.o moduleinit5242.o ../../../bin/../lib/libmemory-default.o ../../../bin/../lib/libthreads- default.o -lck -lconv-cplus-y -lconv-core -lconv-util -lckqt -ldl -lm returned error code 1 charmc exiting... make: *** [pgm] Error 1 ------------------------------------------------------------------------------ For NAMD the error is the following: ------------------------------------------------------------------------------ Fatal Error by charmc in directory /home/blub/src/NAMD_2.7b1_Source/Linux- x86_64-icc.mpi-linux-x86_64-mvapich1_1-icc Command icpc -shared-intel -xO -rdynamic -L.rootdir/tcl/lib - L.rootdir/fftw/lib -I/home/blub/src/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux- x86_64-mvapich1_1-icc/include -DCMK_OPTIMIZE=1 -Iinc -Isrc -i-static -O2 -ip - fno-rtti -o namd2 -L/home/blub/src/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux- x86_64-mvapich1_1-icc/bin/../lib - I/home/blub/src/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux-x86_64-mvapich1_1- icc/bin/../include /home/blub/src/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux- x86_64-mvapich1_1-icc/bin/../lib/libldb-rand.o obj/buildinfo.o obj/common.o obj/dcdlib.o obj/erf.o obj/fitrms.o obj/main.o obj/mainfunc.o obj/memusage.o obj/strlib.o obj/AlgSeven.o obj/AlgRecBisection.o obj/AlgNbor.o obj/AtomMap.o obj/BackEnd.o obj/BroadcastMgr.o obj/BroadcastClient.o obj/CollectionMaster.o obj/CollectionMgr.o obj/Communicate.o obj/Compute.o obj/ComputeAngles.o obj/ComputeBonds.o obj/ComputeConsForce.o obj/ComputeConsForceMsgs.o obj/ComputeCrossterms.o obj/ComputeCylindricalBC.o obj/ComputeDihedrals.o obj/ComputeDPME.o obj/ComputeDPMEMsgs.o obj/ComputeDPMTA.o obj/ComputeEField.o obj/ComputeEwald.o obj/ComputeExt.o obj/ComputeFullDirect.o obj/ComputeHomePatch.o obj/ComputeHomePatches.o obj/ComputeImpropers.o obj/ComputeGlobal.o obj/ComputeGlobalMsgs.o obj/ComputeGridForce.o obj/ComputeMap.o obj/ComputeMgr.o obj/ComputeNonbondedSelf.o obj/ComputeNonbondedPair.o obj/ComputeNonbondedUtil.o obj/ComputeNonbondedStd.o obj/ComputeNonbondedFEP.o obj/ComputeNonbondedLES.o obj/ComputeNonbondedPProf.o obj/ComputeNonbondedCUDA.o obj/ComputePatch.o obj/ComputePatchPair.o obj/ComputePme.o obj/OptPme.o obj/OptPmeRealSpace.o obj/ComputeRestraints.o obj/ComputeSphericalBC.o obj/ComputeStir.o obj/ComputeTclBC.o obj/ConfigList.o obj/Controller.o obj/ccsinterface.o obj/DataStream.o obj/DumpBench.o obj/FreeEnergyAssert.o obj/FreeEnergyGroup.o obj/FreeEnergyLambda.o obj/FreeEnergyLambdMgr.o obj/FreeEnergyParse.o obj/FreeEnergyRestrain.o obj/FreeEnergyRMgr.o obj/FreeEnergyVector.o obj/GlobalMaster.o obj/GlobalMasterServer.o obj/GlobalMasterTest.o obj/GlobalMasterIMD.o obj/GlobalMasterTcl.o obj/GlobalMasterSMD.o obj/GlobalMasterTMD.o obj/GlobalMasterFreeEnergy.o obj/GlobalMasterEasy.o obj/GlobalMasterMisc.o obj/colvarmodule.o obj/colvarparse.o obj/colvar.o obj/colvarvalue.o obj/colvarbias.o obj/colvarbias_abf.o obj/colvarbias_meta.o obj/colvaratoms.o obj/colvarcomp.o obj/colvarcomp_angles.o obj/colvarcomp_coordnums.o obj/colvarcomp_distances.o obj/colvarcomp_protein.o obj/colvarcomp_rotations.o obj/colvarproxy_namd.o obj/GridForceGrid.o obj/GromacsTopFile.o obj/heap.o obj/HomePatch.o obj/IMDOutput.o obj/InfoStream.o obj/LdbCoordinator.o obj/LJTable.o obj/Measure.o obj/MGridforceParams.o obj/MStream.o obj/MigrateAtomsMsg.o obj/Molecule.o obj/Molecule2.o obj/NamdCentLB.o obj/NamdNborLB.o obj/NamdState.o obj/NamdOneTools.o obj/Node.o obj/Output.o obj/Parameters.o obj/ParseOptions.o obj/Patch.o obj/PatchMgr.o obj/PatchMap.o obj/PDB.o obj/PDBData.o obj/PmeBase.o obj/PmeKSpace.o obj/PmeRealSpace.o obj/ProcessorPrivate.o obj/ProxyMgr.o obj/ProxyPatch.o obj/Rebalancer.o obj/RecBisection.o obj/ReductionMgr.o obj/RefineOnly.o obj/RefineTorusLB.o obj/ScriptTcl.o obj/Sequencer.o obj/Set.o obj/Settle.o obj/SimParameters.o obj/Sync.o obj/TclCommands.o obj/TorusLB.o obj/WorkDistrib.o obj/pub3dfft.o obj/vmdsock.o obj/parm.o obj/imd.o obj/CompressPsf.o obj/PluginIOMgr.o obj/AtomsDisInfo.o obj/FileIO.o obj/dcdplugin.o obj/jsplugin.o obj/pdbplugin.o obj/psfplugin.o moduleinit4100.o -lmoduleNeighborLB -lmodulecommlib /home/blub/src/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux-x86_64-mvapich1_1- icc/bin/../lib/libmemory-default.o /home/blub/src/NAMD_2.7b1_Source/charm-6.1.2/mpi-linux-x86_64-mvapich1_1- icc/bin/../lib/libthreads-default.o -lck -lconv-cplus-y -lconv-core -lconv- util -lckqt -ldl -ltcl8.3 -ldl -lsrfftw -lsfftw -lm -lmoduleNeighborLB - lmodulecommlib -lm returned error code 1 charmc exiting... rm -f moduleinit4100.C moduleinit4100.o make: *** [namd2] Error 1 ------------------------------------------------------------------------------ Please find the complete logs here: http://daten-transport.de/ansicht.php?id=zEuKTtBxBPf8 Direct Link: http://daten-transport.de/download.php?id=zEuKTtBxBPf8&dateinummer=4419 Thanks for your help kind regards Bjoern -- Bjoern Olausson Martin-Luther-Universit?t Halle-Wittenberg Fachbereich Biochemie/Biotechnologie Kurt-Mothes-Str. 3 06120 Halle/Saale Phone: +49-345-55-24942 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090824/046a02c1/attachment-0001.bin From funketob at gmail.com Thu Aug 27 15:08:04 2009 From: funketob at gmail.com (Tobias Funke) Date: Thu Aug 27 15:08:51 2009 Subject: [mvapich-discuss] Mvapich 1.4 Message-ID: Hello Mvapich-teams, I am mvapich2 using some 1.4 RC downloaded from ohio. I weesh to know when you will release the proper mvabich2. I face some issues in RC, maybe it solved in rilease ? Funke T -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090828/d5428e28/attachment.html From panda at cse.ohio-state.edu Thu Aug 27 15:20:04 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Aug 27 15:20:34 2009 Subject: [mvapich-discuss] Mvapich 1.4 In-Reply-To: Message-ID: Since the RC1 release, many additional fixes have gone into MVAPICH2 1.4 codebase. All these fixes are available from the `trunk' version. We had planned to push out RC2 two weeks back and things got delayed based on our testing results. The RC2 version will be coming out during the next 1-2 days and will have all of these fixes. The final release will be after that. In the mean time, you are welcome to use the `trunk' version and let us know whether your issues have been solved or not. Thanks, DK On Fri, 28 Aug 2009, Tobias Funke wrote: > Hello Mvapich-teams, > > I am mvapich2 using some 1.4 RC downloaded from ohio. > I weesh to know when you will release the proper > mvabich2. I face some issues in RC, maybe it solved in rilease ? > > Funke T > From eimamagi at srce.hr Sat Aug 29 09:10:51 2009 From: eimamagi at srce.hr (Emir Imamagic) Date: Sat Aug 29 10:07:13 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node Message-ID: <4A9928DB.8030401@srce.hr> Hello, we have a problem with running multiple MPI jobs on the same node. We're using mvapich 1.1.0 on CentOS 5.3 compiled with Intel 11.1. Nodes are 32 core Opterons. We used NPB LU benchmark compiled for 8 processes. With each new job started on the node, CPU usage of all processes decreases (we retrieved it by top). It seems that individual MPI processes are assigned to the same core (as described). This behaves consistently with the increase of jobs: 2 jobs - 50% CPU usage (2*app runtime) 3 jobs - 33% CPU usage (3*app runtime) 4 jobs - 25% CPU usage (4*app runtime) Problem is also described in this thread: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-April/002251.html However, suggested solution does not solve the problem. We set the VIADEV_USE_AFFINITY=0. We even changed the source code (mpid/ch_gen2/viaparam.h): #define _AFFINITY_ 0 Nothing helped. Thanks in advance, emir -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3283 bytes Desc: S/MIME Cryptographic Signature Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090829/71f2d639/smime.bin From panda at cse.ohio-state.edu Sat Aug 29 10:31:31 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Aug 29 10:32:01 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node In-Reply-To: <4A9928DB.8030401@srce.hr> Message-ID: Which interface of mvapich 1.1.0 you are using - Gen2 or Gen2-hybrid? If you are using `Gen2' interface, VIADEV_USE_AFFINITY=0 should be disabling affinity. For Gen2-Hybrid, the variable is MV_USE_AFFINITY. Also, for Gen2 interface, there is a CPU mapping option VIADEV_CPU_MAPPING through which you can actually run an MPI job on a specified set of cores. Can you try this option to make sure that different MPI jobs can explicitly get mapped to different cores. DK On Sat, 29 Aug 2009, Emir Imamagic wrote: > Hello, > > we have a problem with running multiple MPI jobs on the same node. We're > using mvapich 1.1.0 on CentOS 5.3 compiled with Intel 11.1. Nodes are 32 > core Opterons. > > We used NPB LU benchmark compiled for 8 processes. With each new job > started on the node, CPU usage of all processes decreases (we retrieved > it by top). It seems that individual MPI processes are assigned to the > same core (as described). This behaves consistently with the increase of > jobs: > 2 jobs - 50% CPU usage (2*app runtime) > 3 jobs - 33% CPU usage (3*app runtime) > 4 jobs - 25% CPU usage (4*app runtime) > > Problem is also described in this thread: > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-April/002251.html > However, suggested solution does not solve the problem. We set the > VIADEV_USE_AFFINITY=0. We even changed the source code > (mpid/ch_gen2/viaparam.h): > #define _AFFINITY_ 0 > Nothing helped. > > Thanks in advance, > emir > > > > From eimamagi at srce.hr Sat Aug 29 11:18:26 2009 From: eimamagi at srce.hr (Emir Imamagic) Date: Sat Aug 29 11:19:49 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node In-Reply-To: References: Message-ID: <4A9946C2.5090902@srce.hr> Dhabaleswar Panda wrote: > Which interface of mvapich 1.1.0 you are using - Gen2 or Gen2-hybrid? If > you are using `Gen2' interface, VIADEV_USE_AFFINITY=0 should be disabling > affinity. For Gen2-Hybrid, the variable is MV_USE_AFFINITY. Also, for Gen2 > interface, there is a CPU mapping option VIADEV_CPU_MAPPING through which > you can actually run an MPI job on a specified set of cores. Can you try > this option to make sure that different MPI jobs can explicitly get mapped > to different cores. I'm using Gen2. And I tried with both - VIADEV_USE_AFFINITY=0 and - VIADEV_USE_AFFINITY=1, VIADEV_CPU_MAPPING: mpirun_rsh -ssh -np 8 -hostfile ./machines VIADEV_CPU_MAPPING=0,1,2,3,4,5,6,7 VIADEV_USE_AFFINITY=1 ./lu.C.8.mvapich mpirun_rsh -ssh -np 8 -hostfile ./machines VIADEV_CPU_MAPPING=8,9,10,11,12,13,14,15 VIADEV_USE_AFFINITY=1 ./lu.C.8.mvapich Result was the same. Below is the output of top and mpstat when VIADEV_CPU_MAPPING was used. Cheers, emir PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30434 eimamagi 25 0 164m 117m 11m R 50.2 0.2 0:56.78 lu.C.8.mvapich 30435 eimamagi 25 0 164m 111m 5044 R 50.2 0.2 0:56.79 lu.C.8.mvapich 30436 eimamagi 25 0 164m 111m 5156 R 50.2 0.2 0:56.75 lu.C.8.mvapich 30437 eimamagi 25 0 164m 109m 3200 R 50.2 0.2 0:56.80 lu.C.8.mvapich 30440 eimamagi 25 0 164m 110m 4360 R 50.2 0.2 0:56.77 lu.C.8.mvapich 30441 eimamagi 25 0 164m 109m 3168 R 50.2 0.2 0:56.78 lu.C.8.mvapich 30692 eimamagi 25 0 164m 109m 3132 R 50.2 0.2 0:29.20 lu.C.8.mvapich 30693 eimamagi 25 0 164m 111m 5080 R 50.2 0.2 0:29.39 lu.C.8.mvapich 30438 eimamagi 25 0 164m 109m 3172 R 49.8 0.2 0:56.74 lu.C.8.mvapich 30439 eimamagi 25 0 164m 111m 5068 R 49.8 0.2 0:56.54 lu.C.8.mvapich 30688 eimamagi 25 0 164m 117m 11m R 49.8 0.2 0:29.15 lu.C.8.mvapich 30689 eimamagi 25 0 164m 110m 4152 R 49.8 0.2 0:29.15 lu.C.8.mvapich 30690 eimamagi 25 0 164m 111m 5092 R 49.8 0.2 0:29.15 lu.C.8.mvapich 30691 eimamagi 25 0 164m 109m 3360 R 49.8 0.2 0:29.15 lu.C.8.mvapich 30694 eimamagi 25 0 164m 110m 4568 R 49.8 0.2 0:29.16 lu.C.8.mvapich 30695 eimamagi 25 0 164m 109m 3116 R 49.8 0.2 0:29.15 lu.C.8.mvapich And mpstat -P ALL: $ mpstat -P ALL Linux 2.6.18-128.1.16.el5 08/29/2009 05:17:48 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 05:17:48 PM all 15.18 0.00 1.39 0.01 0.00 0.00 0.00 83.42 133.77 05:17:48 PM 0 33.55 0.01 5.40 0.04 0.00 0.02 0.00 60.98 133.77 05:17:48 PM 1 43.95 0.00 5.43 0.00 0.00 0.00 0.00 50.61 0.00 05:17:48 PM 2 50.77 0.00 5.39 0.00 0.00 0.00 0.00 43.83 0.00 05:17:48 PM 3 49.93 0.00 5.46 0.00 0.00 0.00 0.00 44.60 0.00 05:17:48 PM 4 38.46 0.00 5.28 0.00 0.00 0.00 0.00 56.26 0.00 05:17:48 PM 5 33.77 0.00 5.29 0.00 0.00 0.00 0.00 60.93 0.00 05:17:48 PM 6 45.39 0.00 5.25 0.00 0.00 0.00 0.00 49.36 0.00 05:17:48 PM 7 35.32 0.00 5.30 0.00 0.00 0.00 0.00 59.38 0.00 05:17:48 PM 8 16.53 0.00 0.05 0.02 0.00 0.00 0.00 83.40 0.00 05:17:48 PM 9 5.67 0.00 0.05 0.03 0.00 0.00 0.00 94.25 0.00 05:17:48 PM 10 0.81 0.00 0.06 0.02 0.00 0.00 0.00 99.11 0.00 05:17:48 PM 11 0.81 0.00 0.07 0.02 0.00 0.00 0.00 99.09 0.00 05:17:48 PM 12 32.88 0.00 0.13 0.00 0.00 0.00 0.00 66.99 0.00 05:17:48 PM 13 0.94 0.00 0.06 0.00 0.00 0.00 0.00 99.00 0.00 05:17:48 PM 14 5.91 0.00 0.05 0.00 0.00 0.00 0.00 94.04 0.00 05:17:48 PM 15 0.82 0.00 0.10 0.00 0.00 0.00 0.00 99.08 0.00 05:17:48 PM 16 26.38 0.00 0.13 0.00 0.00 0.00 0.00 73.49 0.00 05:17:48 PM 17 1.83 0.00 0.04 0.00 0.00 0.00 0.00 98.13 0.00 05:17:48 PM 18 1.98 0.00 0.03 0.00 0.00 0.00 0.00 97.99 0.00 05:17:48 PM 19 0.80 0.01 0.24 0.00 0.00 0.00 0.00 98.95 0.00 05:17:48 PM 20 15.50 0.00 0.06 0.00 0.00 0.00 0.00 84.44 0.00 05:17:48 PM 21 3.34 0.00 0.07 0.00 0.00 0.00 0.00 96.58 0.00 05:17:48 PM 22 2.97 0.00 0.03 0.00 0.00 0.00 0.00 97.00 0.00 05:17:48 PM 23 2.35 0.00 0.13 0.00 0.00 0.00 0.00 97.52 0.00 05:17:48 PM 24 10.61 0.00 0.06 0.00 0.00 0.00 0.00 89.33 0.00 05:17:48 PM 25 4.52 0.01 0.24 0.02 0.00 0.00 0.00 95.21 0.00 05:17:48 PM 26 2.44 0.00 0.02 0.00 0.00 0.00 0.00 97.54 0.00 05:17:48 PM 27 1.54 0.00 0.02 0.00 0.00 0.00 0.00 98.43 0.00 05:17:48 PM 28 8.58 0.00 0.05 0.01 0.00 0.00 0.00 91.37 0.00 05:17:48 PM 29 4.75 0.00 0.02 0.01 0.00 0.00 0.00 95.23 0.00 05:17:48 PM 30 0.80 0.00 0.01 0.01 0.00 0.00 0.00 99.18 0.00 05:17:48 PM 31 1.99 0.00 0.01 0.01 0.00 0.00 0.00 97.99 0.00 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3283 bytes Desc: S/MIME Cryptographic Signature Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090829/e82cb73a/smime-0001.bin From panda at cse.ohio-state.edu Sat Aug 29 11:39:22 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Aug 29 11:39:54 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node In-Reply-To: <4A9946C2.5090902@srce.hr> Message-ID: What is the output of top and mpstat when you run a 16-process LU job on the same 16-cores (0-15)? You also indicated in your original e-mail that a single node has 32 cores. I am assuming that it has eight sockets of four cores each. Are these Opterons or any other processor type? DK On Sat, 29 Aug 2009, Emir Imamagic wrote: > Dhabaleswar Panda wrote: > > Which interface of mvapich 1.1.0 you are using - Gen2 or Gen2-hybrid? If > > you are using `Gen2' interface, VIADEV_USE_AFFINITY=0 should be disabling > > affinity. For Gen2-Hybrid, the variable is MV_USE_AFFINITY. Also, for Gen2 > > interface, there is a CPU mapping option VIADEV_CPU_MAPPING through which > > you can actually run an MPI job on a specified set of cores. Can you try > > this option to make sure that different MPI jobs can explicitly get mapped > > to different cores. > > I'm using Gen2. And I tried with both > - VIADEV_USE_AFFINITY=0 and > - VIADEV_USE_AFFINITY=1, VIADEV_CPU_MAPPING: > mpirun_rsh -ssh -np 8 -hostfile ./machines > VIADEV_CPU_MAPPING=0,1,2,3,4,5,6,7 VIADEV_USE_AFFINITY=1 ./lu.C.8.mvapich > mpirun_rsh -ssh -np 8 -hostfile ./machines > VIADEV_CPU_MAPPING=8,9,10,11,12,13,14,15 VIADEV_USE_AFFINITY=1 > ./lu.C.8.mvapich > > Result was the same. Below is the output of top and mpstat when > VIADEV_CPU_MAPPING was used. > > Cheers, > emir > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 30434 eimamagi 25 0 164m 117m 11m R 50.2 0.2 0:56.78 lu.C.8.mvapich > 30435 eimamagi 25 0 164m 111m 5044 R 50.2 0.2 0:56.79 lu.C.8.mvapich > 30436 eimamagi 25 0 164m 111m 5156 R 50.2 0.2 0:56.75 lu.C.8.mvapich > 30437 eimamagi 25 0 164m 109m 3200 R 50.2 0.2 0:56.80 lu.C.8.mvapich > 30440 eimamagi 25 0 164m 110m 4360 R 50.2 0.2 0:56.77 lu.C.8.mvapich > 30441 eimamagi 25 0 164m 109m 3168 R 50.2 0.2 0:56.78 lu.C.8.mvapich > 30692 eimamagi 25 0 164m 109m 3132 R 50.2 0.2 0:29.20 lu.C.8.mvapich > 30693 eimamagi 25 0 164m 111m 5080 R 50.2 0.2 0:29.39 lu.C.8.mvapich > 30438 eimamagi 25 0 164m 109m 3172 R 49.8 0.2 0:56.74 lu.C.8.mvapich > 30439 eimamagi 25 0 164m 111m 5068 R 49.8 0.2 0:56.54 lu.C.8.mvapich > 30688 eimamagi 25 0 164m 117m 11m R 49.8 0.2 0:29.15 lu.C.8.mvapich > 30689 eimamagi 25 0 164m 110m 4152 R 49.8 0.2 0:29.15 lu.C.8.mvapich > 30690 eimamagi 25 0 164m 111m 5092 R 49.8 0.2 0:29.15 lu.C.8.mvapich > 30691 eimamagi 25 0 164m 109m 3360 R 49.8 0.2 0:29.15 lu.C.8.mvapich > 30694 eimamagi 25 0 164m 110m 4568 R 49.8 0.2 0:29.16 lu.C.8.mvapich > 30695 eimamagi 25 0 164m 109m 3116 R 49.8 0.2 0:29.15 lu.C.8.mvapich > > And mpstat -P ALL: > $ mpstat -P ALL > Linux 2.6.18-128.1.16.el5 08/29/2009 > > 05:17:48 PM CPU %user %nice %sys %iowait %irq %soft %steal > %idle intr/s > 05:17:48 PM all 15.18 0.00 1.39 0.01 0.00 0.00 0.00 > 83.42 133.77 > 05:17:48 PM 0 33.55 0.01 5.40 0.04 0.00 0.02 0.00 > 60.98 133.77 > 05:17:48 PM 1 43.95 0.00 5.43 0.00 0.00 0.00 0.00 > 50.61 0.00 > 05:17:48 PM 2 50.77 0.00 5.39 0.00 0.00 0.00 0.00 > 43.83 0.00 > 05:17:48 PM 3 49.93 0.00 5.46 0.00 0.00 0.00 0.00 > 44.60 0.00 > 05:17:48 PM 4 38.46 0.00 5.28 0.00 0.00 0.00 0.00 > 56.26 0.00 > 05:17:48 PM 5 33.77 0.00 5.29 0.00 0.00 0.00 0.00 > 60.93 0.00 > 05:17:48 PM 6 45.39 0.00 5.25 0.00 0.00 0.00 0.00 > 49.36 0.00 > 05:17:48 PM 7 35.32 0.00 5.30 0.00 0.00 0.00 0.00 > 59.38 0.00 > 05:17:48 PM 8 16.53 0.00 0.05 0.02 0.00 0.00 0.00 > 83.40 0.00 > 05:17:48 PM 9 5.67 0.00 0.05 0.03 0.00 0.00 0.00 > 94.25 0.00 > 05:17:48 PM 10 0.81 0.00 0.06 0.02 0.00 0.00 0.00 > 99.11 0.00 > 05:17:48 PM 11 0.81 0.00 0.07 0.02 0.00 0.00 0.00 > 99.09 0.00 > 05:17:48 PM 12 32.88 0.00 0.13 0.00 0.00 0.00 0.00 > 66.99 0.00 > 05:17:48 PM 13 0.94 0.00 0.06 0.00 0.00 0.00 0.00 > 99.00 0.00 > 05:17:48 PM 14 5.91 0.00 0.05 0.00 0.00 0.00 0.00 > 94.04 0.00 > 05:17:48 PM 15 0.82 0.00 0.10 0.00 0.00 0.00 0.00 > 99.08 0.00 > 05:17:48 PM 16 26.38 0.00 0.13 0.00 0.00 0.00 0.00 > 73.49 0.00 > 05:17:48 PM 17 1.83 0.00 0.04 0.00 0.00 0.00 0.00 > 98.13 0.00 > 05:17:48 PM 18 1.98 0.00 0.03 0.00 0.00 0.00 0.00 > 97.99 0.00 > 05:17:48 PM 19 0.80 0.01 0.24 0.00 0.00 0.00 0.00 > 98.95 0.00 > 05:17:48 PM 20 15.50 0.00 0.06 0.00 0.00 0.00 0.00 > 84.44 0.00 > 05:17:48 PM 21 3.34 0.00 0.07 0.00 0.00 0.00 0.00 > 96.58 0.00 > 05:17:48 PM 22 2.97 0.00 0.03 0.00 0.00 0.00 0.00 > 97.00 0.00 > 05:17:48 PM 23 2.35 0.00 0.13 0.00 0.00 0.00 0.00 > 97.52 0.00 > 05:17:48 PM 24 10.61 0.00 0.06 0.00 0.00 0.00 0.00 > 89.33 0.00 > 05:17:48 PM 25 4.52 0.01 0.24 0.02 0.00 0.00 0.00 > 95.21 0.00 > 05:17:48 PM 26 2.44 0.00 0.02 0.00 0.00 0.00 0.00 > 97.54 0.00 > 05:17:48 PM 27 1.54 0.00 0.02 0.00 0.00 0.00 0.00 > 98.43 0.00 > 05:17:48 PM 28 8.58 0.00 0.05 0.01 0.00 0.00 0.00 > 91.37 0.00 > 05:17:48 PM 29 4.75 0.00 0.02 0.01 0.00 0.00 0.00 > 95.23 0.00 > 05:17:48 PM 30 0.80 0.00 0.01 0.01 0.00 0.00 0.00 > 99.18 0.00 > 05:17:48 PM 31 1.99 0.00 0.01 0.01 0.00 0.00 0.00 > 97.99 0.00 > > > From eimamagi at srce.hr Sat Aug 29 14:48:40 2009 From: eimamagi at srce.hr (Emir Imamagic) Date: Sat Aug 29 14:49:56 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node In-Reply-To: References: Message-ID: <4A997808.7060806@srce.hr> Dhabaleswar Panda wrote: > What is the output of top and mpstat when you run a 16-process LU job on > the same 16-cores (0-15)? Command: mpirun_rsh -ssh -np 16 -hostfile ./machines VIADEV_USE_AFFINITY=0 ./lu.C.16 TOP: top - 20:45:42 up 56 days, 15:18, 2 users, load average: 8.55, 5.76, 4.46 Tasks: 484 total, 17 running, 467 sleeping, 0 stopped, 0 zombie Cpu(s): 15.2%us, 1.4%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 66072240k total, 9708912k used, 56363328k free, 336556k buffers Swap: 7999992k total, 0k used, 7999992k free, 7728032k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32508 eimamagi 25 0 140m 79m 19m R 99.1 0.1 0:42.09 lu.C.16 32509 eimamagi 25 0 140m 63m 4176 R 99.1 0.1 0:42.10 lu.C.16 32510 eimamagi 25 0 140m 63m 3792 R 99.1 0.1 0:42.08 lu.C.16 32511 eimamagi 25 0 140m 63m 3332 R 99.1 0.1 0:42.09 lu.C.16 32512 eimamagi 25 0 140m 63m 4228 R 99.1 0.1 0:42.11 lu.C.16 32513 eimamagi 25 0 140m 64m 5148 R 99.1 0.1 0:42.11 lu.C.16 32514 eimamagi 25 0 140m 64m 4772 R 99.1 0.1 0:42.11 lu.C.16 32515 eimamagi 25 0 140m 63m 4232 R 99.1 0.1 0:42.11 lu.C.16 32516 eimamagi 25 0 140m 63m 4052 R 99.1 0.1 0:42.11 lu.C.16 32517 eimamagi 25 0 140m 64m 4716 R 99.1 0.1 0:42.10 lu.C.16 32518 eimamagi 25 0 140m 63m 4544 R 99.1 0.1 0:42.10 lu.C.16 32519 eimamagi 25 0 140m 63m 4060 R 99.1 0.1 0:42.11 lu.C.16 32520 eimamagi 25 0 140m 62m 3892 R 99.1 0.1 0:42.10 lu.C.16 32521 eimamagi 25 0 140m 63m 4428 R 99.1 0.1 0:42.11 lu.C.16 32522 eimamagi 25 0 140m 63m 4428 R 99.1 0.1 0:42.11 lu.C.16 32523 eimamagi 25 0 140m 62m 3392 R 99.1 0.1 0:42.11 lu.C.16 MPSTAT: 20:45:23 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 20:45:25 all 50.02 0.00 0.03 0.00 0.00 0.00 0.00 49.95 1005.00 20:45:25 0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1005.00 20:45:25 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 2 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 6 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 7 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 9 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 10 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 11 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 12 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 13 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 14 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 15 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:45:25 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 24 0.50 0.00 0.50 0.00 0.00 0.00 0.00 99.00 0.00 20:45:25 25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 26 0.50 0.00 0.50 0.00 0.00 0.00 0.00 99.00 0.00 20:45:25 27 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 29 0.00 0.00 0.50 0.00 0.00 0.00 0.00 99.50 0.00 20:45:25 30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:45:25 31 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 Just for comparison, here's the output when I run 2 instances of lu.C.16. It is pretty obvious that only first 16 CPUs are used no matter how many jobs I start. TOP: top - 20:47:06 up 56 days, 15:19, 3 users, load average: 16.74, 8.87, 5.66 Tasks: 509 total, 33 running, 476 sleeping, 0 stopped, 0 zombie Cpu(s): 50.0%us, 0.1%sy, 0.0%ni, 49.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 66072240k total, 10744044k used, 55328196k free, 336564k buffers Swap: 7999992k total, 0k used, 7999992k free, 7769652k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32671 eimamagi 25 0 140m 62m 3464 R 50.5 0.1 0:05.25 lu.C.16 32673 eimamagi 25 0 140m 63m 3996 R 50.5 0.1 0:05.26 lu.C.16 32510 eimamagi 25 0 140m 63m 3892 R 50.2 0.1 2:01.07 lu.C.16 32511 eimamagi 25 0 140m 63m 3380 R 50.2 0.1 2:01.13 lu.C.16 32512 eimamagi 25 0 140m 64m 4228 R 50.2 0.1 2:01.15 lu.C.16 32513 eimamagi 25 0 140m 64m 5148 R 50.2 0.1 2:01.15 lu.C.16 32514 eimamagi 25 0 140m 64m 4860 R 50.2 0.1 2:01.15 lu.C.16 32516 eimamagi 25 0 141m 63m 4084 R 50.2 0.1 2:01.13 lu.C.16 32519 eimamagi 25 0 140m 63m 4152 R 50.2 0.1 2:01.14 lu.C.16 32521 eimamagi 25 0 140m 63m 4468 R 50.2 0.1 2:01.14 lu.C.16 32523 eimamagi 25 0 140m 62m 3756 R 50.2 0.1 2:01.13 lu.C.16 32659 eimamagi 25 0 140m 79m 19m R 50.2 0.1 0:05.25 lu.C.16 32660 eimamagi 25 0 140m 63m 4160 R 50.2 0.1 0:05.26 lu.C.16 32662 eimamagi 25 0 140m 63m 3280 R 50.2 0.1 0:05.27 lu.C.16 32664 eimamagi 25 0 141m 64m 5140 R 50.2 0.1 0:05.27 lu.C.16 32665 eimamagi 25 0 140m 64m 4876 R 50.2 0.1 0:05.27 lu.C.16 32666 eimamagi 25 0 140m 64m 4348 R 50.2 0.1 0:05.27 lu.C.16 32668 eimamagi 25 0 140m 64m 4688 R 50.2 0.1 0:05.26 lu.C.16 32669 eimamagi 25 0 140m 63m 4416 R 50.2 0.1 0:05.26 lu.C.16 32672 eimamagi 25 0 140m 63m 4396 R 50.2 0.1 0:05.26 lu.C.16 32508 eimamagi 25 0 140m 79m 19m R 49.8 0.1 2:01.14 lu.C.16 32509 eimamagi 25 0 140m 63m 4176 R 49.8 0.1 2:01.13 lu.C.16 32515 eimamagi 25 0 140m 64m 4404 R 49.8 0.1 2:01.14 lu.C.16 32517 eimamagi 25 0 141m 64m 4716 R 49.8 0.1 2:01.14 lu.C.16 32518 eimamagi 25 0 140m 64m 4660 R 49.8 0.1 2:01.13 lu.C.16 32520 eimamagi 25 0 140m 63m 3960 R 49.8 0.1 2:01.15 lu.C.16 32522 eimamagi 25 0 140m 63m 4484 R 49.8 0.1 2:01.14 lu.C.16 32661 eimamagi 25 0 140m 63m 3776 R 49.8 0.1 0:05.27 lu.C.16 32663 eimamagi 25 0 140m 64m 4216 R 49.8 0.1 0:05.26 lu.C.16 32667 eimamagi 25 0 140m 63m 3896 R 49.8 0.1 0:05.27 lu.C.16 32670 eimamagi 25 0 140m 63m 3956 R 49.8 0.1 0:05.26 lu.C.16 32674 eimamagi 25 0 140m 62m 3408 R 49.8 0.1 0:05.27 lu.C.16 MPSTAT: 20:47:35 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 20:47:37 all 50.00 0.02 0.08 0.00 0.00 0.00 0.00 49.91 1004.50 20:47:37 0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1004.50 20:47:37 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 2 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 6 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 7 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 9 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 10 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 11 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 12 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 13 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 14 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 15 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20:47:37 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 23 0.00 0.00 0.50 0.00 0.00 0.00 0.00 99.50 0.00 20:47:37 24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 25 0.00 0.00 0.50 0.00 0.00 0.00 0.00 99.50 0.00 20:47:37 26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 27 0.00 0.00 0.50 0.00 0.00 0.00 0.00 99.50 0.00 20:47:37 28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 20:47:37 31 0.00 0.00 0.50 0.00 0.00 0.00 0.00 99.50 0.00 > You also indicated in your original e-mail that a single node has 32 > cores. I am assuming that it has eight sockets of four cores each. Are > these Opterons or any other processor type? Quad-Core AMD Opteron(tm) Processor 8384. Thanks, emir -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3283 bytes Desc: S/MIME Cryptographic Signature Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090829/c80d76e9/smime.bin From kandalla at cse.ohio-state.edu Sat Aug 29 17:16:04 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya Kandalla) Date: Sat Aug 29 17:16:58 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node In-Reply-To: <4A997808.7060806@srce.hr> References: <4A997808.7060806@srce.hr> Message-ID: <4A999A94.5050103@cse.ohio-state.edu> Emir, > mpirun_rsh -ssh -np 8 -hostfile ./machines VIADEV_CPU_MAPPING=0,1,2,3,4,5,6,7 VIADEV_USE_AFFINITY=1 ./lu.C.8.mvapich > mpirun_rsh -ssh -np 8 -hostfile ./machines VIADEV_CPU_MAPPING=8,9,10,11,12,13,14,15 VIADEV_USE_AFFINITY=1 ./lu.C.8.mvapich This should ensure that the processes get mapped to the core-id's that you have specified. It is a little strange that it is not happening on your systems. You can tweak the "top" output to also show the "last used cpu" information for each process running within a node. This information will help us ascertain that the 16 processes are indeed getting mapped onto the first 8 cores and nothing else is going on. To do this, you need to : 1. Open the top interface, hit the "f" button, hit the "j" key and return. 2. Optionally, you can then hit the "o" key, hold the shift and the j keys so that the "J" and the "A" fields are juxtaposed - this will be easier to compare visually. Thanks, Krishna Emir Imamagic wrote: > Dhabaleswar Panda wrote: >> What is the output of top and mpstat when you run a 16-process LU job on >> the same 16-cores (0-15)? > > Command: > mpirun_rsh -ssh -np 16 -hostfile ./machines VIADEV_USE_AFFINITY=0 > ./lu.C.16 > > TOP: > top - 20:45:42 up 56 days, 15:18, 2 users, load average: 8.55, 5.76, > 4.46 > Tasks: 484 total, 17 running, 467 sleeping, 0 stopped, 0 zombie > Cpu(s): 15.2%us, 1.4%sy, 0.0%ni, 83.4%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 66072240k total, 9708912k used, 56363328k free, 336556k buffers > Swap: 7999992k total, 0k used, 7999992k free, 7728032k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 32508 eimamagi 25 0 140m 79m 19m R 99.1 0.1 0:42.09 lu.C.16 > 32509 eimamagi 25 0 140m 63m 4176 R 99.1 0.1 0:42.10 lu.C.16 > 32510 eimamagi 25 0 140m 63m 3792 R 99.1 0.1 0:42.08 lu.C.16 > 32511 eimamagi 25 0 140m 63m 3332 R 99.1 0.1 0:42.09 lu.C.16 > 32512 eimamagi 25 0 140m 63m 4228 R 99.1 0.1 0:42.11 lu.C.16 > 32513 eimamagi 25 0 140m 64m 5148 R 99.1 0.1 0:42.11 lu.C.16 > 32514 eimamagi 25 0 140m 64m 4772 R 99.1 0.1 0:42.11 lu.C.16 > 32515 eimamagi 25 0 140m 63m 4232 R 99.1 0.1 0:42.11 lu.C.16 > 32516 eimamagi 25 0 140m 63m 4052 R 99.1 0.1 0:42.11 lu.C.16 > 32517 eimamagi 25 0 140m 64m 4716 R 99.1 0.1 0:42.10 lu.C.16 > 32518 eimamagi 25 0 140m 63m 4544 R 99.1 0.1 0:42.10 lu.C.16 > 32519 eimamagi 25 0 140m 63m 4060 R 99.1 0.1 0:42.11 lu.C.16 > 32520 eimamagi 25 0 140m 62m 3892 R 99.1 0.1 0:42.10 lu.C.16 > 32521 eimamagi 25 0 140m 63m 4428 R 99.1 0.1 0:42.11 lu.C.16 > 32522 eimamagi 25 0 140m 63m 4428 R 99.1 0.1 0:42.11 lu.C.16 > 32523 eimamagi 25 0 140m 62m 3392 R 99.1 0.1 0:42.11 lu.C.16 > > MPSTAT: > 20:45:23 CPU %user %nice %sys %iowait %irq %soft > %steal %idle intr/s > 20:45:25 all 50.02 0.00 0.03 0.00 0.00 0.00 > 0.00 49.95 1005.00 > 20:45:25 0 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 1005.00 > 20:45:25 1 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 2 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 3 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 4 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 5 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 6 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 7 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 8 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 9 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 10 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 11 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 12 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 13 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 14 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 15 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:45:25 16 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 17 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 18 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 19 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 20 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 21 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 22 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 23 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 24 0.50 0.00 0.50 0.00 0.00 0.00 > 0.00 99.00 0.00 > 20:45:25 25 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 26 0.50 0.00 0.50 0.00 0.00 0.00 > 0.00 99.00 0.00 > 20:45:25 27 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 28 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 29 0.00 0.00 0.50 0.00 0.00 0.00 > 0.00 99.50 0.00 > 20:45:25 30 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:45:25 31 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > > > > Just for comparison, here's the output when I run 2 instances of > lu.C.16. It is pretty obvious that only first 16 CPUs are used no > matter how many jobs I start. > > TOP: > top - 20:47:06 up 56 days, 15:19, 3 users, load average: 16.74, > 8.87, 5.66 > Tasks: 509 total, 33 running, 476 sleeping, 0 stopped, 0 zombie > Cpu(s): 50.0%us, 0.1%sy, 0.0%ni, 49.9%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 66072240k total, 10744044k used, 55328196k free, 336564k buffers > Swap: 7999992k total, 0k used, 7999992k free, 7769652k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 32671 eimamagi 25 0 140m 62m 3464 R 50.5 0.1 0:05.25 lu.C.16 > 32673 eimamagi 25 0 140m 63m 3996 R 50.5 0.1 0:05.26 lu.C.16 > 32510 eimamagi 25 0 140m 63m 3892 R 50.2 0.1 2:01.07 lu.C.16 > 32511 eimamagi 25 0 140m 63m 3380 R 50.2 0.1 2:01.13 lu.C.16 > 32512 eimamagi 25 0 140m 64m 4228 R 50.2 0.1 2:01.15 lu.C.16 > 32513 eimamagi 25 0 140m 64m 5148 R 50.2 0.1 2:01.15 lu.C.16 > 32514 eimamagi 25 0 140m 64m 4860 R 50.2 0.1 2:01.15 lu.C.16 > 32516 eimamagi 25 0 141m 63m 4084 R 50.2 0.1 2:01.13 lu.C.16 > 32519 eimamagi 25 0 140m 63m 4152 R 50.2 0.1 2:01.14 lu.C.16 > 32521 eimamagi 25 0 140m 63m 4468 R 50.2 0.1 2:01.14 lu.C.16 > 32523 eimamagi 25 0 140m 62m 3756 R 50.2 0.1 2:01.13 lu.C.16 > 32659 eimamagi 25 0 140m 79m 19m R 50.2 0.1 0:05.25 lu.C.16 > 32660 eimamagi 25 0 140m 63m 4160 R 50.2 0.1 0:05.26 lu.C.16 > 32662 eimamagi 25 0 140m 63m 3280 R 50.2 0.1 0:05.27 lu.C.16 > 32664 eimamagi 25 0 141m 64m 5140 R 50.2 0.1 0:05.27 lu.C.16 > 32665 eimamagi 25 0 140m 64m 4876 R 50.2 0.1 0:05.27 lu.C.16 > 32666 eimamagi 25 0 140m 64m 4348 R 50.2 0.1 0:05.27 lu.C.16 > 32668 eimamagi 25 0 140m 64m 4688 R 50.2 0.1 0:05.26 lu.C.16 > 32669 eimamagi 25 0 140m 63m 4416 R 50.2 0.1 0:05.26 lu.C.16 > 32672 eimamagi 25 0 140m 63m 4396 R 50.2 0.1 0:05.26 lu.C.16 > 32508 eimamagi 25 0 140m 79m 19m R 49.8 0.1 2:01.14 lu.C.16 > 32509 eimamagi 25 0 140m 63m 4176 R 49.8 0.1 2:01.13 lu.C.16 > 32515 eimamagi 25 0 140m 64m 4404 R 49.8 0.1 2:01.14 lu.C.16 > 32517 eimamagi 25 0 141m 64m 4716 R 49.8 0.1 2:01.14 lu.C.16 > 32518 eimamagi 25 0 140m 64m 4660 R 49.8 0.1 2:01.13 lu.C.16 > 32520 eimamagi 25 0 140m 63m 3960 R 49.8 0.1 2:01.15 lu.C.16 > 32522 eimamagi 25 0 140m 63m 4484 R 49.8 0.1 2:01.14 lu.C.16 > 32661 eimamagi 25 0 140m 63m 3776 R 49.8 0.1 0:05.27 lu.C.16 > 32663 eimamagi 25 0 140m 64m 4216 R 49.8 0.1 0:05.26 lu.C.16 > 32667 eimamagi 25 0 140m 63m 3896 R 49.8 0.1 0:05.27 lu.C.16 > 32670 eimamagi 25 0 140m 63m 3956 R 49.8 0.1 0:05.26 lu.C.16 > 32674 eimamagi 25 0 140m 62m 3408 R 49.8 0.1 0:05.27 lu.C.16 > > MPSTAT: > 20:47:35 CPU %user %nice %sys %iowait %irq %soft > %steal %idle intr/s > 20:47:37 all 50.00 0.02 0.08 0.00 0.00 0.00 > 0.00 49.91 1004.50 > 20:47:37 0 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 1004.50 > 20:47:37 1 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 2 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 3 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 4 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 5 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 6 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 7 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 8 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 9 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 10 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 11 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 12 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 13 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 14 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 15 100.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 > 20:47:37 16 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 17 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 18 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 19 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 20 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 21 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 22 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 23 0.00 0.00 0.50 0.00 0.00 0.00 > 0.00 99.50 0.00 > 20:47:37 24 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 25 0.00 0.00 0.50 0.00 0.00 0.00 > 0.00 99.50 0.00 > 20:47:37 26 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 27 0.00 0.00 0.50 0.00 0.00 0.00 > 0.00 99.50 0.00 > 20:47:37 28 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 29 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 30 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 100.00 0.00 > 20:47:37 31 0.00 0.00 0.50 0.00 0.00 0.00 > 0.00 99.50 0.00 > > >> You also indicated in your original e-mail that a single node has 32 >> cores. I am assuming that it has eight sockets of four cores each. Are >> these Opterons or any other processor type? > > Quad-Core AMD Opteron(tm) Processor 8384. > > Thanks, > emir > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From eimamagi at srce.hr Sat Aug 29 17:25:04 2009 From: eimamagi at srce.hr (Emir Imamagic) Date: Sat Aug 29 17:26:44 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node In-Reply-To: <4A999A94.5050103@cse.ohio-state.edu> References: <4A997808.7060806@srce.hr> <4A999A94.5050103@cse.ohio-state.edu> Message-ID: <4A999CB0.9070304@srce.hr> Krishna Chaitanya Kandalla wrote: > 2. Optionally, you can then hit the "o" key, hold the shift and the j > keys so that the "J" and the "A" fields are juxtaposed - this will be > easier to compare visually. top - 23:24:01 up 56 days, 17:56, 3 users, load average: 15.17, 7.18, 2.78 Tasks: 493 total, 17 running, 476 sleeping, 0 stopped, 0 zombie Cpu(s): 25.0%us, 0.1%sy, 0.0%ni, 74.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 66072240k total, 10470960k used, 55601280k free, 336688k buffers Swap: 7999992k total, 0k used, 7999992k free, 7723584k cached P PID USER PR NI VIRT SHR RES %CPU %MEM S TIME+ COMMAND 0 1267 eimamagi 25 0 164m 11m 117m 50.2 0.2 R 1:28.06 lu.C.8.mvapich 2 1269 eimamagi 25 0 164m 5092 111m 50.2 0.2 R 1:28.38 lu.C.8.mvapich 5 1272 eimamagi 25 0 164m 4988 111m 50.2 0.2 R 1:28.05 lu.C.8.mvapich 6 1273 eimamagi 25 0 164m 4220 110m 50.2 0.2 R 1:28.06 lu.C.8.mvapich 1 1338 eimamagi 25 0 164m 5040 111m 50.2 0.2 R 1:28.45 lu.C.8.mvapich 3 1382 eimamagi 25 0 164m 3352 109m 50.2 0.2 R 1:28.09 lu.C.8.mvapich 4 1383 eimamagi 25 0 164m 3184 109m 50.2 0.2 R 1:28.15 lu.C.8.mvapich 7 1386 eimamagi 25 0 164m 3356 109m 50.2 0.2 R 1:28.15 lu.C.8.mvapich 1 1268 eimamagi 25 0 164m 4976 111m 49.8 0.2 R 1:27.74 lu.C.8.mvapich 3 1270 eimamagi 25 0 164m 3180 109m 49.8 0.2 R 1:28.10 lu.C.8.mvapich 4 1271 eimamagi 25 0 164m 3124 109m 49.8 0.2 R 1:28.04 lu.C.8.mvapich 7 1274 eimamagi 25 0 164m 3744 109m 49.8 0.2 R 1:28.04 lu.C.8.mvapich 0 1337 eimamagi 25 0 164m 11m 117m 49.8 0.2 R 1:28.11 lu.C.8.mvapich 2 1381 eimamagi 25 0 164m 4956 111m 49.8 0.2 R 1:27.75 lu.C.8.mvapich 5 1384 eimamagi 25 0 164m 4992 111m 49.8 0.2 R 1:28.13 lu.C.8.mvapich 6 1385 eimamagi 25 0 164m 4232 110m 49.8 0.2 R 1:28.14 lu.C.8.mvapich ID's of cores which are used are consistent with output of mpstat below. Here's also output of mpstat -P ALL 2 5 which nicely shows what's going on: 23:21:45 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 23:21:47 all 25.01 0.00 0.05 0.00 0.00 0.00 0.00 74.94 1004.00 23:21:47 0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1004.00 23:21:47 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23:21:47 2 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23:21:47 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23:21:47 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23:21:47 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23:21:47 6 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23:21:47 7 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23:21:47 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 21 0.00 0.00 0.50 0.00 0.00 0.00 0.00 99.50 0.00 23:21:47 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 23 0.50 0.00 1.00 0.00 0.00 0.00 0.00 98.51 0.00 23:21:47 24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 27 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00 23:21:47 31 0.00 0.00 0.50 0.00 0.00 0.00 0.00 99.50 0.00 Thanks, emir -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3283 bytes Desc: S/MIME Cryptographic Signature Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090829/0a0dc530/smime.bin From kubota at cray.com Mon Aug 31 00:59:29 2009 From: kubota at cray.com (Yutaka Kubota) Date: Mon Aug 31 01:00:23 2009 Subject: [mvapich-discuss] MV2_USE_BLOCKING option trouble on MVAPICH2 all version Message-ID: <3B7D8CBBF8049C4C9746728929189A920EBF2DE3@CFEVS1-IP.americas.cray.com> Dear MVAPICH2 Members, How do you do. This is Yutaka Kubota from Cray Japan. I try to submit MVAPICH2 issue first time that we found. We found blocking mode trouble using cpi.c program that is MVAPIC2 sample program. This problem was appeared using "MV2_USE_BLOCKING=1" option and appoint over 65 cores. If this program execute with option on the environment. The message appeared " Created comp channel 0x1ff6ce10" and stop on the way. We think it is issue of MVAPICH2. Because this problem was not appeared with "VIADEV_USE_BLOCKING=1" option and appoint over 65 cores on MVAPICH. $ mpirun_rsh -np 65 -hostfile hostfile.txt MV2_USE_BLOCKING=0 ./a.out Created comp channel 0x1ff6ce10 * In the case of "-np 64" is not appeared this message I also found compile issue of MVAPICH2 latest version. I guessed to suspect this issue for version 1.2-10-01. However we tried to execute compile latest version. There are appeared follows message during compiling. -1.2p1 and 1.4rc1 /opt/intel/cce/10.1.015/lib/libimf.so: warning: warning: feupdateenv is not implemented and will always fail *But out putted a.out If you need this source code. I will send you. The user had consented that we send user program to developer team. Best regards Yutaka Kubota, Cray Japan Inc From kubota at cray.com Mon Aug 31 02:43:07 2009 From: kubota at cray.com (Yutaka Kubota) Date: Mon Aug 31 02:43:55 2009 Subject: [mvapich-discuss] Global Array Programing Issue on the MVAPICH2 Message-ID: <3B7D8CBBF8049C4C9746728929189A920EBF2DF9@CFEVS1-IP.americas.cray.com> Dear MVAPICH2 member, This is Yutaka Kubota from Cray Japan Inc. One of the user made program using global array function on MVAPICH2. This program execute complete on 1 node and 16 cores environment. However in the case of 4 node and 16 core ( 1 node use 4 core) environment was appeared follows error message and program stop on the way. [4] Abort: Control shouldn't reach here in prototype, header 172 at line 274 in file ibv_recv.c I recompiled MVAPICH2 and user program with debug option. We found issue point of MVPAICH2 function name "MPIDI_CH3I_MRAIL_Parse_header" on the source file ibv_recv.c. We understood this function prepared 37 header type and if header type was not much their 37 header type, this is appeared error message and stop on the way. We suspected this problem of MVAPICH2. Could you investigate why "MPIDI_CH3I_MRAIL_Parse_header" function can't get 37 header types? And why this program get 3 digit number as header type ( ex. 172, 274, .... )? If you need this user program. I can send you because use already have consisted that we send developer team. * check header type source PATH mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/gen2/ibv_recv.c * 37 header types MPIDI_CH3_PKT_FAST_EAGER_SEND MPIDI_CH3_PKT_FAST_EAGER_SEND_WITH_REQ MPIDI_CH3_PKT_EAGER_SEND MPIDI_CH3_PKT_RNDV_REQ_TO_SEND MPIDI_CH3_PKT_RNDV_READY_REQ_TO_SEND MPIDI_CH3_PKT_RNDV_CLR_TO_SEND MPIDI_CH3_PKT_RMA_RNDV_CLR_TO_SEND MPIDI_CH3_PKT_RPUT_FINISH MPIDI_CH3_PKT_NOOP MPIDI_CH3_PKT_EAGER_SYNC_ACK MPIDI_CH3_PKT_CANCEL_SEND_REQ MPIDI_CH3_PKT_CANCEL_SEND_RESP MPIDI_CH3_PKT_CM_SUSPEND MPIDI_CH3_PKT_CM_REACTIVATION_DONE MPIDI_CH3_PKT_CR_REMOTE_UPDATE MPIDI_CH3_PKT_ADDRESS MPIDI_CH3_PKT_PACKETIZED_SEND_START MPIDI_CH3_PKT_PACKETIZED_SEND_DATA MPIDI_CH3_PKT_RNDV_R3_DATA MPIDI_CH3_PKT_EAGER_SYNC_SEND MPIDI_CH3_PKT_READY_SEND MPIDI_CH3_PKT_PUT MPIDI_CH3_PKT_PUT_RNDV MPIDI_CH3_PKT_GET_RNDV MPIDI_CH3_PKT_GET MPIDI_CH3_PKT_GET_RESP MPIDI_CH3_PKT_ACCUMULATE MPIDI_CH3_PKT_ACCUMULATE_RNDV MPIDI_CH3_PKT_LOCK MPIDI_CH3_PKT_LOCK_GRANTED MPIDI_CH3_PKT_PT_RMA_DONE MPIDI_CH3_PKT_LOCK_PUT_UNLOCK MPIDI_CH3_PKT_LOCK_GET_UNLOCK MPIDI_CH3_PKT_LOCK_ACCUM_UNLOCK MPIDI_CH3_PKT_FLOW_CNTL_UPDATE MPIDI_CH3_PKT_CLOSE MPIDI_CH3_PKT_RGET_FINISH Best regards Yutaka Kubota, Cray Japan Inc. From potluri at cse.ohio-state.edu Mon Aug 31 14:19:04 2009 From: potluri at cse.ohio-state.edu (sreeram potluri) Date: Mon Aug 31 14:19:50 2009 Subject: [mvapich-discuss] MV2_USE_BLOCKING option trouble on MVAPICH2 all version In-Reply-To: <23b6b0910908311117k694c8c48w5d4da0de19f796d0@mail.gmail.com> References: <3B7D8CBBF8049C4C9746728929189A920EBF2DE3@CFEVS1-IP.americas.cray.com> <23b6b0910908311117k694c8c48w5d4da0de19f796d0@mail.gmail.com> Message-ID: <23b6b0910908311119u40157f92y7ede2b5512861bfc@mail.gmail.com> Hi Yutaka, Regarding the issue you are seeing with BLOCKING mode: Beyond 64 processes mvapich2 uses on-demand mode of connection setup and the combination of on-demand + blocking is not supported yet. We plan to add this into our next release. As a work around, you can set MV2_ON_DEMAND_THRESHOLD=128 during runtime. This will turn off on-demand connection management upto 128 processes and blocking mode should work fine. To run with larger number of processes, you can set the MV2_ON_DEMAND_THRESHOLD accordingly. We are still looking into the compile-time issue you are seeing. It would be helpful if you can forward us the source code. Please let us know if you face any other issues. Thanks Sreeram Potluri On Mon, Aug 31, 2009 at 12:59 AM, Yutaka Kubota wrote: > Dear MVAPICH2 Members, > > How do you do. This is Yutaka Kubota from Cray Japan. > I try to submit MVAPICH2 issue first time that we found. > > We found blocking mode trouble using cpi.c program that is MVAPIC2 > sample program. This problem was appeared using "MV2_USE_BLOCKING=1" > option and appoint over 65 cores. If this program execute with option on > the environment. The message appeared " Created comp channel 0x1ff6ce10" > and stop on the way. We think it is issue of MVAPICH2. Because this > problem was not appeared with "VIADEV_USE_BLOCKING=1" option and appoint > over 65 cores on MVAPICH. > > $ mpirun_rsh -np 65 -hostfile hostfile.txt MV2_USE_BLOCKING=0 ./a.out > Created comp channel 0x1ff6ce10 > * In the case of "-np 64" is not appeared this message > > I also found compile issue of MVAPICH2 latest version. > I guessed to suspect this issue for version 1.2-10-01. However we tried > to execute compile latest version. There are appeared follows message > during compiling. > > -1.2p1 and 1.4rc1 > /opt/intel/cce/10.1.015/lib/libimf.so: warning: warning: feupdateenv is > not implemented and will always fail > *But out putted a.out > > If you need this source code. I will send you. The user had consented > that we send user program to developer team. > > Best regards > > Yutaka Kubota, Cray Japan Inc > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090831/0ecfd652/attachment-0001.html From panda at cse.ohio-state.edu Mon Aug 31 21:22:03 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Aug 31 21:22:38 2009 Subject: [mvapich-discuss] Problem with more MPI jobs on the same node In-Reply-To: <4A9C6FBF.8030006@srce.hr> Message-ID: Very glad to know that your problem is solved with the latest mvapich 1.1 nightly build from mvapich web site. I am posting this reply to mvapich-discuss list for other users. We do not do the testing of mvapich versions included in the RHEL versions. Red Hat people do this testing. I am cc'ing this note to Doug Ledford. Doug - Could you or your team members take a look at this issue. It looks like affinity-related stuff is not working correctly with the mvapich 1.1 RPM version included in RHEL 5.3. There is a thread of discussion on this issue on the mvapich-discuss list. Thanks, DK On Tue, 1 Sep 2009, Emir Imamagic wrote: > Dhabaleswar Panda wrote: > > Thanks for the update. Will it be possible for you to download mvapich > > 1.1.0 `branch' version from our web site and let us know whether it > > exhibits the same behavior or not. This will help us to isolate whether > > it is a problem with the specific version in the SRPM or not. > > muy bueno, this solved the problem. I downloaded the latest nightly > build (mvapich-1.1-2009-08-30.tar.gz), rebuild the RPM and voila, all my > CPUs are completely utilized (see top output below). > > It is probably a good idea for you to test the version distributed with > the latest RHEL 5.3 and try to reproduce the error. I guess RHEL users > would be happy to know that they're getting problematic version. > > Cheers, > emir > > top - 02:46:42 up 58 days, 21:19, 2 users, load average: 30.02, 13.39, > 5.11 > Tasks: 509 total, 33 running, 476 sleeping, 0 stopped, 0 zombie > Cpu(s): 99.9%us, 0.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 66072240k total, 10799684k used, 55272556k free, 337052k buffers > Swap: 7999992k total, 0k used, 7999992k free, 7831652k cached > > P PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 0 18716 eimamagi 25 0 138m 79m 19m R 99.9 0.1 2:42.29 lu.C.16 > 26 18721 eimamagi 25 0 138m 65m 5312 R 99.9 0.1 2:42.40 lu.C.16 > 24 18723 eimamagi 25 0 138m 64m 4512 R 99.9 0.1 2:42.36 lu.C.16 > 2 18724 eimamagi 25 0 138m 64m 4744 R 99.9 0.1 2:42.40 lu.C.16 > 16 18726 eimamagi 25 0 138m 64m 5252 R 99.9 0.1 2:42.35 lu.C.16 > 1 18728 eimamagi 25 0 138m 63m 4164 R 99.9 0.1 2:42.42 lu.C.16 > 23 18729 eimamagi 25 0 138m 63m 4708 R 99.9 0.1 2:42.38 lu.C.16 > 6 18730 eimamagi 25 0 138m 63m 4696 R 99.9 0.1 2:42.41 lu.C.16 > 17 18753 eimamagi 25 0 138m 63m 3548 R 99.9 0.1 2:41.14 lu.C.16 > 19 18761 eimamagi 25 0 138m 64m 4700 R 99.9 0.1 2:41.41 lu.C.16 > 20 18762 eimamagi 25 0 138m 63m 4112 R 99.9 0.1 2:41.35 lu.C.16 > 15 18763 eimamagi 25 0 138m 63m 4632 R 99.9 0.1 2:41.43 lu.C.16 > 27 18764 eimamagi 25 0 138m 63m 4696 R 99.9 0.1 2:41.42 lu.C.16 > 3 18717 eimamagi 25 0 138m 64m 4560 R 99.6 0.1 2:42.41 lu.C.16 > 12 18718 eimamagi 25 0 138m 64m 4324 R 99.6 0.1 2:42.34 lu.C.16 > 21 18719 eimamagi 25 0 138m 63m 3652 R 99.6 0.1 2:42.39 lu.C.16 > 4 18720 eimamagi 25 0 138m 64m 4640 R 99.6 0.1 2:42.39 lu.C.16 > 18 18722 eimamagi 25 0 138m 64m 5152 R 99.6 0.1 2:42.40 lu.C.16 > 13 18727 eimamagi 25 0 138m 64m 4692 R 99.6 0.1 2:42.41 lu.C.16 > 28 18731 eimamagi 25 0 138m 63m 4136 R 99.6 0.1 2:42.35 lu.C.16 > 11 18750 eimamagi 25 0 138m 79m 19m R 99.6 0.1 2:41.40 lu.C.16 > 7 18751 eimamagi 25 0 138m 64m 4824 R 99.6 0.1 2:41.18 lu.C.16 > 22 18752 eimamagi 25 0 138m 64m 4532 R 99.6 0.1 2:41.41 lu.C.16 > 30 18754 eimamagi 25 0 138m 64m 4760 R 99.6 0.1 2:41.41 lu.C.16 > 5 18755 eimamagi 25 0 138m 65m 5312 R 99.6 0.1 2:41.42 lu.C.16 > 8 18757 eimamagi 25 0 138m 64m 4536 R 99.6 0.1 2:41.34 lu.C.16 > 14 18758 eimamagi 25 0 138m 64m 4676 R 99.6 0.1 2:41.41 lu.C.16 > 25 18759 eimamagi 25 0 138m 64m 5296 R 99.6 0.1 2:41.39 lu.C.16 > 10 18765 eimamagi 25 0 138m 62m 3784 R 99.6 0.1 2:41.27 lu.C.16 > 29 18725 eimamagi 25 0 138m 64m 5284 R 99.3 0.1 2:41.56 lu.C.16 > 9 18756 eimamagi 25 0 138m 65m 5288 R 99.3 0.1 2:40.30 lu.C.16 > 31 18760 eimamagi 25 0 138m 64m 5260 R 97.0 0.1 2:36.96 lu.C.16 > > From panda at cse.ohio-state.edu Mon Aug 31 21:42:58 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Aug 31 21:43:30 2009 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.4RC2 Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH2 1.4RC2 with the following added features and bug-fixes (compared to the RC1 release): - Added Feature: Check-point Restart with Fault-Tolerant Backplane Support (FTB_CR) - Added Feature: Multiple CQ-based design for Chelsio iWARP - Fix for hang with packetized send using RDMA Fast path - Fix for allowing to use user specified P_Key's (Thanks to Mike Heinz @ QLogic) - Fix for allowing mpirun_rsh to accept parameters through the parameters file (Thanks to Mike Heinz @ QLogic) - Distribute LiMIC2-0.5.2 with MVAPICH2. Added flexibility for selecting and using a pre-existing installation of LiMIC2 - Modify the default value of shmem_bcast_leaders to 4K - Fix for one-sided with XRC support - Fix hang with XRC - Fix to always enabling MVAPICH2_Sync_Checkpoint functionality - Increase the amount of command line that mpirun_rsh can handle (Thanks for the suggestion by Bill Barth @ TACC) - Fix build error on RHEL 4 systems (Reported by Nathan Baca and Jonathan Atencio) - Fix issue with PGI compilation for PSM interface - Fix for one-sided accumulate function with user-defined contiguous datatypes - Fix linear/hierarchical switching logic and reduce threshold for the enhanced mpirun_rsh framework. - Clean up intra-node connection management code for iWARP - Fix --enable-g=all issue with uDAPL interface - Fix one sided operation with on demand CM. - Fix VPATH build We strongly encourage MVAPICH2 1.4 users to update to the new RC2 version. For downloading MVAPICH2 1.4RC2, associated user guide and accessing the SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu All feedbacks, including bug reports and hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From panda at cse.ohio-state.edu Mon Aug 31 23:10:22 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Aug 31 23:10:52 2009 Subject: [mvapich-discuss] Hang in MPI_Isend/MPI_Recv combination In-Reply-To: <4A844394.80607@web.de> Message-ID: Hi, We have made MVAPICH2 1.4RC2 release today. We have run your application with this version and it seems to be working fine. Can you double-check your application with this version. Thanks, DK On Thu, 13 Aug 2009, Dorian Krause wrote: > Hi, > > again these 96 processors ... > > My application hangs in a communication step which looks like this: > > --------- > Group A: > > for all neighbors { > MPI_Isend(...); > } > MPI_Waitall(...); > > MPI_Barrier(); > ---- > Group B: > > while(#messages to receive > 0) { > MPI_Probe(MPI_ANY_SOURCE, &stat); > q = stat.MPI_SOURCE > /* in subfunction: */ > MPI_Probe(q, &stat) > q = stat.MPI_COUNT; > MPI_Recv(q, ...); > } > MPI_Barrier(); > ---- > > for more 96 processes this application hangs. Since I can't debug on > this scale, I used gdb to get backtraces. It tourned out that 94 > processes are waiting in the barrier, One processor is trying to receive > a message (stuck in MPI_Recv) and one other is waiting in > MPI_Waitall(...). This looks fine, however the ranks do not match: > > On the PE with rank 83, I have > > #3 0x00000000004349b9 in PMPI_Recv (buf=0x1bd96010, count=202, > datatype=-1946157051, source=40, tag=374, comm=-1006632954, status=0x1) > at recv.c:156 > > and on PE with rank *12* I have > > #3 0x00000000004368f4 in PMPI_Waitall (count=8, > array_of_requests=0x197e6b10, array_of_statuses=0x1) > at waitall.c:191 > > It seems that rank 40 slipped throught the MPI_Waitall eventhough he was > not supposed to do so ... > > Please find attached the output files. There are three processes which > seem to be not in the barrier (2 on compute-0-3 and 1 on compute-0-13 > but the one with the short backtrace on compute-0-3 is also in the > barrier as I could confirm by hand). > > Any hints what might cause this error? > > I'm using the trunk version of mvapich2 (check-out yesterday) and the > cluster consists of 14 LS22 blades (opteron) with 4x DDR Infiniband. I'm > not quiet sure which ofed version it is (it is delivered with the rocks > distribution and they are typically not very verbose concerning version > numbers ...). > > Thanks for your help, > Dorian > > > > > > > >