From Sergey.Shalnov at intel.com Fri Jun 1 02:36:33 2007 From: Sergey.Shalnov at intel.com (Shalnov, Sergey) Date: Fri Jun 1 02:36:55 2007 Subject: [mvapich-discuss] DDR Infiniband configuration for Mvapich-9.9.9 Message-ID: Gentlemen, I used mvapich-0.9.9 to work with my applications. I have upgraded my cluster network from _SDR_ to _DDR_ links and found that mvapich has the same performance. I use ofed-1.2-rc2. I built mvapich by make.mvapich.gen2 script and did not found any configuration macro to point that I have DDR IB. Also, I found that script make.mvapich.vapi can be specially configured for DDR IB. How I should compile mvapich-0.9.9 to achieve maximum performance for DDR IB network? Sergey PS. I have no vapi library installed on my cluster. From surs at cse.ohio-state.edu Fri Jun 1 17:22:18 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Fri Jun 1 17:22:31 2007 Subject: [mvapich-discuss] DDR Infiniband configuration for Mvapich-9.9.9 In-Reply-To: References: Message-ID: <46608E0A.3000805@cse.ohio-state.edu> Hi Sergey, Thanks for trying out MVAPICH-0.9.9. You should just compile MVAPICH-0.9.9 using the make.mvapich2.gen2. If the OFED drivers are loaded on the machine on which you are compiling, MVAPICH should automatically detect all optimal settings. For which application did you not see any benefit? The OSU bandwidth and bi-directional bandwidth tests should always show a difference if you upgraded your links from SDR to DDR. Thanks, Sayantan. Shalnov, Sergey wrote: > Gentlemen, > I used mvapich-0.9.9 to work with my applications. I have upgraded my > cluster network from _SDR_ to _DDR_ links and found that mvapich has the > same performance. I use ofed-1.2-rc2. I built mvapich by > make.mvapich.gen2 script and did not found any configuration macro to > point that I have DDR IB. Also, I found that script make.mvapich.vapi > can be specially configured for DDR IB. > > How I should compile mvapich-0.9.9 to achieve maximum performance for > DDR IB network? > > Sergey > > PS. I have no vapi library installed on my cluster. > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- http://www.cse.ohio-state.edu/~surs From potts at hpcapplications.com Sun Jun 3 10:04:04 2007 From: potts at hpcapplications.com (Mark Potts) Date: Sun Jun 3 10:04:21 2007 Subject: [mvapich-discuss] MVAPICH IB /tmp files Message-ID: <4662CA54.7020609@hpcapplications.com> Hi, I'm observing files created during MVAPICH app startup and need to understand their purpose. Their existence seems related to a configuration problem on an IB x86_64 cluster system. I'm using MVAPICH-0.9.9 ch_gen2 singlerail and the app is quite simple. The 3 files are named: /tmp/ib_pool-NNN-nodename.tmp /tmp/ib_shmem-NNN-nodename.tmp /tmp/ib_shmem_coll-NNN-nodename.tmp where "NNN" is the MPIRUN_ID of the app and "nodename" is the name of the compute node on which the files are found. On some nodes the files are empty and on others they have significant binary (non- text) content. Since these files are not created for all run-time selections of compute nodes, I'm trying to understand two things: (1) Under what circumstances are these files generated or not, and (2) what is the intended content of these files that leaves some empty and others "filled"? Thanks. regards, -- *********************************** >> Mark J. Potts, PhD >> >> HPC Applications Inc. >> phone: 410-992-8360 Bus >> 410-313-9318 Home >> 443-418-4375 Cell >> email: potts@hpcapplications.com >> potts@excray.com *********************************** From t3dinh at yahoo.com Mon Jun 4 04:55:33 2007 From: t3dinh at yahoo.com (phuong dinh) Date: Mon Jun 4 04:55:46 2007 Subject: [mvapich-discuss] viainit.c errors while building mvapich-0.9.9 with topspin Message-ID: <928217.11691.qm@web31611.mail.mud.yahoo.com> I got errors while building mvapich-0.9.9 with topspin and Intel compiler with x86_64 cluster system (Rocks Cent-OS). The errors is the same as the thread below. http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-April/000111.html I thought there is already a patch for this kind of problem. Any thought? Thanks --------------------------------- Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070604/48292009/attachment.html From THOMAS.T.O'SHEA at saic.com Mon Jun 4 11:37:23 2007 From: THOMAS.T.O'SHEA at saic.com (Thomas O'Shea) Date: Mon Jun 4 11:37:48 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. References: Message-ID: <055401c7a6be$3f55e7a0$9b66798b@us.saic.com> We migrated over to gen2 (OpenFabrics ) and we are still getting the same errors. I was wondering if you found anything, or have any ideas of what to try next. Thanks, Tom ----- Original Message ----- From: "wei huang" To: "Thomas O'Shea" Cc: Sent: Friday, May 04, 2007 3:06 PM Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. > Hi Thomas, > > Thanks for your reply. > > Because the source code of your application is not available to us, we > will do a code review of our code (or do you have a piece of code which > shows the problem that can be sent to us?) > > The reason I ask you to try gen2 (OpenFabrics) stack is because the whole > InfiniBand community is moving towards this. So actually most of our > efforts is spent on this front (though we still maintain certain necessary > maintenance and bug fixes for the vapi stack). You can find useful > information to install the OFED stack (OpenFabrics Enterprise > Distribution) here: > > http://www.openfabrics.org/downloads.htm > > And the information to compile mvapich2 with OFED stack is avaialable > through our website. > > Anyway, we will get back to you once we find something. > > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > On Fri, 4 May 2007, Thomas O'Shea wrote: > > > Thanks for the response. > > > > 1) Turns out we are using mvapich2-0.9.8p1 already. > > > > 2) Yes, the standard compiling scripts were used. > > > > 3) You are correct, most of the communication involves one sided operations > > with passive synchronization. The code also uses a few other MPI commands. > > > > We define MPI vector types: > > > > CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > > & xtype,ierr) > > > > CALL MPI_TYPE_COMMIT(xtype,ierr) > > > > Create MPI Windows: > > > > CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > > & MPI_COMM_WORLD,win,ierr) > > > > Synch our gets with lock and unlock: > > > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > > CALL MPI_GET(wget,1,xtype,get_pe, > > & targ_disp,1,xtype,win,ierr) > > CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > > > > We use one broadcast call > > > > call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > > 1 MPI_COMM_WORLD,ierr) > > > > And of course barriers and freeing the windows and vector types. > > > > The error we are getting happens on a MPI_WIN_UNLOCK after a GET call that > > does not use the MPI_TYPE_VECTOR that we created though. The ierr from the > > GET call is 0 as well. > > > > > > 4) I talked with the IT person in charge of this cluster and he said that we > > could try that, but he said the documentation he found on gen2 and udapl was > > somewhat sparse in that he wasn't sure exactly how to set that up and what > > the different compilations actually do differently. Is there any resource > > you can point us towards? > > > > Thanks, > > Tom > > > > > > > Hi Thomas, > > > > > > We will look into this issue. Would you please let us know the following: > > > > > > 1) We have recently made a couple of bug fixes and released > > > mvapich2-0.9.8p1. Would you first try that version? > > > > > > And if it is not working: > > > > > > 2) Did you use the standard compiling scripts (you mentioned ib gold > > > release, is it on vapi? And did you use make.mvapich2.vapi?) > > > > > > 3) Would you provide us some information on how the comunication patterns > > > of your application are? It seems like one sided operations with passive > > > synchronization (lock, get, unlock). Did you use other operations? > > > > > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl on > > > your stack, if they are available on your systems? > > > > > > Thanks. > > > > > > Regards, > > > Wei Huang > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > Dept. of Computer Science and Engineering > > > Ohio State University > > > OH 43210 > > > Tel: (614)292-8501 > > > > > > > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > > > > > Hello, > > > > > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > > > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > > > through infiniband. I started off running this parallel Fortran code > > > > on just one node with MPICH2 and had no problems. It scaled decently > > > > to 8 processors but didn't see much improvement with the jump to 16 > > > > (possibly due to cache coherency or something). Now, when trying to > > > > get it running across the infiniband connect I get this error: > > > > > > > > current bytes 4, total bytes 28, remote id 1 > > > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: Assertion > > 'current_bytes[vc->smp.local_nodes] == 0' failed. > > > > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > > > > exit status of rank 0: killed by signal 9 > > > > > > > > This happens right after a one sided communication (MPI_GET) but > > > > before the MPI_WIN_UNLOCK call that follows. Also this is only with a > > > > process that is on the same node as the calling process, The MPI_GET > > > > call exits with no errors also. > > > > > > > > All the osu_benchmarks run with no problems. There were also no > > > > problems if I make a local mpd (mpd &) ring on a single node and run > > > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile with > > > > the MPICH2 libraries there are no problems on a single node or running > > > > processes spread out on both nodes. > > > > > > > > Ever seen this before? Any help would be greatly appreciated. > > > > > > > > Thanks, > > > > Thomas O'Shea > > > > SAIC > > From huanwei at cse.ohio-state.edu Mon Jun 4 11:50:25 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Mon Jun 4 11:50:39 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. In-Reply-To: <055401c7a6be$3f55e7a0$9b66798b@us.saic.com> Message-ID: Hi, We've been carrying thorough testing on our code base. Up to now, we did not find any outstanding error on MPI one sided code. Can we get access to your source code or get a small program showing the problem? It will be the easiest way for us to find the problem. Also, since this is an assertion failure in the SMP part of code, you can also try compiling mvapich2 without SMP by removing the -D_SMP_ flag from your CFLAGS. Corresponding changes can be made in our make.mvapich2.ofa. Let's see if your program runs successfully with that change. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Mon, 4 Jun 2007, Thomas O'Shea wrote: > We migrated over to gen2 (OpenFabrics ) and we are still getting the same > errors. I was wondering if you found anything, or have any ideas of what to > try next. > > Thanks, > Tom > ----- Original Message ----- > From: "wei huang" > To: "Thomas O'Shea" > Cc: > Sent: Friday, May 04, 2007 3:06 PM > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > Hi Thomas, > > > > Thanks for your reply. > > > > Because the source code of your application is not available to us, we > > will do a code review of our code (or do you have a piece of code which > > shows the problem that can be sent to us?) > > > > The reason I ask you to try gen2 (OpenFabrics) stack is because the whole > > InfiniBand community is moving towards this. So actually most of our > > efforts is spent on this front (though we still maintain certain necessary > > maintenance and bug fixes for the vapi stack). You can find useful > > information to install the OFED stack (OpenFabrics Enterprise > > Distribution) here: > > > > http://www.openfabrics.org/downloads.htm > > > > And the information to compile mvapich2 with OFED stack is avaialable > > through our website. > > > > Anyway, we will get back to you once we find something. > > > > Thanks. > > > > Regards, > > Wei Huang > > > > 774 Dreese Lab, 2015 Neil Ave, > > Dept. of Computer Science and Engineering > > Ohio State University > > OH 43210 > > Tel: (614)292-8501 > > > > > > On Fri, 4 May 2007, Thomas O'Shea wrote: > > > > > Thanks for the response. > > > > > > 1) Turns out we are using mvapich2-0.9.8p1 already. > > > > > > 2) Yes, the standard compiling scripts were used. > > > > > > 3) You are correct, most of the communication involves one sided > operations > > > with passive synchronization. The code also uses a few other MPI > commands. > > > > > > We define MPI vector types: > > > > > > CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > > > & xtype,ierr) > > > > > > CALL MPI_TYPE_COMMIT(xtype,ierr) > > > > > > Create MPI Windows: > > > > > > CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > > > & MPI_COMM_WORLD,win,ierr) > > > > > > Synch our gets with lock and unlock: > > > > > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > > > CALL MPI_GET(wget,1,xtype,get_pe, > > > & targ_disp,1,xtype,win,ierr) > > > CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > > > > > > We use one broadcast call > > > > > > call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > > > 1 MPI_COMM_WORLD,ierr) > > > > > > And of course barriers and freeing the windows and vector types. > > > > > > The error we are getting happens on a MPI_WIN_UNLOCK after a GET call > that > > > does not use the MPI_TYPE_VECTOR that we created though. The ierr from > the > > > GET call is 0 as well. > > > > > > > > > 4) I talked with the IT person in charge of this cluster and he said > that we > > > could try that, but he said the documentation he found on gen2 and udapl > was > > > somewhat sparse in that he wasn't sure exactly how to set that up and > what > > > the different compilations actually do differently. Is there any > resource > > > you can point us towards? > > > > > > Thanks, > > > Tom > > > > > > > > > > Hi Thomas, > > > > > > > > We will look into this issue. Would you please let us know the > following: > > > > > > > > 1) We have recently made a couple of bug fixes and released > > > > mvapich2-0.9.8p1. Would you first try that version? > > > > > > > > And if it is not working: > > > > > > > > 2) Did you use the standard compiling scripts (you mentioned ib gold > > > > release, is it on vapi? And did you use make.mvapich2.vapi?) > > > > > > > > 3) Would you provide us some information on how the comunication > patterns > > > > of your application are? It seems like one sided operations with > passive > > > > synchronization (lock, get, unlock). Did you use other operations? > > > > > > > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl > on > > > > your stack, if they are available on your systems? > > > > > > > > Thanks. > > > > > > > > Regards, > > > > Wei Huang > > > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > > Dept. of Computer Science and Engineering > > > > Ohio State University > > > > OH 43210 > > > > Tel: (614)292-8501 > > > > > > > > > > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > > > > > > > Hello, > > > > > > > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > > > > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > > > > through infiniband. I started off running this parallel Fortran code > > > > > on just one node with MPICH2 and had no problems. It scaled decently > > > > > to 8 processors but didn't see much improvement with the jump to 16 > > > > > (possibly due to cache coherency or something). Now, when trying to > > > > > get it running across the infiniband connect I get this error: > > > > > > > > > > current bytes 4, total bytes 28, remote id 1 > > > > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: > Assertion > > > 'current_bytes[vc->smp.local_nodes] == 0' failed. > > > > > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > > > > > exit status of rank 0: killed by signal 9 > > > > > > > > > > This happens right after a one sided communication (MPI_GET) but > > > > > before the MPI_WIN_UNLOCK call that follows. Also this is only with > a > > > > > process that is on the same node as the calling process, The MPI_GET > > > > > call exits with no errors also. > > > > > > > > > > All the osu_benchmarks run with no problems. There were also no > > > > > problems if I make a local mpd (mpd &) ring on a single node and run > > > > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile > with > > > > > the MPICH2 libraries there are no problems on a single node or > running > > > > > processes spread out on both nodes. > > > > > > > > > > Ever seen this before? Any help would be greatly appreciated. > > > > > > > > > > Thanks, > > > > > Thomas O'Shea > > > > > SAIC > > > > From sylvain.jeaugey at bull.net Mon Jun 4 12:29:58 2007 From: sylvain.jeaugey at bull.net (Sylvain Jeaugey) Date: Mon Jun 4 12:30:01 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. In-Reply-To: <055401c7a6be$3f55e7a0$9b66798b@us.saic.com> References: <055401c7a6be$3f55e7a0$9b66798b@us.saic.com> Message-ID: Hi all, For the record, this is an error I already encountered. [I didn't report it since I'm still using an old mvapich tree.] Unfortunately, we also don't have a simple way to reproduce it. Sylvain On Mon, 4 Jun 2007, Thomas O'Shea wrote: > We migrated over to gen2 (OpenFabrics ) and we are still getting the same > errors. I was wondering if you found anything, or have any ideas of what to > try next. > > Thanks, > Tom > ----- Original Message ----- > From: "wei huang" > To: "Thomas O'Shea" > Cc: > Sent: Friday, May 04, 2007 3:06 PM > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > 'current_bytes[vc->smp.local_nodes]==0' failed. > > >> Hi Thomas, >> >> Thanks for your reply. >> >> Because the source code of your application is not available to us, we >> will do a code review of our code (or do you have a piece of code which >> shows the problem that can be sent to us?) >> >> The reason I ask you to try gen2 (OpenFabrics) stack is because the whole >> InfiniBand community is moving towards this. So actually most of our >> efforts is spent on this front (though we still maintain certain necessary >> maintenance and bug fixes for the vapi stack). You can find useful >> information to install the OFED stack (OpenFabrics Enterprise >> Distribution) here: >> >> http://www.openfabrics.org/downloads.htm >> >> And the information to compile mvapich2 with OFED stack is avaialable >> through our website. >> >> Anyway, we will get back to you once we find something. >> >> Thanks. >> >> Regards, >> Wei Huang >> >> 774 Dreese Lab, 2015 Neil Ave, >> Dept. of Computer Science and Engineering >> Ohio State University >> OH 43210 >> Tel: (614)292-8501 >> >> >> On Fri, 4 May 2007, Thomas O'Shea wrote: >> >>> Thanks for the response. >>> >>> 1) Turns out we are using mvapich2-0.9.8p1 already. >>> >>> 2) Yes, the standard compiling scripts were used. >>> >>> 3) You are correct, most of the communication involves one sided > operations >>> with passive synchronization. The code also uses a few other MPI > commands. >>> >>> We define MPI vector types: >>> >>> CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, >>> & xtype,ierr) >>> >>> CALL MPI_TYPE_COMMIT(xtype,ierr) >>> >>> Create MPI Windows: >>> >>> CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, >>> & MPI_COMM_WORLD,win,ierr) >>> >>> Synch our gets with lock and unlock: >>> >>> CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) >>> CALL MPI_GET(wget,1,xtype,get_pe, >>> & targ_disp,1,xtype,win,ierr) >>> CALL MPI_WIN_UNLOCK(get_pe,win,ierr) >>> >>> We use one broadcast call >>> >>> call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, >>> 1 MPI_COMM_WORLD,ierr) >>> >>> And of course barriers and freeing the windows and vector types. >>> >>> The error we are getting happens on a MPI_WIN_UNLOCK after a GET call > that >>> does not use the MPI_TYPE_VECTOR that we created though. The ierr from > the >>> GET call is 0 as well. >>> >>> >>> 4) I talked with the IT person in charge of this cluster and he said > that we >>> could try that, but he said the documentation he found on gen2 and udapl > was >>> somewhat sparse in that he wasn't sure exactly how to set that up and > what >>> the different compilations actually do differently. Is there any > resource >>> you can point us towards? >>> >>> Thanks, >>> Tom >>> >>> >>>> Hi Thomas, >>>> >>>> We will look into this issue. Would you please let us know the > following: >>>> >>>> 1) We have recently made a couple of bug fixes and released >>>> mvapich2-0.9.8p1. Would you first try that version? >>>> >>>> And if it is not working: >>>> >>>> 2) Did you use the standard compiling scripts (you mentioned ib gold >>>> release, is it on vapi? And did you use make.mvapich2.vapi?) >>>> >>>> 3) Would you provide us some information on how the comunication > patterns >>>> of your application are? It seems like one sided operations with > passive >>>> synchronization (lock, get, unlock). Did you use other operations? >>>> >>>> 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl > on >>>> your stack, if they are available on your systems? >>>> >>>> Thanks. >>>> >>>> Regards, >>>> Wei Huang >>>> >>>> 774 Dreese Lab, 2015 Neil Ave, >>>> Dept. of Computer Science and Engineering >>>> Ohio State University >>>> OH 43210 >>>> Tel: (614)292-8501 >>>> >>>> >>>> On Thu, 3 May 2007, Thomas O'Shea wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 >>>>> 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up >>>>> through infiniband. I started off running this parallel Fortran code >>>>> on just one node with MPICH2 and had no problems. It scaled decently >>>>> to 8 processors but didn't see much improvement with the jump to 16 >>>>> (possibly due to cache coherency or something). Now, when trying to >>>>> get it running across the infiniband connect I get this error: >>>>> >>>>> current bytes 4, total bytes 28, remote id 1 >>>>> nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: > Assertion >>> 'current_bytes[vc->smp.local_nodes] == 0' failed. >>>>> rank 0 in job 1 nessie_32906 caused collective abort of all ranks >>>>> exit status of rank 0: killed by signal 9 >>>>> >>>>> This happens right after a one sided communication (MPI_GET) but >>>>> before the MPI_WIN_UNLOCK call that follows. Also this is only with > a >>>>> process that is on the same node as the calling process, The MPI_GET >>>>> call exits with no errors also. >>>>> >>>>> All the osu_benchmarks run with no problems. There were also no >>>>> problems if I make a local mpd (mpd &) ring on a single node and run >>>>> the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile > with >>>>> the MPICH2 libraries there are no problems on a single node or > running >>>>> processes spread out on both nodes. >>>>> >>>>> Ever seen this before? Any help would be greatly appreciated. >>>>> >>>>> Thanks, >>>>> Thomas O'Shea >>>>> SAIC >>> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From chai.15 at osu.edu Mon Jun 4 14:21:30 2007 From: chai.15 at osu.edu (LEI CHAI) Date: Mon Jun 4 14:21:46 2007 Subject: [mvapich-discuss] MVAPICH IB /tmp files Message-ID: <53962953823d.53823d539629@osu.edu> Dr. Potts, The files you have seen are not a problem :-) MVAPICH uses shared memory for intra-node communication. All the processes on the same node mmap some temporary files, and use the mapped memory regions for intra-node point-to-point and collective communication. As long as you compile MVAPICH with the _SMP_ flag, you will see the files during startup. The files will be cleaned up after the program finishes. The files are initially created as empty, and then processes touch the files to bring them into the memory. So depending on when you take a look at the files, right after the empty files are created or after they are touched by the processes, the files may be "empty" or "filled". Lei ----- Original Message ----- From: Mark Potts Date: Sunday, June 3, 2007 10:04 am Subject: [mvapich-discuss] MVAPICH IB /tmp files > Hi, > I'm observing files created during MVAPICH app startup and need > to understand their purpose. Their existence seems related to a > configuration problem on an IB x86_64 cluster system. I'm > using MVAPICH-0.9.9 ch_gen2 singlerail and the app is quite simple. > > The 3 files are named: /tmp/ib_pool-NNN-nodename.tmp > /tmp/ib_shmem-NNN-nodename.tmp > /tmp/ib_shmem_coll-NNN-nodename.tmp > where "NNN" is the MPIRUN_ID of the app and "nodename" is the name > of the compute node on which the files are found. On some nodes > the files are empty and on others they have significant binary > (non- > text) content. > > Since these files are not created for all run-time selections > of compute nodes, I'm trying to understand two things: > (1) Under what circumstances are these files generated or not, and > (2) what is the intended content of these files that leaves some > empty and others "filled"? > > Thanks. > regards, > -- > *********************************** > >> Mark J. Potts, PhD > >> > >> HPC Applications Inc. > >> phone: 410-992-8360 Bus > >> 410-313-9318 Home > >> 443-418-4375 Cell > >> email: potts@hpcapplications.com > >> potts@excray.com > *********************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From t3dinh at yahoo.com Mon Jun 4 20:25:49 2007 From: t3dinh at yahoo.com (phuong dinh) Date: Mon Jun 4 20:26:06 2007 Subject: Fwd: [mvapich-discuss] viainit.c errors while building mvapich-0.9.9 with topspin Message-ID: <633231.32490.qm@web31604.mail.mud.yahoo.com> Skipped content of type multipart/alternative-------------- next part -------------- An embedded message was scrubbed... From: phuong dinh Subject: [mvapich-discuss] viainit.c errors while building mvapich-0.9.9 with topspin Date: Mon, 4 Jun 2007 01:55:33 -0700 (PDT) Size: 5889 Url: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070604/b8fce88b/attachment-0001.mht From bfp at purdue.edu Mon Jun 4 21:05:17 2007 From: bfp at purdue.edu (Bryan Putnam) Date: Mon Jun 4 21:04:40 2007 Subject: Fwd: [mvapich-discuss] viainit.c errors while building mvapich-0.9.9 with topspin In-Reply-To: <633231.32490.qm@web31604.mail.mud.yahoo.com> References: <633231.32490.qm@web31604.mail.mud.yahoo.com> Message-ID: On Mon, 4 Jun 2007, phuong dinh wrote: > > > Had anyone success in building mvapich 0.9.9 with topspin? > > -Thanks, > Phong Phong, I too ran into vianit.c problems about a year ago when building mvapich with topspin drivers. I eventually gave up on it, and switched to mvapich2, where I had no problems. Bryan > > > --------------------------------- > Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. From zhangwl at ncic.ac.cn Mon Jun 4 21:09:00 2007 From: zhangwl at ncic.ac.cn (zhangwl) Date: Mon Jun 4 21:18:09 2007 Subject: [mvapich-discuss] Fw: Re: Fw: Question for help, thanks a lot! References: Message-ID: <200706050909003437148@ncic.ac.cn> A question on MPI over infiniband bandwidth test. Thanks for your help! zhangwl 2007-06-05 发件人: Jiuxing Liu 发送时间: 2007-06-04 22:05:10 收件人: zhangwl@ncic.ac.cn 抄送: 主题: Re: Fw: Question for help, thanks a lot! Hi, I have left Ohio State and do not maintain MVAPICH any more. Can you forward your message to the MVAPICH mailing list (mvapich-discuss@cse.ohio-state.edu )? I think that people there will be more than happy to help you. Thanks, -Jiuxing "zhangwl" 06/04/2007 07:58 AM ToJiuxing Liu/Watson/IBM@IBMUS cc SubjectFw: Question for help, thanks a lot! Hello, Dr. Jiuxing Liu, When testing mpi over infiniband performance, I find a problem that bandwidth at larger messages are lower instead both intra and inter node. The detailed tests and results are attached as follows. My system is: -- 2.2GHz Dual Core AMD Opteron(tm) Processor 275, 8GB Mem -- Linux 2.6.9-42.ELsmp x86_64 -- openib-1.1 { Detected the following HCAs: 1) mthca0 [ Mellanox PCI-X ] } 1. Test inter-node bandwidth with -DVIADEV_RGET_SUPPORT . setup_ch_gen2 starts... -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RGET_SUPPORT -DLAZY_MEM_UNREGISTER -DCH_GEN2 -D_SMP_ -D_SMP_RNDV_ -D_MLX_PCI_X_ -I/usr/local/ofed/include -O3 $ mpirun_rsh -rsh -np 2 inode28 inode30 ./osu_bw # OSU MPI Bandwidth Test (Version 2.2) # Size Bandwidth (MB/s) 1 0.243180 2 0.507795 4 1.008787 8 2.030054 16 4.008455 32 8.113140 64 16.160978 128 33.764735 256 67.708075 512 161.522157 1024 335.222506 2048 491.421716 4096 568.259955 8192 606.043232 16384 662.063392 32768 738.589843 65536 783.586601 131072 807.462616 262144 820.750931 524288 685.880335 1048576 660.237959 2097152 659.233480 4194304 659.946110 2. Test inter-node bandwidth with -DVIADEV_RPUT_SUPPORT . setup_ch_gen2 starts... -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -DLAZY_MEM_UNREGISTER -DCH_GEN2 -D_SMP_ -D_SMP_RNDV_ -D_MLX_PCI_X_ -I/usr/local/ofed/include -O3 $ mpirun_rsh -rsh -np 2 inode28 inode30 ./osu_bw # OSU MPI Bandwidth Test (Version 2.2) # Size Bandwidth (MB/s) 1 0.248081 2 0.516046 4 1.034260 8 2.069607 16 4.110799 32 8.282444 64 16.593745 128 34.620911 256 69.113305 512 163.455879 1024 341.066875 2048 496.503655 4096 569.049428 8192 606.183374 16384 624.840449 32768 713.280615 65536 769.011487 131072 800.359506 262144 814.869019 524288 679.025085 1048576 652.137840 2097152 650.207077 4194304 650.629356 3. Test intra-node bandwidth with -DVIADEV_RPUT_SUPPORT . $ mpirun_rsh -rsh -np 2 inode28 inode28 ./osu_bw # OSU MPI Bandwidth Test (Version 2.2) # Size Bandwidth (MB/s) 1 2.173175 2 4.449079 4 9.049134 8 20.301348 16 42.489627 32 85.085168 64 153.869271 128 286.734337 256 480.187573 512 741.525232 1024 932.896797 2048 1145.834426 4096 1291.731546 8192 1388.989562 16384 1428.285773 32768 1453.529249 65536 1431.307671 131072 1445.227803 262144 1393.404399 524288 1168.315567 1048576 1071.952093 2097152 1072.327638 4194304 1064.196619 I have seen test results on your homepage (http://mvapich.cse.ohio-state.edu/performance/mvapich/opteron/MVAPICH-opteron-gen2-DDR.shtml, http://mvapich.cse.ohio-state.edu/performance/mvapich/intra_opteron.shtml), that inter-node bandwidth results seem normal but intra-node bandwidth results are like mine. And bandwidth results in you paper BUILDING MULTIRAIL INFINIBAND CLUSTERS: MPI-LEVEL DESIGN AND PERFORMANCE EVALUATION: SC2004(Fig. 9) seem that striping or binding optimization will remove improve this problem. What do you think will be the problem source for my bandwidth tests? In order to get optimal bandwidth value, what do you think I should modify based on default options in original MVAPICH 0.9.8 packet? Does STRIPING or BINDING haven't been added to mvapich0.9.8? Thanks a lot for your any reply! Best Regards, Wenli zhangwl 2007-06-04 ===================================================== Zhang Wenli, NCIC, Institute of Computing Technology Chinese Academy of Sciences NO. 6, Ke Xue Yuan South Road, Zhongguancun, Beijing, P.R.China NCIC, P.O.Box 2704 Zip Code 100080 Tel: 86-10-62601041 Fax: 86-10-62527487 Email: zhangwl@ncic.ac.cn ===================================================== Zhang Wenli, NCIC, Institute of Computing Technology Chinese Academy of Sciences NO. 6, Ke Xue Yuan South Road, Zhongguancun, Beijing, P.R.China NCIC, P.O.Box 2704 Zip Code 100080 Tel: 86-10-62601041 Fax: 86-10-62527487 Email: zhangwl@ncic.ac.cn -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070605/73b7b997/attachment-0001.html From vishnu at cse.ohio-state.edu Tue Jun 5 09:53:19 2007 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Tue Jun 5 09:54:59 2007 Subject: [mvapich-discuss] Re: Question on bandwidth test In-Reply-To: References: Message-ID: <20070605135318.GA18647@cse.ohio-state.edu> Hi Wenli, Thanks for using MVAPICH and reporting the performance issue to us. IMHO, this is not a problem of the MPI layer, but the performance degradation should be visible on the tests at the verbs layer too. I am assuming that you are using OFED-1.1 and HCA firmware version 3.3.3 or greater. To see whether this is the case, may i request you to do the following: Say you want to run the tests on inode28 and inode30 1. On inode28: % ib_rdma_bw -s1048576 -n100 2. On inode30: % ib_rdma_bw -s1048576 -n100 inode28 I feel that you should see a similar performance degradation, as you are seeing at the MPI layer. My answers with respect to Multi-Rail Paper are inline, please scroll down. > > > My system is: > -- 2.2GHz Dual Core AMD Opteron(tm) Processor 275, 8GB Mem > -- Linux 2.6.9-42.ELsmp x86_64 > -- openib-1.1 { Detected the following HCAs: 1) mthca0 [ Mellanox PCI-X ] } > > 1. Test inter-node bandwidth with -DVIADEV_RGET_SUPPORT . > setup_ch_gen2 starts... -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE > -DVIADEV_RGET_SUPPORT -DLAZY_MEM_UNREGISTER -DCH_GEN2 -D_SMP_ -D_SMP_RNDV_ > -D_MLX_PCI_X_ -I/usr/local/ofed/include -O3 > > $ mpirun_rsh -rsh -np 2 inode28 inode30 ./osu_bw > # OSU MPI Bandwidth Test (Version 2.2) > # Size Bandwidth (MB/s) > 1 0.243180 > 2 0.507795 > 4 1.008787 > 8 2.030054 > 16 4.008455 > 32 8.113140 > 64 16.160978 > 128 33.764735 > 256 67.708075 > 512 161.522157 > 1024 335.222506 > 2048 491.421716 > 4096 568.259955 > 8192 606.043232 > 16384 662.063392 > 32768 738.589843 > 65536 783.586601 > 131072 807.462616 > 262144 820.750931 > 524288 685.880335 > 1048576 660.237959 > 2097152 659.233480 > 4194304 659.946110 > 2. Test inter-node bandwidth with -DVIADEV_RPUT_SUPPORT . > setup_ch_gen2 starts... -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE > -DVIADEV_RPUT_SUPPORT -DLAZY_MEM_UNREGISTER -DCH_GEN2 -D_SMP_ -D_SMP_RNDV_ > -D_MLX_PCI_X_ -I/usr/local/ofed/include -O3 > > $ mpirun_rsh -rsh -np 2 inode28 inode30 ./osu_bw > # OSU MPI Bandwidth Test (Version 2.2) > # Size Bandwidth (MB/s) > 1 0.248081 > 2 0.516046 > 4 1.034260 > 8 2.069607 > 16 4.110799 > 32 8.282444 > 64 16.593745 > 128 34.620911 > 256 69.113305 > 512 163.455879 > 1024 341.066875 > 2048 496.503655 > 4096 569.049428 > 8192 606.183374 > 16384 624.840449 > 32768 713.280615 > 65536 769.011487 > 131072 800.359506 > 262144 814.869019 > 524288 679.025085 > 1048576 652.137840 > 2097152 650.207077 > 4194304 650.629356 > 3. Test intra-node bandwidth with -DVIADEV_RPUT_SUPPORT . > $ mpirun_rsh -rsh -np 2 inode28 inode28 ./osu_bw > # OSU MPI Bandwidth Test (Version 2.2) > # Size Bandwidth (MB/s) > 1 2.173175 > 2 4.449079 > 4 9.049134 > 8 20.301348 > 16 42.489627 > 32 85.085168 > 64 153.869271 > 128 286.734337 > 256 480.187573 > 512 741.525232 > 1024 932.896797 > 2048 1145.834426 > 4096 1291.731546 > 8192 1388.989562 > 16384 1428.285773 > 32768 1453.529249 > 65536 1431.307671 > 131072 1445.227803 > 262144 1393.404399 > 524288 1168.315567 > 1048576 1071.952093 > 2097152 1072.327638 > 4194304 1064.196619 > > I have seen test results on your homepage (http://mvapich.cse.ohio-state.edu/ > performance/mvapich/opteron/MVAPICH-opteron-gen2-DDR.shtml, http:// > mvapich.cse.ohio-state.edu/performance/mvapich/intra_opteron.shtml), that > inter-node bandwidth results seem normal but intra-node bandwidth results are > like mine. And bandwidth results in your paper BUILDING MULTIRAIL INFINIBAND > CLUSTERS: MPI-LEVEL DESIGN AND PERFORMANCE EVALUATION: SC2004(Fig. 9) seem that > striping or binding optimization will remove improve this problem. Yes, actually striping the data on multiple paths helps the performance of microbenchmarks and applications as shown in the paper. However, as per your system, you are using only one HCA and one port for communication. Hence, these scheduling policies are unlikely to solve the situation. I think we will have a clearer idea about the point of performance degradation, once you have the results from ib_rdma_bw. Please let us know the outcome of your experimentation. Thanks, :- Abhinav > > What do you think will be the problem source for my bandwidth tests? In order > to get optimal bandwidth value, what do you think I should modify based on > default options in original MVAPICH 0.9.8 packet? > > > Any reply is appreciated! > > Thanks, > Wenli > From t3dinh at yahoo.com Tue Jun 5 10:01:07 2007 From: t3dinh at yahoo.com (phuong dinh) Date: Tue Jun 5 10:01:23 2007 Subject: Fwd: [mvapich-discuss] viainit.c errors while building mvapich-0.9.9 with topspin In-Reply-To: Message-ID: <691460.49864.qm@web31605.mail.mud.yahoo.com> I manage to get it running with Intel compiler (only vapi interface) but no luck with studio sun compiler. Thanks Bryan for your input, will try mvapich2 -Phong Bryan Putnam wrote: On Mon, 4 Jun 2007, phuong dinh wrote: > > > Had anyone success in building mvapich 0.9.9 with topspin? > > -Thanks, > Phong Phong, I too ran into vianit.c problems about a year ago when building mvapich with topspin drivers. I eventually gave up on it, and switched to mvapich2, where I had no problems. Bryan > > > --------------------------------- > Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. --------------------------------- Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070605/6e08a7a0/attachment.html From fjblas at arcos.inf.uc3m.es Tue Jun 5 10:45:45 2007 From: fjblas at arcos.inf.uc3m.es (=?ISO-8859-1?Q?Francisco_Javier_Garc=EDa_Blas?=) Date: Tue Jun 5 10:46:01 2007 Subject: [mvapich-discuss] MVAPICH2 spawn problem Message-ID: <46657719.4010604@arcos.inf.uc3m.es> Hi all, I compiled mvapich2 0.9.8 with ./make-mvapich.ofa and all seems to be well, but when i tried to run the spawn merge example something goes wrong. The result of the execution is this: sh-3.00$ ../bin/mpdboot sh-3.00$ ../bin/mpiexec -n 2 spawn_merge_parent Parents spawning 2 children... Fatal error in MPI_Comm_spawn: Other MPI error, error stack: MPI_Comm_spawn(128)...........: MPI_Comm_spawn(cmd="spawn_merge_child1", argv=(nil), maxprocs=2, MPI_INFO_NULL, root=1, MPI_COMM_WORLD, intercomm=0x7fbfffdb48, errors=0x7fbfffdb60) failed MPID_Comm_spawn_multiple(56)..: MPIDI_Comm_spawn_multiple(213): MPID_Comm_accept(153).........: Function not implementedFatal error in MPI_Comm_spawn: Other MPI error, error stack: MPI_Comm_spawn(128)...........: MPI_Comm_spawn(cmd="spawn_merge_child1", argv=(nil), maxprocs=2, MPI_INFO_NULL, root=1, MPI_COMM_WORLD, intercomm=0x7fbfffdb48, errors=0x7fbfffdb60) failed MPID_Comm_spawn_multiple(56)..: MPIDI_Comm_spawn_multiple(169): MPID_Open_port(71)............: MPID_Open_port(65)............: Function not implementedrank 1 in job 1 lslogin2_33241 caused collective abort of all ranks exit status of rank 1: killed by signal 9 The spawn functionality works well for TCP/IP sockets over IB, it fails only with the Infiniband library. The Infinibad library is ofed1.1 Do you have any idea what the possible problem might be? Many thanks From THOMAS.T.O'SHEA at saic.com Wed Jun 6 19:05:02 2007 From: THOMAS.T.O'SHEA at saic.com (Thomas O'Shea) Date: Wed Jun 6 19:05:31 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. References: <055401c7a6be$3f55e7a0$9b66798b@us.saic.com> Message-ID: <065201c7a88f$1ea862d0$9b66798b@us.saic.com> Did you ever find a work-around? Thanks, Tom ----- Original Message ----- From: "Sylvain Jeaugey" To: "Thomas O'Shea" Cc: "wei huang" ; Sent: Monday, June 04, 2007 9:29 AM Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. > Hi all, > > For the record, this is an error I already encountered. [I didn't report > it since I'm still using an old mvapich tree.] > Unfortunately, we also don't have a simple way to reproduce it. > > Sylvain > > On Mon, 4 Jun 2007, Thomas O'Shea wrote: > > > We migrated over to gen2 (OpenFabrics ) and we are still getting the same > > errors. I was wondering if you found anything, or have any ideas of what to > > try next. > > > > Thanks, > > Tom > > ----- Original Message ----- > > From: "wei huang" > > To: "Thomas O'Shea" > > Cc: > > Sent: Friday, May 04, 2007 3:06 PM > > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > > >> Hi Thomas, > >> > >> Thanks for your reply. > >> > >> Because the source code of your application is not available to us, we > >> will do a code review of our code (or do you have a piece of code which > >> shows the problem that can be sent to us?) > >> > >> The reason I ask you to try gen2 (OpenFabrics) stack is because the whole > >> InfiniBand community is moving towards this. So actually most of our > >> efforts is spent on this front (though we still maintain certain necessary > >> maintenance and bug fixes for the vapi stack). You can find useful > >> information to install the OFED stack (OpenFabrics Enterprise > >> Distribution) here: > >> > >> http://www.openfabrics.org/downloads.htm > >> > >> And the information to compile mvapich2 with OFED stack is avaialable > >> through our website. > >> > >> Anyway, we will get back to you once we find something. > >> > >> Thanks. > >> > >> Regards, > >> Wei Huang > >> > >> 774 Dreese Lab, 2015 Neil Ave, > >> Dept. of Computer Science and Engineering > >> Ohio State University > >> OH 43210 > >> Tel: (614)292-8501 > >> > >> > >> On Fri, 4 May 2007, Thomas O'Shea wrote: > >> > >>> Thanks for the response. > >>> > >>> 1) Turns out we are using mvapich2-0.9.8p1 already. > >>> > >>> 2) Yes, the standard compiling scripts were used. > >>> > >>> 3) You are correct, most of the communication involves one sided > > operations > >>> with passive synchronization. The code also uses a few other MPI > > commands. > >>> > >>> We define MPI vector types: > >>> > >>> CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > >>> & xtype,ierr) > >>> > >>> CALL MPI_TYPE_COMMIT(xtype,ierr) > >>> > >>> Create MPI Windows: > >>> > >>> CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > >>> & MPI_COMM_WORLD,win,ierr) > >>> > >>> Synch our gets with lock and unlock: > >>> > >>> CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > >>> CALL MPI_GET(wget,1,xtype,get_pe, > >>> & targ_disp,1,xtype,win,ierr) > >>> CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > >>> > >>> We use one broadcast call > >>> > >>> call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > >>> 1 MPI_COMM_WORLD,ierr) > >>> > >>> And of course barriers and freeing the windows and vector types. > >>> > >>> The error we are getting happens on a MPI_WIN_UNLOCK after a GET call > > that > >>> does not use the MPI_TYPE_VECTOR that we created though. The ierr from > > the > >>> GET call is 0 as well. > >>> > >>> > >>> 4) I talked with the IT person in charge of this cluster and he said > > that we > >>> could try that, but he said the documentation he found on gen2 and udapl > > was > >>> somewhat sparse in that he wasn't sure exactly how to set that up and > > what > >>> the different compilations actually do differently. Is there any > > resource > >>> you can point us towards? > >>> > >>> Thanks, > >>> Tom > >>> > >>> > >>>> Hi Thomas, > >>>> > >>>> We will look into this issue. Would you please let us know the > > following: > >>>> > >>>> 1) We have recently made a couple of bug fixes and released > >>>> mvapich2-0.9.8p1. Would you first try that version? > >>>> > >>>> And if it is not working: > >>>> > >>>> 2) Did you use the standard compiling scripts (you mentioned ib gold > >>>> release, is it on vapi? And did you use make.mvapich2.vapi?) > >>>> > >>>> 3) Would you provide us some information on how the comunication > > patterns > >>>> of your application are? It seems like one sided operations with > > passive > >>>> synchronization (lock, get, unlock). Did you use other operations? > >>>> > >>>> 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl > > on > >>>> your stack, if they are available on your systems? > >>>> > >>>> Thanks. > >>>> > >>>> Regards, > >>>> Wei Huang > >>>> > >>>> 774 Dreese Lab, 2015 Neil Ave, > >>>> Dept. of Computer Science and Engineering > >>>> Ohio State University > >>>> OH 43210 > >>>> Tel: (614)292-8501 > >>>> > >>>> > >>>> On Thu, 3 May 2007, Thomas O'Shea wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > >>>>> 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > >>>>> through infiniband. I started off running this parallel Fortran code > >>>>> on just one node with MPICH2 and had no problems. It scaled decently > >>>>> to 8 processors but didn't see much improvement with the jump to 16 > >>>>> (possibly due to cache coherency or something). Now, when trying to > >>>>> get it running across the infiniband connect I get this error: > >>>>> > >>>>> current bytes 4, total bytes 28, remote id 1 > >>>>> nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: > > Assertion > >>> 'current_bytes[vc->smp.local_nodes] == 0' failed. > >>>>> rank 0 in job 1 nessie_32906 caused collective abort of all ranks > >>>>> exit status of rank 0: killed by signal 9 > >>>>> > >>>>> This happens right after a one sided communication (MPI_GET) but > >>>>> before the MPI_WIN_UNLOCK call that follows. Also this is only with > > a > >>>>> process that is on the same node as the calling process, The MPI_GET > >>>>> call exits with no errors also. > >>>>> > >>>>> All the osu_benchmarks run with no problems. There were also no > >>>>> problems if I make a local mpd (mpd &) ring on a single node and run > >>>>> the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile > > with > >>>>> the MPICH2 libraries there are no problems on a single node or > > running > >>>>> processes spread out on both nodes. > >>>>> > >>>>> Ever seen this before? Any help would be greatly appreciated. > >>>>> > >>>>> Thanks, > >>>>> Thomas O'Shea > >>>>> SAIC > >>> > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From THOMAS.T.O'SHEA at saic.com Wed Jun 6 19:58:52 2007 From: THOMAS.T.O'SHEA at saic.com (Thomas O'Shea) Date: Wed Jun 6 19:59:26 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. References: Message-ID: <065b01c7a896$a2c9e690$9b66798b@us.saic.com> Hello, We just recompiled without the -D_SMP_flag and the code runs with no errors. Did this change the way mvapich communicates on a local node? How much slower do you think it will be? We're running some scaling tests now. Anything else you want me to try? Thanks, Tom > Hi, > > We've been carrying thorough testing on our code base. Up to now, we did > not find any outstanding error on MPI one sided code. Can we get access to > your source code or get a small program showing the problem? It will be > the easiest way for us to find the problem. > > Also, since this is an assertion failure in the SMP part of code, you can > also try compiling mvapich2 without SMP by removing the -D_SMP_ flag from > your CFLAGS. Corresponding changes can be made in our make.mvapich2.ofa. > Let's see if your program runs successfully with that change. > > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > On Mon, 4 Jun 2007, Thomas O'Shea wrote: > > > We migrated over to gen2 (OpenFabrics ) and we are still getting the same > > errors. I was wondering if you found anything, or have any ideas of what to > > try next. > > > > Thanks, > > Tom > > ----- Original Message ----- > > From: "wei huang" > > To: "Thomas O'Shea" > > Cc: > > Sent: Friday, May 04, 2007 3:06 PM > > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > > > > Hi Thomas, > > > > > > Thanks for your reply. > > > > > > Because the source code of your application is not available to us, we > > > will do a code review of our code (or do you have a piece of code which > > > shows the problem that can be sent to us?) > > > > > > The reason I ask you to try gen2 (OpenFabrics) stack is because the whole > > > InfiniBand community is moving towards this. So actually most of our > > > efforts is spent on this front (though we still maintain certain necessary > > > maintenance and bug fixes for the vapi stack). You can find useful > > > information to install the OFED stack (OpenFabrics Enterprise > > > Distribution) here: > > > > > > http://www.openfabrics.org/downloads.htm > > > > > > And the information to compile mvapich2 with OFED stack is avaialable > > > through our website. > > > > > > Anyway, we will get back to you once we find something. > > > > > > Thanks. > > > > > > Regards, > > > Wei Huang > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > Dept. of Computer Science and Engineering > > > Ohio State University > > > OH 43210 > > > Tel: (614)292-8501 > > > > > > > > > On Fri, 4 May 2007, Thomas O'Shea wrote: > > > > > > > Thanks for the response. > > > > > > > > 1) Turns out we are using mvapich2-0.9.8p1 already. > > > > > > > > 2) Yes, the standard compiling scripts were used. > > > > > > > > 3) You are correct, most of the communication involves one sided > > operations > > > > with passive synchronization. The code also uses a few other MPI > > commands. > > > > > > > > We define MPI vector types: > > > > > > > > CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > > > > & xtype,ierr) > > > > > > > > CALL MPI_TYPE_COMMIT(xtype,ierr) > > > > > > > > Create MPI Windows: > > > > > > > > CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > > > > & MPI_COMM_WORLD,win,ierr) > > > > > > > > Synch our gets with lock and unlock: > > > > > > > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > > > > CALL MPI_GET(wget,1,xtype,get_pe, > > > > & targ_disp,1,xtype,win,ierr) > > > > CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > > > > > > > > We use one broadcast call > > > > > > > > call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > > > > 1 MPI_COMM_WORLD,ierr) > > > > > > > > And of course barriers and freeing the windows and vector types. > > > > > > > > The error we are getting happens on a MPI_WIN_UNLOCK after a GET call > > that > > > > does not use the MPI_TYPE_VECTOR that we created though. The ierr from > > the > > > > GET call is 0 as well. > > > > > > > > > > > > 4) I talked with the IT person in charge of this cluster and he said > > that we > > > > could try that, but he said the documentation he found on gen2 and udapl > > was > > > > somewhat sparse in that he wasn't sure exactly how to set that up and > > what > > > > the different compilations actually do differently. Is there any > > resource > > > > you can point us towards? > > > > > > > > Thanks, > > > > Tom > > > > > > > > > > > > > Hi Thomas, > > > > > > > > > > We will look into this issue. Would you please let us know the > > following: > > > > > > > > > > 1) We have recently made a couple of bug fixes and released > > > > > mvapich2-0.9.8p1. Would you first try that version? > > > > > > > > > > And if it is not working: > > > > > > > > > > 2) Did you use the standard compiling scripts (you mentioned ib gold > > > > > release, is it on vapi? And did you use make.mvapich2.vapi?) > > > > > > > > > > 3) Would you provide us some information on how the comunication > > patterns > > > > > of your application are? It seems like one sided operations with > > passive > > > > > synchronization (lock, get, unlock). Did you use other operations? > > > > > > > > > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl > > on > > > > > your stack, if they are available on your systems? > > > > > > > > > > Thanks. > > > > > > > > > > Regards, > > > > > Wei Huang > > > > > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > > > Dept. of Computer Science and Engineering > > > > > Ohio State University > > > > > OH 43210 > > > > > Tel: (614)292-8501 > > > > > > > > > > > > > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > > > > > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > > > > > through infiniband. I started off running this parallel Fortran code > > > > > > on just one node with MPICH2 and had no problems. It scaled decently > > > > > > to 8 processors but didn't see much improvement with the jump to 16 > > > > > > (possibly due to cache coherency or something). Now, when trying to > > > > > > get it running across the infiniband connect I get this error: > > > > > > > > > > > > current bytes 4, total bytes 28, remote id 1 > > > > > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: > > Assertion > > > > 'current_bytes[vc->smp.local_nodes] == 0' failed. > > > > > > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > > > > > > exit status of rank 0: killed by signal 9 > > > > > > > > > > > > This happens right after a one sided communication (MPI_GET) but > > > > > > before the MPI_WIN_UNLOCK call that follows. Also this is only with > > a > > > > > > process that is on the same node as the calling process, The MPI_GET > > > > > > call exits with no errors also. > > > > > > > > > > > > All the osu_benchmarks run with no problems. There were also no > > > > > > problems if I make a local mpd (mpd &) ring on a single node and run > > > > > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile > > with > > > > > > the MPICH2 libraries there are no problems on a single node or > > running > > > > > > processes spread out on both nodes. > > > > > > > > > > > > Ever seen this before? Any help would be greatly appreciated. > > > > > > > > > > > > Thanks, > > > > > > Thomas O'Shea > > > > > > SAIC > > > > > > From sylvain.jeaugey at bull.net Thu Jun 7 03:28:08 2007 From: sylvain.jeaugey at bull.net (Sylvain Jeaugey) Date: Thu Jun 7 03:28:29 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. In-Reply-To: <065201c7a88f$1ea862d0$9b66798b@us.saic.com> References: <055401c7a6be$3f55e7a0$9b66798b@us.saic.com> <065201c7a88f$1ea862d0$9b66798b@us.saic.com> Message-ID: The workaround is to use another way to communicate inside a node. The first and possibly only idea is to disable _SMP_, of course, but performance may be worse (or better ... depending on your architecture and the size of your nodes). We could workaround this problem since we use a bull-customized version of MVAPICH2 embedding another shared-memory mecanism for intra-node communication. Using this "other" device also made the problem disappear, which confirms that this problem is related to the _SMP_ code. I don't know if MVAPICH 1 would perform better, but you may want to give it a try. Sylvain On Wed, 6 Jun 2007, Thomas O'Shea wrote: > Did you ever find a work-around? > > Thanks, > Tom > ----- Original Message ----- > From: "Sylvain Jeaugey" > To: "Thomas O'Shea" > Cc: "wei huang" ; > > Sent: Monday, June 04, 2007 9:29 AM > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > 'current_bytes[vc->smp.local_nodes]==0' failed. > > >> Hi all, >> >> For the record, this is an error I already encountered. [I didn't report >> it since I'm still using an old mvapich tree.] >> Unfortunately, we also don't have a simple way to reproduce it. >> >> Sylvain >> >> On Mon, 4 Jun 2007, Thomas O'Shea wrote: >> >>> We migrated over to gen2 (OpenFabrics ) and we are still getting the > same >>> errors. I was wondering if you found anything, or have any ideas of what > to >>> try next. >>> >>> Thanks, >>> Tom >>> ----- Original Message ----- >>> From: "wei huang" >>> To: "Thomas O'Shea" >>> Cc: >>> Sent: Friday, May 04, 2007 3:06 PM >>> Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion >>> 'current_bytes[vc->smp.local_nodes]==0' failed. >>> >>> >>>> Hi Thomas, >>>> >>>> Thanks for your reply. >>>> >>>> Because the source code of your application is not available to us, we >>>> will do a code review of our code (or do you have a piece of code which >>>> shows the problem that can be sent to us?) >>>> >>>> The reason I ask you to try gen2 (OpenFabrics) stack is because the > whole >>>> InfiniBand community is moving towards this. So actually most of our >>>> efforts is spent on this front (though we still maintain certain > necessary >>>> maintenance and bug fixes for the vapi stack). You can find useful >>>> information to install the OFED stack (OpenFabrics Enterprise >>>> Distribution) here: >>>> >>>> http://www.openfabrics.org/downloads.htm >>>> >>>> And the information to compile mvapich2 with OFED stack is avaialable >>>> through our website. >>>> >>>> Anyway, we will get back to you once we find something. >>>> >>>> Thanks. >>>> >>>> Regards, >>>> Wei Huang >>>> >>>> 774 Dreese Lab, 2015 Neil Ave, >>>> Dept. of Computer Science and Engineering >>>> Ohio State University >>>> OH 43210 >>>> Tel: (614)292-8501 >>>> >>>> >>>> On Fri, 4 May 2007, Thomas O'Shea wrote: >>>> >>>>> Thanks for the response. >>>>> >>>>> 1) Turns out we are using mvapich2-0.9.8p1 already. >>>>> >>>>> 2) Yes, the standard compiling scripts were used. >>>>> >>>>> 3) You are correct, most of the communication involves one sided >>> operations >>>>> with passive synchronization. The code also uses a few other MPI >>> commands. >>>>> >>>>> We define MPI vector types: >>>>> >>>>> CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, >>>>> & xtype,ierr) >>>>> >>>>> CALL MPI_TYPE_COMMIT(xtype,ierr) >>>>> >>>>> Create MPI Windows: >>>>> >>>>> CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, >>>>> & MPI_COMM_WORLD,win,ierr) >>>>> >>>>> Synch our gets with lock and unlock: >>>>> >>>>> CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) >>>>> CALL MPI_GET(wget,1,xtype,get_pe, >>>>> & targ_disp,1,xtype,win,ierr) >>>>> CALL MPI_WIN_UNLOCK(get_pe,win,ierr) >>>>> >>>>> We use one broadcast call >>>>> >>>>> call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, >>>>> 1 MPI_COMM_WORLD,ierr) >>>>> >>>>> And of course barriers and freeing the windows and vector types. >>>>> >>>>> The error we are getting happens on a MPI_WIN_UNLOCK after a GET call >>> that >>>>> does not use the MPI_TYPE_VECTOR that we created though. The ierr from >>> the >>>>> GET call is 0 as well. >>>>> >>>>> >>>>> 4) I talked with the IT person in charge of this cluster and he said >>> that we >>>>> could try that, but he said the documentation he found on gen2 and > udapl >>> was >>>>> somewhat sparse in that he wasn't sure exactly how to set that up and >>> what >>>>> the different compilations actually do differently. Is there any >>> resource >>>>> you can point us towards? >>>>> >>>>> Thanks, >>>>> Tom >>>>> >>>>> >>>>>> Hi Thomas, >>>>>> >>>>>> We will look into this issue. Would you please let us know the >>> following: >>>>>> >>>>>> 1) We have recently made a couple of bug fixes and released >>>>>> mvapich2-0.9.8p1. Would you first try that version? >>>>>> >>>>>> And if it is not working: >>>>>> >>>>>> 2) Did you use the standard compiling scripts (you mentioned ib gold >>>>>> release, is it on vapi? And did you use make.mvapich2.vapi?) >>>>>> >>>>>> 3) Would you provide us some information on how the comunication >>> patterns >>>>>> of your application are? It seems like one sided operations with >>> passive >>>>>> synchronization (lock, get, unlock). Did you use other operations? >>>>>> >>>>>> 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl >>> on >>>>>> your stack, if they are available on your systems? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Regards, >>>>>> Wei Huang >>>>>> >>>>>> 774 Dreese Lab, 2015 Neil Ave, >>>>>> Dept. of Computer Science and Engineering >>>>>> Ohio State University >>>>>> OH 43210 >>>>>> Tel: (614)292-8501 >>>>>> >>>>>> >>>>>> On Thu, 3 May 2007, Thomas O'Shea wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 >>>>>>> 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up >>>>>>> through infiniband. I started off running this parallel Fortran code >>>>>>> on just one node with MPICH2 and had no problems. It scaled decently >>>>>>> to 8 processors but didn't see much improvement with the jump to 16 >>>>>>> (possibly due to cache coherency or something). Now, when trying to >>>>>>> get it running across the infiniband connect I get this error: >>>>>>> >>>>>>> current bytes 4, total bytes 28, remote id 1 >>>>>>> nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: >>> Assertion >>>>> 'current_bytes[vc->smp.local_nodes] == 0' failed. >>>>>>> rank 0 in job 1 nessie_32906 caused collective abort of all ranks >>>>>>> exit status of rank 0: killed by signal 9 >>>>>>> >>>>>>> This happens right after a one sided communication (MPI_GET) but >>>>>>> before the MPI_WIN_UNLOCK call that follows. Also this is only with >>> a >>>>>>> process that is on the same node as the calling process, The MPI_GET >>>>>>> call exits with no errors also. >>>>>>> >>>>>>> All the osu_benchmarks run with no problems. There were also no >>>>>>> problems if I make a local mpd (mpd &) ring on a single node and run >>>>>>> the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile >>> with >>>>>>> the MPICH2 libraries there are no problems on a single node or >>> running >>>>>>> processes spread out on both nodes. >>>>>>> >>>>>>> Ever seen this before? Any help would be greatly appreciated. >>>>>>> >>>>>>> Thanks, >>>>>>> Thomas O'Shea >>>>>>> SAIC >>>>> >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> > > From huanwei at cse.ohio-state.edu Thu Jun 7 10:12:43 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Thu Jun 7 10:12:57 2007 Subject: [mvapich-discuss] MVAPICH2 spawn problem In-Reply-To: <46657719.4010604@arcos.inf.uc3m.es> Message-ID: Hi, Thanks for using mvapich2. Unfortunately, the spawn support is not there yet with the native IB stack. We are taking a look at this issue and plan to support this in our future releases soon. TCP/IP stack using IPoIB should always work for you. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Tue, 5 Jun 2007, [ISO-8859-1] Francisco Javier Garc韆 Blas wrote: > Hi all, > > I compiled mvapich2 0.9.8 with ./make-mvapich.ofa and all seems to be > well, but when i tried to run the spawn merge example something goes > wrong. The result of the execution is this: > > sh-3.00$ ../bin/mpdboot > sh-3.00$ ../bin/mpiexec -n 2 spawn_merge_parent > Parents spawning 2 children... > Fatal error in MPI_Comm_spawn: Other MPI error, error stack: > MPI_Comm_spawn(128)...........: MPI_Comm_spawn(cmd="spawn_merge_child1", > argv=(nil), maxprocs=2, MPI_INFO_NULL, root=1, MPI_COMM_WORLD, > intercomm=0x7fbfffdb48, errors=0x7fbfffdb60) failed > MPID_Comm_spawn_multiple(56)..: > MPIDI_Comm_spawn_multiple(213): > MPID_Comm_accept(153).........: Function not implementedFatal error in > MPI_Comm_spawn: Other MPI error, error stack: > MPI_Comm_spawn(128)...........: MPI_Comm_spawn(cmd="spawn_merge_child1", > argv=(nil), maxprocs=2, MPI_INFO_NULL, root=1, MPI_COMM_WORLD, > intercomm=0x7fbfffdb48, errors=0x7fbfffdb60) failed > MPID_Comm_spawn_multiple(56)..: > MPIDI_Comm_spawn_multiple(169): > MPID_Open_port(71)............: > MPID_Open_port(65)............: Function not implementedrank 1 in job 1 > lslogin2_33241 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 > > The spawn functionality works well for TCP/IP sockets over IB, it fails > only with the Infiniband library. The Infinibad library is ofed1.1 > > Do you have any idea what the possible problem might be? > > Many thanks > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From santhana at cse.ohio-state.edu Sun Jun 10 14:28:37 2007 From: santhana at cse.ohio-state.edu (Gopal Santhanaraman) Date: Sun Jun 10 18:03:55 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. (fwd) In-Reply-To: Message-ID: Hi Thomas Thanks for your reply. We have tried out the communication patterns that you had reported and they run fine even with the case where the two processes are running on the same node. I have attached the tests with this mail. These tests are are from the mpich2 test suite (test4.c test4_am.c transpose.c). You can also try out these tests on your system. We have not been able to reproduce the error that you are reporting. Can you give more insights into the application code that you are running or if it is possible give us the application code. Thanks Gopal > Date: Mon, 4 Jun 2007 08:37:23 -0700 > From: Thomas O'Shea > Reply-To: Thomas O'Shea > To: wei huang > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > 'current_bytes[vc->smp.local_nodes]==0' failed. > > We migrated over to gen2 (OpenFabrics ) and we are still getting the same > errors. I was wondering if you found anything, or have any ideas of what to > try next. > > Thanks, > Tom > ----- Original Message ----- > From: "wei huang" > To: "Thomas O'Shea" > Cc: > Sent: Friday, May 04, 2007 3:06 PM > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > Hi Thomas, > > > > Thanks for your reply. > > > > Because the source code of your application is not available to us, we > > will do a code review of our code (or do you have a piece of code which > > shows the problem that can be sent to us?) > > > > The reason I ask you to try gen2 (OpenFabrics) stack is because the whole > > InfiniBand community is moving towards this. So actually most of our > > efforts is spent on this front (though we still maintain certain necessary > > maintenance and bug fixes for the vapi stack). You can find useful > > information to install the OFED stack (OpenFabrics Enterprise > > Distribution) here: > > > > http://www.openfabrics.org/downloads.htm > > > > And the information to compile mvapich2 with OFED stack is avaialable > > through our website. > > > > Anyway, we will get back to you once we find something. > > > > Thanks. > > > > Regards, > > Wei Huang > > > > 774 Dreese Lab, 2015 Neil Ave, > > Dept. of Computer Science and Engineering > > Ohio State University > > OH 43210 > > Tel: (614)292-8501 > > > > > > On Fri, 4 May 2007, Thomas O'Shea wrote: > > > > > Thanks for the response. > > > > > > 1) Turns out we are using mvapich2-0.9.8p1 already. > > > > > > 2) Yes, the standard compiling scripts were used. > > > > > > 3) You are correct, most of the communication involves one sided > operations > > > with passive synchronization. The code also uses a few other MPI > commands. > > > > > > We define MPI vector types: > > > > > > CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > > > & xtype,ierr) > > > > > > CALL MPI_TYPE_COMMIT(xtype,ierr) > > > > > > Create MPI Windows: > > > > > > CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > > > & MPI_COMM_WORLD,win,ierr) > > > > > > Synch our gets with lock and unlock: > > > > > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > > > CALL MPI_GET(wget,1,xtype,get_pe, > > > & targ_disp,1,xtype,win,ierr) > > > CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > > > > > > We use one broadcast call > > > > > > call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > > > 1 MPI_COMM_WORLD,ierr) > > > > > > And of course barriers and freeing the windows and vector types. > > > > > > The error we are getting happens on a MPI_WIN_UNLOCK after a GET call > that > > > does not use the MPI_TYPE_VECTOR that we created though. The ierr from > the > > > GET call is 0 as well. > > > > > > > > > 4) I talked with the IT person in charge of this cluster and he said > that we > > > could try that, but he said the documentation he found on gen2 and udapl > was > > > somewhat sparse in that he wasn't sure exactly how to set that up and > what > > > the different compilations actually do differently. Is there any > resource > > > you can point us towards? > > > > > > Thanks, > > > Tom > > > > > > > > > > Hi Thomas, > > > > > > > > We will look into this issue. Would you please let us know the > following: > > > > > > > > 1) We have recently made a couple of bug fixes and released > > > > mvapich2-0.9.8p1. Would you first try that version? > > > > > > > > And if it is not working: > > > > > > > > 2) Did you use the standard compiling scripts (you mentioned ib gold > > > > release, is it on vapi? And did you use make.mvapich2.vapi?) > > > > > > > > 3) Would you provide us some information on how the comunication > patterns > > > > of your application are? It seems like one sided operations with > passive > > > > synchronization (lock, get, unlock). Did you use other operations? > > > > > > > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl > on > > > > your stack, if they are available on your systems? > > > > > > > > Thanks. > > > > > > > > Regards, > > > > Wei Huang > > > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > > Dept. of Computer Science and Engineering > > > > Ohio State University > > > > OH 43210 > > > > Tel: (614)292-8501 > > > > > > > > > > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > > > > > > > Hello, > > > > > > > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > > > > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > > > > through infiniband. I started off running this parallel Fortran code > > > > > on just one node with MPICH2 and had no problems. It scaled decently > > > > > to 8 processors but didn't see much improvement with the jump to 16 > > > > > (possibly due to cache coherency or something). Now, when trying to > > > > > get it running across the infiniband connect I get this error: > > > > > > > > > > current bytes 4, total bytes 28, remote id 1 > > > > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: > Assertion > > > 'current_bytes[vc->smp.local_nodes] == 0' failed. > > > > > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > > > > > exit status of rank 0: killed by signal 9 > > > > > > > > > > This happens right after a one sided communication (MPI_GET) but > > > > > before the MPI_WIN_UNLOCK call that follows. Also this is only with > a > > > > > process that is on the same node as the calling process, The MPI_GET > > > > > call exits with no errors also. > > > > > > > > > > All the osu_benchmarks run with no problems. There were also no > > > > > problems if I make a local mpd (mpd &) ring on a single node and run > > > > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile > with > > > > > the MPICH2 libraries there are no problems on a single node or > running > > > > > processes spread out on both nodes. > > > > > > > > > > Ever seen this before? Any help would be greatly appreciated. > > > > > > > > > > Thanks, > > > > > Thomas O'Shea > > > > > SAIC > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -------------- next part -------------- /* -*- Mode: C; c-basic-offset:4 ; -*- */ /* * (C) 2001 by Argonne National Laboratory. * See COPYRIGHT in top-level directory. */ #include "mpi.h" #include "stdio.h" #include "stdlib.h" #include "mpitest.h" /* tests passive target RMA on 2 processes. tests the lock-single_op-unlock optimization. */ #define SIZE1 100 #define SIZE2 200 int main(int argc, char *argv[]) { int rank, nprocs, A[SIZE2], B[SIZE2], i; MPI_Win win; int errs = 0; MTest_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if (nprocs != 2) { printf("Run this program with 2 processes\n"); MPI_Abort(MPI_COMM_WORLD,1); } if (rank == 0) { for (i=0; i Hi all, We have an 8 dual quad-core node HP cluster connected via Infiniband. We use Voltaire DDR cards and 24-port switch. We also use OFED 1.1 and MVAPICH 0.9.7. We have two interesting problems that we could not overcome yet: 1. In our test program which mimics the communications in our code, the nodes are paired as follows: (0 and 1), (2 and 3), (4 and 5), (6 and 7). We perform one to one communications between these pairs of nodes simultaneously. We use blocking MPI send and receive commands to communicate an integer array of various sizes. In addition, we consider different numbers of processes: (a) 1 process per node, 8 processes overall: One link is established between the pairs of nodes. (b) 2 process per node, 16 processes overall: Two links are established between the pairs of nodes. (c) 4 process per node, 32 processes overall: Four links are established between the pairs of nodes. (d) 8 process per node, 64 processes overall: Eight links are established between the pairs of nodes. We obtain logical timings, except for the following interesting comparison: For 32 processes (4 process per node), the arrays with 512-Byte size are communicated slower than the 4096-Byte size arrays. For both of them, we send/receive 1,000,000 arrays and take the average to find the time per package. Only package size changes. We have made many trials and confirmed this abnormal case is persistent. More specifically, communication of 4k-Byte packages are 2 times faster than the communication of 512-Byte packages. The OSU bandwidth and latency test around these points shows: Byte MB/s 256 417.53 512 592.34 1024 691.02 2048 857.35 4096 906.04 8192 1022.52 Time (usec) 256 4.79 512 5.48 1024 6.60 2048 8.30 4096 11.02 So this behavior does not seem reasonable to us. 2. SOMETIMES, after the test with overall 32 processes, one of the four processes at node3 hangs in TASK_UNINTERRUPTABLE "D" state. Hence, the test program shows a "done." and waits for sometime. We can neither kill the process nor soft reboot the node. We have to wait for that process to terminate, which can last long. Does anybody have some comments in these issues? Thanks in advance, Tahir Malas Bilkent University Electrical and Electronics Engineering Department From surs at cse.ohio-state.edu Tue Jun 12 11:09:01 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Tue Jun 12 11:09:15 2007 Subject: [mvapich-discuss] Two problems related to slowness and TASK_UNINTERRUPTABLE process In-Reply-To: <01ae01c7acc2$dfa8e810$d80cb38b@bs> References: <01ae01c7acc2$dfa8e810$d80cb38b@bs> Message-ID: <466EB70D.2000306@cse.ohio-state.edu> Hi Tahir, Thanks for sharing this data and your observations. It is interesting. We have a more recent release, MVAPICH-0.9.9 which is available from our website (mvapich.cse.ohio-state.edu) as well as with OFED-1.2 distribution. Could you please try out our newer release and see if the results change/remain the same? Thanks, Sayantan. Tahir Malas wrote: > Hi all, > We have an 8 dual quad-core node HP cluster connected via Infiniband. We use > Voltaire DDR cards and 24-port switch. We also use OFED 1.1 and MVAPICH > 0.9.7. We have two interesting problems that we could not overcome yet: > > 1. In our test program which mimics the communications in our code, the > nodes are paired as follows: (0 and 1), (2 and 3), (4 and 5), (6 and 7). We > perform one to one communications between these pairs of nodes > simultaneously. We use blocking MPI send and receive commands to communicate > an integer array of various sizes. In addition, we consider different > numbers of processes: > (a) 1 process per node, 8 processes overall: One link is established between > the pairs of nodes. > (b) 2 process per node, 16 processes overall: Two links are established > between the pairs of nodes. > (c) 4 process per node, 32 processes overall: Four links are established > between the pairs of nodes. > (d) 8 process per node, 64 processes overall: Eight links are established > between the pairs of nodes. > > We obtain logical timings, except for the following interesting comparison: > > For 32 processes (4 process per node), the arrays with 512-Byte size are > communicated slower than the 4096-Byte size arrays. For both of them, we > send/receive 1,000,000 arrays and take the average to find the time per > package. Only package size changes. We have made many trials and confirmed > this abnormal case is persistent. More specifically, communication of > 4k-Byte packages are 2 times faster than the communication of 512-Byte > packages. > > The OSU bandwidth and latency test around these points shows: > Byte MB/s > 256 417.53 > 512 592.34 > 1024 691.02 > 2048 857.35 > 4096 906.04 > 8192 1022.52 > Time (usec) > 256 4.79 > 512 5.48 > 1024 6.60 > 2048 8.30 > 4096 11.02 > So this behavior does not seem reasonable to us. > > 2. SOMETIMES, after the test with overall 32 processes, one of the four > processes at node3 hangs in TASK_UNINTERRUPTABLE "D" state. Hence, the test > program shows a "done." and waits for sometime. We can neither kill the > process nor soft reboot the node. We have to wait for that process to > terminate, which can last long. > > Does anybody have some comments in these issues? > Thanks in advance, > Tahir Malas > Bilkent University > Electrical and Electronics Engineering Department > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- http://www.cse.ohio-state.edu/~surs From hahn at mcmaster.ca Tue Jun 12 11:14:55 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue Jun 12 11:46:52 2007 Subject: [mvapich-discuss] Re: [Beowulf] Two problems related to slowness and TASK_UNINTERRUPTABLE process In-Reply-To: <01ae01c7acc2$dfa8e810$d80cb38b@bs> References: <01ae01c7acc2$dfa8e810$d80cb38b@bs> Message-ID: > For 32 processes (4 process per node), the arrays with 512-Byte size are > communicated slower than the 4096-Byte size arrays. For both of them, we do you mean that this is not the case in other configurations? an interconnect _should_ have some steep rise in effective bandwidth as packet size is increased. it's a useful metric to know the packet size at which half-peak bandwidth is achieved, since this offers some "sense of scale" to programmers judging whether their own packet sizes are appropriate. > this abnormal case is persistent. More specifically, communication of > 4k-Byte packages are 2 times faster than the communication of 512-Byte > packages. perhaps I'm dense this morning, but what's unexpected about that? > The OSU bandwidth and latency test around these points shows: > Byte MB/s > 256 417.53 > 512 592.34 > 1024 691.02 > 2048 857.35 > 4096 906.04 > 8192 1022.52 the osu_bw test is a streaming, fire-and-forget one which strongly rewards message aggregation. (this is not necessarily deceptive - it's measuring a real communication pattern, though it's not the only way to quantify bandwidth.) you can see that it's aggregating because the reported bandwidth for small packets is much higher than you'd expect if each packet took the latency reported below. (unless my math is wrong, 256/(2*4.79e-6) = 26.7 MB/s) > Time (usec) > 256 4.79 > 512 5.48 > 1024 6.60 > 2048 8.30 > 4096 11.02 > So this behavior does not seem reasonable to us. > > 2. SOMETIMES, after the test with overall 32 processes, one of the four > processes at node3 hangs in TASK_UNINTERRUPTABLE "D" state. Hence, the test > program shows a "done." and waits for sometime. We can neither kill the > process nor soft reboot the node. We have to wait for that process to > terminate, which can last long. does /proc/$pid/wchan (on the 'D' state process) tell you anything? do all the ranks return from MPI_Finalize? regards, mark hahn. From THOMAS.T.O'SHEA at saic.com Tue Jun 12 15:52:57 2007 From: THOMAS.T.O'SHEA at saic.com (Thomas O'Shea) Date: Tue Jun 12 15:53:19 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion'current_bytes[vc->smp.local_nodes]==0' failed. (fwd) References: Message-ID: <007501c7ad2b$460d6370$9b66798b@us.saic.com> Thanks for taking a look at this. I think we've narrowed it down to using the MPI_TYPE_VECTOR to make some derived datatypes. When we use the code without them it seems to function fine. There are some issues with handing out the code so it may take a while to boil it down to a simple section that I can pull out and post here. Are there any known limits to the size of MPI_TYPE_VECTOR datatypes? We are using these in conjunction with one sided communications, and I know some implementations require the memory to be used in RMA needs to be allocated using MPI_ALLOC_MEM when using derived datatypes, but I didn't think MPICH2 was one of them. Thanks, Tom > > Hi Thomas > > Thanks for your reply. > > We have tried out the communication patterns that you had reported > and they run fine even with the case where the two processes are > running on the same node. > > I have attached the tests with this mail. These tests are > are from the mpich2 test suite (test4.c test4_am.c transpose.c). > You can also try out these tests on your system. > > We have not been able to reproduce the error that you are reporting. > Can you give more insights into the application code that you are > running or if it is possible give us the application code. > > Thanks > Gopal > > > Date: Mon, 4 Jun 2007 08:37:23 -0700 > > From: Thomas O'Shea > > Reply-To: Thomas O'Shea > > To: wei huang > > Cc: mvapich-discuss@cse.ohio-state.edu > > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > We migrated over to gen2 (OpenFabrics ) and we are still getting the same > > errors. I was wondering if you found anything, or have any ideas of what to > > try next. > > > > Thanks, > > Tom > > ----- Original Message ----- > > From: "wei huang" > > To: "Thomas O'Shea" > > Cc: > > Sent: Friday, May 04, 2007 3:06 PM > > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > > > > Hi Thomas, > > > > > > Thanks for your reply. > > > > > > Because the source code of your application is not available to us, we > > > will do a code review of our code (or do you have a piece of code which > > > shows the problem that can be sent to us?) > > > > > > The reason I ask you to try gen2 (OpenFabrics) stack is because the whole > > > InfiniBand community is moving towards this. So actually most of our > > > efforts is spent on this front (though we still maintain certain necessary > > > maintenance and bug fixes for the vapi stack). You can find useful > > > information to install the OFED stack (OpenFabrics Enterprise > > > Distribution) here: > > > > > > http://www.openfabrics.org/downloads.htm > > > > > > And the information to compile mvapich2 with OFED stack is avaialable > > > through our website. > > > > > > Anyway, we will get back to you once we find something. > > > > > > Thanks. > > > > > > Regards, > > > Wei Huang > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > Dept. of Computer Science and Engineering > > > Ohio State University > > > OH 43210 > > > Tel: (614)292-8501 > > > > > > > > > On Fri, 4 May 2007, Thomas O'Shea wrote: > > > > > > > Thanks for the response. > > > > > > > > 1) Turns out we are using mvapich2-0.9.8p1 already. > > > > > > > > 2) Yes, the standard compiling scripts were used. > > > > > > > > 3) You are correct, most of the communication involves one sided > > operations > > > > with passive synchronization. The code also uses a few other MPI > > commands. > > > > > > > > We define MPI vector types: > > > > > > > > CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > > > > & xtype,ierr) > > > > > > > > CALL MPI_TYPE_COMMIT(xtype,ierr) > > > > > > > > Create MPI Windows: > > > > > > > > CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > > > > & MPI_COMM_WORLD,win,ierr) > > > > > > > > Synch our gets with lock and unlock: > > > > > > > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > > > > CALL MPI_GET(wget,1,xtype,get_pe, > > > > & targ_disp,1,xtype,win,ierr) > > > > CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > > > > > > > > We use one broadcast call > > > > > > > > call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > > > > 1 MPI_COMM_WORLD,ierr) > > > > > > > > And of course barriers and freeing the windows and vector types. > > > > > > > > The error we are getting happens on a MPI_WIN_UNLOCK after a GET call > > that > > > > does not use the MPI_TYPE_VECTOR that we created though. The ierr from > > the > > > > GET call is 0 as well. > > > > > > > > > > > > 4) I talked with the IT person in charge of this cluster and he said > > that we > > > > could try that, but he said the documentation he found on gen2 and udapl > > was > > > > somewhat sparse in that he wasn't sure exactly how to set that up and > > what > > > > the different compilations actually do differently. Is there any > > resource > > > > you can point us towards? > > > > > > > > Thanks, > > > > Tom > > > > > > > > > > > > > Hi Thomas, > > > > > > > > > > We will look into this issue. Would you please let us know the > > following: > > > > > > > > > > 1) We have recently made a couple of bug fixes and released > > > > > mvapich2-0.9.8p1. Would you first try that version? > > > > > > > > > > And if it is not working: > > > > > > > > > > 2) Did you use the standard compiling scripts (you mentioned ib gold > > > > > release, is it on vapi? And did you use make.mvapich2.vapi?) > > > > > > > > > > 3) Would you provide us some information on how the comunication > > patterns > > > > > of your application are? It seems like one sided operations with > > passive > > > > > synchronization (lock, get, unlock). Did you use other operations? > > > > > > > > > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl > > on > > > > > your stack, if they are available on your systems? > > > > > > > > > > Thanks. > > > > > > > > > > Regards, > > > > > Wei Huang > > > > > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > > > Dept. of Computer Science and Engineering > > > > > Ohio State University > > > > > OH 43210 > > > > > Tel: (614)292-8501 > > > > > > > > > > > > > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > > > > > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > > > > > through infiniband. I started off running this parallel Fortran code > > > > > > on just one node with MPICH2 and had no problems. It scaled decently > > > > > > to 8 processors but didn't see much improvement with the jump to 16 > > > > > > (possibly due to cache coherency or something). Now, when trying to > > > > > > get it running across the infiniband connect I get this error: > > > > > > > > > > > > current bytes 4, total bytes 28, remote id 1 > > > > > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: > > Assertion > > > > 'current_bytes[vc->smp.local_nodes] == 0' failed. > > > > > > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > > > > > > exit status of rank 0: killed by signal 9 > > > > > > > > > > > > This happens right after a one sided communication (MPI_GET) but > > > > > > before the MPI_WIN_UNLOCK call that follows. Also this is only with > > a > > > > > > process that is on the same node as the calling process, The MPI_GET > > > > > > call exits with no errors also. > > > > > > > > > > > > All the osu_benchmarks run with no problems. There were also no > > > > > > problems if I make a local mpd (mpd &) ring on a single node and run > > > > > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile > > with > > > > > > the MPICH2 libraries there are no problems on a single node or > > running > > > > > > processes spread out on both nodes. > > > > > > > > > > > > Ever seen this before? Any help would be greatly appreciated. > > > > > > > > > > > > Thanks, > > > > > > Thomas O'Shea > > > > > > SAIC > > > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > ---------------------------------------------------------------------------- ---- > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From surs at cse.ohio-state.edu Tue Jun 12 11:09:01 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Tue Jun 12 16:17:22 2007 Subject: [Beowulf] Re: [mvapich-discuss] Two problems related to slowness and TASK_UNINTERRUPTABLE process In-Reply-To: <01ae01c7acc2$dfa8e810$d80cb38b@bs> References: <01ae01c7acc2$dfa8e810$d80cb38b@bs> Message-ID: <466EB70D.2000306@cse.ohio-state.edu> Hi Tahir, Thanks for sharing this data and your observations. It is interesting. We have a more recent release, MVAPICH-0.9.9 which is available from our website (mvapich.cse.ohio-state.edu) as well as with OFED-1.2 distribution. Could you please try out our newer release and see if the results change/remain the same? Thanks, Sayantan. Tahir Malas wrote: > Hi all, > We have an 8 dual quad-core node HP cluster connected via Infiniband. We use > Voltaire DDR cards and 24-port switch. We also use OFED 1.1 and MVAPICH > 0.9.7. We have two interesting problems that we could not overcome yet: > > 1. In our test program which mimics the communications in our code, the > nodes are paired as follows: (0 and 1), (2 and 3), (4 and 5), (6 and 7). We > perform one to one communications between these pairs of nodes > simultaneously. We use blocking MPI send and receive commands to communicate > an integer array of various sizes. In addition, we consider different > numbers of processes: > (a) 1 process per node, 8 processes overall: One link is established between > the pairs of nodes. > (b) 2 process per node, 16 processes overall: Two links are established > between the pairs of nodes. > (c) 4 process per node, 32 processes overall: Four links are established > between the pairs of nodes. > (d) 8 process per node, 64 processes overall: Eight links are established > between the pairs of nodes. > > We obtain logical timings, except for the following interesting comparison: > > For 32 processes (4 process per node), the arrays with 512-Byte size are > communicated slower than the 4096-Byte size arrays. For both of them, we > send/receive 1,000,000 arrays and take the average to find the time per > package. Only package size changes. We have made many trials and confirmed > this abnormal case is persistent. More specifically, communication of > 4k-Byte packages are 2 times faster than the communication of 512-Byte > packages. > > The OSU bandwidth and latency test around these points shows: > Byte MB/s > 256 417.53 > 512 592.34 > 1024 691.02 > 2048 857.35 > 4096 906.04 > 8192 1022.52 > Time (usec) > 256 4.79 > 512 5.48 > 1024 6.60 > 2048 8.30 > 4096 11.02 > So this behavior does not seem reasonable to us. > > 2. SOMETIMES, after the test with overall 32 processes, one of the four > processes at node3 hangs in TASK_UNINTERRUPTABLE "D" state. Hence, the test > program shows a "done." and waits for sometime. We can neither kill the > process nor soft reboot the node. We have to wait for that process to > terminate, which can last long. > > Does anybody have some comments in these issues? > Thanks in advance, > Tahir Malas > Bilkent University > Electrical and Electronics Engineering Department > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- http://www.cse.ohio-state.edu/~surs _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From vishwas.vasisht at Locuz.com Wed Jun 13 03:59:18 2007 From: vishwas.vasisht at Locuz.com (Vishwas) Date: Wed Jun 13 07:50:01 2007 Subject: [mvapich-discuss] Cluster hanged with mpdallexit message in log Message-ID: Hi, Whole of my cluster got hanged. I could ping the machines but could not login. I have the following message in log Jun 12 14:45:46 node000 python2.4: mpdallexit: mpd_uncaught_except_tb handling: exceptions.TypeError: not all arguments converted during string formatting /usr/local/mvapich2/bin/mpdlib.py 899 __init__ mpd_print(1,'forked process failed; status=' % status) /usr/local/mvapich2/bin/mpdallexit 44 mpdallexit conSock = MPDConClientSock(mpdroot=mpdroot,secretword=parmdb['MPD_SECRETWORD']) /usr/local/mvapich2/bin/mpdallexit 59 ? mpdallexit() Can somebody help me out in this regard. Thanks Vishwas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070613/006e508b/attachment.html From potts at hpcapplications.com Wed Jun 13 09:47:52 2007 From: potts at hpcapplications.com (Mark Potts) Date: Wed Jun 13 09:48:11 2007 Subject: [mvapich-discuss] mvapich jobs cleanup Message-ID: <466FF588.8050204@hpcapplications.com> Hi, We are observing a number of cases in which MVAPICH-0.9.9 jobs launched with mpirun_rsh leave stray processes on some nodes when the job terminates abnormally. Those stray processes continue to run forever and require recognition and killing. Is there a reason this happens with MVAPICH, and is there a way to prevent it. This doesn't seem to be the behavior that occurs for abnormally terminated Voltaire MPI or Intel MPI jobs. regards, -- *********************************** >> Mark J. Potts, PhD >> >> HPC Applications Inc. >> phone: 410-992-8360 Bus >> 410-313-9318 Home >> 443-418-4375 Cell >> email: potts@hpcapplications.com >> potts@excray.com *********************************** From santhana at cse.ohio-state.edu Wed Jun 13 10:21:45 2007 From: santhana at cse.ohio-state.edu (Gopal Santhanaraman) Date: Wed Jun 13 10:22:00 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion'current_bytes[vc->smp.local_nodes]==0' failed. (fwd) In-Reply-To: <007501c7ad2b$460d6370$9b66798b@us.saic.com> Message-ID: Hi Thomas, Thanks for your feedback. I don't know of any known upper limits to the size of MPI_TYPE_VECTOR datatypes. Can you let us know how big is the count and blocklength of the vector datatypes that you are using. Also whenever passive synchronization is used , it is recommended to use MPI_ALLOC_MEM. Thanks Gopal On Tue, 12 Jun 2007, Thomas O'Shea wrote: > Thanks for taking a look at this. I think we've narrowed it down to using > the MPI_TYPE_VECTOR to make some derived datatypes. When we use the code > without them it seems to function fine. There are some issues with handing > out the code so it may take a while to boil it down to a simple section that > I can pull out and post here. > > Are there any known limits to the size of MPI_TYPE_VECTOR datatypes? We are > using these in conjunction with one sided communications, and I know some > implementations require the memory to be used in RMA needs to be allocated > using MPI_ALLOC_MEM when using derived datatypes, but I didn't think MPICH2 > was one of them. > > Thanks, > Tom > > > > > > Hi Thomas > > > > Thanks for your reply. > > > > We have tried out the communication patterns that you had reported > > and they run fine even with the case where the two processes are > > running on the same node. > > > > I have attached the tests with this mail. These tests are > > are from the mpich2 test suite (test4.c test4_am.c transpose.c). > > You can also try out these tests on your system. > > > > We have not been able to reproduce the error that you are reporting. > > Can you give more insights into the application code that you are > > running or if it is possible give us the application code. > > > > Thanks > > Gopal > > > > > Date: Mon, 4 Jun 2007 08:37:23 -0700 > > > From: Thomas O'Shea > > > Reply-To: Thomas O'Shea > > > To: wei huang > > > Cc: mvapich-discuss@cse.ohio-state.edu > > > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > > > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > > > We migrated over to gen2 (OpenFabrics ) and we are still getting the > same > > > errors. I was wondering if you found anything, or have any ideas of what > to > > > try next. > > > > > > Thanks, > > > Tom > > > ----- Original Message ----- > > > From: "wei huang" > > > To: "Thomas O'Shea" > > > Cc: > > > Sent: Friday, May 04, 2007 3:06 PM > > > Subject: Re: [mvapich-discuss] MVAPICH2 Error - Assertion > > > 'current_bytes[vc->smp.local_nodes]==0' failed. > > > > > > > > > > Hi Thomas, > > > > > > > > Thanks for your reply. > > > > > > > > Because the source code of your application is not available to us, we > > > > will do a code review of our code (or do you have a piece of code > which > > > > shows the problem that can be sent to us?) > > > > > > > > The reason I ask you to try gen2 (OpenFabrics) stack is because the > whole > > > > InfiniBand community is moving towards this. So actually most of our > > > > efforts is spent on this front (though we still maintain certain > necessary > > > > maintenance and bug fixes for the vapi stack). You can find useful > > > > information to install the OFED stack (OpenFabrics Enterprise > > > > Distribution) here: > > > > > > > > http://www.openfabrics.org/downloads.htm > > > > > > > > And the information to compile mvapich2 with OFED stack is avaialable > > > > through our website. > > > > > > > > Anyway, we will get back to you once we find something. > > > > > > > > Thanks. > > > > > > > > Regards, > > > > Wei Huang > > > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > > Dept. of Computer Science and Engineering > > > > Ohio State University > > > > OH 43210 > > > > Tel: (614)292-8501 > > > > > > > > > > > > On Fri, 4 May 2007, Thomas O'Shea wrote: > > > > > > > > > Thanks for the response. > > > > > > > > > > 1) Turns out we are using mvapich2-0.9.8p1 already. > > > > > > > > > > 2) Yes, the standard compiling scripts were used. > > > > > > > > > > 3) You are correct, most of the communication involves one sided > > > operations > > > > > with passive synchronization. The code also uses a few other MPI > > > commands. > > > > > > > > > > We define MPI vector types: > > > > > > > > > > CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > > > > > & xtype,ierr) > > > > > > > > > > CALL MPI_TYPE_COMMIT(xtype,ierr) > > > > > > > > > > Create MPI Windows: > > > > > > > > > > CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > > > > > & MPI_COMM_WORLD,win,ierr) > > > > > > > > > > Synch our gets with lock and unlock: > > > > > > > > > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > > > > > CALL MPI_GET(wget,1,xtype,get_pe, > > > > > & targ_disp,1,xtype,win,ierr) > > > > > CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > > > > > > > > > > We use one broadcast call > > > > > > > > > > call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > > > > > 1 MPI_COMM_WORLD,ierr) > > > > > > > > > > And of course barriers and freeing the windows and vector types. > > > > > > > > > > The error we are getting happens on a MPI_WIN_UNLOCK after a GET > call > > > that > > > > > does not use the MPI_TYPE_VECTOR that we created though. The ierr > from > > > the > > > > > GET call is 0 as well. > > > > > > > > > > > > > > > 4) I talked with the IT person in charge of this cluster and he said > > > that we > > > > > could try that, but he said the documentation he found on gen2 and > udapl > > > was > > > > > somewhat sparse in that he wasn't sure exactly how to set that up > and > > > what > > > > > the different compilations actually do differently. Is there any > > > resource > > > > > you can point us towards? > > > > > > > > > > Thanks, > > > > > Tom > > > > > > > > > > > > > > > > Hi Thomas, > > > > > > > > > > > > We will look into this issue. Would you please let us know the > > > following: > > > > > > > > > > > > 1) We have recently made a couple of bug fixes and released > > > > > > mvapich2-0.9.8p1. Would you first try that version? > > > > > > > > > > > > And if it is not working: > > > > > > > > > > > > 2) Did you use the standard compiling scripts (you mentioned ib > gold > > > > > > release, is it on vapi? And did you use make.mvapich2.vapi?) > > > > > > > > > > > > 3) Would you provide us some information on how the comunication > > > patterns > > > > > > of your application are? It seems like one sided operations with > > > passive > > > > > > synchronization (lock, get, unlock). Did you use other operations? > > > > > > > > > > > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or > udapl > > > on > > > > > > your stack, if they are available on your systems? > > > > > > > > > > > > Thanks. > > > > > > > > > > > > Regards, > > > > > > Wei Huang > > > > > > > > > > > > 774 Dreese Lab, 2015 Neil Ave, > > > > > > Dept. of Computer Science and Engineering > > > > > > Ohio State University > > > > > > OH 43210 > > > > > > Tel: (614)292-8501 > > > > > > > > > > > > > > > > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've > got 2 > > > > > > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > > > > > > through infiniband. I started off running this parallel Fortran > code > > > > > > > on just one node with MPICH2 and had no problems. It scaled > decently > > > > > > > to 8 processors but didn't see much improvement with the jump to > 16 > > > > > > > (possibly due to cache coherency or something). Now, when trying > to > > > > > > > get it running across the infiniband connect I get this error: > > > > > > > > > > > > > > current bytes 4, total bytes 28, remote id 1 > > > > > > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: > > > Assertion > > > > > 'current_bytes[vc->smp.local_nodes] == 0' failed. > > > > > > > rank 0 in job 1 nessie_32906 caused collective abort of all > ranks > > > > > > > exit status of rank 0: killed by signal 9 > > > > > > > > > > > > > > This happens right after a one sided communication (MPI_GET) but > > > > > > > before the MPI_WIN_UNLOCK call that follows. Also this is only > with > > > a > > > > > > > process that is on the same node as the calling process, The > MPI_GET > > > > > > > call exits with no errors also. > > > > > > > > > > > > > > All the osu_benchmarks run with no problems. There were also no > > > > > > > problems if I make a local mpd (mpd &) ring on a single node and > run > > > > > > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I > compile > > > with > > > > > > > the MPICH2 libraries there are no problems on a single node or > > > running > > > > > > > processes spread out on both nodes. > > > > > > > > > > > > > > Ever seen this before? Any help would be greatly appreciated. > > > > > > > > > > > > > > Thanks, > > > > > > > Thomas O'Shea > > > > > > > SAIC > > > > > > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > > ---------------------------------------------------------------------------- > ---- > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > From panda at cse.ohio-state.edu Fri Jun 15 14:54:04 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jun 15 14:54:20 2007 Subject: [mvapich-discuss] Announcing the availability of MVAPICH support for QLogic InfiniPath adapters Message-ID: <200706151854.l5FIs4lV008726@xi.cse.ohio-state.edu> The MVAPICH team is pleased to announce the availability of MVAPICH native support for QLogic InfiniPath adapters. Sample performance numbers include: - Opteron single-core with HT and InfiniPath-SDR: - 1.26 microsec one-way latency (4 bytes) - 953 MB/sec unidirectional bandwidth - 1889 MB/sec bidirectional bandwidth - EM64T quad-core with PCIe and InfiniPath-SDR: - 1.91 microsec one-way latency (4 bytes) - 957 MB/sec unidirectional bandwidth - 1565 MB/sec bidirectional bandwidth More detailed performance numbers can be viewed by visiting `Performance' section of the project's web page. For downloading this new support and accessing the anonymous SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu/ Please post your feedback to mvapich-discuss mailing list. Thanks, MVAPICH Team ====================================================================== MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation, U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, QLogic, Sun Microsystems and Linux Networx; and with equipment support from Advanced Clustering, AMD, Apple, Appro, Chelsio, Dell, Fujitsu, Fulcrum, IBM, Intel, Mellanox, Microway, NetEffect, QLogic and Sun Microsystems. Other technology partner includes Etnus. ====================================================================== From tmalas at ee.bilkent.edu.tr Mon Jun 18 12:23:04 2007 From: tmalas at ee.bilkent.edu.tr (Tahir Malas) Date: Mon Jun 18 12:23:12 2007 Subject: [mvapich-discuss] Two problems related to slowness and TASK_UNINTERRUPTABLE process In-Reply-To: <466EB70D.2000306@cse.ohio-state.edu> References: <01ae01c7acc2$dfa8e810$d80cb38b@bs> <466EB70D.2000306@cse.ohio-state.edu> Message-ID: <00da01c7b1c4$f2e2ba80$d80cb38b@bs> Hi Sayantan, We have installed OFED 1.2, and our two problems have gone! Now there is neither suspending processes and nor inconsistent communication times: PACKAGE SIZE 512 BYTES 1.76 PACKAGE SIZE 4096 BYTES 13.83 These were Our test: 512: 29.434 4096: 16.209 with OFED 1.1. Thanks and regards, Tahir Malas Bilkent University Electrical and Electronics Engineering Department Phone: +90 312 290 1385 > -----Original Message----- > From: Sayantan Sur [mailto:surs@cse.ohio-state.edu] > Sent: Tuesday, June 12, 2007 6:09 PM > To: Tahir Malas > Cc: mvapich-discuss@cse.ohio-state.edu; beowulf@beowulf.org; > teoman.terzi@gmail.com; 'Ozgur Ergul' > Subject: Re: [mvapich-discuss] Two problems related to slowness and > TASK_UNINTERRUPTABLE process > > Hi Tahir, > > Thanks for sharing this data and your observations. It is interesting. > We have a more recent release, MVAPICH-0.9.9 which is available from our > website (mvapich.cse.ohio-state.edu) as well as with OFED-1.2 > distribution. Could you please try out our newer release and see if the > results change/remain the same? > > Thanks, > Sayantan. > > Tahir Malas wrote: > > Hi all, > > We have an 8 dual quad-core node HP cluster connected via Infiniband. We > use > > Voltaire DDR cards and 24-port switch. We also use OFED 1.1 and MVAPICH > > 0.9.7. We have two interesting problems that we could not overcome yet: > > > > 1. In our test program which mimics the communications in our code, the > > nodes are paired as follows: (0 and 1), (2 and 3), (4 and 5), (6 and 7). > We > > perform one to one communications between these pairs of nodes > > simultaneously. We use blocking MPI send and receive commands to > communicate > > an integer array of various sizes. In addition, we consider different > > numbers of processes: > > (a) 1 process per node, 8 processes overall: One link is established > between > > the pairs of nodes. > > (b) 2 process per node, 16 processes overall: Two links are established > > between the pairs of nodes. > > (c) 4 process per node, 32 processes overall: Four links are established > > between the pairs of nodes. > > (d) 8 process per node, 64 processes overall: Eight links are > established > > between the pairs of nodes. > > > > We obtain logical timings, except for the following interesting > comparison: > > > > For 32 processes (4 process per node), the arrays with 512-Byte size are > > communicated slower than the 4096-Byte size arrays. For both of them, we > > send/receive 1,000,000 arrays and take the average to find the time per > > package. Only package size changes. We have made many trials and > confirmed > > this abnormal case is persistent. More specifically, communication of > > 4k-Byte packages are 2 times faster than the communication of 512-Byte > > packages. > > > > The OSU bandwidth and latency test around these points shows: > > Byte MB/s > > 256 417.53 > > 512 592.34 > > 1024 691.02 > > 2048 857.35 > > 4096 906.04 > > 8192 1022.52 > > Time (usec) > > 256 4.79 > > 512 5.48 > > 1024 6.60 > > 2048 8.30 > > 4096 11.02 > > So this behavior does not seem reasonable to us. > > > > 2. SOMETIMES, after the test with overall 32 processes, one of the four > > processes at node3 hangs in TASK_UNINTERRUPTABLE "D" state. Hence, the > test > > program shows a "done." and waits for sometime. We can neither kill the > > process nor soft reboot the node. We have to wait for that process to > > terminate, which can last long. > > > > Does anybody have some comments in these issues? > > Thanks in advance, > > Tahir Malas > > Bilkent University > > Electrical and Electronics Engineering Department > > > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > -- > http://www.cse.ohio-state.edu/~surs > From yuan.65 at osu.edu Mon Jun 18 15:51:37 2007 From: yuan.65 at osu.edu (YANG YUAN) Date: Mon Jun 18 15:51:52 2007 Subject: [mvapich-discuss] errors when using MVAPICH on P4 Cluster@OSC Message-ID: <19f92f919feed2.19feed219f92f9@osu.edu> Hi there, I am a new user of MVAPICH, currently i am working on a parallel application using MPI on P4 cluster@OSC. the following error is what i've experienced after successfully compiling and linking. /*** [0] Abort: Error: EVAPI_list_hcas: Error in an underlying O/S call at line 258 in file viaparam.c mpiexec: Error: read_full: EOF, only 0 of 4 bytes. ***/ Could you please tell me how to get rid of this error? Many thanks. Yang p.s. I couldn't use mpicc, mpiCC or mpicxx to compile my code, so currently, the code is compiled and linked using icpc together with $MPI_LIBS and $MPI_CFLAGS. ************************************* Yang Yuan Ph.D. student in Operations Research IWSE, The Ohio State University 210 Baker Systems 1971 Neil Ave Columbus, OH 43210 From vishnu at cse.ohio-state.edu Wed Jun 20 10:43:02 2007 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Wed Jun 20 10:44:57 2007 Subject: [mvapich-discuss] errors when using MVAPICH on P4 Cluster@OSC In-Reply-To: <19f92f919feed2.19feed219f92f9@osu.edu> References: <19f92f919feed2.19feed219f92f9@osu.edu> Message-ID: <20070620144258.GA15162@cse.ohio-state.edu> Hi, Thanks for using MVAPICH and reporting the problem to us. > > Hi there, > > I am a new user of MVAPICH, currently i am working on a parallel application using MPI on P4 cluster@OSC. the following error is what i've experienced after successfully compiling and linking. > > /*** > [0] Abort: Error: EVAPI_list_hcas: Error in an underlying O/S call > at line 258 in file viaparam.c > mpiexec: Error: read_full: EOF, only 0 of 4 bytes. > ***/ > > Could you please tell me how to get rid of this error? Many thanks. > > Yang > > p.s. I couldn't use mpicc, mpiCC or mpicxx to compile my code, so currently, the code is compiled and linked using icpc together with $MPI_LIBS and $MPI_CFLAGS. > This is strange. I would not expect to be able to run an MPI application without compiling with mpicc in general. Can you possibly use a dummy program which calls MPI_Init and follows it with MPI_Finalize and no other communication and computation. If you are not able to compile this program, then i think it would be the best to contact your system administrator regarding the MPI installation and its usage. Thanks much, :- Abhinav > > ************************************* > Yang Yuan > Ph.D. student in Operations Research > IWSE, The Ohio State University > 210 Baker Systems > 1971 Neil Ave > Columbus, OH 43210 > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From yuan.65 at osu.edu Wed Jun 20 10:56:06 2007 From: yuan.65 at osu.edu (YANG YUAN) Date: Wed Jun 20 10:56:21 2007 Subject: [mvapich-discuss] errors when using MVAPICH on P4 Cluster@OSC Message-ID: <1e49b4b1e48a7f.1e48a7f1e49b4b@osu.edu> Hi Abhinav, Thanks for your reply. I tried a dummy program, and it works fine with both mpicc and icpc. The only reason i am using icpc is that a shared open source library i am using now can not be compiled and linked together with my MPI program when using mpicc or mpicxx. I am suspecting it is the major problem that cause the failure of my program. but i don't know how to fix it. Another question, bear with me if it is too simple. I am not sure whether I have to make local copies of the shared library i am using or not. Since it is distributed system, i don't think it is accessible to all processors. Does MPI automatically do the copying? Again, appreciated! yang ----- Original Message ----- From: Abhinav Vishnu Date: Wednesday, June 20, 2007 10:43 am Subject: Re: [mvapich-discuss] errors when using MVAPICH on P4 Cluster@OSC > Hi, > > Thanks for using MVAPICH and reporting the problem to us. > > > > Hi there, > > > > I am a new user of MVAPICH, currently i am working on a parallel > application using MPI on P4 cluster@OSC. the following error is > what i've experienced after successfully compiling and linking. > > > > /*** > > [0] Abort: Error: EVAPI_list_hcas: Error in an underlying O/S call > > at line 258 in file viaparam.c > > mpiexec: Error: read_full: EOF, only 0 of 4 bytes. > > ***/ > > > > Could you please tell me how to get rid of this error? Many thanks. > > > > Yang > > > > p.s. I couldn't use mpicc, mpiCC or mpicxx to compile my code, > so currently, the code is compiled and linked using icpc together > with $MPI_LIBS and $MPI_CFLAGS. > > > > This is strange. I would not expect to be able to run an MPI > applicationwithout compiling with mpicc in general. > > Can you possibly use a dummy program which calls MPI_Init and > follows