From koop at cse.ohio-state.edu Tue May 1 01:48:15 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue May 1 01:48:25 2007 Subject: [mvapich-discuss] viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head != ((void *)0)' failed. In-Reply-To: <46368425.8050005@llnl.gov> Message-ID: Adam, Based on your line number, it appears that this is the 0.9.7-mlx that shipped as a part of OFED 1.1. Is this correct? If so, it will be very hard for us to determine if the issue is still there since it is a different codebase than the 0.9.7 shipped from OSU. If you are able to reproduce in any mode on 0.9.9 please let us know and we'll be very interested to investigate further. Thanks, Matt On Mon, 30 Apr 2007, Adam Moody wrote: > Hello all, > One user's code will sometimes die with MVAPICH-1 0.9.7. One a given > run, it will randomly lead to one of three outcomes: > #1) viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head > != ((void *)0)' failed. > #2) MPI_IRECV : Invalid count argument > #3) the code runs without error > From what I can tell, in case #1, the message that leads to the > assertion failure is an unexpected eager message ~1700 bytes from an > off-node task. The rhandle shows that vbufs_expected=1, but both > vbuf_head and vbuf_tail are NULL. > > So far, this code runs without error in 0.9.9. I'd like to determine > whether 0.9.9 fixes the problem, or whether it's still out there, but > that the new optimizations in 0.9.9 affect timings in such a way so as > to increase our odds of avoiding it. Are there any particular fixes in > 0.9.9 which address the race condition described above? > Thanks, > -Adam Moody > DEG/LLNL > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From pasha at dev.mellanox.co.il Tue May 1 04:48:30 2007 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Tue May 1 04:48:55 2007 Subject: [mvapich-discuss] viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head != ((void *)0)' failed. In-Reply-To: References: Message-ID: <4636FEDE.3010808@dev.mellanox.co.il> Hi All, viarecv.c:613 it is code from mvapich-0.9.7-mlx2.2.0 The same code exists in 0.9.9 -viarecv.c line 555 I will try to analyze the issue in mlx2.2.0 Regards, Pasha Matthew Koop wrote: > Adam, > > Based on your line number, it appears that this is the 0.9.7-mlx that > shipped as a part of OFED 1.1. Is this correct? If so, it will be very > hard for us to determine if the issue is still there since it is a > different codebase than the 0.9.7 shipped from OSU. > > If you are able to reproduce in any mode on 0.9.9 please let us know and > we'll be very interested to investigate further. > > Thanks, > Matt > > On Mon, 30 Apr 2007, Adam Moody wrote: > > >> Hello all, >> One user's code will sometimes die with MVAPICH-1 0.9.7. One a given >> run, it will randomly lead to one of three outcomes: >> #1) viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head >> != ((void *)0)' failed. >> #2) MPI_IRECV : Invalid count argument >> #3) the code runs without error >> From what I can tell, in case #1, the message that leads to the >> assertion failure is an unexpected eager message ~1700 bytes from an >> off-node task. The rhandle shows that vbufs_expected=1, but both >> vbuf_head and vbuf_tail are NULL. >> >> So far, this code runs without error in 0.9.9. I'd like to determine >> whether 0.9.9 fixes the problem, or whether it's still out there, but >> that the new optimizations in 0.9.9 affect timings in such a way so as >> to increase our odds of avoiding it. Are there any particular fixes in >> 0.9.9 which address the race condition described above? >> Thanks, >> -Adam Moody >> DEG/LLNL >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From jonathan_follows at uk.ibm.com Tue May 1 07:05:31 2007 From: jonathan_follows at uk.ibm.com (Jonathan Follows) Date: Tue May 1 07:05:43 2007 Subject: [mvapich-discuss] Jonathan is on holiday until Tuesday May 8th. Message-ID: I will be out of the office starting 04/29/2007 and will not return until 05/08/2007. I will not be back at work until Tuesday May 8th. From moody20 at llnl.gov Tue May 1 14:14:02 2007 From: moody20 at llnl.gov (Adam Moody) Date: Tue May 1 14:14:12 2007 Subject: [mvapich-discuss] viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head != ((void *)0)' failed. In-Reply-To: <4636FEDE.3010808@dev.mellanox.co.il> References: <4636FEDE.3010808@dev.mellanox.co.il> Message-ID: <4637836A.2020709@llnl.gov> Yes, thanks. I haven't yet reproduced it in 0.9.9. I suppose we could try a pure OSU 0.9.7, but I don't think I have a copy around. Maybe I could pull one off of a svn branch or off the trunk given a revision number. -Adam Pavel Shamis (Pasha) wrote: > Hi All, > viarecv.c:613 it is code from mvapich-0.9.7-mlx2.2.0 > The same code exists in 0.9.9 -viarecv.c line 555 > I will try to analyze the issue in mlx2.2.0 > > Regards, > Pasha > > Matthew Koop wrote: > >> Adam, >> >> Based on your line number, it appears that this is the 0.9.7-mlx that >> shipped as a part of OFED 1.1. Is this correct? If so, it will be very >> hard for us to determine if the issue is still there since it is a >> different codebase than the 0.9.7 shipped from OSU. >> >> If you are able to reproduce in any mode on 0.9.9 please let us know and >> we'll be very interested to investigate further. >> >> Thanks, >> Matt >> >> On Mon, 30 Apr 2007, Adam Moody wrote: >> >> >> >>> Hello all, >>> One user's code will sometimes die with MVAPICH-1 0.9.7. One a given >>> run, it will randomly lead to one of three outcomes: >>> #1) viarecv.c:613: viadev_eager_pull: Assertion >>> `rhandle->vbuf_head >>> != ((void *)0)' failed. >>> #2) MPI_IRECV : Invalid count argument >>> #3) the code runs without error >>> From what I can tell, in case #1, the message that leads to the >>> assertion failure is an unexpected eager message ~1700 bytes from an >>> off-node task. The rhandle shows that vbufs_expected=1, but both >>> vbuf_head and vbuf_tail are NULL. >>> >>> So far, this code runs without error in 0.9.9. I'd like to determine >>> whether 0.9.9 fixes the problem, or whether it's still out there, but >>> that the new optimizations in 0.9.9 affect timings in such a way so as >>> to increase our odds of avoiding it. Are there any particular fixes in >>> 0.9.9 which address the race condition described above? >>> Thanks, >>> -Adam Moody >>> DEG/LLNL >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> >> >> >> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > From vishnu at cse.ohio-state.edu Wed May 2 12:52:52 2007 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Wed May 2 12:53:53 2007 Subject: [mvapich-discuss] apparent error In-Reply-To: <46366B07.40508@hpcapplications.com> References: <46366B07.40508@hpcapplications.com> Message-ID: <20070502165251.GA7356@cse.ohio-state.edu> Hi Mark, Thanks for your mail. > Hi, > I've noticed that in the MVAPICH 0.9.9 package included with OFED 1.2 > the source file mpid/ch_gen2_multirail/vbuf.h contains the following > at line 51: > #define VBUF_TOTAL_SIZE (1024*16) > However, all the similar code examples in this and related files > indicates the intention was to assume _SMALL_CLUSTER when the cluster > size is not specified. Thus, I believe line 51 should be: > #define VBUF_TOTAL_SIZE (1024*12) > > I have _not_ encountered a problem as a result of this apparent > error, but it seems worthy of correcting. We chose this value after rigorous analysis of performance with multi-rail device on our clusters. As you have mentioned, this is definitely not a problem. Please let us know if you have some data, which shows otherwise. We will be happy to take a look at it. Thanks and best regards, :- Abhinav > > regards, > -- > *********************************** > >> Mark J. Potts, PhD > >> > >> HPC Applications Inc. > >> phone: 410-992-8360 Bus > >> 410-313-9318 Home > >> 443-418-4375 Cell > >> email: potts@hpcapplications.com > >> potts@excray.com > *********************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From potts at hpcapplications.com Wed May 2 13:08:32 2007 From: potts at hpcapplications.com (Mark Potts) Date: Wed May 2 13:08:47 2007 Subject: [mvapich-discuss] apparent error In-Reply-To: <20070502165251.GA7356@cse.ohio-state.edu> References: <46366B07.40508@hpcapplications.com> <20070502165251.GA7356@cse.ohio-state.edu> Message-ID: <4638C590.9030502@hpcapplications.com> Abhinav, You may be right, but I can not see how rigorous analysis could reveal the value for the default case for VBUF_TOTAL_SIZE should not be the same as that for either _SMALL_, _MEDIUM_, or _LARGE_CLUSTER. In all other system types, you chose a value that was the same as _SMALL_CLUSTER as the default value. EM64T (at line 51) seems to be the odd man out. regards, Abhinav Vishnu wrote: > Hi Mark, > > Thanks for your mail. > >> Hi, >> I've noticed that in the MVAPICH 0.9.9 package included with OFED 1.2 >> the source file mpid/ch_gen2_multirail/vbuf.h contains the following >> at line 51: >> #define VBUF_TOTAL_SIZE (1024*16) >> However, all the similar code examples in this and related files >> indicates the intention was to assume _SMALL_CLUSTER when the cluster >> size is not specified. Thus, I believe line 51 should be: >> #define VBUF_TOTAL_SIZE (1024*12) >> >> I have _not_ encountered a problem as a result of this apparent >> error, but it seems worthy of correcting. > > We chose this value after rigorous analysis of performance with > multi-rail device on our clusters. As you have mentioned, this > is definitely not a problem. Please let us know if you have some > data, which shows otherwise. We will be happy to take a look at it. > > Thanks and best regards, > > :- Abhinav > >> regards, >> -- >> *********************************** >>>> Mark J. Potts, PhD >>>> >>>> HPC Applications Inc. >>>> phone: 410-992-8360 Bus >>>> 410-313-9318 Home >>>> 443-418-4375 Cell >>>> email: potts@hpcapplications.com >>>> potts@excray.com >> *********************************** >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- *********************************** >> Mark J. Potts, PhD >> >> HPC Applications Inc. >> phone: 410-992-8360 Bus >> 410-313-9318 Home >> 443-418-4375 Cell >> email: potts@hpcapplications.com >> potts@excray.com *********************************** From gbauer at ncsa.uiuc.edu Thu May 3 11:45:51 2007 From: gbauer at ncsa.uiuc.edu (Gregory Bauer) Date: Thu May 3 11:45:43 2007 Subject: [mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD Message-ID: <463A03AF.9040305@ncsa.uiuc.edu> We are new at running MVAPICH2 and are seeing some scaling issues with application start-up on an Intel dual quad-core with Infiniband interconnect. Using the typical MPI "hello world' we see that when running 512 tasks (nodes=64:ppn=8) and greater, the application makes very slow progress in mpi_init(), unless MV2_ON_DEMAND_THRESHOLD is set to a count larger than the task count. So it appears that static connection start-up is faster than 'on demand' which is seems counter-intuitive. At task counts greater than 512, the 'on demand' scheme is too slow to be practical. We must have something incorrectly configured, but what? Thanks. -Greg stack trace of the hello process #0 0x0000002a9588a0df in __read_nocancel () from /lib64/tls/libpthread.so.0 #1 0x000000000044f513 in PMIU_readline () #2 0x0000000000443e51 in PMI_KVS_Get () #3 0x00000000004158b2 in MPIDI_CH3I_SMP_init () #4 0x00000000004456f9 in MPIDI_CH3_Init () #5 0x000000000042dab6 in MPID_Init () #6 0x000000000040f469 in MPIR_Init_thread () #7 0x000000000040f0bc in PMPI_Init () #8 0x0000000000403347 in main (argc=Variable "argc" is not available. ) at hello.c:17 strace snipet of the associated python process select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], left {5, 0}) recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8 recvfrom(7, "(dp1\nS\'cmd\'\np2\nS\'response_to_pmi"..., 94, 0, NULL, NULL) = 94 sendto(8, "00000094(dp1\nS\'cmd\'\np2\nS\'respons"..., 102, 0, NULL, 0) = 102 select(16, [4 5 6 7 8 10 12 13 15], [], [], {5, 0}) = 1 (in [7], left {5, 0}) recvfrom(7, "00000094", 8, 0, NULL, NULL) = 8 recvfrom(7, "(dp1\nS\'cmd\'\npProcess 28455 detached Configuration information: $ /usr/local/mvapich2-0.9.8p1-gcc/bin/mpich2version Version: MVAPICH2-0.9.8 Device: osu_ch3:mrail Configure Options: --prefix=/usr/local/mvapich2-0.9.8p1-gcc --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd --enable-romio --without-mpe $ cat /usr/local/ofed/BUILD_ID OFED-1.1 openib-1.1 (REV=9905) $ dmesg | grep ib_ ib_core: no version for "struct_module" found: kernel tainted. ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) ib_mthca: Initializing 0000:0a:00.0 $ uname -a Linux honest1 2.6.9-42.0.8.EL_lustre.1.4.9smp #1 SMP Fri Feb 16 01:23:52 MST 2007 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/*release Red Hat Enterprise Linux AS release 4 (Nahant Update 4) From gaoq at cse.ohio-state.edu Thu May 3 12:11:28 2007 From: gaoq at cse.ohio-state.edu (Qi Gao) Date: Thu May 3 12:11:52 2007 Subject: [mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD References: <463A03AF.9040305@ncsa.uiuc.edu> Message-ID: <002b01c78d9d$bb9478c0$0763a8c0@GAO> Hi Greg, Thanks for trying our MVAPICH2 and letting us know the problem. We are glad to work with you to solve the problem. I assume this stack trace represents the place where most time is spent. And it seems that it's in SMP channel initialization function, and somehow it stucks at the PMI calls to process manager. > #3 0x00000000004158b2 in MPIDI_CH3I_SMP_init () > #4 0x00000000004456f9 in MPIDI_CH3_Init () To help us narrow down the problem, would you try to disable SMP channel and try to see whether that will help? To disable SMP channel, you need to patch the default make script "make.mvapich2.ofa". Here is the patch: ======== --- make.mvapich2.ofa.nosmp 2007-05-03 11:44:53.000000000 -0400 +++ make.mvapich2.ofa 2007-03-08 22:47:23.000000000 -0500 @@ -129,8 +129,6 @@ SHARED_LIBS="" fi +export SMP_FLAG="" + export LD_LIBRARY_PATH=$OPEN_IB_LIB:$LD_LIBRARY_PATH export LIBS=${LIBS:--L${OPEN_IB_LIB} ${BLCR_LIB} ${RDMA_CM_LIBS} -libverbs -libumad -lpthread} export FFLAGS=${FFLAGS:--L${OPEN_IB_LIB}} ======== Thanks! --Qi From gbauer at ncsa.uiuc.edu Thu May 3 14:55:38 2007 From: gbauer at ncsa.uiuc.edu (Gregory Bauer) Date: Thu May 3 14:55:32 2007 Subject: [mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD In-Reply-To: <002b01c78d9d$bb9478c0$0763a8c0@GAO> References: <463A03AF.9040305@ncsa.uiuc.edu> <002b01c78d9d$bb9478c0$0763a8c0@GAO> Message-ID: <463A302A.3080501@ncsa.uiuc.edu> Qi- We have rebuilt with the changes you provided. The time it took for mpiexec to run MPI 'Hello, world' at 1024 cores (nodes=128:ppn=8) was approximately 25 minutes. (the mpdboot and mpdtrace took only a minute or two each or so). To compare, with MV2_ON_DEMAND_THRESHOLD set to 2100, it took only 3 minutes for mpiexec to complete MPI 'Hello, world'. The SMP channel is no longer in the picture. Using gdb to take look where a MPI process, I see #0 0x0000002a9588a0df in __read_nocancel () from /lib64/tls/libpthread.so.0 #1 0x0000000000444da3 in PMIU_readline () #2 0x000000000043cb81 in PMI_KVS_Get () #3 0x0000000000415d9f in MPIDI_CH3I_CM_Init () #4 0x000000000043e410 in MPIDI_CH3_Init () #5 0x0000000000428926 in MPID_Init () #6 0x000000000040eca9 in MPIR_Init_thread () #7 0x000000000040ea4d in PMPI_Init () #8 0x00000000004030e7 in main (argc=Variable "argc" is not available. I also have strace output for that processes , and gdb and strace outpout for the associated python process. (I simply attach to the processes once in a while to see what is happening). -Greg Qi Gao wrote: > Hi Greg, > > Thanks for trying our MVAPICH2 and letting us know the problem. We are > glad to work with you to solve the problem. > > I assume this stack trace represents the place where most time is > spent. And it seems that it's in SMP channel initialization function, > and somehow it stucks at the PMI calls to process manager. > >> #3 0x00000000004158b2 in MPIDI_CH3I_SMP_init () >> #4 0x00000000004456f9 in MPIDI_CH3_Init () > > > To help us narrow down the problem, would you try to disable SMP > channel and try to see whether that will help? > > To disable SMP channel, you need to patch the default make script > "make.mvapich2.ofa". Here is the patch: > > ======== > --- make.mvapich2.ofa.nosmp 2007-05-03 11:44:53.000000000 -0400 > +++ make.mvapich2.ofa 2007-03-08 22:47:23.000000000 -0500 > @@ -129,8 +129,6 @@ > SHARED_LIBS="" > fi > > +export SMP_FLAG="" > + > export LD_LIBRARY_PATH=$OPEN_IB_LIB:$LD_LIBRARY_PATH > export LIBS=${LIBS:--L${OPEN_IB_LIB} ${BLCR_LIB} ${RDMA_CM_LIBS} > -libverbs -libumad -lpthread} > export FFLAGS=${FFLAGS:--L${OPEN_IB_LIB}} > ======== > > > Thanks! > > --Qi From gaoq at cse.ohio-state.edu Thu May 3 15:04:32 2007 From: gaoq at cse.ohio-state.edu (Qi Gao) Date: Thu May 3 15:04:57 2007 Subject: [mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD References: <463A03AF.9040305@ncsa.uiuc.edu> <002b01c78d9d$bb9478c0$0763a8c0@GAO> <463A302A.3080501@ncsa.uiuc.edu> Message-ID: <006601c78db5$e8de0590$0763a8c0@GAO> Hi Greg, Thanks for verifying this. In both cases, the program blocks at a PMI call PMI_KVS_Get(). We will look into this problem further and get back to you. Thanks! --Qi ----- Original Message ----- From: "Gregory Bauer" To: "Qi Gao" Cc: Sent: Thursday, May 03, 2007 2:55 PM Subject: Re: [mvapich-discuss] time spent in mpi_init and MV2_ON_DEMAND_THRESHOLD > Qi- > > We have rebuilt with the changes you provided. > > The time it took for mpiexec to run MPI 'Hello, world' at 1024 cores > (nodes=128:ppn=8) was approximately 25 minutes. (the mpdboot and mpdtrace > took only a minute or two each or so). > > To compare, with MV2_ON_DEMAND_THRESHOLD set to 2100, it took only 3 > minutes for mpiexec to complete MPI 'Hello, world'. > > The SMP channel is no longer in the picture. Using gdb to take look where > a MPI process, I see > > #0 0x0000002a9588a0df in __read_nocancel () from > /lib64/tls/libpthread.so.0 > #1 0x0000000000444da3 in PMIU_readline () > #2 0x000000000043cb81 in PMI_KVS_Get () > #3 0x0000000000415d9f in MPIDI_CH3I_CM_Init () > #4 0x000000000043e410 in MPIDI_CH3_Init () > #5 0x0000000000428926 in MPID_Init () > #6 0x000000000040eca9 in MPIR_Init_thread () > #7 0x000000000040ea4d in PMPI_Init () > #8 0x00000000004030e7 in main (argc=Variable "argc" is not available. > > I also have strace output for that processes , and gdb and strace outpout > for the associated python process. (I simply attach to the processes once > in a while to see what is happening). > > -Greg > > > Qi Gao wrote: > >> Hi Greg, >> >> Thanks for trying our MVAPICH2 and letting us know the problem. We are >> glad to work with you to solve the problem. >> >> I assume this stack trace represents the place where most time is spent. >> And it seems that it's in SMP channel initialization function, and >> somehow it stucks at the PMI calls to process manager. >> >>> #3 0x00000000004158b2 in MPIDI_CH3I_SMP_init () >>> #4 0x00000000004456f9 in MPIDI_CH3_Init () >> >> >> To help us narrow down the problem, would you try to disable SMP channel >> and try to see whether that will help? >> >> To disable SMP channel, you need to patch the default make script >> "make.mvapich2.ofa". Here is the patch: >> >> ======== >> --- make.mvapich2.ofa.nosmp 2007-05-03 11:44:53.000000000 -0400 >> +++ make.mvapich2.ofa 2007-03-08 22:47:23.000000000 -0500 >> @@ -129,8 +129,6 @@ >> SHARED_LIBS="" >> fi >> >> +export SMP_FLAG="" >> + >> export LD_LIBRARY_PATH=$OPEN_IB_LIB:$LD_LIBRARY_PATH >> export LIBS=${LIBS:--L${OPEN_IB_LIB} ${BLCR_LIB} >> ${RDMA_CM_LIBS} -libverbs -libumad -lpthread} >> export FFLAGS=${FFLAGS:--L${OPEN_IB_LIB}} >> ======== >> >> >> Thanks! >> >> --Qi > > From kschoche at scl.ameslab.gov Thu May 3 16:48:51 2007 From: kschoche at scl.ameslab.gov (Kyle Schochenmaier) Date: Thu May 3 15:47:56 2007 Subject: Fwd: [mvapich-discuss] errors building mvapich2-0.9.8p1 on ppc64 In-Reply-To: <4636A32C.1010208@scl.ameslab.gov> References: <4636A32C.1010208@scl.ameslab.gov> Message-ID: <463A4AB3.6090803@scl.ameslab.gov> Whats the next step to debug this compiling error? thanks, Kyle Kyle Schochenmaier wrote: > Yes, my system has uint16_t in stdint.h > > Kyle > > wei huang wrote: >> Could you verify if your system recognizes uint16_t? >> >> Regards, >> Wei Huang >> >> 774 Dreese Lab, 2015 Neil Ave, >> Dept. of Computer Science and Engineering >> Ohio State University >> OH 43210 >> Tel: (614)292-8501 >> >> >> On Fri, 27 Apr 2007, Kyle Schochenmaier wrote: >> >> >>> I've applied that patch, but it doesnt seem to have fixed the problem >>> for some reason. >>> Looks like I'm still getting the exact same error? >>> >>> Kyle >>> >>> >>> wei huang wrote: >>> >>>> Hi Kyle, >>>> >>>> Sorry we do not have any working ppc64 platforms here to exactly >>>> reproduce >>>> your problem. However, by code review we find that the following patch >>>> could solve this compilation error. Would you please try it and let us >>>> know how it works out: >>>> >>>> Index: src/mpid/osu_ch3/include/mpidpre.h >>>> =================================================================== >>>> --- src/mpid/osu_ch3/include/mpidpre.h (revision 1200) >>>> +++ src/mpid/osu_ch3/include/mpidpre.h (working copy) >>>> @@ -46,7 +46,7 @@ >>>> >>>> #if defined(MPID_USE_SEQUENCE_NUMBERS) >>>> /* Reduced bit for seqnum, origin value unsigned long */ >>>> -typedef u_int16_t MPID_Seqnum_t; >>>> +typedef uint16_t MPID_Seqnum_t; >>>> /* End of OSU-MPI2 */ >>>> /* typedef unsigned long MPID_Seqnum_t; */ >>>> #endif >>>> >>>> >>>> Thanks. >>>> >>>> -- Wei >>>> >>>> >>>> >>>> [ Part 1: "Included Message" ] >>>> >>>> Date: Thu, 26 Apr 2007 15:43:40 -0600 >>>> From: Kyle Schochenmaier >>>> To: mvapich-discuss@cse.ohio-state.edu >>>> Subject: [mvapich-discuss] errors building mvapich2-0.9.8p1 on ppc64 >>>> >>>> [ The following text is in the "iso-8859-1" character set. ] >>>> [ Your display is set for the "US-ASCII" character set. ] >>>> [ Some characters may be displayed incorrectly. ] >>>> >>>> My goal is to build mvapich2 support for both the openfabrics stack >>>> and >>>> romio with pvfs2 support, and I've run into problems building it on >>>> ppc64. >>>> I was able to build at least the ofa support on an amd64 system, so I >>>> would think this isnt related to our openfabrics release.. but no luck >>>> yet on the ppc64. >>>> >>>> I'm using the following configure line to try to build on my setup: >>>> OPEN_IB_HOME=/usr/local/ LIBS="-lpvfs2" LDFLAGS="-m64 >>>> -L/usr/src/pvfs2-4.21.07/Bppc64/lib" CFLAGS="-m64 >>>> -I/usr/src/pvfs2-4.21.07/Bppc64/include" ./make.mvapich2.ofa >>>> --enable-romio --with-file-system=pvfs2 >>>> >>>> First off, is this the correct way to build pvfs2 support? The >>>> README's >>>> describe how to build pvfs support manually, this is of course pvfs2. >>>> >>>> I've also tried to build using just the make.mvapich2.ofa script >>>> and it >>>> seems to fail in the same places: >>>> >>>> In file included from ../../../../../../include/mpiimpl.h:91, >>>> from >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidimpl.h:47, >>>> from >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/channels/mrail/include/mpidi_ch3_impl.h:23, >>>> >>>> from ch3_finalize.c:20: >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidpre.h:163: >>>> error: >>>> expected specifier-qualifier-list before 'MPID_Seqnum_t' >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidpre.h:189: >>>> error: >>>> expected specifier-qualifier-list before 'MPID_Seqnum_t' >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidpre.h:209: >>>> error: >>>> expected specifier-qualifier-list before 'MPID_Seqnum_t' >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidpre.h:216: >>>> error: >>>> expected specifier-qualifier-list before 'MPID_Seqnum_t' >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidpre.h:236: >>>> error: >>>> expected specifier-qualifier-list before 'MPID_Seqnum_t' >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidpre.h:247: >>>> error: >>>> expected specifier-qualifier-list before 'MPID_Seqnum_t' >>>> /usr/src/mvapich2-0.9.8p1/src/mpid/osu_ch3/include/mpidpre.h:262: >>>> error: >>>> expected specifier-qualifier-list before 'MPID_Seqnum_t' >>>> >>>> >>>> Regards, >>>> Wei Huang >>>> >>>> 774 Dreese Lab, 2015 Neil Ave, >>>> Dept. of Computer Science and Engineering >>>> Ohio State University >>>> OH 43210 >>>> Tel: (614)292-8501 >>>> >>>> >>>> On Fri, 27 Apr 2007, LEI CHAI wrote: >>>> >>>> >>>> [NON-Text Body part not included] >>>> >>>> >>>> >>>> >>>> >>>> >> >> >> >> >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > !DSPAM:4636a482285351336712104! > From THOMAS.T.O'SHEA at saic.com Thu May 3 20:45:57 2007 From: THOMAS.T.O'SHEA at saic.com (Thomas O'Shea) Date: Thu May 3 20:46:28 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. Message-ID: <022c01c78de5$942bbcb0$9b66798b@us.saic.com> Hello, I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up through infiniband. I started off running this parallel Fortran code on just one node with MPICH2 and had no problems. It scaled decently to 8 processors but didn't see much improvement with the jump to 16 (possibly due to cache coherency or something). Now, when trying to get it running across the infiniband connect I get this error: current bytes 4, total bytes 28, remote id 1 nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: Assertion 'current_bytes[vc->smp.local_nodes] == 0' failed. rank 0 in job 1 nessie_32906 caused collective abort of all ranks exit status of rank 0: killed by signal 9 This happens right after a one sided communication (MPI_GET) but before the MPI_WIN_UNLOCK call that follows. Also this is only with a process that is on the same node as the calling process, The MPI_GET call exits with no errors also. All the osu_benchmarks run with no problems. There were also no problems if I make a local mpd (mpd &) ring on a single node and run the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile with the MPICH2 libraries there are no problems on a single node or running processes spread out on both nodes. Ever seen this before? Any help would be greatly appreciated. Thanks, Thomas O'Shea SAIC -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070503/4f4800e2/attachment.html From huanwei at cse.ohio-state.edu Fri May 4 10:40:36 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Fri May 4 10:40:45 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. In-Reply-To: <022c01c78de5$942bbcb0$9b66798b@us.saic.com> Message-ID: Hi Thomas, We will look into this issue. Would you please let us know the following: 1) We have recently made a couple of bug fixes and released mvapich2-0.9.8p1. Would you first try that version? And if it is not working: 2) Did you use the standard compiling scripts (you mentioned ib gold release, is it on vapi? And did you use make.mvapich2.vapi?) 3) Would you provide us some information on how the comunication patterns of your application are? It seems like one sided operations with passive synchronization (lock, get, unlock). Did you use other operations? 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl on your stack, if they are available on your systems? Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Thu, 3 May 2007, Thomas O'Shea wrote: > Hello, > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > through infiniband. I started off running this parallel Fortran code > on just one node with MPICH2 and had no problems. It scaled decently > to 8 processors but didn't see much improvement with the jump to 16 > (possibly due to cache coherency or something). Now, when trying to > get it running across the infiniband connect I get this error: > > current bytes 4, total bytes 28, remote id 1 > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: Assertion 'current_bytes[vc->smp.local_nodes] == 0' failed. > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > exit status of rank 0: killed by signal 9 > > This happens right after a one sided communication (MPI_GET) but > before the MPI_WIN_UNLOCK call that follows. Also this is only with a > process that is on the same node as the calling process, The MPI_GET > call exits with no errors also. > > All the osu_benchmarks run with no problems. There were also no > problems if I make a local mpd (mpd &) ring on a single node and run > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile with > the MPICH2 libraries there are no problems on a single node or running > processes spread out on both nodes. > > Ever seen this before? Any help would be greatly appreciated. > > Thanks, > Thomas O'Shea > SAIC From THOMAS.T.O'SHEA at saic.com Fri May 4 16:34:10 2007 From: THOMAS.T.O'SHEA at saic.com (Thomas O'Shea) Date: Fri May 4 16:34:25 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. References: Message-ID: <028901c78e8b$9220cd10$9b66798b@us.saic.com> Thanks for the response. 1) Turns out we are using mvapich2-0.9.8p1 already. 2) Yes, the standard compiling scripts were used. 3) You are correct, most of the communication involves one sided operations with passive synchronization. The code also uses a few other MPI commands. We define MPI vector types: CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, & xtype,ierr) CALL MPI_TYPE_COMMIT(xtype,ierr) Create MPI Windows: CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, & MPI_COMM_WORLD,win,ierr) Synch our gets with lock and unlock: CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) CALL MPI_GET(wget,1,xtype,get_pe, & targ_disp,1,xtype,win,ierr) CALL MPI_WIN_UNLOCK(get_pe,win,ierr) We use one broadcast call call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, 1 MPI_COMM_WORLD,ierr) And of course barriers and freeing the windows and vector types. The error we are getting happens on a MPI_WIN_UNLOCK after a GET call that does not use the MPI_TYPE_VECTOR that we created though. The ierr from the GET call is 0 as well. 4) I talked with the IT person in charge of this cluster and he said that we could try that, but he said the documentation he found on gen2 and udapl was somewhat sparse in that he wasn't sure exactly how to set that up and what the different compilations actually do differently. Is there any resource you can point us towards? Thanks, Tom > Hi Thomas, > > We will look into this issue. Would you please let us know the following: > > 1) We have recently made a couple of bug fixes and released > mvapich2-0.9.8p1. Would you first try that version? > > And if it is not working: > > 2) Did you use the standard compiling scripts (you mentioned ib gold > release, is it on vapi? And did you use make.mvapich2.vapi?) > > 3) Would you provide us some information on how the comunication patterns > of your application are? It seems like one sided operations with passive > synchronization (lock, get, unlock). Did you use other operations? > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl on > your stack, if they are available on your systems? > > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > Hello, > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > through infiniband. I started off running this parallel Fortran code > > on just one node with MPICH2 and had no problems. It scaled decently > > to 8 processors but didn't see much improvement with the jump to 16 > > (possibly due to cache coherency or something). Now, when trying to > > get it running across the infiniband connect I get this error: > > > > current bytes 4, total bytes 28, remote id 1 > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: Assertion 'current_bytes[vc->smp.local_nodes] == 0' failed. > > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > This happens right after a one sided communication (MPI_GET) but > > before the MPI_WIN_UNLOCK call that follows. Also this is only with a > > process that is on the same node as the calling process, The MPI_GET > > call exits with no errors also. > > > > All the osu_benchmarks run with no problems. There were also no > > problems if I make a local mpd (mpd &) ring on a single node and run > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile with > > the MPICH2 libraries there are no problems on a single node or running > > processes spread out on both nodes. > > > > Ever seen this before? Any help would be greatly appreciated. > > > > Thanks, > > Thomas O'Shea > > SAIC From huanwei at cse.ohio-state.edu Fri May 4 18:06:39 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Fri May 4 18:06:48 2007 Subject: [mvapich-discuss] MVAPICH2 Error - Assertion 'current_bytes[vc->smp.local_nodes]==0' failed. In-Reply-To: <028901c78e8b$9220cd10$9b66798b@us.saic.com> Message-ID: Hi Thomas, Thanks for your reply. Because the source code of your application is not available to us, we will do a code review of our code (or do you have a piece of code which shows the problem that can be sent to us?) The reason I ask you to try gen2 (OpenFabrics) stack is because the whole InfiniBand community is moving towards this. So actually most of our efforts is spent on this front (though we still maintain certain necessary maintenance and bug fixes for the vapi stack). You can find useful information to install the OFED stack (OpenFabrics Enterprise Distribution) here: http://www.openfabrics.org/downloads.htm And the information to compile mvapich2 with OFED stack is avaialable through our website. Anyway, we will get back to you once we find something. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 4 May 2007, Thomas O'Shea wrote: > Thanks for the response. > > 1) Turns out we are using mvapich2-0.9.8p1 already. > > 2) Yes, the standard compiling scripts were used. > > 3) You are correct, most of the communication involves one sided operations > with passive synchronization. The code also uses a few other MPI commands. > > We define MPI vector types: > > CALL MPI_TYPE_VECTOR(xlen,nguard,iu_bnd,MPI_DOUBLE_PRECISION, > & xtype,ierr) > > CALL MPI_TYPE_COMMIT(xtype,ierr) > > Create MPI Windows: > > CALL MPI_WIN_CREATE(work,winsize,8,MPI_INFO_NULL, > & MPI_COMM_WORLD,win,ierr) > > Synch our gets with lock and unlock: > > CALL MPI_WIN_LOCK(MPI_LOCK_SHARED,get_pe,0,win,ierr) > CALL MPI_GET(wget,1,xtype,get_pe, > & targ_disp,1,xtype,win,ierr) > CALL MPI_WIN_UNLOCK(get_pe,win,ierr) > > We use one broadcast call > > call MPI_BCAST(qxyz,3*maxpan,MPI_DOUBLE_PRECISION,0, > 1 MPI_COMM_WORLD,ierr) > > And of course barriers and freeing the windows and vector types. > > The error we are getting happens on a MPI_WIN_UNLOCK after a GET call that > does not use the MPI_TYPE_VECTOR that we created though. The ierr from the > GET call is 0 as well. > > > 4) I talked with the IT person in charge of this cluster and he said that we > could try that, but he said the documentation he found on gen2 and udapl was > somewhat sparse in that he wasn't sure exactly how to set that up and what > the different compilations actually do differently. Is there any resource > you can point us towards? > > Thanks, > Tom > > > > Hi Thomas, > > > > We will look into this issue. Would you please let us know the following: > > > > 1) We have recently made a couple of bug fixes and released > > mvapich2-0.9.8p1. Would you first try that version? > > > > And if it is not working: > > > > 2) Did you use the standard compiling scripts (you mentioned ib gold > > release, is it on vapi? And did you use make.mvapich2.vapi?) > > > > 3) Would you provide us some information on how the comunication patterns > > of your application are? It seems like one sided operations with passive > > synchronization (lock, get, unlock). Did you use other operations? > > > > 4) Will it possible for you to try gen2 (make.mvapich2.ofa) or udapl on > > your stack, if they are available on your systems? > > > > Thanks. > > > > Regards, > > Wei Huang > > > > 774 Dreese Lab, 2015 Neil Ave, > > Dept. of Computer Science and Engineering > > Ohio State University > > OH 43210 > > Tel: (614)292-8501 > > > > > > On Thu, 3 May 2007, Thomas O'Shea wrote: > > > > > Hello, > > > > > > I'm running the MVAPICH2-0.9.8 using the IB Gold Release. I've got 2 > > > 16 processor nodes (each has 8 dual-core AMD Opterons) hooked up > > > through infiniband. I started off running this parallel Fortran code > > > on just one node with MPICH2 and had no problems. It scaled decently > > > to 8 processors but didn't see much improvement with the jump to 16 > > > (possibly due to cache coherency or something). Now, when trying to > > > get it running across the infiniband connect I get this error: > > > > > > current bytes 4, total bytes 28, remote id 1 > > > nfa_opt: ch3_smp_progress.c:2075: MPIDI_CH3I_SMP_pull_header: Assertion > 'current_bytes[vc->smp.local_nodes] == 0' failed. > > > rank 0 in job 1 nessie_32906 caused collective abort of all ranks > > > exit status of rank 0: killed by signal 9 > > > > > > This happens right after a one sided communication (MPI_GET) but > > > before the MPI_WIN_UNLOCK call that follows. Also this is only with a > > > process that is on the same node as the calling process, The MPI_GET > > > call exits with no errors also. > > > > > > All the osu_benchmarks run with no problems. There were also no > > > problems if I make a local mpd (mpd &) ring on a single node and run > > > the code with MVAPICH2 with 2,4,8,or 16 processors. If I compile with > > > the MPICH2 libraries there are no problems on a single node or running > > > processes spread out on both nodes. > > > > > > Ever seen this before? Any help would be greatly appreciated. > > > > > > Thanks, > > > Thomas O'Shea > > > SAIC > From dog at lanl.gov Wed May 9 13:46:54 2007 From: dog at lanl.gov (David Gunter) Date: Wed May 9 13:47:11 2007 Subject: [mvapich-discuss] MVAPICH2-0.9.8 internal errors In-Reply-To: <41803E3B-99E2-4214-AC1D-A710D1DB0598@lanl.gov> References: <200703221537.l2MFbup1010721@xi.cse.ohio-state.edu> <41803E3B-99E2-4214-AC1D-A710D1DB0598@lanl.gov> Message-ID: <2048E353-6CAC-4B3C-A62A-D769149AE504@lanl.gov> I am curious to know if you were able to reproduce this problem, whether it has been fixed or not. Thanks, david -- David Gunter HPC-4: HPC Environments: Parallel Tools Team Los Alamos National Laboratory On Mar 22, 2007, at 11:20 AM, David Gunter wrote: > I am using the released 0.9.8 version. > > -david > > On Mar 22, 2007, at 9:37 AM, Dhabaleswar Panda wrote: > >> Hi David, >> >> Thanks for this information. One more question - are you using >> MVAPICH2 0.9.8 `released' version or the `branch' version (with some >> recent fixes). If you can let us know this information, it will help >> us. >> >> Thanks, >> >> DK >> >> >>> I have recompiled mvapich2 without using the --enable-debuginfo flag >>> and the problem has gone away. However, I wish to have debuginfo >>> available to our TotalView users so hopefully this can be resolved. >>> >>> Here is the configuration that generates the error message I saw >>> previously: >>> >>> ./configure --prefix=/opt/mvapich2/mvapich2-0.9.8-gcc/ib --with- >>> openib=/usr/local/ofed --enable-romio --with-file-system=ufs+nfs >>> +panfs --enable-sharedlibs=gcc --enable-debuginfo --enable-fast -- >>> with-mpe >>> >>> OFED is ofed-1.1 >>> >>> CFLAGS=-D_X86_64_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED - >>> DMPID_USE_SEQUENCE_NUMBERS -I/usr/local/ofed/include -O2 - >>> D_SHMEM_COLL_ >>> >>> CC=/usr/bin/gcc >>> CXX=/usr/bin/g++ >>> FC=/usr/bin/gfortran >>> F77=/usr/bin/gfortran >>> F90=/usr/bin/gfortran >>> >>> The test was run on 32 process, 16 process, 8 process and 4 >>> process - >>> all of which generated this error message. >>> >>> Thanks, >>> david >>> >>> >>> >>> -- >>> David Gunter >>> HPC-4: HPC Environments: Parallel Tools Team >>> Los Alamos National Laboratory >>> >>> >>> On Mar 21, 2007, at 4:50 PM, wei huang wrote: >>> >>>> Hi, >>>> >>>> Thanks for letting us know this problem. >>>> >>>> We will try to reproduce the problem on our cluster. To help us on >>>> looking >>>> into this problem, would you please let us know the following: >>>> >>>> 1) The exact CFLAGS you are using when configuring mvapich2 (are >>>> you using >>>> default scripts provided by us?) >>>> 2) Any runtime environment variables you have setup up? >>>> 3) On how many nodes do you run the test? >>>> >>>> Thanks. >>>> >>>> Regards, >>>> Wei Huang >>>> >>>> 774 Dreese Lab, 2015 Neil Ave, >>>> Dept. of Computer Science and Engineering >>>> Ohio State University >>>> OH 43210 >>>> Tel: (614)292-8501 >>>> >>>> >>>> On Wed, 21 Mar 2007, David Gunter wrote: >>>> >>>>> I have built mvapich2 for an OFED-based IB Opteron cluster. When >>>>> running the Intel MPI Benchmarks (IMB3) I keep seeing the >>>>> following >>>>> errors messages in many spots, although the tests run to >>>>> completion: >>>>> >>>>> Internal error: communicator is already on free list >>>>> >>>>> What is this referring to? >>>>> >>>>> Thanks. >>>>> --david >>>>> >>>>> -- >>>>> David Gunter >>>>> HPC-4: HPC Environments: Parallel Tools Team >>>>> Los Alamos National Laboratory >>>>> >>>>> >>>>> >>>> >>> >>> >>> --Apple-Mail-1-849288998 >>> Content-Transfer-Encoding: quoted-printable >>> Content-Type: text/html; >>> charset=ISO-8859-1 >>> >>> >> space; = >>> -khtml-line-break: after-white-space; ">I have recompiled mvapich2 = >>> without using the --enable-debuginfo flag and the problem has gone = >>> away.=A0 However, I wish to have debuginfo available to our >>> TotalView = >>> users so hopefully this can be resolved.

>> class=3D"khtml-block-placeholder">
Here is the >>> configuration = >>> that generates the error message I saw previously:

>> class=3D"khtml-block-placeholder">
./configure = >>> --prefix=3D/opt/mvapich2/mvapich2-0.9.8-gcc/ib=A0--with-openib=3D/ >>> usr/loca= >>> l/ofed --enable-romio --with-file-system=3Dufs+nfs+panfs = >>> --enable-sharedlibs=3Dgcc --enable-debuginfo --enable-fast = >>> --with-mpe

>> DIV>
OFED= >>> is ofed-1.1

>> class=3D"khtml-block-placeholder">
CFLAGS=3D-D_X86_64_ - >>> D_SMP_ = >>> -DUSE_HEADER_CACHING=A0 -DONE_SIDED -DMPID_USE_SEQUENCE_NUMBERS=A0 = >>> -I/usr/local/ofed/include -O2=A0=A0 -D_SHMEM_COLL_

>> class=3D"khtml-block-placeholder">
CC=3D/usr/bin/gcc>> DIV>
C= >>> XX=3D/usr/bin/g++
FC=3D/usr/bin/gfortran>> DIV>
F77=3D/usr/bi= >>> n/gfortran
F90=3D/usr/bin/gfortran

>> class=3D"khtml-block-placeholder">
The test was run on >>> 32 = >>> process, 16 process, 8 process and 4 process - all of which >>> generated = >>> this error message.

>> class=3D"khtml-block-placeholder">
Thanks,>> DIV>
david
= >>>


>> class=3D"khtml-block-placeholder">

>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: >>> Helvetica; = >>> font-size: 12px; font-style: normal; font-variant: normal; font- >>> weight: = >>> normal; letter-spacing: normal; line-height: normal; text-align: >>> auto; = >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = >>> white-space: normal; widows: 2; word-spacing: 0px; ">>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: >>> Helvetica; = >>> font-size: 12px; font-style: normal; font-variant: normal; font- >>> weight: = >>> normal; letter-spacing: normal; line-height: normal; text-align: >>> auto; = >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = >>> white-space: normal; widows: 2; word-spacing: 0px; ">>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: >>> Helvetica; = >>> font-size: 12px; font-style: normal; font-variant: normal; font- >>> weight: = >>> normal; letter-spacing: normal; line-height: normal; text-align: >>> auto; = >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = >>> white-space: normal; widows: 2; word-spacing: 0px; ">>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: >>> Helvetica; = >>> font-size: 12px; font-style: normal; font-variant: normal; font- >>> weight: = >>> normal; letter-spacing: normal; line-height: normal; text-align: >>> auto; = >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = >>> white-space: normal; widows: 2; word-spacing: 0px; ">>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: >>> Helvetica; = >>> font-size: 12px; font-style: normal; font-variant: normal; font- >>> weight: = >>> normal; letter-spacing: normal; line-height: normal; text-align: >>> auto; = >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = >>> white-space: normal; widows: 2; word-spacing: 0px; = >>> ">
--
David Gunter
HPC-4: HPC >>> Environments: = >>> Parallel Tools Team
Los Alamos National = >>> Laboratory

>> class=3D"Apple-interchange-newline">
= >>>

On Mar 21, 2007, at 4:50 PM, wei huang = >>> wrote:

>> type=3D"cite">
>> margin-bottom: 0px; margin-left: 0px; ">Hi,
>> style=3D"margin-top:= >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = >>> min-height: 14px; ">
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Thanks >>> for = >>> letting us know this problem.
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min- >>> height: = >>> 14px; ">
>> 0px; = >>> margin-bottom: 0px; margin-left: 0px; ">We will try to reproduce >>> the = >>> problem on our cluster. To help us on looking
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">into this problem, would you please let us >>> know the = >>> following:
>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">1) The exact CFLAGS you are using when >>> configuring = >>> mvapich2 (are you using
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">default = >>> scripts provided by us?)
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">2) Any = >>> runtime environment variables you have setup up?
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">3) On how many nodes do you run the test?>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; min-height: 14px; ">
>> style=3D"margin-top: = >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = >>> ">Thanks.
>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">Regards,
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Wei = >>> Huang
>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">774 Dreese Lab, 2015 Neil Ave,
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">Dept. of Computer Science and Engineering>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">Ohio State University
>> style=3D"margin-top: = >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">OH = >>> 43210
>> margin-bottom: 0px; margin-left: 0px; ">Tel: (614)292-8501>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; min-height: 14px; ">
>> style=3D"margin-top: = >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = >>> min-height: 14px; ">
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">On >>> Wed, 21 = >>> Mar 2007, David Gunter wrote:
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min- >>> height: = >>> 14px; ">
>> top: = >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I >>> have = >>> built mvapich2 for an OFED-based IB Opteron cluster.>> class=3D"Apple-converted-space">=A0 When
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">running the Intel MPI Benchmarks (IMB3) I keep = >>> seeing the following
>> right: = >>> 0px; margin-bottom: 0px; margin-left: 0px; ">errors messages in >>> many = >>> spots, although the tests run to completion:
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; min-height: 14px; ">
>> style=3D"margin-top: = >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; >>> ">Internal = >>> error: communicator is already on free list
>> style=3D"margin-top:= >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = >>> min-height: 14px; ">
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">What >>> is this = >>> referring to?
>> 0px; = >>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">Thanks.
>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = >>> ">--david
>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">--
>> right: = >>> 0px; margin-bottom: 0px; margin-left: 0px; ">David Gunter>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; ">HPC-4: HPC Environments: Parallel Tools = >>> Team
>> margin-bottom: 0px; margin-left: 0px; ">Los Alamos National = >>> Laboratory
>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>> DIV>
>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = >>> margin-left: 0px; min-height: 14px; ">
>> style=3D"margin-top: = >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = >>> min-height: 14px; ">
>> top: = >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = >>> min-height: 14px; ">
= >>>

= >>> >>> --Apple-Mail-1-849288998-- >>> >>> --===============1750880957== >>> Content-Type: text/plain; charset="us-ascii" >>> MIME-Version: 1.0 >>> Content-Transfer-Encoding: 7bit >>> Content-Disposition: inline >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> >>> --===============1750880957==-- >>> >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070509/e80c65ef/attachment-0001.html From koop at cse.ohio-state.edu Wed May 9 14:56:01 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed May 9 14:56:12 2007 Subject: [mvapich-discuss] mvapich & mpiexec In-Reply-To: Message-ID: Egor, > If mvapich work witch mpiexec? I use MVAPICH 0.9.9-beta and mpiexec 0.82. > When I try run tasks I see messages: > mpiexec: Warning: read_ib_one: protocol version 5 not known, but > might still work. But nothig to do. If last mvapich version is not > supported by mpiexec. Or are there other methods to run mpi tasks with > batch system? I use the latest torque batch system. Batch scripts with > mpirun work, but for that is needed install distributive of mvapich on > all nodes. Currently there is no way to enable the old startup protocol in 0.9.9. mpiexec will need to be updated to accommodate the new startup protocol that is used in 0.9.9. Matt From panda at cse.ohio-state.edu Wed May 9 16:02:00 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed May 9 16:02:11 2007 Subject: [mvapich-discuss] MVAPICH2-0.9.8 internal errors In-Reply-To: <2048E353-6CAC-4B3C-A62A-D769149AE504@lanl.gov> from "David Gunter" at May 09, 2007 11:46:54 AM Message-ID: <200705092002.l49K20k5000023@xi.cse.ohio-state.edu> Hi David, Do you see this problem with MVAPICH2 0.9.8p2 version? We had several fixes to the released version during the last months. This latest version (0.9.8p2) is available from the mvapich page. This version is also included and is being tested with OFED 1.2. We have also done testing of this latest version with OFED 1.1. Please let us know if you still see the error with this latest version (p2) on your system and we will investigate this issue further. Thanks, DK > I am curious to know if you were able to reproduce this problem, > whether it has been fixed or not. > > Thanks, > david > -- > David Gunter > HPC-4: HPC Environments: Parallel Tools Team > Los Alamos National Laboratory > > > On Mar 22, 2007, at 11:20 AM, David Gunter wrote: > > > I am using the released 0.9.8 version. > > > > -david > > > > On Mar 22, 2007, at 9:37 AM, Dhabaleswar Panda wrote: > > > >> Hi David, > >> > >> Thanks for this information. One more question - are you using > >> MVAPICH2 0.9.8 `released' version or the `branch' version (with some > >> recent fixes). If you can let us know this information, it will help > >> us. > >> > >> Thanks, > >> > >> DK > >> > >> > >>> I have recompiled mvapich2 without using the --enable-debuginfo flag > >>> and the problem has gone away. However, I wish to have debuginfo > >>> available to our TotalView users so hopefully this can be resolved. > >>> > >>> Here is the configuration that generates the error message I saw > >>> previously: > >>> > >>> ./configure --prefix=/opt/mvapich2/mvapich2-0.9.8-gcc/ib --with- > >>> openib=/usr/local/ofed --enable-romio --with-file-system=ufs+nfs > >>> +panfs --enable-sharedlibs=gcc --enable-debuginfo --enable-fast -- > >>> with-mpe > >>> > >>> OFED is ofed-1.1 > >>> > >>> CFLAGS=-D_X86_64_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED - > >>> DMPID_USE_SEQUENCE_NUMBERS -I/usr/local/ofed/include -O2 - > >>> D_SHMEM_COLL_ > >>> > >>> CC=/usr/bin/gcc > >>> CXX=/usr/bin/g++ > >>> FC=/usr/bin/gfortran > >>> F77=/usr/bin/gfortran > >>> F90=/usr/bin/gfortran > >>> > >>> The test was run on 32 process, 16 process, 8 process and 4 > >>> process - > >>> all of which generated this error message. > >>> > >>> Thanks, > >>> david > >>> > >>> > >>> > >>> -- > >>> David Gunter > >>> HPC-4: HPC Environments: Parallel Tools Team > >>> Los Alamos National Laboratory > >>> > >>> > >>> On Mar 21, 2007, at 4:50 PM, wei huang wrote: > >>> > >>>> Hi, > >>>> > >>>> Thanks for letting us know this problem. > >>>> > >>>> We will try to reproduce the problem on our cluster. To help us on > >>>> looking > >>>> into this problem, would you please let us know the following: > >>>> > >>>> 1) The exact CFLAGS you are using when configuring mvapich2 (are > >>>> you using > >>>> default scripts provided by us?) > >>>> 2) Any runtime environment variables you have setup up? > >>>> 3) On how many nodes do you run the test? > >>>> > >>>> Thanks. > >>>> > >>>> Regards, > >>>> Wei Huang > >>>> > >>>> 774 Dreese Lab, 2015 Neil Ave, > >>>> Dept. of Computer Science and Engineering > >>>> Ohio State University > >>>> OH 43210 > >>>> Tel: (614)292-8501 > >>>> > >>>> > >>>> On Wed, 21 Mar 2007, David Gunter wrote: > >>>> > >>>>> I have built mvapich2 for an OFED-based IB Opteron cluster. When > >>>>> running the Intel MPI Benchmarks (IMB3) I keep seeing the > >>>>> following > >>>>> errors messages in many spots, although the tests run to > >>>>> completion: > >>>>> > >>>>> Internal error: communicator is already on free list > >>>>> > >>>>> What is this referring to? > >>>>> > >>>>> Thanks. > >>>>> --david > >>>>> > >>>>> -- > >>>>> David Gunter > >>>>> HPC-4: HPC Environments: Parallel Tools Team > >>>>> Los Alamos National Laboratory > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >>> --Apple-Mail-1-849288998 > >>> Content-Transfer-Encoding: quoted-printable > >>> Content-Type: text/html; > >>> charset=ISO-8859-1 > >>> > >>> >>> space; = > >>> -khtml-line-break: after-white-space; ">I have recompiled mvapich2 = > >>> without using the --enable-debuginfo flag and the problem has gone = > >>> away.=A0 However, I wish to have debuginfo available to our > >>> TotalView = > >>> users so hopefully this can be resolved.

>>> class=3D"khtml-block-placeholder">
Here is the > >>> configuration = > >>> that generates the error message I saw previously:

>>> class=3D"khtml-block-placeholder">
./configure = > >>> --prefix=3D/opt/mvapich2/mvapich2-0.9.8-gcc/ib=A0--with-openib=3D/ > >>> usr/loca= > >>> l/ofed --enable-romio --with-file-system=3Dufs+nfs+panfs = > >>> --enable-sharedlibs=3Dgcc --enable-debuginfo --enable-fast = > >>> --with-mpe

>>> DIV>
OFED= > >>> is ofed-1.1

>>> class=3D"khtml-block-placeholder">
CFLAGS=3D-D_X86_64_ - > >>> D_SMP_ = > >>> -DUSE_HEADER_CACHING=A0 -DONE_SIDED -DMPID_USE_SEQUENCE_NUMBERS=A0 = > >>> -I/usr/local/ofed/include -O2=A0=A0 -D_SHMEM_COLL_

>>> class=3D"khtml-block-placeholder">
CC=3D/usr/bin/gcc >>> DIV>
C= > >>> XX=3D/usr/bin/g++
FC=3D/usr/bin/gfortran >>> DIV>
F77=3D/usr/bi= > >>> n/gfortran
F90=3D/usr/bin/gfortran

>>> class=3D"khtml-block-placeholder">
The test was run on > >>> 32 = > >>> process, 16 process, 8 process and 4 process - all of which > >>> generated = > >>> this error message.

>>> class=3D"khtml-block-placeholder">
Thanks, >>> DIV>
david
= > >>>


>>> class=3D"khtml-block-placeholder">

>>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: > >>> Helvetica; = > >>> font-size: 12px; font-style: normal; font-variant: normal; font- > >>> weight: = > >>> normal; letter-spacing: normal; line-height: normal; text-align: > >>> auto; = > >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = > >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > >>> white-space: normal; widows: 2; word-spacing: 0px; "> >>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: > >>> Helvetica; = > >>> font-size: 12px; font-style: normal; font-variant: normal; font- > >>> weight: = > >>> normal; letter-spacing: normal; line-height: normal; text-align: > >>> auto; = > >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = > >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > >>> white-space: normal; widows: 2; word-spacing: 0px; "> >>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: > >>> Helvetica; = > >>> font-size: 12px; font-style: normal; font-variant: normal; font- > >>> weight: = > >>> normal; letter-spacing: normal; line-height: normal; text-align: > >>> auto; = > >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = > >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > >>> white-space: normal; widows: 2; word-spacing: 0px; "> >>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: > >>> Helvetica; = > >>> font-size: 12px; font-style: normal; font-variant: normal; font- > >>> weight: = > >>> normal; letter-spacing: normal; line-height: normal; text-align: > >>> auto; = > >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = > >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > >>> white-space: normal; widows: 2; word-spacing: 0px; "> >>> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > >>> border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: > >>> Helvetica; = > >>> font-size: 12px; font-style: normal; font-variant: normal; font- > >>> weight: = > >>> normal; letter-spacing: normal; line-height: normal; text-align: > >>> auto; = > >>> -khtml-text-decorations-in-effect: none; text-indent: 0px; = > >>> -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > >>> white-space: normal; widows: 2; word-spacing: 0px; = > >>> ">
--
David Gunter
HPC-4: HPC > >>> Environments: = > >>> Parallel Tools Team
Los Alamos National = > >>> Laboratory

>>> class=3D"Apple-interchange-newline">
= > >>>

On Mar 21, 2007, at 4:50 PM, wei huang = > >>> wrote:

>>> type=3D"cite">
>>> margin-bottom: 0px; margin-left: 0px; ">Hi,
>>> style=3D"margin-top:= > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > >>> min-height: 14px; ">
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Thanks > >>> for = > >>> letting us know this problem.
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min- > >>> height: = > >>> 14px; ">
>>> 0px; = > >>> margin-bottom: 0px; margin-left: 0px; ">We will try to reproduce > >>> the = > >>> problem on our cluster. To help us on looking
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">into this problem, would you please let us > >>> know the = > >>> following:
>>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">1) The exact CFLAGS you are using when > >>> configuring = > >>> mvapich2 (are you using
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">default = > >>> scripts provided by us?)
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">2) Any = > >>> runtime environment variables you have setup up?
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">3) On how many nodes do you run the test? >>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; min-height: 14px; ">
>>> style=3D"margin-top: = > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > >>> ">Thanks.
>>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">Regards,
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Wei = > >>> Huang
>>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">774 Dreese Lab, 2015 Neil Ave,
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">Dept. of Computer Science and Engineering >>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">Ohio State University
>>> style=3D"margin-top: = > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">OH = > >>> 43210
>>> margin-bottom: 0px; margin-left: 0px; ">Tel: (614)292-8501 >>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; min-height: 14px; ">
>>> style=3D"margin-top: = > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > >>> min-height: 14px; ">
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">On > >>> Wed, 21 = > >>> Mar 2007, David Gunter wrote:
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min- > >>> height: = > >>> 14px; ">
>>> top: = > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I > >>> have = > >>> built mvapich2 for an OFED-based IB Opteron cluster. >>> class=3D"Apple-converted-space">=A0 When
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">running the Intel MPI Benchmarks (IMB3) I keep = > >>> seeing the following
>>> right: = > >>> 0px; margin-bottom: 0px; margin-left: 0px; ">errors messages in > >>> many = > >>> spots, although the tests run to completion:
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; min-height: 14px; ">
>>> style=3D"margin-top: = > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; > >>> ">Internal = > >>> error: communicator is already on free list
>>> style=3D"margin-top:= > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > >>> min-height: 14px; ">
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">What > >>> is this = > >>> referring to?
>>> 0px; = > >>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">Thanks.
>>> margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > >>> ">--david
>>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">--
>>> right: = > >>> 0px; margin-bottom: 0px; margin-left: 0px; ">David Gunter >>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; ">HPC-4: HPC Environments: Parallel Tools = > >>> Team
>>> margin-bottom: 0px; margin-left: 0px; ">Los Alamos National = > >>> Laboratory
>>> margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
>>> DIV>
>>> style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > >>> margin-left: 0px; min-height: 14px; ">
>>> style=3D"margin-top: = > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > >>> min-height: 14px; ">
>>> top: = > >>> 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > >>> min-height: 14px; ">
= > >>>

= > >>> > >>> --Apple-Mail-1-849288998-- > >>> > >>> --===============1750880957== > >>> Content-Type: text/plain; charset="us-ascii" > >>> MIME-Version: 1.0 > >>> Content-Transfer-Encoding: 7bit > >>> Content-Disposition: inline > >>> > >>> _______________________________________________ > >>> mvapich-discuss mailing list > >>> mvapich-discuss@cse.ohio-state.edu > >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>> > >>> --===============1750880957==-- > >>> > >> > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > --Apple-Mail-2-709635751 > Content-Transfer-Encoding: quoted-printable > Content-Type: text/html; > charset=ISO-8859-1 > > -khtml-line-break: after-white-space; ">I am curious to know if you were = > able to reproduce this problem, whether it has been fixed or = > not.

class=3D"khtml-block-placeholder">
Thanks,
david
= >
separate; border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: = > Helvetica; font-size: 12px; font-style: normal; font-variant: normal; = > font-weight: normal; letter-spacing: normal; line-height: normal; = > text-align: auto; -khtml-text-decorations-in-effect: none; text-indent: = > 0px; -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > white-space: normal; widows: 2; word-spacing: 0px; "> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; = > font-size: 12px; font-style: normal; font-variant: normal; font-weight: = > normal; letter-spacing: normal; line-height: normal; text-align: auto; = > -khtml-text-decorations-in-effect: none; text-indent: 0px; = > -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > white-space: normal; widows: 2; word-spacing: 0px; "> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; = > font-size: 12px; font-style: normal; font-variant: normal; font-weight: = > normal; letter-spacing: normal; line-height: normal; text-align: auto; = > -khtml-text-decorations-in-effect: none; text-indent: 0px; = > -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > white-space: normal; widows: 2; word-spacing: 0px; "> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; = > font-size: 12px; font-style: normal; font-variant: normal; font-weight: = > normal; letter-spacing: normal; line-height: normal; text-align: auto; = > -khtml-text-decorations-in-effect: none; text-indent: 0px; = > -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > white-space: normal; widows: 2; word-spacing: 0px; "> class=3D"Apple-style-span" style=3D"border-collapse: separate; = > border-spacing: 0px 0px; color: rgb(0, 0, 0); font-family: Helvetica; = > font-size: 12px; font-style: normal; font-variant: normal; font-weight: = > normal; letter-spacing: normal; line-height: normal; text-align: auto; = > -khtml-text-decorations-in-effect: none; text-indent: 0px; = > -apple-text-size-adjust: auto; text-transform: none; orphans: 2; = > white-space: normal; widows: 2; word-spacing: 0px; = > ">
--
David Gunter
HPC-4: HPC Environments: = > Parallel Tools Team
Los Alamos National = > Laboratory

class=3D"Apple-interchange-newline">
= >

On Mar 22, 2007, at 11:20 AM, David Gunter = > wrote:

type=3D"cite">
margin-bottom: 0px; margin-left: 0px; ">I am using the released 0.9.8 = > version.
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">-david
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">On Mar 22, 2007, at 9:37 AM, = > Dhabaleswar Panda wrote:
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Hi = > David,
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Thanks for this information. One more question - are = > you using
margin-bottom: 0px; margin-left: 0px; ">MVAPICH2 0.9.8 `released' = > version or the `branch' version (with some
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">recent = > fixes). If you can let us know this information, it will help
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">us.
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">Thanks,
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">DK
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
type=3D"cite">
margin-bottom: 0px; margin-left: 0px; ">I have recompiled mvapich2 = > without using the --enable-debuginfo flag
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">and the = > problem has gone away.=A0 = > However, I wish to have debuginfo
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">available to our TotalView users so hopefully this can be = > resolved.
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Here is the configuration that generates the error = > message I saw
margin-bottom: 0px; margin-left: 0px; ">previously:
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">./configure --prefix=3D/opt/mvapich2/mvapich2-0.9.8-gcc/ib = > --with-
margin-bottom: 0px; margin-left: 0px; ">openib=3D/usr/local/ofed = > --enable-romio --with-file-system=3Dufs+nfs
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">+panfs = > --enable-sharedlibs=3Dgcc --enable-debuginfo --enable-fast --
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">with-mpe
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">OFED is ofed-1.1
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">CFLAGS=3D-D_X86_64_ -D_SMP_ -DUSE_HEADER_CACHING class=3D"Apple-converted-space">=A0 -DONE_SIDED -
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">DMPID_USE_SEQUENCE_NUMBERS class=3D"Apple-converted-space">=A0 -I/usr/local/ofed/include -O2 = > =A0 -
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">D_SHMEM_COLL_
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">CC=3D/usr/bin/gcc
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">CXX=3D/usr/bin/g++
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">FC=3D/usr/bin/gfortran
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">F77=3D/usr/bin/gfortran
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">F90=3D/usr/bin/gfortran
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">The test was run on 32 process, = > 16 process, 8 process and 4 process -
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">all of = > which generated this error message.
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">Thanks,
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">david
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">--
margin-bottom: 0px; margin-left: 0px; ">David Gunter
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">HPC-4: HPC Environments: Parallel Tools = > Team
margin-bottom: 0px; margin-left: 0px; ">Los Alamos National = > Laboratory
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">On Mar = > 21, 2007, at 4:50 PM, wei huang wrote:
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Hi,
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">Thanks for letting us know this = > problem.
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">We will try to reproduce the problem on our cluster. = > To help us on
margin-bottom: 0px; margin-left: 0px; ">looking
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">into this problem, would you please let us know the = > following:
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">1) The exact CFLAGS you are using when configuring = > mvapich2 (are
margin-bottom: 0px; margin-left: 0px; ">you using
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">default scripts provided by us?)
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">2) Any runtime environment variables you have setup = > up?
margin-bottom: 0px; margin-left: 0px; ">3) On how many nodes do you run = > the test?
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Thanks.
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">Regards,
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Wei Huang
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; ">774 Dreese Lab, 2015 Neil = > Ave,
margin-bottom: 0px; margin-left: 0px; ">Dept. of Computer Science and = > Engineering
margin-bottom: 0px; margin-left: 0px; ">Ohio State University
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">OH 43210
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Tel: = > (614)292-8501
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">On Wed, = > 21 Mar 2007, David Gunter wrote:
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I have = > built mvapich2 for an OFED-based IB Opteron cluster. class=3D"Apple-converted-space">=A0 When
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">running the Intel MPI Benchmarks (IMB3) I keep = > seeing the following
0px; margin-bottom: 0px; margin-left: 0px; ">errors messages in many = > spots, although the tests run to completion:
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Internal = > error: communicator is already on free list
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > min-height: 14px; ">
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">What is this = > referring to?
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Thanks.
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">--david
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">--
0px; margin-bottom: 0px; margin-left: 0px; ">David Gunter
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">HPC-4: HPC Environments: Parallel Tools = > Team
margin-bottom: 0px; margin-left: 0px; ">Los Alamos National = > Laboratory
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > min-height: 14px; ">
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; ">
margin-bottom: 0px; margin-left: 0px; = > ">--Apple-Mail-1-849288998
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">Content-Transfer-Encoding: quoted-printable
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Content-Type: text/html;
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; "> style=3D"white-space:pre"> charset=3DISO-8859-1
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > "><HTML><BODY style=3D3D"word-wrap: break-word; = > -khtml-nbsp-mode: space; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">-khtml-line-break: after-white-space; ">I have recompiled mvapich2 = > =3D
margin-bottom: 0px; margin-left: 0px; ">without using the = > --enable-debuginfo flag and the problem has gone =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">away.=3DA0 However, I wish to have debuginfo = > available to our TotalView =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">users so = > hopefully this can be resolved.<DIV><BR =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; = > ">class=3D3D"khtml-block-placeholder"></DIV><DIV>Here is = > the configuration =3D
0px; margin-bottom: 0px; margin-left: 0px; ">that generates the error = > message I saw previously:</DIV><DIV><BR =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; = > ">class=3D3D"khtml-block-placeholder"></DIV><DIV>./configur= > e =3D
margin-bottom: 0px; margin-left: 0px; = > ">--prefix=3D3D/opt/mvapich2/mvapich2-0.9.8-gcc/ib=3DA0--with-openib=3D3D/= > usr/loca=3D
margin-bottom: 0px; margin-left: 0px; ">l/ofed --enable-romio = > --with-file-system=3D3Dufs+nfs+panfs =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">--enable-sharedlibs=3D3Dgcc --enable-debuginfo --enable-fast = > =3D
margin-bottom: 0px; margin-left: 0px; = > ">--with-mpe</DIV><DIV><BR = > class=3D3D"khtml-block-placeholder"></DIV><DIV>OFED=3D >
margin-left: 0px; ">=A0is = > ofed-1.1</DIV><DIV><BR =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"khtml-block-placeholder"></DIV><DIV>CFLAGS=3D3= > D-D_X86_64_ -D_SMP_ =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">-DUSE_HEADER_CACHING=3DA0 -DONE_SIDED -DMPID_USE_SEQUENCE_NUMBERS=3DA0 = > =3D
margin-bottom: 0px; margin-left: 0px; ">-I/usr/local/ofed/include = > -O2=3DA0=3DA0 -D_SHMEM_COLL_</DIV><DIV><BR =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; = > ">class=3D3D"khtml-block-placeholder"></DIV><DIV>CC=3D3D/us= > r/bin/gcc</DIV><DIV>C=3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">XX=3D3D/usr/bin/g++</DIV><DIV>FC=3D3D/usr/bin/gfortran</D= > IV><DIV>F77=3D3D/usr/bi=3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">n/gfortran</DIV><DIV>F90=3D3D/usr/bin/gfortran</DIV>&l= > t;DIV><BR =3D
0px; margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"khtml-block-placeholder"></DIV><DIV>The test = > was run on 32 =3D
margin-bottom: 0px; margin-left: 0px; ">process, 16 process, 8 process = > and 4 process - all of which generated =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">this = > error message.</DIV><DIV><BR =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; = > ">class=3D3D"khtml-block-placeholder"></DIV><DIV>Thanks,<= > ;/DIV><DIV>david</DIV>=3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > "><DIV><BR = > class=3D3D"khtml-block-placeholder"></DIV><DIV><BR = > =3D
margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"khtml-block-placeholder"></DIV><DIV><BR>= > <DIV> <SPAN =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"Apple-style-span" style=3D3D"border-collapse: separate; = > =3D
margin-bottom: 0px; margin-left: 0px; ">border-spacing: 0px 0px; color: = > rgb(0, 0, 0); font-family: Helvetica; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">font-size: 12px; font-style: normal; font-variant: normal; = > font-weight: =3D
margin-bottom: 0px; margin-left: 0px; ">normal; letter-spacing: normal; = > line-height: normal; text-align: auto; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">-khtml-text-decorations-in-effect: none; text-indent: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">-apple-text-size-adjust: auto; text-transform: none; = > orphans: 2; =3D
margin-bottom: 0px; margin-left: 0px; ">white-space: normal; widows: 2; = > word-spacing: 0px; "><SPAN =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"Apple-style-span" style=3D3D"border-collapse: separate; = > =3D
margin-bottom: 0px; margin-left: 0px; ">border-spacing: 0px 0px; color: = > rgb(0, 0, 0); font-family: Helvetica; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">font-size: 12px; font-style: normal; font-variant: normal; = > font-weight: =3D
margin-bottom: 0px; margin-left: 0px; ">normal; letter-spacing: normal; = > line-height: normal; text-align: auto; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">-khtml-text-decorations-in-effect: none; text-indent: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">-apple-text-size-adjust: auto; text-transform: none; = > orphans: 2; =3D
margin-bottom: 0px; margin-left: 0px; ">white-space: normal; widows: 2; = > word-spacing: 0px; "><SPAN =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"Apple-style-span" style=3D3D"border-collapse: separate; = > =3D
margin-bottom: 0px; margin-left: 0px; ">border-spacing: 0px 0px; color: = > rgb(0, 0, 0); font-family: Helvetica; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">font-size: 12px; font-style: normal; font-variant: normal; = > font-weight: =3D
margin-bottom: 0px; margin-left: 0px; ">normal; letter-spacing: normal; = > line-height: normal; text-align: auto; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">-khtml-text-decorations-in-effect: none; text-indent: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">-apple-text-size-adjust: auto; text-transform: none; = > orphans: 2; =3D
margin-bottom: 0px; margin-left: 0px; ">white-space: normal; widows: 2; = > word-spacing: 0px; "><SPAN =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"Apple-style-span" style=3D3D"border-collapse: separate; = > =3D
margin-bottom: 0px; margin-left: 0px; ">border-spacing: 0px 0px; color: = > rgb(0, 0, 0); font-family: Helvetica; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">font-size: 12px; font-style: normal; font-variant: normal; = > font-weight: =3D
margin-bottom: 0px; margin-left: 0px; ">normal; letter-spacing: normal; = > line-height: normal; text-align: auto; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">-khtml-text-decorations-in-effect: none; text-indent: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">-apple-text-size-adjust: auto; text-transform: none; = > orphans: 2; =3D
margin-bottom: 0px; margin-left: 0px; ">white-space: normal; widows: 2; = > word-spacing: 0px; "><SPAN =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">class=3D3D"Apple-style-span" style=3D3D"border-collapse: separate; = > =3D
margin-bottom: 0px; margin-left: 0px; ">border-spacing: 0px 0px; color: = > rgb(0, 0, 0); font-family: Helvetica; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">font-size: 12px; font-style: normal; font-variant: normal; = > font-weight: =3D
margin-bottom: 0px; margin-left: 0px; ">normal; letter-spacing: normal; = > line-height: normal; text-align: auto; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">-khtml-text-decorations-in-effect: none; text-indent: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">-apple-text-size-adjust: auto; text-transform: none; = > orphans: 2; =3D
margin-bottom: 0px; margin-left: 0px; ">white-space: normal; widows: 2; = > word-spacing: 0px; =3D
0px; margin-bottom: 0px; margin-left: 0px; = > ">"><DIV>--</DIV><DIV>David = > Gunter</DIV><DIV>HPC-4: HPC Environments: =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Parallel Tools Team</DIV><DIV>Los Alamos = > National =3D
margin-bottom: 0px; margin-left: 0px; = > ">Laboratory</DIV></SPAN></SPAN><BR =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; = > ">class=3D3D"Apple-interchange-newline"></SPAN></SPAN></= > SPAN> =3D
margin-bottom: 0px; margin-left: 0px; = > "></DIV><BR><DIV><DIV>On Mar 21, 2007, at 4:50 = > PM, wei huang =3D
margin-bottom: 0px; margin-left: 0px; ">wrote:</DIV><BR = > class=3D3D"Apple-interchange-newline"><BLOCKQUOTE =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">type=3D3D"cite"><DIV style=3D3D"margin-top: = > 0px; margin-right: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">margin-bottom: 0px; margin-left: 0px; ">Hi,</DIV><DIV = > style=3D3D"margin-top:=3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "> class=3D"Apple-converted-space">=A00px; margin-right: 0px; = > margin-bottom: 0px; margin-left: 0px; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">min-height: 14px; "><BR></DIV><DIV = > style=3D3D"margin-top: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-right: = > 0px; margin-bottom: 0px; margin-left: 0px; ">Thanks for =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">letting us know this problem.</DIV><DIV = > style=3D3D"margin-top: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-right: = > 0px; margin-bottom: 0px; margin-left: 0px; min-height: =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">14px; "><BR></DIV><DIV = > style=3D3D"margin-top: 0px; margin-right: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">margin-bottom: 0px; margin-left: 0px; ">We will = > try to reproduce the =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">problem on = > our cluster. To help us on looking</DIV><DIV =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">style=3D3D"margin-top: 0px; margin-right: 0px; = > margin-bottom: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-left: = > 0px; ">into this problem, would you please let us know the = > =3D
margin-bottom: 0px; margin-left: 0px; ">following:</DIV><DIV = > style=3D3D"margin-top: 0px; margin-right: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; "><BR></DIV><DIV =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">style=3D3D"margin-top: 0px; margin-right: 0px; = > margin-bottom: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-left: = > 0px; ">1) The exact CFLAGS you are using when configuring =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">mvapich2 (are you using</DIV><DIV = > style=3D3D"margin-top: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-right: = > 0px; margin-bottom: 0px; margin-left: 0px; ">default =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">scripts provided by us?)</DIV><DIV = > style=3D3D"margin-top: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-right: = > 0px; margin-bottom: 0px; margin-left: 0px; ">2) Any =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">runtime environment variables you have setup = > up?</DIV><DIV =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">style=3D3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > =3D
margin-bottom: 0px; margin-left: 0px; ">margin-left: 0px; ">3) On how = > many nodes do you run the test?</DIV><DIV =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">style=3D3D"margin-top: 0px; margin-right: 0px; = > margin-bottom: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-left: = > 0px; min-height: 14px; "><BR></DIV><DIV = > style=3D3D"margin-top: =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">0px; = > margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">">Thanks.</DIV><DIV style=3D3D"margin-top:= > 0px; margin-right: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">margin-bottom: 0px; margin-left: 0px; min-height: 14px; = > "><BR></DIV><DIV =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">style=3D3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > =3D
margin-bottom: 0px; margin-left: 0px; ">margin-left: 0px; = > ">Regards,</DIV><DIV style=3D3D"margin-top: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">margin-right: 0px; margin-bottom: 0px; margin-left: = > 0px; ">Wei =3D
margin-bottom: 0px; margin-left: 0px; ">Huang</DIV><DIV = > style=3D3D"margin-top: 0px; margin-right: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; "><BR></DIV><DIV =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">style=3D3D"margin-top: 0px; margin-right: 0px; = > margin-bottom: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-left: = > 0px; ">774 Dreese Lab, 2015 Neil Ave,</DIV><DIV =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">style=3D3D"margin-top: 0px; margin-right: 0px; = > margin-bottom: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-left: = > 0px; ">Dept. of Computer Science and Engineering</DIV><DIV = > =3D
margin-bottom: 0px; margin-left: 0px; ">style=3D3D"margin-top: 0px; = > margin-right: 0px; margin-bottom: 0px; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">margin-left: 0px; ">Ohio State University</DIV><DIV = > style=3D3D"margin-top: =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">0px; = > margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">OH = > =3D
margin-bottom: 0px; margin-left: 0px; ">43210</DIV><DIV = > style=3D3D"margin-top: 0px; margin-right: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">margin-bottom: 0px; margin-left: 0px; ">Tel: = > (614)292-8501</DIV><DIV =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">style=3D3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > =3D
margin-bottom: 0px; margin-left: 0px; ">margin-left: 0px; min-height: = > 14px; "><BR></DIV><DIV style=3D3D"margin-top: = > =3D
margin-bottom: 0px; margin-left: 0px; ">0px; margin-right: 0px; = > margin-bottom: 0px; margin-left: 0px; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">min-height: 14px; "><BR></DIV><DIV = > style=3D3D"margin-top: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-right: = > 0px; margin-bottom: 0px; margin-left: 0px; ">On Wed, 21 =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Mar 2007, David Gunter wrote:</DIV><DIV = > style=3D3D"margin-top: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-right: = > 0px; margin-bottom: 0px; margin-left: 0px; min-height: =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">14px; "><BR></DIV> <BLOCKQUOTE = > type=3D3D"cite"><DIV style=3D3D"margin-top: =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">I have =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">built = > mvapich2 for an OFED-based IB Opteron cluster.<SPAN =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">class=3D3D"Apple-converted-space">=3DA0 = > </SPAN>When</DIV><DIV =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">style=3D3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > =3D
margin-bottom: 0px; margin-left: 0px; ">margin-left: 0px; ">running = > the Intel MPI Benchmarks (IMB3) I keep =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">seeing = > the following</DIV><DIV style=3D3D"margin-top: 0px; = > margin-right: =3D
margin-bottom: 0px; margin-left: 0px; ">0px; margin-bottom: 0px; = > margin-left: 0px; ">errors messages in many =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">spots, although the tests run to = > completion:</DIV><DIV =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">style=3D3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > =3D
margin-bottom: 0px; margin-left: 0px; ">margin-left: 0px; min-height: = > 14px; "><BR></DIV><DIV style=3D3D"margin-top: = > =3D
margin-bottom: 0px; margin-left: 0px; ">0px; margin-right: 0px; = > margin-bottom: 0px; margin-left: 0px; ">Internal =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">error: communicator is already on free = > list</DIV><DIV style=3D3D"margin-top:=3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">=A00px; = > margin-right: 0px; margin-bottom: 0px; margin-left: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">min-height: 14px; "><BR></DIV><DIV = > style=3D3D"margin-top: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-right: = > 0px; margin-bottom: 0px; margin-left: 0px; ">What is this =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">referring to?</DIV><DIV = > style=3D3D"margin-top: 0px; margin-right: 0px; =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">margin-bottom: 0px; margin-left: 0px; min-height: = > 14px; "><BR></DIV><DIV =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">style=3D3D"margin-top: 0px; margin-right: 0px; = > margin-bottom: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-left: = > 0px; ">Thanks.</DIV><DIV style=3D3D"margin-top: 0px; = > =3D
margin-bottom: 0px; margin-left: 0px; ">margin-right: 0px; = > margin-bottom: 0px; margin-left: 0px; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">">--david</DIV><DIV style=3D3D"margin-top: 0px; = > margin-right: 0px; =3D
0px; margin-bottom: 0px; margin-left: 0px; ">margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; "><BR></DIV><DIV = > =3D
margin-bottom: 0px; margin-left: 0px; ">style=3D3D"margin-top: 0px; = > margin-right: 0px; margin-bottom: 0px; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">margin-left: 0px; ">--</DIV><DIV style=3D3D"margin-top: = > 0px; margin-right: =3D
0px; margin-bottom: 0px; margin-left: 0px; ">0px; margin-bottom: 0px; = > margin-left: 0px; ">David Gunter</DIV><DIV =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">style=3D3D"margin-top: 0px; margin-right: 0px; = > margin-bottom: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">margin-left: = > 0px; ">HPC-4: HPC Environments: Parallel Tools =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Team</DIV><DIV style=3D3D"margin-top: 0px; = > margin-right: 0px; =3D
0px; margin-bottom: 0px; margin-left: 0px; ">margin-bottom: 0px; = > margin-left: 0px; ">Los Alamos National =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Laboratory</DIV><DIV style=3D3D"margin-top: = > 0px; margin-right: 0px; =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">margin-bottom: 0px; margin-left: 0px; min-height: 14px; = > "><BR></DIV><DIV =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">style=3D3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > =3D
margin-bottom: 0px; margin-left: 0px; ">margin-left: 0px; min-height: = > 14px; "><BR></DIV><DIV style=3D3D"margin-top: = > =3D
margin-bottom: 0px; margin-left: 0px; ">0px; margin-right: 0px; = > margin-bottom: 0px; margin-left: 0px; =3D
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">min-height: 14px; "><BR></DIV> = > </BLOCKQUOTE><DIV style=3D3D"margin-top: =3D
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; =3D
0px; margin-bottom: 0px; margin-left: 0px; ">min-height: 14px; = > "><BR></DIV> =3D
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > "></BLOCKQUOTE></DIV><BR></DIV></BODY></H= > TML>=3D
margin-bottom: 0px; margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">--Apple-Mail-1-849288998--
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">--=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D1750880957=3D=3D
V style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Content-Type: text/plain; = > charset=3D"us-ascii"
0px; margin-bottom: 0px; margin-left: 0px; ">MIME-Version: 1.0
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Content-Transfer-Encoding: 7bit
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">Content-Disposition: inline
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">_______________________________________________
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">mvapich-discuss mailing list
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; "> href=3D"mailto:mvapich-discuss@cse.ohio-state.edu">mvapich-discuss@cse.ohi= > o-state.edu
IV style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">--=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D1750880957=3D=3D--
<= > DIV style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; min-height: 14px; ">
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; = > ">_______________________________________________
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; ">mvapich-discuss mailing list
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; = > margin-left: 0px; "> href=3D"mailto:mvapich-discuss@cse.ohio-state.edu">mvapich-discuss@cse.ohi= > o-state.edu
= >
= > > --Apple-Mail-2-709635751-- > > --===============1220798408== > Content-Type: text/plain; charset="us-ascii" > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > --===============1220798408==-- > From christian.guggenberger at rzg.mpg.de Thu May 10 04:32:54 2007 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Thu May 10 07:57:13 2007 Subject: [mvapich-discuss] MVAPICH2-0.9.8 internal errors In-Reply-To: <200705092002.l49K20k5000023@xi.cse.ohio-state.edu> References: <2048E353-6CAC-4B3C-A62A-D769149AE504@lanl.gov> <200705092002.l49K20k5000023@xi.cse.ohio-state.edu> Message-ID: <20070510083253.GA3678@daltons.rzg.mpg.de> Hi, > > Do you see this problem with MVAPICH2 0.9.8p2 version? We had several > fixes to the released version during the last months. This latest > version (0.9.8p2) is available from the mvapich page. This version is > also included and is being tested with OFED 1.2. We have also done > testing of this latest version with OFED 1.1. > > Please let us know if you still see the error with this latest version > (p2) on your system and we will investigate this issue further. > btw - would it be possible to put put some (short) changelog on the release pages ? i.e. p2 <- p1 <- Main-Release Or is this easily possible with svn ? thanks a lot, - Christian From huanwei at cse.ohio-state.edu Thu May 10 09:34:05 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Thu May 10 09:34:17 2007 Subject: [mvapich-discuss] MVAPICH2-0.9.8 internal errors In-Reply-To: <20070510083253.GA3678@daltons.rzg.mpg.de> Message-ID: Hi Christian, You can simply check out from our 0.9.8 branch from svn: https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich2/branches/0.9.8/ Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Thu, 10 May 2007, Christian Guggenberger wrote: > Hi, > > > > > Do you see this problem with MVAPICH2 0.9.8p2 version? We had several > > fixes to the released version during the last months. This latest > > version (0.9.8p2) is available from the mvapich page. This version is > > also included and is being tested with OFED 1.2. We have also done > > testing of this latest version with OFED 1.1. > > > > Please let us know if you still see the error with this latest version > > (p2) on your system and we will investigate this issue further. > > > > btw - would it be possible to put put some (short) changelog on the > release pages ? i.e. > > p2 <- p1 <- Main-Release > > Or is this easily possible with svn ? > > thanks a lot, > - Christian > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From dog at lanl.gov Thu May 10 14:18:35 2007 From: dog at lanl.gov (David Gunter) Date: Thu May 10 14:22:15 2007 Subject: [mvapich-discuss] MVAPICH2-0.9.8 internal errors In-Reply-To: <200705092002.l49K20k5000023@xi.cse.ohio-state.edu> References: <200705092002.l49K20k5000023@xi.cse.ohio-state.edu> Message-ID: <9B31E954-1544-483F-A674-50CBFA26673D@lanl.gov> I had a chance to download mvapich2-0.9.8p2 this morning and retest and yes, the problem still exists. I am running the IMB 3.0 suite and I see the following type of behavior: #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 2 # ( 2 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 1.79 1.79 1.79 8 1000 1.81 1.82 1.81 16 1000 1.84 1.84 1.84 32 1000 1.93 1.93 1.93 64 1000 1.97 1.97 1.97 128 1000 2.12 2.13 2.13 256 1000 2.39 2.39 2.39 512 1000 2.87 2.88 2.87 1024 1000 3.90 3.90 3.90 2048 1000 5.97 5.97 5.97 4096 1000 10.38 10.38 10.38 8192 1000 19.20 19.20 19.20 16384 1000 36.35 36.35 36.35 32768 1000 69.93 69.93 69.93 65536 640 142.94 142.94 142.94 131072 320 281.42 281.42 281.42 262144 160 607.62 607.66 607.64 524288 8