From kubota at cray.com Tue Sep 1 04:14:56 2009 From: kubota at cray.com (Yutaka Kubota) Date: Tue Sep 1 04:15:51 2009 Subject: [mvapich-discuss] MV2_USE_BLOCKING option trouble on MVAPICH2 all version In-Reply-To: <23b6b0910908311117k694c8c48w5d4da0de19f796d0@mail.gmail.com> References: <3B7D8CBBF8049C4C9746728929189A920EBF2DE3@CFEVS1-IP.americas.cray.com> <23b6b0910908311117k694c8c48w5d4da0de19f796d0@mail.gmail.com> Message-ID: <3B7D8CBBF8049C4C9746728929189A920EC9E2F0@CFEVS1-IP.americas.cray.com> Hi Sreeram, We could do and resolve this issue using "MV2_ON_DEMAND_THRESHOLD=128" option. Thank you very much, please close this question. Best regards Yutaka Kubota From: sreeram potluri [mailto:sreeram.chowdary@gmail.com] Sent: Tuesday, September 01, 2009 3:17 AM To: Yutaka Kubota Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] MV2_USE_BLOCKING option trouble on MVAPICH2 all version Hi Yutaka, Regarding the issue you are seeing with BLOCKING mode: Beyond 64 processes mvapich2 uses on-demand mode of connection setup and the combination of on-demand + blocking is not supported yet. We plan to add this into our next release. As a work around, you can set MV2_ON_DEMAND_THRESHOLD=128 during runtime. This will turn off on-demand connection management upto 128 processes and blocking mode should work fine. To run with larger number of processes, you can set the MV2_ON_DEMAND_THRESHOLD accordingly. We are still looking into the compile-time issue you are seeing. It would be helpful if you can forward us the source code. Please let us know if you face any other issues. Thanks Sreeram Potluri On Mon, Aug 31, 2009 at 12:59 AM, Yutaka Kubota wrote: Dear MVAPICH2 Members, How do you do. This is Yutaka Kubota from Cray Japan. I try to submit MVAPICH2 issue first time that we found. We found blocking mode trouble using cpi.c program that is MVAPIC2 sample program. This problem was appeared using "MV2_USE_BLOCKING=1" option and appoint over 65 cores. If this program execute with option on the environment. The message appeared " Created comp channel 0x1ff6ce10" and stop on the way. We think it is issue of MVAPICH2. Because this problem was not appeared with "VIADEV_USE_BLOCKING=1" option and appoint over 65 cores on MVAPICH. $ mpirun_rsh -np 65 -hostfile hostfile.txt MV2_USE_BLOCKING=0 ./a.out Created comp channel 0x1ff6ce10 * In the case of "-np 64" is not appeared this message I also found compile issue of MVAPICH2 latest version. I guessed to suspect this issue for version 1.2-10-01. However we tried to execute compile latest version. There are appeared follows message during compiling. -1.2p1 and 1.4rc1 /opt/intel/cce/10.1.015/lib/libimf.so: warning: warning: feupdateenv is not implemented and will always fail *But out putted a.out If you need this source code. I will send you. The user had consented that we send user program to developer team. Best regards Yutaka Kubota, Cray Japan Inc _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090901/2249666d/attachment.html From doriankrause at web.de Tue Sep 1 14:41:40 2009 From: doriankrause at web.de (Dorian Krause) Date: Tue Sep 1 14:42:28 2009 Subject: [mvapich-discuss] Hang in MPI_Isend/MPI_Recv combination In-Reply-To: References: Message-ID: <4A9D6AE4.50200@web.de> Hi, thanks. Unfortunately, we have a downtime due to server-room maintenance. I will test asap ... Dorian Dhabaleswar Panda wrote: > Hi, > > We have made MVAPICH2 1.4RC2 release today. We have run your application > with this version and it seems to be working fine. Can you double-check > your application with this version. > > Thanks, > > DK > > On Thu, 13 Aug 2009, Dorian Krause wrote: > > >> Hi, >> >> again these 96 processors ... >> >> My application hangs in a communication step which looks like this: >> >> --------- >> Group A: >> >> for all neighbors { >> MPI_Isend(...); >> } >> MPI_Waitall(...); >> >> MPI_Barrier(); >> ---- >> Group B: >> >> while(#messages to receive > 0) { >> MPI_Probe(MPI_ANY_SOURCE, &stat); >> q = stat.MPI_SOURCE >> /* in subfunction: */ >> MPI_Probe(q, &stat) >> q = stat.MPI_COUNT; >> MPI_Recv(q, ...); >> } >> MPI_Barrier(); >> ---- >> >> for more 96 processes this application hangs. Since I can't debug on >> this scale, I used gdb to get backtraces. It tourned out that 94 >> processes are waiting in the barrier, One processor is trying to receive >> a message (stuck in MPI_Recv) and one other is waiting in >> MPI_Waitall(...). This looks fine, however the ranks do not match: >> >> On the PE with rank 83, I have >> >> #3 0x00000000004349b9 in PMPI_Recv (buf=0x1bd96010, count=202, >> datatype=-1946157051, source=40, tag=374, comm=-1006632954, status=0x1) >> at recv.c:156 >> >> and on PE with rank *12* I have >> >> #3 0x00000000004368f4 in PMPI_Waitall (count=8, >> array_of_requests=0x197e6b10, array_of_statuses=0x1) >> at waitall.c:191 >> >> It seems that rank 40 slipped throught the MPI_Waitall eventhough he was >> not supposed to do so ... >> >> Please find attached the output files. There are three processes which >> seem to be not in the barrier (2 on compute-0-3 and 1 on compute-0-13 >> but the one with the short backtrace on compute-0-3 is also in the >> barrier as I could confirm by hand). >> >> Any hints what might cause this error? >> >> I'm using the trunk version of mvapich2 (check-out yesterday) and the >> cluster consists of 14 LS22 blades (opteron) with 4x DDR Infiniband. I'm >> not quiet sure which ofed version it is (it is delivered with the rocks >> distribution and they are typically not very verbose concerning version >> numbers ...). >> >> Thanks for your help, >> Dorian >> >> >> >> >> >> >> >> >> > > > From dog at lanl.gov Wed Sep 9 16:06:27 2009 From: dog at lanl.gov (David Gunter) Date: Wed Sep 9 23:32:55 2009 Subject: [mvapich-discuss] Working link for download Message-ID: Can someone provide a working URL from which the latest MVAPICH2 RC may be downloaded? Thanks. -david -- David Gunter HPC-3: Parallel Tools Team Los Alamos National Laboratory From panda at cse.ohio-state.edu Thu Sep 10 07:47:37 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Sep 10 07:48:07 2009 Subject: [mvapich-discuss] Working link for download In-Reply-To: Message-ID: David - Our deparment network went through maintenance yesterday and all systems were down. Similar things happened last week for two days also. Now everything should be operational. You should be able to download RC2 from the mvapich site (http://mvapich.cse.ohio-state.edu). Thanks, DK On Wed, 9 Sep 2009, David Gunter wrote: > Can someone provide a working URL from which the latest MVAPICH2 RC > may be downloaded? > > Thanks. > -david > -- > David Gunter > HPC-3: Parallel Tools Team > Los Alamos National Laboratory > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From Craig.Tierney at noaa.gov Mon Sep 14 15:46:46 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Mon Sep 14 15:47:19 2009 Subject: [mvapich-discuss] Unable to build mvapich2 1.2p1 for use with totalview In-Reply-To: References: Message-ID: <4AAE9DA6.7070908@noaa.gov> I am trying to build Mvapich2 1.2p1 so that I can use Totalview. The docs say that I am supposed to add the following options: --enable-g=dbg --enable-sharedlibs=gcc --enable-debuginfo My complete configure line is: ./configure CC="icc -L/usr/lib64 -L/lib64" CXX="icpc -L/usr/lib64 -L/lib64" F77="ifort -L/usr/lib64 -L/lib64" FC="ifort -L/usr/lib64 -L/lib64" F90="ifort -L/usr/lib64 -L/lib64" \ --with-ib-libpath=/usr/lib64 \ --with-ib-include=/usr/include \ --prefix=/opt/hjet/mvapich2/1.2p1-intel \ --enable-romio=yes --with-file-system=lustre \ --with-pm=remshell \ --enable-g=dbg --enable-sharedlibs=gcc --enable-debuginfo \ --enable-threads=multiple The problem is that when it tries to link the system tools, it fails. For example, when it tries to link mpiexec, I get: icc -L/usr/lib64 -L/lib64 -g -L/usr/lib64 -static -o mpiexec mpiexec.o -L../util \ -lmpiexec -L../../../lib -L/home/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/lib -lmpich -lpthread -lrdmacm -libverbs -libumad -lrt ../util/libmpiexec.a(pmiport.o): In function `MPIE_GetMyHostName': /misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/pm/util/pmiport.c:200: warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking /usr/lib64/libc.a(malloc.o): In function `__malloc_check_init': (.text+0xb00): multiple definition of `__malloc_check_init' ../../../lib/libmpich.a(mvapich_malloc.o):/misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/hooks.c:83: first defined here ld: Warning: size of symbol `__malloc_check_init' changed from 122 in ../../../lib/libmpich.a(mvapich_malloc.o) to 105 in /usr/lib64/libc.a(malloc.o) /usr/lib64/libc.a(malloc.o): In function `_int_free': (.text+0x21f0): multiple definition of `_int_free' ../../../lib/libmpich.a(mvapich_malloc.o):/misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c:4307: first defined here ld: Warning: size of symbol `_int_free' changed from 778 in ../../../lib/libmpich.a(mvapich_malloc.o) to 2413 in /usr/lib64/libc.a(malloc.o) /usr/lib64/libc.a(malloc.o): In function `_int_malloc': (.text+0x2b60): multiple definition of `_int_malloc' ........ And problems with IB: (.text+0xbb): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking /usr/lib64/libibverbs.a(src_libibverbs_la-verbs.o): In function `ibv_create_comp_channel': (.text+0x9b6): undefined reference to `pthread_mutex_trylock' /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_get_fd': (.text+0xdc): undefined reference to `ibwarn' /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_done': (.text+0x10d): undefined reference to `ibwarn' /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_addr_dump': This doesn't happen when the binary is linked dynamically (remove -static). Am I missing an option from getting everything built cleanly? Thanks, Craig -- Craig Tierney (craig.tierney@noaa.gov) From perkinjo at cse.ohio-state.edu Thu Sep 17 13:13:54 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 17 13:14:35 2009 Subject: [mvapich-discuss] Unable to build mvapich2 1.2p1 for use with totalview In-Reply-To: <4AAE9DA6.7070908@noaa.gov> References: <4AAE9DA6.7070908@noaa.gov> Message-ID: <20090917171354.GJ2695@cse.ohio-state.edu> Hi Craig. We haven't seen this type of issue come up before. We'll take a look to see if we can reproduce this issue. In the meantime can you try building while removing the -L/usr/lib64 and -L/lib64 options from your compiler variables. If you really need these options you should add them to the CFLAGS variable. Can you also try leaving the --with-pm option unset (this allows for mpd and mpirun_rsh). We perform the majority of our testing with mpirun_rsh as it scales and performs better than the other pm options. Let us know if any of these actions allows your build to proceed successfully. On Mon, Sep 14, 2009 at 01:46:46PM -0600, Craig Tierney wrote: > > I am trying to build Mvapich2 1.2p1 so that I can use > Totalview. The docs say that I am supposed to add the > following options: > > --enable-g=dbg --enable-sharedlibs=gcc --enable-debuginfo > > My complete configure line is: > > ./configure CC="icc -L/usr/lib64 -L/lib64" CXX="icpc -L/usr/lib64 -L/lib64" F77="ifort -L/usr/lib64 -L/lib64" FC="ifort -L/usr/lib64 -L/lib64" F90="ifort -L/usr/lib64 -L/lib64" \ > --with-ib-libpath=/usr/lib64 \ > --with-ib-include=/usr/include \ > --prefix=/opt/hjet/mvapich2/1.2p1-intel \ > --enable-romio=yes --with-file-system=lustre \ > --with-pm=remshell \ > --enable-g=dbg --enable-sharedlibs=gcc --enable-debuginfo \ > --enable-threads=multiple > > The problem is that when it tries to link the system tools, it fails. > For example, when it tries to link mpiexec, I get: > > icc -L/usr/lib64 -L/lib64 -g -L/usr/lib64 -static -o mpiexec mpiexec.o -L../util \ > -lmpiexec -L../../../lib -L/home/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/lib -lmpich -lpthread -lrdmacm -libverbs -libumad -lrt > ../util/libmpiexec.a(pmiport.o): In function `MPIE_GetMyHostName': > /misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/pm/util/pmiport.c:200: warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the > glibc version used for linking > /usr/lib64/libc.a(malloc.o): In function `__malloc_check_init': > (.text+0xb00): multiple definition of `__malloc_check_init' > ../../../lib/libmpich.a(mvapich_malloc.o):/misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/hooks.c:83: first defined here > ld: Warning: size of symbol `__malloc_check_init' changed from 122 in ../../../lib/libmpich.a(mvapich_malloc.o) to 105 in /usr/lib64/libc.a(malloc.o) > /usr/lib64/libc.a(malloc.o): In function `_int_free': > (.text+0x21f0): multiple definition of `_int_free' > ../../../lib/libmpich.a(mvapich_malloc.o):/misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c:4307: first defined here > ld: Warning: size of symbol `_int_free' changed from 778 in ../../../lib/libmpich.a(mvapich_malloc.o) to 2413 in /usr/lib64/libc.a(malloc.o) > /usr/lib64/libc.a(malloc.o): In function `_int_malloc': > (.text+0x2b60): multiple definition of `_int_malloc' > > ........ > > And problems with IB: > > (.text+0xbb): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking > /usr/lib64/libibverbs.a(src_libibverbs_la-verbs.o): In function `ibv_create_comp_channel': > (.text+0x9b6): undefined reference to `pthread_mutex_trylock' > /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_get_fd': > (.text+0xdc): undefined reference to `ibwarn' > /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_done': > (.text+0x10d): undefined reference to `ibwarn' > /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_addr_dump': > > > This doesn't happen when the binary is linked dynamically (remove -static). > > Am I missing an option from getting everything built cleanly? > > Thanks, > Craig > > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090917/8adfc4b0/attachment.bin From kubota at cray.com Fri Sep 18 01:15:18 2009 From: kubota at cray.com (Yutaka Kubota) Date: Fri Sep 18 01:16:00 2009 Subject: [mvapich-discuss] One of the program always stop on the way using MVAPICH2 Message-ID: <3B7D8CBBF8049C4C9746728929189A920F11F692@CFEVS1-IP.americas.cray.com> Dear MVAPICH2 Discuss Member, This is Yutaka Kubota from Cray Japan. We have found a other problem on the MVAPICH2. Could you investigate about problem? One of the user's program always stop on the way that using MVAPICH2 and ScaLAPACK. But if this program using MVAPICH don't stop on the way and do until last. So I guess this problem is only MVAPICH2 not ScaLAPACK and user program. And I tried to compile several MVAPICH2 version with user program.(1.2p1, 1.4rc1, 1.4rc2 and trunk 9/15/2009 version) there program stop on the way on individual different points and output core files or not. I had already had agreement to send you his program from user. I will let only charger know download URL when I received email from charger. ------------<< follows our investigation >>-------------- - mvapich2_1.4rc2 trunk 9/15/2009 version Stop point line number 3637 Output core files - mvapich2_1.4rc2 Stop point line number 3704 Output core files - mvapich2_1.4rc1 Stop point line number 3558 Output core files is nothing - mvapich2_1.2p1 Stop point line number 3558 Output core files is nothing - mvapich2_1.2-10-01 Stop point line number 3637 Output core files From panda at cse.ohio-state.edu Fri Sep 18 08:20:08 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Sep 18 08:20:39 2009 Subject: [mvapich-discuss] One of the program always stop on the way using MVAPICH2 In-Reply-To: <3B7D8CBBF8049C4C9746728929189A920F11F692@CFEVS1-IP.americas.cray.com> Message-ID: Yutaka - Thanks for this report. Please send us the user's program and we will take a look at it. Thanks, DK On Fri, 18 Sep 2009, Yutaka Kubota wrote: > Dear MVAPICH2 Discuss Member, > > This is Yutaka Kubota from Cray Japan. > > We have found a other problem on the MVAPICH2. Could you investigate > about problem? > > One of the user's program always stop on the way that using MVAPICH2 and > ScaLAPACK. But if this program using MVAPICH don't stop on the way and > do until last. So I guess this problem is only MVAPICH2 not ScaLAPACK > and user program. And I tried to compile several MVAPICH2 version with > user program.(1.2p1, 1.4rc1, 1.4rc2 and trunk 9/15/2009 version) there > program stop on the way on individual different points and output core > files or not. > > I had already had agreement to send you his program from user. I will > let only charger know download URL when I received email from charger. > > ------------<< follows our investigation >>-------------- > - mvapich2_1.4rc2 trunk 9/15/2009 version > Stop point line number 3637 > Output core files > > - mvapich2_1.4rc2 > Stop point line number 3704 > Output core files > > - mvapich2_1.4rc1 > Stop point line number 3558 > Output core files is nothing > > - mvapich2_1.2p1 > Stop point line number 3558 > Output core files is nothing > > - mvapich2_1.2-10-01 > Stop point line number 3637 > Output core files > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From bfp at purdue.edu Fri Sep 18 10:01:51 2009 From: bfp at purdue.edu (Bryan Putnam) Date: Fri Sep 18 10:02:28 2009 Subject: [mvapich-discuss] mvapich2 and Intel MPI Benchmarks In-Reply-To: References: Message-ID: Hi All, I just wondered if you'd had a chance to try out the Intel MPI Benchmarks with mvapich2. IMB-3.2 is what I'm currently running. We've had no problems with the latest mpich2-1.1.1p1, however when using mvapich2-1.4rc2 or mvapich2-1.4rc1 (on both our IB and iWARP clusters) the benchmarks are failing, for example even a simple mpiexec -np 8 ./IMB-MPI1 will result in errors such as rank 6 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective abort of all ranks exit status of rank 6: killed by signal 9 rank 5 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective abort of all ranks exit status of rank 5: killed by signal 11 rank 4 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective abort of all ranks exit status of rank 4: killed by signal 11 rank 3 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective abort of all ranks exit status of rank 3: killed by signal 11 rank 2 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective abort of all ranks exit status of rank 2: kille Thanks, Bryan -- Bryan Putnam Rosen Center for Advanced Computing, Purdue University Young Hall (Rm. 519) 302 Wood Street West Lafayette, IN 47907-2108 Ph 765-496-8225 Fax 765-494-0566 bfp@purdue.edu http://www.rcac.purdue.edu From bfp at purdue.edu Sat Sep 19 00:00:04 2009 From: bfp at purdue.edu (Bryan Putnam) Date: Sat Sep 19 00:00:41 2009 Subject: [mvapich-discuss] Re: mvapich2 and Intel MPI Benchmarks In-Reply-To: References: Message-ID: On Fri, 18 Sep 2009, Bryan Putnam wrote: > Hi All, > > I just wondered if you'd had a chance to try out the Intel MPI Benchmarks > with mvapich2. IMB-3.2 is what I'm currently running. > > We've had no problems with the latest mpich2-1.1.1p1, however when using > mvapich2-1.4rc2 or mvapich2-1.4rc1 (on both our IB and iWARP clusters) > the benchmarks are failing, for example even a simple > > mpiexec -np 8 ./IMB-MPI1 > > will result in errors such as I've discovered that these errors don't occur if each of the 8 processors is on a separate node. In fact if I disable "shared memory collectives" by setting the variable MV2_USE_SHMEM_COLL=0 then the Intel MPI Benchmark seems to be happy even when the processors are on a single node, and things work fine on both the IB and iWARP clusters. Bryan > > > rank 6 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > abort of all ranks > exit status of rank 6: killed by signal 9 > rank 5 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > abort of all ranks > exit status of rank 5: killed by signal 11 > rank 4 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > abort of all ranks > exit status of rank 4: killed by signal 11 > rank 3 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > abort of all ranks > exit status of rank 3: killed by signal 11 > rank 2 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > abort of all ranks > exit status of rank 2: kille > > > > Thanks, > Bryan > > -- > Bryan Putnam > Rosen Center for Advanced Computing, Purdue University > Young Hall (Rm. 519) > 302 Wood Street > West Lafayette, IN 47907-2108 > Ph 765-496-8225 Fax 765-494-0566 > bfp@purdue.edu > http://www.rcac.purdue.edu > > > From panda at cse.ohio-state.edu Sat Sep 19 07:28:44 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Sep 19 07:29:16 2009 Subject: [mvapich-discuss] Re: mvapich2 and Intel MPI Benchmarks In-Reply-To: Message-ID: Bryan, Thanks for your postings. We have been running IMB 3.2 without any problem. When configuring mvapich2 library, are you disabling shared memory communication? Are you seeing this problem with `mpiexec' or with `mpirun_rsh' also. I will suggest you to start using the `mpirun_rsh' framework for scalable job-launching. Thanks, DK > > Hi All, > > > > I just wondered if you'd had a chance to try out the Intel MPI Benchmarks > > with mvapich2. IMB-3.2 is what I'm currently running. > > > > We've had no problems with the latest mpich2-1.1.1p1, however when using > > mvapich2-1.4rc2 or mvapich2-1.4rc1 (on both our IB and iWARP clusters) > > the benchmarks are failing, for example even a simple > > > > mpiexec -np 8 ./IMB-MPI1 > > > > will result in errors such as > > I've discovered that these errors don't occur if each of the 8 processors > is on a separate node. In fact if I disable "shared memory collectives" > by setting the variable > > MV2_USE_SHMEM_COLL=0 > > then the Intel MPI Benchmark seems to be happy even when the processors > are on a single node, and things work fine on both the IB and iWARP > clusters. > > Bryan > > > > > > > rank 6 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > > abort of all ranks > > exit status of rank 6: killed by signal 9 > > rank 5 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > > abort of all ranks > > exit status of rank 5: killed by signal 11 > > rank 4 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > > abort of all ranks > > exit status of rank 4: killed by signal 11 > > rank 3 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > > abort of all ranks > > exit status of rank 3: killed by signal 11 > > rank 2 in job 1 coates-a279.rcac.purdue.edu_51100 caused collective > > abort of all ranks > > exit status of rank 2: kille > > > > > > > > Thanks, > > Bryan > > > > -- > > Bryan Putnam > > Rosen Center for Advanced Computing, Purdue University > > Young Hall (Rm. 519) > > 302 Wood Street > > West Lafayette, IN 47907-2108 > > Ph 765-496-8225 Fax 765-494-0566 > > bfp@purdue.edu > > http://www.rcac.purdue.edu > > > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From Craig.Tierney at noaa.gov Mon Sep 21 16:27:02 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Mon Sep 21 16:27:37 2009 Subject: [mvapich-discuss] Buffer size of unexpected receive queue, is there a way to increase the buffer? In-Reply-To: <20090917171354.GJ2695@cse.ohio-state.edu> References: <4AAE9DA6.7070908@noaa.gov> <20090917171354.GJ2695@cse.ohio-state.edu> Message-ID: <4AB7E196.8020705@noaa.gov> I am trying to figure out a way to increase the unexpected receive queue, if it is even possible or necessary. I have code that looks a bit like: for i=0,npes-1 msgsize=Some function to determine how much data (even 0) should be sent to another PE mpi_send(send to process i, msgsize) if (msgsize > 0) mpi_send(send to process i, data of msgsize bytes) done for i=0,npes-1 mpi_recv(receive from procss i, msgsize) if (msgsize > 0) mpi_recv(receive from process i, data of msgsize bytes) done On my job with 864 cores, the code is blocking in here. I believe that since the receives are not posted, that I am most likely hitting some internal buffer limits. This code works on other systems, but they have had to tweak internal buffer sizes to get it to work. I cannot change the code in any way. So I am looking for how the size of the unexpected receive queue is set and if/how it can be changed. Thanks, Craig -- Craig Tierney (craig.tierney@noaa.gov) From panda at cse.ohio-state.edu Mon Sep 21 18:03:45 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Sep 21 18:04:18 2009 Subject: [mvapich-discuss] Buffer size of unexpected receive queue, is there a way to increase the buffer? In-Reply-To: <4AB7E196.8020705@noaa.gov> Message-ID: Craig - Thanks for your query. We are examining this situation and will send some suggestions soon. Thanks, DK On Mon, 21 Sep 2009, Craig Tierney wrote: > I am trying to figure out a way to increase the > unexpected receive queue, if it is even possible > or necessary. > > I have code that looks a bit like: > > for i=0,npes-1 > msgsize=Some function to determine how much data (even 0) should be sent to another PE > mpi_send(send to process i, msgsize) > if (msgsize > 0) > mpi_send(send to process i, data of msgsize bytes) > done > > for i=0,npes-1 > mpi_recv(receive from procss i, msgsize) > if (msgsize > 0) > mpi_recv(receive from process i, data of msgsize bytes) > done > > On my job with 864 cores, the code is blocking in here. > I believe that since the receives are not posted, that I > am most likely hitting some internal buffer limits. This > code works on other systems, but they have had to tweak > internal buffer sizes to get it to work. > > I cannot change the code in any way. So I am looking > for how the size of the unexpected receive queue is set > and if/how it can be changed. > > Thanks, > Craig > > > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From balaji at mcs.anl.gov Wed Sep 23 06:19:08 2009 From: balaji at mcs.anl.gov (balaji@mcs.anl.gov) Date: Wed Sep 23 06:19:45 2009 Subject: [mvapich-discuss] Buffer size of unexpected receive queue, is there a way to increase the buffer? In-Reply-To: <19407440.888551253701143990.JavaMail.root@zimbra> Message-ID: <706935.888571253701148347.JavaMail.root@zimbra> Just as a FYI -- the below code creates zero unexpected messages. Everything is expected. And it's a perfectly synchronous application, so there's not much pressure on the internal buffers either (though the performance might not be as great). -- Pavan ----- "Craig Tierney" wrote: > I am trying to figure out a way to increase the > unexpected receive queue, if it is even possible > or necessary. > > I have code that looks a bit like: > > for i=0,npes-1 > msgsize=Some function to determine how much data (even 0) should be > sent to another PE > mpi_send(send to process i, msgsize) > if (msgsize > 0) > mpi_send(send to process i, data of msgsize bytes) > done > > for i=0,npes-1 > mpi_recv(receive from procss i, msgsize) > if (msgsize > 0) > mpi_recv(receive from process i, data of msgsize bytes) > done > > On my job with 864 cores, the code is blocking in here. > I believe that since the receives are not posted, that I > am most likely hitting some internal buffer limits. This > code works on other systems, but they have had to tweak > internal buffer sizes to get it to work. > > I cannot change the code in any way. So I am looking > for how the size of the unexpected receive queue is set > and if/how it can be changed. > > Thanks, > Craig > > > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From balaji at mcs.anl.gov Wed Sep 23 06:21:30 2009 From: balaji at mcs.anl.gov (Pavan Balaji) Date: Wed Sep 23 06:22:07 2009 Subject: [mvapich-discuss] Buffer size of unexpected receive queue, is there a way to increase the buffer? In-Reply-To: <4AB7E196.8020705@noaa.gov> Message-ID: <9251517.888611253701290037.JavaMail.root@zimbra> I'm sorry, I'm jet-lagged. I didn't notice that the source rank is changing in the receives. So yes, you would have unexpected messages. Back to bed! -- Pavan ----- "Craig Tierney" wrote: > I am trying to figure out a way to increase the > unexpected receive queue, if it is even possible > or necessary. > > I have code that looks a bit like: > > for i=0,npes-1 > msgsize=Some function to determine how much data (even 0) should be > sent to another PE > mpi_send(send to process i, msgsize) > if (msgsize > 0) > mpi_send(send to process i, data of msgsize bytes) > done > > for i=0,npes-1 > mpi_recv(receive from procss i, msgsize) > if (msgsize > 0) > mpi_recv(receive from process i, data of msgsize bytes) > done > > On my job with 864 cores, the code is blocking in here. > I believe that since the receives are not posted, that I > am most likely hitting some internal buffer limits. This > code works on other systems, but they have had to tweak > internal buffer sizes to get it to work. > > I cannot change the code in any way. So I am looking > for how the size of the unexpected receive queue is set > and if/how it can be changed. > > Thanks, > Craig > > > -- > Craig Tierney (craig.tierney@noaa.gov) > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From panda at cse.ohio-state.edu Wed Sep 23 08:54:27 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Sep 23 08:55:00 2009 Subject: [mvapich-discuss] Reply to "Global Array Programing Issue on the MVAPICH2" Message-ID: This reported issue has been resolved in the latest trunk version of MVAPICH2 1.4. The fix will be a part of the MVAPICH2 1.4 final release. Yutaka - Thanks for reporting the issue for your application and verifying the fix. I am cc'ing this note to mvapich-discuss for everybody's information about this fix. Thanks, DK >I confirm this problem was resolved on the 1.4rc2 trunk version. I told >to the programmer that this problem was resolved and confirmation about >it. > >Best regards > >Yutaka Kubota From srb at osc.edu Wed Sep 23 19:07:26 2009 From: srb at osc.edu (Scott Brozell) Date: Wed Sep 23 21:41:57 2009 Subject: [mvapich-discuss] thread safety Message-ID: <20090923230726.GD21013@phaze.osc.edu> Hi, Which versions of mvapich are thread safe ? http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc2.html "1.4 provides support and optimizations for other MPI-2 features, multi-threading" Does that mean thread safe ? MPICH2 supports thread safety: www.mcs.anl.gov/~thakur/papers/mpi-threads.pdf "In the next release, 1.0.4, the default build of the ch3:sock channel will support thread safety, but thread safety will be enabled only if the user calls MPI Init thread with MPI THREAD MULTIPLE." Does mvapich use the same thread safety mechanism ? thanks, Scott Scott Brozell, Ph.D. Senior Systems Developer/Engineer Science and Technology Support Ohio Supercomputer Center From panda at cse.ohio-state.edu Wed Sep 23 22:27:57 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Sep 23 22:28:30 2009 Subject: [mvapich-discuss] thread safety In-Reply-To: <20090923230726.GD21013@phaze.osc.edu> Message-ID: > Hi, > > Which versions of mvapich are thread safe ? > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc2.html > "1.4 provides support and optimizations for other MPI-2 features, multi-threading" > Does that mean thread safe ? Yes. MVAPICH2 1.4 is based on MPICH2 1.0.8p1. You should be using MPI_THREAD_MULTIPLE feature. Thanks, DK > MPICH2 supports thread safety: > www.mcs.anl.gov/~thakur/papers/mpi-threads.pdf > "In the next release, 1.0.4, the default build of the ch3:sock channel will support thread safety, > but thread safety will be enabled only if the user calls MPI Init thread with > MPI THREAD MULTIPLE." > Does mvapich use the same thread safety mechanism ? > > thanks, > Scott > > Scott Brozell, Ph.D. > Senior Systems Developer/Engineer > Science and Technology Support > Ohio Supercomputer Center > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From kubota at cray.com Thu Sep 24 03:42:13 2009 From: kubota at cray.com (Yutaka Kubota) Date: Thu Sep 24 03:42:54 2009 Subject: [mvapich-discuss] When will relase MVAPICH2 1.4 official version? Message-ID: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> Dear MVAPICH2 discussion Mailing list, This is Yutaka Kubota from Cray Japan. I would like to know when will release MVAPICH2 1.4 official version. If this plan was not decided, we will try to use RC2 or RC3 version. We just would like to know this plan is exist or not. Best regards Yutaka Kubota From wangm9 at cardiff.ac.uk Thu Sep 24 11:14:08 2009 From: wangm9 at cardiff.ac.uk (Manhui Wang) Date: Thu Sep 24 11:14:45 2009 Subject: [mvapich-discuss] fail to link with mvapich2-1.2p1/mvapich2-1.4rc2 In-Reply-To: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> References: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> Message-ID: <4ABB8CC0.3010301@cardiff.ac.uk> Dear MVAPICH2 developers, I tried to build Molpro (see http://www.molpro.net/) with MVAPICH2 1.2p1 (as well as mvapich2-1.4rc2) library, but failed. The reason is that both Molpro and Mvapich use some general function names. The source code of mvapich library (tokens.c parser.c) contains some yacc-like parsing code. These three functions(yylex yyparse yyerror) happen to have the same names as those in parse files of Molpro. I renamed these three function to those with a prefix mvapich_* in MVAPICH source code, and recompiled the mvapich2 library. Now it works fine. Could you please slightly change these names to avoid potential conflict with other application program in next release version? Surely, Molpro could also choose other specific function names to avoid such problems. The following is the error message: /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In function `yylex': parser.c:(.text+0xbfc): multiple definition of `yylex' ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here ld: Warning: size of symbol `yylex' changed from 3091 in ../lib/libmolpro.a(licence.o) to 3336 in /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In function `yyparse': tokens.c:(.text+0x0): multiple definition of `yyparse' ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here ld: Warning: size of symbol `yyparse' changed from 22803 in ../lib/libmolpro.a(licence.o) to 3110 in /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In function `yyerror': tokens.c:(.text+0x1e5c): multiple definition of `yyerror' ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here ld: Warning: size of symbol `yyerror' changed from 934 in ../lib/libmolpro.a(licence.o) to 26 in /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) failure /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In function `yylex': parser.c:(.text+0xbfc): multiple definition of `yylex' ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here ld: Warning: size of symbol `yylex' changed from 3091 in ../lib/libmolpro.a(licence.o) to 3336 in /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In function `yyparse': tokens.c:(.text+0x0): multiple definition of `yyparse' ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here ld: Warning: size of symbol `yyparse' changed from 22803 in ../lib/libmolpro.a(licence.o) to 3110 in /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In function `yyerror': tokens.c:(.text+0x1e5c): multiple definition of `yyerror' ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here Thanks Manhui Yutaka Kubota wrote: > Dear MVAPICH2 discussion Mailing list, > > This is Yutaka Kubota from Cray Japan. > > I would like to know when will release MVAPICH2 1.4 official version. If > this plan was not decided, we will try to use RC2 or RC3 version. We > just would like to know this plan is exist or not. > > Best regards > > Yutaka Kubota > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- ----------- Manhui Wang School of Chemistry, Cardiff University, Main Building, Park Place, Cardiff CF10 3AT, UK Telephone: +44 (0)29208 76637 From perkinjo at cse.ohio-state.edu Thu Sep 24 12:56:11 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 24 12:56:46 2009 Subject: [mvapich-discuss] fail to link with mvapich2-1.2p1/mvapich2-1.4rc2 In-Reply-To: <4ABB8CC0.3010301@cardiff.ac.uk> References: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> <4ABB8CC0.3010301@cardiff.ac.uk> Message-ID: <20090924165611.GH2346@cse.ohio-state.edu> On Thu, Sep 24, 2009 at 04:14:08PM +0100, Manhui Wang wrote: > Dear MVAPICH2 developers, > I tried to build Molpro (see http://www.molpro.net/) with MVAPICH2 > 1.2p1 (as well as mvapich2-1.4rc2) library, but failed. The reason is > that both Molpro and Mvapich use some general function names. The > source code of mvapich library (tokens.c parser.c) contains > some yacc-like parsing code. These three > functions(yylex yyparse yyerror) happen to have the same names as those > in parse files of Molpro. I renamed these three function to those with a > prefix mvapich_* in MVAPICH source code, and recompiled the mvapich2 > library. Now it works fine. Could you please slightly change these > names to avoid potential conflict with other application program in next > release version? Surely, Molpro could also choose other specific > function names to avoid such problems. Thank you for the suggestion. We'll take a look at this and have it resolved before our final release. > > The following is the error message: > > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In > function `yylex': > parser.c:(.text+0xbfc): multiple definition of `yylex' > ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here > ld: Warning: size of symbol `yylex' changed from 3091 in > ../lib/libmolpro.a(licence.o) to 3336 in > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > function `yyparse': > tokens.c:(.text+0x0): multiple definition of `yyparse' > ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here > ld: Warning: size of symbol `yyparse' changed from 22803 in > ../lib/libmolpro.a(licence.o) to 3110 in > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > function `yyerror': > tokens.c:(.text+0x1e5c): multiple definition of `yyerror' > ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here > ld: Warning: size of symbol `yyerror' changed from 934 in > ../lib/libmolpro.a(licence.o) to 26 in > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) > failure > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In > function `yylex': > parser.c:(.text+0xbfc): multiple definition of `yylex' > ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here > ld: Warning: size of symbol `yylex' changed from 3091 in > ../lib/libmolpro.a(licence.o) to 3336 in > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > function `yyparse': > tokens.c:(.text+0x0): multiple definition of `yyparse' > ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here > ld: Warning: size of symbol `yyparse' changed from 22803 in > ../lib/libmolpro.a(licence.o) to 3110 in > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) > /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > function `yyerror': > tokens.c:(.text+0x1e5c): multiple definition of `yyerror' > ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here > > Thanks > Manhui > > Yutaka Kubota wrote: > > Dear MVAPICH2 discussion Mailing list, > > > > This is Yutaka Kubota from Cray Japan. > > > > I would like to know when will release MVAPICH2 1.4 official version. If > > this plan was not decided, we will try to use RC2 or RC3 version. We > > just would like to know this plan is exist or not. > > > > Best regards > > > > Yutaka Kubota > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- > ----------- > Manhui Wang > School of Chemistry, Cardiff University, > Main Building, Park Place, > Cardiff CF10 3AT, UK > Telephone: +44 (0)29208 76637 > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090924/85b5fde4/attachment.bin From perkinjo at cse.ohio-state.edu Thu Sep 24 12:58:03 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 24 12:58:37 2009 Subject: [mvapich-discuss] When will relase MVAPICH2 1.4 official version? In-Reply-To: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> References: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> Message-ID: <20090924165803.GI2346@cse.ohio-state.edu> On Thu, Sep 24, 2009 at 02:42:13AM -0500, Yutaka Kubota wrote: > Dear MVAPICH2 discussion Mailing list, > > This is Yutaka Kubota from Cray Japan. > > I would like to know when will release MVAPICH2 1.4 official version. If > this plan was not decided, we will try to use RC2 or RC3 version. We > just would like to know this plan is exist or not. There will be an official version but we do not have a set date for this. I suggest using either rc2 or our trunk version in the meantime as they may contain bug fixes or performance enhancements not seen in the earlier releases. > > Best regards > > Yutaka Kubota > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090924/e8fe898e/attachment.bin From srb at osc.edu Wed Sep 23 22:44:15 2009 From: srb at osc.edu (Scott Brozell) Date: Thu Sep 24 14:42:35 2009 Subject: [mvapich-discuss] thread safety In-Reply-To: References: <20090923230726.GD21013@phaze.osc.edu> Message-ID: <20090924024414.GF21013@phaze.osc.edu> Hi, How about MVAPICH2 1.2 ? thanks, Scott On Wed, Sep 23, 2009 at 10:27:57PM -0400, Dhabaleswar Panda wrote: > > > > Which versions of mvapich are thread safe ? > > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc2.html > > "1.4 provides support and optimizations for other MPI-2 features, multi-threading" > > Does that mean thread safe ? > > Yes. MVAPICH2 1.4 is based on MPICH2 1.0.8p1. You should be using > MPI_THREAD_MULTIPLE feature. > > Thanks, > > DK > > > MPICH2 supports thread safety: > > www.mcs.anl.gov/~thakur/papers/mpi-threads.pdf > > "In the next release, 1.0.4, the default build of the ch3:sock channel will support thread safety, > > but thread safety will be enabled only if the user calls MPI Init thread with > > MPI THREAD MULTIPLE." > > Does mvapich use the same thread safety mechanism ? > > > > thanks, > > Scott > > > > Scott Brozell, Ph.D. > > Senior Systems Developer/Engineer > > Science and Technology Support > > Ohio Supercomputer Center > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > -- From perkinjo at cse.ohio-state.edu Thu Sep 24 15:39:09 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Sep 24 15:39:43 2009 Subject: [mvapich-discuss] thread safety In-Reply-To: <20090924024414.GF21013@phaze.osc.edu> References: <20090923230726.GD21013@phaze.osc.edu> <20090924024414.GF21013@phaze.osc.edu> Message-ID: <20090924193909.GN2346@cse.ohio-state.edu> On Wed, Sep 23, 2009 at 10:44:15PM -0400, Scott Brozell wrote: > Hi, > > How about MVAPICH2 1.2 ? This also allows MPI_THREAD_MULTIPLE. I'd have to check to see when the ability was first introduced but it has been available for some time now. > > thanks, > Scott > > On Wed, Sep 23, 2009 at 10:27:57PM -0400, Dhabaleswar Panda wrote: > > > > > > Which versions of mvapich are thread safe ? > > > > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc2.html > > > "1.4 provides support and optimizations for other MPI-2 features, multi-threading" > > > Does that mean thread safe ? > > > > Yes. MVAPICH2 1.4 is based on MPICH2 1.0.8p1. You should be using > > MPI_THREAD_MULTIPLE feature. > > > > Thanks, > > > > DK > > > > > MPICH2 supports thread safety: > > > www.mcs.anl.gov/~thakur/papers/mpi-threads.pdf > > > "In the next release, 1.0.4, the default build of the ch3:sock channel will support thread safety, > > > but thread safety will be enabled only if the user calls MPI Init thread with > > > MPI THREAD MULTIPLE." > > > Does mvapich use the same thread safety mechanism ? > > > > > > thanks, > > > Scott > > > > > > Scott Brozell, Ph.D. > > > Senior Systems Developer/Engineer > > > Science and Technology Support > > > Ohio Supercomputer Center > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > -- > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090924/21f1d617/attachment-0001.bin From wangm9 at cardiff.ac.uk Tue Sep 29 06:22:22 2009 From: wangm9 at cardiff.ac.uk (Manhui Wang) Date: Tue Sep 29 06:23:14 2009 Subject: [mvapich-discuss] fail to link with mvapich2-1.2p1/mvapich2-1.4rc2 In-Reply-To: <20090924165611.GH2346@cse.ohio-state.edu> References: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> <4ABB8CC0.3010301@cardiff.ac.uk> <20090924165611.GH2346@cse.ohio-state.edu> Message-ID: <4AC1DFDE.7080106@cardiff.ac.uk> Dear MVAPICH2 developers, Following last mail, I tried to see whether it is possible to make some change in Molpro. But it seems to be impossible since Molpro directly calls these functions, which are from yacc or bison libraries. This means these name conflicts with those in yacc or bison libraries. I have made two small patches(see files attached) for mvapich2, with these patches Molpro works fine with Mvapich2. I would be very appreciated if you can include these changes (probably you will consider more changes to avoid conflicting with other programs, but this works for molpro at least) in the mvapich2 development version at your earliest convenience. So I can test the updated mvapich2 before the final release. Thank you very much. Manhui Jonathan Perkins wrote: > On Thu, Sep 24, 2009 at 04:14:08PM +0100, Manhui Wang wrote: >> Dear MVAPICH2 developers, >> I tried to build Molpro (see http://www.molpro.net/) with MVAPICH2 >> 1.2p1 (as well as mvapich2-1.4rc2) library, but failed. The reason is >> that both Molpro and Mvapich use some general function names. The >> source code of mvapich library (tokens.c parser.c) contains >> some yacc-like parsing code. These three >> functions(yylex yyparse yyerror) happen to have the same names as those >> in parse files of Molpro. I renamed these three function to those with a >> prefix mvapich_* in MVAPICH source code, and recompiled the mvapich2 >> library. Now it works fine. Could you please slightly change these >> names to avoid potential conflict with other application program in next >> release version? Surely, Molpro could also choose other specific >> function names to avoid such problems. > > Thank you for the suggestion. We'll take a look at this and have it > resolved before our final release. > >> The following is the error message: >> >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In >> function `yylex': >> parser.c:(.text+0xbfc): multiple definition of `yylex' >> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here >> ld: Warning: size of symbol `yylex' changed from 3091 in >> ../lib/libmolpro.a(licence.o) to 3336 in >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >> function `yyparse': >> tokens.c:(.text+0x0): multiple definition of `yyparse' >> ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here >> ld: Warning: size of symbol `yyparse' changed from 22803 in >> ../lib/libmolpro.a(licence.o) to 3110 in >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >> function `yyerror': >> tokens.c:(.text+0x1e5c): multiple definition of `yyerror' >> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here >> ld: Warning: size of symbol `yyerror' changed from 934 in >> ../lib/libmolpro.a(licence.o) to 26 in >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) >> failure >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In >> function `yylex': >> parser.c:(.text+0xbfc): multiple definition of `yylex' >> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here >> ld: Warning: size of symbol `yylex' changed from 3091 in >> ../lib/libmolpro.a(licence.o) to 3336 in >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >> function `yyparse': >> tokens.c:(.text+0x0): multiple definition of `yyparse' >> ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here >> ld: Warning: size of symbol `yyparse' changed from 22803 in >> ../lib/libmolpro.a(licence.o) to 3110 in >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) >> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >> function `yyerror': >> tokens.c:(.text+0x1e5c): multiple definition of `yyerror' >> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here >> >> Thanks >> Manhui >> >> Yutaka Kubota wrote: >>> Dear MVAPICH2 discussion Mailing list, >>> >>> This is Yutaka Kubota from Cray Japan. >>> >>> I would like to know when will release MVAPICH2 1.4 official version. If >>> this plan was not decided, we will try to use RC2 or RC3 version. We >>> just would like to know this plan is exist or not. >>> >>> Best regards >>> >>> Yutaka Kubota >>> >>> >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> -- >> ----------- >> Manhui Wang >> School of Chemistry, Cardiff University, >> Main Building, Park Place, >> Cardiff CF10 3AT, UK >> Telephone: +44 (0)29208 76637 >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -- ----------- Manhui Wang School of Chemistry, Cardiff University, Main Building, Park Place, Cardiff CF10 3AT, UK Telephone: +44 (0)29208 76637 -------------- next part -------------- A non-text attachment was scrubbed... Name: parser.c.diff Type: text/x-patch Size: 1422 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090929/431ffdfe/parser.c.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: tokens.c.diff Type: text/x-patch Size: 2896 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090929/431ffdfe/tokens.c.bin From wangm9 at cardiff.ac.uk Tue Sep 29 09:42:07 2009 From: wangm9 at cardiff.ac.uk (Manhui Wang) Date: Tue Sep 29 09:42:48 2009 Subject: [mvapich-discuss] fail to link with mvapich2-1.2p1/mvapich2-1.4rc2 In-Reply-To: <4AC1DFDE.7080106@cardiff.ac.uk> References: <3B7D8CBBF8049C4C9746728929189A920F241A48@CFEVS1-IP.americas.cray.com> <4ABB8CC0.3010301@cardiff.ac.uk> <20090924165611.GH2346@cse.ohio-state.edu> <4AC1DFDE.7080106@cardiff.ac.uk> Message-ID: <4AC20EAF.9060100@cardiff.ac.uk> I made a mistake in producing the previous uploaded patches, here attached is the updated one, which seems to work fine with Molpro. Manhui Wang wrote: > Dear MVAPICH2 developers, > Following last mail, I tried to see whether it is possible to > make some change in Molpro. But it seems to be impossible since Molpro > directly calls these functions, which are from yacc or bison libraries. > This means these name conflicts with those in yacc or bison libraries. I > have made two small patches(see files attached) for mvapich2, with these > patches Molpro works fine with Mvapich2. I would be very appreciated if > you can include these changes (probably you will consider more changes > to avoid conflicting with other programs, but this works for molpro at > least) in the mvapich2 development version at your earliest convenience. > So I can test the updated mvapich2 before the final release. > > Thank you very much. > Manhui > > Jonathan Perkins wrote: >> On Thu, Sep 24, 2009 at 04:14:08PM +0100, Manhui Wang wrote: >>> Dear MVAPICH2 developers, >>> I tried to build Molpro (see http://www.molpro.net/) with MVAPICH2 >>> 1.2p1 (as well as mvapich2-1.4rc2) library, but failed. The reason is >>> that both Molpro and Mvapich use some general function names. The >>> source code of mvapich library (tokens.c parser.c) contains >>> some yacc-like parsing code. These three >>> functions(yylex yyparse yyerror) happen to have the same names as those >>> in parse files of Molpro. I renamed these three function to those with a >>> prefix mvapich_* in MVAPICH source code, and recompiled the mvapich2 >>> library. Now it works fine. Could you please slightly change these >>> names to avoid potential conflict with other application program in next >>> release version? Surely, Molpro could also choose other specific >>> function names to avoid such problems. >> Thank you for the suggestion. We'll take a look at this and have it >> resolved before our final release. >> >>> The following is the error message: >>> >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In >>> function `yylex': >>> parser.c:(.text+0xbfc): multiple definition of `yylex' >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here >>> ld: Warning: size of symbol `yylex' changed from 3091 in >>> ../lib/libmolpro.a(licence.o) to 3336 in >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >>> function `yyparse': >>> tokens.c:(.text+0x0): multiple definition of `yyparse' >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here >>> ld: Warning: size of symbol `yyparse' changed from 22803 in >>> ../lib/libmolpro.a(licence.o) to 3110 in >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >>> function `yyerror': >>> tokens.c:(.text+0x1e5c): multiple definition of `yyerror' >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here >>> ld: Warning: size of symbol `yyerror' changed from 934 in >>> ../lib/libmolpro.a(licence.o) to 26 in >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) >>> failure >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In >>> function `yylex': >>> parser.c:(.text+0xbfc): multiple definition of `yylex' >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here >>> ld: Warning: size of symbol `yylex' changed from 3091 in >>> ../lib/libmolpro.a(licence.o) to 3336 in >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >>> function `yyparse': >>> tokens.c:(.text+0x0): multiple definition of `yyparse' >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here >>> ld: Warning: size of symbol `yyparse' changed from 22803 in >>> ../lib/libmolpro.a(licence.o) to 3110 in >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In >>> function `yyerror': >>> tokens.c:(.text+0x1e5c): multiple definition of `yyerror' >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here >>> >>> Thanks >>> Manhui >>> >>> Yutaka Kubota wrote: >>>> Dear MVAPICH2 discussion Mailing list, >>>> >>>> This is Yutaka Kubota from Cray Japan. >>>> >>>> I would like to know when will release MVAPICH2 1.4 official version. If >>>> this plan was not decided, we will try to use RC2 or RC3 version. We >>>> just would like to know this plan is exist or not. >>>> >>>> Best regards >>>> >>>> Yutaka Kubota >>>> >>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>> -- >>> ----------- >>> Manhui Wang >>> School of Chemistry, Cardiff University, >>> Main Building, Park Place, >>> Cardiff CF10 3AT, UK >>> Telephone: +44 (0)29208 76637 >>> _______________________________________________ >>> mvapich-discuss mailing list >>> mvapich-discuss@cse.ohio-state.edu >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- ----------- Manhui Wang School of Chemistry, Cardiff University, Main Building, Park Place, Cardiff CF10 3AT, UK Telephone: +44 (0)29208 76637 -------------- next part -------------- A non-text attachment was scrubbed... Name: mvapich2-1.4rc2-yacc.patch Type: text/x-patch Size: 7788 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090929/f0ec6d04/mvapich2-1.4rc2-yacc.bin From panda at cse.ohio-state.edu Tue Sep 29 09:46:27 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Sep 29 09:47:02 2009 Subject: [mvapich-discuss] fail to link with mvapich2-1.2p1/mvapich2-1.4rc2 In-Reply-To: <4AC20EAF.9060100@cardiff.ac.uk> Message-ID: Thanks for sending us the patch. We will take a look at it. Thanks, DK On Tue, 29 Sep 2009, Manhui Wang wrote: > I made a mistake in producing the previous uploaded patches, here > attached is the updated one, which seems to work fine with Molpro. > > Manhui Wang wrote: > > Dear MVAPICH2 developers, > > Following last mail, I tried to see whether it is possible to > > make some change in Molpro. But it seems to be impossible since Molpro > > directly calls these functions, which are from yacc or bison libraries. > > This means these name conflicts with those in yacc or bison libraries. I > > have made two small patches(see files attached) for mvapich2, with these > > patches Molpro works fine with Mvapich2. I would be very appreciated if > > you can include these changes (probably you will consider more changes > > to avoid conflicting with other programs, but this works for molpro at > > least) in the mvapich2 development version at your earliest convenience. > > So I can test the updated mvapich2 before the final release. > > > > Thank you very much. > > Manhui > > > > Jonathan Perkins wrote: > >> On Thu, Sep 24, 2009 at 04:14:08PM +0100, Manhui Wang wrote: > >>> Dear MVAPICH2 developers, > >>> I tried to build Molpro (see http://www.molpro.net/) with MVAPICH2 > >>> 1.2p1 (as well as mvapich2-1.4rc2) library, but failed. The reason is > >>> that both Molpro and Mvapich use some general function names. The > >>> source code of mvapich library (tokens.c parser.c) contains > >>> some yacc-like parsing code. These three > >>> functions(yylex yyparse yyerror) happen to have the same names as those > >>> in parse files of Molpro. I renamed these three function to those with a > >>> prefix mvapich_* in MVAPICH source code, and recompiled the mvapich2 > >>> library. Now it works fine. Could you please slightly change these > >>> names to avoid potential conflict with other application program in next > >>> release version? Surely, Molpro could also choose other specific > >>> function names to avoid such problems. > >> Thank you for the suggestion. We'll take a look at this and have it > >> resolved before our final release. > >> > >>> The following is the error message: > >>> > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In > >>> function `yylex': > >>> parser.c:(.text+0xbfc): multiple definition of `yylex' > >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here > >>> ld: Warning: size of symbol `yylex' changed from 3091 in > >>> ../lib/libmolpro.a(licence.o) to 3336 in > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > >>> function `yyparse': > >>> tokens.c:(.text+0x0): multiple definition of `yyparse' > >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here > >>> ld: Warning: size of symbol `yyparse' changed from 22803 in > >>> ../lib/libmolpro.a(licence.o) to 3110 in > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > >>> function `yyerror': > >>> tokens.c:(.text+0x1e5c): multiple definition of `yyerror' > >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here > >>> ld: Warning: size of symbol `yyerror' changed from 934 in > >>> ../lib/libmolpro.a(licence.o) to 26 in > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) > >>> failure > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o): In > >>> function `yylex': > >>> parser.c:(.text+0xbfc): multiple definition of `yylex' > >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x9ccf): first defined here > >>> ld: Warning: size of symbol `yylex' changed from 3091 in > >>> ../lib/libmolpro.a(licence.o) to 3336 in > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(parser.o) > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > >>> function `yyparse': > >>> tokens.c:(.text+0x0): multiple definition of `yyparse' > >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0xdb): first defined here > >>> ld: Warning: size of symbol `yyparse' changed from 22803 in > >>> ../lib/libmolpro.a(licence.o) to 3110 in > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o) > >>> /home/sacmw4/soft/mvapich2-1.2p1-install/lib/libmpich.a(tokens.o): In > >>> function `yyerror': > >>> tokens.c:(.text+0x1e5c): multiple definition of `yyerror' > >>> ../lib/libmolpro.a(licence.o):parse.c:(.text+0x5edd): first defined here > >>> > >>> Thanks > >>> Manhui > >>> > >>> Yutaka Kubota wrote: > >>>> Dear MVAPICH2 discussion Mailing list, > >>>> > >>>> This is Yutaka Kubota from Cray Japan. > >>>> > >>>> I would like to know when will release MVAPICH2 1.4 official version. If > >>>> this plan was not decided, we will try to use RC2 or RC3 version. We > >>>> just would like to know this plan is exist or not. > >>>> > >>>> Best regards > >>>> > >>>> Yutaka Kubota > >>>> > >>>> > >>>> _______________________________________________ > >>>> mvapich-discuss mailing list > >>>> mvapich-discuss@cse.ohio-state.edu > >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>> -- > >>> ----------- > >>> Manhui Wang > >>> School of Chemistry, Cardiff University, > >>> Main Building, Park Place, > >>> Cardiff CF10 3AT, UK > >>> Telephone: +44 (0)29208 76637 > >>> _______________________________________________ > >>> mvapich-discuss mailing list > >>> mvapich-discuss@cse.ohio-state.edu > >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- > ----------- > Manhui Wang > School of Chemistry, Cardiff University, > Main Building, Park Place, > Cardiff CF10 3AT, UK > Telephone: +44 (0)29208 76637 > From Craig.Tierney at noaa.gov Tue Sep 29 12:38:18 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Tue Sep 29 12:38:54 2009 Subject: [mvapich-discuss] Unable to build mvapich2 1.2p1 for use with totalview In-Reply-To: <20090917171354.GJ2695@cse.ohio-state.edu> References: <4AAE9DA6.7070908@noaa.gov> <20090917171354.GJ2695@cse.ohio-state.edu> Message-ID: <4AC237FA.2080607@noaa.gov> Jonathan Perkins wrote: > Hi Craig. We haven't seen this type of issue come up before. We'll > take a look to see if we can reproduce this issue. In the meantime can > you try building while removing the -L/usr/lib64 and -L/lib64 options > from your compiler variables. If you really need these options you > should add them to the CFLAGS variable. > > Can you also try leaving the --with-pm option unset (this allows for mpd > and mpirun_rsh). We perform the majority of our testing with mpirun_rsh > as it scales and performs better than the other pm options. > > Let us know if any of these actions allows your build to proceed > successfully. > Sorry about the delay. The reason I include the flags above (which can be done in LDFLAGS) is so that I don't have to fight the OS with the library search path and have it try the 32-bit libraries first, which causes warnings. Removing that doesn't fix the problem. The error I reported doesn't seem to exist in mvapich2-1.4rc2. However, the problem that I really had is that I cannot debug with totalview. When I start the debugger, it tries to debug mpirun_rsh, not the application. I built mvapich2-1.4rc2 as: ./configure LDFLAGS=-L/usr/lib64 -L/lib64 CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort --with-ib-libpath=/usr/lib64 --with-ib-include=/usr/include --prefix= /opt/hjet/mvapich2/1.4rc2-intel --enable-romio=yes --with-file-system=lustre --e nable-g=dbg --enable-sharedlibs=gcc --enable-debuginfo And I have the following environment variables set: TVDSVRLAUNCHCMD=ssh TOTALVIEW=/opt/toolworks/totalview.8.6.2-0/bin/totalview But my problem is that Totalview is debugging mpirun_rsh, not the MPI application. I am launching the program with: $MPICH/bin/mpirun_rsh -hostfile $MACHINE_FILE -tv -np 8 ./osu_alltoall The program does run, but it just isn't debugged. Thanks, Craig > On Mon, Sep 14, 2009 at 01:46:46PM -0600, Craig Tierney wrote: >> I am trying to build Mvapich2 1.2p1 so that I can use >> Totalview. The docs say that I am supposed to add the >> following options: >> >> --enable-g=dbg --enable-sharedlibs=gcc --enable-debuginfo >> >> My complete configure line is: >> >> ./configure CC="icc -L/usr/lib64 -L/lib64" CXX="icpc -L/usr/lib64 -L/lib64" F77="ifort -L/usr/lib64 -L/lib64" FC="ifort -L/usr/lib64 -L/lib64" F90="ifort -L/usr/lib64 -L/lib64" \ >> --with-ib-libpath=/usr/lib64 \ >> --with-ib-include=/usr/include \ >> --prefix=/opt/hjet/mvapich2/1.2p1-intel \ >> --enable-romio=yes --with-file-system=lustre \ >> --with-pm=remshell \ >> --enable-g=dbg --enable-sharedlibs=gcc --enable-debuginfo \ >> --enable-threads=multiple >> >> The problem is that when it tries to link the system tools, it fails. >> For example, when it tries to link mpiexec, I get: >> >> icc -L/usr/lib64 -L/lib64 -g -L/usr/lib64 -static -o mpiexec mpiexec.o -L../util \ >> -lmpiexec -L../../../lib -L/home/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/lib -lmpich -lpthread -lrdmacm -libverbs -libumad -lrt >> ../util/libmpiexec.a(pmiport.o): In function `MPIE_GetMyHostName': >> /misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/pm/util/pmiport.c:200: warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the >> glibc version used for linking >> /usr/lib64/libc.a(malloc.o): In function `__malloc_check_init': >> (.text+0xb00): multiple definition of `__malloc_check_init' >> ../../../lib/libmpich.a(mvapich_malloc.o):/misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/hooks.c:83: first defined here >> ld: Warning: size of symbol `__malloc_check_init' changed from 122 in ../../../lib/libmpich.a(mvapich_malloc.o) to 105 in /usr/lib64/libc.a(malloc.o) >> /usr/lib64/libc.a(malloc.o): In function `_int_free': >> (.text+0x21f0): multiple definition of `_int_free' >> ../../../lib/libmpich.a(mvapich_malloc.o):/misc/whome/admin/hjet/software/opt/mvapich/mvapich2-1.2p1/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c:4307: first defined here >> ld: Warning: size of symbol `_int_free' changed from 778 in ../../../lib/libmpich.a(mvapich_malloc.o) to 2413 in /usr/lib64/libc.a(malloc.o) >> /usr/lib64/libc.a(malloc.o): In function `_int_malloc': >> (.text+0x2b60): multiple definition of `_int_malloc' >> >> ........ >> >> And problems with IB: >> >> (.text+0xbb): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking >> /usr/lib64/libibverbs.a(src_libibverbs_la-verbs.o): In function `ibv_create_comp_channel': >> (.text+0x9b6): undefined reference to `pthread_mutex_trylock' >> /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_get_fd': >> (.text+0xdc): undefined reference to `ibwarn' >> /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_done': >> (.text+0x10d): undefined reference to `ibwarn' >> /usr/lib64/libibumad.a(libibumad_la-umad.o): In function `umad_addr_dump': >> >> >> This doesn't happen when the binary is linked dynamically (remove -static). >> >> Am I missing an option from getting everything built cleanly? >> >> Thanks, >> Craig >> >> -- >> Craig Tierney (craig.tierney@noaa.gov) >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Craig Tierney (craig.tierney@noaa.gov)