From xavier.bru at bull.net Fri Mar 2 08:09:05 2007 From: xavier.bru at bull.net (xb) Date: Fri Mar 2 09:43:15 2007 Subject: [mvapich-discuss] mpiexec_cr (small) issue Message-ID: <45E821F1.6060005@bull.net> An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070302/bab80008/attachment.html From gaoq at cse.ohio-state.edu Fri Mar 2 09:58:25 2007 From: gaoq at cse.ohio-state.edu (Qi Gao) Date: Fri Mar 2 09:59:41 2007 Subject: [mvapich-discuss] mpiexec_cr (small) issue References: <45E821F1.6060005@bull.net> Message-ID: <003e01c75cdb$546d2220$8d02a8c0@GAO> Hi, Thanks for trying MVAPICH2 and providing this patch to fix the problem! I'll incorporate this patch to our svn. Thanks, --Qi ----- Original Message ----- From: xb To: mvapich-discuss@cse.ohio-state.edu Sent: Friday, March 02, 2007 8:09 AM Subject: [mvapich-discuss] mpiexec_cr (small) issue Hello all, Trying to run mpiexec with cr enabled, I hit a (small) problem where running mpiexec without specifying a path to the command, I get a "command line too long" message. Command works OK if we specify a path before the command name. Hereafter a patch that fixes this (small) problem. xavier traces ---------------------------------------------------------------------------------- # mpiexec -n 2 ./cpi command line too long # /usr/local/mvapich2/bin/mpiexec -n 2 ./cpi Process 0 of 2 is on woodcr1 Process 1 of 2 is on woodcr1 pi is approximately 3.1415926544231318, Error is 0.0000000008333387 wall clock time = 0.223271 # Patched command: # mpiexec -n 2 ./cpi Process 0 of 2 is on woodcr1 Process 1 of 2 is on woodcr1 pi is approximately 3.1415926544231318, Error is 0.0000000008333387 wall clock time = 0.224422 # Patch --------------------------------------------------------------------------------------- --- mvapich2-trunk-2007-03-01/src/pm/mpd/mpiexec_cr.c~ 2007-02-16 21:17:51.000000000 +0100 +++ mvapich2-trunk-2007-03-01/src/pm/mpd/mpiexec_cr.c 2007-03-02 13:51:11.000000000 +0100 @@ -435,7 +435,8 @@ CR_Init(); /*replace the command line to mpiexec*/ rchar = strrchr(argv[0],'/'); - length = rchar+1-argv[0]; + /* strrchr returns NULL if char is not found */ + length = rchar ? rchar+1-argv[0] : 0; if (length+strlen(MPIRUN)>MAX_PATH_LEN) { fprintf(stderr,"command line too long\n"); exit (1); ------------------------------------------------------------------------------ _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070302/5396c31d/attachment.html From rowland at cse.ohio-state.edu Fri Mar 2 16:09:30 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Fri Mar 2 16:10:58 2007 Subject: [mvapich-discuss] deadlock with g95/gfortran In-Reply-To: <1172466230.45e26a36737b2@webmail.mail.gatech.edu> References: <1172466230.45e26a36737b2@webmail.mail.gatech.edu> Message-ID: <45E8928A.6040003@cse.ohio-state.edu> Aliva Pattnaik wrote: > Hi, > > I am trying to run the fortran example problem(fpi.f) that comes with mvapich2- > 0.9.8. I am using g95 to compile it. But while running it with mpiexec its > getting deadlock, though in the "Top" output I can see the processes taking 99% > of CPU time. The same situation is arising while using gfortran. But I am able > to run c example problems compiled with gcc, successfully. > > The cluster that I am using is 64 bit AMD opteron with infiniband. > > I will really appreciate if someone can help me in fixing this problem. > > Thank you very much for your help, > Aliva Hello Aliva. I've been looking into this problem with a variety of compilers: Intel PGI Pathscale GCC (gfortran - also latest from gcc SVN trunk) GCC (g77) I tested with the fpi.f and pi3f90.f90 examples. All cases work as expected except those using gfortran. I only tested fpi.f with g77. When running with two processes, it appeared as if one of the processes was stuck in libgfortran: (gdb) bt #0 0x000000342e00b0df in __read_nocancel () from /lib64/tls/libpthread.so.0 #1 0x0000002a9580b79a in find_or_create_unit () from /usr/lib64/libgfortran.so.1 #2 0x0000002a95807b2e in _gfortran_transfer_real () from /usr/lib64/libgfortran.so.1 #3 0x0000002a9580800d in _gfortran_transfer_real () from /usr/lib64/libgfortran.so.1 #4 0x0000002a95806288 in _gfortran_st_open () from /usr/lib64/libgfortran.so.1 #5 0x0000002a9580956b in _gfortran_st_read_done () from /usr/lib64/libgfortran.so.1 #6 0x0000000000403b92 in MAIN__ () at fpi.f:46 #7 0x000000000047342e in main () while the other was waiting for it after getting to the MPI_BCAST call: (gdb) bt #0 0x0000002a95b3adeb in mthca_poll_cq (ibcq=0x5eb060, ne=1, wc=0x7fbffff080) at src/cq.c:482 #1 0x000000000042ca3b in ibv_poll_cq (cq=0x5eb060, num_entries=1, wc=0x7fbffff080) at /usr/local/ofed/include/infiniband/verbs.h:815 #2 0x000000000042bbf6 in MPIDI_CH3I_MRAILI_Cq_poll (vbuf_handle=0x7fbffff140, vc_req=0x0, receiving=0) at ibv_channel_manager.c:456 #3 0x0000000000414eef in MPIDI_CH3I_read_progress (vc_pptr=0x7fbffff158, v_ptr=0x7fbffff140) at ch3_read_progress.c:110 #4 0x0000000000413b56 in MPIDI_CH3I_Progress (is_blocking=1, state=0x7fbffff1a0) at ch3_progress.c:158 #5 0x000000000040a7dd in MPIC_Wait (request_ptr=0x5bde60) at helper_fns.c:316 #6 0x0000000000409db3 in MPIC_Recv (buf=0x7fbffff664, count=1, datatype=1275069467, source=0, tag=2, comm=1140850688, status=0x1) at helper_fns.c:86 #7 0x000000000040426f in MPIR_Bcast (buffer=0x7fbffff664, count=1, datatype=1275069467, root=0, comm_ptr=0x5a4940) at bcast.c:208 #8 0x000000000040594b in PMPI_Bcast (buffer=0x7fbffff664, count=1, datatype=1275069467, root=0, comm=1140850688) at bcast.c:785 #9 0x0000000000403bfd in pmpi_bcast_ (v1=0x7fbffff664, v2=0x4735ac, v3=0x4735a8, v4=0x4735a4, v5=0x473550, ierr=0x7fbffff66c) at bcastf.c:119 #10 0x0000000000403991 in MAIN__ () at fpi.f:50 #11 0x000000000047342e in main () The process stuck in libgfortran is the one with myid of 0, and should be prompting for the number of intervals. I believe this is where it is stuck. However, I can make it go if I do something like this: [rowland@ro0-oib examples]$ ../bin/mpiexec -n 2 ./fpi 10 100 1000 10000 10 0 Process 1 of 2 is alive Process 0 of 2 is alive Enter the number of intervals: (0 quits) pi is approximately: 3.1424259850010983 Error is: 0.0008333314113051 Enter the number of intervals: (0 quits) pi is approximately: 3.1416009869231241 Error is: 0.0000083333333309 Enter the number of intervals: (0 quits) pi is approximately: 3.1415927369231254 Error is: 0.0000000833333322 Enter the number of intervals: (0 quits) pi is approximately: 3.1415926544231318 Error is: 0.0000000008333387 Enter the number of intervals: (0 quits) pi is approximately: 3.1424259850010983 Error is: 0.0008333314113051 Enter the number of intervals: (0 quits) This seems only necessary with gfortran. It looks like there is some input/output buffering issue or something. I see the same behavior with the pi3f90 example. Other than this issue, it seems gfortran is actually working correctly. Can you let us know if you can duplicate these same results? -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From rowland at cse.ohio-state.edu Fri Mar 2 16:22:56 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Fri Mar 2 16:24:25 2007 Subject: [mvapich-discuss] deadlock with g95/gfortran In-Reply-To: <1172466230.45e26a36737b2@webmail.mail.gatech.edu> References: <1172466230.45e26a36737b2@webmail.mail.gatech.edu> Message-ID: <45E895B0.2050007@cse.ohio-state.edu> Aliva Pattnaik wrote: > Hi, > > I am trying to run the fortran example problem(fpi.f) that comes with mvapich2- > 0.9.8. I am using g95 to compile it. But while running it with mpiexec its > getting deadlock, though in the "Top" output I can see the processes taking 99% > of CPU time. The same situation is arising while using gfortran. But I am able > to run c example problems compiled with gcc, successfully. > > The cluster that I am using is 64 bit AMD opteron with infiniband. > > I will really appreciate if someone can help me in fixing this problem. > > Thank you very much for your help, > Aliva Could you also please try setting this environment variable first: export GFORTRAN_UNBUFFERED_ALL=y It seems this fixes the issue. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From ppk at ats.ucla.edu Fri Mar 2 15:46:24 2007 From: ppk at ats.ucla.edu (Korambath, Prakashan) Date: Fri Mar 2 16:38:41 2007 Subject: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 Message-ID: <43F64E86355A744E9D51506B6C6783B9014CE0D9@EM2.ad.ucla.edu> Hi, I just setup two nodes connected through an IB cable running Fedora Core6 OS kernel 2.6.19-1.2911.fc6 and OFED-1.1. ibstat and ibnodes outputs are below. I ran make.mvapich2.gen2 file in order to create the mpi related files. I am getting following error when I am running mpiexec. Could you please tell me what I am doing wrong? The configure is using --with-device=osu_ch3:mrail inside make.mvapich2.gen2 . I don't know whether I have wrong device or something. Also ulimit -l shows unlimited. Thanks for your help. Prakashan Korambath UCLA ------------------------------------------ -bash-3.1$ mpd & [1] 13652 -bash-3.1$ !mpdboot mpdboot -n 2 -f hostfile [1]+ Done mpd -bash-3.1$ mpicc -o bones bones.c -bash-3.1$ which mpicc ~/mvapich2/bin/mpicc -bash-3.1$ mpiexec -n 2 ./bones cannot create cq Failed to Initialize HCA type Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(230): Initialization failed MPID_Init(81)........: channel initialization failed (unknown)(): Other MPI errorrank 1 in job 1 grid4.ats.ucla.edu_33136 caused collective abort of all ranks exit status of rank 1: killed by signal 9 -bash-3.1$ -bash-3.1$ mpdtrace grid4 n11 ----------------------- [root@grid4 ~]# ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x00066a0098007a39 System image GUID: 0x00066a0098007a39 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 1 LMC: 0 SM lid: 2 Capability mask: 0x02510a6a Port GUID: 0x00066a00a0007a39 [root@grid4 ~]# ibnodes Ca : 0x00066a0098007a25 ports 1 "n11 HCA-1" Ca : 0x00066a0098007a39 ports 1 "grid4 HCA-1" -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070302/8fdeea4c/attachment.html From rowland at cse.ohio-state.edu Fri Mar 2 16:53:17 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Fri Mar 2 16:54:46 2007 Subject: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 In-Reply-To: <43F64E86355A744E9D51506B6C6783B9014CE0D9@EM2.ad.ucla.edu> References: <43F64E86355A744E9D51506B6C6783B9014CE0D9@EM2.ad.ucla.edu> Message-ID: <45E89CCD.5050609@cse.ohio-state.edu> Korambath, Prakashan wrote: > Hi, > I just setup two nodes connected through an IB cable running Fedora > Core6 OS kernel 2.6.19-1.2911.fc6 and OFED-1.1. ibstat and ibnodes > outputs are below. I ran make.mvapich2.gen2 file in order to create the > mpi related files. I am getting following error when I am running > mpiexec. Could you please tell me what I am doing wrong? The configure > is using --with-device=osu_ch3:mrail inside make.mvapich2.gen2 . I > don't know whether I have wrong device or something. Also ulimit -l > shows unlimited. Thanks for your help. Thanks for trying MVAPICH2. Have you tried putting the following in your /etc/init.d/sshd files: ulimit -l unlimited and then restarting sshd first on both machines? This is noted in section 7.2.4 of the MVAPICH2 User Guide: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/index.html We've found this necessary, in addition to modifying /etc/security/limits.conf. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From huanwei at cse.ohio-state.edu Fri Mar 2 16:54:51 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Fri Mar 2 16:55:20 2007 Subject: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 In-Reply-To: <43F64E86355A744E9D51506B6C6783B9014CE0D9@EM2.ad.ucla.edu> Message-ID: Hi Prakashan, Thanks for using mvapich2. This is pretty weird because the ulimit is typically the reason when you see create cq failure. May I ask you to make sure that ulimit is unlimited on both nodes? Also, it will be good if you verify using the following commands (so that ulimit is actually ulimited when you run the program): ssh n11 ulimit -l ssh grid4 ulimit -l Also, would you please verify on both machines that port is active. Finally, if all them are fine, would you please make sure ib level micro-benchmarks run successfully? Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 2 Mar 2007, Korambath, Prakashan wrote: > Hi, > I just setup two nodes connected through an IB cable running Fedora > Core6 OS kernel 2.6.19-1.2911.fc6 and OFED-1.1. ibstat and ibnodes > outputs are below. I ran make.mvapich2.gen2 file in order to create > the mpi related files. I am getting following error when I am running > mpiexec. Could you please tell me what I am doing wrong? The > configure is using --with-device=osu_ch3:mrail inside > make.mvapich2.gen2 . I don't know whether I have wrong device or > something. Also ulimit -l shows unlimited. Thanks for your help. > > > Prakashan Korambath > UCLA > > ------------------------------------------ > > > > -bash-3.1$ mpd & > [1] 13652 > -bash-3.1$ !mpdboot > mpdboot -n 2 -f hostfile > [1]+ Done mpd > -bash-3.1$ mpicc -o bones bones.c > -bash-3.1$ which mpicc > ~/mvapich2/bin/mpicc > -bash-3.1$ mpiexec -n 2 ./bones > cannot create cq > Failed to Initialize HCA type > Fatal error in MPI_Init: Other MPI error, error stack: > MPIR_Init_thread(230): Initialization failed > MPID_Init(81)........: channel initialization failed > (unknown)(): Other MPI errorrank 1 in job 1 grid4.ats.ucla.edu_33136 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 > -bash-3.1$ > -bash-3.1$ mpdtrace > grid4 > n11 > > > > ----------------------- > [root@grid4 ~]# ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x00066a0098007a39 > System image GUID: 0x00066a0098007a39 > Port 1: > State: Active > Physical state: LinkUp > Rate: 20 > Base lid: 1 > LMC: 0 > SM lid: 2 > Capability mask: 0x02510a6a > Port GUID: 0x00066a00a0007a39 > [root@grid4 ~]# ibnodes > Ca : 0x00066a0098007a25 ports 1 "n11 HCA-1" > Ca : 0x00066a0098007a39 ports 1 "grid4 HCA-1" > From rowland at cse.ohio-state.edu Fri Mar 2 16:56:10 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Fri Mar 2 16:57:39 2007 Subject: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 In-Reply-To: <43F64E86355A744E9D51506B6C6783B9014CE0D9@EM2.ad.ucla.edu> References: <43F64E86355A744E9D51506B6C6783B9014CE0D9@EM2.ad.ucla.edu> Message-ID: <45E89D7A.9090004@cse.ohio-state.edu> This is the User Guide link I intended to send: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html#x1-360007.2.4 -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From ppk at ats.ucla.edu Fri Mar 2 17:06:43 2007 From: ppk at ats.ucla.edu (Korambath, Prakashan) Date: Fri Mar 2 17:09:26 2007 Subject: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 References: <43F64E86355A744E9D51506B6C6783B9014CE0D9@EM2.ad.ucla.edu> <45E89CCD.5050609@cse.ohio-state.edu> Message-ID: <43F64E86355A744E9D51506B6C6783B9014CE0DB@EM2.ad.ucla.edu> Thanks Shaun. That did solve the problem. I typed the ulimit -l unlimited in my window, but when one does the ssh it gets system default value. Having that limit set in sshd resolves that issue. Prakashan -----Original Message----- From: Shaun Rowland [mailto:rowland@cse.ohio-state.edu] Sent: Fri 3/2/2007 1:53 PM To: Korambath, Prakashan Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 Korambath, Prakashan wrote: > Hi, > I just setup two nodes connected through an IB cable running Fedora > Core6 OS kernel 2.6.19-1.2911.fc6 and OFED-1.1. ibstat and ibnodes > outputs are below. I ran make.mvapich2.gen2 file in order to create the > mpi related files. I am getting following error when I am running > mpiexec. Could you please tell me what I am doing wrong? The configure > is using --with-device=osu_ch3:mrail inside make.mvapich2.gen2 . I > don't know whether I have wrong device or something. Also ulimit -l > shows unlimited. Thanks for your help. Thanks for trying MVAPICH2. Have you tried putting the following in your /etc/init.d/sshd files: ulimit -l unlimited and then restarting sshd first on both machines? This is noted in section 7.2.4 of the MVAPICH2 User Guide: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/index.html We've found this necessary, in addition to modifying /etc/security/limits.conf. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070302/a7c38997/attachment.html From ppk at ats.ucla.edu Fri Mar 2 17:09:04 2007 From: ppk at ats.ucla.edu (Korambath, Prakashan) Date: Fri Mar 2 17:10:45 2007 Subject: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 References: Message-ID: <43F64E86355A744E9D51506B6C6783B9014CE0DC@EM2.ad.ucla.edu> Hi Wei, It was getting the default value of 32. Now that I added 'ulimit -l unlimited' into /etc/init.d/sshd itself, it is ok. Thanks a lot for the help. Prakashan -----Original Message----- From: wei huang [mailto:huanwei@cse.ohio-state.edu] Sent: Fri 3/2/2007 1:54 PM To: Korambath, Prakashan Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8 Hi Prakashan, Thanks for using mvapich2. This is pretty weird because the ulimit is typically the reason when you see create cq failure. May I ask you to make sure that ulimit is unlimited on both nodes? Also, it will be good if you verify using the following commands (so that ulimit is actually ulimited when you run the program): ssh n11 ulimit -l ssh grid4 ulimit -l Also, would you please verify on both machines that port is active. Finally, if all them are fine, would you please make sure ib level micro-benchmarks run successfully? Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 2 Mar 2007, Korambath, Prakashan wrote: > Hi, > I just setup two nodes connected through an IB cable running Fedora > Core6 OS kernel 2.6.19-1.2911.fc6 and OFED-1.1. ibstat and ibnodes > outputs are below. I ran make.mvapich2.gen2 file in order to create > the mpi related files. I am getting following error when I am running > mpiexec. Could you please tell me what I am doing wrong? The > configure is using --with-device=osu_ch3:mrail inside > make.mvapich2.gen2 . I don't know whether I have wrong device or > something. Also ulimit -l shows unlimited. Thanks for your help. > > > Prakashan Korambath > UCLA > > ------------------------------------------ > > > > -bash-3.1$ mpd & > [1] 13652 > -bash-3.1$ !mpdboot > mpdboot -n 2 -f hostfile > [1]+ Done mpd > -bash-3.1$ mpicc -o bones bones.c > -bash-3.1$ which mpicc > ~/mvapich2/bin/mpicc > -bash-3.1$ mpiexec -n 2 ./bones > cannot create cq > Failed to Initialize HCA type > Fatal error in MPI_Init: Other MPI error, error stack: > MPIR_Init_thread(230): Initialization failed > MPID_Init(81)........: channel initialization failed > (unknown)(): Other MPI errorrank 1 in job 1 grid4.ats.ucla.edu_33136 caused collective abort of all ranks > exit status of rank 1: killed by signal 9 > -bash-3.1$ > -bash-3.1$ mpdtrace > grid4 > n11 > > > > ----------------------- > [root@grid4 ~]# ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x00066a0098007a39 > System image GUID: 0x00066a0098007a39 > Port 1: > State: Active > Physical state: LinkUp > Rate: 20 > Base lid: 1 > LMC: 0 > SM lid: 2 > Capability mask: 0x02510a6a > Port GUID: 0x00066a00a0007a39 > [root@grid4 ~]# ibnodes > Ca : 0x00066a0098007a25 ports 1 "n11 HCA-1" > Ca : 0x00066a0098007a39 ports 1 "grid4 HCA-1" > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070302/664f2475/attachment-0001.html From bhartner at us.ibm.com Tue Mar 6 11:17:35 2007 From: bhartner at us.ibm.com (Bill Hartner) Date: Tue Mar 6 11:18:11 2007 Subject: [mvapich-discuss] multirail and ofed Message-ID: MVAPICH Team, I was reviewing the multirail code and have a question. When using the OFED stack and enabling the use of librdmacm, it appears as though multirail can not be used. Assuming my code review is correct, can you explain why this is the case and are there any plans to get librdmacm and multirail working together ? Thanks, Bill Hartner IBM Systems and Technology Group -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070306/fd81b8bd/attachment.html From panda at cse.ohio-state.edu Tue Mar 6 11:32:28 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Mar 6 11:32:56 2007 Subject: [mvapich-discuss] multirail and ofed In-Reply-To: from "Bill Hartner" at Mar 06, 2007 10:17:35 AM Message-ID: <200703061632.l26GWSSZ020432@xi.cse.ohio-state.edu> Hi Bill, > I was reviewing the multirail code and have a question. > > When using the OFED stack and enabling the use of librdmacm, > it appears as though multirail can not be used. Assuming > my code review is correct, can you explain why this is the > case and are there any plans to get librdmacm and > multirail working together ? I am assuming here that you are reviewing the MVAPICH2 code. Your code review is correct. We just didn't have enough time during the 0.9.8 release time frame to incorporate this feature. We are currently working on this and the next release will have this support. Best Regards, DK > Thanks, > > Bill Hartner > IBM Systems and Technology Group > > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > --===============0164939372==-- > From aliva at gatech.edu Tue Mar 6 17:01:05 2007 From: aliva at gatech.edu (Aliva Pattnaik) Date: Tue Mar 6 17:01:38 2007 Subject: [mvapich-discuss] deadlock with g95/gfortran In-Reply-To: <45E895B0.2050007@cse.ohio-state.edu> References: <1172466230.45e26a36737b2@webmail.mail.gatech.edu> <45E895B0.2050007@cse.ohio-state.edu> Message-ID: <1173218465.45ede4a14f184@webmail.mail.gatech.edu> Hi Shaun, I am now able to run the fortran examples using gfortran after setting the environment variable as suggested by you: setenv GFORTRAN_UNBUFFERED_ALL y Thsnks a lot for your help, Aliva Quoting Shaun Rowland : > Aliva Pattnaik wrote: > > Hi, > > > > I am trying to run the fortran example problem(fpi.f) that comes with > mvapich2- > > 0.9.8. I am using g95 to compile it. But while running it with mpiexec its > > getting deadlock, though in the "Top" output I can see the processes taking > 99% > > of CPU time. The same situation is arising while using gfortran. But I am > able > > to run c example problems compiled with gcc, successfully. > > > > The cluster that I am using is 64 bit AMD opteron with infiniband. > > > > I will really appreciate if someone can help me in fixing this problem. > > > > Thank you very much for your help, > > Aliva > > Could you also please try setting this environment variable first: > > export GFORTRAN_UNBUFFERED_ALL=y > > It seems this fixes the issue. > -- > Shaun Rowland rowland@cse.ohio-state.edu > http://www.cse.ohio-state.edu/~rowland/ > -- From Christian.Boehme at gwdg.de Wed Mar 7 06:05:23 2007 From: Christian.Boehme at gwdg.de (Christian Boehme) Date: Wed Mar 7 06:55:14 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 problem with rsh Message-ID: <45EE9C73.1090606@gwdg.de> Hello! We have a problem when using rsh to start the mpd's with mvapich2 version 0.9.8. We use an LSF-wrapper script to execute: $path_to_mpdboot --rsh=rsh --remcons -n $np -f $hostfile but we get errors like this: > mpdboot_gwdm006 (handle_mpd_output 368): failed to connect to mpd on gwdm103 > Failed to start MPD daemons Using $path_to_mpdboot --remcons -n $np -f $hostfile (i.e. ssh) _does_ work, however. We also use similar wrappers for mvapich (mpirun_rsh mechanism) and openmpi, and they work with _both_ rsh and ssh. I admit that we have not done much debugging, but I thought that someone here might hint where to best start looking for problems, as both rsh and mvapich2 principally work, but not together. Thanks! Christian Boehme -- Dr. Christian Boehme email: Christian.Boehme@gwdg.de phone: +49 (0)551 201-1839 --------------------------------------- Gesellschaft f?r wissenschaftliche Datenverarbeitung mbH G?ttingen (GWDG) Am Fassberg 11, 37077 G?ttingen URL: http://www.gwdg.de E-Mail: gwdg@gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Gesch?ftsf?hrer: Prof. Dr. Bernhard Neumair Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: G?ttingen Registergericht: G?ttingen Handelsregister-Nr. B 598 --------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 7400 bytes Desc: S/MIME Cryptographic Signature Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070307/67b881bd/smime.bin From Christian_Boehme at freenet.de Wed Mar 7 07:04:17 2007 From: Christian_Boehme at freenet.de (Christian Boehme) Date: Wed Mar 7 07:04:53 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 problem with rsh Message-ID: <45EEAA41.9090104@freenet.de> Hello! We have a problem when using rsh to start the mpd's with mvapich2 version 0.9.8. We use an LSF-wrapper script to execute: $path_to_mpdboot --rsh=rsh --remcons -n $np -f $hostfile but we get errors like this: > mpdboot_gwdm006 (handle_mpd_output 368): failed to connect to mpd on gwdm103 > Failed to start MPD daemons Using $path_to_mpdboot --remcons -n $np -f $hostfile (i.e. ssh) _does_ work, however. We also use similar wrappers for mvapich (mpirun_rsh mechanism) and openmpi, and they work with _both_ rsh and ssh. I admit that we have not done much debugging, but I thought that someone here might hint where to best start looking for problems, as both rsh and mvapich2 principally work, but not together. Thanks! Christian Boehme From basv at sara.nl Thu Mar 8 07:56:31 2007 From: basv at sara.nl (Bas van der Vlies) Date: Thu Mar 8 07:57:03 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems Message-ID: <45F007FF.30901@sara.nl> Hello, SARA is testing the openib stack with mvapich2. We have problems with installing the blacs library (www.netlib.org/blacs) it compiles and links correct, some tests namely: * The BSBR test with double precision fails with certain topologies. We tried the GNU compilers (gcc, g77) and the INTEL compilers (icc, ifort). Please help? PS) On our topspin/cisco infiniband stack with their mpi implementation there are no problems. -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From basv at sara.nl Thu Mar 8 09:11:59 2007 From: basv at sara.nl (Bas van der Vlies) Date: Thu Mar 8 09:12:33 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45F007FF.30901@sara.nl> References: <45F007FF.30901@sara.nl> Message-ID: <45F019AF.2050003@sara.nl> Bas van der Vlies wrote: > Hello, > > SARA is testing the openib stack with mvapich2. > > We have problems with installing the blacs library > (www.netlib.org/blacs) it compiles and links correct, some tests namely: > * The BSBR test with double precision fails with certain topologies. > > We tried the GNU compilers (gcc, g77) and the INTEL compilers (icc, > ifort). > > Please help? > > PS) > On our topspin/cisco infiniband stack with their mpi implementation > there are no problems. > Just an update i tried mvapich-0.9.9-beta with openib and blacs. and this runs without any problems. -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From huanwei at cse.ohio-state.edu Thu Mar 8 09:44:16 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Thu Mar 8 09:44:45 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 problem with rsh (fwd) In-Reply-To: Message-ID: Hi Christian, Thanks for using mvapich2. We will try to reproduce it on our system and get back to you. Thanks. -- Wei > ---------- Forwarded message ---------- > Date: Wed, 07 Mar 2007 13:04:17 +0100 > From: Christian Boehme > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] mvapich2-0.9.8 problem with rsh > > Hello! > > We have a problem when using rsh to start the mpd's with mvapich2 > version 0.9.8. We use an LSF-wrapper script to execute: > > $path_to_mpdboot --rsh=rsh --remcons -n $np -f $hostfile > > but we get errors like this: > > > mpdboot_gwdm006 (handle_mpd_output 368): failed to connect to mpd on gwdm103 > > Failed to start MPD daemons > > Using > > $path_to_mpdboot --remcons -n $np -f $hostfile > > (i.e. ssh) _does_ work, however. We also use similar wrappers for > mvapich (mpirun_rsh mechanism) and openmpi, and they work with _both_ > rsh and ssh. I admit that we have not done much debugging, but I thought > that someone here might hint where to best start looking for problems, > as both rsh and mvapich2 principally work, but not together. Thanks! > > Christian Boehme > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From basv at sara.nl Fri Mar 9 02:55:03 2007 From: basv at sara.nl (Bas van der Vlies) Date: Fri Mar 9 02:55:42 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45F019AF.2050003@sara.nl> References: <45F007FF.30901@sara.nl> <45F019AF.2050003@sara.nl> Message-ID: <45F112D7.5010504@sara.nl> Bas van der Vlies wrote: > Bas van der Vlies wrote: >> Hello, >> >> SARA is testing the openib stack with mvapich2. >> >> We have problems with installing the blacs library >> (www.netlib.org/blacs) it compiles and links correct, some tests namely: >> * The BSBR test with double precision fails with certain topologies. >> >> We tried the GNU compilers (gcc, g77) and the INTEL compilers (icc, >> ifort). >> >> Please help? >> >> PS) >> On our topspin/cisco infiniband stack with their mpi implementation >> there are no problems. >> > > Just an update i tried mvapich-0.9.9-beta with openib and blacs. and > this runs without any problems. > > We also tried the MPICH2 version 1.0.5p3 without any problems. I will now test the mvapich2 trunk version and i will let you know. Regards -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From basv at sara.nl Fri Mar 9 03:42:06 2007 From: basv at sara.nl (Bas van der Vlies) Date: Fri Mar 9 03:42:37 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45F112D7.5010504@sara.nl> References: <45F007FF.30901@sara.nl> <45F019AF.2050003@sara.nl> <45F112D7.5010504@sara.nl> Message-ID: <45F11DDE.8000002@sara.nl> Bas van der Vlies wrote: > Bas van der Vlies wrote: >> Bas van der Vlies wrote: >>> Hello, >>> >>> SARA is testing the openib stack with mvapich2. >>> >>> We have problems with installing the blacs library >>> (www.netlib.org/blacs) it compiles and links correct, some tests namely: >>> * The BSBR test with double precision fails with certain topologies. >>> >>> We tried the GNU compilers (gcc, g77) and the INTEL compilers (icc, >>> ifort). >>> >>> Please help? >>> >>> PS) >>> On our topspin/cisco infiniband stack with their mpi implementation >>> there are no problems. >>> >> >> Just an update i tried mvapich-0.9.9-beta with openib and blacs. and >> this runs without any problems. >> >> > > We also tried the MPICH2 version 1.0.5p3 without any problems. I will > now test the mvapich2 trunk version and i will let you know. > > Regards > Just tested the MVAPICH2 trunk version and this fails also at the double precision tests. I am clueless at this point. Are there sites using blacs/scalapack with mvapich2? we going back to MVAPICH1 for blacs and acalapack. {{{ INTEGER BSBR TESTS: BEGIN. INTEGER BSBR TESTS: 9600 TESTS; 3600 PASSED, 6000 SKIPPED, 0 FAILED. REAL BSBR TESTS: BEGIN. REAL BSBR TESTS: 9600 TESTS; 3600 PASSED, 6000 SKIPPED, 0 FAILED. DOUBLE PRECISION BSBR TESTS: BEGIN. PROCESS { 0, 0} REPORTS ERRORS IN TEST# 1370: Invalid element at A( 2, 1): Expected=-.1559522301235248 ; Received=-.2000000000000000E-01 Invalid element at A( 3, 1): Expected=-.2849901817584808 ; Received=-.2000000000000000E-01 Invalid element at A( 4, 1): Expected=-.1751754689686109 ; Received=-.2000000000000000E-01 Invalid element at A( 5, 1): Expected=-.8236521034076603 ; Received=-.2000000000000000E-01 }}} -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From panda at cse.ohio-state.edu Fri Mar 9 07:56:28 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Mar 9 07:56:57 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45F11DDE.8000002@sara.nl> from "Bas van der Vlies" at Mar 09, 2007 09:42:06 AM Message-ID: <200703091256.l29CuSUq003049@xi.cse.ohio-state.edu> Hi, > >>> SARA is testing the openib stack with mvapich2. > >>> > >>> We have problems with installing the blacs library > >>> (www.netlib.org/blacs) it compiles and links correct, some tests namely: > >>> * The BSBR test with double precision fails with certain topologies. > >>> > >>> We tried the GNU compilers (gcc, g77) and the INTEL compilers (icc, > >>> ifort). > >>> > >>> Please help? > >>> > >>> PS) > >>> On our topspin/cisco infiniband stack with their mpi implementation > >>> there are no problems. > >>> > >> > >> Just an update i tried mvapich-0.9.9-beta with openib and blacs. and > >> this runs without any problems. > >> > >> > > > > We also tried the MPICH2 version 1.0.5p3 without any problems. I will > > now test the mvapich2 trunk version and i will let you know. > > > > Regards > > > > Just tested the MVAPICH2 trunk version and this fails also at the double > precision tests. I am clueless at this point. Are there sites using > blacs/scalapack with mvapich2? Thanks for reporting these issues. Since yesterday, we have been looking at this issue. We hope to keep you updated on our findings and fixes as soon as possible. Best Regards, DK > > we going back to MVAPICH1 for blacs and acalapack. > > {{{ > INTEGER BSBR TESTS: BEGIN. > INTEGER BSBR TESTS: 9600 TESTS; 3600 PASSED, 6000 SKIPPED, 0 FAILED. > > > REAL BSBR TESTS: BEGIN. > REAL BSBR TESTS: 9600 TESTS; 3600 PASSED, 6000 SKIPPED, 0 FAILED. > > > DOUBLE PRECISION BSBR TESTS: BEGIN. > > PROCESS { 0, 0} REPORTS ERRORS IN TEST# 1370: > Invalid element at A( 2, 1): > Expected=-.1559522301235248 ; Received=-.2000000000000000E-01 > Invalid element at A( 3, 1): > Expected=-.2849901817584808 ; Received=-.2000000000000000E-01 > Invalid element at A( 4, 1): > Expected=-.1751754689686109 ; Received=-.2000000000000000E-01 > Invalid element at A( 5, 1): > Expected=-.8236521034076603 ; Received=-.2000000000000000E-01 > }}} > > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: basv@sara.nl * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From basv at sara.nl Fri Mar 9 08:43:21 2007 From: basv at sara.nl (Bas van der Vlies) Date: Fri Mar 9 08:44:03 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <200703091256.l29CuSUq003049@xi.cse.ohio-state.edu> References: <200703091256.l29CuSUq003049@xi.cse.ohio-state.edu> Message-ID: <45F16479.90201@sara.nl> Dhabaleswar Panda wrote: > Hi, > >>>>> SARA is testing the openib stack with mvapich2. >>>>> >>>>> We have problems with installing the blacs library >>>>> (www.netlib.org/blacs) it compiles and links correct, some tests namely: >>>>> * The BSBR test with double precision fails with certain topologies. >>>>> >>>>> We tried the GNU compilers (gcc, g77) and the INTEL compilers (icc, >>>>> ifort). >>>>> >>>>> Please help? >>>>> >>>>> PS) >>>>> On our topspin/cisco infiniband stack with their mpi implementation >>>>> there are no problems. >>>>> >>>> Just an update i tried mvapich-0.9.9-beta with openib and blacs. and >>>> this runs without any problems. >>>> >>>> >>> We also tried the MPICH2 version 1.0.5p3 without any problems. I will >>> now test the mvapich2 trunk version and i will let you know. >>> >>> Regards >>> >> Just tested the MVAPICH2 trunk version and this fails also at the double >> precision tests. I am clueless at this point. Are there sites using >> blacs/scalapack with mvapich2? > > Thanks for reporting these issues. Since yesterday, we have been > looking at this issue. We hope to keep you updated on our findings and > fixes as soon as possible. > > Best Regards, > Thanks we also trying to find it. But first i have to roll back to mvapich1 to solve the problem -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From basv at sara.nl Fri Mar 9 10:29:59 2007 From: basv at sara.nl (Bas van der Vlies) Date: Fri Mar 9 10:30:36 2007 Subject: [mvapich-discuss] mvapich1-0.9.9-beta suggestion for F90 Message-ID: <45F17D77.9070405@sara.nl> Hello, I could not build the fortran90 mpich modules because we got an error: {{{ checking whether Fortran 90 is compatible with Fortran 77... no configure: error: Fortran 90 and Fortran 77 compilers are not compatible. They generate external symbol names that are different. }}} the following F90 flag solves the problem: F90=gfortran F90FLAGS="-ff2c" ./make.mvapich.gen2 Just a tip ;-) -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From surs at cse.ohio-state.edu Fri Mar 9 11:03:03 2007 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Fri Mar 9 11:03:32 2007 Subject: [mvapich-discuss] mvapich1-0.9.9-beta suggestion for F90 In-Reply-To: <45F17D77.9070405@sara.nl> References: <45F17D77.9070405@sara.nl> Message-ID: <20070309160300.GB18726@cse.ohio-state.edu> Hello, * On Mar,1 Bas van der Vlies wrote : > Hello, > > I could not build the fortran90 mpich modules because we got an error: > {{{ > checking whether Fortran 90 is compatible with Fortran 77... no > configure: error: Fortran 90 and Fortran 77 compilers are not compatible. > They generate external symbol names that are different. > }}} > > the following F90 flag solves the problem: > F90=gfortran F90FLAGS="-ff2c" ./make.mvapich.gen2 > > Just a tip ;-) Thanks for this tip! We'll include it in our troubleshooting guide. Thanks, Sayantan. > > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: basv@sara.nl * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- http://www.cse.ohio-state.edu/~surs From rowland at cse.ohio-state.edu Fri Mar 9 12:22:21 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Fri Mar 9 12:23:55 2007 Subject: [mvapich-discuss] mvapich1-0.9.9-beta suggestion for F90 In-Reply-To: <45F17D77.9070405@sara.nl> References: <45F17D77.9070405@sara.nl> Message-ID: <45F197CD.9000909@cse.ohio-state.edu> Bas van der Vlies wrote: > Hello, > > I could not build the fortran90 mpich modules because we got an error: > {{{ > checking whether Fortran 90 is compatible with Fortran 77... no > configure: error: Fortran 90 and Fortran 77 compilers are not compatible. > They generate external symbol names that are different. > }}} > > the following F90 flag solves the problem: > F90=gfortran F90FLAGS="-ff2c" ./make.mvapich.gen2 > > Just a tip ;-) What are you using or the F77 compiler? If you use gfortran for both, then I would think you wouldn't need F90FLAGS. The latests versions of gcc only have gfortran. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From rowland at cse.ohio-state.edu Fri Mar 9 13:51:11 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Fri Mar 9 13:52:46 2007 Subject: [mvapich-discuss] mvapich1-0.9.9-beta suggestion for F90 In-Reply-To: <45F17D77.9070405@sara.nl> References: <45F17D77.9070405@sara.nl> Message-ID: <45F1AC9F.1000200@cse.ohio-state.edu> Bas van der Vlies wrote: > Hello, > > I could not build the fortran90 mpich modules because we got an error: > {{{ > checking whether Fortran 90 is compatible with Fortran 77... no > configure: error: Fortran 90 and Fortran 77 compilers are not compatible. > They generate external symbol names that are different. > }}} > > the following F90 flag solves the problem: > F90=gfortran F90FLAGS="-ff2c" ./make.mvapich.gen2 > > Just a tip ;-) Hi. I've looked into this some more. I think the F90FLAGS you mention might only be necessary depending on the version of g77 and gfortran you are trying to use. On our systems, I can use g77 and gfortran (for F77 and F90) without setting F90FLAGS. I assume you are trying to do this and getting the error because the program should be from different versions of GCC. So, if the above works and lets you use g77 and gfortran, that's useful information. What I suggested previously was using gfortran for both F77 and F90 as later version of GCC just have that. This also will mean the Fortran compilers are the same, so they should be "compatible". In order to do this, you need to set the following environment variables: F77=gfortran F90=gfortran F77_GETARGDECL=" " We will put something in the User Guide about this. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From basv at sara.nl Mon Mar 12 04:45:01 2007 From: basv at sara.nl (Bas van der Vlies) Date: Mon Mar 12 04:45:09 2007 Subject: [mvapich-discuss] mvapich1-0.9.9-beta suggestion for F90 In-Reply-To: <45F1AC9F.1000200@cse.ohio-state.edu> References: <45F17D77.9070405@sara.nl> <45F1AC9F.1000200@cse.ohio-state.edu> Message-ID: <45F5130D.1050408@sara.nl> Shaun Rowland wrote: > Bas van der Vlies wrote: >> Hello, >> >> I could not build the fortran90 mpich modules because we got an error: >> {{{ >> checking whether Fortran 90 is compatible with Fortran 77... no >> configure: error: Fortran 90 and Fortran 77 compilers are not compatible. >> They generate external symbol names that are different. >> }}} >> >> the following F90 flag solves the problem: >> F90=gfortran F90FLAGS="-ff2c" ./make.mvapich.gen2 >> >> Just a tip ;-) > > Hi. I've looked into this some more. I think the F90FLAGS you mention > might only be necessary depending on the version of g77 and gfortran you > are trying to use. On our systems, I can use g77 and gfortran (for F77 > and F90) without setting F90FLAGS. I assume you are trying to do this > and getting the error because the program should be from different > versions of GCC. So, if the above works and lets you use g77 and > gfortran, that's useful information. > > What I suggested previously was using gfortran for both F77 and F90 as > later version of GCC just have that. This also will mean the Fortran > compilers are the same, so they should be "compatible". In order to do > this, you need to set the following environment variables: > > F77=gfortran > F90=gfortran > F77_GETARGDECL=" " > > We will put something in the User Guide about this. Thanks, we are using debian etch standards: gfortran = 4.1.1 g77 = 3.4 -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From basv at sara.nl Mon Mar 12 05:21:04 2007 From: basv at sara.nl (Bas van der Vlies) Date: Mon Mar 12 05:21:10 2007 Subject: [mvapich-discuss] mvapich1-0.9.9-beta suggestion for F90 In-Reply-To: <45F5130D.1050408@sara.nl> References: <45F17D77.9070405@sara.nl> <45F1AC9F.1000200@cse.ohio-state.edu> <45F5130D.1050408@sara.nl> Message-ID: <45F51B80.1050003@sara.nl> > > Thanks, we are using debian etch standards: > gfortran = 4.1.1 > g77 = 3.4 > > With mvapich2 we do not have that issue. Forgot to mention it. -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From david-m at orbotech.com Tue Mar 13 11:20:11 2007 From: david-m at orbotech.com (David Minor) Date: Tue Mar 13 11:21:16 2007 Subject: [mvapich-discuss] Problem running directly on IB with ofed Message-ID: Hi, My jobs are running fine in IPoIB but I cant' get them to run straight over any of the ib direct protocols. I build mvapich2-0.9.8 with the following directives: ./configure --prefix=/usr/local/mvapich2 --enable-threads=multiple --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd --disable-romio --without-mpe I'm launching mpd using rsh and giving hostnames that I've mapped to the IP addresses of the HCA's that interest me. I then do an mpirun giving these same hostnames. I think the problem is that I'm using the standard hostnames which are mapped to ip addresses not LID or GID's of the HCA's, but I don't know what else to do. This is my first time trying to get things running in IB, all the tests seem to show my config is OK. Advice? Thanks, David Minor Orbotech -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070313/fd902daa/attachment.html From vishnu at cse.ohio-state.edu Tue Mar 13 15:04:50 2007 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Tue Mar 13 15:04:53 2007 Subject: [mvapich-discuss] Problem running directly on IB with ofed In-Reply-To: Message-ID: Hi David, Thanks for trying MVAPICH2 and reporting the problem to us. The configuration file which you are using does not specify all the CFLAGS which are necessary for compiling MVAPICH2. To address this issues, we have provided a variety of compilation scripts in the top MVAPICH2 directory. As an example, make.mvapich2.gen script will help you compile MVAPICH2 with the OFED libraries. May i suggest you to use this script and report the results of your experimentation back to us. Thanks and regards, :- Abhinav ------------------------------- Abhinav Vishnu, Graduate Research Associate, Department Of Comp. Sc. & Engg. The Ohio State University. ------------------------------- On Tue, 13 Mar 2007, David Minor wrote: > Date: Tue, 13 Mar 2007 17:20:11 +0200 > From: David Minor > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] Problem running directly on IB with ofed > > Hi, > > My jobs are running fine in IPoIB but I cant' get them to run straight > over any of the ib direct protocols. I build mvapich2-0.9.8 with the > following directives: > > ./configure --prefix=/usr/local/mvapich2 --enable-threads=multiple > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd > --disable-romio --without-mpe > > > > I'm launching mpd using rsh and giving hostnames that I've mapped to the > IP addresses of the HCA's that interest me. > > I then do an mpirun giving these same hostnames. I think the problem is > that I'm using the standard hostnames which are mapped to ip addresses > not LID or GID's of the HCA's, but I don't know what else to do. This is > my first time trying to get things running in IB, all the tests seem to > show my config is OK. Advice? > > Thanks, > > David Minor > > Orbotech > > > > From vishnu at cse.ohio-state.edu Tue Mar 13 16:17:00 2007 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Tue Mar 13 16:19:07 2007 Subject: [mvapich-discuss] Problem running directly on IB with ofed In-Reply-To: References: Message-ID: <20070313201659.GA18399@cse.ohio-state.edu> Hi David, In my previous mail, i forgot to mention the references to the user gudie, which should provide you more information about compilation and running programs using MVAPICH2. For more details on compiling MVAPICH2, please refer to: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html#x1-100004.4.1 For usage instructions, please refer to the following: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html#x1-150005 There are various sections for easy troubleshooting, which are present at the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html#x1-270007 Thanks again, :- Abhinav * On Mar,2 Abhinav Vishnu wrote : > Hi David, > > Thanks for trying MVAPICH2 and reporting the problem to us. > The configuration file which you are using does not specify all the CFLAGS > which are necessary for compiling MVAPICH2. > To address this issues, we have provided a variety of compilation scripts > in the top MVAPICH2 directory. As an example, > make.mvapich2.gen script will help you compile MVAPICH2 with the OFED > libraries. > > May i suggest you to use this script and report the results of your > experimentation back to us. > > Thanks and regards, > > :- Abhinav > > ------------------------------- > Abhinav Vishnu, > Graduate Research Associate, > Department Of Comp. Sc. & Engg. > The Ohio State University. > ------------------------------- > > On Tue, 13 Mar 2007, David Minor wrote: > > > Date: Tue, 13 Mar 2007 17:20:11 +0200 > > From: David Minor > > To: mvapich-discuss@cse.ohio-state.edu > > Subject: [mvapich-discuss] Problem running directly on IB with ofed > > > > Hi, > > > > My jobs are running fine in IPoIB but I cant' get them to run straight > > over any of the ib direct protocols. I build mvapich2-0.9.8 with the > > following directives: > > > > ./configure --prefix=/usr/local/mvapich2 --enable-threads=multiple > > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd > > --disable-romio --without-mpe > > > > > > > > I'm launching mpd using rsh and giving hostnames that I've mapped to the > > IP addresses of the HCA's that interest me. > > > > I then do an mpirun giving these same hostnames. I think the problem is > > that I'm using the standard hostnames which are mapped to ip addresses > > not LID or GID's of the HCA's, but I don't know what else to do. This is > > my first time trying to get things running in IB, all the tests seem to > > show my config is OK. Advice? > > > > Thanks, > > > > David Minor > > > > Orbotech > > > > > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From aliva at gatech.edu Tue Mar 13 16:36:36 2007 From: aliva at gatech.edu (Aliva Pattnaik) Date: Tue Mar 13 16:35:49 2007 Subject: [mvapich-discuss] How to make sure application is running on IB In-Reply-To: <20070313201659.GA18399@cse.ohio-state.edu> References: <20070313201659.GA18399@cse.ohio-state.edu> Message-ID: <45F70B54.5070109@gatech.edu> Hi, I am using mvapich2 which is compiled using the make.mvapich2.vapi script file where I am providing the path to ibgd library from mellanox . The cluster that I am using has infiniband network. But I am wondering how to make sure that my application is running over infiniband not tcp/ip? Thank you very much for your answer, Aliva From huanwei at cse.ohio-state.edu Tue Mar 13 16:57:22 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Tue Mar 13 16:57:24 2007 Subject: [mvapich-discuss] How to make sure application is running on IB In-Reply-To: <45F70B54.5070109@gatech.edu> Message-ID: Hi, You can just compile and run osu_latency.c in the osu_benchmarks directory and see the latency you observed. For Infiniband you should definitely observe <10us for 1 byte messages (you probabaly will get far lower if your machines are with PCI-Ex NIC). Otherwise something is wrong with the setup. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Tue, 13 Mar 2007, Aliva Pattnaik wrote: > Hi, > > I am using mvapich2 which is compiled using the make.mvapich2.vapi > script file where I am providing the path to ibgd library from mellanox > . The cluster that I am using has infiniband network. But I am wondering > how to make sure that my application is running over infiniband not tcp/ip? > > Thank you very much for your answer, > Aliva > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From Durga.Choudhury at drs-ss.com Tue Mar 13 17:15:01 2007 From: Durga.Choudhury at drs-ss.com (Choudhury, Durga) Date: Tue Mar 13 17:15:07 2007 Subject: [mvapich-discuss] How to make sure application is running on IB In-Reply-To: Message-ID: Hi Wei There has to be something a bit more reliable than this; for example, if the machine has 10G Ethernet running TCP/IP, I suspect the latencies will be comparable, if not lower. Regards Durga -----Original Message----- From: mvapich-discuss-bounces@cse.ohio-state.edu [mailto:mvapich-discuss-bounces@cse.ohio-state.edu] On Behalf Of wei huang Sent: Tuesday, March 13, 2007 4:57 PM To: Aliva Pattnaik Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] How to make sure application is running on IB Hi, You can just compile and run osu_latency.c in the osu_benchmarks directory and see the latency you observed. For Infiniband you should definitely observe <10us for 1 byte messages (you probabaly will get far lower if your machines are with PCI-Ex NIC). Otherwise something is wrong with the setup. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Tue, 13 Mar 2007, Aliva Pattnaik wrote: > Hi, > > I am using mvapich2 which is compiled using the make.mvapich2.vapi > script file where I am providing the path to ibgd library from mellanox > . The cluster that I am using has infiniband network. But I am wondering > how to make sure that my application is running over infiniband not tcp/ip? > > Thank you very much for your answer, > Aliva > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From david-m at orbotech.com Wed Mar 14 03:22:07 2007 From: david-m at orbotech.com (David Minor) Date: Wed Mar 14 03:23:16 2007 Subject: [mvapich-discuss] Problem running directly on IB with ofed Message-ID: Hi Abhinav, But these are precisely the config results I get from running make.mvapich2.gen2! The only thing I changed was setting MULTI_THREAD to "yes" in my environment. The link you sent me to documentation mentions a make.mvapich2.ofa script which doesn't exist in the 0.9.8 distribution. I have a feeling I'm not giving the correct hostnames or I didn't map them correctly in /etc/hosts. I assigned them like this: 10.1.111.23 student1 student1.orbotech.com 2.2.2.1 student1_ib0 2.2.2.2 student2_ib0 That is the 2.2.2.? numbers are the ib0 devices in ifconfig. Am I supposed to use ib LID or GID addresses here? Is there a way to tell the system to map the ip addresses straight to an ib device, instead of ipoib? Please advise. Regards, David -----Original Message----- From: Abhinav Vishnu [mailto:vishnu@cse.ohio-state.edu] Sent: Tuesday, March 13, 2007 9:05 PM To: David Minor Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Problem running directly on IB with ofed Hi David, Thanks for trying MVAPICH2 and reporting the problem to us. The configuration file which you are using does not specify all the CFLAGS which are necessary for compiling MVAPICH2. To address this issues, we have provided a variety of compilation scripts in the top MVAPICH2 directory. As an example, make.mvapich2.gen script will help you compile MVAPICH2 with the OFED libraries. May i suggest you to use this script and report the results of your experimentation back to us. Thanks and regards, :- Abhinav ------------------------------- Abhinav Vishnu, Graduate Research Associate, Department Of Comp. Sc. & Engg. The Ohio State University. ------------------------------- On Tue, 13 Mar 2007, David Minor wrote: > Date: Tue, 13 Mar 2007 17:20:11 +0200 > From: David Minor > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] Problem running directly on IB with ofed > > Hi, > > My jobs are running fine in IPoIB but I cant' get them to run straight > over any of the ib direct protocols. I build mvapich2-0.9.8 with the > following directives: > > ./configure --prefix=/usr/local/mvapich2 --enable-threads=multiple > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd > --disable-romio --without-mpe > > > > I'm launching mpd using rsh and giving hostnames that I've mapped to the > IP addresses of the HCA's that interest me. > > I then do an mpirun giving these same hostnames. I think the problem is > that I'm using the standard hostnames which are mapped to ip addresses > not LID or GID's of the HCA's, but I don't know what else to do. This is > my first time trying to get things running in IB, all the tests seem to > show my config is OK. Advice? > > Thanks, > > David Minor > > Orbotech > > > > From david-m at orbotech.com Wed Mar 14 07:09:23 2007 From: david-m at orbotech.com (David Minor) Date: Wed Mar 14 07:10:32 2007 Subject: [mvapich-discuss] Problem running directly on IB with ofed Message-ID: Hi Abhinav, I've got it MULTI_THREADED was the problem! I take it the openib driver is not multi-threaded and so when you do MPI::Init with the MULTIPLE_THREAD options it chooses the ipoib driver. Now does this mean that I have NO thread-safety whatsoever for MPI calls? Is it possible to compile openib drivers thread-safe? I've got a lot of questions about this. Regards, David -----Original Message----- From: mvapich-discuss-bounces@cse.ohio-state.edu [mailto:mvapich-discuss-bounces@cse.ohio-state.edu] On Behalf Of David Minor Sent: Wednesday, March 14, 2007 9:22 AM To: Abhinav Vishnu Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] Problem running directly on IB with ofed Hi Abhinav, But these are precisely the config results I get from running make.mvapich2.gen2! The only thing I changed was setting MULTI_THREAD to "yes" in my environment. The link you sent me to documentation mentions a make.mvapich2.ofa script which doesn't exist in the 0.9.8 distribution. I have a feeling I'm not giving the correct hostnames or I didn't map them correctly in /etc/hosts. I assigned them like this: 10.1.111.23 student1 student1.orbotech.com 2.2.2.1 student1_ib0 2.2.2.2 student2_ib0 That is the 2.2.2.? numbers are the ib0 devices in ifconfig. Am I supposed to use ib LID or GID addresses here? Is there a way to tell the system to map the ip addresses straight to an ib device, instead of ipoib? Please advise. Regards, David -----Original Message----- From: Abhinav Vishnu [mailto:vishnu@cse.ohio-state.edu] Sent: Tuesday, March 13, 2007 9:05 PM To: David Minor Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Problem running directly on IB with ofed Hi David, Thanks for trying MVAPICH2 and reporting the problem to us. The configuration file which you are using does not specify all the CFLAGS which are necessary for compiling MVAPICH2. To address this issues, we have provided a variety of compilation scripts in the top MVAPICH2 directory. As an example, make.mvapich2.gen script will help you compile MVAPICH2 with the OFED libraries. May i suggest you to use this script and report the results of your experimentation back to us. Thanks and regards, :- Abhinav ------------------------------- Abhinav Vishnu, Graduate Research Associate, Department Of Comp. Sc. & Engg. The Ohio State University. ------------------------------- On Tue, 13 Mar 2007, David Minor wrote: > Date: Tue, 13 Mar 2007 17:20:11 +0200 > From: David Minor > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] Problem running directly on IB with ofed > > Hi, > > My jobs are running fine in IPoIB but I cant' get them to run straight > over any of the ib direct protocols. I build mvapich2-0.9.8 with the > following directives: > > ./configure --prefix=/usr/local/mvapich2 --enable-threads=multiple > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd > --disable-romio --without-mpe > > > > I'm launching mpd using rsh and giving hostnames that I've mapped to the > IP addresses of the HCA's that interest me. > > I then do an mpirun giving these same hostnames. I think the problem is > that I'm using the standard hostnames which are mapped to ip addresses > not LID or GID's of the HCA's, but I don't know what else to do. This is > my first time trying to get things running in IB, all the tests seem to > show my config is OK. Advice? > > Thanks, > > David Minor > > Orbotech > > > > _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From koop at cse.ohio-state.edu Wed Mar 14 09:44:06 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed Mar 14 09:44:11 2007 Subject: [mvapich-discuss] Problem running directly on IB with ofed In-Reply-To: Message-ID: > I've got it MULTI_THREADED was the problem! I take it the openib > driver is not multi-threaded and so when you do MPI::Init with the > MULTIPLE_THREAD options it chooses the ipoib driver. Now does this > mean that I have NO thread-safety whatsoever for MPI calls? Is it > possible to compile openib drivers thread-safe? I've got a lot of > questions about this. Regards, David David, Sorry to see the trouble you've been having. There was an issue with multithreading and some optimizations we had made for collectives using shared memory. We've since fixed this, however, the main tarball has not yet been updated. Can you try this updated version? You can download an updated tarball from the following location: http://mvapich.cse.ohio-state.edu/nightly/mvapich2/branches/0.9.8/mvapich2-0.9.8-2007-03-13.tar.gz Or, you can do the following in your existing copy of make.mvapich.gen2 (line 118) COLL_FLAG="-D_SHMEM_COLL_" to COLL_FLAG="-" The OpenFabrics drivers are thread-safe. At this point they are not fork-safe though. We have tested with multi-threaded MPI tests and things should operate correctly. Let us know if this does not solve your issue. Thanks, Matt From aquarijen at gmail.com Wed Mar 14 11:43:08 2007 From: aquarijen at gmail.com (Aquarijen) Date: Wed Mar 14 11:43:12 2007 Subject: [mvapich-discuss] Can't read MPIRUN_PROCESSES Message-ID: <2e5ad1b10703140843v768171acod2d714d055cc240c@mail.gmail.com> Hi All, As you know I had had some difficulty with getting mvapich 0.9.8 to work on my system. So I decided to try 0.9.9. I upgraded OFED to 1.1 and installed mvapich 0.9.9 with no problems. I can use the mpicc from mvapich to compile code without any errors. The problem comes when I try to run a job. Even cpi reports the following error for each processor requested: Can't read MPIRUN_PROCESSES What am I doing wrong? I'd be happy to provide any additional information about my cluster that you need. Thank You!!!! Frustrated Newbie, Jennifer -- When I play with my cat, who knows whether she is not amusing herself with me more than I with her. Michel de Montaigne From koop at cse.ohio-state.edu Wed Mar 14 12:43:37 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed Mar 14 12:43:40 2007 Subject: [mvapich-discuss] Can't read MPIRUN_PROCESSES In-Reply-To: <2e5ad1b10703140843v768171acod2d714d055cc240c@mail.gmail.com> Message-ID: Jennifer, > As you know I had had some difficulty with getting mvapich 0.9.8 to > work on my system. So I decided to try 0.9.9. I upgraded OFED to 1.1 > and installed mvapich 0.9.9 with no problems. I can use the mpicc > from mvapich to compile code without any errors. > > The problem comes when I try to run a job. Even cpi reports the > following error for each processor requested: > > Can't read MPIRUN_PROCESSES How are you launching the jobs? The problem you are describing sounds like you may be using the mpirun_rsh from 0.9.8 to launch a program compiled with 0.9.9. Can you check to see that you are using the 0.9.9 mpirun_rsh? Thanks, Matt From aquarijen at gmail.com Wed Mar 14 13:33:39 2007 From: aquarijen at gmail.com (Aquarijen) Date: Wed Mar 14 13:33:42 2007 Subject: [mvapich-discuss] Can't read MPIRUN_PROCESSES In-Reply-To: References: <2e5ad1b10703140843v768171acod2d714d055cc240c@mail.gmail.com> Message-ID: <2e5ad1b10703141033l4df1478ds507cad2c3433e74b@mail.gmail.com> Yes, I've double checked - I am using mpirun from 0.9.9. Incidentally, I get the same exact error when I run using "mpiexec -comm mpich-ib ./cpi" Maybe I have some other problem with my setup...? If I do a ibstat on each of the nodes, all of them output Active and LinkUp... Thanks for responding, Matt. -Jen On 3/14/07, Matthew Koop wrote: > > Jennifer, > > > As you know I had had some difficulty with getting mvapich 0.9.8 to > > work on my system. So I decided to try 0.9.9. I upgraded OFED to 1.1 > > and installed mvapich 0.9.9 with no problems. I can use the mpicc > > from mvapich to compile code without any errors. > > > > The problem comes when I try to run a job. Even cpi reports the > > following error for each processor requested: > > > > Can't read MPIRUN_PROCESSES > > How are you launching the jobs? The problem you are describing sounds like > you may be using the mpirun_rsh from 0.9.8 to launch a program compiled > with 0.9.9. Can you check to see that you are using the 0.9.9 mpirun_rsh? > > Thanks, > > Matt > > > -- When I play with my cat, who knows whether she is not amusing herself with me more than I with her. Michel de Montaigne From dog at lanl.gov Wed Mar 14 13:26:53 2007 From: dog at lanl.gov (David Gunter) Date: Wed Mar 14 13:41:35 2007 Subject: [mvapich-discuss] Building Panasis support for MVAPICH2, possible issue with included files Message-ID: I believe I have found a bug in the mvapich2 build process when enabling Panasas file system support. I followed the advice in the README.romio file but I was getting an error associated with being unable to find mpi.h. Along with my other configure options, I've included the "--enable- romio --with-file-system=panfs+ufs+nfs" part. I have also made sure that CFLAGS contains a "-I/usr/include" since the panasas header files are located there. The build follows the configure set up as normal until it gets to building the panfs bit: ... compiling ROMIO in directory adio/ad_panfs make[5]: Entering directory `/net/scratch1/dog/rpm/BUILD/ mvapich2-0.9.8/src/mpi/romio/adio/ad_panfs' /usr/bin/gcc -D_X86_64_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED - DMPID_USE_SEQUENCE_NUMBERS -I/usr/local/ofed/include -I/usr/include - O2 -D_SHMEM_COLL_ -D_X86_64_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED -DMPID_USE_SEQUENCE_NUMBERS -I/usr/local/ofed/include -I/usr/include - O2 -D_SHMEM_COLL_ -DFORTRANUNDERSCORE -D_LARGEFILE64_SOURCE - D_FILE_OFFSET_BITS=64 -DHAVE_ROMIOCONF_H -I. -I/net/scratch1/dog/rpm/ BUILD/mvapich2-0.9.8/src/mpi/romio/adio/ad_panfs/../include -I../ include -I../../include -I/opt/panfs/include -c ad_panfs.c In file included from ad_panfs.h:15, from ad_panfs.c:9: /net/scratch1/dog/rpm/BUILD/mvapich2-0.9.8/src/mpi/romio/adio/ ad_panfs/../include/adio.h:72:17: mpi.h: No such file or directory In file included from /net/scratch1/dog/rpm/BUILD/mvapich2-0.9.8/src/ mpi/romio/adio/ad_panfs/../include/adio.h:73, from ad_panfs.h:15, from ad_panfs.c:9: ../../include/mpio.h:13:17: mpi.h: No such file or directory In file included from /net/scratch1/dog/rpm/BUILD/mvapich2-0.9.8/src/ mpi/romio/adio/ad_panfs/../include/adio.h:73, from ad_panfs.h:15, from ad_panfs.c:9: ../../include/mpio.h:47: error: syntax error before "MPI_Datatype" ../../include/mpio.h:49: error: syntax error before "datatype" ../../include/mpio.h:119: error: syntax error before "char" ../../include/mpio.h:121: error: syntax error before "MPI_Info" ../../include/mpio.h:125: error: syntax error before "MPI_Group" ... <100's of lines dealing with MPI_xxx defs deleted> It appears that mvapich2-0.9.8/src/mpi/romio/adio/include/adio.h requires mpi.h, but there is no mpi.h to be found. I changed a line in mvapich2-0.9.8/src/mpi/romio/adio/ad_panfs/Makefile.in to point back to mvapich2-0.9.8/src/include and it then completes without error. Here is the change I made: From: INCLUDE_DIR = -I@MPI_INCLUDE_DIR@ -I${srcdir}/../include -I../ include -I../../include -I/opt/panfs/include To: INCLUDE_DIR = -I@MPI_INCLUDE_DIR@ -I${srcdir}/../include -I../include -I../../include -I../../../../include -I/opt/panfs /include Thanks, david -- David Gunter HPC-4: HPC Environments: Parallel Tools Team Los Alamos National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070314/04ad3377/attachment.html From jonathan_follows at uk.ibm.com Wed Mar 14 14:50:33 2007 From: jonathan_follows at uk.ibm.com (Jonathan Follows) Date: Wed Mar 14 14:50:40 2007 Subject: [mvapich-discuss] Jonathan is out of the office on Wednesday Message-ID: I will be out of the office starting 03/14/2007 and will not return until 03/15/2007. From rowland at cse.ohio-state.edu Wed Mar 14 15:53:37 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Wed Mar 14 15:53:40 2007 Subject: [mvapich-discuss] Building Panasis support for MVAPICH2, possible issue with included files In-Reply-To: References: Message-ID: <45F852C1.5020409@cse.ohio-state.edu> David Gunter wrote: > > It appears that mvapich2-0.9.8/src/mpi/romio/adio/include/adio.h > requires mpi.h, but there is no mpi.h to be found. I changed a line in > mvapich2-0.9.8/src/mpi/romio/adio/ad_panfs/Makefile.in to point back to > mvapich2-0.9.8/src/include and it then completes without error. > > Here is the change I made: > > From: > INCLUDE_DIR = -I@MPI_INCLUDE_DIR@ -I${srcdir}/../include -I../include > -I../../include -I/opt/panfs/include > > To: > INCLUDE_DIR = -I@MPI_INCLUDE_DIR@ -I${srcdir}/../include -I../include > -I../../include -I../../../../include -I/opt/panfs > /include Hi David. Thank you very much for reporting this problem and providing a solution. I will apply this fix to our SVN for MVAPICH2. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From huanwei at cse.ohio-state.edu Wed Mar 14 16:49:02 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Wed Mar 14 16:49:05 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45F16479.90201@sara.nl> Message-ID: Hi, Thanks for letting us know the problem. We have generated a patch to address this problem, and have applied it to both the trunk and our svn 0.9.8 branch. Please feel free to update your local version. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 9 Mar 2007, Bas van der Vlies wrote: > Dhabaleswar Panda wrote: > > Hi, > > > >>>>> SARA is testing the openib stack with mvapich2. > >>>>> > >>>>> We have problems with installing the blacs library > >>>>> (www.netlib.org/blacs) it compiles and links correct, some tests namely: > >>>>> * The BSBR test with double precision fails with certain topologies. > >>>>> > >>>>> We tried the GNU compilers (gcc, g77) and the INTEL compilers (icc, > >>>>> ifort). > >>>>> > >>>>> Please help? > >>>>> > >>>>> PS) > >>>>> On our topspin/cisco infiniband stack with their mpi implementation > >>>>> there are no problems. > >>>>> > >>>> Just an update i tried mvapich-0.9.9-beta with openib and blacs. and > >>>> this runs without any problems. > >>>> > >>>> > >>> We also tried the MPICH2 version 1.0.5p3 without any problems. I will > >>> now test the mvapich2 trunk version and i will let you know. > >>> > >>> Regards > >>> > >> Just tested the MVAPICH2 trunk version and this fails also at the double > >> precision tests. I am clueless at this point. Are there sites using > >> blacs/scalapack with mvapich2? > > > > Thanks for reporting these issues. Since yesterday, we have been > > looking at this issue. We hope to keep you updated on our findings and > > fixes as soon as possible. > > > > Best Regards, > > > > Thanks we also trying to find it. But first i have to roll back to > mvapich1 to solve the problem > > > > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: basv@sara.nl * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From aquarijen at gmail.com Thu Mar 15 12:51:45 2007 From: aquarijen at gmail.com (Aquarijen) Date: Thu Mar 15 12:51:49 2007 Subject: [mvapich-discuss] New errors (was can't read MPIRUN_PROCESSES) Message-ID: <2e5ad1b10703150951v3ec83823j5c3106eee9ba9998@mail.gmail.com> Hi Matt, Wow, Yes, that certainly changes things and I get different errors now. In my error file: ----------------------------------------- [0:b09n010.xxx.gov] Abort: [2:b09n008.xxx.gov] Abort: [1:b09n009.xxx.gov] Abort: [3:b09n007.xxx.gov] Abort: [b09n010.xxx.gov:0] Got completion with error IBV_WC_LOC_LEN_ERR, code=1 at line 2381 in file viacheck.c [b09n007.xxx.gov:3] Got completion with error IBV_WC_LOC_LEN_ERR, code=1 at line 2381 in file viacheck.c [3:b09n007.xxx.gov] Abort: [3] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, code=16 at line 2561 in file viacheck.c [1:b09n009.xxx.gov] Abort: [1] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, code=16 at line 2561 in file viacheck.c [b09n009.xxx.gov:1] Got completion with error IBV_WC_LOC_LEN_ERR, code=1 at line 2381 in file viacheck.c [2:b09n008.xxx.gov] Abort: [2] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, code=16 at line 2561 in file viacheck.c [b09n008.xxx.gov:2] Got completion with error IBV_WC_LOC_LEN_ERR, code=1 at line 2381 in file viacheck.c done. ------------------------------------------------------------ Have you seen this before? This is now similar to the problem that I had been having in 0.9.8 that I was unable to figure out. :( Thanks for all of your help!!! -Jennifer On 3/14/07, Matthew Koop wrote: > > > Maybe I should triple check from now on... > > I was using the "mpirun" from 0.9.9, but not the "mpirun_rsh" from > > 0.9.9. Changing this got rid of the error message. Now I get: > > OSU MVAPICH VERSION 0.9.9-SingleRail > > Build-ID: custom > > in the error file - and this seems better! > > But Now I get no output from the cpi or the hello++ program. :( This > > is what I get: > > > echo "Node file: $PBS_NODEFILE :" > > Jen, > > I think if you remove the "-v" from your mpirun_rsh line things should > work for you. Also, you may need to be a bit careful with the > $PBS_NODEFILE if you run more processes than node. With your example it > should be fine, however. > > Let me know if this works. > > Matt > > -- When I play with my cat, who knows whether she is not amusing herself with me more than I with her. Michel de Montaigne From aquarijen at gmail.com Thu Mar 15 16:31:59 2007 From: aquarijen at gmail.com (Aquarijen) Date: Thu Mar 15 16:32:17 2007 Subject: [mvapich-discuss] viacheck.c error? In-Reply-To: <45DE1024.4030009@cse.ohio-state.edu> References: <2e5ad1b10701191137y7e91389fv9561c47d207d61d7@mail.gmail.com> <20070119201522.GA19063@cse.ohio-state.edu> <2e5ad1b10701231238i594604e6w3c48fd23e053827a@mail.gmail.com> <45B6DF54.2070204@cse.ohio-state.edu> <2e5ad1b10702221306m1df8118avcd50cb579bf6b303@mail.gmail.com> <45DE0883.2060409@cse.ohio-state.edu> <2e5ad1b10702221339y2d4814cfx7341c6dccf04cf0b@mail.gmail.com> <45DE1024.4030009@cse.ohio-state.edu> Message-ID: <2e5ad1b10703151331v6a174f10nf1f765da5b217c41@mail.gmail.com> Hi Abhinav, Matt and Everyone, As was suggested as a starting point, I upgraded to OFED 1.1 and made sure that the IBV_EVENT_CLIENT_REREGISTER event was in verbs.h. I also installed mvapich 0.9.9, but had difficulty running any programs - including cpi. So I went back to 0.9.7 which is included with OFED 1.1. I am getting very similar errors as I was getting before the OFED upgrade. I can run cpi, cpip, hello++ and simpleio with no problems and I get expected output. None of the osu benchmarks work for me, however. All of the osu benchmarks give me the following error message: [0] Abort: [b09n010.oic.ornl.gov:0] Got completion with error, code=4 at line 2365 in file viacheck.c done. Has anyone seen this before? I'm puzzled. If it is a connectivity issue, how do I know, and why would cpi and the other very simple programs run? Non-ib batch programs using mpich run successfully. We use ssh for everything and have disabled rsh. The average user cannot ssh to a compute node from the head node, but, once on a compute node, he/she can ssh to other compute nodes without supplying passwords. For mpich, we use mpiexec and this works well for us, but we'd really like to use the infiniband. :( Please point me in the right direction if this is not a mvapich related problem. I administer this cluster, so asking my sysadmin is not an option. ;) Thanks so much for any assistance you can give! Jennifer On 2/22/07, Abhinav Vishnu wrote: > Hi jen, > > > I'm not 100% sure of what information will be most helpful, but the > > error output for osu_bw (as an example) is: > > -------------------------------------------------- > > Connection closed by 172.16.4.36^M > > [0] Abort: [b09n040.oic.ornl.gov:0] Got completion with error, code=1 > > at line 2355 in file viacheck.c > > done. > > --------------------------------------------------- > > > I think the problem is occuring, because your ssh connection > got terminated during the execution of the application. As a result, > any process which tries to communicate with the process present > on the node which died, it will get the "completion with error" during > data transmission. IMHO, your sysadmin should be able to help you > with respect to terminating ssh connection. > > > > > The BUILD_ID file of my ofed is: > > --------------------------------------------------------------------- > > OFED-1.0 > > > > openib-1.0 (REV=8031) > > # User space > > https://openib.org/svn/gen2/branches/1.0/src/userspace > > # Kernel space > > https://openib.org/svn/gen2/branches/1.0/ofed/tags/1.0/linux-kernel > > Git: > > ref: refs/heads/for-2.6.17 > > commit 959eb39297e8c82f61fbfc283ad4ff11c883bf1e > > > > # MPI > > mpi_osu-0.9.7-mlx2.1.0.tgz > > openmpi-1.1b1-1.src.rpm > > mpitests-1.0-0.src.rpm > > --------------------------------------------------------------------------- > > > > > > so that may be a problem - that it is ofed 1.0? > > > > > > ------------------------------------------------------------------------------- > > > > enum ibv_event_type { > > IBV_EVENT_CQ_ERR, > > IBV_EVENT_QP_FATAL, > > IBV_EVENT_QP_REQ_ERR, > > IBV_EVENT_QP_ACCESS_ERR, > > IBV_EVENT_COMM_EST, > > IBV_EVENT_SQ_DRAINED, > > IBV_EVENT_PATH_MIG, > > IBV_EVENT_PATH_MIG_ERR, > > IBV_EVENT_DEVICE_FATAL, > > IBV_EVENT_PORT_ACTIVE, > > IBV_EVENT_PORT_ERR, > > IBV_EVENT_LID_CHANGE, > > IBV_EVENT_PKEY_CHANGE, > > IBV_EVENT_SM_CHANGE, > > IBV_EVENT_SRQ_ERR, > > IBV_EVENT_SRQ_LIMIT_REACHED, > > IBV_EVENT_QP_LAST_WQE_REACHED > > }; > > ------------------------------------------------------------------------------------ > > > > > > Soooo, I assume my new mission is to get ofed 1.1? :} > > > > Yes, i guess this should be the safest bet. Please let us know > the outcome of your experimentation. > > Thanks, > > :- Abhinav > > Thanks!!!! > > Jen > > > > > > -- When I play with my cat, who knows whether she is not amusing herself with me more than I with her. Michel de Montaigne From koop at cse.ohio-state.edu Thu Mar 15 23:52:39 2007 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Thu Mar 15 23:52:41 2007 Subject: [mvapich-discuss] viacheck.c error? In-Reply-To: <2e5ad1b10703151331v6a174f10nf1f765da5b217c41@mail.gmail.com> Message-ID: > Has anyone seen this before? I'm puzzled. If it is a connectivity > issue, how do I know, and why would cpi and the other very simple > programs run? Non-ib batch programs using mpich run successfully. I'm not completely sure why the other benchmarks are not showing the problems -- perhaps it is related to the osu_benchmarks stressing the network more than cpi, which is minimal in communication. > We use ssh for everything and have disabled rsh. The average user > cannot ssh to a compute node from the head node, but, once on a > compute node, he/she can ssh to other compute nodes without supplying > passwords. For mpich, we use mpiexec and this works well for us, but > we'd really like to use the infiniband. :( Please point me in the > right direction if this is not a mvapich related problem. I > administer this cluster, so asking my sysadmin is not an option. ;) This does not seem like an MVAPICH issue, so perhaps testing the lower-level InfiniBand layer directly will be beneficial to rule out setup and connectivity issues. It'd be helpful if you could try running 'ibv_rc_pingpong' that is shipped with OFED. That will allow us to see if the network between at least two nodes is working properly. On host_a: ibv_rc_pingpong On host_b: ibv_rc_pingpong host_b Just a couple other questions: - What is the OS being used? - What HCA type is being used? (Mellanox or Pathscale, PCI-Express or PCI-X) Thanks, Matt From basv at sara.nl Mon Mar 19 04:06:35 2007 From: basv at sara.nl (Bas van der Vlies) Date: Mon Mar 19 04:06:48 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: References: Message-ID: <45FE448B.5040507@sara.nl> wei huang wrote: > Hi, > > Thanks for letting us know the problem. We have generated a patch to > address this problem, and have applied it to both the trunk and our svn > 0.9.8 branch. > > We have done some more tests and found some other problem using mvapich2 and blacs. This problems are encountered by user programs. We get reports from our users that they get wrong answers from their programs. We have made a small fortran (g77) to illustrate a problem. The calls a number of times the same scalapack routine. Independent of the size of the problem the program hangs after 8 or 31 iterations except when number of processes is a square, eg 1x1, 2x2, ... How to compile the program: mpif77 -Wall -g -O0 -o scal scal.f -lscalapack -lfblacs -lcblacs -lblacs -llapack -latlas The program expects on standard input: for example: echo '100 16 100' | mpiexec -n ./scal Regards PS) This program behaves correctly with topspin/ciso software which is based on their infiniband stack and bases on mvapich1 version. We gona test the program in mvapiach1 from OSU -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: scal.f Type: text/x-fortran Size: 7095 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070319/a3e6b856/scal.bin From huanwei at cse.ohio-state.edu Mon Mar 19 09:22:35 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Mon Mar 19 09:22:38 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45FE448B.5040507@sara.nl> Message-ID: Hi, Thanks for letting us know the problem. We will take a look at it and get back to you. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Mon, 19 Mar 2007, Bas van der Vlies wrote: > wei huang wrote: > > Hi, > > > > Thanks for letting us know the problem. We have generated a patch to > > address this problem, and have applied it to both the trunk and our svn > > 0.9.8 branch. > > > > > We have done some more tests and found some other problem using mvapich2 > and blacs. This problems are encountered by user programs. We get > reports from our users that they get wrong answers from their programs. > > We have made a small fortran (g77) to illustrate a problem. > The calls a number of times the same scalapack routine. Independent of > the size of the problem the program hangs after 8 or 31 iterations > except when number of processes is a square, eg 1x1, 2x2, ... > > How to compile the program: > mpif77 -Wall -g -O0 -o scal scal.f -lscalapack -lfblacs -lcblacs -lblacs > -llapack -latlas > > The program expects on standard input: > > > for example: > echo '100 16 100' | mpiexec -n ./scal > > Regards > > > PS) This program behaves correctly with topspin/ciso software which is > based on their infiniband stack and bases on mvapich1 version. > > We gona test the program in mvapiach1 from OSU > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: basv@sara.nl * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > From basv at sara.nl Mon Mar 19 09:47:28 2007 From: basv at sara.nl (Bas van der Vlies) Date: Mon Mar 19 09:47:41 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: References: Message-ID: <45FE9470.7090406@sara.nl> wei huang wrote: > Hi, > > Thanks for letting us know the problem. > > We will take a look at it and get back to you. > Wei, We have done some further testing with mvapich version: * 0.9.8 everything works * 0.9.9-beta it very slow and it hangs also like mvapich2 Regards > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > On Mon, 19 Mar 2007, Bas van der Vlies wrote: > >> wei huang wrote: >>> Hi, >>> >>> Thanks for letting us know the problem. We have generated a patch to >>> address this problem, and have applied it to both the trunk and our svn >>> 0.9.8 branch. >>> >>> >> We have done some more tests and found some other problem using mvapich2 >> and blacs. This problems are encountered by user programs. We get >> reports from our users that they get wrong answers from their programs. >> >> We have made a small fortran (g77) to illustrate a problem. >> The calls a number of times the same scalapack routine. Independent of >> the size of the problem the program hangs after 8 or 31 iterations >> except when number of processes is a square, eg 1x1, 2x2, ... >> >> How to compile the program: >> mpif77 -Wall -g -O0 -o scal scal.f -lscalapack -lfblacs -lcblacs -lblacs >> -llapack -latlas >> >> The program expects on standard input: >> >> >> for example: >> echo '100 16 100' | mpiexec -n ./scal >> >> Regards >> >> >> PS) This program behaves correctly with topspin/ciso software which is >> based on their infiniband stack and bases on mvapich1 version. >> >> We gona test the program in mvapiach1 from OSU >> -- >> ******************************************************************** >> * * >> * Bas van der Vlies e-mail: basv@sara.nl * >> * SARA - Academic Computing Services phone: +31 20 592 8012 * >> * Kruislaan 415 fax: +31 20 6683167 * >> * 1098 SJ Amsterdam * >> * * >> ******************************************************************** >> > -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From panda at cse.ohio-state.edu Mon Mar 19 10:30:12 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Mar 19 10:30:17 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45FE9470.7090406@sara.nl> from "Bas van der Vlies" at Mar 19, 2007 02:47:28 PM Message-ID: <200703191430.l2JEUCHW006572@xi.cse.ohio-state.edu> Hi Bas, > We have done some further testing with mvapich version: > * 0.9.8 everything works > > * 0.9.9-beta it very slow and it hangs also like mvapich2 Thanks for reporting this. Just to check ... are you using the latest mvapich 0.9.9 from the trunk or the beta tarball released on 02/09/07. A lot of successive fixes and tunings have gone since the beta version was released. You can get the latest version of the trunk through SVN checkout or downloading the nightly tarballs of the trunk. Best Regards, DK > Regards > > > Thanks. > > > > Regards, > > Wei Huang > > > > 774 Dreese Lab, 2015 Neil Ave, > > Dept. of Computer Science and Engineering > > Ohio State University > > OH 43210 > > Tel: (614)292-8501 > > > > > > On Mon, 19 Mar 2007, Bas van der Vlies wrote: > > > >> wei huang wrote: > >>> Hi, > >>> > >>> Thanks for letting us know the problem. We have generated a patch to > >>> address this problem, and have applied it to both the trunk and our svn > >>> 0.9.8 branch. > >>> > >>> > >> We have done some more tests and found some other problem using mvapich2 > >> and blacs. This problems are encountered by user programs. We get > >> reports from our users that they get wrong answers from their programs. > >> > >> We have made a small fortran (g77) to illustrate a problem. > >> The calls a number of times the same scalapack routine. Independent of > >> the size of the problem the program hangs after 8 or 31 iterations > >> except when number of processes is a square, eg 1x1, 2x2, ... > >> > >> How to compile the program: > >> mpif77 -Wall -g -O0 -o scal scal.f -lscalapack -lfblacs -lcblacs -lblacs > >> -llapack -latlas > >> > >> The program expects on standard input: > >> > >> > >> for example: > >> echo '100 16 100' | mpiexec -n ./scal > >> > >> Regards > >> > >> > >> PS) This program behaves correctly with topspin/ciso software which is > >> based on their infiniband stack and bases on mvapich1 version. > >> > >> We gona test the program in mvapiach1 from OSU > >> -- > >> ******************************************************************** > >> * * > >> * Bas van der Vlies e-mail: basv@sara.nl * > >> * SARA - Academic Computing Services phone: +31 20 592 8012 * > >> * Kruislaan 415 fax: +31 20 6683167 * > >> * 1098 SJ Amsterdam * > >> * * > >> ******************************************************************** > >> > > > > > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: basv@sara.nl * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From basv at sara.nl Mon Mar 19 10:44:57 2007 From: basv at sara.nl (Bas van der Vlies) Date: Mon Mar 19 10:45:09 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <200703191430.l2JEUCHW006572@xi.cse.ohio-state.edu> References: <200703191430.l2JEUCHW006572@xi.cse.ohio-state.edu> Message-ID: <45FEA1E9.50402@sara.nl> Dhabaleswar Panda wrote: > Hi Bas, > >> We have done some further testing with mvapich version: >> * 0.9.8 everything works >> >> * 0.9.9-beta it very slow and it hangs also like mvapich2 > > Thanks for reporting this. Just to check ... are you using the latest > mvapich 0.9.9 from the trunk or the beta tarball released on 02/09/07. > A lot of successive fixes and tunings have gone since the beta version > was released. You can get the latest version of the trunk through SVN > checkout or downloading the nightly tarballs of the trunk. > I downloaded the tarball. We will test the latest trunk version. Regards > Best Regards, > > DK > >> Regards >> >>> Thanks. >>> >>> Regards, >>> Wei Huang >>> >>> 774 Dreese Lab, 2015 Neil Ave, >>> Dept. of Computer Science and Engineering >>> Ohio State University >>> OH 43210 >>> Tel: (614)292-8501 >>> >>> >>> On Mon, 19 Mar 2007, Bas van der Vlies wrote: >>> >>>> wei huang wrote: >>>>> Hi, >>>>> >>>>> Thanks for letting us know the problem. We have generated a patch to >>>>> address this problem, and have applied it to both the trunk and our svn >>>>> 0.9.8 branch. >>>>> >>>>> >>>> We have done some more tests and found some other problem using mvapich2 >>>> and blacs. This problems are encountered by user programs. We get >>>> reports from our users that they get wrong answers from their programs. >>>> >>>> We have made a small fortran (g77) to illustrate a problem. >>>> The calls a number of times the same scalapack routine. Independent of >>>> the size of the problem the program hangs after 8 or 31 iterations >>>> except when number of processes is a square, eg 1x1, 2x2, ... >>>> >>>> How to compile the program: >>>> mpif77 -Wall -g -O0 -o scal scal.f -lscalapack -lfblacs -lcblacs -lblacs >>>> -llapack -latlas >>>> >>>> The program expects on standard input: >>>> >>>> >>>> for example: >>>> echo '100 16 100' | mpiexec -n ./scal >>>> >>>> Regards >>>> >>>> >>>> PS) This program behaves correctly with topspin/ciso software which is >>>> based on their infiniband stack and bases on mvapich1 version. >>>> >>>> We gona test the program in mvapiach1 from OSU >>>> -- >>>> ******************************************************************** >>>> * * >>>> * Bas van der Vlies e-mail: basv@sara.nl * >>>> * SARA - Academic Computing Services phone: +31 20 592 8012 * >>>> * Kruislaan 415 fax: +31 20 6683167 * >>>> * 1098 SJ Amsterdam * >>>> * * >>>> ******************************************************************** >>>> >> >> -- >> ******************************************************************** >> * * >> * Bas van der Vlies e-mail: basv@sara.nl * >> * SARA - Academic Computing Services phone: +31 20 592 8012 * >> * Kruislaan 415 fax: +31 20 6683167 * >> * 1098 SJ Amsterdam * >> * * >> ******************************************************************** >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From mamidala at cse.ohio-state.edu Mon Mar 19 22:06:46 2007 From: mamidala at cse.ohio-state.edu (amith rajith mamidala) Date: Mon Mar 19 22:06:51 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45FEA1E9.50402@sara.nl> Message-ID: Hi Bas, Can you please apply the one line patches below and let us know the outcome? I have tried a couple of cases and the patch is working fine. Also, can you let us know the nature of this application (scal.f). It seems to be using several hundred of MPI_Comm_split operations. Is this the typical application pattern? For mvapich-0.9.9-beta: Index: create_2level_comm.c (In $HOME/src/context) =================================================================== --- create_2level_comm.c (revision 1102) +++ create_2level_comm.c (working copy) @@ -56,7 +56,6 @@ struct MPIR_COMMUNICATOR* comm_world_ptr; comm_world_ptr = MPIR_GET_COMM_PTR(MPI_COMM_WORLD); - if (comm_count > MAX_ALLOWED_COMM) return; int* shmem_group = malloc(sizeof(int) * size); if (NULL == shmem_group){ For mvapich2-0.9.8: Index: create_2level_comm.c (In $HOME/src/mpi/comm) =================================================================== --- create_2level_comm.c (revision 1104) +++ create_2level_comm.c (working copy) @@ -33,7 +33,6 @@ MPID_Comm_get_ptr( comm, comm_ptr ); MPID_Comm_get_ptr( MPI_COMM_WORLD, comm_world_ptr ); - if (comm_count > MAX_ALLOWED_COMM) return; MPIR_Nest_incr(); Thanks, Amith On Mon, 19 Mar 2007, Bas van der Vlies wrote: > Dhabaleswar Panda wrote: > > Hi Bas, > > > >> We have done some further testing with mvapich version: > >> * 0.9.8 everything works > >> > >> * 0.9.9-beta it very slow and it hangs also like mvapich2 > > > > Thanks for reporting this. Just to check ... are you using the latest > > mvapich 0.9.9 from the trunk or the beta tarball released on 02/09/07. > > A lot of successive fixes and tunings have gone since the beta version > > was released. You can get the latest version of the trunk through SVN > > checkout or downloading the nightly tarballs of the trunk. > > > I downloaded the tarball. We will test the latest trunk version. > > Regards > > > Best Regards, > > > > DK > > > >> Regards > >> > >>> Thanks. > >>> > >>> Regards, > >>> Wei Huang > >>> > >>> 774 Dreese Lab, 2015 Neil Ave, > >>> Dept. of Computer Science and Engineering > >>> Ohio State University > >>> OH 43210 > >>> Tel: (614)292-8501 > >>> > >>> > >>> On Mon, 19 Mar 2007, Bas van der Vlies wrote: > >>> > >>>> wei huang wrote: > >>>>> Hi, > >>>>> > >>>>> Thanks for letting us know the problem. We have generated a patch to > >>>>> address this problem, and have applied it to both the trunk and our svn > >>>>> 0.9.8 branch. > >>>>> > >>>>> > >>>> We have done some more tests and found some other problem using mvapich2 > >>>> and blacs. This problems are encountered by user programs. We get > >>>> reports from our users that they get wrong answers from their programs. > >>>> > >>>> We have made a small fortran (g77) to illustrate a problem. > >>>> The calls a number of times the same scalapack routine. Independent of > >>>> the size of the problem the program hangs after 8 or 31 iterations > >>>> except when number of processes is a square, eg 1x1, 2x2, ... > >>>> > >>>> How to compile the program: > >>>> mpif77 -Wall -g -O0 -o scal scal.f -lscalapack -lfblacs -lcblacs -lblacs > >>>> -llapack -latlas > >>>> > >>>> The program expects on standard input: > >>>> > >>>> > >>>> for example: > >>>> echo '100 16 100' | mpiexec -n ./scal > >>>> > >>>> Regards > >>>> > >>>> > >>>> PS) This program behaves correctly with topspin/ciso software which is > >>>> based on their infiniband stack and bases on mvapich1 version. > >>>> > >>>> We gona test the program in mvapiach1 from OSU > >>>> -- > >>>> ******************************************************************** > >>>> * * > >>>> * Bas van der Vlies e-mail: basv@sara.nl * > >>>> * SARA - Academic Computing Services phone: +31 20 592 8012 * > >>>> * Kruislaan 415 fax: +31 20 6683167 * > >>>> * 1098 SJ Amsterdam * > >>>> * * > >>>> ******************************************************************** > >>>> > >> > >> -- > >> ******************************************************************** > >> * * > >> * Bas van der Vlies e-mail: basv@sara.nl * > >> * SARA - Academic Computing Services phone: +31 20 592 8012 * > >> * Kruislaan 415 fax: +31 20 6683167 * > >> * 1098 SJ Amsterdam * > >> * * > >> ******************************************************************** > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > > > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: basv@sara.nl * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From basv at sara.nl Tue Mar 20 03:42:45 2007 From: basv at sara.nl (Bas van der Vlies) Date: Tue Mar 20 03:42:58 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: References: Message-ID: <45FF9075.4050605@sara.nl> amith rajith mamidala wrote: > Hi Bas, > Hi Amit, > Can you please apply the one line patches below and let us know the > outcome? I have tried a couple of cases and the patch is working fine. > Also, can you let us know the nature of this application (scal.f). > It seems to be using several hundred of MPI_Comm_split operations. Is this > the typical application pattern? > We have some users that do a lot of matrix calculations and runs for a long time. This programs hangs. So we made a small example scalapack program that behaves the same. MPI_Comm_split is called by the blacs library. I have applied the patch for mvapich2-0.9.8 and it does not hang. But it still gives errors. It depends on the size of the matrix when the error occurs and what kind of error: the outcome of 4 nodes with 2 CPU's: 1 : 63 2 : 63 3 : 63 4 : 16 5 : 63 6 : 16 7 : 63 8 : 16 after 16 runs (8 procs): {{{ loop, n, mb, nprocs, nprow, npcol: 16 100 16 8 2 4 Fatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000012, group=0xc80300ec, new_comm=0xbf9b2d64) failed MPI_Comm_create(143): Too many communicatorsFatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000012, group=0xc80300ec, new_comm=0xbfd450f4) failed MPI_Comm_create(143): Too many communicatorsFatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000002, group=0xc803006c, new_comm=0xbfdf39a4) failed MPI_Comm_create(143): Too many communicatorsFatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000002, group=0xc803006c, new_comm=0xbfc38ff4) failed MPI_Comm_create(143): Too many communicatorsFatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000012, group=0xc80300ec, new_comm=0xbfcfd0b4) failed MPI_Comm_create(143): Too many communicatorsFatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000002, group=0xc803006c, new_comm=0xbfcd6084) failed MPI_Comm_create(143): Too many communicatorsFatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000012, group=0xc80300ec, new_comm=0xbff62b14) failed MPI_Comm_create(143): Too many communicatorsFatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(266): MPI_Comm_create(comm=0xc4000002, group=0xc803006c, new_comm=0xbfb2fee4) failed MPI_Comm_create(143): Too many communicatorsrank 7 in job 8 ib-r6n20.irc.sara.nl_11382 caused collective abort of all ranks exit status of rank 7: killed by signal 9 rank 6 in job 8 ib-r6n20.irc.sara.nl_11382 caused collective abort of all ranks exit status of rank 6: killed by signal 9 rank 5 in job 8 ib-r6n20.irc.sara.nl_11382 caused collective abort of all ranks exit status of rank 5: killed by signal 9 rank 4 in job 8 ib-r6n20.irc.sara.nl_11382 caused collective abort of all ranks exit status of rank 4: killed by signal 9 end 8 }}} after 63 runs (7 procs) {{{ loop, n, mb, nprocs, nprow, npcol: 63 100 16 7 1 7 Fatal error in MPI_Comm_split: Other MPI error, error stack: MPI_Comm_split(290).: MPI_Comm_split(comm=0xc40300f4, color=0, key=0, new_comm=0x13535864) failed MPIR_Comm_create(90): Too many communicatorsFatal error in MPI_Comm_split: Other MPI error, error stack: MPI_Comm_split(290).: MPI_Comm_split(comm=0xc40300f4, color=2, key=0, new_comm=0x13533734) failed MPIR_Comm_create(90): Too many communicatorsFatal error in MPI_Comm_split: Other MPI error, error stack: MPI_Comm_split(290).: MPI_Comm_split(comm=0xc401003c, color=2, key=1, new_comm=0x134fea44) failed MPIR_Comm_create(90): Too many communicatorsFatal error in MPI_Comm_split: Other MPI error, error stack: MPI_Comm_split(290).: MPI_Comm_split(comm=0xc401003c, color=1, key=1, new_comm=0x134fe404) failed MPIR_Comm_create(90): Too many communicatorsFatal error in MPI_Comm_split: Other MPI error, error stack: MPI_Comm_split(290).: MPI_Comm_split(comm=0xc40300f4, color=3, key=0, new_comm=0x135f3de4) failed MPIR_Comm_create(90): Too many communicatorsFatal error in MPI_Comm_split: Other MPI error, error stack: MPI_Comm_split(290).: MPI_Comm_split(comm=0xc40300f4, color=1, key=0, new_comm=0x13532184) failed MPIR_Comm_create(90): Too many communicatorsFatal error in MPI_Comm_split: Other MPI error, error stack: MPI_Comm_split(290).: MPI_Comm_split(comm=0xc401003c, color=0, key=1, new_comm=0x134fe404) failed MPIR_Comm_create(90): Too many communicatorsrank 6 in job 7 ib-r6n20.irc.sara.nl_11382 caused collective abort of all ranks exit status of rank 6: killed by signal 9 rank 5 in job 7 ib-r6n20.irc.sara.nl_11382 caused collective abort of all ranks exit status of rank 5: killed by signal 9 rank 4 in job 7 ib-r6n20.irc.sara.nl_11382 caused collective abort of all ranks exit status of rank 4: killed by signal 9 end 7 }}} > For mvapich-0.9.9-beta: > > Index: create_2level_comm.c (In $HOME/src/context) > =================================================================== > --- create_2level_comm.c (revision 1102) > +++ create_2level_comm.c (working copy) > @@ -56,7 +56,6 @@ > struct MPIR_COMMUNICATOR* comm_world_ptr; > comm_world_ptr = MPIR_GET_COMM_PTR(MPI_COMM_WORLD); > > - if (comm_count > MAX_ALLOWED_COMM) return; > > int* shmem_group = malloc(sizeof(int) * size); > if (NULL == shmem_group){ > > > > For mvapich2-0.9.8: > > Index: create_2level_comm.c (In $HOME/src/mpi/comm) > =================================================================== > --- create_2level_comm.c (revision 1104) > +++ create_2level_comm.c (working copy) > @@ -33,7 +33,6 @@ > MPID_Comm_get_ptr( comm, comm_ptr ); > MPID_Comm_get_ptr( MPI_COMM_WORLD, comm_world_ptr ); > > - if (comm_count > MAX_ALLOWED_COMM) return; > > MPIR_Nest_incr(); > > > > Thanks, > Amith > > > On Mon, 19 Mar 2007, Bas van der Vlies wrote: > >> Dhabaleswar Panda wrote: >>> Hi Bas, >>> >>>> We have done some further testing with mvapich version: >>>> * 0.9.8 everything works >>>> >>>> * 0.9.9-beta it very slow and it hangs also like mvapich2 >>> Thanks for reporting this. Just to check ... are you using the latest >>> mvapich 0.9.9 from the trunk or the beta tarball released on 02/09/07. >>> A lot of successive fixes and tunings have gone since the beta version >>> was released. You can get the latest version of the trunk through SVN >>> checkout or downloading the nightly tarballs of the trunk. >>> >> I downloaded the tarball. We will test the latest trunk version. >> >> Regards >> >>> Best Regards, >>> >>> DK >>> >>>> Regards >>>> >>>>> Thanks. >>>>> >>>>> Regards, >>>>> Wei Huang >>>>> >>>>> 774 Dreese Lab, 2015 Neil Ave, >>>>> Dept. of Computer Science and Engineering >>>>> Ohio State University >>>>> OH 43210 >>>>> Tel: (614)292-8501 >>>>> >>>>> >>>>> On Mon, 19 Mar 2007, Bas van der Vlies wrote: >>>>> >>>>>> wei huang wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Thanks for letting us know the problem. We have generated a patch to >>>>>>> address this problem, and have applied it to both the trunk and our svn >>>>>>> 0.9.8 branch. >>>>>>> >>>>>>> >>>>>> We have done some more tests and found some other problem using mvapich2 >>>>>> and blacs. This problems are encountered by user programs. We get >>>>>> reports from our users that they get wrong answers from their programs. >>>>>> >>>>>> We have made a small fortran (g77) to illustrate a problem. >>>>>> The calls a number of times the same scalapack routine. Independent of >>>>>> the size of the problem the program hangs after 8 or 31 iterations >>>>>> except when number of processes is a square, eg 1x1, 2x2, ... >>>>>> >>>>>> How to compile the program: >>>>>> mpif77 -Wall -g -O0 -o scal scal.f -lscalapack -lfblacs -lcblacs -lblacs >>>>>> -llapack -latlas >>>>>> >>>>>> The program expects on standard input: >>>>>> >>>>>> >>>>>> for example: >>>>>> echo '100 16 100' | mpiexec -n ./scal >>>>>> >>>>>> Regards >>>>>> >>>>>> >>>>>> PS) This program behaves correctly with topspin/ciso software which is >>>>>> based on their infiniband stack and bases on mvapich1 version. >>>>>> >>>>>> We gona test the program in mvapiach1 from OSU >>>>>> -- >>>>>> ******************************************************************** >>>>>> * * >>>>>> * Bas van der Vlies e-mail: basv@sara.nl * >>>>>> * SARA - Academic Computing Services phone: +31 20 592 8012 * >>>>>> * Kruislaan 415 fax: +31 20 6683167 * >>>>>> * 1098 SJ Amsterdam * >>>>>> * * >>>>>> ******************************************************************** >>>>>> >>>> -- >>>> ******************************************************************** >>>> * * >>>> * Bas van der Vlies e-mail: basv@sara.nl * >>>> * SARA - Academic Computing Services phone: +31 20 592 8012 * >>>> * Kruislaan 415 fax: +31 20 6683167 * >>>> * 1098 SJ Amsterdam * >>>> * * >>>> ******************************************************************** >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >> >> -- >> ******************************************************************** >> * * >> * Bas van der Vlies e-mail: basv@sara.nl * >> * SARA - Academic Computing Services phone: +31 20 592 8012 * >> * Kruislaan 415 fax: +31 20 6683167 * >> * 1098 SJ Amsterdam * >> * * >> ******************************************************************** >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From basv at sara.nl Tue Mar 20 05:21:16 2007 From: basv at sara.nl (Bas van der Vlies) Date: Tue Mar 20 05:21:23 2007 Subject: [mvapich-discuss] mvapich2-0.9.8 blacs problems In-Reply-To: <45FF9075.4050605@sara.nl> References: <45FF9075.4050605@sara.nl> Message-ID: <45FFA78C.3010603@sara.nl> Bas van der Vlies wrote: > amith rajith mamidala wrote: >> Hi Bas, >> > Hi Amit, > Amit the patch for mvapich1 0.9.9-trunk works ;-) and with the new 0.9.9 version from svn trunk it is not slow anymore Regards -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From luis.kornblueh at zmaw.de Tue Mar 20 12:00:50 2007 From: luis.kornblueh at zmaw.de (Luis Kornblueh) Date: Tue Mar 20 12:00:55 2007 Subject: [mvapich-discuss] Problem building mvapich-0.9.9-beta2 Message-ID: <20070320160050.GA5672@landru.zmaw.de> Hi, I tried to compile mvapich-0.9.9-beta2 and did run into a problem (gcc 4.1.1): gcc -DHAVE_CONFIG_H -I. -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/vapi_multirail -I/sw/src/mvapich/mvapich-0.9.9-beta2/include -I/sw/src/mvapich/mvapich-0.9.9-beta2/include -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/vapi_multirail -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -D_X86_64_ -DUSE_INLINE -DUSE_MRAIL -DEARLY_SEND_COMPLETION -DRDMA_FAST_PATH -DVIADEV_RPUT_SUPPORT -DLAZY_MEM_UNREGISTER -D_SMP_ -D_SMP_RNDV_ -I/usr/include -O3 -DHAVE_MPICHCONF_H -I/sw/src/mvapich/mvapich-0.9.9-beta2 -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/vapi_multirail -I. -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -c vbuf.c vbuf.c:86: error: static declaration free_vbuf_ follows non-static declaration vbuf.h:147: error: previous declaration free_vbuf_ was here Cheerio, Luis -- \\\\\\ (-0^0-) --------------------------oOO--(_)--OOo----------------------------- Luis Kornblueh Tel. : +49-40-41173289 Max-Planck-Institute for Meteorology Fax. : +49-40-41173298 Bundesstr. 53 D-20146 Hamburg Email: luis.kornblueh@zmaw.de Federal Republic of Germany From vishnu at cse.ohio-state.edu Tue Mar 20 12:19:33 2007 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Tue Mar 20 12:21:45 2007 Subject: [mvapich-discuss] Problem building mvapich-0.9.9-beta2 In-Reply-To: <20070320160050.GA5672@landru.zmaw.de> References: <20070320160050.GA5672@landru.zmaw.de> Message-ID: <20070320161933.GA13250@cse.ohio-state.edu> Hi Luis, Thanks for using MVAPICH-0.9.9 and reporting the problem. We will fix this problem soon and notify you. Thanks, :- Abhinav * On Mar,1 Luis Kornblueh wrote : > Hi, > > I tried to compile mvapich-0.9.9-beta2 and did run into a problem (gcc 4.1.1): > > gcc -DHAVE_CONFIG_H -I. -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/vapi_multirail -I/sw/src/mvapich/mvapich-0.9.9-beta2/include -I/sw/src/mvapich/mvapich-0.9.9-beta2/include -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/vapi_multirail -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -D_X86_64_ -DUSE_INLINE -DUSE_MRAIL -DEARLY_SEND_COMPLETION -DRDMA_FAST_PATH -DVIADEV_RPUT_SUPPORT -DLAZY_MEM_UNREGISTER -D_SMP_ -D_SMP_RNDV_ -I/usr/include -O3 -DHAVE_MPICHCONF_H -I/sw/src/mvapich/mvapich-0.9.9-beta2 -I/sw/src/mvapich/mvapich-0.9.9-beta2/mpid/vapi_multirail -I. -Wall -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -c vbuf.c > vbuf.c:86: error: static declaration free_vbuf_ follows non-static declaration > vbuf.h:147: error: previous declaration free_vbuf_ was here > > Cheerio, > Luis > > -- > \\\\\\ > (-0^0-) > ---------