From mhouston at graphics.stanford.edu Sun Apr 1 06:03:31 2007 From: mhouston at graphics.stanford.edu (Mike Houston) Date: Sun Apr 1 09:16:47 2007 Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. Message-ID: <460F8373.9070609@graphics.stanford.edu> We've hit an odd snag with using mvapich2. We can't seem to reliably send messages > 10KB. If we break up all large messages into 8KB blocks and send, things work just fine, but as expected, performance is awful. Under mpich2 with GigE and IPoIB, large messages seem to work just fine. Both MPI_Send and MPI_Put seem to exhibit the same behavior. I should note that the one oddity of our system implementation is that we have a posted MPI_IRecv waiting while doing the large transfers. Open-MPI flips out when we do this, even in tcp mode. We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3) ontop of the latest RHEL4 SMP x86 kernel (32-bit). The boards have slightly older firmware, 3.3.3, but I'm hesitant to flash up unless there are known issues with that firmware... We built using the defaults in make.mvapich2.vapi. Any suggestions on where to look or what to update? It seems *very* odd that large transfers aren't working for us... Thanks! -Mike From huanwei at cse.ohio-state.edu Mon Apr 2 14:49:50 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Mon Apr 2 14:49:55 2007 Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. In-Reply-To: Message-ID: Hi Mike, Thanks for letting us know the problem. However, to us understand more what is going on, would you please let us know the following? 1) Which version of mvapich2 are you using? The latest release version now should be mvapich2-0.9.8. 2) Could you actually try running osu_benchmarks and see if they all pass on your system? The benchmarks are distributed with the packet and are in the `osu_benchmarks' directory. You should not experience problem with that if your systems are setup correctly. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 > ---------- Forwarded message ---------- > Date: Sun, 01 Apr 2007 03:03:31 -0700 > From: Mike Houston > To: mvapich-discuss@cse.ohio-state.edu > Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. > > We've hit an odd snag with using mvapich2. We can't seem to reliably > send messages > 10KB. If we break up all large messages into 8KB blocks > and send, things work just fine, but as expected, performance is awful. > Under mpich2 with GigE and IPoIB, large messages seem to work just > fine. Both MPI_Send and MPI_Put seem to exhibit the same behavior. I > should note that the one oddity of our system implementation is that we > have a posted MPI_IRecv waiting while doing the large transfers. > Open-MPI flips out when we do this, even in tcp mode. > > We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3) > ontop of the latest RHEL4 SMP x86 kernel (32-bit). The boards have > slightly older firmware, 3.3.3, but I'm hesitant to flash up unless > there are known issues with that firmware... We built using the > defaults in make.mvapich2.vapi. Any suggestions on where to look or > what to update? It seems *very* odd that large transfers aren't working > for us... > > Thanks! > > -Mike > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From mhouston at graphics.stanford.edu Mon Apr 2 14:51:43 2007 From: mhouston at graphics.stanford.edu (Mike Houston) Date: Mon Apr 2 14:51:50 2007 Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. In-Reply-To: References: Message-ID: <461150BF.6080401@graphics.stanford.edu> wei huang wrote: > Hi Mike, > > Thanks for letting us know the problem. However, to us understand more > what is going on, would you please let us know the following? > > 1) Which version of mvapich2 are you using? The latest release version now > should be mvapich2-0.9.8. > Yes, 0.9.8 > 2) Could you actually try running osu_benchmarks and see if they all pass > on your system? The benchmarks are distributed with the packet and are in > the `osu_benchmarks' directory. You should not experience problem with > that if your systems are setup correctly. > All gives these a go, but they look like they don't verify results. > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > >> ---------- Forwarded message ---------- >> Date: Sun, 01 Apr 2007 03:03:31 -0700 >> From: Mike Houston >> To: mvapich-discuss@cse.ohio-state.edu >> Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. >> >> We've hit an odd snag with using mvapich2. We can't seem to reliably >> send messages > 10KB. If we break up all large messages into 8KB blocks >> and send, things work just fine, but as expected, performance is awful. >> Under mpich2 with GigE and IPoIB, large messages seem to work just >> fine. Both MPI_Send and MPI_Put seem to exhibit the same behavior. I >> should note that the one oddity of our system implementation is that we >> have a posted MPI_IRecv waiting while doing the large transfers. >> Open-MPI flips out when we do this, even in tcp mode. >> >> We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3) >> ontop of the latest RHEL4 SMP x86 kernel (32-bit). The boards have >> slightly older firmware, 3.3.3, but I'm hesitant to flash up unless >> there are known issues with that firmware... We built using the >> defaults in make.mvapich2.vapi. Any suggestions on where to look or >> what to update? It seems *very* odd that large transfers aren't working >> for us... >> >> Thanks! >> >> -Mike >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > > From huanwei at cse.ohio-state.edu Mon Apr 2 15:00:57 2007 From: huanwei at cse.ohio-state.edu (wei huang) Date: Mon Apr 2 15:01:01 2007 Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. In-Reply-To: <461150BF.6080401@graphics.stanford.edu> Message-ID: Hi Mike, Could you please run IMB test with -DCHECK (Compile IMB with this flag) option? This checks all collective operations with data verification. Also, what exactly do you mean by ``cannot reliably'' send message? Do you see data corruption, or are there other error symptoms? Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Mon, 2 Apr 2007, Mike Houston wrote: > > > wei huang wrote: > > Hi Mike, > > > > Thanks for letting us know the problem. However, to us understand more > > what is going on, would you please let us know the following? > > > > 1) Which version of mvapich2 are you using? The latest release version now > > should be mvapich2-0.9.8. > > > Yes, 0.9.8 > > 2) Could you actually try running osu_benchmarks and see if they all pass > > on your system? The benchmarks are distributed with the packet and are in > > the `osu_benchmarks' directory. You should not experience problem with > > that if your systems are setup correctly. > > > All gives these a go, but they look like they don't verify results. > > Thanks. > > > > Regards, > > Wei Huang > > > > 774 Dreese Lab, 2015 Neil Ave, > > Dept. of Computer Science and Engineering > > Ohio State University > > OH 43210 > > Tel: (614)292-8501 > > > > > > > >> ---------- Forwarded message ---------- > >> Date: Sun, 01 Apr 2007 03:03:31 -0700 > >> From: Mike Houston > >> To: mvapich-discuss@cse.ohio-state.edu > >> Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. > >> > >> We've hit an odd snag with using mvapich2. We can't seem to reliably > >> send messages > 10KB. If we break up all large messages into 8KB blocks > >> and send, things work just fine, but as expected, performance is awful. > >> Under mpich2 with GigE and IPoIB, large messages seem to work just > >> fine. Both MPI_Send and MPI_Put seem to exhibit the same behavior. I > >> should note that the one oddity of our system implementation is that we > >> have a posted MPI_IRecv waiting while doing the large transfers. > >> Open-MPI flips out when we do this, even in tcp mode. > >> > >> We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3) > >> ontop of the latest RHEL4 SMP x86 kernel (32-bit). The boards have > >> slightly older firmware, 3.3.3, but I'm hesitant to flash up unless > >> there are known issues with that firmware... We built using the > >> defaults in make.mvapich2.vapi. Any suggestions on where to look or > >> what to update? It seems *very* odd that large transfers aren't working > >> for us... > >> > >> Thanks! > >> > >> -Mike > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > > > > > > > From mhouston at graphics.stanford.edu Mon Apr 2 15:03:59 2007 From: mhouston at graphics.stanford.edu (Mike Houston) Date: Mon Apr 2 15:04:09 2007 Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. In-Reply-To: References: Message-ID: <4611539F.1040100@graphics.stanford.edu> wei huang wrote: > Hi Mike, > > Could you please run IMB test with -DCHECK (Compile IMB with this flag) > option? This checks all collective operations with data verification. > Will do. I need to finish some large runs on the cluster with the workarounds before I can run this. > Also, what exactly do you mean by ``cannot reliably'' send message? Do you > see data corruption, or are there other error symptoms? > The messages get corrupted or are corrupting data in other windows (windows as MPI_Win). We don't see this behavior with mpich2 over GigE or IPoIB. The later seems to be what we are generally seeing. This may well be a bug on our side, but we don't have issues with other mpi versions, but we could be getting lucky. > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > On Mon, 2 Apr 2007, Mike Houston wrote: > > >> wei huang wrote: >> >>> Hi Mike, >>> >>> Thanks for letting us know the problem. However, to us understand more >>> what is going on, would you please let us know the following? >>> >>> 1) Which version of mvapich2 are you using? The latest release version now >>> should be mvapich2-0.9.8. >>> >>> >> Yes, 0.9.8 >> >>> 2) Could you actually try running osu_benchmarks and see if they all pass >>> on your system? The benchmarks are distributed with the packet and are in >>> the `osu_benchmarks' directory. You should not experience problem with >>> that if your systems are setup correctly. >>> >>> >> All gives these a go, but they look like they don't verify results. >> >>> Thanks. >>> >>> Regards, >>> Wei Huang >>> >>> 774 Dreese Lab, 2015 Neil Ave, >>> Dept. of Computer Science and Engineering >>> Ohio State University >>> OH 43210 >>> Tel: (614)292-8501 >>> >>> >>> >>> >>>> ---------- Forwarded message ---------- >>>> Date: Sun, 01 Apr 2007 03:03:31 -0700 >>>> From: Mike Houston >>>> To: mvapich-discuss@cse.ohio-state.edu >>>> Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. >>>> >>>> We've hit an odd snag with using mvapich2. We can't seem to reliably >>>> send messages > 10KB. If we break up all large messages into 8KB blocks >>>> and send, things work just fine, but as expected, performance is awful. >>>> Under mpich2 with GigE and IPoIB, large messages seem to work just >>>> fine. Both MPI_Send and MPI_Put seem to exhibit the same behavior. I >>>> should note that the one oddity of our system implementation is that we >>>> have a posted MPI_IRecv waiting while doing the large transfers. >>>> Open-MPI flips out when we do this, even in tcp mode. >>>> >>>> We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3) >>>> ontop of the latest RHEL4 SMP x86 kernel (32-bit). The boards have >>>> slightly older firmware, 3.3.3, but I'm hesitant to flash up unless >>>> there are known issues with that firmware... We built using the >>>> defaults in make.mvapich2.vapi. Any suggestions on where to look or >>>> what to update? It seems *very* odd that large transfers aren't working >>>> for us... >>>> >>>> Thanks! >>>> >>>> -Mike >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >>> >>> > > > From Durga.Choudhury at drs-ss.com Mon Apr 2 18:51:07 2007 From: Durga.Choudhury at drs-ss.com (Choudhury, Durga) Date: Mon Apr 2 18:51:18 2007 Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. In-Reply-To: <4611539F.1040100@graphics.stanford.edu> Message-ID: This might be an irrelevant comment, but let me say this anyway. Some gige device drivers have bugs that cannot handle jumbo frames above 8k correctly. Since you say your messages work below 8k, this rang a bell in my mind. Since you say you are running IB interconnect and not gigE, this should be irrelevant to you, but if you are running IPoIB, please check your routing table to make sure you are actually going over IB and not over GigE (in fact, another user had precisely expressed the same concern recently.) If the latter, see if your card is set to handle jumbo frames and try reducing the MTU (1500 bytes is guaranteed to work with all drivers). Best regards Durga -----Original Message----- From: mvapich-discuss-bounces@cse.ohio-state.edu [mailto:mvapich-discuss-bounces@cse.ohio-state.edu] On Behalf Of Mike Houston Sent: Monday, April 02, 2007 3:04 PM To: wei huang Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. wei huang wrote: > Hi Mike, > > Could you please run IMB test with -DCHECK (Compile IMB with this flag) > option? This checks all collective operations with data verification. > Will do. I need to finish some large runs on the cluster with the workarounds before I can run this. > Also, what exactly do you mean by ``cannot reliably'' send message? Do you > see data corruption, or are there other error symptoms? > The messages get corrupted or are corrupting data in other windows (windows as MPI_Win). We don't see this behavior with mpich2 over GigE or IPoIB. The later seems to be what we are generally seeing. This may well be a bug on our side, but we don't have issues with other mpi versions, but we could be getting lucky. > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > On Mon, 2 Apr 2007, Mike Houston wrote: > > >> wei huang wrote: >> >>> Hi Mike, >>> >>> Thanks for letting us know the problem. However, to us understand more >>> what is going on, would you please let us know the following? >>> >>> 1) Which version of mvapich2 are you using? The latest release version now >>> should be mvapich2-0.9.8. >>> >>> >> Yes, 0.9.8 >> >>> 2) Could you actually try running osu_benchmarks and see if they all pass >>> on your system? The benchmarks are distributed with the packet and are in >>> the `osu_benchmarks' directory. You should not experience problem with >>> that if your systems are setup correctly. >>> >>> >> All gives these a go, but they look like they don't verify results. >> >>> Thanks. >>> >>> Regards, >>> Wei Huang >>> >>> 774 Dreese Lab, 2015 Neil Ave, >>> Dept. of Computer Science and Engineering >>> Ohio State University >>> OH 43210 >>> Tel: (614)292-8501 >>> >>> >>> >>> >>>> ---------- Forwarded message ---------- >>>> Date: Sun, 01 Apr 2007 03:03:31 -0700 >>>> From: Mike Houston >>>> To: mvapich-discuss@cse.ohio-state.edu >>>> Subject: [mvapich-discuss] mvapich2 IB problems with transfers over ~10KB. >>>> >>>> We've hit an odd snag with using mvapich2. We can't seem to reliably >>>> send messages > 10KB. If we break up all large messages into 8KB blocks >>>> and send, things work just fine, but as expected, performance is awful. >>>> Under mpich2 with GigE and IPoIB, large messages seem to work just >>>> fine. Both MPI_Send and MPI_Put seem to exhibit the same behavior. I >>>> should note that the one oddity of our system implementation is that we >>>> have a posted MPI_IRecv waiting while doing the large transfers. >>>> Open-MPI flips out when we do this, even in tcp mode. >>>> >>>> We have PCI-X SDR 4X boards, running the latest IB Gold release (1.8.3) >>>> ontop of the latest RHEL4 SMP x86 kernel (32-bit). The boards have >>>> slightly older firmware, 3.3.3, but I'm hesitant to flash up unless >>>> there are known issues with that firmware... We built using the >>>> defaults in make.mvapich2.vapi. Any suggestions on where to look or >>>> what to update? It seems *very* odd that large transfers aren't working >>>> for us... >>>> >>>> Thanks! >>>> >>>> -Mike >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >>> >>> > > > _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From basv at sara.nl Wed Apr 4 02:18:03 2007 From: basv at sara.nl (Bas van der Vlies) Date: Wed Apr 4 02:18:17 2007 Subject: [mvapich-discuss] xcbrd tests Message-ID: <4613431B.1080002@sara.nl> Hello, From our users we still get reports that there are errors with mvapich version 1 and 2. So we did some tests with the Scalapack testsuite. Attached you find the output of the xcbrd test program, part of the Scalapack suit, in the TESTING directory. I attached 4 cases: topspin-g77: topspin mpich1 and infiniband, g77 compiler This test runs ok This is the infiniband stack from topspin/cisco and their implementation of mvapich based on version mpich1-gfortran: mvapich-0.9.8 with gfortran compiler This test gives incorrect output, and takes a long time to complete mpich1-g77: mvapich-0.9.8 with g77 compiler This test gives one line output and hangs mpich2-gfortran: mvapich2-0.9.8 Same result as mpich1-gfortran In all gfortran cases, the environment variable GFORTRAN_UNBUFFERED_ALL was set to 'y'. Other tests give various results: examples for mpich2-gfortran: xsinv runs, put part of the output contains errors xsqr seems to run fine. Can you comment on this? -- Willem Vermin tel (31)20 5923054/5923000 SARA, Kruislaan 415 fax (31)20 6683167 1098 SJ Amsterdam willem@sara.nl Nederland -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** -------------- next part -------------- SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 3.40 0.70 PASSED WALL 4 4 3 1 1 0.00 6.56 0.66 PASSED WALL 4 4 4 1 1 0.00 16.65 1.02 PASSED WALL 4 4 5 1 1 0.00 17.07 1.02 PASSED WALL 10 12 2 1 1 0.00 45.61 0.57 PASSED WALL 10 12 3 1 1 0.00 42.93 0.59 PASSED WALL 10 12 4 1 1 0.00 44.88 0.62 PASSED WALL 10 12 5 1 1 0.00 54.59 0.51 PASSED WALL 17 13 2 1 1 0.00 76.28 0.37 PASSED WALL 17 13 3 1 1 0.00 73.34 0.37 PASSED WALL 17 13 4 1 1 0.00 72.41 0.48 PASSED WALL 17 13 5 1 1 0.00 75.44 0.46 PASSED WALL 13 13 2 1 1 0.00 55.53 0.43 PASSED WALL 13 13 3 1 1 0.00 52.43 0.43 PASSED WALL 13 13 4 1 1 0.00 52.43 0.50 PASSED WALL 13 13 5 1 1 0.00 57.02 0.59 PASSED WALL 4 4 2 2 2 0.00 1.32 0.66 PASSED WALL 4 4 3 2 2 0.00 1.50 1.12 PASSED WALL 4 4 4 2 2 0.00 3.38 1.02 PASSED WALL 4 4 5 2 2 0.00 3.65 1.02 PASSED WALL 10 12 2 2 2 0.00 11.59 0.47 PASSED WALL 10 12 3 2 2 0.00 11.55 0.84 PASSED WALL 10 12 4 2 2 0.00 12.83 0.68 PASSED WALL 10 12 5 2 2 0.00 16.31 0.84 PASSED WALL 17 13 2 2 2 0.00 20.66 0.35 PASSED WALL 17 13 3 2 2 0.00 22.04 0.44 PASSED WALL 17 13 4 2 2 0.00 22.10 0.41 PASSED WALL 17 13 5 2 2 0.00 24.38 0.39 PASSED WALL 13 13 2 2 2 0.00 15.96 0.57 PASSED WALL 13 13 3 2 2 0.00 15.48 0.40 PASSED WALL 13 13 4 2 2 0.00 15.37 0.49 PASSED WALL 13 13 5 2 2 0.00 17.46 0.52 PASSED WALL 4 4 2 1 4 0.00 2.25 0.70 PASSED WALL 4 4 3 1 4 0.00 1.95 0.90 PASSED WALL 4 4 4 1 4 0.00 4.46 1.02 PASSED WALL 4 4 5 1 4 0.00 4.35 1.02 PASSED WALL 10 12 2 1 4 0.00 12.40 0.46 PASSED WALL 10 12 3 1 4 0.00 11.72 0.57 PASSED WALL 10 12 4 1 4 0.00 11.89 0.60 PASSED WALL 10 12 5 1 4 0.00 15.07 0.71 PASSED WALL 17 13 2 1 4 0.00 20.68 0.40 PASSED WALL 17 13 3 1 4 0.00 20.05 0.38 PASSED WALL 17 13 4 1 4 0.00 19.74 0.44 PASSED WALL 17 13 5 1 4 0.00 22.88 0.42 PASSED WALL 13 13 2 1 4 0.00 14.51 0.51 PASSED WALL 13 13 3 1 4 0.00 14.26 0.45 PASSED WALL 13 13 4 1 4 0.00 14.73 0.45 PASSED WALL 13 13 5 1 4 0.00 16.00 0.49 PASSED WALL 4 4 2 4 1 0.00 1.94 1.00 PASSED WALL 4 4 3 4 1 0.00 1.43 0.68 PASSED WALL 4 4 4 4 1 0.00 4.09 1.02 PASSED WALL 4 4 5 4 1 0.00 3.71 1.02 PASSED WALL 10 12 2 4 1 0.00 10.03 0.56 PASSED WALL 10 12 3 4 1 0.00 9.64 0.67 PASSED WALL 10 12 4 4 1 0.00 10.12 0.52 PASSED WALL 10 12 5 4 1 0.00 13.81 0.61 PASSED WALL 17 13 2 4 1 0.00 17.76 0.53 PASSED WALL 17 13 3 4 1 0.00 16.10 0.46 PASSED WALL 17 13 4 4 1 0.00 16.19 0.35 PASSED WALL 17 13 5 4 1 0.00 18.28 0.42 PASSED WALL 13 13 2 4 1 0.00 12.42 0.51 PASSED WALL 13 13 3 4 1 0.00 10.87 0.43 PASSED WALL 13 13 4 4 1 0.00 11.58 0.54 PASSED WALL 13 13 5 4 1 0.00 13.09 0.51 PASSED Finished 64 tests, with the following results: 64 tests completed and passed residual checks. 0 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. Takes a second or so to complete -------------- next part -------------- SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 0.00 0.79 PASSED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 2 2 0.00 0.00 NaN FAILED { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 2 2 0.00 0.00 NaN FAILED { 1, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 1 4 0.00 0.00 NaN FAILED { 0, 3}: Memory overwrite in PCGEBRD { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 4 0.00 0.00 NaN FAILED { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 4 0.00 0.00 NaN FAILED { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 4 0.00 0.00 NaN FAILED { 3, 0}: Memory overwrite in PCGEBRD { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 4 1 0.00 0.00 NaN FAILED { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 4 1 0.00 0.00 NaN FAILED { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 4 1 0.00 0.00 NaN FAILED { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 4 1 0.00 0.00 NaN FAILED Finished 64 tests, with the following results: 1 tests completed and passed residual checks. 63 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. This takes a long time to end -------------- next part -------------- scalapack-1.7.5/TESTING/xcbrd mvapich1/g77: SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 0.00 0.70 PASSED and hangs -------------- next part -------------- SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 0.00 0.79 PASSED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 2 2 0.00 0.00 NaN FAILED { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 2 2 0.00 0.00 NaN FAILED { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 4 0.00 0.00 NaN FAILED { 0, 3}: Memory overwrite in PCGEBRD { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 4 0.00 0.00 NaN FAILED { 3, 0}: Memory overwrite in PCGEBRD { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 4 1 0.00 0.00 NaN FAILED { 3, 0}: Memory overwrite in PCGEBRD { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 4 1 0.00 0.00 NaN FAILED Finished 64 tests, with the following results: 1 tests completed and passed residual checks. 63 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. Takes a long time to complete From basv at sara.nl Wed Apr 4 02:31:41 2007 From: basv at sara.nl (Bas van der Vlies) Date: Wed Apr 4 02:31:52 2007 Subject: [mvapich-discuss] xcbrd tests Message-ID: <4613464D.1040307@sara.nl> Hello, Our users still reports problems with mvapich version 1 and 2. So we did some more testing with some programs from the Scalapack suit. Attached you find the output of the xcbrd test program, part of the Scalapack suit, in the TESTING directory. I attached 4 cases: topspin-g77: topspin mpich1 and infiniband, g77 compiler This test runs ok This is the installation software stack from topspin/cisco based on their implementation of the mvapich software. mpich1-gfortran: mvapich-0.9.8 with gfortran compiler This test gives incorrect output, and takes a long time to complete mpich1-g77: mvapich-0.9.8 with g77 compiler This test gives one line output and hangs mpich2-gfortran: mvapich2-0.9.8 Same result as mpich1-gfortran In all gfortran cases, the environment variable GFORTRAN_UNBUFFERED_ALL was set to 'y'. Other tests give various results: examples for mpich2-gfortran: xsinv runs, put part of the output contains errors xsqr seems to run fine. Can you comment on this? -- Willem Vermin tel (31)20 5923054/5923000 SARA, Kruislaan 415 fax (31)20 6683167 1098 SJ Amsterdam willem@sara.nl Nederland -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** -------------- next part -------------- SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 3.40 0.70 PASSED WALL 4 4 3 1 1 0.00 6.56 0.66 PASSED WALL 4 4 4 1 1 0.00 16.65 1.02 PASSED WALL 4 4 5 1 1 0.00 17.07 1.02 PASSED WALL 10 12 2 1 1 0.00 45.61 0.57 PASSED WALL 10 12 3 1 1 0.00 42.93 0.59 PASSED WALL 10 12 4 1 1 0.00 44.88 0.62 PASSED WALL 10 12 5 1 1 0.00 54.59 0.51 PASSED WALL 17 13 2 1 1 0.00 76.28 0.37 PASSED WALL 17 13 3 1 1 0.00 73.34 0.37 PASSED WALL 17 13 4 1 1 0.00 72.41 0.48 PASSED WALL 17 13 5 1 1 0.00 75.44 0.46 PASSED WALL 13 13 2 1 1 0.00 55.53 0.43 PASSED WALL 13 13 3 1 1 0.00 52.43 0.43 PASSED WALL 13 13 4 1 1 0.00 52.43 0.50 PASSED WALL 13 13 5 1 1 0.00 57.02 0.59 PASSED WALL 4 4 2 2 2 0.00 1.32 0.66 PASSED WALL 4 4 3 2 2 0.00 1.50 1.12 PASSED WALL 4 4 4 2 2 0.00 3.38 1.02 PASSED WALL 4 4 5 2 2 0.00 3.65 1.02 PASSED WALL 10 12 2 2 2 0.00 11.59 0.47 PASSED WALL 10 12 3 2 2 0.00 11.55 0.84 PASSED WALL 10 12 4 2 2 0.00 12.83 0.68 PASSED WALL 10 12 5 2 2 0.00 16.31 0.84 PASSED WALL 17 13 2 2 2 0.00 20.66 0.35 PASSED WALL 17 13 3 2 2 0.00 22.04 0.44 PASSED WALL 17 13 4 2 2 0.00 22.10 0.41 PASSED WALL 17 13 5 2 2 0.00 24.38 0.39 PASSED WALL 13 13 2 2 2 0.00 15.96 0.57 PASSED WALL 13 13 3 2 2 0.00 15.48 0.40 PASSED WALL 13 13 4 2 2 0.00 15.37 0.49 PASSED WALL 13 13 5 2 2 0.00 17.46 0.52 PASSED WALL 4 4 2 1 4 0.00 2.25 0.70 PASSED WALL 4 4 3 1 4 0.00 1.95 0.90 PASSED WALL 4 4 4 1 4 0.00 4.46 1.02 PASSED WALL 4 4 5 1 4 0.00 4.35 1.02 PASSED WALL 10 12 2 1 4 0.00 12.40 0.46 PASSED WALL 10 12 3 1 4 0.00 11.72 0.57 PASSED WALL 10 12 4 1 4 0.00 11.89 0.60 PASSED WALL 10 12 5 1 4 0.00 15.07 0.71 PASSED WALL 17 13 2 1 4 0.00 20.68 0.40 PASSED WALL 17 13 3 1 4 0.00 20.05 0.38 PASSED WALL 17 13 4 1 4 0.00 19.74 0.44 PASSED WALL 17 13 5 1 4 0.00 22.88 0.42 PASSED WALL 13 13 2 1 4 0.00 14.51 0.51 PASSED WALL 13 13 3 1 4 0.00 14.26 0.45 PASSED WALL 13 13 4 1 4 0.00 14.73 0.45 PASSED WALL 13 13 5 1 4 0.00 16.00 0.49 PASSED WALL 4 4 2 4 1 0.00 1.94 1.00 PASSED WALL 4 4 3 4 1 0.00 1.43 0.68 PASSED WALL 4 4 4 4 1 0.00 4.09 1.02 PASSED WALL 4 4 5 4 1 0.00 3.71 1.02 PASSED WALL 10 12 2 4 1 0.00 10.03 0.56 PASSED WALL 10 12 3 4 1 0.00 9.64 0.67 PASSED WALL 10 12 4 4 1 0.00 10.12 0.52 PASSED WALL 10 12 5 4 1 0.00 13.81 0.61 PASSED WALL 17 13 2 4 1 0.00 17.76 0.53 PASSED WALL 17 13 3 4 1 0.00 16.10 0.46 PASSED WALL 17 13 4 4 1 0.00 16.19 0.35 PASSED WALL 17 13 5 4 1 0.00 18.28 0.42 PASSED WALL 13 13 2 4 1 0.00 12.42 0.51 PASSED WALL 13 13 3 4 1 0.00 10.87 0.43 PASSED WALL 13 13 4 4 1 0.00 11.58 0.54 PASSED WALL 13 13 5 4 1 0.00 13.09 0.51 PASSED Finished 64 tests, with the following results: 64 tests completed and passed residual checks. 0 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. Takes a second or so to complete -------------- next part -------------- SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 0.00 0.79 PASSED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 2 2 0.00 0.00 NaN FAILED { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 2 2 0.00 0.00 NaN FAILED { 1, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 1 4 0.00 0.00 NaN FAILED { 0, 3}: Memory overwrite in PCGEBRD { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 4 0.00 0.00 NaN FAILED { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 4 0.00 0.00 NaN FAILED { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 4 0.00 0.00 NaN FAILED { 3, 0}: Memory overwrite in PCGEBRD { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 4 1 0.00 0.00 NaN FAILED { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 4 1 0.00 0.00 NaN FAILED { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 4 1 0.00 0.00 NaN FAILED { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 4 1 0.00 0.00 NaN FAILED Finished 64 tests, with the following results: 1 tests completed and passed residual checks. 63 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. This takes a long time to end -------------- next part -------------- scalapack-1.7.5/TESTING/xcbrd mvapich1/g77: SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 0.00 0.70 PASSED and hangs -------------- next part -------------- SCALAPACK Bidiagonal reduction 'MPI machine' Tests of the parallel complex single precision bidiagonal reduction routines. The following scaled residual checks will be computed: ||A - Q B P'|| / (||A|| * eps * N) The matrix A is randomly generated for each test. An explanation of the input/output parameters follows: TIME : Indicates whether WALL or CPU time was used. M : The number of rows of the matrix A. N : The number of columns of the matrix A. NB : The size of the square blocks the matrix A is split into. P : The number of process rows. Q : The number of process columns. THRESH : If a residual value is less than THRESH, CHECK is flagged as PASSED BRD time : Time in seconds to reduce the matrix MFLOPS : Rate of execution for the bidiagonal reduction. The following parameter values will be used: M : 4 10 17 13 N : 4 12 13 13 NB : 2 3 4 5 P : 1 2 1 4 Q : 1 2 4 1 Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 0.00 0.79 PASSED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 2 2 0.00 0.00 NaN FAILED { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 2 2 0.00 0.00 NaN FAILED { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 1}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 2 2 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 4 0.00 0.00 NaN FAILED { 0, 2}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 1}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 0, 3}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 1 4 0.00 0.00 NaN FAILED { 0, 3}: Memory overwrite in PCGEBRD { 0, 3}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 1 4 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 1 4 0.00 0.00 NaN FAILED { 3, 0}: Memory overwrite in PCGEBRD { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 2 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 4 1 0.00 0.00 NaN FAILED { 2, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 1, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. { 3, 0}: Memory overwrite in PCGEBRD ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 4 4 1 0.00 0.00 NaN FAILED { 3, 0}: Memory overwrite in PCGEBRD { 3, 0}: PCGEBRD memory overwrite in pre-guardzone: loc( 1) = -9923. + i* -9923. ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 10 12 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 17 13 5 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 2 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 3 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 4 4 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 13 13 5 4 1 0.00 0.00 NaN FAILED Finished 64 tests, with the following results: 1 tests completed and passed residual checks. 63 tests completed and failed residual checks. 0 tests skipped because of illegal input values. END OF TESTS. Takes a long time to complete From sylvain.jeaugey at bull.net Wed Apr 4 08:30:11 2007 From: sylvain.jeaugey at bull.net (Sylvain Jeaugey) Date: Wed Apr 4 08:34:02 2007 Subject: [mvapich-discuss] MVAPICH 2 Progress Code improvement for RDMA_FAST_PATH In-Reply-To: References: Message-ID: Hi all, Sorry for the delay, I was preempted on other tasks. I just take a few minutes to send you this patch. It is certainly not the best way to do it, but it works fine. Feel free to adapt it to fit with the rest, as long as the performance is preserved. The patch applies on top of mvapich2 0.9.8. Sylvain On Sun, 25 Mar 2007, wei huang wrote: > Hi Sylvain, > > Thanks for your effort on helping us improving rdma fast path code. Your > proposal looks good to us. There may be some corner cases in the progress > engine that need to be considered, but we should be able to take care of > them later. We are actually working on a later 1.0 release, which will > have more features including enhanced messaging rate, enhanced > collectives, etc. Now it should be the right time to incorperate such > enhancement. We will thus have time to systematically test and evaluate > the changes. A patch from you will definitely help us moving faster along > this direction. A patch against 0.9.8 should work fine. > > Thanks again and looking forward to discussing with you further. > > -- Wei > >> [ADAPTIVE_]RDMA_FAST_PATH is an optimization to provide low latency on >> mvapich2. The issue is, latency increases as the number of total processes >> grows. Finally, when you launch a job with over 32 processes, latency is >> worse than the standard send/recv protocol. >> >> The reason for that is very simple. Contrary to the send/recv protocol >> which gets its receives in a single completion queue, the RDMA fast path >> has to poll _every_ RDMA queue to find out from which queue to receive >> data. >> >> My first try to improve that was to poll only on the VCs associated to >> requests passed to MPID_Progress. That didn't work well because >> unfortunately, well-written MPI applications are scarce, and calling >> MPI_Wait on the wrong request resulted in a deadlock. >> >> My second try is a lot better. The RDMA polling set is now restrained to : >> * VCs on which we have waiting posted receives; >> * VCs on which we have a rendez-vous send in progress. >> .. and it seems to work fine and quickly, since polling is quite always >> directed to the right VC. >> >> Has anyone already a good (better) solution for that ? Am I totally >> mistaken in my understanding of the MVAPICH 2 code ? If I'm not, I will >> consider cleaning things and proposing a patch against 0.9.8, unless I >> should wait until 0.9.9 ? >> >> Thanks in advance for your opinions/comments/flames on that, >> >> Sylvain >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > -------------- next part -------------- diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_channel_manager.c mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_channel_manager.c --- mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_channel_manager.c 2006-10-13 16:50:28.000000000 +0200 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_channel_manager.c 2007-03-26 17:04:40.000000000 +0200 @@ -240,16 +240,73 @@ return type; } +int MPIDI_CH3I_MRAILI_Get_next_vbuf_on_vc(MPIDI_VC_t * vc, MPIDI_VC_t ** vc_ptr, vbuf ** vbuf_ptr) { + vbuf *v; + int seq; + volatile VBUF_FLAG_TYPE *tail; + + v = NULL; + + if (vc->mrail.rfp.RDMA_recv_buf == NULL) { + vbuf_fast_rdma_alloc(vc, 1); + vbuf_address_send(vc); + } + + seq = GetSeqNumVbuf(vc->mrail.cmanager.msg_channels[INDEX_LOCAL(&vc->mrail.cmanager,0)].v_queue_head); + if (seq == PKT_IS_NULL) { + v = &(vc->mrail.rfp.RDMA_recv_buf[vc->mrail.rfp.p_RDMA_recv]); + tail = v->head_flag; + + if (*tail && vc->mrail.rfp.p_RDMA_recv != vc->mrail.rfp.p_RDMA_recv_tail) { + DEBUG_PRINT("Get one!!!!!!!!!!!!!!\n"); + if (++(vc->mrail.rfp.p_RDMA_recv) >= num_rdma_buffer) + vc->mrail.rfp.p_RDMA_recv = 0; + MRAILI_FAST_RDMA_VBUF_START(v, *tail, v->pheader) + v->content_size = *v->head_flag; + + seq = GetSeqNumVbuf(v); + if (seq == vc->seqnum_recv) { + DEBUG_PRINT("Get one exact seq: %d\n", seq); + vc->seqnum_recv ++; + *vbuf_ptr = v; + *vc_ptr = v->vc; + return T_CHANNEL_EXACT_ARRIVE; + } else if( seq == PKT_NO_SEQ_NUM) { + DEBUG_PRINT("[vbuf_local]: get control msg\n"); + *vbuf_ptr = v; + *vc_ptr = v->vc; + return T_CHANNEL_CONTROL_MSG_ARRIVE; + } else { + DEBUG_PRINT("Get one out of order seq: %d, expecting %d\n", + seq, vc->seqnum_recv); + VQUEUE_ENQUEUE(&vc->mrail.cmanager, + INDEX_LOCAL(&vc->mrail.cmanager,0), v); + return T_CHANNEL_NO_ARRIVE; + } + } else + return T_CHANNEL_NO_ARRIVE; + } + + if (seq == vc->seqnum_recv) { + *vbuf_ptr = VQUEUE_DEQUEUE(&vc->mrail.cmanager, INDEX_LOCAL(&vc->mrail.cmanager,0)); + *vc_ptr = (*vbuf_ptr)->vc; + vc->seqnum_recv ++; + return T_CHANNEL_EXACT_ARRIVE; + } else if (seq == PKT_NO_SEQ_NUM) { + *vbuf_ptr = VQUEUE_DEQUEUE(&vc->mrail.cmanager, INDEX_LOCAL(&vc->mrail.cmanager,0)); + *vc_ptr = (*vbuf_ptr)->vc; + return T_CHANNEL_CONTROL_MSG_ARRIVE; + } + return T_CHANNEL_NO_ARRIVE; +} + +extern MPIDI_VC_t * rvip_list; + int MPIDI_CH3I_MRAILI_Get_next_vbuf(MPIDI_VC_t ** vc_ptr, vbuf ** vbuf_ptr) { - MPIDI_VC_t *vc; int type = T_CHANNEL_NO_ARRIVE; int i; - int seq; - vbuf *v; - volatile VBUF_FLAG_TYPE *tail; - v = NULL; *vc_ptr = NULL; *vbuf_ptr = NULL; @@ -265,58 +322,44 @@ if (num_rdma_buffer == 0) goto fn_exit; - /* no msg is queued, poll rdma polling set */ - for (i = 0; i < MPIDI_CH3I_RDMA_Process.polling_group_size; i++) { - vc = MPIDI_CH3I_RDMA_Process.polling_set[i]; - seq = GetSeqNumVbuf(vc->mrail.cmanager.msg_channels[INDEX_LOCAL(&vc->mrail.cmanager,0)].v_queue_head); - if (seq == PKT_IS_NULL) { - v = &(vc->mrail.rfp.RDMA_recv_buf[vc->mrail.rfp.p_RDMA_recv]); - tail = v->head_flag; - - if (*tail && vc->mrail.rfp.p_RDMA_recv != vc->mrail.rfp.p_RDMA_recv_tail) { - DEBUG_PRINT("Get one!!!!!!!!!!!!!!\n"); - if (++(vc->mrail.rfp.p_RDMA_recv) >= num_rdma_buffer) - vc->mrail.rfp.p_RDMA_recv = 0; - MRAILI_FAST_RDMA_VBUF_START(v, *tail, v->pheader) - v->content_size = *v->head_flag; - - seq = GetSeqNumVbuf(v); - if (seq == vc->seqnum_recv) { - DEBUG_PRINT("Get one exact seq: %d\n", seq); - type = T_CHANNEL_EXACT_ARRIVE; - vc->seqnum_recv ++; - *vbuf_ptr = v; - *vc_ptr = v->vc; - goto fn_exit; - } else if( seq == PKT_NO_SEQ_NUM) { - type = T_CHANNEL_CONTROL_MSG_ARRIVE; - DEBUG_PRINT("[vbuf_local]: get control msg\n"); - *vbuf_ptr = v; - *vc_ptr = v->vc; - goto fn_exit; - } else { - DEBUG_PRINT("Get one out of order seq: %d, expecting %d\n", - seq, vc->seqnum_recv); - VQUEUE_ENQUEUE(&vc->mrail.cmanager, - INDEX_LOCAL(&vc->mrail.cmanager,0), v); - continue; - } - } else - continue; - } - - if (seq == vc->seqnum_recv) { - *vbuf_ptr = VQUEUE_DEQUEUE(&vc->mrail.cmanager, INDEX_LOCAL(&vc->mrail.cmanager,0)); - *vc_ptr = (*vbuf_ptr)->vc; - vc->seqnum_recv ++; - type = T_CHANNEL_EXACT_ARRIVE; - goto fn_exit; - } else if (seq == PKT_NO_SEQ_NUM) { - *vbuf_ptr = VQUEUE_DEQUEUE(&vc->mrail.cmanager, INDEX_LOCAL(&vc->mrail.cmanager,0)); - *vc_ptr = (*vbuf_ptr)->vc; - type = T_CHANNEL_CONTROL_MSG_ARRIVE; - goto fn_exit; - } + if (rdma_targeted_polling) { + /* New Progress Path */ + /* no msg is queued, poll rdma polling set */ + MPIDI_VC_t ** vcs; + int count; + MPIDI_CH3U_Recvq_get_AVT(&vcs, &count); + + if (!vcs && count == -2) { /* any source receives : poll everything */ + MPIDI_CH3U_Recvq_release_AVT(); + for (i = 0; i < MPIDI_CH3I_RDMA_Process.polling_group_size; i++) + if (type = MPIDI_CH3I_MRAILI_Get_next_vbuf_on_vc(MPIDI_CH3I_RDMA_Process.polling_set[i], vc_ptr, vbuf_ptr) != T_CHANNEL_NO_ARRIVE) + goto fn_exit; + goto fn_exit; + } else { + /* progress on rendez-vous transfers */ + { + MPIDI_VC_t * vc = rvip_list; + while (vc) { + if (type = MPIDI_CH3I_MRAILI_Get_next_vbuf_on_vc(vc, vc_ptr, vbuf_ptr) != T_CHANNEL_NO_ARRIVE) { + MPIDI_CH3U_Recvq_release_AVT(); + goto fn_exit; + } + vc = vc->mrail.next_rvip_vc; + } + } + /* progress on receives */ + for (i = 0; i < count; i++) { + if (type = MPIDI_CH3I_MRAILI_Get_next_vbuf_on_vc(vcs[i], vc_ptr, vbuf_ptr) != T_CHANNEL_NO_ARRIVE) { + MPIDI_CH3U_Recvq_release_AVT(); + goto fn_exit; + } + } + MPIDI_CH3U_Recvq_release_AVT(); + } + } else { + for (i = 0; i < MPIDI_CH3I_RDMA_Process.polling_group_size; i++) + if (type = MPIDI_CH3I_MRAILI_Get_next_vbuf_on_vc(MPIDI_CH3I_RDMA_Process.polling_set[i], vc_ptr, vbuf_ptr) != T_CHANNEL_NO_ARRIVE) + goto fn_exit; } fn_exit: return type; diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.c mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.c --- mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.c 2006-11-10 20:07:37.000000000 +0100 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.c 2007-03-26 17:26:56.000000000 +0200 @@ -59,6 +59,7 @@ int rdma_max_inline_size; unsigned int rdma_ndreg_entries = RDMA_NDREG_ENTRIES; int num_rdma_buffer; +int rdma_targeted_polling; /* max (total) number of vbufs to allocate, after which process * terminates with a fatal error. @@ -447,6 +448,8 @@ } else { rdma_credit_preserve = 3; } + + rdma_targeted_polling = 1; } void rdma_get_user_parameters(int num_proc, int me) @@ -508,6 +511,9 @@ if ((value = getenv("MV2_NUM_RDMA_BUFFER")) != NULL) { num_rdma_buffer = (int)atoi(value); } + if ((value = getenv("MV2_TARGETED_POLLING")) != NULL) { + rdma_targeted_polling = (int)atoi(value); + } if ((value = getenv("MV2_POLLING_SET_THRESHOLD")) != NULL && MPIDI_CH3I_RDMA_Process.has_adaptive_fast_path) { rdma_polling_set_threshold = atoi(value); diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.h mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.h --- mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.h 2006-10-03 20:22:56.000000000 +0200 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/ibv_param.h 2007-03-26 17:02:05.000000000 +0200 @@ -42,6 +42,7 @@ extern int rdma_read_reserve; extern float rdma_credit_update_threshold; extern int num_rdma_buffer; +extern int rdma_targeted_polling; extern int rdma_iba_eager_threshold; extern char rdma_iba_hca[32]; extern unsigned int rdma_ndreg_entries; diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_post.h mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_post.h --- mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_post.h 2006-10-03 20:22:56.000000000 +0200 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_post.h 2007-03-26 17:02:05.000000000 +0200 @@ -126,6 +126,51 @@ } \ } +extern MPIDI_VC_t *rvip_list; + +/* Maintain rvip list */ +#define ENQUEUE_RVIP_LIST(v) { \ + int already_in_list = 0; \ + MPIDI_VC_t *last_v_, *v_ = rvip_list; \ + while (v_) { \ + if (v_ == v) {\ + already_in_list = 1; \ + break; \ + } \ + last_v_ = v_; \ + v_ = v_->mrail.next_rvip_vc; \ + } \ + if (!already_in_list) { \ + if (rvip_list) \ + last_v_->mrail.next_rvip_vc = v; \ + else \ + rvip_list = v; \ + v->mrail.next_rvip_vc = NULL; \ + v->mrail.rvip_count = 1; \ + } else { \ + v->mrail.rvip_count++; \ + } \ +} + +#define DEQUEUE_RVIP_LIST(v) { \ + MPIDI_VC_t *last_v_, *v_ = rvip_list; \ + while (v_) { \ + if (v_ == v) {\ + if (v->mrail.rvip_count > 1) { \ + v->mrail.rvip_count--; \ + } else { \ + if (v == rvip_list) \ + rvip_list = v->mrail.next_rvip_vc; \ + else \ + last_v_->mrail.next_rvip_vc = v->mrail.next_rvip_vc; \ + } \ + break; \ + } \ + last_v_ = v_; \ + v_ = v_->mrail.next_rvip_vc; \ + } \ +} + /* * Attached to each connection is a list of send handles that * represent rendezvous sends that have been started and acked but not @@ -166,6 +211,8 @@ if (NULL == (c)->mrail.sreq_head) { \ (c)->mrail.sreq_tail = NULL; \ } \ + if ((c)->mrail.sreq_head == NULL) \ + DEQUEUE_RVIP_LIST(c); \ } #define MPIDI_CH3I_MRAIL_SET_PKT_RNDV(_pkt, _req) \ diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h --- mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h 2006-11-10 20:07:37.000000000 +0100 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h 2007-03-26 17:02:05.000000000 +0200 @@ -273,6 +273,11 @@ */ void *nextflow; int inflow; + + /* rvip_list construction */ + void *next_rvip_vc; + int rvip_count; + /* used to distinguish which VIA barrier synchronozations have * completed on this connection. Currently, only used during * process teardown. diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_istartrndvmsg.c mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_istartrndvmsg.c --- mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_istartrndvmsg.c 2006-11-10 20:07:37.000000000 +0100 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_istartrndvmsg.c 2007-03-26 17:02:05.000000000 +0200 @@ -168,6 +168,9 @@ DEBUG_PRINT("[send rts]successful complete\n"); MPIDI_DBG_PRINTF((50, FCNAME, "exiting")); MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_ISTARTRNDVMSG); + + ENQUEUE_RVIP_LIST(vc); + return mpi_errno; } @@ -243,6 +246,9 @@ MPIDI_CH3I_CR_unlock(); #endif DEBUG_PRINT("[send rts]successful complete\n"); + + ENQUEUE_RVIP_LIST(vc); + return mpi_errno; } @@ -286,6 +292,8 @@ MPIDI_CH3I_CR_unlock(); #endif + ENQUEUE_RVIP_LIST(vc); + return mpi_errno; } diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c --- mvapich2-0.9.8/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c 2006-11-10 20:07:37.000000000 +0100 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c 2007-03-26 17:02:05.000000000 +0200 @@ -28,6 +28,9 @@ MPIDI_VC_t *flowlist; +/* rendez-vous in progress list */ +MPIDI_VC_t *rvip_list = NULL; + #undef DEBUG_PRINT #ifdef DEBUG #define DEBUG_PRINT(args...) \ diff -ur mvapich2-0.9.8/src/mpid/osu_ch3/src/ch3u_recvq.c mvapich2-0.9.8-tpp/src/mpid/osu_ch3/src/ch3u_recvq.c --- mvapich2-0.9.8/src/mpid/osu_ch3/src/ch3u_recvq.c 2006-08-04 16:03:58.000000000 +0200 +++ mvapich2-0.9.8-tpp/src/mpid/osu_ch3/src/ch3u_recvq.c 2007-03-26 17:06:22.000000000 +0200 @@ -503,3 +503,105 @@ MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3U_RECVQ_FDP_OR_AEU); return rreq; } + +/* + * MPIDI_CH3U_Recvq_get_AVT() + * + * Get the Active VCs Table to know which VCs to poll. + * The table is global, MPIDI_CH3U_Recvq_release_AVT must be called after use. + */ +#undef FUNCNAME +#define FUNCNAME MPIDI_CH3U_Recvq_get_vcs +#undef FCNAME +#define FCNAME MPIDI_QUOTE(FUNCNAME) +void MPIDI_CH3U_Recvq_get_AVT(MPIDI_VC_t *** vcs, int *count) +{ + static MPIDI_VC_t ** vcs_array = NULL; +#ifdef VCS_DEBUG + static MPIDI_VC_t ** last_vcs_array = NULL; +#endif + static int vcs_array_size = 16; + MPID_Request * rreq; + int i, n_vcs = 0; + MPIDI_VC_t *vc; + + MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3U_RECVQ_GET_AVT); + + MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3U_RECVQ_GET_AVT); + + /* FIXME Lock */ + + if (vcs_array == NULL) { /* Done once */ + vcs_array = (MPIDI_VC_t **)(malloc(sizeof(MPIDI_VC_t *) * vcs_array_size)); +#ifdef VCS_DEBUG + last_vcs_array = (MPIDI_VC_t **)(malloc(sizeof(MPIDI_VC_t *) * vcs_array_size)); +#endif + } + + rreq = recvq_posted_head; + while (rreq) { + if (rreq->dev.match.rank == -2) { + /* ANY_SOURCE ! */ + *vcs = NULL; + *count = -2; + goto fn_exit; + } + + /* request is not any_source */ + MPIDI_Comm_get_vc(rreq->comm, rreq->dev.match.rank, &vc); + + for (i=0; i vcs_array_size) { + vcs_array_size *= 2; + vcs_array = (MPIDI_VC_t **)(realloc(vcs_array, sizeof(MPIDI_VC_t *) * vcs_array_size)); +#ifdef VCS_DEBUG + last_vcs_array = (MPIDI_VC_t **)(realloc(last_vcs_array, sizeof(MPIDI_VC_t *) * vcs_array_size)); +#endif + } +next_req: + rreq=rreq->dev.next; + } + *count = n_vcs; + *vcs = vcs_array; + +#ifdef VCS_DEBUG + for (i=0; ipg_rank); + break; + } + } + + for (i=0; i References: <4613464D.1040307@sara.nl> Message-ID: <46151019.3050608@sara.nl> Just tried mvapich1 version 0.9.9 fro svn and this also fails the xcbrd test: Relative machine precision (eps) is taken to be 0.596046E-07 Routines pass computational tests if scaled residual is less than 10.000 TIME M N NB P Q BRD Time MFLOPS Residual CHECK ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ WALL 4 4 2 1 1 0.00 0.00 0.58 PASSED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 3 1 1 0.00 0.00 NaN FAILED ||A - Q*B*P|| / (||A|| * N * eps) = NaN WALL 4 4 4 1 1 0.00 0.00 NaN FAILED -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From panda at cse.ohio-state.edu Thu Apr 5 11:11:48 2007 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Apr 5 11:11:53 2007 Subject: [mvapich-discuss] xcbrd tests In-Reply-To: <46151019.3050608@sara.nl> from "Bas van der Vlies" at Apr 05, 2007 05:04:57 PM Message-ID: <200704051511.l35FBmex026424@xi.cse.ohio-state.edu> Hi Bas, Thanks for your notes. We are taking a look at this issue and hope to get back to you shortly. Thanks, DK > Just tried mvapich1 version 0.9.9 fro svn and this also fails the xcbrd > test: > > Relative machine precision (eps) is taken to be 0.596046E-07 > Routines pass computational tests if scaled residual is less than > 10.000 > > TIME M N NB P Q BRD Time MFLOPS Residual CHECK > ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ > > WALL 4 4 2 1 1 0.00 0.00 0.58 PASSED > ||A - Q*B*P|| / (||A|| * N * eps) = NaN > WALL 4 4 3 1 1 0.00 0.00 NaN FAILED > ||A - Q*B*P|| / (||A|| * N * eps) = NaN > WALL 4 4 4 1 1 0.00 0.00 NaN FAILED > > > > -- > ******************************************************************** > * * > * Bas van der Vlies e-mail: basv@sara.nl * > * SARA - Academic Computing Services phone: +31 20 592 8012 * > * Kruislaan 415 fax: +31 20 6683167 * > * 1098 SJ Amsterdam * > * * > ******************************************************************** > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From rowland at cse.ohio-state.edu Sat Apr 7 00:03:08 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Sat Apr 7 00:03:28 2007 Subject: [mvapich-discuss] xcbrd tests In-Reply-To: <46151019.3050608@sara.nl> References: <4613464D.1040307@sara.nl> <46151019.3050608@sara.nl> Message-ID: <461717FC.1040709@cse.ohio-state.edu> Bas van der Vlies wrote: > Just tried mvapich1 version 0.9.9 fro svn and this also fails the xcbrd > test: > > Relative machine precision (eps) is taken to be 0.596046E-07 > Routines pass computational tests if scaled residual is less than 10.000 > > TIME M N NB P Q BRD Time MFLOPS Residual CHECK > ---- ------ ------ --- ----- ----- --------- ----------- -------- ------ > > WALL 4 4 2 1 1 0.00 0.00 0.58 PASSED > ||A - Q*B*P|| / (||A|| * N * eps) = NaN > WALL 4 4 3 1 1 0.00 0.00 NaN FAILED > ||A - Q*B*P|| / (||A|| * N * eps) = NaN > WALL 4 4 4 1 1 0.00 0.00 NaN FAILED Hi Bas. I've been looking into this issue for a while. I believe I know what the problem is. I built the following packages: BLACS ATLAS ScaLAPACK (using the two above) I built these four times for the following MPI installations: MVAPICH 0.9.9 with gfortran MVAPICH 0.9.9 with g77 MVAPICH2 0.9.8 with gfortran MVAPICH2 0.9.8 with g77 The only time I had a problem was when I accidentally built the ATLAS package with g77 and tried to use it with the other packages that were built with gfortran. I thought this was probably the problem here anyway, but since I accidentally did this the first time - I could see that I got the same errors as you had reported for that one case. I believe your problem is that all of your packages, including MVAPICH/MVAPICH2, were not built with the same Fortran compiler. The Fortran compiler needs to be common in all builds or you will run into problems, and the problems won't be apparent until you try to use the libraries and get strange results. This is exactly the type of situation you've reported. If you make sure that ScaLAPACK is built with either g77 or gfortran, matching the MVAPICH/MVAPICH2 build you want to test - and also any of the dependencies of ScaLAPACK as well - then these strange problems should go away. The Fortran compiler needs to be the same all around. I have not yet run every ScaLAPACK test, but I ran the ones you reported. I had no issues when the Fortran compilers were uniform. Only when there was a cross g77/gfortran built library introduced did I see the same behavior. I have notes on how I built the packages listed above: BLACS ----- http://www.cse.ohio-state.edu/~rowland/work/blacs.html ATLAS ----- http://www.cse.ohio-state.edu/~rowland/work/atlas.html ScaLAPACK --------- http://www.cse.ohio-state.edu/~rowland/work/scalapack.html Maybe those notes will be useful to compare with. On a side note: if you are using shared library builds of MVAPICH/MVAPICH2 to test, be sure that the path to libmpich.a is not used in any of these configuration files because the mpicc and mpif77 commands place the path of the shared library right into the binary result, and this will cause problems if the programs are statically linked in a weird way like this. No one would or should do that sort of thing, but with these configuration files you need to edit - it's possible to make a mistake here. If you are using a static library build of MVAPICH/MVAPICH2, this does not matter. The steps I've outlined note this appropriately and do the right thing. Also, you do need GFORTRAN_UNBUFFERED_ALL=y set in your environment for the gfortran cases. For MVAPICH2, simply export that variable. For MVAPICH it needs to be specified on the mpirun_rsh command line: mpirun_rsh -np 4 host1 host2 host3 host4 GFORTRAN_UNBUFFERED_ALL=y ./test for example. This is noted on the web pages above too. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From basv at sara.nl Sat Apr 7 03:31:11 2007 From: basv at sara.nl (Bas van der Vlies) Date: Sat Apr 7 03:31:20 2007 Subject: [mvapich-discuss] xcbrd tests In-Reply-To: <461717FC.1040709@cse.ohio-state.edu> References: <4613464D.1040307@sara.nl> <46151019.3050608@sara.nl> <461717FC.1040709@cse.ohio-state.edu> Message-ID: <17F55245-49AC-45C3-9907-714CF22AE37A@sara.nl> Shaun, First of all we use the same fortran compilers for all the packages that is why we send the results to this list. I will read your web pages and build the libraries as you suggested and let you know what the result is. My question is what version of mvapich2 did you use. we have version 0.9.8 with 4 patches applied. Which gfortran/gcc version. Ours is 4.1.1 Regards and a happy easter On Apr 7, 2007, at 6:03 AM, Shaun Rowland wrote: > Bas van der Vlies wrote: >> Just tried mvapich1 version 0.9.9 fro svn and this also fails the >> xcbrd test: >> Relative machine precision (eps) is taken to be 0.596046E-07 >> Routines pass computational tests if scaled residual is less than >> 10.000 >> TIME M N NB P Q BRD Time MFLOPS Residual >> CHECK >> ---- ------ ------ --- ----- ----- --------- ----------- -------- >> ------ >> WALL 4 4 2 1 1 0.00 0.00 0.58 >> PASSED >> ||A - Q*B*P|| / (||A|| * N * eps) = NaN >> WALL 4 4 3 1 1 0.00 0.00 NaN >> FAILED >> ||A - Q*B*P|| / (||A|| * N * eps) = NaN >> WALL 4 4 4 1 1 0.00 0.00 NaN >> FAILED > > Hi Bas. I've been looking into this issue for a while. I believe I > know what the problem is. I built the following packages: > > BLACS > ATLAS > ScaLAPACK (using the two above) > > I built these four times for the following MPI installations: > > MVAPICH 0.9.9 with gfortran > MVAPICH 0.9.9 with g77 > MVAPICH2 0.9.8 with gfortran > MVAPICH2 0.9.8 with g77 > > The only time I had a problem was when I accidentally built the ATLAS > package with g77 and tried to use it with the other packages that were > built with gfortran. I thought this was probably the problem here > anyway, but since I accidentally did this the first time - I could see > that I got the same errors as you had reported for that one case. I > believe your problem is that all of your packages, including > MVAPICH/MVAPICH2, were not built with the same Fortran compiler. The > Fortran compiler needs to be common in all builds or you will run into > problems, and the problems won't be apparent until you try to use the > libraries and get strange results. This is exactly the type of > situation > you've reported. > > If you make sure that ScaLAPACK is built with either g77 or gfortran, > matching the MVAPICH/MVAPICH2 build you want to test - and also any of > the dependencies of ScaLAPACK as well - then these strange problems > should go away. The Fortran compiler needs to be the same all > around. I > have not yet run every ScaLAPACK test, but I ran the ones you > reported. > I had no issues when the Fortran compilers were uniform. Only when > there > was a cross g77/gfortran built library introduced did I see the same > behavior. > > I have notes on how I built the packages listed above: > > BLACS > ----- > http://www.cse.ohio-state.edu/~rowland/work/blacs.html > > ATLAS > ----- > http://www.cse.ohio-state.edu/~rowland/work/atlas.html > > ScaLAPACK > --------- > http://www.cse.ohio-state.edu/~rowland/work/scalapack.html > > Maybe those notes will be useful to compare with. On a side note: > if you > are using shared library builds of MVAPICH/MVAPICH2 to test, be sure > that the path to libmpich.a is not used in any of these configuration > files because the mpicc and mpif77 commands place the path of the > shared > library right into the binary result, and this will cause problems if > the programs are statically linked in a weird way like this. No one > would or should do that sort of thing, but with these configuration > files you need to edit - it's possible to make a mistake here. If you > are using a static library build of MVAPICH/MVAPICH2, this does not > matter. The steps I've outlined note this appropriately and do the > right > thing. > > Also, you do need GFORTRAN_UNBUFFERED_ALL=y set in your environment > for > the gfortran cases. For MVAPICH2, simply export that variable. For > MVAPICH it needs to be specified on the mpirun_rsh command line: > > mpirun_rsh -np 4 host1 host2 host3 host4 > GFORTRAN_UNBUFFERED_ALL=y ./test > > for example. This is noted on the web pages above too. > -- > Shaun Rowland rowland@cse.ohio-state.edu > http://www.cse.ohio-state.edu/~rowland/ -- Bas van der Vlies basv@sara.nl From rowland at cse.ohio-state.edu Sat Apr 7 06:32:35 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Sat Apr 7 06:32:55 2007 Subject: [mvapich-discuss] xcbrd tests In-Reply-To: <17F55245-49AC-45C3-9907-714CF22AE37A@sara.nl> References: <4613464D.1040307@sara.nl> <46151019.3050608@sara.nl> <461717FC.1040709@cse.ohio-state.edu> <17F55245-49AC-45C3-9907-714CF22AE37A@sara.nl> Message-ID: <46177343.3010505@cse.ohio-state.edu> Bas van der Vlies wrote: > Shaun, > > First of all we use the same fortran compilers for all the packages > that is why we send the results to this list. I will read your web pages > and build the libraries as you suggested and let you know what the > result is. My question is what version of mvapich2 did you use. we have > version 0.9.8 with 4 patches applied. Do you have separate builds of BLACS, ATLAS, and ScaLAPACK for each of the MPI implementations and the Fortran compiler used in each? In addition to the Fortran issue, BLACS and ScaLAPACK also link to the specific MPI library. My point about the Fortran compiler possibility was only because I saw similar problems when I incorrectly built ATLAS with the wrong Fortran compiler, otherwise I saw no issues with the tests you mentioned. Please double check the ATLAS package build if you are using that - that seems to be a possible problem to me. There is a "sanity check" target as well, to make sure there are no problems with the build. As for the version of MVAPICH2, I am using the current revision as of today from: https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich2/branches/0.9.8 which should be revision 1158. We are only applying bug fixes there. > Which gfortran/gcc version. Ours is 4.1.1 I used gfortran 4.1.0 and gcc version 3.4.6 (this is RHEL AS 4 Update 4). The version difference between the C and Fortran compilers should not matter in this case, however - I just built with "gcc4" on RHEL 4 as well, so that gcc and gfortran would both be version 4.1.0. In order to do this I had to move /usr/bin/gcc out of the way and copy /usr/bin/gcc4 to /usr/bin/gcc because the ATALS build system is, well.. odd. It went from using the full path I had specified, /usr/bin/gcc4, to just "gcc," hence the need to make sure "gcc" was always "gcc4." It also did not like gcc 4.1.0, but I told it to proceed anyway. I had no problems with the tests you reported. The online documentation shows an example of building with gcc and ifort, and mentions using a non-default Fortran compiler, so using an alternative Fortran compiler should not be an issue, and again, the only time the tests you mentioned failed in such a way for me was when I accidentally built ATLAS with g77 instead of gfortran, while everything else was built with gfortran, otherwise I made sure the Fortran compilers were identical in all the packages and had no issues with those tests. In addition, BLACS and ScaLAPACK were built with the same MPI implementation's mpicc/mpif77 commands too. When you build ATLAS, does the sanity check pass all the way when using gcc 4.1.1 and your chosen Fortran compiler? Note: when I tried forcing gcc 4.1.0, the ATLAS sanity check did not build properly - which leaves me at a loss to whether the library is really correct. But, as it seems from the documentation, building with an alternative Fortran compiler (using gcc 3.x and gfortran 4.x) should be valid. But, in any case, my gcc 4.1.0 and gfortran 4.1.0 build did work for MVAPICH2 here. All other cases I tried were gcc 3.4.6 and gfortran 4.1.0. > Regards and a happy easter Thanks, and happy easter to you. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From basv at sara.nl Tue Apr 10 07:45:56 2007 From: basv at sara.nl (Bas van der Vlies) Date: Tue Apr 10 07:46:05 2007 Subject: [mvapich-discuss] xcbrd tests In-Reply-To: <46177343.3010505@cse.ohio-state.edu> References: <4613464D.1040307@sara.nl> <46151019.3050608@sara.nl> <461717FC.1040709@cse.ohio-state.edu> <17F55245-49AC-45C3-9907-714CF22AE37A@sara.nl> <46177343.3010505@cse.ohio-state.edu> Message-ID: <461B78F4.3090009@sara.nl> Shaun Rowland wrote: > Bas van der Vlies wrote: >> Shaun, >> >> First of all we use the same fortran compilers for all the packages >> that is why we send the results to this list. I will read your web >> pages and build the libraries as you suggested and let you know what >> the result is. My question is what version of mvapich2 did you use. we >> have version 0.9.8 with 4 patches applied. > > Do you have separate builds of BLACS, ATLAS, and ScaLAPACK for each of > the MPI implementations and the Fortran compiler used in each? In > addition to the Fortran issue, BLACS and ScaLAPACK also link to the > specific MPI library. > Shaun we use BLACS, BLAS (instead of ATLAS is easier to build and test) and we use SCALAPACK. For each MPI implementation we have build versions. We build everything with gcc/gfortran 4.1.1 version. We are Using Debian Etch. Now for mvapich1-0.9.9 i have made SCALAPACK and BLACS with different optimization flags (-03 for fortran and -04 for cc) and the xcbrd test stops after serveral runs. It just hangs We have also made SCALAPACK/BLACS/BLAS for mpich-p4 and these runs are correct. Another solution was to use openmpi-1.2 and when we use -03/-04 for scalapack and BLACS all tests succeeded with the -03/-04 some test fails Maybe it is the version of the compilers. We now try to build some versions with gcc version 3.4 and find out if it is a compiler problem. Regards -- ******************************************************************** * * * Bas van der Vlies e-mail: basv@sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From rowland at cse.ohio-state.edu Tue Apr 10 12:21:54 2007 From: rowland at cse.ohio-state.edu (Shaun Rowland) Date: Tue Apr 10 12:21:53 2007 Subject: [mvapich-discuss] xcbrd tests In-Reply-To: <461B78F4.3090009@sara.nl> References: <4613464D.1040307@sara.nl> <46151019.3050608@sara.nl> <461717FC.1040709@cse.ohio-state.edu> <17F55245-49AC-45C3-9907-714CF22AE37A@sara.nl> <46177343.3010505@cse.ohio-state.edu> <461B78F4.3090009@sara.nl> Message-ID: <461BB9A2.7030201@cse.ohio-state.edu> Bas van der Vlies wrote: > Shaun we use BLACS, BLAS (instead of ATLAS is easier to build and test) > and we use SCALAPACK. For each MPI implementation we have build versions. I will try BLAS then as well. Thanks for confirming the different build versions being used. > We build everything with gcc/gfortran 4.1.1 version. We are Using Debian > Etch. > > Now for mvapich1-0.9.9 i have made SCALAPACK and BLACS with different > optimization flags (-03 for fortran and -04 for cc) and the xcbrd test > stops after serveral runs. It just hangs > > We have also made SCALAPACK/BLACS/BLAS for mpich-p4 and these runs are > correct. > > Another solution was to use openmpi-1.2 and when we use -03/-04 for > scalapack and BLACS all tests succeeded with the -03/-04 some test fails I am confused on the very last sentence. There are cases where there are failures with openmpi as well? > Maybe it is the version of the compilers. We now try to build some > versions with gcc version 3.4 and find out if it is a compiler problem. I've since tried using the Intel compiler for everything (icc and ifort both), and it still works for me. I'll take a look at BLAS instead of ATLAS and see if there's a difference. -- Shaun Rowland rowland@cse.ohio-state.edu http://www.cse.ohio-state.edu/~rowland/ From jsquyres at cisco.com Tue Apr 10 12:41:41 2007 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue Apr 10 12:43:00 2007 Subject: [mvapich-discuss] xcbrd tests In-Reply-To: <461BB9A2.7030201@cse.ohio-state.edu> References: <4613464D.1040307@sara.nl> <46151019.3050608@sara.nl> <461717FC.1040709@cse.ohio-state.edu> <17F55245-49AC-45C3-9907-714CF22AE37A@sara.nl> <46177343.3010505@cse.ohio-state.edu> <461B78F4.3090009@sara.nl> <461BB9A2.7030201@cse.ohio-state.edu> Message-ID: <252836E5-1647-4062-9C09-081427589F99@cisco.com> On Apr 10, 2007, at 12:21 PM, Shaun Rowland wrote: >> Another solution was to use openmpi-1.2 and when we use -03/