From michael.heinz at qlogic.com Mon Jun 1 11:08:28 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon Jun 1 11:09:38 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 46912 bytes Desc: image002.png Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090601/ba339278/image002-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 45281 bytes Desc: image005.png Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090601/ba339278/image005-0001.png From panda at cse.ohio-state.edu Mon Jun 1 15:02:40 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Jun 1 15:03:00 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> Message-ID: Hi Mike, Thanks for your report. We tried running PMB (as well IMB, the latest one) on both the released version of MVAPICH 1.1.0 and the branch version. We are getting the peak bandwidth to be in the range of 2400-2600 MB/s consistently. The experiments were done with Mellanox-IB cards, DDR switch and Intel Colvertown platforms. We are not able to reproduce the problem you are mentioning. Could you please provide more details on the platform, adapter, switch, etc. Also, let us know if you are using any specific optimization level. Thanks, DK On Mon, 1 Jun 2009, Mike Heinz wrote: > We had a customer report what they thought was a hardware problem, and I was assigned to investigate. Basically, they were claiming odd variations in performance during PALLAS runs to test their Infiniband fabric. > > What I discovered, however, was a much more interesting problem could be duplicated on any fabric, as long as I was using MVAPICH 1.1.0. > > Basically, what I saw was that, given two hosts and a switch, the Pallas Send Receive benchmark compiled with MVAPICH 1.1.0 would report a performance of EITHER about 2600 MB/S OR 1850 MB/S with little variation otherwise. Moreover, this behavior is unique to MVAPICH 1.1.0 - switching to MVAPICH 2 eliminated the variation. I've attached a chart so you can see what I mean. > > [cid:image002.png@01C9E2A9.4A349440] > > I realize that, looking at the chart, your first instinct is to announce "clearly there was other traffic on the fabric that was interfering with the benchmark" - but I assure you that was not the case. Moreover, using the same nodes and same switch, but compiling with MVAPICH2, shows a complete elimination of the effect: > > [cid:image005.png@01C9E2A9.4A349440] > > Does anyone have any ideas what's going on? If anyone wants to replicate this test, all I did was to perform 100 runs of > > ./PMB2.2.1/SRC_PMB/PMB-MPI1 Sendrecv > > I only used the 4 meg message size for these charts, but that is just for clarity. The issue appears to affect shorter messages as well. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > From michael.heinz at qlogic.com Mon Jun 1 15:15:36 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon Jun 1 15:16:44 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069C28@MNEXMB1.qlogic.org> Interesting. For this test, we're using a couple of AMD opterons, running at 2.4 ghz, and RHEL 4u6, a pair of Mellanox DDR HCAs and a Qlogic 9xxx switch. We took the default when installing OFED and, looking at the build log, it appears that OFED used OPTIMIZATION_FLAG='-O3 -fno-strict-aliasing' when compiling mvapich. No optimization was chosen when compiling Pallas. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Monday, June 01, 2009 3:03 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu; mwheinz@me.com; John Russo; Todd Rimmer Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 Hi Mike, Thanks for your report. We tried running PMB (as well IMB, the latest one) on both the released version of MVAPICH 1.1.0 and the branch version. We are getting the peak bandwidth to be in the range of 2400-2600 MB/s consistently. The experiments were done with Mellanox-IB cards, DDR switch and Intel Colvertown platforms. We are not able to reproduce the problem you are mentioning. Could you please provide more details on the platform, adapter, switch, etc. Also, let us know if you are using any specific optimization level. Thanks, DK On Mon, 1 Jun 2009, Mike Heinz wrote: > We had a customer report what they thought was a hardware problem, and I was assigned to investigate. Basically, they were claiming odd variations in performance during PALLAS runs to test their Infiniband fabric. > > What I discovered, however, was a much more interesting problem could be duplicated on any fabric, as long as I was using MVAPICH 1.1.0. > > Basically, what I saw was that, given two hosts and a switch, the Pallas Send Receive benchmark compiled with MVAPICH 1.1.0 would report a performance of EITHER about 2600 MB/S OR 1850 MB/S with little variation otherwise. Moreover, this behavior is unique to MVAPICH 1.1.0 - switching to MVAPICH 2 eliminated the variation. I've attached a chart so you can see what I mean. > > [cid:image002.png@01C9E2A9.4A349440] > > I realize that, looking at the chart, your first instinct is to announce "clearly there was other traffic on the fabric that was interfering with the benchmark" - but I assure you that was not the case. Moreover, using the same nodes and same switch, but compiling with MVAPICH2, shows a complete elimination of the effect: > > [cid:image005.png@01C9E2A9.4A349440] > > Does anyone have any ideas what's going on? If anyone wants to replicate this test, all I did was to perform 100 runs of > > ./PMB2.2.1/SRC_PMB/PMB-MPI1 Sendrecv > > I only used the 4 meg message size for these charts, but that is just for clarity. The issue appears to affect shorter messages as well. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > From kandalla at cse.ohio-state.edu Tue Jun 2 14:38:04 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya) Date: Tue Jun 2 14:38:42 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069C28@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB43E9069C28@MNEXMB1.qlogic.org> Message-ID: Mike, We have run tests on Intel Clovertown and AMD Barcelona machines with MVAPICH-1.1 and we have compiled the library with the flags that you mentioned in your previous mail. Unfortunately, we are not able to reproduce the issue. We have run the complete sendrecv pallas benchmark about a 100 times in a loop and we see that the peak bandiwdth is in the 2400 - 2600 MB/s range consistently. Could you try running the benchmark on two nodes connected back to back? This will eliminate any network or switch issues. Thanks, Krishna On Mon, Jun 1, 2009 at 3:15 PM, Mike Heinz wrote: > Interesting. > > For this test, we're using a couple of AMD opterons, running at 2.4 ghz, > and RHEL 4u6, a pair of Mellanox DDR HCAs and a Qlogic 9xxx switch. > > We took the default when installing OFED and, looking at the build log, it > appears that OFED used OPTIMIZATION_FLAG='-O3 -fno-strict-aliasing' when > compiling mvapich. No optimization was chosen when compiling Pallas. > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Monday, June 01, 2009 3:03 PM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu; mwheinz@me.com; John Russo; Todd > Rimmer > Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive > Benchmark in MVAPICH 1.1 > > Hi Mike, > > Thanks for your report. We tried running PMB (as well IMB, the latest one) > on both the released version of MVAPICH 1.1.0 and the branch version. We > are getting the peak bandwidth to be in the range of 2400-2600 MB/s > consistently. The experiments were done with Mellanox-IB cards, DDR switch > and Intel Colvertown platforms. We are not able to reproduce the problem > you are mentioning. > > Could you please provide more details on the platform, adapter, switch, > etc. Also, let us know if you are using any specific optimization level. > > Thanks, > > DK > > On Mon, 1 Jun 2009, Mike Heinz wrote: > > > We had a customer report what they thought was a hardware problem, and I > was assigned to investigate. Basically, they were claiming odd variations in > performance during PALLAS runs to test their Infiniband fabric. > > > > What I discovered, however, was a much more interesting problem could be > duplicated on any fabric, as long as I was using MVAPICH 1.1.0. > > > > Basically, what I saw was that, given two hosts and a switch, the Pallas > Send Receive benchmark compiled with MVAPICH 1.1.0 would report a > performance of EITHER about 2600 MB/S OR 1850 MB/S with little variation > otherwise. Moreover, this behavior is unique to MVAPICH 1.1.0 - switching to > MVAPICH 2 eliminated the variation. I've attached a chart so you can see > what I mean. > > > > [cid:image002.png@01C9E2A9.4A349440] > > > > I realize that, looking at the chart, your first instinct is to announce > "clearly there was other traffic on the fabric that was interfering with the > benchmark" - but I assure you that was not the case. Moreover, using the > same nodes and same switch, but compiling with MVAPICH2, shows a complete > elimination of the effect: > > > > [cid:image005.png@01C9E2A9.4A349440] > > > > Does anyone have any ideas what's going on? If anyone wants to replicate > this test, all I did was to perform 100 runs of > > > > ./PMB2.2.1/SRC_PMB/PMB-MPI1 Sendrecv > > > > I only used the 4 meg message size for these charts, but that is just for > clarity. The issue appears to affect shorter messages as well. > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation > > King of Prussia, Pennsylvania > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090602/c42e431c/attachment.html From michael.heinz at qlogic.com Tue Jun 2 15:04:07 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue Jun 2 15:05:28 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB43E9069C28@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069D24@MNEXMB1.qlogic.org> Krishna, I did just as you suggested, but the results were the same. I'm going to try rebuilding the MVAPICH installs themselves (instead of just pallas) and see what happens. 4194304 10 3109.00 3109.70 3109.35 2572.60 4194304 10 3396.10 3396.80 3396.45 2355.16 4194304 10 3113.20 3113.90 3113.55 2569.13 4194304 10 3104.90 3105.60 3105.25 2575.99 4194304 10 3109.30 3110.10 3109.70 2572.26 4194304 10 4402.20 4403.00 4402.60 1816.94 4194304 10 4400.70 4401.40 4401.05 1817.60 4194304 10 4404.30 4405.00 4404.65 1816.12 4194304 10 4408.50 4409.80 4409.15 1814.14 4194304 10 4405.40 4406.00 4405.70 1815.71 4194304 10 4408.90 4409.50 4409.20 1814.26 4194304 10 4400.10 4400.60 4400.35 1817.93 4194304 10 3107.90 3108.50 3108.20 2573.59 4194304 10 3395.10 3395.60 3395.35 2355.99 4194304 10 3111.50 3112.80 3112.15 2570.03 4194304 10 3110.30 3110.80 3110.55 2571.69 4194304 10 3106.60 3108.50 3107.55 2573.59 4194304 10 3115.60 3116.20 3115.90 2567.23 4194304 10 3107.70 3108.20 3107.95 2573.84 4194304 10 3107.40 3107.90 3107.65 2574.09 4194304 10 3109.00 3109.40 3109.20 2572.84 4194304 10 4406.70 4407.30 4407.00 1815.17 4194304 10 4408.50 4409.20 4408.85 1814.39 4194304 10 4412.30 4412.80 4412.55 1812.91 4194304 10 4404.40 4405.00 4404.70 1816.12 4194304 10 4432.10 4432.80 4432.45 1804.73 4194304 10 4430.40 4430.90 4430.65 1805.50 4194304 10 4616.30 4616.80 4616.55 1732.80 4194304 10 3107.60 3108.20 3107.90 2573.84 4194304 10 3103.90 3104.30 3104.10 2577.07 4194304 10 3104.00 3104.60 3104.30 2576.82 4194304 10 3108.50 3109.10 3108.80 2573.09 4194304 10 3108.60 3109.00 3108.80 2573.17 4194304 10 3111.00 3111.60 3111.30 2571.02 4194304 10 3112.40 3113.10 3112.75 2569.79 4194304 10 3688.60 3689.30 3688.95 2168.43 4194304 10 4399.60 4400.20 4399.90 1818.10 4194304 10 4407.00 4407.70 4407.35 1815.01 4194304 10 4448.30 4448.90 4448.60 1798.20 4194304 10 4413.10 4413.90 4413.50 1812.46 4194304 10 4403.70 4404.20 4403.95 1816.45 4194304 10 4407.30 4407.80 4407.55 1814.96 4194304 10 4405.00 4405.60 4405.30 1815.87 4194304 10 4420.20 4420.90 4420.55 1809.59 4194304 10 3105.30 3105.80 3105.55 2575.83 4194304 10 3109.60 3110.10 3109.85 2572.26 4194304 10 3109.10 3109.50 3109.30 2572.76 4194304 10 3109.80 3110.40 3110.10 2572.02 4194304 10 3112.60 3113.30 3112.95 2569.62 4194304 10 3107.40 3107.80 3107.60 2574.17 4194304 10 3111.20 3112.00 3111.60 2570.69 4194304 10 3385.60 3386.10 3385.85 2362.60 4194304 10 4404.70 4405.40 4405.05 1815.95 4194304 10 4411.60 4412.20 4411.90 1813.15 4194304 10 4409.90 4410.60 4410.25 1813.81 4194304 10 4415.10 4415.80 4415.45 1811.68 4194304 10 4406.00 4406.40 4406.20 1815.54 4194304 10 4428.00 4428.50 4428.25 1806.48 4194304 10 4405.50 4406.10 4405.80 1815.66 4194304 10 4410.90 4411.80 4411.35 1813.32 4194304 10 3107.90 3108.60 3108.25 2573.51 4194304 10 3388.60 3389.00 3388.80 2360.58 4194304 10 3451.40 3452.50 3451.95 2317.16 4194304 10 3113.70 3114.00 3113.85 2569.04 4194304 10 3107.20 3107.90 3107.55 2574.09 4194304 10 3107.00 3107.60 3107.30 2574.33 4194304 10 3112.00 3112.60 3112.30 2570.20 4194304 10 3688.00 3688.70 3688.35 2168.79 4194304 10 4427.60 4428.30 4427.95 1806.56 4194304 10 4413.70 4414.30 4414.00 1812.29 4194304 10 4419.60 4420.30 4419.95 1809.83 4194304 10 4411.00 4411.50 4411.25 1813.44 4194304 10 4393.80 4394.40 4394.10 1820.50 4194304 10 4397.60 4398.00 4397.80 1819.01 4194304 10 4408.10 4408.60 4408.35 1814.64 4194304 10 3107.20 3107.70 3107.45 2574.25 4194304 10 3109.10 3109.50 3109.30 2572.76 4194304 10 3102.80 3103.40 3103.10 2577.82 4194304 10 3101.90 3102.30 3102.10 2578.73 4194304 10 3107.50 3108.00 3107.75 2574.00 4194304 10 3102.30 3102.70 3102.50 2578.40 4194304 10 3108.60 3109.40 3109.00 2572.84 4194304 10 3105.30 3105.80 3105.55 2575.83 4194304 10 4389.00 4389.80 4389.40 1822.41 4194304 10 4397.50 4398.20 4397.85 1818.93 4194304 10 4404.60 4405.10 4404.85 1816.08 4194304 10 4423.50 4424.00 4423.75 1808.32 4194304 10 4435.90 4436.50 4436.20 1803.22 4194304 10 4422.20 4422.70 4422.45 1808.85 4194304 10 4408.30 4408.90 4408.60 1814.51 4194304 10 4389.80 4390.40 4390.10 1822.16 4194304 10 3103.90 3104.40 3104.15 2576.99 4194304 10 3103.60 3104.20 3103.90 2577.15 4194304 10 3102.30 3102.90 3102.60 2578.23 4194304 10 3104.60 3105.10 3104.85 2576.41 4194304 10 3103.40 3103.80 3103.60 2577.49 4194304 10 3106.50 3107.00 3106.75 2574.83 4194304 10 3109.80 3110.40 3110.10 2572.02 4194304 10 3120.00 3120.50 3120.25 2563.69 4194304 10 4404.10 4404.70 4404.40 1816.24 -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090602/5e4f1457/attachment-0001.html From michael.heinz at qlogic.com Tue Jun 2 16:45:06 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue Jun 2 16:46:16 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB43E9069C28@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069D63@MNEXMB1.qlogic.org> We are also seeing this behavior when we installed "vanilla" OFED rather than QLogic's pre-packaged binaries. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From: kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] On Behalf Of Krishna Chaitanya Sent: Tuesday, June 02, 2009 2:38 PM To: Mike Heinz Cc: Dhabaleswar Panda; Todd Rimmer; mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 Mike, We have run tests on Intel Clovertown and AMD Barcelona machines with MVAPICH-1.1 and we have compiled the library with the flags that you mentioned in your previous mail. Unfortunately, we are not able to reproduce the issue. We have run the complete sendrecv pallas benchmark about a 100 times in a loop and we see that the peak bandiwdth is in the 2400 - 2600 MB/s range consistently. Could you try running the benchmark on two nodes connected back to back? This will eliminate any network or switch issues. Thanks, Krishna On Mon, Jun 1, 2009 at 3:15 PM, Mike Heinz > wrote: Interesting. For this test, we're using a couple of AMD opterons, running at 2.4 ghz, and RHEL 4u6, a pair of Mellanox DDR HCAs and a Qlogic 9xxx switch. We took the default when installing OFED and, looking at the build log, it appears that OFED used OPTIMIZATION_FLAG='-O3 -fno-strict-aliasing' when compiling mvapich. No optimization was chosen when compiling Pallas. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Monday, June 01, 2009 3:03 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu; mwheinz@me.com; John Russo; Todd Rimmer Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 Hi Mike, Thanks for your report. We tried running PMB (as well IMB, the latest one) on both the released version of MVAPICH 1.1.0 and the branch version. We are getting the peak bandwidth to be in the range of 2400-2600 MB/s consistently. The experiments were done with Mellanox-IB cards, DDR switch and Intel Colvertown platforms. We are not able to reproduce the problem you are mentioning. Could you please provide more details on the platform, adapter, switch, etc. Also, let us know if you are using any specific optimization level. Thanks, DK On Mon, 1 Jun 2009, Mike Heinz wrote: > We had a customer report what they thought was a hardware problem, and I was assigned to investigate. Basically, they were claiming odd variations in performance during PALLAS runs to test their Infiniband fabric. > > What I discovered, however, was a much more interesting problem could be duplicated on any fabric, as long as I was using MVAPICH 1.1.0. > > Basically, what I saw was that, given two hosts and a switch, the Pallas Send Receive benchmark compiled with MVAPICH 1.1.0 would report a performance of EITHER about 2600 MB/S OR 1850 MB/S with little variation otherwise. Moreover, this behavior is unique to MVAPICH 1.1.0 - switching to MVAPICH 2 eliminated the variation. I've attached a chart so you can see what I mean. > > [cid:image002.png@01C9E2A9.4A349440] > > I realize that, looking at the chart, your first instinct is to announce "clearly there was other traffic on the fabric that was interfering with the benchmark" - but I assure you that was not the case. Moreover, using the same nodes and same switch, but compiling with MVAPICH2, shows a complete elimination of the effect: > > [cid:image005.png@01C9E2A9.4A349440] > > Does anyone have any ideas what's going on? If anyone wants to replicate this test, all I did was to perform 100 runs of > > ./PMB2.2.1/SRC_PMB/PMB-MPI1 Sendrecv > > I only used the 4 meg message size for these charts, but that is just for clarity. The issue appears to affect shorter messages as well. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090602/7a285445/attachment.html From panda at cse.ohio-state.edu Tue Jun 2 21:58:21 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Jun 2 21:58:40 2009 Subject: [mvapich-discuss] Announcing the Release of MVAPICH2 1.4RC1 Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH2 1.4RC1 with the following NEW features: - MPI 2.1 standard compliant - Based on MPICH2 1.0.8p1 - Dynamic Process Management (DPM) Support with mpirun_rsh and MPD - Available for OpenFabrics (IB) interface - Support for eXtended Reliable Connection (XRC) - Available for OpenFabrics (IB) interface - Kernel-level single-copy intra-node communication support based on LiMIC2 - Delivers superior intra-node performance for medium and large messages - Available for all interfaces (IB, iWARP and uDAPL) - Enhancement to mpirun_rsh framework for faster job startup on large clusters - Hierarchical ssh to nodes to speedup job startup - Available for OpenFabrics (IB and iWARP), uDAPL interfaces (including Solaris) and the New QLogic-InfiniPath interface - Scalable checkpoint-restart with mpirun_rsh framework - Checkpoint-restart with intra-node shared memory (kernel-level with LiMIC2) support - Available for OpenFabrics (IB) Interface - K-nomial tree-based solution together with shared memory-based broadcast for scalable MPI_Bcast operation - Available for all interfaces (IB, iWARP and uDAPL) - Native support for QLogic InfiniPath - Provides support over PSM interface This release also contains multiple bug fixes since MVAPICH2-1.2p1. A summary of the major fixes are as follows: - Changed parameters for iWARP for increased scalability - Fix error with derived datatypes and Put and Accumulate operations - Unregister stale memory registrations earlier to prevent malloc failures - Fix for compilation issues with --enable-g=mem and --enable-g=all - Change dapl_prepost_noop_extra value from 5 to 8 to prevent credit flow issues - Re-enable RGET (RDMA Read) functionality - Fix SRQ Finalize error - Fix a multi-rail one-sided error when multiple QPs are used - PMI Lookup name failure with SLURM - Port auto-detection failure when the 1st HCA did not have an active failure - MPE support for shared memory collectives now available For downloading MVAPICH2 1.4RC1, associated user guide and accessing the SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu All feedbacks, including bug reports and hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From nilesha at cdac.in Wed Jun 3 02:25:17 2009 From: nilesha at cdac.in (Nilesh Awate) Date: Wed Jun 3 02:28:11 2009 Subject: [mvapich-discuss] Re: [mvapich] Announcing the Release of MVAPICH2 1.4RC1 In-Reply-To: References: Message-ID: <4A26174D.1070901@cdac.in> Dear Sir, Regarding to our previous mailing conversation about compilation error with --enable-g=all. mvapich2-1.2-2009-05-12 was supposed to be fixed that bug but still i encountered following error while compiling for udapl device with above flag. /home/htdg/pn_mpi/mvapich2-1.2-2009-05-12/lib/libmpich.a(ch3u_handle_connection.o)(.text+0x8e): In function `MPIDI_CH3U_Handle_connection': /home/htdg/pn_mpi/mvapich2-1.2-2009-05-12/src/mpid/ch3/src/ch3u_handle_connection.c:63: undefined reference to `MPIDI_CH3_VC_GetStateString' due to "MPIDI_CH3_VC_GetStateString" this function call which is not defined any where then findout that its defination should be in following file. "mrail/src/udapl/udapl_channel_manager.c" #ifdef USE_DBG_LOGGING const char *MPIDI_CH3_VC_GetStateString(MPIDI_VC_t *vc) { return NULL; } #endif This definition may not be correct but that error was suppressed. Hope the appropriate definition is being added in released RC version. with Best Regards, Nilesh Awate Dhabaleswar Panda wrote: > The MVAPICH team is pleased to announce the release of MVAPICH2 1.4RC1 > with the following NEW features: > > - MPI 2.1 standard compliant > > - Based on MPICH2 1.0.8p1 > > - Dynamic Process Management (DPM) Support with mpirun_rsh and MPD > - Available for OpenFabrics (IB) interface > > - Support for eXtended Reliable Connection (XRC) > - Available for OpenFabrics (IB) interface > > - Kernel-level single-copy intra-node communication support based on > LiMIC2 > - Delivers superior intra-node performance for medium and > large messages > - Available for all interfaces (IB, iWARP and uDAPL) > > - Enhancement to mpirun_rsh framework for faster job startup > on large clusters > - Hierarchical ssh to nodes to speedup job startup > - Available for OpenFabrics (IB and iWARP), uDAPL interfaces > (including Solaris) and the New QLogic-InfiniPath interface > > - Scalable checkpoint-restart with mpirun_rsh framework > - Checkpoint-restart with intra-node shared memory (kernel-level with > LiMIC2) support > - Available for OpenFabrics (IB) Interface > > - K-nomial tree-based solution together with shared memory-based > broadcast for scalable MPI_Bcast operation > - Available for all interfaces (IB, iWARP and uDAPL) > > - Native support for QLogic InfiniPath > - Provides support over PSM interface > > This release also contains multiple bug fixes since MVAPICH2-1.2p1. A > summary of the major fixes are as follows: > > - Changed parameters for iWARP for increased scalability > > - Fix error with derived datatypes and Put and Accumulate operations > > - Unregister stale memory registrations earlier to prevent > malloc failures > > - Fix for compilation issues with --enable-g=mem and --enable-g=all > > - Change dapl_prepost_noop_extra value from 5 to 8 to prevent > credit flow issues > > - Re-enable RGET (RDMA Read) functionality > > - Fix SRQ Finalize error > > - Fix a multi-rail one-sided error when multiple QPs are used > > - PMI Lookup name failure with SLURM > > - Port auto-detection failure when the 1st HCA did > not have an active failure > > - MPE support for shared memory collectives now available > > For downloading MVAPICH2 1.4RC1, associated user guide and accessing > the SVN, please visit the following URL: > > http://mvapich.cse.ohio-state.edu > > All feedbacks, including bug reports and hints for performance tuning, > patches and enhancements are welcome. Please post it to the > mvapich-discuss mailing list. > > Thanks, > > The MVAPICH Team > > > _______________________________________________ > mvapich mailing list > mvapich@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich > > From panda at cse.ohio-state.edu Wed Jun 3 07:40:17 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Jun 3 07:40:35 2009 Subject: [mvapich-discuss] Re: [mvapich] Announcing the Release of MVAPICH2 1.4RC1 In-Reply-To: <4A26174D.1070901@cdac.in> Message-ID: Do you see this issue with the latest released 1.4 version? The fixes related to --enable-g=all have gone into the 1.4 version. Please check this version and let us know. DK On Wed, 3 Jun 2009, Nilesh Awate wrote: > > Dear Sir, > > Regarding to our previous mailing conversation about compilation error > with --enable-g=all. > > mvapich2-1.2-2009-05-12 was supposed to be fixed that bug but still i > encountered following error while compiling for udapl device with above > flag. > > /home/htdg/pn_mpi/mvapich2-1.2-2009-05-12/lib/libmpich.a(ch3u_handle_connection.o)(.text+0x8e): > In function `MPIDI_CH3U_Handle_connection': > /home/htdg/pn_mpi/mvapich2-1.2-2009-05-12/src/mpid/ch3/src/ch3u_handle_connection.c:63: > undefined reference to `MPIDI_CH3_VC_GetStateString' > > due to "MPIDI_CH3_VC_GetStateString" this function call which is not > defined any where > > then findout that its defination should be in following file. > > "mrail/src/udapl/udapl_channel_manager.c" > > #ifdef USE_DBG_LOGGING > const char *MPIDI_CH3_VC_GetStateString(MPIDI_VC_t *vc) > { > return NULL; > } > #endif > > This definition may not be correct but that error was suppressed. > > Hope the appropriate definition is being added in released RC version. > > with Best Regards, > Nilesh Awate > > > > > > Dhabaleswar Panda wrote: > > The MVAPICH team is pleased to announce the release of MVAPICH2 1.4RC1 > > with the following NEW features: > > > > - MPI 2.1 standard compliant > > > > - Based on MPICH2 1.0.8p1 > > > > - Dynamic Process Management (DPM) Support with mpirun_rsh and MPD > > - Available for OpenFabrics (IB) interface > > > > - Support for eXtended Reliable Connection (XRC) > > - Available for OpenFabrics (IB) interface > > > > - Kernel-level single-copy intra-node communication support based on > > LiMIC2 > > - Delivers superior intra-node performance for medium and > > large messages > > - Available for all interfaces (IB, iWARP and uDAPL) > > > > - Enhancement to mpirun_rsh framework for faster job startup > > on large clusters > > - Hierarchical ssh to nodes to speedup job startup > > - Available for OpenFabrics (IB and iWARP), uDAPL interfaces > > (including Solaris) and the New QLogic-InfiniPath interface > > > > - Scalable checkpoint-restart with mpirun_rsh framework > > - Checkpoint-restart with intra-node shared memory (kernel-level with > > LiMIC2) support > > - Available for OpenFabrics (IB) Interface > > > > - K-nomial tree-based solution together with shared memory-based > > broadcast for scalable MPI_Bcast operation > > - Available for all interfaces (IB, iWARP and uDAPL) > > > > - Native support for QLogic InfiniPath > > - Provides support over PSM interface > > > > This release also contains multiple bug fixes since MVAPICH2-1.2p1. A > > summary of the major fixes are as follows: > > > > - Changed parameters for iWARP for increased scalability > > > > - Fix error with derived datatypes and Put and Accumulate operations > > > > - Unregister stale memory registrations earlier to prevent > > malloc failures > > > > - Fix for compilation issues with --enable-g=mem and --enable-g=all > > > > - Change dapl_prepost_noop_extra value from 5 to 8 to prevent > > credit flow issues > > > > - Re-enable RGET (RDMA Read) functionality > > > > - Fix SRQ Finalize error > > > > - Fix a multi-rail one-sided error when multiple QPs are used > > > > - PMI Lookup name failure with SLURM > > > > - Port auto-detection failure when the 1st HCA did > > not have an active failure > > > > - MPE support for shared memory collectives now available > > > > For downloading MVAPICH2 1.4RC1, associated user guide and accessing > > the SVN, please visit the following URL: > > > > http://mvapich.cse.ohio-state.edu > > > > All feedbacks, including bug reports and hints for performance tuning, > > patches and enhancements are welcome. Please post it to the > > mvapich-discuss mailing list. > > > > Thanks, > > > > The MVAPICH Team > > > > > > _______________________________________________ > > mvapich mailing list > > mvapich@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From nathan.baca at gmail.com Wed Jun 3 15:43:40 2009 From: nathan.baca at gmail.com (Nathan Baca) Date: Wed Jun 3 15:44:00 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 Message-ID: Hello, I am trying to build the new release candidate and am getting a consistent build error. It seems to happen at the very end of the build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and pathscale3.2. The same build process successfully builds mvapich2-1.2p1. Anybody else seen this? My configure line and error is as follows: ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC F77=pathf90 FC=pathf90 make[5]: Entering directory `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c In file included from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, from ../../../include/mpiimpl.h:115, from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, from mpidu_process_locks.h:32, from mpidu_process_locks.c:6: /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" -- Nathan Baca nathan.baca@gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090603/d93474d3/attachment.html From lyan1 at cct.lsu.edu Thu Jun 4 12:34:21 2009 From: lyan1 at cct.lsu.edu (Le Yan) Date: Thu Jun 4 12:34:42 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: References: Message-ID: <1244133261.23894.3.camel@lyan1-1.lsu.edu> Hi there, I have exactly the same error with GNU and Intel compilers, at the very beginning though. The building options are the same, except the "slurm" part. Cheers, Le On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: > Hello, > > I am trying to build the new release candidate and am getting a > consistent build error. It seems to happen at the very end of the > build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and > pathscale3.2. The same build process successfully builds > mvapich2-1.2p1. Anybody else seen this? > > My configure line and error is as follows: > > ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm > --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC > F77=pathf90 FC=pathf90 > > make[5]: Entering directory > `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' > pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c > In file included > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, > from ../../../include/mpiimpl.h:115, > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, > from mpidu_process_locks.h:32, > from mpidu_process_locks.c:6: > /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" > > > -- > Nathan Baca > nathan.baca@gmail.com > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From panda at cse.ohio-state.edu Thu Jun 4 12:48:26 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Jun 4 12:48:46 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: <1244133261.23894.3.camel@lyan1-1.lsu.edu> Message-ID: We are taking a look at these issues and will get back to you soon. DK On Thu, 4 Jun 2009, Le Yan wrote: > Hi there, > > I have exactly the same error with GNU and Intel compilers, at the very > beginning though. The building options are the same, except the "slurm" > part. > > Cheers, > Le > > On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: > > Hello, > > > > I am trying to build the new release candidate and am getting a > > consistent build error. It seems to happen at the very end of the > > build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and > > pathscale3.2. The same build process successfully builds > > mvapich2-1.2p1. Anybody else seen this? > > > > My configure line and error is as follows: > > > > ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm > > --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC > > F77=pathf90 FC=pathf90 > > > > make[5]: Entering directory > > `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' > > pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > > -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c > > In file included > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, > > > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, > > > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, > > > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, > > > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, > > from ../../../include/mpiimpl.h:115, > > > > from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, > > from mpidu_process_locks.h:32, > > from mpidu_process_locks.c:6: > > /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" > > > > > > -- > > Nathan Baca > > nathan.baca@gmail.com > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From jatencio at gmail.com Thu Jun 4 13:36:48 2009 From: jatencio at gmail.com (Jonathan G. Atencio) Date: Thu Jun 4 13:37:11 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: <1244133261.23894.3.camel@lyan1-1.lsu.edu> References: <1244133261.23894.3.camel@lyan1-1.lsu.edu> Message-ID: <58a35db40906041036h1c3eba5cnb3b1473d6799aae3@mail.gmail.com> I am running RHEl4 with Infiniband. I see similar errors as well. If I pass "-D_XOPEN_SOURCE=600" to my CFLAGS, then run configure, I am able to get past the original error, however, I encounter another error: % export CFLAGS="-D_XOPEN_SOURCE=600" % ./configure ... % make ... gcc -DHAVE_CONFIG_H -I. -I. -I/home/atencio/mvapich2-1.4rc1/src/pm/mpirun/include -D_XOPEN_SOURCE=600 -DNDEBUG -O2 -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -c mpispawn_tree.c mpispawn_tree.c: In function `mpispawn_tree_init': mpispawn_tree.c:176: error: `fd_set' undeclared (first use in this function) mpispawn_tree.c:176: error: (Each undeclared identifier is reported only once mpispawn_tree.c:176: error: for each function it appears in.) mpispawn_tree.c:176: error: syntax error before "set" mpispawn_tree.c:177: error: storage size of 'tv' isn't known mpispawn_tree.c:191: error: `set' undeclared (first use in this function) make[3]: *** [mpispawn_tree.o] Error 1 make[3]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm/mpirun' make[2]: *** [all-redirect] Error 2 make[2]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm' make[1]: *** [all-redirect] Error 2 make[1]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src' make: *** [all-redirect] Error 2 Regards, Jonathan On Thu, Jun 4, 2009 at 10:34 AM, Le Yan wrote: > Hi there, > > I have exactly the same error with GNU and Intel compilers, at the very > beginning though. The building options are the same, except the "slurm" > part. > > Cheers, > Le > > On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: >> Hello, >> >> I am trying to build the new release candidate and am getting a >> consistent build error. It seems to happen at the very end of the >> build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and >> pathscale3.2. The same build process successfully builds >> mvapich2-1.2p1. Anybody else seen this? >> >> My configure line and error is as follows: >> >> ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm >> --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC >> F77=pathf90 FC=pathf90 >> >> make[5]: Entering directory >> `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' >> pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c >> In file included >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, >> >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, >> >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, >> >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, >> >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, >> ? ? ? ? ? ? ? ? ?from ../../../include/mpiimpl.h:115, >> >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.h:32, >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.c:6: >> /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" >> >> >> -- >> Nathan Baca >> nathan.baca@gmail.com >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From perkinjo at cse.ohio-state.edu Thu Jun 4 14:12:46 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Jun 4 14:13:30 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: <58a35db40906041036h1c3eba5cnb3b1473d6799aae3@mail.gmail.com> References: <1244133261.23894.3.camel@lyan1-1.lsu.edu> <58a35db40906041036h1c3eba5cnb3b1473d6799aae3@mail.gmail.com> Message-ID: <20090604181246.GI3079@cse.ohio-state.edu> On Thu, Jun 04, 2009 at 11:36:48AM -0600, Jonathan G. Atencio wrote: > I am running RHEl4 with Infiniband. I see similar errors as well. This may be a rhel4 specific issue. Can Nathan and Le provide details of their OS as well as cpu architecture? > > If I pass "-D_XOPEN_SOURCE=600" to my CFLAGS, then run configure, I am > able to get past the original error, however, I encounter another > error: > > % export CFLAGS="-D_XOPEN_SOURCE=600" > % ./configure > ... > % make > ... > gcc -DHAVE_CONFIG_H -I. -I. > -I/home/atencio/mvapich2-1.4rc1/src/pm/mpirun/include > -D_XOPEN_SOURCE=600 -DNDEBUG -O2 > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -c > mpispawn_tree.c > mpispawn_tree.c: In function `mpispawn_tree_init': > mpispawn_tree.c:176: error: `fd_set' undeclared (first use in this function) > mpispawn_tree.c:176: error: (Each undeclared identifier is reported only once > mpispawn_tree.c:176: error: for each function it appears in.) > mpispawn_tree.c:176: error: syntax error before "set" > mpispawn_tree.c:177: error: storage size of 'tv' isn't known > mpispawn_tree.c:191: error: `set' undeclared (first use in this function) > make[3]: *** [mpispawn_tree.o] Error 1 > make[3]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm/mpirun' > make[2]: *** [all-redirect] Error 2 > make[2]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm' > make[1]: *** [all-redirect] Error 2 > make[1]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src' > make: *** [all-redirect] Error 2 > > > Regards, > > Jonathan > > On Thu, Jun 4, 2009 at 10:34 AM, Le Yan wrote: > > Hi there, > > > > I have exactly the same error with GNU and Intel compilers, at the very > > beginning though. The building options are the same, except the "slurm" > > part. > > > > Cheers, > > Le > > > > On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: > >> Hello, > >> > >> I am trying to build the new release candidate and am getting a > >> consistent build error. It seems to happen at the very end of the > >> build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and > >> pathscale3.2. The same build process successfully builds > >> mvapich2-1.2p1. Anybody else seen this? > >> > >> My configure line and error is as follows: > >> > >> ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm > >> --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC > >> F77=pathf90 FC=pathf90 > >> > >> make[5]: Entering directory > >> `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' > >> pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c > >> In file included > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, > >> ? ? ? ? ? ? ? ? ?from ../../../include/mpiimpl.h:115, > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, > >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.h:32, > >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.c:6: > >> /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" > >> > >> > >> -- > >> Nathan Baca > >> nathan.baca@gmail.com > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090604/b40001b4/attachment.bin From lyan1 at cct.lsu.edu Thu Jun 4 15:09:49 2009 From: lyan1 at cct.lsu.edu (Le Yan) Date: Thu Jun 4 15:10:10 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: <20090604181246.GI3079@cse.ohio-state.edu> References: <1244133261.23894.3.camel@lyan1-1.lsu.edu> <58a35db40906041036h1c3eba5cnb3b1473d6799aae3@mail.gmail.com> <20090604181246.GI3079@cse.ohio-state.edu> Message-ID: <1244142589.16749.2.camel@lyan1-1.lsu.edu> Ours is an x86_64 cluster running rhel4 (kernel 2.6.9-55). Cheers, Le On Thu, 2009-06-04 at 14:12 -0400, Jonathan Perkins wrote: > On Thu, Jun 04, 2009 at 11:36:48AM -0600, Jonathan G. Atencio wrote: > > I am running RHEl4 with Infiniband. I see similar errors as well. > > This may be a rhel4 specific issue. Can Nathan and Le provide details > of their OS as well as cpu architecture? > > > > > If I pass "-D_XOPEN_SOURCE=600" to my CFLAGS, then run configure, I am > > able to get past the original error, however, I encounter another > > error: > > > > % export CFLAGS="-D_XOPEN_SOURCE=600" > > % ./configure > > ... > > % make > > ... > > gcc -DHAVE_CONFIG_H -I. -I. > > -I/home/atencio/mvapich2-1.4rc1/src/pm/mpirun/include > > -D_XOPEN_SOURCE=600 -DNDEBUG -O2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -c > > mpispawn_tree.c > > mpispawn_tree.c: In function `mpispawn_tree_init': > > mpispawn_tree.c:176: error: `fd_set' undeclared (first use in this function) > > mpispawn_tree.c:176: error: (Each undeclared identifier is reported only once > > mpispawn_tree.c:176: error: for each function it appears in.) > > mpispawn_tree.c:176: error: syntax error before "set" > > mpispawn_tree.c:177: error: storage size of 'tv' isn't known > > mpispawn_tree.c:191: error: `set' undeclared (first use in this function) > > make[3]: *** [mpispawn_tree.o] Error 1 > > make[3]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm/mpirun' > > make[2]: *** [all-redirect] Error 2 > > make[2]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm' > > make[1]: *** [all-redirect] Error 2 > > make[1]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src' > > make: *** [all-redirect] Error 2 > > > > > > Regards, > > > > Jonathan > > > > On Thu, Jun 4, 2009 at 10:34 AM, Le Yan wrote: > > > Hi there, > > > > > > I have exactly the same error with GNU and Intel compilers, at the very > > > beginning though. The building options are the same, except the "slurm" > > > part. > > > > > > Cheers, > > > Le > > > > > > On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: > > >> Hello, > > >> > > >> I am trying to build the new release candidate and am getting a > > >> consistent build error. It seems to happen at the very end of the > > >> build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and > > >> pathscale3.2. The same build process successfully builds > > >> mvapich2-1.2p1. Anybody else seen this? > > >> > > >> My configure line and error is as follows: > > >> > > >> ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm > > >> --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC > > >> F77=pathf90 FC=pathf90 > > >> > > >> make[5]: Entering directory > > >> `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' > > >> pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c > > >> In file included > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, > > >> from ../../../include/mpiimpl.h:115, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, > > >> from mpidu_process_locks.h:32, > > >> from mpidu_process_locks.c:6: > > >> /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" > > >> > > >> > > >> -- > > >> Nathan Baca > > >> nathan.baca@gmail.com > > >> _______________________________________________ > > >> mvapich-discuss mailing list > > >> mvapich-discuss@cse.ohio-state.edu > > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From perkinjo at cse.ohio-state.edu Thu Jun 4 15:49:24 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Jun 4 15:49:43 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: <20090604181246.GI3079@cse.ohio-state.edu> References: <1244133261.23894.3.camel@lyan1-1.lsu.edu> <58a35db40906041036h1c3eba5cnb3b1473d6799aae3@mail.gmail.com> <20090604181246.GI3079@cse.ohio-state.edu> Message-ID: <20090604194924.GN3079@cse.ohio-state.edu> On Thu, Jun 04, 2009 at 02:12:46PM -0400, Jonathan Perkins wrote: > On Thu, Jun 04, 2009 at 11:36:48AM -0600, Jonathan G. Atencio wrote: > > I am running RHEl4 with Infiniband. I see similar errors as well. > > This may be a rhel4 specific issue. Can Nathan and Le provide details > of their OS as well as cpu architecture? We've reproduced the error on rhel4 machine and found a couple resolutions. Let me suggest that users on rhel4 add '-D_GNU_SOURCE' to their CFLAGS until we provide a new version that takes care of this internally. The following should work... % ./configure CFLAGS='-D_GNU_SOURCE' % make > > > > > If I pass "-D_XOPEN_SOURCE=600" to my CFLAGS, then run configure, I am > > able to get past the original error, however, I encounter another > > error: > > > > % export CFLAGS="-D_XOPEN_SOURCE=600" > > % ./configure > > ... > > % make > > ... > > gcc -DHAVE_CONFIG_H -I. -I. > > -I/home/atencio/mvapich2-1.4rc1/src/pm/mpirun/include > > -D_XOPEN_SOURCE=600 -DNDEBUG -O2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -c > > mpispawn_tree.c > > mpispawn_tree.c: In function `mpispawn_tree_init': > > mpispawn_tree.c:176: error: `fd_set' undeclared (first use in this function) > > mpispawn_tree.c:176: error: (Each undeclared identifier is reported only once > > mpispawn_tree.c:176: error: for each function it appears in.) > > mpispawn_tree.c:176: error: syntax error before "set" > > mpispawn_tree.c:177: error: storage size of 'tv' isn't known > > mpispawn_tree.c:191: error: `set' undeclared (first use in this function) > > make[3]: *** [mpispawn_tree.o] Error 1 > > make[3]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm/mpirun' > > make[2]: *** [all-redirect] Error 2 > > make[2]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm' > > make[1]: *** [all-redirect] Error 2 > > make[1]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src' > > make: *** [all-redirect] Error 2 > > > > > > Regards, > > > > Jonathan > > > > On Thu, Jun 4, 2009 at 10:34 AM, Le Yan wrote: > > > Hi there, > > > > > > I have exactly the same error with GNU and Intel compilers, at the very > > > beginning though. The building options are the same, except the "slurm" > > > part. > > > > > > Cheers, > > > Le > > > > > > On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: > > >> Hello, > > >> > > >> I am trying to build the new release candidate and am getting a > > >> consistent build error. It seems to happen at the very end of the > > >> build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and > > >> pathscale3.2. The same build process successfully builds > > >> mvapich2-1.2p1. Anybody else seen this? > > >> > > >> My configure line and error is as follows: > > >> > > >> ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm > > >> --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC > > >> F77=pathf90 FC=pathf90 > > >> > > >> make[5]: Entering directory > > >> `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' > > >> pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c > > >> In file included > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, > > >> ? ? ? ? ? ? ? ? ?from ../../../include/mpiimpl.h:115, > > >> > > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, > > >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.h:32, > > >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.c:6: > > >> /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" > > >> > > >> > > >> -- > > >> Nathan Baca > > >> nathan.baca@gmail.com > > >> _______________________________________________ > > >> mvapich-discuss mailing list > > >> mvapich-discuss@cse.ohio-state.edu > > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > _______________________________________________ > > > mvapich-discuss mailing list > > > mvapich-discuss@cse.ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090604/c2513d50/attachment.bin From jatencio at gmail.com Thu Jun 4 16:35:55 2009 From: jatencio at gmail.com (Jonathan G. Atencio) Date: Thu Jun 4 16:36:19 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: <20090604194924.GN3079@cse.ohio-state.edu> References: <1244133261.23894.3.camel@lyan1-1.lsu.edu> <58a35db40906041036h1c3eba5cnb3b1473d6799aae3@mail.gmail.com> <20090604181246.GI3079@cse.ohio-state.edu> <20090604194924.GN3079@cse.ohio-state.edu> Message-ID: <58a35db40906041335s600a83c1n9b6dcade2a977f@mail.gmail.com> Hello Jonathan, This "fix" worked for me. Thanks, Jonathan On Thu, Jun 4, 2009 at 1:49 PM, Jonathan Perkins wrote: > On Thu, Jun 04, 2009 at 02:12:46PM -0400, Jonathan Perkins wrote: >> On Thu, Jun 04, 2009 at 11:36:48AM -0600, Jonathan G. Atencio wrote: >> > I am running RHEl4 with Infiniband. I see similar errors as well. >> >> This may be a rhel4 specific issue. ?Can Nathan and Le provide details >> of their OS as well as cpu architecture? > > We've reproduced the error on rhel4 machine and found a couple > resolutions. ?Let me suggest that users on rhel4 add '-D_GNU_SOURCE' to > their CFLAGS until we provide a new version that takes care of this > internally. > > The following should work... > % ./configure CFLAGS='-D_GNU_SOURCE' > % make > >> >> > >> > If I pass "-D_XOPEN_SOURCE=600" to my CFLAGS, then run configure, I am >> > able to get past the original error, however, I encounter another >> > error: >> > >> > % export CFLAGS="-D_XOPEN_SOURCE=600" >> > % ./configure >> > ... >> > % make >> > ... >> > gcc -DHAVE_CONFIG_H -I. -I. >> > -I/home/atencio/mvapich2-1.4rc1/src/pm/mpirun/include >> > -D_XOPEN_SOURCE=600 -DNDEBUG -O2 >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -c >> > mpispawn_tree.c >> > mpispawn_tree.c: In function `mpispawn_tree_init': >> > mpispawn_tree.c:176: error: `fd_set' undeclared (first use in this function) >> > mpispawn_tree.c:176: error: (Each undeclared identifier is reported only once >> > mpispawn_tree.c:176: error: for each function it appears in.) >> > mpispawn_tree.c:176: error: syntax error before "set" >> > mpispawn_tree.c:177: error: storage size of 'tv' isn't known >> > mpispawn_tree.c:191: error: `set' undeclared (first use in this function) >> > make[3]: *** [mpispawn_tree.o] Error 1 >> > make[3]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm/mpirun' >> > make[2]: *** [all-redirect] Error 2 >> > make[2]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm' >> > make[1]: *** [all-redirect] Error 2 >> > make[1]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src' >> > make: *** [all-redirect] Error 2 >> > >> > >> > Regards, >> > >> > Jonathan >> > >> > On Thu, Jun 4, 2009 at 10:34 AM, Le Yan wrote: >> > > Hi there, >> > > >> > > I have exactly the same error with GNU and Intel compilers, at the very >> > > beginning though. The building options are the same, except the "slurm" >> > > part. >> > > >> > > Cheers, >> > > Le >> > > >> > > On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: >> > >> Hello, >> > >> >> > >> I am trying to build the new release candidate and am getting a >> > >> consistent build error. It seems to happen at the very end of the >> > >> build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and >> > >> pathscale3.2. The same build process successfully builds >> > >> mvapich2-1.2p1. Anybody else seen this? >> > >> >> > >> My configure line and error is as follows: >> > >> >> > >> ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm >> > >> --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC >> > >> F77=pathf90 FC=pathf90 >> > >> >> > >> make[5]: Entering directory >> > >> `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' >> > >> pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c >> > >> In file included >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, >> > >> >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, >> > >> >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, >> > >> >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, >> > >> >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, >> > >> ? ? ? ? ? ? ? ? ?from ../../../include/mpiimpl.h:115, >> > >> >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, >> > >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.h:32, >> > >> ? ? ? ? ? ? ? ? ?from mpidu_process_locks.c:6: >> > >> /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" >> > >> >> > >> >> > >> -- >> > >> Nathan Baca >> > >> nathan.baca@gmail.com >> > >> _______________________________________________ >> > >> mvapich-discuss mailing list >> > >> mvapich-discuss@cse.ohio-state.edu >> > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > >> > > _______________________________________________ >> > > mvapich-discuss mailing list >> > > mvapich-discuss@cse.ohio-state.edu >> > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > >> > >> > _______________________________________________ >> > mvapich-discuss mailing list >> > mvapich-discuss@cse.ohio-state.edu >> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> -- >> Jonathan Perkins >> http://www.cse.ohio-state.edu/~perkinjo > > > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > From lyan1 at cct.lsu.edu Thu Jun 4 17:13:46 2009 From: lyan1 at cct.lsu.edu (Le Yan) Date: Thu Jun 4 17:14:06 2009 Subject: [mvapich-discuss] Build error with mvapich2-1.4-RC1-3378 In-Reply-To: <58a35db40906041335s600a83c1n9b6dcade2a977f@mail.gmail.com> References: <1244133261.23894.3.camel@lyan1-1.lsu.edu> <58a35db40906041036h1c3eba5cnb3b1473d6799aae3@mail.gmail.com> <20090604181246.GI3079@cse.ohio-state.edu> <20090604194924.GN3079@cse.ohio-state.edu> <58a35db40906041335s600a83c1n9b6dcade2a977f@mail.gmail.com> Message-ID: <1244150026.28963.0.camel@lyan1-1.lsu.edu> Same here. Thank you. Cheers, Le On Thu, 2009-06-04 at 14:35 -0600, Jonathan G. Atencio wrote: > Hello Jonathan, > > This "fix" worked for me. > > Thanks, > > Jonathan > > On Thu, Jun 4, 2009 at 1:49 PM, Jonathan Perkins > wrote: > > On Thu, Jun 04, 2009 at 02:12:46PM -0400, Jonathan Perkins wrote: > >> On Thu, Jun 04, 2009 at 11:36:48AM -0600, Jonathan G. Atencio wrote: > >> > I am running RHEl4 with Infiniband. I see similar errors as well. > >> > >> This may be a rhel4 specific issue. Can Nathan and Le provide details > >> of their OS as well as cpu architecture? > > > > We've reproduced the error on rhel4 machine and found a couple > > resolutions. Let me suggest that users on rhel4 add '-D_GNU_SOURCE' to > > their CFLAGS until we provide a new version that takes care of this > > internally. > > > > The following should work... > > % ./configure CFLAGS='-D_GNU_SOURCE' > > % make > > > >> > >> > > >> > If I pass "-D_XOPEN_SOURCE=600" to my CFLAGS, then run configure, I am > >> > able to get past the original error, however, I encounter another > >> > error: > >> > > >> > % export CFLAGS="-D_XOPEN_SOURCE=600" > >> > % ./configure > >> > ... > >> > % make > >> > ... > >> > gcc -DHAVE_CONFIG_H -I. -I. > >> > -I/home/atencio/mvapich2-1.4rc1/src/pm/mpirun/include > >> > -D_XOPEN_SOURCE=600 -DNDEBUG -O2 > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/datatype > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks > >> > -I/home/atencio/mvapich2-1.4rc1/src/mpid/common/locks -c > >> > mpispawn_tree.c > >> > mpispawn_tree.c: In function `mpispawn_tree_init': > >> > mpispawn_tree.c:176: error: `fd_set' undeclared (first use in this function) > >> > mpispawn_tree.c:176: error: (Each undeclared identifier is reported only once > >> > mpispawn_tree.c:176: error: for each function it appears in.) > >> > mpispawn_tree.c:176: error: syntax error before "set" > >> > mpispawn_tree.c:177: error: storage size of 'tv' isn't known > >> > mpispawn_tree.c:191: error: `set' undeclared (first use in this function) > >> > make[3]: *** [mpispawn_tree.o] Error 1 > >> > make[3]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm/mpirun' > >> > make[2]: *** [all-redirect] Error 2 > >> > make[2]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src/pm' > >> > make[1]: *** [all-redirect] Error 2 > >> > make[1]: Leaving directory `/mscf/home/atencio/mvapich2-1.4rc1/src' > >> > make: *** [all-redirect] Error 2 > >> > > >> > > >> > Regards, > >> > > >> > Jonathan > >> > > >> > On Thu, Jun 4, 2009 at 10:34 AM, Le Yan wrote: > >> > > Hi there, > >> > > > >> > > I have exactly the same error with GNU and Intel compilers, at the very > >> > > beginning though. The building options are the same, except the "slurm" > >> > > part. > >> > > > >> > > Cheers, > >> > > Le > >> > > > >> > > On Wed, 2009-06-03 at 13:43 -0600, Nathan Baca wrote: > >> > >> Hello, > >> > >> > >> > >> I am trying to build the new release candidate and am getting a > >> > >> consistent build error. It seems to happen at the very end of the > >> > >> build and has failed with: gcc3.4, gcc4.1, intel10.1.015, and > >> > >> pathscale3.2. The same build process successfully builds > >> > >> mvapich2-1.2p1. Anybody else seen this? > >> > >> > >> > >> My configure line and error is as follows: > >> > >> > >> > >> ./configure --prefix=1.4rc1-pathscale-3.2 --with-slurm=/opt/hptc/slurm > >> > >> --enable-romio --with-file-system=lustre CC=pathcc CXX=pathCC > >> > >> F77=pathf90 FC=pathf90 > >> > >> > >> > >> make[5]: Entering directory > >> > >> `/ram/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks' > >> > >> pathcc -DHAVE_CONFIG_H -I. -I. -I. -I../../../include > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/include -DNDEBUG -O2 > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/datatype > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks > >> > >> -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -I/tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/common/locks -c mpidu_process_locks.c > >> > >> In file included > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h:18, > >> > >> > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/vbuf.h:32, > >> > >> > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/mpidi_ch3_rdma_pre.h:23, > >> > >> > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include/mpidi_ch3_pre.h:23, > >> > >> > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidpre.h:43, > >> > >> from ../../../include/mpiimpl.h:115, > >> > >> > >> > >> from /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/include/mpidimpl.h:36, > >> > >> from mpidu_process_locks.h:32, > >> > >> from mpidu_process_locks.c:6: > >> > >> /tmp/mvapich2-build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2/../rdma/coll_shmem.h:100: error: syntax error before "pthread_spinlock_t" > >> > >> > >> > >> > >> > >> -- > >> > >> Nathan Baca > >> > >> nathan.baca@gmail.com > >> > >> _______________________________________________ > >> > >> mvapich-discuss mailing list > >> > >> mvapich-discuss@cse.ohio-state.edu > >> > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > >> > > _______________________________________________ > >> > > mvapich-discuss mailing list > >> > > mvapich-discuss@cse.ohio-state.edu > >> > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > >> > > >> > _______________________________________________ > >> > mvapich-discuss mailing list > >> > mvapich-discuss@cse.ohio-state.edu > >> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> -- > >> Jonathan Perkins > >> http://www.cse.ohio-state.edu/~perkinjo > > > > > > > > -- > > Jonathan Perkins > > http://www.cse.ohio-state.edu/~perkinjo > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From divi at ncat.edu Sun Jun 7 20:43:38 2009 From: divi at ncat.edu (Divi Venkateswarlu) Date: Sun Jun 7 20:44:47 2009 Subject: [mvapich-discuss] How do I build mvapich2 on QLOGIC IB? Message-ID: Dear all: I am trying to install MVAPICH and MVAPICH2 on a SUN cluster with QLOGIC IB hardware. The system has qlogic mpi compiled with gnu compilers. I have problems using this setup for the chemistry package I am interested in. (This package compiles well on my in-house cluster using mellanox IB/OFED/MVAPICH/INTEL comibination with pretty good scaling) So I am interested in installing MVAPICH/MVAPICH2 with intel compilers on the new cluster I got access to. I have the following questions: 1. The machine does not have ibverbs (that I am familar with in my in-house cluster with OFED) 2. What are my options to get mvapich compiled on this Qlogic fabric? 3. I am not sure what options I choose during configure script of mvapich2 (or mvapich) The following are some concerned files for mpi on this machine. ldd libvapi.so libibt.so.0.0 => /lib64/libibt.so.0.0 (0x000000333d200000) libpublic.so.0.0 => /lib64/libpublic.so.0.0 (0x000000333d600000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032d4200000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000032d5e00000) libm.so.6 => /lib64/libm.so.6 (0x00000032d3e00000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000032d5600000) libc.so.6 => /lib64/libc.so.6 (0x00000032d3600000) libdl.so.2 => /lib64/libdl.so.2 (0x00000032d3a00000) /lib64/ld-linux-x86-64.so.2 (0x00000032d3000000) ldd libmosal.so libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032d4200000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000032d5e00000) libm.so.6 => /lib64/libm.so.6 (0x00000032d3e00000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000032d5600000) libc.so.6 => /lib64/libc.so.6 (0x00000032d3600000) /lib64/ld-linux-x86-64.so.2 (0x00000032d3000000) ldd ./libmtl_common.so libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032d4200000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000032d5e00000) libm.so.6 => /lib64/libm.so.6 (0x00000032d3e00000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000032d5600000) libc.so.6 => /lib64/libc.so.6 (0x00000032d3600000) /lib64/ld-linux-x86-64.so.2 (0x00000032d3000000) Thanks for your input Divi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090607/ed495f14/attachment.html From panda at cse.ohio-state.edu Sun Jun 7 21:48:57 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun Jun 7 21:49:16 2009 Subject: [mvapich-discuss] How do I build mvapich2 on QLOGIC IB? In-Reply-To: Message-ID: With the latest MVAPICH2 1.4RC1, both MVAPICH and MVAPICH2 have now support for Qlogic InfiniPath adapters at the PSM layer. You need to have the latest PSM layer available. For building MVAPICH 1.1 with the PSM layer, please take a look at MVAPICH 1.1 user guide at the following URL: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide-1.1.html#x1-120004.4.3 For building MVAPICH2 1.4RC1 with the PSM layer, please take a look at MVAPICH2 1.4 user guide at the following URL: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-120004.6 You can also refer to these user guides for additional information on running MPI applications at the PSM layer, trouble shooting, etc. Hope this helps. DK On Sun, 7 Jun 2009, Divi Venkateswarlu wrote: > Dear all: > > I am trying to install MVAPICH and MVAPICH2 on a SUN cluster with QLOGIC IB hardware. > > The system has qlogic mpi compiled with gnu compilers. I have problems using this setup > for the chemistry package I am interested in. (This package compiles well on my in-house cluster > using mellanox IB/OFED/MVAPICH/INTEL comibination with pretty good scaling) > > So I am interested in installing MVAPICH/MVAPICH2 with intel compilers on the new cluster I got > access to. > > I have the following questions: > > 1. The machine does not have ibverbs (that I am familar with in my in-house cluster with OFED) > > 2. What are my options to get mvapich compiled on this Qlogic fabric? > > 3. I am not sure what options I choose during configure script of mvapich2 (or mvapich) > > The following are some concerned files for mpi on this machine. > > ldd libvapi.so > libibt.so.0.0 => /lib64/libibt.so.0.0 (0x000000333d200000) > libpublic.so.0.0 => /lib64/libpublic.so.0.0 (0x000000333d600000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032d4200000) > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000032d5e00000) > libm.so.6 => /lib64/libm.so.6 (0x00000032d3e00000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000032d5600000) > libc.so.6 => /lib64/libc.so.6 (0x00000032d3600000) > libdl.so.2 => /lib64/libdl.so.2 (0x00000032d3a00000) > /lib64/ld-linux-x86-64.so.2 (0x00000032d3000000) > > ldd libmosal.so > libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032d4200000) > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000032d5e00000) > libm.so.6 => /lib64/libm.so.6 (0x00000032d3e00000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000032d5600000) > libc.so.6 => /lib64/libc.so.6 (0x00000032d3600000) > /lib64/ld-linux-x86-64.so.2 (0x00000032d3000000) > > ldd ./libmtl_common.so > libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032d4200000) > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000032d5e00000) > libm.so.6 => /lib64/libm.so.6 (0x00000032d3e00000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000032d5600000) > libc.so.6 => /lib64/libc.so.6 (0x00000032d3600000) > /lib64/ld-linux-x86-64.so.2 (0x00000032d3000000) > > Thanks for your input > Divi > > > > > > > From kandalla at cse.ohio-state.edu Sun Jun 7 22:26:20 2009 From: kandalla at cse.ohio-state.edu (Krishna Chaitanya) Date: Sun Jun 7 22:26:44 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069D63@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB43E9069C28@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB43E9069D63@MNEXMB1.qlogic.org> Message-ID: Mike, Can you also let us know about the OFED version that you are using. We are having OFED 1.4 on our systems. Thanks, Krishna On Tue, Jun 2, 2009 at 4:45 PM, Mike Heinz wrote: > We are also seeing this behavior when we installed ?vanilla? OFED rather > than QLogic?s pre-packaged binaries. > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation > > King of Prussia, Pennsylvania > > *From:* kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] *On Behalf Of *Krishna > Chaitanya > *Sent:* Tuesday, June 02, 2009 2:38 PM > *To:* Mike Heinz > *Cc:* Dhabaleswar Panda; Todd Rimmer; mvapich-discuss@cse.ohio-state.edu > > *Subject:* Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive > Benchmark in MVAPICH 1.1 > > > > Mike, > We have run tests on Intel Clovertown and AMD Barcelona machines > with MVAPICH-1.1 and we have compiled the library with the flags that you > mentioned in your previous mail. Unfortunately, we are not able to reproduce > the issue. We have run the complete sendrecv pallas benchmark about a 100 > times in a loop and we see that the peak bandiwdth is in the 2400 - 2600 > MB/s range consistently. > Could you try running the benchmark on two nodes connected back > to back? This will eliminate any network or switch issues. > > Thanks, > Krishna > > On Mon, Jun 1, 2009 at 3:15 PM, Mike Heinz > wrote: > > Interesting. > > For this test, we're using a couple of AMD opterons, running at 2.4 ghz, > and RHEL 4u6, a pair of Mellanox DDR HCAs and a Qlogic 9xxx switch. > > We took the default when installing OFED and, looking at the build log, it > appears that OFED used OPTIMIZATION_FLAG='-O3 -fno-strict-aliasing' when > compiling mvapich. No optimization was chosen when compiling Pallas. > > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Monday, June 01, 2009 3:03 PM > To: Mike Heinz > Cc: mvapich-discuss@cse.ohio-state.edu; mwheinz@me.com; John Russo; Todd > Rimmer > Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive > Benchmark in MVAPICH 1.1 > > Hi Mike, > > Thanks for your report. We tried running PMB (as well IMB, the latest one) > on both the released version of MVAPICH 1.1.0 and the branch version. We > are getting the peak bandwidth to be in the range of 2400-2600 MB/s > consistently. The experiments were done with Mellanox-IB cards, DDR switch > and Intel Colvertown platforms. We are not able to reproduce the problem > you are mentioning. > > Could you please provide more details on the platform, adapter, switch, > etc. Also, let us know if you are using any specific optimization level. > > Thanks, > > DK > > On Mon, 1 Jun 2009, Mike Heinz wrote: > > > We had a customer report what they thought was a hardware problem, and I > was assigned to investigate. Basically, they were claiming odd variations in > performance during PALLAS runs to test their Infiniband fabric. > > > > What I discovered, however, was a much more interesting problem could be > duplicated on any fabric, as long as I was using MVAPICH 1.1.0. > > > > Basically, what I saw was that, given two hosts and a switch, the Pallas > Send Receive benchmark compiled with MVAPICH 1.1.0 would report a > performance of EITHER about 2600 MB/S OR 1850 MB/S with little variation > otherwise. Moreover, this behavior is unique to MVAPICH 1.1.0 - switching to > MVAPICH 2 eliminated the variation. I've attached a chart so you can see > what I mean. > > > > [cid:image002.png@01C9E2A9.4A349440] > > > > I realize that, looking at the chart, your first instinct is to announce > "clearly there was other traffic on the fabric that was interfering with the > benchmark" - but I assure you that was not the case. Moreover, using the > same nodes and same switch, but compiling with MVAPICH2, shows a complete > elimination of the effect: > > > > [cid:image005.png@01C9E2A9.4A349440] > > > > Does anyone have any ideas what's going on? If anyone wants to replicate > this test, all I did was to perform 100 runs of > > > > ./PMB2.2.1/SRC_PMB/PMB-MPI1 Sendrecv > > > > I only used the 4 meg message size for these charts, but that is just for > clarity. The issue appears to affect shorter messages as well. > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation > > King of Prussia, Pennsylvania > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- In the middle of difficulty, lies opportunity -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090607/521fcba1/attachment-0001.html From michael.heinz at qlogic.com Mon Jun 8 09:22:03 2009 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon Jun 8 09:23:22 2009 Subject: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB43E9069BD7@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB43E9069C28@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB43E9069D63@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB43E990A6F7@MNEXMB1.qlogic.org> Krishna, I'll give you an update ASAP; unfortunately, it turns out that this wasn't the same problem the customer was experiencing, so I had to re-focus. However, the common points in all my tests: OFED 1.4.1 (QLogic's latest pull from the OFED git repository), mvapich 1.1.0-3355 Redhat Enterprise, either RHEL 4u6 or, just Friday, RHEL 5u3. Mellanox MHGH28 HCAs of various revisions. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From: kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] On Behalf Of Krishna Chaitanya Sent: Sunday, June 07, 2009 10:26 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu; Dhabaleswar Panda; Todd Rimmer Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 Mike, Can you also let us know about the OFED version that you are using. We are having OFED 1.4 on our systems. Thanks, Krishna On Tue, Jun 2, 2009 at 4:45 PM, Mike Heinz > wrote: We are also seeing this behavior when we installed "vanilla" OFED rather than QLogic's pre-packaged binaries. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From: kris.c1986@gmail.com [mailto:kris.c1986@gmail.com] On Behalf Of Krishna Chaitanya Sent: Tuesday, June 02, 2009 2:38 PM To: Mike Heinz Cc: Dhabaleswar Panda; Todd Rimmer; mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 Mike, We have run tests on Intel Clovertown and AMD Barcelona machines with MVAPICH-1.1 and we have compiled the library with the flags that you mentioned in your previous mail. Unfortunately, we are not able to reproduce the issue. We have run the complete sendrecv pallas benchmark about a 100 times in a loop and we see that the peak bandiwdth is in the 2400 - 2600 MB/s range consistently. Could you try running the benchmark on two nodes connected back to back? This will eliminate any network or switch issues. Thanks, Krishna On Mon, Jun 1, 2009 at 3:15 PM, Mike Heinz > wrote: Interesting. For this test, we're using a couple of AMD opterons, running at 2.4 ghz, and RHEL 4u6, a pair of Mellanox DDR HCAs and a Qlogic 9xxx switch. We took the default when installing OFED and, looking at the build log, it appears that OFED used OPTIMIZATION_FLAG='-O3 -fno-strict-aliasing' when compiling mvapich. No optimization was chosen when compiling Pallas. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Monday, June 01, 2009 3:03 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu; mwheinz@me.com; John Russo; Todd Rimmer Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1 Hi Mike, Thanks for your report. We tried running PMB (as well IMB, the latest one) on both the released version of MVAPICH 1.1.0 and the branch version. We are getting the peak bandwidth to be in the range of 2400-2600 MB/s consistently. The experiments were done with Mellanox-IB cards, DDR switch and Intel Colvertown platforms. We are not able to reproduce the problem you are mentioning. Could you please provide more details on the platform, adapter, switch, etc. Also, let us know if you are using any specific optimization level. Thanks, DK On Mon, 1 Jun 2009, Mike Heinz wrote: > We had a customer report what they thought was a hardware problem, and I was assigned to investigate. Basically, they were claiming odd variations in performance during PALLAS runs to test their Infiniband fabric. > > What I discovered, however, was a much more interesting problem could be duplicated on any fabric, as long as I was using MVAPICH 1.1.0. > > Basically, what I saw was that, given two hosts and a switch, the Pallas Send Receive benchmark compiled with MVAPICH 1.1.0 would report a performance of EITHER about 2600 MB/S OR 1850 MB/S with little variation otherwise. Moreover, this behavior is unique to MVAPICH 1.1.0 - switching to MVAPICH 2 eliminated the variation. I've attached a chart so you can see what I mean. > > [cid:image002.png@01C9E2A9.4A349440] > > I realize that, looking at the chart, your first instinct is to announce "clearly there was other traffic on the fabric that was interfering with the benchmark" - but I assure you that was not the case. Moreover, using the same nodes and same switch, but compiling with MVAPICH2, shows a complete elimination of the effect: > > [cid:image005.png@01C9E2A9.4A349440] > > Does anyone have any ideas what's going on? If anyone wants to replicate this test, all I did was to perform 100 runs of > > ./PMB2.2.1/SRC_PMB/PMB-MPI1 Sendrecv > > I only used the 4 meg message size for these charts, but that is just for clarity. The issue appears to affect shorter messages as well. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss _______________________________________________ mvapich-discuss mailing list mvapich-discuss@cse.ohio-state.edu http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- In the middle of difficulty, lies opportunity -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090608/985210eb/attachment.html From cap at nsc.liu.se Mon Jun 8 10:46:09 2009 From: cap at nsc.liu.se (Peter Kjellstrom) Date: Mon Jun 8 10:46:36 2009 Subject: [mvapich-discuss] How do I build mvapich2 on QLOGIC IB? In-Reply-To: References: Message-ID: <200906081646.13198.cap@nsc.liu.se> On Monday 08 June 2009, Divi Venkateswarlu wrote: > Dear all: > > I am trying to install MVAPICH and MVAPICH2 on a SUN cluster with QLOGIC > IB hardware. > > The system has qlogic mpi compiled with gnu compilers. I have problems > using this setup for the chemistry package I am interested in. (This > package compiles well on my in-house cluster using mellanox > IB/OFED/MVAPICH/INTEL comibination with pretty good scaling) > > So I am interested in installing MVAPICH/MVAPICH2 with intel compilers on > the new cluster I got access to. > > I have the following questions: > > 1. The machine does not have ibverbs (that I am familar with in my > in-house cluster with OFED) > > 2. What are my options to get mvapich compiled on this Qlogic fabric? As an alternative to DKs suggestion, qlogic provides pre-compiled mvapich packages for customers so you don really _have_ to do this yourself. /Peter > 3. I am not sure what options I choose during configure script of > mvapich2 (or mvapich) ... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090608/690bb36c/attachment.bin From dzieko at wcss.pl Tue Jun 9 07:37:22 2009 From: dzieko at wcss.pl (Pawel Dziekonski) Date: Tue Jun 9 07:37:46 2009 Subject: [mvapich-discuss] MVAPICH and MOLCAS Message-ID: <20090609113722.GB14398@cefeid.wcss.wroc.pl> Hi, anybody was successfully in compiling and running MOLCAS (http://www.teokem.lu.se/molcas/) with MVAPICH? In my case paraops.exe hangs or eats all memory. I'm in direct contact with MOLCAS team, but issue seems to be pretty difficult. regards, Pawel -- Pawel Dziekonski Wroclaw Centre for Networking & Supercomputing, HPC Department Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl From panda at cse.ohio-state.edu Tue Jun 9 10:33:54 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Jun 9 10:34:15 2009 Subject: [mvapich-discuss] MVAPICH and MOLCAS In-Reply-To: <20090609113722.GB14398@cefeid.wcss.wroc.pl> Message-ID: Which version of MVAPICH are you using? Have you tested your application with the latest MVAPICH 1.1 version - the version from 1.1 bugfix branch. You can access it from MVAPICH download page. This version is also available with OFED 1.4.1. The latest version includes a fix related to unregistering stale memory registration. You can try your application with this version and let us know whether it works or not. Thanks, DK On Tue, 9 Jun 2009, Pawel Dziekonski wrote: > Hi, > > anybody was successfully in compiling and running MOLCAS > (http://www.teokem.lu.se/molcas/) with MVAPICH? > > In my case paraops.exe hangs or eats all memory. > > I'm in direct contact with MOLCAS team, but issue seems to be pretty > difficult. > > regards, Pawel > -- > Pawel Dziekonski > Wroclaw Centre for Networking & Supercomputing, HPC Department > Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND > phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From dzieko at wcss.pl Tue Jun 9 10:56:32 2009 From: dzieko at wcss.pl (Pawel Dziekonski) Date: Tue Jun 9 10:56:56 2009 Subject: [mvapich-discuss] MVAPICH and MOLCAS In-Reply-To: References: <20090609113722.GB14398@cefeid.wcss.wroc.pl> Message-ID: <20090609145632.GK14398@cefeid.wcss.wroc.pl> On Tue, 09 Jun 2009 at 10:33:54AM -0400, Dhabaleswar Panda wrote: > Which version of MVAPICH are you using? Have you tested your application > with the latest MVAPICH 1.1 version - the version from 1.1 bugfix branch. > You can access it from MVAPICH download page. This version is also > available with OFED 1.4.1. The latest version includes a fix related to > unregistering stale memory registration. Can you write more about this problem or point me to some bug reports? I was recently observing a king of lost memory after specific jobs. This memory can't be allocated by subsequent jobs. Pure malloc clears it. Regarding MOLCAS - after further investigation I think it's prpbably a problem with Global Arrays. regards, P -- Pawel Dziekonski Wroclaw Centre for Networking & Supercomputing, HPC Department Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl From koop at cse.ohio-state.edu Tue Jun 9 15:02:34 2009 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue Jun 9 15:02:53 2009 Subject: [mvapich-discuss] MVAPICH and MOLCAS In-Reply-To: <20090609145632.GK14398@cefeid.wcss.wroc.pl> Message-ID: Pawel, By 'subsequent jobs', what do you mean? If the process has exited and the memory is still free'd we have not heard of such a problem with our library. In the 1.1 branch we have made changes to help with issues where the MPI library was shown to be using too much memory (and failing in some cases). In this case a 'free' was not really unpinning memory immediately, so a 'free' immediately followed by a large malloc (and touching of that buffer) could run the machine out of memory. If you think this might be the problem then the following place is where you can download the latest tarball: http://mvapich.cse.ohio-state.edu/nightly/mvapich/branches/1.1 >From your description though I think it may be a GA issue. Matt On Tue, 9 Jun 2009, Pawel Dziekonski wrote: > On Tue, 09 Jun 2009 at 10:33:54AM -0400, Dhabaleswar Panda wrote: > > Which version of MVAPICH are you using? Have you tested your application > > with the latest MVAPICH 1.1 version - the version from 1.1 bugfix branch. > > You can access it from MVAPICH download page. This version is also > > available with OFED 1.4.1. The latest version includes a fix related to > > unregistering stale memory registration. > > Can you write more about this problem or point me to some bug reports? > I was recently observing a king of lost memory after specific jobs. > This memory can't be allocated by subsequent jobs. Pure malloc clears it. > > Regarding MOLCAS - after further investigation I think it's prpbably > a problem with Global Arrays. > > regards, P > -- > Pawel Dziekonski > Wroclaw Centre for Networking & Supercomputing, HPC Department > Politechnika Wr., pl. Grunwaldzki 9, bud. D2/101, 50-377 Wroclaw, POLAND > phone: +48 71 3202043, fax: +48 71 3225797, http://www.wcss.wroc.pl > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From sumit.panchasara at einfochips.com Thu Jun 11 07:36:59 2009 From: sumit.panchasara at einfochips.com (Sumit Panchasara) Date: Thu Jun 11 07:37:29 2009 Subject: [mvapich-discuss] Mvapich2 over OFA: number of CQ entries Message-ID: <007c01c9ea88$ef1c7040$cd5550c0$@panchasara@einfochips.com> Hello Mvapich2 Team, Recently we have observed that MVAPICH2 over OFA does not run any test (i.e. IMB-MPI1, IMB-EXT, OSU, rotate etc.) due to CQEs being passed in "ibv_create_cq()". Issue: Our provider supports only 32K (0x8000) CQEs where as MVAPICH2 calls create_cq() with 0x9c40 CQEs. This causes failure and not able to run any tests further. Is there any workaround for the same which limits CQEs being created from Mvapich2? Thanks, Sumit Panchasara Embedded Engineer - eInfochips Ltd., India. www.einfochips.com -- _____________________________________________________________________ Disclaimer: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer. Any views expressed in this message are those of the individual sender unless otherwise stated.Company has taken enough precautions to prevent the spread of viruses. However the company accepts no liability for any damage caused by any virus transmitted by this email. _____________________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090611/84b509cc/attachment.html From panda at cse.ohio-state.edu Thu Jun 11 08:13:21 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Jun 11 08:13:42 2009 Subject: [mvapich-discuss] Mvapich2 over OFA: number of CQ entries In-Reply-To: <007c01c9ea88$ef1c7040$cd5550c0$@panchasara@einfochips.com> Message-ID: > Hello Mvapich2 Team, > > Recently we have observed that MVAPICH2 over OFA does not run any test (i.e. > IMB-MPI1, IMB-EXT, OSU, rotate etc.) due to CQEs being passed in > "ibv_create_cq()". > > Issue: > > Our provider supports only 32K (0x8000) CQEs where as MVAPICH2 calls > create_cq() with 0x9c40 CQEs. Can you provide more details on your adapter and the characteristics of the provider library. It does not appear to be a standard InfiniBand adapter. Is it an iWARP adapter? > This causes failure and not able to run any tests further. > > Is there any workaround for the same which limits CQEs being created from > Mvapich2? Since we do not have access to such adapters, it is hard to determine what is going on here. More details about the adapter and its characteristics will be helpful. Thanks, DK > > > Thanks, > > > > Sumit Panchasara > > Embedded Engineer - eInfochips Ltd., India. > > www.einfochips.com > > > > > -- > _____________________________________________________________________ > Disclaimer: This e-mail message and all attachments transmitted with it > are intended solely for the use of the addressee and may contain legally > privileged and confidential information. If the reader of this message > is not the intended recipient, or an employee or agent responsible for > delivering this message to the intended recipient, you are hereby > notified that any dissemination, distribution, copying, or other use of > this message or its attachments is strictly prohibited. If you have > received this message in error, please notify the sender immediately by > replying to this message and please delete it from your computer. Any > views expressed in this message are those of the individual sender > unless otherwise stated.Company has taken enough precautions to prevent > the spread of viruses. However the company accepts no liability for any > damage caused by any virus transmitted by this email. > _____________________________________________________________________ > > From L-marks at northwestern.edu Thu Jun 11 23:36:53 2009 From: L-marks at northwestern.edu (Laurence Marks) Date: Thu Jun 11 23:37:16 2009 Subject: [mvapich-discuss] Processor affinity (slightly special case) Message-ID: <876512660906112036t7248be0as830ee1ef95d12cc3@mail.gmail.com> I have some jobs (large, typically > 64 cores on a dual-quadcore cluster) where if I run 8x8 (for instance) the master node runs into problems because (I think, although it's hard to be 100% certain) the mpi task is using too many resources, OS operations on it stop working or it runs out of swap and the job never completes. One solution is to run, for instance, 1+7+7x8 with one mpi on a single node, 7 on one and 8 on the rest. This is OK (inefficient) because I can run the single on my head node. However, then I come to a limitation because processor affinity on the head node means I can only run one of these at a time, again inefficient. I am interested in suggestions of ways around this. Some possibilities that I can think of: a) Is there a switch which would turn off processor affinity if only one core is being used? (I could write a shell script to do this but that would not be so elegant.) b) Is there some other way to structure the job or some alternative switches that might help. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering to study the structure of matter. From polk678 at gmail.com Fri Jun 12 01:45:43 2009 From: polk678 at gmail.com (gossips J) Date: Fri Jun 12 01:46:09 2009 Subject: [mvapich-discuss] Mvapich2 over OFA: number of CQ entries In-Reply-To: References: Message-ID: I guess you need to use "mv2_default_max_cq_size" param and see if it solves your problem or not. -Polk. On Thu, Jun 11, 2009 at 5:43 PM, Dhabaleswar Panda wrote: > > Hello Mvapich2 Team, > > > > Recently we have observed that MVAPICH2 over OFA does not run any test > (i.e. > > IMB-MPI1, IMB-EXT, OSU, rotate etc.) due to CQEs being passed in > > "ibv_create_cq()". > > > > Issue: > > > > Our provider supports only 32K (0x8000) CQEs where as MVAPICH2 calls > > create_cq() with 0x9c40 CQEs. > > Can you provide more details on your adapter and the characteristics of > the provider library. It does not appear to be a standard InfiniBand > adapter. Is it an iWARP adapter? > > > This causes failure and not able to run any tests further. > > > > Is there any workaround for the same which limits CQEs being created from > > Mvapich2? > > Since we do not have access to such adapters, it is hard to determine what > is going on here. More details about the adapter and its > characteristics will be helpful. > > Thanks, > > DK > > > > > > > Thanks, > > > > > > > > Sumit Panchasara > > > > Embedded Engineer - eInfochips Ltd., India. > > > > www.einfochips.com > > > > > > > > > > -- > > _____________________________________________________________________ > > Disclaimer: This e-mail message and all attachments transmitted with it > > are intended solely for the use of the addressee and may contain legally > > privileged and confidential information. If the reader of this message > > is not the intended recipient, or an employee or agent responsible for > > delivering this message to the intended recipient, you are hereby > > notified that any dissemination, distribution, copying, or other use of > > this message or its attachments is strictly prohibited. If you have > > received this message in error, please notify the sender immediately by > > replying to this message and please delete it from your computer. Any > > views expressed in this message are those of the individual sender > > unless otherwise stated.Company has taken enough precautions to prevent > > the spread of viruses. However the company accepts no liability for any > > damage caused by any virus transmitted by this email. > > _____________________________________________________________________ > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090612/35732621/attachment-0001.html From sumit.panchasara at einfochips.com Fri Jun 12 04:43:27 2009 From: sumit.panchasara at einfochips.com (Sumit Panchasara) Date: Fri Jun 12 04:43:58 2009 Subject: [mvapich-discuss] Mvapich2 over OFA: number of CQ entries In-Reply-To: References: Message-ID: <009901c9eb39$db62d380$92287a80$@panchasara@einfochips.com> From: gossips J [mailto:polk678@gmail.com] Sent: Friday, June 12, 2009 11:16 AM To: Dhabaleswar Panda Cc: Sumit Panchasara; mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Mvapich2 over OFA: number of CQ entries I guess you need to use "mv2_default_max_cq_size" param and see if it solves your problem or not. [[SP]] Thanks a lot,.. it has worked for me. -Polk. -- _____________________________________________________________________ Disclaimer: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer. Any views expressed in this message are those of the individual sender unless otherwise stated.Company has taken enough precautions to prevent the spread of viruses. However the company accepts no liability for any damage caused by any virus transmitted by this email. _____________________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090612/d3eb7b8a/attachment.html From panda at cse.ohio-state.edu Fri Jun 12 07:35:06 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jun 12 07:35:25 2009 Subject: [mvapich-discuss] Processor affinity (slightly special case) In-Reply-To: <876512660906112036t7248be0as830ee1ef95d12cc3@mail.gmail.com> Message-ID: Without the knowledge of your application, it is very hard to know what is going on here. However, both MVAPICH and MVAPICH2 have an option of `blocking progress mode'. This allows you to run more number of processes than the number of cores available. For MVAPICH2 1.4, the runtime environmental variable to select this mode is MV2_USE_BLOCKING. More details are available in the user guide at the following location: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-14600011.60 For MVAPICH, the rumtime environmental variable is VIADEV_USE_BLOCKING. More details on this flag are available in the user guide at the following location: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide-1.1.html#x1-1040009.3.19 See if this mode helps. Thanks, DK On Thu, 11 Jun 2009, Laurence Marks wrote: > I have some jobs (large, typically > 64 cores on a dual-quadcore > cluster) where if I run 8x8 (for instance) the master node runs into > problems because (I think, although it's hard to be 100% certain) the > mpi task is using too many resources, OS operations on it stop working > or it runs out of swap and the job never completes. One solution is to > run, for instance, 1+7+7x8 with one mpi on a single node, 7 on one and > 8 on the rest. This is OK (inefficient) because I can run the single > on my head node. However, then I come to a limitation because > processor affinity on the head node means I can only run one of these > at a time, again inefficient. > > I am interested in suggestions of ways around this. Some possibilities > that I can think of: > a) Is there a switch which would turn off processor affinity if only > one core is being used? (I could write a shell script to do this but > that would not be so elegant.) > b) Is there some other way to structure the job or some alternative > switches that might help. > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/ > Electron crystallography is the branch of science that uses electron > scattering to study the structure of matter. > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From isono at cray.com Mon Jun 15 00:04:45 2009 From: isono at cray.com (Satoshi Isono) Date: Mon Jun 15 00:05:07 2009 Subject: [mvapich-discuss] How to keep gid status Message-ID: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> Dear all, I would like to know how to keep gid status when launching MPI processes. We know that, with sg command in mpirun_rsh command line, it is successful in this case. Can you please advise me. I show a example as below. Most of users belong multiple group. And accounting system is managed based on a group ID (GID). So, all files created from each user must be owned with appropriate group owner information. A problem here is that the state of GID not saved. I would show you a example. Could you read it, according to numbers. 1) User logins into a login node. $ id uid=1002(craysp) gid=1002(cray) groups=10(wheel),1002(cray),8001(GAUSSIAN) This is showing default gid is 1002(cray). This "cray" is primary group ID. 2) User changes arbitrary group with newgrp command. $ newgrp GAUSSIAN $ id uid=1002(craysp) gid=8001(GAUSSIAN) groups=10(wheel),1002(cray),8001(GAUSSIAN) This case is that a user wants to change another group like "GAUSSIAN". Certainly, I make sure it was changed to GAUSSIAN from cray. 3) User runs a MPI job with mpirun_rsh This is the simple MPI code which generates a output file. $ cat gid.c #include #include #include int main(int argc,char *argv[]) { int rank,size,namelen; char name[MPI_MAX_PROCESSOR_NAME],comm[512]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Get_processor_name(name,&namelen); sprintf(comm,"touch testfile_%s_%d",name,rank); system(comm); MPI_Finalize(); return 0; } After running this code, I want that a output file was owned by "GAUSSIAN" group. But it was different from that I want. Below is a run script including mpirun_rsh. $ cat run_i.sh #!/bin/bash . /opt/Modules/init/bash module load pgi mvapich2/pgi mpirun_rsh -np 1 com-0644 ./gid-mv2 4) User confirms that a created file doesn't owned appropriate group ID. $ ls -l testfile_com-0644_0 -rw-r--r-- 1 craysp cray 0 Jun 8 17:50 testfile_com-0644_0 You can confirm that this file is owned "cray" not "GAUSSIAN". This problem is caused on mpirun_rsh command or SSH server configuration, I think. 5) The way to solve it. I am considering that better way is inserting "sg" command just before a.out in mpirun_rsh command line. I would show you a example. $ grep mpirun_rsh run_i.sh mpirun_rsh -np 1 com-0644 /usr/bin/sg `id -gn` ./gid-mv2 By specifying sg command just before a.out, It works well. $ ls -l testfile_com-0644_0 -rw-r--r-- 1 craysp GAUSSIAN 0 Jun 8 18:33 testfile_com-0644_0 6) Request to you I thought that the wrapper script of mpirun_rsh would be created at first. But it is difficult to specify executable file location on command lines. There are various patterns that user describes in mpirun_rsh line. For example: mpirun_rsh -np 2048 -hostfile hosts.txt ./a.out Inputfile | tee -a Outputfile mpirun_rsh -np 256 -hostfile hostlist ./a.out input >> log mpirun_rsh -np 8 -hostfile hostfile MV2_ENABLE_AFFINITY=0 MV2_NUM_HCAS=4 ./numarun_mv2.sh ./a.out ... And we can take a look on line 1607. 1607 /* add the arguments */ 1608 for (i = aout_index + 1; i < argc; i++) { 1609 strcat(command_name, " "); 1610 strcat(command_name, argv[i]); 1611 } An example of edit: 1607 /* add the arguments */ 1608 strcat(command_name, " /usr/bin/sg $(id -gn)"); 1609 for (i = aout_index + 1; i < argc; i++) { 1610 strcat(command_name, " "); 1611 strcat(command_name, argv[i]); 1612 } I have edited showing above and done recompile it, but it doesn't apply. If you know other way which is able to solve this problem, can you please tell me? Best regards, Satoshi Isono From cco2 at cray.com Mon Jun 15 18:03:11 2009 From: cco2 at cray.com (Christopher Co) Date: Mon Jun 15 18:03:33 2009 Subject: [mvapich-discuss] Shared Memory Performance Message-ID: <4A36C51F.2010304@cray.com> Hi, I am doing performance analysis on a Cray CX1 machine. I have run the Pallas MPI benchmark and have noticed a considerable performance difference between MVAPICH2 and Intel MPI on all the tests when shared memory is used. I have also run the benchmark for non-shared memory and the two performed nearly the same (MVAPICH2 was slightly faster). Is this slowdown on shared memory a known issue and/or are there fixes or switches I can enable or disable to get more speed? To give an idea of what I'm seeing, for the simple Ping Pong test for two processes on the same chip, the numbers looks like: Processes # repetitions #bytes Intel MPI time (usec) MVAPICH2 time (usec) 2 1000 0 0.35 0.94 1000 1 0.44 1.24 1000 2 0.45 1.17 1000 4 0.45 1.08 1000 8 0.45 1.11 1000 16 0.44 1.13 1000 32 0.45 1.21 1000 64 0.47 1.35 1000 128 0.48 1.75 1000 256 0.51 2.92 1000 512 0.57 3.41 1000 1024 0.76 3.85 1000 2048 0.98 4.27 1000 4096 1.53 5.14 1000 8192 2.59 8.04 1000 16384 4.86 14.34 1000 32768 7.17 33.92 640 65536 11.65 43.27 320 131072 20.97 66.98 160 262144 39.64 118.58 80 524288 84.91 224.40 40 1048576 212.76 461.80 20 2097152 458.55 1053.67 10 4194304 1738.30 2649.30 Hopefully the table came out clear. MVAPICH2 always lags behind by a considerable amount. Any insight is much appreciated. Thanks! Chris Co From panda at cse.ohio-state.edu Mon Jun 15 18:25:01 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Jun 15 18:25:21 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: <4A36C51F.2010304@cray.com> Message-ID: Can you tell us which version of MVAPICH2 you are using and which option(s) are configured? Are you using correct CPU mapping in both cases? DK On Mon, 15 Jun 2009, Christopher Co wrote: > Hi, > > I am doing performance analysis on a Cray CX1 machine. I have run the > Pallas MPI benchmark and have noticed a considerable performance > difference between MVAPICH2 and Intel MPI on all the tests when shared > memory is used. I have also run the benchmark for non-shared memory and > the two performed nearly the same (MVAPICH2 was slightly faster). Is > this slowdown on shared memory a known issue and/or are there fixes or > switches I can enable or disable to get more speed? > > To give an idea of what I'm seeing, for the simple Ping Pong test for > two processes on the same chip, the numbers looks like: > > Processes # repetitions > #bytes Intel MPI time (usec) MVAPICH2 > time (usec) > 2 1000 0 0.35 0.94 > > 1000 1 0.44 1.24 > > 1000 2 0.45 1.17 > > 1000 4 0.45 1.08 > > 1000 8 0.45 1.11 > > 1000 16 0.44 1.13 > > 1000 32 0.45 1.21 > > 1000 64 0.47 1.35 > > 1000 128 0.48 1.75 > > 1000 256 0.51 2.92 > > 1000 512 0.57 3.41 > > 1000 1024 0.76 3.85 > > 1000 2048 0.98 4.27 > > 1000 4096 1.53 5.14 > > 1000 8192 2.59 8.04 > > 1000 16384 4.86 14.34 > > 1000 32768 7.17 33.92 > > 640 65536 11.65 43.27 > > 320 131072 20.97 66.98 > > 160 262144 39.64 118.58 > > 80 524288 84.91 224.40 > > 40 1048576 212.76 461.80 > > 20 2097152 458.55 1053.67 > > 10 4194304 1738.30 2649.30 > > > Hopefully the table came out clear. MVAPICH2 always lags behind by a > considerable amount. Any insight is much appreciated. Thanks! > > > Chris Co > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From cco2 at cray.com Mon Jun 15 18:50:50 2009 From: cco2 at cray.com (Christopher Co) Date: Mon Jun 15 18:51:14 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: References: Message-ID: <4A36D04A.5020905@cray.com> I am using MVAPICH2 1.4 with the default configuration (since the CX-1 uses Mellanox Infiniband). I am fairly certain my CPU mapping was on-node for both cases (curiously, is there a way for MVAPICH2 to print out the nodes/cores running). I have the numbers for Ping Pong for the off-node case. I should have included this in my earlier message: Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time (usec) 2 1000 0 4.16 3.4 1000 1 4.67 3.56 1000 2 4.21 3.56 1000 4 4.23 3.62 1000 8 4.33 3.63 1000 16 4.33 3.64 1000 32 4.38 3.73 1000 64 4.44 3.92 1000 128 5.61 4.71 1000 256 5.92 5.23 1000 512 6.52 5.79 1000 1024 7.68 7.06 1000 2048 9.97 9.36 1000 4096 12.39 11.97 1000 8192 17.86 22.53 1000 16384 27.44 28.27 1000 32768 40.32 39.82 640 65536 63.61 62.97 320 131072 109.69 110.01 160 262144 204.71 206.9 80 524288 400.72 397.1 40 1048576 775.64 776.45 20 2097152 1523.95 1535.65 10 4194304 3018.84 3054.89 Chris Dhabaleswar Panda wrote: > Can you tell us which version of MVAPICH2 you are using and which > option(s) are configured? Are you using correct CPU mapping in both > cases? > > DK > > On Mon, 15 Jun 2009, Christopher Co wrote: > > >> Hi, >> >> I am doing performance analysis on a Cray CX1 machine. I have run the >> Pallas MPI benchmark and have noticed a considerable performance >> difference between MVAPICH2 and Intel MPI on all the tests when shared >> memory is used. I have also run the benchmark for non-shared memory and >> the two performed nearly the same (MVAPICH2 was slightly faster). Is >> this slowdown on shared memory a known issue and/or are there fixes or >> switches I can enable or disable to get more speed? >> >> To give an idea of what I'm seeing, for the simple Ping Pong test for >> two processes on the same chip, the numbers looks like: >> >> Processes # repetitions >> #bytes Intel MPI time (usec) MVAPICH2 >> time (usec) >> 2 1000 0 0.35 0.94 >> >> 1000 1 0.44 1.24 >> >> 1000 2 0.45 1.17 >> >> 1000 4 0.45 1.08 >> >> 1000 8 0.45 1.11 >> >> 1000 16 0.44 1.13 >> >> 1000 32 0.45 1.21 >> >> 1000 64 0.47 1.35 >> >> 1000 128 0.48 1.75 >> >> 1000 256 0.51 2.92 >> >> 1000 512 0.57 3.41 >> >> 1000 1024 0.76 3.85 >> >> 1000 2048 0.98 4.27 >> >> 1000 4096 1.53 5.14 >> >> 1000 8192 2.59 8.04 >> >> 1000 16384 4.86 14.34 >> >> 1000 32768 7.17 33.92 >> >> 640 65536 11.65 43.27 >> >> 320 131072 20.97 66.98 >> >> 160 262144 39.64 118.58 >> >> 80 524288 84.91 224.40 >> >> 40 1048576 212.76 461.80 >> >> 20 2097152 458.55 1053.67 >> >> 10 4194304 1738.30 2649.30 >> >> >> Hopefully the table came out clear. MVAPICH2 always lags behind by a >> considerable amount. Any insight is much appreciated. Thanks! >> >> >> Chris Co >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > From panda at cse.ohio-state.edu Mon Jun 15 20:25:13 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Jun 15 20:25:33 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: <4A36D04A.5020905@cray.com> Message-ID: Thanks for letting us know that you are using MVAPICH2 1.4. I believe you are taking numbers on Intel systems. Please note that on Intel systems, two cores next to each other within the same chip are numbered as 0 and 4 (not 0 and 1). Thus, the default setting (with processes 0 and 1) run across the chips and thus, you are seeing worse performance. Please run your tests across cores 0 and 4 and you should be able to see better performance. Depending on which pairs of processes you use, you may see some differences in performance for short and large messages (depends on whether these cores are within the same chip, same socket or across sockets). I am attaching some numbers below on our Nehalem system with these two CPU mappings and you can see the performance difference. MVAPICH2 provides flexible mapping of MPI processes to cores within a node. You can try out performance across various pairs and you will see performance difference. More details on such mapping are available from here: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 Also, starting from MVAPICH2 1.4, a new single-copy kernel-based shared-memory scheme (LiMIC2) is introduced. This is `off' by default. You can use it to get better performance for larger message sizes. You need to configure with enable-limic2 and you also need to use MV2_SMP_USE_LIMIC2=1. More details are available from here: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-370006.9 Here are some performance numbers with different CPU mappings. OSU MPI latency with Default CPU mapping (LiMIC2 is off) -------------------------------------------------------- # OSU MPI Latency Test v3.1.1 # Size Latency (us) 0 0.77 1 0.95 2 0.95 4 0.94 8 0.94 16 0.94 32 0.96 64 0.99 128 1.09 256 1.22 512 1.37 1024 1.61 2048 1.79 4096 2.43 8192 5.42 16384 6.73 32768 9.57 65536 15.34 131072 28.71 262144 53.13 524288 100.24 1048576 199.98 2097152 387.28 4194304 991.68 OSU MPI latency with CPU mapping 0:4 (LiMIC2 is off) ---------------------------------------------------- # OSU MPI Latency Test v3.1.1 # Size Latency (us) 0 0.34 1 0.40 2 0.40 4 0.40 8 0.40 16 0.40 32 0.42 64 0.42 128 0.45 256 0.50 512 0.55 1024 0.67 2048 0.91 4096 1.35 8192 3.66 16384 5.01 32768 7.41 65536 12.90 131072 25.21 262144 49.71 524288 97.17 1048576 187.50 2097152 465.57 4194304 1196.31 Let us know if you get better performance with appropriate CPU mapping. Thanks, DK On Mon, 15 Jun 2009, Christopher Co wrote: > I am using MVAPICH2 1.4 with the default configuration (since the CX-1 > uses Mellanox Infiniband). I am fairly certain my CPU mapping was > on-node for both cases (curiously, is there a way for MVAPICH2 to print > out the nodes/cores running). I have the numbers for Ping Pong for the > off-node case. I should have included this in my earlier message: > Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time > (usec) > 2 1000 0 4.16 3.4 > > 1000 1 4.67 3.56 > > 1000 2 4.21 3.56 > > 1000 4 4.23 3.62 > > 1000 8 4.33 3.63 > > 1000 16 4.33 3.64 > > 1000 32 4.38 3.73 > > 1000 64 4.44 3.92 > > 1000 128 5.61 4.71 > > 1000 256 5.92 5.23 > > 1000 512 6.52 5.79 > > 1000 1024 7.68 7.06 > > 1000 2048 9.97 9.36 > > 1000 4096 12.39 11.97 > > 1000 8192 17.86 22.53 > > 1000 16384 27.44 28.27 > > 1000 32768 40.32 39.82 > > 640 65536 63.61 62.97 > > 320 131072 109.69 110.01 > > 160 262144 204.71 206.9 > > 80 524288 400.72 397.1 > > 40 1048576 775.64 776.45 > > 20 2097152 1523.95 1535.65 > > 10 4194304 3018.84 3054.89 > > > > Chris > > > Dhabaleswar Panda wrote: > > Can you tell us which version of MVAPICH2 you are using and which > > option(s) are configured? Are you using correct CPU mapping in both > > cases? > > > > DK > > > > On Mon, 15 Jun 2009, Christopher Co wrote: > > > > > >> Hi, > >> > >> I am doing performance analysis on a Cray CX1 machine. I have run the > >> Pallas MPI benchmark and have noticed a considerable performance > >> difference between MVAPICH2 and Intel MPI on all the tests when shared > >> memory is used. I have also run the benchmark for non-shared memory and > >> the two performed nearly the same (MVAPICH2 was slightly faster). Is > >> this slowdown on shared memory a known issue and/or are there fixes or > >> switches I can enable or disable to get more speed? > >> > >> To give an idea of what I'm seeing, for the simple Ping Pong test for > >> two processes on the same chip, the numbers looks like: > >> > >> Processes # repetitions > >> #bytes Intel MPI time (usec) MVAPICH2 > >> time (usec) > >> 2 1000 0 0.35 0.94 > >> > >> 1000 1 0.44 1.24 > >> > >> 1000 2 0.45 1.17 > >> > >> 1000 4 0.45 1.08 > >> > >> 1000 8 0.45 1.11 > >> > >> 1000 16 0.44 1.13 > >> > >> 1000 32 0.45 1.21 > >> > >> 1000 64 0.47 1.35 > >> > >> 1000 128 0.48 1.75 > >> > >> 1000 256 0.51 2.92 > >> > >> 1000 512 0.57 3.41 > >> > >> 1000 1024 0.76 3.85 > >> > >> 1000 2048 0.98 4.27 > >> > >> 1000 4096 1.53 5.14 > >> > >> 1000 8192 2.59 8.04 > >> > >> 1000 16384 4.86 14.34 > >> > >> 1000 32768 7.17 33.92 > >> > >> 640 65536 11.65 43.27 > >> > >> 320 131072 20.97 66.98 > >> > >> 160 262144 39.64 118.58 > >> > >> 80 524288 84.91 224.40 > >> > >> 40 1048576 212.76 461.80 > >> > >> 20 2097152 458.55 1053.67 > >> > >> 10 4194304 1738.30 2649.30 > >> > >> > >> Hopefully the table came out clear. MVAPICH2 always lags behind by a > >> considerable amount. Any insight is much appreciated. Thanks! > >> > >> > >> Chris Co > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > > > > > From L-marks at northwestern.edu Tue Jun 16 13:52:51 2009 From: L-marks at northwestern.edu (Laurence Marks) Date: Tue Jun 16 13:53:13 2009 Subject: [mvapich-discuss] Options for mpispawn Message-ID: <876512660906161052j4342c886rd283f42e0fc0abb1@mail.gmail.com> Are there any environment/startup files for mpispawn which are node specific? -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering to study the structure of matter. From perkinjo at cse.ohio-state.edu Tue Jun 16 14:18:39 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Tue Jun 16 14:19:01 2009 Subject: [mvapich-discuss] Options for mpispawn In-Reply-To: <876512660906161052j4342c886rd283f42e0fc0abb1@mail.gmail.com> References: <876512660906161052j4342c886rd283f42e0fc0abb1@mail.gmail.com> Message-ID: <20090616181839.GG3120@cse.ohio-state.edu> On Tue, Jun 16, 2009 at 12:52:51PM -0500, Laurence Marks wrote: > Are there any environment/startup files for mpispawn which are node specific? Are you asking whether there are any files created by mpispawn temporarily when mpispawn is invoked or are you asking whether there are any options that changes the behavior of mpispawn? > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/ > Electron crystallography is the branch of science that uses electron > scattering to study the structure of matter. > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090616/30122a61/attachment.bin From jimg at osc.edu Tue Jun 16 13:53:02 2009 From: jimg at osc.edu (James Giuliani) Date: Tue Jun 16 14:22:41 2009 Subject: [mvapich-discuss] Problem with MVAPICH and Intel compilers on Glenn and BALE Message-ID: <4A37A3C2.AD06.00A2.0@osc.edu> I have finally been able to reproduce a memory problem that occurs with MVAPICH and the Intel compilers. I have been able to reproduce the problem down to a series of F90 allocates and deallocates in the attached f90 code. To summarize the behavior, with the following environment: mvapich (0.9.9 or 1.1) Intel compilers (v10) the attached program will hang after 70,520 loop iterations. If you set the loop logic to exit after 500 iterations, MPI_FINALIZE dies with a segmentation violation. mpiexec -n 1 ./test going back in loop 1 going back in loop 2 .. .. going back in loop 499 going back in loop 500 forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source libpthread.so.0 00002AC8C53FFE70 Unknown Unknown Unknown libmpich.so.1.0 00002AC8C4D0AB91 Unknown Unknown Unknown libmpich.so.1.0 00002AC8C4D0940D Unknown Unknown Unknown libmpich.so.1.0 00002AC8C4D22A91 Unknown Unknown Unknown libmpich.so.1.0 00002AC8C4CE0BBB Unknown Unknown Unknown libmpich.so.1.0 00002AC8C4CE0912 Unknown Unknown Unknown test 0000000000405A17 Unknown Unknown Unknown test 0000000000403D42 Unknown Unknown Unknown libc.so.6 00002AC8C5AB68B4 Unknown Unknown Unknown test 0000000000403C69 Unknown Unknown Unknown This problem does not appear if: 1) You compile with Portland group compiler 2) You move the MPI_INIT until after the loop Is anyone aware why this may be a problem? -------------- next part -------------- A non-text attachment was scrubbed... Name: test.f90 Type: application/octet-stream Size: 2833 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090616/bc38aa29/test.obj From panda at cse.ohio-state.edu Tue Jun 16 15:17:14 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Jun 16 15:17:35 2009 Subject: [mvapich-discuss] Problem with MVAPICH and Intel compilers on Glenn and BALE In-Reply-To: <4A37A3C2.AD06.00A2.0@osc.edu> Message-ID: Jim, The following note was posted on the mvapich-discuss list on June 9th as a response to another query. Not sure whether the symptoms you are seeing because of the back-to-back allocation and deallocation and memory not being returned properly. A patch was applied to the 1.1 branch version last month to solve this problem. Can you try the 1.1 branch version (the URL specified below) and see whether the problem persists or goes away. Thanks, DK ============== In the 1.1 branch we have made changes to help with issues where the MPI library was shown to be using too much memory (and failing in some cases). In this case a 'free' was not really unpinning memory immediately, so a 'free' immediately followed by a large malloc (and touching of that buffer) could run the machine out of memory. If you think this might be the problem then the following place is where you can download the latest tarball: http://mvapich.cse.ohio-state.edu/nightly/mvapich/branches/1.1 ================================= On Tue, 16 Jun 2009, James Giuliani wrote: > I have finally been able to reproduce a memory problem that occurs with MVAPICH and the Intel compilers. > > I have been able to reproduce the problem down to a series of F90 allocates and deallocates in the attached f90 code. > > To summarize the behavior, with the following environment: > > mvapich (0.9.9 or 1.1) > Intel compilers (v10) > > the attached program will hang after 70,520 loop iterations. If you set the loop logic to exit after 500 iterations, MPI_FINALIZE dies with a segmentation violation. > > mpiexec -n 1 ./test > going back in loop 1 > going back in loop 2 > .. > .. > going back in loop 499 > going back in loop 500 > forrtl: severe (174): SIGSEGV, segmentation fault occurred > Image PC Routine Line Source > libpthread.so.0 00002AC8C53FFE70 Unknown Unknown Unknown > libmpich.so.1.0 00002AC8C4D0AB91 Unknown Unknown Unknown > libmpich.so.1.0 00002AC8C4D0940D Unknown Unknown Unknown > libmpich.so.1.0 00002AC8C4D22A91 Unknown Unknown Unknown > libmpich.so.1.0 00002AC8C4CE0BBB Unknown Unknown Unknown > libmpich.so.1.0 00002AC8C4CE0912 Unknown Unknown Unknown > test 0000000000405A17 Unknown Unknown Unknown > test 0000000000403D42 Unknown Unknown Unknown > libc.so.6 00002AC8C5AB68B4 Unknown Unknown Unknown > test 0000000000403C69 Unknown Unknown Unknown > > This problem does not appear if: > > 1) You compile with Portland group compiler > 2) You move the MPI_INIT until after the loop > > Is anyone aware why this may be a problem? > > From panda at cse.ohio-state.edu Tue Jun 16 15:34:30 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Jun 16 15:34:51 2009 Subject: [mvapich-discuss] How to keep gid status In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> Message-ID: Hi, Thanks for your note. I shared your question with Dr. Bill Barth from TACC. Folks from TACC have been using MVAPICH with mpirun_rsh in their production environment on Ranger for quite some time. I am including his reply below. I hope his suggested approach will work for you. Let us know. I am cc'ing Dr. Barth on this e-mail also. If there are any additional questions, two of you might exchange additional information on this issue. Thanks, DK ==================================================================== As you may recall, we have wrapper scripts that we use on Ranger and Lonestar to hide the details of the mpirun_rsh command line from the users. We call it 'ibrun'. It interacts with the scheduler (through the environment) to generate the host list and establish the number of tasks to start. I don't see why it would be hard to add a call to /usr/bin/sg in there. If the user would have invoked mpirun_rsh -np 5 -hostfile hosts ./foo he simply runs ibrun ./foo on Ranger or Lonestar. 'ibrun' is basically structured as: #!/bin/bash ....find NP from the envrionment.... ....find host list.... $MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE "$@" So it just takes the command line args of ibrun and passes them directly to mpirun_rsh There's no reason it couldn't do #!/bin/bash ....find NP.... ....find host list.... GROUP_ID=`id -gn` $MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE /usr/bin/sg $GROUP_ID "$@" It should be this straightforward. Bill. ====== On Sun, 14 Jun 2009, Satoshi Isono wrote: > Dear all, > > I would like to know how to keep gid status when launching MPI > processes. We know that, with sg command in mpirun_rsh command line, it > is successful in this case. Can you please advise me. I show a example > as below. > > Most of users belong multiple group. And accounting system is managed > based on a group ID (GID). So, all files created from each user must be > owned with appropriate group owner information. > > A problem here is that the state of GID not saved. I would show you a > example. Could you read it, according to numbers. > > 1) User logins into a login node. > > $ id > uid=1002(craysp) gid=1002(cray) > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > This is showing default gid is 1002(cray). This "cray" is primary group > ID. > > 2) User changes arbitrary group with newgrp command. > > $ newgrp GAUSSIAN > $ id > uid=1002(craysp) gid=8001(GAUSSIAN) > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > This case is that a user wants to change another group like "GAUSSIAN". > Certainly, I make sure it was changed to GAUSSIAN from cray. > > 3) User runs a MPI job with mpirun_rsh > > This is the simple MPI code which generates a output file. > > $ cat gid.c > #include > #include > #include > > int main(int argc,char *argv[]) > { > int rank,size,namelen; > char name[MPI_MAX_PROCESSOR_NAME],comm[512]; > > MPI_Init(&argc,&argv); > > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > MPI_Comm_size(MPI_COMM_WORLD,&size); > MPI_Get_processor_name(name,&namelen); > > sprintf(comm,"touch testfile_%s_%d",name,rank); > system(comm); > > MPI_Finalize(); > return 0; > } > > After running this code, I want that a output file was owned by > "GAUSSIAN" group. But it was different from that I want. Below is a run > script including mpirun_rsh. > > $ cat run_i.sh > #!/bin/bash > . /opt/Modules/init/bash > module load pgi mvapich2/pgi > mpirun_rsh -np 1 com-0644 ./gid-mv2 > > 4) User confirms that a created file doesn't owned appropriate group ID. > > $ ls -l testfile_com-0644_0 > -rw-r--r-- 1 craysp cray 0 Jun 8 17:50 testfile_com-0644_0 > > You can confirm that this file is owned "cray" not "GAUSSIAN". This > problem is caused on mpirun_rsh command or SSH server configuration, I > think. > > 5) The way to solve it. > > I am considering that better way is inserting "sg" command just before > a.out in mpirun_rsh command line. I would show you a example. > > $ grep mpirun_rsh run_i.sh > mpirun_rsh -np 1 com-0644 /usr/bin/sg `id -gn` ./gid-mv2 > > By specifying sg command just before a.out, It works well. > > $ ls -l testfile_com-0644_0 > -rw-r--r-- 1 craysp GAUSSIAN 0 Jun 8 18:33 testfile_com-0644_0 > > 6) Request to you > > I thought that the wrapper script of mpirun_rsh would be created at > first. But it is difficult to specify executable file location on > command lines. There are various patterns that user describes in > mpirun_rsh line. For example: > > mpirun_rsh -np 2048 -hostfile hosts.txt ./a.out Inputfile | tee -a > Outputfile > mpirun_rsh -np 256 -hostfile hostlist ./a.out input >> log > mpirun_rsh -np 8 -hostfile hostfile MV2_ENABLE_AFFINITY=0 > MV2_NUM_HCAS=4 ./numarun_mv2.sh ./a.out > ... > > And we can take a look on line 1607. > > 1607 /* add the arguments */ > 1608 for (i = aout_index + 1; i < argc; i++) { > 1609 strcat(command_name, " "); > 1610 strcat(command_name, argv[i]); > 1611 } > > An example of edit: > > 1607 /* add the arguments */ > 1608 strcat(command_name, " /usr/bin/sg $(id -gn)"); > 1609 for (i = aout_index + 1; i < argc; i++) { > 1610 strcat(command_name, " "); > 1611 strcat(command_name, argv[i]); > 1612 } > > I have edited showing above and done recompile it, but it doesn't apply. > If you know other way which is able to solve this problem, can you > please tell me? > > Best regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From L-marks at northwestern.edu Tue Jun 16 16:17:00 2009 From: L-marks at northwestern.edu (Laurence Marks) Date: Tue Jun 16 16:17:22 2009 Subject: [mvapich-discuss] Options for mpispawn In-Reply-To: <20090616181839.GG3120@cse.ohio-state.edu> References: <876512660906161052j4342c886rd283f42e0fc0abb1@mail.gmail.com> <20090616181839.GG3120@cse.ohio-state.edu> Message-ID: <876512660906161317t6cb716b3lb6d55b3116b84fb8@mail.gmail.com> Here is what I would like to do, either:a) Set processor affinity off if mpispawn is only spawning 1 mpi (MPISPAWN_LOCAL_NPROCS=1) b) Set processor affinity off if mpispawn in running on a specific computer, e.g. by looking in /etc/mpispawn.rc or similar. On Tue, Jun 16, 2009 at 1:18 PM, Jonathan Perkins < perkinjo@cse.ohio-state.edu> wrote: > On Tue, Jun 16, 2009 at 12:52:51PM -0500, Laurence Marks wrote: > > Are there any environment/startup files for mpispawn which are node > specific? > > Are you asking whether there are any files created by mpispawn > temporarily when mpispawn is invoked or are you asking whether there are > any options that changes the behavior of mpispawn? > > > > > -- > > Laurence Marks > > Department of Materials Science and Engineering > > MSE Rm 2036 Cook Hall > > 2220 N Campus Drive > > Northwestern University > > Evanston, IL 60208, USA > > Tel: (847) 491-3996 Fax: (847) 491-7820 > > email: L-marks at northwestern dot edu > > Web: www.numis.northwestern.edu > > Chair, Commission on Electron Crystallography of IUCR > > www.numis.northwestern.edu/ > > Electron crystallography is the branch of science that uses electron > > scattering to study the structure of matter. > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering to study the structure of matter. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090616/dec18cd2/attachment.html From cco2 at cray.com Tue Jun 16 18:55:11 2009 From: cco2 at cray.com (Christopher Co) Date: Tue Jun 16 18:55:39 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: References: Message-ID: <4A3822CF.4070800@cray.com> I am having issues with running processes on the cores I specify using MV2_CPU_MAPPING. Is the PLPA support for mapping MPI processes to cores embedded in MVAPICH2 or does it link to an existing PLPA on configure/install? Also, I want to confirm that no extra configure options are needed to enable this feature. Thanks, Chris Dhabaleswar Panda wrote: > Thanks for letting us know that you are using MVAPICH2 1.4. I believe you > are taking numbers on Intel systems. Please note that on Intel systems, > two cores next to each other within the same chip are numbered as 0 and 4 > (not 0 and 1). Thus, the default setting (with processes 0 and 1) run > across the chips and thus, you are seeing worse performance. Please run > your tests across cores 0 and 4 and you should be able to see better > performance. Depending on which pairs of processes you use, you may see > some differences in performance for short and large messages (depends on > whether these cores are within the same chip, same socket or across > sockets). I am attaching some numbers below on our Nehalem system with > these two CPU mappings and you can see the performance difference. > > MVAPICH2 provides flexible mapping of MPI processes to cores within a > node. You can try out performance across various pairs and you will see > performance difference. More details on such mapping are available from > here: > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 > > Also, starting from MVAPICH2 1.4, a new single-copy kernel-based > shared-memory scheme (LiMIC2) is introduced. This is `off' by default. > You can use it to get better performance for larger message sizes. You > need to configure with enable-limic2 and you also need to use > MV2_SMP_USE_LIMIC2=1. More details are available from here: > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-370006.9 > > Here are some performance numbers with different CPU mappings. > > OSU MPI latency with Default CPU mapping (LiMIC2 is off) > -------------------------------------------------------- > > # OSU MPI Latency Test v3.1.1 > # Size Latency (us) > 0 0.77 > 1 0.95 > 2 0.95 > 4 0.94 > 8 0.94 > 16 0.94 > 32 0.96 > 64 0.99 > 128 1.09 > 256 1.22 > 512 1.37 > 1024 1.61 > 2048 1.79 > 4096 2.43 > 8192 5.42 > 16384 6.73 > 32768 9.57 > 65536 15.34 > 131072 28.71 > 262144 53.13 > 524288 100.24 > 1048576 199.98 > 2097152 387.28 > 4194304 991.68 > > OSU MPI latency with CPU mapping 0:4 (LiMIC2 is off) > ---------------------------------------------------- > > # OSU MPI Latency Test v3.1.1 > # Size Latency (us) > 0 0.34 > 1 0.40 > 2 0.40 > 4 0.40 > 8 0.40 > 16 0.40 > 32 0.42 > 64 0.42 > 128 0.45 > 256 0.50 > 512 0.55 > 1024 0.67 > 2048 0.91 > 4096 1.35 > 8192 3.66 > 16384 5.01 > 32768 7.41 > 65536 12.90 > 131072 25.21 > 262144 49.71 > 524288 97.17 > 1048576 187.50 > 2097152 465.57 > 4194304 1196.31 > > Let us know if you get better performance with appropriate CPU mapping. > > Thanks, > > DK > > > On Mon, 15 Jun 2009, Christopher Co wrote: > > >> I am using MVAPICH2 1.4 with the default configuration (since the CX-1 >> uses Mellanox Infiniband). I am fairly certain my CPU mapping was >> on-node for both cases (curiously, is there a way for MVAPICH2 to print >> out the nodes/cores running). I have the numbers for Ping Pong for the >> off-node case. I should have included this in my earlier message: >> Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time >> (usec) >> 2 1000 0 4.16 3.4 >> >> 1000 1 4.67 3.56 >> >> 1000 2 4.21 3.56 >> >> 1000 4 4.23 3.62 >> >> 1000 8 4.33 3.63 >> >> 1000 16 4.33 3.64 >> >> 1000 32 4.38 3.73 >> >> 1000 64 4.44 3.92 >> >> 1000 128 5.61 4.71 >> >> 1000 256 5.92 5.23 >> >> 1000 512 6.52 5.79 >> >> 1000 1024 7.68 7.06 >> >> 1000 2048 9.97 9.36 >> >> 1000 4096 12.39 11.97 >> >> 1000 8192 17.86 22.53 >> >> 1000 16384 27.44 28.27 >> >> 1000 32768 40.32 39.82 >> >> 640 65536 63.61 62.97 >> >> 320 131072 109.69 110.01 >> >> 160 262144 204.71 206.9 >> >> 80 524288 400.72 397.1 >> >> 40 1048576 775.64 776.45 >> >> 20 2097152 1523.95 1535.65 >> >> 10 4194304 3018.84 3054.89 >> >> >> >> Chris >> >> >> Dhabaleswar Panda wrote: >> >>> Can you tell us which version of MVAPICH2 you are using and which >>> option(s) are configured? Are you using correct CPU mapping in both >>> cases? >>> >>> DK >>> >>> On Mon, 15 Jun 2009, Christopher Co wrote: >>> >>> >>> >>>> Hi, >>>> >>>> I am doing performance analysis on a Cray CX1 machine. I have run the >>>> Pallas MPI benchmark and have noticed a considerable performance >>>> difference between MVAPICH2 and Intel MPI on all the tests when shared >>>> memory is used. I have also run the benchmark for non-shared memory and >>>> the two performed nearly the same (MVAPICH2 was slightly faster). Is >>>> this slowdown on shared memory a known issue and/or are there fixes or >>>> switches I can enable or disable to get more speed? >>>> >>>> To give an idea of what I'm seeing, for the simple Ping Pong test for >>>> two processes on the same chip, the numbers looks like: >>>> >>>> Processes # repetitions >>>> #bytes Intel MPI time (usec) MVAPICH2 >>>> time (usec) >>>> 2 1000 0 0.35 0.94 >>>> >>>> 1000 1 0.44 1.24 >>>> >>>> 1000 2 0.45 1.17 >>>> >>>> 1000 4 0.45 1.08 >>>> >>>> 1000 8 0.45 1.11 >>>> >>>> 1000 16 0.44 1.13 >>>> >>>> 1000 32 0.45 1.21 >>>> >>>> 1000 64 0.47 1.35 >>>> >>>> 1000 128 0.48 1.75 >>>> >>>> 1000 256 0.51 2.92 >>>> >>>> 1000 512 0.57 3.41 >>>> >>>> 1000 1024 0.76 3.85 >>>> >>>> 1000 2048 0.98 4.27 >>>> >>>> 1000 4096 1.53 5.14 >>>> >>>> 1000 8192 2.59 8.04 >>>> >>>> 1000 16384 4.86 14.34 >>>> >>>> 1000 32768 7.17 33.92 >>>> >>>> 640 65536 11.65 43.27 >>>> >>>> 320 131072 20.97 66.98 >>>> >>>> 160 262144 39.64 118.58 >>>> >>>> 80 524288 84.91 224.40 >>>> >>>> 40 1048576 212.76 461.80 >>>> >>>> 20 2097152 458.55 1053.67 >>>> >>>> 10 4194304 1738.30 2649.30 >>>> >>>> >>>> Hopefully the table came out clear. MVAPICH2 always lags behind by a >>>> considerable amount. Any insight is much appreciated. Thanks! >>>> >>>> >>>> Chris Co >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >>> > > From panda at cse.ohio-state.edu Tue Jun 16 21:43:59 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Jun 16 21:44:19 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: <4A3822CF.4070800@cray.com> Message-ID: Could you let us know what issues you are seeing when using MV2_CPU_MAPPING. The PLPA support is embedded in MVAPICH2 code. It does not require any additional configure/install. I am assuming that you are using the Gen2 (OFED) interface with mpirun_rsh and your systems are Linux-based. Thanks, DK On Tue, 16 Jun 2009, Christopher Co wrote: > I am having issues with running processes on the cores I specify using > MV2_CPU_MAPPING. Is the PLPA support for mapping MPI processes to cores > embedded in MVAPICH2 or does it link to an existing PLPA on > configure/install? Also, I want to confirm that no extra configure > options are needed to enable this feature. > > > Thanks, > Chris > > Dhabaleswar Panda wrote: > > Thanks for letting us know that you are using MVAPICH2 1.4. I believe you > > are taking numbers on Intel systems. Please note that on Intel systems, > > two cores next to each other within the same chip are numbered as 0 and 4 > > (not 0 and 1). Thus, the default setting (with processes 0 and 1) run > > across the chips and thus, you are seeing worse performance. Please run > > your tests across cores 0 and 4 and you should be able to see better > > performance. Depending on which pairs of processes you use, you may see > > some differences in performance for short and large messages (depends on > > whether these cores are within the same chip, same socket or across > > sockets). I am attaching some numbers below on our Nehalem system with > > these two CPU mappings and you can see the performance difference. > > > > MVAPICH2 provides flexible mapping of MPI processes to cores within a > > node. You can try out performance across various pairs and you will see > > performance difference. More details on such mapping are available from > > here: > > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 > > > > Also, starting from MVAPICH2 1.4, a new single-copy kernel-based > > shared-memory scheme (LiMIC2) is introduced. This is `off' by default. > > You can use it to get better performance for larger message sizes. You > > need to configure with enable-limic2 and you also need to use > > MV2_SMP_USE_LIMIC2=1. More details are available from here: > > > > http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-370006.9 > > > > Here are some performance numbers with different CPU mappings. > > > > OSU MPI latency with Default CPU mapping (LiMIC2 is off) > > -------------------------------------------------------- > > > > # OSU MPI Latency Test v3.1.1 > > # Size Latency (us) > > 0 0.77 > > 1 0.95 > > 2 0.95 > > 4 0.94 > > 8 0.94 > > 16 0.94 > > 32 0.96 > > 64 0.99 > > 128 1.09 > > 256 1.22 > > 512 1.37 > > 1024 1.61 > > 2048 1.79 > > 4096 2.43 > > 8192 5.42 > > 16384 6.73 > > 32768 9.57 > > 65536 15.34 > > 131072 28.71 > > 262144 53.13 > > 524288 100.24 > > 1048576 199.98 > > 2097152 387.28 > > 4194304 991.68 > > > > OSU MPI latency with CPU mapping 0:4 (LiMIC2 is off) > > ---------------------------------------------------- > > > > # OSU MPI Latency Test v3.1.1 > > # Size Latency (us) > > 0 0.34 > > 1 0.40 > > 2 0.40 > > 4 0.40 > > 8 0.40 > > 16 0.40 > > 32 0.42 > > 64 0.42 > > 128 0.45 > > 256 0.50 > > 512 0.55 > > 1024 0.67 > > 2048 0.91 > > 4096 1.35 > > 8192 3.66 > > 16384 5.01 > > 32768 7.41 > > 65536 12.90 > > 131072 25.21 > > 262144 49.71 > > 524288 97.17 > > 1048576 187.50 > > 2097152 465.57 > > 4194304 1196.31 > > > > Let us know if you get better performance with appropriate CPU mapping. > > > > Thanks, > > > > DK > > > > > > On Mon, 15 Jun 2009, Christopher Co wrote: > > > > > >> I am using MVAPICH2 1.4 with the default configuration (since the CX-1 > >> uses Mellanox Infiniband). I am fairly certain my CPU mapping was > >> on-node for both cases (curiously, is there a way for MVAPICH2 to print > >> out the nodes/cores running). I have the numbers for Ping Pong for the > >> off-node case. I should have included this in my earlier message: > >> Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time > >> (usec) > >> 2 1000 0 4.16 3.4 > >> > >> 1000 1 4.67 3.56 > >> > >> 1000 2 4.21 3.56 > >> > >> 1000 4 4.23 3.62 > >> > >> 1000 8 4.33 3.63 > >> > >> 1000 16 4.33 3.64 > >> > >> 1000 32 4.38 3.73 > >> > >> 1000 64 4.44 3.92 > >> > >> 1000 128 5.61 4.71 > >> > >> 1000 256 5.92 5.23 > >> > >> 1000 512 6.52 5.79 > >> > >> 1000 1024 7.68 7.06 > >> > >> 1000 2048 9.97 9.36 > >> > >> 1000 4096 12.39 11.97 > >> > >> 1000 8192 17.86 22.53 > >> > >> 1000 16384 27.44 28.27 > >> > >> 1000 32768 40.32 39.82 > >> > >> 640 65536 63.61 62.97 > >> > >> 320 131072 109.69 110.01 > >> > >> 160 262144 204.71 206.9 > >> > >> 80 524288 400.72 397.1 > >> > >> 40 1048576 775.64 776.45 > >> > >> 20 2097152 1523.95 1535.65 > >> > >> 10 4194304 3018.84 3054.89 > >> > >> > >> > >> Chris > >> > >> > >> Dhabaleswar Panda wrote: > >> > >>> Can you tell us which version of MVAPICH2 you are using and which > >>> option(s) are configured? Are you using correct CPU mapping in both > >>> cases? > >>> > >>> DK > >>> > >>> On Mon, 15 Jun 2009, Christopher Co wrote: > >>> > >>> > >>> > >>>> Hi, > >>>> > >>>> I am doing performance analysis on a Cray CX1 machine. I have run the > >>>> Pallas MPI benchmark and have noticed a considerable performance > >>>> difference between MVAPICH2 and Intel MPI on all the tests when shared > >>>> memory is used. I have also run the benchmark for non-shared memory and > >>>> the two performed nearly the same (MVAPICH2 was slightly faster). Is > >>>> this slowdown on shared memory a known issue and/or are there fixes or > >>>> switches I can enable or disable to get more speed? > >>>> > >>>> To give an idea of what I'm seeing, for the simple Ping Pong test for > >>>> two processes on the same chip, the numbers looks like: > >>>> > >>>> Processes # repetitions > >>>> #bytes Intel MPI time (usec) MVAPICH2 > >>>> time (usec) > >>>> 2 1000 0 0.35 0.94 > >>>> > >>>> 1000 1 0.44 1.24 > >>>> > >>>> 1000 2 0.45 1.17 > >>>> > >>>> 1000 4 0.45 1.08 > >>>> > >>>> 1000 8 0.45 1.11 > >>>> > >>>> 1000 16 0.44 1.13 > >>>> > >>>> 1000 32 0.45 1.21 > >>>> > >>>> 1000 64 0.47 1.35 > >>>> > >>>> 1000 128 0.48 1.75 > >>>> > >>>> 1000 256 0.51 2.92 > >>>> > >>>> 1000 512 0.57 3.41 > >>>> > >>>> 1000 1024 0.76 3.85 > >>>> > >>>> 1000 2048 0.98 4.27 > >>>> > >>>> 1000 4096 1.53 5.14 > >>>> > >>>> 1000 8192 2.59 8.04 > >>>> > >>>> 1000 16384 4.86 14.34 > >>>> > >>>> 1000 32768 7.17 33.92 > >>>> > >>>> 640 65536 11.65 43.27 > >>>> > >>>> 320 131072 20.97 66.98 > >>>> > >>>> 160 262144 39.64 118.58 > >>>> > >>>> 80 524288 84.91 224.40 > >>>> > >>>> 40 1048576 212.76 461.80 > >>>> > >>>> 20 2097152 458.55 1053.67 > >>>> > >>>> 10 4194304 1738.30 2649.30 > >>>> > >>>> > >>>> Hopefully the table came out clear. MVAPICH2 always lags behind by a > >>>> considerable amount. Any insight is much appreciated. Thanks! > >>>> > >>>> > >>>> Chris Co > >>>> _______________________________________________ > >>>> mvapich-discuss mailing list > >>>> mvapich-discuss@cse.ohio-state.edu > >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>> > >>>> > >>>> > >>> > > > > > From perkinjo at cse.ohio-state.edu Wed Jun 17 08:43:12 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Wed Jun 17 08:43:48 2009 Subject: [mvapich-discuss] Options for mpispawn In-Reply-To: <876512660906161317t6cb716b3lb6d55b3116b84fb8@mail.gmail.com> References: <876512660906161052j4342c886rd283f42e0fc0abb1@mail.gmail.com> <20090616181839.GG3120@cse.ohio-state.edu> <876512660906161317t6cb716b3lb6d55b3116b84fb8@mail.gmail.com> Message-ID: <20090617124312.GE3072@cse.ohio-state.edu> On Tue, Jun 16, 2009 at 03:17:00PM -0500, Laurence Marks wrote: > On Tue, Jun 16, 2009 at 1:18 PM, Jonathan Perkins < > perkinjo@cse.ohio-state.edu> wrote: > > > On Tue, Jun 16, 2009 at 12:52:51PM -0500, Laurence Marks wrote: > > > Are there any environment/startup files for mpispawn which are node > > specific? > > > > Are you asking whether there are any files created by mpispawn > > temporarily when mpispawn is invoked or are you asking whether there are > > any options that changes the behavior of mpispawn? > > Here is what I would like to do, either:a) Set processor affinity off if > mpispawn is only spawning 1 mpi (MPISPAWN_LOCAL_NPROCS=1) > b) Set processor affinity off if mpispawn in running on a specific computer, > e.g. by looking in /etc/mpispawn.rc or similar. Okay, you're asking for some sort of configuration options based on either runtime conditions or preset configuration file(s). We don't currently support this but can consider it for future releases. Based on my first glance it appears that option a) should be possible. As you may already know, you can disable processor affinity by setting MV2_ENABLE_AFFINITY=0 on the mpirun_rsh command line if you want to globally disable this. Either way, I'll discuss this with my team members to see whether we can take these situations into account. -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090617/f5e4b427/attachment-0001.bin From L-marks at northwestern.edu Wed Jun 17 09:03:09 2009 From: L-marks at northwestern.edu (Laurence Marks) Date: Wed Jun 17 09:03:30 2009 Subject: [mvapich-discuss] Options for mpispawn In-Reply-To: <20090617124312.GE3072@cse.ohio-state.edu> References: <876512660906161052j4342c886rd283f42e0fc0abb1@mail.gmail.com> <20090616181839.GG3120@cse.ohio-state.edu> <876512660906161317t6cb716b3lb6d55b3116b84fb8@mail.gmail.com> <20090617124312.GE3072@cse.ohio-state.edu> Message-ID: <876512660906170603m76370780i2b532ec4ebb7f993@mail.gmail.com> Thanks. This would cure several problems for me. On Wed, Jun 17, 2009 at 7:43 AM, Jonathan Perkins wrote: > On Tue, Jun 16, 2009 at 03:17:00PM -0500, Laurence Marks wrote: >> On Tue, Jun 16, 2009 at 1:18 PM, Jonathan Perkins < >> perkinjo@cse.ohio-state.edu> wrote: >> >> > On Tue, Jun 16, 2009 at 12:52:51PM -0500, Laurence Marks wrote: >> > > Are there any environment/startup files for mpispawn which are node >> > specific? >> > >> > Are you asking whether there are any files created by mpispawn >> > temporarily when mpispawn is invoked or are you asking whether there are >> > any options that changes the behavior of mpispawn? >> >> Here is what I would like to do, either:a) Set processor affinity off if >> mpispawn is only spawning 1 mpi (MPISPAWN_LOCAL_NPROCS=1) >> b) Set processor affinity off if mpispawn in running on a specific computer, >> e.g. by looking in /etc/mpispawn.rc or similar. > > Okay, you're asking for some sort of configuration options based on > either runtime conditions or preset configuration file(s). ?We don't > currently support this but can consider it for future releases. ?Based > on my first glance it appears that option a) should be possible. > > As you may already know, you can disable processor affinity by setting > MV2_ENABLE_AFFINITY=0 on the mpirun_rsh command line if you want to > globally disable this. ?Either way, I'll discuss this with my team > members to see whether we can take these situations into account. > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering to study the structure of matter. From karl at tacc.utexas.edu Wed Jun 17 09:33:21 2009 From: karl at tacc.utexas.edu (Karl W. Schulz) Date: Wed Jun 17 09:33:54 2009 Subject: [mvapich-discuss] Options for mpispawn In-Reply-To: <876512660906170603m76370780i2b532ec4ebb7f993@mail.gmail.com> References: <876512660906161052j4342c886rd283f42e0fc0abb1@mail.gmail.com> <20090616181839.GG3120@cse.ohio-state.edu> <876512660906161317t6cb716b3lb6d55b3116b84fb8@mail.gmail.com> <20090617124312.GE3072@cse.ohio-state.edu> <876512660906170603m76370780i2b532ec4ebb7f993@mail.gmail.com> Message-ID: <9AE028DB-86D5-400E-B481-0A4D29D7CE76@tacc.utexas.edu> Laurence, Note that you could also leverage the environment variable mentioned below to disable affinity in MVAPICH and then use numactl directly in a wrapper script to launch jobs. Then you can embed local policy decisions based on the number of tasks spawned per node. We do this locally to provide reasonable affinity settings on QS/QC compute nodes based on how many MPI tasks are being spawned per node. Regards, Karl On Jun 17, 2009, at 8:03 AM, Laurence Marks wrote: > Thanks. This would cure several problems for me. > > On Wed, Jun 17, 2009 at 7:43 AM, Jonathan > Perkins wrote: >> On Tue, Jun 16, 2009 at 03:17:00PM -0500, Laurence Marks wrote: >>> On Tue, Jun 16, 2009 at 1:18 PM, Jonathan Perkins < >>> perkinjo@cse.ohio-state.edu> wrote: >>> >>>> On Tue, Jun 16, 2009 at 12:52:51PM -0500, Laurence Marks wrote: >>>>> Are there any environment/startup files for mpispawn which are >>>>> node >>>> specific? >>>> >>>> Are you asking whether there are any files created by mpispawn >>>> temporarily when mpispawn is invoked or are you asking whether >>>> there are >>>> any options that changes the behavior of mpispawn? >>> >>> Here is what I would like to do, either:a) Set processor affinity >>> off if >>> mpispawn is only spawning 1 mpi (MPISPAWN_LOCAL_NPROCS=1) >>> b) Set processor affinity off if mpispawn in running on a specific >>> computer, >>> e.g. by looking in /etc/mpispawn.rc or similar. >> >> Okay, you're asking for some sort of configuration options based on >> either runtime conditions or preset configuration file(s). We don't >> currently support this but can consider it for future releases. >> Based >> on my first glance it appears that option a) should be possible. >> >> As you may already know, you can disable processor affinity by >> setting >> MV2_ENABLE_AFFINITY=0 on the mpirun_rsh command line if you want to >> globally disable this. Either way, I'll discuss this with my team >> members to see whether we can take these situations into account. >> >> -- >> Jonathan Perkins >> http://www.cse.ohio-state.edu/~perkinjo >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/ > Electron crystallography is the branch of science that uses electron > scattering to study the structure of matter. > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From cco2 at cray.com Wed Jun 17 12:13:48 2009 From: cco2 at cray.com (Christopher Co) Date: Wed Jun 17 12:14:17 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: References: Message-ID: <4A39163C.7060404@cray.com> Those specifications are correct. I am seeing that the MV2_CPU_MAPPING option does not have an effect on which cores are chosen so when I launch a Ping-Pong, 2 cores are arbitrarily chosen by mpirun_rsh. One thing that might be hindering PLPA support is that I do not have sudo/root access on the machine. I installed everything into my home directory. Could this be the issue? Chris Dhabaleswar Panda wrote: > Could you let us know what issues you are seeing when using > MV2_CPU_MAPPING. The PLPA support is embedded in MVAPICH2 code. It does > not require any additional configure/install. I am assuming that you are > using the Gen2 (OFED) interface with mpirun_rsh and your systems are > Linux-based. > > Thanks, > > DK > > > On Tue, 16 Jun 2009, Christopher Co wrote: > > >> I am having issues with running processes on the cores I specify using >> MV2_CPU_MAPPING. Is the PLPA support for mapping MPI processes to cores >> embedded in MVAPICH2 or does it link to an existing PLPA on >> configure/install? Also, I want to confirm that no extra configure >> options are needed to enable this feature. >> >> >> Thanks, >> Chris >> >> Dhabaleswar Panda wrote: >> >>> Thanks for letting us know that you are using MVAPICH2 1.4. I believe you >>> are taking numbers on Intel systems. Please note that on Intel systems, >>> two cores next to each other within the same chip are numbered as 0 and 4 >>> (not 0 and 1). Thus, the default setting (with processes 0 and 1) run >>> across the chips and thus, you are seeing worse performance. Please run >>> your tests across cores 0 and 4 and you should be able to see better >>> performance. Depending on which pairs of processes you use, you may see >>> some differences in performance for short and large messages (depends on >>> whether these cores are within the same chip, same socket or across >>> sockets). I am attaching some numbers below on our Nehalem system with >>> these two CPU mappings and you can see the performance difference. >>> >>> MVAPICH2 provides flexible mapping of MPI processes to cores within a >>> node. You can try out performance across various pairs and you will see >>> performance difference. More details on such mapping are available from >>> here: >>> >>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 >>> >>> Also, starting from MVAPICH2 1.4, a new single-copy kernel-based >>> shared-memory scheme (LiMIC2) is introduced. This is `off' by default. >>> You can use it to get better performance for larger message sizes. You >>> need to configure with enable-limic2 and you also need to use >>> MV2_SMP_USE_LIMIC2=1. More details are available from here: >>> >>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-370006.9 >>> >>> Here are some performance numbers with different CPU mappings. >>> >>> OSU MPI latency with Default CPU mapping (LiMIC2 is off) >>> -------------------------------------------------------- >>> >>> # OSU MPI Latency Test v3.1.1 >>> # Size Latency (us) >>> 0 0.77 >>> 1 0.95 >>> 2 0.95 >>> 4 0.94 >>> 8 0.94 >>> 16 0.94 >>> 32 0.96 >>> 64 0.99 >>> 128 1.09 >>> 256 1.22 >>> 512 1.37 >>> 1024 1.61 >>> 2048 1.79 >>> 4096 2.43 >>> 8192 5.42 >>> 16384 6.73 >>> 32768 9.57 >>> 65536 15.34 >>> 131072 28.71 >>> 262144 53.13 >>> 524288 100.24 >>> 1048576 199.98 >>> 2097152 387.28 >>> 4194304 991.68 >>> >>> OSU MPI latency with CPU mapping 0:4 (LiMIC2 is off) >>> ---------------------------------------------------- >>> >>> # OSU MPI Latency Test v3.1.1 >>> # Size Latency (us) >>> 0 0.34 >>> 1 0.40 >>> 2 0.40 >>> 4 0.40 >>> 8 0.40 >>> 16 0.40 >>> 32 0.42 >>> 64 0.42 >>> 128 0.45 >>> 256 0.50 >>> 512 0.55 >>> 1024 0.67 >>> 2048 0.91 >>> 4096 1.35 >>> 8192 3.66 >>> 16384 5.01 >>> 32768 7.41 >>> 65536 12.90 >>> 131072 25.21 >>> 262144 49.71 >>> 524288 97.17 >>> 1048576 187.50 >>> 2097152 465.57 >>> 4194304 1196.31 >>> >>> Let us know if you get better performance with appropriate CPU mapping. >>> >>> Thanks, >>> >>> DK >>> >>> >>> On Mon, 15 Jun 2009, Christopher Co wrote: >>> >>> >>> >>>> I am using MVAPICH2 1.4 with the default configuration (since the CX-1 >>>> uses Mellanox Infiniband). I am fairly certain my CPU mapping was >>>> on-node for both cases (curiously, is there a way for MVAPICH2 to print >>>> out the nodes/cores running). I have the numbers for Ping Pong for the >>>> off-node case. I should have included this in my earlier message: >>>> Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time >>>> (usec) >>>> 2 1000 0 4.16 3.4 >>>> >>>> 1000 1 4.67 3.56 >>>> >>>> 1000 2 4.21 3.56 >>>> >>>> 1000 4 4.23 3.62 >>>> >>>> 1000 8 4.33 3.63 >>>> >>>> 1000 16 4.33 3.64 >>>> >>>> 1000 32 4.38 3.73 >>>> >>>> 1000 64 4.44 3.92 >>>> >>>> 1000 128 5.61 4.71 >>>> >>>> 1000 256 5.92 5.23 >>>> >>>> 1000 512 6.52 5.79 >>>> >>>> 1000 1024 7.68 7.06 >>>> >>>> 1000 2048 9.97 9.36 >>>> >>>> 1000 4096 12.39 11.97 >>>> >>>> 1000 8192 17.86 22.53 >>>> >>>> 1000 16384 27.44 28.27 >>>> >>>> 1000 32768 40.32 39.82 >>>> >>>> 640 65536 63.61 62.97 >>>> >>>> 320 131072 109.69 110.01 >>>> >>>> 160 262144 204.71 206.9 >>>> >>>> 80 524288 400.72 397.1 >>>> >>>> 40 1048576 775.64 776.45 >>>> >>>> 20 2097152 1523.95 1535.65 >>>> >>>> 10 4194304 3018.84 3054.89 >>>> >>>> >>>> >>>> Chris >>>> >>>> >>>> Dhabaleswar Panda wrote: >>>> >>>> >>>>> Can you tell us which version of MVAPICH2 you are using and which >>>>> option(s) are configured? Are you using correct CPU mapping in both >>>>> cases? >>>>> >>>>> DK >>>>> >>>>> On Mon, 15 Jun 2009, Christopher Co wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> I am doing performance analysis on a Cray CX1 machine. I have run the >>>>>> Pallas MPI benchmark and have noticed a considerable performance >>>>>> difference between MVAPICH2 and Intel MPI on all the tests when shared >>>>>> memory is used. I have also run the benchmark for non-shared memory and >>>>>> the two performed nearly the same (MVAPICH2 was slightly faster). Is >>>>>> this slowdown on shared memory a known issue and/or are there fixes or >>>>>> switches I can enable or disable to get more speed? >>>>>> >>>>>> To give an idea of what I'm seeing, for the simple Ping Pong test for >>>>>> two processes on the same chip, the numbers looks like: >>>>>> >>>>>> Processes # repetitions >>>>>> #bytes Intel MPI time (usec) MVAPICH2 >>>>>> time (usec) >>>>>> 2 1000 0 0.35 0.94 >>>>>> >>>>>> 1000 1 0.44 1.24 >>>>>> >>>>>> 1000 2 0.45 1.17 >>>>>> >>>>>> 1000 4 0.45 1.08 >>>>>> >>>>>> 1000 8 0.45 1.11 >>>>>> >>>>>> 1000 16 0.44 1.13 >>>>>> >>>>>> 1000 32 0.45 1.21 >>>>>> >>>>>> 1000 64 0.47 1.35 >>>>>> >>>>>> 1000 128 0.48 1.75 >>>>>> >>>>>> 1000 256 0.51 2.92 >>>>>> >>>>>> 1000 512 0.57 3.41 >>>>>> >>>>>> 1000 1024 0.76 3.85 >>>>>> >>>>>> 1000 2048 0.98 4.27 >>>>>> >>>>>> 1000 4096 1.53 5.14 >>>>>> >>>>>> 1000 8192 2.59 8.04 >>>>>> >>>>>> 1000 16384 4.86 14.34 >>>>>> >>>>>> 1000 32768 7.17 33.92 >>>>>> >>>>>> 640 65536 11.65 43.27 >>>>>> >>>>>> 320 131072 20.97 66.98 >>>>>> >>>>>> 160 262144 39.64 118.58 >>>>>> >>>>>> 80 524288 84.91 224.40 >>>>>> >>>>>> 40 1048576 212.76 461.80 >>>>>> >>>>>> 20 2097152 458.55 1053.67 >>>>>> >>>>>> 10 4194304 1738.30 2649.30 >>>>>> >>>>>> >>>>>> Hopefully the table came out clear. MVAPICH2 always lags behind by a >>>>>> considerable amount. Any insight is much appreciated. Thanks! >>>>>> >>>>>> >>>>>> Chris Co >>>>>> _______________________________________________ >>>>>> mvapich-discuss mailing list >>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>> >>>>>> >>>>>> >>>>>> >>> > > From jgatenc at sandia.gov Wed Jun 17 17:08:25 2009 From: jgatenc at sandia.gov (Atencio, Jonathan Gerald) Date: Wed Jun 17 17:08:55 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. Message-ID: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> Hello, I noticing that I am attempt to build mvapich using gcc 4.3.2 I no longer see the mpicxx or mpiCC wrappers for c++. Instead, I now the following error in my configure log: ... checking whether selected C++ compiler can compile iostream.h... no!...Cannot use g++ ... It would appear that since g++ could not compile iostream.h, that it does not build the c++ portion of mvapich. I can build mvapich with gcc 4.1 with no issues. I notice in the GCC 4.3 release notes (http://gcc.gnu.org/gcc-4.3/porting_to.html) that iostream.h header was removed. Is there a way to build mvapich using gcc 4.3 or higher? Thank you, Jonathan From cco2 at cray.com Wed Jun 17 17:21:07 2009 From: cco2 at cray.com (Christopher Co) Date: Wed Jun 17 17:21:34 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: <4A39163C.7060404@cray.com> References: <4A39163C.7060404@cray.com> Message-ID: <4A395E43.3060802@cray.com> I have found that the CX-1 I am running on has two Intel Xeon E5472 3 GHz processors (Harpertown). Your test results were on Nehalem processors. When I have received the correct CPU mapping, I've gotten roughly 0.8 usec to Ping Pong 8 bytes. I wonder if this can account for the discrepancy. Anyways, I'll investigate this further and get more data but I wanted to throw this information out there in case it can be helpful. Chris Christopher Co wrote: > Those specifications are correct. I am seeing that the MV2_CPU_MAPPING > option does not have an effect on which cores are chosen so when I > launch a Ping-Pong, 2 cores are arbitrarily chosen by mpirun_rsh. One > thing that might be hindering PLPA support is that I do not have > sudo/root access on the machine. I installed everything into my home > directory. Could this be the issue? > > > Chris > > Dhabaleswar Panda wrote: > >> Could you let us know what issues you are seeing when using >> MV2_CPU_MAPPING. The PLPA support is embedded in MVAPICH2 code. It does >> not require any additional configure/install. I am assuming that you are >> using the Gen2 (OFED) interface with mpirun_rsh and your systems are >> Linux-based. >> >> Thanks, >> >> DK >> >> >> On Tue, 16 Jun 2009, Christopher Co wrote: >> >> >> >>> I am having issues with running processes on the cores I specify using >>> MV2_CPU_MAPPING. Is the PLPA support for mapping MPI processes to cores >>> embedded in MVAPICH2 or does it link to an existing PLPA on >>> configure/install? Also, I want to confirm that no extra configure >>> options are needed to enable this feature. >>> >>> >>> Thanks, >>> Chris >>> >>> Dhabaleswar Panda wrote: >>> >>> >>>> Thanks for letting us know that you are using MVAPICH2 1.4. I believe you >>>> are taking numbers on Intel systems. Please note that on Intel systems, >>>> two cores next to each other within the same chip are numbered as 0 and 4 >>>> (not 0 and 1). Thus, the default setting (with processes 0 and 1) run >>>> across the chips and thus, you are seeing worse performance. Please run >>>> your tests across cores 0 and 4 and you should be able to see better >>>> performance. Depending on which pairs of processes you use, you may see >>>> some differences in performance for short and large messages (depends on >>>> whether these cores are within the same chip, same socket or across >>>> sockets). I am attaching some numbers below on our Nehalem system with >>>> these two CPU mappings and you can see the performance difference. >>>> >>>> MVAPICH2 provides flexible mapping of MPI processes to cores within a >>>> node. You can try out performance across various pairs and you will see >>>> performance difference. More details on such mapping are available from >>>> here: >>>> >>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 >>>> >>>> Also, starting from MVAPICH2 1.4, a new single-copy kernel-based >>>> shared-memory scheme (LiMIC2) is introduced. This is `off' by default. >>>> You can use it to get better performance for larger message sizes. You >>>> need to configure with enable-limic2 and you also need to use >>>> MV2_SMP_USE_LIMIC2=1. More details are available from here: >>>> >>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-370006.9 >>>> >>>> Here are some performance numbers with different CPU mappings. >>>> >>>> OSU MPI latency with Default CPU mapping (LiMIC2 is off) >>>> -------------------------------------------------------- >>>> >>>> # OSU MPI Latency Test v3.1.1 >>>> # Size Latency (us) >>>> 0 0.77 >>>> 1 0.95 >>>> 2 0.95 >>>> 4 0.94 >>>> 8 0.94 >>>> 16 0.94 >>>> 32 0.96 >>>> 64 0.99 >>>> 128 1.09 >>>> 256 1.22 >>>> 512 1.37 >>>> 1024 1.61 >>>> 2048 1.79 >>>> 4096 2.43 >>>> 8192 5.42 >>>> 16384 6.73 >>>> 32768 9.57 >>>> 65536 15.34 >>>> 131072 28.71 >>>> 262144 53.13 >>>> 524288 100.24 >>>> 1048576 199.98 >>>> 2097152 387.28 >>>> 4194304 991.68 >>>> >>>> OSU MPI latency with CPU mapping 0:4 (LiMIC2 is off) >>>> ---------------------------------------------------- >>>> >>>> # OSU MPI Latency Test v3.1.1 >>>> # Size Latency (us) >>>> 0 0.34 >>>> 1 0.40 >>>> 2 0.40 >>>> 4 0.40 >>>> 8 0.40 >>>> 16 0.40 >>>> 32 0.42 >>>> 64 0.42 >>>> 128 0.45 >>>> 256 0.50 >>>> 512 0.55 >>>> 1024 0.67 >>>> 2048 0.91 >>>> 4096 1.35 >>>> 8192 3.66 >>>> 16384 5.01 >>>> 32768 7.41 >>>> 65536 12.90 >>>> 131072 25.21 >>>> 262144 49.71 >>>> 524288 97.17 >>>> 1048576 187.50 >>>> 2097152 465.57 >>>> 4194304 1196.31 >>>> >>>> Let us know if you get better performance with appropriate CPU mapping. >>>> >>>> Thanks, >>>> >>>> DK >>>> >>>> >>>> On Mon, 15 Jun 2009, Christopher Co wrote: >>>> >>>> >>>> >>>> >>>>> I am using MVAPICH2 1.4 with the default configuration (since the CX-1 >>>>> uses Mellanox Infiniband). I am fairly certain my CPU mapping was >>>>> on-node for both cases (curiously, is there a way for MVAPICH2 to print >>>>> out the nodes/cores running). I have the numbers for Ping Pong for the >>>>> off-node case. I should have included this in my earlier message: >>>>> Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time >>>>> (usec) >>>>> 2 1000 0 4.16 3.4 >>>>> >>>>> 1000 1 4.67 3.56 >>>>> >>>>> 1000 2 4.21 3.56 >>>>> >>>>> 1000 4 4.23 3.62 >>>>> >>>>> 1000 8 4.33 3.63 >>>>> >>>>> 1000 16 4.33 3.64 >>>>> >>>>> 1000 32 4.38 3.73 >>>>> >>>>> 1000 64 4.44 3.92 >>>>> >>>>> 1000 128 5.61 4.71 >>>>> >>>>> 1000 256 5.92 5.23 >>>>> >>>>> 1000 512 6.52 5.79 >>>>> >>>>> 1000 1024 7.68 7.06 >>>>> >>>>> 1000 2048 9.97 9.36 >>>>> >>>>> 1000 4096 12.39 11.97 >>>>> >>>>> 1000 8192 17.86 22.53 >>>>> >>>>> 1000 16384 27.44 28.27 >>>>> >>>>> 1000 32768 40.32 39.82 >>>>> >>>>> 640 65536 63.61 62.97 >>>>> >>>>> 320 131072 109.69 110.01 >>>>> >>>>> 160 262144 204.71 206.9 >>>>> >>>>> 80 524288 400.72 397.1 >>>>> >>>>> 40 1048576 775.64 776.45 >>>>> >>>>> 20 2097152 1523.95 1535.65 >>>>> >>>>> 10 4194304 3018.84 3054.89 >>>>> >>>>> >>>>> >>>>> Chris >>>>> >>>>> >>>>> Dhabaleswar Panda wrote: >>>>> >>>>> >>>>> >>>>>> Can you tell us which version of MVAPICH2 you are using and which >>>>>> option(s) are configured? Are you using correct CPU mapping in both >>>>>> cases? >>>>>> >>>>>> DK >>>>>> >>>>>> On Mon, 15 Jun 2009, Christopher Co wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am doing performance analysis on a Cray CX1 machine. I have run the >>>>>>> Pallas MPI benchmark and have noticed a considerable performance >>>>>>> difference between MVAPICH2 and Intel MPI on all the tests when shared >>>>>>> memory is used. I have also run the benchmark for non-shared memory and >>>>>>> the two performed nearly the same (MVAPICH2 was slightly faster). Is >>>>>>> this slowdown on shared memory a known issue and/or are there fixes or >>>>>>> switches I can enable or disable to get more speed? >>>>>>> >>>>>>> To give an idea of what I'm seeing, for the simple Ping Pong test for >>>>>>> two processes on the same chip, the numbers looks like: >>>>>>> >>>>>>> Processes # repetitions >>>>>>> #bytes Intel MPI time (usec) MVAPICH2 >>>>>>> time (usec) >>>>>>> 2 1000 0 0.35 0.94 >>>>>>> >>>>>>> 1000 1 0.44 1.24 >>>>>>> >>>>>>> 1000 2 0.45 1.17 >>>>>>> >>>>>>> 1000 4 0.45 1.08 >>>>>>> >>>>>>> 1000 8 0.45 1.11 >>>>>>> >>>>>>> 1000 16 0.44 1.13 >>>>>>> >>>>>>> 1000 32 0.45 1.21 >>>>>>> >>>>>>> 1000 64 0.47 1.35 >>>>>>> >>>>>>> 1000 128 0.48 1.75 >>>>>>> >>>>>>> 1000 256 0.51 2.92 >>>>>>> >>>>>>> 1000 512 0.57 3.41 >>>>>>> >>>>>>> 1000 1024 0.76 3.85 >>>>>>> >>>>>>> 1000 2048 0.98 4.27 >>>>>>> >>>>>>> 1000 4096 1.53 5.14 >>>>>>> >>>>>>> 1000 8192 2.59 8.04 >>>>>>> >>>>>>> 1000 16384 4.86 14.34 >>>>>>> >>>>>>> 1000 32768 7.17 33.92 >>>>>>> >>>>>>> 640 65536 11.65 43.27 >>>>>>> >>>>>>> 320 131072 20.97 66.98 >>>>>>> >>>>>>> 160 262144 39.64 118.58 >>>>>>> >>>>>>> 80 524288 84.91 224.40 >>>>>>> >>>>>>> 40 1048576 212.76 461.80 >>>>>>> >>>>>>> 20 2097152 458.55 1053.67 >>>>>>> >>>>>>> 10 4194304 1738.30 2649.30 >>>>>>> >>>>>>> >>>>>>> Hopefully the table came out clear. MVAPICH2 always lags behind by a >>>>>>> considerable amount. Any insight is much appreciated. Thanks! >>>>>>> >>>>>>> >>>>>>> Chris Co >>>>>>> _______________________________________________ >>>>>>> mvapich-discuss mailing list >>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>> >>>> >> >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From isono at cray.com Thu Jun 18 03:47:14 2009 From: isono at cray.com (Satoshi Isono) Date: Thu Jun 18 03:47:48 2009 Subject: [mvapich-discuss] What's a cause? Message-ID: <925346A443D4E340BEB20248BAFCDBDF0B7174E5@CFEVS1-IP.americas.cray.com> Hello everyone, When I used MVAPICH 1.0.1, I got errors as below after two minutes. MPI size is 2,560 processes. I think this problem was caused system trouble on each compute node. I would like to know everyone's thought. Messages shows that some of shared libraries cannot load. Are there any key items as below error messages? MPI process terminated unexpectedly MPI process terminated unexpectedly MPI process terminated unexpectedly MPI process terminated unexpectedly MPI process terminated unexpectedly MPI process terminated unexpectedly MPI process terminated unexpectedly forrtl: error (69): process interrupted (SIGINT) Image PC Routine Line Source libpthread.so.0 0000003D3FE0DE60 Unknown Unknown Unknown libpthread.so.0 0000003D3FE0CC79 Unknown Unknown Unknown libibverbs.so.1 0000003D3F606B2F Unknown Unknown Unknown nhm_driver-2 0000000000CB3642 Unknown Unknown Unknown libpthread.so.0 0000003D3FE062E7 Unknown Unknown Unknown libc.so.6 0000003D3F2CE3BD Unknown Unknown Unknown forrtl: error (69): process interrupted (SIGINT) Image PC Routine Line Source nhm_driver-2 000000000068AC82 Unknown Unknown Unknown nhm_driver-2 00000000004059D6 Unknown Unknown Unknown nhm_driver-2 0000000000405942 Unknown Unknown Unknown libc.so.6 0000003D3F21D8A4 Unknown Unknown Unknown nhm_driver-2 0000000000405869 Unknown Unknown Unknown MPI process terminated unexpectedly Regards, Satoshi Isono From panda at cse.ohio-state.edu Thu Jun 18 07:18:28 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Thu Jun 18 07:18:47 2009 Subject: [mvapich-discuss] What's a cause? In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF0B7174E5@CFEVS1-IP.americas.cray.com> Message-ID: Please see that all nodes are configured uniformly with respect to libraries and drivers. Also, make sure that all nodes are accessible through the rsh/ssh mechanism you are using with mpirun. Otherwise, your job might be getting aborted during the job launch phase or immediately after it. Something like this seems to be happening here. I would also like to indicate that MVAPICH 1.0.1 version (you are using) is more than one year old. Please use the latest 1.1 branch version from the following location. This version has multiple bugfixes and additional optimizations/features compared to the 1.0.1 version. http://mvapich.cse.ohio-state.edu/nightly/mvapich/branches/1.1/ DK On Thu, 18 Jun 2009, Satoshi Isono wrote: > Hello everyone, > > When I used MVAPICH 1.0.1, I got errors as below after two minutes. MPI > size is 2,560 processes. I think this problem was caused system trouble > on each compute node. I would like to know everyone's thought. Messages > shows that some of shared libraries cannot load. Are there any key items > as below error messages? > > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > forrtl: error (69): process interrupted (SIGINT) > Image PC Routine Line > Source > libpthread.so.0 0000003D3FE0DE60 Unknown Unknown > Unknown > libpthread.so.0 0000003D3FE0CC79 Unknown Unknown > Unknown > libibverbs.so.1 0000003D3F606B2F Unknown Unknown > Unknown > nhm_driver-2 0000000000CB3642 Unknown Unknown > Unknown > libpthread.so.0 0000003D3FE062E7 Unknown Unknown > Unknown > libc.so.6 0000003D3F2CE3BD Unknown Unknown > Unknown > forrtl: error (69): process interrupted (SIGINT) > Image PC Routine Line > Source > nhm_driver-2 000000000068AC82 Unknown Unknown > Unknown > nhm_driver-2 00000000004059D6 Unknown Unknown > Unknown > nhm_driver-2 0000000000405942 Unknown Unknown > Unknown > libc.so.6 0000003D3F21D8A4 Unknown Unknown > Unknown > nhm_driver-2 0000000000405869 Unknown Unknown > Unknown > MPI process terminated unexpectedly > > > Regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From isono at cray.com Thu Jun 18 09:52:23 2009 From: isono at cray.com (Satoshi Isono) Date: Thu Jun 18 09:52:45 2009 Subject: [mvapich-discuss] What's a cause? In-Reply-To: References: <925346A443D4E340BEB20248BAFCDBDF0B7174E5@CFEVS1-IP.americas.cray.com> Message-ID: <925346A443D4E340BEB20248BAFCDBDF0B71761D@CFEVS1-IP.americas.cray.com> Dear DK Panda, Thanks for your advice. At first, I try to make sure all nodes have been configured MPI environment. Regards, Satoshi Isono -----Original Message----- From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] Sent: Thursday, June 18, 2009 8:18 PM To: Satoshi Isono Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] What's a cause? Please see that all nodes are configured uniformly with respect to libraries and drivers. Also, make sure that all nodes are accessible through the rsh/ssh mechanism you are using with mpirun. Otherwise, your job might be getting aborted during the job launch phase or immediately after it. Something like this seems to be happening here. I would also like to indicate that MVAPICH 1.0.1 version (you are using) is more than one year old. Please use the latest 1.1 branch version from the following location. This version has multiple bugfixes and additional optimizations/features compared to the 1.0.1 version. http://mvapich.cse.ohio-state.edu/nightly/mvapich/branches/1.1/ DK On Thu, 18 Jun 2009, Satoshi Isono wrote: > Hello everyone, > > When I used MVAPICH 1.0.1, I got errors as below after two minutes. MPI > size is 2,560 processes. I think this problem was caused system trouble > on each compute node. I would like to know everyone's thought. Messages > shows that some of shared libraries cannot load. Are there any key items > as below error messages? > > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > MPI process terminated unexpectedly > forrtl: error (69): process interrupted (SIGINT) > Image PC Routine Line > Source > libpthread.so.0 0000003D3FE0DE60 Unknown Unknown > Unknown > libpthread.so.0 0000003D3FE0CC79 Unknown Unknown > Unknown > libibverbs.so.1 0000003D3F606B2F Unknown Unknown > Unknown > nhm_driver-2 0000000000CB3642 Unknown Unknown > Unknown > libpthread.so.0 0000003D3FE062E7 Unknown Unknown > Unknown > libc.so.6 0000003D3F2CE3BD Unknown Unknown > Unknown > forrtl: error (69): process interrupted (SIGINT) > Image PC Routine Line > Source > nhm_driver-2 000000000068AC82 Unknown Unknown > Unknown > nhm_driver-2 00000000004059D6 Unknown Unknown > Unknown > nhm_driver-2 0000000000405942 Unknown Unknown > Unknown > libc.so.6 0000003D3F21D8A4 Unknown Unknown > Unknown > nhm_driver-2 0000000000405869 Unknown Unknown > Unknown > MPI process terminated unexpectedly > > > Regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From perkinjo at cse.ohio-state.edu Thu Jun 18 15:52:22 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Thu Jun 18 15:52:45 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. In-Reply-To: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> References: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> Message-ID: <20090618195222.GD3073@cse.ohio-state.edu> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090618/93cea879/attachment-0001.bin From ce107 at MIT.EDU Thu Jun 18 17:41:34 2009 From: ce107 at MIT.EDU (Constantinos Evangelinos) Date: Thu Jun 18 23:04:23 2009 Subject: [mvapich-discuss] Limic2 on RHEL4 AS Message-ID: <200906181741.35450.ce107@mit.edu> Hi - I'm trying to build and test MVAPICH2 1.4rc1 on a small cluster of RHEL4 boxes with Infiniband (using Intel compilers 10.1). Unfortunately it appears that Limic2 has been written assuming a kernel version > 2.6.9 or 2.6.10 when the transition from the old ioctl happened. [root@bowditch limic]# gmake gmake -C /lib/modules/2.6.9-55.0.9.ELsmp/build SUBDIRS=/root/build/mvapich2-1.4rc1/limic modules gmake[1]: Entering directory `/usr/src/kernels/2.6.9-55.0.9.EL-smp-x86_64' CC [M] /root/build/mvapich2-1.4rc1/limic/limic.o In file included from /root/build/mvapich2-1.4rc1/limic/limic.c:17: include/linux/cdev.h:24: warning: "struct inode" declared inside parameter list include/linux/cdev.h:24: warning: its scope is only this definition or declaration, which is probably not what you want /root/build/mvapich2-1.4rc1/limic/limic.c:103: error: unknown field `unlocked_ioctl' specified in initializer /root/build/mvapich2-1.4rc1/limic/limic.c:103: warning: initialization from incompatible pointer type gmake[2]: *** [/root/build/mvapich2-1.4rc1/limic/limic.o] Error 1 gmake[1]: *** [_module_/root/build/mvapich2-1.4rc1/limic] Error 2 gmake[1]: Leaving directory `/usr/src/kernels/2.6.9-55.0.9.EL-smp-x86_64' gmake: *** [all] Error 2 I was able to build the limic kernel module by making the following changes: *** limic.c 2009-06-18 17:32:44.000000000 -0400 --- limic.c.new 2009-06-18 17:35:45.000000000 -0400 *************** *** 14,23 **** #include #include #include #include #include - #include #include "limic_internal.h" --- 14,23 ---- #include #include + #include #include #include #include #include "limic_internal.h" *************** *** 32,37 **** --- 32,38 ---- int limic_ioctl( + struct inode *inode, struct file *fp, unsigned int op_code, unsigned long arg) *************** *** 100,106 **** static struct file_operations limic_fops={ ! unlocked_ioctl: limic_ioctl, open: limic_open, release: limic_release }; --- 101,107 ---- static struct file_operations limic_fops={ ! ioctl: limic_ioctl, open: limic_open, release: limic_release }; The kernel module then builds fine but when I proceed to build the rest of the MVAPICH2 code I get in trouble with the instances of use of ioctl in limic_lib.h - the old ioctl has an extra inode argument. So the question becomes - is Limic2 restricted to more recent kernel versions (despite the popularity of RHEL4) or can a fix be put in 1.4 in time for the final release? Also - there is nothing that claims it cannot be done but maybe it's an oversight - can CR work in conjunction with Limic2? Constantinos -- Dr. Constantinos Evangelinos Room 54-1518, EAPS/MIT Earth, Atmospheric and Planetary Sciences 77 Massachusetts Avenue Massachusetts Institute of Technology Cambridge, MA 02139 +1-617-324-3386/+1-617-253-4464 (fax) USA From jgatenc at sandia.gov Thu Jun 18 17:53:33 2009 From: jgatenc at sandia.gov (Atencio, Jonathan Gerald) Date: Thu Jun 18 23:15:23 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. In-Reply-To: <20090618195222.GD3073@cse.ohio-state.edu> References: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> <20090618195222.GD3073@cse.ohio-state.edu> Message-ID: <6DBFA9068040C94484B97A16CF47C39A7E3FED2849@ES03SNLNT.srn.sandia.gov> Hello Jonathan, I started from a clean mvapich 1.1 directory. Patched using c++.patch. Configured, notice that C++ bindings would be built. However, when I make, the first compile line errors out. Please see the following: % cd mvapich-1.1 % patch -p1 < c++.patch patching file MPI-2-C++/contrib/examples/chapter_10_mpi2.cc patching file MPI-2-C++/contrib/examples/hello_world.cc patching file MPI-2-C++/contrib/examples/pi.cc patching file MPI-2-C++/contrib/examples/ring.cc patching file MPI-2-C++/contrib/examples/topology.cc patching file MPI-2-C++/contrib/examples/user_bcast.cc patching file MPI-2-C++/contrib/test_suite/cancel.cc patching file MPI-2-C++/contrib/test_suite/errhandler.cc patching file MPI-2-C++/contrib/test_suite/messages.cc patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.cc patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.h patching file MPI-2-C++/contrib/test_suite/range.cc patching file MPI-2-C++/contrib/test_suite/signal.cc patching file configure patching file configure.in patching file examples/basic/hello++.cc patching file examples/perftest/config/confdb/aclocal_cxx.m4 patching file examples/test/command/runtests.in patching file installtest/hello++.cc patching file src/cxx/test/basic.cxx patching file src/cxx/test/errtest.cxx patching file src/cxx/test/send1.cxx % export CC=gcc CXX=g++ F77=gfortran F90=gfortran F77_GETARGDECL=" " % ./configure --with-device=ch_gen2 --with-arch=LINUX -prefix=/gscratch1/jgatenc/mpi/mvapich/1.1 --enable-romio --enable-f77 --enable-sharedlib --enable-cxx -lib="-L/usr/ofed/lib64 -libverbs -libumad -ldl" 2>&1 | tee ../mvapich-build ... checking whether selected C++ compiler can compile iostream... yes Include C++ bindings for MPI from http://www.osl.iu.edu/research/mpi2c++ Send bug reports about the C++ to mpi2cpp-devel@osl.iu.edu ... % make ... gcc -DHAVE_CONFIG_H -I. -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -fPIC -DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/gscratch1/jgatenc/mvapich-1.1 -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I. -c viainit.c In file included from ibverbs_header.h:30, from viapacket.h:40, from req.h:48, from mpid.h:226, from viadev.h:30, from viainit.c:52: via64.h:51:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined via64.h:100:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined. In file included from viainit.c:57: mpid_smpi.h:66:2: error: #error "No architecture defined !!" In file included from viainit.c:57: mpid_smpi.h:203: error: 'SMP_SEND_BUF_SIZE' undeclared here (not in a function) make[3]: *** [viainit.o] Error 1 Exit status from make was 2 make[2]: *** [mpilib] Error 1 make[1]: *** [mpi-modules] Error 2 make: *** [mpi] Error 2 Thank you, Jonathan _____________________________________________ From: Jonathan Perkins [mailto:perkinjo@cse.ohio-state.edu] Sent: Thursday, June 18, 2009 1:52 PM To: Atencio, Jonathan Gerald Cc: mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090618/18a0bef0/attachment.html From ce107 at MIT.EDU Thu Jun 18 17:51:07 2009 From: ce107 at MIT.EDU (Constantinos Evangelinos) Date: Thu Jun 18 23:47:02 2009 Subject: [mvapich-discuss] MVAPICH2 1.4rc1 with CR on RHEL4 AS Message-ID: <200906181751.07726.ce107@mit.edu> Another problem that I'm facing with MVAPICH2 1.4rc1 on RHEL4 Intel compilers 10.1. I'm trying to build it (without Limic2) with support for CR. I've installed the latest BLCR rpms etc. This is my configure line: ./configure --with-rdma=gen2 --enable-blcr --enable-romio --with-file-system=ufs+pvfs2 CPPFLAGS="-I/opt/pvfs-2.8.1/include" LDFLAGS=-L/opt/pvfs-2.8.1/lib LIBS="-lpvfs2 -lcr -pthread" MPICH2LIB_CFLAGS="-xT -fPIC -D_GNU_SOURCE" MPICH2LIB_CXXFLAGS="-xT -fPIC" MPICH2LIB_FFLAGS="-xT -fPIC" MPICH2LIB_F90FLAGS="-xT -fPIC" CC=icc CXX=icpc F77=ifort F90=ifort --prefix=/opt/mvapich2-1.4rc1/icc In successfully building MVAPICH2 1.2 on the very same platform with the same compilers I discovered I had to (a) add a -lcr to LIBS and (b) add a -pthread to LIBS (-lpthread will not do). I've also had to add -D_GNU_SOURCE as already discussed here. However I hit a snag: make[8]: Entering directory `/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2' icc -DHAVE_CONFIG_H -I. -I. -I/root/build/mvapich2-1.4rc1/src/include -I../../../../../../include -DNDEBUG -O2 -xT -fPIC -D_GNU_SOURCE -D_GNU_SOURCE -I/opt/pvfs-2.8.1/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/include -I/root/build/mvapich2-1.4rc1/src/mpid/common/datatype -I/root/build/mvapich2-1.4rc1/src/mpid/common/datatype -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -c cr.c cr.c(277): error: struct "_pthread_rwlock_t" has no field "__data" if (MPICR_cs_lock.__data.__writer == syscall(SYS_gettid)) ^ cr.c(289): error: struct "_pthread_rwlock_t" has no field "__data" if (MPICR_cs_lock.__data.__writer == syscall(SYS_gettid)) ^ cr.c(963): warning #167: argument of type "struct _IO_FILE *" is incompatible with parameter of type "const char *" MPIU_Error_printf(stderr, "rdma_open_hca failed\n"); ^ cr.c(1005): warning #167: argument of type "void *" is incompatible with parameter of type "void *(*)(void *)" (void*) async_thread, ^ cr.c(924): warning #589: transfer of control bypasses initialization of: variable "pg" (declared at line 938) variable "pg_rank" (declared at line 939) variable "pg_size" (declared at line 940) variable "ud_qpn_all" (declared at line 942) variable "lid_all" (declared at line 943) variable "vc" (declared at line 950) variable "i" (declared at line 951) MPIU_ERR_SETFATALANDJUMP1( ^ cr.c(1268): warning #144: a value of type "MPIDI_CH3_PktGeneric_t *" cannot be used to initialize an entity of type "MPIDI_CH3_Pkt_t *" MPIDI_CH3_Pkt_t *upkt = &(req->dev.pending_pkt); ^ compilation aborted for cr.c (code 2) at which point I cannot proceed any further. Any ideas on what can be done? Also in the past there was a performance penalty for shared memory operations when using CR - am I to understand that is not the case anymore? Thanks in advance, Constantinos -- Dr. Constantinos Evangelinos Room 54-1518, EAPS/MIT Earth, Atmospheric and Planetary Sciences 77 Massachusetts Avenue Massachusetts Institute of Technology Cambridge, MA 02139 +1-617-324-3386/+1-617-253-4464 (fax) USA From perkinjo at cse.ohio-state.edu Fri Jun 19 08:04:23 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Fri Jun 19 08:04:45 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. In-Reply-To: <6DBFA9068040C94484B97A16CF47C39A7E3FED2849@ES03SNLNT.srn.sandia.gov> References: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> <20090618195222.GD3073@cse.ohio-state.edu> <6DBFA9068040C94484B97A16CF47C39A7E3FED2849@ES03SNLNT.srn.sandia.gov> Message-ID: <20090619120423.GC3084@cse.ohio-state.edu> On Thu, Jun 18, 2009 at 03:53:33PM -0600, Atencio, Jonathan Gerald wrote: > Hello Jonathan, > > I started from a clean mvapich 1.1 directory. Patched using c++.patch. Configured, notice that C++ bindings would be built. However, when I make, the first compile line errors out. Please see the following: > > % cd mvapich-1.1 > % patch -p1 < c++.patch > patching file MPI-2-C++/contrib/examples/chapter_10_mpi2.cc > patching file MPI-2-C++/contrib/examples/hello_world.cc > patching file MPI-2-C++/contrib/examples/pi.cc > patching file MPI-2-C++/contrib/examples/ring.cc > patching file MPI-2-C++/contrib/examples/topology.cc > patching file MPI-2-C++/contrib/examples/user_bcast.cc > patching file MPI-2-C++/contrib/test_suite/cancel.cc > patching file MPI-2-C++/contrib/test_suite/errhandler.cc > patching file MPI-2-C++/contrib/test_suite/messages.cc > patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.cc > patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.h > patching file MPI-2-C++/contrib/test_suite/range.cc > patching file MPI-2-C++/contrib/test_suite/signal.cc > patching file configure > patching file configure.in > patching file examples/basic/hello++.cc > patching file examples/perftest/config/confdb/aclocal_cxx.m4 > patching file examples/test/command/runtests.in > patching file installtest/hello++.cc > patching file src/cxx/test/basic.cxx > patching file src/cxx/test/errtest.cxx > patching file src/cxx/test/send1.cxx > % export CC=gcc CXX=g++ F77=gfortran F90=gfortran F77_GETARGDECL=" " > % ./configure --with-device=ch_gen2 --with-arch=LINUX -prefix=/gscratch1/jgatenc/mpi/mvapich/1.1 --enable-romio --enable-f77 --enable-sharedlib --enable-cxx -lib="-L/usr/ofed/lib64 -libverbs -libumad -ldl" 2>&1 | tee ../mvapich-build > ... > checking whether selected C++ compiler can compile iostream... yes > > Include C++ bindings for MPI from http://www.osl.iu.edu/research/mpi2c++ > Send bug reports about the C++ to mpi2cpp-devel@osl.iu.edu > ... > % make > ... > gcc -DHAVE_CONFIG_H -I. -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -fPIC -DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/gscratch1/jgatenc/mvapich-1.1 -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I. -c viainit.c > In file included from ibverbs_header.h:30, > from viapacket.h:40, > from req.h:48, > from mpid.h:226, > from viadev.h:30, > from viainit.c:52: > via64.h:51:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined > via64.h:100:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined. > In file included from viainit.c:57: > mpid_smpi.h:66:2: error: #error "No architecture defined !!" > In file included from viainit.c:57: > mpid_smpi.h:203: error: 'SMP_SEND_BUF_SIZE' undeclared here (not in a function) > make[3]: *** [viainit.o] Error 1 > Exit status from make was 2 > make[2]: *** [mpilib] Error 1 > make[1]: *** [mpi-modules] Error 2 > make: *** [mpi] Error 2 It's weird that the architecture isn't defined. I don't see any changes that would effect this detection and I'm a bit unsure why this didn't show up initially with the earlier version of the gnu compilers. Can you check whether this patched version builds with the older compilers? Also, I'd like for you to send me the output of 'uname -a' and 'cat /proc/cpuinfo'. Thanks. > > > Thank you, > > Jonathan > > _____________________________________________ > From: Jonathan Perkins [mailto:perkinjo@cse.ohio-state.edu] > Sent: Thursday, June 18, 2009 1:52 PM > To: Atencio, Jonathan Gerald > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: Re: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. > > > -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090619/81ef51a9/attachment-0001.bin From jgatenc at sandia.gov Fri Jun 19 10:21:10 2009 From: jgatenc at sandia.gov (Atencio, Jonathan Gerald) Date: Fri Jun 19 10:21:44 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. In-Reply-To: <20090619120423.GC3084@cse.ohio-state.edu> References: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> <20090618195222.GD3073@cse.ohio-state.edu> <6DBFA9068040C94484B97A16CF47C39A7E3FED2849@ES03SNLNT.srn.sandia.gov>, <20090619120423.GC3084@cse.ohio-state.edu> Message-ID: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADCC@ES03SNLNT.srn.sandia.gov> >________________________________________ >From: Jonathan Perkins [perkinjo@cse.ohio-state.edu] >Sent: Friday, June 19, 2009 6:04 AM >To: Atencio, Jonathan Gerald >Cc: mvapich-discuss@cse.ohio-state.edu >Subject: Re: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. > >On Thu, Jun 18, 2009 at 03:53:33PM -0600, Atencio, Jonathan Gerald wrote: >> Hello Jonathan, >> >> I started from a clean mvapich 1.1 directory. Patched using c++.patch. Configured, notice that C++ bindings would be built. However, when I make, the first compile line errors out. Please see the following: >> >> % cd mvapich-1.1 >> % patch -p1 < c++.patch >> patching file MPI-2-C++/contrib/examples/chapter_10_mpi2.cc >> patching file MPI-2-C++/contrib/examples/hello_world.cc >> patching file MPI-2-C++/contrib/examples/pi.cc >> patching file MPI-2-C++/contrib/examples/ring.cc >> patching file MPI-2-C++/contrib/examples/topology.cc >> patching file MPI-2-C++/contrib/examples/user_bcast.cc >> patching file MPI-2-C++/contrib/test_suite/cancel.cc >> patching file MPI-2-C++/contrib/test_suite/errhandler.cc >> patching file MPI-2-C++/contrib/test_suite/messages.cc >> patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.cc >> patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.h >> patching file MPI-2-C++/contrib/test_suite/range.cc >> patching file MPI-2-C++/contrib/test_suite/signal.cc >> patching file configure >> patching file configure.in >> patching file examples/basic/hello++.cc >> patching file examples/perftest/config/confdb/aclocal_cxx.m4 >> patching file examples/test/command/runtests.in >> patching file installtest/hello++.cc >> patching file src/cxx/test/basic.cxx >> patching file src/cxx/test/errtest.cxx >> patching file src/cxx/test/send1.cxx >> % export CC=gcc CXX=g++ F77=gfortran F90=gfortran F77_GETARGDECL=" " >> % ./configure --with-device=ch_gen2 --with-arch=LINUX -prefix=/gscratch1/jgatenc/mpi/mvapich/1.1 --enable-romio --enable-f77 --enable-sharedlib --enable-cxx -lib="-L/usr/ofed/lib64 -libverbs -libumad -ldl" 2>&1 | tee ../mvapich-build >> ... >> checking whether selected C++ compiler can compile iostream... yes >> >> Include C++ bindings for MPI from http://www.osl.iu.edu/research/mpi2c++ >> Send bug reports about the C++ to mpi2cpp-devel@osl.iu.edu >> ... >> % make >> ... >> gcc -DHAVE_CONFIG_H -I. -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -fPIC -DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/gscratch1/jgatenc/mvapich-1.1 -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I. -c viainit.c >> In file included from ibverbs_header.h:30, >> from viapacket.h:40, >> from req.h:48, >> from mpid.h:226, >> from viadev.h:30, >> from viainit.c:52: >> via64.h:51:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined >> via64.h:100:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined. >> In file included from viainit.c:57: >> mpid_smpi.h:66:2: error: #error "No architecture defined !!" >> In file included from viainit.c:57: >> mpid_smpi.h:203: error: 'SMP_SEND_BUF_SIZE' undeclared here (not in a function) >> make[3]: *** [viainit.o] Error 1 >> Exit status from make was 2 >> make[2]: *** [mpilib] Error 1 >> make[1]: *** [mpi-modules] Error 2 >> make: *** [mpi] Error 2 > >It's weird that the architecture isn't defined. I don't see any changes >that would effect this detection and I'm a bit unsure why this didn't >show up initially with the earlier version of the gnu compilers. Can >you check whether this patched version builds with the older compilers? > >Also, I'd like for you to send me the output of 'uname -a' and 'cat >/proc/cpuinfo'. Thanks. > I get the same error using gcc 4.1,2 now. % uname -a Linux glory144 2.6.18-63chaos #1 SMP Fri Dec 19 15:37:16 EST 2008 x86_64 x86_64 x86_64 GNU/Linux % cat /proc/cpu processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 8354 stepping : 3 cpu MHz : 1100.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4424.25 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] ,,, processor : 15 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : Quad-Core AMD Opteron(tm) Processor 8354 stepping : 3 cpu MHz : 1100.000 cache size : 512 KB physical id : 3 siblings : 4 core id : 3 cpu cores : 4 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4422.72 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] >> >> >> Thank you, >> >> Jonathan >> >> _____________________________________________ >> From: Jonathan Perkins [mailto:perkinjo@cse.ohio-state.edu] >> Sent: Thursday, June 18, 2009 1:52 PM >> To: Atencio, Jonathan Gerald >> Cc: mvapich-discuss@cse.ohio-state.edu >> Subject: Re: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. >> >> >> > >-- >Jonathan Perkins >http://www.cse.ohio-state.edu/~perkinjo > From gopalakk at cse.ohio-state.edu Fri Jun 19 10:22:03 2009 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Fri Jun 19 10:22:25 2009 Subject: [mvapich-discuss] MVAPICH2 1.4rc1 with CR on RHEL4 AS In-Reply-To: <200906181751.07726.ce107@mit.edu> References: <200906181751.07726.ce107@mit.edu> Message-ID: <92eddfb50906190722g249898fdo1142df70a5f1b09@mail.gmail.com> Hi Constantinos. 2009/6/18 Constantinos Evangelinos : > Another problem that I'm facing with MVAPICH2 1.4rc1 on RHEL4 Intel compilers > 10.1. I'm trying to build it (without Limic2) with support for CR. I've > installed the latest BLCR rpms etc. > > This is my configure line: > ./configure --with-rdma=gen2 --enable-blcr --enable-romio --with-file-system=ufs+pvfs2 > CPPFLAGS="-I/opt/pvfs-2.8.1/include" LDFLAGS=-L/opt/pvfs-2.8.1/lib > LIBS="-lpvfs2 -lcr -pthread" MPICH2LIB_CFLAGS="-xT -fPIC -D_GNU_SOURCE" > MPICH2LIB_CXXFLAGS="-xT -fPIC" MPICH2LIB_FFLAGS="-xT -fPIC" > MPICH2LIB_F90FLAGS="-xT -fPIC" CC=icc CXX=icpc F77=ifort > F90=ifort --prefix=/opt/mvapich2-1.4rc1/icc > > In successfully building MVAPICH2 1.2 on the very same platform with the same > compilers I discovered I had to (a) add a -lcr to LIBS and (b) add a -pthread > to LIBS (-lpthread will not do). I've also had to add -D_GNU_SOURCE as > already discussed here. > > However I hit a snag: > > make[8]: Entering directory > `/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2' > icc -DHAVE_CONFIG_H -I. -I. -I/root/build/mvapich2-1.4rc1/src/include -I../../../../../../include -DNDEBUG -O2 -xT -fPIC -D_GNU_SOURCE -D_GNU_SOURCE -I/opt/pvfs-2.8.1/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/include -I/root/build/mvapich2-1.4rc1/src/mpid/common/datatype -I/root/build/mvapich2-1.4rc1/src/mpid/common/datatype -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -c > cr.c > cr.c(277): error: struct "_pthread_rwlock_t" has no field "__data" > ? ? ?if (MPICR_cs_lock.__data.__writer == syscall(SYS_gettid)) > ? ? ? ? ? ? ? ? ? ? ? ?^ > > cr.c(289): error: struct "_pthread_rwlock_t" has no field "__data" > ? ? ?if (MPICR_cs_lock.__data.__writer == syscall(SYS_gettid)) > ? ? ? ? ? ? ? ? ? ? ? ?^ > > cr.c(963): warning #167: argument of type "struct _IO_FILE *" is incompatible > with parameter of type "const char *" > ? ? ? ? ?MPIU_Error_printf(stderr, "rdma_open_hca failed\n"); > ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ > > cr.c(1005): warning #167: argument of type "void *" is incompatible with > parameter of type "void *(*)(void *)" > ? ? ? ? ? ? ? ? ?(void*) async_thread, > ? ? ? ? ? ? ? ? ?^ > > cr.c(924): warning #589: transfer of control bypasses initialization of: > ? ? ? ? ? ?variable "pg" (declared at line 938) > ? ? ? ? ? ?variable "pg_rank" (declared at line 939) > ? ? ? ? ? ?variable "pg_size" (declared at line 940) > ? ? ? ? ? ?variable "ud_qpn_all" (declared at line 942) > ? ? ? ? ? ?variable "lid_all" (declared at line 943) > ? ? ? ? ? ?variable "vc" (declared at line 950) > ? ? ? ? ? ?variable "i" (declared at line 951) > ? ? ? ?MPIU_ERR_SETFATALANDJUMP1( > ? ? ? ?^ > > cr.c(1268): warning #144: a value of type "MPIDI_CH3_PktGeneric_t *" cannot be > used to initialize an entity of type "MPIDI_CH3_Pkt_t *" > ? ? ? ? ? ? ? ? ?MPIDI_CH3_Pkt_t *upkt = &(req->dev.pending_pkt); > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ > > compilation aborted for cr.c (code 2) > > at which point I cannot proceed any further. Any ideas on what can be done? > Thank You for your report. We are working on a fix for the compilation error. I will provide you with a patch ASAP. > Also in the past there was a performance penalty for shared memory operations > when using CR - am I to understand that is not the case anymore? > Your understanding is correct. CR of the intra-node shared memory channel is supported as of MVAPICH2 1.2. So there is no longer a performance penalty. Thanks & Regards, Karthik > Thanks in advance, > > Constantinos > -- > Dr. Constantinos Evangelinos ? ? ? ? ? ? ? ? ? ?Room 54-1518, EAPS/MIT > Earth, Atmospheric and Planetary Sciences ? ? ? 77 Massachusetts Avenue > Massachusetts Institute of Technology ? ? ? ? ? Cambridge, MA 02139 > +1-617-324-3386/+1-617-253-4464 (fax) ? ? ? ? ? USA > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From perkinjo at cse.ohio-state.edu Fri Jun 19 10:43:42 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Fri Jun 19 10:44:05 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. In-Reply-To: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADCC@ES03SNLNT.srn.sandia.gov> References: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> <20090618195222.GD3073@cse.ohio-state.edu> <6DBFA9068040C94484B97A16CF47C39A7E3FAFADCC@ES03SNLNT.srn.sandia.gov> Message-ID: <20090619144342.GF3066@cse.ohio-state.edu> On Fri, Jun 19, 2009 at 08:21:10AM -0600, Atencio, Jonathan Gerald wrote: > >________________________________________ > >From: Jonathan Perkins [perkinjo@cse.ohio-state.edu] > >Sent: Friday, June 19, 2009 6:04 AM > >To: Atencio, Jonathan Gerald > >Cc: mvapich-discuss@cse.ohio-state.edu > >Subject: Re: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. > > > >On Thu, Jun 18, 2009 at 03:53:33PM -0600, Atencio, Jonathan Gerald wrote: > >> Hello Jonathan, > >> > >> I started from a clean mvapich 1.1 directory. Patched using c++.patch. Configured, notice that C++ bindings would be built. However, when I make, the first compile line errors out. Please see the following: > >> > >> % cd mvapich-1.1 > >> % patch -p1 < c++.patch > >> patching file MPI-2-C++/contrib/examples/chapter_10_mpi2.cc > >> patching file MPI-2-C++/contrib/examples/hello_world.cc > >> patching file MPI-2-C++/contrib/examples/pi.cc > >> patching file MPI-2-C++/contrib/examples/ring.cc > >> patching file MPI-2-C++/contrib/examples/topology.cc > >> patching file MPI-2-C++/contrib/examples/user_bcast.cc > >> patching file MPI-2-C++/contrib/test_suite/cancel.cc > >> patching file MPI-2-C++/contrib/test_suite/errhandler.cc > >> patching file MPI-2-C++/contrib/test_suite/messages.cc > >> patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.cc > >> patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.h > >> patching file MPI-2-C++/contrib/test_suite/range.cc > >> patching file MPI-2-C++/contrib/test_suite/signal.cc > >> patching file configure > >> patching file configure.in > >> patching file examples/basic/hello++.cc > >> patching file examples/perftest/config/confdb/aclocal_cxx.m4 > >> patching file examples/test/command/runtests.in > >> patching file installtest/hello++.cc > >> patching file src/cxx/test/basic.cxx > >> patching file src/cxx/test/errtest.cxx > >> patching file src/cxx/test/send1.cxx > >> % export CC=gcc CXX=g++ F77=gfortran F90=gfortran F77_GETARGDECL=" " > >> % ./configure --with-device=ch_gen2 --with-arch=LINUX -prefix=/gscratch1/jgatenc/mpi/mvapich/1.1 --enable-romio --enable-f77 --enable-sharedlib --enable-cxx -lib="-L/usr/ofed/lib64 -libverbs -libumad -ldl" 2>&1 | tee ../mvapich-build > >> ... > >> checking whether selected C++ compiler can compile iostream... yes > >> > >> Include C++ bindings for MPI from http://www.osl.iu.edu/research/mpi2c++ > >> Send bug reports about the C++ to mpi2cpp-devel@osl.iu.edu > >> ... > >> % make > >> ... > >> gcc -DHAVE_CONFIG_H -I. -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -fPIC -DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/gscratch1/jgatenc/mvapich-1.1 -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I. -c viainit.c > >> In file included from ibverbs_header.h:30, > >> from viapacket.h:40, > >> from req.h:48, > >> from mpid.h:226, > >> from viadev.h:30, > >> from viainit.c:52: > >> via64.h:51:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined > >> via64.h:100:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined. > >> In file included from viainit.c:57: > >> mpid_smpi.h:66:2: error: #error "No architecture defined !!" > >> In file included from viainit.c:57: > >> mpid_smpi.h:203: error: 'SMP_SEND_BUF_SIZE' undeclared here (not in a function) > >> make[3]: *** [viainit.o] Error 1 > >> Exit status from make was 2 > >> make[2]: *** [mpilib] Error 1 > >> make[1]: *** [mpi-modules] Error 2 > >> make: *** [mpi] Error 2 > > > >It's weird that the architecture isn't defined. I don't see any changes > >that would effect this detection and I'm a bit unsure why this didn't > >show up initially with the earlier version of the gnu compilers. Can > >you check whether this patched version builds with the older compilers? > > > >Also, I'd like for you to send me the output of 'uname -a' and 'cat > >/proc/cpuinfo'. Thanks. > > > > I get the same error using gcc 4.1,2 now. > > % uname -a > Linux glory144 2.6.18-63chaos #1 SMP Fri Dec 19 15:37:16 EST 2008 x86_64 x86_64 x86_64 GNU/Linux > % cat /proc/cpu > processor : 0 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : Quad-Core AMD Opteron(tm) Processor 8354 Thanks for the information. I'm about to try this out on a system with a more similar model name for its cpu. I'll get back to you shortly. -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090619/92e4a5be/attachment.bin From perkinjo at cse.ohio-state.edu Fri Jun 19 11:01:54 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Fri Jun 19 11:02:17 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. In-Reply-To: <20090619144342.GF3066@cse.ohio-state.edu> References: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> <20090618195222.GD3073@cse.ohio-state.edu> <6DBFA9068040C94484B97A16CF47C39A7E3FAFADCC@ES03SNLNT.srn.sandia.gov> <20090619144342.GF3066@cse.ohio-state.edu> Message-ID: <20090619150154.GH3066@cse.ohio-state.edu> On Fri, Jun 19, 2009 at 10:43:42AM -0400, Jonathan Perkins wrote: > On Fri, Jun 19, 2009 at 08:21:10AM -0600, Atencio, Jonathan Gerald wrote: > > >________________________________________ > > >From: Jonathan Perkins [perkinjo@cse.ohio-state.edu] > > >Sent: Friday, June 19, 2009 6:04 AM > > >To: Atencio, Jonathan Gerald > > >Cc: mvapich-discuss@cse.ohio-state.edu > > >Subject: Re: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. > > > > > >On Thu, Jun 18, 2009 at 03:53:33PM -0600, Atencio, Jonathan Gerald wrote: > > >> Hello Jonathan, > > >> > > >> I started from a clean mvapich 1.1 directory. Patched using c++.patch. Configured, notice that C++ bindings would be built. However, when I make, the first compile line errors out. Please see the following: > > >> > > >> % cd mvapich-1.1 > > >> % patch -p1 < c++.patch > > >> patching file MPI-2-C++/contrib/examples/chapter_10_mpi2.cc > > >> patching file MPI-2-C++/contrib/examples/hello_world.cc > > >> patching file MPI-2-C++/contrib/examples/pi.cc > > >> patching file MPI-2-C++/contrib/examples/ring.cc > > >> patching file MPI-2-C++/contrib/examples/topology.cc > > >> patching file MPI-2-C++/contrib/examples/user_bcast.cc > > >> patching file MPI-2-C++/contrib/test_suite/cancel.cc > > >> patching file MPI-2-C++/contrib/test_suite/errhandler.cc > > >> patching file MPI-2-C++/contrib/test_suite/messages.cc > > >> patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.cc > > >> patching file MPI-2-C++/contrib/test_suite/mpi2c++_test.h > > >> patching file MPI-2-C++/contrib/test_suite/range.cc > > >> patching file MPI-2-C++/contrib/test_suite/signal.cc > > >> patching file configure > > >> patching file configure.in > > >> patching file examples/basic/hello++.cc > > >> patching file examples/perftest/config/confdb/aclocal_cxx.m4 > > >> patching file examples/test/command/runtests.in > > >> patching file installtest/hello++.cc > > >> patching file src/cxx/test/basic.cxx > > >> patching file src/cxx/test/errtest.cxx > > >> patching file src/cxx/test/send1.cxx > > >> % export CC=gcc CXX=g++ F77=gfortran F90=gfortran F77_GETARGDECL=" " > > >> % ./configure --with-device=ch_gen2 --with-arch=LINUX -prefix=/gscratch1/jgatenc/mpi/mvapich/1.1 --enable-romio --enable-f77 --enable-sharedlib --enable-cxx -lib="-L/usr/ofed/lib64 -libverbs -libumad -ldl" 2>&1 | tee ../mvapich-build > > >> ... > > >> checking whether selected C++ compiler can compile iostream... yes > > >> > > >> Include C++ bindings for MPI from http://www.osl.iu.edu/research/mpi2c++ > > >> Send bug reports about the C++ to mpi2cpp-devel@osl.iu.edu > > >> ... > > >> % make > > >> ... > > >> gcc -DHAVE_CONFIG_H -I. -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/include -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I/gscratch1/jgatenc/mvapich-1.1/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -fPIC -DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/gscratch1/jgatenc/mvapich-1.1 -I/gscratch1/jgatenc/mvapich-1.1/mpid/ch_gen2 -I. -c viainit.c > > >> In file included from ibverbs_header.h:30, > > >> from viapacket.h:40, > > >> from req.h:48, > > >> from mpid.h:226, > > >> from viadev.h:30, > > >> from viainit.c:52: > > >> via64.h:51:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined > > >> via64.h:100:2: error: #error Either _IA32_ or _IA64_ or _X86_64_ or _EM64T_ or _PPC64_ must be defined. > > >> In file included from viainit.c:57: > > >> mpid_smpi.h:66:2: error: #error "No architecture defined !!" > > >> In file included from viainit.c:57: > > >> mpid_smpi.h:203: error: 'SMP_SEND_BUF_SIZE' undeclared here (not in a function) > > >> make[3]: *** [viainit.o] Error 1 > > >> Exit status from make was 2 > > >> make[2]: *** [mpilib] Error 1 > > >> make[1]: *** [mpi-modules] Error 2 > > >> make: *** [mpi] Error 2 > > > > > >It's weird that the architecture isn't defined. I don't see any changes > > >that would effect this detection and I'm a bit unsure why this didn't > > >show up initially with the earlier version of the gnu compilers. Can > > >you check whether this patched version builds with the older compilers? > > > > > >Also, I'd like for you to send me the output of 'uname -a' and 'cat > > >/proc/cpuinfo'. Thanks. > > > > > > > I get the same error using gcc 4.1,2 now. > > > > % uname -a > > Linux glory144 2.6.18-63chaos #1 SMP Fri Dec 19 15:37:16 EST 2008 x86_64 x86_64 x86_64 GNU/Linux > > % cat /proc/cpu > > processor : 0 > > vendor_id : AuthenticAMD > > cpu family : 16 > > model : 2 > > model name : Quad-Core AMD Opteron(tm) Processor 8354 > > Thanks for the information. I'm about to try this out on a system with > a more similar model name for its cpu. I'll get back to you shortly. The patched build works fine for me on an identical system (as far as the arch function in make.mvapich.def is concerned). Are you running make.mvapich.gen2 or trying the commands manually? There are some CFLAGS that are set by our build scripts that don't look to be set from your output above. -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090619/f446664a/attachment-0001.bin From gopalakk at cse.ohio-state.edu Sun Jun 21 12:59:31 2009 From: gopalakk at cse.ohio-state.edu (Karthik Gopalakrishnan) Date: Sun Jun 21 12:59:52 2009 Subject: [mvapich-discuss] MVAPICH2 1.4rc1 with CR on RHEL4 AS In-Reply-To: <92eddfb50906190722g249898fdo1142df70a5f1b09@mail.gmail.com> References: <200906181751.07726.ce107@mit.edu> <92eddfb50906190722g249898fdo1142df70a5f1b09@mail.gmail.com> Message-ID: <92eddfb50906210959neaadddex44515dfa462af077@mail.gmail.com> Hi Constantinos. Please check if the attached patch fixes the CR build issue on RHEL4. Regards, Karthik On Fri, Jun 19, 2009 at 10:22 AM, Karthik Gopalakrishnan wrote: > Hi Constantinos. > > 2009/6/18 Constantinos Evangelinos : >> Another problem that I'm facing with MVAPICH2 1.4rc1 on RHEL4 Intel compilers >> 10.1. I'm trying to build it (without Limic2) with support for CR. I've >> installed the latest BLCR rpms etc. >> >> This is my configure line: >> ./configure --with-rdma=gen2 --enable-blcr --enable-romio --with-file-system=ufs+pvfs2 >> CPPFLAGS="-I/opt/pvfs-2.8.1/include" LDFLAGS=-L/opt/pvfs-2.8.1/lib >> LIBS="-lpvfs2 -lcr -pthread" MPICH2LIB_CFLAGS="-xT -fPIC -D_GNU_SOURCE" >> MPICH2LIB_CXXFLAGS="-xT -fPIC" MPICH2LIB_FFLAGS="-xT -fPIC" >> MPICH2LIB_F90FLAGS="-xT -fPIC" CC=icc CXX=icpc F77=ifort >> F90=ifort --prefix=/opt/mvapich2-1.4rc1/icc >> >> In successfully building MVAPICH2 1.2 on the very same platform with the same >> compilers I discovered I had to (a) add a -lcr to LIBS and (b) add a -pthread >> to LIBS (-lpthread will not do). I've also had to add -D_GNU_SOURCE as >> already discussed here. >> >> However I hit a snag: >> >> make[8]: Entering directory >> `/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2' >> icc -DHAVE_CONFIG_H -I. -I. -I/root/build/mvapich2-1.4rc1/src/include -I../../../../../../include -DNDEBUG -O2 -xT -fPIC -D_GNU_SOURCE -D_GNU_SOURCE -I/opt/pvfs-2.8.1/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/include -I/root/build/mvapich2-1.4rc1/src/mpid/common/datatype -I/root/build/mvapich2-1.4rc1/src/mpid/common/datatype -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/include -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/root/build/mvapich2-1.4rc1/src/mpid/ch3/channels/mrail/src/gen2 -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -I/root/build/mvapich2-1.4rc1/src/mpid/common/locks -c >> cr.c >> cr.c(277): error: struct "_pthread_rwlock_t" has no field "__data" >> ? ? ?if (MPICR_cs_lock.__data.__writer == syscall(SYS_gettid)) >> ? ? ? ? ? ? ? ? ? ? ? ?^ >> >> cr.c(289): error: struct "_pthread_rwlock_t" has no field "__data" >> ? ? ?if (MPICR_cs_lock.__data.__writer == syscall(SYS_gettid)) >> ? ? ? ? ? ? ? ? ? ? ? ?^ >> >> cr.c(963): warning #167: argument of type "struct _IO_FILE *" is incompatible >> with parameter of type "const char *" >> ? ? ? ? ?MPIU_Error_printf(stderr, "rdma_open_hca failed\n"); >> ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >> >> cr.c(1005): warning #167: argument of type "void *" is incompatible with >> parameter of type "void *(*)(void *)" >> ? ? ? ? ? ? ? ? ?(void*) async_thread, >> ? ? ? ? ? ? ? ? ?^ >> >> cr.c(924): warning #589: transfer of control bypasses initialization of: >> ? ? ? ? ? ?variable "pg" (declared at line 938) >> ? ? ? ? ? ?variable "pg_rank" (declared at line 939) >> ? ? ? ? ? ?variable "pg_size" (declared at line 940) >> ? ? ? ? ? ?variable "ud_qpn_all" (declared at line 942) >> ? ? ? ? ? ?variable "lid_all" (declared at line 943) >> ? ? ? ? ? ?variable "vc" (declared at line 950) >> ? ? ? ? ? ?variable "i" (declared at line 951) >> ? ? ? ?MPIU_ERR_SETFATALANDJUMP1( >> ? ? ? ?^ >> >> cr.c(1268): warning #144: a value of type "MPIDI_CH3_PktGeneric_t *" cannot be >> used to initialize an entity of type "MPIDI_CH3_Pkt_t *" >> ? ? ? ? ? ? ? ? ?MPIDI_CH3_Pkt_t *upkt = &(req->dev.pending_pkt); >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >> >> compilation aborted for cr.c (code 2) >> >> at which point I cannot proceed any further. Any ideas on what can be done? >> > Thank You for your report. We are working on a fix for the compilation > error. I will provide you with a patch ASAP. > >> Also in the past there was a performance penalty for shared memory operations >> when using CR - am I to understand that is not the case anymore? >> > Your understanding is correct. CR of the intra-node shared memory > channel is supported as of MVAPICH2 1.2. So there is no longer a > performance penalty. > > Thanks & Regards, > Karthik > >> Thanks in advance, >> >> Constantinos >> -- >> Dr. Constantinos Evangelinos ? ? ? ? ? ? ? ? ? ?Room 54-1518, EAPS/MIT >> Earth, Atmospheric and Planetary Sciences ? ? ? 77 Massachusetts Avenue >> Massachusetts Institute of Technology ? ? ? ? ? Cambridge, MA 02139 >> +1-617-324-3386/+1-617-253-4464 (fax) ? ? ? ? ? USA >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: cr_rhel4.patch Type: application/octet-stream Size: 3495 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090621/b9504a47/cr_rhel4.obj From jgatenc at sandia.gov Mon Jun 22 12:41:10 2009 From: jgatenc at sandia.gov (Atencio, Jonathan Gerald) Date: Mon Jun 22 12:41:42 2009 Subject: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. In-Reply-To: <20090619150154.GH3066@cse.ohio-state.edu> References: <6DBFA9068040C94484B97A16CF47C39A7E3FAFADC9@ES03SNLNT.srn.sandia.gov> <20090618195222.GD3073@cse.ohio-state.edu> <6DBFA9068040C94484B97A16CF47C39A7E3FAFADCC@ES03SNLNT.srn.sandia.gov> <20090619144342.GF3066@cse.ohio-state.edu> <20090619150154.GH3066@cse.ohio-state.edu> Message-ID: <6DBFA9068040C94484B97A16CF47C39A7E40EB2F2F@ES03SNLNT.srn.sandia.gov> Hello Jonathan, You are correct. I was forgetting several flags when I was building manually. This patch works for me. Thank you very much for your help. Regards, Jonathan _____________________________________________ From: Jonathan Perkins [mailto:perkinjo@cse.ohio-state.edu] Sent: Friday, June 19, 2009 9:02 AM To: Atencio, Jonathan Gerald; mvapich-discuss@cse.ohio-state.edu Subject: Re: [mvapich-discuss] Building mvapich with GNU GCC 4.3 or higher. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090622/912bdcc8/attachment.html From isono at cray.com Tue Jun 23 03:59:52 2009 From: isono at cray.com (Satoshi Isono) Date: Tue Jun 23 04:00:27 2009 Subject: [mvapich-discuss] How to keep gid status In-Reply-To: References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> Message-ID: <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> Dear Prof. Panda, Dr. Barth, Thanks for your advices. I have edited mpirun_rsh.c directly in order to keep the order of options on mpirun_rsh command. I show you differential lines as below. My MVAPICH2 version is mvapich2-1.2p1. $ diff mpirun_rsh.c mpirun_rsh.c.org 67d66 < #include 260d258 < struct group *grpptr; 275,276d272 < int sg_index; < 279d274 < grpptr = getgrgid(getgid()); 417,456d411 < //isono < for (i = aout_index; i < argc; i++) { < if (strchr(argv[i], '=') == NULL) { < sg_index = i; < break; < } < } < fprintf(stdout, "\n# INPUT PARAMETERS\n"); < fprintf(stdout, "%15s = %d\n", "argc", argc); < fprintf(stdout, "%15s = %d\n", "option_index", option_index); < fprintf(stdout, "%15s = %d\n", "aout_index", aout_index); < fprintf(stdout, "%15s = %d\n", "sg_index", sg_index); < for (i = 0; i < argc; i++) { < fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]); < } < char add_argv[argc+2][31]; < for (i = 0; i < argc; i++) { < strcpy(add_argv[i], argv[i]); < } < for (i = 0; i < argc+2; i++) { < if (i < sg_index) { < argv[i]=add_argv[i]; < } else if (i == sg_index) { < strcpy(argv[i], "/usr/bin/sg"); < i++; < argv[i] = grpptr->gr_name; < } else { < argv[i]=add_argv[i-2]; < } < } < argc = argc + 2; < fprintf(stdout, "\n# RUNNING PARAMETERS\n"); < fprintf(stdout, "%15s = %d\n", "argc", argc); < fprintf(stdout, "%15s = %d\n", "option_index", option_index); < fprintf(stdout, "%15s = %d\n", "aout_index", aout_index); < fprintf(stdout, "%15s = %d\n", "sg_index", sg_index); < for (i = 0; i < argc; i++) { < fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]); < } < And then, this is test result using new mpirun_rsh. $ mpirun_rsh -np 4 -hostfile hostfile MV2_NUM_HCAS=2 MV2_SM_SCHEDULING=ROUND_ROBIN ./gid-mv2-itl # INPUT PARAMETERS argc = 8 option_index = 3 aout_index = 5 sg_index = 7 argv[ 0] = mpirun_rsh argv[ 1] = -np argv[ 2] = 4 argv[ 3] = -hostfile argv[ 4] = hostfile argv[ 5] = MV2_NUM_HCAS=2 argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN argv[ 7] = ./gid-mv2-itl # RUNNING PARAMETERS argc = 10 option_index = 3 aout_index = 5 sg_index = 7 argv[ 0] = mpirun_rsh argv[ 1] = -np argv[ 2] = 4 argv[ 3] = -hostfile argv[ 4] = hostfile argv[ 5] = MV2_NUM_HCAS=2 argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN argv[ 7] = /usr/bin/sg argv[ 8] = GAUSSIAN argv[ 9] = ./gid-mv2-itl However, there is a problem. When we embed /usr/bin/sg command in the line of mpirun_rsh, how can we deal with an input file? In case of using your wrapper script, it also occurs. I show you a simple example with test code. 1) mpirun_rsh -np 4 -hostfile hostfile ./gid3 ./data And the following is gid3 code. $ cat gid3.c #include #include #include #define MAX_DATA_SIZE 1000000 double a[MAX_DATA_SIZE]; int main(int argc,char *argv[]) { int rank,size,namelen; char name[MPI_MAX_PROCESSOR_NAME],comm[512]; int i,ret,dsize; char str[80]; FILE *fp; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Get_processor_name(name,&namelen); //printf("%4d/%-d: %s\n",rank,size,name); fp=fopen(argv[1],"r"); for(i=0;i Dear all, > > I would like to know how to keep gid status when launching MPI > processes. We know that, with sg command in mpirun_rsh command line, it > is successful in this case. Can you please advise me. I show a example > as below. > > Most of users belong multiple group. And accounting system is managed > based on a group ID (GID). So, all files created from each user must be > owned with appropriate group owner information. > > A problem here is that the state of GID not saved. I would show you a > example. Could you read it, according to numbers. > > 1) User logins into a login node. > > $ id > uid=1002(craysp) gid=1002(cray) > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > This is showing default gid is 1002(cray). This "cray" is primary group > ID. > > 2) User changes arbitrary group with newgrp command. > > $ newgrp GAUSSIAN > $ id > uid=1002(craysp) gid=8001(GAUSSIAN) > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > This case is that a user wants to change another group like "GAUSSIAN". > Certainly, I make sure it was changed to GAUSSIAN from cray. > > 3) User runs a MPI job with mpirun_rsh > > This is the simple MPI code which generates a output file. > > $ cat gid.c > #include > #include > #include > > int main(int argc,char *argv[]) > { > int rank,size,namelen; > char name[MPI_MAX_PROCESSOR_NAME],comm[512]; > > MPI_Init(&argc,&argv); > > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > MPI_Comm_size(MPI_COMM_WORLD,&size); > MPI_Get_processor_name(name,&namelen); > > sprintf(comm,"touch testfile_%s_%d",name,rank); > system(comm); > > MPI_Finalize(); > return 0; > } > > After running this code, I want that a output file was owned by > "GAUSSIAN" group. But it was different from that I want. Below is a run > script including mpirun_rsh. > > $ cat run_i.sh > #!/bin/bash > . /opt/Modules/init/bash > module load pgi mvapich2/pgi > mpirun_rsh -np 1 com-0644 ./gid-mv2 > > 4) User confirms that a created file doesn't owned appropriate group ID. > > $ ls -l testfile_com-0644_0 > -rw-r--r-- 1 craysp cray 0 Jun 8 17:50 testfile_com-0644_0 > > You can confirm that this file is owned "cray" not "GAUSSIAN". This > problem is caused on mpirun_rsh command or SSH server configuration, I > think. > > 5) The way to solve it. > > I am considering that better way is inserting "sg" command just before > a.out in mpirun_rsh command line. I would show you a example. > > $ grep mpirun_rsh run_i.sh > mpirun_rsh -np 1 com-0644 /usr/bin/sg `id -gn` ./gid-mv2 > > By specifying sg command just before a.out, It works well. > > $ ls -l testfile_com-0644_0 > -rw-r--r-- 1 craysp GAUSSIAN 0 Jun 8 18:33 testfile_com-0644_0 > > 6) Request to you > > I thought that the wrapper script of mpirun_rsh would be created at > first. But it is difficult to specify executable file location on > command lines. There are various patterns that user describes in > mpirun_rsh line. For example: > > mpirun_rsh -np 2048 -hostfile hosts.txt ./a.out Inputfile | tee -a > Outputfile > mpirun_rsh -np 256 -hostfile hostlist ./a.out input >> log > mpirun_rsh -np 8 -hostfile hostfile MV2_ENABLE_AFFINITY=0 > MV2_NUM_HCAS=4 ./numarun_mv2.sh ./a.out > ... > > And we can take a look on line 1607. > > 1607 /* add the arguments */ > 1608 for (i = aout_index + 1; i < argc; i++) { > 1609 strcat(command_name, " "); > 1610 strcat(command_name, argv[i]); > 1611 } > > An example of edit: > > 1607 /* add the arguments */ > 1608 strcat(command_name, " /usr/bin/sg $(id -gn)"); > 1609 for (i = aout_index + 1; i < argc; i++) { > 1610 strcat(command_name, " "); > 1611 strcat(command_name, argv[i]); > 1612 } > > I have edited showing above and done recompile it, but it doesn't apply. > If you know other way which is able to solve this problem, can you > please tell me? > > Best regards, > Satoshi Isono > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From bbarth at tacc.utexas.edu Tue Jun 23 10:43:15 2009 From: bbarth at tacc.utexas.edu (Bill Barth) Date: Tue Jun 23 10:43:48 2009 Subject: [mvapich-discuss] How to keep gid status In-Reply-To: <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> Message-ID: <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> Satoshi, You can always quote the command. Compare rusk(2)$ /usr/bin/sg bbarth echo foo which is broken in the manner you suggest, to rusk(3)$ /usr/bin/sg bbarth "echo foo" foo which does the right thing. Of course, once you bring quoting into the mix, you'll have to be more careful b/c the user might have put quotes on his command line. Bill. -- Bill Barth, Ph.D., Assistant Director, HPC (interim) bbarth@tacc.utexas.edu??????? |?? Phone: (512) 232-7069 Office: ROC 1.405???????????? |?? Fax:?? (512) 475-9445 > -----Original Message----- > From: Satoshi Isono [mailto:isono@cray.com] > Sent: Tuesday, June 23, 2009 3:00 AM > To: Dhabaleswar Panda; Bill Barth > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] How to keep gid status > > Dear Prof. Panda, Dr. Barth, > > Thanks for your advices. I have edited mpirun_rsh.c directly in order > to > keep the order of options on mpirun_rsh command. I show you > differential > lines as below. My MVAPICH2 version is mvapich2-1.2p1. > > $ diff mpirun_rsh.c mpirun_rsh.c.org > 67d66 > < #include > 260d258 > < struct group *grpptr; > 275,276d272 > < int sg_index; > < > 279d274 > < grpptr = getgrgid(getgid()); > 417,456d411 > < //isono > < for (i = aout_index; i < argc; i++) { > < if (strchr(argv[i], '=') == NULL) { > < sg_index = i; > < break; > < } > < } > < fprintf(stdout, "\n# INPUT PARAMETERS\n"); > < fprintf(stdout, "%15s = %d\n", "argc", argc); > < fprintf(stdout, "%15s = %d\n", "option_index", option_index); > < fprintf(stdout, "%15s = %d\n", "aout_index", aout_index); > < fprintf(stdout, "%15s = %d\n", "sg_index", sg_index); > < for (i = 0; i < argc; i++) { > < fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]); > < } > < char add_argv[argc+2][31]; > < for (i = 0; i < argc; i++) { > < strcpy(add_argv[i], argv[i]); > < } > < for (i = 0; i < argc+2; i++) { > < if (i < sg_index) { > < argv[i]=add_argv[i]; > < } else if (i == sg_index) { > < strcpy(argv[i], "/usr/bin/sg"); > < i++; > < argv[i] = grpptr->gr_name; > < } else { > < argv[i]=add_argv[i-2]; > < } > < } > < argc = argc + 2; > < fprintf(stdout, "\n# RUNNING PARAMETERS\n"); > < fprintf(stdout, "%15s = %d\n", "argc", argc); > < fprintf(stdout, "%15s = %d\n", "option_index", option_index); > < fprintf(stdout, "%15s = %d\n", "aout_index", aout_index); > < fprintf(stdout, "%15s = %d\n", "sg_index", sg_index); > < for (i = 0; i < argc; i++) { > < fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]); > < } > < > > And then, this is test result using new mpirun_rsh. > > $ mpirun_rsh -np 4 -hostfile hostfile MV2_NUM_HCAS=2 > MV2_SM_SCHEDULING=ROUND_ROBIN ./gid-mv2-itl > > # INPUT PARAMETERS > argc = 8 > option_index = 3 > aout_index = 5 > sg_index = 7 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = MV2_NUM_HCAS=2 > argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN > argv[ 7] = ./gid-mv2-itl > > # RUNNING PARAMETERS > argc = 10 > option_index = 3 > aout_index = 5 > sg_index = 7 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = MV2_NUM_HCAS=2 > argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN > argv[ 7] = /usr/bin/sg > argv[ 8] = GAUSSIAN > argv[ 9] = ./gid-mv2-itl > > However, there is a problem. When we embed /usr/bin/sg command in the > line of mpirun_rsh, how can we deal with an input file? In case of > using > your wrapper script, it also occurs. > > I show you a simple example with test code. > > 1) mpirun_rsh -np 4 -hostfile hostfile ./gid3 ./data > > And the following is gid3 code. > > $ cat gid3.c > #include > #include > #include > #define MAX_DATA_SIZE 1000000 > double a[MAX_DATA_SIZE]; > > int main(int argc,char *argv[]) > { > int rank,size,namelen; > char name[MPI_MAX_PROCESSOR_NAME],comm[512]; > int i,ret,dsize; > char str[80]; > FILE *fp; > > MPI_Init(&argc,&argv); > > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > MPI_Comm_size(MPI_COMM_WORLD,&size); > MPI_Get_processor_name(name,&namelen); > > //printf("%4d/%-d: %s\n",rank,size,name); > fp=fopen(argv[1],"r"); > for(i=0;i if(fgets(str,80,fp)==NULL) break; > ret=sscanf(str,"%lf",&a[i]); > fprintf(stdout,"%s_%d: data = %lf\n",name,rank,a[i]); > } > fclose(fp); > dsize=i; > fprintf(stdout,"n = %d\n",i); > > sprintf(comm,"touch testfile_%s_%d",name,rank); > system(comm); > > MPI_Finalize(); > return 0; > } > > When I try to run this code using new mpirun_rsh, I cannot run with > errors as below. > > $ mpirun_rsh -np 4 -hostfile hostfile ./gid3 ./data > > # INPUT PARAMETERS > argc = 7 > option_index = 3 > aout_index = 5 > sg_index = 5 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = ./gid3 > argv[ 6] = ./data > > # RUNNING PARAMETERS > argc = 9 > option_index = 3 > aout_index = 5 > sg_index = 5 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = /usr/bin/sg > argv[ 6] = GAUSSIAN > argv[ 7] = ./gid3 > argv[ 8] = ./data > MPI process terminated unexpectedly > Exit code -5 signaled from com-0643 > cleanupKilling remote processes...DONE > Signal 15 received. > > In additional information, I was able to run it using other ways > showing > below. Both (2) and (3) need the re-editing for users source code. > > 2) mpirun_rsh -np 4 -hostfile hostfile ./gid4 < ./data > 3) mpirun_rsh -np 4 -hostfile hostfile INPUT_FILENAME=data ./gid5 > > I used the way (2) to specify "stdin" for input file. About (3), source > gid5.c includes getenv("INPUT_FILENAME") function and I exported this > environment variable on option line of mpirun_rsh. > > Sorry for my long explanation. My question is how do we handle the case > of (1). Do you have any ideas to solve it? I think it is NOT good to > modify each users code. > > Please let me know some advices. > > Best regards, > Satoshi Isono > > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Wednesday, June 17, 2009 4:35 AM > To: Satoshi Isono > Cc: mvapich-discuss@cse.ohio-state.edu; Bill Barth; Dhabaleswar Panda > Subject: Re: [mvapich-discuss] How to keep gid status > > Hi, > > Thanks for your note. I shared your question with Dr. Bill Barth from > TACC. Folks from TACC have been using MVAPICH with mpirun_rsh in their > production environment on Ranger for quite some time. I am including > his > reply below. I hope his suggested approach will work for you. Let us > know. > > I am cc'ing Dr. Barth on this e-mail also. If there are any additional > questions, two of you might exchange additional information on this > issue. > > Thanks, > > DK > > ==================================================================== > > As you may recall, we have wrapper scripts that we use on Ranger and > Lonestar to hide the details of the mpirun_rsh command line from the > users. We call it 'ibrun'. It interacts with the scheduler (through the > environment) to generate the host list and establish the number of > tasks > to start. I don't see why it would be hard to add a call to /usr/bin/sg > in > there. > > If the user would have invoked > > mpirun_rsh -np 5 -hostfile hosts ./foo > > he simply runs > > ibrun ./foo > > on Ranger or Lonestar. 'ibrun' is basically structured as: > > #!/bin/bash > ....find NP from the envrionment.... > ....find host list.... > $MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE "$@" > So it just takes the command line args of ibrun and passes them > directly > to mpirun_rsh > > There's no reason it couldn't do > > #!/bin/bash > ....find NP.... > ....find host list.... > GROUP_ID=`id -gn` > $MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE /usr/bin/sg > $GROUP_ID > "$@" > > It should be this straightforward. > > Bill. > > ====== > > > On Sun, 14 Jun 2009, Satoshi Isono wrote: > > > Dear all, > > > > I would like to know how to keep gid status when launching MPI > > processes. We know that, with sg command in mpirun_rsh command line, > it > > is successful in this case. Can you please advise me. I show a > example > > as below. > > > > Most of users belong multiple group. And accounting system is managed > > based on a group ID (GID). So, all files created from each user must > be > > owned with appropriate group owner information. > > > > A problem here is that the state of GID not saved. I would show you a > > example. Could you read it, according to numbers. > > > > 1) User logins into a login node. > > > > $ id > > uid=1002(craysp) gid=1002(cray) > > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > > > This is showing default gid is 1002(cray). This "cray" is primary > group > > ID. > > > > 2) User changes arbitrary group with newgrp command. > > > > $ newgrp GAUSSIAN > > $ id > > uid=1002(craysp) gid=8001(GAUSSIAN) > > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > > > This case is that a user wants to change another group like > "GAUSSIAN". > > Certainly, I make sure it was changed to GAUSSIAN from cray. > > > > 3) User runs a MPI job with mpirun_rsh > > > > This is the simple MPI code which generates a output file. > > > > $ cat gid.c > > #include > > #include > > #include > > > > int main(int argc,char *argv[]) > > { > > int rank,size,namelen; > > char name[MPI_MAX_PROCESSOR_NAME],comm[512]; > > > > MPI_Init(&argc,&argv); > > > > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > > MPI_Comm_size(MPI_COMM_WORLD,&size); > > MPI_Get_processor_name(name,&namelen); > > > > sprintf(comm,"touch testfile_%s_%d",name,rank); > > system(comm); > > > > MPI_Finalize(); > > return 0; > > } > > > > After running this code, I want that a output file was owned by > > "GAUSSIAN" group. But it was different from that I want. Below is a > run > > script including mpirun_rsh. > > > > $ cat run_i.sh > > #!/bin/bash > > . /opt/Modules/init/bash > > module load pgi mvapich2/pgi > > mpirun_rsh -np 1 com-0644 ./gid-mv2 > > > > 4) User confirms that a created file doesn't owned appropriate group > ID. > > > > $ ls -l testfile_com-0644_0 > > -rw-r--r-- 1 craysp cray 0 Jun 8 17:50 testfile_com-0644_0 > > > > You can confirm that this file is owned "cray" not "GAUSSIAN". This > > problem is caused on mpirun_rsh command or SSH server configuration, > I > > think. > > > > 5) The way to solve it. > > > > I am considering that better way is inserting "sg" command just > before > > a.out in mpirun_rsh command line. I would show you a example. > > > > $ grep mpirun_rsh run_i.sh > > mpirun_rsh -np 1 com-0644 /usr/bin/sg `id -gn` ./gid-mv2 > > > > By specifying sg command just before a.out, It works well. > > > > $ ls -l testfile_com-0644_0 > > -rw-r--r-- 1 craysp GAUSSIAN 0 Jun 8 18:33 testfile_com-0644_0 > > > > 6) Request to you > > > > I thought that the wrapper script of mpirun_rsh would be created at > > first. But it is difficult to specify executable file location on > > command lines. There are various patterns that user describes in > > mpirun_rsh line. For example: > > > > mpirun_rsh -np 2048 -hostfile hosts.txt ./a.out Inputfile | tee -a > > Outputfile > > mpirun_rsh -np 256 -hostfile hostlist ./a.out input >> log > > mpirun_rsh -np 8 -hostfile hostfile MV2_ENABLE_AFFINITY=0 > > MV2_NUM_HCAS=4 ./numarun_mv2.sh ./a.out > > ... > > > > And we can take a look on line 1607. > > > > 1607 /* add the arguments */ > > 1608 for (i = aout_index + 1; i < argc; i++) { > > 1609 strcat(command_name, " "); > > 1610 strcat(command_name, argv[i]); > > 1611 } > > > > An example of edit: > > > > 1607 /* add the arguments */ > > 1608 strcat(command_name, " /usr/bin/sg $(id -gn)"); > > 1609 for (i = aout_index + 1; i < argc; i++) { > > 1610 strcat(command_name, " "); > > 1611 strcat(command_name, argv[i]); > > 1612 } > > > > I have edited showing above and done recompile it, but it doesn't > apply. > > If you know other way which is able to solve this problem, can you > > please tell me? > > > > Best regards, > > Satoshi Isono > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From isono at cray.com Wed Jun 24 03:42:50 2009 From: isono at cray.com (Satoshi Isono) Date: Wed Jun 24 03:43:17 2009 Subject: [mvapich-discuss] How to keep gid status In-Reply-To: <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> References: <925346A443D4E340BEB20248BAFCDBDF0B60B8CB@CFEVS1-IP.americas.cray.com> <925346A443D4E340BEB20248BAFCDBDF0B7AA96A@CFEVS1-IP.americas.cray.com> <0E07074B82CE4B4A9982802A8484B6968364AD4419@EXCHANGE2K7.tacc.utexas.edu> Message-ID: <925346A443D4E340BEB20248BAFCDBDF0B87537E@CFEVS1-IP.americas.cray.com> Hello Bill, Thank you for your kindness. I understand I can quote input file name. With command line given below, it was able to bring success. mpirun_rsh -np 4 -hostfile hostfile '"./a.out ./data"' Regards, Satoshi Isono -----Original Message----- From: Bill Barth [mailto:bbarth@tacc.utexas.edu] Sent: Tuesday, June 23, 2009 11:43 PM To: Satoshi Isono; Dhabaleswar Panda Cc: mvapich-discuss@cse.ohio-state.edu Subject: RE: [mvapich-discuss] How to keep gid status Satoshi, You can always quote the command. Compare rusk(2)$ /usr/bin/sg bbarth echo foo which is broken in the manner you suggest, to rusk(3)$ /usr/bin/sg bbarth "echo foo" foo which does the right thing. Of course, once you bring quoting into the mix, you'll have to be more careful b/c the user might have put quotes on his command line. Bill. -- Bill Barth, Ph.D., Assistant Director, HPC (interim) bbarth@tacc.utexas.edu??????? |?? Phone: (512) 232-7069 Office: ROC 1.405???????????? |?? Fax:?? (512) 475-9445 > -----Original Message----- > From: Satoshi Isono [mailto:isono@cray.com] > Sent: Tuesday, June 23, 2009 3:00 AM > To: Dhabaleswar Panda; Bill Barth > Cc: mvapich-discuss@cse.ohio-state.edu > Subject: RE: [mvapich-discuss] How to keep gid status > > Dear Prof. Panda, Dr. Barth, > > Thanks for your advices. I have edited mpirun_rsh.c directly in order > to > keep the order of options on mpirun_rsh command. I show you > differential > lines as below. My MVAPICH2 version is mvapich2-1.2p1. > > $ diff mpirun_rsh.c mpirun_rsh.c.org > 67d66 > < #include > 260d258 > < struct group *grpptr; > 275,276d272 > < int sg_index; > < > 279d274 > < grpptr = getgrgid(getgid()); > 417,456d411 > < //isono > < for (i = aout_index; i < argc; i++) { > < if (strchr(argv[i], '=') == NULL) { > < sg_index = i; > < break; > < } > < } > < fprintf(stdout, "\n# INPUT PARAMETERS\n"); > < fprintf(stdout, "%15s = %d\n", "argc", argc); > < fprintf(stdout, "%15s = %d\n", "option_index", option_index); > < fprintf(stdout, "%15s = %d\n", "aout_index", aout_index); > < fprintf(stdout, "%15s = %d\n", "sg_index", sg_index); > < for (i = 0; i < argc; i++) { > < fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]); > < } > < char add_argv[argc+2][31]; > < for (i = 0; i < argc; i++) { > < strcpy(add_argv[i], argv[i]); > < } > < for (i = 0; i < argc+2; i++) { > < if (i < sg_index) { > < argv[i]=add_argv[i]; > < } else if (i == sg_index) { > < strcpy(argv[i], "/usr/bin/sg"); > < i++; > < argv[i] = grpptr->gr_name; > < } else { > < argv[i]=add_argv[i-2]; > < } > < } > < argc = argc + 2; > < fprintf(stdout, "\n# RUNNING PARAMETERS\n"); > < fprintf(stdout, "%15s = %d\n", "argc", argc); > < fprintf(stdout, "%15s = %d\n", "option_index", option_index); > < fprintf(stdout, "%15s = %d\n", "aout_index", aout_index); > < fprintf(stdout, "%15s = %d\n", "sg_index", sg_index); > < for (i = 0; i < argc; i++) { > < fprintf(stdout, "%7sargv[%2d] = %s\n", " ", i, argv[i]); > < } > < > > And then, this is test result using new mpirun_rsh. > > $ mpirun_rsh -np 4 -hostfile hostfile MV2_NUM_HCAS=2 > MV2_SM_SCHEDULING=ROUND_ROBIN ./gid-mv2-itl > > # INPUT PARAMETERS > argc = 8 > option_index = 3 > aout_index = 5 > sg_index = 7 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = MV2_NUM_HCAS=2 > argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN > argv[ 7] = ./gid-mv2-itl > > # RUNNING PARAMETERS > argc = 10 > option_index = 3 > aout_index = 5 > sg_index = 7 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = MV2_NUM_HCAS=2 > argv[ 6] = MV2_SM_SCHEDULING=ROUND_ROBIN > argv[ 7] = /usr/bin/sg > argv[ 8] = GAUSSIAN > argv[ 9] = ./gid-mv2-itl > > However, there is a problem. When we embed /usr/bin/sg command in the > line of mpirun_rsh, how can we deal with an input file? In case of > using > your wrapper script, it also occurs. > > I show you a simple example with test code. > > 1) mpirun_rsh -np 4 -hostfile hostfile ./gid3 ./data > > And the following is gid3 code. > > $ cat gid3.c > #include > #include > #include > #define MAX_DATA_SIZE 1000000 > double a[MAX_DATA_SIZE]; > > int main(int argc,char *argv[]) > { > int rank,size,namelen; > char name[MPI_MAX_PROCESSOR_NAME],comm[512]; > int i,ret,dsize; > char str[80]; > FILE *fp; > > MPI_Init(&argc,&argv); > > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > MPI_Comm_size(MPI_COMM_WORLD,&size); > MPI_Get_processor_name(name,&namelen); > > //printf("%4d/%-d: %s\n",rank,size,name); > fp=fopen(argv[1],"r"); > for(i=0;i if(fgets(str,80,fp)==NULL) break; > ret=sscanf(str,"%lf",&a[i]); > fprintf(stdout,"%s_%d: data = %lf\n",name,rank,a[i]); > } > fclose(fp); > dsize=i; > fprintf(stdout,"n = %d\n",i); > > sprintf(comm,"touch testfile_%s_%d",name,rank); > system(comm); > > MPI_Finalize(); > return 0; > } > > When I try to run this code using new mpirun_rsh, I cannot run with > errors as below. > > $ mpirun_rsh -np 4 -hostfile hostfile ./gid3 ./data > > # INPUT PARAMETERS > argc = 7 > option_index = 3 > aout_index = 5 > sg_index = 5 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = ./gid3 > argv[ 6] = ./data > > # RUNNING PARAMETERS > argc = 9 > option_index = 3 > aout_index = 5 > sg_index = 5 > argv[ 0] = mpirun_rsh > argv[ 1] = -np > argv[ 2] = 4 > argv[ 3] = -hostfile > argv[ 4] = hostfile > argv[ 5] = /usr/bin/sg > argv[ 6] = GAUSSIAN > argv[ 7] = ./gid3 > argv[ 8] = ./data > MPI process terminated unexpectedly > Exit code -5 signaled from com-0643 > cleanupKilling remote processes...DONE > Signal 15 received. > > In additional information, I was able to run it using other ways > showing > below. Both (2) and (3) need the re-editing for users source code. > > 2) mpirun_rsh -np 4 -hostfile hostfile ./gid4 < ./data > 3) mpirun_rsh -np 4 -hostfile hostfile INPUT_FILENAME=data ./gid5 > > I used the way (2) to specify "stdin" for input file. About (3), source > gid5.c includes getenv("INPUT_FILENAME") function and I exported this > environment variable on option line of mpirun_rsh. > > Sorry for my long explanation. My question is how do we handle the case > of (1). Do you have any ideas to solve it? I think it is NOT good to > modify each users code. > > Please let me know some advices. > > Best regards, > Satoshi Isono > > -----Original Message----- > From: Dhabaleswar Panda [mailto:panda@cse.ohio-state.edu] > Sent: Wednesday, June 17, 2009 4:35 AM > To: Satoshi Isono > Cc: mvapich-discuss@cse.ohio-state.edu; Bill Barth; Dhabaleswar Panda > Subject: Re: [mvapich-discuss] How to keep gid status > > Hi, > > Thanks for your note. I shared your question with Dr. Bill Barth from > TACC. Folks from TACC have been using MVAPICH with mpirun_rsh in their > production environment on Ranger for quite some time. I am including > his > reply below. I hope his suggested approach will work for you. Let us > know. > > I am cc'ing Dr. Barth on this e-mail also. If there are any additional > questions, two of you might exchange additional information on this > issue. > > Thanks, > > DK > > ==================================================================== > > As you may recall, we have wrapper scripts that we use on Ranger and > Lonestar to hide the details of the mpirun_rsh command line from the > users. We call it 'ibrun'. It interacts with the scheduler (through the > environment) to generate the host list and establish the number of > tasks > to start. I don't see why it would be hard to add a call to /usr/bin/sg > in > there. > > If the user would have invoked > > mpirun_rsh -np 5 -hostfile hosts ./foo > > he simply runs > > ibrun ./foo > > on Ranger or Lonestar. 'ibrun' is basically structured as: > > #!/bin/bash > ....find NP from the envrionment.... > ....find host list.... > $MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE "$@" > So it just takes the command line args of ibrun and passes them > directly > to mpirun_rsh > > There's no reason it couldn't do > > #!/bin/bash > ....find NP.... > ....find host list.... > GROUP_ID=`id -gn` > $MPICH_HOME/bin/mpirun_rsh -np $NP -hostfile $HOSTFILE /usr/bin/sg > $GROUP_ID > "$@" > > It should be this straightforward. > > Bill. > > ====== > > > On Sun, 14 Jun 2009, Satoshi Isono wrote: > > > Dear all, > > > > I would like to know how to keep gid status when launching MPI > > processes. We know that, with sg command in mpirun_rsh command line, > it > > is successful in this case. Can you please advise me. I show a > example > > as below. > > > > Most of users belong multiple group. And accounting system is managed > > based on a group ID (GID). So, all files created from each user must > be > > owned with appropriate group owner information. > > > > A problem here is that the state of GID not saved. I would show you a > > example. Could you read it, according to numbers. > > > > 1) User logins into a login node. > > > > $ id > > uid=1002(craysp) gid=1002(cray) > > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > > > This is showing default gid is 1002(cray). This "cray" is primary > group > > ID. > > > > 2) User changes arbitrary group with newgrp command. > > > > $ newgrp GAUSSIAN > > $ id > > uid=1002(craysp) gid=8001(GAUSSIAN) > > groups=10(wheel),1002(cray),8001(GAUSSIAN) > > > > This case is that a user wants to change another group like > "GAUSSIAN". > > Certainly, I make sure it was changed to GAUSSIAN from cray. > > > > 3) User runs a MPI job with mpirun_rsh > > > > This is the simple MPI code which generates a output file. > > > > $ cat gid.c > > #include > > #include > > #include > > > > int main(int argc,char *argv[]) > > { > > int rank,size,namelen; > > char name[MPI_MAX_PROCESSOR_NAME],comm[512]; > > > > MPI_Init(&argc,&argv); > > > > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > > MPI_Comm_size(MPI_COMM_WORLD,&size); > > MPI_Get_processor_name(name,&namelen); > > > > sprintf(comm,"touch testfile_%s_%d",name,rank); > > system(comm); > > > > MPI_Finalize(); > > return 0; > > } > > > > After running this code, I want that a output file was owned by > > "GAUSSIAN" group. But it was different from that I want. Below is a > run > > script including mpirun_rsh. > > > > $ cat run_i.sh > > #!/bin/bash > > . /opt/Modules/init/bash > > module load pgi mvapich2/pgi > > mpirun_rsh -np 1 com-0644 ./gid-mv2 > > > > 4) User confirms that a created file doesn't owned appropriate group > ID. > > > > $ ls -l testfile_com-0644_0 > > -rw-r--r-- 1 craysp cray 0 Jun 8 17:50 testfile_com-0644_0 > > > > You can confirm that this file is owned "cray" not "GAUSSIAN". This > > problem is caused on mpirun_rsh command or SSH server configuration, > I > > think. > > > > 5) The way to solve it. > > > > I am considering that better way is inserting "sg" command just > before > > a.out in mpirun_rsh command line. I would show you a example. > > > > $ grep mpirun_rsh run_i.sh > > mpirun_rsh -np 1 com-0644 /usr/bin/sg `id -gn` ./gid-mv2 > > > > By specifying sg command just before a.out, It works well. > > > > $ ls -l testfile_com-0644_0 > > -rw-r--r-- 1 craysp GAUSSIAN 0 Jun 8 18:33 testfile_com-0644_0 > > > > 6) Request to you > > > > I thought that the wrapper script of mpirun_rsh would be created at > > first. But it is difficult to specify executable file location on > > command lines. There are various patterns that user describes in > > mpirun_rsh line. For example: > > > > mpirun_rsh -np 2048 -hostfile hosts.txt ./a.out Inputfile | tee -a > > Outputfile > > mpirun_rsh -np 256 -hostfile hostlist ./a.out input >> log > > mpirun_rsh -np 8 -hostfile hostfile MV2_ENABLE_AFFINITY=0 > > MV2_NUM_HCAS=4 ./numarun_mv2.sh ./a.out > > ... > > > > And we can take a look on line 1607. > > > > 1607 /* add the arguments */ > > 1608 for (i = aout_index + 1; i < argc; i++) { > > 1609 strcat(command_name, " "); > > 1610 strcat(command_name, argv[i]); > > 1611 } > > > > An example of edit: > > > > 1607 /* add the arguments */ > > 1608 strcat(command_name, " /usr/bin/sg $(id -gn)"); > > 1609 for (i = aout_index + 1; i < argc; i++) { > > 1610 strcat(command_name, " "); > > 1611 strcat(command_name, argv[i]); > > 1612 } > > > > I have edited showing above and done recompile it, but it doesn't > apply. > > If you know other way which is able to solve this problem, can you > > please tell me? > > > > Best regards, > > Satoshi Isono > > > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > From forum.san at gmail.com Fri Jun 26 10:27:22 2009 From: forum.san at gmail.com (Sangamesh B) Date: Fri Jun 26 10:27:46 2009 Subject: [mvapich-discuss] High latency on IB? Message-ID: Dear Mvapich2, The following are the osu latency tests taken with mvapich2 and mpich2. On IB: [user@cluster IB_MVAPICH2]$ /opt/mvapich2/bin/mpirun -machinefile ibmachines -np 2 ./osu_latency_MVAPICH2 # OSU MPI Latency Test (Version 2.2) # Size Latency (us) 0 20.84 1 21.74 2 21.74 4 21.69 8 21.62 16 21.67 32 21.74 64 21.75 128 21.79 256 22.65 512 23.38 1024 24.79 2048 27.43 4096 31.25 8192 38.43 16384 55.92 32768 88.96 65536 160.26 131072 240.20 262144 434.30 524288 753.20 1048576 1400.61 2097152 2619.34 4194304 5014.10 [locuz@cluster IB_MVAPICH2]$ With MPICH2 on ethernet: [user@cluster ETH_MPICH2]$ /opt/mpich2/bin/mpirun -machinefile mpich2macfile -np 2 ./osu_latency_MPICH2 # OSU MPI Latency Test (Version 2.2) # Size Latency (us) 0 62.47 1 62.53 2 62.49 4 62.48 8 62.45 16 62.47 32 62.89 64 63.60 128 123.22 256 124.83 512 124.91 1024 124.98 2048 124.92 4096 124.97 8192 187.37 16384 201.73 32768 374.72 65536 685.11 131072 1186.62 262144 2435.70 524288 4629.19 1048576 9057.72 2097152 17981.10 4194304 35723.62 [user@cluster ETH_MPICH2]$ The latency value is very high. Are these right? Because, the osu benchmarks taken earlier on other clusters, were starting from 3. What could be the reason for this? Is there any way to improve it? I've taken care of mpdboot to use proper interface (i.e. IB or ethernet) wrt mvapich2 and mpich2(the same applies to machinefile also). Thank you From panda at cse.ohio-state.edu Fri Jun 26 10:58:52 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jun 26 10:59:14 2009 Subject: [mvapich-discuss] High latency on IB? In-Reply-To: Message-ID: These numbers are too high for MVAPICH2. Which version of MVAPICH2 and which interface (gen2, uDAPL, etc.) you are using. What computing platform, network adapter and switch you are using? Check to see whether you are configuring the stack properly and whether your systems (platforms, adapters, switches and cables) are stable. DK On Fri, 26 Jun 2009, Sangamesh B wrote: > Dear Mvapich2, > > The following are the osu latency tests taken with mvapich2 and mpich2. > > On IB: > > [user@cluster IB_MVAPICH2]$ /opt/mvapich2/bin/mpirun -machinefile > ibmachines -np 2 ./osu_latency_MVAPICH2 > # OSU MPI Latency Test (Version 2.2) > # Size Latency (us) > 0 20.84 > 1 21.74 > 2 21.74 > 4 21.69 > 8 21.62 > 16 21.67 > 32 21.74 > 64 21.75 > 128 21.79 > 256 22.65 > 512 23.38 > 1024 24.79 > 2048 27.43 > 4096 31.25 > 8192 38.43 > 16384 55.92 > 32768 88.96 > 65536 160.26 > 131072 240.20 > 262144 434.30 > 524288 753.20 > 1048576 1400.61 > 2097152 2619.34 > 4194304 5014.10 > [locuz@cluster IB_MVAPICH2]$ > > With MPICH2 on ethernet: > > [user@cluster ETH_MPICH2]$ /opt/mpich2/bin/mpirun -machinefile > mpich2macfile -np 2 ./osu_latency_MPICH2 > # OSU MPI Latency Test (Version 2.2) > # Size Latency (us) > 0 62.47 > 1 62.53 > 2 62.49 > 4 62.48 > 8 62.45 > 16 62.47 > 32 62.89 > 64 63.60 > 128 123.22 > 256 124.83 > 512 124.91 > 1024 124.98 > 2048 124.92 > 4096 124.97 > 8192 187.37 > 16384 201.73 > 32768 374.72 > 65536 685.11 > 131072 1186.62 > 262144 2435.70 > 524288 4629.19 > 1048576 9057.72 > 2097152 17981.10 > 4194304 35723.62 > [user@cluster ETH_MPICH2]$ > > The latency value is very high. Are these right? Because, the osu > benchmarks taken earlier on other clusters, were starting from 3. > > What could be the reason for this? Is there any way to improve it? > > I've taken care of mpdboot to use proper interface (i.e. IB or > ethernet) wrt mvapich2 and mpich2(the same applies to machinefile > also). > > Thank you > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From cco2 at cray.com Fri Jun 26 15:01:09 2009 From: cco2 at cray.com (Christopher Co) Date: Fri Jun 26 15:01:49 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: <4A395E43.3060802@cray.com> References: <4A39163C.7060404@cray.com> <4A395E43.3060802@cray.com> Message-ID: <4A451AF5.4090407@cray.com> I have found the source of the problem with the shared memory latency through the IMB Ping Pong test. After a lot of digging, I found that the default initialization of IMB is to enable MPI_THREAD_MULTIPLE and in the Ping Pong source code, the "source" of the MPI_Send/Receive functions uses MPI_ANY_SOURCE. These two factors were skewing all the results except for Intel's MPI. After fixing IMB to be initialized to MPI_THREAD_SINGLE and changing source to be the correct value, I produced similar numbers (using cores 5 and 7 for further increased performance) between IMB Ping Pong, OSU Latency, and my own basic Ping Pong timing. The numbers are below. There is still an unknown issue about the 0 and 1 byte latencies are off (and it looks like OSU Latency numbers are correct here). From my testing, I noticed that the first part of the 1000 repetitions IMB did for 0 byte latencies were extremely high values. IMB does do an MPI_Barrier before it starts to ensure that the send/receive start together. #--------------------------------------------------- # Intel (R) MPI Benchmark Suite V3.0, MPI-1 part #--------------------------------------------------- #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 0.37 0.00 1 1000 0.43 2.19 2 1000 0.40 4.82 4 1000 0.46 8.21 8 1000 0.44 17.16 16 1000 0.45 34.10 32 1000 0.47 65.42 64 1000 0.48 127.55 128 1000 0.51 237.48 256 1000 0.56 435.19 512 1000 0.66 742.70 1024 1000 0.83 1171.62 2048 1000 1.17 1669.28 4096 1000 1.84 2124.07 8192 1000 3.24 2409.85 16384 1000 6.25 2501.23 32768 1000 10.77 2901.97 65536 640 16.68 3747.09 131072 320 25.72 4860.57 262144 160 43.62 5730.71 524288 80 81.07 6167.53 1048576 40 173.55 5762.10 2097152 20 1165.35 1716.23 4194304 10 2689.10 1487.49 # OSU MPI Latency Test v3.1.1 # Size Latency (us) 0 0.30 1 0.38 2 0.39 4 0.46 8 0.44 16 0.44 32 0.46 64 0.47 128 0.49 256 0.53 512 0.63 1024 0.79 2048 1.11 4096 1.80 8192 3.24 16384 6.36 32768 10.99 65536 16.34 131072 24.75 262144 41.51 524288 75.74 1048576 157.31 2097152 1159.87 4194304 2696.29 Christopher Co wrote: > I have found that the CX-1 I am running on has two Intel Xeon E5472 3 > GHz processors (Harpertown). Your test results were on Nehalem > processors. When I have received the correct CPU mapping, I've gotten > roughly 0.8 usec to Ping Pong 8 bytes. I wonder if this can account for > the discrepancy. Anyways, I'll investigate this further and get more > data but I wanted to throw this information out there in case it can be > helpful. > > > Chris > > Christopher Co wrote: > >> Those specifications are correct. I am seeing that the MV2_CPU_MAPPING >> option does not have an effect on which cores are chosen so when I >> launch a Ping-Pong, 2 cores are arbitrarily chosen by mpirun_rsh. One >> thing that might be hindering PLPA support is that I do not have >> sudo/root access on the machine. I installed everything into my home >> directory. Could this be the issue? >> >> >> Chris >> >> Dhabaleswar Panda wrote: >> >> >>> Could you let us know what issues you are seeing when using >>> MV2_CPU_MAPPING. The PLPA support is embedded in MVAPICH2 code. It does >>> not require any additional configure/install. I am assuming that you are >>> using the Gen2 (OFED) interface with mpirun_rsh and your systems are >>> Linux-based. >>> >>> Thanks, >>> >>> DK >>> >>> >>> On Tue, 16 Jun 2009, Christopher Co wrote: >>> >>> >>> >>> >>>> I am having issues with running processes on the cores I specify using >>>> MV2_CPU_MAPPING. Is the PLPA support for mapping MPI processes to cores >>>> embedded in MVAPICH2 or does it link to an existing PLPA on >>>> configure/install? Also, I want to confirm that no extra configure >>>> options are needed to enable this feature. >>>> >>>> >>>> Thanks, >>>> Chris >>>> >>>> Dhabaleswar Panda wrote: >>>> >>>> >>>> >>>>> Thanks for letting us know that you are using MVAPICH2 1.4. I believe you >>>>> are taking numbers on Intel systems. Please note that on Intel systems, >>>>> two cores next to each other within the same chip are numbered as 0 and 4 >>>>> (not 0 and 1). Thus, the default setting (with processes 0 and 1) run >>>>> across the chips and thus, you are seeing worse performance. Please run >>>>> your tests across cores 0 and 4 and you should be able to see better >>>>> performance. Depending on which pairs of processes you use, you may see >>>>> some differences in performance for short and large messages (depends on >>>>> whether these cores are within the same chip, same socket or across >>>>> sockets). I am attaching some numbers below on our Nehalem system with >>>>> these two CPU mappings and you can see the performance difference. >>>>> >>>>> MVAPICH2 provides flexible mapping of MPI processes to cores within a >>>>> node. You can try out performance across various pairs and you will see >>>>> performance difference. More details on such mapping are available from >>>>> here: >>>>> >>>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 >>>>> >>>>> Also, starting from MVAPICH2 1.4, a new single-copy kernel-based >>>>> shared-memory scheme (LiMIC2) is introduced. This is `off' by default. >>>>> You can use it to get better performance for larger message sizes. You >>>>> need to configure with enable-limic2 and you also need to use >>>>> MV2_SMP_USE_LIMIC2=1. More details are available from here: >>>>> >>>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-370006.9 >>>>> >>>>> Here are some performance numbers with different CPU mappings. >>>>> >>>>> OSU MPI latency with Default CPU mapping (LiMIC2 is off) >>>>> -------------------------------------------------------- >>>>> >>>>> # OSU MPI Latency Test v3.1.1 >>>>> # Size Latency (us) >>>>> 0 0.77 >>>>> 1 0.95 >>>>> 2 0.95 >>>>> 4 0.94 >>>>> 8 0.94 >>>>> 16 0.94 >>>>> 32 0.96 >>>>> 64 0.99 >>>>> 128 1.09 >>>>> 256 1.22 >>>>> 512 1.37 >>>>> 1024 1.61 >>>>> 2048 1.79 >>>>> 4096 2.43 >>>>> 8192 5.42 >>>>> 16384 6.73 >>>>> 32768 9.57 >>>>> 65536 15.34 >>>>> 131072 28.71 >>>>> 262144 53.13 >>>>> 524288 100.24 >>>>> 1048576 199.98 >>>>> 2097152 387.28 >>>>> 4194304 991.68 >>>>> >>>>> OSU MPI latency with CPU mapping 0:4 (LiMIC2 is off) >>>>> ---------------------------------------------------- >>>>> >>>>> # OSU MPI Latency Test v3.1.1 >>>>> # Size Latency (us) >>>>> 0 0.34 >>>>> 1 0.40 >>>>> 2 0.40 >>>>> 4 0.40 >>>>> 8 0.40 >>>>> 16 0.40 >>>>> 32 0.42 >>>>> 64 0.42 >>>>> 128 0.45 >>>>> 256 0.50 >>>>> 512 0.55 >>>>> 1024 0.67 >>>>> 2048 0.91 >>>>> 4096 1.35 >>>>> 8192 3.66 >>>>> 16384 5.01 >>>>> 32768 7.41 >>>>> 65536 12.90 >>>>> 131072 25.21 >>>>> 262144 49.71 >>>>> 524288 97.17 >>>>> 1048576 187.50 >>>>> 2097152 465.57 >>>>> 4194304 1196.31 >>>>> >>>>> Let us know if you get better performance with appropriate CPU mapping. >>>>> >>>>> Thanks, >>>>> >>>>> DK >>>>> >>>>> >>>>> On Mon, 15 Jun 2009, Christopher Co wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> I am using MVAPICH2 1.4 with the default configuration (since the CX-1 >>>>>> uses Mellanox Infiniband). I am fairly certain my CPU mapping was >>>>>> on-node for both cases (curiously, is there a way for MVAPICH2 to print >>>>>> out the nodes/cores running). I have the numbers for Ping Pong for the >>>>>> off-node case. I should have included this in my earlier message: >>>>>> Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time >>>>>> (usec) >>>>>> 2 1000 0 4.16 3.4 >>>>>> >>>>>> 1000 1 4.67 3.56 >>>>>> >>>>>> 1000 2 4.21 3.56 >>>>>> >>>>>> 1000 4 4.23 3.62 >>>>>> >>>>>> 1000 8 4.33 3.63 >>>>>> >>>>>> 1000 16 4.33 3.64 >>>>>> >>>>>> 1000 32 4.38 3.73 >>>>>> >>>>>> 1000 64 4.44 3.92 >>>>>> >>>>>> 1000 128 5.61 4.71 >>>>>> >>>>>> 1000 256 5.92 5.23 >>>>>> >>>>>> 1000 512 6.52 5.79 >>>>>> >>>>>> 1000 1024 7.68 7.06 >>>>>> >>>>>> 1000 2048 9.97 9.36 >>>>>> >>>>>> 1000 4096 12.39 11.97 >>>>>> >>>>>> 1000 8192 17.86 22.53 >>>>>> >>>>>> 1000 16384 27.44 28.27 >>>>>> >>>>>> 1000 32768 40.32 39.82 >>>>>> >>>>>> 640 65536 63.61 62.97 >>>>>> >>>>>> 320 131072 109.69 110.01 >>>>>> >>>>>> 160 262144 204.71 206.9 >>>>>> >>>>>> 80 524288 400.72 397.1 >>>>>> >>>>>> 40 1048576 775.64 776.45 >>>>>> >>>>>> 20 2097152 1523.95 1535.65 >>>>>> >>>>>> 10 4194304 3018.84 3054.89 >>>>>> >>>>>> >>>>>> >>>>>> Chris >>>>>> >>>>>> >>>>>> Dhabaleswar Panda wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Can you tell us which version of MVAPICH2 you are using and which >>>>>>> option(s) are configured? Are you using correct CPU mapping in both >>>>>>> cases? >>>>>>> >>>>>>> DK >>>>>>> >>>>>>> On Mon, 15 Jun 2009, Christopher Co wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am doing performance analysis on a Cray CX1 machine. I have run the >>>>>>>> Pallas MPI benchmark and have noticed a considerable performance >>>>>>>> difference between MVAPICH2 and Intel MPI on all the tests when shared >>>>>>>> memory is used. I have also run the benchmark for non-shared memory and >>>>>>>> the two performed nearly the same (MVAPICH2 was slightly faster). Is >>>>>>>> this slowdown on shared memory a known issue and/or are there fixes or >>>>>>>> switches I can enable or disable to get more speed? >>>>>>>> >>>>>>>> To give an idea of what I'm seeing, for the simple Ping Pong test for >>>>>>>> two processes on the same chip, the numbers looks like: >>>>>>>> >>>>>>>> Processes # repetitions >>>>>>>> #bytes Intel MPI time (usec) MVAPICH2 >>>>>>>> time (usec) >>>>>>>> 2 1000 0 0.35 0.94 >>>>>>>> >>>>>>>> 1000 1 0.44 1.24 >>>>>>>> >>>>>>>> 1000 2 0.45 1.17 >>>>>>>> >>>>>>>> 1000 4 0.45 1.08 >>>>>>>> >>>>>>>> 1000 8 0.45 1.11 >>>>>>>> >>>>>>>> 1000 16 0.44 1.13 >>>>>>>> >>>>>>>> 1000 32 0.45 1.21 >>>>>>>> >>>>>>>> 1000 64 0.47 1.35 >>>>>>>> >>>>>>>> 1000 128 0.48 1.75 >>>>>>>> >>>>>>>> 1000 256 0.51 2.92 >>>>>>>> >>>>>>>> 1000 512 0.57 3.41 >>>>>>>> >>>>>>>> 1000 1024 0.76 3.85 >>>>>>>> >>>>>>>> 1000 2048 0.98 4.27 >>>>>>>> >>>>>>>> 1000 4096 1.53 5.14 >>>>>>>> >>>>>>>> 1000 8192 2.59 8.04 >>>>>>>> >>>>>>>> 1000 16384 4.86 14.34 >>>>>>>> >>>>>>>> 1000 32768 7.17 33.92 >>>>>>>> >>>>>>>> 640 65536 11.65 43.27 >>>>>>>> >>>>>>>> 320 131072 20.97 66.98 >>>>>>>> >>>>>>>> 160 262144 39.64 118.58 >>>>>>>> >>>>>>>> 80 524288 84.91 224.40 >>>>>>>> >>>>>>>> 40 1048576 212.76 461.80 >>>>>>>> >>>>>>>> 20 2097152 458.55 1053.67 >>>>>>>> >>>>>>>> 10 4194304 1738.30 2649.30 >>>>>>>> >>>>>>>> >>>>>>>> Hopefully the table came out clear. MVAPICH2 always lags behind by a >>>>>>>> considerable amount. Any insight is much appreciated. Thanks! >>>>>>>> >>>>>>>> >>>>>>>> Chris Co >>>>>>>> _______________________________________________ >>>>>>>> mvapich-discuss mailing list >>>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>>> >>>>> >>> >>> >>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Fri Jun 26 16:50:29 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Jun 26 16:50:54 2009 Subject: [mvapich-discuss] Shared Memory Performance In-Reply-To: <4A451AF5.4090407@cray.com> Message-ID: Chris - Thanks for the detailed investigation of the issues and insights here. Glad to know that you are getting the desired performance with MVAPICH2 now. For 0 and 1 byte, the MPI_Barrier used in IMB might be skewing the results. IMB folks can provide more insights here. Thanks, DK On Fri, 26 Jun 2009, Christopher Co wrote: > I have found the source of the problem with the shared memory latency > through the IMB Ping Pong test. After a lot of digging, I found that > the default initialization of IMB is to enable MPI_THREAD_MULTIPLE and > in the Ping Pong source code, the "source" of the MPI_Send/Receive > functions uses MPI_ANY_SOURCE. These two factors were skewing all the > results except for Intel's MPI. After fixing IMB to be initialized to > MPI_THREAD_SINGLE and changing source to be the correct value, I > produced similar numbers (using cores 5 and 7 for further increased > performance) between IMB Ping Pong, OSU Latency, and my own basic Ping > Pong timing. The numbers are below. There is still an unknown issue > about the 0 and 1 byte latencies are off (and it looks like OSU Latency > numbers are correct here). From my testing, I noticed that the first > part of the 1000 repetitions IMB did for 0 byte latencies were extremely > high values. IMB does do an MPI_Barrier before it starts to ensure that > the send/receive start together. > > > #--------------------------------------------------- > # Intel (R) MPI Benchmark Suite V3.0, MPI-1 part > #--------------------------------------------------- > #--------------------------------------------------- > # Benchmarking PingPong > # #processes = 2 > #--------------------------------------------------- > #bytes #repetitions t[usec] Mbytes/sec > 0 1000 0.37 0.00 > 1 1000 0.43 2.19 > 2 1000 0.40 4.82 > 4 1000 0.46 8.21 > 8 1000 0.44 17.16 > 16 1000 0.45 34.10 > 32 1000 0.47 65.42 > 64 1000 0.48 127.55 > 128 1000 0.51 237.48 > 256 1000 0.56 435.19 > 512 1000 0.66 742.70 > 1024 1000 0.83 1171.62 > 2048 1000 1.17 1669.28 > 4096 1000 1.84 2124.07 > 8192 1000 3.24 2409.85 > 16384 1000 6.25 2501.23 > 32768 1000 10.77 2901.97 > 65536 640 16.68 3747.09 > 131072 320 25.72 4860.57 > 262144 160 43.62 5730.71 > 524288 80 81.07 6167.53 > 1048576 40 173.55 5762.10 > 2097152 20 1165.35 1716.23 > 4194304 10 2689.10 1487.49 > > # OSU MPI Latency Test v3.1.1 > # Size Latency (us) > 0 0.30 > 1 0.38 > 2 0.39 > 4 0.46 > 8 0.44 > 16 0.44 > 32 0.46 > 64 0.47 > 128 0.49 > 256 0.53 > 512 0.63 > 1024 0.79 > 2048 1.11 > 4096 1.80 > 8192 3.24 > 16384 6.36 > 32768 10.99 > 65536 16.34 > 131072 24.75 > 262144 41.51 > 524288 75.74 > 1048576 157.31 > 2097152 1159.87 > 4194304 2696.29 > > > > > Christopher Co wrote: > > I have found that the CX-1 I am running on has two Intel Xeon E5472 3 > > GHz processors (Harpertown). Your test results were on Nehalem > > processors. When I have received the correct CPU mapping, I've gotten > > roughly 0.8 usec to Ping Pong 8 bytes. I wonder if this can account for > > the discrepancy. Anyways, I'll investigate this further and get more > > data but I wanted to throw this information out there in case it can be > > helpful. > > > > > > Chris > > > > Christopher Co wrote: > > > >> Those specifications are correct. I am seeing that the MV2_CPU_MAPPING > >> option does not have an effect on which cores are chosen so when I > >> launch a Ping-Pong, 2 cores are arbitrarily chosen by mpirun_rsh. One > >> thing that might be hindering PLPA support is that I do not have > >> sudo/root access on the machine. I installed everything into my home > >> directory. Could this be the issue? > >> > >> > >> Chris > >> > >> Dhabaleswar Panda wrote: > >> > >> > >>> Could you let us know what issues you are seeing when using > >>> MV2_CPU_MAPPING. The PLPA support is embedded in MVAPICH2 code. It does > >>> not require any additional configure/install. I am assuming that you are > >>> using the Gen2 (OFED) interface with mpirun_rsh and your systems are > >>> Linux-based. > >>> > >>> Thanks, > >>> > >>> DK > >>> > >>> > >>> On Tue, 16 Jun 2009, Christopher Co wrote: > >>> > >>> > >>> > >>> > >>>> I am having issues with running processes on the cores I specify using > >>>> MV2_CPU_MAPPING. Is the PLPA support for mapping MPI processes to cores > >>>> embedded in MVAPICH2 or does it link to an existing PLPA on > >>>> configure/install? Also, I want to confirm that no extra configure > >>>> options are needed to enable this feature. > >>>> > >>>> > >>>> Thanks, > >>>> Chris > >>>> > >>>> Dhabaleswar Panda wrote: > >>>> > >>>> > >>>> > >>>>> Thanks for letting us know that you are using MVAPICH2 1.4. I believe you > >>>>> are taking numbers on Intel systems. Please note that on Intel systems, > >>>>> two cores next to each other within the same chip are numbered as 0 and 4 > >>>>> (not 0 and 1). Thus, the default setting (with processes 0 and 1) run > >>>>> across the chips and thus, you are seeing worse performance. Please run > >>>>> your tests across cores 0 and 4 and you should be able to see better > >>>>> performance. Depending on which pairs of processes you use, you may see > >>>>> some differences in performance for short and large messages (depends on > >>>>> whether these cores are within the same chip, same socket or across > >>>>> sockets). I am attaching some numbers below on our Nehalem system with > >>>>> these two CPU mappings and you can see the performance difference. > >>>>> > >>>>> MVAPICH2 provides flexible mapping of MPI processes to cores within a > >>>>> node. You can try out performance across various pairs and you will see > >>>>> performance difference. More details on such mapping are available from > >>>>> here: > >>>>> > >>>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-360006.8 > >>>>> > >>>>> Also, starting from MVAPICH2 1.4, a new single-copy kernel-based > >>>>> shared-memory scheme (LiMIC2) is introduced. This is `off' by default. > >>>>> You can use it to get better performance for larger message sizes. You > >>>>> need to configure with enable-limic2 and you also need to use > >>>>> MV2_SMP_USE_LIMIC2=1. More details are available from here: > >>>>> > >>>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4rc1.html#x1-370006.9 > >>>>> > >>>>> Here are some performance numbers with different CPU mappings. > >>>>> > >>>>> OSU MPI latency with Default CPU mapping (LiMIC2 is off) > >>>>> -------------------------------------------------------- > >>>>> > >>>>> # OSU MPI Latency Test v3.1.1 > >>>>> # Size Latency (us) > >>>>> 0 0.77 > >>>>> 1 0.95 > >>>>> 2 0.95 > >>>>> 4 0.94 > >>>>> 8 0.94 > >>>>> 16 0.94 > >>>>> 32 0.96 > >>>>> 64 0.99 > >>>>> 128 1.09 > >>>>> 256 1.22 > >>>>> 512 1.37 > >>>>> 1024 1.61 > >>>>> 2048 1.79 > >>>>> 4096 2.43 > >>>>> 8192 5.42 > >>>>> 16384 6.73 > >>>>> 32768 9.57 > >>>>> 65536 15.34 > >>>>> 131072 28.71 > >>>>> 262144 53.13 > >>>>> 524288 100.24 > >>>>> 1048576 199.98 > >>>>> 2097152 387.28 > >>>>> 4194304 991.68 > >>>>> > >>>>> OSU MPI latency with CPU mapping 0:4 (LiMIC2 is off) > >>>>> ---------------------------------------------------- > >>>>> > >>>>> # OSU MPI Latency Test v3.1.1 > >>>>> # Size Latency (us) > >>>>> 0 0.34 > >>>>> 1 0.40 > >>>>> 2 0.40 > >>>>> 4 0.40 > >>>>> 8 0.40 > >>>>> 16 0.40 > >>>>> 32 0.42 > >>>>> 64 0.42 > >>>>> 128 0.45 > >>>>> 256 0.50 > >>>>> 512 0.55 > >>>>> 1024 0.67 > >>>>> 2048 0.91 > >>>>> 4096 1.35 > >>>>> 8192 3.66 > >>>>> 16384 5.01 > >>>>> 32768 7.41 > >>>>> 65536 12.90 > >>>>> 131072 25.21 > >>>>> 262144 49.71 > >>>>> 524288 97.17 > >>>>> 1048576 187.50 > >>>>> 2097152 465.57 > >>>>> 4194304 1196.31 > >>>>> > >>>>> Let us know if you get better performance with appropriate CPU mapping. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> DK > >>>>> > >>>>> > >>>>> On Mon, 15 Jun 2009, Christopher Co wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> I am using MVAPICH2 1.4 with the default configuration (since the CX-1 > >>>>>> uses Mellanox Infiniband). I am fairly certain my CPU mapping was > >>>>>> on-node for both cases (curiously, is there a way for MVAPICH2 to print > >>>>>> out the nodes/cores running). I have the numbers for Ping Pong for the > >>>>>> off-node case. I should have included this in my earlier message: > >>>>>> Processes # repetitions #bytes Intel MPI time (usec)] MVAPICH2 time > >>>>>> (usec) > >>>>>> 2 1000 0 4.16 3.4 > >>>>>> > >>>>>> 1000 1 4.67 3.56 > >>>>>> > >>>>>> 1000 2 4.21 3.56 > >>>>>> > >>>>>> 1000 4 4.23 3.62 > >>>>>> > >>>>>> 1000 8 4.33 3.63 > >>>>>> > >>>>>> 1000 16 4.33 3.64 > >>>>>> > >>>>>> 1000 32 4.38 3.73 > >>>>>> > >>>>>> 1000 64 4.44 3.92 > >>>>>> > >>>>>> 1000 128 5.61 4.71 > >>>>>> > >>>>>> 1000 256 5.92 5.23 > >>>>>> > >>>>>> 1000 512 6.52 5.79 > >>>>>> > >>>>>> 1000 1024 7.68 7.06 > >>>>>> > >>>>>> 1000 2048 9.97 9.36 > >>>>>> > >>>>>> 1000 4096 12.39 11.97 > >>>>>> > >>>>>> 1000 8192 17.86 22.53 > >>>>>> > >>>>>> 1000 16384 27.44 28.27 > >>>>>> > >>>>>> 1000 32768 40.32 39.82 > >>>>>> > >>>>>> 640 65536 63.61 62.97 > >>>>>> > >>>>>> 320 131072 109.69 110.01 > >>>>>> > >>>>>> 160 262144 204.71 206.9 > >>>>>> > >>>>>> 80 524288 400.72 397.1 > >>>>>> > >>>>>> 40 1048576 775.64 776.45 > >>>>>> > >>>>>> 20 2097152 1523.95 1535.65 > >>>>>> > >>>>>> 10 4194304 3018.84 3054.89 > >>>>>> > >>>>>> > >>>>>> > >>>>>> Chris > >>>>>> > >>>>>> > >>>>>> Dhabaleswar Panda wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Can you tell us which version of MVAPICH2 you are using and which > >>>>>>> option(s) are configured? Are you using correct CPU mapping in both > >>>>>>> cases? > >>>>>>> > >>>>>>> DK > >>>>>>> > >>>>>>> On Mon, 15 Jun 2009, Christopher Co wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I am doing performance analysis on a Cray CX1 machine. I have run the > >>>>>>>> Pallas MPI benchmark and have noticed a considerable performance > >>>>>>>> difference between MVAPICH2 and Intel MPI on all the tests when shared > >>>>>>>> memory is used. I have also run the benchmark for non-shared memory and > >>>>>>>> the two performed nearly the same (MVAPICH2 was slightly faster). Is > >>>>>>>> this slowdown on shared memory a known issue and/or are there fixes or > >>>>>>>> switches I can enable or disable to get more speed? > >>>>>>>> > >>>>>>>> To give an idea of what I'm seeing, for the simple Ping Pong test for > >>>>>>>> two processes on the same chip, the numbers looks like: > >>>>>>>> > >>>>>>>> Processes # repetitions > >>>>>>>> #bytes Intel MPI time (usec) MVAPICH2 > >>>>>>>> time (usec) > >>>>>>>> 2 1000 0 0.35 0.94 > >>>>>>>> > >>>>>>>> 1000 1 0.44 1.24 > >>>>>>>> > >>>>>>>> 1000 2 0.45 1.17 > >>>>>>>> > >>>>>>>> 1000 4 0.45 1.08 > >>>>>>>> > >>>>>>>> 1000 8 0.45 1.11 > >>>>>>>> > >>>>>>>> 1000 16 0.44 1.13 > >>>>>>>> > >>>>>>>> 1000 32 0.45 1.21 > >>>>>>>> > >>>>>>>> 1000 64 0.47 1.35 > >>>>>>>> > >>>>>>>> 1000 128 0.48 1.75 > >>>>>>>> > >>>>>>>> 1000 256 0.51 2.92 > >>>>>>>> > >>>>>>>> 1000 512 0.57 3.41 > >>>>>>>> > >>>>>>>> 1000 1024 0.76 3.85 > >>>>>>>> > >>>>>>>> 1000 2048 0.98 4.27 > >>>>>>>> > >>>>>>>> 1000 4096 1.53 5.14 > >>>>>>>> > >>>>>>>> 1000 8192 2.59 8.04 > >>>>>>>> > >>>>>>>> 1000 16384 4.86 14.34 > >>>>>>>> > >>>>>>>> 1000 32768 7.17 33.92 > >>>>>>>> > >>>>>>>> 640 65536 11.65 43.27 > >>>>>>>> > >>>>>>>> 320 131072 20.97 66.98 > >>>>>>>> > >>>>>>>> 160 262144 39.64 118.58 > >>>>>>>> > >>>>>>>> 80 524288 84.91 224.40 > >>>>>>>> > >>>>>>>> 40 1048576 212.76 461.80 > >>>>>>>> > >>>>>>>> 20 2097152 458.55 1053.67 > >>>>>>>> > >>>>>>>> 10 4194304 1738.30 2649.30 > >>>>>>>> > >>>>>>>> > >>>>>>>> Hopefully the table came out clear. MVAPICH2 always lags behind by a > >>>>>>>> considerable amount. Any insight is much appreciated. Thanks! > >>>>>>>> > >>>>>>>> > >>>>>>>> Chris Co > >>>>>>>> _______________________________________________ > >>>>>>>> mvapich-discuss mailing list > >>>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > >>> > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > From forum.san at gmail.com Sat Jun 27 10:39:46 2009 From: forum.san at gmail.com (Sangamesh B) Date: Sat Jun 27 10:40:08 2009 Subject: [mvapich-discuss] High latency on IB? In-Reply-To: References: Message-ID: Hi, I had used mvapich2-0.9.8. Voltaire switch, Mellanox cards. The reason for higher latency, mvapich2 was not linked with ofed libraries. Now Mvapich2-1.2rc1 is installed, and got following latency: # OSU MPI Latency Test (Version 2.2) # Size Latency (us) 0 3.96 1 4.04 2 4.04 4 4.06 8 4.04 16 4.14 32 4.17 64 4.45 128 5.51 256 6.11 512 6.98 1024 8.50 2048 10.42 4096 14.26 8192 21.76 16384 39.10 32768 59.67 65536 101.40 131072 185.33 262144 351.24 524288 684.08 1048576 1350.54 2097152 2681.89 4194304 5341.90 Another question: Some of the CFD applications (binary distribution) work with VAPI interface. To build MVAPICH2-0.9.8 for VAPI (new MVAPICH2 not supporting VAPI), what drivers have to be used? Is OFED sufficient? If its not, then mention which package has to be used with the URL to download. Lets know what are all the possibilities to get mvapich2 installed with VAPI.. -- Thank you On Fri, Jun 26, 2009 at 8:28 PM, Dhabaleswar Panda wrote: > These numbers are too high for MVAPICH2. Which version of MVAPICH2 and > which interface (gen2, uDAPL, etc.) you are using. What computing > platform, network adapter and switch you are using? Check to see whether > you are configuring the stack properly and whether your systems > (platforms, adapters, switches and cables) are stable. > > DK > > On Fri, 26 Jun 2009, Sangamesh B wrote: > >> Dear Mvapich2, >> >> ? ? ? The following are the osu latency tests taken with mvapich2 and mpich2. >> >> On IB: >> >> [user@cluster IB_MVAPICH2]$ /opt/mvapich2/bin/mpirun -machinefile >> ibmachines -np 2 ./osu_latency_MVAPICH2 >> # OSU MPI Latency Test (Version 2.2) >> # Size ? ? ? ? ?Latency (us) >> 0 ? ? ? ? ? ? ? 20.84 >> 1 ? ? ? ? ? ? ? 21.74 >> 2 ? ? ? ? ? ? ? 21.74 >> 4 ? ? ? ? ? ? ? 21.69 >> 8 ? ? ? ? ? ? ? 21.62 >> 16 ? ? ? ? ? ? ?21.67 >> 32 ? ? ? ? ? ? ?21.74 >> 64 ? ? ? ? ? ? ?21.75 >> 128 ? ? ? ? ? ? 21.79 >> 256 ? ? ? ? ? ? 22.65 >> 512 ? ? ? ? ? ? 23.38 >> 1024 ? ? ? ? ? ?24.79 >> 2048 ? ? ? ? ? ?27.43 >> 4096 ? ? ? ? ? ?31.25 >> 8192 ? ? ? ? ? ?38.43 >> 16384 ? ? ? ? ? 55.92 >> 32768 ? ? ? ? ? 88.96 >> 65536 ? ? ? ? ? 160.26 >> 131072 ? ? ? ? ?240.20 >> 262144 ? ? ? ? ?434.30 >> 524288 ? ? ? ? ?753.20 >> 1048576 ? ? ? ? 1400.61 >> 2097152 ? ? ? ? 2619.34 >> 4194304 ? ? ? ? 5014.10 >> [locuz@cluster IB_MVAPICH2]$ >> >> With MPICH2 on ethernet: >> >> [user@cluster ETH_MPICH2]$ /opt/mpich2/bin/mpirun -machinefile >> mpich2macfile -np 2 ./osu_latency_MPICH2 >> # OSU MPI Latency Test (Version 2.2) >> # Size ? ? ? ? ?Latency (us) >> 0 ? ? ? ? ? ? ? 62.47 >> 1 ? ? ? ? ? ? ? 62.53 >> 2 ? ? ? ? ? ? ? 62.49 >> 4 ? ? ? ? ? ? ? 62.48 >> 8 ? ? ? ? ? ? ? 62.45 >> 16 ? ? ? ? ? ? ?62.47 >> 32 ? ? ? ? ? ? ?62.89 >> 64 ? ? ? ? ? ? ?63.60 >> 128 ? ? ? ? ? ? 123.22 >> 256 ? ? ? ? ? ? 124.83 >> 512 ? ? ? ? ? ? 124.91 >> 1024 ? ? ? ? ? ?124.98 >> 2048 ? ? ? ? ? ?124.92 >> 4096 ? ? ? ? ? ?124.97 >> 8192 ? ? ? ? ? ?187.37 >> 16384 ? ? ? ? ? 201.73 >> 32768 ? ? ? ? ? 374.72 >> 65536 ? ? ? ? ? 685.11 >> 131072 ? ? ? ? ?1186.62 >> 262144 ? ? ? ? ?2435.70 >> 524288 ? ? ? ? ?4629.19 >> 1048576 ? ? ? ? 9057.72 >> 2097152 ? ? ? ? 17981.10 >> 4194304 ? ? ? ? 35723.62 >> [user@cluster ETH_MPICH2]$ >> >> The latency value is very high. Are these right? Because, the osu >> benchmarks taken earlier on other clusters, were starting from 3. >> >> What could be the reason for this? Is there any way to improve it? >> >> I've taken care of mpdboot to use proper interface (i.e. IB or >> ethernet) wrt mvapich2 and mpich2(the same applies to machinefile >> also). >> >> Thank ?you >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > > From panda at cse.ohio-state.edu Sat Jun 27 11:16:33 2009 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Jun 27 11:16:55 2009 Subject: [mvapich-discuss] High latency on IB? In-Reply-To: Message-ID: Good to know that you found the problem. VAPI interface has been deprecated from both MVAPICH and MVAPICH2 since people are not using it any longer. You need to move to OFED Gen2 interface. That is the most commonly used interface these days. Also, the latest version of MVAPICH2 is 1.4RC1. You need to move to that to get the best set of features, performance, scalability and fault tolerance. DK On Sat, 27 Jun 2009, Sangamesh B wrote: > Hi, > > I had used mvapich2-0.9.8. Voltaire switch, Mellanox cards. > > The reason for higher latency, mvapich2 was not linked with ofed libraries. > Now Mvapich2-1.2rc1 is installed, and got following latency: > > # OSU MPI Latency Test (Version 2.2) > # Size Latency (us) > 0 3.96 > 1 4.04 > 2 4.04 > 4 4.06 > 8 4.04 > 16 4.14 > 32 4.17 > 64 4.45 > 128 5.51 > 256 6.11 > 512 6.98 > 1024 8.50 > 2048 10.42 > 4096 14.26 > 8192 21.76 > 16384 39.10 > 32768 59.67 > 65536 101.40 > 131072 185.33 > 262144 351.24 > 524288 684.08 > 1048576 1350.54 > 2097152 2681.89 > 4194304 5341.90 > > Another question: Some of the CFD applications (binary distribution) > work with VAPI interface. To build MVAPICH2-0.9.8 for VAPI (new > MVAPICH2 not supporting VAPI), what drivers have to be used? > > Is OFED sufficient? If its not, then mention which package has to be > used with the URL to download. > > Lets know what are all the possibilities to get mvapich2 installed with VAPI.. > > -- > Thank you > > On Fri, Jun 26, 2009 at 8:28 PM, Dhabaleswar > Panda wrote: > > These numbers are too high for MVAPICH2. Which version of MVAPICH2 and > > which interface (gen2, uDAPL, etc.) you are using. What computing > > platform, network adapter and switch you are using? Check to see whether > > you are configuring the stack properly and whether your systems > > (platforms, adapters, switches and cables) are stable. > > > > DK > > > > On Fri, 26 Jun 2009, Sangamesh B wrote: > > > >> Dear Mvapich2, > >> > >>       The following are the osu latency tests taken with mvapich2 and mpich2. > >> > >> On IB: > >> > >> [user@cluster IB_MVAPICH2]$ /opt/mvapich2/bin/mpirun -machinefile > >> ibmachines -np 2 ./osu_latency_MVAPICH2 > >> # OSU MPI Latency Test (Version 2.2) > >> # Size          Latency (us) > >> 0               20.84 > >> 1               21.74 > >> 2               21.74 > >> 4               21.69 > >> 8               21.62 > >> 16              21.67 > >> 32              21.74 > >> 64              21.75 > >> 128             21.79 > >> 256             22.65 > >> 512             23.38 > >> 1024            24.79 > >> 2048            27.43 > >> 4096            31.25 > >> 8192            38.43 > >> 16384           55.92 > >> 32768           88.96 > >> 65536           160.26 > >> 131072          240.20 > >> 262144          434.30 > >> 524288          753.20 > >> 1048576         1400.61 > >> 2097152         2619.34 > >> 4194304         5014.10 > >> [locuz@cluster IB_MVAPICH2]$ > >> > >> With MPICH2 on ethernet: > >> > >> [user@cluster ETH_MPICH2]$ /opt/mpich2/bin/mpirun -machinefile > >> mpich2macfile -np 2 ./osu_latency_MPICH2 > >> # OSU MPI Latency Test (Version 2.2) > >> # Size          Latency (us) > >> 0               62.47 > >> 1               62.53 > >> 2               62.49 > >> 4               62.48 > >> 8               62.45 > >> 16              62.47 > >> 32              62.89 > >> 64              63.60 > >> 128             123.22 > >> 256             124.83 > >> 512             124.91 > >> 1024            124.98 > >> 2048            124.92 > >> 4096            124.97 > >> 8192            187.37 > >> 16384           201.73 > >> 32768           374.72 > >> 65536           685.11 > >> 131072          1186.62 > >> 262144          2435.70 > >> 524288          4629.19 > >> 1048576         9057.72 > >> 2097152         17981.10 > >> 4194304         35723.62 > >> [user@cluster ETH_MPICH2]$ > >> > >> The latency value is very high. Are these right? Because, the osu > >> benchmarks taken earlier on other clusters, were starting from 3. > >> > >> What could be the reason for this? Is there any way to improve it? > >> > >> I've taken care of mpdboot to use proper interface (i.e. IB or > >> ethernet) wrt mvapich2 and mpich2(the same applies to machinefile > >> also). > >> > >> Thank  you > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From rpsmic001 at gmail.com Mon Jun 29 01:45:01 2009 From: rpsmic001 at gmail.com (Michael Rapson) Date: Mon Jun 29 01:45:25 2009 Subject: [mvapich-discuss] compiling MVAPICH against GCC-4.3.3 Message-ID: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> Hi all, I am coming in at the end of a conversation between Atencio and Jonathan discussing a problem compiling against GCC-4.3.3. I am also compiling against GCC-4.3.3 and am running into the same issue with the iostream.h header file (and I presume many similar files since this is just a testcase). I am new to the MVAPICH mailing list and it seems like a patch for the problem has been written, but it hasn't made it onto the archives of mvapich-discuss. Does someone have a copy of the patch that they could send me or where else could I download it from? I have been trying to install mvapich-1.1-2009-06-21. Is it likely that the patch would have already been incorporated into the newer daily tarballs? Thanks for your help. Regards, Michael From perkinjo at cse.ohio-state.edu Mon Jun 29 07:59:51 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Mon Jun 29 08:00:21 2009 Subject: [mvapich-discuss] compiling MVAPICH against GCC-4.3.3 In-Reply-To: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> References: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> Message-ID: <20090629115951.GA2432@cse.ohio-state.edu> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090629/3f0aa7e1/attachment-0001.bin From saurabh.barve at gmail.com Tue Jun 30 01:16:01 2009 From: saurabh.barve at gmail.com (Saurabh Barve) Date: Tue Jun 30 01:16:25 2009 Subject: [mvapich-discuss] Problem Compiling MVAPICH2 using PGI Compilers Message-ID: Hi, I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux machine, and am running into errors. Here is how I run the configure script: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable-f77 -- enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX --with- romio --without-mpe --prefix=/opt/mvapich2/pgi ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ This is the error I get when I run 'make': ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ... ... make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ mvapich2-1.4rc1/src/pm/mpirun' make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ mvapich2-1.4rc1/src/pm' make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ mvapich2-1.4rc1/src/pm' make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ mvapich2-1.4rc1/src' make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ mvapich2-1.4rc1/examples' ../bin/mpicc -o cpi cpi.o -lm /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ libmpich.a(mpid_irecv.o): In function `MPID_Irecv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ libmpich.a(mpid_recv.o): In function `MPID_Recv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' make[1]: *** [cpi] Error 2 make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ mvapich2-1.4rc1/examples' make: *** [all-redirect] Error 2 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ More details about the system I'm using: 1) Operating System - CentOS Linux 5.0 2) Kernel version - 2.6.18-8.1.14.el5 3) PGI Compiler Suite - Version 8.0-2 4) MVAPICH2 version 1.4rc1 MVAPICH2 builds fine for me when I use the Intel compilers (icc, icpc, ifort) and use the same configure options as above. What am I doing wrong? Thanks, Saurabh -- Fortune favors the Barve From perkinjo at cse.ohio-state.edu Tue Jun 30 08:00:06 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Tue Jun 30 08:00:49 2009 Subject: [mvapich-discuss] Problem Compiling MVAPICH2 using PGI Compilers In-Reply-To: References: Message-ID: <20090630120005.GA2761@cse.ohio-state.edu> On Mon, Jun 29, 2009 at 10:16:01PM -0700, Saurabh Barve wrote: > Hi, > > I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux > machine, and am running into errors. > > Here is how I run the configure script: > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - > DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable-f77 -- > enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX --with- > romio --without-mpe --prefix=/opt/mvapich2/pgi > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Don't set CFLAGS or LIBS, this is taken care of by the supplied configure options. There is also shouldn't be a need to specify the arch. Do you get the same error with the following command? ./configure --with-device=ch3:psm --with-romio --without-mpe --prefix=/opt/mvapich2/pgi CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 > > > This is the error I get when I run 'make': > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ... > ... > make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm/mpirun' > make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src' > make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > ../bin/mpicc -o cpi cpi.o -lm > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_irecv.o): In function `MPID_Irecv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_recv.o): In function `MPID_Recv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' > make[1]: *** [cpi] Error 2 > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > make: *** [all-redirect] Error 2 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > More details about the system I'm using: > > 1) Operating System - CentOS Linux 5.0 > > 2) Kernel version - 2.6.18-8.1.14.el5 > > 3) PGI Compiler Suite - Version 8.0-2 > > 4) MVAPICH2 version 1.4rc1 > > MVAPICH2 builds fine for me when I use the Intel compilers (icc, icpc, > ifort) and use the same configure options as above. > > What am I doing wrong? > > Thanks, > Saurabh > -- > Fortune favors the Barve > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090630/7ddef1ff/attachment.bin From rpsmic001 at gmail.com Tue Jun 30 09:11:01 2009 From: rpsmic001 at gmail.com (Michael Rapson) Date: Tue Jun 30 09:11:29 2009 Subject: [mvapich-discuss] compiling MVAPICH against GCC-4.3.3 In-Reply-To: <20090629115951.GA2432@cse.ohio-state.edu> References: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> <20090629115951.GA2432@cse.ohio-state.edu> Message-ID: <73bd53f00906300611n47d7e8c3o3f966b890de68e8f@mail.gmail.com> Hi there Jonathan, Thanks for the patch, I applied it and it fixed the c++ problem. I was able to build the library, but needed to use a work around for an unrelated problem (I think). For some reason the value of MPI_ADDRESS_KIND and MPI_OFFSET_KIND in mpif.h (all copies) is not determined correctly. I edited all versions of mpif.h by hand and gave these terms the value 8 (found in other mvapich install) then ran make mpi-modules, make install, and make mpi-lib-test. The library passed all its internal checks and I was planning to send you a note letting you know that it worked once I had run some of the tests in packages depending on mpi. (PETSc and Trilinos) cracking the llsubmit script is taking longer than I thought though (I am getting "Child exited abnormally!" errors which I see from the archives can be related to the scheduler (the cluster uses Tivoli Load Leveler software). So summary, thanks I am 90% sure the patch worked on my system but am tracking down the correct submission script before I can be certain. Thanks for the help! Cheers, Michael On Mon, Jun 29, 2009 at 1:59 PM, Jonathan Perkins wrote: > On Mon, Jun 29, 2009 at 07:45:01AM +0200, Michael Rapson wrote: >> Hi all, >> >> I am coming in at the end of a conversation between Atencio and >> Jonathan discussing a problem compiling against GCC-4.3.3. I am also >> compiling against GCC-4.3.3 and am running into the same issue with >> the iostream.h header file (and I presume many similar files since >> this is just a testcase). I am new to the MVAPICH mailing list and it >> seems like a patch for the problem has been written, but it hasn't >> made it onto >> the archives of mvapich-discuss. Does someone have a copy of the patch >> that they could send me or where else could I download it from? > > I'm attaching it in this reply, it won't show on the list but you'll get > it directly. > >> >> I have been trying to install mvapich-1.1-2009-06-21. Is it likely >> that the patch would have already been incorporated into the newer >> daily tarballs? > > It should have been, this is an oversight on my part. ?It'll be in > tonight's nightly tarball. > >> >> Thanks for your help. >> >> Regards, >> Michael >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > From perkinjo at cse.ohio-state.edu Tue Jun 30 09:53:14 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Tue Jun 30 09:53:40 2009 Subject: [mvapich-discuss] compiling MVAPICH against GCC-4.3.3 In-Reply-To: <73bd53f00906300611n47d7e8c3o3f966b890de68e8f@mail.gmail.com> References: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> <20090629115951.GA2432@cse.ohio-state.edu> <73bd53f00906300611n47d7e8c3o3f966b890de68e8f@mail.gmail.com> Message-ID: <20090630135314.GD2761@cse.ohio-state.edu> On Tue, Jun 30, 2009 at 03:11:01PM +0200, Michael Rapson wrote: > Hi there Jonathan, > > Thanks for the patch, I applied it and it fixed the c++ problem. Glad to hear this. > > I was able to build the library, but needed to use a work around for > an unrelated problem (I think). For some reason the value of > MPI_ADDRESS_KIND and MPI_OFFSET_KIND in mpif.h (all copies) is not > determined correctly. I edited all versions of mpif.h by hand and > gave these terms the value 8 (found in other mvapich install) then ran > make mpi-modules, make install, and make mpi-lib-test. That's odd. Is there anything exotic about your system (architecture)? > > The library passed all its internal checks and I was planning to send > you a note letting you know that it worked once I had run some of the > tests in packages depending on mpi. (PETSc and Trilinos) cracking the > llsubmit script is taking longer than I thought though (I am getting > "Child exited abnormally!" errors which I see from the archives can be > related to the scheduler (the cluster uses Tivoli Load Leveler > software). > > So summary, thanks I am 90% sure the patch worked on my system but am > tracking down the correct submission script before I can be certain. > Thanks for the help! Thanks for the feedback. > > Cheers, > Michael > > On Mon, Jun 29, 2009 at 1:59 PM, Jonathan > Perkins wrote: > > On Mon, Jun 29, 2009 at 07:45:01AM +0200, Michael Rapson wrote: > >> Hi all, > >> > >> I am coming in at the end of a conversation between Atencio and > >> Jonathan discussing a problem compiling against GCC-4.3.3. I am also > >> compiling against GCC-4.3.3 and am running into the same issue with > >> the iostream.h header file (and I presume many similar files since > >> this is just a testcase). I am new to the MVAPICH mailing list and it > >> seems like a patch for the problem has been written, but it hasn't > >> made it onto > >> the archives of mvapich-discuss. Does someone have a copy of the patch > >> that they could send me or where else could I download it from? > > > > I'm attaching it in this reply, it won't show on the list but you'll get > > it directly. > > > >> > >> I have been trying to install mvapich-1.1-2009-06-21. Is it likely > >> that the patch would have already been incorporated into the newer > >> daily tarballs? > > > > It should have been, this is an oversight on my part. ?It'll be in > > tonight's nightly tarball. > > > >> > >> Thanks for your help. > >> > >> Regards, > >> Michael > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > -- > > Jonathan Perkins > > http://www.cse.ohio-state.edu/~perkinjo > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090630/cf8b6488/attachment.bin From rpsmic001 at gmail.com Tue Jun 30 10:21:08 2009 From: rpsmic001 at gmail.com (Michael Rapson) Date: Tue Jun 30 10:21:40 2009 Subject: [mvapich-discuss] compiling MVAPICH against GCC-4.3.3 In-Reply-To: <20090630135314.GD2761@cse.ohio-state.edu> References: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> <20090629115951.GA2432@cse.ohio-state.edu> <73bd53f00906300611n47d7e8c3o3f966b890de68e8f@mail.gmail.com> <20090630135314.GD2761@cse.ohio-state.edu> Message-ID: <73bd53f00906300721r1b084a2bo8f2d752ccb61ed2f@mail.gmail.com> Hi there, The exotic thing about the architecture is that everything dates from 2006. It is an IBM e1350 Cluster with AMD Opteron processors (x86_64) but it was commissioned in 2006 and the software has been left as is since then. (Only the original software is supported fully). In particular it is still using OFED version 1.1. There is an existing MVAPICH installation on the machine (version 0.9.8 apparently) which I have used to compare my new build against, in particular getting those parameters I mentioned, but I need to use more recent gcc compilers for some of my code. (The gcc version on the machine is 3.3.3.) I have figured out / borrowed from other applications a good submit script for the old MVAPICH install (pasted below) but as I say this method of running the executable gives a 'Child exited abnormally!' error (slightly better than when I use mpirun_rsh -rsh alternative where I get permission denied messages). So regarding the unusual architecture, its mainly that all the system files are very old compared to the gcc-4.3.3 compiler. (I have used one of the other tips for linking to an old ofed package, removing the -DXRC flag as suggested in http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-March/002184.html.) Perhaps this explains why the PARAMETER statement's values aren't determined correctly? Any ideas about whether some of the environment variables below are necessary or could be causing problems would be most appreciated. Thanks, Michael # @ shell = /usr/bin/ksh # @ output = $(Executable).$(Cluster).out # @ error = $(Executable).$(Cluster).err # @ wall_clock_limit = 12:00:00 # @ class= UAT # @ node = 4 # @ node_usage = not_shared # @ job_type = MPICH # @ notification = error # @ resources = ConsumableCpus(1) # @ tasks_per_node = 4 # @ environment = GOTO_NUM_THREADS=1; OMP_NUM_THREADS=1; VIADEV_CLUSTER_SIZE=AUTO; VIADEV_DEFAULT_RETRY_COUNT=15; VIADEV_DEFAULT_TIME_OUT=22; VIADEV_NUM_RDMA_BUFFER=4; VIADEV_ADAPTIVE_RDMA_LIMIT=2; VIADEV_SQ_SIZE_MAX=64; VIADEV_DEFAULT_MAX_SG_LIST=1; VIADEV_MAX_INLINE_SIZE=80; VIADEV_SRQ_SIZE=2048; VIADEV_VBUF_TOTAL_SIZE=2048; VIADEV_VBUF_POOL_SIZE=512; VIADEV_VBUF_SECONDARY_POOL_SIZE=128; VIADEV_ENABLE_AFFINITY=0; DISABLE_RDMA_ALLTOALL=1; DISABLE_RDMA_ALLGATHER=1; DISABLE_RDMA_BARRIER=1 # @ queue echo "++++++++++" echo "host files is:" echo " " cat $LOADL_HOSTFILE cp $LOADL_HOSTFILE $LOADL_STEP_OUT.hostfile echo " " echo "++++++++++" /CHPC/usr/local/mvapich/bin/mpirun \ -np $LOADL_TOTAL_TASKS \ -hostfile $LOADL_HOSTFILE \ src/snes/examples/tutorials/ex5f #src/snes/examples/tutorials/ex19 On Tue, Jun 30, 2009 at 3:53 PM, Jonathan Perkins wrote: > On Tue, Jun 30, 2009 at 03:11:01PM +0200, Michael Rapson wrote: >> Hi there Jonathan, >> >> Thanks for the patch, I applied it and it fixed the c++ problem. > > Glad to hear this. > >> >> I was able to build the library, but needed to use a work around for >> an unrelated problem (I think). For some reason the value of >> MPI_ADDRESS_KIND and MPI_OFFSET_KIND in mpif.h (all copies) is not >> determined correctly. I ?edited all versions of mpif.h by hand and >> gave these terms the value 8 (found in other mvapich install) then ran >> make mpi-modules, make install, and make mpi-lib-test. > > That's odd. ?Is there anything exotic about your system (architecture)? > >> >> The library passed all its internal checks and I was planning to send >> you a note letting you know that it worked once I had run some of the >> tests in packages depending on mpi. (PETSc and Trilinos) cracking the >> llsubmit script is taking longer than I thought though (I am getting >> "Child exited abnormally!" errors which I see from the archives can be >> related to the scheduler (the cluster uses Tivoli Load Leveler >> software). >> >> So summary, thanks I am 90% sure the patch worked on my system but am >> tracking down the correct submission script before I can be certain. >> Thanks for the help! > > Thanks for the feedback. > >> >> Cheers, >> Michael >> >> On Mon, Jun 29, 2009 at 1:59 PM, Jonathan >> Perkins wrote: >> > On Mon, Jun 29, 2009 at 07:45:01AM +0200, Michael Rapson wrote: >> >> Hi all, >> >> >> >> I am coming in at the end of a conversation between Atencio and >> >> Jonathan discussing a problem compiling against GCC-4.3.3. I am also >> >> compiling against GCC-4.3.3 and am running into the same issue with >> >> the iostream.h header file (and I presume many similar files since >> >> this is just a testcase). I am new to the MVAPICH mailing list and it >> >> seems like a patch for the problem has been written, but it hasn't >> >> made it onto >> >> the archives of mvapich-discuss. Does someone have a copy of the patch >> >> that they could send me or where else could I download it from? >> > >> > I'm attaching it in this reply, it won't show on the list but you'll get >> > it directly. >> > >> >> >> >> I have been trying to install mvapich-1.1-2009-06-21. Is it likely >> >> that the patch would have already been incorporated into the newer >> >> daily tarballs? >> > >> > It should have been, this is an oversight on my part. ?It'll be in >> > tonight's nightly tarball. >> > >> >> >> >> Thanks for your help. >> >> >> >> Regards, >> >> Michael >> >> _______________________________________________ >> >> mvapich-discuss mailing list >> >> mvapich-discuss@cse.ohio-state.edu >> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > >> > -- >> > Jonathan Perkins >> > http://www.cse.ohio-state.edu/~perkinjo >> > >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > -- > Jonathan Perkins > http://www.cse.ohio-state.edu/~perkinjo > From perkinjo at cse.ohio-state.edu Tue Jun 30 11:21:43 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Tue Jun 30 11:22:07 2009 Subject: [mvapich-discuss] compiling MVAPICH against GCC-4.3.3 In-Reply-To: <73bd53f00906300721r1b084a2bo8f2d752ccb61ed2f@mail.gmail.com> References: <73bd53f00906282245w7e569ea7r5da5c14a9090d21e@mail.gmail.com> <20090629115951.GA2432@cse.ohio-state.edu> <73bd53f00906300611n47d7e8c3o3f966b890de68e8f@mail.gmail.com> <20090630135314.GD2761@cse.ohio-state.edu> <73bd53f00906300721r1b084a2bo8f2d752ccb61ed2f@mail.gmail.com> Message-ID: <20090630152143.GE2761@cse.ohio-state.edu> On Tue, Jun 30, 2009 at 04:21:08PM +0200, Michael Rapson wrote: > Hi there, > > The exotic thing about the architecture is that everything dates from > 2006. It is an IBM e1350 Cluster with AMD Opteron processors (x86_64) > but it was commissioned in 2006 and the software has been left as is > since then. (Only the original software is supported fully). In > particular it is still using OFED version 1.1. I wouldn't think that this would cause a problem. Can you try using a fresh copy of the source (perhaps last night's tarball) and give it a run through. > > There is an existing MVAPICH installation on the machine (version > 0.9.8 apparently) which I have used to compare my new build against, > in particular getting those parameters I mentioned, but I need to use > more recent gcc compilers for some of my code. (The gcc version on the > machine is 3.3.3.) I have figured out / borrowed from other > applications a good submit script for the old MVAPICH install (pasted > below) but as I say this method of running the executable gives a > 'Child exited abnormally!' error (slightly better than when I use > mpirun_rsh -rsh alternative where I get permission denied messages). It may be easier to debug the mpirun_rsh alternative. Is ssh enabled on your system? Was there a particular file or executable listed when it said permission denied? > > So regarding the unusual architecture, its mainly that all the system > files are very old compared to the gcc-4.3.3 compiler. (I have used > one of the other tips for linking to an old ofed package, removing the > -DXRC flag as suggested in > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-March/002184.html.) > Perhaps this explains why the PARAMETER statement's values aren't > determined correctly? This is unrelated. > > Any ideas about whether some of the environment variables below are > necessary or could be causing problems would be most appreciated. The following variables are no longer used: VIADEV_DEFAULT_RETRY_COUNT VIADEV_DEFAULT_TIME_OUT VIADEV_DEFAULT_MAX_SG_LIST DISABLE_RDMA_ALLTOALL DISABLE_RDMA_ALLGATHER DISABLE_RDMA_BARRIER This should be changed: VIADEV_SQ_SIZE_MAX=64; --> VIADEV_SQ_SIZE=64; VIADEV_ENABLE_AFFINITY=0; --> VIADEV_USE_AFFINITY=0; If a fresh build with the edits to the environment variables don't work I wonder whether you can isolate the run to a single node to see if the error shows in this case as well. > > > Thanks, > > Michael > > # @ shell = /usr/bin/ksh > # @ output = $(Executable).$(Cluster).out > # @ error = $(Executable).$(Cluster).err > # @ wall_clock_limit = 12:00:00 > # @ class= UAT > # @ node = 4 > # @ node_usage = not_shared > # @ job_type = MPICH > # @ notification = error > # @ resources = ConsumableCpus(1) > # @ tasks_per_node = 4 > # @ environment = GOTO_NUM_THREADS=1; OMP_NUM_THREADS=1; > VIADEV_CLUSTER_SIZE=AUTO; VIADEV_DEFAULT_RETRY_COUNT=15; > VIADEV_DEFAULT_TIME_OUT=22; VIADEV_NUM_RDMA_BUFFER=4; > VIADEV_ADAPTIVE_RDMA_LIMIT=2; VIADEV_SQ_SIZE_MAX=64; > VIADEV_DEFAULT_MAX_SG_LIST=1; VIADEV_MAX_INLINE_SIZE=80; > VIADEV_SRQ_SIZE=2048; VIADEV_VBUF_TOTAL_SIZE=2048; > VIADEV_VBUF_POOL_SIZE=512; VIADEV_VBUF_SECONDARY_POOL_SIZE=128; > VIADEV_ENABLE_AFFINITY=0; DISABLE_RDMA_ALLTOALL=1; > DISABLE_RDMA_ALLGATHER=1; DISABLE_RDMA_BARRIER=1 > > # @ queue > echo "++++++++++" > echo "host files is:" > echo " " > cat $LOADL_HOSTFILE > cp $LOADL_HOSTFILE $LOADL_STEP_OUT.hostfile > echo " " > echo "++++++++++" > > /CHPC/usr/local/mvapich/bin/mpirun \ > -np $LOADL_TOTAL_TASKS \ > -hostfile $LOADL_HOSTFILE \ > src/snes/examples/tutorials/ex5f > #src/snes/examples/tutorials/ex19 > > > > On Tue, Jun 30, 2009 at 3:53 PM, Jonathan > Perkins wrote: > > On Tue, Jun 30, 2009 at 03:11:01PM +0200, Michael Rapson wrote: > >> Hi there Jonathan, > >> > >> Thanks for the patch, I applied it and it fixed the c++ problem. > > > > Glad to hear this. > > > >> > >> I was able to build the library, but needed to use a work around for > >> an unrelated problem (I think). For some reason the value of > >> MPI_ADDRESS_KIND and MPI_OFFSET_KIND in mpif.h (all copies) is not > >> determined correctly. I ?edited all versions of mpif.h by hand and > >> gave these terms the value 8 (found in other mvapich install) then ran > >> make mpi-modules, make install, and make mpi-lib-test. > > > > That's odd. ?Is there anything exotic about your system (architecture)? > > > >> > >> The library passed all its internal checks and I was planning to send > >> you a note letting you know that it worked once I had run some of the > >> tests in packages depending on mpi. (PETSc and Trilinos) cracking the > >> llsubmit script is taking longer than I thought though (I am getting > >> "Child exited abnormally!" errors which I see from the archives can be > >> related to the scheduler (the cluster uses Tivoli Load Leveler > >> software). > >> > >> So summary, thanks I am 90% sure the patch worked on my system but am > >> tracking down the correct submission script before I can be certain. > >> Thanks for the help! > > > > Thanks for the feedback. > > > >> > >> Cheers, > >> Michael > >> > >> On Mon, Jun 29, 2009 at 1:59 PM, Jonathan > >> Perkins wrote: > >> > On Mon, Jun 29, 2009 at 07:45:01AM +0200, Michael Rapson wrote: > >> >> Hi all, > >> >> > >> >> I am coming in at the end of a conversation between Atencio and > >> >> Jonathan discussing a problem compiling against GCC-4.3.3. I am also > >> >> compiling against GCC-4.3.3 and am running into the same issue with > >> >> the iostream.h header file (and I presume many similar files since > >> >> this is just a testcase). I am new to the MVAPICH mailing list and it > >> >> seems like a patch for the problem has been written, but it hasn't > >> >> made it onto > >> >> the archives of mvapich-discuss. Does someone have a copy of the patch > >> >> that they could send me or where else could I download it from? > >> > > >> > I'm attaching it in this reply, it won't show on the list but you'll get > >> > it directly. > >> > > >> >> > >> >> I have been trying to install mvapich-1.1-2009-06-21. Is it likely > >> >> that the patch would have already been incorporated into the newer > >> >> daily tarballs? > >> > > >> > It should have been, this is an oversight on my part. ?It'll be in > >> > tonight's nightly tarball. > >> > > >> >> > >> >> Thanks for your help. > >> >> > >> >> Regards, > >> >> Michael > >> >> _______________________________________________ > >> >> mvapich-discuss mailing list > >> >> mvapich-discuss@cse.ohio-state.edu > >> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > >> > -- > >> > Jonathan Perkins > >> > http://www.cse.ohio-state.edu/~perkinjo > >> > > >> > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > -- > > Jonathan Perkins > > http://www.cse.ohio-state.edu/~perkinjo > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090630/186d25b9/attachment.bin From saurabh.barve at gmail.com Tue Jun 30 11:56:31 2009 From: saurabh.barve at gmail.com (Saurabh Barve) Date: Tue Jun 30 11:56:58 2009 Subject: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers In-Reply-To: <20090630120005.GA2761@cse.ohio-state.edu> References: <20090630120005.GA2761@cse.ohio-state.edu> Message-ID: <580D1A32C30545B6A8DB7CB941CAC33F@LENOVO> On Mon, Jun 29, 2009 at 10:16:01PM -0700, Saurabh Barve wrote: > Hi, > > I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux > machine, and am running into errors. > > Here is how I run the configure script: > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - > DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable-f77 -- > enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX --with- > romio --without-mpe --prefix=/opt/mvapich2/pgi > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Don't set CFLAGS or LIBS, this is taken care of by the supplied configure options. There is also shouldn't be a need to specify the arch. Do you get the same error with the following command? ./configure --with-device=ch3:psm --with-romio --without-mpe --prefix=/opt/mvapich2/pgi CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 > > > This is the error I get when I run 'make': > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ... > ... > make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm/mpirun' > make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src/pm' > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/src' > make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > ../bin/mpicc -o cpi cpi.o -lm > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_irecv.o): In function `MPID_Irecv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_recv.o): In function `MPID_Recv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' > make[1]: *** [cpi] Error 2 > make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > mvapich2-1.4rc1/examples' > make: *** [all-redirect] Error 2 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > More details about the system I'm using: > > 1) Operating System - CentOS Linux 5.0 > > 2) Kernel version - 2.6.18-8.1.14.el5 > > 3) PGI Compiler Suite - Version 8.0-2 > > 4) MVAPICH2 version 1.4rc1 > > MVAPICH2 builds fine for me when I use the Intel compilers (icc, icpc, > ifort) and use the same configure options as above. > > What am I doing wrong? > > Thanks, > Saurabh > -- Jonathan, I get the same error when I try to use your configure command and then run 'make': ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ../bin/mpicc -I../src/include -I../src/include -c cpi.c ../bin/mpicc -o cpi cpi.o -lm /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/libmpich.a(mpid_irecv.o): In function `MPID_Irecv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/libmpich.a(mpid_recv.o): In function `MPID_Recv': /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' make[1]: *** [cpi] Error 2 make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/examples' make: *** [all-redirect] Error 2 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -Saurabh -------------------------------------------------- From: "Jonathan Perkins" Sent: Tuesday, June 30, 2009 5:00 AM To: "Saurabh Barve" Cc: Subject: Re: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers From karl at tacc.utexas.edu Tue Jun 30 12:46:49 2009 From: karl at tacc.utexas.edu (Karl W. Schulz) Date: Tue Jun 30 12:47:20 2009 Subject: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers In-Reply-To: <580D1A32C30545B6A8DB7CB941CAC33F@LENOVO> References: <20090630120005.GA2761@cse.ohio-state.edu> <580D1A32C30545B6A8DB7CB941CAC33F@LENOVO> Message-ID: <15DB2DD0-0F62-44AF-8866-0D51D2A3B234@tacc.utexas.edu> Have you tried adding "-D_GNU_SOURCE" to your CFLAGS? Not sure if that's the issue here, but we did have to do this to get a PGI build setup locally. Karl On Jun 30, 2009, at 10:56 AM, Saurabh Barve wrote: > On Mon, Jun 29, 2009 at 10:16:01PM -0700, Saurabh Barve wrote: >> Hi, >> >> I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux >> machine, and am running into errors. >> >> Here is how I run the configure script: >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - >> DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable-f77 >> -- >> enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX -- >> with- >> romio --without-mpe --prefix=/opt/mvapich2/pgi >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Don't set CFLAGS or LIBS, this is taken care of by the supplied > configure options. There is also shouldn't be a need to specify the > arch. > > Do you get the same error with the following command? > > ./configure --with-device=ch3:psm --with-romio --without-mpe -- > prefix=/opt/mvapich2/pgi > CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 > >> >> >> This is the error I get when I run 'make': >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ... >> ... >> make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >> mvapich2-1.4rc1/src/pm/mpirun' >> make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >> mvapich2-1.4rc1/src/pm' >> make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >> mvapich2-1.4rc1/src/pm' >> make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >> mvapich2-1.4rc1/src' >> make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >> mvapich2-1.4rc1/examples' >> ../bin/mpicc -o cpi cpi.o -lm >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ >> libmpich.a(mpid_irecv.o): In function `MPID_Irecv': >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/ >> mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/ >> mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ >> libmpich.a(mpid_recv.o): In function `MPID_Recv': >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/ >> mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/ >> mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' >> make[1]: *** [cpi] Error 2 >> make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >> mvapich2-1.4rc1/examples' >> make: *** [all-redirect] Error 2 >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> More details about the system I'm using: >> >> 1) Operating System - CentOS Linux 5.0 >> >> 2) Kernel version - 2.6.18-8.1.14.el5 >> >> 3) PGI Compiler Suite - Version 8.0-2 >> >> 4) MVAPICH2 version 1.4rc1 >> >> MVAPICH2 builds fine for me when I use the Intel compilers (icc, >> icpc, >> ifort) and use the same configure options as above. >> >> What am I doing wrong? >> >> Thanks, >> Saurabh >> -- > > > Jonathan, > > I get the same error when I try to use your configure command and > then run > 'make': > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ../bin/mpicc -I../src/include -I../src/include -c cpi.c > ../bin/mpicc -o cpi cpi.o -lm > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_irecv.o): > In function `MPID_Irecv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ > src/mpid_irecv.c:84: > undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ > src/mpid_irecv.c:90: > undefined reference to `MPIDI_CH3_iRecv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > libmpich.a(mpid_recv.o): > In function `MPID_Recv': > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ > src/mpid_recv.c:62: > undefined reference to `MPIDI_CH3_Recv' > /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ > src/mpid_recv.c:72: > undefined reference to `MPIDI_CH3_Recv' > make[1]: *** [cpi] Error 2 > make[1]: Leaving directory > `/usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/examples' > make: *** [all-redirect] Error 2 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -Saurabh > -------------------------------------------------- > From: "Jonathan Perkins" > Sent: Tuesday, June 30, 2009 5:00 AM > To: "Saurabh Barve" > Cc: > Subject: Re: [mvapich-discuss] Problem Compiling MVAPICH2 using > PGICompilers > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From saurabh.barve at gmail.com Tue Jun 30 13:47:37 2009 From: saurabh.barve at gmail.com (Saurabh Barve) Date: Tue Jun 30 13:49:46 2009 Subject: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers In-Reply-To: <15DB2DD0-0F62-44AF-8866-0D51D2A3B234@tacc.utexas.edu> References: <20090630120005.GA2761@cse.ohio-state.edu> <580D1A32C30545B6A8DB7CB941CAC33F@LENOVO> <15DB2DD0-0F62-44AF-8866-0D51D2A3B234@tacc.utexas.edu> Message-ID: <4C662271-1D10-4FBE-846C-3575D619EA86@gmail.com> On Jun 30, 2009, at 9:46 AM, Karl W. Schulz wrote: > Have you tried adding "-D_GNU_SOURCE" to your CFLAGS? Not sure if > that's the issue here, but we did have to do this to get a PGI build > setup locally. > > Karl > Yes, I tried adding "-D_GNU_SOURCE" to my CFLAGS. I had seen that 'fix' sent to the mailing list. But it didn't work for me. I keep getting the same error. Did you use gcc/g++ as the C/C++ compilers or pgcc/pgCC? I would like to compile everything with the PGI compilers. --------------------------------------------------------------- Fortune Favors the Barve > On Jun 30, 2009, at 10:56 AM, Saurabh Barve wrote: > >> On Mon, Jun 29, 2009 at 10:16:01PM -0700, Saurabh Barve wrote: >>> Hi, >>> >>> I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux >>> machine, and am running into errors. >>> >>> Here is how I run the configure script: >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - >>> DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable- >>> f77 -- >>> enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX -- >>> with- >>> romio --without-mpe --prefix=/opt/mvapich2/pgi >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Don't set CFLAGS or LIBS, this is taken care of by the supplied >> configure options. There is also shouldn't be a need to specify the >> arch. >> >> Do you get the same error with the following command? >> >> ./configure --with-device=ch3:psm --with-romio --without-mpe -- >> prefix=/opt/mvapich2/pgi >> CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 >> >>> >>> >>> This is the error I get when I run 'make': >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> ... >>> ... >>> make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >>> mvapich2-1.4rc1/src/pm/mpirun' >>> make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >>> mvapich2-1.4rc1/src/pm' >>> make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >>> mvapich2-1.4rc1/src/pm' >>> make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >>> mvapich2-1.4rc1/src' >>> make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >>> mvapich2-1.4rc1/examples' >>> ../bin/mpicc -o cpi cpi.o -lm >>> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ >>> libmpich.a(mpid_irecv.o): In function `MPID_Irecv': >>> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >>> src/ >>> mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' >>> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >>> src/ >>> mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' >>> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ >>> libmpich.a(mpid_recv.o): In function `MPID_Recv': >>> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >>> src/ >>> mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' >>> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >>> src/ >>> mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' >>> make[1]: *** [cpi] Error 2 >>> make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ >>> mvapich2-1.4rc1/examples' >>> make: *** [all-redirect] Error 2 >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> More details about the system I'm using: >>> >>> 1) Operating System - CentOS Linux 5.0 >>> >>> 2) Kernel version - 2.6.18-8.1.14.el5 >>> >>> 3) PGI Compiler Suite - Version 8.0-2 >>> >>> 4) MVAPICH2 version 1.4rc1 >>> >>> MVAPICH2 builds fine for me when I use the Intel compilers (icc, >>> icpc, >>> ifort) and use the same configure options as above. >>> >>> What am I doing wrong? >>> >>> Thanks, >>> Saurabh >>> -- >> >> >> Jonathan, >> >> I get the same error when I try to use your configure command and >> then run >> 'make': >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ../bin/mpicc -I../src/include -I../src/include -c cpi.c >> ../bin/mpicc -o cpi cpi.o -lm >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ >> libmpich.a(mpid_irecv.o): >> In function `MPID_Irecv': >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/mpid_irecv.c:84: >> undefined reference to `MPIDI_CH3_iRecv' >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/mpid_irecv.c:90: >> undefined reference to `MPIDI_CH3_iRecv' >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ >> libmpich.a(mpid_recv.o): >> In function `MPID_Recv': >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/mpid_recv.c:62: >> undefined reference to `MPIDI_CH3_Recv' >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/ >> src/mpid_recv.c:72: >> undefined reference to `MPIDI_CH3_Recv' >> make[1]: *** [cpi] Error 2 >> make[1]: Leaving directory >> `/usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/examples' >> make: *** [all-redirect] Error 2 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> -Saurabh >> -------------------------------------------------- >> From: "Jonathan Perkins" >> Sent: Tuesday, June 30, 2009 5:00 AM >> To: "Saurabh Barve" >> Cc: >> Subject: Re: [mvapich-discuss] Problem Compiling MVAPICH2 using >> PGICompilers >> >> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From perkinjo at cse.ohio-state.edu Tue Jun 30 17:40:14 2009 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Tue Jun 30 17:40:38 2009 Subject: [mvapich-discuss] Problem Compiling MVAPICH2 using PGICompilers In-Reply-To: <580D1A32C30545B6A8DB7CB941CAC33F@LENOVO> References: <20090630120005.GA2761@cse.ohio-state.edu> <580D1A32C30545B6A8DB7CB941CAC33F@LENOVO> Message-ID: <20090630214014.GF2476@cse.ohio-state.edu> On Tue, Jun 30, 2009 at 08:56:31AM -0700, Saurabh Barve wrote: > Manually Qouting an intermediate what Jonathan Perkins wrote: > > On Mon, Jun 29, 2009 at 10:16:01PM -0700, Saurabh Barve wrote: > >> Hi, > >> > >> I'm trying to build MVAPICH2 (mvapich2-1.4rc1) on a CentOS Linux > >> machine, and am running into errors. > >> > >> Here is how I run the configure script: > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 CFLAGS="-D_X86_64_ -D_SMP_ - > >> DCH_PSM" LIBS="-lpthread -lpsm_infinipath" ./configure --enable-f77 -- > >> enable-f90 --enable-cxx --with-device=ch3:psm --with-arch=LINUX --with- > >> romio --without-mpe --prefix=/opt/mvapich2/pgi > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Don't set CFLAGS or LIBS, this is taken care of by the supplied > > configure options. There is also shouldn't be a need to specify the > > arch. > > > > Do you get the same error with the following command? > > > > ./configure --with-device=ch3:psm --with-romio --without-mpe > > --prefix=/opt/mvapich2/pgi CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90 > > > > I get the same error when I try to use your configure command and then > run 'make': > Thanks for this report. We have reproduced this issue and are investigating this further. We'll get back to you once we find a solution. > >> > >> > >> This is the error I get when I run 'make': > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> ... > >> ... > >> make[4]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > >> mvapich2-1.4rc1/src/pm/mpirun' > >> make[3]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > >> mvapich2-1.4rc1/src/pm' > >> make[2]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > >> mvapich2-1.4rc1/src/pm' > >> make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > >> mvapich2-1.4rc1/src' > >> make[1]: Entering directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > >> mvapich2-1.4rc1/examples' > >> ../bin/mpicc -o cpi cpi.o -lm > >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > >> libmpich.a(mpid_irecv.o): In function `MPID_Irecv': > >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > >> mpid_irecv.c:84: undefined reference to `MPIDI_CH3_iRecv' > >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > >> mpid_irecv.c:90: undefined reference to `MPIDI_CH3_iRecv' > >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/lib/ > >> libmpich.a(mpid_recv.o): In function `MPID_Recv': > >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > >> mpid_recv.c:62: undefined reference to `MPIDI_CH3_Recv' > >> /usr/src/redhat/SOURCES/mvapich2/pgi/mvapich2-1.4rc1/src/mpid/ch3/src/ > >> mpid_recv.c:72: undefined reference to `MPIDI_CH3_Recv' > >> make[1]: *** [cpi] Error 2 > >> make[1]: Leaving directory `/usr/src/redhat/SOURCES/mvapich2/pgi/ > >> mvapich2-1.4rc1/examples' > >> make: *** [all-redirect] Error 2 > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> More details about the system I'm using: > >> > >> 1) Operating System - CentOS Linux 5.0 > >> > >> 2) Kernel version - 2.6.18-8.1.14.el5 > >> > >> 3) PGI Compiler Suite - Version 8.0-2 > >> > >> 4) MVAPICH2 version 1.4rc1 > >> > >> MVAPICH2 builds fine for me when I use the Intel compilers (icc, icpc, > >> ifort) and use the same configure options as above. > >> > >> What am I doing wrong? > >> > >> Thanks, > >> Saurabh > >> -- -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090630/f0e160bc/attachment.bin