From luitjens at cs.utah.edu Mon Nov 3 14:17:46 2008 From: luitjens at cs.utah.edu (Justin) Date: Mon Nov 3 14:18:42 2008 Subject: [mvapich-discuss] hang at large numbers of processors Message-ID: <490F4E5A.8050606@cs.utah.edu> We are running into hangs on Ranger using mvapich that are not present on other machines. These hangs seem to only occur on arge problems with large numbers of processors. We have ran into similar problems on some LLNL machines in the past and were able to get around them by disabling the shared memory optimizations. In these cases the problem had to do with fixed sized buffers used in the shared memory optimizations. We would like to disable shared memory on Ranger but are confused with all the different parameters dealing with shared memory optimizations. How do we know which parameters affect the run? For example do we use the parameters that begin with MV_ or VIADEV_? From past conversations I have had with support teams the parameters that have an effect vary according to the hardware/mpi build. What is the best way to determine which parameters are active? Also here is a stacktrace from one of our hangs: .stack.i132-112.ranger.tacc.utexas.edu.16033 Intel(R) Debugger for applications running on Intel(R) 64, Version 10.1-35 , Build 20080310 Attaching to program: /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, process 16033 Reading symbols from /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no debugging symbols found)...done. smpi_net_lookup () at mpid_smpi.c:1381 #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at viacheck.c:505 #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, status=0x10, error_code=0x4) at mpid_recv.c:106 #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at sendrecv.c:98 #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at intra_fns_new.c:5682 #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at intra_fns_new.c:6014 #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so In this case what would be the likely parameter I could play with in order to potentially stop a hang in MPI_Allreduce? Thanks, Justin From panda at cse.ohio-state.edu Mon Nov 3 15:38:20 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Nov 3 15:38:31 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <490F4E5A.8050606@cs.utah.edu> Message-ID: Justin, Could you let us know which stack (MVAPICH or MVAPICH2) you are using on Ranger. These two stacks have the parameters named differently. Also, on what exact process count you see this problem. If you can also let us know the version number of mvapich/mvapich2 stack and/or the path of the MPI library on Ranger, it will be helpful. Thanks, DK On Mon, 3 Nov 2008, Justin wrote: > We are running into hangs on Ranger using mvapich that are not present > on other machines. These hangs seem to only occur on arge problems with > large numbers of processors. We have ran into similar problems on some > LLNL machines in the past and were able to get around them by disabling > the shared memory optimizations. In these cases the problem had to do > with fixed sized buffers used in the shared memory optimizations. > > We would like to disable shared memory on Ranger but are confused with > all the different parameters dealing with shared memory optimizations. > How do we know which parameters affect the run? For example do we use > the parameters that begin with MV_ or VIADEV_? From past conversations > I have had with support teams the parameters that have an effect vary > according to the hardware/mpi build. What is the best way to determine > which parameters are active? > > Also here is a stacktrace from one of our hangs: > > .stack.i132-112.ranger.tacc.utexas.edu.16033 > Intel(R) Debugger for applications running on Intel(R) 64, Version > 10.1-35 , Build 20080310 > Attaching to program: > /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, > process 16033 > Reading symbols from > /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no > debugging symbols found)...done. > smpi_net_lookup () at mpid_smpi.c:1381 > #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 > #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 > #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at > viacheck.c:505 > #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > status=0x10, error_code=0x4) at mpid_recv.c:106 > #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 > #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, > sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, > recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at > sendrecv.c:98 > #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at > intra_fns_new.c:5682 > #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at > intra_fns_new.c:6014 > #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, > count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 > #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > > In this case what would be the likely parameter I could play with in > order to potentially stop a hang in MPI_Allreduce? > > Thanks, > Justin > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From luitjens at cs.utah.edu Mon Nov 3 15:43:26 2008 From: luitjens at cs.utah.edu (Justin) Date: Mon Nov 3 15:44:24 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: References: Message-ID: <490F626E.5000808@cs.utah.edu> Hi, We are using mvapich_devel_1.0 on Ranger. I am seeing my current lockup at 16,384 processors at the following stacktrace: #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, out_of_order=0x7fff52849030) at viacheck.c:206 #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at viacheck.c:505 #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190 #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, sendcount=1384419376, sendtype=43, dest=35, sendtag=64, recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, comm=0x2aaaad75d000) at intra_fns_new.c:5682 #6 0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, comm=0x2aaaad75d000) at intra_fns_new.c:6014 #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) at allreduce.c:83 #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so I was seeing lockups at smaller powers of two but adding the following seemed to stop those: export VIADEV_USE_SHMEM_COLL=0 export VIADEV_USE_SHMEM_ALLREDUCE=0 Now I am just seeing it at 16K. What is odd to me is that if the 2 commands above stop the shared memory optimizations then why does the stacktrace still show 'ntra_shmem_Allreduce' being called? Here is some other info that might be useful: login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v OSU MVAPICH VERSION 1.0-SingleRail Build-ID: custom MPI Path: lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> /opt/apps/intel10_1/mvapich-devel/1.0/include/ lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ Thanks, Justin Dhabaleswar Panda wrote: > Justin, > > Could you let us know which stack (MVAPICH or MVAPICH2) you are using on > Ranger. These two stacks have the parameters named differently. Also, on > what exact process count you see this problem. If you can also let us know > the version number of mvapich/mvapich2 stack and/or the path of the MPI > library on Ranger, it will be helpful. > > Thanks, > > DK > > On Mon, 3 Nov 2008, Justin wrote: > > >> We are running into hangs on Ranger using mvapich that are not present >> on other machines. These hangs seem to only occur on arge problems with >> large numbers of processors. We have ran into similar problems on some >> LLNL machines in the past and were able to get around them by disabling >> the shared memory optimizations. In these cases the problem had to do >> with fixed sized buffers used in the shared memory optimizations. >> >> We would like to disable shared memory on Ranger but are confused with >> all the different parameters dealing with shared memory optimizations. >> How do we know which parameters affect the run? For example do we use >> the parameters that begin with MV_ or VIADEV_? From past conversations >> I have had with support teams the parameters that have an effect vary >> according to the hardware/mpi build. What is the best way to determine >> which parameters are active? >> >> Also here is a stacktrace from one of our hangs: >> >> .stack.i132-112.ranger.tacc.utexas.edu.16033 >> Intel(R) Debugger for applications running on Intel(R) 64, Version >> 10.1-35 , Build 20080310 >> Attaching to program: >> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, >> process 16033 >> Reading symbols from >> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no >> debugging symbols found)...done. >> smpi_net_lookup () at mpid_smpi.c:1381 >> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 >> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 >> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at >> viacheck.c:505 >> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >> status=0x10, error_code=0x4) at mpid_recv.c:106 >> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 >> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, >> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, >> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at >> sendrecv.c:98 >> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at >> intra_fns_new.c:5682 >> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at >> intra_fns_new.c:6014 >> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, >> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 >> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >> >> In this case what would be the likely parameter I could play with in >> order to potentially stop a hang in MPI_Allreduce? >> >> Thanks, >> Justin >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> From koop at cse.ohio-state.edu Mon Nov 3 16:50:53 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Nov 3 16:51:05 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <490F626E.5000808@cs.utah.edu> Message-ID: Justin, I think there are a couple things here: 1.) Simply exporting the variables is not sufficient for the setup at TACC. You'll need to set it the following way: ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name Since the ENVs weren't being propogated the setting wasn't taking effect (and that is why you still saw the shmem functions in the backtrace). 2.) There was a limitation in the 1.0 versions where when the shared memory bcast implementation was run on more than 1K nodes there would be a hang. Since the shared memory allreduce uses a bcast internally it is also hanging you can try just disabling the bcast: ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name Let us know if this works or if you have additional questions. Thanks, Matt On Mon, 3 Nov 2008, Justin wrote: > Hi, > > We are using mvapich_devel_1.0 on Ranger. I am seeing my current lockup > at 16,384 processors at the following stacktrace: > > #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, > out_of_order=0x7fff52849030) at viacheck.c:206 > #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at > viacheck.c:505 > #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, > status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 > #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, > array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190 > #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, > sendcount=1384419376, sendtype=43, dest=35, sendtag=64, > recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, > recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 > #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, > recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, > comm=0x2aaaad75d000) at intra_fns_new.c:5682 > #6 0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, > recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, > comm=0x2aaaad75d000) at intra_fns_new.c:6014 > #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, > recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) > at allreduce.c:83 > #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > > I was seeing lockups at smaller powers of two but adding the following > seemed to stop those: > > export VIADEV_USE_SHMEM_COLL=0 > export VIADEV_USE_SHMEM_ALLREDUCE=0 > > Now I am just seeing it at 16K. What is odd to me is that if the 2 > commands above stop the shared memory optimizations then why does the > stacktrace still show 'ntra_shmem_Allreduce' being called? > > Here is some other info that might be useful: > > login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v > OSU MVAPICH VERSION 1.0-SingleRail > Build-ID: custom > > MPI Path: > lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> > /opt/apps/intel10_1/mvapich-devel/1.0/include/ > lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> > /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ > > > Thanks, > Justin > > Dhabaleswar Panda wrote: > > Justin, > > > > Could you let us know which stack (MVAPICH or MVAPICH2) you are using on > > Ranger. These two stacks have the parameters named differently. Also, on > > what exact process count you see this problem. If you can also let us know > > the version number of mvapich/mvapich2 stack and/or the path of the MPI > > library on Ranger, it will be helpful. > > > > Thanks, > > > > DK > > > > On Mon, 3 Nov 2008, Justin wrote: > > > > > >> We are running into hangs on Ranger using mvapich that are not present > >> on other machines. These hangs seem to only occur on arge problems with > >> large numbers of processors. We have ran into similar problems on some > >> LLNL machines in the past and were able to get around them by disabling > >> the shared memory optimizations. In these cases the problem had to do > >> with fixed sized buffers used in the shared memory optimizations. > >> > >> We would like to disable shared memory on Ranger but are confused with > >> all the different parameters dealing with shared memory optimizations. > >> How do we know which parameters affect the run? For example do we use > >> the parameters that begin with MV_ or VIADEV_? From past conversations > >> I have had with support teams the parameters that have an effect vary > >> according to the hardware/mpi build. What is the best way to determine > >> which parameters are active? > >> > >> Also here is a stacktrace from one of our hangs: > >> > >> .stack.i132-112.ranger.tacc.utexas.edu.16033 > >> Intel(R) Debugger for applications running on Intel(R) 64, Version > >> 10.1-35 , Build 20080310 > >> Attaching to program: > >> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, > >> process 16033 > >> Reading symbols from > >> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no > >> debugging symbols found)...done. > >> smpi_net_lookup () at mpid_smpi.c:1381 > >> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 > >> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 > >> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at > >> viacheck.c:505 > >> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > >> status=0x10, error_code=0x4) at mpid_recv.c:106 > >> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > >> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 > >> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, > >> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, > >> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at > >> sendrecv.c:98 > >> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > >> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at > >> intra_fns_new.c:5682 > >> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at > >> intra_fns_new.c:6014 > >> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, > >> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 > >> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >> > >> In this case what would be the likely parameter I could play with in > >> order to potentially stop a hang in MPI_Allreduce? > >> > >> Thanks, > >> Justin > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From luitjens at cs.utah.edu Mon Nov 3 18:09:34 2008 From: luitjens at cs.utah.edu (Justin) Date: Mon Nov 3 18:10:31 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: References: Message-ID: <490F84AE.6000301@cs.utah.edu> Are there similar hangs when using mvapich2? A coworker of mine is reporting similar hangs on Abe using mvapich2. I'm not sure of the version. Justin Matthew Koop wrote: > Justin, > > I think there are a couple things here: > > 1.) Simply exporting the variables is not sufficient for the setup at > TACC. You'll need to set it the following way: > > ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > > Since the ENVs weren't being propogated the setting wasn't taking effect > (and that is why you still saw the shmem functions in the backtrace). > > 2.) There was a limitation in the 1.0 versions where when the > shared memory bcast implementation was run on more than 1K nodes there > would be a hang. Since the shared memory allreduce uses a bcast internally > it is also hanging you can try just disabling the bcast: > > ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > > Let us know if this works or if you have additional questions. > > Thanks, > Matt > > On Mon, 3 Nov 2008, Justin wrote: > > >> Hi, >> >> We are using mvapich_devel_1.0 on Ranger. I am seeing my current lockup >> at 16,384 processors at the following stacktrace: >> >> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, >> out_of_order=0x7fff52849030) at viacheck.c:206 >> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at >> viacheck.c:505 >> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, >> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 >> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, >> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190 >> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, >> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, >> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, >> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 >> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, >> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, >> comm=0x2aaaad75d000) at intra_fns_new.c:5682 >> #6 0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, >> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, >> comm=0x2aaaad75d000) at intra_fns_new.c:6014 >> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, >> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) >> at allreduce.c:83 >> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >> >> I was seeing lockups at smaller powers of two but adding the following >> seemed to stop those: >> >> export VIADEV_USE_SHMEM_COLL=0 >> export VIADEV_USE_SHMEM_ALLREDUCE=0 >> >> Now I am just seeing it at 16K. What is odd to me is that if the 2 >> commands above stop the shared memory optimizations then why does the >> stacktrace still show 'ntra_shmem_Allreduce' being called? >> >> Here is some other info that might be useful: >> >> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v >> OSU MVAPICH VERSION 1.0-SingleRail >> Build-ID: custom >> >> MPI Path: >> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> >> /opt/apps/intel10_1/mvapich-devel/1.0/include/ >> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> >> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ >> >> >> Thanks, >> Justin >> >> Dhabaleswar Panda wrote: >> >>> Justin, >>> >>> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on >>> Ranger. These two stacks have the parameters named differently. Also, on >>> what exact process count you see this problem. If you can also let us know >>> the version number of mvapich/mvapich2 stack and/or the path of the MPI >>> library on Ranger, it will be helpful. >>> >>> Thanks, >>> >>> DK >>> >>> On Mon, 3 Nov 2008, Justin wrote: >>> >>> >>> >>>> We are running into hangs on Ranger using mvapich that are not present >>>> on other machines. These hangs seem to only occur on arge problems with >>>> large numbers of processors. We have ran into similar problems on some >>>> LLNL machines in the past and were able to get around them by disabling >>>> the shared memory optimizations. In these cases the problem had to do >>>> with fixed sized buffers used in the shared memory optimizations. >>>> >>>> We would like to disable shared memory on Ranger but are confused with >>>> all the different parameters dealing with shared memory optimizations. >>>> How do we know which parameters affect the run? For example do we use >>>> the parameters that begin with MV_ or VIADEV_? From past conversations >>>> I have had with support teams the parameters that have an effect vary >>>> according to the hardware/mpi build. What is the best way to determine >>>> which parameters are active? >>>> >>>> Also here is a stacktrace from one of our hangs: >>>> >>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 >>>> Intel(R) Debugger for applications running on Intel(R) 64, Version >>>> 10.1-35 , Build 20080310 >>>> Attaching to program: >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, >>>> process 16033 >>>> Reading symbols from >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no >>>> debugging symbols found)...done. >>>> smpi_net_lookup () at mpid_smpi.c:1381 >>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 >>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 >>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at >>>> viacheck.c:505 >>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >>>> status=0x10, error_code=0x4) at mpid_recv.c:106 >>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 >>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, >>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, >>>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at >>>> sendrecv.c:98 >>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at >>>> intra_fns_new.c:5682 >>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at >>>> intra_fns_new.c:6014 >>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, >>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 >>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>> >>>> In this case what would be the likely parameter I could play with in >>>> order to potentially stop a hang in MPI_Allreduce? >>>> >>>> Thanks, >>>> Justin >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> From luitjens at cs.utah.edu Mon Nov 3 19:54:45 2008 From: luitjens at cs.utah.edu (Justin) Date: Mon Nov 3 19:55:45 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: References: Message-ID: <490F9D55.5030901@cs.utah.edu> Here is an update: I am running on ranger with the following ibrun command: ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../sus where sus is our executable. With this i'm still occasionally seeing a hang at large numbers of processors at this stack trace: #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at viacheck.c:505 #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, status=0x10, error_code=0xb) at mpid_recv.c:106 #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, recvcount=1, recvtype=6, source=2912, recvtag=14, comm=130, status=0x7fff952efd2c) at sendrecv.c:98 #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, count=4, datatype=0xb, op=22046016, comm=0x1506810) at intra_fns_new.c:5682 #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, count=1, datatype=0xb, op=22046016, comm=0x1506810) at intra_fns_new.c:6014 #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, count=11, datatype=11, op=22046016, comm=22046736) at allreduce.c:83 #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () in /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so #10 0x0000000007d0db10 all reduce is still using shared memory. Do you have any more suggestions? Thanks, Justin Matthew Koop wrote: > Justin, > > I think there are a couple things here: > > 1.) Simply exporting the variables is not sufficient for the setup at > TACC. You'll need to set it the following way: > > ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > > Since the ENVs weren't being propogated the setting wasn't taking effect > (and that is why you still saw the shmem functions in the backtrace). > > 2.) There was a limitation in the 1.0 versions where when the > shared memory bcast implementation was run on more than 1K nodes there > would be a hang. Since the shared memory allreduce uses a bcast internally > it is also hanging you can try just disabling the bcast: > > ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > > Let us know if this works or if you have additional questions. > > Thanks, > Matt > > On Mon, 3 Nov 2008, Justin wrote: > > >> Hi, >> >> We are using mvapich_devel_1.0 on Ranger. I am seeing my current lockup >> at 16,384 processors at the following stacktrace: >> >> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, >> out_of_order=0x7fff52849030) at viacheck.c:206 >> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at >> viacheck.c:505 >> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, >> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 >> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, >> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190 >> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, >> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, >> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, >> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 >> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, >> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, >> comm=0x2aaaad75d000) at intra_fns_new.c:5682 >> #6 0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, >> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, >> comm=0x2aaaad75d000) at intra_fns_new.c:6014 >> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, >> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) >> at allreduce.c:83 >> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >> >> I was seeing lockups at smaller powers of two but adding the following >> seemed to stop those: >> >> export VIADEV_USE_SHMEM_COLL=0 >> export VIADEV_USE_SHMEM_ALLREDUCE=0 >> >> Now I am just seeing it at 16K. What is odd to me is that if the 2 >> commands above stop the shared memory optimizations then why does the >> stacktrace still show 'ntra_shmem_Allreduce' being called? >> >> Here is some other info that might be useful: >> >> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v >> OSU MVAPICH VERSION 1.0-SingleRail >> Build-ID: custom >> >> MPI Path: >> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> >> /opt/apps/intel10_1/mvapich-devel/1.0/include/ >> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> >> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ >> >> >> Thanks, >> Justin >> >> Dhabaleswar Panda wrote: >> >>> Justin, >>> >>> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on >>> Ranger. These two stacks have the parameters named differently. Also, on >>> what exact process count you see this problem. If you can also let us know >>> the version number of mvapich/mvapich2 stack and/or the path of the MPI >>> library on Ranger, it will be helpful. >>> >>> Thanks, >>> >>> DK >>> >>> On Mon, 3 Nov 2008, Justin wrote: >>> >>> >>> >>>> We are running into hangs on Ranger using mvapich that are not present >>>> on other machines. These hangs seem to only occur on arge problems with >>>> large numbers of processors. We have ran into similar problems on some >>>> LLNL machines in the past and were able to get around them by disabling >>>> the shared memory optimizations. In these cases the problem had to do >>>> with fixed sized buffers used in the shared memory optimizations. >>>> >>>> We would like to disable shared memory on Ranger but are confused with >>>> all the different parameters dealing with shared memory optimizations. >>>> How do we know which parameters affect the run? For example do we use >>>> the parameters that begin with MV_ or VIADEV_? From past conversations >>>> I have had with support teams the parameters that have an effect vary >>>> according to the hardware/mpi build. What is the best way to determine >>>> which parameters are active? >>>> >>>> Also here is a stacktrace from one of our hangs: >>>> >>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 >>>> Intel(R) Debugger for applications running on Intel(R) 64, Version >>>> 10.1-35 , Build 20080310 >>>> Attaching to program: >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, >>>> process 16033 >>>> Reading symbols from >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no >>>> debugging symbols found)...done. >>>> smpi_net_lookup () at mpid_smpi.c:1381 >>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 >>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 >>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at >>>> viacheck.c:505 >>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >>>> status=0x10, error_code=0x4) at mpid_recv.c:106 >>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 >>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, >>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, >>>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at >>>> sendrecv.c:98 >>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at >>>> intra_fns_new.c:5682 >>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at >>>> intra_fns_new.c:6014 >>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, >>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 >>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>> >>>> In this case what would be the likely parameter I could play with in >>>> order to potentially stop a hang in MPI_Allreduce? >>>> >>>> Thanks, >>>> Justin >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> >> From panda at cse.ohio-state.edu Mon Nov 3 21:25:18 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Nov 3 21:25:30 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <490F84AE.6000301@cs.utah.edu> Message-ID: > Are there similar hangs when using mvapich2? A coworker of mine is > reporting similar hangs on Abe using mvapich2. I'm not sure of the version. It will be good to know what version of MVAPICH2 is running on Abe. Also, it will be helpful to get a backtrace of the hang. This will help us to determine whether the causes are same or not. Thanks, DK > Justin > > Matthew Koop wrote: > > Justin, > > > > I think there are a couple things here: > > > > 1.) Simply exporting the variables is not sufficient for the setup at > > TACC. You'll need to set it the following way: > > > > ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > > > > Since the ENVs weren't being propogated the setting wasn't taking effect > > (and that is why you still saw the shmem functions in the backtrace). > > > > 2.) There was a limitation in the 1.0 versions where when the > > shared memory bcast implementation was run on more than 1K nodes there > > would be a hang. Since the shared memory allreduce uses a bcast internally > > it is also hanging you can try just disabling the bcast: > > > > ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > > > > Let us know if this works or if you have additional questions. > > > > Thanks, > > Matt > > > > On Mon, 3 Nov 2008, Justin wrote: > > > > > >> Hi, > >> > >> We are using mvapich_devel_1.0 on Ranger. I am seeing my current lockup > >> at 16,384 processors at the following stacktrace: > >> > >> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, > >> out_of_order=0x7fff52849030) at viacheck.c:206 > >> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at > >> viacheck.c:505 > >> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, > >> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 > >> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, > >> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190 > >> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, > >> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, > >> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, > >> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 > >> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, > >> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, > >> comm=0x2aaaad75d000) at intra_fns_new.c:5682 > >> #6 0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, > >> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, > >> comm=0x2aaaad75d000) at intra_fns_new.c:6014 > >> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, > >> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) > >> at allreduce.c:83 > >> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >> > >> I was seeing lockups at smaller powers of two but adding the following > >> seemed to stop those: > >> > >> export VIADEV_USE_SHMEM_COLL=0 > >> export VIADEV_USE_SHMEM_ALLREDUCE=0 > >> > >> Now I am just seeing it at 16K. What is odd to me is that if the 2 > >> commands above stop the shared memory optimizations then why does the > >> stacktrace still show 'ntra_shmem_Allreduce' being called? > >> > >> Here is some other info that might be useful: > >> > >> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v > >> OSU MVAPICH VERSION 1.0-SingleRail > >> Build-ID: custom > >> > >> MPI Path: > >> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> > >> /opt/apps/intel10_1/mvapich-devel/1.0/include/ > >> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> > >> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ > >> > >> > >> Thanks, > >> Justin > >> > >> Dhabaleswar Panda wrote: > >> > >>> Justin, > >>> > >>> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on > >>> Ranger. These two stacks have the parameters named differently. Also, on > >>> what exact process count you see this problem. If you can also let us know > >>> the version number of mvapich/mvapich2 stack and/or the path of the MPI > >>> library on Ranger, it will be helpful. > >>> > >>> Thanks, > >>> > >>> DK > >>> > >>> On Mon, 3 Nov 2008, Justin wrote: > >>> > >>> > >>> > >>>> We are running into hangs on Ranger using mvapich that are not present > >>>> on other machines. These hangs seem to only occur on arge problems with > >>>> large numbers of processors. We have ran into similar problems on some > >>>> LLNL machines in the past and were able to get around them by disabling > >>>> the shared memory optimizations. In these cases the problem had to do > >>>> with fixed sized buffers used in the shared memory optimizations. > >>>> > >>>> We would like to disable shared memory on Ranger but are confused with > >>>> all the different parameters dealing with shared memory optimizations. > >>>> How do we know which parameters affect the run? For example do we use > >>>> the parameters that begin with MV_ or VIADEV_? From past conversations > >>>> I have had with support teams the parameters that have an effect vary > >>>> according to the hardware/mpi build. What is the best way to determine > >>>> which parameters are active? > >>>> > >>>> Also here is a stacktrace from one of our hangs: > >>>> > >>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 > >>>> Intel(R) Debugger for applications running on Intel(R) 64, Version > >>>> 10.1-35 , Build 20080310 > >>>> Attaching to program: > >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, > >>>> process 16033 > >>>> Reading symbols from > >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no > >>>> debugging symbols found)...done. > >>>> smpi_net_lookup () at mpid_smpi.c:1381 > >>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 > >>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 > >>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at > >>>> viacheck.c:505 > >>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > >>>> status=0x10, error_code=0x4) at mpid_recv.c:106 > >>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > >>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 > >>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, > >>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, > >>>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at > >>>> sendrecv.c:98 > >>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at > >>>> intra_fns_new.c:5682 > >>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at > >>>> intra_fns_new.c:6014 > >>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, > >>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 > >>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >>>> > >>>> In this case what would be the likely parameter I could play with in > >>>> order to potentially stop a hang in MPI_Allreduce? > >>>> > >>>> Thanks, > >>>> Justin > >>>> _______________________________________________ > >>>> mvapich-discuss mailing list > >>>> mvapich-discuss@cse.ohio-state.edu > >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>> > >>>> > >>>> > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > From luitjens at cs.utah.edu Mon Nov 3 21:32:10 2008 From: luitjens at cs.utah.edu (Justin) Date: Mon Nov 3 21:33:08 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: References: Message-ID: <490FB42A.5070708@cs.utah.edu> Ok, I will work with my co-worker to get this information. It may take a few days as I don't have an account on Abe and will have to relay everything through him. Justin Dhabaleswar Panda wrote: >> Are there similar hangs when using mvapich2? A coworker of mine is >> reporting similar hangs on Abe using mvapich2. I'm not sure of the version. >> > > It will be good to know what version of MVAPICH2 is running on Abe. Also, > it will be helpful to get a backtrace of the hang. This will help us to > determine whether the causes are same or not. > > Thanks, > > DK > > >> Justin >> >> Matthew Koop wrote: >> >>> Justin, >>> >>> I think there are a couple things here: >>> >>> 1.) Simply exporting the variables is not sufficient for the setup at >>> TACC. You'll need to set it the following way: >>> >>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name >>> >>> Since the ENVs weren't being propogated the setting wasn't taking effect >>> (and that is why you still saw the shmem functions in the backtrace). >>> >>> 2.) There was a limitation in the 1.0 versions where when the >>> shared memory bcast implementation was run on more than 1K nodes there >>> would be a hang. Since the shared memory allreduce uses a bcast internally >>> it is also hanging you can try just disabling the bcast: >>> >>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name >>> >>> Let us know if this works or if you have additional questions. >>> >>> Thanks, >>> Matt >>> >>> On Mon, 3 Nov 2008, Justin wrote: >>> >>> >>> >>>> Hi, >>>> >>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my current lockup >>>> at 16,384 processors at the following stacktrace: >>>> >>>> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, >>>> out_of_order=0x7fff52849030) at viacheck.c:206 >>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at >>>> viacheck.c:505 >>>> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, >>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 >>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, >>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190 >>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, >>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, >>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, >>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 >>>> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, >>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, >>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 >>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, >>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, >>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 >>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, >>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) >>>> at allreduce.c:83 >>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>> >>>> I was seeing lockups at smaller powers of two but adding the following >>>> seemed to stop those: >>>> >>>> export VIADEV_USE_SHMEM_COLL=0 >>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 >>>> >>>> Now I am just seeing it at 16K. What is odd to me is that if the 2 >>>> commands above stop the shared memory optimizations then why does the >>>> stacktrace still show 'ntra_shmem_Allreduce' being called? >>>> >>>> Here is some other info that might be useful: >>>> >>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v >>>> OSU MVAPICH VERSION 1.0-SingleRail >>>> Build-ID: custom >>>> >>>> MPI Path: >>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> >>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ >>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> >>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ >>>> >>>> >>>> Thanks, >>>> Justin >>>> >>>> Dhabaleswar Panda wrote: >>>> >>>> >>>>> Justin, >>>>> >>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on >>>>> Ranger. These two stacks have the parameters named differently. Also, on >>>>> what exact process count you see this problem. If you can also let us know >>>>> the version number of mvapich/mvapich2 stack and/or the path of the MPI >>>>> library on Ranger, it will be helpful. >>>>> >>>>> Thanks, >>>>> >>>>> DK >>>>> >>>>> On Mon, 3 Nov 2008, Justin wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> We are running into hangs on Ranger using mvapich that are not present >>>>>> on other machines. These hangs seem to only occur on arge problems with >>>>>> large numbers of processors. We have ran into similar problems on some >>>>>> LLNL machines in the past and were able to get around them by disabling >>>>>> the shared memory optimizations. In these cases the problem had to do >>>>>> with fixed sized buffers used in the shared memory optimizations. >>>>>> >>>>>> We would like to disable shared memory on Ranger but are confused with >>>>>> all the different parameters dealing with shared memory optimizations. >>>>>> How do we know which parameters affect the run? For example do we use >>>>>> the parameters that begin with MV_ or VIADEV_? From past conversations >>>>>> I have had with support teams the parameters that have an effect vary >>>>>> according to the hardware/mpi build. What is the best way to determine >>>>>> which parameters are active? >>>>>> >>>>>> Also here is a stacktrace from one of our hangs: >>>>>> >>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 >>>>>> Intel(R) Debugger for applications running on Intel(R) 64, Version >>>>>> 10.1-35 , Build 20080310 >>>>>> Attaching to program: >>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, >>>>>> process 16033 >>>>>> Reading symbols from >>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no >>>>>> debugging symbols found)...done. >>>>>> smpi_net_lookup () at mpid_smpi.c:1381 >>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 >>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 >>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at >>>>>> viacheck.c:505 >>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 >>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 >>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, >>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, >>>>>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at >>>>>> sendrecv.c:98 >>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at >>>>>> intra_fns_new.c:5682 >>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at >>>>>> intra_fns_new.c:6014 >>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, >>>>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 >>>>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>>>> >>>>>> In this case what would be the likely parameter I could play with in >>>>>> order to potentially stop a hang in MPI_Allreduce? >>>>>> >>>>>> Thanks, >>>>>> Justin >>>>>> _______________________________________________ >>>>>> mvapich-discuss mailing list >>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>> >>>>>> >>>>>> >>>>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >>>> From koop at cse.ohio-state.edu Mon Nov 3 22:04:54 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Nov 3 22:05:05 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <490F9D55.5030901@cs.utah.edu> Message-ID: Justin, Thanks for this update. Even though the backtrace shows 'intra_shmem_Allreduce' it is not following the shared memory path, within that function a fallback is called. A couple things: - Does it work if all shared memory collectives are turned off? (VIADEV_USE_SHMEM_COLL=0) - Have you tried the 1.0.1 installed on TACC at all? Matt On Mon, 3 Nov 2008, Justin wrote: > Here is an update: > > I am running on ranger with the following ibrun command: > > ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../sus > > where sus is our executable. With this i'm still occasionally seeing a > hang at large numbers of processors at this stack trace: > > #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 > #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 > #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at > viacheck.c:505 > #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, > status=0x10, error_code=0xb) at mpid_recv.c:106 > #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, > array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 > #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, > sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, recvcount=1, > recvtype=6, source=2912, recvtag=14, comm=130, status=0x7fff952efd2c) at > sendrecv.c:98 > #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, > recvbuf=0x10, count=4, datatype=0xb, op=22046016, comm=0x1506810) at > intra_fns_new.c:5682 > #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > recvbuf=0x10, count=1, datatype=0xb, op=22046016, comm=0x1506810) at > intra_fns_new.c:6014 > #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, > count=11, datatype=11, op=22046016, comm=22046736) at allreduce.c:83 > #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > #10 0x0000000007d0db10 > > all reduce is still using shared memory. > > Do you have any more suggestions? > > Thanks, > Justin > > Matthew Koop wrote: > > Justin, > > > > I think there are a couple things here: > > > > 1.) Simply exporting the variables is not sufficient for the setup at > > TACC. You'll need to set it the following way: > > > > ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > > > > Since the ENVs weren't being propogated the setting wasn't taking effect > > (and that is why you still saw the shmem functions in the backtrace). > > > > 2.) There was a limitation in the 1.0 versions where when the > > shared memory bcast implementation was run on more than 1K nodes there > > would be a hang. Since the shared memory allreduce uses a bcast internally > > it is also hanging you can try just disabling the bcast: > > > > ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > > > > Let us know if this works or if you have additional questions. > > > > Thanks, > > Matt > > > > On Mon, 3 Nov 2008, Justin wrote: > > > > > >> Hi, > >> > >> We are using mvapich_devel_1.0 on Ranger. I am seeing my current lockup > >> at 16,384 processors at the following stacktrace: > >> > >> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, > >> out_of_order=0x7fff52849030) at viacheck.c:206 > >> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at > >> viacheck.c:505 > >> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, > >> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 > >> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, > >> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190 > >> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, > >> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, > >> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, > >> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 > >> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, > >> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, > >> comm=0x2aaaad75d000) at intra_fns_new.c:5682 > >> #6 0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, > >> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, > >> comm=0x2aaaad75d000) at intra_fns_new.c:6014 > >> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, > >> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) > >> at allreduce.c:83 > >> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >> > >> I was seeing lockups at smaller powers of two but adding the following > >> seemed to stop those: > >> > >> export VIADEV_USE_SHMEM_COLL=0 > >> export VIADEV_USE_SHMEM_ALLREDUCE=0 > >> > >> Now I am just seeing it at 16K. What is odd to me is that if the 2 > >> commands above stop the shared memory optimizations then why does the > >> stacktrace still show 'ntra_shmem_Allreduce' being called? > >> > >> Here is some other info that might be useful: > >> > >> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v > >> OSU MVAPICH VERSION 1.0-SingleRail > >> Build-ID: custom > >> > >> MPI Path: > >> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> > >> /opt/apps/intel10_1/mvapich-devel/1.0/include/ > >> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> > >> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ > >> > >> > >> Thanks, > >> Justin > >> > >> Dhabaleswar Panda wrote: > >> > >>> Justin, > >>> > >>> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on > >>> Ranger. These two stacks have the parameters named differently. Also, on > >>> what exact process count you see this problem. If you can also let us know > >>> the version number of mvapich/mvapich2 stack and/or the path of the MPI > >>> library on Ranger, it will be helpful. > >>> > >>> Thanks, > >>> > >>> DK > >>> > >>> On Mon, 3 Nov 2008, Justin wrote: > >>> > >>> > >>> > >>>> We are running into hangs on Ranger using mvapich that are not present > >>>> on other machines. These hangs seem to only occur on arge problems with > >>>> large numbers of processors. We have ran into similar problems on some > >>>> LLNL machines in the past and were able to get around them by disabling > >>>> the shared memory optimizations. In these cases the problem had to do > >>>> with fixed sized buffers used in the shared memory optimizations. > >>>> > >>>> We would like to disable shared memory on Ranger but are confused with > >>>> all the different parameters dealing with shared memory optimizations. > >>>> How do we know which parameters affect the run? For example do we use > >>>> the parameters that begin with MV_ or VIADEV_? From past conversations > >>>> I have had with support teams the parameters that have an effect vary > >>>> according to the hardware/mpi build. What is the best way to determine > >>>> which parameters are active? > >>>> > >>>> Also here is a stacktrace from one of our hangs: > >>>> > >>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 > >>>> Intel(R) Debugger for applications running on Intel(R) 64, Version > >>>> 10.1-35 , Build 20080310 > >>>> Attaching to program: > >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, > >>>> process 16033 > >>>> Reading symbols from > >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no > >>>> debugging symbols found)...done. > >>>> smpi_net_lookup () at mpid_smpi.c:1381 > >>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 > >>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360 > >>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at > >>>> viacheck.c:505 > >>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > >>>> status=0x10, error_code=0x4) at mpid_recv.c:106 > >>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > >>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 > >>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, > >>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1, > >>>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at > >>>> sendrecv.c:98 > >>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at > >>>> intra_fns_new.c:5682 > >>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at > >>>> intra_fns_new.c:6014 > >>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10, > >>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 > >>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >>>> > >>>> In this case what would be the likely parameter I could play with in > >>>> order to potentially stop a hang in MPI_Allreduce? > >>>> > >>>> Thanks, > >>>> Justin > >>>> _______________________________________________ > >>>> mvapich-discuss mailing list > >>>> mvapich-discuss@cse.ohio-state.edu > >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>> > >>>> > >>>> > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > >> > From karl at tacc.utexas.edu Tue Nov 4 08:00:06 2008 From: karl at tacc.utexas.edu (Karl W. Schulz) Date: Tue Nov 4 08:00:22 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: References: Message-ID: <6F274BA7-9057-437F-AD91-AA6DB941EA4A@tacc.utexas.edu> Just FYI so that everyone is aware, we actually do propagate all user environment variables on Ranger so it is sufficient to simply set VIADEV parameters in your job script as long as jobs are launched with ibrun. Karl On Nov 3, 2008, at 9:04 PM, Matthew Koop wrote: > Justin, > > Thanks for this update. Even though the backtrace shows > 'intra_shmem_Allreduce' it is not following the shared memory path, > within > that function a fallback is called. > > A couple things: > > - Does it work if all shared memory collectives are turned off? > (VIADEV_USE_SHMEM_COLL=0) > > - Have you tried the 1.0.1 installed on TACC at all? > > Matt > > On Mon, 3 Nov 2008, Justin wrote: > >> Here is an update: >> >> I am running on ranger with the following ibrun command: >> >> ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../sus >> >> where sus is our executable. With this i'm still occasionally >> seeing a >> hang at large numbers of processors at this stack trace: >> >> #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 >> #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at mpid_smpi.c: >> 1360 >> #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at >> viacheck.c:505 >> #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, >> status=0x10, error_code=0xb) at mpid_recv.c:106 >> #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, >> array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 >> #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, >> sendcount=16, >> sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, >> recvcount=1, >> recvtype=6, source=2912, recvtag=14, comm=130, >> status=0x7fff952efd2c) at >> sendrecv.c:98 >> #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, >> recvbuf=0x10, count=4, datatype=0xb, op=22046016, comm=0x1506810) at >> intra_fns_new.c:5682 >> #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >> recvbuf=0x10, count=1, datatype=0xb, op=22046016, comm=0x1506810) at >> intra_fns_new.c:6014 >> #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, >> recvbuf=0x10, >> count=11, datatype=11, op=22046016, comm=22046736) at allreduce.c:83 >> #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >> /work/00975/luitjens/SCIRun/optimized/lib/ >> libPackages_Uintah_CCA_Components_Schedulers.so >> #10 0x0000000007d0db10 >> >> all reduce is still using shared memory. >> >> Do you have any more suggestions? >> >> Thanks, >> Justin >> >> Matthew Koop wrote: >>> Justin, >>> >>> I think there are a couple things here: >>> >>> 1.) Simply exporting the variables is not sufficient for the setup >>> at >>> TACC. You'll need to set it the following way: >>> >>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name >>> >>> Since the ENVs weren't being propogated the setting wasn't taking >>> effect >>> (and that is why you still saw the shmem functions in the >>> backtrace). >>> >>> 2.) There was a limitation in the 1.0 versions where when the >>> shared memory bcast implementation was run on more than 1K nodes >>> there >>> would be a hang. Since the shared memory allreduce uses a bcast >>> internally >>> it is also hanging you can try just disabling the bcast: >>> >>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name >>> >>> Let us know if this works or if you have additional questions. >>> >>> Thanks, >>> Matt >>> >>> On Mon, 3 Nov 2008, Justin wrote: >>> >>> >>>> Hi, >>>> >>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my current >>>> lockup >>>> at 16,384 processors at the following stacktrace: >>>> >>>> #0 0x00002b015c4f85ff in poll_rdma_buffer >>>> (vbuf_addr=0x7fff52849020, >>>> out_of_order=0x7fff52849030) at viacheck.c:206 >>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at >>>> viacheck.c:505 >>>> #2 0x00002b015c4db00b in MPID_RecvComplete >>>> (request=0x7fff52849020, >>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 >>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, >>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at >>>> waitall.c:190 >>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, >>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, >>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, >>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 >>>> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, >>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, >>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 >>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce >>>> (sendbuf=0x7fff52849020, >>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, >>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 >>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, >>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, >>>> comm=-1384787968) >>>> at allreduce.c:83 >>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>> /work/00975/luitjens/SCIRun/optimized/lib/ >>>> libPackages_Uintah_CCA_Components_Schedulers.so >>>> >>>> I was seeing lockups at smaller powers of two but adding the >>>> following >>>> seemed to stop those: >>>> >>>> export VIADEV_USE_SHMEM_COLL=0 >>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 >>>> >>>> Now I am just seeing it at 16K. What is odd to me is that if the 2 >>>> commands above stop the shared memory optimizations then why does >>>> the >>>> stacktrace still show 'ntra_shmem_Allreduce' being called? >>>> >>>> Here is some other info that might be useful: >>>> >>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh >>>> -v >>>> OSU MVAPICH VERSION 1.0-SingleRail >>>> Build-ID: custom >>>> >>>> MPI Path: >>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> >>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ >>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> >>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ >>>> >>>> >>>> Thanks, >>>> Justin >>>> >>>> Dhabaleswar Panda wrote: >>>> >>>>> Justin, >>>>> >>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are >>>>> using on >>>>> Ranger. These two stacks have the parameters named differently. >>>>> Also, on >>>>> what exact process count you see this problem. If you can also >>>>> let us know >>>>> the version number of mvapich/mvapich2 stack and/or the path of >>>>> the MPI >>>>> library on Ranger, it will be helpful. >>>>> >>>>> Thanks, >>>>> >>>>> DK >>>>> >>>>> On Mon, 3 Nov 2008, Justin wrote: >>>>> >>>>> >>>>> >>>>>> We are running into hangs on Ranger using mvapich that are not >>>>>> present >>>>>> on other machines. These hangs seem to only occur on arge >>>>>> problems with >>>>>> large numbers of processors. We have ran into similar problems >>>>>> on some >>>>>> LLNL machines in the past and were able to get around them by >>>>>> disabling >>>>>> the shared memory optimizations. In these cases the problem >>>>>> had to do >>>>>> with fixed sized buffers used in the shared memory optimizations. >>>>>> >>>>>> We would like to disable shared memory on Ranger but are >>>>>> confused with >>>>>> all the different parameters dealing with shared memory >>>>>> optimizations. >>>>>> How do we know which parameters affect the run? For example do >>>>>> we use >>>>>> the parameters that begin with MV_ or VIADEV_? From past >>>>>> conversations >>>>>> I have had with support teams the parameters that have an >>>>>> effect vary >>>>>> according to the hardware/mpi build. What is the best way to >>>>>> determine >>>>>> which parameters are active? >>>>>> >>>>>> Also here is a stacktrace from one of our hangs: >>>>>> >>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 >>>>>> Intel(R) Debugger for applications running on Intel(R) 64, >>>>>> Version >>>>>> 10.1-35 , Build 20080310 >>>>>> Attaching to program: >>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ >>>>>> StandAlone/sus, >>>>>> process 16033 >>>>>> Reading symbols from >>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ >>>>>> StandAlone/sus...(no >>>>>> debugging symbols found)...done. >>>>>> smpi_net_lookup () at mpid_smpi.c:1381 >>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 >>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at >>>>>> mpid_smpi.c:1360 >>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at >>>>>> viacheck.c:505 >>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 >>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 >>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, >>>>>> sendcount=16, >>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, >>>>>> recvcount=1, >>>>>> recvtype=6, source=2278, recvtag=14, comm=130, >>>>>> status=0x7fff4385028c) at >>>>>> sendrecv.c:98 >>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, >>>>>> comm=0x1506680) at >>>>>> intra_fns_new.c:5682 >>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce >>>>>> (sendbuf=0x6d29f0, >>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, >>>>>> comm=0x1506680) at >>>>>> intra_fns_new.c:6014 >>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, >>>>>> recvbuf=0x10, >>>>>> count=4, datatype=14, op=22045696, comm=22046336) at >>>>>> allreduce.c:83 >>>>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii >>>>>> () in >>>>>> /work/00975/luitjens/SCIRun/optimized/lib/ >>>>>> libPackages_Uintah_CCA_Components_Schedulers.so >>>>>> >>>>>> In this case what would be the likely parameter I could play >>>>>> with in >>>>>> order to potentially stop a hang in MPI_Allreduce? >>>>>> >>>>>> Thanks, >>>>>> Justin >>>>>> _______________________________________________ >>>>>> mvapich-discuss mailing list >>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>> >>>>>> >>>>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>> >>>> >> > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From panda at cse.ohio-state.edu Tue Nov 4 09:18:34 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Nov 4 09:18:47 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <6F274BA7-9057-437F-AD91-AA6DB941EA4A@tacc.utexas.edu> Message-ID: Karl, > Just FYI so that everyone is aware, we actually do propagate all user > environment variables on Ranger so it is sufficient to simply set > VIADEV parameters in your job script as long as jobs are launched with > ibrun. Thanks for the clarification here. DK > Karl > > On Nov 3, 2008, at 9:04 PM, Matthew Koop wrote: > > > Justin, > > > > Thanks for this update. Even though the backtrace shows > > 'intra_shmem_Allreduce' it is not following the shared memory path, > > within > > that function a fallback is called. > > > > A couple things: > > > > - Does it work if all shared memory collectives are turned off? > > (VIADEV_USE_SHMEM_COLL=0) > > > > - Have you tried the 1.0.1 installed on TACC at all? > > > > Matt > > > > On Mon, 3 Nov 2008, Justin wrote: > > > >> Here is an update: > >> > >> I am running on ranger with the following ibrun command: > >> > >> ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../sus > >> > >> where sus is our executable. With this i'm still occasionally > >> seeing a > >> hang at large numbers of processors at this stack trace: > >> > >> #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 > >> #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at mpid_smpi.c: > >> 1360 > >> #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at > >> viacheck.c:505 > >> #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, > >> status=0x10, error_code=0xb) at mpid_recv.c:106 > >> #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, > >> array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 > >> #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, > >> sendcount=16, > >> sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, > >> recvcount=1, > >> recvtype=6, source=2912, recvtag=14, comm=130, > >> status=0x7fff952efd2c) at > >> sendrecv.c:98 > >> #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, > >> recvbuf=0x10, count=4, datatype=0xb, op=22046016, comm=0x1506810) at > >> intra_fns_new.c:5682 > >> #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >> recvbuf=0x10, count=1, datatype=0xb, op=22046016, comm=0x1506810) at > >> intra_fns_new.c:6014 > >> #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, > >> recvbuf=0x10, > >> count=11, datatype=11, op=22046016, comm=22046736) at allreduce.c:83 > >> #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >> /work/00975/luitjens/SCIRun/optimized/lib/ > >> libPackages_Uintah_CCA_Components_Schedulers.so > >> #10 0x0000000007d0db10 > >> > >> all reduce is still using shared memory. > >> > >> Do you have any more suggestions? > >> > >> Thanks, > >> Justin > >> > >> Matthew Koop wrote: > >>> Justin, > >>> > >>> I think there are a couple things here: > >>> > >>> 1.) Simply exporting the variables is not sufficient for the setup > >>> at > >>> TACC. You'll need to set it the following way: > >>> > >>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > >>> > >>> Since the ENVs weren't being propogated the setting wasn't taking > >>> effect > >>> (and that is why you still saw the shmem functions in the > >>> backtrace). > >>> > >>> 2.) There was a limitation in the 1.0 versions where when the > >>> shared memory bcast implementation was run on more than 1K nodes > >>> there > >>> would be a hang. Since the shared memory allreduce uses a bcast > >>> internally > >>> it is also hanging you can try just disabling the bcast: > >>> > >>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > >>> > >>> Let us know if this works or if you have additional questions. > >>> > >>> Thanks, > >>> Matt > >>> > >>> On Mon, 3 Nov 2008, Justin wrote: > >>> > >>> > >>>> Hi, > >>>> > >>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my current > >>>> lockup > >>>> at 16,384 processors at the following stacktrace: > >>>> > >>>> #0 0x00002b015c4f85ff in poll_rdma_buffer > >>>> (vbuf_addr=0x7fff52849020, > >>>> out_of_order=0x7fff52849030) at viacheck.c:206 > >>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at > >>>> viacheck.c:505 > >>>> #2 0x00002b015c4db00b in MPID_RecvComplete > >>>> (request=0x7fff52849020, > >>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 > >>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, > >>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at > >>>> waitall.c:190 > >>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, > >>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, > >>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, > >>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 > >>>> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, > >>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, > >>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 > >>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce > >>>> (sendbuf=0x7fff52849020, > >>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, > >>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 > >>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, > >>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, > >>>> comm=-1384787968) > >>>> at allreduce.c:83 > >>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>>> /work/00975/luitjens/SCIRun/optimized/lib/ > >>>> libPackages_Uintah_CCA_Components_Schedulers.so > >>>> > >>>> I was seeing lockups at smaller powers of two but adding the > >>>> following > >>>> seemed to stop those: > >>>> > >>>> export VIADEV_USE_SHMEM_COLL=0 > >>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 > >>>> > >>>> Now I am just seeing it at 16K. What is odd to me is that if the 2 > >>>> commands above stop the shared memory optimizations then why does > >>>> the > >>>> stacktrace still show 'ntra_shmem_Allreduce' being called? > >>>> > >>>> Here is some other info that might be useful: > >>>> > >>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh > >>>> -v > >>>> OSU MVAPICH VERSION 1.0-SingleRail > >>>> Build-ID: custom > >>>> > >>>> MPI Path: > >>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> > >>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ > >>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> > >>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ > >>>> > >>>> > >>>> Thanks, > >>>> Justin > >>>> > >>>> Dhabaleswar Panda wrote: > >>>> > >>>>> Justin, > >>>>> > >>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are > >>>>> using on > >>>>> Ranger. These two stacks have the parameters named differently. > >>>>> Also, on > >>>>> what exact process count you see this problem. If you can also > >>>>> let us know > >>>>> the version number of mvapich/mvapich2 stack and/or the path of > >>>>> the MPI > >>>>> library on Ranger, it will be helpful. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> DK > >>>>> > >>>>> On Mon, 3 Nov 2008, Justin wrote: > >>>>> > >>>>> > >>>>> > >>>>>> We are running into hangs on Ranger using mvapich that are not > >>>>>> present > >>>>>> on other machines. These hangs seem to only occur on arge > >>>>>> problems with > >>>>>> large numbers of processors. We have ran into similar problems > >>>>>> on some > >>>>>> LLNL machines in the past and were able to get around them by > >>>>>> disabling > >>>>>> the shared memory optimizations. In these cases the problem > >>>>>> had to do > >>>>>> with fixed sized buffers used in the shared memory optimizations. > >>>>>> > >>>>>> We would like to disable shared memory on Ranger but are > >>>>>> confused with > >>>>>> all the different parameters dealing with shared memory > >>>>>> optimizations. > >>>>>> How do we know which parameters affect the run? For example do > >>>>>> we use > >>>>>> the parameters that begin with MV_ or VIADEV_? From past > >>>>>> conversations > >>>>>> I have had with support teams the parameters that have an > >>>>>> effect vary > >>>>>> according to the hardware/mpi build. What is the best way to > >>>>>> determine > >>>>>> which parameters are active? > >>>>>> > >>>>>> Also here is a stacktrace from one of our hangs: > >>>>>> > >>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 > >>>>>> Intel(R) Debugger for applications running on Intel(R) 64, > >>>>>> Version > >>>>>> 10.1-35 , Build 20080310 > >>>>>> Attaching to program: > >>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ > >>>>>> StandAlone/sus, > >>>>>> process 16033 > >>>>>> Reading symbols from > >>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ > >>>>>> StandAlone/sus...(no > >>>>>> debugging symbols found)...done. > >>>>>> smpi_net_lookup () at mpid_smpi.c:1381 > >>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 > >>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at > >>>>>> mpid_smpi.c:1360 > >>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at > >>>>>> viacheck.c:505 > >>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > >>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 > >>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > >>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 > >>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, > >>>>>> sendcount=16, > >>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, > >>>>>> recvcount=1, > >>>>>> recvtype=6, source=2278, recvtag=14, comm=130, > >>>>>> status=0x7fff4385028c) at > >>>>>> sendrecv.c:98 > >>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, > >>>>>> comm=0x1506680) at > >>>>>> intra_fns_new.c:5682 > >>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce > >>>>>> (sendbuf=0x6d29f0, > >>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, > >>>>>> comm=0x1506680) at > >>>>>> intra_fns_new.c:6014 > >>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, > >>>>>> recvbuf=0x10, > >>>>>> count=4, datatype=14, op=22045696, comm=22046336) at > >>>>>> allreduce.c:83 > >>>>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii > >>>>>> () in > >>>>>> /work/00975/luitjens/SCIRun/optimized/lib/ > >>>>>> libPackages_Uintah_CCA_Components_Schedulers.so > >>>>>> > >>>>>> In this case what would be the likely parameter I could play > >>>>>> with in > >>>>>> order to potentially stop a hang in MPI_Allreduce? > >>>>>> > >>>>>> Thanks, > >>>>>> Justin > >>>>>> _______________________________________________ > >>>>>> mvapich-discuss mailing list > >>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>>> > >>>>>> > >>>>>> > >>>> _______________________________________________ > >>>> mvapich-discuss mailing list > >>>> mvapich-discuss@cse.ohio-state.edu > >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>> > >>>> > >> > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse.ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From luitjens at cs.utah.edu Tue Nov 4 10:16:28 2008 From: luitjens at cs.utah.edu (Justin) Date: Tue Nov 4 10:17:30 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <6F274BA7-9057-437F-AD91-AA6DB941EA4A@tacc.utexas.edu> References: <6F274BA7-9057-437F-AD91-AA6DB941EA4A@tacc.utexas.edu> Message-ID: <4910674C.5080703@cs.utah.edu> Thanks, I thought this was the case but I wasn't positive. It appears my hangs have been resolved by doing two things: 1) update from 1.0 to 1.0.1 2) disable shared memory broadcast (would hang on 16K in 1.0.1). Is number 2 fixed in 1.1? If so when is 1.1's release date? I will contact TACC and let them know the solution to my problem so they can relay it to others who have a similar problem. Thanks, Justin Karl W. Schulz wrote: > Just FYI so that everyone is aware, we actually do propagate all user > environment variables on Ranger so it is sufficient to simply set > VIADEV parameters in your job script as long as jobs are launched with > ibrun. > > Karl > > On Nov 3, 2008, at 9:04 PM, Matthew Koop wrote: > >> Justin, >> >> Thanks for this update. Even though the backtrace shows >> 'intra_shmem_Allreduce' it is not following the shared memory path, >> within >> that function a fallback is called. >> >> A couple things: >> >> - Does it work if all shared memory collectives are turned off? >> (VIADEV_USE_SHMEM_COLL=0) >> >> - Have you tried the 1.0.1 installed on TACC at all? >> >> Matt >> >> On Mon, 3 Nov 2008, Justin wrote: >> >>> Here is an update: >>> >>> I am running on ranger with the following ibrun command: >>> >>> ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../sus >>> >>> where sus is our executable. With this i'm still occasionally seeing a >>> hang at large numbers of processors at this stack trace: >>> >>> #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 >>> #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at >>> mpid_smpi.c:1360 >>> #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at >>> viacheck.c:505 >>> #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, >>> status=0x10, error_code=0xb) at mpid_recv.c:106 >>> #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, >>> array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 >>> #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, >>> sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, recvcount=1, >>> recvtype=6, source=2912, recvtag=14, comm=130, >>> status=0x7fff952efd2c) at >>> sendrecv.c:98 >>> #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, >>> recvbuf=0x10, count=4, datatype=0xb, op=22046016, comm=0x1506810) at >>> intra_fns_new.c:5682 >>> #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >>> recvbuf=0x10, count=1, datatype=0xb, op=22046016, comm=0x1506810) at >>> intra_fns_new.c:6014 >>> #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, >>> recvbuf=0x10, >>> count=11, datatype=11, op=22046016, comm=22046736) at allreduce.c:83 >>> #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>> >>> #10 0x0000000007d0db10 >>> >>> all reduce is still using shared memory. >>> >>> Do you have any more suggestions? >>> >>> Thanks, >>> Justin >>> >>> Matthew Koop wrote: >>>> Justin, >>>> >>>> I think there are a couple things here: >>>> >>>> 1.) Simply exporting the variables is not sufficient for the setup at >>>> TACC. You'll need to set it the following way: >>>> >>>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name >>>> >>>> Since the ENVs weren't being propogated the setting wasn't taking >>>> effect >>>> (and that is why you still saw the shmem functions in the backtrace). >>>> >>>> 2.) There was a limitation in the 1.0 versions where when the >>>> shared memory bcast implementation was run on more than 1K nodes there >>>> would be a hang. Since the shared memory allreduce uses a bcast >>>> internally >>>> it is also hanging you can try just disabling the bcast: >>>> >>>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name >>>> >>>> Let us know if this works or if you have additional questions. >>>> >>>> Thanks, >>>> Matt >>>> >>>> On Mon, 3 Nov 2008, Justin wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my current >>>>> lockup >>>>> at 16,384 processors at the following stacktrace: >>>>> >>>>> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, >>>>> out_of_order=0x7fff52849030) at viacheck.c:206 >>>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at >>>>> viacheck.c:505 >>>>> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, >>>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 >>>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, >>>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at >>>>> waitall.c:190 >>>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, >>>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, >>>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, >>>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 >>>>> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, >>>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 >>>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce >>>>> (sendbuf=0x7fff52849020, >>>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 >>>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, >>>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, >>>>> comm=-1384787968) >>>>> at allreduce.c:83 >>>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>>> >>>>> >>>>> I was seeing lockups at smaller powers of two but adding the >>>>> following >>>>> seemed to stop those: >>>>> >>>>> export VIADEV_USE_SHMEM_COLL=0 >>>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 >>>>> >>>>> Now I am just seeing it at 16K. What is odd to me is that if the 2 >>>>> commands above stop the shared memory optimizations then why does the >>>>> stacktrace still show 'ntra_shmem_Allreduce' being called? >>>>> >>>>> Here is some other info that might be useful: >>>>> >>>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v >>>>> OSU MVAPICH VERSION 1.0-SingleRail >>>>> Build-ID: custom >>>>> >>>>> MPI Path: >>>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ >>>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ >>>>> >>>>> >>>>> Thanks, >>>>> Justin >>>>> >>>>> Dhabaleswar Panda wrote: >>>>> >>>>>> Justin, >>>>>> >>>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are >>>>>> using on >>>>>> Ranger. These two stacks have the parameters named differently. >>>>>> Also, on >>>>>> what exact process count you see this problem. If you can also >>>>>> let us know >>>>>> the version number of mvapich/mvapich2 stack and/or the path of >>>>>> the MPI >>>>>> library on Ranger, it will be helpful. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> DK >>>>>> >>>>>> On Mon, 3 Nov 2008, Justin wrote: >>>>>> >>>>>> >>>>>> >>>>>>> We are running into hangs on Ranger using mvapich that are not >>>>>>> present >>>>>>> on other machines. These hangs seem to only occur on arge >>>>>>> problems with >>>>>>> large numbers of processors. We have ran into similar problems >>>>>>> on some >>>>>>> LLNL machines in the past and were able to get around them by >>>>>>> disabling >>>>>>> the shared memory optimizations. In these cases the problem had >>>>>>> to do >>>>>>> with fixed sized buffers used in the shared memory optimizations. >>>>>>> >>>>>>> We would like to disable shared memory on Ranger but are >>>>>>> confused with >>>>>>> all the different parameters dealing with shared memory >>>>>>> optimizations. >>>>>>> How do we know which parameters affect the run? For example do >>>>>>> we use >>>>>>> the parameters that begin with MV_ or VIADEV_? From past >>>>>>> conversations >>>>>>> I have had with support teams the parameters that have an effect >>>>>>> vary >>>>>>> according to the hardware/mpi build. What is the best way to >>>>>>> determine >>>>>>> which parameters are active? >>>>>>> >>>>>>> Also here is a stacktrace from one of our hangs: >>>>>>> >>>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 >>>>>>> Intel(R) Debugger for applications running on Intel(R) 64, Version >>>>>>> 10.1-35 , Build 20080310 >>>>>>> Attaching to program: >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, >>>>>>> >>>>>>> process 16033 >>>>>>> Reading symbols from >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no >>>>>>> >>>>>>> debugging symbols found)...done. >>>>>>> smpi_net_lookup () at mpid_smpi.c:1381 >>>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 >>>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at >>>>>>> mpid_smpi.c:1360 >>>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at >>>>>>> viacheck.c:505 >>>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >>>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 >>>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >>>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 >>>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, >>>>>>> sendcount=16, >>>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, >>>>>>> recvcount=1, >>>>>>> recvtype=6, source=2278, recvtag=14, comm=130, >>>>>>> status=0x7fff4385028c) at >>>>>>> sendrecv.c:98 >>>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, >>>>>>> comm=0x1506680) at >>>>>>> intra_fns_new.c:5682 >>>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >>>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, >>>>>>> comm=0x1506680) at >>>>>>> intra_fns_new.c:6014 >>>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, >>>>>>> recvbuf=0x10, >>>>>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 >>>>>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>>>>> >>>>>>> >>>>>>> In this case what would be the likely parameter I could play >>>>>>> with in >>>>>>> order to potentially stop a hang in MPI_Allreduce? >>>>>>> >>>>>>> Thanks, >>>>>>> Justin >>>>>>> _______________________________________________ >>>>>>> mvapich-discuss mailing list >>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>>> >>>>>>> >>>>>>> >>>>> _______________________________________________ >>>>> mvapich-discuss mailing list >>>>> mvapich-discuss@cse.ohio-state.edu >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>> >>>>> >>> >> >> _______________________________________________ >> mvapich-discuss mailing list >> mvapich-discuss@cse.ohio-state.edu >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss From koop at cse.ohio-state.edu Tue Nov 4 15:12:20 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue Nov 4 15:12:32 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <4910674C.5080703@cs.utah.edu> Message-ID: > It appears my hangs have been resolved by doing two things: > > 1) update from 1.0 to 1.0.1 > 2) disable shared memory broadcast (would hang on 16K in 1.0.1). Good to hear that it is working for you now. > Is number 2 fixed in 1.1? If so when is 1.1's release date? This is fixed in 1.1 and will be released next week. You can also change the value at compile time in the 1.0.1 release by changing #define DEFAULT_SHMEM_BCAST_LEADERS 1024 to a higher value (however many nodes are used at maximum) in src/env/initutil.c. The MPI library would have to be recompiled to take this change though. Thanks, Matt > I will contact TACC and let them know the solution to my problem so they > can relay it to others who have a similar problem. > > Karl W. Schulz wrote: > > Just FYI so that everyone is aware, we actually do propagate all user > > environment variables on Ranger so it is sufficient to simply set > > VIADEV parameters in your job script as long as jobs are launched with > > ibrun. > > > > Karl > > > > On Nov 3, 2008, at 9:04 PM, Matthew Koop wrote: > > > >> Justin, > >> > >> Thanks for this update. Even though the backtrace shows > >> 'intra_shmem_Allreduce' it is not following the shared memory path, > >> within > >> that function a fallback is called. > >> > >> A couple things: > >> > >> - Does it work if all shared memory collectives are turned off? > >> (VIADEV_USE_SHMEM_COLL=0) > >> > >> - Have you tried the 1.0.1 installed on TACC at all? > >> > >> Matt > >> > >> On Mon, 3 Nov 2008, Justin wrote: > >> > >>> Here is an update: > >>> > >>> I am running on ranger with the following ibrun command: > >>> > >>> ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../sus > >>> > >>> where sus is our executable. With this i'm still occasionally seeing a > >>> hang at large numbers of processors at this stack trace: > >>> > >>> #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 > >>> #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at > >>> mpid_smpi.c:1360 > >>> #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at > >>> viacheck.c:505 > >>> #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, > >>> status=0x10, error_code=0xb) at mpid_recv.c:106 > >>> #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, > >>> array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 > >>> #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16, > >>> sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, recvcount=1, > >>> recvtype=6, source=2912, recvtag=14, comm=130, > >>> status=0x7fff952efd2c) at > >>> sendrecv.c:98 > >>> #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>> recvbuf=0x10, count=4, datatype=0xb, op=22046016, comm=0x1506810) at > >>> intra_fns_new.c:5682 > >>> #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >>> recvbuf=0x10, count=1, datatype=0xb, op=22046016, comm=0x1506810) at > >>> intra_fns_new.c:6014 > >>> #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, > >>> recvbuf=0x10, > >>> count=11, datatype=11, op=22046016, comm=22046736) at allreduce.c:83 > >>> #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >>> > >>> #10 0x0000000007d0db10 > >>> > >>> all reduce is still using shared memory. > >>> > >>> Do you have any more suggestions? > >>> > >>> Thanks, > >>> Justin > >>> > >>> Matthew Koop wrote: > >>>> Justin, > >>>> > >>>> I think there are a couple things here: > >>>> > >>>> 1.) Simply exporting the variables is not sufficient for the setup at > >>>> TACC. You'll need to set it the following way: > >>>> > >>>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > >>>> > >>>> Since the ENVs weren't being propogated the setting wasn't taking > >>>> effect > >>>> (and that is why you still saw the shmem functions in the backtrace). > >>>> > >>>> 2.) There was a limitation in the 1.0 versions where when the > >>>> shared memory bcast implementation was run on more than 1K nodes there > >>>> would be a hang. Since the shared memory allreduce uses a bcast > >>>> internally > >>>> it is also hanging you can try just disabling the bcast: > >>>> > >>>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > >>>> > >>>> Let us know if this works or if you have additional questions. > >>>> > >>>> Thanks, > >>>> Matt > >>>> > >>>> On Mon, 3 Nov 2008, Justin wrote: > >>>> > >>>> > >>>>> Hi, > >>>>> > >>>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my current > >>>>> lockup > >>>>> at 16,384 processors at the following stacktrace: > >>>>> > >>>>> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, > >>>>> out_of_order=0x7fff52849030) at viacheck.c:206 > >>>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at > >>>>> viacheck.c:505 > >>>>> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, > >>>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 > >>>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, > >>>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at > >>>>> waitall.c:190 > >>>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, > >>>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, > >>>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, > >>>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 > >>>>> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, > >>>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, > >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 > >>>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce > >>>>> (sendbuf=0x7fff52849020, > >>>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, > >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 > >>>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, > >>>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, > >>>>> comm=-1384787968) > >>>>> at allreduce.c:83 > >>>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >>>>> > >>>>> > >>>>> I was seeing lockups at smaller powers of two but adding the > >>>>> following > >>>>> seemed to stop those: > >>>>> > >>>>> export VIADEV_USE_SHMEM_COLL=0 > >>>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 > >>>>> > >>>>> Now I am just seeing it at 16K. What is odd to me is that if the 2 > >>>>> commands above stop the shared memory optimizations then why does the > >>>>> stacktrace still show 'ntra_shmem_Allreduce' being called? > >>>>> > >>>>> Here is some other info that might be useful: > >>>>> > >>>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v > >>>>> OSU MVAPICH VERSION 1.0-SingleRail > >>>>> Build-ID: custom > >>>>> > >>>>> MPI Path: > >>>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> > >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ > >>>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> > >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ > >>>>> > >>>>> > >>>>> Thanks, > >>>>> Justin > >>>>> > >>>>> Dhabaleswar Panda wrote: > >>>>> > >>>>>> Justin, > >>>>>> > >>>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are > >>>>>> using on > >>>>>> Ranger. These two stacks have the parameters named differently. > >>>>>> Also, on > >>>>>> what exact process count you see this problem. If you can also > >>>>>> let us know > >>>>>> the version number of mvapich/mvapich2 stack and/or the path of > >>>>>> the MPI > >>>>>> library on Ranger, it will be helpful. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> DK > >>>>>> > >>>>>> On Mon, 3 Nov 2008, Justin wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> We are running into hangs on Ranger using mvapich that are not > >>>>>>> present > >>>>>>> on other machines. These hangs seem to only occur on arge > >>>>>>> problems with > >>>>>>> large numbers of processors. We have ran into similar problems > >>>>>>> on some > >>>>>>> LLNL machines in the past and were able to get around them by > >>>>>>> disabling > >>>>>>> the shared memory optimizations. In these cases the problem had > >>>>>>> to do > >>>>>>> with fixed sized buffers used in the shared memory optimizations. > >>>>>>> > >>>>>>> We would like to disable shared memory on Ranger but are > >>>>>>> confused with > >>>>>>> all the different parameters dealing with shared memory > >>>>>>> optimizations. > >>>>>>> How do we know which parameters affect the run? For example do > >>>>>>> we use > >>>>>>> the parameters that begin with MV_ or VIADEV_? From past > >>>>>>> conversations > >>>>>>> I have had with support teams the parameters that have an effect > >>>>>>> vary > >>>>>>> according to the hardware/mpi build. What is the best way to > >>>>>>> determine > >>>>>>> which parameters are active? > >>>>>>> > >>>>>>> Also here is a stacktrace from one of our hangs: > >>>>>>> > >>>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 > >>>>>>> Intel(R) Debugger for applications running on Intel(R) 64, Version > >>>>>>> 10.1-35 , Build 20080310 > >>>>>>> Attaching to program: > >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, > >>>>>>> > >>>>>>> process 16033 > >>>>>>> Reading symbols from > >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no > >>>>>>> > >>>>>>> debugging symbols found)...done. > >>>>>>> smpi_net_lookup () at mpid_smpi.c:1381 > >>>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 > >>>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at > >>>>>>> mpid_smpi.c:1360 > >>>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at > >>>>>>> viacheck.c:505 > >>>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > >>>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 > >>>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > >>>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 > >>>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, > >>>>>>> sendcount=16, > >>>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, > >>>>>>> recvcount=1, > >>>>>>> recvtype=6, source=2278, recvtag=14, comm=130, > >>>>>>> status=0x7fff4385028c) at > >>>>>>> sendrecv.c:98 > >>>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, > >>>>>>> comm=0x1506680) at > >>>>>>> intra_fns_new.c:5682 > >>>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >>>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, > >>>>>>> comm=0x1506680) at > >>>>>>> intra_fns_new.c:6014 > >>>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, > >>>>>>> recvbuf=0x10, > >>>>>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 > >>>>>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >>>>>>> > >>>>>>> > >>>>>>> In this case what would be the likely parameter I could play > >>>>>>> with in > >>>>>>> order to potentially stop a hang in MPI_Allreduce? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Justin > >>>>>>> _______________________________________________ > >>>>>>> mvapich-discuss mailing list > >>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> _______________________________________________ > >>>>> mvapich-discuss mailing list > >>>>> mvapich-discuss@cse.ohio-state.edu > >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>> > >>>>> > >>> > >> > >> _______________________________________________ > >> mvapich-discuss mailing list > >> mvapich-discuss@cse.ohio-state.edu > >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From karl at tacc.utexas.edu Tue Nov 4 15:42:24 2008 From: karl at tacc.utexas.edu (Karl W. Schulz) Date: Tue Nov 4 15:42:37 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: References: Message-ID: Matt, Just to make sure I'm clear, I assume we should probably use a value of 3936 for DEFAULT_SHMEM_BCAST_LEADERS for Ranger to support runs across the entire system (ie. our max # of compute nodes)? Thanks, Karl On Nov 4, 2008, at 2:12 PM, Matthew Koop wrote: > >> It appears my hangs have been resolved by doing two things: >> >> 1) update from 1.0 to 1.0.1 >> 2) disable shared memory broadcast (would hang on 16K in 1.0.1). > > Good to hear that it is working for you now. > >> Is number 2 fixed in 1.1? If so when is 1.1's release date? > > This is fixed in 1.1 and will be released next week. You can also > change > the value at compile time in the 1.0.1 release by changing > > #define DEFAULT_SHMEM_BCAST_LEADERS 1024 > > to a higher value (however many nodes are used at maximum) in > src/env/initutil.c. The MPI library would have to be recompiled to > take > this change though. > > Thanks, > > Matt > > >> I will contact TACC and let them know the solution to my problem so >> they >> can relay it to others who have a similar problem. >> >> Karl W. Schulz wrote: >>> Just FYI so that everyone is aware, we actually do propagate all >>> user >>> environment variables on Ranger so it is sufficient to simply set >>> VIADEV parameters in your job script as long as jobs are launched >>> with >>> ibrun. >>> >>> Karl >>> >>> On Nov 3, 2008, at 9:04 PM, Matthew Koop wrote: >>> >>>> Justin, >>>> >>>> Thanks for this update. Even though the backtrace shows >>>> 'intra_shmem_Allreduce' it is not following the shared memory path, >>>> within >>>> that function a fallback is called. >>>> >>>> A couple things: >>>> >>>> - Does it work if all shared memory collectives are turned off? >>>> (VIADEV_USE_SHMEM_COLL=0) >>>> >>>> - Have you tried the 1.0.1 installed on TACC at all? >>>> >>>> Matt >>>> >>>> On Mon, 3 Nov 2008, Justin wrote: >>>> >>>>> Here is an update: >>>>> >>>>> I am running on ranger with the following ibrun command: >>>>> >>>>> ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../ >>>>> sus >>>>> >>>>> where sus is our executable. With this i'm still occasionally >>>>> seeing a >>>>> hang at large numbers of processors at this stack trace: >>>>> >>>>> #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 >>>>> #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at >>>>> mpid_smpi.c:1360 >>>>> #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at >>>>> viacheck.c:505 >>>>> #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, >>>>> status=0x10, error_code=0xb) at mpid_recv.c:106 >>>>> #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, >>>>> array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 >>>>> #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, >>>>> sendcount=16, >>>>> sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, >>>>> recvcount=1, >>>>> recvtype=6, source=2912, recvtag=14, comm=130, >>>>> status=0x7fff952efd2c) at >>>>> sendrecv.c:98 >>>>> #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>>> recvbuf=0x10, count=4, datatype=0xb, op=22046016, >>>>> comm=0x1506810) at >>>>> intra_fns_new.c:5682 >>>>> #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >>>>> recvbuf=0x10, count=1, datatype=0xb, op=22046016, >>>>> comm=0x1506810) at >>>>> intra_fns_new.c:6014 >>>>> #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, >>>>> recvbuf=0x10, >>>>> count=11, datatype=11, op=22046016, comm=22046736) at >>>>> allreduce.c:83 >>>>> #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () >>>>> in >>>>> /work/00975/luitjens/SCIRun/optimized/lib/ >>>>> libPackages_Uintah_CCA_Components_Schedulers.so >>>>> >>>>> #10 0x0000000007d0db10 >>>>> >>>>> all reduce is still using shared memory. >>>>> >>>>> Do you have any more suggestions? >>>>> >>>>> Thanks, >>>>> Justin >>>>> >>>>> Matthew Koop wrote: >>>>>> Justin, >>>>>> >>>>>> I think there are a couple things here: >>>>>> >>>>>> 1.) Simply exporting the variables is not sufficient for the >>>>>> setup at >>>>>> TACC. You'll need to set it the following way: >>>>>> >>>>>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name >>>>>> >>>>>> Since the ENVs weren't being propogated the setting wasn't taking >>>>>> effect >>>>>> (and that is why you still saw the shmem functions in the >>>>>> backtrace). >>>>>> >>>>>> 2.) There was a limitation in the 1.0 versions where when the >>>>>> shared memory bcast implementation was run on more than 1K >>>>>> nodes there >>>>>> would be a hang. Since the shared memory allreduce uses a bcast >>>>>> internally >>>>>> it is also hanging you can try just disabling the bcast: >>>>>> >>>>>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name >>>>>> >>>>>> Let us know if this works or if you have additional questions. >>>>>> >>>>>> Thanks, >>>>>> Matt >>>>>> >>>>>> On Mon, 3 Nov 2008, Justin wrote: >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my >>>>>>> current >>>>>>> lockup >>>>>>> at 16,384 processors at the following stacktrace: >>>>>>> >>>>>>> #0 0x00002b015c4f85ff in poll_rdma_buffer >>>>>>> (vbuf_addr=0x7fff52849020, >>>>>>> out_of_order=0x7fff52849030) at viacheck.c:206 >>>>>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck >>>>>>> (blocking=1384419360) at >>>>>>> viacheck.c:505 >>>>>>> #2 0x00002b015c4db00b in MPID_RecvComplete >>>>>>> (request=0x7fff52849020, >>>>>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 >>>>>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, >>>>>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at >>>>>>> waitall.c:190 >>>>>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, >>>>>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, >>>>>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, >>>>>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 >>>>>>> #5 0x00002b015c4c9d2d in intra_Allreduce >>>>>>> (sendbuf=0x7fff52849020, >>>>>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, >>>>>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 >>>>>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce >>>>>>> (sendbuf=0x7fff52849020, >>>>>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, >>>>>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 >>>>>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, >>>>>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, >>>>>>> comm=-1384787968) >>>>>>> at allreduce.c:83 >>>>>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii >>>>>>> () in >>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/ >>>>>>> libPackages_Uintah_CCA_Components_Schedulers.so >>>>>>> >>>>>>> >>>>>>> I was seeing lockups at smaller powers of two but adding the >>>>>>> following >>>>>>> seemed to stop those: >>>>>>> >>>>>>> export VIADEV_USE_SHMEM_COLL=0 >>>>>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 >>>>>>> >>>>>>> Now I am just seeing it at 16K. What is odd to me is that if >>>>>>> the 2 >>>>>>> commands above stop the shared memory optimizations then why >>>>>>> does the >>>>>>> stacktrace still show 'ntra_shmem_Allreduce' being called? >>>>>>> >>>>>>> Here is some other info that might be useful: >>>>>>> >>>>>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ >>>>>>> %mpirun_rsh -v >>>>>>> OSU MVAPICH VERSION 1.0-SingleRail >>>>>>> Build-ID: custom >>>>>>> >>>>>>> MPI Path: >>>>>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> >>>>>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ >>>>>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> >>>>>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Justin >>>>>>> >>>>>>> Dhabaleswar Panda wrote: >>>>>>> >>>>>>>> Justin, >>>>>>>> >>>>>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are >>>>>>>> using on >>>>>>>> Ranger. These two stacks have the parameters named differently. >>>>>>>> Also, on >>>>>>>> what exact process count you see this problem. If you can also >>>>>>>> let us know >>>>>>>> the version number of mvapich/mvapich2 stack and/or the path of >>>>>>>> the MPI >>>>>>>> library on Ranger, it will be helpful. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> DK >>>>>>>> >>>>>>>> On Mon, 3 Nov 2008, Justin wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> We are running into hangs on Ranger using mvapich that are not >>>>>>>>> present >>>>>>>>> on other machines. These hangs seem to only occur on arge >>>>>>>>> problems with >>>>>>>>> large numbers of processors. We have ran into similar >>>>>>>>> problems >>>>>>>>> on some >>>>>>>>> LLNL machines in the past and were able to get around them by >>>>>>>>> disabling >>>>>>>>> the shared memory optimizations. In these cases the problem >>>>>>>>> had >>>>>>>>> to do >>>>>>>>> with fixed sized buffers used in the shared memory >>>>>>>>> optimizations. >>>>>>>>> >>>>>>>>> We would like to disable shared memory on Ranger but are >>>>>>>>> confused with >>>>>>>>> all the different parameters dealing with shared memory >>>>>>>>> optimizations. >>>>>>>>> How do we know which parameters affect the run? For example >>>>>>>>> do >>>>>>>>> we use >>>>>>>>> the parameters that begin with MV_ or VIADEV_? From past >>>>>>>>> conversations >>>>>>>>> I have had with support teams the parameters that have an >>>>>>>>> effect >>>>>>>>> vary >>>>>>>>> according to the hardware/mpi build. What is the best way to >>>>>>>>> determine >>>>>>>>> which parameters are active? >>>>>>>>> >>>>>>>>> Also here is a stacktrace from one of our hangs: >>>>>>>>> >>>>>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 >>>>>>>>> Intel(R) Debugger for applications running on Intel(R) 64, >>>>>>>>> Version >>>>>>>>> 10.1-35 , Build 20080310 >>>>>>>>> Attaching to program: >>>>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ >>>>>>>>> StandAlone/sus, >>>>>>>>> >>>>>>>>> process 16033 >>>>>>>>> Reading symbols from >>>>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ >>>>>>>>> StandAlone/sus...(no >>>>>>>>> >>>>>>>>> debugging symbols found)...done. >>>>>>>>> smpi_net_lookup () at mpid_smpi.c:1381 >>>>>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c: >>>>>>>>> 1381 >>>>>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at >>>>>>>>> mpid_smpi.c:1360 >>>>>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck >>>>>>>>> (blocking=7154160) at >>>>>>>>> viacheck.c:505 >>>>>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >>>>>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 >>>>>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >>>>>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c: >>>>>>>>> 190 >>>>>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, >>>>>>>>> sendcount=16, >>>>>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, >>>>>>>>> recvcount=1, >>>>>>>>> recvtype=6, source=2278, recvtag=14, comm=130, >>>>>>>>> status=0x7fff4385028c) at >>>>>>>>> sendrecv.c:98 >>>>>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>>>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, >>>>>>>>> comm=0x1506680) at >>>>>>>>> intra_fns_new.c:5682 >>>>>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce >>>>>>>>> (sendbuf=0x6d29f0, >>>>>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, >>>>>>>>> comm=0x1506680) at >>>>>>>>> intra_fns_new.c:6014 >>>>>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, >>>>>>>>> recvbuf=0x10, >>>>>>>>> count=4, datatype=14, op=22045696, comm=22046336) at >>>>>>>>> allreduce.c:83 >>>>>>>>> #9 0x00002ada6a67a4f8 in >>>>>>>>> _ZN6Uintah12MPIScheduler7executeEii () in >>>>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/ >>>>>>>>> libPackages_Uintah_CCA_Components_Schedulers.so >>>>>>>>> >>>>>>>>> >>>>>>>>> In this case what would be the likely parameter I could play >>>>>>>>> with in >>>>>>>>> order to potentially stop a hang in MPI_Allreduce? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Justin >>>>>>>>> _______________________________________________ >>>>>>>>> mvapich-discuss mailing list >>>>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich- >>>>>>>>> discuss >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> mvapich-discuss mailing list >>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>>> >>>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> mvapich-discuss mailing list >>>> mvapich-discuss@cse.ohio-state.edu >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >> > From panda at cse.ohio-state.edu Tue Nov 4 16:55:46 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue Nov 4 16:55:59 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: Message-ID: Karl, > Just to make sure I'm clear, I assume we should probably use a value > of 3936 for DEFAULT_SHMEM_BCAST_LEADERS for Ranger to support runs > across the entire system (ie. our max # of compute nodes)? In MVAPICH 1.1 and MVAPICH2, we have set it to 4096 (reflecting 4K nodes) to take care of most of the large-scale IB clusters out there today. Setting it to 3936 should work. However, it is a non-power-of-2 number. We have not done in-depth testing whether it will lead to any side effect or not. To be safe, please use 4096 for the time being. This parameter is defined as a run-time environmental variable in MVAPICH 1.1 and MVAPICH2. Once these new releases are made (to happen soon) and you install these versions on Ranger, one can run some experiments with a value of 3936 and then this value can be adjusted. Thanks, DK > Thanks, > > Karl > > On Nov 4, 2008, at 2:12 PM, Matthew Koop wrote: > > > > >> It appears my hangs have been resolved by doing two things: > >> > >> 1) update from 1.0 to 1.0.1 > >> 2) disable shared memory broadcast (would hang on 16K in 1.0.1). > > > > Good to hear that it is working for you now. > > > >> Is number 2 fixed in 1.1? If so when is 1.1's release date? > > > > This is fixed in 1.1 and will be released next week. You can also > > change > > the value at compile time in the 1.0.1 release by changing > > > > #define DEFAULT_SHMEM_BCAST_LEADERS 1024 > > > > to a higher value (however many nodes are used at maximum) in > > src/env/initutil.c. The MPI library would have to be recompiled to > > take > > this change though. > > > > Thanks, > > > > Matt > > > > > >> I will contact TACC and let them know the solution to my problem so > >> they > >> can relay it to others who have a similar problem. > >> > >> Karl W. Schulz wrote: > >>> Just FYI so that everyone is aware, we actually do propagate all > >>> user > >>> environment variables on Ranger so it is sufficient to simply set > >>> VIADEV parameters in your job script as long as jobs are launched > >>> with > >>> ibrun. > >>> > >>> Karl > >>> > >>> On Nov 3, 2008, at 9:04 PM, Matthew Koop wrote: > >>> > >>>> Justin, > >>>> > >>>> Thanks for this update. Even though the backtrace shows > >>>> 'intra_shmem_Allreduce' it is not following the shared memory path, > >>>> within > >>>> that function a fallback is called. > >>>> > >>>> A couple things: > >>>> > >>>> - Does it work if all shared memory collectives are turned off? > >>>> (VIADEV_USE_SHMEM_COLL=0) > >>>> > >>>> - Have you tried the 1.0.1 installed on TACC at all? > >>>> > >>>> Matt > >>>> > >>>> On Mon, 3 Nov 2008, Justin wrote: > >>>> > >>>>> Here is an update: > >>>>> > >>>>> I am running on ranger with the following ibrun command: > >>>>> > >>>>> ibrun VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_SHMEM_ALLREDUCE=0 ../ > >>>>> sus > >>>>> > >>>>> where sus is our executable. With this i'm still occasionally > >>>>> seeing a > >>>>> hang at large numbers of processors at this stack trace: > >>>>> > >>>>> #0 0x00002abc19a38510 in smpi_net_lookup () at mpid_smpi.c:1381 > >>>>> #1 0x00002abc19a38414 in MPID_SMP_Check_incoming () at > >>>>> mpid_smpi.c:1360 > >>>>> #2 0x00002abc19a5293c in MPID_DeviceCheck (blocking=7154160) at > >>>>> viacheck.c:505 > >>>>> #3 0x00002abc19a3600b in MPID_RecvComplete (request=0x6d29f0, > >>>>> status=0x10, error_code=0xb) at mpid_recv.c:106 > >>>>> #4 0x00002abc19a5e2f7 in MPI_Waitall (count=7154160, > >>>>> array_of_requests=0x10, array_of_statuses=0xb) at waitall.c:190 > >>>>> #5 0x00002abc19a46d3c in MPI_Sendrecv (sendbuf=0x6d29f0, > >>>>> sendcount=16, > >>>>> sendtype=11, dest=11, sendtag=22046016, recvbuf=0x1506810, > >>>>> recvcount=1, > >>>>> recvtype=6, source=2912, recvtag=14, comm=130, > >>>>> status=0x7fff952efd2c) at > >>>>> sendrecv.c:98 > >>>>> #6 0x00002abc19a24d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>>>> recvbuf=0x10, count=4, datatype=0xb, op=22046016, > >>>>> comm=0x1506810) at > >>>>> intra_fns_new.c:5682 > >>>>> #7 0x00002abc19a24516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >>>>> recvbuf=0x10, count=1, datatype=0xb, op=22046016, > >>>>> comm=0x1506810) at > >>>>> intra_fns_new.c:6014 > >>>>> #8 0x00002abc199ef286 in MPI_Allreduce (sendbuf=0x6d29f0, > >>>>> recvbuf=0x10, > >>>>> count=11, datatype=11, op=22046016, comm=22046736) at > >>>>> allreduce.c:83 > >>>>> #9 0x00002abc18bda4f8 in _ZN6Uintah12MPIScheduler7executeEii () > >>>>> in > >>>>> /work/00975/luitjens/SCIRun/optimized/lib/ > >>>>> libPackages_Uintah_CCA_Components_Schedulers.so > >>>>> > >>>>> #10 0x0000000007d0db10 > >>>>> > >>>>> all reduce is still using shared memory. > >>>>> > >>>>> Do you have any more suggestions? > >>>>> > >>>>> Thanks, > >>>>> Justin > >>>>> > >>>>> Matthew Koop wrote: > >>>>>> Justin, > >>>>>> > >>>>>> I think there are a couple things here: > >>>>>> > >>>>>> 1.) Simply exporting the variables is not sufficient for the > >>>>>> setup at > >>>>>> TACC. You'll need to set it the following way: > >>>>>> > >>>>>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > >>>>>> > >>>>>> Since the ENVs weren't being propogated the setting wasn't taking > >>>>>> effect > >>>>>> (and that is why you still saw the shmem functions in the > >>>>>> backtrace). > >>>>>> > >>>>>> 2.) There was a limitation in the 1.0 versions where when the > >>>>>> shared memory bcast implementation was run on more than 1K > >>>>>> nodes there > >>>>>> would be a hang. Since the shared memory allreduce uses a bcast > >>>>>> internally > >>>>>> it is also hanging you can try just disabling the bcast: > >>>>>> > >>>>>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > >>>>>> > >>>>>> Let us know if this works or if you have additional questions. > >>>>>> > >>>>>> Thanks, > >>>>>> Matt > >>>>>> > >>>>>> On Mon, 3 Nov 2008, Justin wrote: > >>>>>> > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my > >>>>>>> current > >>>>>>> lockup > >>>>>>> at 16,384 processors at the following stacktrace: > >>>>>>> > >>>>>>> #0 0x00002b015c4f85ff in poll_rdma_buffer > >>>>>>> (vbuf_addr=0x7fff52849020, > >>>>>>> out_of_order=0x7fff52849030) at viacheck.c:206 > >>>>>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck > >>>>>>> (blocking=1384419360) at > >>>>>>> viacheck.c:505 > >>>>>>> #2 0x00002b015c4db00b in MPID_RecvComplete > >>>>>>> (request=0x7fff52849020, > >>>>>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 > >>>>>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, > >>>>>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at > >>>>>>> waitall.c:190 > >>>>>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, > >>>>>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, > >>>>>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, > >>>>>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 > >>>>>>> #5 0x00002b015c4c9d2d in intra_Allreduce > >>>>>>> (sendbuf=0x7fff52849020, > >>>>>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, > >>>>>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 > >>>>>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce > >>>>>>> (sendbuf=0x7fff52849020, > >>>>>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, > >>>>>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 > >>>>>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, > >>>>>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, > >>>>>>> comm=-1384787968) > >>>>>>> at allreduce.c:83 > >>>>>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii > >>>>>>> () in > >>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/ > >>>>>>> libPackages_Uintah_CCA_Components_Schedulers.so > >>>>>>> > >>>>>>> > >>>>>>> I was seeing lockups at smaller powers of two but adding the > >>>>>>> following > >>>>>>> seemed to stop those: > >>>>>>> > >>>>>>> export VIADEV_USE_SHMEM_COLL=0 > >>>>>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 > >>>>>>> > >>>>>>> Now I am just seeing it at 16K. What is odd to me is that if > >>>>>>> the 2 > >>>>>>> commands above stop the shared memory optimizations then why > >>>>>>> does the > >>>>>>> stacktrace still show 'ntra_shmem_Allreduce' being called? > >>>>>>> > >>>>>>> Here is some other info that might be useful: > >>>>>>> > >>>>>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ > >>>>>>> %mpirun_rsh -v > >>>>>>> OSU MVAPICH VERSION 1.0-SingleRail > >>>>>>> Build-ID: custom > >>>>>>> > >>>>>>> MPI Path: > >>>>>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> > >>>>>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ > >>>>>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> > >>>>>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ > >>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Justin > >>>>>>> > >>>>>>> Dhabaleswar Panda wrote: > >>>>>>> > >>>>>>>> Justin, > >>>>>>>> > >>>>>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are > >>>>>>>> using on > >>>>>>>> Ranger. These two stacks have the parameters named differently. > >>>>>>>> Also, on > >>>>>>>> what exact process count you see this problem. If you can also > >>>>>>>> let us know > >>>>>>>> the version number of mvapich/mvapich2 stack and/or the path of > >>>>>>>> the MPI > >>>>>>>> library on Ranger, it will be helpful. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> DK > >>>>>>>> > >>>>>>>> On Mon, 3 Nov 2008, Justin wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> We are running into hangs on Ranger using mvapich that are not > >>>>>>>>> present > >>>>>>>>> on other machines. These hangs seem to only occur on arge > >>>>>>>>> problems with > >>>>>>>>> large numbers of processors. We have ran into similar > >>>>>>>>> problems > >>>>>>>>> on some > >>>>>>>>> LLNL machines in the past and were able to get around them by > >>>>>>>>> disabling > >>>>>>>>> the shared memory optimizations. In these cases the problem > >>>>>>>>> had > >>>>>>>>> to do > >>>>>>>>> with fixed sized buffers used in the shared memory > >>>>>>>>> optimizations. > >>>>>>>>> > >>>>>>>>> We would like to disable shared memory on Ranger but are > >>>>>>>>> confused with > >>>>>>>>> all the different parameters dealing with shared memory > >>>>>>>>> optimizations. > >>>>>>>>> How do we know which parameters affect the run? For example > >>>>>>>>> do > >>>>>>>>> we use > >>>>>>>>> the parameters that begin with MV_ or VIADEV_? From past > >>>>>>>>> conversations > >>>>>>>>> I have had with support teams the parameters that have an > >>>>>>>>> effect > >>>>>>>>> vary > >>>>>>>>> according to the hardware/mpi build. What is the best way to > >>>>>>>>> determine > >>>>>>>>> which parameters are active? > >>>>>>>>> > >>>>>>>>> Also here is a stacktrace from one of our hangs: > >>>>>>>>> > >>>>>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 > >>>>>>>>> Intel(R) Debugger for applications running on Intel(R) 64, > >>>>>>>>> Version > >>>>>>>>> 10.1-35 , Build 20080310 > >>>>>>>>> Attaching to program: > >>>>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ > >>>>>>>>> StandAlone/sus, > >>>>>>>>> > >>>>>>>>> process 16033 > >>>>>>>>> Reading symbols from > >>>>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/ > >>>>>>>>> StandAlone/sus...(no > >>>>>>>>> > >>>>>>>>> debugging symbols found)...done. > >>>>>>>>> smpi_net_lookup () at mpid_smpi.c:1381 > >>>>>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c: > >>>>>>>>> 1381 > >>>>>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at > >>>>>>>>> mpid_smpi.c:1360 > >>>>>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck > >>>>>>>>> (blocking=7154160) at > >>>>>>>>> viacheck.c:505 > >>>>>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > >>>>>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 > >>>>>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > >>>>>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c: > >>>>>>>>> 190 > >>>>>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, > >>>>>>>>> sendcount=16, > >>>>>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, > >>>>>>>>> recvcount=1, > >>>>>>>>> recvtype=6, source=2278, recvtag=14, comm=130, > >>>>>>>>> status=0x7fff4385028c) at > >>>>>>>>> sendrecv.c:98 > >>>>>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>>>>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, > >>>>>>>>> comm=0x1506680) at > >>>>>>>>> intra_fns_new.c:5682 > >>>>>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce > >>>>>>>>> (sendbuf=0x6d29f0, > >>>>>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, > >>>>>>>>> comm=0x1506680) at > >>>>>>>>> intra_fns_new.c:6014 > >>>>>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, > >>>>>>>>> recvbuf=0x10, > >>>>>>>>> count=4, datatype=14, op=22045696, comm=22046336) at > >>>>>>>>> allreduce.c:83 > >>>>>>>>> #9 0x00002ada6a67a4f8 in > >>>>>>>>> _ZN6Uintah12MPIScheduler7executeEii () in > >>>>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/ > >>>>>>>>> libPackages_Uintah_CCA_Components_Schedulers.so > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> In this case what would be the likely parameter I could play > >>>>>>>>> with in > >>>>>>>>> order to potentially stop a hang in MPI_Allreduce? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Justin > >>>>>>>>> _______________________________________________ > >>>>>>>>> mvapich-discuss mailing list > >>>>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich- > >>>>>>>>> discuss > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> mvapich-discuss mailing list > >>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>>>> > >>>>>>> > >>>>> > >>>> > >>>> _______________________________________________ > >>>> mvapich-discuss mailing list > >>>> mvapich-discuss@cse.ohio-state.edu > >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >> > > > From daniel.s.kokron at nasa.gov Wed Nov 5 15:11:43 2008 From: daniel.s.kokron at nasa.gov (Dan Kokron) Date: Wed Nov 5 16:16:47 2008 Subject: [mvapich-discuss] error report Message-ID: <1225915903.17790.338.camel@outfield.gsfc.nasa.gov> MvaPICH-1.1rc1 failed to compile. The build logs are attached. uname -a Linux discover02 2.6.16.53-0.16-smp #1 SMP Tue Oct 2 16:57:49 UTC 2007 x86_64 x86_64 x86_64 GNU/Linux -- Dan Kokron Global Modeling and Assimilation Office NASA Goddard Space Flight Center Greenbelt, MD 20771 Daniel.S.Kokron@nasa.gov Phone: (301) 614-5192 Fax: (301) 614-5304 -------------- next part -------------- A non-text attachment was scrubbed... Name: mvapichlogs.tar Type: application/x-tar Size: 491520 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081105/24952da5/mvapichlogs-0001.tar From koop at cse.ohio-state.edu Wed Nov 5 17:33:01 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Wed Nov 5 17:33:11 2008 Subject: [mvapich-discuss] error report In-Reply-To: <1225915903.17790.338.camel@outfield.gsfc.nasa.gov> Message-ID: Dan, Thanks for this feedback. It seems that all of the hooks had not been included for all of the debug flags that you specified in configure. I've checked in the necessary changes into the trunk of MVAPICH (and will be available in tonight's nightly tarball). These changes will also be reflected in the release version. I've also attached a patch to this email with the required changes. Thanks and let us know if you find any other issues. Matt On Wed, 5 Nov 2008, Dan Kokron wrote: > MvaPICH-1.1rc1 failed to compile. The build logs are attached. > > uname -a > Linux discover02 2.6.16.53-0.16-smp #1 SMP Tue Oct 2 16:57:49 UTC 2007 > x86_64 x86_64 x86_64 GNU/Linux > > -- > Dan Kokron > Global Modeling and Assimilation Office > NASA Goddard Space Flight Center > Greenbelt, MD 20771 > Daniel.S.Kokron@nasa.gov > Phone: (301) 614-5192 > Fax: (301) 614-5304 > -------------- next part -------------- Index: mpid/ch_hybrid/Makefile.in =================================================================== --- mpid/ch_hybrid/Makefile.in (revision 3116) +++ mpid/ch_hybrid/Makefile.in (working copy) @@ -80,7 +80,7 @@ mv_send_xrc.c shmem_coll.c mv_init_rc_qp.c mv_rndv.c \ mv_rndv_rput.c mv_rndv_r3.c mv_rndv_rput_rel.c mv_rndv_rput_unrel.c \ mv_init_rc_fp.c mv_connect.c \ - mv_rpool.c mv_init_xrc_qp.c + mv_rpool.c mv_init_xrc_qp.c objtrace.c calltrace.c VIAOBJECTS = mv_init.o mv_finalize.o \ mv_send.o mv_recv.o mv_check.o mv_threads.o \ @@ -94,7 +94,7 @@ mv_send_xrc.o shmem_coll.o mv_init_rc_qp.o mv_rndv.o \ mv_rndv_rput.o mv_rndv_r3.o mv_rndv_rput_rel.o mv_rndv_rput_unrel.o \ mv_init_rc_fp.o mv_connect.o \ - mv_rpool.o mv_init_xrc_qp.o + mv_rpool.o mv_init_xrc_qp.o objtrace.o calltrace.o # default_all is the target used by the MPICH build. It can be optimized # to not to the ranlib that default does. Is this necessary on modern machines? Index: mpid/ch_hybrid/objtrace.c =================================================================== --- mpid/ch_hybrid/objtrace.c (revision 0) +++ mpid/ch_hybrid/objtrace.c (revision 3117) @@ -0,0 +1,23 @@ +#include "mpiimpl.h" + +#include + +FILE *MPIR_Ref_fp = 0; +int MPIR_Ref_flags = 0; + +void MPIR_Ref_init( + int flag, + char *filename) +{ + MPIR_Ref_flags = flag; + if (flag) { + if (filename) { + MPIR_Ref_fp = fopen( filename, "w" ); + if (!MPIR_Ref_fp) { + MPIR_Ref_flags = 0; + } + } + else + MPIR_Ref_fp = stdout; + } +} Index: mpid/ch_hybrid/calltrace.c =================================================================== --- mpid/ch_hybrid/calltrace.c (revision 0) +++ mpid/ch_hybrid/calltrace.c (revision 3117) @@ -0,0 +1,60 @@ + +#include +#ifdef HAVE_STDLIB_H +#include +#endif + +#ifndef FPRINTF +#define FPRINTF fprintf +#endif + +#if defined(NEEDS_STDLIB_PROTOTYPES) +#include +/* + Some gcc installations have out-of-date include files and need these + definitions to handle the "missing" prototypes. This is NOT + autodetected, but is provided and can be selected by using a switch + on the options line. + + These are from stdlib.h, stdio.h, and unistd.h + */ +extern int FPRINTF(FILE*,const char*,...); +extern int fflush(FILE *); +#endif + +#include "calltrace.h" + +/* Declarations */ +#ifdef DEBUG_TRACE +char *(TR_stack[TR_MAX_STACK]); +int TR_stack_sp = 0, TR_stack_debug = 0; + +void TR_stack_init( int flag ) +{ + TR_stack_debug = flag; +} + +/* Generate a stack trace */ +void TR_stack_print( + FILE *fp, + int dir ) +{ + int i; + + if (dir == 1) { + for (i=0; i=0; i--) { + FPRINTF( fp, "(%d) %s\n", i, TR_stack[i] ); + } + } +} + +#else +void TR_stack_init( int flag ) +{ +} +#endif Index: mpid/ch_gen2/Makefile.in =================================================================== --- mpid/ch_gen2/Makefile.in (revision 3116) +++ mpid/ch_gen2/Makefile.in (working copy) @@ -68,7 +68,7 @@ collutils.c intra_rdma_barrier.c \ mpid_mcast.c mcst_grp.c ibmcgrp.c \ crc32h.c avl.c mem_hooks.c viacoalesce.c shmem_coll.c\ - async_progress.c + async_progress.c calltrace.c objtrace.c VIAOBJECTS = viainit.o viasend.o viarecv.o viacheck.o \ viapriv.o viaparam.o viutil.o vbuf.o cm_user.o\ @@ -76,7 +76,8 @@ mpid_hsend.o mpid_hrecv.o mpid_pack.o cmnargs.o \ mpid_misc.o dreg.o mpid_smpi.o collutils.o intra_rdma_barrier.o \ mpid_mcast.o mcst_grp.o ibmcgrp.o crc32h.o avl.o \ - mem_hooks.o viacoalesce.o shmem_coll.o async_progress.o + mem_hooks.o viacoalesce.o shmem_coll.o async_progress.o calltrace.o \ + objtrace.o # default_all is the target used by the MPICH build. It can be optimized @@ -157,7 +158,7 @@ process/pmgr_client.h process/minidaemon.h mpid_misc.c viaparam.h viaconfig.h viadev.h \ process/common_pmgr_collective.h process/client_pmgr_collective.h \ cmnargs.c dreg.h mpid.h mpid_smpi.h mpid_smpi.c \ - process/mpirun_util.c + process/mpirun_util.c calltrace.c # # Files from the original ADI that we need, but didn't change, Index: mpid/ch_gen2/viapriv.h =================================================================== --- mpid/ch_gen2/viapriv.h (revision 3116) +++ mpid/ch_gen2/viapriv.h (working copy) @@ -30,6 +30,7 @@ #include +#include "calltrace.h" #include "viaparam.h" #include "ibverbs_header.h" Index: mpid/ch_gen2/objtrace.c =================================================================== --- mpid/ch_gen2/objtrace.c (revision 0) +++ mpid/ch_gen2/objtrace.c (revision 3117) @@ -0,0 +1,23 @@ +#include "mpiimpl.h" + +#include + +FILE *MPIR_Ref_fp = 0; +int MPIR_Ref_flags = 0; + +void MPIR_Ref_init( + int flag, + char *filename) +{ + MPIR_Ref_flags = flag; + if (flag) { + if (filename) { + MPIR_Ref_fp = fopen( filename, "w" ); + if (!MPIR_Ref_fp) { + MPIR_Ref_flags = 0; + } + } + else + MPIR_Ref_fp = stdout; + } +} Index: mpid/ch_gen2/calltrace.c =================================================================== --- mpid/ch_gen2/calltrace.c (revision 0) +++ mpid/ch_gen2/calltrace.c (revision 3117) @@ -0,0 +1,60 @@ + +#include +#ifdef HAVE_STDLIB_H +#include +#endif + +#ifndef FPRINTF +#define FPRINTF fprintf +#endif + +#if defined(NEEDS_STDLIB_PROTOTYPES) +#include +/* + Some gcc installations have out-of-date include files and need these + definitions to handle the "missing" prototypes. This is NOT + autodetected, but is provided and can be selected by using a switch + on the options line. + + These are from stdlib.h, stdio.h, and unistd.h + */ +extern int FPRINTF(FILE*,const char*,...); +extern int fflush(FILE *); +#endif + +#include "calltrace.h" + +/* Declarations */ +#ifdef DEBUG_TRACE +char *(TR_stack[TR_MAX_STACK]); +int TR_stack_sp = 0, TR_stack_debug = 0; + +void TR_stack_init( int flag ) +{ + TR_stack_debug = flag; +} + +/* Generate a stack trace */ +void TR_stack_print( + FILE *fp, + int dir ) +{ + int i; + + if (dir == 1) { + for (i=0; i=0; i--) { + FPRINTF( fp, "(%d) %s\n", i, TR_stack[i] ); + } + } +} + +#else +void TR_stack_init( int flag ) +{ +} +#endif Index: mpid/ch_smp/Makefile.in =================================================================== --- mpid/ch_smp/Makefile.in (revision 3116) +++ mpid/ch_smp/Makefile.in (working copy) @@ -64,14 +64,14 @@ mpid_init.c mpid_recv.c mpid_send.c \ mpid_hsend.c mpid_hrecv.c mpid_pack.c \ cmnargs.c mpid_misc.c mpid_smpi.c \ - shmem_coll.c + shmem_coll.c objtrace.c calltrace.c VIAOBJECTS = viainit.o viasend.o viarecv.o viacheck.o \ viaparam.o viutil.o \ mpid_init.o mpid_send.o mpid_recv.o \ mpid_hsend.o mpid_hrecv.o mpid_pack.o cmnargs.o\ mpid_misc.o mpid_smpi.o \ - shmem_coll.o + shmem_coll.o objtrace.o calltrace.o # default_all is the target used by the MPICH build. It can be optimized # to not to the ranlib that default does. Is this necessary on modern machines? Index: mpid/ch_smp/objtrace.c =================================================================== --- mpid/ch_smp/objtrace.c (revision 0) +++ mpid/ch_smp/objtrace.c (revision 3117) @@ -0,0 +1,23 @@ +#include "mpiimpl.h" + +#include + +FILE *MPIR_Ref_fp = 0; +int MPIR_Ref_flags = 0; + +void MPIR_Ref_init( + int flag, + char *filename) +{ + MPIR_Ref_flags = flag; + if (flag) { + if (filename) { + MPIR_Ref_fp = fopen( filename, "w" ); + if (!MPIR_Ref_fp) { + MPIR_Ref_flags = 0; + } + } + else + MPIR_Ref_fp = stdout; + } +} Index: mpid/ch_smp/calltrace.c =================================================================== --- mpid/ch_smp/calltrace.c (revision 0) +++ mpid/ch_smp/calltrace.c (revision 3117) @@ -0,0 +1,60 @@ + +#include +#ifdef HAVE_STDLIB_H +#include +#endif + +#ifndef FPRINTF +#define FPRINTF fprintf +#endif + +#if defined(NEEDS_STDLIB_PROTOTYPES) +#include +/* + Some gcc installations have out-of-date include files and need these + definitions to handle the "missing" prototypes. This is NOT + autodetected, but is provided and can be selected by using a switch + on the options line. + + These are from stdlib.h, stdio.h, and unistd.h + */ +extern int FPRINTF(FILE*,const char*,...); +extern int fflush(FILE *); +#endif + +#include "calltrace.h" + +/* Declarations */ +#ifdef DEBUG_TRACE +char *(TR_stack[TR_MAX_STACK]); +int TR_stack_sp = 0, TR_stack_debug = 0; + +void TR_stack_init( int flag ) +{ + TR_stack_debug = flag; +} + +/* Generate a stack trace */ +void TR_stack_print( + FILE *fp, + int dir ) +{ + int i; + + if (dir == 1) { + for (i=0; i=0; i--) { + FPRINTF( fp, "(%d) %s\n", i, TR_stack[i] ); + } + } +} + +#else +void TR_stack_init( int flag ) +{ +} +#endif Index: mpid/ch_psm/Makefile.in =================================================================== --- mpid/ch_psm/Makefile.in (revision 3116) +++ mpid/ch_psm/Makefile.in (working copy) @@ -63,13 +63,14 @@ psmpriv.c psmparam.c \ mpid_init.c mpid_recv.c mpid_send.c \ mpid_hsend.c mpid_hrecv.c mpid_pack.c \ - cmnargs.c mpid_misc.c shmem_coll.c mpid_smpi.c + cmnargs.c mpid_misc.c shmem_coll.c mpid_smpi.c objtrace.c calltrace.c VIAOBJECTS = psmprobe.o psmwait.o psminit.o psmsend.o psmrecv.o psmcheck.o \ psmpriv.o psmparam.o \ mpid_init.o mpid_send.o mpid_recv.o \ mpid_hsend.o mpid_hrecv.o mpid_pack.o cmnargs.o \ - mpid_misc.o shmem_coll.o mpid_smpi.o + mpid_misc.o shmem_coll.o mpid_smpi.o \ + objtrace.o calltrace.o # default_all is the target used by the MPICH build. It can be optimized # to not to the ranlib that default does. Is this necessary on modern machines? Index: mpid/ch_psm/objtrace.c =================================================================== --- mpid/ch_psm/objtrace.c (revision 0) +++ mpid/ch_psm/objtrace.c (revision 3117) @@ -0,0 +1,23 @@ +#include "mpiimpl.h" + +#include + +FILE *MPIR_Ref_fp = 0; +int MPIR_Ref_flags = 0; + +void MPIR_Ref_init( + int flag, + char *filename) +{ + MPIR_Ref_flags = flag; + if (flag) { + if (filename) { + MPIR_Ref_fp = fopen( filename, "w" ); + if (!MPIR_Ref_fp) { + MPIR_Ref_flags = 0; + } + } + else + MPIR_Ref_fp = stdout; + } +} Index: mpid/ch_psm/calltrace.c =================================================================== --- mpid/ch_psm/calltrace.c (revision 0) +++ mpid/ch_psm/calltrace.c (revision 3117) @@ -0,0 +1,60 @@ + +#include +#ifdef HAVE_STDLIB_H +#include +#endif + +#ifndef FPRINTF +#define FPRINTF fprintf +#endif + +#if defined(NEEDS_STDLIB_PROTOTYPES) +#include +/* + Some gcc installations have out-of-date include files and need these + definitions to handle the "missing" prototypes. This is NOT + autodetected, but is provided and can be selected by using a switch + on the options line. + + These are from stdlib.h, stdio.h, and unistd.h + */ +extern int FPRINTF(FILE*,const char*,...); +extern int fflush(FILE *); +#endif + +#include "calltrace.h" + +/* Declarations */ +#ifdef DEBUG_TRACE +char *(TR_stack[TR_MAX_STACK]); +int TR_stack_sp = 0, TR_stack_debug = 0; + +void TR_stack_init( int flag ) +{ + TR_stack_debug = flag; +} + +/* Generate a stack trace */ +void TR_stack_print( + FILE *fp, + int dir ) +{ + int i; + + if (dir == 1) { + for (i=0; i=0; i--) { + FPRINTF( fp, "(%d) %s\n", i, TR_stack[i] ); + } + } +} + +#else +void TR_stack_init( int flag ) +{ +} +#endif From luitjens at cs.utah.edu Thu Nov 6 16:21:15 2008 From: luitjens at cs.utah.edu (Justin) Date: Thu Nov 6 16:22:22 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <490FB42A.5070708@cs.utah.edu> References: <490FB42A.5070708@cs.utah.edu> Message-ID: <49135FCB.2040000@cs.utah.edu> The build is mvapich2-1.0-intel-ofed-1.2. I don't have a stacktrace yet as Abe is very slow queue times. Justin Justin wrote: > Ok, I will work with my co-worker to get this information. It may > take a few days as I don't have an account on Abe and will have to > relay everything through him. > > Justin > > Dhabaleswar Panda wrote: >>> Are there similar hangs when using mvapich2? A coworker of mine is >>> reporting similar hangs on Abe using mvapich2. I'm not sure of the >>> version. >>> >> >> It will be good to know what version of MVAPICH2 is running on Abe. >> Also, >> it will be helpful to get a backtrace of the hang. This will help us to >> determine whether the causes are same or not. >> >> Thanks, >> >> DK >> >> >>> Justin >>> >>> Matthew Koop wrote: >>> >>>> Justin, >>>> >>>> I think there are a couple things here: >>>> >>>> 1.) Simply exporting the variables is not sufficient for the setup at >>>> TACC. You'll need to set it the following way: >>>> >>>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name >>>> >>>> Since the ENVs weren't being propogated the setting wasn't taking >>>> effect >>>> (and that is why you still saw the shmem functions in the backtrace). >>>> >>>> 2.) There was a limitation in the 1.0 versions where when the >>>> shared memory bcast implementation was run on more than 1K nodes there >>>> would be a hang. Since the shared memory allreduce uses a bcast >>>> internally >>>> it is also hanging you can try just disabling the bcast: >>>> >>>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name >>>> >>>> Let us know if this works or if you have additional questions. >>>> >>>> Thanks, >>>> Matt >>>> >>>> On Mon, 3 Nov 2008, Justin wrote: >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my current >>>>> lockup >>>>> at 16,384 processors at the following stacktrace: >>>>> >>>>> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, >>>>> out_of_order=0x7fff52849030) at viacheck.c:206 >>>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at >>>>> viacheck.c:505 >>>>> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, >>>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 >>>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, >>>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at >>>>> waitall.c:190 >>>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, >>>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, >>>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, >>>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 >>>>> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, >>>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 >>>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce >>>>> (sendbuf=0x7fff52849020, >>>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 >>>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, >>>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, >>>>> comm=-1384787968) >>>>> at allreduce.c:83 >>>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>>> >>>>> >>>>> I was seeing lockups at smaller powers of two but adding the >>>>> following >>>>> seemed to stop those: >>>>> >>>>> export VIADEV_USE_SHMEM_COLL=0 >>>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 >>>>> >>>>> Now I am just seeing it at 16K. What is odd to me is that if the 2 >>>>> commands above stop the shared memory optimizations then why does the >>>>> stacktrace still show 'ntra_shmem_Allreduce' being called? >>>>> >>>>> Here is some other info that might be useful: >>>>> >>>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v >>>>> OSU MVAPICH VERSION 1.0-SingleRail >>>>> Build-ID: custom >>>>> >>>>> MPI Path: >>>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ >>>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ >>>>> >>>>> >>>>> Thanks, >>>>> Justin >>>>> >>>>> Dhabaleswar Panda wrote: >>>>> >>>>> >>>>>> Justin, >>>>>> >>>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are >>>>>> using on >>>>>> Ranger. These two stacks have the parameters named differently. >>>>>> Also, on >>>>>> what exact process count you see this problem. If you can also >>>>>> let us know >>>>>> the version number of mvapich/mvapich2 stack and/or the path of >>>>>> the MPI >>>>>> library on Ranger, it will be helpful. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> DK >>>>>> >>>>>> On Mon, 3 Nov 2008, Justin wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> We are running into hangs on Ranger using mvapich that are not >>>>>>> present >>>>>>> on other machines. These hangs seem to only occur on arge >>>>>>> problems with >>>>>>> large numbers of processors. We have ran into similar problems >>>>>>> on some >>>>>>> LLNL machines in the past and were able to get around them by >>>>>>> disabling >>>>>>> the shared memory optimizations. In these cases the problem had >>>>>>> to do >>>>>>> with fixed sized buffers used in the shared memory optimizations. >>>>>>> >>>>>>> We would like to disable shared memory on Ranger but are >>>>>>> confused with >>>>>>> all the different parameters dealing with shared memory >>>>>>> optimizations. >>>>>>> How do we know which parameters affect the run? For example do >>>>>>> we use >>>>>>> the parameters that begin with MV_ or VIADEV_? From past >>>>>>> conversations >>>>>>> I have had with support teams the parameters that have an effect >>>>>>> vary >>>>>>> according to the hardware/mpi build. What is the best way to >>>>>>> determine >>>>>>> which parameters are active? >>>>>>> >>>>>>> Also here is a stacktrace from one of our hangs: >>>>>>> >>>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 >>>>>>> Intel(R) Debugger for applications running on Intel(R) 64, Version >>>>>>> 10.1-35 , Build 20080310 >>>>>>> Attaching to program: >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, >>>>>>> >>>>>>> process 16033 >>>>>>> Reading symbols from >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no >>>>>>> >>>>>>> debugging symbols found)...done. >>>>>>> smpi_net_lookup () at mpid_smpi.c:1381 >>>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 >>>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at >>>>>>> mpid_smpi.c:1360 >>>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at >>>>>>> viacheck.c:505 >>>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, >>>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 >>>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, >>>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 >>>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, >>>>>>> sendcount=16, >>>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, >>>>>>> recvcount=1, >>>>>>> recvtype=6, source=2278, recvtag=14, comm=130, >>>>>>> status=0x7fff4385028c) at >>>>>>> sendrecv.c:98 >>>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, >>>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, >>>>>>> comm=0x1506680) at >>>>>>> intra_fns_new.c:5682 >>>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, >>>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, >>>>>>> comm=0x1506680) at >>>>>>> intra_fns_new.c:6014 >>>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, >>>>>>> recvbuf=0x10, >>>>>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 >>>>>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in >>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so >>>>>>> >>>>>>> >>>>>>> In this case what would be the likely parameter I could play >>>>>>> with in >>>>>>> order to potentially stop a hang in MPI_Allreduce? >>>>>>> >>>>>>> Thanks, >>>>>>> Justin >>>>>>> _______________________________________________ >>>>>>> mvapich-discuss mailing list >>>>>>> mvapich-discuss@cse.ohio-state.edu >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> _______________________________________________ >>>>> mvapich-discuss mailing list >>>>> mvapich-discuss@cse.ohio-state.edu >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss >>>>> >>>>> >>>>> > > From daniel.s.kokron at nasa.gov Thu Nov 6 16:24:55 2008 From: daniel.s.kokron at nasa.gov (Dan Kokron) Date: Thu Nov 6 17:39:52 2008 Subject: [mvapich-discuss] should I worry? Message-ID: <1226006695.17790.382.camel@outfield.gsfc.nasa.gov> I see the following message near the end of my build of mvapich-1.1rc1. Should I be concerned? Do I need MPD in order to run an MPI application? "MPD is not installed since you opt not to have it." -- Dan Kokron Global Modeling and Assimilation Office NASA Goddard Space Flight Center Greenbelt, MD 20771 Daniel.S.Kokron@nasa.gov Phone: (301) 614-5192 Fax: (301) 614-5304 From koop at cse.ohio-state.edu Thu Nov 6 18:02:29 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Thu Nov 6 18:02:39 2008 Subject: [mvapich-discuss] should I worry? In-Reply-To: <1226006695.17790.382.camel@outfield.gsfc.nasa.gov> Message-ID: MPD is not required and we do not support it either so you can ignore the message. The MVAPICH mpirun_rsh framework is much more scalable and user-friendly. Matt On Thu, 6 Nov 2008, Dan Kokron wrote: > I see the following message near the end of my build of mvapich-1.1rc1. > Should I be concerned? Do I need MPD in order to run an MPI > application? > > > "MPD is not installed since you opt not to have it." > -- > Dan Kokron > Global Modeling and Assimilation Office > NASA Goddard Space Flight Center > Greenbelt, MD 20771 > Daniel.S.Kokron@nasa.gov > Phone: (301) 614-5192 > Fax: (301) 614-5304 > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Fri Nov 7 01:54:39 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Nov 7 01:54:51 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.2 Message-ID: The MVAPICH team is pleased to announce the availability of MVAPICH2-1.2 with the following NEW features: - Scalable and robust daemon-less job startup - Enhanced and robust mpirun_rsh framework (non-MPD-based) to provide scalable job launching on multi-thousand core clusters - Available for OpenFabrics (IB and iWARP) and uDAPL interfaces (including Solaris) - Support for Totalview debugger - Checkpoint-restart with intra-node shared memory support - Allows best performance and scalability with fault-tolerance support - Enhancement to software installation - Full autoconf-based configuration - Automatically detects system architecture and adapter types and optimizes MVAPICH2 for any particular installation - An application (mpiname) for querying the MVAPICH2 library version and configuration information - Enhanced processor affinity using PLPA for multi-core architectures - Allows user-defined flexible processor affinity - Enhanced scalability for RDMA-based direct one-sided communication with less communication resource - Available for OpenFabrics (IB and iWARP) interfaces - Shared memory optimized algorithm for MPI_Bcast operation - Optimized and tuned MPI_Alltoall - Based on MPICH2 1.0.7 More details on all features and supported platforms can be obtained by visiting the following URL: http://mvapich.cse.ohio-state.edu/overview/mvapich2/features.shtml MVAPICH2 1.2 is being made available with OFED 1.4. It is also tested with OFED 1.3. It continues to deliver excellent performance. Sample performance numbers include: OpenFabrics/Gen2 on EM64T quad-core with PCIe-Gen2 and ConnectX-QDR: Two-sided operations: - 1.25 microsec one-way latency (4 bytes) - 2573 MB/sec unidirectional bandwidth - 5037 MB/sec bidirectional bandwidth One-sided operations: - 2.73 microsec Put latency (4 bytes) - 2576 MB/sec unidirectional Put bandwidth - 4921 MB/sec bidirectional Put bandwidth Performance numbers for several other platforms, system configurations and operations can be viewed by visiting `Performance' section of the project's web page. For downloading MVAPICH2 1.2 package and accessing the anonymous SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu/ All feedbacks, including bug reports, hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From panda at cse.ohio-state.edu Fri Nov 7 10:29:00 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Nov 7 10:29:11 2008 Subject: [mvapich-discuss] hang at large numbers of processors In-Reply-To: <49135FCB.2040000@cs.utah.edu> Message-ID: Justin, Thanks for sending us this information. This problem could be the same as you experienced earlier. A few things related to shared memory broadcast were resolved in MVAPICH2 1.0.3. We have released MVAPICH2 1.2 yesterday. Abe folks should upgrade their stack to this latest release. Let us know if the problem still persists with MVAPICH2 1.2 and we will take a deeper look at it. Thanks, DK > The build is mvapich2-1.0-intel-ofed-1.2. I don't have a stacktrace yet > as Abe is very slow queue times. > > Justin > > > Justin wrote: > > Ok, I will work with my co-worker to get this information. It may > > take a few days as I don't have an account on Abe and will have to > > relay everything through him. > > > > Justin > > > > Dhabaleswar Panda wrote: > >>> Are there similar hangs when using mvapich2? A coworker of mine is > >>> reporting similar hangs on Abe using mvapich2. I'm not sure of the > >>> version. > >>> > >> > >> It will be good to know what version of MVAPICH2 is running on Abe. > >> Also, > >> it will be helpful to get a backtrace of the hang. This will help us to > >> determine whether the causes are same or not. > >> > >> Thanks, > >> > >> DK > >> > >> > >>> Justin > >>> > >>> Matthew Koop wrote: > >>> > >>>> Justin, > >>>> > >>>> I think there are a couple things here: > >>>> > >>>> 1.) Simply exporting the variables is not sufficient for the setup at > >>>> TACC. You'll need to set it the following way: > >>>> > >>>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name > >>>> > >>>> Since the ENVs weren't being propogated the setting wasn't taking > >>>> effect > >>>> (and that is why you still saw the shmem functions in the backtrace). > >>>> > >>>> 2.) There was a limitation in the 1.0 versions where when the > >>>> shared memory bcast implementation was run on more than 1K nodes there > >>>> would be a hang. Since the shared memory allreduce uses a bcast > >>>> internally > >>>> it is also hanging you can try just disabling the bcast: > >>>> > >>>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name > >>>> > >>>> Let us know if this works or if you have additional questions. > >>>> > >>>> Thanks, > >>>> Matt > >>>> > >>>> On Mon, 3 Nov 2008, Justin wrote: > >>>> > >>>> > >>>> > >>>>> Hi, > >>>>> > >>>>> We are using mvapich_devel_1.0 on Ranger. I am seeing my current > >>>>> lockup > >>>>> at 16,384 processors at the following stacktrace: > >>>>> > >>>>> #0 0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, > >>>>> out_of_order=0x7fff52849030) at viacheck.c:206 > >>>>> #1 0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at > >>>>> viacheck.c:505 > >>>>> #2 0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, > >>>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106 > >>>>> #3 0x00002b015c5032f7 in MPI_Waitall (count=1384419360, > >>>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at > >>>>> waitall.c:190 > >>>>> #4 0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, > >>>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64, > >>>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, > >>>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98 > >>>>> #5 0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, > >>>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, > >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682 > >>>>> #6 0x00002b015c4c9516 in intra_shmem_Allreduce > >>>>> (sendbuf=0x7fff52849020, > >>>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, > >>>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014 > >>>>> #7 0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, > >>>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, > >>>>> comm=-1384787968) > >>>>> at allreduce.c:83 > >>>>> #8 0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >>>>> > >>>>> > >>>>> I was seeing lockups at smaller powers of two but adding the > >>>>> following > >>>>> seemed to stop those: > >>>>> > >>>>> export VIADEV_USE_SHMEM_COLL=0 > >>>>> export VIADEV_USE_SHMEM_ALLREDUCE=0 > >>>>> > >>>>> Now I am just seeing it at 16K. What is odd to me is that if the 2 > >>>>> commands above stop the shared memory optimizations then why does the > >>>>> stacktrace still show 'ntra_shmem_Allreduce' being called? > >>>>> > >>>>> Here is some other info that might be useful: > >>>>> > >>>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v > >>>>> OSU MVAPICH VERSION 1.0-SingleRail > >>>>> Build-ID: custom > >>>>> > >>>>> MPI Path: > >>>>> lrwxrwxrwx 1 tg802225 G-800594 46 May 27 14:29 include -> > >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/ > >>>>> lrwxrwxrwx 1 tg802225 G-800594 49 May 27 14:29 lib -> > >>>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/ > >>>>> > >>>>> > >>>>> Thanks, > >>>>> Justin > >>>>> > >>>>> Dhabaleswar Panda wrote: > >>>>> > >>>>> > >>>>>> Justin, > >>>>>> > >>>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are > >>>>>> using on > >>>>>> Ranger. These two stacks have the parameters named differently. > >>>>>> Also, on > >>>>>> what exact process count you see this problem. If you can also > >>>>>> let us know > >>>>>> the version number of mvapich/mvapich2 stack and/or the path of > >>>>>> the MPI > >>>>>> library on Ranger, it will be helpful. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> DK > >>>>>> > >>>>>> On Mon, 3 Nov 2008, Justin wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> We are running into hangs on Ranger using mvapich that are not > >>>>>>> present > >>>>>>> on other machines. These hangs seem to only occur on arge > >>>>>>> problems with > >>>>>>> large numbers of processors. We have ran into similar problems > >>>>>>> on some > >>>>>>> LLNL machines in the past and were able to get around them by > >>>>>>> disabling > >>>>>>> the shared memory optimizations. In these cases the problem had > >>>>>>> to do > >>>>>>> with fixed sized buffers used in the shared memory optimizations. > >>>>>>> > >>>>>>> We would like to disable shared memory on Ranger but are > >>>>>>> confused with > >>>>>>> all the different parameters dealing with shared memory > >>>>>>> optimizations. > >>>>>>> How do we know which parameters affect the run? For example do > >>>>>>> we use > >>>>>>> the parameters that begin with MV_ or VIADEV_? From past > >>>>>>> conversations > >>>>>>> I have had with support teams the parameters that have an effect > >>>>>>> vary > >>>>>>> according to the hardware/mpi build. What is the best way to > >>>>>>> determine > >>>>>>> which parameters are active? > >>>>>>> > >>>>>>> Also here is a stacktrace from one of our hangs: > >>>>>>> > >>>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033 > >>>>>>> Intel(R) Debugger for applications running on Intel(R) 64, Version > >>>>>>> 10.1-35 , Build 20080310 > >>>>>>> Attaching to program: > >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus, > >>>>>>> > >>>>>>> process 16033 > >>>>>>> Reading symbols from > >>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no > >>>>>>> > >>>>>>> debugging symbols found)...done. > >>>>>>> smpi_net_lookup () at mpid_smpi.c:1381 > >>>>>>> #0 0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381 > >>>>>>> #1 0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at > >>>>>>> mpid_smpi.c:1360 > >>>>>>> #2 0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at > >>>>>>> viacheck.c:505 > >>>>>>> #3 0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0, > >>>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106 > >>>>>>> #4 0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160, > >>>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190 > >>>>>>> #5 0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, > >>>>>>> sendcount=16, > >>>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, > >>>>>>> recvcount=1, > >>>>>>> recvtype=6, source=2278, recvtag=14, comm=130, > >>>>>>> status=0x7fff4385028c) at > >>>>>>> sendrecv.c:98 > >>>>>>> #6 0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0, > >>>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, > >>>>>>> comm=0x1506680) at > >>>>>>> intra_fns_new.c:5682 > >>>>>>> #7 0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0, > >>>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, > >>>>>>> comm=0x1506680) at > >>>>>>> intra_fns_new.c:6014 > >>>>>>> #8 0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, > >>>>>>> recvbuf=0x10, > >>>>>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83 > >>>>>>> #9 0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in > >>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so > >>>>>>> > >>>>>>> > >>>>>>> In this case what would be the likely parameter I could play > >>>>>>> with in > >>>>>>> order to potentially stop a hang in MPI_Allreduce? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Justin > >>>>>>> _______________________________________________ > >>>>>>> mvapich-discuss mailing list > >>>>>>> mvapich-discuss@cse.ohio-state.edu > >>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> _______________________________________________ > >>>>> mvapich-discuss mailing list > >>>>> mvapich-discuss@cse.ohio-state.edu > >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > >>>>> > >>>>> > >>>>> > > > > > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From dkokron at toad.net Sat Nov 8 20:36:40 2008 From: dkokron at toad.net (Dan Kokron) Date: Sun Nov 9 09:15:53 2008 Subject: [mvapich-discuss] Bus error from mpirun_rsh Message-ID: <49163EA8.5030505@toad.net> Following is a minimal trace from a core file dropped by mpirun_rsh while attempting to run an application on 120 nodes of an SGI ICE system. The application and MVAPICH were both compiled with the Intel-10.1.015 compiler suite. Does this look familiar to anyone? Some system config Linux p4fe1 2.6.16.60-0.27schamp-nasa #1 SMP Sat Sep 13 20:37:07 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Command line mpirun_rsh -ssh -np 480 -hostfile machinefile VIADEV_USE_SHMEM_COLL=0 VIADEV_CLUSTER_SIZE=MEDIUM ./Application.x which mpirun_rsh /u/dkokron/play/mvapich-1.1rc1/bin/mpirun_rsh p4fe1.dkokron 269> gdb -c core.10742 /u/dkokron/play/mvapich-1.1rc1/bin/mpirun_rsh GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"... Using host libthread_db library "/lib64/libthread_db.so.1". Reading symbols from /nasa/intel/cce/10.1.015/lib/libimf.so...done. Loaded symbols for /nasa/intel/cce/10.1.015/lib/libimf.so Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libgcc_s.so.1...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Core was generated by `mpirun_rsh -ssh -np 480 -hostfile machinefile VIADEV_USE_SHMEM_COLL=0 VIADEV_CL'. Program terminated with signal 7, Bus error. #0 0x0000000000402b15 in main () -- Dan Kokron From dkokron at toad.net Sat Nov 8 21:42:03 2008 From: dkokron at toad.net (Dan Kokron) Date: Sun Nov 9 09:15:53 2008 Subject: [mvapich-discuss] is something else turned off by VIADEV_USE_SHMEM_COLL=0 ? Message-ID: <49164DFB.7080103@toad.net> I have an application that runs okay with the following command line using mvapich-1.1rc1 (disabling the SHMEM collectives is necessary to get it to run). runs; mpirun_rsh -ssh -np $NPES -hostfile machinefile VIADEV_USE_SHMEM_COLL=0 ./Application however if I disable each collective individually on a single command line, it fails in the same manner as when the collectives are defaulted. e.g. mpirun_rsh -ssh -np $NPES -hostfile machinefile VIADEV_USE_SHMEM_ALLREDUCE=0 VIADEV_USE_SHMEM_REDUCE=0 VIADEV_USE_SHMEM_BARRIER=0 VIADEV_USE_SHMEM_BCAST=0 VIADEV_USE_NEW_ALLGATHER=0 ./Application fails just like ... mpirun_rsh -ssh -np $NPES -hostfile machinefile ./Application Does setting VIADEV_USE_SHMEM_COLL=0 disable more code paths than indicated in the documentation? Sec 5.4 of http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide-1.1.pdf -- Dan Kokron From panda at cse.ohio-state.edu Sun Nov 9 13:26:52 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun Nov 9 13:27:07 2008 Subject: [mvapich-discuss] Bus error from mpirun_rsh In-Reply-To: <49163EA8.5030505@toad.net> Message-ID: > Following is a minimal trace from a core file dropped by mpirun_rsh > while attempting to run an application on 120 nodes of an SGI ICE > system. The application and MVAPICH were both compiled with the > Intel-10.1.015 compiler suite. Does this look familiar to anyone? We have not seen this error. Will it be possible for you to send us backtrace of this error? Do you see this error with the latest mvapich trunk version also? A couple of fixes have gone into the codebase since RC1 was released. We are currently testing the trunk version for the final release. You can get the nightly tarball of the latest trunk version from mvapich download page. Thanks, DK > Some system config > Linux p4fe1 2.6.16.60-0.27schamp-nasa #1 SMP Sat Sep 13 20:37:07 UTC > 2008 x86_64 x86_64 x86_64 GNU/Linux > > Command line > mpirun_rsh -ssh -np 480 -hostfile machinefile VIADEV_USE_SHMEM_COLL=0 > VIADEV_CLUSTER_SIZE=MEDIUM ./Application.x > > which mpirun_rsh > /u/dkokron/play/mvapich-1.1rc1/bin/mpirun_rsh > > p4fe1.dkokron 269> gdb -c core.10742 > /u/dkokron/play/mvapich-1.1rc1/bin/mpirun_rsh > GNU gdb 6.6 > Copyright (C) 2006 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-suse-linux"... > Using host libthread_db library "/lib64/libthread_db.so.1". > Reading symbols from /nasa/intel/cce/10.1.015/lib/libimf.so...done. > Loaded symbols for /nasa/intel/cce/10.1.015/lib/libimf.so > Reading symbols from /lib64/libm.so.6...done. > Loaded symbols for /lib64/libm.so.6 > Reading symbols from /lib64/libgcc_s.so.1...done. > Loaded symbols for /lib64/libgcc_s.so.1 > Reading symbols from /lib64/libc.so.6...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/libdl.so.2...done. > Loaded symbols for /lib64/libdl.so.2 > Reading symbols from /lib64/ld-linux-x86-64.so.2...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > Core was generated by `mpirun_rsh -ssh -np 480 -hostfile machinefile > VIADEV_USE_SHMEM_COLL=0 VIADEV_CL'. > Program terminated with signal 7, Bus error. > #0 0x0000000000402b15 in main () > > -- > Dan Kokron > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From panda at cse.ohio-state.edu Sun Nov 9 13:31:20 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun Nov 9 13:31:32 2008 Subject: [mvapich-discuss] is something else turned off by VIADEV_USE_SHMEM_COLL=0 ? In-Reply-To: <49164DFB.7080103@toad.net> Message-ID: > I have an application that runs okay with the following command line > using mvapich-1.1rc1 (disabling the SHMEM collectives is necessary to > get it to run). > > runs; > mpirun_rsh -ssh -np $NPES -hostfile machinefile VIADEV_USE_SHMEM_COLL=0 > ./Application > > however if I disable each collective individually on a single command > line, it fails in the same manner as when the collectives are defaulted. > > e.g. > mpirun_rsh -ssh -np $NPES -hostfile machinefile > VIADEV_USE_SHMEM_ALLREDUCE=0 VIADEV_USE_SHMEM_REDUCE=0 > VIADEV_USE_SHMEM_BARRIER=0 VIADEV_USE_SHMEM_BCAST=0 > VIADEV_USE_NEW_ALLGATHER=0 ./Application > > fails just like ... > mpirun_rsh -ssh -np $NPES -hostfile machinefile ./Application > > Does setting VIADEV_USE_SHMEM_COLL=0 disable more code paths than > indicated in the documentation? The design and use of shmem_based_collectives requires modifications to the way communicators are created and handled. There are some parts related to this communicator creation/handling which could be the cause of these differences. We will take a look at it. Thanks, DK > Sec 5.4 of > http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide-1.1.pdf > > -- > Dan Kokron > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From christian.guggenberger at rzg.mpg.de Tue Nov 11 10:14:17 2008 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Tue Nov 11 10:14:31 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.2 In-Reply-To: References: Message-ID: <20081111151416.GD32449@daltons.rzg.mpg.de> Hi, > - Scalable and robust daemon-less job startup > - Enhanced and robust mpirun_rsh framework (non-MPD-based) to > provide scalable job launching on multi-thousand core clusters > - Available for OpenFabrics (IB and iWARP) and uDAPL interfaces > (including Solaris) > - Support for Totalview debugger > is there a way to tell 'mpirun_rsh' to look for a default hostfile (i.e. when no '-hostfile' option is supplied) ? thanks, - Christian From art702 at jaguar1.usouthal.edu Tue Nov 11 10:48:21 2008 From: art702 at jaguar1.usouthal.edu (art702@jaguar1.usouthal.edu) Date: Tue Nov 11 10:48:33 2008 Subject: [mvapich-discuss] MPI programs on Intel vTune Performance analyzer Message-ID: Am new to MVAPICH, am trying to evaluate performance of MPI programs on Intel vTune performance analyzer. can anybody help me run to MPI Progems on vTune? -arvind. From sridharj at cse.ohio-state.edu Tue Nov 11 11:19:54 2008 From: sridharj at cse.ohio-state.edu (Jaidev Sridhar) Date: Tue Nov 11 11:20:06 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.2 In-Reply-To: <20081111151416.GD32449@daltons.rzg.mpg.de> References: <20081111151416.GD32449@daltons.rzg.mpg.de> Message-ID: <20081111161954.GA19542@kappa.cse.ohio-state.edu> Hi Christian, On Tue, Nov 11, 2008 at 04:14:17PM +0100, Christian Guggenberger wrote: > > - Scalable and robust daemon-less job startup > > - Enhanced and robust mpirun_rsh framework (non-MPD-based) to > > provide scalable job launching on multi-thousand core clusters > > - Available for OpenFabrics (IB and iWARP) and uDAPL interfaces > > (including Solaris) > > - Support for Totalview debugger > > > > is there a way to tell 'mpirun_rsh' to look for a default hostfile (i.e. > when no '-hostfile' option is supplied) ? > If a hostfile is not specified, mpirun_rsh expects hostnames on the command-line. For example, mpirun_rsh -np 2 host1 host2 ./a.out It doesn't look for a default hostfile. -Jaidev -- You can rent this space for only $5 a week. From spiglerg at gmail.com Tue Nov 11 12:51:26 2008 From: spiglerg at gmail.com (SpiglerG) Date: Tue Nov 11 12:51:38 2008 Subject: [mvapich-discuss] mvapich2 crash Message-ID: Hi. Since when I started using the BASS cluster, I've worked on porting my applications to the new system (mainly libraries compatibility), but I had some weird problems with using the installed MPI library. After some coding, I could finally take the problem down to a simple case. It seems that running an MPI program which uses pthread library, and has threads instancing malloc/free calls leads to program crashes due to `munmap_chunk() invalid pointer`s. This could be depend on some memory locking or strict memory handling from the MPI system; could someone help me solving it? I'm attaching the source code I'm using (the stripped-down one), among with an example of a crash. I'm compiling with `mpicc -o pt pt.c -lpthread -fPIC` (-fPIC just to get some more debug information), and running with `mpirun -np 1 -machinefile machines $(pwd)/pt` (where machines contains a single line with an allocated machine, eg using `qlogin -l gpus=4`, [as I'm working on GPU nodes for my apps]). Hope someone can help me. Giacomo Spigler -------------- next part -------------- In main: creating thread 0 In main: creating thread 1 In main: creating thread 2 In main: creating thread 3 In main: creating thread 4 Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! *** glibc detected *** /home/spiglerg/bugtest/pt: munmap_chunk(): invalid pointer: 0x00002aaab3e80020 *** ======= Backtrace: ========= /lib64/libc.so.6(cfree+0x1b6)[0x3376e74d86] /home/spiglerg/bugtest/pt(PrintHello+0x6b)[0x400911] /lib64/libpthread.so.0[0x3377a062f7] /lib64/libc.so.6(clone+0x6d)[0x3376ed1b6d] ======= Memory map: ======== 00400000-00401000 r-xp 00000000 00:16 151175235 /home/spiglerg/bugtest/pt 00600000-00601000 rw-p 00000000 00:16 151175235 /home/spiglerg/bugtest/pt 19b8a000-19bc4000 rw-p 19b8a000 00:00 0 19bc4000-19bc5000 rw-p 19bc4000 00:00 0 19bc5000-19bf6000 rw-p 19bc5000 00:00 0 41417000-41418000 ---p 41417000 00:00 0 41418000-49418000 rw-p 41418000 00:00 0 49418000-49419000 ---p 49418000 00:00 0 49419000-51419000 rw-p 49419000 00:00 0 51419000-5141a000 ---p 51419000 00:00 0 5141a000-5941a000 rw-p 5141a000 00:00 0 5941a000-5941b000 ---p 5941a000 00:00 0 5941b000-6141b000 rw-p 5941b000 00:00 0 6141b000-6141c000 ---p 6141b000 00:00 0 6141c000-6941c000 rw-p 6141c000 00:00 0 6941c000-6941d000 ---p 6941c000 00:00 0 6941d000-7141d000 rw-p 6941d000 00:00 0 7141d000-7141e000 ---p 7141d000 00:00 0 7141e000-7941e000 rw-p 7141e000 00:00 0 3375600000-337561a000 r-xp 00000000 fd:00 965129 /lib64/ld-2.5.so 337581a000-337581b000 r--p 0001a000 fd:00 965129 /lib64/ld-2.5.so 337581b000-337581c000 rw-p 0001b000 fd:00 965129 /lib64/ld-2.5.so 3375a00000-3375a27000 r-xp 00000000 fd:00 965160 /lib64/libibt.so.0.0 3375a27000-3375c27000 ---p 00027000 fd:00 965160 /lib64/libibt.so.0.0 3375c27000-3375c29000 rw-p 00027000 fd:00 965160 /lib64/libibt.so.0.0 3375e00000-3375e20000 r-xp 00000000 fd:00 964873 /lib64/libpublic.so.0.0 3375e20000-3376020000 ---p 00020000 fd:00 964873 /lib64/libpublic.so.0.0 3376020000-3376021000 rw-p 00020000 fd:00 964873 /lib64/libpublic.so.0.0 3376200000-3376201000 r-xp 00000000 fd:00 965087 /lib64/libmosal.so.0.0 3376201000-3376400000 ---p 00001000 fd:00 965087 /lib64/libmosal.so.0.0 3376400000-3376401000 rw-p 00000000 fd:00 965087 /lib64/libmosal.so.0.0 3376600000-337660f000 r-xp 00000000 fd:00 964860 /lib64/libvapi.so.0.0 337660f000-337680e000 ---p 0000f000 fd:00 964860 /lib64/libvapi.so.0.0 337680e000-337680f000 rw-p 0000e000 fd:00 964860 /lib64/libvapi.so.0.0 3376a00000-3376a02000 r-xp 00000000 fd:00 965091 /lib64/libmtl_common.so.0.0 3376a02000-3376c01000 ---p 00002000 fd:00 965091 /lib64/libmtl_common.so.0.0 3376c01000-3376c02000 rw-p 00001000 fd:00 965091 /lib64/libmtl_common.so.0.0 3376e00000-3376f4a000 r-xp 00000000 fd:00 965132 /lib64/libc-2.5.so 3376f4a000-3377149000 ---p 0014a000 fd:00 965132 /lib64/libc-2.5.so 3377149000-337714d000 r--p 00149000 fd:00 965132 /lib64/libc-2.5.so 337714d000-337714e000 rw-p 0014d000 fd:00 965132 /lib64/libc-2.5.so 337714e000-3377153000 rw-p 337714e000 00:00 0 3377200000-3377202000 r-xp 00000000 fd:00 965004 /lib64/libdl-2.5.so 3377202000-3377402000 ---p 00002000 fd:00 965004 /lib64/libdl-2.5.so 3377402000-3377403000 r--p 00002000 fd:00 965004 /lib64/libdl-2.5.so 3377403000-3377404000 rw-p 00003000 fd:00 965004 /lib64/libdl-2.5.so 3377600000-3377682000 r-xp 00000000 fd:00 965138 /lib64/libm-2.5.so 3377682000-3377881000 ---p 00082000 fd:00 965138 /lib64/libm-2.5.so 3377881000-3377882000 r--p 00081000 fd:00 965138 /lib64/libm-2.5.so 3377882000-3377883000 rw-p 00082000 fd:00 965138 /lib64/libm-2.5.so 3377a00000-3377a15000 r-xp 00000000 fd:00 965134 /lib64/libpthread-2.5.so 3377a15000-3377c14000 ---p 00015000 fd:00 965134 /lib64/libpthread-2.5.so 3377c14000-3377c15000 r--p 00014000 fd:00 965134 /lib64/libpthread-2.5.so 3377c15000-3377c16000 rw-p 00015000 fd:00 965134 /lib64/libpthread-2.5.so 3377c16000-3377c1a000 rw-p 3377c16000 00:00 0 3377e00000-3377e01000 r-xp 00000000 fd:00 965111 /lib64/libmpga.so.0.0 3377e01000-3378000000 ---p 00001000 fd:00 965111 /lib64/libmpga.so.0.0 3378000000-3378001000 rw-p 00000000 fd:00 965111 /lib64/libmpga.so.0.0 3378e00000-3378e0d000 r-xp 00000000 fd:00 965068 /lib64/libgcc_s-4.1.2-20080102.so.1 3378e0d000-337900d000 ---p 0000d000 fd:00 965068 /lib64/libgcc_s-4.1.2-20080102.so.1 337900d000-337900e000 rw-p 0000d000 fd:00 965068 /lib64/libgcc_s-4.1.2-20080102.so.1 3379200000-33792e6000 r-xp 00000000 fd:02 461360 /usr/lib64/libstdc++.so.6.0.8 33792e6000-33794e5000 ---p 000e6000 fd:02 461360 /usr/lib64/libstdc++.so.6.0.8 33794e5000-33794eb000 r--p 000e5000 fd:02 461360 /usr/lib64/libstdc++.so.6.0.8 33794eb000-33794ee000 rw-p 000eb000 fd:02 461360 /usr/lib64/libstdc++.so.6.0.8 33794ee000-3379500000 rw-p 33794ee000 00:00 0 2aaaaaaab000-2aaaaaab4000 r-xp 00000000 fd:00 965103 /lib64/libmt25218vpd.so.0.0 2aaaaaab4000-2aaaaacb3000 ---p 00009000 fd:00 965103 /lib64/libmt25218vpd.so.0.0 2aaaaacb3000-2aaaaacb4000 rw-p 00008000 fd:00 965103 /lib64/libmt25218vpd.so.0.0 2aaaaacb4000-2aaaaacb5000 rw-s 00000000 00:10 7445 /dev/SysIbt 2aaaaacb5000-2aaaaacb6000 rw-s 00000000 00:10 7445 /dev/SysIbt 2aaaaacb6000-2aaaaacb7000 rw-s ffffc20000a5a000 00:10 7445 /dev/SysIbt 2aaaaacb7000-2aaaaacb8000 rw-s ffff81024bced000 00:10 7445 /dev/SysIbt 2aaaaacb8000-2aaaaacb9000 rw-s ffff8102459da000 00:10 7445 /dev/SysIbt 2aaaaacb9000-2aaaaacba000 rw-p 2aaaaacb9000 00:00 0 2aaaaacba000-2aaaaaeba000 rw-p 2aaaaacba000 00:00 0 2aaaaaeba000-2aaaaaebb000 rw-p 2aaaaaeba000 00:00 0 2aaaaaede000-2aaaaaee8000 r-xp 00000000 fd:00 964828 /lib64/libnss_files-2.5.so 2aaaaaee8000-2aaaab0e7000 ---p 0000a000 fd:00 964828 /lib64/libnss_files-2.5.so 2aaaab0e7000-2aaaab0e8000 r--p 00009000 fd:00 964828 /lib64/libnss_files-2.5.so 2aaaab0e8000-2aaaab0e9000 rw-p 0000a000 fd:00 964828 /lib64/libnss_files-2.5.so 2aaaab0e9000-2aaaabaae000 rw-p 2aaaab0e9000 00:00 0 2aaaabaae000-2aaaabaaf000 ---p 2aaaabaae000 00:00 0 2aaaabaaf000-2aaab3aaf000 rwxp 2aaaabaaf000 00:00 0 2aaab3e80000-2aaab4622000 rw-p 2aaab3e80000 00:00 0 2aaab8000000-2aaab83f2000 rw-p 2aaab8000000 00:00 0 2aaab83f2000-2aaabc000000 ---p 2aaab83f2000 00:00 0 2aaabc000000-2aaabc021000 rw-p 2aaabc000000 00:00 0 2aaabc021000-2aaac0000000 ---p 2aaabc021000 00:00 0 2b222e787000-2b222e789000 rw-p 2b222e787000 00:00 0 2b222e789000-2b222e78a000 rw-s 00000000 00:10 7445 /dev/SysIbt 2b222e7ab000-2b222e7ac000 rw-p 2b222e7ab000 00:00 0 2b222e7ac000-2b222e832000 r-xp 00000000 fd:00 1029187 /opt/iba/lib64/shared/libmpich.so.1.0 2b222e832000-2b222ea31000 ---p 00086000 fd:00 1029187 /opt/iba/lib64/shared/libmpich.so.1.0 2b222ea31000-2b222ea36000 rw-p 00085000 fd:00 1029187 /opt/iba/lib64/shared/libmpich.so.1.0 2b222ea36000-2b222ea87000 rw-p 2b222ea36000 00:00 0 2b222ea87000-2b222ea9a000 r-xp 00000000 fd:00 965128 /lib64/libmpicm.so.1.0 2b222ea9a000-2b222ec99000 ---p 00013000 fd:00 965128 /lib64/libmpicm.so.1.0 2b222ec99000-2b222ec9b000 rw-p 00012000 fd:00 965128 /lib64/libmpicm.so.1.0 2b222ec9b000-2b222ecbe000 rw-p 2b222ec9b000 00:00 0 7fff7c30d000-7fff7c323000 rw-p 7fff7c30d000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] bash: line 1: 5887 Aborted /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=bass-gpu26.cs.unc.edu MPIRUN_PORT=47559 MPIRUN_PROCESSES='bass-gpu26:' MPIRUN_RANK=0 MPIRUN_NPROCS=1 MPIRUN_ID=5881 /home/spiglerg/bugtest/pt -------------- next part -------------- A non-text attachment was scrubbed... Name: pt.c Type: text/x-csrc Size: 823 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081111/115df7f0/pt-0001.bin From christian.guggenberger at rzg.mpg.de Tue Nov 11 12:57:27 2008 From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger) Date: Tue Nov 11 12:57:40 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.2 In-Reply-To: <20081111161954.GA19542@kappa.cse.ohio-state.edu> References: <20081111151416.GD32449@daltons.rzg.mpg.de> <20081111161954.GA19542@kappa.cse.ohio-state.edu> Message-ID: <20081111175727.GG32449@daltons.rzg.mpg.de> Hi Jaidev, > > > - Scalable and robust daemon-less job startup > > > - Enhanced and robust mpirun_rsh framework (non-MPD-based) to > > > provide scalable job launching on multi-thousand core clusters > > > - Available for OpenFabrics (IB and iWARP) and uDAPL interfaces > > > (including Solaris) > > > - Support for Totalview debugger > > > > > > > is there a way to tell 'mpirun_rsh' to look for a default hostfile (i.e. > > when no '-hostfile' option is supplied) ? > > > > If a hostfile is not specified, mpirun_rsh expects hostnames on the > command-line. For example, > mpirun_rsh -np 2 host1 host2 ./a.out > > It doesn't look for a default hostfile. > that's sad - is such a feature planned for a future release ? cheers. - Christian From sridharj at cse.ohio-state.edu Tue Nov 11 14:49:07 2008 From: sridharj at cse.ohio-state.edu (Jaidev Sridhar) Date: Tue Nov 11 14:49:16 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH2 1.2 In-Reply-To: <20081111175727.GG32449@daltons.rzg.mpg.de> References: <20081111151416.GD32449@daltons.rzg.mpg.de> <20081111161954.GA19542@kappa.cse.ohio-state.edu> <20081111175727.GG32449@daltons.rzg.mpg.de> Message-ID: <20081111194907.GA24603@kappa.cse.ohio-state.edu> Hi Christian, On Tue, Nov 11, 2008 at 06:57:27PM +0100, Christian Guggenberger wrote: > > > > It doesn't look for a default hostfile. > > > > that's sad - is such a feature planned for a future release ? > Thanks for your suggestion, this feature would be good to have. We'll work on adding this in the next release of MVAPICH and MVAPICH2. -Jaidev -- You can rent this space for only $5 a week. From daniel.s.kokron at nasa.gov Tue Nov 11 17:37:07 2008 From: daniel.s.kokron at nasa.gov (Dan Kokron) Date: Tue Nov 11 20:17:47 2008 Subject: [mvapich-discuss] invalid communicator Message-ID: <1226443027.17790.498.camel@outfield.gsfc.nasa.gov> I am seeing the following error message while running under mvapich-1.1rc1. 151 - MPI_SCATTERV : Communicator argument is not a valid communicator Special bit pattern 37c00000 in communicator is incorrect. May indicate an out-of-order argument or a freed communicator I noticed that a similar issue was raised in this forum in July and was followed up in Sept. with http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001916.html The followup does not indicate the coding error that was fixed. What coding error should I be looking for in my code? Any other suggestions to try? mpif90 -show ln -s /home/dkokron/play/mvapich-1.1rc1/include/mpif.h mpif.h /usr/local/intel/comp/9.1.052/bin/ifort -L/usr/lib64 -L/home/dkokron/play/mvapich-1.1rc1/lib -lmpichf90nc -lmpich -L/usr/lib64 -Wl,-rpath=/usr/lib64 -libverbs -libumad -lpthread -lpthread -lrt -- Dan Kokron Global Modeling and Assimilation Office NASA Goddard Space Flight Center Greenbelt, MD 20771 Daniel.S.Kokron@nasa.gov Phone: (301) 614-5192 Fax: (301) 614-5304 From panda at cse.ohio-state.edu Wed Nov 12 00:10:18 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Nov 12 00:10:29 2008 Subject: [mvapich-discuss] invalid communicator In-Reply-To: <1226443027.17790.498.camel@outfield.gsfc.nasa.gov> Message-ID: Dan - The reported error was because of some incorrect datatype format being used by the application. The user code was failing with MVAPICH2 1.2RC2 and also with MPICH2. After taking care of the datatype formating issue, the application was able to successfully run with MVAPICH2 1.2RC2. Here is a short explanation on this issue, as analyzed by ANL researchers. >The problem is that in Allgather, the contribution from rank 0 is stored >at outbuf, the contribution from rank 1 is stored at outbuf + recvcount * >extent(recvtype), the contribution from rank 2 is storead at outbuf + 2 * >recvcount * extent(recvtype), and so on. He has neglected to take the >extent of the recvtype into account, and hence expects the data to be >placed elsewhere. Ask him to look at the definition of MPI_Gather, which >explains how the received data is placed. I didn't check his test >program, but I suspect it writes outside the allocated buffer as a >result. Please check your application to see if this situation is arising or not. You can also verify your application with the standard MPICH or MPICH2 stack with the TCP/IP interface (over Ethernet). This will eliminate any IB-related issues. Thanks, DK On Tue, 11 Nov 2008, Dan Kokron wrote: > I am seeing the following error message while running under mvapich-1.1rc1. > > 151 - MPI_SCATTERV : Communicator argument is not a valid communicator > Special bit pattern 37c00000 in communicator is incorrect. May indicate an > out-of-order argument or a freed communicator > > I noticed that a similar issue was raised in this forum in July and was followed up in Sept. with > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-September/001916.html > > The followup does not indicate the coding error that was fixed. What > coding error should I be looking for in my code? Any other suggestions > to try? > > mpif90 -show > ln -s /home/dkokron/play/mvapich-1.1rc1/include/mpif.h mpif.h > /usr/local/intel/comp/9.1.052/bin/ifort -L/usr/lib64 > -L/home/dkokron/play/mvapich-1.1rc1/lib -lmpichf90nc -lmpich > -L/usr/lib64 -Wl,-rpath=/usr/lib64 -libverbs -libumad -lpthread > -lpthread -lrt > > > -- > Dan Kokron > Global Modeling and Assimilation Office > NASA Goddard Space Flight Center > Greenbelt, MD 20771 > Daniel.S.Kokron@nasa.gov > Phone: (301) 614-5192 > Fax: (301) 614-5304 > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From michael.heinz at qlogic.com Wed Nov 12 11:52:29 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed Nov 12 11:52:59 2008 Subject: [mvapich-discuss] fork() failing in mvapich1 and mvapich2, using OFED 1.4 Message-ID: I'm not sure when this stopped working, but I'm getting a complaint from our QA people that our fork() test program is failing with mvapich1 and mvapich2 when tested with OFED 1.4. When I tested with OFED 1.3.1, I got a similar result: [root@panic mpi_fork]$ mpirun_rsh -np 2 panic homer mpi_fork 128 1024 Exit code -3 signaled from homer Abort signaled by rank 0: [panic:0] Got completion with error IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 Killing remote processes...MPI process terminated unexpectedly DONE This is the program that generates the failure: #include #include #include #include #define MYBUFSIZE (4*1024*1028) #define MAX_REQ_NUM 100000 char s_buf1[MYBUFSIZE]; char r_buf1[MYBUFSIZE]; MPI_Request request[MAX_REQ_NUM]; MPI_Status my_stat[MAX_REQ_NUM]; int main(int argc,char *argv[]) { int myid, numprocs, i; int size, loop, page_size; char *s_buf, *r_buf; double t_start=0.0, t_end=0.0, t=0.0; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if ( argc < 3 ) { fprintf(stderr, "Usage: mpi_fork loop msg_size\n"); MPI_Finalize(); return 0; } size=atoi(argv[2]); loop = atoi(argv[1]); if(size > MYBUFSIZE){ fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE); MPI_Finalize(); return 0; } if(loop > MAX_REQ_NUM){ fprintf(stderr, "Maximum number of iterations is %d\n",MAX_REQ_NUM); MPI_Finalize(); return 0; } page_size = getpagesize(); s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * page_size); r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * page_size); assert( (s_buf != NULL) && (r_buf != NULL) ); for ( i=0; i Message-ID: Hi Mike, In order to have the fork support enabled you need to set an additional ENV. See Section 7.1.2 in the User Guide for more information: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-350007.1.2 Thanks, Matt On Wed, 12 Nov 2008, Mike Heinz wrote: > I'm not sure when this stopped working, but I'm getting a complaint from > our QA people that our fork() test program is failing with mvapich1 and > mvapich2 when tested with OFED 1.4. When I tested with OFED 1.3.1, I got > a similar result: > > > [root@panic mpi_fork]$ mpirun_rsh -np 2 panic homer mpi_fork 128 1024 > Exit code -3 signaled from homer > Abort signaled by rank 0: [panic:0] Got completion with error > IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 > > Killing remote processes...MPI process terminated unexpectedly > DONE > > > This is the program that generates the failure: > > #include > #include > #include > #include > > > #define MYBUFSIZE (4*1024*1028) > #define MAX_REQ_NUM 100000 > > char s_buf1[MYBUFSIZE]; > char r_buf1[MYBUFSIZE]; > > > MPI_Request request[MAX_REQ_NUM]; > MPI_Status my_stat[MAX_REQ_NUM]; > > int main(int argc,char *argv[]) > { > int myid, numprocs, i; > int size, loop, page_size; > char *s_buf, *r_buf; > double t_start=0.0, t_end=0.0, t=0.0; > > > MPI_Init(&argc,&argv); > MPI_Comm_size(MPI_COMM_WORLD,&numprocs); > MPI_Comm_rank(MPI_COMM_WORLD,&myid); > > if ( argc < 3 ) { > fprintf(stderr, "Usage: mpi_fork loop msg_size\n"); > MPI_Finalize(); > return 0; > } > size=atoi(argv[2]); > loop = atoi(argv[1]); > > if(size > MYBUFSIZE){ > fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE); > MPI_Finalize(); > return 0; > } > > if(loop > MAX_REQ_NUM){ > fprintf(stderr, "Maximum number of iterations is > %d\n",MAX_REQ_NUM); > MPI_Finalize(); > return 0; > } > > page_size = getpagesize(); > > s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * > page_size); > r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * > page_size); > > assert( (s_buf != NULL) && (r_buf != NULL) ); > > for ( i=0; i s_buf[i]='a'; > r_buf[i]='b'; > } > > /*warmup */ > if (myid == 0) > { > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > // fork a child process and make sure it lives beyond parent > touching pages > // if fork is not properly handled in stack, parent would get a copy > // of its registered/locked pages (such as qp wqes) on 1st access > // and problems such as Local Length Error would be reported by HCA > if (fork() == 0) { > // child exists but doesn't touch anything, parent still owns > pages > sleep(10); > // exec another program > execlp("date", "date", NULL); > // just in case exec fails > exit(0); > } > > MPI_Barrier(MPI_COMM_WORLD); > > if (myid == 0) > { > t_start=MPI_Wtime(); > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > t_end=MPI_Wtime(); > t = t_end - t_start; > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > > if ( myid == 0 ) { > double tmp; > tmp = ((size*1.0)/1.0e6)*loop; > fprintf(stdout,"%d\t%f\n", size, tmp/t); > } > { > int status; > int ret; > > ret = wait(&status); > if (ret == -1 || ! WIFEXITED(status) || WEXITSTATUS(status) != > 0) > { > fprintf(stdout,"ERROR: child failure: ret=%d, status=0x%x, > exit_status=%d\n", ret, status, WEXITSTATUS(status)); > } > } > > MPI_Barrier(MPI_COMM_WORLD); > MPI_Finalize(); > return 0; > } > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From michael.heinz at qlogic.com Wed Nov 12 12:22:22 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed Nov 12 12:22:53 2008 Subject: [mvapich-discuss] fork() failing in mvapich1 and mvapich2, using OFED 1.4 In-Reply-To: References: Message-ID: Thanks for the reply, Matt. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Matthew Koop [mailto:koop@cse.ohio-state.edu] Sent: Wednesday, November 12, 2008 12:13 PM To: Mike Heinz Cc: mvapich-discuss@cse.ohio-state.edu; general@lists.openfabrics.org Subject: Re: [mvapich-discuss] fork() failing in mvapich1 and mvapich2, using OFED 1.4 Hi Mike, In order to have the fork support enabled you need to set an additional ENV. See Section 7.1.2 in the User Guide for more information: http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-350 007.1.2 Thanks, Matt On Wed, 12 Nov 2008, Mike Heinz wrote: > I'm not sure when this stopped working, but I'm getting a complaint > from our QA people that our fork() test program is failing with > mvapich1 and > mvapich2 when tested with OFED 1.4. When I tested with OFED 1.3.1, I > got a similar result: > > > [root@panic mpi_fork]$ mpirun_rsh -np 2 panic homer mpi_fork 128 1024 > Exit code -3 signaled from homer Abort signaled by rank 0: [panic:0] > Got completion with error IBV_WC_LOC_LEN_ERR, code=1, dest rank=1 > > Killing remote processes...MPI process terminated unexpectedly DONE > > > This is the program that generates the failure: > > #include > #include > #include > #include > > > #define MYBUFSIZE (4*1024*1028) > #define MAX_REQ_NUM 100000 > > char s_buf1[MYBUFSIZE]; > char r_buf1[MYBUFSIZE]; > > > MPI_Request request[MAX_REQ_NUM]; > MPI_Status my_stat[MAX_REQ_NUM]; > > int main(int argc,char *argv[]) > { > int myid, numprocs, i; > int size, loop, page_size; > char *s_buf, *r_buf; > double t_start=0.0, t_end=0.0, t=0.0; > > > MPI_Init(&argc,&argv); > MPI_Comm_size(MPI_COMM_WORLD,&numprocs); > MPI_Comm_rank(MPI_COMM_WORLD,&myid); > > if ( argc < 3 ) { > fprintf(stderr, "Usage: mpi_fork loop msg_size\n"); > MPI_Finalize(); > return 0; > } > size=atoi(argv[2]); > loop = atoi(argv[1]); > > if(size > MYBUFSIZE){ > fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE); > MPI_Finalize(); > return 0; > } > > if(loop > MAX_REQ_NUM){ > fprintf(stderr, "Maximum number of iterations is > %d\n",MAX_REQ_NUM); > MPI_Finalize(); > return 0; > } > > page_size = getpagesize(); > > s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size * > page_size); > r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size * > page_size); > > assert( (s_buf != NULL) && (r_buf != NULL) ); > > for ( i=0; i s_buf[i]='a'; > r_buf[i]='b'; > } > > /*warmup */ > if (myid == 0) > { > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > // fork a child process and make sure it lives beyond parent > touching pages > // if fork is not properly handled in stack, parent would get a copy > // of its registered/locked pages (such as qp wqes) on 1st access > // and problems such as Local Length Error would be reported by HCA > if (fork() == 0) { > // child exists but doesn't touch anything, parent still owns > pages > sleep(10); > // exec another program > execlp("date", "date", NULL); > // just in case exec fails > exit(0); > } > > MPI_Barrier(MPI_COMM_WORLD); > > if (myid == 0) > { > t_start=MPI_Wtime(); > for ( i=0; i< loop; i++ ) { > MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD, > request+i); > } > > MPI_Waitall(loop, request, my_stat); > MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD, > &my_stat[0]); > > t_end=MPI_Wtime(); > t = t_end - t_start; > > }else{ > for ( i=0; i< loop; i++ ) { > MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD, > request+i); > } > MPI_Waitall(loop, request, my_stat); > MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD); > } > > if ( myid == 0 ) { > double tmp; > tmp = ((size*1.0)/1.0e6)*loop; > fprintf(stdout,"%d\t%f\n", size, tmp/t); > } > { > int status; > int ret; > > ret = wait(&status); > if (ret == -1 || ! WIFEXITED(status) || WEXITSTATUS(status) != > 0) > { > fprintf(stdout,"ERROR: child failure: ret=%d, status=0x%x, > exit_status=%d\n", ret, status, WEXITSTATUS(status)); > } > } > > MPI_Barrier(MPI_COMM_WORLD); > MPI_Finalize(); > return 0; > } > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From spiglerg at gmail.com Thu Nov 13 13:17:58 2008 From: spiglerg at gmail.com (SpiglerG) Date: Thu Nov 13 13:18:13 2008 Subject: [mvapich-discuss] Re: mvapich2 crash In-Reply-To: References: Message-ID: > Hi. > Since when I started using the BASS cluster, I've worked on porting my > applications to the new system (mainly libraries compatibility), but I > had some weird problems with using the installed MPI library. > After some coding, I could finally take the problem down to a simple case. > It seems that running an MPI program which uses pthread library, and > has threads instancing malloc/free calls leads to program crashes due > to `munmap_chunk() invalid pointer`s. > This could be depend on some memory locking or strict memory handling > from the MPI system; could someone help me solving it? > I'm attaching the source code I'm using (the stripped-down one), among > with an example of a crash. > I'm compiling with `mpicc -o pt pt.c -lpthread -fPIC` (-fPIC just to > get some more debug information), and running with `mpirun -np 1 > -machinefile machines $(pwd)/pt` (where machines contains a single > line with an allocated machine, eg using `qlogin -l gpus=4`, [as I'm > working on GPU nodes for my apps]). > > Hope someone can help me. > Giacomo Spigler > I've downloaded and compiled latest version of MVAPICH2, compiling with `./configure --prefix... --enable-cxx --enable-threads=multiple` and now the problems doesn't persist anymore. It could be that it was related to a now-fixed MVAPICH2 bug, or to a missing `enable-threads` (as the problem was related to threads), but it's solved. I would like to ask whether it is possible to do this officially (that is, without having to keep my mpicc executable to compile my programs), and upgrade the system this way. Thanks. Giacomo Spigler From koop at cse.ohio-state.edu Thu Nov 13 21:39:40 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Thu Nov 13 21:39:51 2008 Subject: [mvapich-discuss] Re: mvapich2 crash In-Reply-To: Message-ID: This is somewhat strange. Does using the latest MVAPICH2 without explitly setting --enable-threads=multiple work? There has been a bug fix to ptmalloc with the latest release as well. Your example works fine on my system here with the latest MVAPICH2: [koop@wci1-oib mvapich2-trunk3]$ ./bin/mpirun_rsh -np 1 wci1 ./pt In main: creating thread 0 In main: creating thread 1 In main: creating thread 2 In main: creating thread 3 In main: creating thread 4 Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! Hello, World! It's me, thread #5! Unless you are making MPI calls from the threads using MPI_THREAD_MULTIPLE should not be required (and would require MPI_Init_thread). Matt On Thu, 13 Nov 2008, SpiglerG wrote: > > Hi. > > Since when I started using the BASS cluster, I've worked on porting my > > applications to the new system (mainly libraries compatibility), but I > > had some weird problems with using the installed MPI library. > > After some coding, I could finally take the problem down to a simple case. > > It seems that running an MPI program which uses pthread library, and > > has threads instancing malloc/free calls leads to program crashes due > > to `munmap_chunk() invalid pointer`s. > > This could be depend on some memory locking or strict memory handling > > from the MPI system; could someone help me solving it? > > I'm attaching the source code I'm using (the stripped-down one), among > > with an example of a crash. > > I'm compiling with `mpicc -o pt pt.c -lpthread -fPIC` (-fPIC just to > > get some more debug information), and running with `mpirun -np 1 > > -machinefile machines $(pwd)/pt` (where machines contains a single > > line with an allocated machine, eg using `qlogin -l gpus=4`, [as I'm > > working on GPU nodes for my apps]). > > > > Hope someone can help me. > > Giacomo Spigler > > > > I've downloaded and compiled latest version of MVAPICH2, compiling > with `./configure --prefix... --enable-cxx --enable-threads=multiple` > and now the problems doesn't persist anymore. > It could be that it was related to a now-fixed MVAPICH2 bug, or to a > missing `enable-threads` (as the problem was related to threads), but > it's solved. > I would like to ask whether it is possible to do this officially (that > is, without having to keep my mpicc executable to compile my > programs), and upgrade the system this way. > Thanks. > > Giacomo Spigler > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From spiglerg at gmail.com Fri Nov 14 07:38:17 2008 From: spiglerg at gmail.com (SpiglerG) Date: Fri Nov 14 07:38:29 2008 Subject: [mvapich-discuss] Re: mvapich2 crash In-Reply-To: References: Message-ID: > This is somewhat strange. Does using the latest MVAPICH2 without explitly > setting --enable-threads=multiple work? There has been a bug fix to > ptmalloc with the latest release as well. > > Your example works fine on my system here with the latest MVAPICH2: > [koop@wci1-oib mvapich2-trunk3]$ ./bin/mpirun_rsh -np 1 wci1 ./pt > In main: creating thread 0 > In main: creating thread 1 > In main: creating thread 2 > In main: creating thread 3 > In main: creating thread 4 > Hello, World! It's me, thread #5! > Hello, World! It's me, thread #5! > Hello, World! It's me, thread #5! > Hello, World! It's me, thread #5! > Hello, World! It's me, thread #5! > > Unless you are making MPI calls from the threads using MPI_THREAD_MULTIPLE > should not be required (and would require MPI_Init_thread). > I'm not calling mpi commands from within threads, just mallocing/freeing. I think the problem was with ptmalloc library, as this was already suggested by a few other coders. Giacomo From panda at cse.ohio-state.edu Fri Nov 14 22:54:42 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri Nov 14 22:54:55 2008 Subject: [mvapich-discuss] Announcing the release of MVAPICH 1.1 Message-ID: The MVAPICH team is pleased to announce the availability of MVAPICH-1.1 with the following NEW features: - New Features for OpenFabrics Gen2-IB Interface - eXtended Reliable Connection (XRC) support - Lock-free design to provide support for asynchronous progress at both sender and receiver to overlap computation and communication - Optimized MPI_allgather collective - Efficient intra-node shared memory communication support for diskless clusters - Enhanced Totalview Support with the new mpirun_rsh framework - New OpenFabrics Gen2-Hybrid Interface - Replaces the Gen2-UD interface of MVAPICH 1.0 series - Targeted for large-scale IB clusters (multi-thousand cores) to provide highest performance and minimal memory usage - Support for UD, RC and XRC transports - Adaptive selection during run-time (based on application and systems characteristics) to switch between RC and UD (or between XRC and UD) transports - Delivers performance and scalability with near constant memory footprint for communication contexts - Zero-copy protocol with UD for large data transfer - Multiple buffer organizations with XRC support - Shared memory communication between cores within a node - Efficient intra-node shared memory communication support for diskless clusters - Multi-core optimized collectives (MPI_Bcast, MPI_Barrier, MPI_Reduce and MPI_Allreduce) - Optimized MPI_Allgather collective - Enhanced Totalview Support with the new mpirun_rsh framework - New Features for MVAPICH-InfiniPath (QLogic) Interface - Enhanced Totalview Support with the new mpirun_rsh framework - New Features for Shared-Memory only Interface - Enhanced Totalview Support with the new mpirun_rsh framework More details on all features and supported platforms can be obtained by visiting the following URL: http://mvapich.cse.ohio-state.edu/overview/mvapich/features.shtml MVAPICH 1.1 is being made available with OFED 1.4. It is also tested with OFED 1.3. It continues to deliver excellent performance. Sample performance numbers include: OpenFabrics/Gen2-IB on EM64T quad-core with PCIe2 and ConnectX-QDR: - 1.17 microsec one-way latency (4 bytes) - 2569 MB/sec unidirectional bandwidth - 5025 MB/sec bidirectional bandwidth OpenFabrics/Gen2-Hybrid on EM64T quad-core with PCIe2 and ConnectX-QDR: - 1.18 microsec one-way latency (4 bytes) - 2571 MB/sec unidirectional bandwidth - 5027 MB/sec bidirectional bandwidth OpenFabrics/Gen2-IB on Opteron quad-core with PCIe and ConnectX-DDR: - 1.62 microsec one-way latency (4 bytes) - 1628 MB/sec unidirectional bandwidth - 2889 MB/sec bidirectional bandwidth InfiniPath on EM64T quad-core with PCIe2 and QLogic-DDR: - 1.28 microsec one-way latency (4 bytes) - 1953 MB/sec unidirectional bandwidth Performance numbers for several other platforms, system configurations and operations can be viewed by visiting `Performance' section of the project's web page. For downloading MVAPICH 1.1 package and accessing the anonymous SVN, please visit the following URL: http://mvapich.cse.ohio-state.edu/ All feedbacks, including bug reports, hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list. Thanks, The MVAPICH Team From daniel.s.kokron at nasa.gov Tue Nov 18 13:35:19 2008 From: daniel.s.kokron at nasa.gov (Dan Kokron) Date: Tue Nov 18 13:39:58 2008 Subject: [mvapich-discuss] failure during startup when using mvapich2-1.2 Message-ID: <1227033319.13762.39.camel@outfield.gsfc.nasa.gov> My first attempt to use mvapich2-1.2 fails with the following errors. Any ideas why this is failing? The config.log is attached. mpirun_rsh -ssh -np 240 -hostfile machinefile ./GEOSgcm.x [cli_14]: readline failed Segmentation fault [cli_10]: readline failed [cli_11]: readline failed [cli_12]: readline failed [cli_8]: readline failed [cli_15]: readline failed [cli_9]: readline failed [cli_13]: readline failed Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 [cli_0]: readline failed Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Child exited abnormally! cleanupKilling remote processes...[cli_27]: readline failed Broken pipe [cli_4]: readline failed [cli_6]: [cli_3]: readline failed [cli_7]: readline failed readline failed [cli_1]: readline failed [cli_2]: readline failed [cli_5]: readline failed Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Broken pipe [cli_24]: readline failed [cli_30]: [cli_28]: readline failed readline failed [cli_25]: readline failed [cli_29]: readline failed [cli_31]: readline failed [cli_26]: readline failed Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310)...........: Initialization failed MPID_Init(113)..................: channel initialization failed MPIDI_CH3_Init(182).............: MPIDI_CH3I_SMP_Init(652)........: MPIDI_CH3I_SMP_pull_header(2599): PMI_KVS_Get returned 0 -- Dan Kokron Global Modeling and Assimilation Office NASA Goddard Space Flight Center Greenbelt, MD 20771 Daniel.S.Kokron@nasa.gov Phone: (301) 614-5192 Fax: (301) 614-5304 -------------- next part -------------- A non-text attachment was scrubbed... Name: config.log Type: text/x-log Size: 204629 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081118/34d9e2a5/config-0001.bin From nilesh_awate at yahoo.com Wed Nov 19 09:30:14 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Wed Nov 19 09:30:31 2008 Subject: [mvapich-discuss] messege truncated Message-ID: <732636.48804.qm@web94102.mail.in2.yahoo.com> Hi all, I am using mvapich2-1.0.3 with dapl interconnect (its a proprietary nic & dapl library) I got following error while running pallas over (amd dual core) 5 nodes cluster. Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff24744cec, count=952788905, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff24744cd0) failed MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -483811676 rank 0 in job 2 test01_40634 caused collective abort of all ranks exit status of rank 0: killed by signal 9 will you suggest where we should look for solving above error ? what can we interpret from above message ? wating for reply thanking Nilesh Bring your gang together. Do your thing. Find your favourite Yahoo! group at http://in.promos.yahoo.com/groups/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081119/945942fd/attachment.html From panda at cse.ohio-state.edu Wed Nov 19 10:57:36 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Nov 19 10:57:50 2008 Subject: [mvapich-discuss] messege truncated In-Reply-To: <732636.48804.qm@web94102.mail.in2.yahoo.com> Message-ID: MVAPICH2 1.2 was released around two weeks back. Can you try the latest version. DK On Wed, 19 Nov 2008, nilesh awate wrote: > Hi all, I am using mvapich2-1.0.3 with dapl interconnect (its a proprietary nic & dapl library) I got following error while running pallas over (amd dual core) 5 nodes cluster. Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff24744cec, count=952788905, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff24744cd0) failed MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -483811676 rank 0 in job 2 test01_40634 caused collective abort of all ranks exit status of rank 0: killed by signal 9 will you suggest where we should look for solving above error ? what can we interpret from above message ? wating for reply thanking Nilesh Bring your gang together. Do your thing. Find your favourite Yahoo! group at http://in.promos.yahoo.com/groups/ From nilesh_awate at yahoo.com Thu Nov 20 08:02:29 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Thu Nov 20 08:02:46 2008 Subject: [mvapich-discuss] messege truncated References: Message-ID: <696356.96646.qm@web94105.mail.in2.yahoo.com> Thanks for suggestion (use mvapich2-1.2) sir, I have tried the same but still we are facing same problem Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186).......................: MPI_Recv(buf=0x7fff1faf6008, count=945075466, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, status=0x7fff1faf5fe0) failed MPIDI_CH3U_Request_unpack_uebuf(590): Message truncated; 4 bytes received but buffer size is -514665432 rank 0 in job 4 test01_52519 caused collective abort of all ranks exit status of rank 0: killed by signal 9 is there any suggestion ? what does this error mean mean ? is this a result of data curruption/packet missing, or something else ? wating for reply Nilesh Awate ________________________________ From: Dhabaleswar Panda To: nilesh awate Cc: MVAPICH2 Sent: Wednesday, 19 November, 2008 9:27:36 PM Subject: Re: [mvapich-discuss] messege truncated MVAPICH2 1.2 was released around two weeks back. Can you try the latest version. DK On Wed, 19 Nov 2008, nilesh awate wrote: > Hi all, I am using mvapich2-1.0.3 with dapl interconnect (its a proprietary nic & dapl library) I got following error while running pallas over (amd dual core) 5 nodes cluster. Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff24744cec, count=952788905, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff24744cd0) failed MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -483811676 rank 0 in job 2 test01_40634 caused collective abort of all ranks exit status of rank 0: killed by signal 9 will you suggest where we should look for solving above error ? what can we interpret from above message ? wating for reply thanking Nilesh Bring your gang together. Do your thing. Find your favourite Yahoo! group at http://in.promos.yahoo.com/groups/ Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081120/009a7757/attachment.html From luitjens at cs.utah.edu Thu Nov 20 10:57:42 2008 From: luitjens at cs.utah.edu (Justin) Date: Thu Nov 20 10:58:10 2008 Subject: [mvapich-discuss] messege truncated In-Reply-To: <696356.96646.qm@web94105.mail.in2.yahoo.com> References: <696356.96646.qm@web94105.mail.in2.yahoo.com> Message-ID: <492588F6.20704@cs.utah.edu> The message means mpi received a message larger than the buffer size you specified. Namely in this case the buffer length is '-514665432' thus any length of message would be bigger than it. What I find odd is the parameters you are sending MPI_Recv. You are sending a count of '945075466' are you really sending a message that is a gigabyte in size? It might be possible that the count is being converted to a signed int causing it to wrap to a negative number. Check the size that you are specifying for the buffer. It is odd that you have it specified to be a GB in size when you are only receiving 2 bytes. nilesh awate wrote: > > Thanks for suggestion (use mvapich2-1.2) sir, > > I have tried the same but still we are facing same problem > > Fatal error in MPI_Recv: > Message truncated, error stack: > MPI_Recv(186).......................: MPI_Recv(buf=0x7fff1faf6008, > count=945075466, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, > status=0x7fff1faf5fe0) failed > MPIDI_CH3U_Request_unpack_uebuf(590): Message truncated; 4 bytes > received but buffer size is -514665432 > rank 0 in job 4 test01_52519 caused collective abort of all ranks > exit status of rank 0: killed by signal 9 > > is there any suggestion ? > > what does this error mean mean ? > > is this a result of data curruption/packet missing, or something else ? > > wating for reply > Nilesh Awate > > > > ------------------------------------------------------------------------ > *From:* Dhabaleswar Panda > *To:* nilesh awate > *Cc:* MVAPICH2 > *Sent:* Wednesday, 19 November, 2008 9:27:36 PM > *Subject:* Re: [mvapich-discuss] messege truncated > > MVAPICH2 1.2 was released around two weeks back. Can you try the latest > version. > > DK > > On Wed, 19 Nov 2008, nilesh awate wrote: > > > Hi all, > I am using mvapich2-1.0.3 with dapl interconnect (its a > proprietary nic & dapl library) > I got following error while running pallas over (amd dual core) 5 > nodes cluster. > > Fatal error in MPI_Recv: > Message truncated, error stack: > MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff24744cec, > count=952788905, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, > status=0x7fff24744cd0) failed > MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag > 1000 truncated; 4 bytes received but buffersize is -483811676 > rank 0 in job 2 test01_40634 caused collective abort of all ranks > exit status of rank 0: killed by signal 9 > > > will you suggest where we should look for solving above error ? > what can we interpret from above message ? > > wating for reply > thanking > Nilesh > > > Bring your gang together. Do your thing. Find your favourite > Yahoo! group at http://in.promos.yahoo.com/groups/ > > > ------------------------------------------------------------------------ > Add more friends to your messenger and enjoy! Invite them now. > > > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From nilesh_awate at yahoo.com Fri Nov 21 08:23:07 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Fri Nov 21 08:23:26 2008 Subject: [mvapich-discuss] messege truncated References: <696356.96646.qm@web94105.mail.in2.yahoo.com> <492588F6.20704@cs.utah.edu> Message-ID: <596548.81154.qm@web94104.mail.in2.yahoo.com> Hi Justine, We are running Pallas over mpi( dapl interconnect), I got the same error while running Pallas with tcp-ip(ethernet) network. Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff23cdd22c, count=976479459, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff23cdd210) failed MPIDI_CH3U_Post_data_receive_found(163): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -389049460 I am running it over AMD 5 nodes cluster having this (1Ghz Dual-Core AMD Opteron Processor 1216) configuration. I don't know how MPI_Recv got such a huge count. . .when Pallas is sending max 4194304Bytes is this some garbage value it receives ? waiting for reply, Nilesh ________________________________ From: Justin To: nilesh awate Cc: Dhabaleswar Panda ; MVAPICH2 Sent: Thursday, 20 November, 2008 9:27:42 PM Subject: Re: [mvapich-discuss] messege truncated The message means mpi received a message larger than the buffer size you specified. Namely in this case the buffer length is '-514665432' thus any length of message would be bigger than it. What I find odd is the parameters you are sending MPI_Recv. You are sending a count of '945075466' are you really sending a message that is a gigabyte in size? It might be possible that the count is being converted to a signed int causing it to wrap to a negative number. Check the size that you are specifying for the buffer. It is odd that you have it specified to be a GB in size when you are only receiving 2 bytes. nilesh awate wrote: > > Thanks for suggestion (use mvapich2-1.2) sir, > > I have tried the same but still we are facing same problem > > Fatal error in MPI_Recv: > Message truncated, error stack: > MPI_Recv(186)........................: MPI_Recv(buf=0x7fff1faf6008, count=945075466, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, status=0x7fff1faf5fe0) failed > MPIDI_CH3U_Request_unpack_uebuf(590): Message truncated; 4 bytes received but buffer size is -514665432 > rank 0 in job 4 test01_52519 caused collective abort of all ranks > exit status of rank 0: killed by signal 9 > > is there any suggestion ? > > what does this error mean mean ? > > is this a result of data curruption/packet missing, or something else ? > > wating for reply > Nilesh Awate > > > > ------------------------------------------------------------------------ > *From:* Dhabaleswar Panda > *To:* nilesh awate > *Cc:* MVAPICH2 > *Sent:* Wednesday, 19 November, 2008 9:27:36 PM > *Subject:* Re: [mvapich-discuss] messege truncated > > MVAPICH2 1.2 was released around two weeks back. Can you try the latest > version. > > DK > > On Wed, 19 Nov 2008, nilesh awate wrote: > > > Hi all, > I am using mvapich2-1.0.3 with dapl interconnect (its a proprietary nic & dapl library) > I got following error while running pallas over (amd dual core) 5 nodes cluster. > > Fatal error in MPI_Recv: > Message truncated, error stack: > MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff24744cec, count=952788905, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff24744cd0) failed > MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -483811676 > rank 0 in job 2 test01_40634 caused collective abort of all ranks > exit status of rank 0: killed by signal 9 > > > will you suggest where we should look for solving above error ? > what can we interpret from above message ? > > wating for reply > thanking > Nilesh > > > Bring your gang together. Do your thing. Find your favourite Yahoo! group at http://in.promos.yahoo.com/groups/ > > > ------------------------------------------------------------------------ > Add more friends to your messenger and enjoy! Invite them now. > ------------------------------------------------------------------------ > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > Did you know? You can CHAT without downloading messenger. Go to http://in.webmessenger.yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081121/478ec019/attachment-0001.html From luitjens at cs.utah.edu Fri Nov 21 10:39:51 2008 From: luitjens at cs.utah.edu (Justin) Date: Fri Nov 21 10:40:23 2008 Subject: [mvapich-discuss] messege truncated In-Reply-To: <596548.81154.qm@web94104.mail.in2.yahoo.com> References: <696356.96646.qm@web94105.mail.in2.yahoo.com> <492588F6.20704@cs.utah.edu> <596548.81154.qm@web94104.mail.in2.yahoo.com> Message-ID: <4926D647.7010703@cs.utah.edu> One thing that I have used to track down bugs of this nature in the past is to use the MPI_Errhandler functionality. Try placing this in your code after MPI_Init: MPI_Errhandler_set(MPI_COMM_WORLD,MPI_ERRORS_RETURN); Then at your MPI_Recv's add an if around them and some debugging output: if(MPI_Recv(...)!=MPI_SUCCESS) { char hostname[100]; gethostname(hostname,100); cout << "MPI Recv returned error on " << hostname << ":" << getpid() << endl; cout << "Waiting for a debugger\n"; while(1); } Then from here you should be able to ssh into the back node doing the processing (specified by the hostname above) and then attach gdb to the process (specified by the pid above). Make sure you have compiled with -g. Then look at the parameters to MPI_Recv and see if something doesn't look right. Good Luck, Justin nilesh awate wrote: > > Hi Justine, > > We are running Pallas over mpi( dapl interconnect), I got the same > error while running Pallas with tcp-ip(ethernet) network. > > Fatal error in MPI_Recv: > Message truncated, error stack: > MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff23cdd22c, > count=976479459, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, > status=0x7fff23cdd210) failed > MPIDI_CH3U_Post_data_receive_found(163): Message from rank 2 and tag > 1000 truncated; 4 bytes received but buffersize is -389049460 > > I am running it over AMD 5 nodes cluster having this (1Ghz Dual-Core > AMD Opteron Processor 1216) configuration. > > I don't know how MPI_Recv got such a huge count. . .when Pallas is > sending max 4194304Bytes > > is this some garbage value it receives ? > > waiting for reply, > > Nilesh > > > > > > > > > ------------------------------------------------------------------------ > *From:* Justin > *To:* nilesh awate > *Cc:* Dhabaleswar Panda ; MVAPICH2 > > *Sent:* Thursday, 20 November, 2008 9:27:42 PM > *Subject:* Re: [mvapich-discuss] messege truncated > > The message means mpi received a message larger than the buffer size > you specified. Namely in this case the buffer length is '-514665432' > thus any length of message would be bigger than it. What I find odd > is the parameters you are sending MPI_Recv. You are sending a count > of '945075466' are you really sending a message that is a gigabyte in > size? It might be possible that the count is being converted to a > signed int causing it to wrap to a negative number. Check the size > that you are specifying for the buffer. It is odd that you have it > specified to be a GB in size when you are only receiving 2 bytes. > nilesh awate wrote: > > > > Thanks for suggestion (use mvapich2-1.2) sir, > > > > I have tried the same but still we are facing same problem > > > > Fatal error in MPI_Recv: > > Message truncated, error stack: > > MPI_Recv(186).......................: MPI_Recv(buf=0x7fff1faf6008, > count=945075466, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, > status=0x7fff1faf5fe0) failed > > MPIDI_CH3U_Request_unpack_uebuf(590): Message truncated; 4 bytes > received but buffer size is -514665432 > > rank 0 in job 4 test01_52519 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > is there any suggestion ? > > > > what does this error mean mean ? > > > > is this a result of data curruption/packet missing, or something else ? > > > > wating for reply > > Nilesh Awate > > > > > > > > ------------------------------------------------------------------------ > > *From:* Dhabaleswar Panda > > > *To:* nilesh awate > > > *Cc:* MVAPICH2 > > > *Sent:* Wednesday, 19 November, 2008 9:27:36 PM > > *Subject:* Re: [mvapich-discuss] messege truncated > > > > MVAPICH2 1.2 was released around two weeks back. Can you try the latest > > version. > > > > DK > > > > On Wed, 19 Nov 2008, nilesh awate wrote: > > > > > Hi all, > > I am using mvapich2-1.0.3 with dapl interconnect (its a > proprietary nic & dapl library) > > I got following error while running pallas over (amd dual core) 5 > nodes cluster. > > > > Fatal error in MPI_Recv: > > Message truncated, error stack: > > MPI_Recv(186)..........................: > MPI_Recv(buf=0x7fff24744cec, count=952788905, MPI_INT, src=2, > tag=1000,MPI_COMM_WORLD, status=0x7fff24744cd0) failed > > MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag > 1000 truncated; 4 bytes received but buffersize is -483811676 > > rank 0 in job 2 test01_40634 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > > > will you suggest where we should look for solving above error ? > > what can we interpret from above message ? > > > > wating for reply > > thanking > > Nilesh > > > > > > Bring your gang together. Do your thing. Find your favourite > Yahoo! group at http://in.promos.yahoo.com/groups/ > > > > > > ------------------------------------------------------------------------ > > Add more friends to your messenger and enjoy! Invite them now. > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse..ohio-state.edu > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > > ------------------------------------------------------------------------ > Add more friends to your messenger and enjoy! Invite them now. > From mbozzore at platform.com Sat Nov 22 22:44:55 2008 From: mbozzore at platform.com (Mehdi Bozzo-Rey) Date: Sat Nov 22 22:45:07 2008 Subject: [mvapich-discuss] mvapich2 / rpm specfile / OFED 1.4 / blcr Message-ID: <531893A968B34D40B36C7A6445BC828A0210602B@catoexm06.noam.corp.platform.com> Hello, I used your MVAPICH2 spec file (the one available in the OFED distribution) and I noticed the following in the logs: * Thu Oct 09 2008 Jonathan Perkins - Change MV2_DEFAULT_MAX_WQE from 200 to 64 to reduce memory usage. - Fix mpirun_rsh ssh stdin bug. - Always build and install mpirun_rsh in addition to the process manager(s) selected through the --with-pm mechanism. - Remove various compilation warnings. I enabled the use of the BLCR library and noticed that (as mentioned in the logs) mpi_rsh is also built and installed, but does not work ... (because I use BLCR so only the mpd framework is supported / should be used). Am I right ? More precisely, the error I got was: [mbozzore@compute-0-0 examples]$ /opt/mvapich2/gnu/bin/mpirun_rsh -ssh -np 2 -hostfile ./hostfile ./cpi [Rank 0][cr.c: line 601]MV2_CKPT_MPD_BASE_PORT is not set Exit code -5 signaled from compute-0-0 cleanupKilling remote processes...[Rank 0][cr.c: line 601]MV2_CKPT_MPD_BASE_PORT is not set MPI process terminated unexpectedly MPI process terminated unexpectedly DONE [mbozzore@compute-0-0 examples]$ Signal 15 received. Signal 15 received. [mbozzore@compute-0-0 examples]$ /opt/mvapich2/gnu/bin/mpirun_rsh -ssh -np 1 -hostfile ./hostfile ./cpi Exit code -5 signaled from compute-0-0 cleanupKilling remote processes...[Rank 0][cr.c: line 601]MV2_CKPT_MPD_BASE_PORT is not set MPI process terminated unexpectedly DONE [mbozzore@compute-0-0 examples]$ Signal 15 received. So, should mpi_rsh be skipped during the install when blcr is used ? Best regards, Mehdi Mehdi Bozzo-Rey HPC Solution Developer Platform OCS5 Platform computing Phone: +1 905 948 4649 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081122/54a2432d/attachment.html From panda at cse.ohio-state.edu Sat Nov 22 23:15:14 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sat Nov 22 23:15:29 2008 Subject: [mvapich-discuss] mvapich2 / rpm specfile / OFED 1.4 / blcr In-Reply-To: <531893A968B34D40B36C7A6445BC828A0210602B@catoexm06.noam.corp.platform.com> Message-ID: Hi Mehdi, Jonathan will send a detailed reply wrt the rpm specfile. FYI, BLCR in MVAPICH2 1.2 is only supported with MPD, not with mpirun_rsh. This restriction will go away in the next release. For all other interfaces/modes, both mpirun_rsh and MPD are built and available for use. We strongly recommend users to use mpirun_rsh for its performance and scalability. Thanks, DK On Sat, 22 Nov 2008, Mehdi Bozzo-Rey wrote: > Hello, > > > > I used your MVAPICH2 spec file (the one available in the OFED > distribution) and I noticed the following in the logs: > > > > * Thu Oct 09 2008 Jonathan Perkins > > - Change MV2_DEFAULT_MAX_WQE from 200 to 64 to reduce memory usage. > > - Fix mpirun_rsh ssh stdin bug. > > - Always build and install mpirun_rsh in addition to the process > manager(s) > > selected through the --with-pm mechanism. > > - Remove various compilation warnings. > > > > I enabled the use of the BLCR library and noticed that (as mentioned in > the logs) mpi_rsh is also built and installed, but does not work ... > (because I use BLCR so only the mpd framework is supported / should be > used). Am I right ? > > > > More precisely, the error I got was: > > > > [mbozzore@compute-0-0 examples]$ /opt/mvapich2/gnu/bin/mpirun_rsh -ssh > -np 2 -hostfile ./hostfile ./cpi [Rank 0][cr.c: line > 601]MV2_CKPT_MPD_BASE_PORT is not set Exit code -5 signaled from > compute-0-0 cleanupKilling remote processes...[Rank 0][cr.c: line > 601]MV2_CKPT_MPD_BASE_PORT is not set MPI process terminated > unexpectedly MPI process terminated unexpectedly DONE > [mbozzore@compute-0-0 examples]$ Signal 15 received. > > Signal 15 received. > > > > [mbozzore@compute-0-0 examples]$ /opt/mvapich2/gnu/bin/mpirun_rsh -ssh > -np 1 -hostfile ./hostfile ./cpi Exit code -5 signaled from compute-0-0 > cleanupKilling remote processes...[Rank 0][cr.c: line > 601]MV2_CKPT_MPD_BASE_PORT is not set MPI process terminated > unexpectedly DONE [mbozzore@compute-0-0 examples]$ Signal 15 received. > > > > So, should mpi_rsh be skipped during the install when blcr is used ? > > > > > > Best regards, > > > > Mehdi > > > > > > > > Mehdi Bozzo-Rey > > HPC Solution Developer > > Platform OCS5 > > > Platform computing > > Phone: +1 905 948 4649 > > > > > > From noam.bernstein at nrl.navy.mil Mon Nov 24 09:54:17 2008 From: noam.bernstein at nrl.navy.mil (Noam Bernstein) Date: Mon Nov 24 09:54:32 2008 Subject: [mvapich-discuss] ifort 11.0.069 mvapich-1.1 weird conflict In-Reply-To: References: Message-ID: <18E8049F-BCFE-476D-A32D-A0E39B5B2406@nrl.navy.mil> I'm seeing a very strange problem, that I have no idea how to debug in the following simple FORTRAN 90 program: program test implicit none include 'mpif.h' logical iop integer stat call mpi_init(stat) open(unit=7,file="bulk.Cr.xyz",form="formatted",status="OLD", iostat=stat) print *, stat close(unit=7) inquire(7,opened=iop) print *, iop open(unit=7,file="bulk.Cr.xyz",form="formatted",status="OLD", iostat=stat) print *, stat call mpi_finalize(stat) end program I've compiled mvapich-1.1 with Intel 10.1.018. When I compile the program with Intel ifort 10.1.018, it works fine (i.e. both opens return stat=0). When I compile the program with ifort 11.0.069, the second ifort fails with status 40, which is recursive I/O (I/O statment executed from inside another I/O statement). I'd blame Intel first, of course, but this only happens when I use MPI. If I strip out the MPI things, the program works fine. Any ideas how to proceed? thanks, Noam From bachth at uni-mainz.de Mon Nov 24 10:46:14 2008 From: bachth at uni-mainz.de (Thomas Bach) Date: Mon Nov 24 10:46:31 2008 Subject: [mvapich-discuss] Error while compiling mvapich-1.1 with sunstudio Message-ID: <492ACC46.9050600@uni-mainz.de> Hi all, I'm trying to compile mvapich-1.1 with the sunstudio compiler (version 12.0 - most current patch level). See attached files for error messages. # rpm -q -a | egrep 'sun-(f90|cc)-12' sun-cc-12.0-8 sun-f90-12.0-5 # cat /etc/SuSE-release SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 PATCHLEVEL = 1 # suncc -V cc: Sun C 5.9 Linux_i386 Patch 124871-07 2008/10/08 # sunCC -V sunCC: Sun C++ 5.9 Linux_i386 Patch 124865-06 2008/08/22 # sunf77 -V f90: Sun Fortran 95 8.3 Linux_i386 Patch 127145-04 2008/04/17 Any suggestions how to solve this? Greets, Thomas Bach. -------------- next part -------------- cleaning directory mpi-io cleaning directory adio/common cleaning directory mpi-io/glue/mpich1 cleaning directory adio/ad_testfs cleaning directory adio/ad_ufs cleaning directory adio/ad_nfs cleaning directory mpi-io/fortran cleaning directory test rm -f .P* PI* *.o rm -f simple perf async coll_test coll_perf misc file_info excl large_array atomicity noncontig i_noncontig noncontig_coll split_coll shared_fp large_file psimple error status noncontig_coll2 fcoll_test fperf fmisc pfcoll_test cleaning src/pt2pt rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/pt2pt/*.o cleaning src/env rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/env/*.o cleaning src/dmpi rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/dmpi/*.o cleaning src/util rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/util/*.o cleaning src/context rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/context/*.o cleaning src/coll rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/coll/*.o cleaning src/topol rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/topol/*.o cleaning src/profile rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/profile/*.o cleaning src/misc2 rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/misc2/*.o cleaning src/external rm -f *.o *~ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/external/*.o cleaning mpid/ch_gen2 /bin/rm -f *.o *.d *~ viainit.o viasend.o viarecv.o viacheck.o viapriv.o viaparam.o viutil.o vbuf.o cm_user.o mpid_init.o mpid_send.o mpid_recv.o cm.o mpid_hsend.o mpid_hrecv.o mpid_pack.o cmnargs.o mpid_misc.o dreg.o mpid_smpi.o collutils.o intra_rdma_barrier.o mpid_mcast.o mcst_grp.o ibmcgrp.o crc32h.o avl.o mem_hooks.o viacoalesce.o shmem_coll.o async_progress.o calltrace.o objtrace.o \ queue.o sbcnst2.o tr2.o queue.c sbcnst2.c tr2.c sbcnst2.h tr2.h core ibmcgrp/*.o ibmcgrp/ibmcgrp \ /usr/local/cluster/src/mvapich/mvapich-1.1-sun/bin/ibmcgrp make --no-print-directory -C process clean rm -f *.o mpirun_rsh mpispawn minidaemon_client pmgr_collective_client.o pmgr_collective_common.o minidaemon.o mpirun_util.o cleaning examples rm -f sendchar *.o Making clean in directory test Making clean in directory pt2pt Making clean in directory coll Making clean in directory topol Making clean in directory env Making clean in directory context Making clean in directory profile Making clean in directory io Making clean in directory perftest cd /usr/local/cluster/src/mvapich/mvapich-1.1-sun/examples/perftest && true --foreign --include-deps Makefile cd /usr/local/cluster/src/mvapich/mvapich-1.1-sun/examples/perftest && true test -z "mpptest goptest buflimit " || rm -f mpptest goptest buflimit rm -f *.o core *.core rm -f tunepkt pktuse copytest vectest stress ctest cluster tcomm rm -f work.pc work.pcl cleaning examples/basic rm -f work.pc work.pcl rm -f *.o *~ PI* cpi systest srtest fpi cpilog cpi_autolog hello++ iotest pi3f90 upshot rdb.* startup.* core rm -f hello++.ti hello++.ii cleaning examples/test/pt2pt cleaning examples/test/coll cleaning examples/test/topol cleaning examples/test/context cleaning examples/test/env cleaning examples/test/profile cleaning examples/test Making clean in directory pt2pt Making clean in directory coll Making clean in directory topol Making clean in directory env Making clean in directory context Making clean in directory profile Making clean in directory io cleaning examples/perftest cd /usr/local/cluster/src/mvapich/mvapich-1.1-sun/examples/perftest && true --foreign --include-deps Makefile cd /usr/local/cluster/src/mvapich/mvapich-1.1-sun/examples/perftest && true test -z "mpptest goptest buflimit " || rm -f mpptest goptest buflimit rm -f *.o core *.core rm -f tunepkt pktuse copytest vectest stress ctest cluster tcomm rm -f work.pc work.pcl cleaning src/infoexport rm -f *.o rm -f *.i rm -f *.s rm -f *.cxx.log rm -f *.cxx.errors rm -f libtvmpich.so.1.0 rm -f libtvmpich.so* rm -f /usr/local/cluster/src/mvapich/mvapich-1.1-sun/src/infoexport/*.o rm -f *~ *.o aditest1 aditest2 aditest3 aditest4 aditest5 aditest6 aditest7 aditest8 aditest9 aditest10 aditest11 aditest12 aditest13 timers trunc rm -f /usr/local/cluster/src/mvapich/mvapich-1.1-sun/lib/lib*.a rm -f /usr/local/cluster/src/mvapich/mvapich-1.1-sun/lib/shared/lib*.so* make --no-print-directory mpi-modules make --no-print-directory mpilib for file in queue.c sbcnst2.c tr2.c sbcnst2.h tr2.h ; do \ if [ ! -s $file ] ; then \ ln -s ../util/$file; \ fi; \ done making mpir in directory mpid/ch_gen2 for file in queue.c sbcnst2.c tr2.c sbcnst2.h tr2.h ; do \ if [ ! -s $file ] ; then \ ln -s ../util/$file; \ fi; \ done suncc -DHAVE_CONFIG_H -I. -I/usr/local/cluster/src/mvapich/mvapich-1.1-sun/mpid/ch_gen2 -I/usr/local/cluster/src/mvapich/mvapich-1.1-sun/include -I/usr/local/cluster/src/mvapich/mvapich-1.1-sun/include -I/usr/local/cluster/src/mvapich/mvapich-1.1-sun/mpid/ch_gen2 -I/usr/local/cluster/src/mvapich/mvapich-1.1-sun/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -D_GNU_SOURCE -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 -DHAVE_MPICHCONF_H -I/usr/local/cluster/src/mvapich/mvapich-1.1-sun -I/usr/local/cluster/src/mvapich/mvapich-1.1-sun/mpid/ch_gen2 -I. -c viainit.c "ib_init.h", line 15: warning: initializer does not fit or is out of range: -1 "ib_init.h", line 37: warning: improper pointer/integer combination: op "<" "viainit.c", line 426: warning: implicit function declaration: ibv_create_xrc_srq "viainit.c", line 427: warning: improper pointer/integer combination: op "=" "viainit.c", line 428: undefined struct/union member: xrc_srq_num "viainit.c", line 428: improper member use: xrc_srq_num "viainit.c", line 692: warning: argument #3 is incompatible with prototype: prototype: pointer to function(pointer to void) returning pointer to void : "/usr/include/pthread.h", line 221 argument : pointer to void "viainit.c", line 1134: warning: argument mismatch "viainit.c", line 1144: undefined symbol: IBV_DEVICE_XRC "viainit.c", line 1160: warning: implicit function declaration: ibv_open_xrc_domain "viainit.c", line 1161: warning: improper pointer/integer combination: op "=" "viainit.c", line 1417: warning: argument #3 is incompatible with prototype: prototype: pointer to function(pointer to void) returning pointer to void : "/usr/include/pthread.h", line 221 argument : pointer to void "viainit.c", line 1541: warning: argument mismatch "viainit.c", line 1836: warning: implicit function declaration: ibv_close_xrc_domain cc: acomp failed for viainit.c make[3]: *** [viainit.o] Error 2 Exit status from make was 2 make[2]: *** [mpilib] Error 1 make[1]: *** [mpi-modules] Error 2 make: *** [mpi] Error 2 -------------- next part -------------- configure:suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 conftest.c -o conftest -L/opt/ofed/lib64 -Wl,-rpath=/opt/ofed/lib64 -libverbs -libumad -lpthread configure: failed program was: #define MPE_USE_EXTENSIONS 1 #define HAS_VOLATILE 1 #include "confdefs.h" #include int main() { exit(0); } int t() { /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_gethrtime) || defined (__stub___gethrtime) choke me #else /* Override any gcc2 internal prototype to avoid an error. */ extern char gethrtime(); gethrtime(); #endif } "conftest.c", line 3: warning: implicit function declaration: exit conftest.o: In function `t': conftest.c:(.text+0x28): undefined reference to `gethrtime' configure:suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 conftest.c -o conftest -L/opt/ofed/lib64 -Wl,-rpath=/opt/ofed/lib64 -libverbs -libumad -lpthread configure: failed program was: #define MPE_USE_EXTENSIONS 1 #define HAS_VOLATILE 1 #include "confdefs.h" #include int main() { exit(0); } int t() { /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_clock_gettime) || defined (__stub___clock_gettime) choke me #else /* Override any gcc2 internal prototype to avoid an error. */ extern char clock_gettime(); clock_gettime(); #endif } "conftest.c", line 3: warning: implicit function declaration: exit conftest.o: In function `t': conftest.c:(.text+0x28): undefined reference to `clock_gettime' configure:suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 conftest.c -o conftest -L/opt/ofed/lib64 -Wl,-rpath=/opt/ofed/lib64 -libverbs -libumad -lpthread configure: failed program was: #define MPE_USE_EXTENSIONS 1 #define HAS_VOLATILE 1 #include "confdefs.h" #include int main() { exit(0); } int t() { /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_clock_gettime) || defined (__stub___clock_gettime) choke me #else /* Override any gcc2 internal prototype to avoid an error. */ extern char clock_gettime(); clock_gettime(); #endif } "conftest.c", line 3: warning: implicit function declaration: exit conftest.o: In function `t': conftest.c:(.text+0x28): undefined reference to `clock_gettime' configure:suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 conftest.c -o conftest -L/opt/ofed/lib64 -Wl,-rpath=/opt/ofed/lib64 -libverbs -libumad -lpthread configure: failed program was: #define MPE_USE_EXTENSIONS 1 #define HAS_VOLATILE 1 #include "confdefs.h" #include int main() { exit(0); } int t() { /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_clock_getres) || defined (__stub___clock_getres) choke me #else /* Override any gcc2 internal prototype to avoid an error. */ extern char clock_getres(); clock_getres(); #endif } "conftest.c", line 3: warning: implicit function declaration: exit conftest.o: In function `t': conftest.c:(.text+0x28): undefined reference to `clock_getres' configure:suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 conftest.c -o conftest -L/opt/ofed/lib64 -Wl,-rpath=/opt/ofed/lib64 -libverbs -libumad -lpthread configure: failed program was: #define MPE_USE_EXTENSIONS 1 #define HAS_VOLATILE 1 #include "confdefs.h" #include int main() { exit(0); } int t() { /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_clock_getres) || defined (__stub___clock_getres) choke me #else /* Override any gcc2 internal prototype to avoid an error. */ extern char clock_getres(); clock_getres(); #endif } "conftest.c", line 3: warning: implicit function declaration: exit conftest.o: In function `t': conftest.c:(.text+0x28): undefined reference to `clock_getres' "conftest1.c", line 4: warning: implicit function declaration: gethostbyname NOTICE: Invoking /opt/sunstudio/sunstudio12/bin/f90 -f77 -ftrap=%none -o conftest conftest.f conftest1.o conftest.f: MAIN main: #include "confdefs.h" #include #include int func( int a, ... ){ int b; va_list ap; va_start( ap ); b = va_arg(ap, int); printf( "%d-%d\n", a, b ); va_end(ap); fflush(stdout); return 0; } int main() { func( 1, 2 ); return 0;} "conftest.c", line 7: warning: argument mismatch configure:suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 conftest.c -o conftest -L/opt/ofed/lib64 -Wl,-rpath=/opt/ofed/lib64 -libverbs -libumad -lpthread configure: failed program was: #define MPE_USE_EXTENSIONS 1 #define HAS_VOLATILE 1 #define HAVE_SIGNAL_H 1 #define HAVE_SIGACTION 1 #define HAVE_PRAGMA_WEAK 1 #define HAVE_WEAK_SYMBOLS 1 #define HAVE_UNAME 1 #define HAVE_NETDB_H 1 #define HAVE_GETHOSTBYNAME 1 #define HAVE_CATOPEN 1 #define HAVE_CATCLOSE 1 #define HAVE_CATGETS 1 #define HAVE_GENCAT 1 #define HAVE_NL_TYPES_H 1 #define STDC_HEADERS 1 #define HAVE_STDLIB_H 1 #define HAVE_STRING_H 1 #define HAVE_UNISTD_H 1 #define HAVE_STDARG_H 1 #define USE_STDARG 1 #define MALLOC_RET_VOID 1 #define HAVE_SYSTEM 1 #define HAVE_NICE 1 #define HAVE_STRDUP 1 #define HAVE_MEMORY_H 1 #define HAVE_SYS_IOCTL_H 1 #include "confdefs.h" #include int main() { exit(0); } int t() { main(); } "conftest.c", line 3: cannot find include file: "conftest.c", line 4: warning: implicit function declaration: exit cc: acomp failed for conftest.c suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 ... test for quotes in defn "conftest.c", line 3: warning: old-style declaration or incorrect type for: main suncc -D_X86_64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -D_SMP_ -D_SMP_RNDV_ -DXRC -DCH_GEN2 -D_GNU_SOURCE -I/opt/ofed//include -O3 conftest.o -o conftest foo.a -L/opt/ofed/lib64 -Wl,-rpath=/opt/ofed/lib64 -libverbs -libumad -lpthread From koop at cse.ohio-state.edu Mon Nov 24 11:21:28 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon Nov 24 11:21:40 2008 Subject: [mvapich-discuss] Error while compiling mvapich-1.1 with sunstudio In-Reply-To: <492ACC46.9050600@uni-mainz.de> Message-ID: Hi Thomas, What version of OFED do you have installed on your machine? (you may have the ofed_info command installed that can tell you what version is installed). The error you describe looks to be related to support for eXtended Reliable Connection (XRC) mode, which allows greater scalability for ConnectX cards. This requires OFED 1.3 or higher to be installed. You can compile without this support by removing the "-DXRC" part of the CFLAGS line within the make.mvapich.gen2 script. Let us know if this helps. Thanks, Matt On Mon, 24 Nov 2008, Thomas Bach wrote: > Hi all, > > I'm trying to compile mvapich-1.1 with the sunstudio compiler (version > 12.0 - most current patch level). > See attached files for error messages. > > # rpm -q -a | egrep 'sun-(f90|cc)-12' > sun-cc-12.0-8 > sun-f90-12.0-5 > > # cat /etc/SuSE-release > SUSE Linux Enterprise Server 10 (x86_64) > VERSION = 10 > PATCHLEVEL = 1 > > # suncc -V > cc: Sun C 5.9 Linux_i386 Patch 124871-07 2008/10/08 > # sunCC -V > sunCC: Sun C++ 5.9 Linux_i386 Patch 124865-06 2008/08/22 > # sunf77 -V > f90: Sun Fortran 95 8.3 Linux_i386 Patch 127145-04 2008/04/17 > > Any suggestions how to solve this? > > Greets, > Thomas Bach. > From perkinjo at cse.ohio-state.edu Mon Nov 24 14:39:21 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Mon Nov 24 14:39:35 2008 Subject: [mvapich-discuss] ifort 11.0.069 mvapich-1.1 weird conflict In-Reply-To: <18E8049F-BCFE-476D-A32D-A0E39B5B2406@nrl.navy.mil> References: <18E8049F-BCFE-476D-A32D-A0E39B5B2406@nrl.navy.mil> Message-ID: <20081124193921.GF2881@cse.ohio-state.edu> Noam: It appears that you are having this problem when using mvapich compiled with 10.1 and the mpi programs compiled with 11.0. When both the library and the program use 10.1 there is no problem. Can you make sure that both the MPI library and your application are compiled with the same compiler? This should not lead to any problems. On Mon, Nov 24, 2008 at 09:54:17AM -0500, Noam Bernstein wrote: > I'm seeing a very strange problem, that I have no idea how to debug in > the following simple FORTRAN 90 program: > > program test > implicit none > include 'mpif.h' > > logical iop > integer stat > > call mpi_init(stat) > > open(unit=7,file="bulk.Cr.xyz",form="formatted",status="OLD", > iostat=stat) > print *, stat > close(unit=7) > > inquire(7,opened=iop) > print *, iop > open(unit=7,file="bulk.Cr.xyz",form="formatted",status="OLD", > iostat=stat) > print *, stat > > call mpi_finalize(stat) > > end program > > I've compiled mvapich-1.1 with Intel 10.1.018. > > When I compile the program with Intel ifort 10.1.018, it works fine > (i.e. both > opens return stat=0). When I compile the program with ifort 11.0.069, > the > second ifort fails with status 40, which is recursive I/O (I/O statment > executed > from inside another I/O statement). I'd blame Intel first, of course, > but this > only happens when I use MPI. If I strip out the MPI things, the program > works > fine. > > Any ideas how to proceed? > > thanks, > Noam > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From noam.bernstein at nrl.navy.mil Mon Nov 24 15:15:30 2008 From: noam.bernstein at nrl.navy.mil (Noam Bernstein) Date: Mon Nov 24 15:15:47 2008 Subject: [mvapich-discuss] ifort 11.0.069 mvapich-1.1 weird conflict In-Reply-To: <20081124193921.GF2881@cse.ohio-state.edu> References: <18E8049F-BCFE-476D-A32D-A0E39B5B2406@nrl.navy.mil> <20081124193921.GF2881@cse.ohio-state.edu> Message-ID: On Nov 24, 2008, at 2:39 PM, Jonathan Perkins wrote: > Noam: > It appears that you are having this problem when using mvapich > compiled > with 10.1 and the mpi programs compiled with 11.0. When both the > library > and the program use 10.1 there is no problem. Can you make sure that > both the MPI library and your application are compiled with the same > compiler? This should not lead to any problems. Looks like you're right. I thought I was being careful with that, but apparently not enough. Perhaps I was lulled into a false confidence because different 10.1 subversions worked OK with the same MPI libraries. I'll just have to maintain differently compiled versions for 10.1 and 11.0 compiler versions. Noam From nilesh_awate at yahoo.com Tue Nov 25 09:09:07 2008 From: nilesh_awate at yahoo.com (nilesh awate) Date: Tue Nov 25 09:16:09 2008 Subject: [mvapich-discuss] messege truncated References: <696356.96646.qm@web94105.mail.in2.yahoo.com> <492588F6.20704@cs.utah.edu> <596548.81154.qm@web94104.mail.in2.yahoo.com> <4926D647.7010703@cs.utah.edu> Message-ID: <422288.41824.qm@web94107.mail.in2.yahoo.com> Hi all, I want to detail the information regarding this discussion as all my trials are failing over standards I am using RHEL5 on AMD opteron dual core, mvapich2-1.2(dapl interconnect; with and without RDMA_FAST_PATH) with mellanox network. I am running Pallas (with check) with above setup. I got following error Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff3072accc, count=896311571, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, status=0x7fff3072acb0) failed MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffer size is -709721012 rank 0 in job 5 test01_44984 caused collective abort of all ranks exit status of rank 0: killed by signal 9 Above error occurs in SendRecv benchmark most of the time. I ran same thing with gen2, it worked fine . . . but with dapl interconnect its failing waiting for reply, Nilesh Nilesh Awate C-DAC R&D ________________________________ From: Justin To: nilesh awate Cc: MVAPICH2 Sent: Friday, 21 November, 2008 9:09:51 PM Subject: Re: [mvapich-discuss] messege truncated One thing that I have used to track down bugs of this nature in the past is to use the MPI_Errhandler functionality. Try placing this in your code after MPI_Init: MPI_Errhandler_set(MPI_COMM_WORLD,MPI_ERRORS_RETURN); Then at your MPI_Recv's add an if around them and some debugging output: if(MPI_Recv(...)!=MPI_SUCCESS) { char hostname[100]; gethostname(hostname,100); cout << "MPI Recv returned error on " << hostname << ":" << getpid() << endl; cout << "Waiting for a debugger\n"; while(1); } Then from here you should be able to ssh into the back node doing the processing (specified by the hostname above) and then attach gdb to the process (specified by the pid above). Make sure you have compiled with -g. Then look at the parameters to MPI_Recv and see if something doesn't look right. Good Luck, Justin nilesh awate wrote: > > Hi Justine, > > We are running Pallas over mpi( dapl interconnect), I got the same error while running Pallas with tcp-ip(ethernet) network. > > Fatal error in MPI_Recv: > Message truncated, error stack: > MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff23cdd22c, count=976479459, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff23cdd210) failed > MPIDI_CH3U_Post_data_receive_found(163): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -389049460 > > I am running it over AMD 5 nodes cluster having this (1Ghz Dual-Core AMD Opteron Processor 1216) configuration. > > I don't know how MPI_Recv got such a huge count. . .when Pallas is sending max 4194304Bytes > > is this some garbage value it receives ? > > waiting for reply, > > Nilesh > > > > > > > ------------------------------------------------------------------------ > *From:* Justin > *To:* nilesh awate > *Cc:* Dhabaleswar Panda ; MVAPICH2 > *Sent:* Thursday, 20 November, 2008 9:27:42 PM > *Subject:* Re: [mvapich-discuss] messege truncated > > The message means mpi received a message larger than the buffer size you specified. Namely in this case the buffer length is '-514665432' thus any length of message would be bigger than it. What I find odd is the parameters you are sending MPI_Recv. You are sending a count of '945075466' are you really sending a message that is a gigabyte in size? It might be possible that the count is being converted to a signed int causing it to wrap to a negative number. Check the size that you are specifying for the buffer. It is odd that you have it specified to be a GB in size when you are only receiving 2 bytes. > nilesh awate wrote: > > > > Thanks for suggestion (use mvapich2-1.2) sir, > > > > I have tried the same but still we are facing same problem > > > > Fatal error in MPI_Recv: > > Message truncated, error stack: > > MPI_Recv(186)........................: MPI_Recv(buf=0x7fff1faf6008, count=945075466, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, status=0x7fff1faf5fe0) failed > > MPIDI_CH3U_Request_unpack_uebuf(590): Message truncated; 4 bytes received but buffer size is -514665432 > > rank 0 in job 4 test01_52519 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > is there any suggestion ? > > > > what does this error mean mean ? > > > > is this a result of data curruption/packet missing, or something else ? > > > > wating for reply > > Nilesh Awate > > > > > > > > ------------------------------------------------------------------------ > > *From:* Dhabaleswar Panda > > > *To:* nilesh awate > > > *Cc:* MVAPICH2 > > > *Sent:* Wednesday, 19 November, 2008 9:27:36 PM > > *Subject:* Re: [mvapich-discuss] messege truncated > > > > MVAPICH2 1.2 was released around two weeks back. Can you try the latest > > version. > > > > DK > > > > On Wed, 19 Nov 2008, nilesh awate wrote: > > > > > Hi all, > > I am using mvapich2-1.0.3 with dapl interconnect (its a proprietary nic & dapl library) > > I got following error while running pallas over (amd dual core) 5 nodes cluster. > > > > Fatal error in MPI_Recv: > > Message truncated, error stack: > > MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff24744cec, count=952788905, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff24744cd0) failed > > MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -483811676 > > rank 0 in job 2 test01_40634 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > > > will you suggest where we should look for solving above error ? > > what can we interpret from above message ? > > > > wating for reply > > thanking > > Nilesh > > > > > > Bring your gang together. Do your thing. Find your favourite Yahoo! group at http://in.promos.yahoo.com/groups/ > > > > > > ------------------------------------------------------------------------ > > Add more friends to your messenger and enjoy! Invite them now. > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse..ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > ------------------------------------------------------------------------ > Add more friends to your messenger and enjoy! Invite them now. Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081125/8a5406bd/attachment-0001.html From panda at cse.ohio-state.edu Wed Nov 26 16:55:02 2008 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed Nov 26 16:55:17 2008 Subject: [mvapich-discuss] messege truncated In-Reply-To: <422288.41824.qm@web94107.mail.in2.yahoo.com> Message-ID: Which version of Pallas are you running? As you might be knowing, Pallas benchmarks are outdated. They have been replaced with Intel MPI Benchmarks (IMB). The latest version is 3.1. Can you try your tests with IMB 3.1. Thanks, DK On Tue, 25 Nov 2008, nilesh awate wrote: > Hi all, I want to detail the information regarding this discussion as all my trials are failing over standards I am using RHEL5 on AMD opteron dual core, mvapich2-1.2(dapl interconnect; with and without RDMA_FAST_PATH) with mellanox network. I am running Pallas (with check) with above setup. I got following error Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff3072accc, count=896311571, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, status=0x7fff3072acb0) failed MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffer size is -709721012 rank 0 in job 5 test01_44984 caused collective abort of all ranks exit status of rank 0: killed by signal 9 Above error occurs in SendRecv benchmark most of the time. I ran same thing with gen2, it worked fine . . . but with dapl interconnect its failing waiting for reply, Nilesh Nilesh Awate C-DAC R&D ________________________________ From: Justin To: nilesh awate Cc: MVAPICH2 Sent: Friday, 21 November, 2008 9:09:51 PM Subject: Re: [mvapich-discuss] messege truncated One thing that I have used to track down bugs of this nature in the past is to use the MPI_Errhandler functionality. Try placing this in your code after MPI_Init: MPI_Errhandler_set(MPI_COMM_WORLD,MPI_ERRORS_RETURN); Then at your MPI_Recv's add an if around them and some debugging output: if(MPI_Recv(...)!=MPI_SUCCESS) { char hostname[100]; gethostname(hostname,100); cout << "MPI Recv returned error on " << hostname << ":" << getpid() << endl; cout << "Waiting for a debugger\n"; while(1); } Then from here you should be able to ssh into the back node doing the processing (specified by the hostname above) and then attach gdb to the process (specified by the pid above). Make sure you have compiled with -g. Then look at the parameters to MPI_Recv and see if something doesn't look right. Good Luck, Justin nilesh awate wrote: > > Hi Justine, > > We are running Pallas over mpi( dapl interconnect), I got the same error while running Pallas with tcp-ip(ethernet) network. > > Fatal error in MPI_Recv: > Message truncated, error stack: > MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff23cdd22c, count=976479459, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff23cdd210) failed > MPIDI_CH3U_Post_data_receive_found(163): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -389049460 > > I am running it over AMD 5 nodes cluster having this (1Ghz Dual-Core AMD Opteron Processor 1216) configuration. > > I don't know how MPI_Recv got such a huge count. . .when Pallas is sending max 4194304Bytes > > is this some garbage value it receives ? > > waiting for reply, > > Nilesh > > > > > > > ------------------------------------------------------------------------ > *From:* Justin > *To:* nilesh awate > *Cc:* Dhabaleswar Panda ; MVAPICH2 > *Sent:* Thursday, 20 November, 2008 9:27:42 PM > *Subject:* Re: [mvapich-discuss] messege truncated > > The message means mpi received a message larger than the buffer size you specified. Namely in this case the buffer length is '-514665432' thus any length of message would be bigger than it. What I find odd is the parameters you are sending MPI_Recv. You are sending a count of '945075466' are you really sending a message that is a gigabyte in size? It might be possible that the count is being converted to a signed int causing it to wrap to a negative number. Check the size that you are specifying for the buffer. It is odd that you have it specified to be a GB in size when you are only receiving 2 bytes. > nilesh awate wrote: > > > > Thanks for suggestion (use mvapich2-1.2) sir, > > > > I have tried the same but still we are facing same problem > > > > Fatal error in MPI_Recv: > > Message truncated, error stack: > > MPI_Recv(186)........................: MPI_Recv(buf=0x7fff1faf6008, count=945075466, MPI_INT, src=2, tag=1000, MPI_COMM_WORLD, status=0x7fff1faf5fe0) failed > > MPIDI_CH3U_Request_unpack_uebuf(590): Message truncated; 4 bytes received but buffer size is -514665432 > > rank 0 in job 4 test01_52519 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > is there any suggestion ? > > > > what does this error mean mean ? > > > > is this a result of data curruption/packet missing, or something else ? > > > > wating for reply > > Nilesh Awate > > > > > > > > ------------------------------------------------------------------------ > > *From:* Dhabaleswar Panda > > > *To:* nilesh awate > > > *Cc:* MVAPICH2 > > > *Sent:* Wednesday, 19 November, 2008 9:27:36 PM > > *Subject:* Re: [mvapich-discuss] messege truncated > > > > MVAPICH2 1.2 was released around two weeks back. Can you try the latest > > version. > > > > DK > > > > On Wed, 19 Nov 2008, nilesh awate wrote: > > > > > Hi all, > > I am using mvapich2-1.0.3 with dapl interconnect (its a proprietary nic & dapl library) > > I got following error while running pallas over (amd dual core) 5 nodes cluster. > > > > Fatal error in MPI_Recv: > > Message truncated, error stack: > > MPI_Recv(186)..........................: MPI_Recv(buf=0x7fff24744cec, count=952788905, MPI_INT, src=2, tag=1000,MPI_COMM_WORLD, status=0x7fff24744cd0) failed > > MPIDI_CH3U_Post_data_receive_found(243): Message from rank 2 and tag 1000 truncated; 4 bytes received but buffersize is -483811676 > > rank 0 in job 2 test01_40634 caused collective abort of all ranks > > exit status of rank 0: killed by signal 9 > > > > > > will you suggest where we should look for solving above error ? > > what can we interpret from above message ? > > > > wating for reply > > thanking > > Nilesh > > > > > > Bring your gang together. Do your thing. Find your favourite Yahoo! group at http://in.promos.yahoo.com/groups/ > > > > > > ------------------------------------------------------------------------ > > Add more friends to your messenger and enjoy! Invite them now. > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > mvapich-discuss mailing list > > mvapich-discuss@cse..ohio-state.edu > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > > > > ------------------------------------------------------------------------ > Add more friends to your messenger and enjoy! Invite them now. Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ From bachth at uni-mainz.de Fri Nov 28 06:04:02 2008 From: bachth at uni-mainz.de (Thomas Bach) Date: Fri Nov 28 22:47:02 2008 Subject: [mvapich-discuss] Error while compiling mvapich-1.1 with sunstudio In-Reply-To: (Matthew Koop's message of "Mon, 24 Nov 2008 17:21:28 +0100") References: Message-ID: <87skpct8m5.fsf@taris.box> Hi Matt, please excuse the late reply! Matthew Koop writes: > What version of OFED do you have installed on your machine? (you may have > the ofed_info command installed that can tell you what version is > installed). $ ofed_info | sed -ne '1p' OFED-1.2.5 > The error you describe looks to be related to support for eXtended > Reliable Connection (XRC) mode, which allows greater scalability for > ConnectX cards. This requires OFED 1.3 or higher to be installed. > > You can compile without this support by removing the "-DXRC" part of the > CFLAGS line within the make.mvapich.gen2 script. Yes, that helped. At least compilation and installation-routines are working fine now. But during testing make.mvapich.gen2 breaks. I'm going to compile some programs on my own and test the installation. I attached all log files and a cut-and-paste snippet of the last part I got on my terminal. Thank you, Thomas Bach. -------------- next part -------------- A non-text attachment was scrubbed... Name: config.log Type: application/octet-stream Size: 7065 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081128/5410e037/config-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: make-mine.log Type: application/octet-stream Size: 336756 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081128/5410e037/make-mine-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: output Type: application/octet-stream Size: 4862 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081128/5410e037/output-0001.obj From bachth at uni-mainz.de Fri Nov 28 08:32:39 2008 From: bachth at uni-mainz.de (Thomas Bach) Date: Fri Nov 28 22:47:03 2008 Subject: [mvapich-discuss] Error while compiling mvapich-1.1 with sunstudio In-Reply-To: (Matthew Koop's message of "Mon, 24 Nov 2008 17:21:28 +0100") References: Message-ID: <87y6z4kmbs.fsf@taris.box> Hi, I missinterpreted some signs in my last mail. Compilation still fails. With the original make.mvapich.gen2 shipped by your distribution it stops somewhere at viainit.c. I changed that a bit to get shared-libs, f90-modules, and to explicitly compile the f77 stuff (which it doesn't by defautl). So that the configure command looks like this: ./configure --with-device=ch_gen2 --with-arch=LINUX -prefix=${PREFIX} \ --with-romio --without-mpe -lib="$LIBS" --enable-cxx --enable-f77 \ --enable-f90modules --enable-f90 --enable-sharedlib 2>&1 |tee config-mine.log Now viainit.c compiles, but I get errors with cpplib which are ignored. The test-routine still crashes with the same output as the last compilation without adapted configure. Also I can't find any fortran files in the installation. Greets, Thomas Bach. -------------- next part -------------- A non-text attachment was scrubbed... Name: config-mine.log Type: application/octet-stream Size: 18172 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081128/d9412633/config-mine-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: make-mine.log Type: application/octet-stream Size: 332260 bytes Desc: not available Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20081128/d9412633/make-mine-0001.obj