<div class="gmail_quote"><div><div class="Wj3C7c"><div class="gmail_quote">I am using MVAPICH2 on a small set of workstations equipped with infiniband. I am using a GPU device library known as CUDA. CUDA uses page-locked memory areas, and I believe this is conflicting with MVAPICH2. If I run a series of broadcasts of size (1024, 2048, ..., 2MB) and run each size of broadcast a number of times (30 seems to work), nodes consistenly abort. A typical call stack:<br>
<br>2: 0x00002adab034db90 in memset () from /lib/libc.so.6<br>2: (gdb) 2: (gdb) bt<br>2: #0 0x00002adab034db90 in memset () from /lib/libc.so.6<br>2: #1 0x0000000000443bdd in allocate_vbuf_region ()<br>2: #2 0x0000000000443fe5 in get_vbuf ()<br>
2: #3 0x000000000043716f in MRAILI_Get_Vbuf ()<br>2: #4 0x000000000043739e in MPIDI_CH3I_MRAILI_Eager_send ()<br>2: #5 0x000000000042f483 in MPIDI_CH3_Rendezvous_r3_push ()<br>2: #6 0x000000000042f721 in MPIDI_CH3_Rendezvous_push ()<br>
2: #7 0x000000000042f9b1 in MPIDI_CH3I_MRAILI_Process_rndv ()<br>2: #8 0x000000000042d904 in MPIDI_CH3I_Progress ()<br>2: #9 0x0000000000420b6e in MPIC_Wait ()<br>2: #10 0x00000000004215b9 in MPIC_Send ()<br>2: #11 0x000000000041f62e in MPIR_Bcast ()<br>
2: #12 0x000000000042097e in PMPI_Bcast ()<br>2: #13 0x000000000041dff9 in dcgn::BroadcastRequest::performCollectiveGlobal (<br>2: this=0xc5c6f0) at infrastructure/src/BroadcastRequest.cpp:18<br>2: #14 0x000000000041e7f0 in dcgn::CollectiveRequest::poll (this=0xc5c6f0, <br>
2: ioRequests=@0x5f3708) at infrastructure/src/CollectiveRequest.cpp:75<br>2: #15 0x000000000041e75c in dcgn::CollectiveRequest::poll (this=0xc5c6f0, <br>2: ioRequests=@0x5f3708) at infrastructure/src/CollectiveRequest.cpp:84<br>
2: #16 0x00000000004146e3 in dcgn::MPIWorker::serviceRequest (this=0x5f3680, <br>2: req=0xc5c6f0, isShutdown=<value optimized out>)<br>2: at infrastructure/src/MPIWorker.cpp:118<br>2: #17 0x000000000041499f in dcgn::MPIWorker::loop (this=0x5f3680)<br>
2: at infrastructure/src/MPIWorker.cpp:78<br>2: #18 0x0000000000415986 in dcgn::MPIWorker::launchThread (<br>2: param=<value optimized out>) at infrastructure/src/MPIWorker.cpp:221<br>2: #19 0x000000000041ce1a in dcgn::Thread::run (p=0x801de0)<br>
2: at infrastructure/src/Thread.cpp:18<br>2: #20 0x00002adaaf78e297 in start_thread () from /lib/libpthread.so.0<br>2: #21 0x00002adab039a51e in clone () from /lib/libc.so.6<br>2: (gdb) up 13<br>2: #13 0x000000000041dff9 in dcgn::BroadcastRequest::performCollectiveGlobal (<br>
2: this=0xc5c6f0) at infrastructure/src/BroadcastRequest.cpp:18<br>2: 18 MPI_Bcast(buf, numBytes, MPI_BYTE, mpiWorker->getMPIRankByTarget(root), MPI_COMM_WORLD);<br>2: Current language: auto; currently c++<br>
2: (gdb) print numBytes<br>2: $1 = 2097152<br><br><br>Before this crash, successful broadcasts were performed with sizes 1K, 2K, all the way to 1MB. The next broadcast is a 2MB broadcast and causes this crash. I am not sure if any 2MB broadcasts were performed before this, or if the first 2MB broadcast is what fails.<br>
<br>Is there any way to either disable VBUF usage (I know this will cause performance degredation, but then again, slow performance is better than no performance) or limit vbuf usage? I tried limiting the number of vbufs to 1024 with a size of 16K (mpiexec -gdb -env MV2_VBUF_MAX 1024 -env MV2_VBUF_TOTAL_SIZE 16384) and I end up with <br>
<br>0: [0] Abort: VBUF alloc failure, limit exceeded at line 138 in file vbuf.c<br>
</div><br>
</div></div></div><br>