From huangwei at mvapich.cse.ohio-state.edu Fri Aug 1 15:51:13 2008 From: huangwei at mvapich.cse.ohio-state.edu (huangwei@mvapich.cse.ohio-state.edu) Date: Fri Aug 1 15:51:24 2008 Subject: [mvapich-commit] r2892 - mvapich2/trunk/src/mpid/ch3/include Message-ID: <200808011951.m71JpDuc011550@mvapich.cse.ohio-state.edu> Author: huangwei Date: 2008-08-01 15:51:11 -0400 (Fri, 01 Aug 2008) New Revision: 2892 Modified: mvapich2/trunk/src/mpid/ch3/include/mpidpre.h Log: Using int32_t for rank for larger jobs with 32k processes or more Modified: mvapich2/trunk/src/mpid/ch3/include/mpidpre.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/include/mpidpre.h 2008-07-31 17:56:19 UTC (rev 2891) +++ mvapich2/trunk/src/mpid/ch3/include/mpidpre.h 2008-08-01 19:51:11 UTC (rev 2892) @@ -65,7 +65,7 @@ typedef struct MPIDI_Message_match { int32_t tag; - int16_t rank; + int32_t rank; int16_t context_id; } MPIDI_Message_match; From koop at mvapich.cse.ohio-state.edu Fri Aug 1 21:34:48 2008 From: koop at mvapich.cse.ohio-state.edu (koop@mvapich.cse.ohio-state.edu) Date: Fri Aug 1 21:34:59 2008 Subject: [mvapich-commit] r2894 - mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2 Message-ID: <200808020134.m721YmmL012134@mvapich.cse.ohio-state.edu> Author: koop Date: 2008-08-01 21:34:46 -0400 (Fri, 01 Aug 2008) New Revision: 2894 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c Log: * Increase the maximum heap size Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-01 19:52:00 UTC (rev 2893) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-02 01:34:46 UTC (rev 2894) @@ -19,7 +19,7 @@ #define HEAP_MIN_SIZE (32*1024) #ifndef HEAP_MAX_SIZE -#define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */ +#define HEAP_MAX_SIZE (4*1024*1024) /* must be a power of two */ #endif /* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps From chail at mvapich.cse.ohio-state.edu Mon Aug 4 13:44:06 2008 From: chail at mvapich.cse.ohio-state.edu (chail@mvapich.cse.ohio-state.edu) Date: Mon Aug 4 13:44:18 2008 Subject: [mvapich-commit] r2896 - in mvapich2/trunk/src/mpid/ch3/channels/mrail: include src/rdma src/udapl Message-ID: <200808041744.m74Hi6a2023728@mvapich.cse.ohio-state.edu> Author: chail Date: 2008-08-04 13:44:04 -0400 (Mon, 04 Aug 2008) New Revision: 2896 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mpidi_ch3_impl.h mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_read_progress.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/rdma_udapl_init.c Log: For uDAPL device, terminating the on demand connection-setup server thread as soon as all the connections have been established. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mpidi_ch3_impl.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mpidi_ch3_impl.h 2008-08-02 01:35:47 UTC (rev 2895) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mpidi_ch3_impl.h 2008-08-04 17:44:04 UTC (rev 2896) @@ -56,6 +56,7 @@ MPIDI_CH3I_CM_type_t cm_type; /*a flag to indicate whether new connection been established*/ volatile int new_conn_complete; + int num_conn; #ifdef CKPT /*a flag to indicate some reactivation has finished*/ volatile int reactivation_complete; Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c 2008-08-02 01:35:47 UTC (rev 2895) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c 2008-08-04 17:44:04 UTC (rev 2896) @@ -51,6 +51,7 @@ if (pg_size > threshold) { MPIDI_CH3I_Process.cm_type = MPIDI_CH3I_CM_ON_DEMAND; + MPIDI_CH3I_Process.num_conn = 0; #if defined(DISABLE_PTMALLOC) && !defined(SOLARIS) MPIU_Error_printf("Error: On-demand connection management does " Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_read_progress.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_read_progress.c 2008-08-02 01:35:47 UTC (rev 2895) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_read_progress.c 2008-08-04 17:44:04 UTC (rev 2896) @@ -23,6 +23,9 @@ #ifdef DAPL_DEFAULT_PROVIDER #include "rdma_impl.h" extern MPIDI_CH3I_RDMA_Process_t MPIDI_CH3I_RDMA_Process; +extern struct smpi_var g_smpi; +extern int od_server_thread; +extern int cached_pg_size; #endif #ifdef DEBUG @@ -52,6 +55,20 @@ MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3I_READ_PROGRESS); MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3I_READ_PROGRESS); +#ifdef DAPL_DEFAULT_PROVIDER + if (od_server_thread && + MPIDI_CH3I_Process.num_conn >= cached_pg_size - g_smpi.num_local_nodes) { + int ret; + ret = pthread_cancel(MPIDI_CH3I_RDMA_Process.server_thread); + MPIU_Assert(ret == 0); + + ret = pthread_join(MPIDI_CH3I_RDMA_Process.server_thread, NULL); + MPIU_Assert(ret == 0); + + od_server_thread = 0; + } +#endif + *vc_pptr = NULL; *v_ptr = NULL; pg = MPIDI_Process.my_pg; Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/rdma_udapl_init.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/rdma_udapl_init.c 2008-08-02 01:35:47 UTC (rev 2895) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/rdma_udapl_init.c 2008-08-04 17:44:04 UTC (rev 2896) @@ -41,6 +41,8 @@ pthread_mutex_t cm_conn_state_lock; pthread_mutex_t cm_conn_state_lock_udapl; +int od_server_thread = 0; + extern DAT_EP_HANDLE temp_ep_handle; extern void MPIDI_CH3I_RDMA_util_atos (char *str, DAT_SOCK_ADDR * addr); @@ -720,7 +722,8 @@ return error; } - if (MPIDI_CH3I_Process.cm_type == MPIDI_CH3I_CM_ON_DEMAND) { + if (MPIDI_CH3I_Process.cm_type == MPIDI_CH3I_CM_ON_DEMAND && + od_server_thread) { ret = pthread_cancel(MPIDI_CH3I_RDMA_Process.server_thread); CHECK_RETURN (ret, "could not cancel server thread"); @@ -1522,6 +1525,7 @@ /* Start this thread after the EPs have been created and before any connection request has been made */ pthread_create(&MPIDI_CH3I_RDMA_Process.server_thread, NULL, od_conn_server, (void*)pg); + od_server_thread = 1; error = PMI_Barrier (); pthread_mutex_init(&cm_conn_state_lock, NULL); @@ -1612,6 +1616,7 @@ MRAILI_Init_vc (vc, cached_pg_rank); vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } else { /* someone just accepted my other request */ @@ -1619,6 +1624,7 @@ MRAILI_Init_vc (peer_vc, cached_pg_rank); peer_vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } } while (peer != i); #else @@ -1639,6 +1645,7 @@ MRAILI_Init_vc (vc, cached_pg_rank); vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } else { /* someone just accepted my other request */ @@ -1647,6 +1654,7 @@ MRAILI_Init_vc (peer_vc, cached_pg_rank); peer_vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } } while (size != 0); #endif @@ -1716,6 +1724,7 @@ MRAILI_Init_vc (vc, cached_pg_rank); vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } else { /* someone just accepted my request */ @@ -1723,6 +1732,7 @@ MRAILI_Init_vc (peer_vc, cached_pg_rank); peer_vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } } while (peer != i); #else @@ -1743,6 +1753,7 @@ MRAILI_Init_vc (vc, cached_pg_rank); vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } else { /* someone just accepted my request */ @@ -1751,6 +1762,7 @@ MRAILI_Init_vc (peer_vc, cached_pg_rank); peer_vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } } while (size != 0); #endif @@ -1852,6 +1864,7 @@ MRAILI_Init_vc (vc, cached_pg_rank); vc->ch.state = MPIDI_CH3I_VC_STATE_IDLE; MPIDI_CH3I_Process.new_conn_complete = 1; + MPIDI_CH3I_Process.num_conn++; } else { ret = dat_ep_reset (vc->mrail.qp_hndl[0]); From curtisbr at mvapich.cse.ohio-state.edu Wed Aug 6 14:25:23 2008 From: curtisbr at mvapich.cse.ohio-state.edu (curtisbr@mvapich.cse.ohio-state.edu) Date: Wed Aug 6 14:25:32 2008 Subject: [mvapich-commit] r2906 - mvapich2/trunk/src/mpid/ch3/channels/mrail Message-ID: <200808061825.m76IPNjd010118@mvapich.cse.ohio-state.edu> Author: curtisbr Date: 2008-08-06 14:25:21 -0400 (Wed, 06 Aug 2008) New Revision: 2906 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in Log: Remove /lib requirement from path. The path specified must now be to the BLCR library. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in 2008-08-06 17:45:04 UTC (rev 2905) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in 2008-08-06 18:25:21 UTC (rev 2906) @@ -181,7 +181,7 @@ AC_CHECK_HEADER([infiniband/verbs.h],,[AC_MSG_ERROR(['infiniband/verbs.h not found. Did you specify --with-ib-include=?'])]) mrail_libs="-libverbs -libumad $mrail_libs" - AC_MSG_CHECKING([for path to BLCR]) + AC_MSG_CHECKING([for path to BLCR library]) if test "$with_blcr" != "no"; then if test "$enable_rdma_cm" = "default"; then @@ -194,10 +194,10 @@ AC_MSG_ERROR([Registration caching is required for BLCR support.]) fi - mrail_ld_library_path=${with_blcr}/lib:$mrail_ld_library_path - mrail_fflags="-L${with_blcr}/lib $mrail_fflags" - mrail_ldflags="-L${with_blcr}/lib $mrail_ldflags" - LDFLAGS="-L${with_blcr}/lib $LDFLAGS" + mrail_ld_library_path=${with_blcr}:$mrail_ld_library_path + mrail_fflags="-L${with_blcr} $mrail_fflags" + mrail_ldflags="-L${with_blcr} $mrail_ldflags" + LDFLAGS="-L${with_blcr} $LDFLAGS" AC_SEARCH_LIBS(cr_init, cr,,[AC_MSG_ERROR(['libcr not found.'])],) AC_CHECK_HEADER([libcr.h],,[AC_MSG_ERROR(['libcr.h not found.'])]) mrail_libs="-lcr $mrail_libs" From curtisbr at mvapich.cse.ohio-state.edu Wed Aug 6 16:40:41 2008 From: curtisbr at mvapich.cse.ohio-state.edu (curtisbr@mvapich.cse.ohio-state.edu) Date: Wed Aug 6 16:40:57 2008 Subject: [mvapich-commit] r2908 - in mvapich2/trunk/src: mpid/ch3/channels/mrail pm/mpd pm/mpirun Message-ID: <200808062040.m76KefLX010377@mvapich.cse.ohio-state.edu> Author: curtisbr Date: 2008-08-06 16:40:41 -0400 (Wed, 06 Aug 2008) New Revision: 2908 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args mvapich2/trunk/src/pm/mpd/configure.in mvapich2/trunk/src/pm/mpirun/configure.in Log: Support non-standard path for BLCR header files. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in 2008-08-06 20:39:19 UTC (rev 2907) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in 2008-08-06 20:40:41 UTC (rev 2908) @@ -48,8 +48,12 @@ pthread.h \ ) -AC_ARG_WITH(blcr, -[--with-blcr=path - Enable Berkeley Lab Checkpoint/Restart support.],,with_blcr=no) +AC_ARG_ENABLE(blcr, +[--enable-blcr - Enable Berkeley Lab Checkpoint/Restart support.],,enable_blcr=no) +AC_ARG_WITH(blcr-include, +[--with-blcr-include=path - Specify the path to the BLCR header files.],,) +AC_ARG_WITH(blcr-libpath, +[--with-blcr-libpath=path - Specify the path to the BLCR library.],,with_blcr_libpath=default) AC_ARG_WITH(cluster-size, [--with-cluster-size=level - Specify the cluster size.],,with_cluster_size=small) AC_ARG_WITH(dapl-include, @@ -181,9 +185,10 @@ AC_CHECK_HEADER([infiniband/verbs.h],,[AC_MSG_ERROR(['infiniband/verbs.h not found. Did you specify --with-ib-include=?'])]) mrail_libs="-libverbs -libumad $mrail_libs" - AC_MSG_CHECKING([for path to BLCR library]) + AC_MSG_CHECKING([whether to enable support for BLCR]) + AC_MSG_RESULT($enable_blcr) - if test "$with_blcr" != "no"; then + if test "$enable_blcr" = "yes"; then if test "$enable_rdma_cm" = "default"; then enable_rdma_cm=no elif test "$enable_rdma_cm" = "yes"; then @@ -194,18 +199,35 @@ AC_MSG_ERROR([Registration caching is required for BLCR support.]) fi - mrail_ld_library_path=${with_blcr}:$mrail_ld_library_path - mrail_fflags="-L${with_blcr} $mrail_fflags" - mrail_ldflags="-L${with_blcr} $mrail_ldflags" - LDFLAGS="-L${with_blcr} $LDFLAGS" + AC_MSG_CHECKING([for the BLCR includes path]) + + if test -n "$with_ib_include"; then + AC_MSG_RESULT($with_ib_include) + else + AC_MSG_RESULT([default]) + fi + + AC_MSG_CHECKING([for the BLCR library path]) + + if test "$with_blcr_libpath" != "default"; then + if test ! -d $with_blcr_libpath; then + AC_MSG_ERROR([The specified BLCR library path is invalid.]) + fi + + mrail_ld_library_path=${with_blcr_libpath}:$mrail_ld_library_path + mrail_fflags="-L${with_blcr_libpath} $mrail_fflags" + mrail_ldflags="-L${with_blcr_libpath} $mrail_ldflags" + LDFLAGS="-L${with_blcr_libpath} $LDFLAGS" + fi + + AC_MSG_RESULT($with_blcr_libpath) + AC_SEARCH_LIBS(cr_init, cr,,[AC_MSG_ERROR(['libcr not found.'])],) AC_CHECK_HEADER([libcr.h],,[AC_MSG_ERROR(['libcr.h not found.'])]) mrail_libs="-lcr $mrail_libs" AC_DEFINE(CKPT, 1, [Define to enable Berkeley Lab Checkpoint-Restart support.]) fi - AC_MSG_RESULT($with_blcr) - AC_MSG_CHECKING([whether to enable support for RDMA CM]) if test "$enable_rdma_cm" = "default" -o "$enable_rdma_cm" = "yes"; then Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args 2008-08-06 20:39:19 UTC (rev 2907) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args 2008-08-06 20:40:41 UTC (rev 2908) @@ -18,9 +18,12 @@ gen2) for arg in $ac_configure_args; do case $arg in - \'--with-blcr=*) - with_blcr=`echo $arg | sed -e 's/--with-blcr=//' -e "s/'//g"` - CPPFLAGS="-I${with_blcr}/include $CPPFLAGS" + \'--with-blcr-include=*) + with_blcr_include=`echo $arg | sed -e 's/--with-blcr-include=//' -e "s/'//g"` + + if test -n "$with_blcr_include"; then + CPPFLAGS="-I${with_blcr_include}/include $CPPFLAGS" + fi ;; \'--with-ib-include=*) with_ib_include=`echo $arg | sed -e 's/--with-ib-include=//' -e "s/'//g"` Modified: mvapich2/trunk/src/pm/mpd/configure.in =================================================================== --- mvapich2/trunk/src/pm/mpd/configure.in 2008-08-06 20:39:19 UTC (rev 2907) +++ mvapich2/trunk/src/pm/mpd/configure.in 2008-08-06 20:40:41 UTC (rev 2908) @@ -72,10 +72,10 @@ echo Using INSTALL=$INSTALL dnl <_OSU_MVAPICH_> -AC_ARG_WITH(blcr, -[--with-blcr - Enable support for Berkeley Lab Checkpoint/Restart],,with_blcr=no) +AC_ARG_ENABLE(blcr, +[--enable-blcr - Enable support for Berkeley Lab Checkpoint/Restart],,enable_blcr=no) -if test "$with_blcr" != "no"; then +if test "$enable_blcr" = "yes"; then MPD_OTHERS=mpiexec_cr fi Modified: mvapich2/trunk/src/pm/mpirun/configure.in =================================================================== --- mvapich2/trunk/src/pm/mpirun/configure.in 2008-08-06 20:39:19 UTC (rev 2907) +++ mvapich2/trunk/src/pm/mpirun/configure.in 2008-08-06 20:40:41 UTC (rev 2908) @@ -32,8 +32,8 @@ AC_CONFIG_AUX_DIR(../../../confdb) AC_CANONICAL_BUILD -AC_ARG_WITH(blcr, -[--with-blcr - Enable Berkeley Lab Checkpoint/Restart support.],,with_blcr=no) +AC_ARG_ENABLE(blcr, +[--enable-blcr - Enable Berkeley Lab Checkpoint/Restart support.],,enable_blcr=no) AC_ARG_ENABLE(rsh, [--enable-rsh - Enable use of rsh for command execution by default.],,enable_rsh=no) @@ -61,7 +61,7 @@ mpispawn_other_libs="-lresolv -lsocket -lnsl" fi -if test "$with_blcr" != "no"; then +if test "$enable_blcr" = "yes"; then AC_MSG_ERROR([BLCR is not supported by mpirun.]) fi From curtisbr at mvapich.cse.ohio-state.edu Wed Aug 6 17:39:05 2008 From: curtisbr at mvapich.cse.ohio-state.edu (curtisbr@mvapich.cse.ohio-state.edu) Date: Wed Aug 6 17:39:15 2008 Subject: [mvapich-commit] r2910 - mvapich2/trunk/src/mpid/ch3/channels/mrail Message-ID: <200808062139.m76Ld5Ow010519@mvapich.cse.ohio-state.edu> Author: curtisbr Date: 2008-08-06 17:39:05 -0400 (Wed, 06 Aug 2008) New Revision: 2910 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in Log: Test with_blcr_include. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in 2008-08-06 21:38:29 UTC (rev 2909) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/configure.in 2008-08-06 21:39:05 UTC (rev 2910) @@ -201,8 +201,8 @@ AC_MSG_CHECKING([for the BLCR includes path]) - if test -n "$with_ib_include"; then - AC_MSG_RESULT($with_ib_include) + if test -n "$with_blcr_include"; then + AC_MSG_RESULT($with_blcr_include) else AC_MSG_RESULT([default]) fi From curtisbr at mvapich.cse.ohio-state.edu Wed Aug 6 19:14:07 2008 From: curtisbr at mvapich.cse.ohio-state.edu (curtisbr@mvapich.cse.ohio-state.edu) Date: Wed Aug 6 19:14:17 2008 Subject: [mvapich-commit] r2912 - mvapich2/trunk/src/mpid/ch3/channels/mrail Message-ID: <200808062314.m76NE7ee010704@mvapich.cse.ohio-state.edu> Author: curtisbr Date: 2008-08-06 19:14:06 -0400 (Wed, 06 Aug 2008) New Revision: 2912 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args Log: Remove appending of /include to the new blcr include path variable. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args 2008-08-06 23:13:29 UTC (rev 2911) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/setup_channel.args 2008-08-06 23:14:06 UTC (rev 2912) @@ -22,7 +22,7 @@ with_blcr_include=`echo $arg | sed -e 's/--with-blcr-include=//' -e "s/'//g"` if test -n "$with_blcr_include"; then - CPPFLAGS="-I${with_blcr_include}/include $CPPFLAGS" + CPPFLAGS="-I${with_blcr_include} $CPPFLAGS" fi ;; \'--with-ib-include=*) From gopalakk at mvapich.cse.ohio-state.edu Wed Aug 6 21:42:53 2008 From: gopalakk at mvapich.cse.ohio-state.edu (gopalakk@mvapich.cse.ohio-state.edu) Date: Wed Aug 6 21:43:04 2008 Subject: [mvapich-commit] r2913 - mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2 Message-ID: <200808070142.m771grfd010951@mvapich.cse.ohio-state.edu> Author: gopalakk Date: 2008-08-06 21:42:51 -0400 (Wed, 06 Aug 2008) New Revision: 2913 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/cr.c Log: Fix a bug in CR code. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/cr.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/cr.c 2008-08-06 23:14:06 UTC (rev 2912) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/cr.c 2008-08-07 01:42:51 UTC (rev 2913) @@ -832,7 +832,7 @@ /* * Don't try to destroy the QP when SMP is used. */ - if (!SMP_INIT) { + if (!(SMP_INIT && (vc->smp.local_nodes >= 0))) { for (rail_index = 0; rail_index < vc->mrail.num_rails; ++rail_index) { ibv_destroy_qp(vc->mrail.rails[rail_index].qp_hndl); From koop at mvapich.cse.ohio-state.edu Thu Aug 7 20:13:37 2008 From: koop at mvapich.cse.ohio-state.edu (koop@mvapich.cse.ohio-state.edu) Date: Thu Aug 7 20:13:49 2008 Subject: [mvapich-commit] r2917 - in mvapich2/trunk/src/mpid/ch3/channels/mrail/src: gen2 udapl Message-ID: <200808080013.m780DbtC020247@mvapich.cse.ohio-state.edu> Author: koop Date: 2008-08-07 20:13:35 -0400 (Thu, 07 Aug 2008) New Revision: 2917 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h Log: * Instead of trying to deregister entries from within the malloc/free hooks, just queue it to be done when the next entry into the dreg cache is made. This prevents a deadlock condition from being able to occur. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-07 20:21:53 UTC (rev 2916) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-08 00:13:35 UTC (rev 2917) @@ -97,13 +97,14 @@ #if !defined(DISABLE_PTMALLOC) static pthread_spinlock_t dreg_lock = 0; +static pthread_spinlock_t dereg_lock = 0; static pthread_t th_id_of_lock; /* Array which stores the memory regions * ptrs which are to be deregistered after * free hook pulls them out of the reg cache */ -static struct ibv_mr** deregister_mr_array; +static dreg_region *deregister_mr_array; /* Number of pending deregistration * operations @@ -638,18 +639,18 @@ #if !defined(DISABLE_PTMALLOC) pthread_spin_init(&dreg_lock, 0); + pthread_spin_init(&dereg_lock, 0); - deregister_mr_array = (struct ibv_mr**) MPIU_Malloc(sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); + deregister_mr_array = (dreg_region*) MPIU_Malloc(sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); - if (NULL == deregister_mr_array) - { + if (NULL == deregister_mr_array) { ibv_va_error_abort( GEN_EXIT_ERR, "dreg_init: unable to malloc %d bytes", - (int) sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); + (int) sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); } - memset(deregister_mr_array, 0, sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); + memset(deregister_mr_array, 0, sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); n_dereg_mr = 0; INIT_FREE_LIST(&vma_free_list); @@ -659,6 +660,17 @@ #if !defined(DISABLE_PTMALLOC) +static void lock_dereg() +{ + pthread_spin_lock(&dereg_lock); +} + +static void unlock_dereg() +{ + pthread_spin_unlock(&dereg_lock); +} + + static void lock_dreg() { pthread_spin_lock(&dreg_lock); @@ -681,22 +693,85 @@ static void flush_dereg_mrs() { - int i = 0; + unsigned long i, j, k; + unsigned long pagenum_low, pagenum_high; + unsigned long npages, begin, end; + unsigned long user_low_a, user_high_a; + unsigned long pagebase_low_a, pagebase_high_a; + struct dreg_entry *d; + void *addr; - for (; i < n_dereg_mr; ++i) - { - if (deregister_mr_array[i]) - { - if (ibv_dereg_mr(deregister_mr_array[i])) - { - ibv_error_abort(IBV_RETURN_ERR, "deregistration failed\n"); + lock_dereg(); + + for(j = 0; j < n_dereg_mr; j++) { + void *buf; + size_t len; + + buf = deregister_mr_array[j].buf; + len = deregister_mr_array[j].len; + + /* calculate base page address for registration */ + user_low_a = (unsigned long) buf; + user_high_a = user_low_a + (unsigned long) len - 1; + + pagebase_low_a = user_low_a & ~DREG_PAGEMASK; + pagebase_high_a = user_high_a & ~DREG_PAGEMASK; + + /* info to store in hash table */ + pagenum_low = pagebase_low_a >> DREG_PAGEBITS; + pagenum_high = pagebase_high_a >> DREG_PAGEBITS; + npages = 1 + (pagenum_high - pagenum_low); + + /* For every page in this buffer find out whether + * it is registered or not. This is fine, since + * we register only at a page granularity */ + + for (i = 0; i < npages; ++i) { + addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); + begin = ((unsigned long)addr) >> DREG_PAGEBITS; + end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; + + while ((d = dreg_lookup(begin, end)) != NULL) { + if (d->refcount != 0 || d->is_valid == 0) { + /* This memory area is still being referenced + * by other pending MPI operations, which are + * expected to call dreg_unregister and thus + * unpin the buffer. We cannot deregister this + * page, since other ops are pending from here. */ + + /* OR: This memory region is in the process of + * being deregistered. Leave it alone! */ + continue; + } + + d->is_valid = 0; + + for (k = 0; k < rdma_num_hcas; ++k) { + if (d->memhandle[k]) { + if(ibv_dereg_mr(d->memhandle[k])) { + ibv_error_abort(IBV_RETURN_ERR, "deregistration failed\n"); + } + } + + d->memhandle[i] = NULL; + } + + if (d->refcount == 0) { + if (MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { + DREG_REMOVE_FROM_UNUSED_LIST(d); + } + } else { + --d->refcount; + } + + dreg_remove(d); + DREG_ADD_TO_FREE_LIST(d); } } - - deregister_mr_array[i] = NULL; } n_dereg_mr = 0; + unlock_dereg(); } #endif /* !defined(DISABLE_PTMALLOC) */ @@ -1006,106 +1081,20 @@ #if !defined(DISABLE_PTMALLOC) void find_and_free_dregs_inside(void* buf, size_t len) { - unsigned long i = 0; - unsigned long begin; - unsigned long end; - - /* Calculate base page address for registration. */ - unsigned long user_low_a = (unsigned long) buf; - unsigned long user_high_a = user_low_a + (unsigned long) len - 1; - unsigned long pagebase_low_a = user_low_a & ~DREG_PAGEMASK; - unsigned long pagebase_high_a = user_high_a & ~DREG_PAGEMASK; - - /* Info to store in hash table. */ - unsigned long pagenum_low = pagebase_low_a >> DREG_PAGEBITS; - unsigned long pagenum_high = pagebase_high_a >> DREG_PAGEBITS; - unsigned long npages = 1 + (pagenum_high - pagenum_low); - - struct dreg_entry* d = NULL; - void* addr = NULL; - - /* For every page in this buffer find out whether - * it is registered or not. This is fine, since - * we register only at a page granularity */ - if (!g_is_dreg_initialized || !MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { return; } - if (pthread_self() == th_id_of_lock) - { - /* - * This comparison is necessary to distinguish - * between recursive and multi-threaded calls to - * the registration cache. - * - * The recursive calls are possible since - * ibv_dereg_mr calls free after de-registering - * memory regions. However, this free should be - * for a smaller memory region, which is not - * handled by the MPI cache. We shouldn't - * try to do anything more in this routine. - */ - return; - } + lock_dereg(); - lock_dreg(); + deregister_mr_array[n_dereg_mr].buf = buf; + deregister_mr_array[n_dereg_mr].len = len; - for (; i < npages; ++i) - { - addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); - begin = ((unsigned long)addr) >> DREG_PAGEBITS; - end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; + n_dereg_mr++; + unlock_dereg(); - while ((d = dreg_lookup(begin, end)) != NULL) - { - if (d->refcount != 0 || d->is_valid == 0) - { - /* This memory area is still being referenced - * by other pending MPI operations, which are - * expected to call dreg_unregister and thus - * unpin the buffer. We cannot deregister this - * page, since other ops are pending from here. */ - - /* OR: This memory region is in the process of - * being deregistered. Leave it alone! */ - continue; - } - - for (i = 0; i < rdma_num_hcas; ++i) - { - d->is_valid = 0; - - if (d->memhandle[i]) - { - MPIU_Assert(n_dereg_mr < (rdma_ndreg_entries * MAX_NUM_HCAS)); - deregister_mr_array[n_dereg_mr] = d->memhandle[i]; - ++n_dereg_mr; - } - - d->memhandle[i] = NULL; - } - - if (d->refcount == 0) - { - if (MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) - { - DREG_REMOVE_FROM_UNUSED_LIST(d); - } - } - else - { - --d->refcount; - } - - dreg_remove(d); - DREG_ADD_TO_FREE_LIST(d); - } - } - - unlock_dreg(); } #endif /* !defined(DISABLE_PTMALLOC) */ Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-07 20:21:53 UTC (rev 2916) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-08 00:13:35 UTC (rev 2917) @@ -44,6 +44,11 @@ typedef struct dreg_entry dreg_entry; +typedef struct { + void *buf; + size_t len; +} dreg_region; + struct dreg_entry { unsigned long pagenum; struct ibv_mr *memhandle[MAX_NUM_HCAS]; Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-07 20:21:53 UTC (rev 2916) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-08 00:13:35 UTC (rev 2917) @@ -93,13 +93,14 @@ #ifndef DISABLE_PTMALLOC static pthread_spinlock_t dreg_lock = 0; +static pthread_spinlock_t dereg_lock = 0; static pthread_t th_id_of_lock = -1; /* Array which stores the memory regions * ptrs which are to be deregistered after * free hook pulls them out of the reg cache */ -static VIP_MEM_HANDLE **deregister_mr_array; +static dreg_region *deregister_mr_array; /* Number of pending deregistration * operations @@ -592,20 +593,21 @@ #ifndef DISABLE_PTMALLOC pthread_spin_init(&dreg_lock, 0); + pthread_spin_init(&dereg_lock, 0); - deregister_mr_array = (VIP_MEM_HANDLE **) - MPIU_Malloc(sizeof(VIP_MEM_HANDLE *) * + deregister_mr_array = (dreg_region *) + MPIU_Malloc(sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); if(NULL == deregister_mr_array) { udapl_error_abort(GEN_EXIT_ERR, "dreg_init: unable to malloc %d bytes", - (int) sizeof(VIP_MEM_HANDLE *) * + (int) sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); } memset(deregister_mr_array, 0, - sizeof(VIP_MEM_HANDLE *) * + sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); @@ -620,6 +622,16 @@ #ifndef DISABLE_PTMALLOC +static void lock_dereg() +{ + pthread_spin_lock(&dereg_lock); +} + +static void unlock_dereg() +{ + pthread_spin_unlock(&dereg_lock); +} + static void lock_dreg() { pthread_spin_lock(&dreg_lock); @@ -642,22 +654,73 @@ static void flush_dereg_mrs() { - int i; + int i, j; + unsigned long pagenum_low, pagenum_high; + unsigned long npages, begin, end; + unsigned long user_low_a, user_high_a; + unsigned long pagebase_low_a, pagebase_high_a; + struct dreg_entry *d; + void *addr; - for(i = 0; i < n_dereg_mr; i++) { + lock_dereg(); - if(deregister_mr_array[i]) { + /* For every page in this buffer find out whether + * it is registered or not. This is fine, since + * we register only at a page granularity */ - if(dat_lmr_free(deregister_mr_array[i]->hndl)) { - udapl_error_abort(UDAPL_RETURN_ERR, - "deregistration failed\n"); + for(j = 0; j < n_dereg_mr; j++) { + void *buf; + size_t len; + + buf = deregister_mr_array[j].buf; + len = deregister_mr_array[j].len; + + /* calculate base page address for registration */ + user_low_a = (unsigned long) buf; + user_high_a = user_low_a + (unsigned long) len - 1; + + pagebase_low_a = user_low_a & ~DREG_PAGEMASK; + pagebase_high_a = user_high_a & ~DREG_PAGEMASK; + + /* info to store in hash table */ + pagenum_low = pagebase_low_a >> DREG_PAGEBITS; + pagenum_high = pagebase_high_a >> DREG_PAGEBITS; + npages = 1 + (pagenum_high - pagenum_low); + + /* For every page in this buffer find out whether + * it is registered or not. This is fine, since + * we register only at a page granularity */ + + for (i = 0; i < npages; ++i) { + addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); + begin = ((unsigned long)addr) >> DREG_PAGEBITS; + end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; + + while ((d = dreg_lookup(begin, end)) != NULL) { + if (d->refcount != 0 || d->is_valid == 0) { + /* This memory area is still being referenced + * by other pending MPI operations, which are + * expected to call dreg_unregister and thus + * unpin the buffer. We cannot deregister this + * page, since other ops are pending from here. */ + + /* OR: This memory region is in the process of + * being deregistered. Leave it alone! */ + continue; + } + + d->is_valid = 0; + + if(dat_lmr_free(d->memhandle.hndl)) { + udapl_error_abort(UDAPL_RETURN_ERR, + "deregistration failed\n"); + } } } - - deregister_mr_array[i] = NULL; } n_dereg_mr = 0; + unlock_dereg(); } #endif @@ -956,103 +1019,17 @@ #ifndef DISABLE_PTMALLOC void find_and_free_dregs_inside(void *buf, size_t len) { - int i; - unsigned long pagenum_low, pagenum_high; - unsigned long npages, begin, end; - unsigned long user_low_a, user_high_a; - unsigned long pagebase_low_a, pagebase_high_a; - struct dreg_entry *d; - void *addr; - - /* calculate base page address for registration */ - user_low_a = (unsigned long) buf; - user_high_a = user_low_a + (unsigned long) len - 1; - - pagebase_low_a = user_low_a & ~DREG_PAGEMASK; - pagebase_high_a = user_high_a & ~DREG_PAGEMASK; - - /* info to store in hash table */ - pagenum_low = pagebase_low_a >> DREG_PAGEBITS; - pagenum_high = pagebase_high_a >> DREG_PAGEBITS; - npages = 1 + (pagenum_high - pagenum_low); - - /* For every page in this buffer find out whether - * it is registered or not. This is fine, since - * we register only at a page granularity */ - if(!is_dreg_initialized || !MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { return; } - if(pthread_self() == th_id_of_lock) { + lock_dereg(); + deregister_mr_array[n_dereg_mr].buf = buf; + deregister_mr_array[n_dereg_mr].len = len; - /* - * This comparison is necessary to distinguish - * between recursive and multi-threaded calls to - * the registration cache. - * - * The recursive calls are possible since - * dat_lmr_free calls free after de-registering - * memory regions. However, this free should be - * for a smaller memory region, which is not - * handled by the MPI cache. We shouldn't - * try to do anything more in this routine. - */ - - return; - } - - lock_dreg(); - - for(i = 0; i < npages; i++) { - - addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); - - begin = ((unsigned long)addr) >> DREG_PAGEBITS; - - end = ((unsigned long)(((char*)addr) + - DREG_PAGESIZE - 1)) >> DREG_PAGEBITS; - - while( (d = dreg_lookup (begin, end)) != NULL) { - - if((d->refcount != 0) || (d->is_valid == 0)) { - /* This memory area is still being referenced - * by other pending MPI operations, which are - * expected to call dreg_unregister and thus - * unpin the buffer. We cannot deregister this - * page, since other ops are pending from here. */ - - /* OR: This memory region is in the process of - * being deregistered. Leave it alone! */ - continue; - } - - for(i = 0; i < rdma_num_hcas; i++) { - - d->is_valid = 0; - - MPIU_Assert(n_dereg_mr < - (rdma_ndreg_entries * MAX_NUM_HCAS)); - - deregister_mr_array[n_dereg_mr] = &(d->memhandle); - n_dereg_mr++; - - } - - if(d->refcount == 0) { - if(MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { - DREG_REMOVE_FROM_UNUSED_LIST(d); - } - } else { - d->refcount--; - } - - dreg_remove (d); - DREG_ADD_TO_FREE_LIST(d); - } - } - unlock_dreg(); + n_dereg_mr++; + unlock_dereg(); } #endif Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-07 20:21:53 UTC (rev 2916) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-08 00:13:35 UTC (rev 2917) @@ -45,6 +45,11 @@ typedef struct dreg_entry dreg_entry; +typedef struct { + void *buf; + size_t len; +} dreg_region; + struct dreg_entry { unsigned long pagenum; VIP_MEM_HANDLE memhandle; From koop at mvapich.cse.ohio-state.edu Sun Aug 10 18:16:10 2008 From: koop at mvapich.cse.ohio-state.edu (koop@mvapich.cse.ohio-state.edu) Date: Sun Aug 10 18:16:21 2008 Subject: [mvapich-commit] r2923 - in mvapich2/trunk/src/mpid/ch3/channels/mrail/src: gen2 memory memory/ptmalloc2 udapl Message-ID: <200808102216.m7AMGA4C032734@mvapich.cse.ohio-state.edu> Author: koop Date: 2008-08-10 18:16:09 -0400 (Sun, 10 Aug 2008) New Revision: 2923 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h Log: * Check the free'd addresses whenever it reaches above a threshold (20). This prevents an overrun of addresses. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-08 04:57:32 UTC (rev 2922) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-10 22:16:09 UTC (rev 2923) @@ -34,9 +34,11 @@ #include #include "dreg.h" +#include "mem_hooks.h" #include "avl.h" #include "rdma_impl.h" #include "mpiutil.h" +#include "assert.h" #undef DEBUG_PRINT #if defined(DEBUG) @@ -106,14 +108,6 @@ */ static dreg_region *deregister_mr_array; -/* Number of pending deregistration - * operations - * Note: This number can never exceed - * the total number of reg. cache - * entries - */ -static int n_dereg_mr; - /* Keep free list of VMA data structs * and entries */ static vma_t vma_free_list; @@ -651,7 +645,7 @@ } memset(deregister_mr_array, 0, sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); - n_dereg_mr = 0; + mvapich2_minfo.n_dereg_mr = 0; INIT_FREE_LIST(&vma_free_list); INIT_FREE_LIST(&entry_free_list); @@ -703,7 +697,7 @@ lock_dereg(); - for(j = 0; j < n_dereg_mr; j++) { + for(j = 0; j < mvapich2_minfo.n_dereg_mr; j++) { void *buf; size_t len; @@ -770,9 +764,17 @@ } } - n_dereg_mr = 0; + mvapich2_minfo.n_dereg_mr = 0; unlock_dereg(); } + +void flush_dereg_mrs_lock() +{ + lock_dreg(); + flush_dereg_mrs(); + unlock_dreg(); +} + #endif /* !defined(DISABLE_PTMALLOC) */ /* will return a NULL pointer if registration fails */ @@ -782,7 +784,9 @@ #if !defined(DISABLE_PTMALLOC) lock_dreg(); - flush_dereg_mrs(); + if(mvapich2_minfo.n_dereg_mr) { + flush_dereg_mrs(); + } #endif /* !defined(DISABLE_PTMALLOC) */ struct dreg_entry* d = dreg_find(buf, len); @@ -1089,10 +1093,12 @@ lock_dereg(); - deregister_mr_array[n_dereg_mr].buf = buf; - deregister_mr_array[n_dereg_mr].len = len; + assert(rdma_ndreg_entries * MAX_NUM_HCAS > mvapich2_minfo.n_dereg_mr); - n_dereg_mr++; + deregister_mr_array[mvapich2_minfo.n_dereg_mr].buf = buf; + deregister_mr_array[mvapich2_minfo.n_dereg_mr].len = len; + + mvapich2_minfo.n_dereg_mr++; unlock_dereg(); } Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-08 04:57:32 UTC (rev 2922) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-10 22:16:09 UTC (rev 2923) @@ -245,6 +245,7 @@ #ifndef DISABLE_PTMALLOC void find_and_free_dregs_inside(void *buf, size_t len); +void flush_dereg_mrs_lock(); #endif #ifdef CKPT Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c 2008-08-08 04:57:32 UTC (rev 2922) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c 2008-08-10 22:16:09 UTC (rev 2923) @@ -36,6 +36,11 @@ } #endif /* !defined(DISABLE_MUNMAP_HOOK) */ +void mvapich2_mem_flush() +{ + flush_dereg_mrs_lock(); +} + void mvapich2_mem_unhook(void *ptr, size_t size) { if((size > 0) && Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-08 04:57:32 UTC (rev 2922) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-10 22:16:09 UTC (rev 2923) @@ -19,7 +19,7 @@ #define HEAP_MIN_SIZE (32*1024) #ifndef HEAP_MAX_SIZE -#define HEAP_MAX_SIZE (4*1024*1024) /* must be a power of two */ +#define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */ #endif /* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c 2008-08-08 04:57:32 UTC (rev 2922) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c 2008-08-10 22:16:09 UTC (rev 2923) @@ -218,6 +218,11 @@ /* <_OSU_MVAPICH_> */ #include "mpidi_ch3i_rdma_conf.h" #if !defined(DISABLE_PTMALLOC) +#define FLUSH_DREG() { \ + if (mvapich2_minfo.n_dereg_mr > 20) { \ + mvapich2_mem_flush(); \ + } \ +} /* */ /* @@ -3415,6 +3420,7 @@ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_malloc = 1; + FLUSH_DREG(); /* */ return victim; @@ -3438,6 +3444,7 @@ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_free = 1; + FLUSH_DREG(); #if defined(DISABLE_TRAP_SBRK) if (!mvapich2_minfo.is_inside_free) @@ -3498,6 +3505,7 @@ return (*hook)(oldmem, bytes, RETURN_ADDRESS (0)); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_realloc = 1; + FLUSH_DREG(); /* */ #if REALLOC_ZERO_BYTES_FREES @@ -3522,6 +3530,7 @@ if(newp) /* <_OSU_MVAPICH_> */ { + FLUSH_DREG(); mvapich2_minfo.is_our_realloc = 1; /* */ return chunk2mem(newp); @@ -3533,6 +3542,7 @@ if(oldsize - SIZE_SZ >= nb) /* <_OSU_MVAPICH_> */ { + FLUSH_DREG(); mvapich2_minfo.is_our_realloc = 1; /* */ return oldmem; /* do nothing */ @@ -3546,6 +3556,7 @@ munmap_chunk(oldp); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_realloc = 1; + FLUSH_DREG(); /* */ return newmem; } @@ -3575,6 +3586,7 @@ ar_ptr == arena_for_chunk(mem2chunk(newp))); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_realloc = 1; + FLUSH_DREG(); /* */ return newp; } @@ -3626,6 +3638,7 @@ ar_ptr == arena_for_chunk(mem2chunk(p))); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_memalign = 1; + FLUSH_DREG(); /* */ return p; } @@ -3648,6 +3661,7 @@ (void)mutex_unlock(&ar_ptr->mutex); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_valloc = 1; + FLUSH_DREG(); /* */ return p; } @@ -3665,6 +3679,7 @@ (void)mutex_unlock(&ar_ptr->mutex); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_valloc = 1; + FLUSH_DREG(); /* */ return p; } @@ -3704,6 +3719,7 @@ while(sz > 0) ((char*)mem)[--sz] = 0; /* rather inefficient */ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_calloc = 1; + FLUSH_DREG(); /* */ return mem; #endif @@ -3766,6 +3782,7 @@ if (chunk_is_mmapped(p)) /* <_OSU_MVAPICH_> */ { + FLUSH_DREG(); mvapich2_minfo.is_our_calloc = 1; /* */ return mem; @@ -3814,6 +3831,7 @@ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_calloc = 1; + FLUSH_DREG(); /* */ return mem; } @@ -3832,6 +3850,7 @@ (void)mutex_unlock(&ar_ptr->mutex); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_calloc = 1; + FLUSH_DREG(); /* */ return m; } Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-08 04:57:32 UTC (rev 2922) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-10 22:16:09 UTC (rev 2923) @@ -33,6 +33,7 @@ #include #include "dreg.h" +#include "mem_hooks.h" #include "avl.h" #include "rdma_impl.h" #include "udapl_util.h" @@ -102,14 +103,6 @@ */ static dreg_region *deregister_mr_array; -/* Number of pending deregistration - * operations - * Note: This number can never exceed - * the total number of reg. cache - * entries - */ -static int n_dereg_mr; - /* Keep free list of VMA data structs * and entries */ static vma_t vma_free_list; @@ -611,7 +604,7 @@ rdma_ndreg_entries * MAX_NUM_HCAS); - n_dereg_mr = 0; + mvapich2_minfo.n_dereg_mr = 0; INIT_FREE_LIST(&vma_free_list); @@ -668,7 +661,7 @@ * it is registered or not. This is fine, since * we register only at a page granularity */ - for(j = 0; j < n_dereg_mr; j++) { + for(j = 0; j < mvapich2_minfo.n_dereg_mr; j++) { void *buf; size_t len; @@ -719,9 +712,15 @@ } } - n_dereg_mr = 0; + mvapich2_minfo.n_dereg_mr = 0; unlock_dereg(); } + +void flush_dereg_mrs_lock() { + lock_dreg(); + flush_dereg_mrs(); + unlock_dreg(); +} #endif /* will return a NULL pointer if registration fails */ @@ -1025,10 +1024,10 @@ } lock_dereg(); - deregister_mr_array[n_dereg_mr].buf = buf; - deregister_mr_array[n_dereg_mr].len = len; + deregister_mr_array[mvapich2_minfo.n_dereg_mr].buf = buf; + deregister_mr_array[mvapich2_minfo.n_dereg_mr].len = len; - n_dereg_mr++; + mvapich2_minfo.n_dereg_mr++; unlock_dereg(); } #endif Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-08 04:57:32 UTC (rev 2922) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-10 22:16:09 UTC (rev 2923) @@ -250,6 +250,7 @@ #ifndef DISABLE_PTMALLOC void find_and_free_dregs_inside(void *buf, size_t len); +void flush_dereg_mrs_lock(); #endif #ifdef CKPT From koop at mvapich.cse.ohio-state.edu Sun Aug 10 19:44:33 2008 From: koop at mvapich.cse.ohio-state.edu (koop@mvapich.cse.ohio-state.edu) Date: Sun Aug 10 19:44:43 2008 Subject: [mvapich-commit] r2924 - mvapich2/trunk/src/mpid/ch3/channels/mrail/include Message-ID: <200808102344.m7ANiXYF000411@mvapich.cse.ohio-state.edu> Author: koop Date: 2008-08-10 19:44:32 -0400 (Sun, 10 Aug 2008) New Revision: 2924 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h Log: * Add changes to include file to support last check in Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h 2008-08-10 22:16:09 UTC (rev 2923) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h 2008-08-10 23:44:32 UTC (rev 2924) @@ -28,6 +28,7 @@ #endif /* ifndef DISABLE_MUNMAP_HOOK */ typedef struct { + int n_dereg_mr; int is_our_malloc; int is_our_free; int is_our_calloc; @@ -44,6 +45,7 @@ mvapich2_malloc_info_t mvapich2_minfo; void mvapich2_mem_unhook(void *mem, size_t size); +void mvapich2_mem_flush(); int mvapich2_minit(void); void mvapich2_mfin(void); From curtisbr at mvapich.cse.ohio-state.edu Mon Aug 11 13:51:23 2008 From: curtisbr at mvapich.cse.ohio-state.edu (curtisbr@mvapich.cse.ohio-state.edu) Date: Mon Aug 11 13:51:34 2008 Subject: [mvapich-commit] r2928 - mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma Message-ID: <200808111751.m7BHpNd1009160@mvapich.cse.ohio-state.edu> Author: curtisbr Date: 2008-08-11 13:51:23 -0400 (Mon, 11 Aug 2008) New Revision: 2928 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_progress.c Log: Change state to reflect function name (trac 264). Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_progress.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_progress.c 2008-08-11 17:50:03 UTC (rev 2927) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_progress.c 2008-08-11 17:51:23 UTC (rev 2928) @@ -312,10 +312,10 @@ { int mpi_errno = MPI_SUCCESS; - MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_PROGRESS); + MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_PROGRESS_TEST); MPIDI_STATE_DECL(MPID_STATE_MPIDU_YIELD); - MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_PROGRESS); + MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_PROGRESS_TEST); #if defined(CKPT) MPIDI_CH3I_CR_lock(); @@ -427,7 +427,7 @@ MPIDI_CH3I_CR_unlock(); #endif /* defined(CKPT) */ - MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_PROGRESS); + MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_PROGRESS_TEST); DEBUG_PRINT("Exiting ch3 progress test\n"); return mpi_errno; } From kumarra at mvapich.cse.ohio-state.edu Thu Aug 14 10:09:38 2008 From: kumarra at mvapich.cse.ohio-state.edu (kumarra@mvapich.cse.ohio-state.edu) Date: Thu Aug 14 10:09:49 2008 Subject: [mvapich-commit] r2932 - mvapich2/trunk/src/mpi/coll Message-ID: <200808141409.m7EE9cpd008363@mvapich.cse.ohio-state.edu> Author: kumarra Date: 2008-08-14 10:09:35 -0400 (Thu, 14 Aug 2008) New Revision: 2932 Modified: mvapich2/trunk/src/mpi/coll/bcast.c Log: The datatypes of different processes taking part in broadcast collective can be different. shmem-bcast did not handle this scenario. Modified: mvapich2/trunk/src/mpi/coll/bcast.c =================================================================== --- mvapich2/trunk/src/mpi/coll/bcast.c 2008-08-14 02:08:22 UTC (rev 2931) +++ mvapich2/trunk/src/mpi/coll/bcast.c 2008-08-14 14:09:35 UTC (rev 2932) @@ -264,9 +264,14 @@ } } #if defined(_OSU_MVAPICH_) - else if (enable_shmem_collectives && (comm_ptr->shmem_coll_ok == 1) && (nbytes < shmem_bcast_threshold) - && is_contig && is_homogeneous && enable_shmem_bcast){ - mpi_errno = intra_shmem_Bcast_Large(buffer, count, datatype, nbytes, root, comm_ptr); + else if (enable_shmem_collectives && (comm_ptr->shmem_coll_ok == 1) && (nbytes < shmem_bcast_threshold) && enable_shmem_bcast) { + + if( !is_contig || !is_homogeneous) { + mpi_errno = intra_shmem_Bcast_Large(tmp_buf, nbytes, MPI_BYTE, nbytes, root, comm_ptr); + } else { + mpi_errno = intra_shmem_Bcast_Large(buffer, count, datatype, nbytes, root, comm_ptr); + } + if(mpi_errno == -1) { /* use long message algorithm: binomial tree scatter followed by an allgather */ From kumarra at mvapich.cse.ohio-state.edu Thu Aug 14 10:12:12 2008 From: kumarra at mvapich.cse.ohio-state.edu (kumarra@mvapich.cse.ohio-state.edu) Date: Thu Aug 14 10:12:24 2008 Subject: [mvapich-commit] r2933 - mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma Message-ID: <200808141412.m7EECCAR008388@mvapich.cse.ohio-state.edu> Author: kumarra Date: 2008-08-14 10:12:07 -0400 (Thu, 14 Aug 2008) New Revision: 2933 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_shmem_coll.c Log: Typo mistake in shmem_bcast: replacing %d with %s and printf with fprintf. Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_shmem_coll.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_shmem_coll.c 2008-08-14 14:09:35 UTC (rev 2932) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/rdma/ch3_shmem_coll.c 2008-08-14 14:12:07 UTC (rev 2933) @@ -101,7 +101,7 @@ sprintf(shmem_file, "/tmp/ib_shmem_coll-%s-%s-%d.tmp", pg->ch.kvs_name, hostname, getuid()); - sprintf(bcast_file,"/tmp/ib_shmem_bcast_coll-%d-%s-%d", + sprintf(bcast_file,"/tmp/ib_shmem_bcast_coll-%s-%s-%d", pg->ch.kvs_name, hostname, getuid()); /* open the shared memory file */ @@ -458,7 +458,7 @@ char *buf; buf = (char *) calloc(*bcast_seg_size + 1, sizeof(char)); if (write(*fd, buf, *bcast_seg_size) != *bcast_seg_size) { - printf("[%d] shmem_coll_init:error in writing " "shared memory file: %d\n", my_rank, errno); + fprintf(stderr, "[%d] shmem_coll_init:error in writing " "shared memory file: %d\n", my_rank, errno); MPIU_Free(buf); return 0; } From koop at mvapich.cse.ohio-state.edu Thu Aug 14 11:35:08 2008 From: koop at mvapich.cse.ohio-state.edu (koop@mvapich.cse.ohio-state.edu) Date: Thu Aug 14 11:35:21 2008 Subject: [mvapich-commit] r2934 - in mvapich2/trunk/src/mpid/ch3/channels/mrail: include src/gen2 src/memory src/memory/ptmalloc2 src/udapl Message-ID: <200808141535.m7EFZ8wU008534@mvapich.cse.ohio-state.edu> Author: koop Date: 2008-08-14 11:35:06 -0400 (Thu, 14 Aug 2008) New Revision: 2934 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h Log: * Reverting r2917, 2923, and 2924 Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h 2008-08-14 15:35:06 UTC (rev 2934) @@ -28,7 +28,6 @@ #endif /* ifndef DISABLE_MUNMAP_HOOK */ typedef struct { - int n_dereg_mr; int is_our_malloc; int is_our_free; int is_our_calloc; @@ -45,7 +44,6 @@ mvapich2_malloc_info_t mvapich2_minfo; void mvapich2_mem_unhook(void *mem, size_t size); -void mvapich2_mem_flush(); int mvapich2_minit(void); void mvapich2_mfin(void); Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-14 15:35:06 UTC (rev 2934) @@ -34,11 +34,9 @@ #include #include "dreg.h" -#include "mem_hooks.h" #include "avl.h" #include "rdma_impl.h" #include "mpiutil.h" -#include "assert.h" #undef DEBUG_PRINT #if defined(DEBUG) @@ -99,15 +97,22 @@ #if !defined(DISABLE_PTMALLOC) static pthread_spinlock_t dreg_lock = 0; -static pthread_spinlock_t dereg_lock = 0; static pthread_t th_id_of_lock; /* Array which stores the memory regions * ptrs which are to be deregistered after * free hook pulls them out of the reg cache */ -static dreg_region *deregister_mr_array; +static struct ibv_mr** deregister_mr_array; +/* Number of pending deregistration + * operations + * Note: This number can never exceed + * the total number of reg. cache + * entries + */ +static int n_dereg_mr; + /* Keep free list of VMA data structs * and entries */ static vma_t vma_free_list; @@ -633,19 +638,19 @@ #if !defined(DISABLE_PTMALLOC) pthread_spin_init(&dreg_lock, 0); - pthread_spin_init(&dereg_lock, 0); - deregister_mr_array = (dreg_region*) MPIU_Malloc(sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); + deregister_mr_array = (struct ibv_mr**) MPIU_Malloc(sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); - if (NULL == deregister_mr_array) { + if (NULL == deregister_mr_array) + { ibv_va_error_abort( GEN_EXIT_ERR, "dreg_init: unable to malloc %d bytes", - (int) sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); + (int) sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); } - memset(deregister_mr_array, 0, sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); - mvapich2_minfo.n_dereg_mr = 0; + memset(deregister_mr_array, 0, sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); + n_dereg_mr = 0; INIT_FREE_LIST(&vma_free_list); INIT_FREE_LIST(&entry_free_list); @@ -654,17 +659,6 @@ #if !defined(DISABLE_PTMALLOC) -static void lock_dereg() -{ - pthread_spin_lock(&dereg_lock); -} - -static void unlock_dereg() -{ - pthread_spin_unlock(&dereg_lock); -} - - static void lock_dreg() { pthread_spin_lock(&dreg_lock); @@ -687,94 +681,23 @@ static void flush_dereg_mrs() { - unsigned long i, j, k; - unsigned long pagenum_low, pagenum_high; - unsigned long npages, begin, end; - unsigned long user_low_a, user_high_a; - unsigned long pagebase_low_a, pagebase_high_a; - struct dreg_entry *d; - void *addr; + int i = 0; - lock_dereg(); - - for(j = 0; j < mvapich2_minfo.n_dereg_mr; j++) { - void *buf; - size_t len; - - buf = deregister_mr_array[j].buf; - len = deregister_mr_array[j].len; - - /* calculate base page address for registration */ - user_low_a = (unsigned long) buf; - user_high_a = user_low_a + (unsigned long) len - 1; - - pagebase_low_a = user_low_a & ~DREG_PAGEMASK; - pagebase_high_a = user_high_a & ~DREG_PAGEMASK; - - /* info to store in hash table */ - pagenum_low = pagebase_low_a >> DREG_PAGEBITS; - pagenum_high = pagebase_high_a >> DREG_PAGEBITS; - npages = 1 + (pagenum_high - pagenum_low); - - /* For every page in this buffer find out whether - * it is registered or not. This is fine, since - * we register only at a page granularity */ - - for (i = 0; i < npages; ++i) { - addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); - begin = ((unsigned long)addr) >> DREG_PAGEBITS; - end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; - - while ((d = dreg_lookup(begin, end)) != NULL) { - if (d->refcount != 0 || d->is_valid == 0) { - /* This memory area is still being referenced - * by other pending MPI operations, which are - * expected to call dreg_unregister and thus - * unpin the buffer. We cannot deregister this - * page, since other ops are pending from here. */ - - /* OR: This memory region is in the process of - * being deregistered. Leave it alone! */ - continue; - } - - d->is_valid = 0; - - for (k = 0; k < rdma_num_hcas; ++k) { - if (d->memhandle[k]) { - if(ibv_dereg_mr(d->memhandle[k])) { - ibv_error_abort(IBV_RETURN_ERR, "deregistration failed\n"); - } - } - - d->memhandle[i] = NULL; - } - - if (d->refcount == 0) { - if (MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { - DREG_REMOVE_FROM_UNUSED_LIST(d); - } - } else { - --d->refcount; - } - - dreg_remove(d); - DREG_ADD_TO_FREE_LIST(d); + for (; i < n_dereg_mr; ++i) + { + if (deregister_mr_array[i]) + { + if (ibv_dereg_mr(deregister_mr_array[i])) + { + ibv_error_abort(IBV_RETURN_ERR, "deregistration failed\n"); } } + + deregister_mr_array[i] = NULL; } - mvapich2_minfo.n_dereg_mr = 0; - unlock_dereg(); + n_dereg_mr = 0; } - -void flush_dereg_mrs_lock() -{ - lock_dreg(); - flush_dereg_mrs(); - unlock_dreg(); -} - #endif /* !defined(DISABLE_PTMALLOC) */ /* will return a NULL pointer if registration fails */ @@ -784,9 +707,7 @@ #if !defined(DISABLE_PTMALLOC) lock_dreg(); - if(mvapich2_minfo.n_dereg_mr) { - flush_dereg_mrs(); - } + flush_dereg_mrs(); #endif /* !defined(DISABLE_PTMALLOC) */ struct dreg_entry* d = dreg_find(buf, len); @@ -1085,22 +1006,106 @@ #if !defined(DISABLE_PTMALLOC) void find_and_free_dregs_inside(void* buf, size_t len) { + unsigned long i = 0; + unsigned long begin; + unsigned long end; + + /* Calculate base page address for registration. */ + unsigned long user_low_a = (unsigned long) buf; + unsigned long user_high_a = user_low_a + (unsigned long) len - 1; + unsigned long pagebase_low_a = user_low_a & ~DREG_PAGEMASK; + unsigned long pagebase_high_a = user_high_a & ~DREG_PAGEMASK; + + /* Info to store in hash table. */ + unsigned long pagenum_low = pagebase_low_a >> DREG_PAGEBITS; + unsigned long pagenum_high = pagebase_high_a >> DREG_PAGEBITS; + unsigned long npages = 1 + (pagenum_high - pagenum_low); + + struct dreg_entry* d = NULL; + void* addr = NULL; + + /* For every page in this buffer find out whether + * it is registered or not. This is fine, since + * we register only at a page granularity */ + if (!g_is_dreg_initialized || !MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { return; } - lock_dereg(); + if (pthread_self() == th_id_of_lock) + { + /* + * This comparison is necessary to distinguish + * between recursive and multi-threaded calls to + * the registration cache. + * + * The recursive calls are possible since + * ibv_dereg_mr calls free after de-registering + * memory regions. However, this free should be + * for a smaller memory region, which is not + * handled by the MPI cache. We shouldn't + * try to do anything more in this routine. + */ + return; + } - assert(rdma_ndreg_entries * MAX_NUM_HCAS > mvapich2_minfo.n_dereg_mr); + lock_dreg(); - deregister_mr_array[mvapich2_minfo.n_dereg_mr].buf = buf; - deregister_mr_array[mvapich2_minfo.n_dereg_mr].len = len; + for (; i < npages; ++i) + { + addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); + begin = ((unsigned long)addr) >> DREG_PAGEBITS; + end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; - mvapich2_minfo.n_dereg_mr++; - unlock_dereg(); + while ((d = dreg_lookup(begin, end)) != NULL) + { + if (d->refcount != 0 || d->is_valid == 0) + { + /* This memory area is still being referenced + * by other pending MPI operations, which are + * expected to call dreg_unregister and thus + * unpin the buffer. We cannot deregister this + * page, since other ops are pending from here. */ + /* OR: This memory region is in the process of + * being deregistered. Leave it alone! */ + continue; + } + + for (i = 0; i < rdma_num_hcas; ++i) + { + d->is_valid = 0; + + if (d->memhandle[i]) + { + MPIU_Assert(n_dereg_mr < (rdma_ndreg_entries * MAX_NUM_HCAS)); + deregister_mr_array[n_dereg_mr] = d->memhandle[i]; + ++n_dereg_mr; + } + + d->memhandle[i] = NULL; + } + + if (d->refcount == 0) + { + if (MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) + { + DREG_REMOVE_FROM_UNUSED_LIST(d); + } + } + else + { + --d->refcount; + } + + dreg_remove(d); + DREG_ADD_TO_FREE_LIST(d); + } + } + + unlock_dreg(); } #endif /* !defined(DISABLE_PTMALLOC) */ Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-14 15:35:06 UTC (rev 2934) @@ -44,11 +44,6 @@ typedef struct dreg_entry dreg_entry; -typedef struct { - void *buf; - size_t len; -} dreg_region; - struct dreg_entry { unsigned long pagenum; struct ibv_mr *memhandle[MAX_NUM_HCAS]; @@ -245,7 +240,6 @@ #ifndef DISABLE_PTMALLOC void find_and_free_dregs_inside(void *buf, size_t len); -void flush_dereg_mrs_lock(); #endif #ifdef CKPT Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c 2008-08-14 15:35:06 UTC (rev 2934) @@ -36,11 +36,6 @@ } #endif /* !defined(DISABLE_MUNMAP_HOOK) */ -void mvapich2_mem_flush() -{ - flush_dereg_mrs_lock(); -} - void mvapich2_mem_unhook(void *ptr, size_t size) { if((size > 0) && Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-14 15:35:06 UTC (rev 2934) @@ -19,7 +19,7 @@ #define HEAP_MIN_SIZE (32*1024) #ifndef HEAP_MAX_SIZE -#define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */ +#define HEAP_MAX_SIZE (4*1024*1024) /* must be a power of two */ #endif /* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c 2008-08-14 15:35:06 UTC (rev 2934) @@ -218,11 +218,6 @@ /* <_OSU_MVAPICH_> */ #include "mpidi_ch3i_rdma_conf.h" #if !defined(DISABLE_PTMALLOC) -#define FLUSH_DREG() { \ - if (mvapich2_minfo.n_dereg_mr > 20) { \ - mvapich2_mem_flush(); \ - } \ -} /* */ /* @@ -3420,7 +3415,6 @@ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_malloc = 1; - FLUSH_DREG(); /* */ return victim; @@ -3444,7 +3438,6 @@ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_free = 1; - FLUSH_DREG(); #if defined(DISABLE_TRAP_SBRK) if (!mvapich2_minfo.is_inside_free) @@ -3505,7 +3498,6 @@ return (*hook)(oldmem, bytes, RETURN_ADDRESS (0)); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_realloc = 1; - FLUSH_DREG(); /* */ #if REALLOC_ZERO_BYTES_FREES @@ -3530,7 +3522,6 @@ if(newp) /* <_OSU_MVAPICH_> */ { - FLUSH_DREG(); mvapich2_minfo.is_our_realloc = 1; /* */ return chunk2mem(newp); @@ -3542,7 +3533,6 @@ if(oldsize - SIZE_SZ >= nb) /* <_OSU_MVAPICH_> */ { - FLUSH_DREG(); mvapich2_minfo.is_our_realloc = 1; /* */ return oldmem; /* do nothing */ @@ -3556,7 +3546,6 @@ munmap_chunk(oldp); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_realloc = 1; - FLUSH_DREG(); /* */ return newmem; } @@ -3586,7 +3575,6 @@ ar_ptr == arena_for_chunk(mem2chunk(newp))); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_realloc = 1; - FLUSH_DREG(); /* */ return newp; } @@ -3638,7 +3626,6 @@ ar_ptr == arena_for_chunk(mem2chunk(p))); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_memalign = 1; - FLUSH_DREG(); /* */ return p; } @@ -3661,7 +3648,6 @@ (void)mutex_unlock(&ar_ptr->mutex); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_valloc = 1; - FLUSH_DREG(); /* */ return p; } @@ -3679,7 +3665,6 @@ (void)mutex_unlock(&ar_ptr->mutex); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_valloc = 1; - FLUSH_DREG(); /* */ return p; } @@ -3719,7 +3704,6 @@ while(sz > 0) ((char*)mem)[--sz] = 0; /* rather inefficient */ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_calloc = 1; - FLUSH_DREG(); /* */ return mem; #endif @@ -3782,7 +3766,6 @@ if (chunk_is_mmapped(p)) /* <_OSU_MVAPICH_> */ { - FLUSH_DREG(); mvapich2_minfo.is_our_calloc = 1; /* */ return mem; @@ -3831,7 +3814,6 @@ /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_calloc = 1; - FLUSH_DREG(); /* */ return mem; } @@ -3850,7 +3832,6 @@ (void)mutex_unlock(&ar_ptr->mutex); /* <_OSU_MVAPICH_> */ mvapich2_minfo.is_our_calloc = 1; - FLUSH_DREG(); /* */ return m; } Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-14 15:35:06 UTC (rev 2934) @@ -33,7 +33,6 @@ #include #include "dreg.h" -#include "mem_hooks.h" #include "avl.h" #include "rdma_impl.h" #include "udapl_util.h" @@ -94,15 +93,22 @@ #ifndef DISABLE_PTMALLOC static pthread_spinlock_t dreg_lock = 0; -static pthread_spinlock_t dereg_lock = 0; static pthread_t th_id_of_lock = -1; /* Array which stores the memory regions * ptrs which are to be deregistered after * free hook pulls them out of the reg cache */ -static dreg_region *deregister_mr_array; +static VIP_MEM_HANDLE **deregister_mr_array; +/* Number of pending deregistration + * operations + * Note: This number can never exceed + * the total number of reg. cache + * entries + */ +static int n_dereg_mr; + /* Keep free list of VMA data structs * and entries */ static vma_t vma_free_list; @@ -586,25 +592,24 @@ #ifndef DISABLE_PTMALLOC pthread_spin_init(&dreg_lock, 0); - pthread_spin_init(&dereg_lock, 0); - deregister_mr_array = (dreg_region *) - MPIU_Malloc(sizeof(dreg_region) * + deregister_mr_array = (VIP_MEM_HANDLE **) + MPIU_Malloc(sizeof(VIP_MEM_HANDLE *) * rdma_ndreg_entries * MAX_NUM_HCAS); if(NULL == deregister_mr_array) { udapl_error_abort(GEN_EXIT_ERR, "dreg_init: unable to malloc %d bytes", - (int) sizeof(dreg_region) * + (int) sizeof(VIP_MEM_HANDLE *) * rdma_ndreg_entries * MAX_NUM_HCAS); } memset(deregister_mr_array, 0, - sizeof(dreg_region) * + sizeof(VIP_MEM_HANDLE *) * rdma_ndreg_entries * MAX_NUM_HCAS); - mvapich2_minfo.n_dereg_mr = 0; + n_dereg_mr = 0; INIT_FREE_LIST(&vma_free_list); @@ -615,16 +620,6 @@ #ifndef DISABLE_PTMALLOC -static void lock_dereg() -{ - pthread_spin_lock(&dereg_lock); -} - -static void unlock_dereg() -{ - pthread_spin_unlock(&dereg_lock); -} - static void lock_dreg() { pthread_spin_lock(&dreg_lock); @@ -647,80 +642,23 @@ static void flush_dereg_mrs() { - int i, j; - unsigned long pagenum_low, pagenum_high; - unsigned long npages, begin, end; - unsigned long user_low_a, user_high_a; - unsigned long pagebase_low_a, pagebase_high_a; - struct dreg_entry *d; - void *addr; + int i; - lock_dereg(); + for(i = 0; i < n_dereg_mr; i++) { - /* For every page in this buffer find out whether - * it is registered or not. This is fine, since - * we register only at a page granularity */ + if(deregister_mr_array[i]) { - for(j = 0; j < mvapich2_minfo.n_dereg_mr; j++) { - void *buf; - size_t len; - - buf = deregister_mr_array[j].buf; - len = deregister_mr_array[j].len; - - /* calculate base page address for registration */ - user_low_a = (unsigned long) buf; - user_high_a = user_low_a + (unsigned long) len - 1; - - pagebase_low_a = user_low_a & ~DREG_PAGEMASK; - pagebase_high_a = user_high_a & ~DREG_PAGEMASK; - - /* info to store in hash table */ - pagenum_low = pagebase_low_a >> DREG_PAGEBITS; - pagenum_high = pagebase_high_a >> DREG_PAGEBITS; - npages = 1 + (pagenum_high - pagenum_low); - - /* For every page in this buffer find out whether - * it is registered or not. This is fine, since - * we register only at a page granularity */ - - for (i = 0; i < npages; ++i) { - addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); - begin = ((unsigned long)addr) >> DREG_PAGEBITS; - end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; - - while ((d = dreg_lookup(begin, end)) != NULL) { - if (d->refcount != 0 || d->is_valid == 0) { - /* This memory area is still being referenced - * by other pending MPI operations, which are - * expected to call dreg_unregister and thus - * unpin the buffer. We cannot deregister this - * page, since other ops are pending from here. */ - - /* OR: This memory region is in the process of - * being deregistered. Leave it alone! */ - continue; - } - - d->is_valid = 0; - - if(dat_lmr_free(d->memhandle.hndl)) { - udapl_error_abort(UDAPL_RETURN_ERR, - "deregistration failed\n"); - } + if(dat_lmr_free(deregister_mr_array[i]->hndl)) { + udapl_error_abort(UDAPL_RETURN_ERR, + "deregistration failed\n"); } } + + deregister_mr_array[i] = NULL; } - mvapich2_minfo.n_dereg_mr = 0; - unlock_dereg(); + n_dereg_mr = 0; } - -void flush_dereg_mrs_lock() { - lock_dreg(); - flush_dereg_mrs(); - unlock_dreg(); -} #endif /* will return a NULL pointer if registration fails */ @@ -1018,17 +956,103 @@ #ifndef DISABLE_PTMALLOC void find_and_free_dregs_inside(void *buf, size_t len) { + int i; + unsigned long pagenum_low, pagenum_high; + unsigned long npages, begin, end; + unsigned long user_low_a, user_high_a; + unsigned long pagebase_low_a, pagebase_high_a; + struct dreg_entry *d; + void *addr; + + /* calculate base page address for registration */ + user_low_a = (unsigned long) buf; + user_high_a = user_low_a + (unsigned long) len - 1; + + pagebase_low_a = user_low_a & ~DREG_PAGEMASK; + pagebase_high_a = user_high_a & ~DREG_PAGEMASK; + + /* info to store in hash table */ + pagenum_low = pagebase_low_a >> DREG_PAGEBITS; + pagenum_high = pagebase_high_a >> DREG_PAGEBITS; + npages = 1 + (pagenum_high - pagenum_low); + + /* For every page in this buffer find out whether + * it is registered or not. This is fine, since + * we register only at a page granularity */ + if(!is_dreg_initialized || !MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { return; } - lock_dereg(); - deregister_mr_array[mvapich2_minfo.n_dereg_mr].buf = buf; - deregister_mr_array[mvapich2_minfo.n_dereg_mr].len = len; + if(pthread_self() == th_id_of_lock) { - mvapich2_minfo.n_dereg_mr++; - unlock_dereg(); + /* + * This comparison is necessary to distinguish + * between recursive and multi-threaded calls to + * the registration cache. + * + * The recursive calls are possible since + * dat_lmr_free calls free after de-registering + * memory regions. However, this free should be + * for a smaller memory region, which is not + * handled by the MPI cache. We shouldn't + * try to do anything more in this routine. + */ + + return; + } + + lock_dreg(); + + for(i = 0; i < npages; i++) { + + addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); + + begin = ((unsigned long)addr) >> DREG_PAGEBITS; + + end = ((unsigned long)(((char*)addr) + + DREG_PAGESIZE - 1)) >> DREG_PAGEBITS; + + while( (d = dreg_lookup (begin, end)) != NULL) { + + if((d->refcount != 0) || (d->is_valid == 0)) { + /* This memory area is still being referenced + * by other pending MPI operations, which are + * expected to call dreg_unregister and thus + * unpin the buffer. We cannot deregister this + * page, since other ops are pending from here. */ + + /* OR: This memory region is in the process of + * being deregistered. Leave it alone! */ + continue; + } + + for(i = 0; i < rdma_num_hcas; i++) { + + d->is_valid = 0; + + MPIU_Assert(n_dereg_mr < + (rdma_ndreg_entries * MAX_NUM_HCAS)); + + deregister_mr_array[n_dereg_mr] = &(d->memhandle); + n_dereg_mr++; + + } + + if(d->refcount == 0) { + if(MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { + DREG_REMOVE_FROM_UNUSED_LIST(d); + } + } else { + d->refcount--; + } + + dreg_remove (d); + DREG_ADD_TO_FREE_LIST(d); + } + } + unlock_dreg(); } #endif Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h =================================================================== --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-14 14:12:07 UTC (rev 2933) +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-14 15:35:06 UTC (rev 2934) @@ -45,11 +45,6 @@ typedef struct dreg_entry dreg_entry; -typedef struct { - void *buf; - size_t len; -} dreg_region; - struct dreg_entry { unsigned long pagenum; VIP_MEM_HANDLE memhandle; @@ -250,7 +245,6 @@ #ifndef DISABLE_PTMALLOC void find_and_free_dregs_inside(void *buf, size_t len); -void flush_dereg_mrs_lock(); #endif #ifdef CKPT From koop at cse.ohio-state.edu Thu Aug 14 11:56:42 2008 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Thu Aug 14 11:56:52 2008 Subject: [mvapich-commit] r2934 - in mvapich2/trunk/src/mpid/ch3/channels/mrail: include src/gen2 src/memory src/memory/ptmalloc2 src/udapl In-Reply-To: <200808141535.m7EFZ8wU008534@mvapich.cse.ohio-state.edu> Message-ID: Not sure why this didn't come through until now, but the changes are reverted (as of 30 minutes ago). I'll also back out the changes from exp2. Matt On Thu, 14 Aug 2008 koop@mvapich.cse.ohio-state.edu wrote: > Author: koop > Date: 2008-08-14 11:35:06 -0400 (Thu, 14 Aug 2008) > New Revision: 2934 > > Modified: > mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h > mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c > mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h > mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c > mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c > mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c > mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c > mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h > Log: > > * Reverting r2917, 2923, and 2924 > > > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/include/mem_hooks.h 2008-08-14 15:35:06 UTC (rev 2934) > @@ -28,7 +28,6 @@ > #endif /* ifndef DISABLE_MUNMAP_HOOK */ > > typedef struct { > - int n_dereg_mr; > int is_our_malloc; > int is_our_free; > int is_our_calloc; > @@ -45,7 +44,6 @@ > mvapich2_malloc_info_t mvapich2_minfo; > > void mvapich2_mem_unhook(void *mem, size_t size); > -void mvapich2_mem_flush(); > int mvapich2_minit(void); > void mvapich2_mfin(void); > > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.c 2008-08-14 15:35:06 UTC (rev 2934) > @@ -34,11 +34,9 @@ > #include > > #include "dreg.h" > -#include "mem_hooks.h" > #include "avl.h" > #include "rdma_impl.h" > #include "mpiutil.h" > -#include "assert.h" > > #undef DEBUG_PRINT > #if defined(DEBUG) > @@ -99,15 +97,22 @@ > > #if !defined(DISABLE_PTMALLOC) > static pthread_spinlock_t dreg_lock = 0; > -static pthread_spinlock_t dereg_lock = 0; > static pthread_t th_id_of_lock; > > /* Array which stores the memory regions > * ptrs which are to be deregistered after > * free hook pulls them out of the reg cache > */ > -static dreg_region *deregister_mr_array; > +static struct ibv_mr** deregister_mr_array; > > +/* Number of pending deregistration > + * operations > + * Note: This number can never exceed > + * the total number of reg. cache > + * entries > + */ > +static int n_dereg_mr; > + > /* Keep free list of VMA data structs > * and entries */ > static vma_t vma_free_list; > @@ -633,19 +638,19 @@ > > #if !defined(DISABLE_PTMALLOC) > pthread_spin_init(&dreg_lock, 0); > - pthread_spin_init(&dereg_lock, 0); > > - deregister_mr_array = (dreg_region*) MPIU_Malloc(sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); > + deregister_mr_array = (struct ibv_mr**) MPIU_Malloc(sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); > > - if (NULL == deregister_mr_array) { > + if (NULL == deregister_mr_array) > + { > ibv_va_error_abort( > GEN_EXIT_ERR, > "dreg_init: unable to malloc %d bytes", > - (int) sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); > + (int) sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); > } > > - memset(deregister_mr_array, 0, sizeof(dreg_region) * rdma_ndreg_entries * MAX_NUM_HCAS); > - mvapich2_minfo.n_dereg_mr = 0; > + memset(deregister_mr_array, 0, sizeof(struct ibv_mr*) * rdma_ndreg_entries * MAX_NUM_HCAS); > + n_dereg_mr = 0; > > INIT_FREE_LIST(&vma_free_list); > INIT_FREE_LIST(&entry_free_list); > @@ -654,17 +659,6 @@ > > #if !defined(DISABLE_PTMALLOC) > > -static void lock_dereg() > -{ > - pthread_spin_lock(&dereg_lock); > -} > - > -static void unlock_dereg() > -{ > - pthread_spin_unlock(&dereg_lock); > -} > - > - > static void lock_dreg() > { > pthread_spin_lock(&dreg_lock); > @@ -687,94 +681,23 @@ > > static void flush_dereg_mrs() > { > - unsigned long i, j, k; > - unsigned long pagenum_low, pagenum_high; > - unsigned long npages, begin, end; > - unsigned long user_low_a, user_high_a; > - unsigned long pagebase_low_a, pagebase_high_a; > - struct dreg_entry *d; > - void *addr; > + int i = 0; > > - lock_dereg(); > - > - for(j = 0; j < mvapich2_minfo.n_dereg_mr; j++) { > - void *buf; > - size_t len; > - > - buf = deregister_mr_array[j].buf; > - len = deregister_mr_array[j].len; > - > - /* calculate base page address for registration */ > - user_low_a = (unsigned long) buf; > - user_high_a = user_low_a + (unsigned long) len - 1; > - > - pagebase_low_a = user_low_a & ~DREG_PAGEMASK; > - pagebase_high_a = user_high_a & ~DREG_PAGEMASK; > - > - /* info to store in hash table */ > - pagenum_low = pagebase_low_a >> DREG_PAGEBITS; > - pagenum_high = pagebase_high_a >> DREG_PAGEBITS; > - npages = 1 + (pagenum_high - pagenum_low); > - > - /* For every page in this buffer find out whether > - * it is registered or not. This is fine, since > - * we register only at a page granularity */ > - > - for (i = 0; i < npages; ++i) { > - addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); > - begin = ((unsigned long)addr) >> DREG_PAGEBITS; > - end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; > - > - while ((d = dreg_lookup(begin, end)) != NULL) { > - if (d->refcount != 0 || d->is_valid == 0) { > - /* This memory area is still being referenced > - * by other pending MPI operations, which are > - * expected to call dreg_unregister and thus > - * unpin the buffer. We cannot deregister this > - * page, since other ops are pending from here. */ > - > - /* OR: This memory region is in the process of > - * being deregistered. Leave it alone! */ > - continue; > - } > - > - d->is_valid = 0; > - > - for (k = 0; k < rdma_num_hcas; ++k) { > - if (d->memhandle[k]) { > - if(ibv_dereg_mr(d->memhandle[k])) { > - ibv_error_abort(IBV_RETURN_ERR, "deregistration failed\n"); > - } > - } > - > - d->memhandle[i] = NULL; > - } > - > - if (d->refcount == 0) { > - if (MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { > - DREG_REMOVE_FROM_UNUSED_LIST(d); > - } > - } else { > - --d->refcount; > - } > - > - dreg_remove(d); > - DREG_ADD_TO_FREE_LIST(d); > + for (; i < n_dereg_mr; ++i) > + { > + if (deregister_mr_array[i]) > + { > + if (ibv_dereg_mr(deregister_mr_array[i])) > + { > + ibv_error_abort(IBV_RETURN_ERR, "deregistration failed\n"); > } > } > + > + deregister_mr_array[i] = NULL; > } > > - mvapich2_minfo.n_dereg_mr = 0; > - unlock_dereg(); > + n_dereg_mr = 0; > } > - > -void flush_dereg_mrs_lock() > -{ > - lock_dreg(); > - flush_dereg_mrs(); > - unlock_dreg(); > -} > - > #endif /* !defined(DISABLE_PTMALLOC) */ > > /* will return a NULL pointer if registration fails */ > @@ -784,9 +707,7 @@ > > #if !defined(DISABLE_PTMALLOC) > lock_dreg(); > - if(mvapich2_minfo.n_dereg_mr) { > - flush_dereg_mrs(); > - } > + flush_dereg_mrs(); > #endif /* !defined(DISABLE_PTMALLOC) */ > > struct dreg_entry* d = dreg_find(buf, len); > @@ -1085,22 +1006,106 @@ > #if !defined(DISABLE_PTMALLOC) > void find_and_free_dregs_inside(void* buf, size_t len) > { > + unsigned long i = 0; > + unsigned long begin; > + unsigned long end; > + > + /* Calculate base page address for registration. */ > + unsigned long user_low_a = (unsigned long) buf; > + unsigned long user_high_a = user_low_a + (unsigned long) len - 1; > + unsigned long pagebase_low_a = user_low_a & ~DREG_PAGEMASK; > + unsigned long pagebase_high_a = user_high_a & ~DREG_PAGEMASK; > + > + /* Info to store in hash table. */ > + unsigned long pagenum_low = pagebase_low_a >> DREG_PAGEBITS; > + unsigned long pagenum_high = pagebase_high_a >> DREG_PAGEBITS; > + unsigned long npages = 1 + (pagenum_high - pagenum_low); > + > + struct dreg_entry* d = NULL; > + void* addr = NULL; > + > + /* For every page in this buffer find out whether > + * it is registered or not. This is fine, since > + * we register only at a page granularity */ > + > if (!g_is_dreg_initialized > || !MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) > { > return; > } > > - lock_dereg(); > + if (pthread_self() == th_id_of_lock) > + { > + /* > + * This comparison is necessary to distinguish > + * between recursive and multi-threaded calls to > + * the registration cache. > + * > + * The recursive calls are possible since > + * ibv_dereg_mr calls free after de-registering > + * memory regions. However, this free should be > + * for a smaller memory region, which is not > + * handled by the MPI cache. We shouldn't > + * try to do anything more in this routine. > + */ > + return; > + } > > - assert(rdma_ndreg_entries * MAX_NUM_HCAS > mvapich2_minfo.n_dereg_mr); > + lock_dreg(); > > - deregister_mr_array[mvapich2_minfo.n_dereg_mr].buf = buf; > - deregister_mr_array[mvapich2_minfo.n_dereg_mr].len = len; > + for (; i < npages; ++i) > + { > + addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); > + begin = ((unsigned long)addr) >> DREG_PAGEBITS; > + end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; > > - mvapich2_minfo.n_dereg_mr++; > - unlock_dereg(); > + while ((d = dreg_lookup(begin, end)) != NULL) > + { > + if (d->refcount != 0 || d->is_valid == 0) > + { > + /* This memory area is still being referenced > + * by other pending MPI operations, which are > + * expected to call dreg_unregister and thus > + * unpin the buffer. We cannot deregister this > + * page, since other ops are pending from here. */ > > + /* OR: This memory region is in the process of > + * being deregistered. Leave it alone! */ > + continue; > + } > + > + for (i = 0; i < rdma_num_hcas; ++i) > + { > + d->is_valid = 0; > + > + if (d->memhandle[i]) > + { > + MPIU_Assert(n_dereg_mr < (rdma_ndreg_entries * MAX_NUM_HCAS)); > + deregister_mr_array[n_dereg_mr] = d->memhandle[i]; > + ++n_dereg_mr; > + } > + > + d->memhandle[i] = NULL; > + } > + > + if (d->refcount == 0) > + { > + if (MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) > + { > + DREG_REMOVE_FROM_UNUSED_LIST(d); > + } > + } > + else > + { > + --d->refcount; > + } > + > + dreg_remove(d); > + DREG_ADD_TO_FREE_LIST(d); > + } > + } > + > + unlock_dreg(); > } > #endif /* !defined(DISABLE_PTMALLOC) */ > > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/gen2/dreg.h 2008-08-14 15:35:06 UTC (rev 2934) > @@ -44,11 +44,6 @@ > > typedef struct dreg_entry dreg_entry; > > -typedef struct { > - void *buf; > - size_t len; > -} dreg_region; > - > struct dreg_entry { > unsigned long pagenum; > struct ibv_mr *memhandle[MAX_NUM_HCAS]; > @@ -245,7 +240,6 @@ > > #ifndef DISABLE_PTMALLOC > void find_and_free_dregs_inside(void *buf, size_t len); > -void flush_dereg_mrs_lock(); > #endif > > #ifdef CKPT > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c 2008-08-14 15:35:06 UTC (rev 2934) > @@ -36,11 +36,6 @@ > } > #endif /* !defined(DISABLE_MUNMAP_HOOK) */ > > -void mvapich2_mem_flush() > -{ > - flush_dereg_mrs_lock(); > -} > - > void mvapich2_mem_unhook(void *ptr, size_t size) > { > if((size > 0) && > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/arena.c 2008-08-14 15:35:06 UTC (rev 2934) > @@ -19,7 +19,7 @@ > > #define HEAP_MIN_SIZE (32*1024) > #ifndef HEAP_MAX_SIZE > -#define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */ > +#define HEAP_MAX_SIZE (4*1024*1024) /* must be a power of two */ > #endif > > /* HEAP_MIN_SIZE and HEAP_MAX_SIZE limit the size of mmap()ed heaps > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/memory/ptmalloc2/mvapich_malloc.c 2008-08-14 15:35:06 UTC (rev 2934) > @@ -218,11 +218,6 @@ > /* <_OSU_MVAPICH_> */ > #include "mpidi_ch3i_rdma_conf.h" > #if !defined(DISABLE_PTMALLOC) > -#define FLUSH_DREG() { \ > - if (mvapich2_minfo.n_dereg_mr > 20) { \ > - mvapich2_mem_flush(); \ > - } \ > -} > /* */ > > /* > @@ -3420,7 +3415,6 @@ > > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_malloc = 1; > - FLUSH_DREG(); > /* */ > > return victim; > @@ -3444,7 +3438,6 @@ > > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_free = 1; > - FLUSH_DREG(); > > #if defined(DISABLE_TRAP_SBRK) > if (!mvapich2_minfo.is_inside_free) > @@ -3505,7 +3498,6 @@ > return (*hook)(oldmem, bytes, RETURN_ADDRESS (0)); > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_realloc = 1; > - FLUSH_DREG(); > /* */ > > #if REALLOC_ZERO_BYTES_FREES > @@ -3530,7 +3522,6 @@ > if(newp) > /* <_OSU_MVAPICH_> */ > { > - FLUSH_DREG(); > mvapich2_minfo.is_our_realloc = 1; > /* */ > return chunk2mem(newp); > @@ -3542,7 +3533,6 @@ > if(oldsize - SIZE_SZ >= nb) > /* <_OSU_MVAPICH_> */ > { > - FLUSH_DREG(); > mvapich2_minfo.is_our_realloc = 1; > /* */ > return oldmem; /* do nothing */ > @@ -3556,7 +3546,6 @@ > munmap_chunk(oldp); > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_realloc = 1; > - FLUSH_DREG(); > /* */ > return newmem; > } > @@ -3586,7 +3575,6 @@ > ar_ptr == arena_for_chunk(mem2chunk(newp))); > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_realloc = 1; > - FLUSH_DREG(); > /* */ > return newp; > } > @@ -3638,7 +3626,6 @@ > ar_ptr == arena_for_chunk(mem2chunk(p))); > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_memalign = 1; > - FLUSH_DREG(); > /* */ > return p; > } > @@ -3661,7 +3648,6 @@ > (void)mutex_unlock(&ar_ptr->mutex); > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_valloc = 1; > - FLUSH_DREG(); > /* */ > return p; > } > @@ -3679,7 +3665,6 @@ > (void)mutex_unlock(&ar_ptr->mutex); > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_valloc = 1; > - FLUSH_DREG(); > /* */ > return p; > } > @@ -3719,7 +3704,6 @@ > while(sz > 0) ((char*)mem)[--sz] = 0; /* rather inefficient */ > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_calloc = 1; > - FLUSH_DREG(); > /* */ > return mem; > #endif > @@ -3782,7 +3766,6 @@ > if (chunk_is_mmapped(p)) > /* <_OSU_MVAPICH_> */ > { > - FLUSH_DREG(); > mvapich2_minfo.is_our_calloc = 1; > /* */ > return mem; > @@ -3831,7 +3814,6 @@ > > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_calloc = 1; > - FLUSH_DREG(); > /* */ > return mem; > } > @@ -3850,7 +3832,6 @@ > (void)mutex_unlock(&ar_ptr->mutex); > /* <_OSU_MVAPICH_> */ > mvapich2_minfo.is_our_calloc = 1; > - FLUSH_DREG(); > /* */ > return m; > } > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.c 2008-08-14 15:35:06 UTC (rev 2934) > @@ -33,7 +33,6 @@ > #include > > #include "dreg.h" > -#include "mem_hooks.h" > #include "avl.h" > #include "rdma_impl.h" > #include "udapl_util.h" > @@ -94,15 +93,22 @@ > > #ifndef DISABLE_PTMALLOC > static pthread_spinlock_t dreg_lock = 0; > -static pthread_spinlock_t dereg_lock = 0; > static pthread_t th_id_of_lock = -1; > > /* Array which stores the memory regions > * ptrs which are to be deregistered after > * free hook pulls them out of the reg cache > */ > -static dreg_region *deregister_mr_array; > +static VIP_MEM_HANDLE **deregister_mr_array; > > +/* Number of pending deregistration > + * operations > + * Note: This number can never exceed > + * the total number of reg. cache > + * entries > + */ > +static int n_dereg_mr; > + > /* Keep free list of VMA data structs > * and entries */ > static vma_t vma_free_list; > @@ -586,25 +592,24 @@ > > #ifndef DISABLE_PTMALLOC > pthread_spin_init(&dreg_lock, 0); > - pthread_spin_init(&dereg_lock, 0); > > - deregister_mr_array = (dreg_region *) > - MPIU_Malloc(sizeof(dreg_region) * > + deregister_mr_array = (VIP_MEM_HANDLE **) > + MPIU_Malloc(sizeof(VIP_MEM_HANDLE *) * > rdma_ndreg_entries * MAX_NUM_HCAS); > > if(NULL == deregister_mr_array) { > udapl_error_abort(GEN_EXIT_ERR, > "dreg_init: unable to malloc %d bytes", > - (int) sizeof(dreg_region) * > + (int) sizeof(VIP_MEM_HANDLE *) * > rdma_ndreg_entries * MAX_NUM_HCAS); > } > > memset(deregister_mr_array, 0, > - sizeof(dreg_region) * > + sizeof(VIP_MEM_HANDLE *) * > rdma_ndreg_entries * > MAX_NUM_HCAS); > > - mvapich2_minfo.n_dereg_mr = 0; > + n_dereg_mr = 0; > > INIT_FREE_LIST(&vma_free_list); > > @@ -615,16 +620,6 @@ > > #ifndef DISABLE_PTMALLOC > > -static void lock_dereg() > -{ > - pthread_spin_lock(&dereg_lock); > -} > - > -static void unlock_dereg() > -{ > - pthread_spin_unlock(&dereg_lock); > -} > - > static void lock_dreg() > { > pthread_spin_lock(&dreg_lock); > @@ -647,80 +642,23 @@ > > static void flush_dereg_mrs() > { > - int i, j; > - unsigned long pagenum_low, pagenum_high; > - unsigned long npages, begin, end; > - unsigned long user_low_a, user_high_a; > - unsigned long pagebase_low_a, pagebase_high_a; > - struct dreg_entry *d; > - void *addr; > + int i; > > - lock_dereg(); > + for(i = 0; i < n_dereg_mr; i++) { > > - /* For every page in this buffer find out whether > - * it is registered or not. This is fine, since > - * we register only at a page granularity */ > + if(deregister_mr_array[i]) { > > - for(j = 0; j < mvapich2_minfo.n_dereg_mr; j++) { > - void *buf; > - size_t len; > - > - buf = deregister_mr_array[j].buf; > - len = deregister_mr_array[j].len; > - > - /* calculate base page address for registration */ > - user_low_a = (unsigned long) buf; > - user_high_a = user_low_a + (unsigned long) len - 1; > - > - pagebase_low_a = user_low_a & ~DREG_PAGEMASK; > - pagebase_high_a = user_high_a & ~DREG_PAGEMASK; > - > - /* info to store in hash table */ > - pagenum_low = pagebase_low_a >> DREG_PAGEBITS; > - pagenum_high = pagebase_high_a >> DREG_PAGEBITS; > - npages = 1 + (pagenum_high - pagenum_low); > - > - /* For every page in this buffer find out whether > - * it is registered or not. This is fine, since > - * we register only at a page granularity */ > - > - for (i = 0; i < npages; ++i) { > - addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); > - begin = ((unsigned long)addr) >> DREG_PAGEBITS; > - end = (unsigned long)((char*)addr + DREG_PAGESIZE - 1) >> DREG_PAGEBITS; > - > - while ((d = dreg_lookup(begin, end)) != NULL) { > - if (d->refcount != 0 || d->is_valid == 0) { > - /* This memory area is still being referenced > - * by other pending MPI operations, which are > - * expected to call dreg_unregister and thus > - * unpin the buffer. We cannot deregister this > - * page, since other ops are pending from here. */ > - > - /* OR: This memory region is in the process of > - * being deregistered. Leave it alone! */ > - continue; > - } > - > - d->is_valid = 0; > - > - if(dat_lmr_free(d->memhandle.hndl)) { > - udapl_error_abort(UDAPL_RETURN_ERR, > - "deregistration failed\n"); > - } > + if(dat_lmr_free(deregister_mr_array[i]->hndl)) { > + udapl_error_abort(UDAPL_RETURN_ERR, > + "deregistration failed\n"); > } > } > + > + deregister_mr_array[i] = NULL; > } > > - mvapich2_minfo.n_dereg_mr = 0; > - unlock_dereg(); > + n_dereg_mr = 0; > } > - > -void flush_dereg_mrs_lock() { > - lock_dreg(); > - flush_dereg_mrs(); > - unlock_dreg(); > -} > #endif > > /* will return a NULL pointer if registration fails */ > @@ -1018,17 +956,103 @@ > #ifndef DISABLE_PTMALLOC > void find_and_free_dregs_inside(void *buf, size_t len) > { > + int i; > + unsigned long pagenum_low, pagenum_high; > + unsigned long npages, begin, end; > + unsigned long user_low_a, user_high_a; > + unsigned long pagebase_low_a, pagebase_high_a; > + struct dreg_entry *d; > + void *addr; > + > + /* calculate base page address for registration */ > + user_low_a = (unsigned long) buf; > + user_high_a = user_low_a + (unsigned long) len - 1; > + > + pagebase_low_a = user_low_a & ~DREG_PAGEMASK; > + pagebase_high_a = user_high_a & ~DREG_PAGEMASK; > + > + /* info to store in hash table */ > + pagenum_low = pagebase_low_a >> DREG_PAGEBITS; > + pagenum_high = pagebase_high_a >> DREG_PAGEBITS; > + npages = 1 + (pagenum_high - pagenum_low); > + > + /* For every page in this buffer find out whether > + * it is registered or not. This is fine, since > + * we register only at a page granularity */ > + > if(!is_dreg_initialized || > !MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { > return; > } > > - lock_dereg(); > - deregister_mr_array[mvapich2_minfo.n_dereg_mr].buf = buf; > - deregister_mr_array[mvapich2_minfo.n_dereg_mr].len = len; > + if(pthread_self() == th_id_of_lock) { > > - mvapich2_minfo.n_dereg_mr++; > - unlock_dereg(); > + /* > + * This comparison is necessary to distinguish > + * between recursive and multi-threaded calls to > + * the registration cache. > + * > + * The recursive calls are possible since > + * dat_lmr_free calls free after de-registering > + * memory regions. However, this free should be > + * for a smaller memory region, which is not > + * handled by the MPI cache. We shouldn't > + * try to do anything more in this routine. > + */ > + > + return; > + } > + > + lock_dreg(); > + > + for(i = 0; i < npages; i++) { > + > + addr = (void *) ((uintptr_t) pagebase_low_a + i * DREG_PAGESIZE); > + > + begin = ((unsigned long)addr) >> DREG_PAGEBITS; > + > + end = ((unsigned long)(((char*)addr) + > + DREG_PAGESIZE - 1)) >> DREG_PAGEBITS; > + > + while( (d = dreg_lookup (begin, end)) != NULL) { > + > + if((d->refcount != 0) || (d->is_valid == 0)) { > + /* This memory area is still being referenced > + * by other pending MPI operations, which are > + * expected to call dreg_unregister and thus > + * unpin the buffer. We cannot deregister this > + * page, since other ops are pending from here. */ > + > + /* OR: This memory region is in the process of > + * being deregistered. Leave it alone! */ > + continue; > + } > + > + for(i = 0; i < rdma_num_hcas; i++) { > + > + d->is_valid = 0; > + > + MPIU_Assert(n_dereg_mr < > + (rdma_ndreg_entries * MAX_NUM_HCAS)); > + > + deregister_mr_array[n_dereg_mr] = &(d->memhandle); > + n_dereg_mr++; > + > + } > + > + if(d->refcount == 0) { > + if(MPIDI_CH3I_RDMA_Process.has_lazy_mem_unregister) { > + DREG_REMOVE_FROM_UNUSED_LIST(d); > + } > + } else { > + d->refcount--; > + } > + > + dreg_remove (d); > + DREG_ADD_TO_FREE_LIST(d); > + } > + } > + unlock_dreg(); > } > #endif > > > Modified: mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h > =================================================================== > --- mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-14 14:12:07 UTC (rev 2933) > +++ mvapich2/trunk/src/mpid/ch3/channels/mrail/src/udapl/dreg.h 2008-08-14 15:35:06 UTC (rev 2934) > @@ -45,11 +45,6 @@ > > typedef struct dreg_entry dreg_entry; > > -typedef struct { > - void *buf; > - size_t len; > -} dreg_region; > - > struct dreg_entry { > unsigned long pagenum; > VIP_MEM_HANDLE memhandle; > @@ -250,7 +245,6 @@ > > #ifndef DISABLE_PTMALLOC > void find_and_free_dregs_inside(void *buf, size_t len); > -void flush_dereg_mrs_lock(); > #endif > > #ifdef CKPT > > _______________________________________________ > mvapich-commit mailing list > mvapich-commit@cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-commit > From gopalakk at mvapich.cse.ohio-state.edu Sat Aug 16 18:43:35 2008 From: gopalakk at mvapich.cse.ohio-state.edu (gopalakk@mvapich.cse.ohio-state.edu) Date: Sat Aug 16 18:43:47 2008 Subject: [mvapich-commit] r2937 - mvapich2/trunk/src/pm/mpd Message-ID: <200808162243.m7GMhZ1I011882@mvapich.cse.ohio-state.edu> Author: gopalakk Date: 2008-08-16 18:43:34 -0400 (Sat, 16 Aug 2008) New Revision: 2937 Modified: mvapich2/trunk/src/pm/mpd/mpiexec_cr.c Log: Fix the bind() failure in mpiexec_cr Modified: mvapich2/trunk/src/pm/mpd/mpiexec_cr.c =================================================================== --- mvapich2/trunk/src/pm/mpd/mpiexec_cr.c 2008-08-14 17:54:14 UTC (rev 2936) +++ mvapich2/trunk/src/pm/mpd/mpiexec_cr.c 2008-08-16 22:43:34 UTC (rev 2937) @@ -254,10 +254,12 @@ int CR_connect_mpd(int port) { + int i = 1; struct sockaddr_in sa; mpiexec_listen_fd = socket(AF_INET, SOCK_STREAM, 0); if (mpiexec_listen_fd < 0) CR_ERR_ABORT("socket failed\n"); + setsockopt(mpiexec_listen_fd, SOL_SOCKET, SO_REUSEADDR, (int *) &i, sizeof(i)); sa.sin_family = AF_INET; sa.sin_port = htons(port); CR_DBG("Listen port %d\n",port); From perkinjo at mvapich.cse.ohio-state.edu Wed Aug 20 13:30:01 2008 From: perkinjo at mvapich.cse.ohio-state.edu (perkinjo@mvapich.cse.ohio-state.edu) Date: Wed Aug 20 13:30:14 2008 Subject: [mvapich-commit] r2943 - mvapich2/trunk/maint Message-ID: <200808201730.m7KHU1Wr017055@mvapich.cse.ohio-state.edu> Author: perkinjo Date: 2008-08-20 13:30:01 -0400 (Wed, 20 Aug 2008) New Revision: 2943 Modified: mvapich2/trunk/maint/Version Log: Reflect current release candidate. Modified: mvapich2/trunk/maint/Version =================================================================== --- mvapich2/trunk/maint/Version 2008-08-20 15:33:57 UTC (rev 2942) +++ mvapich2/trunk/maint/Version 2008-08-20 17:30:01 UTC (rev 2943) @@ -1 +1 @@ -1.2.0 +1.2rc2 From chail at mvapich.cse.ohio-state.edu Wed Aug 20 14:16:19 2008 From: chail at mvapich.cse.ohio-state.edu (chail@mvapich.cse.ohio-state.edu) Date: Wed Aug 20 14:16:31 2008 Subject: [mvapich-commit] r2944 - mvapich2/trunk Message-ID: <200808201816.m7KIGJqD017167@mvapich.cse.ohio-state.edu> Author: chail Date: 2008-08-20 14:16:18 -0400 (Wed, 20 Aug 2008) New Revision: 2944 Modified: mvapich2/trunk/CHANGELOG Log: Update changelog Modified: mvapich2/trunk/CHANGELOG =================================================================== --- mvapich2/trunk/CHANGELOG 2008-08-20 17:30:01 UTC (rev 2943) +++ mvapich2/trunk/CHANGELOG 2008-08-20 18:16:18 UTC (rev 2944) @@ -3,8 +3,31 @@ This file briefly describes the changes to the MVAPICH2 software package. The logs are arranged in the "most recent first" order. -MVAPICH2-1.2 (07/02/08) +MVAPICH2-1.2-RC2 (08/20/2008) +* Following bugs are fixed in RC2 + + - Properly handle the scenario in shared memory broadcast code when the + datatypes of different processes taking part in broadcast are different. + + - Fix a bug in Checkpoint-Restart code to determine whether a connection is a + shared memory connection or a network connection. + + - Support non-standard path for BLCR header files. + + - Increase the maximum heap size to avoid race condition in realloc(). + + - Use int32_t for rank for larger jobs with 32k processes or more. + + - Improve mvapich2-1.2 bandwidth to the same level of mvapich2-1.0.3. + + - An error handling patch for uDAPL interface. Thanks for Nilesh Awate for the patch. + + - Explicitly set some of the EP attributes when on demand connection is used + in uDAPL interface. + +MVAPICH2-1.2-RC1 (07/02/08) + * Following features are added for this new mvapich2-1.2 release: - Based on MPICH2 1.0.7 From perkinjo at mvapich.cse.ohio-state.edu Mon Aug 25 17:07:36 2008 From: perkinjo at mvapich.cse.ohio-state.edu (perkinjo@mvapich.cse.ohio-state.edu) Date: Mon Aug 25 17:07:51 2008 Subject: [mvapich-commit] r2955 - mvapich2/trunk/maint Message-ID: <200808252107.m7PL7a56032538@mvapich.cse.ohio-state.edu> Author: perkinjo Date: 2008-08-25 17:07:35 -0400 (Mon, 25 Aug 2008) New Revision: 2955 Modified: mvapich2/trunk/maint/Version Log: Fix syntax of version string Modified: mvapich2/trunk/maint/Version =================================================================== --- mvapich2/trunk/maint/Version 2008-08-25 13:52:19 UTC (rev 2954) +++ mvapich2/trunk/maint/Version 2008-08-25 21:07:35 UTC (rev 2955) @@ -1 +1 @@ -1.2rc2 +1.2.0rc2