From leonleon77 at gmail.com Tue Apr 15 03:00:26 2008 From: leonleon77 at gmail.com (leon zadorin) Date: Tue Apr 15 03:00:37 2008 Subject: [mvapich-commit] dynamic process connections (accept/connect or MPI_Comm_join) and Infiniband... Message-ID: <26d2cb010804150000r56b8ad62n4128d6c7a5dc170c@mail.gmail.com> Hello everyone, I am relatively new to the whole MPI/Infiniband scene, so my apologies if some of the questions/thoughts of mine are naive... I am currently experiencing difficulties with dynamic process connections (MPI_Comm_join) between 2 hosts (each with Infiniband and Ethernet card). The setup is: 2 hosts, each with Ethernet (Gigabit) card and with Infiniband card (PCI-e), running Linux 32 bits. Hosts are connected via Infiniband switch (w.r.t Infiniband cards) and via Ethernet/IP network (w.r.t. Ethernet cards). mvapich2 has been made with "make.make.mvapich2.ofa" mpdboot has been executed and mpd daemons are running on both hosts I would like to know if it is currently possible to achieve the following: 1) start 1 app on 1 host (without using mpirun); 2) then later, after some time, start another app on 2nd host (without using mpirun); 3) make the app in step 2 automatically connect to the app started in step 1 I was able to achieve the above when running with mpich2 library, using sock channels and only when using 'MPI_Comm_join' call (using MPI_Publish_name, etc. did not work when starting apps without mpirun [even with all mpds being active]). However, the MPI_Comm_join tactic fails when attempting to use mvapich2 (mvapich2-1.0-2008-04-10) over Infiniband... I wonder if the following has something to do with it: http://lists.openfabrics.org/pipermail/commits/2006-January/004707.html " -------------------------------------------------------------------------------- - Known Deficiencies -------------------------------------------------------------------------------- - -- The sock channel is the only channel that implements dynamic process support - (i.e., MPI_COMM_SPAWN, MPI_COMM_CONNECT, MPI_COMM_ACCEPT, etc.). All other - channels will experience failures for tests exercising dynamic process - functionality. " and in http://lists.openfabrics.org/pipermail/commits/2006-May/007209.html we have: " -- MPI_COMM_JOIN has been implemented; although like the other dynamic process - routines, it is only supported by the Sock channel. " Given that above quotes mentioned both the MPI_Comm_join and MPI_Comm_connect ... is there any way at all to currently achieve the above 3 steps when using Infiniband cards (and may be having Ethernet cards on all of the hosts as well)? I would imagine that, albeit theoretically, it is plausible to use sock channel to 'bootstrap' the Infiniband channel? http://www.mpi-forum.org/docs/mpi-20-html/node115.htm " MPI uses the socket to bootstrap creation of the intercommunicator, and for nothing else. " Perhaps I need to build mvapich2 not via "make.make.mvapich2.ofa" but something else so that both: socket and infiniband channels are supported? Of course the same aforementioned link (http://www.mpi-forum.org/docs/mpi-20-html/node115.htm) says: " Advice to users. An MPI implementation may require a specific communication medium for MPI communication, such as a shared memory segment or a special switch. In this case, it may not be possible for two processes to successfully join even if there is a socket connecting them and they are using the same MPI implementation. ( End of advice to users.) " If this is the case here and there is no way to use MPI_Comm_join to achieve the originally described 3 steps (connecting apps started at different times and without the use of mpirun) - is that then at all possible (e.g. using MPI's open port, publish name, lookup name, accept/connect calls)? Are the limitations purely theoretical or more of a practical nature? Ideally, for async. server design purposes and, given that MPI_Comm_accept is blocking and there is no 'test'/'poll' for it, it would be good to be able to use sockets channel to coordinate infiniband channel bootstrapping with MPI_Comm_join (even if MPI_Comm_join in itself is blocking, at least one can 'poll' for the TCP's socket's fd before calling 'accept' and subsequently MPI_Comm_join)... If mvapich2 is unable to provide dynamic process connectivity over Infiniband... are there any other libs that could do that? Kind regards Leon. From chail at mvapich.cse.ohio-state.edu Thu Apr 24 14:28:08 2008 From: chail at mvapich.cse.ohio-state.edu (chail@mvapich.cse.ohio-state.edu) Date: Thu Apr 24 14:28:20 2008 Subject: [mvapich-commit] r2410 - mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/gen2 Message-ID: <200804241828.m3OIS8QM018097@mvapich.cse.ohio-state.edu> Author: chail Date: 2008-04-24 14:28:06 -0400 (Thu, 24 Apr 2008) New Revision: 2410 Modified: mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c Log: Exchanging hostid in MPI_Ring_exchange so that it doesn't need to be done in smpi_exchange_info and this improves startup performance. Modified: mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c =================================================================== --- mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c 2008-04-23 19:26:58 UTC (rev 2409) +++ mvapich2/trunk/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c 2008-04-24 18:28:06 UTC (rev 2410) @@ -1590,10 +1590,8 @@ recv_packet->host_id; #ifdef _SMP_ - if (SMP_INIT) { MPIDI_PG_Get_vc(cached_pg, recv_packet->rank, &vc); vc->smp.hostid = recv_packet->host_id; - } #endif rdma_iba_addr_table.qp_num_rdma[recv_packet->rank][rail_index] = From chail at mvapich.cse.ohio-state.edu Thu Apr 24 14:32:12 2008 From: chail at mvapich.cse.ohio-state.edu (chail@mvapich.cse.ohio-state.edu) Date: Thu Apr 24 14:32:23 2008 Subject: [mvapich-commit] r2411 - mvapich2/branches/1.0/src/mpid/osu_ch3/channels/mrail/src/gen2 Message-ID: <200804241832.m3OIWCbk018114@mvapich.cse.ohio-state.edu> Author: chail Date: 2008-04-24 14:32:12 -0400 (Thu, 24 Apr 2008) New Revision: 2411 Modified: mvapich2/branches/1.0/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c Log: Exchanging hostid in MPI_Ring_exchange so that it doesn't need to be done in smpi_exchange_info and this improves startup performance. Modified: mvapich2/branches/1.0/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c =================================================================== --- mvapich2/branches/1.0/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c 2008-04-24 18:28:06 UTC (rev 2410) +++ mvapich2/branches/1.0/src/mpid/osu_ch3/channels/mrail/src/gen2/rdma_iba_priv.c 2008-04-24 18:32:12 UTC (rev 2411) @@ -1590,10 +1590,8 @@ recv_packet->host_id; #ifdef _SMP_ - if (SMP_INIT) { MPIDI_PG_Get_vc(cached_pg, recv_packet->rank, &vc); vc->smp.hostid = recv_packet->host_id; - } #endif rdma_iba_addr_table.qp_num_rdma[recv_packet->rank][rail_index] =