From panda at cse.ohio-state.edu Mon Jun 5 23:23:40 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon Jun 5 23:23:45 2006 Subject: [mvapich] Announcing the release of MVAPICH2 0.9.3 with multi-threading In-Reply-To: from "James D Ballew" at Jun 01, 2006 03:47:51 PM Message-ID: <200606060323.k563Ne6I025470@xi.cse.ohio-state.edu> Dear Jim, My apology for not able to reply back to your e-mail earlier. I have been swampped in multiple deadlines ... > Noticed this in the MVAPICH 0.9.7 User guide: > > ??? (NEW) Support for Fault Tolerance > ??? (NEW) Mem-to-mem reliable data transfer (detection of I/O bus error with > 32bit > CRC) > ??? Additional fault tolerance support (such as checkpoint restart, automatic > path > migration (APM), etc.) will be introduced in successive releases > > How far along are you on your support for Fault Tolerance in dual-rail > networks. > We are very interested in testing anything you have available and would be > very interested in working with your group as a test site. You must have come across the recent announcement of `full support (both detection and retransmission) of mem-to-mem reliable data transfer. Since the current version of MVAPICH has separate single-rail and multi-rail devices, most features are available on the single-rail side because many people have been using this. Gradually, we are migrating things to the multi-rail side. In order to bring all features to single-rail and multi-rail side, from the next release of MVAPICH2, we are unifying single-rail and multi-rail designs (i.e., single-rail will be one instance of multi-rail). After these, all others features will be available on both sides (single-rail and multi-rail) on MVAPICH2 front. Wrt fault-tolerant features like APM and Check-point restart, we have these working in the lab as prototype forms. With more in-depth testing, we plan to release these during the next 1-3 months. Many thanks for offering your organization to work as a test-site. As these features are completely available, we will definitely take your help. In fact, for recent releases, we have been releasing RCs (after some reasonable testing) through our SVN and announcing them on mvapich-discuss mailing list. If somebody from Raytheon is on this mailing list, they should be getting these e-mails and can try out the early RCs. Overall, as you know, we are in a university environment. Our progress on adding new features completely depends on the demand from the IB community and the available funding we have. We are always trying to achieve a balance between these two. If Raytheon is interested in some critical features and would like to extend some funding to us, we will be happy to put more resource along these directions, work closely with Raytheon and and accelerate on adding those features. Please let us know your thoughts. Best Regards, DK > Regards, > Jim >