<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7638.1">
<TITLE>RE: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<BR>
<P><FONT SIZE=2>Hi Wei,<BR>
<BR>
It was getting the default value of 32. Now that I added 'ulimit -l unlimited' into /etc/init.d/sshd itself, it is ok. Thanks a lot for the help.<BR>
<BR>
Prakashan<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: wei huang [<A HREF="mailto:huanwei@cse.ohio-state.edu">mailto:huanwei@cse.ohio-state.edu</A>]<BR>
Sent: Fri 3/2/2007 1:54 PM<BR>
To: Korambath, Prakashan<BR>
Cc: mvapich-discuss@cse.ohio-state.edu<BR>
Subject: Re: [mvapich-discuss] Failed to Initialize HCA type for mvapich2-0.9.8<BR>
<BR>
Hi Prakashan,<BR>
<BR>
Thanks for using mvapich2.<BR>
<BR>
This is pretty weird because the ulimit is typically the reason when you<BR>
see create cq failure. May I ask you to make sure that ulimit is unlimited<BR>
on both nodes? Also, it will be good if you verify using the following<BR>
commands (so that ulimit is actually ulimited when you run the program):<BR>
<BR>
ssh n11 ulimit -l<BR>
ssh grid4 ulimit -l<BR>
<BR>
Also, would you please verify on both machines that port is active.<BR>
<BR>
Finally, if all them are fine, would you please make sure ib level<BR>
micro-benchmarks run successfully?<BR>
<BR>
Thanks.<BR>
<BR>
Regards,<BR>
Wei Huang<BR>
<BR>
774 Dreese Lab, 2015 Neil Ave,<BR>
Dept. of Computer Science and Engineering<BR>
Ohio State University<BR>
OH 43210<BR>
Tel: (614)292-8501<BR>
<BR>
<BR>
On Fri, 2 Mar 2007, Korambath, Prakashan wrote:<BR>
<BR>
> Hi,<BR>
> I just setup two nodes connected through an IB cable running Fedora<BR>
> Core6 OS kernel 2.6.19-1.2911.fc6 and OFED-1.1. ibstat and ibnodes<BR>
> outputs are below. I ran make.mvapich2.gen2 file in order to create<BR>
> the mpi related files. I am getting following error when I am running<BR>
> mpiexec. Could you please tell me what I am doing wrong? The<BR>
> configure is using --with-device=osu_ch3:mrail inside<BR>
> make.mvapich2.gen2 . I don't know whether I have wrong device or<BR>
> something. Also ulimit -l shows unlimited. Thanks for your help.<BR>
><BR>
><BR>
> Prakashan Korambath<BR>
> UCLA<BR>
><BR>
> ------------------------------------------<BR>
><BR>
><BR>
><BR>
> -bash-3.1$ mpd &<BR>
> [1] 13652<BR>
> -bash-3.1$ !mpdboot<BR>
> mpdboot -n 2 -f hostfile<BR>
> [1]+ Done mpd<BR>
> -bash-3.1$ mpicc -o bones bones.c<BR>
> -bash-3.1$ which mpicc<BR>
> ~/mvapich2/bin/mpicc<BR>
> -bash-3.1$ mpiexec -n 2 ./bones<BR>
> cannot create cq<BR>
> Failed to Initialize HCA type<BR>
> Fatal error in MPI_Init: Other MPI error, error stack:<BR>
> MPIR_Init_thread(230): Initialization failed<BR>
> MPID_Init(81)........: channel initialization failed<BR>
> (unknown)(): Other MPI errorrank 1 in job 1 grid4.ats.ucla.edu_33136 caused collective abort of all ranks<BR>
> exit status of rank 1: killed by signal 9<BR>
> -bash-3.1$<BR>
> -bash-3.1$ mpdtrace<BR>
> grid4<BR>
> n11<BR>
><BR>
><BR>
><BR>
> -----------------------<BR>
> [root@grid4 ~]# ibstat<BR>
> CA 'mthca0'<BR>
> CA type: MT25204<BR>
> Number of ports: 1<BR>
> Firmware version: 1.0.800<BR>
> Hardware version: a0<BR>
> Node GUID: 0x00066a0098007a39<BR>
> System image GUID: 0x00066a0098007a39<BR>
> Port 1:<BR>
> State: Active<BR>
> Physical state: LinkUp<BR>
> Rate: 20<BR>
> Base lid: 1<BR>
> LMC: 0<BR>
> SM lid: 2<BR>
> Capability mask: 0x02510a6a<BR>
> Port GUID: 0x00066a00a0007a39<BR>
> [root@grid4 ~]# ibnodes<BR>
> Ca : 0x00066a0098007a25 ports 1 "n11 HCA-1"<BR>
> Ca : 0x00066a0098007a39 ports 1 "grid4 HCA-1"<BR>
><BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>