I install Torque and Mpich2 on a cluster. I can run
jobs using Mpich2. Before run a job, I up the demons
(Mpd) in all the machines (mpdboot), and run the job.

The problem is, that I can not do the same using
Torque, because when I run more than one job in the
queue, any jobs dies.

I think that the mpd deamon kill the other mpd deamon.

I am using the next script to run on the queue:


#PBS -l nodes=3
#PBS -j eo
#PBS -m bae
#PBS -M fokerman at yahoo.com

NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')

echo "nodes ($NP cpu total):"
sort $PBS_NODEFILE | uniq


cat $PBS_NODEFILE > mpd.hosts

mpdboot -n $NP

mpirun -machinefile $PBS_NODEFILE -np $NP


exit 0


A lot of tanks!





