Friday, 15 January 2010

c - mpi_comm_spawn error: MPI Application rank 0 killed before MPI_Finalize() with signal 11 -



c - mpi_comm_spawn error: MPI Application rank 0 killed before MPI_Finalize() with signal 11 -

i'm trying run mpi programme using mpi_comm_spawn. spawn 1 worker program, , phone call mpi_reduce in both programs, add together results. reason, application hangs @ mpi_comm_spawn, aborts after minute. spawned process gets code segment calls mpi_reduce after happens. application continues hang, , gives more errors in command prompt. should happen both spawned , master programs reach mpi_reduce call, , master programme gets sum, , outputs sum.

here's output, i've set <> mpi's output, , not own

world size = 1 phone call mpi_comm_spawn 2 workers... parent result 3.141668952 numdarts child: 500000000 argv[1] = 500000000 <>mpi application rank 0 killed before mpi_finalize() signal 11 spawned process got result: 3.141668952 spawned process send message parent <>piworker: rank 1:0: mpi_finalize: ibv connection 0 on card 0 broken <>piworker: rank 1:0: mpi_finalize: ibv_poll_cq(): bad status 12 <>piworker: rank 1:0: mpi_finalize: self n93 peer n93 (rank: 0) <>piworker: rank 1:0: mpi_finalize: error message: transport retry exceeded error

here's master program's code:

#include "mpi.h" #include <stdio.h> #include <stdlib.h> #include <string.h> #include "globals.h" int randsign(); double randfloat(); double dboard(); int main(int argc, char *argv[]) { int world_size, flag; mpi_comm everyone; /* intercommunicator */ char worker_program[100]; int universe_size; // mpi_comm_get_attr(mpi_comm_world, mpi_universe_size, &universe_size, &flag); // printf("universe size: %i\n", universe_size); int numdarts = 1000000000; int numworkers = 2; char* args[1]; if(argc >= 2) { numworkers = atoi(argv[1]); } if(argc >= 3) numdarts = atoi(argv[2]); mpi_init(&argc, &argv); mpi_comm_size(mpi_comm_world, &world_size); printf("world size = %i\n", world_size); if (world_size != 1) printf("top heavy management\n"); int numdartsworker = numdarts/numworkers; int numdartsmaster = numdarts/numworkers + (numdarts % numworkers); //the master computes leftover args[0] = malloc(256 * sizeof(char)); sprintf(args[0], "%i", numdartsworker); printf("argument passing workers: %s\n", args[0]); /* * spawn workers. note there run-time determination * of type of worker spawn, , presumably calculation must * done @ run time , cannot calculated before starting * program. if known when application * first started, improve start them @ 1 time * in single mpi_comm_world. */ printf("about phone call mpi_comm_spawn %i workers...\n", numworkers); int resultlen = 0; double myresult = dboard(numdartsmaster); printf("parent result %.9f\n", myresult); //the master counts worker, hence -1 mpi_comm_spawn("piworker", args, numworkers-1, mpi_info_null, 0, mpi_comm_self, &everyone, mpi_errcodes_ignore); double pisum = 24; int rc = mpi_reduce(&myresult, &pisum, 1, mpi_double, mpi_sum, 0, everyone); if (rc != mpi_success) printf("failure on mpi_reduce\n"); free(args); /* * parallel code here. communicator "everyone" can used * communicate spawned processes, have ranks 0,.. * mpi_universe_size-1 in remote grouping of intercommunicator * "everyone". */ //receive results int i=1; mpi_status status; double avgpi = pisum/(double)numworkers; printf("with %i workers, %i darts, estimated value of pi is: %.9f\n", numworkers, numdarts, avgpi); mpi_finalize(); homecoming 0; }

the code worker (spawned) program

int main(int argc, char *argv[]) { int size; mpi_comm parent; mpi_init(&argc, &argv); mpi_comm_get_parent(&parent); if (parent == mpi_comm_null) printf("no parent!"); int taskid; mpi_comm_remote_size(parent, &size); mpi_comm_rank(mpi_comm_world,&taskid); double pisum = 0; int resultlen = 0; char parentname[256]; int numdarts; if (size != 1) { printf("something's wrong parent"); homecoming 1; } /* * parallel code here. * manager represented process rank 0 in (the remote * grouping of) parent communicator. if workers need communicate * among themselves, can utilize mpi_comm_world. */ if(argc >= 2) numdarts = atoi(argv[1]); else { printf("error for: %i, number of darts not specified.\n", taskid); } printf("numdarts child: %i\n", numdarts); printf("argv[1] = %s\n", argv[1]); double mypisum = dboard(numdarts); printf("spawned process got result: %.9f\n", mypisum); printf("spawned process send message parent\n"); //mpi_send((void *)&mypisum, 1, mpi_double, 0, 1, parent); int rc = mpi_reduce(&mypisum, &pisum, 1, mpi_double, mpi_sum, 0, parent); if(rc != mpi_success) printf("%d: problem mpi_reduce\n"); printf("sent message parent"); mpi_finalize(); homecoming 0; }

hopefully, cause of more apparent more experience this. i've been trying sorts of things, why have many printf calls.

the problem master process dies because of wrong usage of free():

class="lang-c prettyprint-override">char* args[1]; ... args[0] = malloc(256 * sizeof(char)); ... free(args);

you trying free non-heap (stack) memory , free(args) triggers abort in modern glibc versions. right invocation should be:

class="lang-c prettyprint-override">free(args[0]);

other that, mpi_reduce not work way expect work when called intercommunicator. must alter master code passes mpi_root root argument mpi_reduce , have manually add together master's value since not used during reduction (only values processes in remote grouping beingness reduced - see here).

c mpi

No comments:

Post a Comment