如果找到了对您有用的资料,烦请点击右手边的Google广告支持我继续共享知识,谢谢! http://dengpeng.spaces.live.com/

2007年5月3日星期四

Opensource: Parallel Matrix Multiply

Requirements are here: http://www.cs.mu.oz.au/678/assignment2.html or http://www.csse.unimelb.edu.au/678/assignment2.html

 

/*
Name: 433-678 Cluster and Grid Computing
Student Number: 263497
Author: Peng Deng
Login Name: pdeng
Date: 18-04-07 07:50
*/

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
int Number_Of_Nodes, My_Rank, Source=0;

int Number_of_Elements = 0 ;
int Number_of_Rows = 0;
int Remainder;
int Remaind_Elements;

int i, j, k;

double StartTime, EndTime;

char Processor_Name[MPI_MAX_PROCESSOR_NAME];
int NameLength;

int * matrixA; //pointer to matrix A
int * matrixB; //pointer to matrix B
int * matrixC; //pointer to matrix C
int * matrixTemp;
int * matrixTempResult;

int * Data_Counts;
int * Data_Displs;

int size;
size=atoi(argv[1]); // The first argument of the program receives the size of matrices
/* NOTE: The MPICH NT 1.2.5 implementation is different from Linux distributation.*/
/* ONLY Rank = 0 can get the values from command line arguments in Linux distributation. But in mpich nt, every node has this value */

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &Number_Of_Nodes);
MPI_Comm_rank(MPI_COMM_WORLD, &My_Rank);
MPI_Get_processor_name(Processor_Name,&NameLength);

MPI_Bcast (&size, 1,MPI_INTEGER,Source,MPI_COMM_WORLD); // In The MPICH NT 1.2.5, this line is not necessary

matrixA = (int *) malloc(size * size * sizeof(int)); // this matrix will be scatter to every nodes
matrixB = (int *) malloc(size * size * sizeof(int)); // This matrix will be broadcast to every nodes
matrixC = (int *) malloc(size * size * sizeof(int)); // this matrix will be gather to My_Rank==0 from every nodes

Data_Counts = (int *) malloc(Number_Of_Nodes * sizeof(int)); //int array. Every element in this array contains a value describe how many elements per node.
Data_Displs = (int *) malloc(Number_Of_Nodes * sizeof(int)); //int array. Every element in this array contains a value describe the offset related to matrixA

if(My_Rank == 0)
{
/* fill random number into matrix */
for (k=0; k<size*size; k++)
{
matrixA[k]=rand()/1000;
matrixB[k]=rand()/1000;
}
}

/* check the size and number of nodes in MPI_COMM_WORLD */
if(size % Number_Of_Nodes ==0)
{
Number_of_Rows = size / Number_Of_Nodes; // How many rows per node
Number_of_Elements = size * Number_of_Rows; // Length multiply width. How many elements per node
matrixTemp = (int *) malloc(Number_of_Elements * sizeof(int)); //Store elements in temp array from matrix A
matrixTempResult = (int *) malloc(Number_of_Elements * sizeof(int)); //Store result to temp array

for(i=0; i<Number_Of_Nodes; i++)
{
Data_Counts[i]=Number_of_Elements; // How many elements in the i node
Data_Displs[i]=i * Number_of_Elements; // The start offset of data which will be send to node i related to matrixA(send buffer)
}
}else{
/* Last node takes all remain rows */
/*
Number_of_Rows = size / Number_Of_Nodes; // How many rows per node on average. For example, 8/3=2; 5/2=2; ......
Number_of_Elements = size * Number_of_Rows; // Length multiply width. How many elements per node on average
Remainder = size % Number_Of_Nodes; // How many rows remains. For example, 8%3=2; 5%2=1; ......
Remaind_Elements = size * Remainder; // How many elements remains
matrixTemp = (int *) malloc((Number_of_Elements + Remaind_Elements)* sizeof(int)); //Store elements from matrix A
matrixTempResult = (int *) malloc((Number_of_Elements + Remaind_Elements)* sizeof(int)); //Store result to temp array
for(j=0; j<Number_Of_Nodes-1; j++) // Fill properties of Number_Of_Nodes-1 nodes into int array.
{
Data_Counts[j]=Number_of_Elements; // How many elements in j node, j starts from 0 to Number_Of_Nodes-1. The last node is reserved.
Data_Displs[j]=j * Number_of_Elements; // The start offset of data which will be send to node i related to matrixA(send buffer)
}
//Deal with The last node
Data_Counts[Number_Of_Nodes-1]=Remaind_Elements + Number_of_Elements; // The last node take more workload to do -- Remaind_Elements
Data_Displs[Number_Of_Nodes-1]=(Number_Of_Nodes-1) * Number_of_Elements;
*/


/* The difference number of rows to process on every node is restrict to 1 */
Number_of_Rows = size / Number_Of_Nodes; // How many rows per node on average. For example, 8/3=2; 5/2=2; ......
Number_of_Elements = size * Number_of_Rows; // Length multiply width. How many elements per node on average

Remainder = size % Number_Of_Nodes; // How many rows remains. For example, 8%3=2; 5%2=1; ......
Remaind_Elements = size * Remainder; // How many elements remains

matrixTemp = (int *) malloc((Number_of_Elements + size)* sizeof(int)); //Store elements from matrix A
matrixTempResult = (int *) malloc((Number_of_Elements + size)* sizeof(int)); //Store result to temp array

for(j=0; j<Remainder; j++) // Fill properties of first Number_Of_Nodes nodes into int array.
{
Data_Counts[j]=Number_of_Elements + size; // How many elements in j node, j starts from 0 to Remainder. Every node takes one more row from Remainder.
Data_Displs[j]=j * (Number_of_Elements + size); // The start offset of data which will be send to node i related to matrixA(send buffer)
}

for(i=Remainder; i<Number_Of_Nodes; i++) // Fill properties of latter nodes into int array.
{
Data_Counts[i]=Number_of_Elements; // How many elements in i node, i starts from Remainder to Number_Of_Nodes. These nodes take one less row compare to previous nodes.
Data_Displs[i]=Remainder * (Number_of_Elements + size) + (i - Remainder) * Number_of_Elements; // The start offset of data which will be send to node i related to matrixA(send buffer)
}
}

MPI_Barrier(MPI_COMM_WORLD); // Wait untill all nodes reach this point

StartTime=MPI_Wtime(); // Start time recorded

/* Broadcast matrix B */
MPI_Bcast (matrixB, size*size,MPI_INTEGER,Source,MPI_COMM_WORLD);

/* Scatter matrixA to nodes */
MPI_Scatterv(matrixA, Data_Counts, Data_Displs, MPI_INTEGER, matrixTemp, Data_Counts[My_Rank], MPI_INTEGER, Source, MPI_COMM_WORLD);

/* Do computation matrixTempResult = matrixTemp * matrixB */
for(i=0; i<Data_Counts[My_Rank]/size; i++)
{
for(j=0; j<size; j++)
{
matrixTempResult[size*i+j]=0;
for(k=0; k<size; k++)
{
matrixTempResult[size*i+j] += matrixTemp[i*size+k] * matrixB[k*size+j];
}
}
}
/* Gather matrixTempResult from all nodes to matrixC on source node */
MPI_Gatherv(matrixTempResult, Data_Counts[My_Rank], MPI_INTEGER, matrixC, Data_Counts, Data_Displs, MPI_INTEGER, Source, MPI_COMM_WORLD);

EndTime=MPI_Wtime(); // End time recorded

if(My_Rank==0){
printf("Time: %f\n",EndTime-StartTime); // Print out th time used in the communication and computation
}

/* Free the memory allocations */
free(matrixA);
free(matrixB);
free(matrixC);

MPI_Finalize();
return 0;
}

没有评论: