如果找到了对您有用的资料,烦请点击右手边的Google广告支持我继续共享知识,谢谢! http://dengpeng.spaces.live.com/

2007年6月15日星期五

A Chinese man throws bottles at Ex-Taiwanese President

李登辉这个狗贼,人人得而诛之!

2007年6月13日星期三

433-678 Cluster and Grid Computing Model Paper

433-678 Cluster and Grid Computing

Model Paper

Question 1:

A) Discuss the major trends in computing that have led to the emergence of Cluster computing. [4]

More and more computing power is required in growing scientific research and industry applications, such as weather forecasting and earthquake analysis, etc. Such tasks often require huge amount of computing, and thus needs a more powerful processor to finish the job efficiently. To finish the job faster, we need more powerful processor. But now, the processing power of a single processor has reached its saturation point. Even if not, the price performance ratio of developing a new more powerful CPU is very big. One way to overcome this limitation is to link together two or more computers to jointly solve the computational problem. In this way, we divide the task into parts, and distribute them among those linked computers, make them work in parallel. Therefore, the overall time used to finish the task will be greatly reduced. These linked computers constitute a cluster.

The trend in parallel computing is to move away from specialized traditional supercomputing platform to cheaper, general purpose systems consisting of loosely coupled components built up from PCs or workstations. The following points describe the advantages of cluster over specialized computers:

l Individual workstations are becoming increasing powerful.

l Communication bandwidth between workstations is increase while latency is decreasing.

l Clusters are easier to integrate into existing networks than specialized parallel computers.

l Typical low user utilization of personal workstations.

l Cluster has many standards on development tools, but specialized computer do not

l Clusters can be easily grown; node’s capability can be easily increased by adding memory or additional processors.

Because of these advantages and requirements, cluster is getting more and more popular.

B) Describe the design issues and the architecture of Cluster computing systems. [6]

Cluster design issues:

• Enhanced Performance (performance @ low cost)

The aim of designing a cluster is to develop a system with better performance but at a relatively low cost, which is known as performance/cost ratio.

• Enhanced Availability (failure management)

When designing a cluster, one must take the failure issue into account. Faults, errors, failures can not be avoided from a system. A cluster with high availability must has a robust mechanism to be able to recovery from failure. The fault tolerance characteristic must be considered when designing a cluser.

• Single System Image (look-and-feel of one system)

A cluster should hide the complexity of the lower level architecture by providing the users a single unified abstraction of resource. Designing this way allows the user to use a cluster easily and effectively without the knowledge of the underlying system architecture.

• Size Scalability (physical & application)

A cluster should have great scalability no matter in physical architecture or in applications size. As the cluster is expanded, the performance of the cluster should also be scaled up. Ideally, if the rate of increase in the system size and the rate of increase in the application size are the same, the performance should remain the same as what it is before scale up. But no existing system has accomplished that goal, but wee-designed systems approximate it.

• Fast Communication (networks & protocols)

The network is the most critical part of a cluster. Its capabilities and performance directly influences the applicability of the whole system for high performance computing. Besides choosing high speed networks, a simple communication protocol is desired. In cluster, communications between nodes (constitute the cluster) are done by passing messages. A high level protocol will reduce the efficiency of the underlying message passing. So we need a lower level communication protocol for a lightweight messaging system. Because, the higher level the protocol is, the longer time needed for extracting the message from the physical message.

• Load Balancing (CPU, Net, Memory, Disk)

The cluster should be able to handle load balancing issues, that is, given a task, the system can partition it into parts so that each nodes is assigned work in proportion to its performance, no nodes in the cluster will be overloaded. This load-balancing requirement not only means the balance of workload, but also means the balance of network traffic load across the cluster.

• Security and Encryption (clusters of clusters)

Security is a must-think-of issue in cluster system design, especially for a cluster of cluster. The communication channels between local safe cluster is unsafe. To ensure the safety of these channels , one often builds up secured channels between clusters. However, secured channel often involves encryption overheads. Thus, designer should make a tradeoff between performance lost and security.

• Distributed Environment (Social issues)

• Manageability (admin. And control)

Building a cluster is not simply connecting a set of PC. Cluster is a complicated system, so administration system of a cluster is very important.

• Programmability (simple API if required)

Cluster should provides a set of simple APIs if required. This approach will increase the programmability of the cluster systems since the programmer will have more time to spend in optimizing the application itself, rather than on low-level details of the underlying programming system.

• Applicability (cluster-aware and non-aware app.)

The typical architecture of a cluster is shown in figure below:

 

A cluster is type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single integrated computing resource.

The cluster architecture can be classified into three layers:

l Hardware fabric layer:

The hardware layer includes multiple stand-alone PCs or workstations interconnected by high speed network/Switch. Each node is running its own OS. (PC/Workstation in the Figure). The network interface hardware within a single PC/Workstation acts as a communication processor and is responsible for transmitting and receiving packets of data between cluster nodes via a network/switch. Communication software offers a means of fast and reliable data communication among cluster nodes and to the outside world.

l Middleware layer:

Middleware layer is the most important layer. It is denoted as Cluster Middleware in the figure. It is responsible for offering an illusion of a unified system image (single system image) and availability out of a collection on independent but interconnected computers to higher layers and hides complexity of lower layer (hardware layer). With the help of middleware, nodes in cluster can work cooperatively as a single integrated computing resource.

l Application Layer:

Contains applications both in sequential and parallel forms, which utilize the power of cluster to finish certain tasks. Parallel programming environments supports parallel application which can be regarded as a middleware between parallel application and underlying cluster. It offers portable, efficient, standard and easy to use tools to develop and run applications.

Question 2:

A) What is a Single System Image (SSI) ? Describe different SSI services that cluster middleware need to support. [5]

Single System Image is the property of a system that hides the heterogeneous and distributed nature of the available resources and presents them to users and applications as a single unified computing resource. SSI provides benefits below:

l Use of system resources transparent.

l Transparent process migration and load balancing across nodes.

l Improved reliability and higher availability.

l Improved system response time and performance

l Simplified system management.

l Reduction in the risk of operator errors.

l No need to be aware of the underlying system architecture to use these machines effectively.

SSI needs to support services below:

l Single Entry Point: User can connect to cluster as a virtual host, although the virtual host includes multiple stand-alone computer nodes. The SSI transparently distributes the user request to each node to balance the load.

l Single User Interface: For easy to use and ease to manage, the SSI should provide a single GUI window to user. User can manage all nodes like managing a single computer.

l Single Process Space: Process has a cluster wide unique ID. A process can create child process on any other node. A process can communicate with any other process. Cluster should support globalize process management.

l Single Memory Space: Users have an illusion of a big and centralized main memory which reality is distributed on stand-alone nodes.

l Single I/O Space: This allows node to do I/O operations on other node.

l Single File Hierarchy: From users’ view, cluster provides a single hierarchical file system rather than distributed file on different nodes.

l Single Control Point: The entire cluster and each individual node can be configured through one GUI interface.

l Single Virtual Networking: Any node can access any network connected to the cluster, no matter which node the connection is connected to.

l Single Job Management System: Under a global job scheduler a user job can be submitted from any node and dispatched to any node to run.

l Check pointing and process migration: Check pointing is a software mechanism to periodically save process state and intermediate results in memory or disks. This allows the rollback after a failure. Process migration is needed in dynamic load balancing among the cluster nodes.

B) Discuss SSI architecture of implementing at Operation System and Tool levels with a suitable example. [5]

SSI can be implemented on Operating System Kernel Level. Offers full SSI, but expensive to develop and maintain due to limited market share. In the kernel-level approach, unless all components are specifically developed to support SSI, it cannot be used or released to the market. Due to this, the kernel-level approach appears as a risky and economically nonviable approach.

Cluster operating systems support an efficient execution of parallel applications in an environment shared with sequential applications. A goal is to pool resources in a cluster to provide better performance for both sequential and parallel applications. To realize this goal, the operating system must support gang scheduling of parallel programs, identify idle resources in the system (such as processors, memory, and networks), and offer globalize access to them. It should optimally support process migration to provide dynamic load balancing as well as fast inter process communication for both the system- and user-level applications.

Cluster OS can be built as a layer on top of existing OS or at Kernel level.

The first method is the approach followed by GLUnix from Berkeley. This strategy makes the system easily portable and reduces development time. GLUnix is an OS layer designed to provide support for transparent remote execution, interactive parallel and sequential jobs, load balancing, and backward compatibility for existing application binaries.

The second method need to modify existing kernel, but provide more transparent to user.

MOSIX is another software package specifically designed to enhance the Linux kernel with cluster-computing capabilities. The core of MOSIX is adaptive load balancing, memory ushering, and file I/O optimization algorithms that respond to variations in the use of the cluster resources. In such cases, MOSIX initiates process migration from one node to another to balance the load, move a process to a node that has sufficient free memory, or reduce the number of remote file I/O operations.

C) Discuss SSI architecture of implementing at Hardware levels with a suitable example. <Extended Question>

SSI can be implemented at hardware level. Hardware level implementation offers the highest level of transparency, but it has rigid architecture which is not flexible while extending or enhancing the system.

In detail, SSI can implement on memory and CPU:

l Memory:

n Boundary: memory space

n Importance: better communication and synchronization

n Examples: SCI (Scalable Coherent Interface), Stanford DASH

l CPU:

n Boundary: memory and I/O device space

n Importance: lower overhead cluster I/O

n Examples: SCI, SMP techniques

D) Discuss SSI architecture of implementing at Application level with a suitable example. <Extended Question>

Applications can also support SSI. The application-level SSI is the highest and, in a sense, most important because this is what the end user sees. At this level, multiple cooperative components of an application are presented to the user as a single application. For instances, the Linux Virtual Server (LVS) directs network connections to the different servers according to scheduling algorithms and makes parallel services of the cluster to appear as a virtual service on a single IP address. Linux Virtual Server extends the TCP/IP stack of Linux kernel to support three IP load-balancing techniques: NAT, IP tunneling, and direct routing. It also provides four scheduling algorithms for selecting servers from the cluster for new connections: round-robin, weighted round-robin, least connection, and weighted least connection. Client applications interact with the cluster as if it were a single server. The clients are not affected by interaction with the cluster and do not need modification. Scalability is achieved by transparently adding or removing a node in the cluster. High availability is provided by detecting node or daemon failures and reconfiguring the system appropriately.

An application-level approach helps realize SSI partially and requires that each application be developed as SSI aware separately. A key advantage of application-level SSI compared with the kernel level is that it can be realized in stages, and the user can benefit from it immediately, but in the kernel-level approach, unless all components are specifically developed to support SSI, it cannot be used or released to the market.

D) Discuss SSI architecture of implementing at Middleware level with a suitable example. <Extended Question>

Middleware, a layer that resides between OS and applications, is one of the common mechanisms used to implement SSI in clusters. They include cluster file system, programming environments such as PVM, job management and scheduling systems such as CODINE and Condor (Litzkow, Livny, and Mutka, 1988), and cluster-enabled Java Virtual Machine (JVM) such as JESSICA (Ma,Wang, and Lau, forthcoming). SSI offered by cluster file systems makes disks attached to cluster nodes appear as a single large storage system and ensure that every node in the cluster has the same view of the data. Global job scheduling systems manage resources and enable the scheduling of system activities and execution of applications while offering high availability services transparently. Cluster-enabled JVM allows execution of Java threads-based applications on clusters without any modifications.

l Sub-system

n Boundary: A sub-system

n Importance: SSI for all applications of the sub-system

n Examples: Distributed DB (e.g., Oracle 10g), OSF DME, Lotus Notes, MPI, PVM

l File system

n Boundary: Shared portion of the file system

n Importance: Implicitly supports many applications and subsystems

n Examples: Sun NFS, OSF, DFS, NetWare, and so on

l Toolkit

n Boundary: Explicit toolkit facilities: user, service name, time

n Importance: Best level of support for heterogeneous system

n Examples: OSF DCE, Sun ONC+, Apollo Domain

Question 3:

A) What are the key distinctions between Cluster and Grid computing? [3]

A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource.

Grid is a type of parallel and distributed system that enables the sharing, exchange, selection and aggregation of geographically distributed “autonomous” resources depending on their availability, capability, cost and user QoS requirements.

Distinctions:

1. Grid aims to aggregate different resources together, and provides the users with multi-purposes resources, such as computing, data storage, application, etc. While a single cluster tends to provide single-purpose service to the users.

2. Cluster normally lives in one physical location and has one administrative domain. A grid is usually built out of separate administrative domains and aggregate resources distributed geographically. Thus the resources in grid are tending to be unreliable.

3. The nodes in cluster are homogeneous, but in grid the resources are mostly heterogeneous. Grid can be built to connect and gather the resources from clusters, namely a grid of clusters.

B) Discuss two commercial applications of Clusters and Grids. [3]

l Google Search Engine: A large Cluster was built by Google to provide web search service to users around the world. Due to the huge amount of request and complexity of this task, one single computer can not handle this. So, cluster is built to balance the load through distributes requests to different nodes. A single system image is provided to the users.

l Global Radio Telescope Project: Radio telescopes and high performance computers around the world are connected and shared by Grid. So scientists from different countries can collaborate together to share and process data faster and get more precise result.

C) Discuss Grid Security Issues along with PKI (public key infrastructure) and its usage in Grid Computing. [4]

Grid Security Infrastructure provides methods for authentication of Grid users and secure communication. It is based on SSL (Secure Sockets Layer), PKI (Public Key Infrastructure) and X.509 Certificate Architecture.

The primary motivations behind the GSI are:

l The need for secure communication (authenticated and perhaps confidential) between elements of a computational Grid.

l The need to support security across organizational boundaries, thus prohibiting a centrally-managed security system.

l The need to support "single sign-on" for users of the Grid, including delegation of credentials for computations that involve multiple resources and/or sites.

PKI includes:

X.509 certificate:

Users gain access to resources by having their Grid certificate subjects mapped to an account on the remote machine by its system administrators. This certificate contains information vital to identifying and authenticating the user or service. A GSI certificate includes four primary pieces of information:

l A subject name, which identifies the person or object that the certificate represents.

l The public key belonging to the subject.

l The identity of a Certificate Authority (CA) that has signed the certificate to certify that the public key and the identity both belong to the subject.

l The digital signature of the named CA.

A third party (a Certificate Authority) is used to certify the link between the public key and the subject in the certificate. In order to trust the certificate and its contents, the CA's certificate must be trusted. The link between the CA and its certificate must be established via some non-cryptographic means, or else the system is not trustworthy.

The following figure describes the Mutual authentication process in GSI. (GSI certificates are encoded in the X.509 certificate format)

 

1. AàB: cert(A)

2. B: validate cert(A)

3. BàA:

4. AàB:

5. B: ,if equals to randomMessage, then A is who he says he is.

6. The same process on B side.

To mutually authenticate, the first person (A) establishes a connection to the second person (B). To start the authentication process, A gives B his certificate. The certificate tells B who A is claiming to be (the identity), what A's public key is, and what CA is being used to certify the certificate. B will first make sure that the certificate is valid by checking the CA's digital signature to make sure that the CA actually signed the certificate and that the certificate hasn't been tampered with.

Once B has checked out A's certificate, B must make sure that A really is the person identified in the certificate. B generates a random message and sends it to A, asking A to encrypt it. A encrypts the message using his private key, and sends it back to B. B decrypts the message using A's public key. If this results in the original random message, then B knows that A is who he says he is.

Now that B trusts A's identity, the same operation must happen in reverse. B sends A her certificate, A validates the certificate and sends a challenge message to be encrypted. B encrypts the message and sends it back to A, and A decrypts it and compares it with the original. If it matches, then A knows that B is who she says she is.

At this point, A and B have established a connection to each other and are certain that they know each others' identities.

Besides, GSI provides a delegation capability: an extension of the standard SSL protocol which reduces the number of times the user must enter his pass phrase. If a Grid computation requires that several Grid resources be used (each requiring mutual authentication), or if there is a need to have agents (local or remote) requesting services on behalf of a user, the need to re-enter the user's pass phrase can be avoided by creating a proxy.

A proxy consists of a new certificate (with a new public key in it) and a new private key. The new certificate contains the owner's identity, modified slightly to indicate that it is a proxy. The new certificate is signed by the owner, rather than a CA. (See diagram below.) The certificate also includes a time notation after which the proxy should no longer be accepted by others. Proxies have limited lifetimes.

 

The proxy's private key must be kept secure, but because the proxy isn't valid for very long, it doesn't have to kept quite as secure as the owner's private key. It is thus possible to store the proxy's private key in a local storage system without being encrypted, as long as the permissions on the file prevent anyone else from looking at them easily. Once a proxy is created and stored, the user can use the proxy certificate and private key for mutual authentication without entering a password.

When proxies are used, the mutual authentication process differs slightly. The remote party receives not only the proxy's certificate (signed by the owner), but also the owner's certificate. During mutual authentication, the owner's public key (obtained from her certificate) is used to validate the signature on the proxy certificate. The CA's public key is then used to validate the signature on the owner's certificate. This establishes a chain of trust from the CA to the proxy through the owner.

Question 4:

A) Discuss in detail the architecture of Grid Computing systems. [4]

 

Figure shows the hardware and software stack within a typical Grid architecture. It consists of four layers: fabric, core middleware, user-level middleware, and applications and portals layers.

Grid Fabric level

Consists of distributed resources such as computers, networks, storage devices and scientific instruments. The computational resources represent multiple architectures such as clusters, supercomputers, servers and ordinary PCs which run a variety of operating systems (such as UNIX variants or Windows). Scientific instruments such as telescope and sensor networks provide real-time data that can be transmitted directly to computational sites or are stored in a database.

Core Grid middleware

Offers services such as remote process management, co-allocation of resources, storage access, information registration and discovery, security, and aspects of Quality of Service (QoS) such as resource reservation and trading. These services abstract the complexity and heterogeneity of the fabric level by providing a consistent method for accessing distributed resources.

User-level Grid middleware

Utilizes the interfaces provided by the low-level middleware to provide higher level abstractions and services. These include application development environments, programming tools and resource brokers for managing resources and scheduling application tasks for execution on global resources.

Grid applications and portals

Are typically developed using Grid-enabled languages and utilities such as HPC++ or MPI. An example application, such as parameter simulation or a grand-challenge problem, would require computational power, access to remote data sets, and may need to interact with scientific instruments. Grid portals offer Web-enabled application services, where users can submit and collect results for their jobs on remote resources through the Web.

B) Discuss the design issues of Grid Resource management systems. [3]

The goal of designing a Grid Resource Management System is :

1. To manage the Supply and Demand for Resources. Resources in Grid;

2. Allocate resources such that:

They are allocated on fairly

They are effectively utilised

Most users are satisfied

High priority jobs are given prominence

3. Additionally, we need to make sure that Resource Providers are given appropriate “incentive” for their contribution and to ensure sustained resource sharing.

4. Plus, the resources in a grid system have many unique characteristics, such as, autonomous, heterogeneous, substrate, varying availability, Size (large number of nodes, providers, consumers), Geographic distribution and different time zones, Differing goals (producers and consumers have different objectives and strategies), Insecure and Unreliable environment.

Therefore, design a GRMS is a complex task.

The RMS can be classified into three categories: centralized, decentralized and Hierarchical:

Centralized scheduling model:

This can be used for managing single or multiple resources located either in a single or multiple domains. It can only support uniform policy. It is not suitable for grid resource management systems as they are expected to honor (local) policies imposed by resource owners.

Decentralized scheduling model:

In this model schedulers interact among themselves in order to decide which resource should be applied to the jobs being executed. In this scheme, there is no central leader responsible for scheduling; hence this model appears to be highly scalable and fault-tolerant. As resource owners can define the policy that schedulers can enforce, the decentralized scheme suits grid systems.

Hierarchical scheduling model:

This model fits for grid systems as it allows remote resource owners to enforce their own policy on external users. This model looks like a hybrid model (combination of central and decentralized model), but appears more like centralized model and therefore suits grid systems.

 

When we design GRMS we must consider Consumer and Provider:

l Grid Consumers: Execute jobs for solving varying problem size and complexity. Benefit by selecting and aggregating resources wisely. Tradeoff timeframe and cost using Strategy: minimize expenses.

l Grid Providers: Contribute (“idle”) resource for executing consumer jobs. Benefit by maximizing resource utilization. Tradeoff local requirements & market opportunity. Strategy: maximize return on investment

To inspire users to provide and use resources, the Economic based RMS is desired.

C) Describe the need for Grid/Computational Economy and its benefits. [3]

As a Grid is constructed by coupling resources distributed across various organizations and administrative domains that may be owned by different organizations, it is essential to support mechanisms and policies that help in regulate resource supply and demand. Owners of grid want to sell idle resources to customers to earn money and clients want to solve problems in low cost. The pricing of resource will be driven by demand and supply and is one of the best mechanisms to regulate and control access to computational resources. An economic approach is one means of managing resources in a complex and decentralized manner. This approach provides incentives for resource owners, and users to be part of the Grid and develop and using strategies that help maximize their objectives. This approach, namely, Grid/Computational Economy is desired.

The benefits of economy-based resource management include:

l It helps in building large-scale grid as it motivates resource owners to contribute their idle resources for others to use and profit from it.

l It provides fair basis for access to grid resources for everyone.

l It helps in regulating the demand and supply.

l It offers uniform treatment to all resources.

l It offers an efficient mechanism for allocation and management of resources.

l It helps building a highly scalable system.

l It places the power in the hand of both resource owners and users.

Question 5:

A) Discuss the architecture of a Grid Resource Broker with a suitable example. [3]

 

l Interface layer: It provides interface to both human and applications. People can access Gridbus Broker through web portal and applications can interact with it using Web Services. The inputs from the external entities are translated by this layer to create the objects in the Core layer. Three kinds of inputs are provided to the broker: a description of the application requirements, a set of services that can be utilized for executing the application, and the set of credentials for accessing the services.

l Core layer: This layer contains entities that represent the properties of the Grid infrastructure independent of the middleware and the functionality of the broker itself. Therefore, it abstracts the details of the actual interaction with the Grid resources performed by the Execution layer. This interaction is driven by the decisions made by the functional components of the broker present in the Core layer. These components can be broadly classified into two categories - entities and workers.

n Entities: Entities exist as information containers representing the properties, functions and instantaneous states of the various architectural elements that are proxies for the actual Grid entities and constructs involved in the execution.

n Workers: Workers represent the functionality of the broker, that is, they implement the actual logic and manipulate the entities in order to achieve the application objectives.

l Execution layer: The actual task of dispatching the jobs is taken care of by the Execution layer which provides Dispatchers for various middleware. These dispatchers create middleware-specific Agents from the jobs and are executed on the remote resources.

l Persistence Sub-system: The persistence subsystem extends across the three layers described previously and maintains the state of the various entities within the broker. It is primarily used to interface with the database into which the state is stored at regular intervals. The persistence sub-system satisfies two purposes: it allows for recovery in case of unexpected failure of the broker and is also used as a medium of synchronization among the components in the broker.

B) Discuss Deadline-and-Budget Constrained Time and Cost optimization scheduling algorithms for Grid Computing. [5]

1. Deadline-and-Budget Constrained Time:

1. For each resource, calculate the next completion time for an assigned job, taking into account previously assigned jobs.

2. Sort resources by next completion time.

3. Assign one job to the first resource for which the cost per job is less than the remaining budget per job.

4. Repeat all steps until all jobs are processed. (This is performed periodically or at each scheduling-event.)

2. Cost optimization

It allocates the as many jobs that the first cheapest resource can complete by the deadline and then allocates the remaining jobs to the next cheapest resources.

1. Sort resources by increasing cost.

2. For each resource in order, assign as many jobs as possible to the resource, without exceeding the deadline.

3. Repeat all steps until all jobs are processed.

C) Briefly describe the key components of Grid Simulation (GridSim) toolkit and its benefits [2].

D) Briefly describe the key components of Globus Toolkit and its benefits [2].

 

Globus Toolkit is the de facto open source toolkit for building computing grids.

3. GRAM: Grid Resource Management is a unify remote interface to Resource Managers. GRAM is for stateful job control. It creates an environment for job; stage files to /from environment; cause execution of job process; monitor execution; signal important access to client; enable client access to output files.

4. GIS: Grid Information Service. The system information is critical to operation of the grid and construction of applications. We need to use this information to determine available resource, tuning methods and state of jobs.

5. GDM: Grid Data Management manages data transfer and access, data replication.

6. GSI: Grid Security Infrastructure. It provides users single-sign-on property and secure communication with grid. It uses X.509 certificate, SSL and PKI to achieve this.

Question 6:

A) Discuss different models or strategies for parallelization of applications. [4]

Parallel applications can be classified into some well defined programming paradigms. A few programming paradigms are used repeatedly to develop many parallel programs. Each paradigm is a class of algorithms that have the same control structure. The following paradigms are popularly used in parallel programming:

1. Task-Farming (or Master/Slave):

Master decomposes the problem into small tasks, distributes to workers and gathers partial results to produce the final result. Mapping/Load Balancing: 1. Static; 2. Dynamic.

In the first case, the distribution of tasks is all performed at the beginning of the computation, which allows the master to participate in the computation after each slave has been allocated a fraction of the work. Figure below presents this way.

 

The dynamic load balancing is used when the number of tasks exceeds the number of available processors, or when the number of tasks is unknown at the start of the application, or when the execution times are not predictable, or when we are dealing with unbalanced problems.

2. Single Program Multiple Data (SPMD):

Most commonly used model. Each process executes the same piece of code, but on different parts of the data. This involves splitting the data among the available processors. Different names: geometric/domain decomposition, data parallelism.

This paradigm is highly sensitive to the loss of some process. Usually, the loss of a single process is enough to cause a deadlock in the calculation in which none of the processes can advance beyond a global synchronization point.

Figure below presents this paradigm:

 

3. Data Pipelining:

This is based on a functional decomposition approach: the tasks of the algorithm, which are capable of concurrent operation, are identified and each processor executes a small part of the total algorithm. The pipeline is one of the simplest and most popular functional decomposition paradigms. Figure below presents the structure of this model.

Processes are organized in a pipeline. Each process corresponds to a stage of the pipeline and is responsible for a particular task. The communication pattern can be very simple since the data flows between the adjacent stages of the pipeline. The efficiency of this paradigm is directly dependent on the ability to balance the load across the stages of the pipeline. The robustness of this paradigm against reconfigurations of the system can be achieved by providing multiple independent paths across the stages. This paradigm is often used in data reduction or image processing applications

 

4. Divide and Conquer:

A problem is divided into two or more sub problems, and each of these sub problems are solved independently, and their results are combined to give a final result. Because the sub problems are independent, no communication is necessary between processes working on different sub problems. We can identify three generic computational operations for divide and conquer: split, compute, and join. Master-worker/task-farming paradigm is like divide and conquer with master doing both split and join operation. Figure below presents this model.

 

5. Speculative Parallelism:

It used when it is quite difficult to achieve parallelism through one of the previous paradigms. Problems with complex dependencies – use “look ahead “execution. Another use is to employ different algorithms for solving the same problem—the first one to give the final solution is the one that is chosen.

B) Discuss the design of a parallel algorithm for matrix multiplication? Discuss its implementation using the standard MPI (message passing interface). [6]

The design of the matrix multiplication algorithm can be organized into 4 stages. And this belongs to Task-Farming paradigm.

1. Partitioning:

In my design, each processor/node will be assigned the whole matrix B. I only decompose matrix A into smaller sub matrixes. And distribute each of these sub matrix to one processor/node in the cluster. This technique is of domain/data decomposition. How the decomposition of data is done according to rows of matrix A, and the process is as follows:

Denote the number of processors/nodes as Number_of_Nodes, and the size of the matrix A as Matrix_Size. And Number_of_Rows_per_Node equals to , and Remainder equals to Matrix_Size MOD Number_of_Nodes. If Remainder equals to 0, which means every node in the group is assigned Number_of_Rows_per_Node rows of work exactly. But if Remainder does not equal to 0, I simply assign each row in Remainder to one node in the group. Noted that the Remainder is less than Number_of_Nodes, so there is a maximum of 1 difference in the work load between every two nodes.

2. Communication:

I my design, there is no communication between worker process. The master process first broadcasts matrix B to all the worker processes. And then partitions matrix A according to scheme described previously, and scatters them to each worker process. Right after all the worker finish their tasks, the master process gathers all the results from them and generates the final whole result of the multiplication.

3. Agglomeration:

The communication paradigm is very simple in my design, no need to group smaller tasks into larger task or combine individual communication to a super communication to improve performance or reduce communication cost.

4. Mapping:

In my design, I assume that each node/processor has the same processing power. So mapping tasks to processors is very easy. Just assign each sub matrix resulted in partitioning stage to every node/processor in the cluster, and assign the whole matrix B to every node/processor.

Implementation using MPI:

1. Initialization of MPI environment, and get current process’s rank number:

MPI_Init(&argc,&argv);

MPI_Comm_rank(MPI_COMM_WORLD, &My_Rank);

2. Get the size of worker nodes;

MPI_Comm_size(MPI_COMM_WORLD, &Num_of_Nodes);

3. Using partition strategy described previously to partition matrix A;

4. Matrix B is broadcasted to all nodes/processors:

MPI_Bcast(matrixB,Matrix_Size*Matrix_Size,MPI_INT,0,MPI_COMM_WORLD);

5. Matrix A is scattered to all nodes according to the previous partition strategy.

MPI_Scatterv(matrixA, MPI_INTEGER, matrixTemp, MPI_INTEGER, Source, MPI_COMM_WORLD);

6. Do multiplications parallel among all worker nodes/processors.

7. Gather all results from nodes to master process with rank number 0

MPI_Gatherv(matrixTempResult,MPI_INTEGER, matrixC, MPI_INTEGER, Source, MPI_COMM_WORLD);

Question 7:

A) Write a multithread program for file copy operation. [5]

reader()

{

- - - - - - - - - -

lock(buff[i]);

read(src,buff[i]);

unlock(buff[i]);

- - - - - - - - - -

}

writer()

{

- - - - - - - - - -

lock(buff[i]);

write(src,buff[i]);

unlock(buff[i]);

- - - - - - - - - -

}

B) Discuss parametric processing programming model and its applications in Grid computing. [5]

When a user describes an experiment to Nimrod, they develop a declarative “plan” file which describes the parameters, their default values, and the commands necessary for performing the work. The system then uses this information to transport the necessary files and schedule the work on the first available machine.

Parametric computing is a killer application model for grid computing.

A declarative parametric language is powerful enough to support expression of many applications as grid applications.

As demonstrated in Drug Discovery application, parametric model saves a lot software engineering cost.

2007年6月6日星期三

433-654 Sensor Networks and Applications Model Paper and Solutions Ver. 0.8

PDF Version contains many figures

433-654 Sensor Networks and Applications

1. Describe major problems in every component, address features of WSN and explain major problems in one sentence.

 

/***********************************

Solution:

WSN features:

l Small-scale sensor nodes

l Limited power they can harvest or store

l Harsh environmental conditions

l Node failures

l Mobility of nodes

l Dynamic network topology

l Communication failures

l Heterogeneity of nodes

l Large scale of deployment

l Unattended operation

Power Supply: Most nodes are battery powered, have limited lifetime. It is almost impossible to replace battery for each node, since large quantity and deployed site.

Sensors:

Memory: Limited memory space, since node is small and low cost.

Micro Processor: Limited CPU processing power since to reserve battery and keep low cost.

Radio: Short range wireless communication, cost large amount of energy (1000 times larger than computation).

***********************************/

2. While designing algorithms to run on WSN using some generic sensors like Xbow Mote for example. What is the main constraint that we have to consider that exists in our sensors and which component and operation has effect on this.

/***********************************

Solution:

Energy is the main constraint in sensor. Since most of sensor nodes are powered by battery, the lifetime of whole system is depend on battery life. And replace battery is impossible due to large amount of nodes and deploy environment like battlefield.

Every component in sensor needs energy to work. The communication component is a big consumer. It takes 1000 times more energy to transmit data than to process it. So, to extend lifetime, the system should keep a low duty cycle and keep communication as low as possible. In this case, the in-network process and in-network data aggregation is preferred.

***********************************/

3. Multi-hop RF network for establishing WSN rather than requiring each sensor to directly connect to a base station. A multi-hop network is more power efficient. Explain it in detail.

/***********************************

Solution:

 

l r: r is a distance value in multi-hop style.

l Nr: Nr is the distance from sender to receiver in one-hop style.

l : The power to send message from sender to receiver.

l : The minimum receive signal strength from receiver’s view.

l : The power to send message from one sender node o another receiver node in multi-hop style.

The curve in figure above shows the signal attenuation. The power advantage of multi-hop over one-hop is:

 

The α value is signal attenuation constant.

***********************************/

4. What is duty cycle in WSN? What is the aim of design engineering with this parameter?

/***********************************

Solution:

 

 

For low-powered electronic device like nodes in WSN, they at least have two states “Sleep” and “Awake”. In “Awake” mode, all components in node working and consume energy. In “Sleep” mode, most of components are in energy saving mode which cost few energy. Tasks and events can awake sleeping components. The duty cycle is defined as above. The aim of WSN duty cycle is 0.1% to 1%.

***********************************/

5. What is APIT test and how does the location algorithm that we saw in class work that uses this test?

/***********************************

Solution:

APIT stands for Approximate Point-In-triangulation Test. It is a range-free location algorithm.

1. The unknown node collects location information from all neighbor landmarks.

2. Randomly select tree of them, and do a triangulation test to check whether the unknown node is in the triangle. Iteratively do this step until get desired accuracy.

3. Use centroid approach to compute all triangles and the overlapped area is the location of unknown node.

***********************************/

6. Routing data in WSN. A greedy algorithm can be utilized. Although greedy algorithms are simple and powerful, they may get stuck in local. Why this will happen? How to deal with this?

/***********************************

Solution:

If all neighbor nodes’ costs are larger than the cost of the node itself to reach a destination, but the destination is still out of reach. This is called routing void.

To deal with it, the routing void node can select the neighbor node which cost least and report this node to previous node. Next time, the previous node can choose the neighbor node to communicate rather than routing void node. Route along the perimeter of a face using right hand rule on the sub-graph.

 

***********************************/

7. Write a COUGAR like query language according to the statement below

“Find the max temperature level for towns in Tasmania for two hours starting from now by sampling every 10 minutes if that town also has an average humidity for 90% or above.”

/***********************************

SELECT MIN(humidity), town

FROM sensors

WHERE state = ‘Queensland’

GROUP BY town

HAVING MAX(temperature) > 30

DURATION [now, now + 600 min]

SAMPLING PERIOD 30 min

Solution:

SELECT MAX(temperature), town

FROM sensor

WHERE state=’ Tasmania’

GROUPBY town

HAVING AVG(humidity)>90

DURATION[now, now+7200]

EVERY 600

***********************************/

8. Compare tree-based and multi-path-based data aggregation schemas from at list two points of view.

/***********************************

Solution:

l Tree-based: The base station is the root of routing tree. The tree, that is utilized to distribute the query, is also utilized to collect the data. TinyDB is an example.

l Multi-path-based: To prevent failures, the same sensor value can be sent along multiple paths.

Main disadvantage of tree-based is that it can loose large sub-trees/data due to central point of failure. Multi-path-based schema consumes more energy than tree-based. And data aggregation can be applied to tree-based schema which significantly reduces energy cost. The multi-path-based schema also can do data aggregation, but will lead to multiple different approximation value when reaching sink node.

***********************************/

9. Fractional Cascading is a distributed hierarchical aggregation and storage method. Explain how Fractional Cascading works.

/***********************************

Solution:

Fractional Cascading is a distributed hierarchical aggregation and storage method.

In most cases, queries in WSN are highly spatially correlated. Because WSN measure physical phenomena and physical interaction are mostly localized in space and/ or time.

The key idea is each does not need to know the exact value of other node, but a “fraction’ information of that area. Further the node between the areas, less and coarser value the node has.

For example, Fractional Cascading works in this style:

1. Setup Quad-tree in sensor field

2. Each sensor stores values from three siblings

 

From P’s point of view, the P has exact value of P’s three siblings (Darkest area). P has highest value of parent’s three siblings (Darker area). P has a few knowledge of lightest area. When the query is injected into sensor field, the closer the query to the hot area, the precise value it will get.

l The total amount of information duplication across all sensors is kept small, because of the geometric decrease with distance.

l The communication costs required to build and maintain the index remain reasonable, as on the average information travels only a short distance.

l Neighboring sensors have highly correlated world views.

l It is very fast when region is local and small.

GHT stores data far away from the node generated it.

It restricted geographic flooding, but the communication cost of visiting hot spots and canonical pieces dominates the total query cost.

***********************************/

10. WSN can observe various kinds of security problems. One of these attacks is known as “Sink hole”. Describe how “Sink hole” attack works and main effect.

/***********************************

Solution:

In this attack, the adversary lures all the traffic of the network to pass through a compromised node creating a metaphorical sinkhole with the adversary at the centre. This is achieved by the routing algorithm being used. As a result, the surrounding nodes route all traffic to the base station via the compromised node.

The compromised node is yielding to other node saying: “Hello! I have best route to transmit your data!” But actually, do nothing and drop all packets. The real sink node then can not get any information.

***********************************/

11. What is SensorML? Describe the main reasons of why one may want to use SensorML.

/***********************************

Solution:

SensorML is a XML file describing sensor properties and information needed for data processing.

SensorML enables discovery and control of sensors on the Web and enables geo-location of measurements. It provides node’s performance and calibration characteristics, describes the sequence of processes by which an observation was made. It enables derived observations. It also gives function model of sensor system.

Advantages:

l Tree style, some parts can be ignored which saves time.

l Platform independent, the XML file is platform neutral.

l If some parts loose, it still works

Disadvantages:

l Verbose and redundant, data occupies a lot of space. This causes overhead in storage, energy cost, communication bandwidth, process speed and so on. The nodes right have limited capability to deal resource hungry XML.

l SensorML is a complex and developing standard.

l WSN is data-centric network, no one want to spend time on annotate nodes.

***********************************/

12. The figure below presents different strategies for routing and communication single targets state information in WSN. State advantages and disadvantages of each method.

 

a b c

/***********************************

Solution:

l a: Every node in the network stores and updates target state information. It is very robust. The challenges are how to maintain information consistency and it is difficult to response user’s queries effective and correct.

l b: Leader nodes are selected in the succession according to information such as vehicle movement. A leader is elected combines with measurements from nearby sensor. As the target moves, the initial node is no longer desired. The current leader handoff target state information to new elected leader until task finish. It is advantage in reducing the overall bandwidth consumption and increase the network lifetime. The main challenge is how to elect a good leader node.

l c: A fix node stores the target state from other relevant nodes through communication. The communication cost would be very high and the fix node is a single point of failure reducing the robustness of the whole task.

***********************************/

13. Why developing software in WSN a programmer can choose from different programming paradigms. We have mentioned two such. Name and describe each.

/***********************************

Solution:

l Node-centric: Programmers think of how a node should behave in environment. Node-level OS provides hardware and networking abstractions of a sensor node o programmers, or it can be a language platform, which provides a library of components to programmers. TinyOS is a node-centric OS.

l State-centric: Take PIECES (Programming and Interaction Environment for Collaborative Embedded System) as example. It contains two parts principals and port agent. Principal maintains state corresponding to certain aspects of physical phenomenon of interest. It updates state from time to time. Port agents facilitate communication between principals Agents can be active or passive. Active – pushes data autonomously to destinations; Passive – sends data only when consumers request it

***********************************/

14. What is ZigBee? Wht purpose of ZigBee? Why? What advantages and disadvantages it has? Describe ZigBee architecture and application domain.

/***********************************

Solution:

ZigBee is a suit of protocol designed for low-cost, low-power, low-date-rate and small-scale wireless devices. It is based on the IEEE 802.15.4. ZigBee is targeted at RF applications that require a low data rate, long battery life, and secure networking.

 

ZigBee is mainly used in monitor and control applications.

Advantage

l Network size: support mesh, star, hybrid network architecture.

l Low power consumption: Devices can run for years.

Disadvantages

l Cost still high: Like Blootooth years before.

l Interfere with other wireless protocol: Use free ISM band.

l Dynamic routing: It is lack of dynamic routing, making it impossible to build self healing networks that can restructure itself when a router or coordinator is disconnected.

l Secure transfer: Although encryption exists on the hardware level there is no support for setting the current key or performing key exchange with other nodes. These features are lacking because they where not specified in version 0.92 of the ZigBee standard.

***********************************/

Extend Question

15. Describe PDT.

/***********************************

Solution:

Pocket-Driven Trajectories (PDT) Algorithm is selectivity-aware, localized and adaptive. It is highly efficient for queries with predicates on spatially correlated attributes and long-running queries.

It aims to minimize the use of non-selected parts by computing selectivity-aware data collection paths and reconfigure spatial network efficiently when nodes can move or when they die.

 

 

Typical performance improvement of 30%−40% for selective queries.

***********************************/

16. Describe tracking issues and DEH (Distributed Euler Histogram) algorithm

/***********************************

Solution:

Distributed Euler Histogram is a low cost algorithm designed for Distinct Entries query but also can answer Total Traffic query and performs well for the Distinct Objects query.

DEH maintains Face Counts and Edge Counts in a Voronoi diagram. The answer for Distinct Entries query is Face Count – Edge Count.

 

For example:

Distinct Entries in P= 6 – 5 = 1;

Distinct Entries in Q= 2 – 1 = 1;

Distinct Entries in R= 4 – 2 = 2.

Data is commonly aggregated using a random tree in WSNs that we also use in our method

GPSR is used for routing.

l Distributed Euler Histogram is an efficient approach to solve aggregate queries in WSNs

l Distributed Euler Histogram is better than ID-based System because:

l DEH achieves similar accuracy for Distinct Objects query

l DEH achieves higher accuracy for Distinct Entries query

l DEH has much less cost than keeping IDs

***********************************/

17. Describe “Worm hole”.

/***********************************

Solution:

In a wormhole attack, the adversary tunnels messages received in one part of the network to another part of the network over a low-latency link and replays the messages in this part of the network. The wormhole attack requires two at least two malicious nodes. An adversary situated close to the base station may be able to convince all the nodes that are several hops away from the base station that they are only one hop away via the wormhole. All routes will then pass through the wormhole-thereby creating a sinkhole. The attacker can then selectively forward packets or manipulate data and thereby pose various kinds of security threats.

Figure below depicts the wormhole attack scenario discussed above. The malicious nodes/adversaries are depicted in red color. The wormhole is depicted in gray color and the generated as a result of the false belief that the base station is closer via the adversary. There is a sinkhole created at both the adversaries as a result of this wormhole.

 

***********************************/

18. Describe the detection and SNR advantage.

/***********************************

Solution:

In sensor field, denser sensor field can get higher accurate result by 10logN due to Signal Noise Ratio. Increasing the sensor density by a factor of N gives a SNR advantage of:

 

***********************************/

19. Describe proactive and reactive protocol

/***********************************

Solution:

l Proactive Protocol: These algorithms maintain fresh lists of destinations and their routes by distributing routing tables in the network periodically. The main disadvantages are maintaining a large routing table, communication overhead. But the first message would be very fast and optimal.

l Reactive Protocol: The protocol finds the route on demand by flooding the network with Route Request packets. The main disadvantages of such algorithms are 1. Delay in route finding. 2. Excessive flooding can lead to network clogging.

***********************************/

20. Describe Rumor Routing

/***********************************

Solution:

It is a agent-based routing protocol. Source nodes send out packets through random nodes and sink node also send query packet through random nodes. If two routs overlap then the route is found.

The disadvantages are:

l Routs are not optimal

l No delivery guarantee

l Performance depends on topology

 

***********************************/

21. Describe ADOV

/***********************************

Solution:

ADOV (ad hoc on-demand distance vector protocol) is a reactive routing protocol.

***********************************/

22. Describe GLS

/***********************************

Solution:

In GLS (Geographic location service), nodes act as a location server in their neighborhood. Sensor field is organized in a hierarchy Quad-tree mode. GLS tries to map the node IDs to node location.

 

***********************************/

23. Attack from layers in WSN

/***********************************

Solution:

Layer

Attack

Defend

Physical

Tampering

Fake node,

Compromised self-check

Capture and subversion

Link Layer

Jamming

Frequency hop

Contention

Exhaustion

Black list

Network Layer

Flooding

Challenge and response

Traffic redirection

Authentication

Spoofing

Packet dropping

Homing

Encryption,

node to node authentication

Application Layer

Eavesdropping

Encryption

Replay

Nonce

Data insertion

MAC

Data modification

MAC

***********************************/

2007年6月4日星期一

433-678 Cluster and Grid Computing Quiz Ver. 0.91

PDF Version Download

433-678 Cluster and Grid Computing Quiz

(Brain dump, enjoy~)

 

1. Which of the following is a reliable communication and delivery protocol?

a)TCP/IP, b)UDP, c)MPI, d)None of the above

Answer: a

/*****************************

Solution:

TCP is a connection- oriented protocol.

UDP is a connectionless protocol. It sends and forgets.

MPI (Message Passing Interface) is a de facto standard in parallel computing.

*****************************/

2. In Java Threads, which of the following methods execute threads without blocking?

a)Thread.run(), b)Thread.join(), c)Thread.start(), d)Thread.interrupt()

Answer: c

/*****************************

Solution:

.run():If this thread was constructed using a separate Runnable run object, then that Runnable object's run method is called; otherwise, this method does nothing and returns.

.join():Waits for this thread to die.

.start():Causes this thread to begin execution; the Java Virtual Machine calls the run method of this thread.

.interrupt():Interrupts this thread.

*****************************/

3. Which of the models below follows SIMD?

a)Stream Processing, b)Uniprocessor, c)Vector Processing, d)None of the above

Answer: c

/*****************************

Solution:

Flynn’s Law is a classification of computer architecture. It is based upon the number of current instruction and data streams available.

  Single instruction

Multiple instruction

Single data SISD MISD
Multiple data SIMD MIMD

1. Single Instruction, Single Data stream (SISD) - a sequential computer which exploits no parallelism in either the instruction or data streams. Examples of SISD architecture are the traditional uniprocessor machines like a PC or old mainframes.

2. Multiple Instruction, Single Data stream (MISD) - unusual due to the fact that multiple instruction streams generally require multiple data streams to be effective. However, this type is used when it comes to redundant parallelism, as for example on airplanes that need to have several backup systems in case one fails. Some theoretical computer architectures have also been proposed which make use of MISD, but none have entered mass production.

3. Single Instruction, Multiple Data streams (SIMD) - a computer which exploits multiple data streams against a single instruction stream to perform operations which may be naturally parallelised. For example, an array processor or GPU.

4. Multiple Instruction, Multiple Data streams (MIMD) - multiple autonomous processors simultaneously executing different instructions on different data. Distributed systems are generally recognised to be MIMD architectures; either exploiting a single shared memory space or a distributed memory space.

As of 2006, all the top 10 and most of the TOP500 supercomputers are based on a MIMD architecture.

l Single Program, Multiple Data streams (SPMD) - multiple autonomous processors simultaneously executing the same program (but at independent points, rather than in the lockstep that SIMD imposes) on different data. Also referred to as 'Single Process, multiple data'[6]. SPMD is the most common style of parallel programming.

l Multiple Program Multiple Data (MPMD) -- multiple autonomous processors simultaneously operating at least 2 independent programs. Typically such systems pick one node to be the "host" ("the explicit host/node programming model") or "manager" (the "Manager/Worker" strategy), which runs one program that farms out data to all the other nodes which all run a second program. Those other nodes then return their results directly to the manager.

Stream Processing: MISD

Uniprocessor: SISD

Vector Processing: SIMD

*****************************/

4. “Speedup obtained by distributed execution of a problem is limited by the non-parallelizable (or serial) component of the problem”. This law is better known as:

a)Gustafson’s Law, b)Amdahl’s Law, c)Moore’s Law, d)Brooks’ Law

Answer: b

/*****************************

Solution:

1. Gustafson’s Law: The speed up of program depends on the non-parallelized part of process.

2. Moore’s Law: Transistors on a single chip doubles ~ every 18–24 months.

3. Brooks’ Law: Adding manpower to a late software project makes it later.

*****************************/

5. List 4 points of difference between Cluster and Symmetric Multi-Processing (SMP) systems:

Answer:

/*****************************

Solution:

l Single point of failure in SMP

l Cluster can be built on heterogeneity hardware and OS

l Cluster is more scalable than SMP

l Cluster is easier to implement

*****************************/

6. Which of these is NOT an operating system used in cluster?

a)Linux, b)PBS, c)Windows NT, d)Solaris MC

Answer: b

/*****************************

Solution:

1. PBS (Portable Batch System): is a computer software job scheduler that allocates network resources to batch jobs. It can schedule jobs to execute on networked, multi-platform UNIX environments.

2. Solaris MC (Multi-Computer): Solaris MC is a prototype distributed operating system for multi-computers (i.e., clusters of nodes) that provides a single-system image: a cluster appears to the user and applications as a single computer running the Solaris operating system.

*****************************/

7. MPI programming is generally carried out at which level of granularity?

a)Large/Task, b)Medium/Function, c)Fine/Compiler, d)Very Fine/Hardware

Answer: a

/*****************************

Solution:

1. Large/Task: PVM, MPI

2. Medium/Function: Threads

3. Fine/Compiler: lines of code

4. Very Fine/Hardware: CPU, Hardware

*****************************/

8. “MPI_Comm_rank(MPI_COMM_WORLD, &rank);” this function call:

a) Initializes the MPI communicators

b) Asks for the rank of the host process

c) Sends a message to the process related “rank”

d) None of the above

Answer: b

/*****************************

Solution:

l Initializes the MPI communicators: MPI_Comm_size(MPI_COMM_WORLD, & numtasks);

l Sends a message to the process related “rank”: MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);

buf the address of the data to be sent

count the number of elements of datatype buf contains

datatype the MPI datatype

dest rank of destination in communicator comm

tag a marker used to distinguish different message types

comm the communicator shared by sender and receiver

ierror the fortran return value of the send

*****************************/

9. List 4 stages of the methodical design of parallel programs

Answer:

/*****************************

Solution:

1. Partitioning: Decomposition of computational activities and the data into small tasks – there exist number of paradigms – e.g. master worker, pipeline, divide and conquer, SPMD, and speculation.

2. Communication: Flow of information and coordination among tasks that are created in the portioning stage.

3. Agglomeration (分块): Tasks and communication structure created in the above stages are evaluated for performance and implementation cost. Tasks may be grouped into larger tasks to improve communication. Individual communications can be bundled.

4. Mapping / Scheduling: Assigning tasks to processors such that job completion time is minimized and resource utilization is maximized. Even the cost of computation can be minimized based on QoS requirements.

*****************************/

10. MOSIX provides single system image at which level

a)Middleware, b)Application, c)File system, d)Kernel/OS

Answer: d

/*****************************

Solution:

MOSIX is a management system for Linux clusters and organizational Grids that provides a Single-System Image (SSI). In a MOSIX cluster/Grid there is no need to modify or to link applications with any library, to copy files or login to remote nodes, or even to assign processes to different nodes - it is all done automatically, like in an SMP.

*****************************/

11. At what level Google implements its Search Engine

a) Application Level, b) Operating System Level, c)Hardware Level, d)Compiler Level

Answer: a

/*****************************

Solution:

From users’ view, Google provides Application Level Single System Image, but we do not know whether Google implements SSI on other level.

 

Application and Subsystem Level

Google, PBS, Oracle 10g, SUN NFS, …

Operating System Kernel Level

Solaris MC, MOSIX, …

Hardware Level

SCI (Scalable Coherent Interface), …

*****************************/

2007年6月3日星期日

股市要你死亡,必先叫你疯狂

有曰:"上帝要你死亡,必先叫你疯狂。"套在股市,也是放诸四海而皆准。特别是新兴股市,没有经过几次疯狂与死亡,那个股市就不会成熟。中国股市的历史才十几年,老百姓习惯于共产党领导下的,例如疯狂的大?跃进与文革的群众运动,要他们在股市不疯狂一下,是不可能的事情。最近网络流传的"股歌",改编了国歌"义勇军进行曲"与流行曲"梦醒时分",就具有"中国特色"。
要使股民疯狂,并且有越来越多的民众加入股民的队伍,并且随之疯狂,必须是要股民相信中国股市长升不跌的神话,即使有调整,也是短暂调整立刻恢复升市。而成交量也必须大增,中国股市今年年初单日成交爆天量,首破一千亿元,远远超过香港的成交量;四月二十四日成交量再创三一七九亿元新高。四月十一日深沪股市总市值破人民币十四兆元,首次超过港股约十三兆七千亿元的总市值。去年中国股市指数大约上升一倍,今年第一季度上升了四成,屡创新高,到四月三十日以三八四一点收市。其中出现几次小股灾,但都出现V形反弹,加上官方的信心喊话,即使加息,也从开始市场会有反应到后来完全没有反应,使民众越来越相信,股市不会跌。
在这个情况下,股市每次暴跌,反而引来投资者大批入市,最明显的表现是登记的新股民大增,因为他们认为这是趁低入市的大好机会。例如"二二七"股灾,中国增加十九万四千户新股民。全国有多少股市与基金户口?有一个报导,四月二十六日,总数已达九千二百五十五万户;另一个报导,到四月十日为止,开户总数超过一亿户。都超过了美国七千六百万户的股民数量。对中国这个新兴的市场来说,这些数字不可谓不大,但是还不到两位数,应该还有发展空间。在"五一"黄金周前几天,每天有逾二十万人开户。一些证券行利用长假开设炒股班,报名者踊跃,因此估计节后股民人数还会激增。
股市出现泡沫,必须到"全民皆股"的地步,大户才可能把火棒交给小投资者而使自己全身而退。所谓"全民皆股",当然不是每个人都炒股票,但是稍微有经济能力,也就是有存款的,都会买上几手。看目前中国股市的情况,已经出现不少使人担忧的情况。根据中国媒体报导的若干情况,比香港股市爆破前的情况还要严重。
一月分就有报导说,中国的保母、司机都在买股票、买基金;刚从上海出差回香港的瑞士信贷董事总经理、亚洲区首席经济分析师陶冬,对于这种全民疯股市的景象印象深刻。他在上海见识到从出租车到餐馆,每个人都在谈股票。对"走资"先知先觉的泉州市,证券公司门口卖茶叶蛋的老太太也能推荐股票。
大学校园出现一批炒股大军,部份学校还开设了炒股课程,成为最热门的选修课,有的学生以拖欠学费来炒股。广州大学生里出现"炒股大户",有一成学生开户。深圳的调查 ,新生一成炒股,大四学生则有八成。
更惊人的是广州一家小学三十多个小学生成为股民,而且获利颇佳,有的还不到十岁,却有了两年"股龄"。在南京力学小学读四年班有八名炒股炒基金。学校、家长对此竟面看待,认为可以培养他们的数学头脑,也不必为他们的利是钱如何用烦恼,还有所谓心理学家说:又不是放弃学业全天候炒股,没什么好大惊小怪的,用利是钱来学经济知识,比乱花金钱好。
上班族也卷入炒股。有人在上班时频频如厕,其实是查看股市行情。由于要留意股市走势,因此不少上班族都将见客时间安排在下午三点,亦即是股市收市之后。因为彼此是"同道中人",三点后才见客,已是内地职场的"潜规则"。上海的保姆市场冷却,"上海阿姨"有一大半参与炒股,有近一成辞工专门炒股;钟点女工只肯接下午三点股市收盘后或周末的工作。而保姆在主人家天天与外界通电话打听行情,无心工作。 二○○五年底,也就是中国股市进入大牛市前夕,中国的银行储蓄存款达十四万亿元,因此投入股市的潜力很大。然而人行上海总部发布的一月上海市货币信贷运行报告显示,当月,上海本外币储蓄减少一九二亿元;中资银行同业存款则增加一三五六亿元,其中大部份是证券公司和结算公司的存放款。这说明主要资金并不是来自居民的储蓄存款。
据中国报章披露,掌控一万两千亿美元的国际对冲基金正以各种合法或非法途径大举杀入中国资本市场。地下外资"入市"规模可能高达八百亿美元(折合六千多亿人民币),主要来自欧美和日本。值得注意的是,他们全把资金推给"境外",有意回避"境内"资金。中国对境外资金都有限制,如果能进来这样多,也是"境内"有"里通外国"者。不论是境外还是境内,都和国内的特权阶层分不开。
由于股市热潮。垃圾也可以变为凤凰,一旦爆破,这些垃圾势必最早打回原形。而垃圾公司能被炒作,或者一般上市公司股价被炒作到远离其实际价值,都要拜大机构炒作所赐。他们的资金来源是金融机构,如银行、社保机构等,这也是为何要收紧银根的原因。但是为何不能从根本上惩处违规行动呢?因为他们都有后台。
中国股市是政府与国有企业的圈钱市场,因此本来就有"政策市"之说。为了"引钱出洞",他们必须投入资金造势,那是"吃小亏,占大便宜",因此小投资者只能量力而行,不能太贪心而上钩,因此千万不要借钱或典当房子来买股票。即使纽约、香港这样法制健全的国际金融中心,还是会被金融蛀虫钻空子,何况中国这样法制不彰、贪官污吏横行的中国市场!
一些中国专家与国际投资银行不断警告中国股市已经出现泡沫。泡沫的最好说明是中国A股与在香港上市的中国H股的差价。四月底,第二家以A、H股同步形式上市的中信银行首日挂牌,中港两地股价表现悬殊,H股收市升幅14%;A股升势则较招股价高出近一倍。香港的投资者比较理智,两者的差价,可能就是它的"泡沫",泡沫越大,当然越容易爆破。从中港股市平均市盈利(本益比)来说,香港约十四倍,中国是四十倍,中国的经济与企业前途再光明,差距也不会如此之大,何况中国上市公司有许多黑幕。
中国政府当然要尽量维持这个五彩缤纷的泡沫,让更多的民间资金投入,才可以多多榨取人民的财产。也由于中国市场的封闭,以及政府对市场的深度介入,因此在其他国家看来早该的爆破,在中国却可能显得更加绚丽壮观。然而天下无不散的宴席,股市不可能永远升,也不会永远跌,脱离实际的大升,跌起来必然很可怕。而参与的股民越多,对社会的震荡也越大。也许,中国社会矛盾的总爆发,就是从股市崩盘引发出来的呢。

林保华