Parallel Algorithm

0
110

Parallel Algorithm – Introduction

An algorithm is a sequence of steps thead wear get inplaces from the user and after several complaceation, produces an awayplace. A parallel algorithm is an algorithm thead wear can execute lots of instructions simultaneously on various processing devices and then combine all the individual awayplaces to produce the final result.

Concurrent Processing

The easy availability of complaceers alengthy with the gseriesth of Internet has alterd the way we store and process data. We are living in a day and age where data is available in abundance. Every day we deal with huge volumes of data thead wear require complex complaceing and thead wear too, in quick time. Sometimes, we need to fetch data from similar or interrelated furthermorets thead wear occur simultaneously. This is where we require concurrent processing thead wear can divide a complex task and process it multiple systems to produce the awayplace in quick time.

Concurrent processing is essential where the task involves processing a huge bulk of complex data. Examples include − accessing big databases, aircraft checking, astronomical calculations, atomic and nuclear physics, biomedical analysis, economic planning, image processing, robotics, weather forecasting, web-based services, etc.

Whead wear is Parallelism?

Parallelism is the process of processing lots of set of instructions simultaneously. It reddish coloureduces the comppermite complaceational time. Parallelism can end up being implemented simply by using parallel complaceers, i.e. a complaceer with many kind of processors. Parallel complaceers require parallel algorithm, programming languages, compilers and operating system thead wear supinterface multitascalifornia king.

In this tutorial, we will discuss only abaway parallel algorithms. Before moving further, permit us preliminary discuss abaway algorithms and their particular particular kinds.

Whead wear is an Algorithm?

An algorithm is a sequence of instructions followed to solve a problem. While designing an algorithm, we need to conaspectr the architecture of complaceer on which the algorithm will end up being executed. As per the architecture, generally generally there are 2 kinds of complaceers −

  • Sequential Complaceer
  • Parallel Complaceer

Depending on the architecture of complaceers, we have 2 kinds of algorithms −

  • Sequential Algorithm − An algorithm in which several consecutive steps of instructions are executed in a chronological order to solve a problem.

  • Parallel Algorithm − The problem is divided into sub-problems and are executed in parallel to get individual awayplaces. Later on, these individual awayplaces are combined with every other to get the final desireddish coloured awayplace.

It is not easy to divide a big problem into sub-problems. Sub-problems may have data dependency among all of them. Therefore, the processors have to communicate with every other to solve the problem.

It has end up beingen found thead wear the time needed simply by the processors in communicating with every other is more than the take actionionual processing time. So, while designing a parallel algorithm, proper CPU utilization need to end up being conaspectreddish coloured to get an effective algorithm.

To design an algorithm properly, we must have a clear idea of the fundamental model of complaceation in a parallel complaceer.

Model of Complaceation

Both sequential and parallel complaceers operate on a set (stream) of instructions caldelivered algorithms. These set of instructions (algorithm) instruct the complaceer abaway whead wear it has to do in every step.

Depending on the instruction stream and data stream, complaceers can end up being courseified into four categories −

  • Single Instruction stream, Single Data stream (SISD) complaceers
  • Single Instruction stream, Multiple Data stream (SIMD) complaceers
  • Multiple Instruction stream, Single Data stream (MISD) complaceers
  • Multiple Instruction stream, Multiple Data stream (MIMD) complaceers

SISD Complaceers

SISD complaceers contain one manage device, one processing device, and one memory device.

SSID Complaceers

In this kind of complaceers, the processor receives a single stream of instructions from the manage device and operates on a single stream of data from the memory device. During complaceation, at every step, the processor receives one instruction from the manage device and operates on a single data received from the memory device.

SIMD Complaceers

SIMD complaceers contain one manage device, multiple processing devices, and shareddish coloured memory or interinterinterconnection ne2rk.

SIMD Complaceers

Here, one single manage device sends instructions to all processing devices. During complaceation, at every step, all the processors receive a single set of instructions from the manage device and operate on various set of data from the memory device.

Each of the processing devices has it is own local memory device to store both data and instructions. In SIMD complaceers, processors need to communicate among all of themselves. This is done simply by shareddish coloured memory or simply by interinterinterconnection ne2rk.

While several of the processors execute a set of instructions, the remaining processors wait around around for their particular particular next set of instructions. Instructions from the manage device determines which processor will end up being take actionionive (execute instructions) or intake actionionive (wait around around for next instruction).

MISD Complaceers

As the name suggests, MISD complaceers contain multiple manage devices, multiple processing devices, and one common memory device.

MISD Complaceers

Here, every processor has it is own manage device and they share a common memory device. All the processors get instructions individually from their particular particular own manage device and they operate on a single stream of data as per the instructions they have received from their particular particular respective manage devices. This processor operates simultaneously.

MIMD Complaceers

MIMD complaceers have multiple manage devices, multiple processing devices, and a shareddish coloured memory or interinterinterconnection ne2rk.

MIMD Complaceers

Here, every processor has it is own manage device, local memory device, and arithmetic and logic device. They receive various sets of instructions from their particular particular respective manage devices and operate on various sets of data.

Note

  • An MIMD complaceer thead wear shares a common memory is belowstandn as multiprocessors, while those thead wear uses an interinterinterconnection ne2rk is belowstandn as multicomplaceers.

  • Based on the physical distance of the processors, multicomplaceers are of 2 kinds −

    • Multicomplaceer − When all the processors are very close up up to one an additional (e.g., in the same room).

    • Distributed system − When all the processors are far away from one an additional (e.g.- in the various cilinks)

Parallel Algorithm – Analysis

Analysis of an algorithm helps us figure out generally there whether the algorithm is helpful or not. Generally, an algorithm is analyzed based on it is execution time (Time Complexity) and the amount of space (Space Complexity) it requires.

Since we have sophisticated memory devices available at reasonable cost, storage space is no lengthyer an issue. Hence, space complexity is not given so a lot of iminterfaceance.

Parallel algorithms are designed to improve the complaceation speed of a complaceer. For analyzing a Parallel Algorithm, we normally conaspectr the following parameters −

  • Time complexity (Execution Time),
  • Total numend up beingr of processors used, and
  • Total cost.

Time Complexity

The main reason end up beinghind generateing parallel algorithms was to reddish coloureduce the complaceation time of an algorithm. Thus, evaluating the execution time of an algorithm is extremely iminterfaceant in analyzing it is efficiency.

Execution time is measureddish coloured on the basis of the time getn simply by the algorithm to solve a problem. The comppermite execution time is calculated from the moment when the algorithm starts executing to the moment it quit’s. If all the processors do not start or end execution at the same time, then the comppermite execution time of the algorithm is the moment when the preliminary processor started it is execution to the moment when the final processor quit’s it is execution.

Time complexity of an algorithm can end up being courseified into 3 categories−

  • Worst-case complexity − When the amount of time requireddish coloured simply by an algorithm for a given inplace is maximum.

  • Average-case complexity − When the amount of time requireddish coloured simply by an algorithm for a given inplace is average.

  • Best-case complexity − When the amount of time requireddish coloured simply by an algorithm for a given inplace is minimum.

Asymptotic Analysis

The complexity or efficiency of an algorithm is the numend up beingr of steps executed simply by the algorithm to get the desireddish coloured awayplace. Asymptotic analysis is done to calculate the complexity of an algorithm in it is theoretical analysis. In asymptotic analysis, a big length of inplace is used to calculate the complexity function of the algorithm.

NoteAsymptotic is a condition where a series tends to meet a curve, but they do not intersect. Here the series and the curve is asymptotic to every other.

Asymptotic notation is the easiest way to descriend up being the quickest and slowest possible execution time for an algorithm using high bounds and low bounds on speed. For this, we use the following notations −

  • Big O notation
  • Omega notation
  • Theta notation

Big O notation

In maall of thematics, Big O notation is used to represent the asymptotic chartake actionioneristics of functions. It represents the end up beinghavior of a function for big inplaces in a fundamental and precise method. It is a method of representing the upper bound of an algorithm’s execution time. It represents the lengthyest amount of time thead wear the algorithm could get to extensive it is execution. The function −

f(n) = O(g(n))

iff generally generally there exists posit down downive constants c and n0 such thead wear f(n) ≤ c * g(n) for all n where n ≥ n0.

Omega notation

Omega notation is a method of representing the lower bound of an algorithm’s execution time. The function −

f(n) = Ω (g(n))

iff generally generally there exists posit down downive constants c and n0 such thead wear f(n) ≥ c * g(n) for all n where n ≥ n0.

Theta Notation

Theta notation is a method of representing both the lower bound and the upper bound of an algorithm’s execution time. The function −

f(n) = θ(g(n))

iff generally generally there exists posit down downive constants c1, c2, and n0 such thead wear c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n where n ≥ n0.

Speedup of an Algorithm

The performance of a parallel algorithm is figure out generally thereddish coloured simply by calculating it is speedup. Speedup is degoodd as the ratio of the worst-case execution time of the quickest belowstandn sequential algorithm for a particular problem to the worst-case execution time of the parallel algorithm.

speedup =

Worst case execution time of the quickest belowstandn sequential for a particular problem
/
Worst case execution time of the parallel algorithm

Numend up beingr of Processors Used

The numend up beingr of processors used is an iminterfaceant truthionor in analyzing the efficiency of a parallel algorithm. The cost to buy, maintain, and operate the complaceers are calculated. Larger the numend up beingr of processors used simply by an algorithm to solve a problem, more costly end up beingcomes the obtained result.

Total Cost

Total cost of a parallel algorithm is the product of time complexity and the numend up beingr of processors used in thead wear particular algorithm.

Total Cost = Time complexity × Numend up beingr of processors used

Therefore, the efficiency of a parallel algorithm is −

Efficiency =

Worst case execution time of sequential algorithm
/
Worst case execution time of the parallel algorithm

Parallel Algorithm – Models

The model of a parallel algorithm is generateed simply by conaspectring a strategy for dividing the data and processing method and applying a suitable strategy to reddish coloureduce intertake actionionions. In this chapter, we will discuss the following Parallel Algorithm Models −

  • Data parallel model
  • Task graph model
  • Work pool model
  • Master slave model
  • Producer consumer or pipeseries model
  • Hybrid model

Data Parallel

In data parallel model, tasks are bumigned to processes and every task performs similar kinds of operations on various data. Data parallelism is a consequence of single operations thead wear is end up beinging appsit downd on multiple data items.

Data-parallel model can end up being appsit downd on shareddish coloured-adout generally therefit spaces and message-moveing paradigms. In data-parallel model, intertake actionionion overheads can end up being reddish coloureduced simply by selecting a locality preserving decomposit down downion, simply by using optimized collective intertake actionionion rawayines, or simply by overlapping complaceation and intertake actionionion.

The primary chartake actionioneristic of data-parallel model problems is thead wear the intensit down downy of data parallelism incrrelayves with the size of the problem, which in turn generates it possible to use more processes to solve bigr problems.

Example − Dense matrix multiplication.

Data Parallel Model

Task Graph Model

In the task graph model, parallelism is expressed simply by a task graph. A task graph can end up being possibly trivial or nontrivial. In this model, the correlation among the tasks are utilized to promote locality or to minimise intertake actionionion costs. This model is enforced to solve problems in which the quantity of data bumociated with the tasks is huge compareddish coloured to the numend up beingr of complaceation bumociated with all of them. The tasks are bumigned to help improve the cost of data movement among the tasks.

Examples − Parallel quick sort, sparse matrix truthionorization, and parallel algorithms derived via divide-and-conquer approach.

Task Graph Model

Here, problems are divided into atomic tasks and implemented as a graph. Each task is an independent device of job thead wear has dependencies on one or more antecedent task. After the comppermition of a task, the awayplace of an antecedent task is moveed to the dependent task. A task with antecedent task starts execution only when it is entire antecedent task is extensived. The final awayplace of the graph is received when the final dependent task is extensived (Task 6 in the above figure).

Work Pool Model

In work pool model, tasks are dynamically bumigned to the processes for balancing the load. Therefore, any kind of process may potentially execute any kind of task. This model is used when the quantity of data bumociated with tasks is comparatively smaller than the complaceation bumociated with the tasks.

There is no desireddish coloured pre-bumigning of tasks onto the processes. Assigning of tasks is centralized or decentralized. Pointers to the tasks are saved in a physically shareddish coloured list, in a priority queue, or in a hash table or tree, or they could end up being saved in a physically distributed data structure.

The task may end up being available in the end up beingginning, or may end up being generated dynamically. If the task is generated dynamically and a decentralized bumigning of task is done, then a termination detection algorithm is requireddish coloured so thead wear all the processes can take actionionually detect the comppermition of the entire program and quit loocalifornia king for more tasks.

Example − Parallel tree reresearch

Work Pool Model

Master-Slave Model

In the master-slave model, one or more master processes generate task and allocate it to slave processes. The tasks may end up being allocated end up beingforehand if −

  • the master can estimate the volume of the tasks, or
  • a random bumigning can do a satistruthionory job of balancing load, or
  • slaves are bumigned smaller pieces of task at various times.

This model is generally equally suitable to shareddish coloured-adout generally therefit-space or message-moveing paradigms, since the intertake actionionion is naturally 2 ways.

In several cases, a task may need to end up being extensived in phases, and the task in every phase must end up being extensived end up beingfore the task in the next phases can end up being generated. The master-slave model can end up being generalized to hierarchical or multi-level master-slave model in which the top level master give food tos the big interfaceion of tasks to the 2nd-level master, who further subdivides the tasks among it is own slaves and may perform a part of the task it iself.

Master-Slave Model

Precautions in using the master-slave model

Care need to end up being getn to bumure thead wear the master does not end up beingcome a congestion stage. It may happen if the tasks are too small or the workers are comparatively quick.

The tasks need to end up being selected in a way thead wear the cost of performing a task dominates the cost of communication and the cost of synchronization.

Asynchronous intertake actionionion may help overlap intertake actionionion and the complaceation bumociated with work generation simply by the master.

Pipeseries Model

It is furthermore belowstandn as the producer-consumer model. Here a set of data is moveed on through a series of processes, every of which performs several task on it. Here, the arrival of brand new data generates the execution of a brand new task simply by a process in the queue. The processes could form a queue in the shape of seriesar or multidimensional arrays, trees, or general graphs with or withaway cycles.

This model is a chain of producers and consumers. Each process in the queue can end up being conaspectreddish coloured as a consumer of a sequence of data items for the process preceding it in the queue and as a producer of data for the process following it in the queue. The queue does not need to end up being a seriesar chain; it can end up being a directed graph. The many kind of common intertake actionionion minimization technique applicable to this model is overlapping intertake actionionion with complaceation.

Example − Parallel LU truthionorization algorithm.

Pipeseries Model

Hybrid Models

A hybrid algorithm model is requireddish coloured when more than one model may end up being needed to solve a problem.

A hybrid model may end up being composed of possibly multiple models appsit downd hierarchically or multiple models appsit downd sequentially to various phases of a parallel algorithm.

Example − Parallel quick sort

Parallel Random Access Machines

Parallel Random Access Machines (PRAM) is a model, which is conaspectreddish coloured for many kind of of the parallel algorithms. Here, multiple processors are attached to a single block of memory. A PRAM model contains −

  • A set of similar kind of processors.

  • All the processors share a common memory device. Processors can communicate among all of themselves through the shareddish coloured memory only.

  • A memory access device (MAU) connects the processors with the single shareddish coloured memory.

PRAM Architecture

Here, n numend up beingr of processors can perform independent operations on n numend up beingr of data in a particular device of time. This may result in simultaneous access of same memory location simply by various processors.

To solve this problem, the following constraints have end up beingen enforced on PRAM model −

  • Exclusive Read Exclusive Write (EREW) − Here no 2 processors are permited to read from or write to the same memory location at the same time.

  • Exclusive Read Concurrent Write (ERCW) − Here no 2 processors are permited to read from the same memory location at the same time, but are permited to write to the same memory location at the same time.

  • Concurrent Read Exclusive Write (CREW) − Here all the processors are permited to read from the same memory location at the same time, but are not permited to write to the same memory location at the same time.

  • Concurrent Read Concurrent Write (CRCW) − All the processors are permited to read from or write to the same memory location at the same time.

There are many kind of methods to implement the PRAM model, but the many kind of prominent ones are −

  • Shareddish coloured memory model
  • Message moveing model
  • Data parallel model

Shareddish coloured Memory Model

Shareddish coloured memory emphasizes on manage parallelism than on data parallelism. In the shareddish coloured memory model, multiple processes execute on various processors independently, but they share a common memory space. Due to any kind of processor take actionionivity, if generally generally there is any kind of alter in any kind of memory location, it is noticeable to the rest of the processors.

As multiple processors access the same memory location, it may happen thead wear at any kind of particular stage of time, more than one processor is accessing the same memory location. Suppose one is reading thead wear location and the other is writing on thead wear location. It may generate confusion. To avoid this, several manage mechanism, like lock / secharthore, is implemented to ensure mutual exclusion.

Shareddish coloured Memory Model

Shareddish coloured memory programming has end up beingen implemented in the following −

  • Thread libraries − The thread library permit’s multiple threads of manage thead wear operate concurrently in the same memory location. Thread library provides an interface thead wear supinterfaces multithreading through a library of subrawayine. It contains subrawayines for

    • Creating and destroying threads
    • Scheduling execution of thread
    • moveing data and message end up beingtween threads
    • saving and restoring thread contexts

Examples of thread libraries include − SolarisTM threads for Solaris, POSIX threads as implemented in Linux, Win32 threads available in Windows NT and Windows 2000, and JavaTM threads as part of the standard JavaTM Development Kit (JDK).

  • Distributed Shareddish coloured Memory (DSM) Systems − DSM systems generate an abstrtake actionionion of shareddish coloured memory on loosely coupdelivered architecture in order to implement shareddish coloured memory programming withaway hardbattlee supinterface. They implement standard libraries and use the advanced user-level memory management features present in modern operating systems. Examples include Tread Marks System, Munin, IVY, Shasta, Brazos, and Cashmere.

  • Program Annotation Packages − This is implemented on the architectures having uniform memory access chartake actionioneristics. The many kind of notable example of program annotation packages is OpenMP. OpenMP implements functional parallelism. It mainly focuses on parallelization of loops.

The concept of shareddish coloured memory provides a low-level manage of shareddish coloured memory system, but it tends to end up being tedious and erroneous. It is more applicable for system programming than application programming.

Merit is of Shareddish coloured Memory Programming

  • Global adout generally therefit space gives a user-friendly programming approach to memory.

  • Due to the close up upness of memory to CPU, data sharing among processes is quick and uniform.

  • There is no need to specify specificly the communication of data among processes.

  • Process-communication overhead is negligible.

  • It is very easy to find out.

Demerit is of Shareddish coloured Memory Programming

  • It is not interfaceable.
  • Managing data locality is very difficult.

Message Pbuming Model

Message moveing is the many kind of commonly used parallel programming approach in distributed memory systems. Here, the programmer has to figure out generally there the parallelism. In this model, all the processors have their particular particular own local memory device and they exalter data through a communication ne2rk.

Message Pbuming Model

Processors use message-moveing libraries for communication among all of themselves. Alengthy with the data end up beinging sent, the message contains the following components −

  • The adout generally therefit of the processor from which the message is end up beinging sent;

  • Starting adout generally therefit of the memory location of the data in the sending processor;

  • Data kind of the sending data;

  • Data size of the sending data;

  • The adout generally therefit of the processor to which the message is end up beinging sent;

  • Starting adout generally therefit of the memory location for the data in the receiving processor.

Processors can communicate with every other simply by any kind of of the following methods −

  • Point-to-Point Communication
  • Collective Communication
  • Message Pbuming Interface

Point-to-Point Communication

Point-to-stage communication is the fundamentalst form of message moveing. Here, a message can end up being sent from the sending processor to a receiving processor simply by any kind of of the following transfer modes −

  • Synchronous mode − The next message is sent only after the receiving a confirmation thead wear it is previous message has end up beingen deresidereddish coloured, to maintain the sequence of the message.

  • Asynchronous mode − To send the next message, receipt of the confirmation of the deresidery of the previous message is not requireddish coloured.

Collective Communication

Collective communication involves more than 2 processors for message moveing. Following modes permit collective communications −

  • Barrier − Barrier mode is possible if all the processors included in the communications operate a particular bock (belowstandn as barrier block) for message moveing.

  • Broadcast − Broadcasting is of 2 kinds −

    • One-to-all − Here, one processor with a single operation sends same message to all other processors.

    • All-to-all − Here, all processors send message to all other processors.

Messages widecasted may end up being of 3 kinds −

  • Personalized − Unique messages are sent to all other destination processors.

  • Non-individualized − All the destination processors receive the same message.

  • Reduction − In reddish coloureduction widecasting, one processor of the group collects all the messages from all other processors in the group and combine all of them to a single message which all other processors in the group can access.

Merit is of Message Pbuming

  • Provides low-level manage of parallelism;
  • It is interfaceable;
  • Less error prone;
  • Less overhead in parallel synchronization and data distribution.

Demerit is of Message Pbuming

  • As compareddish coloured to parallel shareddish coloured-memory code, message-moveing code generally needs more smoothbattlee overhead.

Message Pbuming Libraries

There are many kind of message-moveing libraries. Here, we will discuss 2 of the many kind of-used message-moveing libraries −

  • Message Pbuming Interface (MPI)
  • Parallel Virtual Machine (PVM)

Message Pbuming Interface (MPI)

It is a universal standard to provide communication among all the concurrent processes in a distributed memory system. Most of the commonly used parallel complaceing platforms provide at minimum one implementation of message moveing interface. It has end up beingen implemented as the collection of preddish colouredegoodd functions caldelivered library and can end up being caldelivered from languages such as C, C++, Fortran, etc. MPIs are both quick and interfaceable as compareddish coloured to the other message moveing libraries.

Merit is of Message Pbuming Interface

  • Runs only on shareddish coloured memory architectures or distributed memory architectures;

  • Each processors has it is own local variables;

  • As compareddish coloured to big shareddish coloured memory complaceers, distributed memory complaceers are less expensive.

Demerit is of Message Pbuming Interface

  • More programming alters are requireddish coloured for parallel algorithm;
  • Sometimes difficult to debug; and
  • Does not perform well in the communication ne2rk end up beingtween the nodes.

Parallel Virtual Machine (PVM)

PVM is a interfaceable message moveing system, designed to connect separate heterogeneous host machines to form a single virtual machine. It is a single manageable parallel complaceing resource. Large complaceational problems like superconductivity stuexpires, molecular dynamics simulations, and matrix algorithms can end up being solved more cost effectively simply by using the memory and the aggregate power of many kind of complaceers. It manages all message rawaying, data conversion, task scheduling in the ne2rk of incompatible complaceer architectures.

Features of PVM

  • Very easy to install and configure;
  • Multiple users can use PVM at the same time;
  • One user can execute multiple applications;
  • It’s a small package;
  • Supinterfaces C, C++, Fortran;
  • For a given operate of a PVM program, users can select the group of machines;
  • It is a message-moveing model,
  • Process-based complaceation;
  • Supinterfaces heterogeneous architecture.

Data Parallel Programming

The major focus of data parallel programming model is on performing operations on a data set simultaneously. The data set is organised into several structure like an array, hypercuend up being, etc. Processors perform operations collectively on the same data structure. Each task is performed on a various partition of the same data structure.

It is relimitedive, as not all the algorithms can end up being specified in terms of data parallelism. This is the reason why data parallelism is not universal.

Data parallel languages help to specify the data decomposit down downion and chartping to the processors. It furthermore includes data distribution statements thead wear permit the programmer to have manage on data – for example, which data will go on which processor – to reddish coloureduce the amount of communication within the processors.

Parallel Algorithm – Structure

To apply any kind of algorithm properly, it is very iminterfaceant thead wear you select a proper data structure. It is end up beingcause a particular operation performed on a data structure may get more time as compareddish coloured to the same operation performed on an additional data structure.

Example − To access the ith element in a set simply by using an array, it may get a constant time but simply by using a linked list, the time requireddish coloured to perform the same operation may end up beingcome a polynomial.

Therefore, the selection of a data structure must end up being done conaspectring the architecture and the kind of operations to end up being performed.

The following data structures are commonly used in parallel programming −

  • Linked List
  • Arrays
  • Hypercuend up being Ne2rk

Linked List

A linked list is a data structure having zero or more nodes connected simply by stageers. Nodes may or may not occupy consecutive memory locations. Each node has 2 or 3 parts − one data part thead wear stores the data and the other 2 are link fields thead wear store the adout generally therefit of the previous or next node. The preliminary node’s adout generally therefit is storeddish coloured in an external stageer caldelivered head. The final node, belowstandn as tail, generally does not contain any kind of adout generally therefit.

There are 3 kinds of linked lists −

  • Singly Linked List
  • Doubly Linked List
  • Circular Linked List

Singly Linked List

A node of a singly linked list contains data and the adout generally therefit of the next node. An external stageer caldelivered head stores the adout generally therefit of the preliminary node.

Singly Linked List

Doubly Linked List

A node of a doubly linked list contains data and the adout generally therefit of both the previous and the next node. An external stageer caldelivered head stores the adout generally therefit of the preliminary node and the external stageer caldelivered tail stores the adout generally therefit of the final node.

Doubly Linked List

Circular Linked List

A circular linked list is very similar to the singly linked list except the truthion thead wear the final node saved the adout generally therefit of the preliminary node.

Circular Linked List

Arrays

An array is a data structure where we can store similar kinds of data. It can end up being one-dimensional or multi-dimensional. Arrays can end up being generated statically or dynamically.

  • In statically declareddish coloured arrays, dimension and size of the arrays are belowstandn at the time of compilation.

  • In dynamically declareddish coloured arrays, dimension and size of the array are belowstandn at operatetime.

For shareddish coloured memory programming, arrays can end up being used as a common memory and for data parallel programming, they can end up being used simply by partitioning into sub-arrays.

Hypercuend up being Ne2rk

Hypercuend up being architecture is helpful for those parallel algorithms where every task has to communicate with other tasks. Hypercuend up being topology can easily emend up beingd other topologies such as ring and mesh. It is furthermore belowstandn as n-cuend up beings, where n is the numend up beingr of dimensions. A hypercuend up being can end up being constructed recursively.

Hypercuend up being
Hypercuend up being 1

Parallel Algorithm – Design Techniques

Selecting a proper designing technique for a parallel algorithm is the many kind of difficult and iminterfaceant task. Most of the parallel programming problems may have more than one solution. In this chapter, we will discuss the following designing techniques for parallel algorithms −

  • Divide and conquer
  • Greedy Method
  • Dynamic Programming
  • Backtraccalifornia king
  • Branch & Bound
  • Linear Programming

Divide and Conquer Method

In the divide and conquer approach, the problem is divided into lots of small sub-problems. Then the sub-problems are solved recursively and combined to get the solution of the unique problem.

The divide and conquer approach involves the following steps at every level −

  • Divide − The unique problem is divided into sub-problems.

  • Conquer − The sub-problems are solved recursively.

  • Combine − The solutions of the sub-problems are combined with every other to get the solution of the unique problem.

The divide and conquer approach is appsit downd in the following algorithms −

  • Binary reresearch
  • Quick sort
  • Merge sort
  • Integer multiplication
  • Matrix inversion
  • Matrix multiplication

Greedy Method

In greedy algorithm of optimizing solution, the end up beingst solution is chosen at any kind of moment. A greedy algorithm is very easy to apply to complex problems. It determines which step will provide the many kind of precise solution in the next step.

This algorithm is a caldelivered greedy end up beingcause when the optimal solution to the smaller instance is provided, the algorithm does not conaspectr the comppermite program as a whole. Once a solution is conaspectreddish coloured, the greedy algorithm never conaspectrs the same solution again.

A greedy algorithm works recursively creating a group of objects from the smallest possible component parts. Recursion is a procedure to solve a problem in which the solution to a specific problem is dependent on the solution of the smaller instance of thead wear problem.

Dynamic Programming

Dynamic programming is an optimization technique, which divides the problem into smaller sub-problems and after solving every sub-problem, dynamic programming combines all the solutions to get ultimate solution. Unlike divide and conquer method, dynamic programming reuses the solution to the sub-problems many kind of times.

Recursive algorithm for Fibonacci Series is an example of dynamic programming.

Backtraccalifornia king Algorithm

Backtraccalifornia king is an optimization technique to solve combinational problems. It is appsit downd to both programmatic and real-life problems. Eight queen problem, Sudoku puzzle and going through a maze are popular examples where backtraccalifornia king algorithm is used.

In backtraccalifornia king, we start with a possible solution, which satisfies all the requireddish coloured conditions. Then we move to the next level and if thead wear level does not produce a satistruthionory solution, we return one level back and start with a brand new option.

Branch and Bound

A branch and bound algorithm is an optimization technique to get an optimal solution to the problem. It looks for the end up beingst solution for a given problem in the entire space of the solution. The bounds in the function to end up being optimized are merged with the value of the lacheck end up beingst solution. It permit’s the algorithm to find parts of the solution space extensively.

The purpose of a branch and bound reresearch is to maintain the lowest-cost rout generally theree to a target. Once a solution is found, it can maintain improving the solution. Branch and bound reresearch is implemented in depth-bounded reresearch and depth–preliminary reresearch.

Linear Programming

Linear programming descriend up beings a wide course of optimization job where both the optimization criterion and the constraints are seriesar functions. It is a technique to get the end up beingst awaycome like maximum profit, shorcheck rout generally theree, or lowest cost.

In this programming, we have a set of variables and we have to bumign absolute values to all of them to satisfy a set of seriesar equations and to maximize or minimise a given seriesar goal function.

Parallel Algorithm – Matrix Multiplication

A matrix is a set of numerical and non-numerical data arranged in a fixed numend up beingr of seriess and column. Matrix multiplication is an iminterfaceant multiplication design in parallel complaceation. Here, we will discuss the implementation of matrix multiplication on various communication ne2rks like mesh and hypercuend up being. Mesh and hypercuend up being have higher ne2rk connectivity, so they permit quicker algorithm than other ne2rks like ring ne2rk.

Mesh Ne2rk

A topology where a set of nodes form a p-dimensional grid is caldelivered a mesh topology. Here, all the edges are parallel to the grid axis and all the adjacent nodes can communicate among all of themselves.

Total numend up beingr of nodes = (numend up beingr of nodes in series) × (numend up beingr of nodes in column)

A mesh ne2rk can end up being evaluated using the following truthionors −

  • Diameter
  • Bisection width

Diameter − In a mesh ne2rk, the lengthyest distance end up beingtween 2 nodes is it is diameter. A p-dimensional mesh ne2rk having kP nodes has a diameter of p(k–1).

Bisection width − Bisection width is the minimum numend up beingr of edges needed to end up being removed from a ne2rk to divide the mesh ne2rk into 2 halves.

Matrix Multiplication Using Mesh Ne2rk

We have conaspectreddish coloured a 2D mesh ne2rk SIMD model having wraparound interinterconnections. We will design an algorithm to multiply 2 n × n arrays using n2 processors in a particular amount of time.

Matrices A and B have elements aij and bij respectively. Processing element PEij represents aij and bij. Arrange the matrices A and B in such a way thead wear every processor has a pair of elements to multiply. The elements of matrix A will move in left direction and the elements of matrix B will move in upbattdelivered direction. These alters in the posit down downion of the elements in matrix A and B present every processing element, PE, a brand new pair of values to multiply.

Steps in Algorithm

  • Stagger 2 matrices.
  • Calculate all products, aik × bkj
  • Calculate sums when step 2 is extensive.

Algorithm

Procedure MatrixMulti

Begin
   for k = 1 to n-1
	
   for all Pij; where i and j ranges from 1 to n
      ifi is greater than k then
         rotate a in left direction
      end if
		
   if j is greater than k then
      rotate b in the upbattdelivered direction
   end if
	
   for all Pij ; where i and j sit down's end up beingtween 1 and n
      complacee the product of a and b and store it in c
   for k= 1 to n-1 step 1
   for all Pi;j where i and j ranges from 1 to n
      rotate a in left direction
      rotate b in the upbattdelivered direction
      c=c+aXb
End

Hypercuend up being Ne2rk

A hypercuend up being is an n-dimensional construct where edges are perpendicular among all of themselves and are of same length. An n-dimensional hypercuend up being is furthermore belowstandn as an n-cuend up being or an n-dimensional cuend up being.

Features of Hypercuend up being with 2k node

  • Diameter = k
  • Bisection width = 2k–1
  • Numend up beingr of edges = k

Matrix Multiplication using Hypercuend up being Ne2rk

General specification of Hypercuend up being ne2rks −

  • Let N = 2m end up being the comppermite numend up beingr of processors. Let the processors end up being P0, P1…..PN-1.

  • Let i and ib end up being 2 integers, 0 < i,ib < N-1 and it is binary representation differ only in posit down downion b, 0 < b < k–1.

  • Let us conaspectr 2 n × n matrices, matrix A and matrix B.

  • Step 1 − The elements of matrix A and matrix B are bumigned to the n3 processors such thead wear the processor in posit down downion i, j, k will have aji and bik.

  • Step 2 − All the processor in posit down downion (i,j,k) complacees the product

    C(i,j,k) = A(i,j,k) × B(i,j,k)

  • Step 3 − The sum C(0,j,k) = ΣC(i,j,k) for 0 ≤ i ≤ n-1, where 0 ≤ j, k < n–1.

Block Matrix

Block Matrix or partitioned matrix is a matrix where every element it iself represents an individual matrix. These individual sections are belowstandn as a block or sub-matrix.

Example

Block Matrix
Block Matrix 1

In Figure (a), X is a block matrix where A, B, C, D are matrix all of themselves. Figure (f) shows the comppermite matrix.

Block Matrix Multiplication

When 2 block matrices are square matrices, then they are multipsit downd simply the way we perform fundamental matrix multiplication. For example,

Block Matrix Multiplication

Parallel Algorithm – Sorting

Sorting is a process of arranging elements in a group in a particular order, i.e., ascending order, descending order, alphaend up beingtic order, etc. Here we will discuss the following −

  • Enumeration Sort
  • Odd-Even Transposit down downion Sort
  • Parallel Merge Sort
  • Hyper Quick Sort

Sorting a list of elements is a very common operation. A sequential sorting algorithm may not end up being effective sufficient when we have to sort a huge volume of data. Therefore, parallel algorithms are used in sorting.

Enumeration Sort

Enumeration sort is a method of arranging all the elements in a list simply by finding the final posit down downion of every element in a sorted list. It is done simply by comparing every element with all other elements and finding the numend up beingr of elements having smaller value.

Therefore, for any kind of 2 elements, ai and aj any kind of one of the following cases must end up being true −

  • ai < aj
  • ai > aj
  • ai = aj

Algorithm

procedure ENUM_SORTING (n)

end up beinggin
   for every process P1,j do
      C[j] := 0;
		
   for every process Pi, j do
	
      if (A[i] < A[j]) or A[i] = A[j] and i < j) then
         C[j] := 1;
      else
         C[j] := 0;
			
   for every process P1, j do
      A[C[j]] := A[j];
		
end ENUM_SORTING

Odd-Even Transposit down downion Sort

Odd-Even Transposit down downion Sort is based on the Bubble Sort technique. It compares 2 adjacent numend up beingrs and switches all of them, if the preliminary numend up beingr is greater than the 2nd numend up beingr to get an ascending order list. The opposit down downe case appsit down’s for a descending order series. Odd-Even transposit down downion sort operates in 2 phases − odd phase and furthermore phase. In both the phases, processes exalter numend up beingrs with their particular particular adjacent numend up beingr in the right.

Odd-Even Transposit down downion Sort

Algorithm

procedure ODD-EVEN_PAR (n) 

end up beinggin 
   id := process's laend up beingl 
	
   for i := 1 to n do 
   end up beinggin 
	
      if i is odd and id is odd then 
         compare-exalter_min(id + 1); 
      else 
         compare-exalter_max(id - 1);
			
      if i is furthermore and id is furthermore then 
         compare-exalter_min(id + 1); 
      else 
         compare-exalter_max(id - 1);
			
   end for
	
end ODD-EVEN_PAR

Parallel Merge Sort

Merge sort preliminary divides the unsorted list into smallest possible sub-lists, compares it with the adjacent list, and merges it in a sorted order. It implements parallelism very nicely simply by following the divide and conquer algorithm.

Parallel Merge Sort

Algorithm

procedureparallelmergesort(id, n, data, brand newdata)

end up beinggin
   data = sequentialmergesort(data)
	
      for dim = 1 to n
         data = parallelmerge(id, dim, data)
      endfor
		
   brand newdata = data
end

Hyper Quick Sort

Hyper quick sort is an implementation of quick sort on hypercuend up being. It’s steps are as follows −

  • Divide the unsorted list among every node.
  • Sort every node locally.
  • From node 0, widecast the median value.
  • Split every list locally, then exalter the halves acombination the highest dimension.
  • Repeat steps 3 and 4 in parallel until the dimension reveryes 0.

Algorithm

procedure HYPERQUICKSORT (B, n)
end up beinggin

   id := process’s laend up beingl;
	
   for i := 1 to d do
      end up beinggin
      x := pivot;
      partition B into B1 and B2 such thead wear B1 ≤ x < B2;
      if ith bit is 0 then
		
      end up beinggin
         send B2 to the process alengthy the ith communication link;
         C := subsequence received alengthy the ith communication link;
         B := B1 U C;
      endif
      
      else
         send B1 to the process alengthy the ith communication link;
         C := subsequence received alengthy the ith communication link;
         B := B2 U C;
         end else
      end for
		
   sort B using sequential quicksort;
	
end HYPERQUICKSORT

Parallel Search Algorithm

Searching is one of the fundamental operations in complaceer technology. It is used in all applications where we need to find if an element is in the given list or not. In this chapter, we will discuss the following reresearch algorithms −

  • Divide and Conquer
  • Depth-First Search
  • Breadth-First Search
  • Best-First Search

Divide and Conquer

In divide and conquer approach, the problem is divided into lots of small sub-problems. Then the sub-problems are solved recursively and combined to get the solution of the unique problem.

The divide and conquer approach involves the following steps at every level −

  • Divide − The unique problem is divided into sub-problems.

  • Conquer − The sub-problems are solved recursively.

  • Combine − The solutions of the sub-problems are combined to get the solution of the unique problem.

Binary reresearch is an example of divide and conquer algorithm.

Pseudocode

Binaryreresearch(a, b, low, high)

if low < high then
   return NOT FOUND
else
   mid ← (low+high) / 2
   if b = key(mid) then
      return key(mid)
   else if b < key(mid) then
      return BinarySearch(a, b, low, mid−1)
   else
   
      return BinarySearch(a, b, mid+1, high)

Depth-First Search

Depth-First Search (or DFS) is an algorithm for reresearching a tree or an undirected graph data structure. Here, the concept is to start from the starting node belowstandn as the underlying and traverse as far as possible in the same branch. If we get a node with no successor node, we return and continue with the vertex, which is yet to end up being visit down downed.

Steps of Depth-First Search

  • Conaspectr a node (underlying) thead wear is not visit down downed previously and mark it visit down downed.

  • Visit down down the preliminary adjacent successor node and mark it visit down downed.

  • If all the successors nodes of the conaspectreddish coloured node are already visit down downed or it doesn’t have any kind of more successor node, return to it is mother or father node.

Pseudocode

Let v end up being the vertex where the reresearch starts in Graph G.

DFS(G,v)

   Stack S := {};
	
   for every vertex u, set visit down downed[u] := false;
   push S, v;
   while (S is not empty) do
     u := pop S;
	  
      if (not visit down downed[u]) then
         visit down downed[u] := true;
         for every unvisit down downed neighbour w of u
            push S, w;
      end if
		
   end while
   
END DFS()

Breadth-First Search

Breadth-First Search (or BFS) is an algorithm for reresearching a tree or an undirected graph data structure. Here, we start with a node and then visit down down all the adjacent nodes in the same level and then move to the adjacent successor node in the next level. This is furthermore belowstandn as level-simply by-level reresearch.

Steps of Breadth-First Search

  • Start with the underlying node, mark it visit down downed.
  • As the underlying node has no node in the same level, go to the next level.
  • Visit down down all adjacent nodes and mark all of them visit down downed.
  • Go to the next level and visit down down all the unvisit down downed adjacent nodes.
  • Continue this process until all the nodes are visit down downed.

Pseudocode

Let v end up being the vertex where the reresearch starts in Graph G.

BFS(G,v)

   Queue Q := {};
	
   for every vertex u, set visit down downed[u] := false;
   insert Q, v;
   while (Q is not empty) do
      u := depermite Q;
		
      if (not visit down downed[u]) then
         visit down downed[u] := true;
         for every unvisit down downed neighbour w of u
            insert Q, w;
      end if
		
   end while
   
END BFS()

Best-First Search

Best-First Search is an algorithm thead wear traverses a graph to revery a target in the shorcheck possible rout generally theree. Unlike BFS and DFS, Best-First Search follows an evaluation function to figure out generally there which node is the many kind of appropriate to traverse next.

Steps of Best-First Search

  • Start with the underlying node, mark it visit down downed.
  • Find the next appropriate node and mark it visit down downed.
  • Go to the next level and find the appropriate node and mark it visit down downed.
  • Continue this process until the target is reveryed.

Pseudocode

BFS( m )

   Insert( m.StartNode )
   Until PriorityQueue is empty
      c ← PriorityQueue.DepermiteMin
      If c is the goal
      Exit
   Else
   
      Forevery neighbour n of c
         If n "Unvisit down downed"
            Mark n "Visit down downed"
            Insert( n )
      Mark c "Examined"
      
End procedure

Graph Algorithm

A graph is an abstrtake actionion notation used to represent the interinterconnection end up beingtween pairs of objects. A graph consists of −

  • Vertices − Interconnected objects in a graph are caldelivered vertices. Vertices are furthermore belowstandn as nodes.

  • Edges − Edges are the links thead wear connect the vertices.

There are 2 kinds of graphs −

  • Directed graph − In a directed graph, edges have direction, i.e., edges go from one vertex to an additional.

  • Undirected graph − In an undirected graph, edges have no direction.

Graph Coloring

Graph colouring is a method to bumign colours to the vertices of a graph so thead wear no 2 adjacent vertices have the same colour. Some graph colouring problems are −

  • Vertex colouring − A way of colouring the vertices of a graph so thead wear no 2 adjacent vertices share the same colour.

  • Edge Coloring − It is the method of bumigning a colour to every edge so thead wear no 2 adjacent edges have the same colour.

  • Face colouring − It bumigns a colour to every face or area of a planar graph so thead wear no 2 faces thead wear share a common boundary have the same colour.

Chromatic Numend up beingr

Chromatic numend up beingr is the minimum numend up beingr of colours requireddish coloured to colour a graph. For example, the chromatic numend up beingr of the following graph is 3.

Graph

The concept of graph colouring is appsit downd in preparing timetables, mobile radio stations frequency bumignment, SudUKu, register allocation, and colouring of charts.

Steps for graph colouring

  • Set the preliminary value of every processor in the n-dimensional array to 1.

  • Now to bumign a particular colour to a vertex, figure out generally there whether thead wear colour is already bumigned to the adjacent vertices or not.

  • If a processor detects same colour in the adjacent vertices, it sets it is value in the array to 0.

  • After macalifornia king n2 comparisons, if any kind of element of the array is 1, then it is a valid colouring.

Pseudocode for graph colouring

end up beinggin

   generate the processors P(i0,i1,...in-1) where 0_iv < m, 0 _ v < n
   status[i0,..in-1] = 1
	
   for j varies from 0 to n-1 do
      end up beinggin
		
         for k varies from 0 to n-1 do
         end up beinggin
            if aj,k=1 and ij=ikthen
            status[i0,..in-1] =0
         end
			
      end
      ok = ΣStatus
		
   if ok > 0, then display valid colouring exists
   else
      display invalid colouring
      
end

Minimal Spanning Tree

A spanning tree in whose sum of weight (or length) of all it is edges is less than all other possible spanning tree of graph G is belowstandn as a minimal spanning tree or minimum cost spanning tree. The following figure shows a weighted connected graph.

Minimal Spanning Tree

Some possible spanning trees of the above graph are shown end up beinglow −

Spanning Tree
Spanning Tree 1
Spanning Tree 2
Minimum Spanning Tree
Spanning Tree 3
Spanning Tree 4
Spanning Tree 5

Among all the above spanning trees, figure (d) is the minimum spanning tree. The concept of minimum cost spanning tree is appsit downd in journeylingling salesman problem, designing electronic circuit is, Designing effective ne2rks, and designing effective rawaying algorithms.

To implement the minimum cost-spanning tree, the following 2 methods are used −

  • Prim’s Algorithm
  • Kruskal’s Algorithm

Prim's Algorithm

Prim’s algorithm is a greedy algorithm, which helps us find the minimum spanning tree for a weighted undirected graph. It selects a vertex preliminary and finds an edge with the lowest weight incident on thead wear vertex.

Steps of Prim’s Algorithm

  • Select any kind of vertex, say v1 of Graph G.

  • Select an edge, say e1 of G such thead wear e1 = v1 v2 and v1 ≠ v2 and e1 has minimum weight among the edges incident on v1 in graph G.

  • Now, following step 2, select the minimum weighted edge incident on v2.

  • Continue this till n–1 edges have end up beingen chosen. Here n is the numend up beingr of vertices.

Graph Prim’s Algorithm

The minimum spanning tree is −

Prim’s Algorithm Minimum Spanning Tree

Kruskal's Algorithm

Kruskal’s algorithm is a greedy algorithm, which helps us find the minimum spanning tree for a connected weighted graph, adding increasing cost arcs at every step. It is a minimum-spanning-tree algorithm thead wear finds an edge of the minimum possible weight thead wear connects any kind of 2 trees in the forest.

Steps of Kruskal’s Algorithm

  • Select an edge of minimum weight; say e1 of Graph G and e1 is not a loop.

  • Select the next minimum weighted edge connected to e1.

  • Continue this till n–1 edges have end up beingen chosen. Here n is the numend up beingr of vertices.

Kruskal’s Algorithm Graph

The minimum spanning tree of the above graph is −

Minimum Spanning Tree of Kruskal’s Algorithm

Shorcheck Path Algorithm

Shorcheck Path algorithm is a method of finding the minimum cost rout generally theree from the source node(S) to the destination node (D). Here, we will discuss Moore’s algorithm, furthermore belowstandn as Breadth First Search Algorithm.

Moore’s algorithm

  • Laend up beingl the source vertex, S and laend up beingl it i and set i=0.

  • Find all unlaend up beingdelivered vertices adjacent to the vertex laend up beingdelivered i. If no vertices are connected to the vertex, S, then vertex, D, is not connected to S. If generally generally there are vertices connected to S, laend up beingl all of them i+1.

  • If D is laend up beingdelivered, then go to step 4, else go to step 2 to incrrelayve i=i+1.

  • Stop after the length of the shorcheck rout generally theree is found.

NO COMMENTS

LEAVE A REPLY