Sorting on two dimensional processor arrays

Algorithm 4-way mergesort

In practice, there is a need for an algorithm that is simple and easy to implement. In the following, the algorithm 4-way mergesort of Schimmler [Schi 87] is presented.

Algorithm

Definition:  An m×n-array of data is called roughly sorted, if sorting of the rows suffices to sort the array completely.

In a roughly sorted array each data element is already in its proper row. The idea of 4-way mergesort is to merge four roughly sorted k/2×k/2-arrays to one roughly sorted k×k-array.

 

Procedure 4-way merge (k)

Input:

k×k-array, whose four k/2×k/2-subarrays are roughly sorted

Output:

the roughly sorted k×k-array

Method:

  1. sort the rows of the subarrays:
    1. ascending in the upper subarrays,
    2. descending in the lower subarrays;
  2. sort the columns;
  3. sort the rows:
    1. odd rows ascending,
    2. even rows descending;
  4. sort the columns;

 

The correctness of procedure 4-way merge can be shown in an illustrative way by the 0-1-principle. (The 0-1-principle is also valid for the term "roughly sorted", since by subsequent sorting of the rows a sorted array can be obtained.)

Starting point is an array containing zeroes and ones, whose four subarrays are roughly sorted. In Figure 2(a) this situation is illustrated (zeroes are drawn white, ones gray).

Sorting the rows of the subarrays in step 1 results in a situation similar to Figure 2(b).

Sorting the columns in step 2 yields two roughly sorted k×k/2-arrays (Figure 2(c)).

After sorting the rows in alternating order in step 3 two different situations are possible: either the "dirty" halfrows (i.e. halfrows in which zeroes and ones occur) of situation (c) are in different halves (Figure 2(d)) or in the same half of the array (in the right half in Figure 2(e)). This depends on the number of full 1-halfrows in situation (c). In both cases however, sorting of the columns in step 4 leads to the situation shown in Figure 2(f), namely a roughly sorted k×k-array.

 

Figure 1: Proof of the correctness of procedure 4-way merge using the 0-1-principle 

Figure 1: Proof of the correctness of procedure 4-way merge using the 0-1-principle

 

By recursive application of procedure 4-way merge a totally unsorted array is roughly sorted:

 

Procedure roughsort (k)

Input:

unsorted k×k-array

Output:

roughly sorted k×k-array

Method:

  1. if k>1 then
    1. apply roughsort (k/2) to the four subarrays;
    2. 4-way merge (k);

 

An n×n-array is sorted by roughly sorting it and then sorting the rows:

 

Algorithm 4-way mergesort (n)

Input:

unsorted n×n-array

Output:

sorted n×n-array

Method:

  1. roughsort (n);
  2. sort the rows;

 

Following from the 0-1-principle algorithm 4-way mergesort sorts every n×n-array filled with arbitrary data, since it sorts all 0-1-arrays.

Analysis

Sorting a row or a column of length k with odd-even transposition sort takes k steps. In procedure 4-way merge this amounts to the following:

step 1: k/2 steps
step 2: ksteps
step 3: ksteps
step 4: k/2steps
altogether:   3ksteps

In step 4 only k/2 steps of odd-even transposition sort are necessary, since each column consists of two sorted subsequences shuffled into each other.

The recursive execution of 4-way merge in roughsort takes 3n + 3n/2 + 3n/4 + ... + 3 ≤ 6n steps. Thus, including n steps for the subsequent sorting of the rows 4-way mergesort has a time complexity of

T(n) ≤ 7n

Simulation

The following applet shows how 4-way mergesort sorts a 32×32-array of 0's and 1's. For more clarity, operations that are performed in parallel on a two-dimensional processor array are shown sequentially here.

 

(Java applet for simulation of 4-way mergesort)

Conclusions

Algorithm 4-way mergesort for sorting an n×n-array has been presented. It has the same asymptotic complexity as the algorithm of Thompson/Kung [TK 77], namely O(n). Concerning the constant it is slower by a factor of about 2. However, its quality lies in its simplicity. An implementation of the algorithm on an instruction systolic array is given in [Schi 87] and in [Lan 90].

References

[Lan 90]   H.W. Lang: Das befehlssystolische Prozessorfeld -- Architektur und Programmierung. Dissertation, Christian-Albrechts-Universität Kiel (1990)

[Schi 87]   M. Schimmler: Fast Sorting on the Instruction Systolic Array. Bericht Nr. 8709, Institut für Informatik, Christian-Albrechts-Universität Kiel (1987)

[TK 77]   C.D. Thompson, H.T. Kung: Sorting on a Mesh-Connected Parallel Computer. Communications of the ACM, 20, 4, 263-271 (1977)

 

Next:   [Rotatesort]   or   [up]

 


H.W. Lang   mail@hwlang.de   Impressum   Datenschutz
Created: 01.03.1997   Updated: 07.02.2023