some aggregating variants of common sorting algorithms
Posted: Tue Feb 28, 2017 4:20 pm
After watching some videos showing visualizations of sorting performance for various algorithms, I got curious about the idea of aggregating or accretion of repeated data points, something I didn't know any examples of. I did some reading, and my impression is that while the idea of clustering repeated values is pretty common as an 'optimization trick', the only algorithms I found that did it or anything quite like it are Timsort (which is a combination merge and insertion sort, mainly), and the various counting sorts (which count the number of occurrences rather than actually sorting the original data). I have not found any formal literature on what I am calling 'aggregating' or 'accretive' sorting algorithms in general. Most of the related discussions I did find were about Timsort specifically, which has caught on recently for this very property - it performs better than most other algorithms for real-world data, even though its worst-case and average-case are the same as the much less complex Merge Sort it is partly based on.
Out of curiosity, I decided to try devising aggregating variants on two of the simpler sorts: the short-circuiting version of BubbleSort, and Gnome Sort, in both the basic and 'teleporting' variants. They aren't really all that interesting in and of themselves, but I wanted to see how it altered best-case performance. I figure that if nothing else, they would serve as a sort of baseline for the effect that this sort of modification would have.
I thought I would share the ones I have done already before going any further. If anyone is curious, I can post both the instrumented test versions of these (and the base algorithms for comparison) and a log of some trials as an attachment.
Out of curiosity, I decided to try devising aggregating variants on two of the simpler sorts: the short-circuiting version of BubbleSort, and Gnome Sort, in both the basic and 'teleporting' variants. They aren't really all that interesting in and of themselves, but I wanted to see how it altered best-case performance. I figure that if nothing else, they would serve as a sort of baseline for the effect that this sort of modification would have.
I thought I would share the ones I have done already before going any further. If anyone is curious, I can post both the instrumented test versions of these (and the base algorithms for comparison) and a log of some trials as an attachment.
Code: Select all
def musterSort(data):
"""muster sort - an accretive (or aggregating) variant of the
optimized bubblesort.
I chose the name because repeated elements aggregate into a
'troop' or 'squadron' which moves together from then on."""
endpoint = len(data)
for iter in range(endpoint):
sorted = True
tagged = 0
for pos in range(1, endpoint - iter):
diff = data[tagged] - data[pos]
if diff != 0:
if diff > 0:
data[pos], data[tagged] = data[tagged], data[pos]
sorted = False
tagged = pos
pos += 1
if sorted:
break
return data
Code: Select all
def smurfSort(data):
""" an aggregating variant of a Gnome Sort
Basically, gnomes that congregate in a village..."""
tagged, pos = 0, 1
endpoint = len(data)
direction = 1
while pos < endpoint:
if pos == 0:
direction = 1
tagged, pos = 0, 1
diff = (data[tagged] - data[pos]) * direction
if diff < 0:
tagged = pos
elif diff > 0:
data[tagged], data[pos] = data[pos], data[tagged]
tagged, pos = pos, tagged
direction = -direction
else:
pass
pos += direction
return data
Code: Select all
def bamfSort(data):
""" an aggregating version of the 'optimized' Gnome Sort
Probably only X-Men fans of old will get the name."""
def innerBamfSort(top):
pos, tagged = top-1 , top
while pos >= 0 and data[pos] >= data[tagged]:
if data[tagged] != data[pos]:
data[tagged], data[pos] = data[pos], data[tagged]
tagged = pos
pos -= 1
for pos in range(len(data)):
innerBamfSort(pos)
return data