C# LSD Radix Sort

The Least Significant Digit Radix sort algorithm is a classic near-linear-time sorting algorithm. It is a good example of how additional information about the inputs for an algorithm allow us to improve our code’s efficiency.

In the real world, for an overwhelming majority of applications, using the native .NET Array.Sort() implementation is efficient and adequate. The native sort algorithm implemented in the .NET library is a smart combination of three different sort algorithms (insertion sort, heapsort, and quicksort) depending on the input parameters. These provide a worst-case runtime in the order of O(nlogn) where n is the input size.

However, theory is very powerful, and for some applications, when you know more about the input (for example – the range and distribution of the population), you can achieve near-linear time sorting. Counting sort is a classic simple example of the concept, useful for sorting integers.

Following is a C# implementation of Least Significant Digit Radix Sort. This algorithm sorts strings (or anything that can be represented as a string) in O(n*k) time (where k is the average length of each string key). For many languages (DNA, words in the English language, ISBNs, etc.), this means near-linear performance. The complete implementation is available here.

How the Algorithm Works

During each step we sort the strings according to one of their characters, starting from the rightmost character and working our way left to the first character:

sort_steps

Note the number of these steps depends on the length of the longest key. This is why the performance of the algorithm is of O(k*n).

Between each step, sorting the keys by one character is performed via Bucket Sort. We create R buckets (one bucket for each letter in our alphabet) and add the strings to the buckets in order. The buckets are then combined in order and the result is fed to the next step. In the C# implementation I used queues to represent the buckets:

sort_queue_steps

LSD Radix Sort vs. Array.Sort()

The C# implementation of Radix LSD sort linked above performs much faster than the native .NET Array.Sort() implementation:

radix_sort_chart

LSD Radix sort also has the benefit of being stable (relative order of elements remains the same for elements with the same key), and allows for very easy reconfiguration of the alphanumerical order (by changing the alphabet definition).

However, at peak it does end up using roughly 2n+k memory – and for most cases, the O(nlogn) performance of the native algorithm is more than adequate. The LSD Radix algorithm also does not lend itself well to parallelization.

Optimization

The implementation provided can be optimized in a variety of ways:

  • Scanning the input once for the alphabet. The implementation relies on us already knowing the alphabet and the length of the longest key string – a real world implementation would likely need a quick pass of the input to gather this information upfront.
  • Use of a more efficient data structure than queues –  The algorithm spends a lot of time merging the queues between the different iterations. An array-based data-structure that handles these merges in constant time (by modifying index references rather than copying over values) would significantly improve the runtime of the algorithm.
  • Compound alphabets could be used (ie combining every adjacent two letters and increasing the number of queues at each step to RxR) to optimize performance for cases where you know some two letter combinations are very common (ie consonants and vowels in the English language). This would come at a cost of much higher memory consumption.

Let me know if you have any ideas as to how the implementation could be improved.

 

Modeling Complex Tracks with OpenScad

I have been working to create 3d-printed structures that would include marbles rolling down specific tracks (maybe a Rube Goldberg machine). To do this, a generic mechanism for creating tracks for arbitrarily complex paths is needed. OpenScad is a great tool for parametric 3D modeling, and seemed like a natural choice for solving the problem. A sample track looks like:
parabola_filled_with_trail

The Basic Shape

As always, we start with some spheres .These are copies of the same object, representing the location of the marble at different points in time. This can be done with the OpenScad ‘sphere’ command (note the approach is generic and should work with any object type – not just spheres):

3-spheres

Connecting the Spheres

We can use the convex hull function in OpenScad to create a convex hull around the spheres. The choice of the number of spheres included in each convex hull is important – the more sphere we include, the more area under the track will be included in the convex hull. In order to keep the structure free-standing we will need to use a union of multiple convex hulls, with each hull encompassing three spheres:

3-spheres-hull

Generating a Full Path of Spheres

We can run this for a full set of spheres, and define the x,y,z coordinates for each sphere arbitrarily. This is where I plan to use a defined set of twists, turns and drop to generate the complex tracks for the marbles to follow. In the interim, for proving out this method, I am using a parametric path for a simple concave parabola on the x-z access with a right intercept at x=rightx and peak at z=topz.  Note ‘n’ in this example represents the total number of spheres we are using (the path resolution):

function xi(i) =  rightx*i/n;
function yi(i) = 0;
function zi(i) = topz*(-i*i+n*i)/(n*n/4);

The convex hull around 3-sphere segments of the parabola defined above looks like:

parabola_filled

Generated via:

module convexTrail(){
    // start from 0 for a closed shape; 1 for an open shape
    for(i=[0:parts-1:n-2]){
        hull()
        {
            for(j=[i:1:i+2]) // connect each three items
                basic_shape(xi(j),yi(j),zi(j));       
        }    
    }
}

Creating a Groove for the Track

Finally, for a track to be useful it needs to have a groove for the object to pass through. We recreate the same shape, offset on the z access by the sphere radius, and subtract it from the original shape:

parabola_filled_with_trail

Generated via:

difference(){
    convexTrail();
    translate([0,0,r])
        convexTrail();
}

The full code is available on GitHub:

https://github.com/asternberg/OpenScad/blob/master/convex_hull_parabola_spheres.scad

 

Follow-Ups

There are a few different directions I can see taking this. If anyone reading this has thoughts or ideas, please share in the comments below.

  • Create a generic module to load path locations from a data file
  • Improvements to render time:
    • Look at generating the convex hull for all spheres at once and then substracting the bottom portion (should make rendering more efficient)
    • Use the OpenScad 2d sub-system for some of the initial work
  • Research equivalent Blender Python script
  • 3D-printing a track and run some marbles through it!

 

DFS/BFS Graph Search with C#

This is a basic implementation of DFS/BFS graph search with C#. I have not seen many implementations in C# (most educational ones use Java) utilizing the .NET built-in data structures (Stack vs. Queue).

The search method returns a list corresponding to a path in order. The sample dataset is setup to illustrate the different results: BFS will always return the shortest path.

Code: https://dotnetfiddle.net/HCR4O8