.NET Framework - Sorting IList<T> *in place*
Asked By Marcel_Müller on 25-Jul-12 09:21 AM
Is there a way to sort an IList<T> in place?
The solutions I found imply that you copy the current list to List<T>,
sort it and throw away the current IList<T> by replacing the reference
with a new list.
But this might be a very bad Idea if the IList<T> have been something
else than a simple List<T> before. Furthermore it requires by reference
access to the list parameter.
OK, I could copy the list, sort it, clear the original list and copy the
sorted content back to the original list. Seriously?
Isn't there any generic sort algorithm that can be applied to a generic
interface?
Marcel
Peter Duniho replied to Marcel_Müller on 25-Jul-12 10:26 AM
Sure, as long as it is not a read-only IList<T>. Of course there is.
Now, there is nothing in .NET that I know of that will do that. The built-in
sort implementations apply to arrays or List<T> instances. But there is
nothing stopping you from writing your own.
Indeed, it is possible that someone has already done so. The built-in .NET
sort algorithms use quicksort, which is great for many scenarios, but not
the most efficient in others. Even ignoring IList<T> there is good reason to
have alternatives, so in any case maybe it is already been done.
You should use your favorite web search engine to look for a third-party
library that does that. Even if it does not support IList<T> directly, with
an open-source library it would be easy enough to port the existing
algorithm to do so.
Heck...Arne seems to have lots of free time. Maybe he will even write it for
you and post it here. :) It would not even actually be that many lines of
code...it is just a matter of writing it down.
In the meantime, I recommend you take a look at these articles, with an eye
toward just writing it yourself:
http://en.wikipedia.org/wiki/Quicksort
http://en.wikipedia.org/wiki/Heapsort
Those are two of the best general-purpose sorts, both of which can be
implemented to be both memory- and time-efficient.
Pete
Arne_Vajhøj replied to Marcel_Müller on 25-Jul-12 10:52 AM
I would be skeptical about its usage, because IList does only
guarantee that elements can be accessed by index not that such
access will be efficient, so if access by index is O(n) instead
of O(1) performance will be very bad.
But a quicksort of IList<T> is easy to write.
Simple version:
public class QS
{
private static void SortHelp<T>(int n1, int n2, IList<T> lst)
where T : IComparable<T>
{
int l = n1;
int r = n2;
T pivot = lst[(n1 + n2) / 2];
do
{
while (lst[l].CompareTo(pivot) < 0) l++;
while (lst[r].CompareTo(pivot) > 0) r--;
if (l <= r)
{
T tmp = lst[l];
lst[l] = lst[r];
lst[r] = tmp;
l++;
r--;
}
} while (l <= r);
if (n1 < r) SortHelp(n1, r, lst);
if (l < n2) SortHelp(l, n2, lst);
return;
}
public static void Sort<T>(IList<T> lst) where T : IComparable<T>
{
SortHelp(0, lst.Count - 1, lst);
}
}
Arne
Marcel Müller replied to Peter Duniho on 25-Jul-12 01:20 PM
Oh, that is the point that I wanted to avoid.
it is unbelievable that business applications that use a modern
development environment need to ship with their own general purpose sort
algorithmn, is not it?.
I have implemented quicksort several time on several platforms. But most
times I had old environments like REXX or C.
I know.
Marcel
Arne Vajhøj replied to Marcel Müller on 25-Jul-12 01:46 PM
No.
You do not want to create a sort method that may perform very
poorly depending on the IList implementation.
You want to create a sort method for something where you know
access via index is fast and then developers can convert as
mentioned in original post.
Arne
Arne Vajhøj replied to Arne Vajhøj on 25-Jul-12 01:52 PM
Java Collections.sort sort List<T> which is the interface
in Java.
But note what they write in the docs:
This algorithm offers guaranteed n log(n) performance. This
implementation dumps the specified list into an array, sorts the array,
and iterates over the list resetting each element from the corresponding
position in the array. This avoids the n2 log(n) performance that would
result from attempting to sort a linked list in place.
Arne
Marcel Müller replied to Arne Vajhøj on 25-Jul-12 04:56 PM
Hmm, in C++ STL it is best practice not to expose interfaces that cannot
be implemented reasonably fast. Consequently linked lists do not provide
direct access.
Most of the time .NET follows this rule too. I.e. properties (like Item)
should be O(1) or at most O(log(n)). AFAIK there is no .NET container
that implements IList<> with O(n) member access.
Looking back 20 years, I also prefer this rule, since programmers tend
not to look at the performance of each function that they call in a
loop. I had hundreds of sessions where I had to remove code like that
because it was incredibly slow or it used (shared) system resources
excessively.
From that point of view, LINQ is a pain. While I personally like the
expressive syntax, performance problems have become much more common
with LINQ. Entire sub expressions are evaluated over and over
accidentally. LINQ does not fit very well into non-functional languages.
But there are still many advantages too.
Marcel
Arne Vajhøj replied to Marcel Müller on 25-Jul-12 05:25 PM
.NET LinkedList<T> does not implement IList<T>.
I do not even remember anything besides the array backed List<T> that
implements IList<T>.
But I assume that the original poster must have some class in
mind otherwise the problem would not exist.
And the most likely reason for the existence of such a class
must be that for whatever reason array backed it not possible.
Classic tradeoff.
I have noted the effect as well.
But for small data sizes and modern hardware it does not matter.
Arne
Marcel Müller replied to Arne Vajhøj on 25-Jul-12 06:28 PM
Hmm, seems you are right. And the implicit conversion from T[] to
IList<T> seems to be some secret too.
No, I prefer IList<T> in public interfaces because it offers the
implementer the option to use T[] for long lived or constant size arrays
and List<T> for arrays that are subject to change. Furthermore I can
replace IList<T> with a read-only proxy in debug builds to ensure
constness in some places where anything else would result in hard to
find race-conditions. Of course, not in this case.
Marcel
Arne Vajhøj replied to Marcel Müller on 25-Jul-12 06:52 PM
I would make it readonly in all build for those cases.
Allowing caller code to access the data directly is
breaking encapsulation.
If a sort is needed then the class should have a sort
method that could sort the implementation.
But OK I have no guarantee that the original poster sees
it that way, so I see your point.
Arne
Peter Duniho replied to Marcel Müller on 25-Jul-12 08:48 PM
.NET does already include at least two different sorting APIs (not even
counting "order by" in LINQ, and other corner cases).
You have a specific requirement that is more constraining that what .NET
offers. But you have not explained the requirement, and it is unusual enough
that it is not surprising that you might have to "roll your own". For most
applications, the difference between O(n log n) and O(n (log n + 2)) is
insignificant, especially given how incredibly fast sequential memory
copies are on modern hardware.
I am not surprised at all that .NET has not addressed this narrow business
case. it is a general-purpose framework, not an "end-all, be-all" one.
What, exactly, is wrong with copying your data to a temporary array,
sorting that, and then copying it back to the IList<T> object?
Pete
Arne Vajhøj replied to Peter Duniho on 25-Jul-12 09:11 PM
Strictly speaking there are no difference between O(n log n)
and O(n (log n + 2)) at all.
Not that it really matters for your argument, which I agree with.
Arne