From AS3 to C#, Part 19: SQL-Style Queries With LINQ
The series continues today by looking at C#’s SQL-style queries called LINQ. These don’t run on a database, but rather query in-memory objects like arrays. Read on to learn about this powerful tool for writing extremely concise, readable code.
Table of Contents
- From AS3 to C#, Part 1: Class Basics
- From AS3 to C#, Part 2: Extending Classes and Implementing Interfaces
- From AS3 to C#, Part 3: AS3 Class Parity
- From AS3 to C#, Part 4: Abstract Classes and Functions
- From AS3 to C#, Part 5: Static Classes, Destructors, and Constructor Tricks
- From AS3 to C#, Part 6: Extension Methods and Virtual Functions
- From AS3 to C#, Part 7: Special Functions
- From AS3 to C#, Part 8: More Special Functions
- From AS3 to C#, Part 9: Even More Special Functions
- From AS3 to C#, Part 10: Alternatives to Classes
- From AS3 to C#, Part 11: Generic Classes, Interfaces, Methods, and Delegates
- From AS3 to C#, Part 12: Generics Wrapup and Annotations
- From AS3 to C#, Part 13: Where Everything Goes
- From AS3 to C#, Part 14: Built-in Types and Variables
- From AS3 to C#, Part 15: Loops, Casts, and Operators
- From AS3 to C#, Part 16: Lambdas and Delegates
- From AS3 to C#, Part 17: Conditionals, Exceptions, and Iterators
- From AS3 to C#, Part 18: Resource Allocation and Cleanup
- From AS3 to C#, Part 19: SQL-Style Queries With LINQ
- From AS3 to C#, Part 20: Preprocessor Directives
- From AS3 to C#, Part 21: Unsafe Code
- From AS3 to C#, Part 22: Multi-Threading and Miscellany
- From AS3 to C#, Part 23: Conclusion
LINQ can be thought of as a more general version of the E4X support in AS3. Instead of querying only relatively-rare XML objects, it queries extremely-common objects like arrays, lists, dictionaries, and so forth. As a reminder, here’s how E4X looks in AS3:
function findAdultSmiths(xml:XML): XMLList { // E4X expression here return xml.(@lastName == "Smith" && Number(@age) >= 18); } // Data to query var xml:XML = <people> <person lastName="Jones" age="11" /> <person lastName="Adams" age="22" /> <person lastName="Smith" age="40" /> <person lastName="Smith" age="5" /> </people>; // Loop over results of the query var adultSmiths:XMLList = findAdultSmiths(people.person) for each (var person:XML in adultSmiths) { trace("LastName: " + person.@lastName + ", Age: " + person.@age); } // Output: // LastName: Smith, Age: 40
But what if you had a simple Array
or Vector
of Person
objects? E4X would be of no use unless you first converted it to an XML document. That’d be extremely slow and messy. C#’s LINQ queries provide an answer to that scenario. Here’s how the above E4X expression would look as a LINQ query:
public class Person { public string LastName { get; private set; } public int Age{ get; private set; } public Person(string lastName, int age) { LastName = lastName; Age = age; } } public IEnumerable<Person> FindAdultSmiths(Person[] people) { // LINQ query here return from p in people where p.LastName == "Smith" && p.Age >= 18 select p; } // Data to query var people = new Person[]{ new Person("Jones", 11), new Person("Adams", 22), new Person("Smith", 40), new Person("Smith", 5) }; // Loop over results of the query var adultSmiths = FindAdultSmiths(people) foreach (var person in adultSmiths) { Debug.Log("LastName: " + person.LastName + ", Age: " + person.Age); } // Output: // LastName: Smith, Age: 40
You might have noticed that LINQ queries return an IEnumerable
of some type, just like iterators. This is because they basically are iterators that have not yet gone through the data source evaluating the query expressions. LINQ queries are “lazy” evaluated during the foreach
loop or as you manually go through the IEnumerable
‘s values.
If you want, you can force the query to complete right away with some extension methods found in the System.Linq
namespace. Here’s an example:
using System.Linq; public Person[] FindAdultSmiths(Person[] people) { var e = from p in people where p.LastName == "Smith" && p.Age >= 18 select p; return e.ToArray(); // force it }
Other ways of forcing the LINQ query to execute include ToList()
that returns a System.Collections.Generic.List<T>>
(similar to Vector
in AS3) and FirstOrDefault
that returns the first result or the default value (usually null
) if not found. These are all just convenience methods that replace boilerplate code like this FirstOrDefault
sample implementation:
public static class IEnumerableExtensions { public static T MyFirstOrDefault<T>(this IEnumerable<T> e) { var e2 = e.GetEnumerator(); return e2.MoveNext() ? e2.Current : default(T); } } var e = from p in people where p.LastName == "Smith" && p.Age >= 18 select p; var first = e.MyFirstOrDefault(); Debug.Log("LastName: " + first.LastName + ", Age: " + first.Age);
Now let’s delve more deeply into what you can do inside LINQ queries. Here is the same query as above but broken onto multiple lines:
from p in people where p.LastName == "Smith" && p.Age >= 18 select p
The from
part always comes first and specifies the data source to query: people
. It also names a loop iterator variable p
that will represent the current Person
the query is being run on.
Next is the where
part that acts like a filter. This has access to the iterator variable p
and must return a bool
based on that. You can write arbitrarily-complex expressions here, including function calls or even nested LINQ queries.
Lastly there is the select
part. This determines the type of IEnumerable<T>
that the LINQ query will evaluate to. For example, if you wanted it to evaluate to an IEnumerable<int>
of ages, you could have written select p.Age
instead. Like the where
part, this expression can also be arbitrarily complex and can return any type.
These are three basic parts of simple LINQ queries and they already replace quite a lot of code:
public Person[] FindAdultSmiths(Person[] people) { var list = new List<People>(); foreach (var p in people) { if (p.LastName == "Smith" && p.Age >= 18) { list.Add(p); } } return list.ToArray(); }
However, LINQ queries can be much more complex. Let’s start exploring their features by looking at orderby
clauses. These allow you to sort the results either ascending or descending:
from p in people where p.LastName == "Smith" && p.Age >= 18 orderby p.Age // sort ascending by age select p
The default is an ascending sort, but you can specify explicitly if you want:
from p in people where p.LastName == "Smith" && p.Age >= 18 orderby p.Age descending select p
You can also add more than one sorting criteria to allow for tie-breaking:
from p in people where p.LastName == "Smith" && p.Age >= 18 orderby p.Age ascending, p.LastName descending select p
There is also the group
clause that allows you to group results together into an IGrouping<T>
instead of an IEnumerable<T>
. Here’s a query that would group people
by the first letter of their last name:
var grouping = from p in people group p by p.LastName[0]; foreach (var group in grouping) { Debug.Log("Group: " + group.Key); foreach (var person in group) { Debug.Log(" LastName: " + person.LastName + ", Age: " + person.Age); } } /* Output: Group: J LastName: Jones, Age: 11 Group: A LastName: Adams, Age: 22 Group: S LastName: Smith, Age: 40 LastName: Smith, Age: 5 */
The by
sub-clause allows for an arbitrary expression and can return whatever it wants to.
You can also nest LINQ queries. The following example shows an outer query that goes over sentences and an inner query that goes over the words in those sentences:
var sentences = new string[] { "Once upon a time", "They lived happily ever after" }; var query = from sentence in sentences from word in sentence.Split(' ') where word.ToLower()[0] == 'a' || word.ToLower()[0] == 'e' || word.ToLower()[0] == 'i' || word.ToLower()[0] == 'o' || word.ToLower()[0] == 'u' select word; foreach (var word in query) { Debug.Log(word); } /* Output: Once upon a ever after */
One problem with this query is that it does a lot of redundant work by calling ToLower
over and over. This is where the let
clause comes in. It’s useful for defining a constant variable so it’s not evaluated over and over in the where
clause. Here’s how that looks:
var query = from sentence in sentences from word in sentence.Split(' ') let firstLetter = word.ToLower()[0] where firstLetter == 'a' || firstLetter == 'e' || firstLetter == 'i' || firstLetter == 'o' || firstLetter == 'u' select word;
Lastly, there is the concept of a “join”. The simplest type of join is called an “inner join” and it’s where you take two lists and join them together based on something they have in common. For example, we could join the list of people with a list of houses that have people that own them:
public class House { public string Address { get; private set; } public Person Owner { get; private set; } public House(string address, Person owner) { Address = address; Owner = owner; } } var jones11 = new Person("Jones", 11); var adams22 = new Person("Adams", 22); var smith40 = new Person("Smith", 40); var smith05 = new Person("Smith", 5); var people = new Person[]{ jones11, adams22, smith40, smith05 }; var houses = new House[]{ new House("123 Main St", adams22), new House("211 Flower Blvd", smith40) } var query = from p in people join h in houses on p equals h.Owner select new { Person=p, House=h }; foreach (var result in query) { Debug.Log( "LastName: " + result.Person.LastName + " lives at: " + result.House.Address ); } /* Output: LastName: Adams lives at: 123 Main St LastName: Smith lives at: 211 Flower Blvd */
Notice that the join
declares a new iterator variable as it essentially is doing a nested search to find what the equals
sub-clause is searching for. The select
clause here is returning an anonymous type since there’s rarely a class for “person and house” available.
If you add an into
sub-clause at the end of the join
clause, you get what’s called a “group join”. Just with the group
clause we used before, this will produce groups as a result:
var query = from p in people join h in houses on p equals h.Owner into g select new { Person=p, Group=g }; foreach (var result in query) { var person = result.Person; Debug.Log(person.LastName + "(Age=" + person.Age + ") lives at:"); foreach (var house in result.Group) { Debug.Log(" " + house.Address); } } /* Output: Jones(Age=11) lives at: Adams(Age=22) lives at: 123 Main St Smith(Age=40) lives at: 211 Flower Blvd Smith(Age=5) lives at: */
Here the results are a little different. Rather than just return the matches, an empty group is returned if there is no match. The children (Jones, 11 and Smith, 5) don’t own any houses, so they have an empty group.
The final kind of join is called a “left outer join” and can help address this issue. It’ll allow you to specify an alternate value to return if there is an empty group. You do this by nesting a from
clause and using the DefaultIfEmpty
helper function:
var query = from p in people join h in houses on p equals h.Owner into g from h2 in g.DefaultIfEmpty() select new { Person = p, Residence = (h2 == null ? "is homeless" : "lives at " + h2.Address) }; foreach (var result in query) { var person = result.Person; Debug.Log(person.LastName + " (Age=" + person.Age + ") " + result.Residence); } /* Output: Jones (Age=11) is homeless Adams (Age=22) lives at 123 Main St Smith (Age=40) lives at 211 Flower Blvd Smith (Age=5) is homeless */
Notice that the results are once again a single-dimensional list, not a list of groups.
To summarize the conciseness of LINQ queries compared to the code we’d have to manually write without them, the following table will show the C# LINQ query and the equivalent in AS3 where they’re not available (aside from XML). Which would you rather write, read, and maintain?
//////// // C# // //////// var query = from p in people join h in houses on p equals h.Owner into g orderby p.Age descending, p.LastName ascending select new { Person=p, Group=g }; foreach (var result in query) { var person = result.Person; Debug.Log(person.LastName + "(Age=" + person.Age + ") lives at:"); foreach (var house in result.Group) { Debug.Log(" " + house.Address); } } //
///////// // AS3 // ///////// class PersonAndHouses { var person:Person; var houses:Vector.<House> = new <House>[]; } private function RunQuery( people:Vector.<Person>, houses:Vector.<House> ): Vector.<PersonAndHouses> { var results:Vector.<PersonAndHouses> = new <PersonAndHouses>[]; for each (var p:Person in people) { var result:PersonAndHouses = new PersonAndHouses(); result.person = p; results.push(result); for each (var h:House in houses) { if (h.owner == p) { result.houses.push(h); } } } results.sort( function(a:PersonAndHouses, b:PersonAndHouses): Number { var primary:int = b.person.age - a.person.age; if (primary == 0) { return a.person.lastName < b.person.lastName ? -1 : 1; } else { return Number(primary); } } ); return results; } for each (var result:PersonAndHouses in RunQuery(people, houses)) { var person:Person = result.person; trace(person.lastName + "(Age=" + person.age + ") lives at:"); for each (var house:House in result.houses) { trace(" " + house.address); } }
That’s all for today. Stay tuned for next week when we’ll continue the series with even more exciting new features in C#!
Spot a bug? Have a question or suggestion? Post a comment!
#1 by henke37 on November 24th, 2014 ·
Are you sure that it isn’t “order by”, with the space?
#2 by jackson on November 24th, 2014 ·
Yep, it’s really orderby.