JacksonDunstan.com

The series continues today by looking at C#’s SQL-style queries called LINQ. These don’t run on a database, but rather query in-memory objects like arrays. Read on to learn about this powerful tool for writing extremely concise, readable code.

Table of Contents

LINQ can be thought of as a more general version of the E4X support in AS3. Instead of querying only relatively-rare XML objects, it queries extremely-common objects like arrays, lists, dictionaries, and so forth. As a reminder, here’s how E4X looks in AS3:

function findAdultSmiths(xml:XML): XMLList
{
	// E4X expression here
	return xml.(@lastName == "Smith" && Number(@age) >= 18);
}
 
// Data to query
var xml:XML = <people>
	<person lastName="Jones" age="11" />
	<person lastName="Adams" age="22" />
	<person lastName="Smith" age="40" />
	<person lastName="Smith" age="5" />
</people>;
 
// Loop over results of the query
var adultSmiths:XMLList = findAdultSmiths(people.person)
for each (var person:XML in adultSmiths)
{
	trace("LastName: " + person.@lastName + ", Age: " + person.@age);
}
 
// Output:
// LastName: Smith, Age: 40

But what if you had a simple Array or Vector of Person objects? E4X would be of no use unless you first converted it to an XML document. That’d be extremely slow and messy. C#’s LINQ queries provide an answer to that scenario. Here’s how the above E4X expression would look as a LINQ query:

public class Person
{
	public string LastName { get; private set; }
	public int Age{ get; private set; }
	public Person(string lastName, int age)
	{
		LastName = lastName;
		Age = age;
	}
}
 
public IEnumerable<Person> FindAdultSmiths(Person[] people)
{
	// LINQ query here
	return from p in people where p.LastName == "Smith" && p.Age >= 18 select p;
}
 
// Data to query
var people = new Person[]{
	new Person("Jones", 11),
	new Person("Adams", 22),
	new Person("Smith", 40),
	new Person("Smith", 5)
};
 
// Loop over results of the query
var adultSmiths = FindAdultSmiths(people)
foreach (var person in adultSmiths)
{
	Debug.Log("LastName: " + person.LastName + ", Age: " + person.Age);
}
 
// Output:
// LastName: Smith, Age: 40

You might have noticed that LINQ queries return an IEnumerable of some type, just like iterators. This is because they basically are iterators that have not yet gone through the data source evaluating the query expressions. LINQ queries are “lazy” evaluated during the foreach loop or as you manually go through the IEnumerable‘s values.

If you want, you can force the query to complete right away with some extension methods found in the System.Linq namespace. Here’s an example:

using System.Linq;
 
public Person[] FindAdultSmiths(Person[] people)
{
	var e = from p in people where p.LastName == "Smith" && p.Age >= 18 select p;
	return e.ToArray(); // force it
}

Other ways of forcing the LINQ query to execute include ToList() that returns a System.Collections.Generic.List<T>> (similar to Vector in AS3) and FirstOrDefault that returns the first result or the default value (usually null) if not found. These are all just convenience methods that replace boilerplate code like this FirstOrDefault sample implementation:

public static class IEnumerableExtensions
{
	public static T MyFirstOrDefault<T>(this IEnumerable<T> e)
	{
		var e2 = e.GetEnumerator();
		return e2.MoveNext() ? e2.Current : default(T);
	}
}
 
var e = from p in people where p.LastName == "Smith" && p.Age >= 18 select p;
var first = e.MyFirstOrDefault();
Debug.Log("LastName: " + first.LastName + ", Age: " + first.Age);

Now let’s delve more deeply into what you can do inside LINQ queries. Here is the same query as above but broken onto multiple lines:

from p in people
where p.LastName == "Smith" && p.Age >= 18
select p

The from part always comes first and specifies the data source to query: people. It also names a loop iterator variable p that will represent the current Person the query is being run on.

Next is the where part that acts like a filter. This has access to the iterator variable p and must return a bool based on that. You can write arbitrarily-complex expressions here, including function calls or even nested LINQ queries.

Lastly there is the select part. This determines the type of IEnumerable<T> that the LINQ query will evaluate to. For example, if you wanted it to evaluate to an IEnumerable<int> of ages, you could have written select p.Age instead. Like the where part, this expression can also be arbitrarily complex and can return any type.

These are three basic parts of simple LINQ queries and they already replace quite a lot of code:

public Person[] FindAdultSmiths(Person[] people)
{
	var list = new List<People>();
	foreach (var p in people)
	{
		if (p.LastName == "Smith" && p.Age >= 18)
		{
			list.Add(p);
		}
	}
	return list.ToArray();
}

However, LINQ queries can be much more complex. Let’s start exploring their features by looking at orderby clauses. These allow you to sort the results either ascending or descending:

from p in people
where p.LastName == "Smith" && p.Age >= 18
orderby p.Age // sort ascending by age
select p

The default is an ascending sort, but you can specify explicitly if you want:

from p in people
where p.LastName == "Smith" && p.Age >= 18
orderby p.Age descending
select p

You can also add more than one sorting criteria to allow for tie-breaking:

from p in people
where p.LastName == "Smith" && p.Age >= 18
orderby p.Age ascending, p.LastName descending
select p

There is also the group clause that allows you to group results together into an IGrouping<T> instead of an IEnumerable<T>. Here’s a query that would group people by the first letter of their last name:

var grouping = from p in people group p by p.LastName[0];
foreach (var group in grouping)
{
	Debug.Log("Group: " + group.Key);
	foreach (var person in group)
	{
		Debug.Log("  LastName: " + person.LastName + ", Age: " + person.Age);
	}
}
 
/* Output:
Group: J
  LastName: Jones, Age: 11
Group: A
  LastName: Adams, Age: 22
Group: S
  LastName: Smith, Age: 40
  LastName: Smith, Age: 5
*/

The by sub-clause allows for an arbitrary expression and can return whatever it wants to.

You can also nest LINQ queries. The following example shows an outer query that goes over sentences and an inner query that goes over the words in those sentences:

var sentences = new string[] {
	"Once upon a time",
	"They lived happily ever after"
};
var query =
	from sentence in sentences
		from word in sentence.Split(' ')
		where word.ToLower()[0] == 'a' || word.ToLower()[0] == 'e'
			|| word.ToLower()[0] == 'i' || word.ToLower()[0] == 'o'
			|| word.ToLower()[0] == 'u'
		select word;
foreach (var word in query)
{
	Debug.Log(word);
}
 
/* Output:
Once
upon
a
ever
after
*/

One problem with this query is that it does a lot of redundant work by calling ToLower over and over. This is where the let clause comes in. It’s useful for defining a constant variable so it’s not evaluated over and over in the where clause. Here’s how that looks:

var query =
	from sentence in sentences
		from word in sentence.Split(' ')
		let firstLetter = word.ToLower()[0]
		where firstLetter == 'a' || firstLetter == 'e'
			|| firstLetter == 'i' || firstLetter == 'o'
			|| firstLetter == 'u'
		select word;

Lastly, there is the concept of a “join”. The simplest type of join is called an “inner join” and it’s where you take two lists and join them together based on something they have in common. For example, we could join the list of people with a list of houses that have people that own them:

public class House
{
	public string Address { get; private set; }
	public Person Owner { get; private set; }
	public House(string address, Person owner)
	{
		Address = address;
		Owner = owner;
	}
}
 
var jones11 = new Person("Jones", 11);
var adams22 = new Person("Adams", 22);
var smith40 = new Person("Smith", 40);
var smith05 = new Person("Smith", 5);
 
var people = new Person[]{ jones11, adams22, smith40, smith05 };
 
var houses = new House[]{
	new House("123 Main St", adams22),
	new House("211 Flower Blvd", smith40)
}
 
var query = from p in people
	join h in houses on p equals h.Owner
	select new { Person=p, House=h };
foreach (var result in query)
{
	Debug.Log(
		"LastName: " + result.Person.LastName
		+ " lives at: " + result.House.Address
	);
}
 
/* Output:
LastName: Adams lives at: 123 Main St
LastName: Smith lives at: 211 Flower Blvd
*/

Notice that the join declares a new iterator variable as it essentially is doing a nested search to find what the equals sub-clause is searching for. The select clause here is returning an anonymous type since there’s rarely a class for “person and house” available.

If you add an into sub-clause at the end of the join clause, you get what’s called a “group join”. Just with the group clause we used before, this will produce groups as a result:

var query = from p in people
	join h in houses on p equals h.Owner into g
	select new { Person=p, Group=g };
foreach (var result in query)
{
	var person = result.Person;
	Debug.Log(person.LastName + "(Age=" + person.Age + ") lives at:");
	foreach (var house in result.Group)
	{
		Debug.Log("  " + house.Address);
	}
}
 
/* Output:
Jones(Age=11) lives at:
Adams(Age=22) lives at:
  123 Main St
Smith(Age=40) lives at:
  211 Flower Blvd
Smith(Age=5) lives at:
*/

Here the results are a little different. Rather than just return the matches, an empty group is returned if there is no match. The children (Jones, 11 and Smith, 5) don’t own any houses, so they have an empty group.

The final kind of join is called a “left outer join” and can help address this issue. It’ll allow you to specify an alternate value to return if there is an empty group. You do this by nesting a from clause and using the DefaultIfEmpty helper function:

var query = from p in people
	join h in houses on p equals h.Owner into g
	from h2 in g.DefaultIfEmpty()
	select new {
		Person = p,
		Residence = (h2 == null ? "is homeless" : "lives at " + h2.Address)
	};
foreach (var result in query)
{
	var person = result.Person;
	Debug.Log(person.LastName + " (Age=" + person.Age + ") " + result.Residence);
}
 
/* Output:
Jones (Age=11) is homeless
Adams (Age=22) lives at 123 Main St
Smith (Age=40) lives at 211 Flower Blvd
Smith (Age=5) is homeless
*/

Notice that the results are once again a single-dimensional list, not a list of groups.

To summarize the conciseness of LINQ queries compared to the code we’d have to manually write without them, the following table will show the C# LINQ query and the equivalent in AS3 where they’re not available (aside from XML). Which would you rather write, read, and maintain?

////////
// C# //
////////
var query = from p in people
	join h in houses on p equals h.Owner into g
	orderby p.Age descending, p.LastName ascending
	select new { Person=p, Group=g };
foreach (var result in query)
{
	var person = result.Person;
	Debug.Log(person.LastName + "(Age=" + person.Age + ") lives at:");
	foreach (var house in result.Group)
	{
		Debug.Log("  " + house.Address);
	}
}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
//

/////////
// AS3 //
/////////
class PersonAndHouses
{
	var person:Person;
	var houses:Vector.<House> = new <House>[];
}
 
private function RunQuery(
	people:Vector.<Person>,
	houses:Vector.<House>
): Vector.<PersonAndHouses>
{
	var results:Vector.<PersonAndHouses> = new <PersonAndHouses>[];
	for each (var p:Person in people)
	{
		var result:PersonAndHouses = new PersonAndHouses();
		result.person = p;
		results.push(result);
 
		for each (var h:House in houses)
		{
			if (h.owner == p)
			{
				result.houses.push(h);
			}
		}
	}
	results.sort(
		function(a:PersonAndHouses, b:PersonAndHouses): Number
		{
			var primary:int = b.person.age - a.person.age;
			if (primary == 0)
			{
				return a.person.lastName < b.person.lastName
					? -1
					: 1;
			}
			else
			{
				return Number(primary);
			}
		}
	);
	return results;
}
 
for each (var result:PersonAndHouses in RunQuery(people, houses))
{
	var person:Person = result.person;
	trace(person.lastName + "(Age=" + person.age + ") lives at:");
	for each (var house:House in result.houses)
	{
		trace("  " + house.address);
	}
}

That’s all for today. Stay tuned for next week when we’ll continue the series with even more exciting new features in C#!

Continue to Part 20

Spot a bug? Have a question or suggestion? Post a comment!

From AS3 to C#, Part 19: SQL-Style Queries With LINQ

Comments