I started writing this post more than a year ago. Bought one year of hosting, installed BlogEngine and started writing. Then…a new job, workload, procrastination. Completely forgot about it. Thanks to newfound programming and writing energy I have now picked it up again.
Anyone who works extensively with Linq to SQL (L2S) will eventually notice that a lot of performance is lost in the phase between executing the Linq query and seeing it end up as a SQL query. Linq to SQL is really good at converting complex queries and projections into working SQL - quite a lot better than EF actually – but it seems to come at a cost.
Thankfully, L2S provides a remedy, in the form of CompiledQuery. It lets us statically define a query as a delegate type, which will perform most of the heavy query conversion stuff only once, and then just sit and wait for parameters.
public static Func<Northwnd, string, IQueryable<Customer>>
CustomersByCity =
CompiledQuery.Compile((Northwnd db, string city) =>
from c in db.Customers where c.City == city select c);
public static Func<Northwnd, string, IQueryable<Customer>>
CustomersById = CompiledQuery.Compile((Northwnd db,
string id) => db.Customers.Where(c => c.CustomerID == id));
Very useful and in most cases significantly faster than standard queries.
What’s the problem? Well…
- CompiledQuery can’t project to anonymous types
- It doesn’t work in dynamic scenarios where criteria are added based on e.g an input form
- The delegate can only take 3 input parameters (can be solved by wrapping the parameters in containers)
And above all…
- Developers are lazy….Well, not all, but many. I know I am..
The boring part about creating CompiledQuery:s can be reduced with code generators (e.g T4), but being able to just type the queries when you need them and where you need them is darn productive…
Could it be possible to get the speed from CompiledQuery and keep the flexibility of standard queries? Well, my fist initial response/thought was a resounding “Maybe!”, so I set off to find out.
My idea was to:
- Intercept the queries before they’re sent off to be executed
- Examine the expression tree, replacing everything that resembles a variable with an input parameter
- Check if the rewritten query existed in the cache
- If not, convert it into a compiled query and store it in the cache. If yes, retrieve it.
- Execute the query, passing the extracted variables as parameters.
Absolutely not a novel idea, perhaps it’s even a bad idea, but I wanted to give it a shot anyway.
Intercept the queries before they’re sent off to be executed
This was the easy part. I borrowed the concept used in LinqKit’s AsExpandable extension method, which wraps the IQueryable<T> into an ExpandableQuery<T>, which in turn fires up an ExpandableQueryProvider<T> which all queries invoke when they are executed.
I created a CompilableQuery<T> and a CompilableQueryProvider<T> where all the shady stuff should happen.
Examine the expression tree, replacing stuff…
Given the query:
var name = "Something"
var customers = from c in db.Customers
where c.Name=name and c.Orders.Count()>5
select c;
we have one clear parametrization candidate, the “name” variable. But should we also extract the contant value “5”? No, it’s likely a constant for a reason and unless the query is the result of Dynamic Linq, we should just let it stay constant. Or, should we not? We may need to revisit this…
Speaking of revisiting. That’s exactly what we should do. Send an expression visitor on its merry way along the branches of the expression tree. Let’s start off with the simplest query imaginable..
var product=(from p in db.Products.AsCompilable()
where p.ProductID==1
select p).Single();
Which results in this expression tree

We want to convert that picture (it’s a PNG) into a CompiledQuery like this one..
Func<Northwind,Product> theQuery = CompiledQuery.Compile((Northwind db)=>
(from p in db.Products
where p.ProductID==1
select p).Single()
);
In other words, we need to extract the DataContext from the query, since it is the only required parameter for the delegate. But it’s nowhere to be found… The closest we get is the ConstantExpression with “Table(Products)”, which means that we have no way of knowing that what actually was used was the “Products” property of a Northwind DataContext. Not a biggie, actually. We can get the DataContext instance from the Context property of the Table and the Products property is only a wrapper around GetTable<Product>, so we can use that as a replacement. Let’s try that in the newcomer, ParametrizingExpressionVisitor.
protected override Expression VisitConstant(ConstantExpression c)
{
if (c.Type.IsGenericType && c.Type.GetGenericTypeDefinition() == typeof(Table<>))
{
var tableType = c.Type.GetGenericArguments()[0];
var dataContext = c.Value.GetType().GetProperty("Context").GetValue(c.Value, null) as DataContext;
return Expression.Call(Expression.Constant(dataContext), "GetTable", new Type[] { tableType });
}
return base.VisitConstant(c);
}
Ladida! The query still works.
Moving on to extract the parameters. Let’s modify the query slightly.
var product=(from p in db.Products.AsCompilable()
where p.ProductID==productId
select p).Single();
The constant “1” is now a variable, which we want to extract and use as a parameter. Let’s look at the expression tree.

The simple ConstantExpression, representing the “1” has now become a slightly more complex MemberExpression, referring to a strange type called “<>c__DisplayClass13”. That’s actually a compiler generated class, which is needed in order to access “outside” variables inside the scope of a lambda function. Matt Manela explains it nicely.
So, in order to replace this little MemberExpression with a ParameterExpression, we do this…
protected override Expression VisitMemberAccess(MemberExpression m)
{
if (m.Member.DeclaringType.Name.StartsWith("<>c__") && m.Member.MemberType == System.Reflection.MemberTypes.Field)
{
FieldInfo field = m.Member as FieldInfo;
if (field != null)
{
object obj = ((ConstantExpression)m.Expression).Value;
//Add the value to the extraction list
_parameterValues.Add(field.GetValue(obj));
return Expression.Parameter("__param" + _parameterValues.Count,field.FieldType);
}
}
return base.VisitMemberAccess(m);
}
This worked fine, and I was able to compile and execute a CompiledQuery from the rewritten expression tree, using Reflection. But when adding a large set of input parameters, I got this:
System.ArgumentException : An incorrect number of type args were specified for the declaration of a Func type.
at System.Linq.Expressions.Expression.GetFuncType(Type[] typeArgs)
Arg(h)! That means we instead have to use some kind of “parameter container” and send that into the query instead. We could use a Dictionary or List or something like that. In fact, let’s just keep things really simple and go for an object[], i.e an array of System.Object. The parameter replacement is now looking like this:
protected override Expression VisitMemberAccess(MemberExpression m)
{
//If the member is a DisplayClass
if (m.IsCapturedField())
{
FieldInfo field = m.Member as FieldInfo;
if (field != null)
{
//Get the value
object obj = ((ConstantExpression)m.Expression).Value;
var value = field.GetValue(obj);
Type arrayType;
//Expression arrayIndex;
//If SupportEnumerableParameters and the type is an enumerable
if (_compilationParameters.SupportEnumerableParameters && (arrayType = field.FieldType.GetGenericIEnumerables().FirstOrDefault()) != null)
{
var values = new List<Expression>();
foreach (var item in ((IEnumerable)value))
{
values.Add(AddParameterWithValue(arrayType, item));
}
return Expression.NewArrayInit(arrayType, values);
}
else
{
return AddParameterWithValue(field.FieldType, value);
}
}
}
return base.VisitMemberAccess(m);
}
private Expression AddParameterWithValue(Type valueType,object value)
{
//Create a ConstantExpression to hold the index in the array
var arrayIndex = Expression.Constant(_parameterValues.Count);
//Add the value to the list that is returned after visiting hours
_parameterValues.Add(value);
//Return a replacement expression, which retreives the value from the array and converts (casts) it to its original type
return Expression.Convert(Expression.ArrayIndex(_parameterContainerParam, arrayIndex), valueType);
}
And since we’re now back to having just one parameter, we won’t need the reflection stuff to use CompiledQuery. The result from the parametrization phase is stored in a class exposing the rewritten Expression, references to the ParameterExpressions used, a reference to the actual DataContext instance, and a list of parameter values.
var lambda = Expression.Lambda(result.Expression, new ParameterExpression[] {
result.DataContextParameter, result.ParameterContainerParameter }
) as Expression<Func<DataContext, object[], TResult>>;
var query = CompiledQuery.Compile<DataContext, object[], TResult>(lambda);
return query.Invoke(result.DataContext, result.ParametersValues.ToArray());
Caching
We now have a query provider which precompiles every query, every time. It’s good to see that it works, but clearly, we need some kind of cache to make this useful.
To be able to cache something we need an identifying key from the query. Expression.ToString() should do the trick.
var product = (from p in products
where p.Category.CategoryID == categoryId1
select p).Count();
gives us…
__context.GetTable().Where(p => (p.Category.CategoryID = Convert(__params[0]))).Count()
which is exactly what we n……..oooo, wait a minute….That the “Convert” doesn’t say anything about what type it’s converting to may be a problem, but the bigger problem is that GetTable() doesn’t say anything about by which Table<T> it is getting. If we have more than one type of entity exposing something named Category.CategoryID we will get the same cache key. We could get around that by maintaining separate caches for all types of T, and that may be a good idea anyway, but still…. I’d prefer to get a uniquely identifying query string.
Clearly then, Expression.ToString isn’t enough, which leaves us no other option than to find another solution. Modifying the code from the existing ToString method would be an easy way out, but it’s not really legal, so I chose to use the ExpressionPrinter class from Mono and modify it slightly. While I was at it I added a bit of parameter name normalization to the ParametrizingExpressionVisitor. The result…
__context.GetTable<Product>().Where<Product>(p0 => (p0.Category.CategoryID = ((Int32)__params[0]))).Count<Product>()
A bit too verbose, perhaps, but it will do for now.
With that problem out of the way we can now check a cache for an existing CompiledQuery delegate, before trying to compile one. I decided to abstract the cache into a small interface.
public interface ICompiledQueryCache
{
Delegate GetOrAdd<TResult>(string key, Func<Delegate> valueFactory);
}
And the complete Execute method now looks like this:
TResult IQueryProvider.Execute<TResult>(Expression expression)
{
expression = ExpressionFlattener.Flatten(expression);
var result = ParametrizingExpressionVisitor.Process(expression,_compilationParameters);
string cacheKey = ExpressionPrinter.ToString(result.Expression);
var query = _cache.GetOrAdd<TResult>(cacheKey, () => {
var lambda = Expression.Lambda(result.Expression, new ParameterExpression[] {
result.DataContextParameter, result.ParameterContainerParameter }
) as Expression<Func<DataContext, object[], TResult>>;
return CompiledQuery.Compile<DataContext, object[], TResult>(lambda);
});
var compiledQuery = query as Func<DataContext, object[], TResult>;
return compiledQuery.Invoke(result.DataContext, result.ParameterValues.ToArray());
}
The ExpressionFlattener is derived from Matt Warren’s super smart PartialEval code, and is used to evaluate as much as possible of the expression tree before it’s sent further. Non compiled queries actually perform similar flattening, but CompiledQuery:s don’t, making it impossible to use something like int.Parse(col["productId"]) inside a query. My goal is feature parity with standard queries, so the flattener is needed.
For testing purposes I used a simple Dictionary as the backing store for the cache. I know that there are concurrency issues with this, but it can easily be swapped out for something smarter, such as .NET 4:s ConcurrentDictionary.
Does it work?
In theory, the result should be that the non-compiled queries take the same amount of time every time, whereas the compiled version would take the of compilation the first time, and stay consistent after that. In reality, we have external factors, not least the actual SQL query execution, which makes each iteration different from the other. In order to avoid as much influence from the SQL server as possible, a dry run of each query was run at the beginning of each iteration sequence.
Every query was run 10000 times, in a round robin fashion. To be able to get only the time consumed up until the actual SQL execution I created a small class called StopWatchStoppingTextWriter, which I hooked up to the DataContext’s Log property. The idea was that the StopWatch should be stopped as soon as the context tries to send the constructed SQL query to the logger, which it does just before it executes.
class StopWatchStoppingTextWriter : TextWriter
{
Stopwatch _watch;
public StopWatchStoppingTextWriter(Stopwatch watch)
{
_watch = watch;
}
public override void WriteLine(string value)
{
_watch.Stop();
}
public override Encoding Encoding
{
get { return Encoding.UTF8; }
}
}
Test query 1
This is a really simple query, retrieving the name from the “first” product. It should be an easy job for the query provider and precompilation shouldn’t help much.
(from x in products
orderby x.ProductID
select x.ProductName).First()

Ouch!
- The auto compiled version is a LOT slower on the first run. Not surprising, really, but still…
- It’s interesting to see how the standard CompiledQuery version seems to defer most of its work until the first run.
- The auto compiled version regains it’s glory on the subsequent runs. Given the nature of this query, it’s likely that it would be reused many times, so the initial hit should be worth taking.
Test query 2
The second query is a strange one, with parameters from the outside, subqueries, “paging” etc.
(from x in context.Products
where (x.CategoryID == categoryId1 || x.CategoryID == categoryId2)
&& x.SupplierID == iteration
|| x.Category.Products.Any(y => y.Order_Details.Any(od => od.Order.CustomerID.Length > 0))
orderby x.ProductID
select x).Skip(iteration).Take(50).Select(d => d.ProductID)

Nice!
- The auto compiled version is very close the non compiled version, even on the first run!
- Subsequent runs reduces the pre-SQL execution phase to 10% of the time required by the standard query.
- The CompiledQuery version remains the winner, but the lead is mostly eaten up the time required for SQL and materialization.
Test query 3
The last test query isn’t very complex, but it uses 11 input parameters. For the CompiledQuery version I passed an array of integers as a single parameter.
(from x in products
where (
x.CategoryID == categoryId1 ||
x.CategoryID == categoryId2 ||
x.CategoryID == categoryId3 ||
x.CategoryID == categoryId4 ||
x.CategoryID == categoryId5 ||
x.CategoryID == categoryId6 ||
x.CategoryID == categoryId7 ||
x.CategoryID == categoryId8 ||
x.CategoryID == categoryId9 ||
x.CategoryID == categoryId10)
orderby x.ProductID
select x).Skip(iteration).Take(50).Select(d => d.ProductID).ToList()

Wow!
- The auto compiled query wipes the floor with the non compiled query, even on the first run!
- It’s really close to the standard CompiledQuery!
- Subsequent queries increases the lead over the non compiled query, making it more than 10 times faster, including SQL.
- The distance to the standard CompiledQuery stays roughly the same, giving a good indication of the actual overhead of the parameterization and caching.
Conclusions
I think the tests show that this is indeed a useful approach. Provided that a working cache solution is used, it should be perfectly production worthy. It would require a very high query variance factor in order to not gain a lot from using this. The standard CompiledQuery is clearly faster, but it has its drawbacks. I mentioned some of them above, and here is what MSDN says:
It is an error to re-execute a compiled query using a DataContext that contains a DataLoadOptions different from the one originally used to compile the query (the first time it is executed). The reason for this behavior is that compiling the query requires information that is contained in the DataShape. When a query is executed again using a different DataShape, the cached query becomes invalid and an error is generated.
CompilableQuery could handle this by including the DataContext’s DataLoadOptions in the cache key.
If a new query operator is applied to the result of the delegate execution, an exception is generated.
This won’t happen, since the full query is compiled.
When you want to execute a query operator over the result of executing a compiled query, you must translate the result to a list before operating on it.
Again, this won’t happen.
You cannot currently store (in static variables) queries that return an anonymous type, because the type has no name to provide as a generic argument.
Not a problem. The anonymous types won’t be anonymous to the CompilableQueryProvider.
Next steps
- Completing the EF4 version. It works, but needs a bit of tweaking. Suffice to say, the EF implementation of CompiledQuery is a lot more picky about what you try to put into it.
- Cleaning up and refactoring, hopefully making it possible to keep a reusable rewriting core for both EF and L2S.
- Investigating caching options
- Real life tests
- Providing fallback when queries fails
- And a lot more stuff which I can’t remember right now
You can download the solution here, but I’ll upload it to GitHub or CodePlex as soon as I can decide which I like best.
As always (really, Gunnar, you’ve only written 3 blog posts…), I welcome any ideas or opinions.
PS. While Googling for a solution to a problem I had with the EF4 version, I found an article by Dan Simmons, showing a solution which is similar to mine. After the initial “bummer!” feeling, I must admit that it is an elegant solution, but not quite as transparent as AutoCompiler. But, maybe that’s good thing? DS
