Automatic compilation of Linq to SQL queries

by Gunnar Liljas 26. November 2010 00:03

I started writing this post more than a year ago. Bought one year of hosting, installed BlogEngine and started writing. Then…a new job, workload, procrastination. Completely forgot about it. Thanks to newfound programming and writing energy I have now picked it up again.

Anyone who works extensively with Linq to SQL (L2S) will eventually notice that a lot of performance is lost in the phase between executing the Linq query and seeing it end up as a SQL query. Linq to SQL is really good at converting complex queries and projections into working SQL - quite a lot better than EF actually – but it seems to come at a cost.

Thankfully, L2S provides a remedy, in the form of CompiledQuery. It lets us statically define a query as a delegate type, which will perform most of the heavy query conversion stuff only once, and then just sit and wait for parameters.

public static Func<Northwnd, string, IQueryable<Customer>>
    CustomersByCity =
        CompiledQuery.Compile((Northwnd db, string city) =>
            from c in db.Customers where c.City == city select c);

public static Func<Northwnd, string, IQueryable<Customer>>
    CustomersById = CompiledQuery.Compile((Northwnd db,
    string id) => db.Customers.Where(c => c.CustomerID == id));

Very useful and in most cases significantly faster than standard queries.

What’s the problem? Well…

  • CompiledQuery can’t project to anonymous types
  • It doesn’t work in dynamic scenarios where criteria are added based on e.g an input form
  • The delegate can only take 3 input parameters (can be solved by wrapping the parameters in containers)

And above all…

  • Developers are lazy….Well, not all, but many. I know I am.. Smile

The boring part about creating CompiledQuery:s can be reduced with code generators (e.g T4), but being able to just type the queries when you need them and where you need them is darn productive…

Could it be possible to get the speed from CompiledQuery and keep the flexibility of standard queries? Well, my fist initial response/thought was a resounding “Maybe!”, so I set off to find out.

My idea was to:

  1. Intercept the queries before they’re sent off to be executed
  2. Examine the expression tree, replacing everything that resembles a variable with an input parameter
  3. Check if the rewritten query existed in the cache
  4. If not, convert it into a compiled query and store it in the cache. If yes, retrieve it.
  5. Execute the query, passing the extracted variables as parameters.

Absolutely not a novel idea, perhaps it’s even a bad idea, but I wanted to give it a shot anyway.

 

Intercept the queries before they’re sent off to be executed

This was the easy part. I borrowed the concept used in LinqKit’s AsExpandable extension method, which wraps the IQueryable<T> into an ExpandableQuery<T>, which in turn fires up an ExpandableQueryProvider<T> which all queries invoke when they are executed.

I created a CompilableQuery<T> and a CompilableQueryProvider<T> where all the shady stuff should happen.

 

Examine the expression tree, replacing stuff…

Given the query:

var name = "Something" 
var customers = from c in db.Customers 
                where c.Name=name and c.Orders.Count()>5 
                select c;

we have one clear parametrization candidate, the “name” variable. But should we also extract the contant value “5”? No, it’s likely a constant for a reason and unless the query is the result of Dynamic Linq, we should just let it stay constant. Or, should we not? We may need to revisit this…

Speaking of revisiting. That’s exactly what we should do. Send an expression visitor on its merry way along the branches of the expression tree. Let’s start off with the simplest query imaginable..

var product=(from p in db.Products.AsCompilable()
             where p.ProductID==1
             select p).Single();  

Which results in this expression tree

expression

We want to convert that picture (it’s a PNG) into a CompiledQuery like this one..

Func<Northwind,Product> theQuery = CompiledQuery.Compile((Northwind db)=> 
    (from p in db.Products
     where p.ProductID==1
     select p).Single()
);

In other words, we need to extract the DataContext from the query, since it is the only required parameter for the delegate. But it’s nowhere to be found… The closest we get is the ConstantExpression with “Table(Products)”, which means that we have no way of knowing that what actually was used was the “Products” property of a Northwind DataContext. Not a biggie, actually. We can get the DataContext instance from the Context property of the Table and the Products property is only a wrapper around GetTable<Product>, so we can use that as a replacement. Let’s try that in the newcomer, ParametrizingExpressionVisitor.

protected override Expression VisitConstant(ConstantExpression c)
{
    if (c.Type.IsGenericType && c.Type.GetGenericTypeDefinition() == typeof(Table<>))
    {
        var tableType = c.Type.GetGenericArguments()[0];
        var dataContext = c.Value.GetType().GetProperty("Context").GetValue(c.Value, null) as DataContext;
        return Expression.Call(Expression.Constant(dataContext), "GetTable", new Type[] { tableType });
    }
    return base.VisitConstant(c);
}

Ladida! The query still works.

Moving on to extract the parameters. Let’s modify the query slightly.

var product=(from p in db.Products.AsCompilable()
             where p.ProductID==productId
             select p).Single();

The constant “1” is now a variable, which we want to extract and use as a parameter. Let’s look at the expression tree.

productid

The simple ConstantExpression, representing the “1” has now become a slightly more complex MemberExpression, referring to a strange type called “<>c__DisplayClass13”. That’s actually a compiler generated class, which is needed in order to access “outside” variables inside the scope of a lambda function. Matt Manela explains it nicely.

So, in order to replace this little MemberExpression with a ParameterExpression, we do this…

protected override Expression VisitMemberAccess(MemberExpression m)
{
    if (m.Member.DeclaringType.Name.StartsWith("<>c__") && m.Member.MemberType == System.Reflection.MemberTypes.Field)
    {
        FieldInfo field = m.Member as FieldInfo;
        if (field != null)
        {
            object obj = ((ConstantExpression)m.Expression).Value;

    //Add the value to the extraction list

            _parameterValues.Add(field.GetValue(obj));

            return Expression.Parameter("__param" + _parameterValues.Count,field.FieldType);
        }
    }
    return base.VisitMemberAccess(m);
}

This worked fine, and I was able to compile and execute a CompiledQuery from the rewritten expression tree, using Reflection. But when adding a large set of input parameters, I got this:

System.ArgumentException : An incorrect number of type args were specified for the declaration of a Func type.
at System.Linq.Expressions.Expression.GetFuncType(Type[] typeArgs)

Arg(h)! That means we instead have to use some kind of “parameter container” and send that into the query instead. We could use a Dictionary or List or something like that. In fact, let’s just keep things really simple and go for an object[], i.e an array of System.Object. The parameter replacement is now looking like this:

protected override Expression VisitMemberAccess(MemberExpression m)
{
    //If the member is a DisplayClass
    if (m.IsCapturedField())
    {
        FieldInfo field = m.Member as FieldInfo;
        if (field != null)
        {
            //Get the value
            object obj = ((ConstantExpression)m.Expression).Value;
            var value = field.GetValue(obj);

            Type arrayType;
            //Expression arrayIndex;
            //If SupportEnumerableParameters and the type is an enumerable
            if (_compilationParameters.SupportEnumerableParameters && (arrayType = field.FieldType.GetGenericIEnumerables().FirstOrDefault()) != null)
            {
                var values = new List<Expression>();
                foreach (var item in ((IEnumerable)value))
                {
                    values.Add(AddParameterWithValue(arrayType, item));    
                }
                return Expression.NewArrayInit(arrayType, values);
            }
            else
            {
                return AddParameterWithValue(field.FieldType, value);
            }
        }
    }
    return base.VisitMemberAccess(m);
}

private Expression AddParameterWithValue(Type valueType,object value)
{
    //Create a ConstantExpression to hold the index in the array                
    var arrayIndex = Expression.Constant(_parameterValues.Count);
    //Add the value to the list that is returned after visiting hours
    _parameterValues.Add(value);
    //Return a replacement expression, which retreives the value from the array and converts (casts) it to its original type
    return Expression.Convert(Expression.ArrayIndex(_parameterContainerParam, arrayIndex), valueType);
}

And since we’re now back to having just one parameter, we won’t need the reflection stuff to use CompiledQuery. The result from the parametrization phase is stored in a class exposing the rewritten Expression, references to the ParameterExpressions used, a reference to the actual DataContext instance, and a list of parameter values.

var lambda = Expression.Lambda(result.Expression, new ParameterExpression[] { 
                            result.DataContextParameter, result.ParameterContainerParameter }
                        ) as Expression<Func<DataContext, object[], TResult>>;

var query = CompiledQuery.Compile<DataContext, object[], TResult>(lambda);

return query.Invoke(result.DataContext, result.ParametersValues.ToArray());

 

Caching

We now have a query provider which precompiles every query, every time. It’s good to see that it works, but clearly, we need some kind of cache to make this useful.

To be able to cache something we need an identifying key from the query. Expression.ToString() should do the trick.

var product = (from p in products 
               where p.Category.CategoryID == categoryId1
               select p).Count();

gives us…

__context.GetTable().Where(p => (p.Category.CategoryID = Convert(__params[0]))).Count()

which is exactly what we n……..oooo, wait a minute….That the “Convert” doesn’t say anything about what type it’s converting to may be a problem, but the bigger problem is that GetTable() doesn’t say anything about by which Table<T> it is getting. If we have more than one type of entity exposing something named Category.CategoryID we will get the same cache key. We could get around that by maintaining separate caches for all types of T, and that may be a good idea anyway, but still…. I’d prefer to get a uniquely identifying query string.

Clearly then, Expression.ToString isn’t enough, which leaves us no other option than to find another solution. Modifying the code from the existing ToString method would be an easy way out, but it’s not really legal, so I chose to use the ExpressionPrinter class from Mono and modify it slightly. While I was at it I added a bit of parameter name normalization to the ParametrizingExpressionVisitor. The result…

__context.GetTable<Product>().Where<Product>(p0 => (p0.Category.CategoryID = ((Int32)__params[0]))).Count<Product>()

A bit too verbose, perhaps, but it will do for now.

With that problem out of the way we can now check a cache for an existing CompiledQuery delegate, before trying to compile one. I decided to abstract the cache into a small interface.

public interface ICompiledQueryCache
{
    Delegate GetOrAdd<TResult>(string key, Func<Delegate> valueFactory);
}

And the complete Execute method now looks like this:

TResult IQueryProvider.Execute<TResult>(Expression expression)
{
    expression = ExpressionFlattener.Flatten(expression);
    
    var result = ParametrizingExpressionVisitor.Process(expression,_compilationParameters);

    string cacheKey = ExpressionPrinter.ToString(result.Expression);

    var query = _cache.GetOrAdd<TResult>(cacheKey, () => {
        
        var lambda = Expression.Lambda(result.Expression, new ParameterExpression[] { 
                    result.DataContextParameter, result.ParameterContainerParameter }
        ) as Expression<Func<DataContext, object[], TResult>>;
        
        return CompiledQuery.Compile<DataContext, object[], TResult>(lambda);
    });

    var compiledQuery = query as Func<DataContext, object[], TResult>;

    return compiledQuery.Invoke(result.DataContext, result.ParameterValues.ToArray());
}

The ExpressionFlattener is derived from Matt Warren’s super smart PartialEval code, and is used to evaluate as much as possible of the expression tree before it’s sent further. Non compiled queries actually perform similar flattening, but CompiledQuery:s don’t, making it impossible to use something like int.Parse(col["productId"]) inside a query. My goal is feature parity with standard queries, so the flattener is needed.

For testing purposes I used a simple Dictionary as the backing store for the cache. I know that there are concurrency issues with this, but it can easily be swapped out for something smarter, such as .NET 4:s ConcurrentDictionary.

 

Does it work?

In theory, the result should be that the non-compiled queries take the same amount of time every time, whereas the compiled version would take the of compilation the first time, and stay consistent after that. In reality, we have external factors, not least the actual SQL query execution, which makes each iteration different from the other. In order to avoid as much influence from the SQL server as possible, a dry run of each query was run at the beginning of each iteration sequence.

Every query was run 10000 times, in a round robin fashion. To be able to get only the time consumed up until the actual SQL execution I created a small class called StopWatchStoppingTextWriter, which I hooked up to the DataContext’s Log property. The idea was that the StopWatch should be stopped as soon as the context tries to send the constructed SQL query to the logger, which it does just before it executes.

class StopWatchStoppingTextWriter : TextWriter
{
    Stopwatch _watch;
    public StopWatchStoppingTextWriter(Stopwatch watch)
    {
        _watch = watch;
    }

    public override void WriteLine(string value)
    {
        _watch.Stop();
    }

    public override Encoding Encoding
    {
        get { return Encoding.UTF8; }
    }
}

Test query 1

This is a really simple query, retrieving the name from the “first” product. It should be an easy job for the query provider and precompilation shouldn’t help much.

(from x in products
orderby x.ProductID
select x.ProductName).First()

Query1

 

Ouch!

  • The auto compiled version is a LOT slower on the first run. Not surprising, really, but still…
  • It’s interesting to see how the standard CompiledQuery version seems to defer most of its work until the first run.
  • The auto compiled version regains it’s glory on the subsequent runs. Given the nature of this query, it’s likely that it would be reused many times, so the initial hit should be worth taking.

Test query 2

The second query is a strange one, with parameters from the outside, subqueries, “paging” etc.

(from x in context.Products
where (x.CategoryID == categoryId1 || x.CategoryID == categoryId2)
&& x.SupplierID == iteration
|| x.Category.Products.Any(y => y.Order_Details.Any(od => od.Order.CustomerID.Length > 0))
orderby x.ProductID
select x).Skip(iteration).Take(50).Select(d => d.ProductID)

Query2

Nice!

  • The auto compiled version is very close the non compiled version, even on the first run!
  • Subsequent runs reduces the pre-SQL execution phase to 10% of the time required by the standard query.
  • The CompiledQuery version remains the winner, but the lead is mostly eaten up the time required for SQL and materialization.

 

Test query 3

The last test query isn’t very complex, but it uses 11 input parameters. For the CompiledQuery version I passed an array of integers as a single parameter.

(from x in products
where (
x.CategoryID == categoryId1 ||
x.CategoryID == categoryId2 ||
x.CategoryID == categoryId3 ||
x.CategoryID == categoryId4 ||
x.CategoryID == categoryId5 ||
x.CategoryID == categoryId6 ||
x.CategoryID == categoryId7 ||
x.CategoryID == categoryId8 ||
x.CategoryID == categoryId9 ||
x.CategoryID == categoryId10)
orderby x.ProductID
select x).Skip(iteration).Take(50).Select(d => d.ProductID).ToList()

 

Query3

 

Wow!

  • The auto compiled query wipes the floor with the non compiled query, even on the first run!
  • It’s really close to the standard CompiledQuery!
  • Subsequent queries increases the lead over the non compiled query, making it more than 10 times faster, including SQL.
  • The distance to the standard CompiledQuery stays roughly the same, giving a good indication of the actual overhead of the parameterization and caching.

Conclusions

I think the tests show that this is indeed a useful approach. Provided that a working cache solution is used, it should be perfectly production worthy. It would require a very high query variance factor in order to not gain a lot from using this. The standard CompiledQuery is clearly faster, but it has its drawbacks. I mentioned some of them above, and here is what MSDN says:

It is an error to re-execute a compiled query using a DataContext that contains a DataLoadOptions different from the one originally used to compile the query (the first time it is executed). The reason for this behavior is that compiling the query requires information that is contained in the DataShape. When a query is executed again using a different DataShape, the cached query becomes invalid and an error is generated.

CompilableQuery could handle this by including the DataContext’s DataLoadOptions in the cache key.

If a new query operator is applied to the result of the delegate execution, an exception is generated.

This won’t happen, since the full query is compiled.

When you want to execute a query operator over the result of executing a compiled query, you must translate the result to a list before operating on it.

Again, this won’t happen.

You cannot currently store (in static variables) queries that return an anonymous type, because the type has no name to provide as a generic argument.

Not a problem. The anonymous types won’t be anonymous to the CompilableQueryProvider.

Next steps

  • Completing the EF4 version. It works, but needs a bit of tweaking. Suffice to say, the EF implementation of CompiledQuery is a lot more picky about what you try to put into it.
  • Cleaning up and refactoring, hopefully making it possible to keep a reusable rewriting core for both EF and L2S.
  • Investigating caching options
  • Real life tests
  • Providing fallback when queries fails
  • And a lot more stuff which I can’t remember right now

You can download the solution here, but I’ll upload it to GitHub or CodePlex as soon as I can decide which I like best.

As always (really, Gunnar, you’ve only written 3 blog posts…), I welcome any ideas or opinions.

PS. While Googling for a solution to a problem I had with the EF4 version, I found an article by Dan Simmons, showing a solution which is similar to mine. After the initial “bummer!” feeling, I must admit that it is an elegant solution, but not quite as transparent as AutoCompiler. But, maybe that’s good thing? DS

 

kick it on DotNetKicks.com

Tags: , , ,

Linq

Comments

1/26/2011 8:31:10 AM #

Tobias

Hi!

Great project, thanks =)

I am actively using it but hit two issues - one larger, one smaller.

Queries of type:

from t in table.AsCompilable() orderby t.prop select t

will fail.
I managed to glean from your test cases that

from t in table.AsCompilable().OrderBy(o => o.prop) select t

will work.

Do you have any suggestion how to work around that? Currently, I have this (obviously bad code) in use, because there are quite a few such queries in the codebase:

        public IEnumerator<T> GetEnumerator()
        {
            IQueryable<T> val;

            try
            {
                val = _provider.Execute<IQueryable<T>>(_inner.Expression);
            }
            catch (Exception)
            {
                val = _provider.Execute<IOrderedQueryable<T>>(_inner.Expression);
            }
            return val.GetEnumerator();
        }

The other issue is, that using the simple type name in the cacheKey does cause issues in ASP.NET Website projects, because partial recompilation causes cached objects to be invalid (differening assemblies...).

Thanks again for this code!

Best

Tobias Germany |

1/27/2011 8:45:15 AM #

gunnar

Hi! Thanks for your input. Good to hear from someone else who's using it!

I tried to reproduce the orderby problem, but was unable to get any error. Maybe I've fixed it without realising it. I added this test:

[Test]
        public void Can_Select_Products_Ordered_By_ProductID_Using_Linq_Syntax()
        {
            using (NorthwindExpressDataContext db = new NorthwindExpressDataContext())
            {
                var products = from t in db.Products.AsCompilable() orderby t.ProductID select t;
                Assert.AreEqual(1, products.First().ProductID);
                Assert.AreEqual(2, products.Skip(1).First().ProductID);
            }
        }

and it passes.

I used a crude approach to solving the cache key problem, by simply using the assembly qualified name instead.

Give it a try!

You can download the latest binaries and source from http://linqautocompiler.codeplex.com/ (you'd be the first...)

Best regards
Gunnar

gunnar Sweden |

9/5/2011 6:36:21 PM #

Geof

Hi Gunnar,

I read your article after reading the following article: http://blog.csharplearners.com/category/linq/

I am still not sure, what I gain from the new released Microsoft Entity Framework (EF) June 2011 CTP. What do we really gain by the new "Auto-Compiled Linq Queries?"

- Geof

Geof Canada |

9/6/2011 7:56:43 AM #

gunnar

Hi Geof!

What we gain is of course faster queries. The query compilation phase takes a lot of time, so reusing already compiled query plans makes a lot of sense.

As to what is gained compared to my implementation.... Well, I certainly think that MS solution may be more complete than mine. There may be fringe cases where my solution simply doesn't work. I do use it in production code, and there are a few (very few) others who do the same.

I've looked at the code in the CTP, and it's orders of magnitude more complex than Linq.Autocompiler, so I think that it may also be less efficient. Linq.Autocompiler also supports per-query compilation (EF CTP does it on a per-datacontext level) and parametrization of constant expressions.

Once MS reenables CompiledQuery in EF 4.x I will port my solution, and well see if it has any merit.

gunnar Sweden |

Comments are closed

Month List

Page List