In this blog post I will describe the steps I took in order to find out why the title above was the case, and how it could be fixed.
On Stackoverflow the general opinion was that the reported slowness was “by design” and could not be fixed, but looking at recent tests posted on Stackoverflow pointed to the fact that something was not done right.
Since Entity Framework is now Open Source and available on CodePlex, I decided to have a deeper look.
To test if the process could be improved, I created the following console app:
1: namespace EF6SqlCETest
2: {
3: using System;
4: using System.Data.Entity;
5: using System.Diagnostics;
6:
7: class Program
8: {
9: static void Main(string[] args)
10: {
11: using (var db = new StudentContext())
12: {
13: Stopwatch sw = new Stopwatch();
14: db.Database.Delete();
15: sw.Start();
16: db.Database.CreateIfNotExists();
17: db.Configuration.AutoDetectChangesEnabled = false;
18: db.Configuration.ProxyCreationEnabled = false;
19: Console.WriteLine(
20: "Db created in {0}", sw.Elapsed.ToString());
21: sw.Restart();
22: for (int i = 0; i < 4000; i++)
23: {
24: var student = new Student { Name = Guid.NewGuid().ToString() };
25: db.Students.Add(student);
26: }
27: Console.WriteLine(
28: "Entities added in {0}", sw.Elapsed.ToString());
29:
30: sw.Restart();
31: int recordsAffected = db.SaveChanges();
32: sw.Stop();
33: Console.WriteLine(
34: "Saved {0} entities to the database, press any key to exit.",
35: recordsAffected);
36: Console.WriteLine(
37: "Saved entities in {0}", sw.Elapsed.ToString());
38: Console.ReadKey();
39: }
40:
41: }
42: }
43:
44: public class Student
45: {
46: public int Id { get; set; }
47: public string Name { get; set; }
48: }
49:
50: public class StudentContext : DbContext
51: {
52: public DbSet<Student> Students { get; set; }
53: }
54:
55: }
The test project and the related app.config is available for download here: http://sdrv.ms/UCL2j5
The test code is a simple Code First DbContext model. For each run I start with a new blank database, and creates it before doing SaveChanges, so that part of the process can be timed individually. The 2 options on lines 17 and 18 are there to ensure that the for loop runs quickly, without these option the loop adding objects takes much longer (test for yourself).
The resulting table looks like this:
CREATE TABLE [Students] (
[Id] int NOT NULL IDENTITY (1,1)
, [Name] nvarchar(4000) NULL
);
GO
ALTER TABLE [Students] ADD CONSTRAINT [PK_dbo.Students] PRIMARY KEY ([Id]);
GO
In order to find out where time was spent during SaveChanges, I ran a Visual Studio Performance Analysis. It turned out that all the time was spent in sqlceqp40.dll, the SQL Server Compact 4.0 unmanaged query processor – so something was amiss.
As described in my earlier blogpost, the SQL statements generated in order to return the server generated id (the IDENTITY value), looked like the following:
SELECT [Id] FROM [Students] WHERE [Id] = @@IDENTITY
So using the SQL Server Compact Toolbox, I coulde analyze the 2 statements:
And got the following result:
So for every INSERT, a table scan was performed, as for some reason, the SQL Server Compact query processor could not figure out to do an Index Seek. And the more rows to be scanned, the worse the performance got. And all the time for the operation was spent doing this.
In order to avoid this, I decided that the goal of the statement executed should be to avoid table scans, but return a value with the exact same shape as the previous statement executed, that is; it should have the name of the IDENTITY column, and be of the correct type (only bigint and int are supported as IDENTITY types with SQL Server Compact).
The return value of @@IDENTITY is numeric, so simply using “SELECT @@IDENTITY AS [Id]” would not work. So the statement could be:
SELECT [Id] FROM [Students] WHERE [Id] = CAST(@@IDENTITY as int)
The type could then be either int or bigint and the column alias should of course be the correct column name.
I could then analyze the modified statement:
And no table scan, instead an index seek, consuming only 19% of the total query cost instead of 100%.
And so this is what I have implemented in my fix, that I now need to figure out how to “submit a pull request” for.
8 comments:
Thanks for your good work! How will we know if and when the EF-team will act on this problem?
Andre: I think the team will take the fix seriously, but I just need to submit it, and then they will revei/polish and include in the master branch - after taht it will be available in the source and all future builds.
Alex: Use the Toolbox file version checker to determine the version, and I will tell you how to proceed
This is also a problem in LINQ-to-SQL.
Note that in L2S (and maybe also EF?), the query to return the server generated ID will also potentially include other columns, such as a rowversion column, e.g:
SELECT [Id], [RowVersion] FROM [Student] WHERE [Id] = @@IDENTITY
As such, I'm not sure that simply doing "SELECT CAST(@@IDENTITY AS int) AS [Id]" is sufficient.
However, the query optimizer *will* do the right thing if you cast the @@IDENTITY variable to int before doing the select:
SELECT [Id], [RowVersion] FROM [Student] WHERE [Id] = CAST(@@IDENTITY AS INT)
I think the 'correct' place to fix this would be in SQL CE itself, I'd see this as being a issue with the CE query optimizer rather than the queries generated by EF/L2S.
Unfortunately I can't see any obvious way to bodge L2S to do the right thing...
Anthony: Thanks, that was very useful information, I will do some testing with rowversion in my model. The only place this can be fixed as I see it is in the EF provider, as SQL Compact and LINQ to SQL are both eol.
Anthony: Thanks very much, your proposal is the "correct" fix. And LINQ to SQL suffers from the same issue, yes, but has much less overhead when it comes to database ceation and adding objects (both less than 500 ms)
Thanks Eric, got it working. Your blog was invaluable for that.
Post a Comment