In this blog post I will describe the steps I took in order to find out why the title above was the case, and how it could be fixed.
On Stackoverflow the general opinion was that the reported slowness was “by design” and could not be fixed, but looking at recent tests posted on Stackoverflow pointed to the fact that something was not done right.
Since Entity Framework is now Open Source and available on CodePlex, I decided to have a deeper look.
To test if the process could be improved, I created the following console app:
1: namespace EF6SqlCETest
2: {
3: using System;
4: using System.Data.Entity;
5: using System.Diagnostics;
6:
7: class Program
8: {
9: static void Main(string[] args)
10: {
11: using (var db = new StudentContext())
12: {
13: Stopwatch sw = new Stopwatch();
14: db.Database.Delete();
15: sw.Start();
16: db.Database.CreateIfNotExists();
17: db.Configuration.AutoDetectChangesEnabled = false;
18: db.Configuration.ProxyCreationEnabled = false;
19: Console.WriteLine(
20: "Db created in {0}", sw.Elapsed.ToString());
21: sw.Restart();
22: for (int i = 0; i < 4000; i++)
23: {
24: var student = new Student { Name = Guid.NewGuid().ToString() };
25: db.Students.Add(student);
26: }
27: Console.WriteLine(
28: "Entities added in {0}", sw.Elapsed.ToString());
29:
30: sw.Restart();
31: int recordsAffected = db.SaveChanges();
32: sw.Stop();
33: Console.WriteLine(
34: "Saved {0} entities to the database, press any key to exit.",
35: recordsAffected);
36: Console.WriteLine(
37: "Saved entities in {0}", sw.Elapsed.ToString());
38: Console.ReadKey();
39: }
40:
41: }
42: }
43:
44: public class Student
45: {
46: public int Id { get; set; }
47: public string Name { get; set; }
48: }
49:
50: public class StudentContext : DbContext
51: {
52: public DbSet<Student> Students { get; set; }
53: }
54:
55: }
The test project and the related app.config is available for download here: http://sdrv.ms/UCL2j5
The test code is a simple Code First DbContext model. For each run I start with a new blank database, and creates it before doing SaveChanges, so that part of the process can be timed individually. The 2 options on lines 17 and 18 are there to ensure that the for loop runs quickly, without these option the loop adding objects takes much longer (test for yourself).
The resulting table looks like this:
CREATE TABLE [Students] (
[Id] int NOT NULL IDENTITY (1,1)
, [Name] nvarchar(4000) NULL
);
GO
ALTER TABLE [Students] ADD CONSTRAINT [PK_dbo.Students] PRIMARY KEY ([Id]);
GO
In order to find out where time was spent during SaveChanges, I ran a Visual Studio Performance Analysis. It turned out that all the time was spent in sqlceqp40.dll, the SQL Server Compact 4.0 unmanaged query processor – so something was amiss.
As described in my earlier blogpost, the SQL statements generated in order to return the server generated id (the IDENTITY value), looked like the following:
SELECT [Id] FROM [Students] WHERE [Id] = @@IDENTITY
So using the SQL Server Compact Toolbox, I coulde analyze the 2 statements:
And got the following result:
So for every INSERT, a table scan was performed, as for some reason, the SQL Server Compact query processor could not figure out to do an Index Seek. And the more rows to be scanned, the worse the performance got. And all the time for the operation was spent doing this.
In order to avoid this, I decided that the goal of the statement executed should be to avoid table scans, but return a value with the exact same shape as the previous statement executed, that is; it should have the name of the IDENTITY column, and be of the correct type (only bigint and int are supported as IDENTITY types with SQL Server Compact).
The return value of @@IDENTITY is numeric, so simply using “SELECT @@IDENTITY AS [Id]” would not work. So the statement could be:
SELECT [Id] FROM [Students] WHERE [Id] = CAST(@@IDENTITY as int)
The type could then be either int or bigint and the column alias should of course be the correct column name.
I could then analyze the modified statement:
And no table scan, instead an index seek, consuming only 19% of the total query cost instead of 100%.
And so this is what I have implemented in my fix, that I now need to figure out how to “submit a pull request” for.