查询在客户端应用程序中需要很长时间,但在SQL Server Management Studio中运行速度很快

我正在开发一个存储图像和相关元数据的应用程序.我在使用NHibernate执行某个查询时遇到了问题.查询耗时过长(在我的机器上类似于31秒),尽管在SQL Server Management Studio中执行时相同的查询只需要几分之一秒.

我已经将问题简化并解决了一个小的测试应用程序:

实体:

标记,由Id组成(字符串,标记值本身)

public class Tag
{
    public virtual string Id { get; set; }
}

图像,包括Id(int),Name(字符串)和标签(多对多,Tag实例集)

public class Image
{
    private Iesi.Collections.Generic.ISet<Tag> tags = new HashedSet<Tag>();

    public virtual int Id { get; set; }

    public virtual string Name { get; set; }

    public virtual IEnumerable<Tag> Tags
    {
        get { return tags; }
    }

    public virtual void AddTag(Tag tag)
    {
        tags.Add(tag);
    }
}

我使用“按代码映射”使用以下映射:

public class TagMapping : ClassMapping<Tag>
{
    public TagMapping()
    {
        Id(x => x.Id, map => map.Generator(Generators.Assigned));
    }
}

public class ImageMapping : ClassMapping<Image>
{
    public ImageMapping()
    {
        Id(x => x.Id, map => map.Generator(Generators.Native));
        Property(x => x.Name);
        Set(x => x.Tags, 
            map => map.Access(Accessor.Field),
            map => map.ManyToMany(m2m => { }));
    }
}

NHibernate /数据库配置如下所示:

  <hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
    <session-factory>
      <property name="dialect">NHibernate.Dialect.MsSql2008Dialect</property>
      <property name="connection.connection_string_name">PrimaryDatabase</property>
        <property name="format_sql">true</property>
    </session-factory>
  </hibernate-configuration>
  <connectionStrings>
    <add name="PrimaryDatabase" providerName="System.Data.SqlClient" connectionString="Data Source=.\SQLEXPRESS;Initial Catalog=PerfTest;Integrated Security=True" />
  </connectionStrings>

我想实现以下查询:给我所有图像,其中名称包含特定字符串或任何标记包含特定字符串.为了找到后者,我使用了一个子查询,它给了我带有匹配标签的所有图像的ID.因此,最终搜索条件是:图像具有包含特定字符串的名称,或者其ID是子查询返回的ID之一.

这是执行查询的代码:

var term = "abc";
var mode = MatchMode.Anywhere;

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var qry = session.QueryOver<Image>()
    .Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
            Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
    .List();

我运行此查询的测试数据库(DBMS:SQL Server 2008 Express R2)是专门为此测试创建的,不包含任何其他内容.我用随机数据填充:10.000张图像(表格图像),4.000张标签(表格标签)和图像与标签之间大约200,000个关联(表格标签),即.每个图像有大约20个相关标签.数据库

SQL NHibernate声称使用的是:

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Name like @p0
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p1
        )
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

鉴于我正在创建的查询,这看起来很合理.

如果我使用NHibernate运行此查询,则查询大约需要30秒(NHibernate.AdoNet.AbstractBatcher – ExecuteReader需要32964毫秒)并返回98个实体.

但是,如果我直接在Sql Server Management studio中执行等效查询:

DECLARE @p0 nvarchar(4000)
DECLARE @p1 nvarchar(4000)

SET @p0 = '%abc%'
SET @p1 = '%abc%'    

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Name like @p0
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p1
        )
    );

查询花费不到一秒钟(并返回98个结果).

进一步实验:

如果我只按名称搜索或仅按标签搜索,即:

var qry = session.QueryOver<Image>()
    .Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
    .List();

要么

var qry = session.QueryOver<Image>()
    .Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
    .List();

查询很快.

如果我在子查询中不使用like但完全匹配:

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .Where(x => x.Id == term)
    .Select(x => x.Id);

查询也很快.

将名称的匹配模式更改为Exact不会更改任何内容.

当我调试程序并在查询执行时暂停时,托管调用堆栈的顶部如下所示:

[Managed to Native Transition]   
System.Data.dll!SNINativeMethodWrapper.SNIReadSync(System.Runtime.InteropServices.SafeHandle pConn, ref System.IntPtr packet, int timeout) + 0x53 bytes  
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult asyncResult, System.Data.SqlClient.TdsParserStateObject stateObj) + 0xa3 bytes   
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket() + 0x24 bytes  
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadBuffer() + 0x1f bytes     
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadByte() + 0x46 bytes   
System.Data.dll!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior runBehavior, System.Data.SqlClient.SqlCommand cmdHandler, System.Data.SqlClient.SqlDataReader dataStream, System.Data.SqlClient.BulkCopySimpleResultSet bulkCopyHandler, System.Data.SqlClient.TdsParserStateObject stateObj) + 0x67 bytes     
System.Data.dll!System.Data.SqlClient.SqlDataReader.ConsumeMetaData() + 0x22 bytes   
System.Data.dll!System.Data.SqlClient.SqlDataReader.MetaData.get() + 0x57 bytes  
System.Data.dll!System.Data.SqlClient.SqlCommand.FinishExecuteReader(System.Data.SqlClient.SqlDataReader ds, System.Data.SqlClient.RunBehavior runBehavior, string resetOptionsString) + 0xe1 bytes  
...

所以,我的问题是:

>为什么NHibernate执行查询的时间要长得多,即使使用的SQL是相同的?
>我怎样才能摆脱这种差异?是否有可能导致此行为的设置?

我知道查询一般不是世界上最有效的东西,但是我在这里引人注目的是使用NHibernate和manualy查询之间的区别.这里确实存在一些奇怪的事情.

很抱歉这篇文章很长,但我希望尽可能多地包含这个问题.非常感谢您的帮助!

更新1:
我已经使用NHProf测试了应用程序而没有太多附加值:NHProf显示执行的SQL是

SELECT this_.Id   as Id1_0_,
       this_.Name as Name1_0_
FROM   Image this_
WHERE  (this_.Name like '%abc%' /* @p0 */
         or this_.Id in (SELECT this_0_.Id as y0_
                         FROM   Image this_0_
                                inner join Tags tags3_
                                  on this_0_.Id = tags3_.image_key
                                inner join Tag tag1_
                                  on tags3_.elt = tag1_.Id
                         WHERE  tag1_.Id like '%abc%' /* @p1 */))

这正是我之前发布的内容(因为这是NHibernate首先写入日志的内容).

这是NHProf的截图

警告是可以理解的,但不解释行为.

更新2
@surfen首先将子查询的结果从数据库中提取出来并将它们重新放回主查询中:

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var ids = imagesWithMatchingTag.GetExecutableQueryOver(session).List<int>().ToArray();

var qry = session.QueryOver<Image>()
    .Where(
            Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
            Restrictions.On<Image>(x => x.Id).IsIn(ids))
    .List();

虽然这确实使主查询再次快速,但我宁愿不采用这种方法,因为它不适合现实世界应用程序中的预期用法.有趣的是,这要快得多.我希望子查询方法同样快,因为它不依赖于外部查询.

更新3
这似乎与NHibernate无关.如果我使用普通的ADO.NET对象运行查询,我会得到相同的行为:

var cmdText = @"SELECT this_.Id   as Id1_0_,
                        this_.Name as Name1_0_
                FROM   Image this_
                WHERE  (this_.Name like  @p0 
                            or this_.Id in 
                        (SELECT this_0_.Id as y0_
                        FROM   Image this_0_
                            inner join Tags tags3_
                                on this_0_.Id = tags3_.image_key
                            inner join Tag tag1_
                                on tags3_.elt = tag1_.Id
                        WHERE  tag1_.Id like  @p1 ));";

using (var con = new SqlConnection(ConfigurationManager.ConnectionStrings["PrimaryDatabase"].ConnectionString))
{
    con.Open();
    using (var txn = con.BeginTransaction())
    {
        using (var cmd = new SqlCommand(cmdText, con, txn))
        {
            cmd.CommandTimeout = 120;
            cmd.Parameters.AddWithValue("p0", "%abc%");
            cmd.Parameters.AddWithValue("p1", "%abc%");

            using (var reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine("Match");
                }
            }

        }
        txn.Commit();
    }
}

更新4

查询计划(点击放大):

慢查询

快速查询

确切地说,计划有所不同.

更新5

因为Sql Server确实认为子查询是相关的,所以我尝试了不同的东西:我将与名称相关的标准单独移动到子查询:

var term = "abc";
var mode = MatchMode.Anywhere;

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var imagesWithMatchingName = QueryOver.Of<Image>()
    .WhereRestrictionOn(x => x.Name).IsLike(term, mode)
    .Select(x => x.Id);

var qry = session.QueryOver<Image>()
    .Where(
      Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingName) ||        
      Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag) 
    ).List();

生成的SQL:

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p0
        )
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            WHERE
                this_0_.Name like @p1
        )
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

这似乎打破了相关性,结果查询再次“快速”(“快速”,如“当下可接受”).查询时间从30秒下降到~170ms.仍然不是轻量级查询,但至少可以让我从这里继续.我知道“喜欢’%foo%’”永远不会超级快.如果遇到最糟糕的情况,我仍然可以转移到专门的搜索服务器(Lucene,solr)或真正的全文搜索.

更新6
我能够将查询重写为不使用子查询:

var qry = session.QueryOver(() => img)
    .Left.JoinQueryOver(x => x.Tags, () => tag)
    .Where(
        Restrictions.Like(Projections.Property(() => img.Name), term, mode) ||
        Restrictions.Like(Projections.Property(() => tag.Id), term, mode))
    .TransformUsing(Transformers.DistinctRootEntity)
    .List();

SQL:

SELECT
    this_.Id as Id1_1_,
    this_.Name as Name1_1_,
    tags3_.image_key as image1_3_,
    tag1_.Id as elt3_,
    tag1_.Id as Id0_0_
FROM
    Image this_
left outer join
    Tags tags3_
        on this_.Id=tags3_.image_key
left outer join
    Tag tag1_
        on tags3_.elt=tag1_.Id
WHERE
    (
        this_.Name like @p0
        or tag1_.Id like @p1
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

但是,查询现在比具有子查询的版本稍差.我会进一步研究这个问题.

最佳答案
我敢打赌,这是第二个慢的查询:

var qry = session.QueryOver<Image>()
.Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
        Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();

您仅为第一个查询提供了SQL.那第二个怎么样?你在SQL Management Studio下测试了吗?使用SQL Server Profiler作为@JoachimIsaksson建议找出NHibernate执行服务器端的查询.

这看起来就像是将97个图像对象加载到内存中.它们各有多大?

编辑

另一个赌注是您的第一个查询执行第二个查询的广告内部查询.尝试在第一个查询上执行.List()以将标记加载到内存中.

编辑2

从查询计划看,您的查询看起来像是Correlated subquery.
您提到这些查询很快:

var qry = session.QueryOver<Image>()
.Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();

要么

var qry = session.QueryOver<Image>()
.Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
.List();

只需UNION它们就可以获得与单独运行它们相同的结果.
还要确保所有连接列都有索引.

这就是IS IN(查询)的问题 – 您无法确定数据库是如何执行它的(除非您以某种方式强迫它使用某个计划).也许你可以改变.In()它以某种方式变成JoinQueryOver()?

转载注明原文:查询在客户端应用程序中需要很长时间,但在SQL Server Management Studio中运行速度很快 - 代码日志