Raven DB:这个多映射/归约索引有什么问题?



我有一个应用程序来跟踪网站的页面访问量。这是我的模型:

public class VisitSession {
    public string SessionId { get; set; }
    public DateTime StartTime { get; set; }
    public string UniqueVisitorId { get; set; }
    public IList<PageVisit> PageVisits { get; set; }
}

当访问者访问网站时,访问会话将开始。一个访问会话具有多个页面访问。当访问者第一次访问网站时,跟踪器将写入一个UniqueVisitorId (GUID) cookie。因此,我们能够知道访客是否是回访者。

现在我想写一个视图,显示每天的TotalVisitSessions,TotalPageVisits,TotalUniqueVisits。所以我写了这个多地图减少:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
    public VisitSummaryByDateIndex()
    {
        AddMap<VisitSession>(sessions => from s in sessions
                                            select new VisitSummaryByDate
                                            {
                                                Date = s.StartTime.Date,
                                                TotalVisitSessions = 1,
                                                TotalPageVisits = 0,
                                                TotalNewVisitors = s.IsNewVisit ? 1 : 0,
                                                TotalUniqueVisitors = 0,
                                                UniqueVisitorId = s.UniqueVisitorId
                                            });
        AddMap<PageVisit>(visits => from v in visits
                                    select new VisitSummaryByDate
                                    {
                                        Date = v.VisitTime.Date,
                                        TotalVisitSessions = 0,
                                        TotalPageVisits = 1,
                                        TotalNewVisitors = 0,
                                        TotalUniqueVisitors = 0,
                                        UniqueVisitorId = String.Empty
                                    });
        Reduce = results => from result in results
                            group result by result.Date into g
                            select new VisitSummaryByDate
                            {
                                Date = g.Key,
                                TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
                                TotalPageVisits = g.Sum(it => it.TotalPageVisits),
                                TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
                                TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(it => it.Length > 0).Distinct().Count(),
                                UniqueVisitorId = String.Empty
                            };
    }
}

问题在于"TotalUniqueVisits"计算,有时索引结果的TotalUniqueVisiters为1,有时为2。但是我检查了数据,它永远不会像这样少。我的 Map/Reduce 语法有问题吗?

相关文章:Raven DB:如何创建"UniqueVisitorCount by date"索引

可在此处找到包含示例数据的代码:https://gist.github.com/2702071

Reduce实际上是在结果上多次处理的。索引假定此操作仅发生一次,并且有权访问整个结果集。

索引需要如下所示:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
    public VisitSummaryByDateIndex()
    {
        AddMap<VisitSession>(sessions => from s in sessions
                                         select new VisitSummaryByDate
                                         {
                                             Date = s.StartTime.Date,
                                             TotalVisitSessions = 1,
                                             TotalPageVisits = 0,
                                             TotalNewVisitors = s.IsNewVisit ? 1 : 0,
                                             TotalUniqueVisitors = 1,
                                             UniqueVisitorId = new[]{s.UniqueVisitorId}
                                         });
        AddMap<PageVisit>(visits => from v in visits
                                    select new VisitSummaryByDate
                                    {
                                        Date = v.VisitTime.Date,
                                        TotalVisitSessions = 0,
                                        TotalPageVisits = 1,
                                        TotalNewVisitors = 0,
                                        TotalUniqueVisitors = 0,
                                        UniqueVisitorId = new string[0]
                                    });
        Reduce = results => from result in results
                            group result by result.Date into g
                            select new VisitSummaryByDate
                            {
                                Date = g.Key,
                                TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
                                TotalPageVisits = g.Sum(it => it.TotalPageVisits),
                                TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
                                TotalUniqueVisitors = g.Sum(it => it.TotalUniqueVisitors),,
                                UniqueVisitorId =  g.Select(x=>x.UniqueVisitorId).Distinct()
                             };
    }
}

正确的索引是:

public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
    public VisitSummaryByDateIndex()
    {
        AddMap<VisitSession>(sessions => from s in sessions
                                         select new VisitSummaryByDate
                                         {
                                             Date = s.StartTime.Date,
                                             TotalVisitSessions = 1,
                                             TotalPageVisits = 0,
                                             TotalNewVisitors = s.IsNewVisit ? 1 : 0,
                                             TotalUniqueVisitors = 0,
                                             UniqueVisitorId = s.UniqueVisitorId
                                         });
        AddMap<PageVisit>(visits => from v in visits
                                    select new VisitSummaryByDate
                                    {
                                        Date = v.VisitTime.Date,
                                        TotalVisitSessions = 0,
                                        TotalPageVisits = 1,
                                        TotalNewVisitors = 0,
                                        TotalUniqueVisitors = 0,
                                        UniqueVisitorId = string.Empty,
                                    });
        Reduce = results => from result in results
                            group result by result.Date into g
                            select new VisitSummaryByDate
                            {
                                Date = g.Key,
                                TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
                                TotalPageVisits = g.Sum(it => it.TotalPageVisits),
                                TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
                                TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(x => x.Length > 0).Distinct().Count(),
                                UniqueVisitorId = g.FirstOrDefault().UniqueVisitorId,
                            };
    }
}

不同之处在于 UniqueVisitorId 是在 reduce 中设置的。 我必须承认,我还没有 100% 确定为什么需要这样做。

相关内容

  • 没有找到相关文章

最新更新