我有一个应用程序来跟踪网站的页面访问量。这是我的模型:
public class VisitSession {
public string SessionId { get; set; }
public DateTime StartTime { get; set; }
public string UniqueVisitorId { get; set; }
public IList<PageVisit> PageVisits { get; set; }
}
当访问者访问网站时,访问会话将开始。一个访问会话具有多个页面访问。当访问者第一次访问网站时,跟踪器将写入一个UniqueVisitorId (GUID) cookie。因此,我们能够知道访客是否是回访者。
现在我想写一个视图,显示每天的TotalVisitSessions,TotalPageVisits,TotalUniqueVisits。所以我写了这个多地图减少:
public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
public VisitSummaryByDateIndex()
{
AddMap<VisitSession>(sessions => from s in sessions
select new VisitSummaryByDate
{
Date = s.StartTime.Date,
TotalVisitSessions = 1,
TotalPageVisits = 0,
TotalNewVisitors = s.IsNewVisit ? 1 : 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = s.UniqueVisitorId
});
AddMap<PageVisit>(visits => from v in visits
select new VisitSummaryByDate
{
Date = v.VisitTime.Date,
TotalVisitSessions = 0,
TotalPageVisits = 1,
TotalNewVisitors = 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = String.Empty
});
Reduce = results => from result in results
group result by result.Date into g
select new VisitSummaryByDate
{
Date = g.Key,
TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
TotalPageVisits = g.Sum(it => it.TotalPageVisits),
TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(it => it.Length > 0).Distinct().Count(),
UniqueVisitorId = String.Empty
};
}
}
问题在于"TotalUniqueVisits"计算,有时索引结果的TotalUniqueVisiters为1,有时为2。但是我检查了数据,它永远不会像这样少。我的 Map/Reduce 语法有问题吗?
相关文章:Raven DB:如何创建"UniqueVisitorCount by date"索引
可在此处找到包含示例数据的代码:https://gist.github.com/2702071
Reduce实际上是在结果上多次处理的。索引假定此操作仅发生一次,并且有权访问整个结果集。
索引需要如下所示:
public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
public VisitSummaryByDateIndex()
{
AddMap<VisitSession>(sessions => from s in sessions
select new VisitSummaryByDate
{
Date = s.StartTime.Date,
TotalVisitSessions = 1,
TotalPageVisits = 0,
TotalNewVisitors = s.IsNewVisit ? 1 : 0,
TotalUniqueVisitors = 1,
UniqueVisitorId = new[]{s.UniqueVisitorId}
});
AddMap<PageVisit>(visits => from v in visits
select new VisitSummaryByDate
{
Date = v.VisitTime.Date,
TotalVisitSessions = 0,
TotalPageVisits = 1,
TotalNewVisitors = 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = new string[0]
});
Reduce = results => from result in results
group result by result.Date into g
select new VisitSummaryByDate
{
Date = g.Key,
TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
TotalPageVisits = g.Sum(it => it.TotalPageVisits),
TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
TotalUniqueVisitors = g.Sum(it => it.TotalUniqueVisitors),,
UniqueVisitorId = g.Select(x=>x.UniqueVisitorId).Distinct()
};
}
}
正确的索引是:
public class VisitSummaryByDateIndex : AbstractMultiMapIndexCreationTask<VisitSummaryByDate>
{
public VisitSummaryByDateIndex()
{
AddMap<VisitSession>(sessions => from s in sessions
select new VisitSummaryByDate
{
Date = s.StartTime.Date,
TotalVisitSessions = 1,
TotalPageVisits = 0,
TotalNewVisitors = s.IsNewVisit ? 1 : 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = s.UniqueVisitorId
});
AddMap<PageVisit>(visits => from v in visits
select new VisitSummaryByDate
{
Date = v.VisitTime.Date,
TotalVisitSessions = 0,
TotalPageVisits = 1,
TotalNewVisitors = 0,
TotalUniqueVisitors = 0,
UniqueVisitorId = string.Empty,
});
Reduce = results => from result in results
group result by result.Date into g
select new VisitSummaryByDate
{
Date = g.Key,
TotalVisitSessions = g.Sum(it => it.TotalVisitSessions),
TotalPageVisits = g.Sum(it => it.TotalPageVisits),
TotalNewVisitors = g.Sum(it => it.TotalNewVisitors),
TotalUniqueVisitors = g.Select(it => it.UniqueVisitorId).Where(x => x.Length > 0).Distinct().Count(),
UniqueVisitorId = g.FirstOrDefault().UniqueVisitorId,
};
}
}
不同之处在于 UniqueVisitorId 是在 reduce 中设置的。 我必须承认,我还没有 100% 确定为什么需要这样做。