Jaccard分数/距离或重叠百分比



我希望能够计算一个矩形相对于矩形网格的Jaccard分数/距离(距离为1分)。我的网格是50x50(总共1625625个矩形)。

我能够在.34秒内根据所有这些计算我的输入矩形的分数,但这还不够快,因为我要么需要能够处理10k个矩形,要么将结果存储在DB中(每次调用更新成千上万行中的10行)。所以我希望让DB为我做计算,而不必从DB中提取任何东西,然而,如果没有游标,我想不出如何做到这一点。。。

sourceRectangles包含我的单个矩形(尽管实际上会有10k),rectangles包含我的网格,temporaryRectangleList包含分数的总和。

Dictionary<UInt32, Rectangle> temporaryRectangleList = new Dictionary<UInt32, Rectangle>();
foreach (var sourceRectangle in sourceRectangles)
{
foreach (var rectangle in rectangles)
{
// For each rectangle within the group
//foreach (var rectangle in group)
//{
int max_MinX = Math.Max(sourceRectangle.MinX, rectangle.MinX);
int min_MaxX = Math.Min(sourceRectangle.MaxX, rectangle.MaxX);
// There is an overlap
//if (max_MinX < min_MaxX)
//{
int max_MinY = Math.Max(sourceRectangle.MinY, rectangle.MinY);
int min_MaxY = Math.Min(sourceRectangle.MaxY, rectangle.MaxY);

// Calculate the area of the overlap
int area = ((min_MaxX - max_MinX)*(min_MaxY - max_MinY));
// Store the Jaccard score
var score = (double) area/((sourceRectangle.Area + rectangle.Area) - area);
if (temporaryRectangleList.ContainsKey(rectangle.ID))
{
temporaryRectangleList[rectangle.ID].Weight += score;
}
else
{
temporaryRectangleList.Add(rectangle.ID, new Rectangle(rectangle, score));
}
}
}

我需要能够在字典中查找项目,因为我需要通过矩形的ID从字典中提取数据。

如果你认为你可以加快C#的速度(10k个矩形处理<1s),那就去做吧,但.34s是我能为每个矩形做的最好的,所以我正在寻找一个与此代码等效的SQL(最好更好…哈哈)。

不幸的是,SQL表太大,无法在这里转储,所以我只能给你结构:

USE [Rectangles]
GO
/****** Object:  Table [dbo].[PreProcessed]    Script Date: 14/01/2014 16:39:33 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PreProcessed](
[ID] [int] NOT NULL,
[MinX] [int] NOT NULL,
[MinY] [int] NOT NULL,
[MaxX] [int] NOT NULL,
[MaxY] [int] NOT NULL,
[Area] [int] NOT NULL,
CONSTRAINT [PK_PreProcessed] PRIMARY KEY CLUSTERED 
(
[ID] ASC,
[Area] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

矩形类:

public class Rectangle
{
public Rectangle(UInt32 id, int minX, int maxX, int minY, int maxY, double weight)
{
ID = id;
MinX = minX;
MaxX = maxX;
MinY = minY;
MaxY = maxY;
Area = (maxX - minX)*(maxY - minY);
Weight = weight;
}
public Rectangle(Rectangle input, double weight)
{
ID = input.ID;
MinX = input.MinX;
MaxX = input.MaxX;
MinY = input.MinY;
MaxY = input.MaxY;
Area = input.Area;
Weight = weight;
}
public int Area { get; set; }
public int MinX { get; set; }
public int MaxX { get; set; }
public int MinY { get; set; }
public int MaxY { get; set; }
public UInt32 ID { get; set; }
public double Weight { get; set; }
}

SQL Server具有geometry数据类型。它具有计算多边形的交集和并集的方法。

最新更新