每个产品都有一个LatLng
。随着时间的推移,有时产品可以被复制,但很难发现。价格或图片可能略有不同。
我想比较Products
在每个位置的ProductType
和Price
,并把它们放在地图上,这样更容易找到重复的。
对于每个产品,我计算其价格的5%,以便我可以将其与其他产品价格相加/减去,以找到大致匹配的产品。
Products
ProductID | ProductType | 价格 | 纬度 | 经度 | ABC | 红色小部件 | 500 | 12.34 | 67.89 |
---|---|---|---|---|
DEF | 红色小部件 | 505 | 12.34 | 67.89 |
MNO | 红色小部件 | 480 | 12.34 | 67.89 |
RST | 红色小部件 | 500 | 12.34 | 67.89 |
UVW | 红色小部件 | 300 | 12.34 | 67.89 |
. JKL | 蓝色小部件 | 800 | 76.54 | 32.10 |
蓝色小部件 | 800 | 45.67 | 23.45 |
第一件事:
AND (/*same price*/
b.Price = a.Price
OR
/*b.price is within percentage over a.price*/
(b.Price > a.Price AND b.Price < (a.Price+a.PricePercent))
OR
/*b.price is within percentage under a.price */
(b.Price < a.Price AND b.Price >(a.Price-a.PricePercent))
)
是(或应该是)与以下相同:
AND (
-- price within range +/- percent
b.Price < (a.Price+a.PricePercent)
AND b.Price >(a.Price-a.PricePercent)
)
第二个对我来说是有效的:
SQL小提琴
MS SQL Server 2017 Schema Setup:
查询1:
WITH cte AS
(
SELECT
p.ProductID,
p.ProductType,
p.Price,
(p.Price / 100) * 5 AS PricePercent,
ROUND(p.Latitude, 3) AS Latitude,
ROUND(p.Longitude, 3) AS Longitude
FROM
Products p
WHERE
p.Latitude IS NOT NULL AND p.Longitude IS NOT NULL
)
SELECT
DISTINCT a.ProductID,
a.Price,
a.Latitude, a.Longitude
FROM
cte a
INNER JOIN
/* ProductIDs don't match */
cte b ON a.ProductID <> b.ProductID
/* match only where location is the same*/
AND a.Latitude = b.Latitude
AND a.Longitude = b.Longitude
/* match only where Product Type is the same*/
AND a.ProductType = b.ProductType
/*match only if price is the same, or within 5% above or 5% below price*/
AND (
-- price within range +/- percent
b.Price < (a.Price+a.PricePercent)
AND b.Price >(a.Price-a.PricePercent)
)
结果:
ProductID | 价格 | 纬度 | 经度 | ABC | 500 | 12 | 68 |
---|---|---|---|
DEF | 505 | 12 | 68 |
RST | 500 | 12 | 68 |