OPENJSON没有检测到重复的值



我需要将一些销售记录导入并解析到数据库中。
有些客户有不止一种产品,因此在JSONArray中有几个产品不同但收货地址相同的订单。

当我试图将地址保存在Addresses表中时出现问题:
显然,我只需要每个地址的一条记录,为了实现这一点,我计算了地址字段的Hash,并将其与表中已经存在的哈希进行比较:

看起来(好吧,我敢肯定),查询检查只有第一次,如果Hash已经存在,如果它不存在,它添加尽可能多的行作为订单计数:

我可以想象这是OPENJSON函数的标准行为,因为JSON有效载荷的解析似乎是由内循环进行的,因此我认为我必须使用不同的方法....但是我不知道该用什么。

这里是JSON有效载荷

declare @json nvarchar(max)=N'[
{
"id": 21660,
"currency": "USD",
"total": "15.00",
"shipping": {
"first_name": "Charles",
"last_name": "Leuschke",
"address_1": "3121 W Olive Ave",
"city": "Burbank",
"state": "CA",
"postcode": "91505",
"country": "US"
},
"line_items": [
{
"id": 1052
}
]
},
{
"id": 21659,
"currency": "USD",
"total": "38.00",
"shipping": {
"first_name": "Charles",
"last_name": "Leuschke",
"address_1": "3121 W Olive Ave",
"city": "Burbank",
"state": "CA",
"postcode": "91505",
"country": "US"
},
"line_items": [
{
"id": 1050
}
]
},
{
"id": 21658,
"currency": "USD",
"total": "38.00",
"shipping": {
"first_name": "Charles",
"last_name": "Leuschke",
"address_1": "3121 W Olive Ave",
"city": "Burbank",
"state": "CA",
"postcode": "91505",
"country": "US"
},
"line_items": [
{
"id": 1048
}
]
}
]'

和(简化的)query

Insert Into @Addresses
(
orderId,
fullName,
addressLine1,
city,
stateOrProvince,
postalCode,
countryCode,
addressCode
)
SELECT
o.orderId,
concat(s.firstName,' ',s.lastName),
s.addressLine1,
s.city,
s.stateOrProvince,
s.postalCode,
s.countryCode,
convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName, ' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
FROM OPENJSON(@json) 
WITH  (
orderId                     nvarchar(64)    '$.id',
shipping                    nvarchar(max)   '$.shipping' AS JSON
) o  
CROSS APPLY OPENJSON(shipping) 
WITH  (
firstName                   nvarchar(128)   '$.first_name',
lastName                    nvarchar(128)   '$.last_name',
addressLine1                nvarchar(128)   '$.address_1',
city                        nvarchar(128)   '$.city',
stateOrProvince             nvarchar(64)    '$.state',
postalCode                  nvarchar(64)    '$.postcode',
countryCode                 nvarchar(4)     '$.country'
) s
left join @Addresses a on a.addressCode=convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
where a.addressCode is null

我还准备了一个sqlfiddle,您可以看到它返回3行,而目标是只获得一个

https://dbfiddle.uk/?rdbms=sqlserver_2019&小提琴= 3031 a99e3cd24f6bf383c0de29cf19a1

请使用标准的ROW_NUMBER()重复数据删除方法,例如


WITH cte AS (
SELECT
o.orderId,
concat(s.firstName,' ',s.lastName) fullName,
s.addressLine1,
s.city,
s.stateOrProvince,
s.postalCode,
s.countryCode,
h.addressCode,
ROW_NUMBER() OVER (PARTITION BY h.addressCode ORDER BY o.OrderId ASC) rn
FROM OPENJSON(@json) 
WITH  (
orderId nvarchar(64) '$.id',
shipping nvarchar(max) '$.shipping' AS JSON
) o  
CROSS APPLY OPENJSON(shipping) 
WITH  (
firstName nvarchar(128) '$.first_name',
lastName nvarchar(128) '$.last_name',
addressLine1 nvarchar(128) '$.address_1',
city nvarchar(128) '$.city',
stateOrProvince nvarchar(64) '$.state',
postalCode nvarchar(64) '$.postcode',
countryCode nvarchar(4) '$.country'
) s
CROSS APPLY (
VALUES (CONVERT(nvarchar(64), HASHBYTES('SHA1', CONCAT(s.firstName ,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2))
) h (addressCode)
LEFT JOIN @Addresses a ON a.addressCode = h.AddressCode
WHERE a.addressCode IS NULL
)
INSERT INTO @Addresses
(
orderId,
fullName,
addressLine1,
city,
stateOrProvince,
postalCode,
countryCode,
addressCode
)
SELECT orderId,
fullName,
addressLine1,
city,
stateOrProvince,
postalCode,
countryCode,
addressCode
FROM cte
WHERE rn = 1;

注意:如果您使用CROSS APPLY来计算哈希码,可以避免多次计算。

最新更新