在oracle sql中使用regexp_replace规范地址



非常感谢大家的帮助。

我们正在规范客户的地址。

我有一个主客户表,其中包含客户数据,地址字段要标准化。

我有一个映射表,包含from_str到to_str值的映射。

我需要将to_str对应于from_str到地址数据中。

如果标准化后的地址长度超过35个字符,则从字符串右侧到末尾的前一个空格(' ')作为单独的字段创建address2

你可以帮助与sql或pl/sql?使用Oracle 12c数据库

我在下面写的代码并不适用于所有from_str值…

任何帮助都是感激的。

code so far:

with addresses as
(
select cust_id,address addr from
(
select 10 cust_id,'9 Help Street, Level 4' address from dual union all
select 11 cust_id,'22 Victoria Street' address from dual union all
select 12 cust_id,'1495 Franklin Str' address from dual union all
select 13 cust_id,'30 Hasivim St.,Petah-Tikva' address from dual union all
select 14 cust_id,'2 Jakaranda St' address from dual union all
select 15 cust_id,'61, Science Park Rd' address from dual union all
select 16 cust_id,'61, Social park road' address from dual union all
select 17 cust_id,'Av. Hermanos Escobar 5756' address from dual union all
select 18 cust_id,'Ave. Hermanos Escobar 5756' address from dual union all
select 19 cust_id,'8000 W FLORISSANT AVE' address from dual union all
select 20 cust_id,'8600 MEMORIAL PKWY SW' address from dual union all
select 21 cust_id,'8200 FLORISSANTMEMORIALWAYABOVE SW' address from dual union all
select 22 cust_id,'8600 MEMORIALFLORISSANT PKWY SW' address from dual
) t1
),
replacements as
(
select id,to_str,from_string from_str from
(
select 1 id,'St' to_str,'Street' from_string from dual union all
select 2 id,'St' to_str,'St.' from_string from dual union all
select 3 id,'St' to_str,'Str' from_string from dual union all
select 4 id,'St' to_str,'St' from_string from dual union all
select 5 id,'Rd' to_str,'Rd.' from_string from dual union all
select 6 id,'Rd' to_str,'road' from_string from dual union all
select 7 id,'Av' to_str,'Av.' from_string from dual union all
select 8 id,'Av' to_str,'Ave.' from_string from dual union all
select 9 id,'Av' to_str,'Avenue' from_string from dual union all
select 10 id,'Av' to_str,'Aven.' from_string from dual union all
select 11 id,'West' to_str,'W' from_string from dual union all
select 12 id,'South West' to_str,'SW.' from_string from dual
) t2
),
r(addr,test_addr,l) as 
(
select  addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr,
id - 1
from  
addresses,
replacements
where id = (select count(*) from replacements)
union all
select addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr,
l - 1
from  r,
replacements
where id = l
)
select addr,test_addr,l
from  r
where l=0
;

预期输出:

cust_id address
10      9 Help St, Level 4
11      22 Victoria St
12      1495 Franklin St
13      30 Hasivim St ,Petah-Tikva
14      2 Jakaranda St
15      61, Science Park Rd
16      61, Social park Rd
17      Av Hermanos Escobar 5756
18      Av Hermanos Escobar 5756
19      8000 West FLORISSANT Ave
20      8600 MEMORIAL PKWY South West

如果地址长度大于35个字符,则预期输出为:

cust_id address                              address2
21      8200 FLORISSANTMEMORIALWAYABOVE      South West
22      8600 MEMORIALFLORISSANT PKWY         South West

这是一个…应用多个替换操作的有趣方式。你有两个问题,正如你提到的。对于第一个,CTE的递归部分在addr上执行regexp_replace(),而不是test_addr(前一个递归步骤的修改输出)。因此,只有列表中的最后一条规则才会被应用。

r(addr,test_addr, l) as 
(
select  addr,regexp_replace(addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, 
id - 1
from  
addresses,
replacements
where id = (select count(*) from replacements)
union all
-- if you do regexp_replace on addr, it throws out the previous replace (which is in r.test_addr)
select addr,regexp_replace(test_addr,'(^|W)' || from_str || '(W|$)','1' || to_str || '2') test_addr, 
l - 1
from  r,
replacements
where id = l
)

对于"超过35个字符"的字符问题,我建议使用substr/instr -虽然读起来很尴尬,但它们通常很快。

select addr,test_addr,l, 
case when length(test_addr) > 35 then
substr(test_addr, 1, instr(substr(test_addr,1,35), ' ', -1))
else test_addr
end as addr1,
case when length(test_addr) > 35 then
substr(test_addr, instr(substr(test_addr,1,35), ' ', -1))
else null
end as addr2
from  r
where l=0
;

可能有更优雅的方式来做这部分,这只是我想到的第一件事。

最新更新