SQL根据数字范围选择的行，这是长文本列的一部分

我在Oracle SQL DB中具有以下数据：

create table test_tbl(userid varchar2(10) not null, username varchar2(50), employment_data varchar2(500));
insert into test_tbl(userid, username, employment_data) values
(1, 'jsmith', 'Microsoft-Programmer-2000-April 15,Google-Tester-2002-March 3,Apple-Manager-2008-October 29');
insert into test_tbl(userid, username, employment_data) values
(2, 'mblack', 'Facebook-Tester-2002-May 12');
insert into test_tbl(userid, username, employment_data) values
(3, 'fnlee', 'Apple-Builder-1998-June 2,Google-Tester-2004-July 4');
insert into test_tbl(userid, username, employment_data) values
(4, 'jstone', 'Microsoft-Manager-2002-July 1');
insert into test_tbl(userid, username, employment_data) values
(5, 'msfeng', 'Microsoft-Tester-1998-May 12,Facebook-Programmer-2001-January 23,Microsoft-Manager-2005-May 21');
insert into test_tbl(userid, username, employment_data) values
(6, 'jdoe', null);

我可以将连字符和逗号更改为其他分离器，但是applyment_data列的内容始终需要是[逗号/其他一些定界线]分离的[连字符/其他一些定界符]分离的列表，其中包含公司，位置，雇用年，雇用月份。

我想让所有在给定的年份中雇用在一家公司工作的人的用户群。示例输入和所需输出：

输入：Facebook，1995，2001 --->输出：5
输入：Facebook，2000，2004 --->输出：2，5
输入：Microsoft，1996，2009 --->输出：1，4，5
输入：Microsoft，2001，2003 --->输出：4

我尝试使用XMLTABLE，但不知道如何处理空间。

这次我还尝试了REGEXP_SUBSTR，并取得了更大的成功，但是我仍然无法完成我想要的方式。另外，这要求我使用WITH子句，这是一个问题，这是一个问题，因为查询在代码中的构建方式，其解释是一个长篇小说，但您必须对此表示敬意;(p>谢谢

这个复杂的任务应分为较小的步骤。
解除步骤的最佳方法是使用WITH子句构建查询。

当您了解这些步骤时，您可以根据您的要求轻松将查询转换为嵌套的步骤。

with commas as (
  select t.*, regexp_substr(t.employment_data, '(^|,)([^,]+)',1,level,'',2) job
  from test_tbl t
  connect by prior dbms_random.value() is not null 
      and prior userid=userid 
      and regexp_substr(t.employment_data, '(^|,)([^,]+)',1,level,'',2) is not null
)
, hyphens as (
  select t.*
    , regexp_substr(t.job, '^s*(.+?)s*-s*(.+?)s*-(d{4})-(.+)$', 1, 1, '', 1) company
    , regexp_substr(job, '^s*(.+?)s*-s*(.+?)s*-(d{4})-(.+)$', 1, 1, '', 2) title
    , to_number(regexp_substr(job, '^s*(.+?)s*-s*(.+?)s*-(d{4})-(.+)$', 1, 1, '', 3), '9999') year
    , regexp_substr(job, '^s*(.+?)s*-s*(.+?)s*-(d{4})-(.+)$', 1, 1, '', 4) day_month
  from commas t
)
, filters as (
  select 1 id, 'Facebook' company, 1995 start_year, 2001 end_year from dual union all
  select 2, 'Facebook', 2000, 2004 from dual union all
  select 3, 'Microsoft', 1996, 2009 from dual union all
  select 4, 'Microsoft', 2001, 2003 from dual 
)
, dst as (
  select distinct f.company,f.start_year,f.end_year,t.userid
  from hyphens t join filters f 
    on t.company=f.company and t.year >=f.start_year and t.year <=f.end_year
)
select t.company, t.start_year, t.end_year, listagg(t.userid, ',') within group(order by t.userid) userids
from dst t 
group by t.company,t.start_year,t.end_year
order by t.company,t.start_year,t.end_year

解释：

首先，我们将employment_data字段分开，并产生commas CTE。为此，我们使用层次查询和regexp_substr。请注意，我们使用虚拟prior dbms_random.value()检查以防止ORA-01436'通过'循环错误连接。结果放在job字段中。

然后，我们从job提取所需的字段，在连字符上分裂并产生hyphens CTE。我们使用regexp_substr，它很简单。我们使用s*来照顾多余的空格。

之后，我们定义过滤器。他们本来可以放在查询的顶部，没关系。

这一刻，我们拥有hyphens CTE，该CCE持有company，year，userid进行过滤。接下来，我们将这些字段与过滤器匹配并产生dst CTE。

和在最后一个选择中，我们使用ListAgg来连接唯一的匹配ID和输出结果userids字段。

结果是：

COMPANY    START_YEAR  END_YEAR  USERIDS  
---------- ----------- --------- -------- 
Facebook   1995        2001      5        
Facebook   2000        2004      2,5      
Microsoft  1996        2009      1,4,5    
Microsoft  2001        2003      4

您想要的是正则是在这里查看

REGEX_SUBSTR将要大得多，您需要尽可能缩小可能性，因此可以使用Regex_like，表达式可能会出现类似的内容：

^(company_name_you_searchfor)-[A-Za-z]*-(2000|2001|2002|2003|2004).*

另一个选择是您使用Regexp_instr查询两个Regex。公司的第一个年度第二年

最后我可以想到仪器功能

相关内容

最新更新

热门标签：