最多可连续两年



如果我有这样的字符串:

"我从1998年开始,到2012年左右结束;

我想提取";年;字符串的一部分(年份之间的字符串可以是任何长度的任何字符串(,然后取最近的一年(因此是最长的一年(,我试图用这个来识别年份:

SUBSTRING(YourValue, NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue), 0), 4)

但我不知道如何获得多年,然后进行比较。

此外,最近的一年不一定是字符串的末尾:

"我在2013年毕业,但从1997年开始;

我使用的是SQL Server 2012(v11.0.7507(

DECLARE @sometext VARCHAR(max) = 'I started in 1998 and finished around 2012';
select @sometext;
select TOP 1 * from string_split(@sometext,' ') where value between '1000' and '9999' order by value desc;

DBFIDDLE

或者,也许:

select TOP 1 *  
from string_split(replace(replace(@sometext,'.',' '),',',' '),' ')
where value between '1000' and '9999' 
order by value desc;

当你的句子以"."结尾时。

这并不漂亮,但由于你无法判断日期,这肯定是。

CTE搜索日期,否则返回NULL,即没有日期或只有一个日期。

下面的选择检查最低日期,因为它必须是开始日期,还捕获CTE的一个或两个值是否为NULL

CREATE TABLE t1 (id int, YourValue varchar(299))
INSERT INTO t1 VALUEs(1,'I started in 1998 and finished around 2012'),(2,'I was done in 2013 but started in 1997.')
,(2,'I started in 2022.')
WITH CTE AS 
(SELECT CASE WHEN PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) > 0 THEN
SUBSTRING(YourValue, NULLIF(PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue), 0), 4) ELSE NULL END
val1 ,
CASE WHEN PATINDEX('%[0-9][0-9][0-9][0-9]%', RIGHT(YourValue, LEn(YourValue) - PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) - 4)) > 0 
THEN SUBSTRING(RIGHT(YourValue, LEn(YourValue) - PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) - 4) ,
PATINDEX('%[0-9][0-9][0-9][0-9]%', RIGHT(YourValue, LEn(YourValue) - PATINDEX('%[0-9][0-9][0-9][0-9]%', YourValue) - 4)),4) 
ELSE NULL END as val2
FROM t1)
SELECT CASE WHEN val2 IS NULL THEn val1 
ELSE
CASE WHeN val1 <= Val2 THEN val1 ELSE val2 END END as [STartdate]
,CASE WHEN val2 IS NULL THEn NULL
ELSE CASe WHEN val1 > Val2 THEN val1 ELSE val2 END END as [ENDdate] FROM CTE
STartdate|ENDdate:--------|:------1998 | 20121997 | 20132022 |null

db<gt;小提琴这里

请尝试以下解决方案。

它使用XML和XQuery。

它将从SQL Server 2012开始工作。

注意事项:

  • CROSS APPLY正在将输入标记列转换为XML。它完成了用于句子中每个单词的标记化
  • XQuery.query()方法使用FLWOR表达式遍历所有令牌,在检查YEAR数据类型的过程中(xs:gYear用XML表示(,以及获得最大年份

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens varchar(MAX));
INSERT INTO @tbl (tokens) VALUES
('I started in 1998 and finished around 2012'),
('I was done in 2013 but started in 1997.'),
('I started in 2022.');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1)
, @dot CHAR(1) = '.';
SELECT t.*
, c.query('
for $x in max(/root/r[xs:gYear(./text()[1]) instance of xs:gYear]/text())
return $x
').value('.','CHAR(4)') AS Result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
REPLACE(REPLACE(tokens,@dot, ''), @separator, ']]></r><r><![CDATA[') + 
']]></r></root>' AS XML)) AS t1(c);

输出

+----+--------------------------------------------+--------+
| id |                   tokens                   | Result |
+----+--------------------------------------------+--------+
|  1 | I started in 1998 and finished around 2012 |   2012 |
|  2 | I was done in 2013 but started in 1997.    |   2013 |
|  3 | I started in 2022.                         |   2022 |
+----+--------------------------------------------+--------+

最新更新