我已经在这里工作了几个小时,这里没有一个解决方案真正帮助了我。我有一个格式为"NN:NN 字符串转到此处"的文本文件。实际文件如下。我需要从实际的字符串诗句中正则表达式章节:诗句。如您所见,并非所有都用换行符分隔。我得到的最接近的是(d{1,2}:d{1,2})[^d]*
但它只是真正分开了 NN:NN。
如何完成字符串分离?
1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham.
1:2 Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren; 1:3 And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram; 1:4 And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon; 1:5 And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse; 1:6 And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias; 1:7 And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa; 1:8 And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias; 1:9 And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias; 1:10 And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias; 1:11
And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon: 1:12 And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel; 1:13 And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor; 1:14 And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud; 1:15 And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob; 1:16 And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.
1:17 So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.
1:18 Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.
你很接近。以下方法应该有效:
preg_match_all("/(d{1,2}:d{1,2})([^d]*)/", $str, $output_array);
print_r(array_combine($output_array[1], $output_array[2]));
http://sandbox.onlinephpfunctions.com/code/e5522443d16558890431519ec6dd03a308ca1e32
正则表达式:(d+:d+)R?s*(.+?(?=s*d+:d+|$))
详情:
-
d
匹配一个数字(等于[0-9]
( -
R
匹配任何 Unicode 换行符序列 -
s
匹配任何空格字符 -
.+?
匹配任何字符(行终止符除外( -
$
断言字符串末尾的位置 -
?
0 次到一次之间的匹配 -
|
或 -
+
一次和无限次之间的比赛 -
*
零次和无限次之间的匹配
PHP代码:
$text = "1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham............";
preg_match_all("/(d+:d+)R?s*(.+?(?=s*d+:d+|$))/m", $text, $matches);
print_r(array_combine($matches[1], $matches[2]));
输出:
Array
(
[1:1] => The book of the generation of Jesus Christ, the son of David, the son of Abraham.
[1:2] => Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren;
[1:3] => And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram;
[1:4] => And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon;
[1:5] => And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse;
[1:6] => And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias;
[1:7] => And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa;
[1:8] => And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias;
[1:9] => And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias;
[1:10] => And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias;
[1:11] => And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon:
[1:12] => And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel;
[1:13] => And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor;
[1:14] => And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud;
[1:15] => And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob;
[1:16] => And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.
[1:17] => So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.
[1:18] => Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.
)
这不仅相当快,而且还会从文本值中修剪所有前导/尾随空格。 *所有文本行都以 :
、;
或 .
结尾 我正在利用这一事实来提高模式效率。
如果在实际项目中,某些句子包含换行符(您的示例不包含(,则在第二个模式分隔符之后添加s
以使.
也匹配换行符。
~(d{1,2}:d{1,2})s+(.*?[:;.](?=s*(?:d{1,2}:d{1,2})|$))~
2193步
模式演示
代码:(演示(
$text="1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham.
1:2 Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren; 1:3 And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram; 1:4 And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon; 1:5 And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse; 1:6 And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias; 1:7 And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa; 1:8 And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias; 1:9 And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias; 1:10 And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias; 1:11
And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon: 1:12 And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel; 1:13 And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor; 1:14 And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud; 1:15 And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob; 1:16 And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.
1:17 So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.
1:18 Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.";
$pattern='/(d{1,2}:d{1,2})s+(.*?[:;.](?=s*(?:d{1,2}:d{1,2})|$))/';
var_export(preg_match_all($pattern,$text,$out)?array_combine($out[1],$out[2]):[]);
输出:
array (
'1:1' => 'The book of the generation of Jesus Christ, the son of David, the son of Abraham.',
'1:2' => 'Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren;',
'1:3' => 'And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram;',
'1:4' => 'And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon;',
'1:5' => 'And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse;',
'1:6' => 'And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias;',
'1:7' => 'And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa;',
'1:8' => 'And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias;',
'1:9' => 'And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias;',
'1:10' => 'And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias;',
'1:11' => 'And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon:',
'1:12' => 'And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel;',
'1:13' => 'And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor;',
'1:14' => 'And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud;',
'1:15' => 'And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob;',
'1:16' => 'And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.',
'1:17' => 'So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.',
'1:18' => 'Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.',
)
解释:
~ #Pattern delimiter
(d{1,2}:d{1,2}) #Capture nn:nn as Group1
s+ #Match one or more whitespaces (including newlines)
( #Start Capture Group2
.*? #Lazily match zero or more non-newline characters
[:;.] #Match a colon, semi-colon, or dot
(?= #Start "lookahead" (aka: match but don't consume)
s* #Match zero or more whitespace characters
(?:d{1,2}:d{1,2}) #Match nn:nn
| #Or
$ #Match the end of the entire string
) #End "lookahead"
) #End Capture Group2
~ #Pattern delimiter
正则表达式回溯将使您的任务更轻松
/(?:d+:d+).*?(?=(?:d+:d+)|$)/s
请参阅 https://regex101.com/r/5UDJOz/1