正则表达式尝试
(\section{|\subsection{|\subsubsection{|\paragraph[^{]*{)(w)w*([ |}]*)
<标题>搜索文本section{intro to installation of apps}
subsection{another heading for myformatting{special}}
subsubsection{good morning, San Francisco}
paragraph{installation of backend services}
<标题>期望输出值除介词、连词和通常在标题上大写的词性外,所有的首字母都大写。
我想我应该缩小范围,所以让我借用一下美国政府印刷局的样式手册:
冠词a、an和The;介词at、by、for、in、of、on、to和up;连词and, as, but, if, or, and nor;复合数字的第二个元素不能大写。
41页
subsection{Installation guide for the server-side app myapp{webgen}}
subsection{Installation Guide for the Server-side App myapp{Webgen}}
或
subsection{Installation Guide for the Server-side App myapp{webgen}}
你会如何命名这种类型的字符串修改?
将REGEX应用于字符串之间的字符串?
将REGEX应用于字符串的一部分,当该部分落在两个其他字符串之间时?
对出现在两个之间的子字符串应用REGEX字符串中的其他子字符串?
& lt; 别的>
我匹配每个乳胶标题命令,包括{。这意味着我的表达式只匹配实际标题文本中的第一个单词。我不能用"或"空格包围整个标题代码,因为那样我几乎可以找到文档中的每个单词。此外,我必须小心标题内命令的格式。
其他有用的相关问题
- 使用SED的首字母大写
- https://superuser.com/questions/749164/how-to-use-regex-to-capitalise-the-first-letter-of-each-word-in-a-sentence
- 使用Sed将每个单词的第一个字母大写
- 使用vim 将选择的每个单词的首字母大写
所以在我看来,如果你需要实现像这样的伪代码:
- 我们在第一个单词上吗?如果是,则大写并继续。
- 当前的单词是"保留"吗?如果是,降低它并继续。
- 当前单词是数字吗?如果是,降低它并继续。
- 我们还在名单上吗?如果是,则逐字打印该行并继续。
另一个有用的规则可能是保留完全大写的单词,以防它们是首字母缩写。
下面的awk脚本可以满足您的需要。
#!/usr/bin/awk -f
function toformal(subject) {
return toupper(substr(subject,1,1)) tolower(substr(subject,2))
}
BEGIN {
# Reserved word list gets split into an array for easy matching.
reserved="at by for in of on to up and as but if or nor";
split(reserved,a_reserved," "); for(i in a_reserved) r[a_reserved[i]]=1;
# Same with the list of compound numerals. If this isn't what you mean, say so.
numerals="hundred thousand million billion";
split(numerals,a_numerals," "); for(i in a_numerals) n[a_numerals[i]]=1;
}
# This awk condition matches the lines we're interested in modifying.
/^\(section|subsection|subsubsection|paragraph)[{]/ {
# Separate the particular section and the text, then split text to an array.
section=$0; sub(/\/,"",section); sub(/[{].*/,"",section);
text=$0; sub(/^[^{]*[{]/,"",text); sub(/[}].*/,"",text);
size=split(text,atext,/[[:space:]]/);
# First word...
newtext=toformal(atext[1]);
for(i=2; i<=size; i++) {
# Reserved word...
if (r[tolower(atext[i])]) { newtext=newtext " " atext[i]; continue; }
# Compound numerals...
if (n[tolower(atext[i])]) { newtext=newtext " " tolower(atext[i]); continue; }
# # Acronyms maybe...
# if (atext[i] == toupper(atext[i])) { newtext=newtext " " atext[i]; continue; }
# Everything else...
newtext=newtext " " toformal(atext[i]);
}
print newtext;
next;
}
# Print the line if we get this far. This is a non-condition with
# a print-only statement.
1
下面是如何在Perl中使用模块Lingua::EN::Titlecase
和递归正则表达式完成此操作的示例:
use strict;
use warnings;
use Lingua::EN::Titlecase;
my $tc = Lingua::EN::Titlecase->new();
my $data = do {local $/; <> };
my ($kw_regex) = map { qr/$_/ }
join '|', qw(section subsection subsubsection paragraph);
$data =~ s/(\(?: $kw_regex))({(?:[^{}]++|(?2))*})/title_case($tc,$1,$2)/gex;
print $data;
sub title_case {
my ($tc, $p1, $p2) = @_;
$p2 =~ s/^{//;
$p2 =~ s/}$//;
if ($p2 =~ /\/ ) {
while ($p2 =~ /G(.*?)(\.*?)({(?:[^{}]++|(?3))*})/ ) {
my $next_pos = $+[0];
substr($p2, $-[1], $+[1] -$-[1], $tc->title($1));
substr($p2, $-[3], $+[3] -$-[3], title_case($tc,'',$3));
pos($p2) = $next_pos;
}
$p2 =~ s/G(.+)$/$tc->title($1)/e;
}
else {
$p2 = $tc->title($p2);
}
return $p1 . '{' . $p2 . '}';
}