如何使用带有regex的linux-sed在文本文件中查找和删除长模式



我正在将许多bibtex文件解析为R以进行一些数据分析。然而,这些摘要经常会引起问题,我想提前使用sed将其删除。

我找到了sed 's/Abstracts=s[{][{]//' < file.bib

成功删除摘要条目和

sed 's/[}][}],//' < file.bib删除右括号和逗号。

然而,我无法以任何方式将两者结合起来,也无法删除两者之间的所有内容。例如尝试:

sed 's/^Abstracts=s[{][{][sS]*[}][}],$//' < file.bib

这就是bibtex参考的样子:

@article{ ISI:000072671200001,
Author = {Edmondson, A and Moingeon, B},
Title = {{From organizational learning to the learning organization}},
Journal = {{MANAGEMENT LEARNING}},
Year = {{1998}},
Volume = {{29}},
Number = {{1}},
Pages = {{5-20}},
Month = {{MAR}},
Abstract = {{This article reviews theories of organizational learning and presents a
framework with which to organize the literature. We argue that unit of
analysis provides one critical distinction in the organizational
learning literature and research objective provides another. The
resulting two-by-two matrix contains four categories of research, which
we have called: (2) residues (organizations as residues of past
learning); (2) communities (organizations as collections of individuals
who can learn and develop); (3) participation (organizational
improvement gained through intelligent activity of individual members),
and (4) accountability (organizational improvement gained through
developing individuals' mental models). We also propose a distinction
between the terms organizational learning and the learning organization.
Our subsequent analysis identifies relationships between disparate parts
of the literature and shows that these relationships point to individual
mental models as a critical source of leverage for creating learning
organizations. A brief discussion of the work of two of the most visible
researchers in this field, Peter Senge and Chris Argyris, provides
additional support for this type of change strategy.}},
DOI = {{10.1177/1350507698291001}},
ISSN = {{1350-5076}},
Unique-ID = {{ISI:000072671200001}},
}

这就是我希望它看起来的样子:

@article{ ISI:000072671200001,
Author = {Edmondson, A and Moingeon, B},
Title = {{From organizational learning to the learning organization}},
Journal = {{MANAGEMENT LEARNING}},
Year = {{1998}},
Volume = {{29}},
Number = {{1}},
Pages = {{5-20}},
Month = {{MAR}},
DOI = {{10.1177/1350507698291001}},
ISSN = {{1350-5076}},
Unique-ID = {{ISI:000072671200001}},
}

您可以尝试按顺序将sed命令彼此管道连接。类似这样的东西:

sed 's/Abstracts=s[{][{]//' < file.bib | sed 's/[}][}],//'

您也可以尝试在您的模式中使用OR Regex运算符,如:

sed 's/Abstracts=s[{][{]|[}][}],//' < file.bib

这两种方法都应该有效。我希望这能有所帮助。

这可能对你有用(GNU sed(:

sed '/^Abstract = {{/,/.*}},$/d' file

这使用范围运算符,,其与删除命令d组合删除从Abstract = {{开始到}},结束的行。

最新更新