用于捕获电子邮件编码附件字符串的正则表达式



我想编写一个正则表达式,从电子邮件MIME消息字符串(eml)中捕获编码单词部分。 例如,这是电子邮件的一部分:

<div dir=3D"ltr"><br clear=3D"all"><div><div dir=3D"ltr"><div style=3D"dire=
ction:rtl">-------------</div><div style=3D"direction:rtl">=D7=91=D7=91=D7=
=A8=D7=9B=D7=94,</div><div style=3D"direction:rtl">=D7=90=D7=91=D7=99=D7=A2=
=D7=93 =D7=9B=D7=94=D7=9F</div></div></div>
</div>
--20cf3003bc2e044e980500f755dc--
--20cf3003bc2e044e9d0500f755de
Content-Type: text/plain; charset=US-ASCII; name="EhudBanay.txt"
Content-Disposition: attachment; filename="EhudBanay.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_hz0z4us30
aHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRz
aGFyZS5jb20lMkZmaWxlcyUyRjM4NzAxNTA2MDclMkZFaHVkX0JhbmFpXy1fVGlwX1RpcGFfXzE5
OThfLnJhciZoPTdBUUZRb0RMQQ0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1o
dHRwcyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMzk5MzMyNjg1MSUyRkVo
dWRfQmFuYWlfLV9UYWhhdF9TaWFoX0hhWWFzbWluXzE5ODkucmFyJmg9QkFRRWhJY3djDQoNCmh0
dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hh
cmUuY29tJTJGZmlsZXMlMkYzMjQwMTM5MTMyJTJGRWh1ZF9CYW5haV8tX1Jlc2lzZXlfTGFpbGFf
MjAxMS5yYXImaD1RQVFHN0pGWXUNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9
aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjE5NTE2ODA4MTglMkZF
aHVkX0JhbmFpXy1fT2RfTWVhdF9fMTk5Nl8ucmFyJmg9YUFRRUVuaUIxDQoNCmh0dHBzOi8vd3d3
LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJG
ZmlsZXMlMkYyMjc2NTc5MTgzJTJGRWh1ZF9CYW5haV8tX0thcm92X18xOTg5Xy5yYXImaD1mQVFH
a2dYVXENCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3
d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjQwOTg0NjQzNjYlMkZFaHVkX0JhbmFpXy1fSGFT
aGxpc2hpX18xOTkyXy5yYXImaD1GQVFGNjRmY3gNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29t
L2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjMxNDY1
NDc2OTElMkZFaHVkX0JhbmFpXy1fRWh1ZF9CYW5haV9WZUhhUGxpdGltX18xOTg3X19GLnBhcnQy
LnJhciZoPUJBUUVoSWN3Yw0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1odHRw
cyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMjYwNDg2Njc1MiUyRkVodWRf
QmFuYWlfLV9FaHVkX0JhbmFpX1ZlSGFQbGl0aW1fXzE5ODdfX0YucGFydDEucmFyJmg9REFRSHpG
LXZBDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3
LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYyNjQxMzIwNzg2JTJGRWh1ZF9CYW5haV8tX0Ryb3Bz
X09mX1RoZV9OaWdodF9fMjAxMV8ucmFyJmg9Y0FRRlRZQ1pTDQoNCmh0dHBzOi8vd3d3LmZhY2Vi
b29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMl
MkYzMTQ3NzUzNzAwJTJGRWh1ZCUyNTIwQmFuYWklMjUyMC0lMjUyMEtlZXAlMjUyMERyaXZpbmcu
cGFydDEucmFyJmg9S0FRRWtPUkZTDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91
PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYxNzc1NDI5NDY3JTJG
RWh1ZF9CYW5haV8tX0FuZV9MaV9fMjAwNF8ucmFyJmg9dkFRRWlEWXFu
--20cf3003bc2e044e9d0500f755de--

我只想抓住这部分:

aHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRz
aGFyZS5jb20lMkZmaWxlcyUyRjM4NzAxNTA2MDclMkZFaHVkX0JhbmFpXy1fVGlwX1RpcGFfXzE5
OThfLnJhciZoPTdBUUZRb0RMQQ0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1o
dHRwcyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMzk5MzMyNjg1MSUyRkVo
dWRfQmFuYWlfLV9UYWhhdF9TaWFoX0hhWWFzbWluXzE5ODkucmFyJmg9QkFRRWhJY3djDQoNCmh0
dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hh
cmUuY29tJTJGZmlsZXMlMkYzMjQwMTM5MTMyJTJGRWh1ZF9CYW5haV8tX1Jlc2lzZXlfTGFpbGFf
MjAxMS5yYXImaD1RQVFHN0pGWXUNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9
aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjE5NTE2ODA4MTglMkZF
aHVkX0JhbmFpXy1fT2RfTWVhdF9fMTk5Nl8ucmFyJmg9YUFRRUVuaUIxDQoNCmh0dHBzOi8vd3d3
LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJG
ZmlsZXMlMkYyMjc2NTc5MTgzJTJGRWh1ZF9CYW5haV8tX0thcm92X18xOTg5Xy5yYXImaD1mQVFH
a2dYVXENCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3
d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjQwOTg0NjQzNjYlMkZFaHVkX0JhbmFpXy1fSGFT
aGxpc2hpX18xOTkyXy5yYXImaD1GQVFGNjRmY3gNCg0KaHR0cHM6Ly93d3cuZmFjZWJvb2suY29t
L2wucGhwP3U9aHR0cHMlM0ElMkYlMkZ3d3cucmFwaWRzaGFyZS5jb20lMkZmaWxlcyUyRjMxNDY1
NDc2OTElMkZFaHVkX0JhbmFpXy1fRWh1ZF9CYW5haV9WZUhhUGxpdGltX18xOTg3X19GLnBhcnQy
LnJhciZoPUJBUUVoSWN3Yw0KDQpodHRwczovL3d3dy5mYWNlYm9vay5jb20vbC5waHA/dT1odHRw
cyUzQSUyRiUyRnd3dy5yYXBpZHNoYXJlLmNvbSUyRmZpbGVzJTJGMjYwNDg2Njc1MiUyRkVodWRf
QmFuYWlfLV9FaHVkX0JhbmFpX1ZlSGFQbGl0aW1fXzE5ODdfX0YucGFydDEucmFyJmg9REFRSHpG
LXZBDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3
LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYyNjQxMzIwNzg2JTJGRWh1ZF9CYW5haV8tX0Ryb3Bz
X09mX1RoZV9OaWdodF9fMjAxMV8ucmFyJmg9Y0FRRlRZQ1pTDQoNCmh0dHBzOi8vd3d3LmZhY2Vi
b29rLmNvbS9sLnBocD91PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMl
MkYzMTQ3NzUzNzAwJTJGRWh1ZCUyNTIwQmFuYWklMjUyMC0lMjUyMEtlZXAlMjUyMERyaXZpbmcu
cGFydDEucmFyJmg9S0FRRWtPUkZTDQoNCmh0dHBzOi8vd3d3LmZhY2Vib29rLmNvbS9sLnBocD91
PWh0dHBzJTNBJTJGJTJGd3d3LnJhcGlkc2hhcmUuY29tJTJGZmlsZXMlMkYxNzc1NDI5NDY3JTJG
RWh1ZF9CYW5haV8tX0FuZV9MaV9fMjAwNF8ucmFyJmg9dkFRRWlEWXFu

我无法用正则表达式做到这一点,所以我尝试使用以下行,该行始终以"--"开头并以"--"结尾。 请注意,在所需部件之前始终有一个空行。我试过:"(\s).*(--)$",但它只返回以下行。

有人可以帮忙吗?

听起来您正在尝试解析多部分/混合电子邮件。已经有大多数语言的库可以做到这一点。如果你想写你自己的,我建议遵循多部分消息的结构。

  • 查找内容类型标头中定义的边界
  • 将消息拆分为由边界分隔的部分(前缀为 --)。
  • 对于每个部分,请查找表示标头末尾的两个连续换行符的第一个实例。

虽然正则表达式可能对此有所帮助。我不确定它是解析结构化消息的正确工具。

您可以使用此正则表达式:

ns*nK(?:[^-]{2})*.?(?=n--)

在线演示

最新更新