如何区分电子邮件IMAP中的内联图像和签名和其他空白图像



我正在使用mailkit从邮箱中获取电子邮件并将其保存到数据库中以显示在我的MVC应用程序中。

我将html电子邮件保存为数据库中的纯文本,我可以获取附件并将其保存在文件系统中,但是当电子邮件中有inline图像时,我会出现问题,因为签名和其他空白映像也被保存为附件在文件系统中。

有没有办法区分内联附件和签名或其他空白图像?

预先感谢

您使用哪个IMAP库无关需要使用一些创造力来解决。

您可以做的是从FAQ中的HtmlPreviewVisitor示例开始,并将其修改为每一个so Lightsly,以将附件拆分为2个列表:

  1. 实际附件列表
  2. html引用的图像列表实际(通过行走HTML并跟踪哪些图像被引用)

代码:

/// <summary>
/// Visits a MimeMessage and splits attachments into those that are
/// referenced by the HTML body vs regular attachments.
/// </summary>
class AttachmentVisitor : MimeVisitor
{
    List<MultipartRelated> stack = new List<MultipartRelated> ();
    List<MimeEntity> attachments = new List<MimeEntity> ();
    List<MimePart> embedded = new List<MimePart> ();
    bool foundBody;
    /// <summary>
    /// Creates a new AttachmentVisitor.
    /// </summary>
    public AttachmentVisitor ()
    {
    }
    /// <summary>
    /// The list of attachments that were in the MimeMessage.
    /// </summary>
    public IList<MimeEntity> Attachments {
        get { return attachments; }
    }
    /// <summary>
    /// The list of embedded images that were in the MimeMessage.
    /// </summary>
    public IList<MimePart> EmbeddedImages {
        get { return embedded; }
    }
    protected override void VisitMultipartAlternative (MultipartAlternative alternative)
    {
        // walk the multipart/alternative children backwards from greatest level of faithfulness to the least faithful
        for (int i = alternative.Count - 1; i >= 0 && !foundBody; i--)
            alternative[i].Accept (this);
    }
    protected override void VisitMultipartRelated (MultipartRelated related)
    {
        var root = related.Root;
        // push this multipart/related onto our stack
        stack.Add (related);
        // visit the root document
        root.Accept (this);
        // pop this multipart/related off our stack
        stack.RemoveAt (stack.Count - 1);
    }
    // look up the image based on the img src url within our multipart/related stack
    bool TryGetImage (string url, out MimePart image)
    {
        UriKind kind;
        int index;
        Uri uri;
        if (Uri.IsWellFormedUriString (url, UriKind.Absolute))
            kind = UriKind.Absolute;
        else if (Uri.IsWellFormedUriString (url, UriKind.Relative))
            kind = UriKind.Relative;
        else
            kind = UriKind.RelativeOrAbsolute;
        try {
            uri = new Uri (url, kind);
        } catch {
            image = null;
            return false;
        }
        for (int i = stack.Count - 1; i >= 0; i--) {
            if ((index = stack[i].IndexOf (uri)) == -1)
                continue;
            image = stack[i][index] as MimePart;
            return image != null;
        }
        image = null;
        return false;
    }
    // called when an HTML tag is encountered
    void HtmlTagCallback (HtmlTagContext ctx, HtmlWriter htmlWriter)
    {
        if (ctx.TagId == HtmlTagId.Image && !ctx.IsEndTag && stack.Count > 0) {
            // search for the src= attribute
            foreach (var attribute in ctx.Attributes) {
                if (attribute.Id == HtmlAttributeId.Src) {
                    MimePart image;
                    if (!TryGetImage (attribute.Value, out image))
                        continue;
                    if (!embedded.Contains (image))
                        embedded.Add (image);
                }
            }
        }
    }
    protected override void VisitTextPart (TextPart entity)
    {
        TextConverter converter;
        if (foundBody) {
            // since we've already found the body, treat this as an
            // attachment
            attachments.Add (entity);
            return;
        }
        if (entity.IsHtml) {
            converter = new HtmlToHtml {
                HtmlTagCallback = HtmlTagCallback
            };
            converter.Convert (entity.Text);
        }
        foundBody = true;
    }
    protected override void VisitTnefPart (TnefPart entity)
    {
        // extract any attachments in the MS-TNEF part
        attachments.AddRange (entity.ExtractAttachments ());
    }
    protected override void VisitMessagePart (MessagePart entity)
    {
        // treat message/rfc822 parts as attachments
        attachments.Add (entity);
    }
    protected override void VisitMimePart (MimePart entity)
    {
        // realistically, if we've gotten this far, then we can treat
        // this as an attachment even if the IsAttachment property is
        // false.
        attachments.Add (entity);
    }
}

使用它:

var visitor = new AttachmentVisitor ();
message.Accept (visitor);
// Now you can use visitor.Attachments and visitor.EmbeddedImages

更简单,尽管更少的错误(正弦它实际上并未验证HTML引用的图像),但这样做的方法是:

var embeddedImages = message.BodyParts.OfType<MimePart> ().
    Where (x => x.ContentType.IsMimeType ("image", "*") &&
           x.ContentDisposition != null &&
           x.ContentDisposition.Disposition.Equals ("inline" StringComparison.OrdinalIgnoreCase));

现在您有了embeddedImages的列表,您必须找出一种方法来确定它们是仅在签名中使用还是在HTML中使用的其他地方。

很可能还必须分析HTML本身。

也可能值得注意的是,某些HTML邮件将引用位于邮件中 not 的网络上的图像。如果您也需要这些图像,如果我提供的代码无法将其定位在消息的哑剧中,则需要修改TryGetImage从Web下载图像。

对于文本/普通消息(根本无法使用图像),将签名与消息主体的其余部分分开的通用约定是只有2个破折号和一个空间的线:--

从我对具有签名的HTML消息的有限经验来看,它们似乎并未遵循类似的约定。查看我使用Outlook从Microsoft上收到的一些HTML消息,它们在消息结束时似乎在<table>之内。但是,这假定该消息不是答复。一旦您开始解析消息回复,此<table>最终出现在消息的中间,因为要回复的原始消息在最后。

由于每个人的签名也不同,我不确定这种<table>是否是Outlook惯例,或者人们是否正在手动构建他们的签名,而他们都只是在使用桌子(我也只看到一个桌子)很少有,大多数不使用签名,所以我的样本量很小)。

使用https://mailsystem.codeplex.com/:

班级阅读电子邮件:

class readMail:IDisposable
    {
        public Imap4Client client = new Imap4Client();
        public readMail(string mailServer, int port, bool ssl, string login, string password)
        {
            Pop3Client pop = new Pop3Client();
            if (ssl)
            {
                client.ConnectSsl(mailServer, port);
            }
            else
            client.Connect(mailServer, port);
            client.Login(login, password);
        }
        public IEnumerable<Message> GetAllMails(string mailBox)
        {
            IEnumerable<Message> ms = GetMails(mailBox, "ALL").Cast<Message>();
            return GetMails(mailBox, "ALL").Cast<Message>();
        }
        protected Imap4Client Client
        {
            get { return client ?? (client = new Imap4Client()); }
        }
        private MessageCollection GetMails(string mailBox, string searchPhrase)
        {
            try
            {
                MessageCollection messages = new MessageCollection();
                Mailbox mails = new Mailbox();
                mails = Client.SelectMailbox(mailBox);
                messages = mails.SearchParse(searchPhrase);
                return messages;
            }
            catch(Exception ecc)
            {
            }
        }
        public void Dispose()
        {
            throw new NotImplementedException();
        }
    }

,然后:

using (readMail read = new readMail("host.name.information", port, true, username, password) )
            {

                var emailList = read.GetAllMails(this.folderEmail);
                int k = 0;
                Mailbox bbb = read.client.SelectMailbox(this.folderEmail);
                int[] unseen = bbb.Search("UNSEEN");
                foreach (Message email in emailList)
                {
                  /// Contains all parts for which no Content-Disposition header was found. Disposition is left to the final agent.
                  MimePartCollection im1= email.UnknownDispositionMimeParts;
                  //Collection containing embedded MIME parts of the message (included text parts)
                  EmbeddedObjectCollection im2 = email.EmbeddedObjects;
                  //Collection containing attachments of the message.
                  AttachmentCollection attach=email.Attachments;
               }
            }

就我而言,所有签名的图像都在未知的示例示例中,但这可能是一个特定的情况(不同的电子邮件客户端等)。因此,对于我所知道的,我没有找到任何库将嵌入图像与上下文图像分开的库签名图像

最新更新