撤消使用 imaplib 获取的电子邮件"marked as read"状态



我写了一个python脚本来获取我所有的gmail。我有数十万封旧电子邮件,其中约有10,000封未阅读。

成功获取我的所有电子邮件后,我发现Gmail已将所有获取的电子邮件标记为"已读"。这对我来说是灾难性的,因为我只需要检查所有未读的电子邮件。

如何恢复有关哪些电子邮件未读的信息?我将每个邮件对象转储到文件中,我的代码核心如下所示:

m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") 
resp, items = m.uid('search', None, 'ALL')
uids = items[0].split() 
for uid in uids:
    resp, data = m.uid('fetch', uid, "(RFC822)") 
    email_body = data[0][1]
    mail = email.message_from_string(email_body)
    dumbobj(uid, mail)

我希望有一个选项可以在 gmail 中撤消此操作,或者存储的邮件对象中有一个反映可见状态信息的成员。

对于任何希望预防这种头痛的人,请考虑这个答案 这里.但是,这对我不起作用,因为损害已经造成。

编辑:我编写了以下函数来递归"grep"对象中的所有字符串,并使用以下关键字将其应用于转储的电子邮件对象:

regex = "(?i)((marked)|(seen)|(unread)|(read)|(flag)|(delivered)|(status)|(sate))"

到目前为止,没有结果(只有一个不相关的"交付给")。我还可以尝试哪些其他关键字?

def grep_object (obj, regex , cycle = set(), matched = set()):
    import re
    if id(obj) in cycle:
        return 
    cycle.update([id(obj)])
    if isinstance(obj, basestring):
        if re.search(regex, obj):
            matched.update([obj])
    def grep_dict (adict ):
        try:
             [  [ grep_object(a, regex, cycle, matched )  for a in ab ] for ab in adict.iteritems() ]
        except:pass
    grep_dict(obj)
    try:grep_dict(obj.__dict__)
    except:pass
    try:
        [ grep_object(elm, regex, cycle, matched ) for elm in obj ]
    except: pass
    return matched
grep_object(mail_object, regex)

我遇到了类似的问题(不是gmail),对我来说最大的问题是制作一个可重现的测试用例;我终于设法制作了一个(见下文)。

Seen标志而言,我现在收集它是这样的:

  • 如果邮件是新的/不可见的,则 IMAP 提取Seen标志将返回空(即它不会出现,与电子邮件相关)。
  • 如果在邮箱(收件箱)上选择IMAP,则会得到一个"标志"UNSEEN其中包含该文件夹中新电子邮件的ID(或uid)列表(没有Seen标志)
  • 在我的测试用例中,如果您获取带有 BODY.PEEK 的消息的标头,则不会设置消息上的Seen;如果您使用 BODY 获取它们,则设置Seen
  • 在我的测试用例中,获取(RFC822)也不会设置Seen(与Gmail的情况不同)

在测试用例中,我尝试执行pprint.pprint(inspect.getmembers(mail))(代替您的dumpobj(uid, mail))-但只有在我确定已设置Seen之后。我得到的输出以mail_object_inspect.txt发布 - 据我所知,没有一个可读字段中提到"新/读/看"等;此外,mail.as_string()打印:

'发件人: jesse@example.com收件人: user@example.com主题: 这是一条测试消息!\您好。我是贝尔斯登(Bear Stearns)董事的执行助理,这是一家失败的投资银行。 我有权访问 6,000,000 美元。...'

更糟糕的是,imaplib代码中任何地方都没有提到"字段"(如果文件名在任何地方不包含不区分大小写的"字段",则打印以下文件名):

$ grep -L -i field /usr/lib/python{2.7,3.2}/imaplib.py
/usr/lib/python2.7/imaplib.py
/usr/lib/python3.2/imaplib.py

。所以我想该信息没有与您的转储一起保存。


这里有一些关于重建测试用例的信息。最困难的是找到一个小型IMAP服务器,它可以与一些任意用户和电子邮件一起快速运行,但不必在您的系统上安装大量内容。最后我找到了一个:trivial-server.pl,Perl 的 Net::IMAP::Server 的示例文件;在 Ubuntu 11.04 上测试。

测试用例粘贴在此要点中,其中包含两个文件(带有许多注释),我将尝试发布删节:

  • trivial-serverB.pl - Perl (v5.10.1) Net::IMAP::Server服务器(在带有 telnet 客户端会话的文件末尾有一个终端输出粘贴)
  • testimap.py - Python 2.7/3.2 imaplib
    客户端(在文件末尾有一个终端输出粘贴,其本身与服务器一起运行)

trivial-serverB.pl

首先,确保您有Net::IMAP::Server - 请注意,它有许多依赖项,因此以下命令可能需要一段时间才能安装:

sudo perl -MCPAN -e 'install Net::IMAP::Server'

然后,在你得到trivial-serverB.pl的目录中,创建一个带有SSL证书的子目录:

mkdir certs
openssl req 
  -x509 -nodes -days 365 
  -subj '/C=US/ST=Oregon/L=Portland/CN=localhost' 
  -newkey rsa:1024 -keyout certs/server-key.pem -out certs/server-cert.pem

最后使用管理属性运行服务器:

sudo perl trivial-serverB.pl

请注意,trivial-serverB.pl有一个黑客,可以让客户端在没有 SSL 的情况下进行连接。这是trivial-serverB.pl

#!/usr/bin/perl
use v5.10.1;
use feature qw(say);
use Net::IMAP::Server;
package Demo::IMAP::Hack;
$INC{'Demo/IMAP/Hack.pm'} = 1;
sub capabilityb {
  my $self = shift;
  print STDERR "Capabilitin'n";
  my $base = $self->server->capability;
  my @words = split " ", $base;
  @words = grep {$_ ne "STARTTLS"} @words
    if $self->is_encrypted;
  unless ($self->auth) {
    my $auth = $self->auth || $self->server->auth_class->new;
    my @auth = $auth->sasl_provides;
    # hack:
    #unless ($self->is_encrypted) {
    #  # Lack of encrpytion makes us turn off all plaintext auth
    #  push @words, "LOGINDISABLED";
    #  @auth = grep {$_ ne "PLAIN"} @auth;
    #}
    push @words, map {"AUTH=$_"} @auth;
  }
  return join(" ", @words);
}
package Demo::IMAP::Auth;
$INC{'Demo/IMAP/Auth.pm'} = 1;
use base 'Net::IMAP::Server::DefaultAuth';
sub auth_plain {
    my ( $self, $user, $pass ) = @_;
    # XXX DO AUTH CHECK
    $self->user($user);
    return 1;
}
package Demo::IMAP::Model;
$INC{'Demo/IMAP/Model.pm'} = 1;
use base 'Net::IMAP::Server::DefaultModel';
sub init {
    my $self = shift;
    $self->root( Demo::IMAP::Mailbox->new() );
    $self->root->add_child( name => "INBOX" );
}
###########################################
package Demo::IMAP::Mailbox;
use base qw/Net::IMAP::Server::Mailbox/;
use Data::Dumper;
my $data = <<'EOF';
From: jesse@example.com
To: user@example.com
Subject: This is a test message!
Hello. I am executive assistant to the director of
Bear Stearns, a failed investment Bank.  I have
access to USD6,000,000. ...
EOF
my $msg = Net::IMAP::Server::Message->new($data);
sub load_data {
    my $self = shift;
    $self->add_message($msg);
}
my %ports = ( port => 143, ssl_port => 993 );
$ports{$_} *= 10 for grep {$> > 0} keys %ports;
$myserv = Net::IMAP::Server->new(
    auth_class  => "Demo::IMAP::Auth",
    model_class => "Demo::IMAP::Model",
    user        => 'nobody',
    log_level   => 3, # at least 3 to output 'CONNECT TCP Peer: ...' message; 4 to output IMAP commands too
    %ports,
);
# apparently, this overload MUST be after the new?! here:
{
no strict 'refs';
*Net::IMAP::Server::Connection::capability = &Demo::IMAP::Hack::capabilityb;
}
# https://stackoverflow.com/questions/27206371/printing-addresses-of-perl-object-methods
say " -", $myserv->can('validate'), " -", $myserv->can('capability'), " -", &Net::IMAP::Server::Connection::capability, " -", &Demo::IMAP::Hack::capabilityb;
$myserv->run();

testimap.py

使用上面的服务器在一个终端中运行,在另一个终端中,您可以执行以下操作:

python testimap.py

该代码将简单地从上面服务器呈现的一条(也是唯一的)消息中读取字段和内容,并最终恢复(删除)Seen字段。

import sys
if sys.version_info[0] < 3: # python 2.7
  def uttc(x):
    return x
else:                       # python 3+
  def uttc(x):
    return x.decode("utf-8")
import imaplib
import email
import pprint,inspect
imap_user = 'nobody'
imap_password = 'whatever'
imap_server = 'localhost'
conn = imaplib.IMAP4(imap_server)
conn.debug = 3
try:
  (retcode, capabilities) = conn.login(imap_user, imap_password)
except:
  print(sys.exc_info()[1])
  sys.exit(1)
# not conn.select(readonly=1), else we cannot modify the Seen flag later
conn.select() # Select inbox or default namespace
(retcode, messages) = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
  for num in uttc(messages[0]).split(' '):
    if not(num):
      print("No messages available: num is `{0}`!".format(num))
      break
    print('Processing message: {0}'.format(num))
    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))
    print('Peeking headers, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(BODY.PEEK[HEADER])')
    pprint.pprint(data)
    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))
    print('Get RFC822 body, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(RFC822)')
    mail = email.message_from_string(uttc(data[0][1]))
    #pprint.pprint(inspect.getmembers(mail))
    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. ['1 (FLAGS ())']
            "Seen" if isSeen else "NEW"))
    print('Get headers, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(BODY[HEADER])') # note, FLAGS (\Seen) is now in data, even if not explicitly requested!
    pprint.pprint(data)
    print('Get RFC822 body, message: {0} '.format(num))
    typ, data = conn.fetch(num,'(RFC822)')
    mail = email.message_from_string(uttc(data[0][1]))
    pprint.pprint(inspect.getmembers(mail)) # this is in mail_object_inspect.txt
    pprint.pprint(mail.as_string())
    typ, data = conn.fetch(num,'(FLAGS)')
    isSeen = ( "Seen" in uttc(data[0]) )
    print('Got flags: {2}: {0} .. {1}'.format(typ,data, # Seen: OK .. ['1 (FLAGS (\Seen))']
            "Seen" if isSeen else "NEW"))
    conn.select() # select again, to see flags server side
    # * OK [UNSEEN 0] # no more unseen messages (if there was only one msg in folder)
    print('Restoring flag to unseen/new, message: {0} '.format(num))
    ret, data = conn.store(num,'-FLAGS','\Seen')
    if ret == 'OK':
      print("Set back to unseen; Got OK: {0}{1}{2}".format(data,'n',30*'-'))
      print(mail)
      typ, data = conn.fetch(num,'(FLAGS)')
      isSeen = ( "Seen" in uttc(data[0]) )
      print('Got flags: {2}: {0} .. {1}'.format(typ,data, # NEW: OK .. [b'1 (FLAGS ())']
              "Seen" if isSeen else "NEW"))
conn.close()

引用

  • 如何在 Python 中模拟 IMAP 服务器,尽管极度懒惰?
  • 只获取新的电子邮件 imaplib 和 python
  • 撤消使用 imaplib 获取的电子邮件的"标记为已读"状态
  • http://www.skytale.net/blog/archives/23-Manual-IMAP.html
  • IMAP 提取主题
  • https://mail.python.org/pipermail/python-list/2009-March/527020.html
  • http://www.thecodingforums.com/threads/re-imaplib-fetch-message-flags.673872/

相关内容

最新更新