我有一个通过流API提取Twitter数据的Python代码。我想每天使用单独的文件,所以我想让脚本运行24小时,然后杀死它并重新启动它,因为重新启动程序时文件的名称会改变。
如何确保脚本在00:00停止并立即重新启动?代码可以在下面找到。如果你对我如何每天创建一个新的文本文件有任何其他的想法,这将是更好的。
import tweepy
import datetime
key_words = ["xx"]
twitter_data_title = "".join([xx, "_", date_today, ".txt"])
class TwitterStreamer():
def __init__(self):
pass
def stream_tweets(self, twitter_data_title, key_words):
listener = StreamListener(twitter_data_title)
auth = tweepy.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_secret_token)
stream = tweepy.Stream(auth, listener)
stream.filter(track=key_words)
class StreamListener(tweepy.StreamListener):
def __init__(self, twitter_data_title):
self.fetched_tweets_filename = twitter_data_title
def on_data(self, data):
try:
print(data)
with open(self.fetched_tweets_filename, 'a') as tf:
tf.write(data)
return True
except BaseException as e:
print("Error on_data %s" % str(e))
return True
def on_exception(self, exception):
print('exception', exception)
stream_tweets(twitter_data_title, key_words)
def on_error(self, status):
print(status)
def stream_tweets(twitter_data_title, key_words):
listener = StreamListener(twitter_data_title)
auth = tweepy.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_secret_token)
stream = tweepy.Stream(auth, listener)
stream.filter(track=key_words)
if __name__ == '__main__':
twitter_streamer = TwitterStreamer()
twitter_streamer.stream_tweets(twitter_data_title, key_words)
看起来示例中的'阻塞'代码来自另一个库,因此您没有机会(轻松地)更改内部循环以检查条件并退出。
使用后台进程(不理想)
你可以改变你的入口点,在后台进程中启动代码,并检查文件的标题是否应该改变:
from multiprocessing import Process
from time import sleep
...
if __name__ == "__main__":
twitter_streamer = TwitterStreamer()
twitter_data_title, process = None, None
while True:
new_data_title = "".join([xx, "_", str(datetime.date.today()), ".txt"])
if new_data_title == twitter_data_title: # Nothing to do.
sleep(60) # Sleep for a minute
continue # And check again
# Set the new title.
twitter_data_title = new_data_title
# If the process is already running, terminate and join it.
if process is not None:
process.terminate()
process.join()
process = Process(target=twitter_streamer.stream_tweets, args=[twitter_data_title, key_words])
process.start()
改变StreamListener
一个更好的选择可能是将日期知识编码为StreamListener
。不是传递文件名(twitter_data_title
),而是传递文件前缀(您的示例中的xx
),并在属性中构建文件名:
...
class StreamListener(tweepy.StreamListener):
def __init__(self, file_prefix):
self.prefix = file_prefix
@property
def fetched_tweets_filename(self):
"""The file name for the tweets."""
date = datetime.date.today()
return f"{self.prefix}_{date}.txt"
...
...
if __name__ == "__main__":
twitter_streamer = TwitterStreamer()
twitter_streamer.stream_tweets(xx, key_words)
由于StreamListener.on_data
从self.fetched_tweets_filename
获取文件名,这应该意味着当日期更改时,推文将被写入新文件。
我会把这段添加到你的代码中:
from threading import Timer
def stopTheScript():
exec(open("anotherscript.py").read())
exit()
Timer(86400, stopTheScript).start() #86400 s = 24 h