我是Python的新手,所以我在这方面有点吃力。基本上,下面的代码获取了带有比特币标签的推文文本,我想提取日期和作者以及文本。我尝试过不同的方法,但还是坚持了下来。非常感谢您的帮助。
import pandas as pd
import numpy as np
import tweepy
api_key = '*'
api_secret_key = '*'
access_token = '*'
access_token_secret = '*'
authentication = tweepy.OAuthHandler(consumer_key, consumer_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(authentication, wait_on_rate_limit=True)
#Get tweets about Bitcoin and filter out any retweets
search_term = '#bitcoin -filter:retweets'
tweets = tweepy.Cursor(api.search_tweets, q=search_term, lang='en', since='2018-11-01', tweet_mode='extended').items(50)
all_tweets = [tweet.full_text for tweet in tweets]
df = pd.DataFrame(all_tweets, columns=['Tweets'])
df.head()
如果使用dir(tweet)
,则会看到对象tweet
中的所有变量和函数
author
contributors
coordinates
created_at
destroy
display_text_range
entities
extended_entities
favorite
favorite_count
favorited
full_text
geo
id
id_str
in_reply_to_screen_name
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
is_quote_status
lang
metadata
parse
parse_list
place
possibly_sensitive
retweet
retweet_count
retweeted
retweets
source
source_url
truncated
user
还有created_at
all_tweets = []
for tweet in tweets:
#print('n'.join(dir(tweet)))
all_tweets.append( [tweet.full_text, tweet.created_at] )
df = pd.DataFrame(all_tweets, columns=['Tweets', 'Created At'])
df.head()
结果:
Tweets Created At
0 @Ralvero Of course $KAWA ready for 100x 🚀#ETH ... 2022-03-26 13:51:06+00:00
1 Pairs:1INCHUSDT n SELL:1.58500n Time :3/26/2... 2022-03-26 13:51:06+00:00
2 @hotcrosscom @iSafePal 🌐 First LIVE Dapp: Cylu... 2022-03-26 13:51:04+00:00
3 @Justdoitalex @Isabel_Schnabel Finally a truth... 2022-03-26 13:51:03+00:00
4 #Bitcoin has rejected for the fourth time the ... 2022-03-26 13:50:55+00:00
但是您的代码有since
的问题,因为它似乎在3.8版中被删除了
请参阅:在Tweepy中收集特定时间段内的推文,直到不起作用