我试图在元组[(company_name, symbol)]
列表中获取每个公司的价格数据。在这个例子中,我使用的是TD Ameritrade API。
同样,我在Reddit上也有同样的问题。唯一的区别是我试图检索每个帖子'id'的所有评论。但是,与Reddit代码我从熊猫df而不是列表拉ID。
这是我现在在哪里:
-
这是TD Ameritrade API
async def run_app(symbols): # empty list to append dataframes to all_dfs = [] for sym,name in symbols: # gets the price history data rdata = asyncio.get_price_data(symbol=sym, period='10', periodType='day', frequency='minute', frequencyType='1') # prepares data for panda df can = unpack_ph_data(rdata, 'candles', 'symbol') # function for creating a panda df df = create_ph_df(can) # function for creating a panda df # append to all_dfs all_dfs.append(df) return rdata
我的想法是使用for语句,它将为运行每个步骤符号列表中的每个项目。首先,我尝试没有
asyncio
,然后我看到一个类似的例子,但没有API,所以我想我会尝试它。 Reddit的 :
我正在尝试使用
praw
包类似的东西。但是对于这个,我从pandas df中的每一行提取数据,并遇到了同样的问题。我有一个函数,它获得一个指定的subreddit并返回所有的数据在熊猫df:
def get_subreddit_data(subreddit="all", limit=25): """ :param subreddit: Which subreddit to get top posts. :param limit: number of desired posts (This will be for setting the limit after.hot(limit=num_of_posts)and eventually determined from user input. Default 25 :returns: top posts in subreddit or default (top posts on reddit). data is returned in pandas df with the following columns: >>> title, score, id, subreddit, url, comments, selftext, created <<< """ # empty list to insert data to: posts = [] # variable for data top_posts = reddit.subreddit(subreddit).hot(limit=limit) # limit and subreddit params # FOR loop to append data to posts for post in top_posts: posts.append([post.title, post.score, post.id, post.subreddit, post.url, post.num_comments, post.selftext, post.created]) # Create df df = pd.DataFrame(posts, columns=["title","score", "id", "subreddit", "url", "comments","selftext", "created"]) return df
df = get_reddit_subreddit(subreddit, limit)
工作正常,返回熊猫df这就是我遇到问题的地方:
IDs = [] for ID in [df["id"]]: IDs.append(ID) # Add IDs to ID list def return_comments_for(ID_list): # empty list to append comments to _comments = [] """ :param ID_list: list of post IDs :returns: list of comments for each post ID in ID_list """ # for loop to extract each ID one by one for ID in ID_list: # Create submission instance submission = reddit.submission(id=ID) submission.comments.replace_more(limit=None) for comment in submission.comments.list(): _comments.append(comment.body) comments = return_comments_for(IDs)
没有工作,所以我尝试不创建一个函数,并使用队列:
# Empty list for all IDs queue = [] IDs = [df["id"]] # get IDs from DF for i in IDs: queue.append(i) # Add IDs to queue # list to append comments to _comments = [] while queue: # pop item index 0 and assign to ID ID = queue.pop(0) # create submission instance for ID submission = reddit.submission(id=ID) submission.comments.replace_more(limit=None) # for each comment in submission instance for comment in submission.comments.list(): _comments.append(comment.body) # append to main comment list
这不是我尝试使用队列堆栈的唯一方法。我试过很多不同的方法,就是记不住。但不管怎样,它们都不起作用,所以我错过了一些东西。
这是我每次得到的全部错误消息。不管我怎么努力。
ValueError Traceback (most recent call last) <ipython-input-9-2cddf98f54ba> in <module> 20 21 ---> 22 comments = return_comments_for(IDs) 23 print(comments) <ipython-input-9-2cddf98f54ba> in return_comments_for(ID_list) 14 for ID in ID_list: 15 # Create submission instance ---> 16 submission = reddit.submission(id=ID) 17 submission.comments.replace_more(limit=None) 18 for comment in submission.comments.list(): C:ProgramDataAnaconda3libsite-packagesprawreddit.py in submission(self, id, url) 847 848 """ --> 849 return models.Submission(self, id=id, url=url) C:ProgramDataAnaconda3libsite-packagesprawmodelsredditsubmission.py in __init__(self, reddit, id, url, _data) 532 533 """ --> 534 if (id, url, _data).count(None) != 2: 535 raise TypeError("Exactly one of `id`, `url`, or `_data` must be provided.") 536 self.comment_limit = 2048 C:ProgramDataAnaconda3libsite-packagespandascoregeneric.py in __nonzero__(self) 1476 1477 def __nonzero__(self): -> 1478 raise ValueError( 1479 f"The truth value of a {type(self).__name__} is ambiguous. " 1480 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
After
IDs = []
for ID in [df["id"]]:
IDs.append(ID) # Add IDs to ID list
IDS是不是int的列表,而是pandas的列表。只包含一个级数,即df["id"]
。
# This does what you were trying to do:
IDs = []
for ID in df["id"]:
IDs.append(ID)
# Which can be shortened to
IDs = list(df["id"])
# But I think just passing the Series to your function, should work fine:
comments = return_comments_for(df["id"])
真正的bug是comments
在此之后将变成None
,因为return_comments_for
不返回任何东西,所以它将隐式返回None
。
您试图提取id列表的方式是错误的,只需执行以下操作:
IDs = df["id"].values.tolist()