如何对List中的每个Item执行API调用



我试图在元组[(company_name, symbol)]列表中获取每个公司的价格数据。在这个例子中,我使用的是TD Ameritrade API。

同样,我在Reddit上也有同样的问题。唯一的区别是我试图检索每个帖子'id'的所有评论。但是,与Reddit代码我从熊猫df而不是列表拉ID。

这是我现在在哪里:

  • 这是TD Ameritrade API

    async def run_app(symbols):
    # empty list to append dataframes to
    all_dfs = []
    for sym,name in symbols:
    # gets the price history data 
    rdata = asyncio.get_price_data(symbol=sym, period='10', periodType='day', frequency='minute', frequencyType='1')
    # prepares data for panda df
    can = unpack_ph_data(rdata, 'candles', 'symbol') 
    # function for creating a panda df
    df = create_ph_df(can) # function for creating a panda df
    # append to all_dfs
    all_dfs.append(df)
    return rdata
    

    我的想法是使用for语句,它将为运行每个步骤符号列表中的每个项目。首先,我尝试没有asyncio,然后我看到一个类似的例子,但没有API,所以我想我会尝试它。

  • Reddit的
  • :

    我正在尝试使用praw包类似的东西。但是对于这个,我从pandas df中的每一行提取数据,并遇到了同样的问题。

    我有一个函数,它获得一个指定的subreddit并返回所有的数据在熊猫df:

    def get_subreddit_data(subreddit="all", limit=25): 
    """
    :param subreddit: Which subreddit to get top posts. 
    :param limit: number of desired posts (This will be for setting the limit 
    after.hot(limit=num_of_posts)and eventually determined from user input. 
    Default 25
    :returns: top posts in subreddit or default (top posts on reddit).
    data is returned in pandas df with the following columns: 
    >>> title, score, id, subreddit, url, comments, selftext, created <<<
    """
    # empty list to insert data to: 
    posts = []
    # variable for data 
    top_posts = reddit.subreddit(subreddit).hot(limit=limit) # limit and subreddit params
    # FOR loop to append data to posts 
    for post in top_posts:
    posts.append([post.title, post.score, post.id, post.subreddit, post.url, post.num_comments, post.selftext, post.created])
    # Create df 
    df = pd.DataFrame(posts, columns=["title","score", "id", "subreddit", "url", "comments","selftext", "created"])
    return df
    

    df = get_reddit_subreddit(subreddit, limit)工作正常,返回熊猫df

    这就是我遇到问题的地方:

    IDs = []
    for ID in [df["id"]]:
    IDs.append(ID) # Add IDs to ID list
    def return_comments_for(ID_list):
    # empty list to append comments to
    _comments = []
    """
    :param ID_list: list of post IDs 
    :returns: list of comments for each post ID 
    in ID_list
    """
    # for loop to extract each ID one by one
    for ID in ID_list:
    # Create submission instance
    submission = reddit.submission(id=ID)
    submission.comments.replace_more(limit=None)
    for comment in submission.comments.list():
    _comments.append(comment.body)
    comments = return_comments_for(IDs)
    

    没有工作,所以我尝试不创建一个函数,并使用队列:

    # Empty list for all IDs
    queue = [] 
    IDs = [df["id"]] # get IDs from DF
    for i in IDs:
    queue.append(i) # Add IDs to queue 
    # list to append comments to
    _comments = []
    while queue:
    # pop item index 0 and assign to ID
    ID = queue.pop(0)
    # create submission instance for ID
    submission = reddit.submission(id=ID)
    submission.comments.replace_more(limit=None)
    # for each comment in submission instance 
    for comment in submission.comments.list():
    _comments.append(comment.body) # append to main comment list
    

    这不是我尝试使用队列堆栈的唯一方法。我试过很多不同的方法,就是记不住。但不管怎样,它们都不起作用,所以我错过了一些东西。

    这是我每次得到的全部错误消息。不管我怎么努力。

    ValueError                                Traceback (most recent call last)
    <ipython-input-9-2cddf98f54ba> in <module>
    20 
    21 
    ---> 22 comments = return_comments_for(IDs)
    23 print(comments)
    <ipython-input-9-2cddf98f54ba> in return_comments_for(ID_list)
    14     for ID in ID_list:
    15         # Create submission instance
    ---> 16         submission = reddit.submission(id=ID)
    17         submission.comments.replace_more(limit=None)
    18         for comment in submission.comments.list():
    C:ProgramDataAnaconda3libsite-packagesprawreddit.py in submission(self, id, url)
    847 
    848         """
    --> 849         return models.Submission(self, id=id, url=url)
    C:ProgramDataAnaconda3libsite-packagesprawmodelsredditsubmission.py in __init__(self, reddit, id, url, _data)
    532 
    533         """
    --> 534         if (id, url, _data).count(None) != 2:
    535             raise TypeError("Exactly one of `id`, `url`, or `_data` must be provided.")
    536         self.comment_limit = 2048
    C:ProgramDataAnaconda3libsite-packagespandascoregeneric.py in __nonzero__(self)
    1476 
    1477     def __nonzero__(self):
    -> 1478         raise ValueError(
    1479             f"The truth value of a {type(self).__name__} is ambiguous. "
    1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    

After

IDs = []
for ID in [df["id"]]:
IDs.append(ID) # Add IDs to ID list

IDS是不是int的列表,而是pandas的列表。只包含一个级数,即df["id"]

# This does what you were trying to do:
IDs = []
for ID in df["id"]:
IDs.append(ID)

# Which can be shortened to
IDs = list(df["id"])
# But I think just passing the Series to your function, should work fine:
comments = return_comments_for(df["id"])

真正的bug是comments在此之后将变成None,因为return_comments_for不返回任何东西,所以它将隐式返回None

您试图提取id列表的方式是错误的,只需执行以下操作:

IDs = df["id"].values.tolist()

相关内容

最新更新