如何使用Graph API和用户令牌刮擦Facebook数据



我正在尝试刮擦公共页面的Facebook数据。

我使用的代码几个月(也许是10个月前)的工作正常。现在,当我想继续该项目时,但是代码不再起作用。我曾经使用我的私人用户令牌,几分钟后就到了。但这对我的用例就足够了。我不需要应用程序,也不需要应用程序评论等来获得永久令牌。

这是代码:

def getData(page, urlToConnect, startTime, filterStart, filterEnd):
    posts = []
    found = False
    try:
        while (True):
            #print(url)
            facebook_connection = urlopen(urlToConnect)
            data = facebook_connection.read().decode('utf8')
            json_object = json.loads(data)
            #posts=json_object["data"]
            allposts=json_object["data"]
            allposts = np.asarray(allposts)
            created = startTime
            for i in range(0,100,1):
                if (pd.to_datetime(allposts[i]['created_time']) > pd.to_datetime(created)):
                    posts.append(allposts[i])
                else:
                    print(" found date at this index: ", i)
                    posts.append(allposts[i])
                    found = True
                    break;
                if (i == 99):
                    urlToConnect = json_object["paging"]["next"]
            if (found == True):
                break; 

        df=pd.DataFrame(allposts)
        df['Angry'] = df['Angry'].astype(str).str.replace('{'data':(.*?)count': ','')
        df['Angry'] = df['Angry'].str.replace(',(.*?)}}','')
        df['Haha'] = df['Haha'].astype(str).str.replace('{'data':(.*?)count': ','')
        df['Haha'] = df['Haha'].str.replace('}}','')
        df['Love'] = df['Love'].astype(str).str.replace('{'data':(.*?)count': ','')
        df['Love'] = df['Love'].str.replace('}}','')
        df['Sad'] = df['Sad'].astype(str).str.replace('{'data':(.*?)count': ','')
        df['Sad'] = df['Sad'].str.replace(',(.*?)}}','')
        df['Wow'] = df['Wow'].astype(str).str.replace('{'data':(.*?)count': ','')
        df['Wow'] = df['Wow'].str.replace('}}','')
        df['comments'] = df['comments'].astype(str).str.replace('{'data':(.*?)count': ','')
        df['comments'] = df['comments'].str.replace(',(.*?)}}','')
        df['likes'] = df['likes'].astype(str).str.replace('{'(.*?)count':','')
        df['likes'] = df['likes'].str.replace(',(.*?)}}','')
        df['shares'] = df['shares'].astype(str).str.replace('{'count': ','')
        df['shares'] = df['shares'].str.replace('}','')
        df['date'], df['time'] = df['created_time'].astype(str).str.split('T', 1).str
        df['time'] = df['time'].str.replace('[+]0000','')
        # Convert NaN's to 0 (as string)
        df['shares'] = df['shares'].str.replace('nan','0')
        df['shares'] = df['shares'].str.replace('Nan','0')
        df['shares'] = df['shares'].str.replace('NaN','0')
        # Convert Series values from str to int
        df['shares'] = df['shares'].astype(int)
        df['likes'] = df['likes'].astype(int)
        df['comments'] = df['comments'].astype(int)
        df['Love'] = df['Love'].astype(int)
        df['Wow'] = df['Wow'].astype(int)
        df['Sad'] = df['Sad'].astype(int)
        df['Angry'] = df['Angry'].astype(int)
        df['Haha'] = df['Haha'].astype(int)

        # Sum over all number columns of one row
        col_list= list(df)
        df['total_reac'] = df[col_list].sum(axis=1)
        # Sort values by 'total_reac' column, descending
        df = df.sort_values(by='total_reac', ascending=False)
        # Convert column from str to datetime
        df['created_time'] = pd.to_datetime(df['created_time'])
        # Filter for dates needed
        df = df[(df['created_time'] > fStart) & (df['created_time'] <= fEnd)]

        # Save Dataframe as csv
        df.to_csv("Facebook_Posts_" + page + ".csv" )

    except Exception as ex:
        print (ex)
    return df

token="my_User__Token_Here (got from my personal  https://developers.facebook.com/tools/explorer)"
sTime = '2018-05-01'
fStart = '2018-05-01'
fEnd = '2018-05-29'

page_id="nytimes"
url="https://graph.facebook.com/3.2/"+page_id+"/posts/?fields=id,created_time,message,shares.summary(true).limit(0),comments.summary(true).limit(0),likes.summary(true),reactions.type(LOVE).limit(0).summary(total_count).as(Love),reactions.type(WOW).limit(0).summary(total_count).as(Wow),reactions.type(HAHA).limit(0).summary(total_count).as(Haha),reactions.type(SAD).limit(0).summary(1).as(Sad),reactions.type(ANGRY).limit(0).summary(1).as(Angry)&access_token="+token+"&limit=100"
dataNYT = getData(page_id, url, sTime, fStart, fEnd)

dataNYT.to_csv("NYT_posts.csv")

这是我现在遇到的错误:

HTTP Error 400: Bad Request

,当我尝试在浏览器中键入请求的URL时,此错误出现:

{
   "error": {
      "message": "Unknown path components: /nytimes/posts",
      "type": "OAuthException",
      "code": 2500,
      "fbtrace_id": "HsN9zi+byTD"
   }
}

有人有一个主意吗?

不确定为什么您会遇到该错误,当我尝试API在API Explorer中调用时,我得到了正确的一个:

{
  "error": {
    "message": "(#10) To use 'Page Public Content Access', your use of this endpoint must be reviewed and approved by Facebook. To submit this 'Page Public Content Access' feature for review please read our documentation on reviewable features: https://developers.facebook.com/docs/apps/review.",
    "type": "OAuthException",
    "code": 10,
    "fbtrace_id": "AZJ2HjKFmkW"
  }
}

您确实需要一个应用程序,并且确实需要应用程序评论。为了访问您不拥有的页面,您必须获得Facebook批准的"页面公共内容访问"。之后,您甚至可以使用永无止语的应用程序访问令牌。但是您仍然需要一个应用程序,以始终进行任何API访问。

更多信息:https://developers.facebook.com/docs/apps/review/feature/?locale = de_de_de_de_de#reference-pages_access_access

最新更新