我想将我的网址存储在变量名称中,"url"将网址保存在 excel 工作表 csv 中



我想把我的url存储在变量名中"url";将url保存在excel工作表CSV中,但给我unboundloclerror本地变量"url",该变量在赋值前引用。

class NewsSpider(scratchy.Spider(:name=";文章";

def start_requests(self):
url = input("Enter the article url: ")

yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
url = url
yield{
'Category':Category,
'Headlines':Headlines,
'Author': Author,
'Source': Source,
'Publication Date': Published_Date,
'Feature_Image': Feature_Image,
'Skift Take': skift_take,
'Article Content': Content
}
# =============== Data Store +++++++++++++++++++++
Data = [[Category,Headlines,Author,Source,Published_Date,Feature_Image,Content,url]]
try:
df = pd.DataFrame (Data, columns = ['Category','Headlines','Author','Source','Published_Date','Feature_Image','Content','URL'])
print(df)
with open('C:/Users/Public/pagedata.csv', 'a') as f:
df.to_csv(f, header=False)
except:
df = pd.DataFrame (Data, columns = ['Category','Headlines','Author','Source','Published_Date','Feature_Image','Content','URL'])
print(df)
df.to_csv('C:/Users/Public/pagedata.csv', mode='a')

您可以只调用response.url而不是url=url

url = response.url

#或

def parse_dir_contents(self, response):

yield{
'Category':Category,
'Headlines':Headlines,
'Author': Author,
'Source': Source,
'Publication Date': Published_Date,
'Feature_Image': Feature_Image,
'Skift Take': skift_take,
'Article Content': Content,
'url': response.url
}

最新更新