将CSV循环添加到Python脚本



我确信这是一条非常基本的python,但我已经筋疲力尽了。

我拼凑了一个适用于单个输入的脚本,并希望通过给它提供一个YouTube视频ID的csv来循环,从而将其提升到下一个级别。

我知道我的代码一定很乱,我随意地混合使用单引号和双引号,所以任何清理的帮助都将不胜感激。

我的csv被称为"url.csv",其中一列"url"包含YouTube视频ID列表。

url
1whRd_c_irk
prlK8iY7blk
SnFaRXeep5Y

我如何让它一个接一个地处理这些,本质上用循环代替videoid = "RvCBzhhydNk"

import re
import requests
import mechanize
from bs4 import BeautifulSoup
from youtube_transcript_api import YouTubeTranscriptApi
videoid = "RvCBzhhydNk"
#DATE
source = requests.get('https://www.youtube.com/watch?v=' + videoid).text
soup = BeautifulSoup(source, features="html.parser")
published = soup.find("meta", attrs={'itemprop': 'datePublished'})
#VIDEO TITLE and CLEAN-UP
br = mechanize.Browser()
br.open('https://www.youtube.com/watch?v=' + videoid)
title = re.sub('[^A-Za-z0-9]+', ' ', br.title().replace("YouTube", "")).strip()
#TRANSCRIPT
outlines = []
transcript = YouTubeTranscriptApi.get_transcript(videoid)
for i in transcript:
outtext = (i['text'])
outlines.append(outtext)
out = outtext.replace(" so ", "nnSo ")
#CREATE TEXT FILE

with open((published["content"]) + " " + (title) + ".txt", "a") as opf:
opf.write(out + " ")

这是我的解决方案。要是我早点在谷歌上搜索就好了。谢谢你的忠告。

这是受我在Github上发现的一个YouTube成绩单抓取器的启发,但它很难正常工作。

我用了美味的汤,因为我发现它比硒快得多。删除了一些功能,添加了一些,最终使其按要求工作。

filename = 'url.csv'
colname = 'url'
delimiter = 't'
breakword = 'however'
prefix = 'transcript_'
import re
import csv
import requests
import mechanize
from bs4 import BeautifulSoup
from youtube_transcript_api import YouTubeTranscriptApi
def gettranscript(videoid):
try:
#DATE
source = requests.get('https://www.youtube.com/watch?v=' + videoid).text
soup = BeautifulSoup(source, features="html.parser")
published = soup.find("meta", attrs={'itemprop': 'datePublished'})
#VIDEO TITLE and CLEAN-UP
br = mechanize.Browser()
br.open('https://www.youtube.com/watch?v=' + videoid)
title = re.sub('[^A-Za-z0-9]+', ' ', br.title().replace("YouTube", "")).strip()
#TRANSCRIPT
outlines = []
transcript = YouTubeTranscriptApi.get_transcript(videoid)
for i in transcript:
outtext = (i['text'])
outlines.append(outtext)
out = outtext.replace(" " + breakword + " ", "nn" + breakword + " ")
#CREATE TEXT FILE
with open(prefix + (published["content"]) + " " + (title) + ".txt", "a") as opf:
opf.write(out + " ")
except:
pass
#READ CSV
csvread = open(filename, newline='n')
csvreader = csv.DictReader(csvread, delimiter=delimiter, quoting=csv.QUOTE_NONE)
rowcount = len(open(filename).readlines())
for row in csvreader:
msg = gettranscript(row[colname])

最新更新