我正在从PDF中提取文本到字符串文本:
text = "● A justification of your prediction, including the following information that helped formnno Angle of the sun relative to the surface on September 22, 2021no Materials of the surface (include three materials) and heat absorptionnncharacteristicsnno Length of exposure of the surface to the sun (i.e., the amount of time the surfacennhas had to warm on that day), including slopes of the stadium and a considerationnof the angles of the seatsnn1 Yes, I know that’s a Wednesday but just go with it…nnx0c● Sources: Be sure to include in-text citations as appropriate as well as provide a list ofnnsources that were used for your report, use MLA or APA citation stylenn● Your report can assume any format you chose, and should be between 300-400 words innnlengthnnResources:nn"
我想把这段文字分割成"x0c"我尝试了re.split(r'[x0c]+', text),但这只是删除了"x0c",它不分裂。同样,text.splitlines()也没有达到这个目的。
我错过了什么?
怎么了
text.split("x0c")
?这给了我一个包含两个元素的列表,它看起来就像你在这里想要的。
如果需要,可以进一步按行分割:
sections = [x.split("n") for x in text.split("x0c")]
可能有一个更干净的方法,但这将是我的方法:
splittext = text.split('x0c')
splittext[0] += 'x0c'
string1 = splittext[0]
string2 = splittext[1]