使用 Python Beautifulsoup 从复杂的 html 标签中获取数据



我有以下HTML数据:

<div class="display-info">
<div class="record-icon pubtype"><span class="pubtype-icon pt-academicJournal" title="Academic Journal"> </span>
<p class="caption">Academic Journal</p>
</div>By: Stein, Mark. <strong>Organization Studies</strong>. 2007, Vol. 28 Issue 8, p1223-1241. 19p. Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further <strong>contagion</strong> of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions. [ABSTRACT FROM AUTHOR] DOI: 10.1177/0170840607079527. (<cite>AN: 26198405</cite>)
<p class="subjectResults"><strong>Subjects:
</strong>Industrial relations; Personnel management; Customer relations; Corporate image; Public relations; Consumer behavior; Sales personnel; Administration of Human Resource Programs (except Education, Public Health, and Veterans' Affairs Programs); Human Resources Consulting Services; Public Relations Agencies; Psychoanalysis; Social interaction</p><span class="record-additional"><span class="item add-to-folder"><a class="folder-toggle item-not-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="true" data-itemid="50" href="#" id="add_50" name="addToFolder" title="To print, e-mail, or save multiple items">Add to folder</a> <a class="folder-toggle item-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="false" data-itemid="50" href="#" id="added_50" style="display: none;" title="Remove result from folder">Remove from folder</a></span><span class="result-list-cite-ref-label"><a data-title="Cited References" href="javascript:__doLinkPostBack('','sl~~ref||su~~50','_top');" id="references50" title="Cited References">Cited References: (92) </a></span><span class="result-list-cite-link"><a data-title="Times Cited in this Database" href="javascript:__doLinkPostBack('','sl~~cit||su~~50','_top');" id="citations50" title="Times Cited in this Database">Times Cited in this Database: (20) </a></span> </span>
<div class="record-formats-wrapper externalLinks"><span><span class="custom-link"><a class="ils-link" href="/ehost/SmartLink/OpenIlsLink?sid=42487fcc-c655-469f-b8ed-2802260b3983@sessionmgr102&amp;vid=15&amp;sl=smartlink&amp;st=ilslink_new&amp;sv=sdbn%253Dbth%2526pbt%253DAcademic%2520Journal%2526issn%253D01708406%2526ttl%253DOrganization%252520Studies%2526stp%253DC%2526asi%253DY%2526ldc%253DCheck%252520full%252520text%252520availability%2526lna%253DFull%252520Text%252520Finder%252520%25252D%252520INSEAD%2526lca%253DfullText%2526lo%255Fan%253D26198405&amp;su=http%3A%2F%2Fresolver%2Eebscohost%2Ecom%2Fopenurl%3Fcustid%3Ds8362180%26group%3Dmain%26authtype%3Dip%2Cuid%26sid%3DEBSCO%3Abth%26genre%3Darticle%26issn%3D01708406%26ISBN%3D%26volume%3D28%26issue%3D8%26date%3D20070801%26spage%3D1223%26pages%3D1223%2D1241%26title%3DOrganization%20Studies%26atitle%3DToxicity%2520and%2520the%2520Unconscious%2520Experience%2520of%2520the%2520Body%2520at%2520the%2520Employee%2D%2DCustomer%2520Interface%2E%26aulast%3DStein%252C%2520Mark%26id%3DDOI%3A10%2E1177%2F0170840607079527" id="linkILSLink50_1" onblur="self.status='';return true" onfocus="self.status='check full text availability.';return true" onmouseout="self.status='';return true" onmouseover="self.status='check full text availability.';return true" target="_new" title="check full text availability."><img align="middle" alt="check full text availability." border="0" class="icon-image" data-defer-image="https://s3.amazonaws.com/libapps/customers/2023/images/logo-INSEAD_blanc-sur-vert_250.jpg" id="imgILSLink50_1" src="https://if.ebsco-content.com/interfacefiles/17.232.0.2749/blank.gif"/>Check full text availability</a></span></span>
</div>
</div>

我需要By: Stein, Mark.Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further <strong>contagion</strong> of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions.

有了soup.select(".display-info")[0].text我得到

Academic JournalBy: Stein, Mark. Organization Studies. 2007, Vol. 28 Issue 8, p1223-1241. 19p. Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further contagion of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions. [ABSTRACT FROM AUTHOR] DOI: 10.1177/0170840607079527. (AN: 26198405)Subjects:
Industrial relations; Personnel management; Customer relations; Corporate image; Public relations; Consumer behavior; Sales personnel; Administration of Human Resource Programs (except Education, Public Health, and Veterans' Affairs Programs); Human Resources Consulting Services; Public Relations Agencies; Psychoanalysis; Social interactionAdd to folder Remove from folderCited References: (92) Times Cited in this Database: (20)  Check full text availability 

对于此任务,最好同时使用rebs4

如果变量txt包含问题中的 HTML 文本,则此脚本:

import re
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
txt = soup.select_one('.display-info').get_text(strip=True, separator='n')
author = re.findall(r'By:.*', txt)[0]
abstract = re.findall(r'Abstract:.*?(?=[ABSTRACT FROM AUTHOR])', txt, flags=re.S)[0]
from textwrap import wrap
print(author)
print(*wrap(abstract.replace('n', ' ')), sep='n')
# or in case Python2 just:
# print author
# print abstract

指纹:

By: Stein, Mark.
Abstract: While the literature on front-line service work utilizes a
variety of productive images, I argue that these images do not capture
certain of the more problematic experiences of front-line service
employees. Drawing on words used by these workers themselves, and
using concepts from psychoanalysis and its application to
organizational dynamics, I therefore propose a new image, that of
toxicity. I argue that — especially when under severe pressure from
customers — front-line workers may have the unconscious fantasy that
they have been polluted by toxic substances. The unconscious
experience of the entry of toxic material is likely to result in
further contagion of relationships such as those among employees and
between employees and customers. This may also result in workers
retaliating against customers by exacting revenge on them. A downward
spiralling of relationships may follow, with the result that large
parts of the work environment are experienced as toxic. The
implications for theory are explored. In conclusion, I argue that the
theme of toxicity helps us connect the employee-customer interface
with a deep reservoir of primordial human experience that links the
body with emotions.

使用以下正则表达式。

from bs4 import BeautifulSoup
import re
html='''<div class="display-info">
<div class="record-icon pubtype"><span class="pubtype-icon pt-academicJournal" title="Academic Journal"> </span>
<p class="caption">Academic Journal</p>
</div>By: Stein, Mark. <strong>Organization Studies</strong>. 2007, Vol. 28 Issue 8, p1223-1241. 19p. Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further <strong>contagion</strong> of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions. [ABSTRACT FROM AUTHOR] DOI: 10.1177/0170840607079527. (<cite>AN: 26198405</cite>)
<p class="subjectResults"><strong>Subjects:
</strong>Industrial relations; Personnel management; Customer relations; Corporate image; Public relations; Consumer behavior; Sales personnel; Administration of Human Resource Programs (except Education, Public Health, and Veterans' Affairs Programs); Human Resources Consulting Services; Public Relations Agencies; Psychoanalysis; Social interaction</p><span class="record-additional"><span class="item add-to-folder"><a class="folder-toggle item-not-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="true" data-itemid="50" href="#" id="add_50" name="addToFolder" title="To print, e-mail, or save multiple items">Add to folder</a> <a class="folder-toggle item-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="false" data-itemid="50" href="#" id="added_50" style="display: none;" title="Remove result from folder">Remove from folder</a></span><span class="result-list-cite-ref-label"><a data-title="Cited References" href="javascript:__doLinkPostBack('','sl~~ref||su~~50','_top');" id="references50" title="Cited References">Cited References: (92) </a></span><span class="result-list-cite-link"><a data-title="Times Cited in this Database" href="javascript:__doLinkPostBack('','sl~~cit||su~~50','_top');" id="citations50" title="Times Cited in this Database">Times Cited in this Database: (20) </a></span> </span>
<div class="record-formats-wrapper externalLinks"><span><span class="custom-link"><a class="ils-link" href="/ehost/SmartLink/OpenIlsLink?sid=42487fcc-c655-469f-b8ed-2802260b3983@sessionmgr102&amp;vid=15&amp;sl=smartlink&amp;st=ilslink_new&amp;sv=sdbn%253Dbth%2526pbt%253DAcademic%2520Journal%2526issn%253D01708406%2526ttl%253DOrganization%252520Studies%2526stp%253DC%2526asi%253DY%2526ldc%253DCheck%252520full%252520text%252520availability%2526lna%253DFull%252520Text%252520Finder%252520%25252D%252520INSEAD%2526lca%253DfullText%2526lo%255Fan%253D26198405&amp;su=http%3A%2F%2Fresolver%2Eebscohost%2Ecom%2Fopenurl%3Fcustid%3Ds8362180%26group%3Dmain%26authtype%3Dip%2Cuid%26sid%3DEBSCO%3Abth%26genre%3Darticle%26issn%3D01708406%26ISBN%3D%26volume%3D28%26issue%3D8%26date%3D20070801%26spage%3D1223%26pages%3D1223%2D1241%26title%3DOrganization%20Studies%26atitle%3DToxicity%2520and%2520the%2520Unconscious%2520Experience%2520of%2520the%2520Body%2520at%2520the%2520Employee%2D%2DCustomer%2520Interface%2E%26aulast%3DStein%252C%2520Mark%26id%3DDOI%3A10%2E1177%2F0170840607079527" id="linkILSLink50_1" onblur="self.status='';return true" onfocus="self.status='check full text availability.';return true" onmouseout="self.status='';return true" onmouseover="self.status='check full text availability.';return true" target="_new" title="check full text availability."><img align="middle" alt="check full text availability." border="0" class="icon-image" data-defer-image="https://s3.amazonaws.com/libapps/customers/2023/images/logo-INSEAD_blanc-sur-vert_250.jpg" id="imgILSLink50_1" src="https://if.ebsco-content.com/interfacefiles/17.232.0.2749/blank.gif"/>Check full text availability</a></span></span>
</div>
</div>'''
soup=BeautifulSoup(html,'html.parser')
divtext=soup.find('div',class_='display-info')
print(re.findall("By:?s.*Mark.",divtext.text)[0])
print(re.findall("Abstract:?s.*[",divtext.text)[0][:-1])

输出

By: Stein, Mark.
Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further contagion of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions.

最新更新