我有一个使用BeautifulSoup的网站提取的值列表。它看起来像这样:
tables_values1 = soup.find_all('td',attrs={'class':'x1'})
print(tables_values1)
输出: [123值1,123值2,"123值3] (注意没有"或"(
我正在尝试使用以下方法(我也在stackexchange上找到(切掉前x个字符:
tables_values = [x[2:] for x in tables_values1]
但是,这将返回:
类型错误:不可哈希类型:"切片">
任何人都可以帮助弄清楚为什么会发生这种情况以及如何解决它?非常感谢!
编辑:请让我知道这是否是一个有效的列表!
编辑 3:按照以下要求打印确切的 repr:
[<td class="views-field views-field-field-category-value-2018">136 </td>, <td class="views-field views-field-field-category-value-2018">SFD </td>, <td class="views-field views-field-field-category-value-2018">136 </td>, <td class="views-field views-field-field-category-value-2018">$33,657,146 </td>, <td class="views-field views-field-field-category-value-2018">9.7 </td>, <td class="views-field views-field-field-category-value-2018">$33,657,146 </td>, <td class="views-field views-field-field-category-value-2018">61 </td>, <td class="views-field views-field-field-category-value-2018">34 </td>, <td class="views-field views-field-field-category-value-2018">5 </td>, <td class="views-field views-field-field-category-value-2018">61 </td>, <td class="views-field views-field-field-category-value-2018">34 </td>, <td class="views-field views-field-field-category-value-2018">5 </td>, <td class="views-field views-field-field-category-value-2018">5 </td>, <td class="views-field views-field-field-category-value-2018">95 </td>]
<td class="views-field views-field-field-category-value-2018">136 </td>
这些是列表中的BeautifulSoup标签对象,而不是字符串。您正在尝试将它们切片,就好像它们是字符串一样。您确实应该将它们用作标签,而不是尝试进行字符串操作;例如,如果您尝试获取标签之间的文本,那将是
contents = [x.string for x in tables_values1]
其中string
属性是获取标记的单个字符串子项(如果有(的帮助程序。
如果您确实想通过字符串操作而不是通过 BeautifulSoup 界面来执行任务,您可以将标签对象转换为字符串,包括<td class="..."></td>
部分:
strings = [str(x) for x in tables_values1]
然后,您可以根据需要对字符串进行切片。