Pandas无法在Jupyter notebook中打印使用xpath从web收集的对象列表



这是我使用的代码。我使用的是Jupternotebook网页版。我升级了XML, python版本是3.8。

import numpy as np
import requests 
from lxml import html
import csv
import pandas as pd
# getting the web content
r = requests.get('http://www.pro-football-reference.com/years/2017/draft.htm')
data = html.fromstring(r.text)

收集特定数据

pick = data.xpath('//td[@data_stat="draft_pick"]//text()')
player = data.xpath('//td[@data_stat="player"]//text()')
position = data.xpath('//td[@data_stat="pos"]//text()')
age= data.xpath('//td[@data_stat="age"]//text()')
games_played = data.xpath('//td[@data_stat="g"]//text()')
cmp = data.xpath('//td[@data_stat="pass_cmp"]//text()')
att = data.xpath('//td[@data_stat="pass_att"]//text()')
college = data.xpath('//td[@data_stat="college_id"]//text()')
data = list(zip(pick,player,position,age,games_played,cmp,att,college))
df = pd.DataFrame(data)
df

在我尝试的两个单独的文件上显示两个错误:

  1. init. py的祝辞
  2. AttributeError: 'list'对象没有属性'xpath'

代码没有给我从网页上我想要的数据列表。有人能帮我一下吗?提前谢谢你。

您可以使用read_html:

将html表直接加载到数据框中
import pandas as pd
df = pd.read_html('http://www.pro-football-reference.com/years/2017/draft.htm')[0]
df.columns = df.columns.droplevel(0) # drop top header row
df = df[df['Rnd'].ne('Rnd')] # remove mid-table header rows 

输出:

|    |   Rnd |   Pick | Tm   | Player            | Pos   |   Age |   To |   AP1 |   PB |   St |   CarAV |   DrAV |   G |   Cmp |   Att |   Yds |   TD |   Int |   Att |   Yds |   TD |   Rec |   Yds |   TD |   Solo |   Int |    Sk | College/Univ   | Unnamed: 28_level_1   |
|---:|------:|-------:|:-----|:------------------|:------|------:|-----:|------:|-----:|-----:|--------:|-------:|----:|------:|------:|------:|-----:|------:|------:|------:|-----:|------:|------:|-----:|-------:|------:|------:|:---------------|:----------------------|
|  0 |     1 |      1 | CLE  | Myles Garrett     | DE    |    21 | 2020 |     1 |    2 |    4 |      35 |     35 |  51 |     0 |     0 |     0 |    0 |     0 |     0 |     0 |    0 |     0 |     0 |    0 |    107 |   nan |  42.5 | Texas A&M      | College Stats         |
|  1 |     1 |      2 | CHI  | Mitchell Trubisky | QB    |    23 | 2020 |     0 |    1 |    3 |      33 |     33 |  51 |  1010 |  1577 | 10609 |   64 |    37 |   190 |  1057 |    8 |     0 |     0 |    0 |    nan |   nan | nan   | North Carolina | College Stats         |
|  2 |     1 |      3 | SFO  | Solomon Thomas    | DE    |    22 | 2020 |     0 |    0 |    2 |      15 |     15 |  48 |     0 |     0 |     0 |    0 |     0 |     0 |     0 |    0 |     0 |     0 |    0 |     73 |   nan |   6   | Stanford       | College Stats         |
|  3 |     1 |      4 | JAX  | Leonard Fournette | RB    |    22 | 2020 |     0 |    0 |    3 |      25 |     20 |  49 |     0 |     0 |     0 |    0 |     0 |   763 |  2998 |   23 |   170 |  1242 |    2 |    nan |   nan | nan   | LSU            | College Stats         |
|  4 |     1 |      5 | TEN  | Corey Davis       | WR    |    22 | 2020 |     0 |    0 |    4 |      25 |     25 |  56 |     0 |     0 |     0 |    0 |     0 |     6 |    55 |    0 |   207 |  2851 |   11 |    nan |   nan | nan   | West. Michigan | College Stats         |

最新更新