只能对字符串值使用.str访问器,当数据框列已转换为所有字符串时



我正试图从FDA网页中提取一些信息。我正在使用这个代码:

import pandas as pd
#Get CEDI html tables 
CEDI_inv_url = "https://www.accessdata.fda.gov/scripts/sda/sdNavigation.cfm?sd=edisrev&displayAll=true"
CEDI_HTML_tables = pd.read_html(CEDI_inv_url)
# STEP 2: Extract information from HTML Tables (i.e., scrapping of information)
CEDI_table_data = CEDI_HTML_tables[0]
CEDI_df = pd.DataFrame (CEDI_table_data, columns = ['MAINTERM','CAS NO','CUM DC (ppb)','CEDI','REGNUM'])
CEDI_df['CAS NO'].to_string()
CEDI_df['CAS NO'] = CEDI_df['CAS NO'].str.extract(r'([0-9]+[u2011|-][0-9]{2}[u2011|-][0-9](?![0-9]))')
CEDI_df.head()

我得到一个只能使用。str访问器与字符串值!错误。我已经尝试了许多方法将数据帧转换为字符串。什么好主意吗?

出现此错误是因为您正在访问的列不是字符串。使用.astype(str)应该修复:

import pandas as pd
CEDI_inv_url = "https://www.accessdata.fda.gov/scripts/sda/sdNavigation.cfm?sd=edisrev&displayAll=true"
CEDI_HTML_tables = pd.read_html(CEDI_inv_url)
CEDI_df = pd.DataFrame(CEDI_HTML_tables[0], columns = ['MAINTERM','CAS/ID NO','CUM DC (ppb)','CEDI (mg/kg bw/d)','REGNUM'])
CEDI_df['CAS/ID NO'] = CEDI_df['CAS/ID NO'].astype(str).str.extract(r'([0-9]+[u2011|-][0-9]{2}[u2011|-][0-9](?![0-9]))')
print(CEDI_df.head())
输出:

MAINTERM CAS/ID NO  CUM DC (ppb)  CEDI             REGNUM
0  (1,1,4,4- TETRAMETHYLTETRAMETHYLENE)BIS(TERT-B...       NaN           0.2   NaN  177.2600 177.1520
1  (2,4,4-TRIMETHYLPENT-2-YL)-N-PHENYL-1-NAPHTHYL...       NaN          50.0   NaN                NaN
2  (2- (METHACRYLOYLOXY)ETHYL)TRIMETHYLAMMONIUM M...       NaN           0.4   NaN   178.3520 176.170
3              (2-ALKENYL(C15-21))SUCCINIC ANHYDRIDE       NaN           5.0   NaN            176.170
4  (N-OCTYL)TIN S,S'S" TRIS(ISOOCTYLMERCAPTOACETATE)       NaN           7.7   NaN           178.2650

最新更新