我尝试从链接https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv读取CSV
df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv', parse_dates=['time'])
但是,时间列仍然是字符串格式
df.dtypes
[output]
ip object
time object
path object
status int64
size int64
dtype: object
有趣的是,当我从不同的url读取类似的csv时,它工作了。所以df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/solution/traffic.csv', parse_dates=['time'])
确实将时间列转换为datetime对象。为什么parse_dates在第一个链接中失败,我如何修复它?
datetimes:
1017-06-19 14:46:24
可能的解决方案是将值转换为NaT
:
df['time'] = pd.to_datetime(df['time'], errors='coerce')