- 我可以像这样打印
pandas
数据帧的两列 - 如何格式化逐行打印
- 这是我的";丑陋的";解决方案,然后是我所期望的工作
import pandas
def date_normalization(data: pandas.core.frame.DataFrame) -> None:
# EDIT: add completed code
# convert to desired date format
data[normalized] = pandas.to_datetime(
data[original],
errors="coerce",
).dt.strftime('%d/%m/%Y')
original = "start"
normalized = "normalized"
data = pandas.DataFrame({
original:
{
0: "AUG 26 2016",
1: "JAN-FEB 2021",
2: "2017-06-01 00:00:00"
}})
date_normalization(data)
# remove rows with invalid date
data = data[data[normalized].notnull()]
# arrggghh ... this is working, but ugly 👹👹👹 ...
for i, before in enumerate(data[original]):
for j, after in enumerate(data[normalized]):
if i == j:
print(f"row {i}: {before} -> {after}")
print("n")
# surprisingly (?) this doesn't work 🥴
for row in data:
print(f"{row[original]} -> {row[normalized]}")
这是我第二次尝试时得到的错误:
row 0: AUG 26 2016 -> 26/08/2016
row 1: 2017-06-01 00:00:00 -> 01/06/2017
Traceback (most recent call last):
File "/home/oren/Downloads/GGG/main.py", line 36, in <module>
print(f"{row[original]} -> {row[normalized]}")
TypeError: string indices must be integers
因为创建了新列normalized
,所以可以使用zip
:
import pandas as pd
def date_normalization(data: pd.core.frame.DataFrame) -> None:
# EDIT: add completed code
# convert to desired date format
data[normalized] = pd.to_datetime(
data[original],
errors="coerce",
).dt.strftime('%d/%m/%Y')
return data.dropna(subset=['normalized'])
original = "start"
normalized = "normalized"
data = pd.DataFrame({
original:
{
0: "AUG 26 2016",
1: "JAN-FEB 2021",
2: "2017-06-01 00:00:00"
}})
data = date_normalization(data)
print (data)
start normalized
0 AUG 26 2016 26/08/2016
2 2017-06-01 00:00:00 01/06/2017
for o,n in zip(data[original], data[normalized]):
print(f"{o} -> {n}")
AUG 26 2016 -> 26/08/2016
2017-06-01 00:00:00 -> 01/06/2017
删除NaN
后,可以使用data.reset_index(drop=True, inplace=True)
重置索引。如果不重置索引,即使删除某些行,原始索引也将保留。
您可以使用DataFrame.iterrows.
for index, row in data.iterrows():
print(f"{row[original]} -> {row[normalized]}")