需要一些与转置相关的python代码的帮助。有一个csv文件在下面查看(a1
&b1
,a2
&b2
,c1
&c2
,d1
&d2
像组)
r1;a1;b1;a2;b2;c1;c2;d1;d2
12;ec;em;ed;em;ec;ek;;
23;ec;ek;ec;rk;;;;
34;ec;rk;ec;jk;ek;45;;
需要将输出文件获取为
12;ec;em
12;ed;em
12;ec;ek
23;ec;ek
23;ec;rk
34;ec;rk
34;ec;jk
34;ek;45
一种选择是根据列成对循环访问数据帧,然后连接成一个数据帧:
temp = df.set_index('r1')
outcome = [temp.iloc[:, n:n+2]
.set_axis(['left', 'right'], axis = 'columns')
for n in range(0, len(temp.columns), 2)]
pd.concat(outcome).dropna().sort_index().reset_index()
r1 left right
0 12 ec em
1 12 ed em
2 12 ec ek
3 23 ec ek
4 23 ec rk
5 34 ec rk
6 34 ec jk
7 34 ek 45
一个可能更快的选择是将所有东西都转储到 numpy 中,在返回熊猫清理之前重新塑造:
temp = df.set_index('r1')
# since the column reshape is in pairs, we use 2:
outcome = np.reshape(temp.to_numpy(), (-1, 2))
# index will be repeated for the pairs of columns:
index = temp.index.repeat(len(temp.columns)//2)
pd.DataFrame(outcome, index = index).sort_index().dropna().reset_index()
r1 0 1
0 12 ec em
1 12 ed em
2 12 ec ek
3 23 ec ek
4 23 ec rk
5 34 ec rk
6 34 ec jk
7 34 ek 45
另一个选项是从 pyjanitor pivot_longer,您可以在其中传递重塑所需的模式:
(
df
.pivot_longer(index = 'r1',
names_to = ['left', 'right'],
names_pattern = ["a1|a2|c1|d1", "b1|b2|c2|d2"],
sort_by_appearance = True)
.dropna()
)
r1 left right
0 12 ec em
1 12 ed em
2 12 ec ek
4 23 ec ek
5 23 ec rk
8 34 ec rk
9 34 ec jk
10 34 ek 45
我将尝试仅建议一个简单的示例。我试图让它尽可能明确和模块化,例如它很容易看到并随心所欲地更改。
import csv
def flatten(t):
return [item for sublist in t for item in sublist]
def reformat_first_with_pairs(p_list):
reformated = []
for i in range(1, len(p_list), 2):
reformated.append([p_list[0], p_list[i], p_list[i + 1]])
return reformated
def disqualify_if_contains(p_list, value):
filtered = []
for key, sub_list in enumerate(p_list):
if not value in sub_list:
filtered.append(sub_list)
return filtered
def rows_from_csv_file(file_name, f_delimiter=';'):
input_file_rows = []
try:
with open(file_name, newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
input_file_rows.append(row[0].split(f_delimiter))
except EnvironmentError:
raise
return input_file_rows
def rows_to_csv_file(file_name, rows, f_delimiter=';'):
try:
with open(file_name, 'w+', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=f_delimiter,
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in rows:
writer.writerow(row)
return True
except EnvironmentError:
raise
def custom_csv_reformat_00(input_file, output_file):
try:
input_file_rows = rows_from_csv_file(input_file)
except EnvironmentError as e:
raise
input_file_rows.pop(0)
reformated_list = flatten(map(reformat_first_with_pairs, input_file_rows))
reformated_list = disqualify_if_contains(reformated_list, '')
try:
rows_to_csv_file(output_file, reformated_list)
except EnvironmentError as e:
raise
IF_NAME = 'input.csv'
OF_NAME = 'output.csv'
try:
custom_csv_reformat_00(IF_NAME, OF_NAME)
except Exception as e:
print(e)
再次指出,我不知道您的目标是否是删除第一行,如下所示:
input_file_rows.pop(0)