如何根据一系列组数据项转置 csv 文件数据



需要一些与转置相关的python代码的帮助。有一个csv文件在下面查看(a1&b1a2&b2c1&c2d1&d2像组)

r1;a1;b1;a2;b2;c1;c2;d1;d2
12;ec;em;ed;em;ec;ek;;
23;ec;ek;ec;rk;;;;
34;ec;rk;ec;jk;ek;45;;

需要将输出文件获取为

12;ec;em
12;ed;em
12;ec;ek
23;ec;ek
23;ec;rk
34;ec;rk
34;ec;jk
34;ek;45

一种选择是根据列成对循环访问数据帧,然后连接成一个数据帧:


temp = df.set_index('r1')
outcome = [temp.iloc[:, n:n+2]
.set_axis(['left', 'right'], axis = 'columns')
for n in range(0, len(temp.columns), 2)]
pd.concat(outcome).dropna().sort_index().reset_index()
r1 left right
0  12   ec    em
1  12   ed    em
2  12   ec    ek
3  23   ec    ek
4  23   ec    rk
5  34   ec    rk
6  34   ec    jk
7  34   ek    45

一个可能更快的选择是将所有东西都转储到 numpy 中,在返回熊猫清理之前重新塑造:

temp = df.set_index('r1')
# since the column reshape is in pairs, we use 2:
outcome = np.reshape(temp.to_numpy(), (-1, 2))
# index will be repeated for the pairs of columns:
index = temp.index.repeat(len(temp.columns)//2)
pd.DataFrame(outcome, index = index).sort_index().dropna().reset_index()
r1   0   1
0  12  ec  em
1  12  ed  em
2  12  ec  ek
3  23  ec  ek
4  23  ec  rk
5  34  ec  rk
6  34  ec  jk
7  34  ek  45

另一个选项是从 pyjanitor pivot_longer,您可以在其中传递重塑所需的模式:

(
df
.pivot_longer(index = 'r1', 
names_to = ['left', 'right'], 
names_pattern = ["a1|a2|c1|d1", "b1|b2|c2|d2"], 
sort_by_appearance = True)
.dropna()
)
r1 left right
0   12   ec    em
1   12   ed    em
2   12   ec    ek
4   23   ec    ek
5   23   ec    rk
8   34   ec    rk
9   34   ec    jk
10  34   ek    45

我将尝试仅建议一个简单的示例。我试图让它尽可能明确和模块化,例如它很容易看到并随心所欲地更改。

import csv

def flatten(t):
return [item for sublist in t for item in sublist]

def reformat_first_with_pairs(p_list):
reformated = []
for i in range(1, len(p_list), 2):
reformated.append([p_list[0], p_list[i], p_list[i + 1]])
return reformated

def disqualify_if_contains(p_list, value):
filtered = []
for key, sub_list in enumerate(p_list):
if not value in sub_list:
filtered.append(sub_list)
return filtered

def rows_from_csv_file(file_name, f_delimiter=';'):
input_file_rows = []
try:
with open(file_name, newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
input_file_rows.append(row[0].split(f_delimiter))
except EnvironmentError:
raise
return input_file_rows

def rows_to_csv_file(file_name, rows, f_delimiter=';'):
try:
with open(file_name, 'w+', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=f_delimiter,
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in rows:
writer.writerow(row)
return True
except EnvironmentError:
raise

def custom_csv_reformat_00(input_file, output_file):
try:
input_file_rows = rows_from_csv_file(input_file)
except EnvironmentError as e:
raise
input_file_rows.pop(0)
reformated_list = flatten(map(reformat_first_with_pairs, input_file_rows))
reformated_list = disqualify_if_contains(reformated_list, '')
try:
rows_to_csv_file(output_file, reformated_list)
except EnvironmentError as e:
raise

IF_NAME = 'input.csv'
OF_NAME = 'output.csv'
try:
custom_csv_reformat_00(IF_NAME, OF_NAME)
except Exception as e:
print(e)

再次指出,我不知道您的目标是否是删除第一行,如下所示:

input_file_rows.pop(0)

相关内容

  • 没有找到相关文章

最新更新