我们如何在一个数据帧中选择和连接一系列列



我正在努力实现这一点:

所需输出的屏幕截图

但是当我运行下面的脚本时,我得到一个空的数据帧。

import pandas as pd
df1 = pd.DataFrame({'Column1': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
                   'Column2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'Column3': ['I', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII', 'IX', 'X'],
                   'Column4': [pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA],
                   'Column5': ['K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T'],
                   'Column6': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
                   'Column7': ['XI', 'XII', 'XIII', 'XIV', 'XV', 'XVI', 'XVII', 'XVIII', 'XIX', 'XX'],
                   'Column8': [pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA, pd.NA],
                   'Column9': ['U', 'V', 'W', 'X', 'Y', 'Z', '', '', '', ''],
                   'Column10': [21, 22, 23, 24, 25, 26, pd.NA, pd.NA, pd.NA, pd.NA],
                   'Column11': ['XXI', 'XXII', 'XXIII', 'XXIV', 'XXV', 'XXVI', '', '', '', '']})
column_names = ['Letters', 'Numbers', 'RomanNumerals']
df4 = pd.DataFrame(columns = column_names)
while i<len(df1.columns):
    df2 = df1.iloc[:, i:i+3]
    df3 = df2.rename(index={0: 'Letters', 1: 'Numbers', 2: 'RomanNumerals'})
    df4 = pd.concat(df4, df3)
    i+=4
    
print(df4)
Empty DataFrame
Columns: [Letters, Numbers, RomanNumerals]
Index: []

我错过什么了吗?

您的代码看起来不错。你不需要df3。只要适当地命名列,它就会起作用。

i = 0
column_names = ['Letters', 'Numbers', 'RomanNumerals']
result = pd.DataFrame(columns=column_names)
while i<len(df1.columns):
    df2 = df1.iloc[:, i:i+3]
    df2.columns = column_names
    result = pd.concat([result, df2], axis=0)
    i += 4
result.dropna(inplace=True)
df1 = df1.replace([pd.NA, ''], np.nan)
df1 = df1.dropna(axis=1, how='all')
num_cols = 3
column_names = ['Letters', 'Numbers', 'RomanNumerals']
columns = df1.columns
dfs = []
for i in range(len(columns)//num_cols):
    temp_df = df1[columns[i*num_cols:(i+1)*num_cols]]
    temp_df.columns = column_names
    dfs.append(temp_df)
df1 = pd.concat(dfs, ignore_index=True)
df1 = df1.dropna(how='all')
print(df1)

输出:

   Letters  Numbers RomanNumerals
0        A      1.0             I
1        B      2.0            II
2        C      3.0           III
3        D      4.0            IV
4        E      5.0             V
5        F      6.0            VI
6        G      7.0           VII
7        H      8.0          VIII
8        I      9.0            IX
9        J     10.0             X
10       K     11.0            XI
11       L     12.0           XII
12       M     13.0          XIII
13       N     14.0           XIV
14       O     15.0            XV
15       P     16.0           XVI
16       Q     17.0          XVII
17       R     18.0         XVIII
18       S     19.0           XIX
19       T     20.0            XX
20       U     21.0           XXI
21       V     22.0          XXII
22       W     23.0         XXIII
23       X     24.0          XXIV
24       Y     25.0           XXV
25       Z     26.0          XXVI

试试这个:

df1 = df1.replace([pd.NA, ""], np.nan)
df1 = df1.dropna(axis=1, how="all")
num_cols = 3
column_names = ["Letters", "Numbers", "RomanNumerals"]
columns = df1.columns

pd.concat(
    [
        df1[df1.columns[i :: num_cols]].unstack().reset_index(drop=True)
        for i in range(num_cols)
    ],
    axis=1,
).dropna().set_axis(column_names, axis=1)

输出:

    Letters  Numbers RomanNumerals
0        A      1.0             I
1        B      2.0            II
2        C      3.0           III
3        D      4.0            IV
4        E      5.0             V
5        F      6.0            VI
6        G      7.0           VII
7        H      8.0          VIII
8        I      9.0            IX
9        J     10.0             X
10       K     11.0            XI
11       L     12.0           XII
12       M     13.0          XIII
13       N     14.0           XIV
14       O     15.0            XV
15       P     16.0           XVI
16       Q     17.0          XVII
17       R     18.0         XVIII
18       S     19.0           XIX
19       T     20.0            XX
20       U     21.0           XXI
21       V     22.0          XXII
22       W     23.0         XXIII
23       X     24.0          XXIV
24       Y     25.0           XXV
25       Z     26.0          XXVI

最新更新