如何使用NAN选择Pandas数据集中的所有行,然后将其转换为新列



[原始数据][1]我有一组全年阅读的书名数据。我只想在原始数据的基础上添加一个名为Date的新列。


Tittle         Page        Author    Rating
0   Monday, Mar. 1  NaN         NaN      NaN
1   Tittle 1        5.0         JHK      1.50
2   Tittle 2        13.0        ABB      0.03
3   Tittle 3        100.0       ACC      3.5
4   Tittle 4        9.0          NN      5.40
5   Tuesday, Jan. 2 NaN         NaN       NaN
6   Tittle 5        6.0         BBB      6.50
7   Tittle 7        14.0        CCC      10.00
8   Tittle 8        10.0        CNN      2.50
9   Wednesday, Dec.3 NaN        NaN      NaN
10  Tittle 10       5.0         CBS      1.00
11  Title 20        5.0         ABC      1.00
12  Title 21        25.0        JJJ      3.50
13  Title 22        1.0         NNN      7.50
14  Thursday, Mar.4 NaN         NaN      NaN
15  Title 25        100.0       VVV      9.00
16  Title 30        6.0         YYYY     9.00
17  Title 35        2.0         QQQ      9.00

我试过使用dropna((,但最终它只是去掉了整行。

dfs = pd.read_csv('Book2.csv')
df = dfs.dropna()
display(df)
Tittle  Page    Author  Rating
1   Tittle 1    5.0     JHK     1.50
2   Tittle 2    13.0    ABB     0.03
4   Tittle 4    9.0     tvN     5.40
6   Tittle 5    6.0     BBB     6.50
7   Tittle 7    14.0    CCC     10.00
8   Tittle 8    10.0    CNN     2.50
10  Tittle 10   5.0     CBS     1.00
11  Title 20    5.0     ABC     1.00
12  Title 21    25.0    JJJ     3.50
13  Title 22    1.0     NNN     7.50
15  Title 25    100.0   VVV     9.00
16  Title 30    6.0     YYYY    9.00
17  Title 35    2.0     QQQ     9.00

我曾尝试使用pd.isna((来制作一个新的数据帧,但结果并不是我想要的样子。

dfs = pd.read_csv('Book2.csv')
df = dfs[dfs.isnull().any(axis=1)]
display(df)
Tittle  Page    Author  Rating
0   Monday, Mar. 1  NaN NaN NaN
5   Tuesday, Jan. 2 NaN NaN NaN
9   Wednesday, Dec. 3   NaN NaN NaN
14  Thursday, Mar. 4    NaN NaN NaN

最后,我只是在excel中手动编辑它,使它看起来像我想要的样子。[我希望它看起来像这样。][2]

Tittle  Page    Author  Rating  Date
0   Tittle 1    5   JHK 1.50    Monday, Mar. 1
1   Tittle 2    13  ABB 0.03    Monday, Mar. 1
2   Tittle 3    100 ACC 4.50    Monday, Mar. 1
3   Tittle 4    9   tvN 5.40    Monday, Mar. 1
4   Tittle 5    6   BBB 6.50    Tuesday, Jan. 2
5   Tittle 7    14  CCC 10.00   Tuesday, Jan. 2
6   Tittle 8    10  CNN 2.50    Tuesday, Jan. 2
7   Tittle 10   5   CBS 1.00    Wednesday, Dec. 3
8   Title 20    5   ABC 1.00    Wednesday, Dec. 3
9   Title 21    25  JJJ 3.50    Wednesday, Dec. 3
10  Title 22    1   NNN 7.50    Wednesday, Dec. 3
11  Title 25    100 VVV 9.00    Thursday, Mar. 4
12  Title 30    6   YYYY    9.00    Thursday, Mar. 4
13  Title 35    2   QQQ 9.00    Thursday, Mar. 4
```


So I could sort or groupby the data by date later on if i want to see if there is a trend on the students reading habits in the future but for now I just want to add another date column and use the date that is all ready there in the data set if possible. I have to comb through a total of 181 pages of this and I am hoping pandas could help cut down the hours I will have to spend editing this manually in excel and using the copy and paste.
if you have any other recommendation in how to efficiently wrangle this data-set where the title will not mix in the same column and the NAN will be taken cared off. It will be greatly appreciated. 

[1]: https://i.stack.imgur.com/JBihm.png
[2]: https://i.stack.imgur.com/nlJbE.png

fillna()填充具有三个或更多NA编号的条件提取的数据帧。将生成的日期序列与NA为零的数据组合。

df1 = df.loc[df.isnull().sum(axis=1) == 3].reindex(index=df.index)
df1.fillna(method='ffill', inplace=True)
df1 = pd.concat([df1[['Tittle']], df.loc[df.isnull().sum(axis=1) == 0]], axis=1, join='inner')
df1.columns =['Date', 'Tittle', 'Page', 'Author', 'Rating']
df1
Date    Tittle  Page    Author  Rating
1   Monday, Mar. 1  Tittle 1    5.0     JHK     1.50
2   Monday, Mar. 1  Tittle 2    13.0    ABB     0.03
3   Monday, Mar. 1  Tittle 3    100.0   ACC     3.50
4   Monday, Mar. 1  Tittle 4    9.0     NN  5.40
6   Tuesday, Jan. 2     Tittle 5    6.0     BBB     6.50
7   Tuesday, Jan. 2     Tittle 7    14.0    CCC     10.00
8   Tuesday, Jan. 2     Tittle 8    10.0    CNN     2.50
10  Wednesday, Dec. 3   Tittle 10   5.0     CBS     1.00
11  Wednesday, Dec. 3   Title 20    5.0     ABC     1.00
12  Wednesday, Dec. 3   Title 21    25.0    JJJ     3.50
13  Wednesday, Dec. 3   Title 22    1.0     NNN     7.50
15  Thursday, Mar. 4    Title 25    100.0   VVV     9.00
16  Thursday, Mar. 4    Title 30    6.0     YYYY    9.00
17  Thursday, Mar. 4    Title 35    2.0     QQQ     9.00

相关内容

最新更新