从一列中提取并复制以"Unit"开头的字符串到新列中:熊猫



下面是我的输入数据的外观。我想使用pandas/python/regex将所有以"Unit"开头的字符串提取到一个新列中,该列对应于单词在第二列中的位置。如有任何帮助,我们将不胜感激。

Input:
A
MARYLAND
Unit6
Unit7
Unit8
NEW SECTOR
Unit1
Unit2
NORTH SECTOR
Unit1
Unit2
PVT SECTOR
PUBLIC SECTOR
Unit1
Unit2
CENTRAL SECTOR
THERMAL
SOUTH SECTOR
Unit1
Unit2
Unit3
ACCOUNT SECTOR
DOLBY DIGITAL
WASHINGTON

Output:
A              B
MARYLAND            
Unit6           Unit6
Unit7           Unit7
Unit8           Unit8
NEW SECTOR          
Unit1           Unit1
Unit2           Unit2
NORTH SECTOR            
Unit1           Unit1
Unit2           Unit2
PVT SECTOR          
PUBLIC SECTOR           
Unit1           Unit1
Unit2           Unit2
CENTRAL SECTOR          
THERMAL         
SOUTH SECTOR            
Unit1           Unit1
Unit2           Unit2
Unit3           Unit3
ACCOUNT SECTOR          
DOLBY DIGITAL           
WASHINGTON          

最后,现在"Unit"字符串被复制到新列,我想从列A:中删除这些值

A            B
MARYLAND            
Unit6
Unit7
Unit8
NEW SECTOR          
Unit1
Unit2
NORTH SECTOR            
Unit1
Unit2
PVT SECTOR          
PUBLIC SECTOR           
Unit1
Unit2
CENTRAL SECTOR          
THERMAL         
SOUTH SECTOR            
Unit1
Unit2
Unit3
ACCOUNT SECTOR          
DOLBY DIGITAL           
WASHINGTON  

使用str.extractfillna:

df['B'] = df['A'].str.extract('(^Unitd+)')
df.loc[df['B'].notnull(),'A'] = ''
df['B'].fillna('',inplace=True)
print(df)
A      B
0         MARYLAND       
1                   Unit6
2                   Unit7
3                   Unit8
4       NEW SECTOR       
5                   Unit1
6                   Unit2
7     NORTH SECTOR       
8                   Unit1
9                   Unit2
10      PVT SECTOR       
11   PUBLIC SECTOR       
12                  Unit1
13                  Unit2
14  CENTRAL SECTOR       
15         THERMAL       
16    SOUTH SECTOR       
17                  Unit1
18                  Unit2
19                  Unit3
20  ACCOUNT SECTOR       
21   DOLBY DIGITAL       
22      WASHINGTON       

使用列A作为索引数组的另一种方法:

df["B"] = df["A"][df['A'].str.contains('^Unit', regex=True)]
df["B"] = df["B"].fillna("")
A        B
0   MARYLAND    
1   Unit6    Unit6
2   Unit7    Unit7
3   Unit8    Unit8
4   NEW SECTOR  
5   Unit1    Unit1
6   Unit2    Unit2
7   NORTH SECTOR    
8   Unit1    Unit1
9   Unit2    Unit2
10  PVT SECTOR  
11  PUBLIC SECTOR   
12  Unit1    Unit1
13  Unit2    Unit2
14  CENTRAL SECTOR  
15  THERMAL 
16  SOUTH SECTOR    
17  Unit1    Unit1
18  Unit2    Unit2
19  Unit3    Unit3
20  ACCOUNT SECTOR  
21  DOLBY DIGITAL   
22  WASHINGTON  

最新更新