生成新的字段 csv python



我有这个csv文件:

movieId;title;genres
1;Toy Story (1995);Adventure|Animation|Children|Comedy|Fantasy
2;Jumanji (1995);Adventure|Children|Fantasy
3;Grumpier Old Men (1995);Comedy|Romance
4;Waiting to Exhale (1995);Comedy|Drama|Romance
5;Father of the Bride Part II (1995);Comedy
6;Heat (1995);Action|Crime|Thriller
7;Sabrina (1995);Comedy|Romance
8;Tom and Huck (1995);Adventure|Children
9;Hate (Haine, La) (1995);Crime|Drama
10;Seven (a.k.a. Se7en) (1995);Mystery|Thriller

我想从字段标题生成一个名为 year 的新字段,因为字段标题还包含电影的年份。我尝试过这种方式,但它不起作用:

import pandas
df=pandas.read_csv("/Users/Desktop/IMDB.csv")
str=df
str1="(19"
str2="(20"
str3="(21"
str.find(str1, beg=0, end=len(string))
str.find(str1, beg=0, end=len(string)) 
str.find(str1, beg=0, end=len(string))

如果包含长度为 4 的数字,则使用 str.extract by 正则表达式表示括号之间的值:

df['year'] = df['title'].str.extract('((d{4}))', expand=False).astype(int)
print (df)
   movieId                               title  
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   
5        6                         Heat (1995)   
6        7                      Sabrina (1995)   
7        8                 Tom and Huck (1995)   
8        9             Hate (Haine, La) (1995)   
9       10         Seven (a.k.a. Se7en) (1995)   
                                        genres  year  
0  Adventure|Animation|Children|Comedy|Fantasy  1995  
1                   Adventure|Children|Fantasy  1995  
2                               Comedy|Romance  1995  
3                         Comedy|Drama|Romance  1995  
4                                       Comedy  1995  
5                        Action|Crime|Thriller  1995  
6                               Comedy|Romance  1995  
7                           Adventure|Children  1995  
8                                  Crime|Drama  1995  
9                             Mystery|Thriller  1995  

最新更新