如何使用python上的另一列检索的字符串创建列?



我正试图从我的csv文件中的一列读取信息,并用它来创建一个新列。请帮助

我导入了csv文件并打印了前10行(+标题),但现在我想在标题列中创建一个年份列。

```
import csv
from itertools import islice
from operator import itemgetter
#opening the CSV file
with open('/home/raymondossai/movies.csv', mode ='r')as file:
#reading the CSV file
csvFile = csv.reader(file)
#displaying the contents of the CSV file
for row in islice(csvFile, 11): # first 10 only
print(row)
```

结果:

['movieId', 'title', 'genres']
['1', 'Toy Story (1995)', 'Adventure|Animation|Children|Comedy|Fantasy']
['2', 'Jumanji (1995)', 'Adventure|Children|Fantasy']
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']
['4', 'Waiting to Exhale (1995)', 'Comedy|Drama|Romance']
['5', 'Father of the Bride Part II (1995)', 'Comedy']
['6', 'Heat (1995)', 'Action|Crime|Thriller']
['7', 'Sabrina (1995)', 'Comedy|Romance']
['8', 'Tom and Huck (1995)', 'Adventure|Children']
['9', 'Sudden Death (1995)', 'Action']
['10', 'GoldenEye (1995)', 'Action|Adventure|Thriller']

您可以使用retitle中提取年份:

rows = [
["movieId", "title", "genres"],
["1", "Toy Story (1995)", "Adventure|Animation|Children|Comedy|Fantasy"],
["2", "Jumanji (1995)", "Adventure|Children|Fantasy"],
["3", "Grumpier Old Men (1995)", "Comedy|Romance"],
["4", "Waiting to Exhale (1995)", "Comedy|Drama|Romance"],
["5", "Father of the Bride Part II (1995)", "Comedy"],
["6", "Heat (1995)", "Action|Crime|Thriller"],
["7", "Sabrina (1995)", "Comedy|Romance"],
["8", "Tom and Huck (1995)", "Adventure|Children"],
["9", "Sudden Death (1995)", "Action"],
["10", "GoldenEye (1995)", "Action|Adventure|Thriller"],
]
import re
pat = re.compile(r"((d{4}))")
for movie_id, title, genres in rows[1:]:
year = pat.search(title)
print([movie_id, title, genres, year.group(1) if year else "N/A"])

打印:

['1', 'Toy Story (1995)', 'Adventure|Animation|Children|Comedy|Fantasy', '1995']
['2', 'Jumanji (1995)', 'Adventure|Children|Fantasy', '1995']
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance', '1995']
['4', 'Waiting to Exhale (1995)', 'Comedy|Drama|Romance', '1995']
['5', 'Father of the Bride Part II (1995)', 'Comedy', '1995']
['6', 'Heat (1995)', 'Action|Crime|Thriller', '1995']
['7', 'Sabrina (1995)', 'Comedy|Romance', '1995']
['8', 'Tom and Huck (1995)', 'Adventure|Children', '1995']
['9', 'Sudden Death (1995)', 'Action', '1995']
['10', 'GoldenEye (1995)', 'Action|Adventure|Thriller', '1995']

你绝对应该使用pandas,它更容易处理表,也更干净。

尝试像这样读取CSV文件:

import pandas as pd
df = pd.read_csv('/home/raymondossai/movies.csv')

df对象基本上是您的CSV表在python中表示为对象。

将年份作为一个额外的列,您可以使用str.split()方法,因为年份总是跟在'('表达式之后:

# get the 4 characters of the year (first 4 characters after the ' (' expression)
df['Year'] = df['title'].str.split(pat=' (', expand=True)[1][:4].astype(int)

相关内容

  • 没有找到相关文章

最新更新