考虑下面的代码
import pprint
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")
ratings = ratings.map(lambda x: {
"movie_title": x["movie_title"],
"user_id": x["user_id"],
"user_rating": x["user_rating"],
# "timestamp": x["timestamp"],
})
movies = movies.map(lambda x: x["movie_title"])
type(movies)
for example in movies.take(2):
# pprint.pprint(tf.reshape(example['movie_title'],[3,5,1]))
pprint.pprint(example)
上面的代码将给我如下的输出
<tf.Tensor: shape=(), dtype=string, numpy=b'You So Crazy (1994)'>
<tf.Tensor: shape=(), dtype=string, numpy=b'Love Is All There Is (1996)'>
现在下面的代码将给出不同的结果。
假设我们在一个名为songs_details.csv
,song_id,title,release,artist_name,year,count
0,SOAAAGQ12A8C1420C8,Orgelblut,Dolores,Bohren Der Club Of Gore,2008,1
1,SOAACPJ12A81C21360,Cearc Agus Coileach The Hen And Cock,CasadhTurning,Mchel Silleabhin,1,1
2,SOAAEJI12AB0188AB5,Godlovesugly,God Loves Ugly,Atmosphere,1,1
3,SOAAFAC12A67ADF7EB,Rome Wasnt Built In A Day,Parts Of The Process,Morcheeba,2000,2
4,SOAAKPM12A58A77210,So Confused feat Butta Creame amended album version,Late Night Special,Pretty Ricky,2007,1
5,SOAAOYI12AB01831CE,Criminal,Gotan Project live,Gotan Project,2006,2
现在让我们读取这个csv文件并处理它
songs = tf.data.experimental.make_csv_dataset(
"./songs_details.csv",
batch_size=128,
select_columns=['song_id','title','release','artist_name','year'],
num_epochs=1,
ignore_errors=True,)
songs = songs.unbatch().map(lambda x: {
"song_id":x["song_id"],
"release":x["release"],
"artist_name":x["artist_name"],
"title":x["title"],
"year":x["year"],
})
for example in songs.map(lambda x: x['title']).take(2):
print(example)
以上将产生如下输出
tf.Tensor(b'Skip The Youth', shape=(), dtype=string)
tf.Tensor(b'Teenage Dirtbag', shape=(), dtype=string)
变量的两种表示之间是否有区别,我的意思是在tf.Tensor()
和<tf.Tensor: >
之间
使用的TF版本是2.9.1
变量的两种表示之间没有显著差异。唯一的区别是您将pprint
用于一个变量,而将print
用于另一个变量。这两个函数似乎打印Tensorflow张量有点不同。然而,这并不影响张量本身。用pprint
或print
同时打印将得到相同的结果。