文本向量化如何使用Tensorflow和Python应用于stackoverflow问题数据集？

Tensorflow是Google提供的一种机器学习框架。它是一个开放源代码框架，与Python结合使用以实现算法，深度学习应用程序等等。它用于研究和生产目的。

可以使用下面的代码行在Windows上安装'tensorflow'软件包-

pip install tensorflow

Tensor是TensorFlow中使用的数据结构。它有助于连接流程图中的边缘。该流程图称为“数据流程图”。张量不过是多维数组或列表。

我们正在使用Google合作实验室来运行以下代码。Google Colab或Colaboratory可帮助在浏览器上运行Python代码，并且需要零配置并免费访问GPU（图形处理单元）。合作已建立在Jupyter Notebook的基础上。

示例

以下是代码片段-

print("1234 ---> ", int_vectorize_layer.get_vocabulary()[1289])
print("321 ---> ", int_vectorize_layer.get_vocabulary()[313])
print("Vocabulary size is : {}".format(len(int_vectorize_layer.get_vocabulary())))

print("The text vectorization is applied to the training dataset")
binary_train_ds = raw_train_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the validation dataset")
binary_val_ds = raw_val_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the test dataset")
binary_test_ds = raw_test_ds.map(binary_vectorize_text)

int_train_ds = raw_train_ds.map(int_vectorize_text)
int_val_ds = raw_val_ds.map(int_vectorize_text)
int_test_ds = raw_test_ds.map(int_vectorize_text)

代码信用-https://www.tensorflow.org/tutorials/load_data/text

输出结果

1234 ---> substring
321 ---> 20
Vocabulary size is : 10000
The text vectorization is applied to the training dataset
The text vectorization is applied to the validation dataset
The text vectorization is applied to the test dataset

说明

作为最后的预处理步骤，将“ TextVectorization”层应用于训练数据，测试数据和验证数据集。

基础教程