通过从给定句子的每两个连续单词中创建一对单词来形成二元组。在python中,此技术在文本分析中大量使用。下面我们看到两种实现方法。
使用这两种方法,我们首先将句子拆分为多个单词,然后使用枚举函数从连续的单词中创建一对单词。
list = ['Stop. look left right. go'] print ("The given list is : \n" + str(list)) # Using enumerate() and split() for Bigram formation output = [(k, m.split()[n + 1]) for m in list for n, k in enumerate(m.split()) if n < len(m.split()) - 1] print ("Bigram formation from given list is: \n" + str(output))
输出结果
运行上面的代码给我们以下结果-
The given list is : ['Stop. look left right. go'] Bigram formation from given list is: [('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]
我们还可以使用zip和split函数创建biagram。zip()函数按顺序将标题中的单词加在一起,这些单词是使用split()从句子中创建的。
list = ['Stop. look left right. go'] print ("The given list is : \n" + str(list)) # Using zip() and split() for Bigram formation output = [m for n in list for m in zip(n.split(" ")[:-1], n.split(" ")[1:])] print ("Bigram formation from given list is: \n" + str(output))
输出结果
运行上面的代码给我们以下结果-
The given list is : ['Stop. look left right. go'] Bigram formation from given list is: [('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]