分类导航

程序问答发布时间：2022-06-02 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了删除熊猫数据集中的停用词，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

如何解决删除熊猫数据集中的停用词？

开发过程中遇到删除熊猫数据集中的停用词的问题如何解决？下面主要结合日常开发的经验，给出你关于删除熊猫数据集中的停用词的解决方法建议，希望对你解决删除熊猫数据集中的停用词有所启发或帮助；

我正在尝试删除 apadas 数据集中的停用词，其中每一行都有一个词的标记化列表，单词列表的格式如下：

['Uno',','dos','One','two','tres','quatro','Yes','Wooly','Bully','Watch','it','Now','watch','Here','he','come','here','git','ya','Matty','told','HattIE','about','a','thing','she','saw','Had','big','horns','and','wooly','jaw','yes','drive','``','Let',"'s",'do',"n't",'take','no','chance','not','be','L-seven','learn','to','dance',"''",'Yeah','That','the','Get','you','someone','really','pull','wool','with','You','got','it']

使用以下代码执行此操作。
ret = df['tokenized_lyric'].apply(lambda x: [item for item in x if item.lower() not in stops])

print(ret)

这让我得到如下列表

e0       [n,n,e,w,r,...
2165    [,l,p,...

似乎删除了几乎所有字符。我如何让它只删除我设置的停用词？

解决方法

您正在使用列表推导式迭代字符串的字符。相反，在 lower() 之后，使用 split() 拆分字符串，然后迭代工作令牌，如下所示 -

print([i for i in 'hi there']) #iteraTing over the characters
print([i for i in 'hi there'.split()]) #iteraTing over the words

['h','i',' ','t','h','e','r','e']
['hi','there']

试试这个 lambda 函数 -

s = 'Hello World And Underworld'

stops = ['and','or','the']

f = lambda x: [item for item in x.split() if item.lower() not in stops]
f(s)

['Hello','world','underworld']

W.r.t 你的代码，它会是 -

df['tokenized_lyric'].apply(lambda x: [item for item in x.split() if item.lower() not in stops])

from nltk.corpus import stopwords

# stop words from nltk library
stopwords = stopwords.words('english')

# user defined stop words
custom_stopwords = ['hey','Hello'] 

# complete list of stop words
complete_stopwords = stopwords + custom_stopwords

# 
df['lyrics_clean'] = df['lyrics'].apply(lambda x: [word for word in x.split() if word not in (complete_stopwords)])

大佬总结

以上是大佬教程为你收集整理的删除熊猫数据集中的停用词全部内容，希望文章能够帮你解决删除熊猫数据集中的停用词所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错，欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ：384754419，请注明来意。

标签：删除熊猫数据集中的停用词

上一篇: 如何在 PyTorch 中训练张量的张量下一篇:在armv8中无一例外地突然出现在E...

猜你在找的程序问答相关文章

在烧瓶中重定向时发出POST请求 2022-06-02
从 CreateWindow() 返回的 HWND 的格式值是多少？ 2022-05-31
使用nodejs打印json对象内容 2022-05-31
useEffect 无限循环仅在测试时发生，否则不会发生 - 尽管使用 useReducer 2022-05-31
从雅虎财经检索 ESG 分数 2022-05-31
Gulp：获取“必须指定任务功能”错误，但我只有 1 个任务 2022-05-31
JavaScript 将平面数组转换为嵌套/分组和排序数组 2022-05-31
405 Method Not Allowed 当提交表单到 Flask 时，即使路由有 ['GET', 'PO... 2022-05-31
Mongodb 错误码和对应的 http 状态码 2022-05-31
连接到上游时 Nginx connect() 失败（111：连接被拒绝），客户端：192.168.128.1，服务... 2022-05-31