分类导航

程序问答发布时间：2022-06-02 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了以任何顺序从 a 列中获取 b 列中找到的字符串计数，并在新列中返回计数，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

如何解决以任何顺序从 a 列中获取 b 列中找到的字符串计数，并在新列中返回计数？

开发过程中遇到以任何顺序从 a 列中获取 b 列中找到的字符串计数，并在新列中返回计数的问题如何解决？下面主要结合日常开发的经验，给出你关于以任何顺序从 a 列中获取 b 列中找到的字符串计数，并在新列中返回计数的解决方法建议，希望对你解决以任何顺序从 a 列中获取 b 列中找到的字符串计数，并在新列中返回计数有所启发或帮助；

我正在尝试获取 b 列中以任何顺序与 a 列匹配的子字符串的数量。

示例：

[col a]                   [col b]                             [frequency]
big red car            elon musk drives a big red car              1
elon musk car          elon musk drives a big red car              1
red big car            elon musk drives a big red car              1

匹配的最大数量需要固定为 1。例如big red car 只会匹配一次，而不是每个组合都匹配。

如果可能的话，我需要返回完全匹配的单词。 car 与 card 等不匹配。

我尝试过的：

df["frequency"] = df.apply(lambda x: x['col b'].count(x['col a']),axis=1)

这只能找到完全匹配，但我需要它们以任何顺序匹配。

感谢任何帮助。

解决方法

假设您要检查“[col A]”中的所有单词是否都在“[col B]”中：

def ismatch(s):
    A = set(s['[col a]'].split())
    B = set(s['[col b]'].split())
    return A.intersection(B) == A
df.apply(ismatch,axis=1)

输入：

         [col a]                         [col b]  [frequency]
0    big red car  elon musk drives a big red car            1
1  elon musk car  elon musk drives a big red car            1
2    red big car  elon musk drives a big red car            1
3   red big card  elon musk drives a big red car            1

输出：

0    True
1    True
2    True
3   false

通过 str.contains() 尝试：

words='|'.join(df['[col a]'].unique())
#Finally:
df['[frequency]']=df['[col b]'].str.contains(words).astype(int)
#OR
df['[frequency]']=df['[col b]'].str.contains(words).view('i1')

df 的输出：

[col a]                   [col b]                             [frequency]
big red car            elon musk drives a big red car              1
elon musk car          elon musk drives a big red car              1
red big car            elon musk drives a big red car              1