分类导航

C&C++ 发布时间：2022-04-03 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了c – 哪种字符串查找算法适用于此？，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

我有一个大字串说“aaaaaaaaaaabbbb bbbbbcccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd我想计算(重叠是好的)在大字符串中找到小字符串的次数.我只关心速度. KMP似乎很好,但看起来Rabin-Karp处理了多个但速度很慢.

解决方法

大多数字符串搜索算法的问题是它们将至少花费时间O(k)来返回k个匹配,所以如果你有一个字符串说100万“a”s,以及100万个小字符串查询“a”,那么它将花费大约100万次,数百万次迭代来计算所有比赛！

另一种线性时间方法是：

>构造一个大字符串的后缀树：O(n)其中n是len(大字符串)
>预先计算后缀树中每个节点下面的后缀数：O(n)
>对于每个小字符串,在后缀树中找到具有小字符串作为后缀的节点：O(m)其中m是len(小字符串)
>将总节点数添加到该节点下方的后缀数. (每个后缀对应于大字符串中小字符串的不同匹配)

这将花费时间O(n p),其中n是大字符串的长度,p是所有小字符串的总长度.

示例代码

根据要求,这里有一些Python中使用这种方法的小(ish)示例代码：

from collections import defaultDict

class SuffixTree:
    def __init__(self):
        """Returns an empty suffix tree"""
        self.T=''
        self.E={}
        self.nodes=[-1] # 0th node is empty String

    def add(self,s):
        """Adds the input String to the suffix tree.

        This inserts all subStrings into the tree.
        End the String with a unique character if you want a leaf-node for every suffix.

        Produces an edge graph keyed by (node,character) that gives (first,last,end)
        This means that the edge has characters from T[first:last+1] and goes to node end."""
        origin,first,last = 0,len(self.T),len(self.T)-1
        self.T+=s
        nc = len(self.nodes)
        self.nodes += [-1]*(2*len(s))
        T=self.T
        E=self.E
        nodes=self.nodes

        Lm1=len(T)-1
        for last_char_index in xrange(first,len(T)):
            c=T[last_char_index]
            last_parent_node = -1                    
            while 1:
                parent_node = origin
                if first>last:
                    if (origin,C) in E:
                        break             
                else:
                    key = origin,T[first]
                    edge_first,edge_last,edge_end = E[key]
                    span = last - first
                    A = edge_first+span
                    m = T[A+1]
                    if m==c:
                        break
                    E[key] = (edge_first,A,nC)
                    nodes[nc] = origin
                    E[nc,m] = (A+1,edge_end)
                    parent_node = nc
                    nc+=1  
                E[parent_node,c] = (last_char_index,Lm1,nC)
                nc+=1  
                if last_parent_node>0:
                    nodes[last_parent_node] = parent_node
                last_parent_node = parent_node
                if origin==0:
                    first+=1
                else:
                    origin = nodes[origin]

                if first <= last:
                    edge_first,edge_end=E[origin,T[first]]
                    span = edge_last-edge_first
                    while span <= last - first:
                        first+=span+1
                        origin = edge_end
                        if first <= last:
                            edge_first,edge_end = E[origin,T[first]]
                            span = edge_last - edge_first

            if last_parent_node>0:
                nodes[last_parent_node] = parent_node
            last+=1
            if first <= last:
                    edge_first,T[first]]
                            span = edge_last - edge_first
        return self


    def make_choices(self):
        """Construct a sorted list for each node of the possible conTinuing characters"""
        choices = [list() for n in xrange(len(self.nodes))] # Contains set of choices for each node
        for (origin,C),edge in self.E.items():
            choices[origin].append(C)
        choices=[sorted(s) for s in choices] # should not have any repeats by construction
        self.choices=choices
        return choices


    def count_suffixes(self,term):
        """Recurses through the tree finding how many suffixes are based at each node.
        Strings assumed to use term as the terminaTing character"""
        C = self.suffix_counts = [0]*len(self.nodes)
        choices = self.make_choices()
        def f(node=0):
            t=0
            X=choices[node]
            if len(X)==0:
                t+=1 # this node is a leaf node
            else:
                for c in X:
                    if c==term:
                        t+=1
                        conTinue
                    first,end = self.E[node,c]
                    t+=f(end)
            C[node]=t
            return t
        return f()

    def count_matches(self,needlE):
        """Return the count of matches for this needle in the suffix tree"""
        i=0
        node=0
        E=self.E
        T=self.T
        while i<len(needlE):
            c=needle[i]
            key=node,c
            if key not in E:
                return 0
            first,node = E[key]
            while i<len(needlE) and first<=last:
                if needle[i]!=T[first]:
                    return 0
                i+=1
                first+=1
        return self.suffix_counts[node]


big="aaaaaaaaaaabbbbbbbbbcccccccccccddddddddddd"
small_Strings=["a","ab","abc"]
s=SuffixTree()
term=chr(0)
s.add(big+term)
s.count_suffixes(term)
for needle in small_Strings:
    x=s.count_matches(needlE)
    print needle,'has',x,'matches'

它打印：

a has 11 matches 
ab has 1 matches 
abc has 0 matches

但是,在实践中,我建议您只使用预先存在的Aho-Corasick实现,因为我希望在您的特定情况下这会更快.

大佬总结

以上是大佬教程为你收集整理的c – 哪种字符串查找算法适用于此？全部内容，希望文章能够帮你解决c – 哪种字符串查找算法适用于此？所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错，欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ：384754419，请注明来意。

标签：c 于此哪种字符串查找算法适用

上一篇: objective-c – ARC,桥接演员和G... 下一篇:c – 具有代理迭代器/引用和auto...

猜你在找的C&C++相关文章

两个稀疏矩阵的乘法算法的实现——十字链表矩阵相乘 2022-04-13
c – program_options代码中的链接错误与ubuntu上的boost库 2019-10-05
如何将警告视为Makefile中的错误？ 2019-10-05
如何检查数组是否有任何重复？ 2019-10-05
c – 你应该在虚拟继承中写“公共虚拟”还是“虚拟公共”？ 2019-10-05
C URLencode库(支持Unicode)？ 2019-10-05
objective-c – 启用ARC的设备上的iOS崩溃 2019-10-05
c – 除零除法：检查除数的表达式不会导致零与检查除数不为零？ 2019-10-05
c – 重复排列：避免溢出 2019-10-05
C“删除”很慢.我应该先看哪儿？ 2019-10-05