LeetCode_30_困难_串联所有单词的子串

发布于:2024-03-21 ⋅ 阅读:(78) ⋅ 点赞:(0)


1. 题目

给定一个字符串 s s s 和一个字符串数组 w o r d s words words w o r d s words words 中所有字符串 长度相同

s s s 中的 串联子串 是指一个包含 w o r d s words words 中所有字符串以任意顺序排列连接起来的子串。

例如,如果 w o r d s = [ " a b " , " c d " , " e f " ] words = ["ab","cd","ef"] words=["ab","cd","ef"], 那么 " a b c d e f " , " a b e f c d " , " c d a b e f " , " c d e f a b " , " e f a b c d " , " e f c d a b " "abcdef", "abefcd","cdabef", "cdefab","efabcd", "efcdab" "abcdef""abefcd""cdabef""cdefab""efabcd""efcdab" 都是串联子串。 " a c d b e f " "acdbef" "acdbef" 不是串联子串,因为他不是任何 w o r d s words words 排列的连接。

返回所有串联子串在 s s s 中的开始索引。你可以以 任意顺序 返回答案。

示例 1:

输入: s = " b a r f o o t h e f o o b a r m a n " , w o r d s = [ " f o o " , " b a r " ] s = "barfoothefoobarman", words = ["foo","bar"] s="barfoothefoobarman",words=["foo","bar"]
输出: [ 0 , 9 ] [0,9] [0,9]
解释:因为 w o r d s . l e n g t h = = 2 words.length == 2 words.length==2 同时 w o r d s [ i ] . l e n g t h = = 3 words[i].length == 3 words[i].length==3,连接的子字符串的长度必须为 6 6 6
子串 " b a r f o o " "barfoo" "barfoo" 开始位置是 0 0 0。它是 w o r d s words words 中以 [ " b a r " , " f o o " ] ["bar","foo"] ["bar","foo"] 顺序排列的连接。
子串 " f o o b a r " "foobar" "foobar" 开始位置是 9 9 9。它是 w o r d s words words 中以 [ " f o o " , " b a r " ] ["foo","bar"] ["foo","bar"] 顺序排列的连接。
输出顺序无关紧要。返回 [ 9 , 0 ] [9,0] [9,0] 也是可以的。

示例 2:

输入: s = " w o r d g o o d g o o d g o o d b e s t w o r d " , w o r d s = [ " w o r d " , " g o o d " , " b e s t " , " w o r d " ] s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"] s="wordgoodgoodgoodbestword",words=["word","good","best","word"]
输出: [ ] [] []
解释:因为 w o r d s . l e n g t h = = 4 words.length == 4 words.length==4 并且 w o r d s [ i ] . l e n g t h = = 4 words[i].length == 4 words[i].length==4,所以串联子串的长度必须为 16 16 16
s s s 中没有子串长度为 16 16 16 并且等于 w o r d s words words 的任何顺序排列的连接。
所以我们返回一个空数组。

示例 3:

输入: s = " b a r f o o f o o b a r t h e f o o b a r m a n " , w o r d s = [ " b a r " , " f o o " , " t h e " ] s = "barfoofoobarthefoobarman", words = ["bar","foo","the"] s="barfoofoobarthefoobarman",words=["bar","foo","the"]
输出: [ 6 , 9 , 12 ] [6,9,12] [6,9,12]
解释:因为 w o r d s . l e n g t h = = 3 words.length == 3 words.length==3 并且 w o r d s [ i ] . l e n g t h = = 3 words[i].length == 3 words[i].length==3,所以串联子串的长度必须为 9 9 9
子串 " f o o b a r t h e " "foobarthe" "foobarthe" 开始位置是 6 6 6。它是 w o r d s words words 中以 [ " f o o " , " b a r " , " t h e " ] ["foo","bar","the"] ["foo","bar","the"] 顺序排列的连接。
子串 " b a r t h e f o o " "barthefoo" "barthefoo" 开始位置是 9 9 9。它是 w o r d s words words 中以 [ " b a r " , " t h e " , " f o o " ] ["bar","the","foo"] ["bar","the","foo"] 顺序排列的连接。
子串 " t h e f o o b a r " "thefoobar" "thefoobar" 开始位置是 12 12 12。它是 w o r d s words words 中以 [ " t h e " , " f o o " , " b a r " ] ["the","foo","bar"] ["the","foo","bar"] 顺序排列的连接。


提示

  • 1 < = s . l e n g t h < = 1 0 4 1 <= s.length <= 10^4 1<=s.length<=104
  • 1 < = w o r d s . l e n g t h < = 5000 1 <= words.length <= 5000 1<=words.length<=5000
  • 1 < = w o r d s [ i ] . l e n g t h < = 30 1 <= words[i].length <= 30 1<=words[i].length<=30
  • w o r d s [ i ] words[i] words[i] s s s 由小写英文字母组成

2. 思路及代码实现详解(Python)

2.1 滑动窗口

words \textit{words} words 的长度为 m m m words \textit{words} words 中每个单词的长度为 n n n s s s 的长度为 ls \textit{ls} ls。由于已知单词的长度和数量,因此可以直到串联子串的长度,并通过建立一个固定长度的窗口,并在不断滑动窗口的过程中,判断窗口中的固定长度字符串的出现次数是否与 w o r d s words words 中的一致,如果一致,则获取窗口的开始索引位置。

具体做法是,首先需要将 s s s 划分为单词组,每个单词的大小均为 n n n (首尾除外)。这样的划分方法有 n n n 种( l e n ( s ) / n len(s)/n len(s)/n 的商的情况),即先删去前 i i i i = 0 ∼ n − 1 i = 0 \sim n-1 i=0n1)个字母后,将剩下的字母进行划分,如果末尾有不到 n n n 个字母也删去。对这 n n n 种划分得到的单词数组分别使用滑动窗口对 words \textit{words} words 进行搜寻,每次向右滑动 n n n 个距离,直到 s s s 右端剩余字符数不足 n n n

划分成单词组后,一个窗口包含 s s s 中从 i i i 起始的前 m m m 个单词,用一个哈希表 differ \textit{differ} differ 表示窗口中单词频次和 words \textit{words} words 中单词频次之差,当差值为 0 0 0 时认为窗口中单词数量与 w o r d s words words 当中单词数量一致。初始化 differ \textit{differ} differ 时,出现在窗口中的单词,每出现一次,相应的值增加 1 1 1,出现在 words \textit{words} words 中的单词,每出现一次,相应的值减少 1 1 1。然后将窗口右移,右侧会加入一个单词,左侧会移出一个单词,并对 differ \textit{differ} differ 做相应的更新。窗口移动时,若出现 differ \textit{differ} differ 中值不为 0 0 0 的键的数量为 0 0 0,则表示这个窗口中的单词频次和 words \textit{words} words 中单词频次相同,窗口的左端点是一个待求的起始位置。划分的方法有 n n n 种,做 n n n 次滑动窗口后,即可找到所有可能的起始位置。

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        res = []
        m, n, ls = len(words), len(words[0]), len(s)
        for i in range(n):
            if i + m * n > ls:
                break
            differ = Counter()
            for j in range(m):
                word = s[i + j * n: i + (j + 1) * n]
                differ[word] += 1
            for word in words:
                differ[word] -= 1
                if differ[word] == 0:
                    del differ[word]
            for start in range(i, ls - m * n + 1, n):
                if start != i:
                    word = s[start + (m - 1) * n: start + m * n]
                    differ[word] += 1
                    if differ[word] == 0:
                        del differ[word]
                    word = s[start - n: start]
                    differ[word] -= 1
                    if differ[word] == 0:
                        del differ[word]
                if len(differ) == 0:
                    res.append(start)
        return res

该算法遍历了 n n n 种窗口的左端起始位置,每次窗口滑动的时间复杂度为 O ( l s ) O(ls) O(ls),总的渐进复杂度为 O ( l s × n ) O(ls\times n) O(ls×n),空间复杂度为 O ( m × n ) O(m\times n) O(m×n),主要是保存单词出现频次的哈希表开销。

执行用时:97 ms
消耗内存:11.21 MB

题解来源:力扣官方题解

本文含有隐藏内容,请 开通VIP 后查看

网站公告

今日签到

点亮在社区的每一天
去签到