271.Encode and Decode Strings

Tags: [string], [encode], [decode], [serialization], [deserialization], [escaping]

Com: {g}

Link: https://leetcode.com/problems/encode-and-decode-strings/#/description

Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.

Machine 1 (sender) has the function:

string encode(vector<string> strs) {
  // ... your code
  return encoded_string;
}

Machine 2 (receiver) has the function:

vector<string>
 decode(string s) {
  //... your code
  return strs;
}

So Machine 1 does:

string encoded_string = encode(strs);

and Machine 2 does:

vector<string> strs2 = decode(encoded_string);

strs2in Machine 2 should be the same as strsin Machine 1.

Implement the encodeand decodemethods.

Note:

  • The string may contain any possible characters out of 256 valid ascii characters. Your algorithm should be generalized enough to work on any possible characters.
  • Do not use class member/global/static variables to store states. Your encode and decode algorithms should be stateless.
  • Do not rely on any library method such as evalor serialize methods. You should implement your own encode/decode algorithm.

Better Solution:

class Codec:

    def encode(self, strs):
        """Encodes a list of strings to a single string.

        :type strs: List[str]
        :rtype: str
        """
        if not strs:
            return ''

        # here DELIMITER we can choose any non-numeral char.
        DELIMITER = 'a'
        result = []
        for s in strs:
            result.append(str(len(s)))
            result.append(DELIMITER)
            result.append(s)

        return ''.join(result)


    def decode(self, s):
        """Decodes a single string to a list of strings.

        :type s: str
        :rtype: List[str]
        """
        if not s:
            return []

        result = []
        index = 0
        while index < len(s):
            i = index
            while i < len(s) and '0' <= s[i] <= '9':
                i += 1

            sub_s_len = int(s[index:i])
            # Skip the delimiter, so we start from i + 1.
            result.append(s[i + 1:i + 1 + sub_s_len])
            index = i + 1 + sub_s_len

        return result


# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))

Revelation:

  • The idea is that we can use any non-numeral char to split the s_len and s.

Solution:

class Codec:

    def encode(self, strs):
        """Encodes a list of strings to a single string.

        :type strs: List[str]
        :rtype: str
        """
        if not strs:
            return ''

        result = []
        for s in strs:
            s_len = len(s)
            s_len_str = '0' * (10 - len(str(s_len))) + str(s_len)
            result.append(s_len_str)
            result.append(s)

        return ''.join(result)


    def decode(self, s):
        """Decodes a single string to a list of strings.

        :type s: str
        :rtype: List[str]
        """
        if not s:
            return []

        result = []
        index = 0
        while index < len(s):
            sub_s_len = int(s[index:index + 10])
            result.append(s[index + 10: index + 10 + sub_s_len])
            index += 10 + sub_s_len

        return result

# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))

Revelation:

  • We cannot use delimiter to concat the strings, because we cannot choose a 100% different char as delimiter.
  • The main idea is that put the len of each string in front of the string, but if the len of s_len is not fixed, it's also very hard to parse it. So we assume the max length of the string is in billions, so the max len of str(s_len) is 10.
  • But there are two drawbacks: One is the max length of each string is limited in billions, the other one is we will waste some spaces.

Note:

  • Time complexity of encode = O(n), n is the number of elements in strs.
  • Time complexity of decode = O(n), n is the number of elements in strs.

Other idea:

  • Escaping, we can use '#' as the delimiter to separate the strings, and we don't need to record the s_len, but we need to first convert all existing '#' to '##', and then we using ' # ' (the '#' surrounding with two spaces) to separate each string, so when we do decoding, once we meet the single '#', we know how to separate the strings.

results matching ""

    No results matching ""