271.Encode and Decode Strings
Tags: [string], [encode], [decode], [serialization], [deserialization], [escaping]
Com: {g}
Link: https://leetcode.com/problems/encode-and-decode-strings/#/description
Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.
Machine 1 (sender) has the function:
string encode(vector<string> strs) {
// ... your code
return encoded_string;
}
Machine 2 (receiver) has the function:
vector<string>
decode(string s) {
//... your code
return strs;
}
So Machine 1 does:
string encoded_string = encode(strs);
and Machine 2 does:
vector<string> strs2 = decode(encoded_string);
strs2
in Machine 2 should be the same as strs
in Machine 1.
Implement the encode
and decode
methods.
Note:
- The string may contain any possible characters out of 256 valid ascii characters. Your algorithm should be generalized enough to work on any possible characters.
- Do not use class member/global/static variables to store states. Your encode and decode algorithms should be stateless.
- Do not rely on any library method such as
eval
or serialize methods. You should implement your own encode/decode algorithm.
Better Solution:
class Codec:
def encode(self, strs):
"""Encodes a list of strings to a single string.
:type strs: List[str]
:rtype: str
"""
if not strs:
return ''
# here DELIMITER we can choose any non-numeral char.
DELIMITER = 'a'
result = []
for s in strs:
result.append(str(len(s)))
result.append(DELIMITER)
result.append(s)
return ''.join(result)
def decode(self, s):
"""Decodes a single string to a list of strings.
:type s: str
:rtype: List[str]
"""
if not s:
return []
result = []
index = 0
while index < len(s):
i = index
while i < len(s) and '0' <= s[i] <= '9':
i += 1
sub_s_len = int(s[index:i])
# Skip the delimiter, so we start from i + 1.
result.append(s[i + 1:i + 1 + sub_s_len])
index = i + 1 + sub_s_len
return result
# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))
Revelation:
- The idea is that we can use any non-numeral char to split the s_len and s.
Solution:
class Codec:
def encode(self, strs):
"""Encodes a list of strings to a single string.
:type strs: List[str]
:rtype: str
"""
if not strs:
return ''
result = []
for s in strs:
s_len = len(s)
s_len_str = '0' * (10 - len(str(s_len))) + str(s_len)
result.append(s_len_str)
result.append(s)
return ''.join(result)
def decode(self, s):
"""Decodes a single string to a list of strings.
:type s: str
:rtype: List[str]
"""
if not s:
return []
result = []
index = 0
while index < len(s):
sub_s_len = int(s[index:index + 10])
result.append(s[index + 10: index + 10 + sub_s_len])
index += 10 + sub_s_len
return result
# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))
Revelation:
- We cannot use delimiter to concat the strings, because we cannot choose a 100% different char as delimiter.
- The main idea is that put the len of each string in front of the string, but if the len of s_len is not fixed, it's also very hard to parse it. So we assume the max length of the string is in billions, so the max len of str(s_len) is 10.
- But there are two drawbacks: One is the max length of each string is limited in billions, the other one is we will waste some spaces.
Note:
- Time complexity of encode = O(n), n is the number of elements in strs.
- Time complexity of decode = O(n), n is the number of elements in strs.
Other idea:
- Escaping, we can use '#' as the delimiter to separate the strings, and we don't need to record the s_len, but we need to first convert all existing '#' to '##', and then we using ' # ' (the '#' surrounding with two spaces) to separate each string, so when we do decoding, once we meet the single '#', we know how to separate the strings.