python - Large list generation optimization -
i needed python function take list of strings in form of:
seq = ['a[0]','b[2:5]','a[4]']
and return new list of "expanded" elements preserved order, so:
expanded = ['a[0]', 'b[2]', 'b[3]', 'b[4]', 'b[5]', 'a[4]']
to achieve goal wrote simple function:
def expand_seq(seq): #['item[i]' item in seq xrange in item] return ['%s[%s]'%(item.split('[')[0],i) item in seq in xrange(int(item.split('[')[-1][:-1].split(':')[0]),int(item.split('[')[-1][:-1].split(':')[-1])+1)]
when dealing sequence generate less 500k items works well, slows down quite bit when generating large lists (more 1 million). example:
# let's generate 10 million items! seq = ['a[1:5000000]','b[1:5000000]'] t1 = time.clock() seq = expand_seq(seq) t2 = time.clock() print round(t2-t1, 3) # result: 9.541 seconds
i'm looking ways improve function , speed when dealing large lists. if has suggestions, love hear them!
the following seems give 35% speedup:
import re r = re.compile(r"(\w+)\[(\d+)(?::(\d+))?\]") def expand_seq(seq): result = [] item in seq: m = r.match(item) name, start, end = m.group(1), int(m.group(2)), m.group(3) rng = xrange(start, int(end)) if end else (start,) t = name + "[" result.extend(t + str(i) + "]" in rng) return result
with code:
- we compile regular expression use in function.
- we concatenate our strings directly.
Comments
Post a Comment