python - re.search becomes unresponsive -
when run code doesn't print neither 'checked'
nor 'not matching'
. stops responding completely.
url='http://hoswifi.bblink.cn/v3/2-fd1cc0657845832e5e1248e6539a50fa/topic/55-13950.html?from=home' m=re.search(r'/\d-(b|(\w+){10,64})/index.html',url) if m: print('checked') else: print('not matching')
suppose have following script:
s = '1234567890' m = re.search(r'(\w+)*z', s)
our string contains 10 digits, , not contain 'z'
. intentional forces re.search
check all possible combinations, otherwise stop on first match.
i can't calculate number of possible combinations, since math involved rather tricky, here small demonstration on happens when s
gets more digits:
time goes 1μs single digit s
100 seconds 30 digit s
, is, 108 more time.
my guess similar happens when use (\w+){10,64}
. instead should use \w{10,64}
.
code used demo:
import timeit import matplotlib.pyplot plt setup = """ import re """ _base_stmt = "m = re.search(r'(\w+)*z','{}')" # (searched string becomes '1', '11', '111'...) statements = {} in range(1, 18): statements.update({i: _base_stmt.format('1'*i)}) # creates x, y values x = [] y = [] in sorted(statements): x.append(i) y.append(timeit.timeit(statements[i], setup, number=1)) # plot plt.plot(x, y) plt.xlabel('string length') plt.ylabel('time(sec)') plt.show()
Comments
Post a Comment