python - re.search becomes unresponsive -
when run code doesn't print neither 'checked' nor 'not matching'. stops responding completely.
url='http://hoswifi.bblink.cn/v3/2-fd1cc0657845832e5e1248e6539a50fa/topic/55-13950.html?from=home' m=re.search(r'/\d-(b|(\w+){10,64})/index.html',url) if m: print('checked') else: print('not matching')
suppose have following script:
s = '1234567890' m = re.search(r'(\w+)*z', s) our string contains 10 digits, , not contain 'z'. intentional forces re.search check all possible combinations, otherwise stop on first match.
i can't calculate number of possible combinations, since math involved rather tricky, here small demonstration on happens when s gets more digits:

time goes 1μs single digit s 100 seconds 30 digit s, is, 108 more time.
my guess similar happens when use (\w+){10,64}. instead should use \w{10,64}.
code used demo:
import timeit import matplotlib.pyplot plt setup = """ import re """ _base_stmt = "m = re.search(r'(\w+)*z','{}')" # (searched string becomes '1', '11', '111'...) statements = {} in range(1, 18): statements.update({i: _base_stmt.format('1'*i)}) # creates x, y values x = [] y = [] in sorted(statements): x.append(i) y.append(timeit.timeit(statements[i], setup, number=1)) # plot plt.plot(x, y) plt.xlabel('string length') plt.ylabel('time(sec)') plt.show()
Comments
Post a Comment