python - Numpy: Multiprocessing a matrix multiplication with pool -
i trying calculate dot product pool
pool = pool(8) x = np.array([2,3,1,0]) y = np.array([1,3,1,0]) print np.dot(x,y) #works print pool.map(np.dot,x,y) #error below valueerror: truth value of array more 1 element ambiguous. use a.any() or a.all()
also tried
ne.evaluate('dot(x, y)') typeerror: 'variablenode' object not callable
what trying is, unfortunately, not possible in ways you're trying it, , not possible in simple way either.
to make things worse, multiprocessing.pool documentation python 2.7 utterly wrong , lies pool.map
: isn't @ equivalent builtin map
. builtin map
can take multiple argument iterators pass function, while pool.map
can't... has been known , not fixed or documented in docstring pool.map
since at least 2011. there's partial fix, of course, in python 3 starmap...
honestly, though, multiprocessing module isn't terribly useful speeding numerical code. example, see here long discussion of situations many numpy operations slower when done through multiprocessing.
however, there's issue here well: can't parallelize this. map
takes lists/iterators of arguments , applies function each in turn. that's not going want: in case, try map(np.dot,x,y)
, , note product of each element of x , y list, not dot product. running function many times in parallel easy. making function parallel on single call hard, requires making function parallel. in case, mean rewriting function.
except np.dot parallelized, if have version of numpy blas or atlas (try np.__config__.show()
). don't need work @ in case: np.dot(x,y)
should already use cores without work!
i should note is, however, restricted dtypes
; floats supported. on computer, example, behold striking differences between float
, int
:
in [19]: = np.matrix(np.random.randint(0,10,size=(1000,1000)),dtype='int') in [20]: b = a.astype('float') in [23]: %timeit np.dot(a,a) 1 loops, best of 3: 6.91 s per loop in [24]: %timeit np.dot(b,b) 10 loops, best of 3: 28.1 ms per loop
for numexpr (and in asking questions, it's useful point out abbreviations using might not know), there's limited set of supported functions; check documentation list. error because dot
isn't supported function. since you're dealing 1d arrays, , dot pretty simple define, following work: ne.evaluate('sum(x*y)')
. doubt, however, you're planning on using 1d arrays.
if want parallelize things on large scale, i'd suggest using ipython, , parallel system, unlike python's multiprocessing, useful numerical work. added bonus, can parallelize across computers. however, sort of parallelization useful things take while per run; if want use cores simple things, it's best hope numpy has multiprocessor support functions want use.
Comments
Post a Comment