Thursday, 15 March 2012

python - How to vectorize this loop difference in numpy? -



python - How to vectorize this loop difference in numpy? -

i sense there should quick way of speeding code. think reply here, cannot seem problem in format. underlying problem attempting solve find point wise difference in terms of parallel , perpendicular components , create 2d histogram of these differences.

out = np.zeros((len(rpbins)-1,len(pibins)-1)) tmp = np.zeros((len(x),2)) in xrange(len(x)): tmp[:,0] = x - x[i] tmp[:,1] = y - y[i] para = np.sum(tmp**2,axis=-1)**(1./2) perp = np.abs(z - z[i]) h, _, _ = np.histogram2d(para, perp, bins=[rpbins, pibins]) out += h

vectorizing things tricky, because rid of loop on n elements have build array of (n, n), big inputs worse performance python loop. can done:

mask = np.triu_indices(x.shape[0], 1) para = np.sqrt((x[:, none] - x)**2 + (y[:, none] - y)**2) perp = np.abs(z[:, none] - z) hist, _, _ = np.histogram2d(para[mask], perp[mask], bins=[rpbins, pibins])

the mask avoid counting each distance twice. have set diagonal offset 1 avoid including 0 distances of each point in histogram. if don't index para , perp it, exact same result code.

with sample data:

items = 100 rpbins, pibins = np.linspace(0, 1, 3), np.linspace(0, 1, 3) x = np.random.rand(items) y = np.random.rand(items) z = np.random.rand(items)

i hist , out:

>>> hist array([[ 1795., 651.], [ 1632., 740.]]) >>> out array([[ 3690., 1302.], [ 3264., 1480.]])

and out[i, j] = 2 * hist[i, j] except i = j = 0, out[0, 0] = 2 * hist[0, 0] + items because of 0 distance of each item itself.

edit tried next after tcaswell's comment:

items = 1000 rpbins, pibins = np.linspace(0, 1, 3), np.linspace(0, 1, 3) x, y, z = np.random.rand(3, items) def hist1(x, y, z, rpbins, pibins) : mask = np.triu_indices(x.shape[0], 1) para = np.sqrt((x[:, none] - x)**2 + (y[:, none] - y)**2) perp = np.abs(z[:, none] - z) hist, _, _ = np.histogram2d(para[mask], perp[mask], bins=[rpbins, pibins]) homecoming hist def hist2(x, y, z, rpbins, pibins) : mask = np.triu_indices(x.shape[0], 1) para = np.sqrt((x[:, none] - x)[mask]**2 + (y[:, none] - y)[mask]**2) perp = np.abs((z[:, none] - z)[mask]) hist, _, _ = np.histogram2d(para, perp, bins=[rpbins, pibins]) homecoming hist def hist3(x, y, z, rpbins, pibins) : mask = np.triu_indices(x.shape[0], 1) para = np.sqrt(((x[:, none] - x)**2 + (y[:, none] - y)**2)[mask]) perp = np.abs((z[:, none] - z)[mask]) hist, _, _ = np.histogram2d(para, perp, bins=[rpbins, pibins]) homecoming hist in [10]: %timeit -n1 -r10 hist1(x, y, z, rpbins, pibins) 1 loops, best of 10: 289 ms per loop in [11]: %timeit -n1 -r10 hist2(x, y, z, rpbins, pibins) 1 loops, best of 10: 294 ms per loop in [12]: %timeit -n1 -r10 hist3(x, y, z, rpbins, pibins) 1 loops, best of 10: 278 ms per loop

it seems of time spent instantiating new arrays, not doing actual computations, while there efficiency scrape off, there isn't much.

python matlab numpy linear-algebra

No comments:

Post a Comment