Friday, 15 August 2014

algorithm - Transform a set of large integers into a set of small ones -



algorithm - Transform a set of large integers into a set of small ones -

how recode set of strictly increasing (or strictly decreasing) positive integers p, decrease number of positive integers can occur between integers in our set?

why want this: want randomly sample p 1.) p big enumerate, , 2.) members of p related in nonrandom way, in way complicated sample by. however, know fellow member of p when see it. know p[0] , p[n] can't entertain thought of enumerating of p or understanding exactly how members of p related. likewise, number of possible integers occurring between p[0] , p[n] many times greater size of p, making chance of randomly drawing fellow member of p small.

example: allow p[0] = 2101010101 & p[n] = 505050505. now, maybe we're interested in integers between p[0] , p[n] have specific quality (e.g. integers in p[x] sum q or less, each fellow member of p has 7 or less largest integer). so, not positive integers p[n] <= x <= p[0] belong p. p i'm interested in discussed in comments below.

what i've tried: if p strictly decreasing set , know p[0] , p[n], can treat each fellow member if subtracted p[0]. doing decreases each number, perhaps , maintains each fellow member unique integer. p i'm interested in (below), 1 can treat each decreased value of p beingness divided mutual denominator (9,11,99), decreases number of possible integers between members of p. i've found used in conjunction, these approaches decrease set of p[0] <= x <= p[n] few orders of magnitude, making chance of randomly drawing fellow member of p positive integers p[n] <= x <= p[0] still small.

note: should clear, have know p. if don't, means have no clue of we're looking for. when randomly sample integers between p[0] , p[n] (recoded or not) need able "yup, belongs p.", if indeed does.

a reply increment practical application of computing algorithm have developed. illustration of kind of p i'm interested in given in comment 2. adamant giving due credit.

while original question asking generic scenario concerning integer encodings, suggest unlikely there exists approach works in finish generality. example, if p[i] more or less random (from information-theoretic standpoint), surprised if should work.

so, instead, allow turn our attending op's actual problem of generating partitions of integer n containing k parts. when encoding combinatorial objects integers, behooves preserve much of combinatorial construction possible. this, turn classic text combinatorial algorithms nijenhuis , wilf, chapter 13. in fact, in chapter, demonstrate framework enumerate , sample number of combinatorial families -- including partitions of n largest part equal k. using well-known duality between partitions k parts , partitions largest part k (take transpose of ferrers diagram), find need create alter decoding process.

anyways, here's source code:

import sys import random import time if len(sys.argv) < 4 : sys.stderr.write("usage: {0} n k iter\n".format(sys.argv[0])) sys.stderr.write("\tn = number partitioned\n") sys.stderr.write("\tk = number of parts\n") sys.stderr.write("\titer = number of iterations (if iter=0, enumerate partitions)\n") quit() n = int(sys.argv[1]) k = int(sys.argv[2]) iters = int(sys.argv[3]) if (n < k) : sys.stderr.write("error: n<k ({0}<{1})\n".format(n,k)) quit() # b[n][k] = number of partitions of n largest part equal k b = [[0 j in range(k+1)] in range(n+1)] def calc_b(n,k) : j in xrange(1,k+1) : m in xrange(j, n+1) : if j == 1 : b[m][j] = 1 elif m - j > 0 : b[m][j] = b[m-1][j-1] + b[m-j][j] else : b[m][j] = b[m-1][j-1] def generate(n,k,r=none) : path = [] append = path.append # invalid input if n < k or n == 0 or k == 0: homecoming [] # pick random number between 1 , b[n][k] if r not specified if r == none : r = random.randrange(1,b[n][k]+1) # build path r while r > 0 : if n==1 , k== 1: append('n') r = 0 ### finish loop elif r <= b[n-k][k] , b[n-k][k] > 0 : # east/west move append('e') n = n-k else : # northeast/southwest move append('n') r -= b[n-k][k] n = n-1 k = k-1 # decode path partition partition = [] l = 0 d = 0 append = partition.append in reversed(path) : if == 'n' : if d > 0 : # apply east moves @ 1 time j in xrange(l) : partition[j] += d d = 0 # reset east moves append(1) # apply north move l += 1 else : d += 1 # accumulate east moves if d > 0 : # apply remaining east moves j in xrange(l) : partition[j] += d homecoming partition t = time.clock() sys.stderr.write("generating b table... ") calc_b(n, k) sys.stderr.write("done ({0} seconds)\n".format(time.clock()-t)) bmax = b[n][k] bits = 0 sys.stderr.write("b[{0}][{1}]: {2}\t".format(n,k,bmax)) while bmax > 1 : bmax //= 2 bits += 1 sys.stderr.write("bits: {0}\n".format(bits)) if iters == 0 : # enumerate partitions in xrange(1,b[n][k]+1) : print i,"\t",generate(n,k,i) else : # generate random partitions t=time.clock() in xrange(1,iters+1) : q = generate(n,k) print q if i%1000==0 : sys.stderr.write("{0} written ({1:.3f} seconds)\r".format(i,time.clock()-t)) sys.stderr.write("{0} written ({1:.3f} seconds total) ({2:.3f} iterations per second)\n".format(i, time.clock()-t, float(i)/(time.clock()-t) if time.clock()-t else 0))

and here's examples of performance (on macbook pro 8.3, 2ghz i7, 4 gb, mac osx 10.6.3, python 2.6.1):

mhum$ python part.py 20 5 10 generating b table... done (6.7e-05 seconds) b[20][5]: 84 bits: 6 [7, 6, 5, 1, 1] [6, 6, 5, 2, 1] [5, 5, 4, 3, 3] [7, 4, 3, 3, 3] [7, 5, 5, 2, 1] [8, 6, 4, 1, 1] [5, 4, 4, 4, 3] [6, 5, 4, 3, 2] [8, 6, 4, 1, 1] [10, 4, 2, 2, 2] 10 written (0.000 seconds total) (37174.721 iterations per second) mhum$ python part.py 20 5 1000000 > /dev/null generating b table... done (5.9e-05 seconds) b[20][5]: 84 bits: 6 100000 written (2.013 seconds total) (49665.478 iterations per second) mhum$ python part.py 200 25 100000 > /dev/null generating b table... done (0.002296 seconds) b[200][25]: 147151784574 bits: 37 100000 written (8.342 seconds total) (11987.843 iterations per second) mhum$ python part.py 3000 200 100000 > /dev/null generating b table... done (0.313318 seconds) b[3000][200]: 3297770929953648704695235165404132029244952980206369173 bits: 181 100000 written (59.448 seconds total) (1682.135 iterations per second) mhum$ python part.py 5000 2000 100000 > /dev/null generating b table... done (4.829086 seconds) b[5000][2000]: 496025142797537184410324290349759736884515893324969819660 bits: 188 100000 written (255.328 seconds total) (391.653 iterations per second) mhum$ python part-final2.py 20 3 0 generating b table... done (0.0 seconds) b[20][3]: 33 bits: 5 1 [7, 7, 6] 2 [8, 6, 6] 3 [8, 7, 5] 4 [9, 6, 5] 5 [10, 5, 5] 6 [8, 8, 4] 7 [9, 7, 4] 8 [10, 6, 4] 9 [11, 5, 4] 10 [12, 4, 4] 11 [9, 8, 3] 12 [10, 7, 3] 13 [11, 6, 3] 14 [12, 5, 3] 15 [13, 4, 3] 16 [14, 3, 3] 17 [9, 9, 2] 18 [10, 8, 2] 19 [11, 7, 2] 20 [12, 6, 2] 21 [13, 5, 2] 22 [14, 4, 2] 23 [15, 3, 2] 24 [16, 2, 2] 25 [10, 9, 1] 26 [11, 8, 1] 27 [12, 7, 1] 28 [13, 6, 1] 29 [14, 5, 1] 30 [15, 4, 1] 31 [16, 3, 1] 32 [17, 2, 1] 33 [18, 1, 1]

i'll leave op verify code indeed generates partitions according desired (uniform) distribution.

edit: added illustration of enumeration functionality.

algorithm combinatorics sampling random-sample number-theory

No comments:

Post a Comment