opencv - Recognizing similar shapes at random scale and translation -
playing around finding stuff on graphical screen, i'm @ loss how find given shape within image. shape in image have different scale , @ unknown x,y offset, of course.
aside pixel artifacts resulting different scales, there little noise in both images, need tolerant search.
here's image looking for.
it should show somewhere in screen dump of (dual) screen buffer, 3300 x 1200 pixels in size. i'd of course of study expect find in browser window, info shouldn't necessary.
the object of exercise (so far) come result says:
yes, wooden frame (of approximate color , that, perchance truncated, shape) found on screen (or not); and the game's client area (the black area within frame) occupies rectangle(x1,y1)
(x2,y2)
. i robust against scaling , noise that's introduced dithering. on other hand, can rule out of usual cv challenges, such rotation or non-rigidity. frame shape dead easy human brain discern, how hard can dedicated piece of software? adobe flash application, , until had thought perceiving images game gui should easy pie.
i'm looking algorithm able find x,y translation @ greatest possible overlap between needle , haystack occur, , if possible without having iterated through series of possible scale factors. ideally, algorithm abstract out "shape-ness" of images in way that's independent of scale.
i've read interesting things fourier transforms accomplish similar: given target image @ same scale, fft , matrix math yielded points in bigger image corresponded search pattern. don't have theoretical background set practice, nor know if approach gracefully handle scale problem. help appreciated!
technology: i'm programming in clojure/java adapt algorithms in other languages. think should able interface libraries follow c calling conventions prefer pure java solution.
you may able understand why i've shied away presenting actual image. it's silly game, task of screen-reading proving much more challenging had thought.
i'm able exhaustive search of screen buffer pixels (excluding black) create image, , runs in under minute. ambition find wooden frame using technique match shape regardless of differences might arise scaling , dithering.
dithering, in fact, 1 of many frustrations i'm having project. i've been working on extracting useful vectors border extraction, edges woefully elusive because pixels of given area have inconsistent colors - it's hard tell real edges local dithering artifacts. had no thought such simple-looking game produce graphics hard software perceive.
should start off locally averaging pixels before start looking features? should cut down color depth throwing out to the lowest degree important bits of pixel color values?
i'm trying pure java solution (actually programming in clojure/java mix) i'm not wild opencv (which installs .dll's or .so's c code). please don't worry selection of language, learning experience much more interesting me performance.
being computer vison guy, point feature-extraction , -matching (sift, surf, lbp, etc.), overkill, since of these methods offer more invariances (=tolerances against transformations) require (e.g. against rotation, luminance change,...). also, using features involve either opencv or lots of programming.
so here proposal simple solution - justice whether passes smartness threshold:
it looks image looking has distinct structures (the letters, logos, etc). suggest pixel-to-pixel match every possible translation, , number of different scales (i assume range of scales limited) - only little distinctive patch of image looking (say, square portion of yellowish text). much faster matching whole thing. if want fancy name it: in image processing called template matching correlation. "template" thing looking for.
once have found few candidate locations little distinctive patch, can verify have nail testing either whole image or, more efficiently, few other distinctive patches of image (using, of course, translation / scale found). makes search robust against accidental matches of original patch without stealing much performance.
regarding dithering tolerance, go simple pre-filtering of both images (the template looking for, , image search space). depending on properties of dithering, can start experimenting simple box blur, , proceed median filter little kernel (3 x 3) if not work. not 100% identity between template , searched image, robust numerical scores can compare.
edit in lite of comments
i understand (1) want more robust, more "cv-like" , bit more fancy solution, , (2) skeptical towards achieving scale invariance scanning though big stack of different scales.
regarding (1), canonical approach is, mentioned above, utilize feature descriptors. feature descriptors not describe finish image (or shape), little portion of image in way invariant against various transformations. have @ sift , surf, , @ vlfeat, has sift implementation , implements mser , hog (and much smaller opencv). surf easier implement sift, both heavily patented. both have "upright" version, has no rotation invariance. should increment robustness in case.
the strategy describe in comment goes more in direction of shape descriptors image feature descriptors. make sure understand difference between those! 2d shape descriptors aim @ shapes typically described outline or binary mask. image feature descriptors (in sense utilize above) aim @ images intensity values, typically photographs. interesting shape descriptor shape context, many others summarized here. don't think problem best solved shape descriptors, maybe misunderstood something. careful shape descriptors on image edges, edges, beingness first derivatives, can altered dithering noise.
regarding (2): i'd convince scanning through bunch of different scales not stupid hack don't know computer vision! actually, done lot in vision, have fancy name mislead uninitiated - scale space search. that's bit of oversimplification, bit. image feature descriptors used in practice accomplish scale invariance using scale space, stack of increasingly downscaled (and low-pass filtered) images. trick add together extrema in scale space , compute descriptors @ extrema. still, finish scale space computed , traversed find extrema. have in original sift paper explanation of this.
opencv image-processing computer-vision fft image-recognition
No comments:
Post a Comment