Wednesday, 15 January 2014

php - Compare table rows, big data amount -



php - Compare table rows, big data amount -

i have quite interesting task. don't know how phone call in 1 word in order search related topics. topic title might not reflect need. so, if has improve title - welcome.

i'll seek explain problem.

i have 100,000 rows in mysql db table. , need "compare" entries table.

"compare" doesn't mean equal. there algorithm calculation comparing level. have weight coefficient each table column. means if entry#1's column1 equals entry#2's column2 give, say, 5 point pair. , on each column.

the straight forwards way - apply calculation rules each couple of entries. why afraid of this? 100,000 entries means 5 billion "compare" operations. sure, can calculate on demand , store result somewhere in cache. believe obvious way not effective.

so, first question is: is there other improve way achive goal except of brute force?

my sec question related tool improve calculations.

application language php. hence, need load memory whole table , iterate on data. create stored procedure in mysql. using mongodb's aggregation framework or mapreduce.

the to the lowest degree of first way. of - last.

i'm looking suggestion or advice people have experience in such sort of cases.

since, don't know how inquire google help, links appreciated.

update:

calculation rules bit more complicated described...

table has set of related columns used @ 1 time group(not 1 one). let's assume:

table has fields, say, tag_1, tag_2, .., tag_n. row_1 , row_2 - entries in table.

the rule(pseudo-code):

if(row_1.tag_1==row_2.tag_1) { // gives 10 points } elseif(row_1.tag_1 in row_2.tags && row_1.tag_1!=row_2.tag_1) { // gives 5 points } .... // , on

basically, need check find intersection of 2 arrays. if not empty - points given. if indexes of tags in 2 rows match additional points given.

i'm wondering, how can accomplished using stored procedures language? because can done pretty easy using programming language.

if stored procedure can choice.

if have static table, doesn't create difference choose, long store results somewhere (presumably in database).

if info changing, need compare each new row rows, full-table scan. best done in database.

if info fits memory (and 500,000 rows should fit memory), (2) faster (3) on equivalent hardware. "equivalent hardware" of import consideration.

in cases, opt (2). sounds query like:

select t.id, t2.id, ((case when t1.col1 = t2.col1 5 else 0 end) + (case when t2.col2 = t2.col2 7 else 0 end) + . . . ) t cross bring together t2

if much more comfortable map-reduce, might find easier code there. know both languages , prefer sql this.

php mysql mongodb mapreduce aggregation

No comments:

Post a Comment