Hee

Unknown

sql - MySQL query optimization: how to optimize voting calculations? -

hope you're doing fine.

i need help bit database:

this database stores votes. users pick sound tracks like, , vote them. can vote 'up' or 'down'. easy pie. but, when comes calculating stats gets hairy.

meta

it's key-value styled table, stores commonly used stats (just sort-of caching):

mysql> select * meta; +-------------+-------+ | key         | value | +-------------+-------+ | track_count | 2620  | | vote_count  | 3821  | | user_count  | 371   | +-------------+-------+

vote

the vote table holds vote itself. interesting field here type, value of means:

0 - app made vote, user voted track using ui 1 - imported vote (from external service) 2 - merged vote. same imported vote, makes note, user voted track using external service, , he's repeating himself using app. track

the track holds total stats itself. amount of likes, dislikes, likes external service (likesrp), dislikes external service (dislikesrp), likes/dislikes adjustments.

app

the app requires votes for:

5 up-voted tracks during lastly 7 days 5 down-voted tracks during lastly 7 days 5 up-voted tracks during lastly 7 days, votes of imported external service (vote.type = 1) 100 up-voted tracks during lastly month

to 100 most-up voted track utilize query:

class="lang-sql prettyprint-override">

select     t.hash,     t.title,     t.artist,     coalesce(x.votestotal, 0) + t.likesadjust votesadjusted (     select         v.trackhash,         sum(v.vote) votestotal             vote v             v.createdat > now() - interval 1 month , v.vote = 'up'      grouping         v.trackhash     order         votestotal desc ) x right  bring together track t     on t.hash = x.trackhash order     votesadjusted desc limit 0, 100;

this query working ok , honors adjustments (client wanted adjust track position in lists). same query used 5 up/down voted tracks. , query task #3 this:

class="lang-sql prettyprint-override">

select     t.hash,     t.title,     t.artist,     coalesce(x.votestotal, 1) votestotal (     select         v.trackhash,         sum(v.vote) votestotal             vote v             v.type = '1' ,         v.createdat > now() - interval 1 week ,         v.vote = 'up'      grouping         v.trackhash     order         votestotal desc ) x right  bring together track t     on t.hash = x.trackhash order     votestotal desc limit 0, 5;

the problem first query taking 2 seconds perform , have less 4k votes. end of year, figure 200k votes, kill database. i'm figuring out how solve puzzle.

and came downwards these questions:

did create database design wrong? mean, better? did create query wrong? anything else improve?

the first thing did caching. but, ok, solves problem drastically. i'm curious sql-related solution (always leaning towards perfection).

the sec thing had thought set calculated values meta table , alter them during voting procedure. i'm quite short on time seek out. worth way? or, how enterprise class apps solve these problems?

thanks.

edit

i can't believe forgot include indices. here are:

mysql> show indexes in vote; +-------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | table | non_unique | key_name                | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment | +-------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | vote  |          0 | unique_userid_trackhash |            1 | userid      |         |         890 |     null | null   |      | btree      |         | | vote  |          0 | unique_userid_trackhash |            2 | trackhash   |         |        4450 |     null | null   |      | btree      |         | | vote  |          1 | index_trackhash         |            1 | trackhash   |         |        4450 |     null | null   |      | btree      |         | | vote  |          1 | index_createdat         |            1 | createdat   |         |        1483 |     null | null   |      | btree      |         | | vote  |          1 | userid                  |            1 | userid      |         |        1483 |     null | null   |      | btree      |         | +-------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+  mysql> show indexes in track; +-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | table | non_unique | key_name       | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment | +-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | track |          0 | primary        |            1 | hash        |         |        2678 |     null | null   |      | btree      |         | | track |          1 | index_likes    |            1 | likes       |         |          66 |     null | null   |      | btree      |         | | track |          1 | index_dislikes |            1 | dislikes    |         |          27 |     null | null   |      | btree      |         | +-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

this subjective question because much depends on exact requirements, , performance testing nobody here can on data. can reply questions , add together generic solutions might work you:

did create database design wrong? mean, better?

no. ideal design oltp.

did create query wrong?

no (although order by in subqeries redundant). performance of query much dependant on indexes on vote table, since main columns queried in part:

select  v.trackhash, sum(v.vote) votestotal    vote v   v.createdat > now() - interval 1 month , v.vote = 'up'  grouping v.trackhash

i suggest 2 indexes, 1 on trackhash , 1 on createdat, vote , type (this may perform improve 3 separate indexes, worth testing both ways). 200k rows not much data, right indexes shouldn't take long query info on lastly month.

anything else improve?

this much balancing act, depends on exact requirements best way proceed. there 3 main ways approach problem.

1. current approach (query vote table each time)

as mentioned before think approach should scalable application. advantage not require maintenance, , info sent application date , accurate. disadvantage performance, might take bit longer insert info (due updating indexes), , select data. preferred approach.

2. olap approach

this involve maintaining summary table such as:

create table votearchive (       trackhash           char(40) not null,         createddate         date not null,         appmadeupvotes      int not null,         appmadedownvotes    int not null,         importedupvotes     int not null,         importeddownvotes   int not null,         mergedupvotes       int not null,         mergeddownvotes     int not null,     primary key (createddate, trackhash) );

this can populated nightly running simple query

insert votearchive select  trackhash,         date(createdat),         count(case when vote = 'up' , type = 0 1 end),         count(case when vote = 'down' , type = 0 1 end),         count(case when vote = 'up' , type = 1 1 end),         count(case when vote = 'down' , type = 1 1 end),         count(case when vote = 'up' , type = 2 1 end),         count(case when vote = 'down' , type = 2 1 end)    votes   createdat > date(current_timestamp)  grouping trackhash, date(createdat);

you can utilize table in place of live data. has advantage of date beingness part of clustered index, query beingness limited date should fast. disadvantage of if query table statistics accurate lastly time populated, much faster queries though. additional work maintain query. sec selection if nto query live data.

3. update statistics during voting

i including completeness implore not utilize method. accomplish in either application layer or via trigger , although allow querying of date info without having query "production" table open errors, , have never come accross advocates approach. every vote need insert/update logic should turn fast insert query longer process, depending on how maintenance there chance (albeit little of concurrency issues).

4. combination of above

you have 2 tables of same format vote table, , 1 table set out in solution 2, have 1 vote table storing today's votes, , 1 historic votes, , still maintain summary table, can combine today's info summary table date results without querying lot of data. again, additional maintenance, , more potential things go wrong.

mysql sql sql-optimization

Hee

Tuesday, 15 January 2013

sql - MySQL query optimization: how to optimize voting calculations? -

No comments:

Post a Comment