sql - MySQL query optimization: how to optimize voting calculations? -
hope you're doing fine.
i need help bit database:
this database stores votes. users pick sound tracks like, , vote them. can vote 'up' or 'down'. easy pie. but, when comes calculating stats gets hairy.
metait's key-value styled table, stores commonly used stats (just sort-of caching):
mysql> select * meta; +-------------+-------+ | key | value | +-------------+-------+ | track_count | 2620 | | vote_count | 3821 | | user_count | 371 | +-------------+-------+
vote the vote table holds vote itself. interesting field here type
, value of means:
0
- app made vote, user voted track using ui 1
- imported vote (from external service) 2
- merged vote. same imported vote, makes note, user voted track using external service, , he's repeating himself using app. track the track holds total stats itself. amount of likes, dislikes, likes external service (likesrp
), dislikes external service (dislikesrp
), likes/dislikes adjustments.
the app requires votes for:
5 up-voted tracks during lastly 7 days 5 down-voted tracks during lastly 7 days 5 up-voted tracks during lastly 7 days, votes of imported external service (vote.type = 1
) 100 up-voted tracks during lastly month to 100 most-up voted track utilize query:
class="lang-sql prettyprint-override">select t.hash, t.title, t.artist, coalesce(x.votestotal, 0) + t.likesadjust votesadjusted ( select v.trackhash, sum(v.vote) votestotal vote v v.createdat > now() - interval 1 month , v.vote = 'up' grouping v.trackhash order votestotal desc ) x right bring together track t on t.hash = x.trackhash order votesadjusted desc limit 0, 100;
this query working ok , honors adjustments (client wanted adjust track position in lists). same query used 5 up/down voted tracks. , query task #3 this:
class="lang-sql prettyprint-override">select t.hash, t.title, t.artist, coalesce(x.votestotal, 1) votestotal ( select v.trackhash, sum(v.vote) votestotal vote v v.type = '1' , v.createdat > now() - interval 1 week , v.vote = 'up' grouping v.trackhash order votestotal desc ) x right bring together track t on t.hash = x.trackhash order votestotal desc limit 0, 5;
the problem first query taking 2 seconds perform , have less 4k votes. end of year, figure 200k votes, kill database. i'm figuring out how solve puzzle.
and came downwards these questions:
did create database design wrong? mean, better? did create query wrong? anything else improve?the first thing did caching. but, ok, solves problem drastically. i'm curious sql-related solution (always leaning towards perfection).
the sec thing had thought set calculated values meta
table , alter them during voting procedure. i'm quite short on time seek out. worth way? or, how enterprise class apps solve these problems?
thanks.
editi can't believe forgot include indices. here are:
mysql> show indexes in vote; +-------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | table | non_unique | key_name | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment | +-------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | vote | 0 | unique_userid_trackhash | 1 | userid | | 890 | null | null | | btree | | | vote | 0 | unique_userid_trackhash | 2 | trackhash | | 4450 | null | null | | btree | | | vote | 1 | index_trackhash | 1 | trackhash | | 4450 | null | null | | btree | | | vote | 1 | index_createdat | 1 | createdat | | 1483 | null | null | | btree | | | vote | 1 | userid | 1 | userid | | 1483 | null | null | | btree | | +-------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ mysql> show indexes in track; +-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | table | non_unique | key_name | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment | +-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | track | 0 | primary | 1 | hash | | 2678 | null | null | | btree | | | track | 1 | index_likes | 1 | likes | | 66 | null | null | | btree | | | track | 1 | index_dislikes | 1 | dislikes | | 27 | null | null | | btree | | +-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
this subjective question because much depends on exact requirements, , performance testing nobody here can on data. can reply questions , add together generic solutions might work you:
did create database design wrong? mean, better?
no. ideal design oltp.
did create query wrong?
no (although order by
in subqeries redundant). performance of query much dependant on indexes on vote
table, since main columns queried in part:
select v.trackhash, sum(v.vote) votestotal vote v v.createdat > now() - interval 1 month , v.vote = 'up' grouping v.trackhash
i suggest 2 indexes, 1 on trackhash
, 1 on createdat
, vote
, type
(this may perform improve 3 separate indexes, worth testing both ways). 200k rows not much data, right indexes shouldn't take long query info on lastly month.
anything else improve?
this much balancing act, depends on exact requirements best way proceed. there 3 main ways approach problem.
1. current approach (query vote table each time)
as mentioned before think approach should scalable application. advantage not require maintenance, , info sent application date , accurate. disadvantage performance, might take bit longer insert info (due updating indexes), , select data. preferred approach.
2. olap approach
this involve maintaining summary table such as:
create table votearchive ( trackhash char(40) not null, createddate date not null, appmadeupvotes int not null, appmadedownvotes int not null, importedupvotes int not null, importeddownvotes int not null, mergedupvotes int not null, mergeddownvotes int not null, primary key (createddate, trackhash) );
this can populated nightly running simple query
insert votearchive select trackhash, date(createdat), count(case when vote = 'up' , type = 0 1 end), count(case when vote = 'down' , type = 0 1 end), count(case when vote = 'up' , type = 1 1 end), count(case when vote = 'down' , type = 1 1 end), count(case when vote = 'up' , type = 2 1 end), count(case when vote = 'down' , type = 2 1 end) votes createdat > date(current_timestamp) grouping trackhash, date(createdat);
you can utilize table in place of live data. has advantage of date beingness part of clustered index, query beingness limited date should fast. disadvantage of if query table statistics accurate lastly time populated, much faster queries though. additional work maintain query. sec selection if nto query live data.
3. update statistics during voting
i including completeness implore not utilize method. accomplish in either application layer or via trigger , although allow querying of date info without having query "production" table open errors, , have never come accross advocates approach. every vote need insert/update logic should turn fast insert query longer process, depending on how maintenance there chance (albeit little of concurrency issues).
4. combination of above
you have 2 tables of same format vote table, , 1 table set out in solution 2, have 1 vote table storing today's votes, , 1 historic votes, , still maintain summary table, can combine today's info summary table date results without querying lot of data. again, additional maintenance, , more potential things go wrong.
mysql sql sql-optimization
No comments:
Post a Comment