data structures - Which is the most efficient way to persist a string for retrieving spans of text? -
i need way store 1 big text on disk without loading exclusively in memory.
my queries in form of spans of text, such as: give me text between position x , position x + n, nil more, nil less. don't have frequent changes text.
probably need "persistent" b-tree.
it need dbms features like:
a client / server architecture a cache systemthanks
it need dbms features like: ...
so, why don't utilize dbms? or nosql solution query capabilities, orientdb?
i think this.
split text in chunks (chapters? paragraphs? fixed size?) save text in table (at least) 3 fields: text (the chunk of text) begin (the offset of chunk origin of total text) end (the end offset of chunk origin of total text)now can write query extract text between position x , position x+n.
select text, begin end text_table end >= x , begin <= (x+n) order begin
finaly have extract text doing like: - first row: substring(text, (x-begin)) - "inner" rows: text - lastly row: substring(text, 0, (x+n-begin))
obviously, should take care of "edge cases" (result 1 or 2 rows, requested span out of range, ...). think approach should solve problem without much effort.
hope helps. bye, raf
text data-structures nosql bigdata storage-engines
No comments:
Post a Comment