Similarly to Most Blocks Accounts on Steem Blockchain I was going to have a page that lists the most popular accounts (most followed).
However, after analysis, I decide to pause it because the query to get the most popular accounts is very slow - it takes more than 20 minutes to retrieve results.
On the other hand, the query to get the most blocked accounts is fast - less than a minute - which makes it feasible to do so.
When a query takes ages, the SQLite suffers a known database lock problem. SQLite is not very good at concurrency - when a thread is reading all rows, all other write operations (such as insertion) will be locked:
database is locked
Traceback (most recent call last):
File "test.py", line 135, in <module>
con.commit()
sqlite3.OperationalError: database is locked
Almost Same query - but with different parameters. For 'most blocked' the filtering is where what = 'ignore'
and for 'most popular' the filtering is where what = 'blog'
This is the table schema and indices:
sqlite> .schema
CREATE TABLE mute (
who text,
whom text,
what text,
time text,
block integer,
constraint pkey primary key (who, whom)
);
CREATE INDEX index_from on mute(who);
CREATE INDEX index_to on mute(whom);
CREATE INDEX index_time on mute(time);
CREATE INDEX index_what on mute(what);
CREATE INDEX index_block on mute(block);
And if we run the query to see the number of the candidate rows:
sqlite> select count(rowid) from mute where what='ignore';
332096
sqlite> select count(rowid) from mute where what='blog';
100757608
It is a huge difference! Yes, for both queries, SQLite will utilize the index what
and process the result set by grouping - the more rows - the more time it requires!
This can be confirmed by the explain
statement which takes you how the SQLite will process your query.
sqlite> explain select whom,count(1) from mute where what='blog' group by whom order by count(1) desc limit 100;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 57 0 00 Start at 57
1 OpenEphemeral 1 4 0 k(1,-B) 00 nColumn=4
2 Integer 100 1 0 00 r[1]=100; LIMIT counter
3 SorterOpen 2 1 0 k(1,B) 00
4 Integer 0 5 0 00 r[5]=0; clear abort flag
5 Integer 0 4 0 00 r[4]=0; indicate accumulator empty
6 Null 0 8 8 00 r[8..8]=NULL
7 Gosub 7 49 0 00
8 OpenRead 0 2 0 3 00 root=2 iDb=0; mute
9 OpenRead 3 7 0 k(2,,) 02 root=7 iDb=0; index_what
10 String8 0 10 0 blog 00 r[10]='blog'
11 SeekGE 3 18 10 1 00 key=r[10]
12 IdxGT 3 18 10 1 00 key=r[10]
13 DeferredSeek 3 0 0 00 Move 0 to 3.rowid if needed
14 Column 0 1 11 00 r[11]=mute.whom
15 MakeRecord 11 1 12 00 r[12]=mkrec(r[11])
16 SorterInsert 2 12 0 00 key=r[12]
17 Next 3 12 1 00
18 OpenPseudo 4 11 1 00 1 columns in r[11]
19 SorterSort 2 51 0 00 GROUP BY sort
20 SorterData 2 11 4 00 r[11]=data
21 Column 4 0 9 00 r[9]=
22 Compare 8 9 1 k(1,B) 00 r[8] <-> r[9]
23 Jump 24 28 24 00
24 Move 9 8 1 00 r[8]=r[9]
25 Gosub 6 37 0 00 output one row
26 IfPos 5 51 0 00 if r[5]>0 then r[5]-=0, goto 51; check abort flag
27 Gosub 7 49 0 00 reset accumulator
28 Integer 1 12 0 00 r[12]=1
29 AggStep0 0 12 3 count(1) 01 accum=r[3] step(r[12])
30 Column 4 0 2 00 r[2]=
31 Integer 1 4 0 00 r[4]=1; indicate data in accumulator
32 SorterNext 2 20 0 00
33 Gosub 6 37 0 00 output final row
34 Goto 0 51 0 00
35 Integer 1 5 0 00 r[5]=1; set abort flag
36 Return 6 0 0 00
37 IfPos 4 39 0 00 if r[4]>0 then r[4]-=0, goto 39; Groupby result generator entry point
38 Return 6 0 0 00
39 AggFinal 3 1 0 count(1) 00 accum=r[3] N=1
40 Copy 2 15 0 00 r[15]=r[2]
41 Copy 3 13 0 00 r[13]=r[3]
42 Sequence 1 14 0 00 r[14]=cursor[1].ctr++
43 MakeRecord 13 3 17 00 r[17]=mkrec(r[13..15])
44 IdxInsert 1 17 13 3 00 key=r[17]
45 IfNotZero 1 48 0 00 if r[1]!=0 then r[1]--, goto 48
46 Last 1 0 0 00
47 Delete 1 0 0 00
48 Return 6 0 0 00 end groupby result generator
49 Null 0 2 3 00 r[2..3]=NULL
50 Return 7 0 0 00
51 Sort 1 56 0 00
52 Column 1 2 15 00 r[15]=whom
53 Column 1 0 16 00 r[16]=count(1)
54 ResultRow 15 2 0 00 output=r[15..16]
55 Next 1 52 0 00
56 Halt 0 0 0 00
57 Transaction 0 0 20 0 01 usesStmtJournal=0
58 Goto 0 1 0 00
SQLite is a relational database - which makes it inefficient in handling a large dataset. The steem blockchain has been running for years - the data accumulates quickly.
Every little helps! I hope this helps!
Steem On!~
Reposted to Blog
If you like my work, please consider voting for me, thanks!
https://steemit.com/~witnesses type in justyy and click VOTE
Alternatively, you could proxy to me if you are too lazy to vote!
Also: you can vote me at the tool I made: https://steemyy.com/witness-voting/?witness=justyy
A few things that come to my mind:
I'm pretty sure that this is not related to SQLite's performance :)
This might also help: https://www.sqlitetutorial.net/sqlite-having/
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Great suggestions!
yes, the query has a
limit 100
.the
type
field is indexed anyway .Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hello, a indian man by the name of Rupender Sharma IT specialist at TELUS in Abbotsford, BC Canada ripped me off.
I need your help if you can?
On june 4th and 10th - 175 ETH Stolen, 29,093 TUSD Stolen, 2402 LINK Stolen.
Do you know anyone that can trace the addresses the stolen tokens were sent to?
I will reward you $10k in Tokens or Fiat of choice for the full recovery.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit