aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl
diff options
context:
space:
mode:
authordan miller2007-10-20 02:49:29 +0000
committerdan miller2007-10-20 02:49:29 +0000
commite36d23a85ebff914d74bb541558c2b6082b78edb (patch)
tree54b58fdf162e78af64055282a6035c8d2443389d /libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl
parent* Fixed an issue whereby avatar chat distances were being calculated against ... (diff)
downloadopensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.zip
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.gz
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.bz2
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.xz
sqlite source (unix build) added to libraries
Diffstat (limited to 'libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl')
-rw-r--r--libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl265
1 files changed, 265 insertions, 0 deletions
diff --git a/libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl b/libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl
new file mode 100644
index 0000000..5b2897e
--- /dev/null
+++ b/libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl
@@ -0,0 +1,265 @@
1#
2# Run this TCL script to generate HTML for the goals.html file.
3#
4set rcsid {$Id: optimizer.tcl,v 1.1 2005/08/30 22:44:06 drh Exp $}
5source common.tcl
6header {The SQLite Query Optimizer}
7
8proc CODE {text} {
9 puts "<blockquote><pre>"
10 puts $text
11 puts "</pre></blockquote>"
12}
13proc IMAGE {name {caption {}}} {
14 puts "<center><img src=\"$name\">"
15 if {$caption!=""} {
16 puts "<br>$caption"
17 }
18 puts "</center>"
19}
20proc PARAGRAPH {text} {
21 puts "<p>$text</p>\n"
22}
23proc HEADING {level name} {
24 puts "<h$level>$name</h$level>"
25}
26
27HEADING 1 {The SQLite Query Optimizer}
28
29PARAGRAPH {
30 This article describes how the SQLite query optimizer works.
31 This is not something you have to know in order to use SQLite - many
32 programmers use SQLite successfully without the slightest hint of what
33 goes on in the inside.
34 But a basic understanding of what SQLite is doing
35 behind the scenes will help you to write more efficient SQL. And the
36 knowledge gained by studying the SQLite query optimizer has broad
37 application since most other relational database engines operate
38 similarly.
39 A solid understanding of how the query optimizer works is also
40 required before making meaningful changes or additions to the SQLite, so
41 this article should be read closely by anyone aspiring
42 to hack the source code.
43}
44
45HEADING 2 Background
46
47PARAGRAPH {
48 It is important to understand that SQL is a programming language.
49 SQL is a perculiar programming language in that it
50 describes <u>what</u> the programmer wants to compute not <u>how</u>
51 to compute it as most other programming languages do.
52 But perculiar or not, SQL is still just a programming language.
53}
54
55PARAGRAPH {
56 It is very helpful to think of each SQL statement as a separate
57 program.
58 An important job of the SQL database engine is to translate each
59 SQL statement from its descriptive form that specifies what the
60 information is desired (the <u>what</u>)
61 into a procedural form that specifies how to go
62 about acquiring the desired information (the <u>how</u>).
63 The task of translating the <u>what</u> into a
64 <u>how</u> is assigned to the query optimizer.
65}
66
67PARAGRAPH {
68 The beauty of SQL comes from the fact that the optimizer frees the programmer
69 from having to worry over the details of <u>how</u>. The programmer
70 only has to specify the <u>what</u> and then leave the optimizer
71 to deal with all of the minutae of implementing the
72 <u>how</u>. Thus the programmer is able to think and work at a
73 much higher level and leave the optimizer to stress over the low-level
74 work.
75}
76
77HEADING 2 {Database Layout}
78
79PARAGRAPH {
80 An SQLite database consists of one or more "b-trees".
81 Each b-tree contains zero or more "rows".
82 A single row contains a "key" and some "data".
83 In general, both the key and the data are arbitrary binary
84 data of any length.
85 The keys must all be unique within a single b-tree.
86 Rows are stored in order of increasing key values - each
87 b-tree has a comparision functions for keys that determines
88 this order.
89}
90
91PARAGRAPH {
92 In SQLite, each SQL table is stored as a b-tree where the
93 key is a 64-bit integer and the data is the content of the
94 table row. The 64-bit integer key is the ROWID. And, of course,
95 if the table has an INTEGER PRIMARY KEY, then that integer is just
96 an alias for the ROWID.
97}
98
99PARAGRAPH {
100 Consider the following block of SQL code:
101}
102
103CODE {
104 CREATE TABLE ex1(
105 id INTEGER PRIMARY KEY,
106 x VARCHAR(30),
107 y INTEGER
108 );
109 INSERT INTO ex1 VALUES(NULL,'abc',12345);
110 INSERT INTO ex1 VALUES(NULL,456,'def');
111 INSERT INTO ex1 VALUES(100,'hello','world');
112 INSERT INTO ex1 VALUES(-5,'abc','xyz');
113 INSERT INTO ex1 VALUES(54321,NULL,987);
114}
115
116PARAGRAPH {
117 This code generates a new b-tree (named "ex1") containing 5 rows.
118 This table can be visualized as follows:
119}
120IMAGE table-ex1b2.gif
121
122PARAGRAPH {
123 Note that the key for each row if the b-tree is the INTEGER PRIMARY KEY
124 for that row. (Remember that the INTEGER PRIMARY KEY is just an alias
125 for the ROWID.) The other fields of the table form the data for each
126 entry in the b-tree. Note also that the b-tree entries are in ROWID order
127 which is different from the order that they were originally inserted.
128}
129
130PARAGRAPH {
131 Now consider the following SQL query:
132}
133CODE {
134 SELECT y FROM ex1 WHERE x=456;
135}
136
137PARAGRAPH {
138 When the SQLite parser and query optimizer are handed this query, they
139 have to translate it into a procedure that will find the desired result.
140 In this case, they do what is call a "full table scan". They start
141 at the beginning of the b-tree that contains the table and visit each
142 row. Within each row, the value of the "x" column is tested and when it
143 is found to match 456, the value of the "y" column is output.
144 We can represent this procedure graphically as follows:
145}
146IMAGE fullscanb.gif
147
148PARAGRAPH {
149 A full table scan is the access method of last resort. It will always
150 work. But if the table contains millions of rows and you are only looking
151 a single one, it might take a very long time to find the particular row
152 you are interested in.
153 In particular, the time needed to access a single row of the table is
154 proportional to the total number of rows in the table.
155 So a big part of the job of the optimizer is to try to find ways to
156 satisfy the query without doing a full table scan.
157}
158PARAGRAPH {
159 The usual way to avoid doing a full table scan is use a binary search
160 to find the particular row or rows of interest in the table.
161 Consider the next query which searches on rowid instead of x:
162}
163CODE {
164 SELECT y FROM ex1 WHERE rowid=2;
165}
166
167PARAGRAPH {
168 In the previous query, we could not use a binary search for x because
169 the values of x were not ordered. But the rowid values are ordered.
170 So instead of having to visit every row of the b-tree looking for one
171 that has a rowid value of 2, we can do a binary search for that particular
172 row and output its corresponding y value. We show this graphically
173 as follows:
174}
175IMAGE direct1b.gif
176
177PARAGRAPH {
178 When doing a binary search, we only have to look at a number of
179 rows with is proportional to the logorithm of the number of entries
180 in the table. For a table with just 5 entires as in the example above,
181 the difference between a full table scan and a binary search is
182 negligible. In fact, the full table scan might be faster. But in
183 a database that has 5 million rows, a binary search will be able to
184 find the desired row in only about 23 tries, whereas the full table
185 scan will need to look at all 5 million rows. So the binary search
186 is about 200,000 times faster in that case.
187}
188PARAGRAPH {
189 A 200,000-fold speed improvement is huge. So we always want to do
190 a binary search rather than a full table scan when we can.
191}
192PARAGRAPH {
193 The problem with a binary search is that the it only works if the
194 fields you are search for are in sorted order. So we can do a binary
195 search when looking up the rowid because the rows of the table are
196 sorted by rowid. But we cannot use a binary search when looking up
197 x because the values in the x column are in no particular order.
198}
199PARAGRAPH {
200 The way to work around this problem and to permit binary searching on
201 fields like x is to provide an index.
202 An index is another b-tree.
203 But in the index b-tree the key is not the rowid but rather the field
204 or fields being indexed followed by the rowid.
205 The data in an index b-tree is empty - it is not needed or used.
206 The following diagram shows an index on the x field of our example table:
207}
208IMAGE index-ex1-x-b.gif
209
210PARAGRAPH {
211 An important point to note in the index are that they keys of the
212 b-tree are in sorted order. (Recall that NULL values in SQLite sort
213 first, followed by numeric values in numerical order, then strings, and
214 finally BLOBs.) This is the property that will allow use to do a
215 binary search for the field x. The rowid is also included in every
216 key for two reasons. First, by including the rowid we guarantee that
217 every key will be unique. And second, the rowid will be used to look
218 up the actual table entry after doing the binary search. Finally, note
219 that the data portion of the index b-tree serves no purpose and is thus
220 kept empty to save space in the disk file.
221}
222PARAGRAPH {
223 Remember what the original query example looked like:
224}
225CODE {
226 SELECT y FROM ex1 WHERE x=456;
227}
228
229PARAGRAPH {
230 The first time this query was encountered we had to do a full table
231 scan. But now that we have an index on x, we can do a binary search
232 on that index for the entry where x==456. Then from that entry we
233 can find the rowid value and use the rowid to look up the corresponding
234 entry in the original table. From the entry in the original table,
235 we can find the value y and return it as our result. The following
236 diagram shows this process graphically:
237}
238IMAGE indirect1b1.gif
239
240PARAGRAPH {
241 With the index, we are able to look up an entry based on the value of
242 x after visiting only a logorithmic number of b-tree entries. Unlike
243 the case where we were searching using rowid, we have to do two binary
244 searches for each output row. But for a 5-million row table, that is
245 still only 46 searches instead of 5 million for a 100,000-fold speedup.
246}
247
248HEADING 3 {Parsing The WHERE Clause}
249
250
251
252# parsing the where clause
253# rowid lookup
254# index lookup
255# index lookup without the table
256# how an index is chosen
257# joins
258# join reordering
259# order by using an index
260# group by using an index
261# OR -> IN optimization
262# Bitmap indices
263# LIKE and GLOB optimization
264# subquery flattening
265# MIN and MAX optimizations