diff options
author | dan miller | 2007-10-20 02:49:29 +0000 |
---|---|---|
committer | dan miller | 2007-10-20 02:49:29 +0000 |
commit | e36d23a85ebff914d74bb541558c2b6082b78edb (patch) | |
tree | 54b58fdf162e78af64055282a6035c8d2443389d /libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl | |
parent | * Fixed an issue whereby avatar chat distances were being calculated against ... (diff) | |
download | opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.zip opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.gz opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.bz2 opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.xz |
sqlite source (unix build) added to libraries
Diffstat (limited to 'libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl')
-rw-r--r-- | libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl | 265 |
1 files changed, 265 insertions, 0 deletions
diff --git a/libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl b/libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl new file mode 100644 index 0000000..5b2897e --- /dev/null +++ b/libraries/sqlite/unix/sqlite-3.5.1/www/optimizer.tcl | |||
@@ -0,0 +1,265 @@ | |||
1 | # | ||
2 | # Run this TCL script to generate HTML for the goals.html file. | ||
3 | # | ||
4 | set rcsid {$Id: optimizer.tcl,v 1.1 2005/08/30 22:44:06 drh Exp $} | ||
5 | source common.tcl | ||
6 | header {The SQLite Query Optimizer} | ||
7 | |||
8 | proc CODE {text} { | ||
9 | puts "<blockquote><pre>" | ||
10 | puts $text | ||
11 | puts "</pre></blockquote>" | ||
12 | } | ||
13 | proc IMAGE {name {caption {}}} { | ||
14 | puts "<center><img src=\"$name\">" | ||
15 | if {$caption!=""} { | ||
16 | puts "<br>$caption" | ||
17 | } | ||
18 | puts "</center>" | ||
19 | } | ||
20 | proc PARAGRAPH {text} { | ||
21 | puts "<p>$text</p>\n" | ||
22 | } | ||
23 | proc HEADING {level name} { | ||
24 | puts "<h$level>$name</h$level>" | ||
25 | } | ||
26 | |||
27 | HEADING 1 {The SQLite Query Optimizer} | ||
28 | |||
29 | PARAGRAPH { | ||
30 | This article describes how the SQLite query optimizer works. | ||
31 | This is not something you have to know in order to use SQLite - many | ||
32 | programmers use SQLite successfully without the slightest hint of what | ||
33 | goes on in the inside. | ||
34 | But a basic understanding of what SQLite is doing | ||
35 | behind the scenes will help you to write more efficient SQL. And the | ||
36 | knowledge gained by studying the SQLite query optimizer has broad | ||
37 | application since most other relational database engines operate | ||
38 | similarly. | ||
39 | A solid understanding of how the query optimizer works is also | ||
40 | required before making meaningful changes or additions to the SQLite, so | ||
41 | this article should be read closely by anyone aspiring | ||
42 | to hack the source code. | ||
43 | } | ||
44 | |||
45 | HEADING 2 Background | ||
46 | |||
47 | PARAGRAPH { | ||
48 | It is important to understand that SQL is a programming language. | ||
49 | SQL is a perculiar programming language in that it | ||
50 | describes <u>what</u> the programmer wants to compute not <u>how</u> | ||
51 | to compute it as most other programming languages do. | ||
52 | But perculiar or not, SQL is still just a programming language. | ||
53 | } | ||
54 | |||
55 | PARAGRAPH { | ||
56 | It is very helpful to think of each SQL statement as a separate | ||
57 | program. | ||
58 | An important job of the SQL database engine is to translate each | ||
59 | SQL statement from its descriptive form that specifies what the | ||
60 | information is desired (the <u>what</u>) | ||
61 | into a procedural form that specifies how to go | ||
62 | about acquiring the desired information (the <u>how</u>). | ||
63 | The task of translating the <u>what</u> into a | ||
64 | <u>how</u> is assigned to the query optimizer. | ||
65 | } | ||
66 | |||
67 | PARAGRAPH { | ||
68 | The beauty of SQL comes from the fact that the optimizer frees the programmer | ||
69 | from having to worry over the details of <u>how</u>. The programmer | ||
70 | only has to specify the <u>what</u> and then leave the optimizer | ||
71 | to deal with all of the minutae of implementing the | ||
72 | <u>how</u>. Thus the programmer is able to think and work at a | ||
73 | much higher level and leave the optimizer to stress over the low-level | ||
74 | work. | ||
75 | } | ||
76 | |||
77 | HEADING 2 {Database Layout} | ||
78 | |||
79 | PARAGRAPH { | ||
80 | An SQLite database consists of one or more "b-trees". | ||
81 | Each b-tree contains zero or more "rows". | ||
82 | A single row contains a "key" and some "data". | ||
83 | In general, both the key and the data are arbitrary binary | ||
84 | data of any length. | ||
85 | The keys must all be unique within a single b-tree. | ||
86 | Rows are stored in order of increasing key values - each | ||
87 | b-tree has a comparision functions for keys that determines | ||
88 | this order. | ||
89 | } | ||
90 | |||
91 | PARAGRAPH { | ||
92 | In SQLite, each SQL table is stored as a b-tree where the | ||
93 | key is a 64-bit integer and the data is the content of the | ||
94 | table row. The 64-bit integer key is the ROWID. And, of course, | ||
95 | if the table has an INTEGER PRIMARY KEY, then that integer is just | ||
96 | an alias for the ROWID. | ||
97 | } | ||
98 | |||
99 | PARAGRAPH { | ||
100 | Consider the following block of SQL code: | ||
101 | } | ||
102 | |||
103 | CODE { | ||
104 | CREATE TABLE ex1( | ||
105 | id INTEGER PRIMARY KEY, | ||
106 | x VARCHAR(30), | ||
107 | y INTEGER | ||
108 | ); | ||
109 | INSERT INTO ex1 VALUES(NULL,'abc',12345); | ||
110 | INSERT INTO ex1 VALUES(NULL,456,'def'); | ||
111 | INSERT INTO ex1 VALUES(100,'hello','world'); | ||
112 | INSERT INTO ex1 VALUES(-5,'abc','xyz'); | ||
113 | INSERT INTO ex1 VALUES(54321,NULL,987); | ||
114 | } | ||
115 | |||
116 | PARAGRAPH { | ||
117 | This code generates a new b-tree (named "ex1") containing 5 rows. | ||
118 | This table can be visualized as follows: | ||
119 | } | ||
120 | IMAGE table-ex1b2.gif | ||
121 | |||
122 | PARAGRAPH { | ||
123 | Note that the key for each row if the b-tree is the INTEGER PRIMARY KEY | ||
124 | for that row. (Remember that the INTEGER PRIMARY KEY is just an alias | ||
125 | for the ROWID.) The other fields of the table form the data for each | ||
126 | entry in the b-tree. Note also that the b-tree entries are in ROWID order | ||
127 | which is different from the order that they were originally inserted. | ||
128 | } | ||
129 | |||
130 | PARAGRAPH { | ||
131 | Now consider the following SQL query: | ||
132 | } | ||
133 | CODE { | ||
134 | SELECT y FROM ex1 WHERE x=456; | ||
135 | } | ||
136 | |||
137 | PARAGRAPH { | ||
138 | When the SQLite parser and query optimizer are handed this query, they | ||
139 | have to translate it into a procedure that will find the desired result. | ||
140 | In this case, they do what is call a "full table scan". They start | ||
141 | at the beginning of the b-tree that contains the table and visit each | ||
142 | row. Within each row, the value of the "x" column is tested and when it | ||
143 | is found to match 456, the value of the "y" column is output. | ||
144 | We can represent this procedure graphically as follows: | ||
145 | } | ||
146 | IMAGE fullscanb.gif | ||
147 | |||
148 | PARAGRAPH { | ||
149 | A full table scan is the access method of last resort. It will always | ||
150 | work. But if the table contains millions of rows and you are only looking | ||
151 | a single one, it might take a very long time to find the particular row | ||
152 | you are interested in. | ||
153 | In particular, the time needed to access a single row of the table is | ||
154 | proportional to the total number of rows in the table. | ||
155 | So a big part of the job of the optimizer is to try to find ways to | ||
156 | satisfy the query without doing a full table scan. | ||
157 | } | ||
158 | PARAGRAPH { | ||
159 | The usual way to avoid doing a full table scan is use a binary search | ||
160 | to find the particular row or rows of interest in the table. | ||
161 | Consider the next query which searches on rowid instead of x: | ||
162 | } | ||
163 | CODE { | ||
164 | SELECT y FROM ex1 WHERE rowid=2; | ||
165 | } | ||
166 | |||
167 | PARAGRAPH { | ||
168 | In the previous query, we could not use a binary search for x because | ||
169 | the values of x were not ordered. But the rowid values are ordered. | ||
170 | So instead of having to visit every row of the b-tree looking for one | ||
171 | that has a rowid value of 2, we can do a binary search for that particular | ||
172 | row and output its corresponding y value. We show this graphically | ||
173 | as follows: | ||
174 | } | ||
175 | IMAGE direct1b.gif | ||
176 | |||
177 | PARAGRAPH { | ||
178 | When doing a binary search, we only have to look at a number of | ||
179 | rows with is proportional to the logorithm of the number of entries | ||
180 | in the table. For a table with just 5 entires as in the example above, | ||
181 | the difference between a full table scan and a binary search is | ||
182 | negligible. In fact, the full table scan might be faster. But in | ||
183 | a database that has 5 million rows, a binary search will be able to | ||
184 | find the desired row in only about 23 tries, whereas the full table | ||
185 | scan will need to look at all 5 million rows. So the binary search | ||
186 | is about 200,000 times faster in that case. | ||
187 | } | ||
188 | PARAGRAPH { | ||
189 | A 200,000-fold speed improvement is huge. So we always want to do | ||
190 | a binary search rather than a full table scan when we can. | ||
191 | } | ||
192 | PARAGRAPH { | ||
193 | The problem with a binary search is that the it only works if the | ||
194 | fields you are search for are in sorted order. So we can do a binary | ||
195 | search when looking up the rowid because the rows of the table are | ||
196 | sorted by rowid. But we cannot use a binary search when looking up | ||
197 | x because the values in the x column are in no particular order. | ||
198 | } | ||
199 | PARAGRAPH { | ||
200 | The way to work around this problem and to permit binary searching on | ||
201 | fields like x is to provide an index. | ||
202 | An index is another b-tree. | ||
203 | But in the index b-tree the key is not the rowid but rather the field | ||
204 | or fields being indexed followed by the rowid. | ||
205 | The data in an index b-tree is empty - it is not needed or used. | ||
206 | The following diagram shows an index on the x field of our example table: | ||
207 | } | ||
208 | IMAGE index-ex1-x-b.gif | ||
209 | |||
210 | PARAGRAPH { | ||
211 | An important point to note in the index are that they keys of the | ||
212 | b-tree are in sorted order. (Recall that NULL values in SQLite sort | ||
213 | first, followed by numeric values in numerical order, then strings, and | ||
214 | finally BLOBs.) This is the property that will allow use to do a | ||
215 | binary search for the field x. The rowid is also included in every | ||
216 | key for two reasons. First, by including the rowid we guarantee that | ||
217 | every key will be unique. And second, the rowid will be used to look | ||
218 | up the actual table entry after doing the binary search. Finally, note | ||
219 | that the data portion of the index b-tree serves no purpose and is thus | ||
220 | kept empty to save space in the disk file. | ||
221 | } | ||
222 | PARAGRAPH { | ||
223 | Remember what the original query example looked like: | ||
224 | } | ||
225 | CODE { | ||
226 | SELECT y FROM ex1 WHERE x=456; | ||
227 | } | ||
228 | |||
229 | PARAGRAPH { | ||
230 | The first time this query was encountered we had to do a full table | ||
231 | scan. But now that we have an index on x, we can do a binary search | ||
232 | on that index for the entry where x==456. Then from that entry we | ||
233 | can find the rowid value and use the rowid to look up the corresponding | ||
234 | entry in the original table. From the entry in the original table, | ||
235 | we can find the value y and return it as our result. The following | ||
236 | diagram shows this process graphically: | ||
237 | } | ||
238 | IMAGE indirect1b1.gif | ||
239 | |||
240 | PARAGRAPH { | ||
241 | With the index, we are able to look up an entry based on the value of | ||
242 | x after visiting only a logorithmic number of b-tree entries. Unlike | ||
243 | the case where we were searching using rowid, we have to do two binary | ||
244 | searches for each output row. But for a 5-million row table, that is | ||
245 | still only 46 searches instead of 5 million for a 100,000-fold speedup. | ||
246 | } | ||
247 | |||
248 | HEADING 3 {Parsing The WHERE Clause} | ||
249 | |||
250 | |||
251 | |||
252 | # parsing the where clause | ||
253 | # rowid lookup | ||
254 | # index lookup | ||
255 | # index lookup without the table | ||
256 | # how an index is chosen | ||
257 | # joins | ||
258 | # join reordering | ||
259 | # order by using an index | ||
260 | # group by using an index | ||
261 | # OR -> IN optimization | ||
262 | # Bitmap indices | ||
263 | # LIKE and GLOB optimization | ||
264 | # subquery flattening | ||
265 | # MIN and MAX optimizations | ||