aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl
diff options
context:
space:
mode:
authordan miller2007-10-20 02:49:29 +0000
committerdan miller2007-10-20 02:49:29 +0000
commite36d23a85ebff914d74bb541558c2b6082b78edb (patch)
tree54b58fdf162e78af64055282a6035c8d2443389d /libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl
parent* Fixed an issue whereby avatar chat distances were being calculated against ... (diff)
downloadopensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.zip
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.gz
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.bz2
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.xz
sqlite source (unix build) added to libraries
Diffstat (limited to 'libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl')
-rw-r--r--libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl785
1 files changed, 785 insertions, 0 deletions
diff --git a/libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl b/libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl
new file mode 100644
index 0000000..d143f08
--- /dev/null
+++ b/libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl
@@ -0,0 +1,785 @@
1#
2# Run this script to generated a fileformat.html output file
3#
4set rcsid {$Id: fileformat.tcl,v 1.13 2004/10/10 17:24:55 drh Exp $}
5source common.tcl
6header {SQLite Database File Format (Version 2)}
7puts {
8<h2>SQLite 2.X Database File Format</h2>
9
10<p>
11This document describes the disk file format for SQLite versions 2.1
12through 2.8. SQLite version 3.0 and following uses a very different
13format which is described separately.
14</p>
15
16<h3>1.0 &nbsp; Layers</h3>
17
18<p>
19SQLite is implemented in layers.
20(See the <a href="arch.html">architecture description</a>.)
21The format of database files is determined by three different
22layers in the architecture.
23</p>
24
25<ul>
26<li>The <b>schema</b> layer implemented by the VDBE.</li>
27<li>The <b>b-tree</b> layer implemented by btree.c</li>
28<li>The <b>pager</b> layer implemented by pager.c</li>
29</ul>
30
31<p>
32We will describe each layer beginning with the bottom (pager)
33layer and working upwards.
34</p>
35
36<h3>2.0 &nbsp; The Pager Layer</h3>
37
38<p>
39An SQLite database consists of
40"pages" of data. Each page is 1024 bytes in size.
41Pages are numbered beginning with 1.
42A page number of 0 is used to indicate "no such page" in the
43B-Tree and Schema layers.
44</p>
45
46<p>
47The pager layer is responsible for implementing transactions
48with atomic commit and rollback. It does this using a separate
49journal file. Whenever a new transaction is started, a journal
50file is created that records the original state of the database.
51If the program terminates before completing the transaction, the next
52process to open the database can use the journal file to restore
53the database to its original state.
54</p>
55
56<p>
57The journal file is located in the same directory as the database
58file and has the same name as the database file but with the
59characters "<tt>-journal</tt>" appended.
60</p>
61
62<p>
63The pager layer does not impose any content restrictions on the
64main database file. As far as the pager is concerned, each page
65contains 1024 bytes of arbitrary data. But there is structure to
66the journal file.
67</p>
68
69<p>
70A journal file begins with 8 bytes as follows:
710xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, and 0xd6.
72Processes that are attempting to rollback a journal use these 8 bytes
73as a sanity check to make sure the file they think is a journal really
74is a valid journal. Prior version of SQLite used different journal
75file formats. The magic numbers for these prior formats are different
76so that if a new version of the library attempts to rollback a journal
77created by an earlier version, it can detect that the journal uses
78an obsolete format and make the necessary adjustments. This article
79describes only the newest journal format - supported as of version
802.8.0.
81</p>
82
83<p>
84Following the 8 byte prefix is a three 4-byte integers that tell us
85the number of pages that have been committed to the journal,
86a magic number used for
87sanity checking each page, and the
88original size of the main database file before the transaction was
89started. The number of committed pages is used to limit how far
90into the journal to read. The use of the checksum magic number is
91described below.
92The original size of the database is used to restore the database
93file back to its original size.
94The size is expressed in pages (1024 bytes per page).
95</p>
96
97<p>
98All three integers in the journal header and all other multi-byte
99numbers used in the journal file are big-endian.
100That means that the most significant byte
101occurs first. That way, a journal file that is
102originally created on one machine can be rolled back by another
103machine that uses a different byte order. So, for example, a
104transaction that failed to complete on your big-endian SparcStation
105can still be rolled back on your little-endian Linux box.
106</p>
107
108<p>
109After the 8-byte prefix and the three 4-byte integers, the
110journal file consists of zero or more page records. Each page
111record is a 4-byte (big-endian) page number followed by 1024 bytes
112of data and a 4-byte checksum.
113The data is the original content of the database page
114before the transaction was started. So to roll back the transaction,
115the data is simply written into the corresponding page of the
116main database file. Pages can appear in the journal in any order,
117but they are guaranteed to appear only once. All page numbers will be
118between 1 and the maximum specified by the page size integer that
119appeared at the beginning of the journal.
120</p>
121
122<p>
123The so-called checksum at the end of each record is not really a
124checksum - it is the sum of the page number and the magic number which
125was the second integer in the journal header. The purpose of this
126value is to try to detect journal corruption that might have occurred
127because of a power loss or OS crash that occurred which the journal
128file was being written to disk. It could have been the case that the
129meta-data for the journal file, specifically the size of the file, had
130been written to the disk so that when the machine reboots it appears that
131file is large enough to hold the current record. But even though the
132file size has changed, the data for the file might not have made it to
133the disk surface at the time of the OS crash or power loss. This means
134that after reboot, the end of the journal file will contain quasi-random
135garbage data. The checksum is an attempt to detect such corruption. If
136the checksum does not match, that page of the journal is not rolled back.
137</p>
138
139<p>
140Here is a summary of the journal file format:
141</p>
142
143<ul>
144<li>8 byte prefix: 0xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, 0xd6</li>
145<li>4 byte number of records in journal</li>
146<li>4 byte magic number used for page checksums</li>
147<li>4 byte initial database page count</li>
148<li>Zero or more instances of the following:
149 <ul>
150 <li>4 byte page number</li>
151 <li>1024 bytes of original data for the page</li>
152 <li>4 byte checksum</li>
153 </ul>
154</li>
155</ul>
156
157<h3>3.0 &nbsp; The B-Tree Layer</h3>
158
159<p>
160The B-Tree layer builds on top of the pager layer to implement
161one or more separate b-trees all in the same disk file. The
162algorithms used are taken from Knuth's <i>The Art Of Computer
163Programming.</i></p>
164
165<p>
166Page 1 of a database contains a header string used for sanity
167checking, a few 32-bit words of configuration data, and a pointer
168to the beginning of a list of unused pages in the database.
169All other pages in the
170database are either pages of a b-tree, overflow pages, or unused
171pages on the freelist.
172</p>
173
174<p>
175Each b-tree page contains zero or more database entries.
176Each entry has an unique key of one or more bytes and data of
177zero or more bytes.
178Both the key and data are arbitrary byte sequences. The combination
179of key and data are collectively known as "payload". The current
180implementation limits the amount of payload in a single entry to
1811048576 bytes. This limit can be raised to 16777216 by adjusting
182a single #define in the source code and recompiling. But most entries
183contain less than a hundred bytes of payload so a megabyte limit seems
184more than enough.
185</p>
186
187<p>
188Up to 238 bytes of payload for an entry can be held directly on
189a b-tree page. Any additional payload is contained on a linked list
190of overflow pages. This limit on the amount of payload held directly
191on b-tree pages guarantees that each b-tree page can hold at least
1924 entries. In practice, most entries are smaller than 238 bytes and
193thus most pages can hold more than 4 entries.
194</p>
195
196<p>
197A single database file can hold any number of separate, independent b-trees.
198Each b-tree is identified by its root page, which never changes.
199Child pages of the b-tree may change as entries are added and removed
200and pages split and combine. But the root page always stays the same.
201The b-tree itself does not record which pages are root pages and which
202are not. That information is handled entirely at the schema layer.
203</p>
204
205<h4>3.1 &nbsp; B-Tree Page 1 Details</h4>
206
207<p>
208Page 1 begins with the following 48-byte string:
209</p>
210
211<blockquote><pre>
212** This file contains an SQLite 2.1 database **
213</pre></blockquote>
214
215<p>
216If you count the number of characters in the string above, you will
217see that there are only 47. A '\000' terminator byte is added to
218bring the total to 48.
219</p>
220
221<p>
222A frequent question is why the string says version 2.1 when (as
223of this writing) we are up to version 2.7.0 of SQLite and any
224change to the second digit of the version is suppose to represent
225a database format change. The answer to this is that the B-tree
226layer has not changed any since version 2.1. There have been
227database format changes since version 2.1 but those changes have
228all been in the schema layer. Because the format of the b-tree
229layer is unchanged since version 2.1.0, the header string still
230says version 2.1.
231</p>
232
233<p>
234After the format string is a 4-byte integer used to determine the
235byte-order of the database. The integer has a value of
2360xdae37528. If this number is expressed as 0xda, 0xe3, 0x75, 0x28, then
237the database is in a big-endian format and all 16 and 32-bit integers
238elsewhere in the b-tree layer are also big-endian. If the number is
239expressed as 0x28, 0x75, 0xe3, and 0xda, then the database is in a
240little-endian format and all other multi-byte numbers in the b-tree
241layer are also little-endian.
242Prior to version 2.6.3, the SQLite engine was only able to read databases
243that used the same byte order as the processor they were running on.
244But beginning with 2.6.3, SQLite can read or write databases in any
245byte order.
246</p>
247
248<p>
249After the byte-order code are six 4-byte integers. Each integer is in the
250byte order determined by the byte-order code. The first integer is the
251page number for the first page of the freelist. If there are no unused
252pages in the database, then this integer is 0. The second integer is
253the number of unused pages in the database. The last 4 integers are
254not used by the b-tree layer. These are the so-called "meta" values that
255are passed up to the schema layer
256and used there for configuration and format version information.
257All bytes of page 1 past beyond the meta-value integers are unused
258and are initialized to zero.
259</p>
260
261<p>
262Here is a summary of the information contained on page 1 in the b-tree layer:
263</p>
264
265<ul>
266<li>48 byte header string</li>
267<li>4 byte integer used to determine the byte-order</li>
268<li>4 byte integer which is the first page of the freelist</li>
269<li>4 byte integer which is the number of pages on the freelist</li>
270<li>36 bytes of meta-data arranged as nine 4-byte integers</li>
271<li>928 bytes of unused space</li>
272</ul>
273
274<h4>3.2 &nbsp; Structure Of A Single B-Tree Page</h4>
275
276<p>
277Conceptually, a b-tree page contains N database entries and N+1 pointers
278to other b-tree pages.
279</p>
280
281<blockquote>
282<table border=1 cellspacing=0 cellpadding=5>
283<tr>
284<td align="center">Ptr<br>0</td>
285<td align="center">Entry<br>0</td>
286<td align="center">Ptr<br>1</td>
287<td align="center">Entry<br>1</td>
288<td align="center"><b>...</b></td>
289<td align="center">Ptr<br>N-1</td>
290<td align="center">Entry<br>N-1</td>
291<td align="center">Ptr<br>N</td>
292</tr>
293</table>
294</blockquote>
295
296<p>
297The entries are arranged in increasing order. That is, the key to
298Entry 0 is less than the key to Entry 1, and the key to Entry 1 is
299less than the key of Entry 2, and so forth. The pointers point to
300pages containing additional entries that have keys in between the
301entries on either side. So Ptr 0 points to another b-tree page that
302contains entries that all have keys less than Key 0, and Ptr 1
303points to a b-tree pages where all entries have keys greater than Key 0
304but less than Key 1, and so forth.
305</p>
306
307<p>
308Each b-tree page in SQLite consists of a header, zero or more "cells"
309each holding a single entry and pointer, and zero or more "free blocks"
310that represent unused space on the page.
311</p>
312
313<p>
314The header on a b-tree page is the first 8 bytes of the page.
315The header contains the value
316of the right-most pointer (Ptr N) and the byte offset into the page
317of the first cell and the first free block. The pointer is a 32-bit
318value and the offsets are each 16-bit values. We have:
319</p>
320
321<blockquote>
322<table border=1 cellspacing=0 cellpadding=5>
323<tr>
324<td align="center" width=30>0</td>
325<td align="center" width=30>1</td>
326<td align="center" width=30>2</td>
327<td align="center" width=30>3</td>
328<td align="center" width=30>4</td>
329<td align="center" width=30>5</td>
330<td align="center" width=30>6</td>
331<td align="center" width=30>7</td>
332</tr>
333<tr>
334<td align="center" colspan=4>Ptr N</td>
335<td align="center" colspan=2>Cell 0</td>
336<td align="center" colspan=2>Freeblock 0</td>
337</tr>
338</table>
339</blockquote>
340
341<p>
342The 1016 bytes of a b-tree page that come after the header contain
343cells and freeblocks. All 1016 bytes are covered by either a cell
344or a freeblock.
345</p>
346
347<p>
348The cells are connected in a linked list. Cell 0 contains Ptr 0 and
349Entry 0. Bytes 4 and 5 of the header point to Cell 0. Cell 0 then
350points to Cell 1 which contains Ptr 1 and Entry 1. And so forth.
351Cells vary in size. Every cell has a 12-byte header and at least 4
352bytes of payload space. Space is allocated to payload in increments
353of 4 bytes. Thus the minimum size of a cell is 16 bytes and up to
35463 cells can fit on a single page. The size of a cell is always a multiple
355of 4 bytes.
356A cell can have up to 238 bytes of payload space. If
357the payload is more than 238 bytes, then an additional 4 byte page
358number is appended to the cell which is the page number of the first
359overflow page containing the additional payload. The maximum size
360of a cell is thus 254 bytes, meaning that a least 4 cells can fit into
361the 1016 bytes of space available on a b-tree page.
362An average cell is usually around 52 to 100 bytes in size with about
36310 or 20 cells to a page.
364</p>
365
366<p>
367The data layout of a cell looks like this:
368</p>
369
370<blockquote>
371<table border=1 cellspacing=0 cellpadding=5>
372<tr>
373<td align="center" width=20>0</td>
374<td align="center" width=20>1</td>
375<td align="center" width=20>2</td>
376<td align="center" width=20>3</td>
377<td align="center" width=20>4</td>
378<td align="center" width=20>5</td>
379<td align="center" width=20>6</td>
380<td align="center" width=20>7</td>
381<td align="center" width=20>8</td>
382<td align="center" width=20>9</td>
383<td align="center" width=20>10</td>
384<td align="center" width=20>11</td>
385<td align="center" width=100>12 ... 249</td>
386<td align="center" width=20>250</td>
387<td align="center" width=20>251</td>
388<td align="center" width=20>252</td>
389<td align="center" width=20>253</td>
390</tr>
391<tr>
392<td align="center" colspan=4>Ptr</td>
393<td align="center" colspan=2>Keysize<br>(low)</td>
394<td align="center" colspan=2>Next</td>
395<td align="center" colspan=1>Ksz<br>(hi)</td>
396<td align="center" colspan=1>Dsz<br>(hi)</td>
397<td align="center" colspan=2>Datasize<br>(low)</td>
398<td align="center" colspan=1>Payload</td>
399<td align="center" colspan=4>Overflow<br>Pointer</td>
400</tr>
401</table>
402</blockquote>
403
404<p>
405The first four bytes are the pointer. The size of the key is a 24-bit
406where the upper 8 bits are taken from byte 8 and the lower 16 bits are
407taken from bytes 4 and 5 (or bytes 5 and 4 on little-endian machines.)
408The size of the data is another 24-bit value where the upper 8 bits
409are taken from byte 9 and the lower 16 bits are taken from bytes 10 and
41011 or 11 and 10, depending on the byte order. Bytes 6 and 7 are the
411offset to the next cell in the linked list of all cells on the current
412page. This offset is 0 for the last cell on the page.
413</p>
414
415<p>
416The payload itself can be any number of bytes between 1 and 1048576.
417But space to hold the payload is allocated in 4-byte chunks up to
418238 bytes. If the entry contains more than 238 bytes of payload, then
419additional payload data is stored on a linked list of overflow pages.
420A 4 byte page number is appended to the cell that contains the first
421page of this linked list.
422</p>
423
424<p>
425Each overflow page begins with a 4-byte value which is the
426page number of the next overflow page in the list. This value is
4270 for the last page in the list. The remaining
4281020 bytes of the overflow page are available for storing payload.
429Note that a full page is allocated regardless of the number of overflow
430bytes stored. Thus, if the total payload for an entry is 239 bytes,
431the first 238 are stored in the cell and the overflow page stores just
432one byte.
433</p>
434
435<p>
436The structure of an overflow page looks like this:
437</p>
438
439<blockquote>
440<table border=1 cellspacing=0 cellpadding=5>
441<tr>
442<td align="center" width=20>0</td>
443<td align="center" width=20>1</td>
444<td align="center" width=20>2</td>
445<td align="center" width=20>3</td>
446<td align="center" width=200>4 ... 1023</td>
447</tr>
448<tr>
449<td align="center" colspan=4>Next Page</td>
450<td align="center" colspan=1>Overflow Data</td>
451</tr>
452</table>
453</blockquote>
454
455<p>
456All space on a b-tree page which is not used by the header or by cells
457is filled by freeblocks. Freeblocks, like cells, are variable in size.
458The size of a freeblock is at least 4 bytes and is always a multiple of
4594 bytes.
460The first 4 bytes contain a header and the remaining bytes
461are unused. The structure of the freeblock is as follows:
462</p>
463
464<blockquote>
465<table border=1 cellspacing=0 cellpadding=5>
466<tr>
467<td align="center" width=20>0</td>
468<td align="center" width=20>1</td>
469<td align="center" width=20>2</td>
470<td align="center" width=20>3</td>
471<td align="center" width=200>4 ... 1015</td>
472</tr>
473<tr>
474<td align="center" colspan=2>Size</td>
475<td align="center" colspan=2>Next</td>
476<td align="center" colspan=1>Unused</td>
477</tr>
478</table>
479</blockquote>
480
481<p>
482Freeblocks are stored in a linked list in increasing order. That is
483to say, the first freeblock occurs at a lower index into the page than
484the second free block, and so forth. The first 2 bytes of the header
485are an integer which is the total number of bytes in the freeblock.
486The second 2 bytes are the index into the page of the next freeblock
487in the list. The last freeblock has a Next value of 0.
488</p>
489
490<p>
491When a new b-tree is created in a database, the root page of the b-tree
492consist of a header and a single 1016 byte freeblock. As entries are
493added, space is carved off of that freeblock and used to make cells.
494When b-tree entries are deleted, the space used by their cells is converted
495into freeblocks. Adjacent freeblocks are merged, but the page can still
496become fragmented. The b-tree code will occasionally try to defragment
497the page by moving all cells to the beginning and constructing a single
498freeblock at the end to take up all remaining space.
499</p>
500
501<h4>3.3 &nbsp; The B-Tree Free Page List</h4>
502
503<p>
504When information is removed from an SQLite database such that one or
505more pages are no longer needed, those pages are added to a list of
506free pages so that they can be reused later when new information is
507added. This subsection describes the structure of this freelist.
508</p>
509
510<p>
511The 32-bit integer beginning at byte-offset 52 in page 1 of the database
512contains the address of the first page in a linked list of free pages.
513If there are no free pages available, this integer has a value of 0.
514The 32-bit integer at byte-offset 56 in page 1 contains the number of
515free pages on the freelist.
516</p>
517
518<p>
519The freelist contains a trunk and many branches. The trunk of
520the freelist is composed of overflow pages. That is to say, each page
521contains a single 32-bit integer at byte offset 0 which
522is the page number of the next page on the freelist trunk.
523The payload area
524of each trunk page is used to record pointers to branch pages.
525The first 32-bit integer in the payload area of a trunk page
526is the number of branch pages to follow (between 0 and 254)
527and each subsequent 32-bit integer is a page number for a branch page.
528The following diagram shows the structure of a trunk freelist page:
529</p>
530
531<blockquote>
532<table border=1 cellspacing=0 cellpadding=5>
533<tr>
534<td align="center" width=20>0</td>
535<td align="center" width=20>1</td>
536<td align="center" width=20>2</td>
537<td align="center" width=20>3</td>
538<td align="center" width=20>4</td>
539<td align="center" width=20>5</td>
540<td align="center" width=20>6</td>
541<td align="center" width=20>7</td>
542<td align="center" width=200>8 ... 1023</td>
543</tr>
544<tr>
545<td align="center" colspan=4>Next trunk page</td>
546<td align="center" colspan=4># of branch pages</td>
547<td align="center" colspan=1>Page numbers for branch pages</td>
548</tr>
549</table>
550</blockquote>
551
552<p>
553It is important to note that only the pages on the trunk of the freelist
554contain pointers to other pages. The branch pages contain no
555data whatsoever. The fact that the branch pages are completely
556blank allows for an important optimization in the paging layer. When
557a branch page is removed from the freelist to be reused, it is not
558necessary to write the original content of that page into the rollback
559journal. The branch page contained no data to begin with, so there is
560no need to restore the page in the event of a rollback. Similarly,
561when a page is not longer needed and is added to the freelist as a branch
562page, it is not necessary to write the content of that page
563into the database file.
564Again, the page contains no real data so it is not necessary to record the
565content of that page. By reducing the amount of disk I/O required,
566these two optimizations allow some database operations
567to go four to six times faster than they would otherwise.
568</p>
569
570<h3>4.0 &nbsp; The Schema Layer</h3>
571
572<p>
573The schema layer implements an SQL database on top of one or more
574b-trees and keeps track of the root page numbers for all b-trees.
575Where the b-tree layer provides only unformatted data storage with
576a unique key, the schema layer allows each entry to contain multiple
577columns. The schema layer also allows indices and non-unique key values.
578</p>
579
580<p>
581The schema layer implements two separate data storage abstractions:
582tables and indices. Each table and each index uses its own b-tree
583but they use the b-tree capabilities in different ways. For a table,
584the b-tree key is a unique 4-byte integer and the b-tree data is the
585content of the table row, encoded so that columns can be separately
586extracted. For indices, the b-tree key varies in size depending on the
587size of the fields being indexed and the b-tree data is empty.
588</p>
589
590<h4>4.1 &nbsp; SQL Table Implementation Details</h4>
591
592<p>Each row of an SQL table is stored in a single b-tree entry.
593The b-tree key is a 4-byte big-endian integer that is the ROWID
594or INTEGER PRIMARY KEY for that table row.
595The key is stored in a big-endian format so
596that keys will sort in numerical order using memcmp() function.</p>
597
598<p>The content of a table row is stored in the data portion of
599the corresponding b-tree table. The content is encoded to allow
600individual columns of the row to be extracted as necessary. Assuming
601that the table has N columns, the content is encoded as N+1 offsets
602followed by N column values, as follows:
603</p>
604
605<blockquote>
606<table border=1 cellspacing=0 cellpadding=5>
607<tr>
608<td>offset 0</td>
609<td>offset 1</td>
610<td><b>...</b></td>
611<td>offset N-1</td>
612<td>offset N</td>
613<td>value 0</td>
614<td>value 1</td>
615<td><b>...</b></td>
616<td>value N-1</td>
617</tr>
618</table>
619</blockquote>
620
621<p>
622The offsets can be either 8-bit, 16-bit, or 24-bit integers depending
623on how much data is to be stored. If the total size of the content
624is less than 256 bytes then 8-bit offsets are used. If the total size
625of the b-tree data is less than 65536 then 16-bit offsets are used.
62624-bit offsets are used otherwise. Offsets are always little-endian,
627which means that the least significant byte occurs first.
628</p>
629
630<p>
631Data is stored as a nul-terminated string. Any empty string consists
632of just the nul terminator. A NULL value is an empty string with no
633nul-terminator. Thus a NULL value occupies zero bytes and an empty string
634occupies 1 byte.
635</p>
636
637<p>
638Column values are stored in the order that they appear in the CREATE TABLE
639statement. The offsets at the beginning of the record contain the
640byte index of the corresponding column value. Thus, Offset 0 contains
641the byte index for Value 0, Offset 1 contains the byte offset
642of Value 1, and so forth. The number of bytes in a column value can
643always be found by subtracting offsets. This allows NULLs to be
644recovered from the record unambiguously.
645</p>
646
647<p>
648Most columns are stored in the b-tree data as described above.
649The one exception is column that has type INTEGER PRIMARY KEY.
650INTEGER PRIMARY KEY columns correspond to the 4-byte b-tree key.
651When an SQL statement attempts to read the INTEGER PRIMARY KEY,
652the 4-byte b-tree key is read rather than information out of the
653b-tree data. But there is still an Offset associated with the
654INTEGER PRIMARY KEY, just like any other column. But the Value
655associated with that offset is always NULL.
656</p>
657
658<h4>4.2 &nbsp; SQL Index Implementation Details</h4>
659
660<p>
661SQL indices are implement using a b-tree in which the key is used
662but the data is always empty. The purpose of an index is to map
663one or more column values into the ROWID for the table entry that
664contains those column values.
665</p>
666
667<p>
668Each b-tree in an index consists of one or more column values followed
669by a 4-byte ROWID. Each column value is nul-terminated (even NULL values)
670and begins with a single character that indicates the datatype for that
671column value. Only three datatypes are supported: NULL, Number, and
672Text. NULL values are encoded as the character 'a' followed by the
673nul terminator. Numbers are encoded as the character 'b' followed by
674a string that has been crafted so that sorting the string using memcmp()
675will sort the corresponding numbers in numerical order. (See the
676sqliteRealToSortable() function in util.c of the SQLite sources for
677additional information on this encoding.) Numbers are also nul-terminated.
678Text values consists of the character 'c' followed by a copy of the
679text string and a nul-terminator. These encoding rules result in
680NULLs being sorted first, followed by numerical values in numerical
681order, followed by text values in lexicographical order.
682</p>
683
684<h4>4.4 &nbsp; SQL Schema Storage And Root B-Tree Page Numbers</h4>
685
686<p>
687The database schema is stored in the database in a special tabled named
688"sqlite_master" and which always has a root b-tree page number of 2.
689This table contains the original CREATE TABLE,
690CREATE INDEX, CREATE VIEW, and CREATE TRIGGER statements used to define
691the database to begin with. Whenever an SQLite database is opened,
692the sqlite_master table is scanned from beginning to end and
693all the original CREATE statements are played back through the parser
694in order to reconstruct an in-memory representation of the database
695schema for use in subsequent command parsing. For each CREATE TABLE
696and CREATE INDEX statement, the root page number for the corresponding
697b-tree is also recorded in the sqlite_master table so that SQLite will
698know where to look for the appropriate b-tree.
699</p>
700
701<p>
702SQLite users can query the sqlite_master table just like any other table
703in the database. But the sqlite_master table cannot be directly written.
704The sqlite_master table is automatically updated in response to CREATE
705and DROP statements but it cannot be changed using INSERT, UPDATE, or
706DELETE statements as that would risk corrupting the database.
707</p>
708
709<p>
710SQLite stores temporary tables and indices in a separate
711file from the main database file. The temporary table database file
712is the same structure as the main database file. The schema table
713for the temporary tables is stored on page 2 just as in the main
714database. But the schema table for the temporary database named
715"sqlite_temp_master" instead of "sqlite_master". Other than the
716name change, it works exactly the same.
717</p>
718
719<h4>4.4 &nbsp; Schema Version Numbering And Other Meta-Information</h4>
720
721<p>
722The nine 32-bit integers that are stored beginning at byte offset
72360 of Page 1 in the b-tree layer are passed up into the schema layer
724and used for versioning and configuration information. The meaning
725of the first four integers is shown below. The other five are currently
726unused.
727</p>
728
729<ol>
730<li>The schema version number</li>
731<li>The format version number</li>
732<li>The recommended pager cache size</li>
733<li>The safety level</li>
734</ol>
735
736<p>
737The first meta-value, the schema version number, is used to detect when
738the schema of the database is changed by a CREATE or DROP statement.
739Recall that when a database is first opened the sqlite_master table is
740scanned and an internal representation of the tables, indices, views,
741and triggers for the database is built in memory. This internal
742representation is used for all subsequent SQL command parsing and
743execution. But what if another process were to change the schema
744by adding or removing a table, index, view, or trigger? If the original
745process were to continue using the old schema, it could potentially
746corrupt the database by writing to a table that no longer exists.
747To avoid this problem, the schema version number is changed whenever
748a CREATE or DROP statement is executed. Before each command is
749executed, the current schema version number for the database file
750is compared against the schema version number from when the sqlite_master
751table was last read. If those numbers are different, the internal
752schema representation is erased and the sqlite_master table is reread
753to reconstruct the internal schema representation.
754(Calls to sqlite_exec() generally return SQLITE_SCHEMA when this happens.)
755</p>
756
757<p>
758The second meta-value is the schema format version number. This
759number tells what version of the schema layer should be used to
760interpret the file. There have been changes to the schema layer
761over time and this number is used to detect when an older database
762file is being processed by a newer version of the library.
763As of this writing (SQLite version 2.7.0) the current format version
764is "4".
765</p>
766
767<p>
768The third meta-value is the recommended pager cache size as set
769by the DEFAULT_CACHE_SIZE pragma. If the value is positive it
770means that synchronous behavior is enable (via the DEFAULT_SYNCHRONOUS
771pragma) and if negative it means that synchronous behavior is
772disabled.
773</p>
774
775<p>
776The fourth meta-value is safety level added in version 2.8.0.
777A value of 1 corresponds to a SYNCHRONOUS setting of OFF. In other
778words, SQLite does not pause to wait for journal data to reach the disk
779surface before overwriting pages of the database. A value of 2 corresponds
780to a SYNCHRONOUS setting of NORMAL. A value of 3 corresponds to a
781SYNCHRONOUS setting of FULL. If the value is 0, that means it has not
782been initialized so the default synchronous setting of NORMAL is used.
783</p>
784}
785footer $rcsid