From e36d23a85ebff914d74bb541558c2b6082b78edb Mon Sep 17 00:00:00 2001 From: dan miller Date: Sat, 20 Oct 2007 02:49:29 +0000 Subject: sqlite source (unix build) added to libraries --- .../sqlite/unix/sqlite-3.5.1/www/fileformat.tcl | 785 +++++++++++++++++++++ 1 file changed, 785 insertions(+) create mode 100644 libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl (limited to 'libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl') diff --git a/libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl b/libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl new file mode 100644 index 0000000..d143f08 --- /dev/null +++ b/libraries/sqlite/unix/sqlite-3.5.1/www/fileformat.tcl @@ -0,0 +1,785 @@ +# +# Run this script to generated a fileformat.html output file +# +set rcsid {$Id: fileformat.tcl,v 1.13 2004/10/10 17:24:55 drh Exp $} +source common.tcl +header {SQLite Database File Format (Version 2)} +puts { +

SQLite 2.X Database File Format

+ +

+This document describes the disk file format for SQLite versions 2.1 +through 2.8. SQLite version 3.0 and following uses a very different +format which is described separately. +

+ +

1.0   Layers

+ +

+SQLite is implemented in layers. +(See the architecture description.) +The format of database files is determined by three different +layers in the architecture. +

+ + + +

+We will describe each layer beginning with the bottom (pager) +layer and working upwards. +

+ +

2.0   The Pager Layer

+ +

+An SQLite database consists of +"pages" of data. Each page is 1024 bytes in size. +Pages are numbered beginning with 1. +A page number of 0 is used to indicate "no such page" in the +B-Tree and Schema layers. +

+ +

+The pager layer is responsible for implementing transactions +with atomic commit and rollback. It does this using a separate +journal file. Whenever a new transaction is started, a journal +file is created that records the original state of the database. +If the program terminates before completing the transaction, the next +process to open the database can use the journal file to restore +the database to its original state. +

+ +

+The journal file is located in the same directory as the database +file and has the same name as the database file but with the +characters "-journal" appended. +

+ +

+The pager layer does not impose any content restrictions on the +main database file. As far as the pager is concerned, each page +contains 1024 bytes of arbitrary data. But there is structure to +the journal file. +

+ +

+A journal file begins with 8 bytes as follows: +0xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, and 0xd6. +Processes that are attempting to rollback a journal use these 8 bytes +as a sanity check to make sure the file they think is a journal really +is a valid journal. Prior version of SQLite used different journal +file formats. The magic numbers for these prior formats are different +so that if a new version of the library attempts to rollback a journal +created by an earlier version, it can detect that the journal uses +an obsolete format and make the necessary adjustments. This article +describes only the newest journal format - supported as of version +2.8.0. +

+ +

+Following the 8 byte prefix is a three 4-byte integers that tell us +the number of pages that have been committed to the journal, +a magic number used for +sanity checking each page, and the +original size of the main database file before the transaction was +started. The number of committed pages is used to limit how far +into the journal to read. The use of the checksum magic number is +described below. +The original size of the database is used to restore the database +file back to its original size. +The size is expressed in pages (1024 bytes per page). +

+ +

+All three integers in the journal header and all other multi-byte +numbers used in the journal file are big-endian. +That means that the most significant byte +occurs first. That way, a journal file that is +originally created on one machine can be rolled back by another +machine that uses a different byte order. So, for example, a +transaction that failed to complete on your big-endian SparcStation +can still be rolled back on your little-endian Linux box. +

+ +

+After the 8-byte prefix and the three 4-byte integers, the +journal file consists of zero or more page records. Each page +record is a 4-byte (big-endian) page number followed by 1024 bytes +of data and a 4-byte checksum. +The data is the original content of the database page +before the transaction was started. So to roll back the transaction, +the data is simply written into the corresponding page of the +main database file. Pages can appear in the journal in any order, +but they are guaranteed to appear only once. All page numbers will be +between 1 and the maximum specified by the page size integer that +appeared at the beginning of the journal. +

+ +

+The so-called checksum at the end of each record is not really a +checksum - it is the sum of the page number and the magic number which +was the second integer in the journal header. The purpose of this +value is to try to detect journal corruption that might have occurred +because of a power loss or OS crash that occurred which the journal +file was being written to disk. It could have been the case that the +meta-data for the journal file, specifically the size of the file, had +been written to the disk so that when the machine reboots it appears that +file is large enough to hold the current record. But even though the +file size has changed, the data for the file might not have made it to +the disk surface at the time of the OS crash or power loss. This means +that after reboot, the end of the journal file will contain quasi-random +garbage data. The checksum is an attempt to detect such corruption. If +the checksum does not match, that page of the journal is not rolled back. +

+ +

+Here is a summary of the journal file format: +

+ + + +

3.0   The B-Tree Layer

+ +

+The B-Tree layer builds on top of the pager layer to implement +one or more separate b-trees all in the same disk file. The +algorithms used are taken from Knuth's The Art Of Computer +Programming.

+ +

+Page 1 of a database contains a header string used for sanity +checking, a few 32-bit words of configuration data, and a pointer +to the beginning of a list of unused pages in the database. +All other pages in the +database are either pages of a b-tree, overflow pages, or unused +pages on the freelist. +

+ +

+Each b-tree page contains zero or more database entries. +Each entry has an unique key of one or more bytes and data of +zero or more bytes. +Both the key and data are arbitrary byte sequences. The combination +of key and data are collectively known as "payload". The current +implementation limits the amount of payload in a single entry to +1048576 bytes. This limit can be raised to 16777216 by adjusting +a single #define in the source code and recompiling. But most entries +contain less than a hundred bytes of payload so a megabyte limit seems +more than enough. +

+ +

+Up to 238 bytes of payload for an entry can be held directly on +a b-tree page. Any additional payload is contained on a linked list +of overflow pages. This limit on the amount of payload held directly +on b-tree pages guarantees that each b-tree page can hold at least +4 entries. In practice, most entries are smaller than 238 bytes and +thus most pages can hold more than 4 entries. +

+ +

+A single database file can hold any number of separate, independent b-trees. +Each b-tree is identified by its root page, which never changes. +Child pages of the b-tree may change as entries are added and removed +and pages split and combine. But the root page always stays the same. +The b-tree itself does not record which pages are root pages and which +are not. That information is handled entirely at the schema layer. +

+ +

3.1   B-Tree Page 1 Details

+ +

+Page 1 begins with the following 48-byte string: +

+ +
+** This file contains an SQLite 2.1 database **
+
+ +

+If you count the number of characters in the string above, you will +see that there are only 47. A '\000' terminator byte is added to +bring the total to 48. +

+ +

+A frequent question is why the string says version 2.1 when (as +of this writing) we are up to version 2.7.0 of SQLite and any +change to the second digit of the version is suppose to represent +a database format change. The answer to this is that the B-tree +layer has not changed any since version 2.1. There have been +database format changes since version 2.1 but those changes have +all been in the schema layer. Because the format of the b-tree +layer is unchanged since version 2.1.0, the header string still +says version 2.1. +

+ +

+After the format string is a 4-byte integer used to determine the +byte-order of the database. The integer has a value of +0xdae37528. If this number is expressed as 0xda, 0xe3, 0x75, 0x28, then +the database is in a big-endian format and all 16 and 32-bit integers +elsewhere in the b-tree layer are also big-endian. If the number is +expressed as 0x28, 0x75, 0xe3, and 0xda, then the database is in a +little-endian format and all other multi-byte numbers in the b-tree +layer are also little-endian. +Prior to version 2.6.3, the SQLite engine was only able to read databases +that used the same byte order as the processor they were running on. +But beginning with 2.6.3, SQLite can read or write databases in any +byte order. +

+ +

+After the byte-order code are six 4-byte integers. Each integer is in the +byte order determined by the byte-order code. The first integer is the +page number for the first page of the freelist. If there are no unused +pages in the database, then this integer is 0. The second integer is +the number of unused pages in the database. The last 4 integers are +not used by the b-tree layer. These are the so-called "meta" values that +are passed up to the schema layer +and used there for configuration and format version information. +All bytes of page 1 past beyond the meta-value integers are unused +and are initialized to zero. +

+ +

+Here is a summary of the information contained on page 1 in the b-tree layer: +

+ + + +

3.2   Structure Of A Single B-Tree Page

+ +

+Conceptually, a b-tree page contains N database entries and N+1 pointers +to other b-tree pages. +

+ +
+ + + + + + + + + + + +
Ptr
0
Entry
0
Ptr
1
Entry
1
...Ptr
N-1
Entry
N-1
Ptr
N
+
+ +

+The entries are arranged in increasing order. That is, the key to +Entry 0 is less than the key to Entry 1, and the key to Entry 1 is +less than the key of Entry 2, and so forth. The pointers point to +pages containing additional entries that have keys in between the +entries on either side. So Ptr 0 points to another b-tree page that +contains entries that all have keys less than Key 0, and Ptr 1 +points to a b-tree pages where all entries have keys greater than Key 0 +but less than Key 1, and so forth. +

+ +

+Each b-tree page in SQLite consists of a header, zero or more "cells" +each holding a single entry and pointer, and zero or more "free blocks" +that represent unused space on the page. +

+ +

+The header on a b-tree page is the first 8 bytes of the page. +The header contains the value +of the right-most pointer (Ptr N) and the byte offset into the page +of the first cell and the first free block. The pointer is a 32-bit +value and the offsets are each 16-bit values. We have: +

+ +
+ + + + + + + + + + + + + + + + +
01234567
Ptr NCell 0Freeblock 0
+
+ +

+The 1016 bytes of a b-tree page that come after the header contain +cells and freeblocks. All 1016 bytes are covered by either a cell +or a freeblock. +

+ +

+The cells are connected in a linked list. Cell 0 contains Ptr 0 and +Entry 0. Bytes 4 and 5 of the header point to Cell 0. Cell 0 then +points to Cell 1 which contains Ptr 1 and Entry 1. And so forth. +Cells vary in size. Every cell has a 12-byte header and at least 4 +bytes of payload space. Space is allocated to payload in increments +of 4 bytes. Thus the minimum size of a cell is 16 bytes and up to +63 cells can fit on a single page. The size of a cell is always a multiple +of 4 bytes. +A cell can have up to 238 bytes of payload space. If +the payload is more than 238 bytes, then an additional 4 byte page +number is appended to the cell which is the page number of the first +overflow page containing the additional payload. The maximum size +of a cell is thus 254 bytes, meaning that a least 4 cells can fit into +the 1016 bytes of space available on a b-tree page. +An average cell is usually around 52 to 100 bytes in size with about +10 or 20 cells to a page. +

+ +

+The data layout of a cell looks like this: +

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
0123456789101112 ... 249250251252253
PtrKeysize
(low)
NextKsz
(hi)
Dsz
(hi)
Datasize
(low)
PayloadOverflow
Pointer
+
+ +

+The first four bytes are the pointer. The size of the key is a 24-bit +where the upper 8 bits are taken from byte 8 and the lower 16 bits are +taken from bytes 4 and 5 (or bytes 5 and 4 on little-endian machines.) +The size of the data is another 24-bit value where the upper 8 bits +are taken from byte 9 and the lower 16 bits are taken from bytes 10 and +11 or 11 and 10, depending on the byte order. Bytes 6 and 7 are the +offset to the next cell in the linked list of all cells on the current +page. This offset is 0 for the last cell on the page. +

+ +

+The payload itself can be any number of bytes between 1 and 1048576. +But space to hold the payload is allocated in 4-byte chunks up to +238 bytes. If the entry contains more than 238 bytes of payload, then +additional payload data is stored on a linked list of overflow pages. +A 4 byte page number is appended to the cell that contains the first +page of this linked list. +

+ +

+Each overflow page begins with a 4-byte value which is the +page number of the next overflow page in the list. This value is +0 for the last page in the list. The remaining +1020 bytes of the overflow page are available for storing payload. +Note that a full page is allocated regardless of the number of overflow +bytes stored. Thus, if the total payload for an entry is 239 bytes, +the first 238 are stored in the cell and the overflow page stores just +one byte. +

+ +

+The structure of an overflow page looks like this: +

+ +
+ + + + + + + + + + + + +
01234 ... 1023
Next PageOverflow Data
+
+ +

+All space on a b-tree page which is not used by the header or by cells +is filled by freeblocks. Freeblocks, like cells, are variable in size. +The size of a freeblock is at least 4 bytes and is always a multiple of +4 bytes. +The first 4 bytes contain a header and the remaining bytes +are unused. The structure of the freeblock is as follows: +

+ +
+ + + + + + + + + + + + + +
01234 ... 1015
SizeNextUnused
+
+ +

+Freeblocks are stored in a linked list in increasing order. That is +to say, the first freeblock occurs at a lower index into the page than +the second free block, and so forth. The first 2 bytes of the header +are an integer which is the total number of bytes in the freeblock. +The second 2 bytes are the index into the page of the next freeblock +in the list. The last freeblock has a Next value of 0. +

+ +

+When a new b-tree is created in a database, the root page of the b-tree +consist of a header and a single 1016 byte freeblock. As entries are +added, space is carved off of that freeblock and used to make cells. +When b-tree entries are deleted, the space used by their cells is converted +into freeblocks. Adjacent freeblocks are merged, but the page can still +become fragmented. The b-tree code will occasionally try to defragment +the page by moving all cells to the beginning and constructing a single +freeblock at the end to take up all remaining space. +

+ +

3.3   The B-Tree Free Page List

+ +

+When information is removed from an SQLite database such that one or +more pages are no longer needed, those pages are added to a list of +free pages so that they can be reused later when new information is +added. This subsection describes the structure of this freelist. +

+ +

+The 32-bit integer beginning at byte-offset 52 in page 1 of the database +contains the address of the first page in a linked list of free pages. +If there are no free pages available, this integer has a value of 0. +The 32-bit integer at byte-offset 56 in page 1 contains the number of +free pages on the freelist. +

+ +

+The freelist contains a trunk and many branches. The trunk of +the freelist is composed of overflow pages. That is to say, each page +contains a single 32-bit integer at byte offset 0 which +is the page number of the next page on the freelist trunk. +The payload area +of each trunk page is used to record pointers to branch pages. +The first 32-bit integer in the payload area of a trunk page +is the number of branch pages to follow (between 0 and 254) +and each subsequent 32-bit integer is a page number for a branch page. +The following diagram shows the structure of a trunk freelist page: +

+ +
+ + + + + + + + + + + + + + + + + +
012345678 ... 1023
Next trunk page# of branch pagesPage numbers for branch pages
+
+ +

+It is important to note that only the pages on the trunk of the freelist +contain pointers to other pages. The branch pages contain no +data whatsoever. The fact that the branch pages are completely +blank allows for an important optimization in the paging layer. When +a branch page is removed from the freelist to be reused, it is not +necessary to write the original content of that page into the rollback +journal. The branch page contained no data to begin with, so there is +no need to restore the page in the event of a rollback. Similarly, +when a page is not longer needed and is added to the freelist as a branch +page, it is not necessary to write the content of that page +into the database file. +Again, the page contains no real data so it is not necessary to record the +content of that page. By reducing the amount of disk I/O required, +these two optimizations allow some database operations +to go four to six times faster than they would otherwise. +

+ +

4.0   The Schema Layer

+ +

+The schema layer implements an SQL database on top of one or more +b-trees and keeps track of the root page numbers for all b-trees. +Where the b-tree layer provides only unformatted data storage with +a unique key, the schema layer allows each entry to contain multiple +columns. The schema layer also allows indices and non-unique key values. +

+ +

+The schema layer implements two separate data storage abstractions: +tables and indices. Each table and each index uses its own b-tree +but they use the b-tree capabilities in different ways. For a table, +the b-tree key is a unique 4-byte integer and the b-tree data is the +content of the table row, encoded so that columns can be separately +extracted. For indices, the b-tree key varies in size depending on the +size of the fields being indexed and the b-tree data is empty. +

+ +

4.1   SQL Table Implementation Details

+ +

Each row of an SQL table is stored in a single b-tree entry. +The b-tree key is a 4-byte big-endian integer that is the ROWID +or INTEGER PRIMARY KEY for that table row. +The key is stored in a big-endian format so +that keys will sort in numerical order using memcmp() function.

+ +

The content of a table row is stored in the data portion of +the corresponding b-tree table. The content is encoded to allow +individual columns of the row to be extracted as necessary. Assuming +that the table has N columns, the content is encoded as N+1 offsets +followed by N column values, as follows: +

+ +
+ + + + + + + + + + + + +
offset 0offset 1...offset N-1offset Nvalue 0value 1...value N-1
+
+ +

+The offsets can be either 8-bit, 16-bit, or 24-bit integers depending +on how much data is to be stored. If the total size of the content +is less than 256 bytes then 8-bit offsets are used. If the total size +of the b-tree data is less than 65536 then 16-bit offsets are used. +24-bit offsets are used otherwise. Offsets are always little-endian, +which means that the least significant byte occurs first. +

+ +

+Data is stored as a nul-terminated string. Any empty string consists +of just the nul terminator. A NULL value is an empty string with no +nul-terminator. Thus a NULL value occupies zero bytes and an empty string +occupies 1 byte. +

+ +

+Column values are stored in the order that they appear in the CREATE TABLE +statement. The offsets at the beginning of the record contain the +byte index of the corresponding column value. Thus, Offset 0 contains +the byte index for Value 0, Offset 1 contains the byte offset +of Value 1, and so forth. The number of bytes in a column value can +always be found by subtracting offsets. This allows NULLs to be +recovered from the record unambiguously. +

+ +

+Most columns are stored in the b-tree data as described above. +The one exception is column that has type INTEGER PRIMARY KEY. +INTEGER PRIMARY KEY columns correspond to the 4-byte b-tree key. +When an SQL statement attempts to read the INTEGER PRIMARY KEY, +the 4-byte b-tree key is read rather than information out of the +b-tree data. But there is still an Offset associated with the +INTEGER PRIMARY KEY, just like any other column. But the Value +associated with that offset is always NULL. +

+ +

4.2   SQL Index Implementation Details

+ +

+SQL indices are implement using a b-tree in which the key is used +but the data is always empty. The purpose of an index is to map +one or more column values into the ROWID for the table entry that +contains those column values. +

+ +

+Each b-tree in an index consists of one or more column values followed +by a 4-byte ROWID. Each column value is nul-terminated (even NULL values) +and begins with a single character that indicates the datatype for that +column value. Only three datatypes are supported: NULL, Number, and +Text. NULL values are encoded as the character 'a' followed by the +nul terminator. Numbers are encoded as the character 'b' followed by +a string that has been crafted so that sorting the string using memcmp() +will sort the corresponding numbers in numerical order. (See the +sqliteRealToSortable() function in util.c of the SQLite sources for +additional information on this encoding.) Numbers are also nul-terminated. +Text values consists of the character 'c' followed by a copy of the +text string and a nul-terminator. These encoding rules result in +NULLs being sorted first, followed by numerical values in numerical +order, followed by text values in lexicographical order. +

+ +

4.4   SQL Schema Storage And Root B-Tree Page Numbers

+ +

+The database schema is stored in the database in a special tabled named +"sqlite_master" and which always has a root b-tree page number of 2. +This table contains the original CREATE TABLE, +CREATE INDEX, CREATE VIEW, and CREATE TRIGGER statements used to define +the database to begin with. Whenever an SQLite database is opened, +the sqlite_master table is scanned from beginning to end and +all the original CREATE statements are played back through the parser +in order to reconstruct an in-memory representation of the database +schema for use in subsequent command parsing. For each CREATE TABLE +and CREATE INDEX statement, the root page number for the corresponding +b-tree is also recorded in the sqlite_master table so that SQLite will +know where to look for the appropriate b-tree. +

+ +

+SQLite users can query the sqlite_master table just like any other table +in the database. But the sqlite_master table cannot be directly written. +The sqlite_master table is automatically updated in response to CREATE +and DROP statements but it cannot be changed using INSERT, UPDATE, or +DELETE statements as that would risk corrupting the database. +

+ +

+SQLite stores temporary tables and indices in a separate +file from the main database file. The temporary table database file +is the same structure as the main database file. The schema table +for the temporary tables is stored on page 2 just as in the main +database. But the schema table for the temporary database named +"sqlite_temp_master" instead of "sqlite_master". Other than the +name change, it works exactly the same. +

+ +

4.4   Schema Version Numbering And Other Meta-Information

+ +

+The nine 32-bit integers that are stored beginning at byte offset +60 of Page 1 in the b-tree layer are passed up into the schema layer +and used for versioning and configuration information. The meaning +of the first four integers is shown below. The other five are currently +unused. +

+ +
    +
  1. The schema version number
  2. +
  3. The format version number
  4. +
  5. The recommended pager cache size
  6. +
  7. The safety level
  8. +
+ +

+The first meta-value, the schema version number, is used to detect when +the schema of the database is changed by a CREATE or DROP statement. +Recall that when a database is first opened the sqlite_master table is +scanned and an internal representation of the tables, indices, views, +and triggers for the database is built in memory. This internal +representation is used for all subsequent SQL command parsing and +execution. But what if another process were to change the schema +by adding or removing a table, index, view, or trigger? If the original +process were to continue using the old schema, it could potentially +corrupt the database by writing to a table that no longer exists. +To avoid this problem, the schema version number is changed whenever +a CREATE or DROP statement is executed. Before each command is +executed, the current schema version number for the database file +is compared against the schema version number from when the sqlite_master +table was last read. If those numbers are different, the internal +schema representation is erased and the sqlite_master table is reread +to reconstruct the internal schema representation. +(Calls to sqlite_exec() generally return SQLITE_SCHEMA when this happens.) +

+ +

+The second meta-value is the schema format version number. This +number tells what version of the schema layer should be used to +interpret the file. There have been changes to the schema layer +over time and this number is used to detect when an older database +file is being processed by a newer version of the library. +As of this writing (SQLite version 2.7.0) the current format version +is "4". +

+ +

+The third meta-value is the recommended pager cache size as set +by the DEFAULT_CACHE_SIZE pragma. If the value is positive it +means that synchronous behavior is enable (via the DEFAULT_SYNCHRONOUS +pragma) and if negative it means that synchronous behavior is +disabled. +

+ +

+The fourth meta-value is safety level added in version 2.8.0. +A value of 1 corresponds to a SYNCHRONOUS setting of OFF. In other +words, SQLite does not pause to wait for journal data to reach the disk +surface before overwriting pages of the database. A value of 2 corresponds +to a SYNCHRONOUS setting of NORMAL. A value of 3 corresponds to a +SYNCHRONOUS setting of FULL. If the value is 0, that means it has not +been initialized so the default synchronous setting of NORMAL is used. +

+} +footer $rcsid -- cgit v1.1