aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/libraries/sqlite/unix/sqlite-3.5.1/www/lockingv3.tcl
diff options
context:
space:
mode:
authordan miller2007-10-20 02:49:29 +0000
committerdan miller2007-10-20 02:49:29 +0000
commite36d23a85ebff914d74bb541558c2b6082b78edb (patch)
tree54b58fdf162e78af64055282a6035c8d2443389d /libraries/sqlite/unix/sqlite-3.5.1/www/lockingv3.tcl
parent* Fixed an issue whereby avatar chat distances were being calculated against ... (diff)
downloadopensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.zip
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.gz
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.bz2
opensim-SC-e36d23a85ebff914d74bb541558c2b6082b78edb.tar.xz
sqlite source (unix build) added to libraries
Diffstat (limited to 'libraries/sqlite/unix/sqlite-3.5.1/www/lockingv3.tcl')
-rw-r--r--libraries/sqlite/unix/sqlite-3.5.1/www/lockingv3.tcl570
1 files changed, 570 insertions, 0 deletions
diff --git a/libraries/sqlite/unix/sqlite-3.5.1/www/lockingv3.tcl b/libraries/sqlite/unix/sqlite-3.5.1/www/lockingv3.tcl
new file mode 100644
index 0000000..c51cf79
--- /dev/null
+++ b/libraries/sqlite/unix/sqlite-3.5.1/www/lockingv3.tcl
@@ -0,0 +1,570 @@
1#
2# Run this script to generated a lockingv3.html output file
3#
4set rcsid {$Id: }
5source common.tcl
6header {File Locking And Concurrency In SQLite Version 3}
7
8proc HEADING {level title {label {}}} {
9 global pnum
10 incr pnum($level)
11 foreach i [array names pnum] {
12 if {$i>$level} {set pnum($i) 0}
13 }
14 set h [expr {$level+1}]
15 if {$h>6} {set h 6}
16 set n $pnum(1).$pnum(2)
17 for {set i 3} {$i<=$level} {incr i} {
18 append n .$pnum($i)
19 }
20 if {$label!=""} {
21 puts "<a name=\"$label\">"
22 }
23 puts "<h$h>$n $title</h$h>"
24}
25set pnum(1) 0
26set pnum(2) 0
27set pnum(3) 0
28set pnum(4) 0
29set pnum(5) 0
30set pnum(6) 0
31set pnum(7) 0
32set pnum(8) 0
33
34HEADING 1 {File Locking And Concurrency In SQLite Version 3}
35
36puts {
37<p>Version 3 of SQLite introduces a more complex locking and journaling
38mechanism designed to improve concurrency and reduce the writer starvation
39problem. The new mechanism also allows atomic commits of transactions
40involving multiple database files.
41This document describes the new locking mechanism.
42The intended audience is programmers who want to understand and/or modify
43the pager code and reviewers working to verify the design
44of SQLite version 3.
45</p>
46}
47
48HEADING 1 {Overview} overview
49
50puts {
51<p>
52Locking and concurrency control are handled by the the
53<a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/pager.c">
54pager module</a>.
55The pager module is responsible for making SQLite "ACID" (Atomic,
56Consistent, Isolated, and Durable). The pager module makes sure changes
57happen all at once, that either all changes occur or none of them do,
58that two or more processes do not try to access the database
59in incompatible ways at the same time, and that once changes have been
60written they persist until explicitly deleted. The pager also provides
61an memory cache of some of the contents of the disk file.</p>
62
63<p>The pager is unconcerned
64with the details of B-Trees, text encodings, indices, and so forth.
65From the point of view of the pager the database consists of
66a single file of uniform-sized blocks. Each block is called a
67"page" and is usually 1024 bytes in size. The pages are numbered
68beginning with 1. So the first 1024 bytes of the database are called
69"page 1" and the second 1024 bytes are call "page 2" and so forth. All
70other encoding details are handled by higher layers of the library.
71The pager communicates with the operating system using one of several
72modules
73(Examples:
74<a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/os_unix.c">
75os_unix.c</a>,
76<a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/os_win.c">
77os_win.c</a>)
78that provides a uniform abstraction for operating system services.
79</p>
80
81<p>The pager module effectively controls access for separate threads, or
82separate processes, or both. Throughout this document whenever the
83word "process" is written you may substitute the word "thread" without
84changing the truth of the statement.</p>
85}
86
87HEADING 1 {Locking} locking
88
89puts {
90<p>
91From the point of view of a single process, a database file
92can be in one of five locking states:
93</p>
94
95<p>
96<table cellpadding="20">
97<tr><td valign="top">UNLOCKED</td>
98<td valign="top">
99No locks are held on the database. The database may be neither read nor
100written. Any internally cached data is considered suspect and subject to
101verification against the database file before being used. Other
102processes can read or write the database as their own locking states
103permit. This is the default state.
104</td></tr>
105
106<tr><td valign="top">SHARED</td>
107<td valign="top">
108The database may be read but not written. Any number of
109processes can hold SHARED locks at the same time, hence there can be
110many simultaneous readers. But no other thread or process is allowed
111to write to the database file while one or more SHARED locks are active.
112</td></tr>
113
114<tr><td valign="top">RESERVED</td>
115<td valign="top">
116A RESERVED lock means that the process is planning on writing to the
117database file at some point in the future but that it is currently just
118reading from the file. Only a single RESERVED lock may be active at one
119time, though multiple SHARED locks can coexist with a single RESERVED lock.
120RESERVED differs from PENDING in that new SHARED locks can be acquired
121while there is a RESERVED lock.
122</td></tr>
123
124<tr><td valign="top">PENDING</td>
125<td valign="top">
126A PENDING lock means that the process holding the lock wants to write
127to the database as soon as possible and is just waiting on all current
128SHARED locks to clear so that it can get an EXCLUSIVE lock. No new
129SHARED locks are permitted against the database if
130a PENDING lock is active, though existing SHARED locks are allowed to
131continue.
132</td></tr>
133
134<tr><td valign="top">EXCLUSIVE</td>
135<td valign="top">
136An EXCLUSIVE lock is needed in order to write to the database file.
137Only one EXCLUSIVE lock is allowed on the file and no other locks of
138any kind are allowed to coexist with an EXCLUSIVE lock. In order to
139maximize concurrency, SQLite works to minimize the amount of time that
140EXCLUSIVE locks are held.
141</td></tr>
142</table>
143</p>
144
145<p>
146The operating system interface layer understands and tracks all five
147locking states described above.
148The pager module only tracks four of the five locking states.
149A PENDING lock is always just a temporary
150stepping stone on the path to an EXCLUSIVE lock and so the pager module
151does not track PENDING locks.
152</p>
153}
154
155HEADING 1 {The Rollback Journal} rollback
156
157puts {
158<p>Any time a process wants to make a changes to a database file, it
159first records enough information in the <em>rollback journal</em> to
160restore the database file back to its initial condition. Thus, before
161altering any page of the database, the original contents of that page
162must be written into the journal. The journal also records the initial
163size of the database so that if the database file grows it can be truncated
164back to its original size on a rollback.</p>
165
166<p>The rollback journal is a ordinary disk file that has the same name as
167the database file with the suffix "<tt>-journal</tt>" added.</p>
168
169<p>If SQLite is working with multiple databases at the same time
170(using the ATTACH command) then each database has its own journal.
171But there is also a separate aggregate journal
172called the <em>master journal</em>.
173The master journal does not contain page data used for rolling back
174changes. Instead the master journal contains the names of the
175individual file journals for each of the ATTACHed databases. Each of
176the individual file journals also contain the name of the master journal.
177If there are no ATTACHed databases (or if none of the ATTACHed database
178is participating in the current transaction) no master journal is
179created and the normal rollback journal contains an empty string
180in the place normally reserved for recording the name of the master
181journal.</p>
182
183<p>A individual file journal is said to be <em>hot</em>
184if it needs to be rolled back
185in order to restore the integrity of its database.
186A hot journal is created when a process is in the middle of a database
187update and a program or operating system crash or power failure prevents
188the update from completing.
189Hot journals are an exception condition.
190Hot journals exist to recover from crashes and power failures.
191If everything is working correctly
192(that is, if there are no crashes or power failures)
193you will never get a hot journal.
194</p>
195
196<p>
197If no master journal is involved, then
198a journal is hot if it exists and its corresponding database file
199does not have a RESERVED lock.
200If a master journal is named in the file journal, then the file journal
201is hot if its master journal exists and there is no RESERVED
202lock on the corresponding database file.
203It is important to understand when a journal is hot so the
204preceding rules will be repeated in bullets:
205</p>
206
207<ul>
208<li>A journal is hot if...
209 <ul>
210 <li>It exists, and</li>
211 <li>It's master journal exists or the master journal name is an
212 empty string, and</li>
213 <li>There is no RESERVED lock on the corresponding database file.</li>
214 </ul>
215</li>
216</ul>
217}
218
219HEADING 2 {Dealing with hot journals} hot_journals
220
221puts {
222<p>
223Before reading from a a database file, SQLite always checks to see if that
224database file has a hot journal. If the file does have a hot journal, then
225the journal is rolled back before the file is read. In this way, we ensure
226that the database file is in a consistent state before it is read.
227</p>
228
229<p>When a process wants to read from a database file, it followed
230the following sequence of steps:
231</p>
232
233<ol>
234<li>Open the database file and obtain a SHARED lock. If the SHARED lock
235 cannot be obtained, fail immediately and return SQLITE_BUSY.</li>
236<li>Check to see if the database file has a hot journal. If the file
237 does not have a hot journal, we are done. Return immediately.
238 If there is a hot journal, that journal must be rolled back by
239 the subsequent steps of this algorithm.</li>
240<li>Acquire a PENDING lock then an EXCLUSIVE lock on the database file.
241 (Note: Do not acquire a RESERVED lock because that would make
242 other processes think the journal was no longer hot.) If we
243 fail to acquire these locks it means another process
244 is already trying to do the rollback. In that case,
245 drop all locks, close the database, and return SQLITE_BUSY. </li>
246<li>Read the journal file and roll back the changes.</li>
247<li>Wait for the rolled back changes to be written onto
248 the surface of the disk. This protects the integrity of the database
249 in case another power failure or crash occurs.</li>
250<li>Delete the journal file.</li>
251<li>Delete the master journal file if it is safe to do so.
252 This step is optional. It is here only to prevent stale
253 master journals from cluttering up the disk drive.
254 See the discussion below for details.</li>
255<li>Drop the EXCLUSIVE and PENDING locks but retain the SHARED lock.</li>
256</ol>
257
258<p>After the algorithm above completes successfully, it is safe to
259read from the database file. Once all reading has completed, the
260SHARED lock is dropped.</p>
261}
262
263HEADING 2 {Deleting stale master journals} stale_master_journals
264
265puts {
266<p>A stale master journal is a master journal that is no longer being
267used for anything. There is no requirement that stale master journals
268be deleted. The only reason for doing so is to free up disk space.</p>
269
270<p>A master journal is stale if no individual file journals are pointing
271to it. To figure out if a master journal is stale, we first read the
272master journal to obtain the names of all of its file journals. Then
273we check each of those file journals. If any of the file journals named
274in the master journal exists and points back to the master journal, then
275the master journal is not stale. If all file journals are either missing
276or refer to other master journals or no master journal at all, then the
277master journal we are testing is stale and can be safely deleted.</p>
278}
279
280HEADING 1 {Writing to a database file} writing
281
282puts {
283<p>To write to a database, a process must first acquire a SHARED lock
284as described above (possibly rolling back incomplete changes if there
285is a hot journal).
286After a SHARED lock is obtained, a RESERVED lock must be acquired.
287The RESERVED lock signals that the process intends to write to the
288database at some point in the future. Only one process at a time
289can hold a RESERVED lock. But other processes can continue to read
290the database while the RESERVED lock is held.
291</p>
292
293<p>If the process that wants to write is unable to obtain a RESERVED
294lock, it must mean that another process already has a RESERVED lock.
295In that case, the write attempt fails and returns SQLITE_BUSY.</p>
296
297<p>After obtaining a RESERVED lock, the process that wants to write
298creates a rollback journal. The header of the journal is initialized
299with the original size of the database file. Space in the journal header
300is also reserved for a master journal name, though the master journal
301name is initially empty.</p>
302
303<p>Before making changes to any page of the database, the process writes
304the original content of that page into the rollback journal. Changes
305to pages are held in memory at first and are not written to the disk.
306The original database file remains unaltered, which means that other
307processes can continue to read the database.</p>
308
309<p>Eventually, the writing process will want to update the database
310file, either because its memory cache has filled up or because it is
311ready to commit its changes. Before this happens, the writer must
312make sure no other process is reading the database and that the rollback
313journal data is safely on the disk surface so that it can be used to
314rollback incomplete changes in the event of a power failure.
315The steps are as follows:</p>
316
317<ol>
318<li>Make sure all rollback journal data has actually been written to
319 the surface of the disk (and is not just being held in the operating
320 system's or disk controllers cache) so that if a power failure occurs
321 the data will still be there after power is restored.</li>
322<li>Obtain a PENDING lock and then an EXCLUSIVE lock on the database file.
323 If other processes are still have SHARED locks, the writer might have
324 to wait until those SHARED locks clear before it is able to obtain
325 an EXCLUSIVE lock.</li>
326<li>Write all page modifications currently held in memory out to the
327 original database disk file.</li>
328</ol>
329
330<p>
331If the reason for writing to the database file is because the memory
332cache was full, then the writer will not commit right away. Instead,
333the writer might continue to make changes to other pages. Before
334subsequent changes are written to the database file, the rollback
335journal must be flushed to disk again. Note also that the EXCLUSIVE
336lock that the writer obtained in order to write to the database initially
337must be held until all changes are committed. That means that no other
338processes are able to access the database from the
339time the memory cache first spills to disk until the transaction
340commits.
341</p>
342
343<p>
344When a writer is ready to commit its changes, it executes the following
345steps:
346</p>
347
348<ol>
349<li value="4">
350 Obtain an EXCLUSIVE lock on the database file and
351 make sure all memory changes have been written to the database file
352 using the algorithm of steps 1-3 above.</li>
353<li>Flush all database file changes to the disk. Wait for those changes
354 to actually be written onto the disk surface.</li>
355<li>Delete the journal file. This is the instant when the changes are
356 committed. Prior to deleting the journal file, if a power failure
357 or crash occurs, the next process to open the database will see that
358 it has a hot journal and will roll the changes back.
359 After the journal is deleted, there will no longer be a hot journal
360 and the changes will persist.
361 </li>
362<li>Drop the EXCLUSIVE and PENDING locks from the database file.
363 </li>
364</ol>
365
366<p>As soon as PENDING lock is released from the database file, other
367processes can begin reading the database again. In the current implementation,
368the RESERVED lock is also released, but that is not essential. Future
369versions of SQLite might provide a "CHECKPOINT" SQL command that will
370commit all changes made so far within a transaction but retain the
371RESERVED lock so that additional changes can be made without given
372any other process an opportunity to write.</p>
373
374<p>If a transaction involves multiple databases, then a more complex
375commit sequence is used, as follows:</p>
376
377<ol>
378<li value="4">
379 Make sure all individual database files have an EXCLUSIVE lock and a
380 valid journal.
381<li>Create a master-journal. The name of the master-journal is arbitrary.
382 (The current implementation appends random suffixes to the name of the
383 main database file until it finds a name that does not previously exist.)
384 Fill the master journal with the names of all the individual journals
385 and flush its contents to disk.
386<li>Write the name of the master journal into
387 all individual journals (in space set aside for that purpose in the
388 headers of the individual journals) and flush the contents of the
389 individual journals to disk and wait for those changes to reach the
390 disk surface.
391<li>Flush all database file changes to the disk. Wait for those changes
392 to actually be written onto the disk surface.</li>
393<li>Delete the master journal file. This is the instant when the changes are
394 committed. Prior to deleting the master journal file, if a power failure
395 or crash occurs, the individual file journals will be considered hot
396 and will be rolled back by the next process that
397 attempts to read them. After the master journal has been deleted,
398 the file journals will no longer be considered hot and the changes
399 will persist.
400 </li>
401<li>Delete all individual journal files.
402<li>Drop the EXCLUSIVE and PENDING locks from all database files.
403 </li>
404</ol>
405}
406
407HEADING 2 {Writer starvation} writer_starvation
408
409puts {
410<p>In SQLite version 2, if many processes are reading from the database,
411it might be the case that there is never a time when there are
412no active readers. And if there is always at least one read lock on the
413database, no process would ever be able to make changes to the database
414because it would be impossible to acquire a write lock. This situation
415is called <em>writer starvation</em>.</p>
416
417<p>SQLite version 3 seeks to avoid writer starvation through the use of
418the PENDING lock. The PENDING lock allows existing readers to continue
419but prevents new readers from connecting to the database. So when a
420process wants to write a busy database, it can set a PENDING lock which
421will prevent new readers from coming in. Assuming existing readers do
422eventually complete, all SHARED locks will eventually clear and the
423writer will be given a chance to make its changes.</p>
424}
425
426HEADING 1 {How To Corrupt Your Database Files} how_to_corrupt
427
428puts {
429<p>The pager module is robust but it is not completely failsafe.
430It can be subverted. This section attempts to identify and explain
431the risks.</p>
432
433<p>
434Clearly, a hardware or operating system fault that introduces incorrect data
435into the middle of the database file or journal will cause problems.
436Likewise,
437if a rogue process opens a database file or journal and writes malformed
438data into the middle of it, then the database will become corrupt.
439There is not much that can be done about these kinds of problems
440so they are given no further attention.
441</p>
442
443<p>
444SQLite uses POSIX advisory locks to implement locking on Unix. On
445windows it uses the LockFile(), LockFileEx(), and UnlockFile() system
446calls. SQLite assumes that these system calls all work as advertised. If
447that is not the case, then database corruption can result. One should
448note that POSIX advisory locking is known to be buggy or even unimplemented
449on many NFS implementations (including recent versions of Mac OS X)
450and that there are reports of locking problems
451for network filesystems under windows. Your best defense is to not
452use SQLite for files on a network filesystem.
453</p>
454
455<p>
456SQLite uses the fsync() system call to flush data to the disk under Unix and
457it uses the FlushFileBuffers() to do the same under windows. Once again,
458SQLite assumes that these operating system services function as advertised.
459But it has been reported that fsync() and FlushFileBuffers() do not always
460work correctly, especially with inexpensive IDE disks. Apparently some
461manufactures of IDE disks have defective controller chips that report
462that data has reached the disk surface when in fact the data is still
463in volatile cache memory in the disk drive electronics. There are also
464reports that windows sometimes chooses to ignore FlushFileBuffers() for
465unspecified reasons. The author cannot verify any of these reports.
466But if they are true, it means that database corruption is a possibility
467following an unexpected power loss. These are hardware and/or operating
468system bugs that SQLite is unable to defend against.
469</p>
470
471<p>
472If a crash or power failure occurs and results in a hot journal but that
473journal is deleted, the next process to open the database will not
474know that it contains changes that need to be rolled back. The rollback
475will not occur and the database will be left in an inconsistent state.
476Rollback journals might be deleted for any number of reasons:
477</p>
478
479<ul>
480<li>An administrator might be cleaning up after an OS crash or power failure,
481 see the journal file, think it is junk, and delete it.</li>
482<li>Someone (or some process) might rename the database file but fail to
483 also rename its associated journal.</li>
484<li>If the database file has aliases (hard or soft links) and the file
485 is opened by a different alias than the one used to create the journal,
486 then the journal will not be found. To avoid this problem, you should
487 not create links to SQLite database files.</li>
488<li>Filesystem corruption following a power failure might cause the
489 journal to be renamed or deleted.</li>
490</ul>
491
492<p>
493The last (fourth) bullet above merits additional comment. When SQLite creates
494a journal file on Unix, it opens the directory that contains that file and
495calls fsync() on the directory, in an effort to push the directory information
496to disk. But suppose some other process is adding or removing unrelated
497files to the directory that contains the database and journal at the the
498moment of a power failure. The supposedly unrelated actions of this other
499process might result in the journal file being dropped from the directory and
500moved into "lost+found". This is an unlikely scenario, but it could happen.
501The best defenses are to use a journaling filesystem or to keep the
502database and journal in a directory by themselves.
503</p>
504
505<p>
506For a commit involving multiple databases and a master journal, if the
507various databases were on different disk volumes and a power failure occurs
508during the commit, then when the machine comes back up the disks might
509be remounted with different names. Or some disks might not be mounted
510at all. When this happens the individual file journals and the master
511journal might not be able to find each other. The worst outcome from
512this scenario is that the commit ceases to be atomic.
513Some databases might be rolled back and others might not.
514All databases will continue to be self-consistent.
515To defend against this problem, keep all databases
516on the same disk volume and/or remount disks using exactly the same names
517after a power failure.
518</p>
519}
520
521HEADING 1 {Transaction Control At The SQL Level} transaction_control
522
523puts {
524<p>
525The changes to locking and concurrency control in SQLite version 3 also
526introduce some subtle changes in the way transactions work at the SQL
527language level.
528By default, SQLite version 3 operates in <em>autocommit</em> mode.
529In autocommit mode,
530all changes to the database are committed as soon as all operations associated
531with the current database connection complete.</p>
532
533<p>The SQL command "BEGIN TRANSACTION" (the TRANSACTION keyword
534is optional) is used to take SQLite out of autocommit mode.
535Note that the BEGIN command does not acquire any locks on the database.
536After a BEGIN command, a SHARED lock will be acquired when the first
537SELECT statement is executed. A RESERVED lock will be acquired when
538the first INSERT, UPDATE, or DELETE statement is executed. No EXCLUSIVE
539lock is acquired until either the memory cache fills up and must
540be spilled to disk or until the transaction commits. In this way,
541the system delays blocking read access to the file file until the
542last possible moment.
543</p>
544
545<p>The SQL command "COMMIT" does not actually commit the changes to
546disk. It just turns autocommit back on. Then, at the conclusion of
547the command, the regular autocommit logic takes over and causes the
548actual commit to disk to occur.
549The SQL command "ROLLBACK" also operates by turning autocommit back on,
550but it also sets a flag that tells the autocommit logic to rollback rather
551than commit.</p>
552
553<p>If the SQL COMMIT command turns autocommit on and the autocommit logic
554then tries to commit change but fails because some other process is holding
555a SHARED lock, then autocommit is turned back off automatically. This
556allows the user to retry the COMMIT at a later time after the SHARED lock
557has had an opportunity to clear.</p>
558
559<p>If multiple commands are being executed against the same SQLite database
560connection at the same time, the autocommit is deferred until the very
561last command completes. For example, if a SELECT statement is being
562executed, the execution of the command will pause as each row of the
563result is returned. During this pause other INSERT, UPDATE, or DELETE
564commands can be executed against other tables in the database. But none
565of these changes will commit until the original SELECT statement finishes.
566</p>
567}
568
569
570footer $rcsid