<liclass="toctree-l1 current has-children"><aclass="reference internal"href="../index.html">Comparison</a><inputchecked=""class="toctree-checkbox"id="toctree-checkbox-1"name="toctree-checkbox-1"role="switch"type="checkbox"/><labelfor="toctree-checkbox-1"><divclass="visually-hidden">Toggle navigation of Comparison</div><iclass="icon"><svg><usehref="#svg-arrow-right"></use></svg></i></label><ulclass="current">
<liclass="toctree-l2 has-children"><aclass="reference internal"href="../p2p/index.html">P2P</a><inputclass="toctree-checkbox"id="toctree-checkbox-2"name="toctree-checkbox-2"role="switch"type="checkbox"/><labelfor="toctree-checkbox-2"><divclass="visually-hidden">Toggle navigation of P2P</div><iclass="icon"><svg><usehref="#svg-arrow-right"></use></svg></i></label><ul>
<liclass="toctree-l2 current has-children"><aclass="reference internal"href="index.html">Data Structures</a><inputchecked=""class="toctree-checkbox"id="toctree-checkbox-5"name="toctree-checkbox-5"role="switch"type="checkbox"/><labelfor="toctree-checkbox-5"><divclass="visually-hidden">Toggle navigation of Data Structures</div><iclass="icon"><svg><usehref="#svg-arrow-right"></use></svg></i></label><ulclass="current">
<h1>SQLite<aclass="headerlink"href="#sqlite"title="Permalink to this heading">#</a></h1>
<spanclass="target"id="index-0"></span><pid="index-1">We want something like sqlite, but for <spanclass="target"id="index-2"></span>Graph Databases!</p>
<p>Most of the existing triple stores and graph databases are very heavyweight services that would be impractical for packaging in a portable daemon in the same way that sqlite works. Maybe we can learn from how sqlite works and do something similar for graph databases?</p>
<p>Questions:</p>
<ulclass="simple">
<li><p>How come these things can be faster than idk like a .json file</p></li>
<li><p>How are they different architecturally than a traditional SQL server</p></li>
</ul>
<sectionid="file-structure">
<h2>File Structure<aclass="headerlink"href="#file-structure"title="Permalink to this heading">#</a></h2>
<ulclass="simple">
<li><p>Main file</p></li>
<li><p>Rollback Journal - stores additional information to restore in case of a crash. Store a copy of the original DB, write changes directly into DB file. COMMIT occurs when rollback is deleted</p></li>
<li><p>Write-ahead Log - if in <aclass="reference external"href="https://www.sqlite.org/wal.html">WAL mode</a>, append updates to WAL file. COMMIT occurs when writing to WAL file (not to main DB). Multiple transactions can be batched.</p></li>
</ul>
<sectionid="pages">
<h3>Pages<aclass="headerlink"href="#pages"title="Permalink to this heading">#</a></h3>
<p>Pages are the basic unit of an sqlite file.</p>
<p>Numeracy:</p>
<ulclass="simple">
<li><p>Each page can be a power of 2 between 512 and 65536</p></li>
<li><p>All pages are the same size</p></li>
<li><p>Max <codeclass="docutils literal notranslate"><spanclass="pre">2^32</span><spanclass="pre">-</span><spanclass="pre">2</span></code> pages in a single DB.</p></li>
</ul>
<sectionid="types">
<h4>Types<aclass="headerlink"href="#types"title="Permalink to this heading">#</a></h4>
<p>Each page has a single type:</p>
<blockquote>
<div><ulclass="simple">
<li><p>The lock-byte page</p></li>
<li><p>A freelist page</p>
<ul>
<li><p>A freelist trunk page</p></li>
<li><p>A freelist leaf page</p></li>
</ul>
</li>
<li><p>A b-tree page</p>
<ul>
<li><p>A table b-tree interior page</p></li>
<li><p>A table b-tree leaf page</p></li>
<li><p>An index b-tree interior page</p></li>
<li><p>An index b-tree leaf page</p></li>
</ul>
</li>
<li><p>A payload overflow page</p></li>
<li><p>A pointer map page</p></li>
</ul>
</div></blockquote>
<sectionid="lock-byte">
<h5>Lock-byte<aclass="headerlink"href="#lock-byte"title="Permalink to this heading">#</a></h5>
<p>(artifact of windows 95 compatibility)</p>
</section>
<sectionid="freelist">
<h5>Freelist<aclass="headerlink"href="#freelist"title="Permalink to this heading">#</a></h5>
<p>Linked list of “trunks and leaves” to keep track of unused pages:</p>
<ulclass="simple">
<li><p>Trunk pages:</p>
<ul>
<li><p>Series of 4-byte integers that take up full page</p></li>
<li><p>First integer is the page number of the next trunk (zero if it’s the last page)</p></li>
<li><p>Second integer is number of leaf pointers that follow</p></li>
</ul>
</li>
<li><p>Leaf pages:</p>
<ul>
<li><p>contain nothing!</p></li>
</ul>
</li>
</ul>
</section>
<sectionid="b-tree">
<h5><spanclass="target"id="index-3"></span>B-tree<aclass="headerlink"href="#b-tree"title="Permalink to this heading">#</a></h5>
<p>(<aclass="reference external"href="https://en.wikipedia.org/wiki/B-tree">B-tree wiki page</a>)</p>
<p>Two types of b-trees: table and index</p>
<ulclass="simple">
<li><p><strong>Table B-Trees</strong>:</p>
<ul>
<li><p>One table b-tree in the db file for each <codeclass="docutils literal notranslate"><spanclass="pre">rowid</span></code> table in the database schema</p></li>
<li><p>64-bit signed integer key that refers to the <codeclass="docutils literal notranslate"><spanclass="pre">rowid</span></code> it implements</p></li>
<li><p>Store all data in leaves (interior pages just point to leaves)</p></li>
<li></li>
</ul>
</li>
<li><p><strong>Index B-Trees</strong>:</p>
<ul>
<li><p>One index b-tree for each index in the schema</p></li>
<li><p>Arbitrary keys</p></li>
<li><p>Store no data.</p></li>
</ul>
</li>
</ul>
<p>Two types of b-tree pages:</p>
<ulclass="simple">
<li><p><strong>Interior</strong></p></li>
<li><p><strong>Leaf</strong></p></li>
</ul>
<divclass="admonition-todo admonition"id="id1">
<pclass="admonition-title">Todo</p>
<p>Describe freeblocks</p>
</div>
</section>
</section>
<sectionid="payload-overflow">
<h4>Payload Overflow<aclass="headerlink"href="#payload-overflow"title="Permalink to this heading">#</a></h4>
<blockquote>
<div><p>Define the “payload” of a cell to be the arbitrary length section of the cell.</p>
<ulclass="simple">
<li><p>For an index b-tree, the key is always arbitrary in length and hence the payload is the key.</p></li>
<li><p>There are no arbitrary length elements in the cells of interior table b-tree pages and so those cells have no payload.</p></li>
<li><p>Table b-tree leaf pages contain arbitrary length content and so for cells on those pages the payload is the content.</p></li>
</ul>
</div></blockquote>
<p>When a payload is bigger than some threshold<aclass="footnote-reference brackets"href="#overflowthreshold"id="id2"role="doc-noteref"><spanclass="fn-bracket">[</span>1<spanclass="fn-bracket">]</span></a>, store it on a linked list of payload overload pages. The first four bytes of each overflow page are a 4-byte big-endian integer indicating the page number of the next page in the chain, or zero for the final page.</p>
</section>
<sectionid="pointer-maps">
<h4>Pointer Maps<aclass="headerlink"href="#pointer-maps"title="Permalink to this heading">#</a></h4>
<p>Backlinks from child to parent nodes in index trees to assist with vacuuming :)</p>
<p>Each pointermap page provides backlinks for the pages immediately following it.</p>
<p>Each 5-byte ptrmap entry consists of:</p>
<ulclass="simple">
<li><p>1 byte of page type information:</p>
<ul>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">0</span></code>: A b-tree root page</p></li>
<li><p>Magic header string makes it easy to identify sqlite files</p></li>
<li><p>File change counter & schema cookie - 4-byte integer that increments whenever the db file is unlocked. useful for cache invalidation</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">version-valid-for-number</span></code> - stores the version of the software that most recently modified it, and the change counter at that modification. Useful for detecting if certain behaviors like updating the in-header db size are behaving correctly by knowing what version made a given change.</p></li>
</ul>
</section>
</section>
<sectionid="schema">
<h2>Schema<aclass="headerlink"href="#schema"title="Permalink to this heading">#</a></h2>
<sectionid="records">
<h3>Records<aclass="headerlink"href="#records"title="Permalink to this heading">#</a></h3>
</section>
<sectionid="tables">
<h3>Tables<aclass="headerlink"href="#tables"title="Permalink to this heading">#</a></h3>
</section>
<sectionid="indices">
<h3>Indices<aclass="headerlink"href="#indices"title="Permalink to this heading">#</a></h3>
</section>
</section>
<sectionid="i-o">
<h2>I/O<aclass="headerlink"href="#i-o"title="Permalink to this heading">#</a></h2>
<divclass="admonition-todo admonition"id="id3">
<pclass="admonition-title">Todo</p>
<p><strong>How does writing and querying an sqlite file actually work???</strong></p>
</div>
<p>All reads from and writes to the main database file happen at a page boundary.</p>
<p>All writes are an integer number of pages in size.</p>
<p>Most reads are also an integer number of pages in size, except opening the database which reads the header (first 100 bytes).</p>
</section>
<sectionid="see-also">
<h2>See also<aclass="headerlink"href="#see-also"title="Permalink to this heading">#</a></h2>
<li><p><aclass="reference external"href="https://www.sqlite.org/quirks.html">SQLite Quirks</a> - useful for understanding some design decisions</p></li>
<li><p><aclass="reference external"href="https://www.sqlite.org/custombuild.html">Customization and Porting</a></p></li>
<div><p>The overflow thresholds are designed to give a minimum fanout of 4 for index b-trees and to make sure enough of the payload is on the b-tree page that the record header can usually be accessed without consulting an overflow page. In hindsight, the designer of the SQLite b-tree logic realized that these thresholds could have been made much simpler. However, the computations cannot be changed without resulting in an incompatible file format. And the current computations work well, even if they are a little complex.</p>