move docs back to top level

This commit is contained in:
Dustin J. Mitchell 2020-11-23 16:07:35 -05:00
parent 2296d0fa35
commit 084c978b31
12 changed files with 0 additions and 0 deletions

View file

@ -1 +0,0 @@
book

View file

@ -1,3 +0,0 @@
This is an [mdbook](https://rust-lang.github.io/mdBook/index.html) book.
Minor modifications can be made without installing the mdbook tool, as the content is simple Markdown.
Changes are verified on pull requests.

View file

@ -1,6 +0,0 @@
[book]
authors = ["Dustin J. Mitchell"]
language = "en"
multilingual = false
src = "src"
title = "TaskChampion"

View file

@ -1,11 +0,0 @@
# Summary
- [Installation](./installation.md)
- [Usage](./usage.md)
---
- [Data Model](./data-model.md)
- [Replica Storage](./storage.md)
- [Task Database](./taskdb.md)
- [Tasks](./tasks.md)
- [Synchronization](./sync.md)
- [Planned Functionality](./plans.md)

View file

@ -1,5 +0,0 @@
# Data Model
A client manages a single offline instance of a single user's task list, called a replica.
This section covers the structure of that data.
Note that this data model is visible only on the client; the server does not have access to client data.

View file

@ -1,3 +0,0 @@
# Installation
As this is currently in development, installation is by cloning the repository and running "cargo build".

View file

@ -1,35 +0,0 @@
# Planned Functionality
This section is a bit of a to-do list for additional functionality to add to the synchronzation system.
Each feature has some discussion of how it might be implemented.
## Snapshots
As designed, storage required on the server would grow with time, as would the time required for new clients to update to the latest version.
As an optimization, the server also stores "snapshots" containing a full copy of the task database at a given version.
Based on configurable heuristics, it may delete older operations and snapshots, as long as enough data remains for active clients to synchronize and for new clients to initialize.
Since snapshots must be computed by clients, the server may "request" a snapshot when providing the latest version to a client.
This request comes with a number indicating how much it 'wants" the snapshot.
Clients which can easily generate and transmit a snapshot should be generous to the server, while clients with more limited resources can wait until the server's requests are more desperate.
The intent is, where possible, to request snapshots created on well-connected desktop clients over mobile and low-power clients.
## Encryption and Signing
From the server's perspective, all data except for version numbers are opaque binary blobs.
Clients encrypt and sign these blobs using a symmetric key known only to the clients.
This secures the data at-rest on the server.
Note that privacy is not complete, as the server still has some information about users, including source and frequency of synchronization transactions and size of those transactions.
## Backups
In this design, the server is little more than an authenticated storage for encrypted blobs provided by the client.
To allow for failure or data loss on the server, clients are expected to cache these blobs locally for a short time (a week), along with a server-provided HMAC signature.
When data loss is detected -- such as when a client expects the server to have a version N or higher, and the server only has N-1, the client can send those blobs to the server.
The server can validate the HMAC and, if successful, add the blobs to its datastore.
## Expiration
Deleted tasks remain in the task database, and are simply hidden in most views.
All tasks have an expiration time after which they may be flushed, preventing unbounded increase in task database size.
However, purging of a task does not satisfy the necessary OT guarantees, so some further formal design work is required before this is implemented.

View file

@ -1,41 +0,0 @@
# Replica Storage
Each replica has a storage backend.
The interface for this backend is given in `crate::taskstorage::TaskStorage` and `TaskStorageTxn`.
The storage is transaction-protected, with the expectation of a serializable isolation level.
The storage contains the following information:
- `tasks`: a set of tasks, indexed by UUID
- `base_version`: the number of the last version sync'd from the server (a single integer)
- `operations`: all operations performed since base_version
- `working_set`: a mapping from integer -> UUID, used to keep stable small-integer indexes into the tasks for users' convenience. This data is not synchronized with the server and does not affect any consistency guarantees.
## Tasks
The tasks are stored as an un-ordered collection, keyed by task UUID.
Each task in the database has represented by a key-value map.
See [Tasks](./tasks.md) for details on the content of that map.
## Operations
Every change to the task database is captured as an operation.
In other words, operations act as deltas between database states.
Operations are crucial to synchronization of replicas, using a technique known as Operational Transforms.
Each operation has one of the forms
* `Create(uuid)`
* `Delete(uuid)`
* `Update(uuid, property, value, timestamp)`
The Create form creates a new task.
It is invalid to create a task that already exists.
Similarly, the Delete form deletes an existing task.
It is invalid to delete a task that does not exist.
The Update form updates the given property of the given task, where property and value are both strings.
Value can also be `None` to indicate deletion of a property.
It is invalid to update a task that does not exist.
The timestamp on updates serves as additional metadata and is used to resolve conflicts.

View file

@ -1,120 +0,0 @@
# Synchronization
The [task database](./taskdb.md) also implements synchronization.
Synchronization occurs between disconnected replicas, mediated by a server.
The replicas never communicate directly with one another.
The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata.
The synchronization process is a critical part of the task database's functionality, and it cannot function efficiently without occasional synchronization operations
## Operational Transformations
Synchronization is based on [operational transformation](https://en.wikipedia.org/wiki/Operational_transformation).
This section will assume some familiarity with the concept.
## State and Operations
At a given time, the set of tasks in a replica's storage is the essential "state" of that replica.
All modifications to that state occur via operations, as defined in [Replica Storage](./storage.md).
We can draw a network, or graph, with the nodes representing states and the edges representing operations.
For example:
```text
o -- State: {abc-d123: 'get groceries', priority L}
|
| -- Operation: set abc-d123 priority to H
|
o -- State: {abc-d123: 'get groceries', priority H}
```
For those familiar with distributed version control systems, a state is analogous to a revision, while an operation is analogous to a commit.
Fundamentally, synchronization involves all replicas agreeing on a single, linear sequence of operations and the state that those operations create.
Since the replicas are not connected, each may have additional operations that have been applied locally, but which have not yet been agreed on.
The synchronization process uses operational transformation to "linearize" those operations.
This process is analogous (vaguely) to rebasing a sequence of Git commits.
### Versions
Occasionally, database states are named with an integer, called a version.
The system as a whole (all replicas) constructs a monotonic sequence of versions and the operations that separate each version from the next.
No gaps are allowed in the version numbering.
Version 0 is implicitly the empty database.
The server stores the operations to change a state from a version N to a version N+1, and provides that information as needed to replicas.
Replicas use this information to update their local task databases, and to generate new versions to send to the server.
Replicas generate a new version to transmit changes made locally to the server.
The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version.
In order to keep the gap-free monotonic numbering, the server will only accept a proposed version from a replica if its number is one greater that the latest version on the server.
In the non-conflict case (such as with a single replica), then, a replica's synchronization process involves gathering up the operations it has accumulated since its last synchronization; bundling those operations into version N+1; and sending that version to the server.
### Transformation
When the latest version on the server contains operations that are not present in the replica, then the states have diverged.
For example (with lower-case letters designating operations):
```text
o -- version N
w|\a
o o
x| \b
o o
y| \c
o o -- replica's local state
z|
o -- version N+1
```
In this situation, the replica must "rebase" the local operations onto the latest version from the server and try again.
This process is performed using operational transformation (OT).
The result of this transformation is a sequence of operations based on the latest version, and a sequence of operations the replica can apply to its local task database to reach the same state
Continuing the example above, the resulting operations are shown with `'`:
```text
o -- version N
w|\a
o o
x| \b
o o
y| \c
o o -- replica's intermediate local state
z| |w'
o-N+1 o
a'\ |x'
o o
b'\ |y'
o o
c'\|z'
o -- version N+2
```
The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2.
Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate *precisely* the same final state at version N+2.
Careful selection of the operations and the transformation function ensure this.
See the comments in the source code for the details of how this transformation process is implemented.
## Replica Implementation
The replica's [storage](./storage.md) contains the current state in `tasks`, the as-yet un-synchronized operations in `operations`, and the last version at which synchronization occurred in `base_version`.
To perform a synchronization, the replica first requests any versions greater than `base_version` from the server, and rebases any local operations on top of those new versions, updating `base_version`.
If there are no un-synchronized local operations, the process is complete.
Otherwise, the replica creates a new version containing those local operations and uploads that to the server.
In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version.
In this case, the process repeats.
The replica's un-synchronized operations are already reflected in `tasks`, so the following invariant holds:
> Applying `operations` to the set of tasks at `base_version` gives a set of tasks identical
> to `tasks`.
## Server Implementation
The server implementation is simple.
It supports fetching versions keyed by number, and adding a new version.
In adding a new version, the version number must be one greater than the greatest existing version.
Critically, the server operates on nothing more than numbered, opaque blobs of data.

View file

@ -1,28 +0,0 @@
# Task Database
The task database is a layer of abstraction above the replica storage layer, responsible for maintaining some important invariants.
While the storage is pluggable, there is only one implementation of the task database.
## Reading Data
The task database provides read access to the data in the replica's storage through a variety of methods on the struct.
Each read operation is executed in a transaction, so data may not be consistent between read operations.
In practice, this is not an issue for TaskChampion's purposes.
## Working Set
The task database maintains the working set.
The working set maps small integers to current tasks, for easy reference by command-line users.
This is done in such a way that the task numbers remain stable until the working set is rebuilt, at which point gaps in the numbering, such as for completed tasks, are removed by shifting all higher-numbered tasks downward.
The working set is not replicated, and is not considered a part of any consistency guarantees in the task database.
## Modifying Data
Modifications to the data set are made by applying operations.
Operations are described in [Replica Storage](./storage.md).
Each operation is added to the list of operations in the storage, and simultaneously applied to the tasks in that storage.
Operations are checked for validity as they are applied.

View file

@ -1,38 +0,0 @@
# Tasks
Tasks are stored internally as a key/value map with string keys and values.
All fields are optional: the `Create` operation creates an empty task.
Display layers should apply appropriate defaults where necessary.
## Atomicity
The synchronization process does not support read-modify-write operations.
For example, suppose tags are updated by reading a list of tags, adding a tag, and writing the result back.
This would be captured as an `Update` operation containing the amended list of tags.
Suppose two such `Update` operations are made in different replicas and must be reconciled:
* `Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag1", "2020-11-23T14:21:22Z")`
* `Update("d394be59-60e6-499e-b7e7-ca0142648409", "tags", "oldtag,newtag2", "2020-11-23T15:08:57Z")`
The result of this reconciliation will be `oldtag,newtag2`, while the user almost certainly intended `oldtag,newtag1,newtag2`.
The key names given below avoid this issue, allowing user updates such as adding a tag or deleting a dependency to be represented in a single `Update` operation.
## Representations
Integers are stored in decimal notation.
Timestamps are stored as UNIX epoch timestamps, in the form of an integer.
## Keys
The following keys, and key formats, are defined:
* `status` - one of `P` for a pending task (the default), `C` for completed or `D` for deleted
* `description` - the one-line summary of the task
* `modified` - the time of the last modification of this task
The following are not yet implemented:
* `dep.<uuid>` - indicates this task depends on `<uuid>` (value is an empty string)
* `tag.<tag>` - indicates this task has tag `<tag>` (value is an empty string)
* `annotation.<timestamp>` - value is an annotation created at the given time

View file

@ -1,4 +0,0 @@
# Usage
The main interface to your tasks is the `task` command, which supports various subcommands.
You can find a quick list of all subcommands with `task help`.