diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md index 4555c20c6..91a180ab5 100644 --- a/docs/src/SUMMARY.md +++ b/docs/src/SUMMARY.md @@ -7,5 +7,7 @@ - [Replica Storage](./storage.md) - [Task Database](./taskdb.md) - [Tasks](./tasks.md) -- [Synchronization](./sync.md) +- [Synchronization and the Sync Server](./sync.md) + - [Synchronization Model](./sync-model.md) + * [Server-Replica Protocol](./sync-protocol.md) - [Planned Functionality](./plans.md) diff --git a/docs/src/sync-model.md b/docs/src/sync-model.md new file mode 100644 index 000000000..691312efa --- /dev/null +++ b/docs/src/sync-model.md @@ -0,0 +1,128 @@ +# Synchronization Model + +The [task database](./taskdb.md) also implements synchronization. +Synchronization occurs between disconnected replicas, mediated by a server. +The replicas never communicate directly with one another. +The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata. + +The synchronization process is a critical part of the task database's functionality, and it cannot function efficiently without occasional synchronization operations + +## Operational Transforms + +Synchronization is based on [operational transformation](https://en.wikipedia.org/wiki/Operational_transformation). +This section will assume some familiarity with the concept. + +## State and Operations + +At a given time, the set of tasks in a replica's storage is the essential "state" of that replica. +All modifications to that state occur via operations, as defined in [Replica Storage](./storage.md). +We can draw a network, or graph, with the nodes representing states and the edges representing operations. +For example: + +```text + o -- State: {abc-d123: 'get groceries', priority L} + | + | -- Operation: set abc-d123 priority to H + | + o -- State: {abc-d123: 'get groceries', priority H} +``` + +For those familiar with distributed version control systems, a state is analogous to a revision, while an operation is analogous to a commit. + +Fundamentally, synchronization involves all replicas agreeing on a single, linear sequence of operations and the state that those operations create. +Since the replicas are not connected, each may have additional operations that have been applied locally, but which have not yet been agreed on. +The synchronization process uses operational transformation to "linearize" those operations. +This process is analogous (vaguely) to rebasing a sequence of Git commits. + +### Versions + +Occasionally, database states are given a name (that takes the form of a UUID). +The system as a whole (all replicas) constructs a branch-free sequence of versions and the operations that separate each version from the next. +The version with the nil UUID is implicitly the empty database. + +The server stores the operations to change a state from a "parent" version to a "child" version, and provides that information as needed to replicas. +Replicas use this information to update their local task databases, and to generate new versions to send to the server. + +Replicas generate a new version to transmit local changes to the server. +The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version. +In order to keep the versions in a single sequence, the server will only accept a proposed version from a replica if its parent version matches the latest version on the server. + +In the non-conflict case (such as with a single replica), then, a replica's synchronization process involves gathering up the operations it has accumulated since its last synchronization; bundling those operations into a version; and sending that version to the server. + +### Replica Invariant + +The replica's [storage](./storage.md) contains the current state in `tasks`, the as-yet un-synchronized operations in `operations`, and the last version at which synchronization occurred in `base_version`. + +The replica's un-synchronized operations are already reflected in its local `tasks`, so the following invariant holds: + +> Applying `operations` to the set of tasks at `base_version` gives a set of tasks identical +> to `tasks`. + +### Transformation + +When the latest version on the server contains operations that are not present in the replica, then the states have diverged. +For example: + +```text + o -- version N + w|\a + o o + x| \b + o o + y| \c + o o -- replica's local state + z| + o -- version N+1 +``` + +(diagram notation: `o` designates a state, lower-case letters designate operations, and versions are presented as if they were numbered sequentially) + +In this situation, the replica must "rebase" the local operations onto the latest version from the server and try again. +This process is performed using operational transformation (OT). +The result of this transformation is a sequence of operations based on the latest version, and a sequence of operations the replica can apply to its local task database to reach the same state +Continuing the example above, the resulting operations are shown with `'`: + +```text + o -- version N + w|\a + o o + x| \b + o o + y| \c + o o -- replica's intermediate local state + z| |w' + o-N+1 o + a'\ |x' + o o + b'\ |y' + o o + c'\|z' + o -- version N+2 +``` + +The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2. +Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate *precisely* the same final state at version N+2. +Careful selection of the operations and the transformation function ensure this. + +See the comments in the source code for the details of how this transformation process is implemented. + +## Synchronization Process + +To perform a synchronization, the replica first requests the child version of `base_version` from the server (GetChildVersion). +It applies that version to its local `tasks`, rebases its local `operations` as described above, and updates `base_version`. +The replica repeats this process until the server indicates no additional child versions exist. +If there are no un-synchronized local operations, the process is complete. + +Otherwise, the replica creates a new version containing its local operations, giving its `base_version` as the parent version, and transmits that to the server (AddVersion). +In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version and the server will respond with the new expected parent version. +In this case, the process repeats. +If the server indicates a conflict twice with the same expected base version, that is an indication that the replica has diverged (something serious has gone wrong). + +## Servers + +A replica depends on periodic synchronization for performant operation. +Without synchronization, its list of pending operations would grow indefinitely, and tasks could never be expired. +So all replicas, even "singleton" replicas which do not replicate task data with any other replica, must synchronize periodically. + +TaskChampion provides a `LocalServer` for this purpose. +It implements the `get_child_version` and `add_version` operations as described, storing data on-disk locally, all within the `task` binary. diff --git a/docs/src/sync-protocol.md b/docs/src/sync-protocol.md new file mode 100644 index 000000000..9a5caa247 --- /dev/null +++ b/docs/src/sync-protocol.md @@ -0,0 +1,92 @@ +# Server-Replica Protocol + +The server-replica protocol is defined abstractly in terms of request/response transactions from the replica to the server. +This is made concrete in an HTTP representation. + +The protocol builds on the model presented in the previous chapter, and in particular on the synchronization process. + +## Clients + +From the server's perspective, replicas are indistinguishable, so this protocol uses the term "client" to refer generically to all replicas replicating a single task history. + +## Server + +For each client, the server is responsible for storing the task history, in the form of a branch-free sequence of versions. + +For each client, it stores a set of versions as well as the latest version ID, defaulting to the nil UUID. +Each version has a version ID, a parent version ID, and a history segment (opaque data containing the operations for that version). +The server should maintain the following invariants: + +1. Given a client c, c.latestVersion is nil or exists in the set of versions. +1. Given versions v1 and v2 for a client, with v1.versionId != v2.versionId and v1.parentVersionId != nil, v1.parentVersionId != v2.parentVersionId. + In other words, versions do not branch. + +Note that versions form a linked list beginning with the version stored in he client. +This linked list need not continue back to a version with v.parentVersionId = nil. +It may end at any point when v.parentVersionId is not found in the set of Versions. +This observation allows the server to discard older versions. + +## Transactions + +### AddVersion + +The AddVersion transaction requests that the server add a new version to the client's task history. +The request contains the following; + + * parent version ID + * history segment + +The server determines whether the new version is acceptable, atomically with respect to other requests for the same client. +If it has no versions for the client, it accepts the version. +If it already has one or more versions for the client, then it accepts the version only if the given parent version ID matches its stored latest parent ID. + +If the version is accepted, the server generates a new version ID for it. +The version is added to the set of versions for the client, the client's latest version ID is set to the new version ID. +The new version ID is returned in the response to the client. + +If the version is not accepted, the server makes no changes, but responds to the client with a conflict indication containing the latest version ID. +The client may then "rebase" its operations and try again. +Note that if a client receives two conflict responses with the same parent version ID, it is an indication that the client's version history has diverged from that on the server. + +### GetChildVersion + +The GetChildVersion transaction is a read-only request for a version. +The request consists of a parent version ID. +The server searches its set of versions for a version with the given parent ID. +If found, it returns the version's + + * version ID, + * parent version ID (matching that in the request), and + * history segment. + +If not found, the server returns a negative response. + +## HTTP Representation + +The transactions above are realized for an HTTP server at `` using the HTTP requests and responses described here. +The `origin` *should* be an HTTPS endpoint on general principle, but nothing in the functonality or security of the protocol depends on connection encryption. + +The replica identifies itself to the server using a `clientId` in the form of a UUID. + +### AddVersion + +The request is a `POST` to `/client//add-version/`. +The request body contains the history segment, optionally encoded using any encoding supported by actix-web. +The content-type must be `application/vnd.taskchampion.history-segment`. + +The success response is a 200 OK with an empty body. +The new version ID appears in the `X-Version-Id` header. + +On conflict, the response is a 409 CONFLICT with an empty body. +The expected parent version ID appears in the `X-Parent-Version-Id` header. + +Other error responses (4xx or 5xx) may be returned and should be treated appropriately to their meanings in the HTTP specification. + +### GetChildVersion + +The request is a `GET` to `/client//get-child-version/`. +The response is 404 NOT FOUND if no such version exists. +Otherwise, the response is a 200 OK. +The version's history segment is returned in the response body, with content-type `application/vnd.taskchampion.history-segment`. +The version ID appears in the `X-Version-Id` header. +The response body may be encoded, in accordance with any `Accept-Encoding` header in the request. diff --git a/docs/src/sync.md b/docs/src/sync.md index cd2621cdc..fed75d17f 100644 --- a/docs/src/sync.md +++ b/docs/src/sync.md @@ -1,128 +1,7 @@ -# Synchronization +# Synchronization and the Sync Server -The [task database](./taskdb.md) also implements synchronization. -Synchronization occurs between disconnected replicas, mediated by a server. -The replicas never communicate directly with one another. -The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata. +This section covers *synchronization* of *replicas* containing the same set of tasks. +A replica is can perform all operations locally without connecting to a sync server, then share those operations with other replicas when it connects. +Sync is a critical feature of TaskChampion, allowing users to consult and update the same task list on multiple devices, without requiring constant connection. -The synchronization process is a critical part of the task database's functionality, and it cannot function efficiently without occasional synchronization operations - -## Operational Transformations - -Synchronization is based on [operational transformation](https://en.wikipedia.org/wiki/Operational_transformation). -This section will assume some familiarity with the concept. - -## State and Operations - -At a given time, the set of tasks in a replica's storage is the essential "state" of that replica. -All modifications to that state occur via operations, as defined in [Replica Storage](./storage.md). -We can draw a network, or graph, with the nodes representing states and the edges representing operations. -For example: - -```text - o -- State: {abc-d123: 'get groceries', priority L} - | - | -- Operation: set abc-d123 priority to H - | - o -- State: {abc-d123: 'get groceries', priority H} -``` - -For those familiar with distributed version control systems, a state is analogous to a revision, while an operation is analogous to a commit. - -Fundamentally, synchronization involves all replicas agreeing on a single, linear sequence of operations and the state that those operations create. -Since the replicas are not connected, each may have additional operations that have been applied locally, but which have not yet been agreed on. -The synchronization process uses operational transformation to "linearize" those operations. -This process is analogous (vaguely) to rebasing a sequence of Git commits. - -### Versions - -Occasionally, database states are given a name (that takes the form of a UUID). -The system as a whole (all replicas) constructs a branch-free sequence of versions and the operations that separate each version from the next. -The version with the nil UUID is implicitly the empty database. - -The server stores the operations to change a state from a "parent" version to a "child" version, and provides that information as needed to replicas. -Replicas use this information to update their local task databases, and to generate new versions to send to the server. - -Replicas generate a new version to transmit local changes to the server. -The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version. -In order to keep the versions in a single sequence, the server will only accept a proposed version from a replica if its parent version matches the latest version on the server. - -In the non-conflict case (such as with a single replica), then, a replica's synchronization process involves gathering up the operations it has accumulated since its last synchronization; bundling those operations into a version; and sending that version to the server. - -### Replica Invariant - -The replica's [storage](./storage.md) contains the current state in `tasks`, the as-yet un-synchronized operations in `operations`, and the last version at which synchronization occurred in `base_version`. - -The replica's un-synchronized operations are already reflected in its local `tasks`, so the following invariant holds: - -> Applying `operations` to the set of tasks at `base_version` gives a set of tasks identical -> to `tasks`. - -### Transformation - -When the latest version on the server contains operations that are not present in the replica, then the states have diverged. -For example: - -```text - o -- version N - w|\a - o o - x| \b - o o - y| \c - o o -- replica's local state - z| - o -- version N+1 -``` - -(diagram notation: `o` designates a state, lower-case letters designate operations, and versions are presented as if they were numbered sequentially) - -In this situation, the replica must "rebase" the local operations onto the latest version from the server and try again. -This process is performed using operational transformation (OT). -The result of this transformation is a sequence of operations based on the latest version, and a sequence of operations the replica can apply to its local task database to reach the same state -Continuing the example above, the resulting operations are shown with `'`: - -```text - o -- version N - w|\a - o o - x| \b - o o - y| \c - o o -- replica's intermediate local state - z| |w' - o-N+1 o - a'\ |x' - o o - b'\ |y' - o o - c'\|z' - o -- version N+2 -``` - -The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2. -Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate *precisely* the same final state at version N+2. -Careful selection of the operations and the transformation function ensure this. - -See the comments in the source code for the details of how this transformation process is implemented. - -## Synchronization Process - -To perform a synchronization, the replica first requests the child version of `base_version` from the server (`get_child_version`). -It applies that version to its local `tasks`, rebases its local `operations` as described above, and updates `base_version`. -The replica repeats this process until the server indicates no additional child versions exist. -If there are no un-synchronized local operations, the process is complete. - -Otherwise, the replica creates a new version containing its local operations, giving its `base_version` as the parent version, and transmits that to the server (`add_version`). -In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version and the server will respond with the new expected parent version. -In this case, the process repeats. -If the server indicates a conflict twice with the same expected base version, that is an indication that the replica has diverged (something serious has gone wrong). - -## Servers - -A replica depends on periodic synchronization for performant operation. -Without synchronization, its list of pending operations would grow indefinitely, and tasks could never be expired. -So all replicas, even "singleton" replicas which do not replicate task data with any other replica, must synchronize periodically. - -TaskChampion provides a `LocalServer` for this purpose. -It implements the `get_child_version` and `add_version` operations as described, storing data on-disk locally, all within the `task` binary. +This is a complex topic, and the section is broken into several chapters, beginning at the lower levels of the implementation and working up.