Add support for cloud sync, specifically GCP (#3223)

* Add support for cloud sync, specifically GCP

This adds generic support for sync to cloud services, with specific
spuport for GCP. Adding others -- so long as they support a
compare-and-set operation -- should be comparatively straightforward.

The cloud support includes cleanup of unnecessary data, and should keep
total space usage roughly proportional to the number of tasks.

Co-authored-by: ryneeverett <ryneeverett@gmail.com>
This commit is contained in:
Dustin J. Mitchell 2024-01-21 12:36:37 -05:00 committed by GitHub
parent 6f1c16fecd
commit 9566c929e2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
36 changed files with 4012 additions and 401 deletions

View file

@ -11,4 +11,7 @@
* [Synchronization Model](./sync-model.md)
* [Snapshots](./snapshots.md)
* [Server-Replica Protocol](./sync-protocol.md)
* [Encryption](./encryption.md)
* [HTTP Implementation](./http.md)
* [Object-Store Implementation](./object-store.md)
* [Planned Functionality](./plans.md)

View file

@ -0,0 +1,38 @@
# Encryption
The client configuration includes an encryption secret of arbitrary length.
This section describes how that information is used to encrypt and decrypt data sent to the server (versions and snapshots).
Encryption is not used for local (on-disk) sync, but is used for all cases where data is sent from the local host.
## Key Derivation
The client derives the 32-byte encryption key from the configured encryption secret using PBKDF2 with HMAC-SHA256 and 100,000 iterations.
The salt value depends on the implementation of the protocol, as described in subsequent chapters.
## Encryption
The client uses [AEAD](https://commondatastorage.googleapis.com/chromium-boringssl-docs/aead.h.html), with algorithm CHACHA20_POLY1305.
The client should generate a random nonce, noting that AEAD is _not secure_ if a nonce is used repeatedly for the same key.
AEAD supports additional authenticated data (AAD) which must be provided for both open and seal operations.
In this protocol, the AAD is always 17 bytes of the form:
* `app_id` (byte) - always 1
* `version_id` (16 bytes) - 16-byte form of the version ID associated with this data
* for versions (AddVersion, GetChildVersion), the _parent_ version_id
* for snapshots (AddSnapshot, GetSnapshot), the snapshot version_id
The `app_id` field is for future expansion to handle other, non-task data using this protocol.
Including it in the AAD ensures that such data cannot be confused with task data.
Although the AEAD specification distinguishes ciphertext and tags, for purposes of this specification they are considered concatenated into a single bytestring as in BoringSSL's `EVP_AEAD_CTX_seal`.
## Representation
The final byte-stream is comprised of the following structure:
* `version` (byte) - format version (always 1)
* `nonce` (12 bytes) - encryption nonce
* `ciphertext` (remaining bytes) - ciphertext from sealing operation
The `version` field identifies this data format, and future formats will have a value other than 1 in this position.

View file

@ -0,0 +1,65 @@
# HTTP Representation
The transactions in the sync protocol are realized for an HTTP server at `<origin>` using the HTTP requests and responses described here.
The `origin` *should* be an HTTPS endpoint on general principle, but nothing in the functonality or security of the protocol depends on connection encryption.
The replica identifies itself to the server using a `client_id` in the form of a UUID.
This value is passed with every request in the `X-Client-Id` header, in its dashed-hex format.
The salt used in key derivation is the SHA256 hash of the 16-byte form of the client ID.
## AddVersion
The request is a `POST` to `<origin>/v1/client/add-version/<parentVersionId>`.
The request body contains the history segment, optionally encoded using any encoding supported by actix-web.
The content-type must be `application/vnd.taskchampion.history-segment`.
The success response is a 200 OK with an empty body.
The new version ID appears in the `X-Version-Id` header.
If included, a snapshot request appears in the `X-Snapshot-Request` header with value `urgency=low` or `urgency=high`.
On conflict, the response is a 409 CONFLICT with an empty body.
The expected parent version ID appears in the `X-Parent-Version-Id` header.
Other error responses (4xx or 5xx) may be returned and should be treated appropriately to their meanings in the HTTP specification.
## GetChildVersion
The request is a `GET` to `<origin>/v1/client/get-child-version/<parentVersionId>`.
The response is determined as described above.
The _not-found_ response is 404 NOT FOUND.
The _gone_ response is 410 GONE.
Neither has a response body.
On success, the response is a 200 OK.
The version's history segment is returned in the response body, with content-type `application/vnd.taskchampion.history-segment`.
The version ID appears in the `X-Version-Id` header.
The response body may be encoded, in accordance with any `Accept-Encoding` header in the request.
On failure, a client should treat a 404 NOT FOUND as indicating that it is up-to-date.
Clients should treat a 410 GONE as a synchronization error.
If the client has pending changes to send to the server, based on a now-removed version, then those changes cannot be reconciled and will be lost.
The client should, optionally after consulting the user, download and apply the latest snapshot.
## AddSnapshot
The request is a `POST` to `<origin>/v1/client/add-snapshot/<versionId>`.
The request body contains the snapshot data, optionally encoded using any encoding supported by actix-web.
The content-type must be `application/vnd.taskchampion.snapshot`.
If the version is invalid, as described above, the response should be 400 BAD REQUEST.
The server response should be 200 OK on success.
## GetSnapshot
The request is a `GET` to `<origin>/v1/client/snapshot`.
The response is a 200 OK.
The snapshot is returned in the response body, with content-type `application/vnd.taskchampion.snapshot`.
The version ID appears in the `X-Version-Id` header.
The response body may be encoded, in accordance with any `Accept-Encoding` header in the request.
After downloading and decrypting a snapshot, a client must replace its entire local task database with the content of the snapshot.
Any local operations that had not yet been synchronized must be discarded.
After the snapshot is applied, the client should begin the synchronization process again, starting from the snapshot version.

View file

@ -0,0 +1,9 @@
# Object Store Representation
TaskChampion also supports use of a generic key-value store to synchronize replicas.
In this case, the salt used in key derivation is a random 16-byte value, stored
in the object store and retrieved as needed.
The details of the mapping from this protocol to keys and values are private to the implementation.
Other applications should not access the key-value store directly.

View file

@ -2,7 +2,7 @@
The basic synchronization model described in the previous page has a few shortcomings:
* servers must store an ever-increasing quantity of versions
* a new replica must download all versions since the beginning in order to derive the current state
* a new replica must download all versions since the beginning (the nil UUID) in order to derive the current state
Snapshots allow TaskChampion to avoid both of these issues.
A snapshot is a copy of the task database at a specific version.
@ -37,12 +37,3 @@ This saves resources in these restricted environments.
A snapshot must be made on a replica with no unsynchronized operations.
As such, it only makes sense to request a snapshot in response to a successful AddVersion request.
## Handling Deleted Versions
When a replica requests a child version, the response must distinguish two cases:
1. No such child version exists because the replica is up-to-date.
1. No such child version exists because it has been deleted, and the replica must re-initialize itself.
The details of this logic are covered in the [Server-Replica Protocol](./sync-protocol.md).

View file

@ -32,7 +32,10 @@ For those familiar with distributed version control systems, a state is analogou
Fundamentally, synchronization involves all replicas agreeing on a single, linear sequence of operations and the state that those operations create.
Since the replicas are not connected, each may have additional operations that have been applied locally, but which have not yet been agreed on.
The synchronization process uses operational transformation to "linearize" those operations.
This process is analogous (vaguely) to rebasing a sequence of Git commits.
Critically, though, operations cannot merge; in effect, the only option is rebasing.
Furthermore, once an operation has been sent to the server it cannot be changed; in effect, the server does not permit "force push".
### Sync Operations
@ -135,4 +138,4 @@ Without synchronization, its list of pending operations would grow indefinitely,
So all replicas, even "singleton" replicas which do not replicate task data with any other replica, must synchronize periodically.
TaskChampion provides a `LocalServer` for this purpose.
It implements the `get_child_version` and `add_version` operations as described, storing data on-disk locally, all within the `ta` binary.
It implements the `get_child_version` and `add_version` operations as described, storing data on-disk locally.

View file

@ -1,91 +1,42 @@
# Server-Replica Protocol
The server-replica protocol is defined abstractly in terms of request/response transactions from the replica to the server.
This is made concrete in an HTTP representation.
The server-replica protocol is defined abstractly in terms of request/response transactions.
The protocol builds on the model presented in the previous chapter, and in particular on the synchronization process.
The protocol builds on the model presented in the previous chapters, and in particular on the synchronization process.
## Clients
From the server's perspective, replicas accessing the same task history are indistinguishable, so this protocol uses the term "client" to refer generically to all replicas replicating a single task history.
Each client is identified and authenticated with a "client_id key", known only to the server and to the replicas replicating the task history.
From the protocol's perspective, replicas accessing the same task history are indistinguishable, so this protocol uses the term "client" to refer generically to all replicas replicating a single task history.
## Server
A server implements the requests and responses described below.
Where the logic is implemented depends on the specific implementation of the protocol.
For each client, the server is responsible for storing the task history, in the form of a branch-free sequence of versions.
It also stores the latest snapshot, if any exists.
From the server's perspective, snapshots and versions are opaque byte sequences.
* versions: a set of {versionId: UUID, parentVersionId: UUID, historySegment: bytes}
* latestVersionId: UUID
* snapshotVersionId: UUID
* snapshot: bytes
## Version Invariant
For each client, it stores a set of versions as well as the latest version ID, defaulting to the nil UUID.
Each version has a version ID, a parent version ID, and a history segment (opaque data containing the operations for that version).
The server should maintain the following invariants for each client:
The following invariant must always hold:
1. latestVersionId is nil or exists in the set of versions.
2. Given versions v1 and v2 for a client, with v1.versionId != v2.versionId and v1.parentVersionId != nil, v1.parentVersionId != v2.parentVersionId.
In other words, versions do not branch.
3. If snapshotVersionId is nil, then there is a version with parentVersionId == nil.
4. If snapshotVersionId is not nil, then there is a version with parentVersionId = snapshotVersionId.
Note that versions form a linked list beginning with the latestVersionId stored for the client.
This linked list need not continue back to a version with v.parentVersionId = nil.
It may end at any point when v.parentVersionId is not found in the set of Versions.
This observation allows the server to discard older versions.
The third invariant prevents the server from discarding versions if there is no snapshot.
The fourth invariant prevents the server from discarding versions newer than the snapshot.
> All versions are linked by parent-child relationships to form a single chain.
> That is, each version must have no more than one parent and one child, and no more than one version may have zero parents or zero children.
## Data Formats
### Encryption
The client configuration includes an encryption secret of arbitrary length and a clientId to identify itself.
This section describes how that information is used to encrypt and decrypt data sent to the server (versions and snapshots).
#### Key Derivation
The client derives the 32-byte encryption key from the configured encryption secret using PBKDF2 with HMAC-SHA256 and 100,000 iterations.
The salt is the SHA256 hash of the 16-byte form of the client ID.
#### Encryption
The client uses [AEAD](https://commondatastorage.googleapis.com/chromium-boringssl-docs/aead.h.html), with algorithm CHACHA20_POLY1305.
The client should generate a random nonce, noting that AEAD is _not secure_ if a nonce is used repeatedly for the same key.
AEAD supports additional authenticated data (AAD) which must be provided for both open and seal operations.
In this protocol, the AAD is always 17 bytes of the form:
* `app_id` (byte) - always 1
* `version_id` (16 bytes) - 16-byte form of the version ID associated with this data
* for versions (AddVersion, GetChildVersion), the _parent_ version_id
* for snapshots (AddSnapshot, GetSnapshot), the snapshot version_id
The `app_id` field is for future expansion to handle other, non-task data using this protocol.
Including it in the AAD ensures that such data cannot be confused with task data.
Although the AEAD specification distinguishes ciphertext and tags, for purposes of this specification they are considered concatenated into a single bytestring as in BoringSSL's `EVP_AEAD_CTX_seal`.
#### Representation
The final byte-stream is comprised of the following structure:
* `version` (byte) - format version (always 1)
* `nonce` (12 bytes) - encryption nonce
* `ciphertext` (remaining bytes) - ciphertext from sealing operation
The `version` field identifies this data format, and future formats will have a value other than 1 in this position.
Task data sent to the server is encrypted by the client, using the scheme described in the "Encryption" chapter.
### Version
The decrypted form of a version is a JSON array containing operations in the order they should be applied.
Each operation has the form `{TYPE: DATA}`, for example:
* `{"Create":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}`
* `{"Delete":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}`
* `{"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":"v","timestamp":"2021-10-11T12:47:07.188090948Z"}}`
* `{"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":null,"timestamp":"2021-10-11T12:47:07.188090948Z"}}` (to delete a property)
* `[{"Create":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}]`
* `[{"Delete":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}]`
* `[{"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":"v","timestamp":"2021-10-11T12:47:07.188090948Z"}}]`
* `[{"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":null,"timestamp":"2021-10-11T12:47:07.188090948Z"}}]` (to delete a property)
Timestamps are in RFC3339 format with a `Z` suffix.
@ -108,24 +59,25 @@ For example (pretty-printed for clarity):
## Transactions
All interactions between the client and server are defined in terms of request/response transactions, as described here.
### AddVersion
The AddVersion transaction requests that the server add a new version to the client's task history.
The request contains the following;
* parent version ID
* history segment
* parent version ID, and
* encrypted version data.
The server determines whether the new version is acceptable, atomically with respect to other requests for the same client.
If it has no versions for the client, it accepts the version.
If it already has one or more versions for the client, then it accepts the version only if the given parent version ID matches its stored latest parent ID.
If it already has one or more versions for the client, then it accepts the version only if the given parent version has no children, thereby maintaining the version invariant.
If the version is accepted, the server generates a new version ID for it.
The version is added to the set of versions for the client, the client's latest version ID is set to the new version ID.
The new version ID is returned in the response to the client.
The version is added to the chain of versions for the client, and the new version ID is returned in the response to the client.
The response may also include a request for a snapshot, with associated urgency.
If the version is not accepted, the server makes no changes, but responds to the client with a conflict indication containing the latest version ID.
If the version is not accepted, the server makes no changes, but responds to the client with a conflict indication containing the ID of the version which has no children.
The client may then "rebase" its operations and try again.
Note that if a client receives two conflict responses with the same parent version ID, it is an indication that the client's version history has diverged from that on the server.
@ -138,23 +90,17 @@ If found, it returns the version's
* version ID,
* parent version ID (matching that in the request), and
* history segment.
* encrypted version data.
The response is either a version (success, _not-found_, or _gone_, as determined by the first of the following to apply:
* If a version with parentVersionId equal to the requested parentVersionId exists, it is returned.
* If the requested parentVersionId is the nil UUID ..
* ..and snapshotVersionId is nil, the response is _not-found_ (the client has no versions).
* ..and snapshotVersionId is not nil, the response is _gone_ (the first version has been deleted).
* If a version with versionId equal to the requested parentVersionId exists, the response is _not-found_ (the client is up-to-date)
* Otherwise, the response is _gone_ (the requested version has been deleted).
If not found, it returns an indication that no such version exists.
### AddSnapshot
The AddSnapshot transaction requests that the server store a new snapshot, generated by the client.
The request contains the following:
* version ID at which the snapshot was made
* snapshot data (opaque to the server)
* version ID at which the snapshot was made, and
* encrypted snapshot data.
The server should validate that the snapshot is for an existing version and is newer than any existing snapshot.
It may also validate that the snapshot is for a "recent" version (e.g., one of the last 5 versions).
@ -167,66 +113,3 @@ The server response is empty.
The GetSnapshot transaction requests that the server provide the latest snapshot.
The response contains the snapshot version ID and the snapshot data, if those exist.
## HTTP Representation
The transactions above are realized for an HTTP server at `<origin>` using the HTTP requests and responses described here.
The `origin` *should* be an HTTPS endpoint on general principle, but nothing in the functonality or security of the protocol depends on connection encryption.
The replica identifies itself to the server using a `client_id` in the form of a UUID.
This value is passed with every request in the `X-Client-Id` header, in its dashed-hex format.
### AddVersion
The request is a `POST` to `<origin>/v1/client/add-version/<parentVersionId>`.
The request body contains the history segment, optionally encoded using any encoding supported by actix-web.
The content-type must be `application/vnd.taskchampion.history-segment`.
The success response is a 200 OK with an empty body.
The new version ID appears in the `X-Version-Id` header.
If included, a snapshot request appears in the `X-Snapshot-Request` header with value `urgency=low` or `urgency=high`.
On conflict, the response is a 409 CONFLICT with an empty body.
The expected parent version ID appears in the `X-Parent-Version-Id` header.
Other error responses (4xx or 5xx) may be returned and should be treated appropriately to their meanings in the HTTP specification.
### GetChildVersion
The request is a `GET` to `<origin>/v1/client/get-child-version/<parentVersionId>`.
The response is determined as described above.
The _not-found_ response is 404 NOT FOUND.
The _gone_ response is 410 GONE.
Neither has a response body.
On success, the response is a 200 OK.
The version's history segment is returned in the response body, with content-type `application/vnd.taskchampion.history-segment`.
The version ID appears in the `X-Version-Id` header.
The response body may be encoded, in accordance with any `Accept-Encoding` header in the request.
On failure, a client should treat a 404 NOT FOUND as indicating that it is up-to-date.
Clients should treat a 410 GONE as a synchronization error.
If the client has pending changes to send to the server, based on a now-removed version, then those changes cannot be reconciled and will be lost.
The client should, optionally after consulting the user, download and apply the latest snapshot.
### AddSnapshot
The request is a `POST` to `<origin>/v1/client/add-snapshot/<versionId>`.
The request body contains the snapshot data, optionally encoded using any encoding supported by actix-web.
The content-type must be `application/vnd.taskchampion.snapshot`.
If the version is invalid, as described above, the response should be 400 BAD REQUEST.
The server response should be 200 OK on success.
### GetSnapshot
The request is a `GET` to `<origin>/v1/client/snapshot`.
The response is a 200 OK.
The snapshot is returned in the response body, with content-type `application/vnd.taskchampion.snapshot`.
The version ID appears in the `X-Version-Id` header.
The response body may be encoded, in accordance with any `Accept-Encoding` header in the request.
After downloading and decrypting a snapshot, a client must replace its entire local task database with the content of the snapshot.
Any local operations that had not yet been synchronized must be discarded.
After the snapshot is applied, the client should begin the synchronization process again, starting from the snapshot version.