Merge pull request #36 from djmitche/issue20

refactor working-set support, improve data model docs
2025-06-26 10:54:26 +02:00 · 2020-11-22 19:48:52 -05:00 · 2020-11-22 19:48:52 -05:00 · c2c2a00ed5
commit c2c2a00ed5
parent ffbf272afc 03e4fc7cee
14 changed files with 394 additions and 199 deletions
--- a/README.md
+++ b/README.md
@ -5,7 +5,16 @@ TaskChampion is an open-source personal task-tracking application.
 Use it to keep track of what you need to do, with a quick command-line interface and flexible sorting and filtering.
 It is modeled on [TaskWarrior](https://taskwarrior.org), but not a drop-in replacement for that application.

+Goals:
+
+ * Feature parity with TaskWarrior (but not compatibility)
+ * Aproachable, maintainable codebase
+ * Active development community
+ * Reasonable privacy: user's task details not visible on server
+ * Reliable concurrency - clients do not diverge
+ * Storage performance O(n) with n number of tasks
+
 See:

- * [Development Notes](docs/development-notes.md)
+ * [Documentation](docs/src/SUMMARY.md) (will be published as an mdbook eventually)
 * [Progress on the first version](https://github.com/djmitche/taskwarrior-rust/projects/1)
--- a/docs/src/SUMMARY.md
+++ b/docs/src/SUMMARY.md
@ -2,7 +2,9 @@

 - [Installation](./installation.md)
 - [Usage](./usage.md)
- [Internal Details](./internals.md)
-  - [Data Model](./data-model.md)
 ---
- [Development Notes](./development-notes.md)
+- [Data Model](./data-model.md)
+  - [Replica Storage](./storage.md)
+  - [Task Database](./taskdb.md)
+- [Synchronization](./sync.md)
+  - [Planned Functionality](./plans.md)
--- a/docs/src/data-model.md
+++ b/docs/src/data-model.md
@ -1,42 +1,5 @@
 # Data Model

-A client manages a single offline instance of a single user's task list.
-The data model is only seen from the clients' perspective.
-
-## Task Database
-
-The task database is composed of an un-ordered collection of tasks, each keyed by a UUID.
-Each task in the database has an arbitrary-sized set of key/value properties, with string values.
-
-Tasks are only created and modified; "deleted" tasks continue to stick around and can be modified and even un-deleted.
-Tasks have an expiration time, after which they may be purged from the database.
-
-## Task Fields
-
-Each task can have any of the following fields.
-Timestamps are stored as UNIX epoch timestamps, in the form of an integer expressed in decimal notation.
-Note that it is possible for any field to be omitted.
-
-NOTE: This structure is based on https://taskwarrior.org/docs/design/task.html, but will diverge from that
-model over time.
-
-* `status` - one of `Pending`, `Completed`, `Deleted`, `Recurring`, or `Waiting`
-* `entry` (timestamp) - time that the task was created
-* `description` - the one-line summary of the task
-* `start` (timestamp) - if set, the task is active and this field gives the time the task was started
-* `end` (timestamp) - the time at which the task was deleted or completed
-* `due` (timestamp) - the time at which the task is due
-* `until` (timestamp) - the time after which recurrent child tasks should not be created
-* `wait` (timestamp) - the time before which this task is considered waiting and should not be shown
-* `modified` (timestamp) - time that the task was last modified
-* `scheduled` (timestamp) - time that the task is available to start
-* `recur` - recurrence frequency
-* `mask` - recurrence history
-* `imask` - for children of recurring tasks, the index into the `mask` property on the parent
-* `parent` - for children of recurring tasks, the uuid of the parent task
-* `project` - the task's project (usually a short identifier)
-* `priority` - the task's priority, one of `L`, `M`, or `H`.
-* `depends` - a comma (`,`) separated list of uuids of tasks on which this task depends
-* `tags` - a comma (`,`) separated list of tags for this task
-* `annotation_<timestamp>` - an annotation for this task, with the timestamp as part of the key
-* `udas` - user-defined attributes
+A client manages a single offline instance of a single user's task list, called a replica.
+This section covers the structure of that data.
+Note that this data model is visible only on the client; the server does not have access to client data.
--- a/docs/src/development-notes.md
+++ b/docs/src/development-notes.md
@ -1,93 +0,0 @@
-Goals:
-
- * Reasonable privacy: user's task details not visible on server
- * Reliable concurrency - clients do not diverge
- * Storage O(n) with n number of tasks
-
-# Operations
-
-Every change to the task database is captured as an operation.
-Each operation has one of the forms 
- * `Create(uuid)`
- * `Delete(uuid)`
- * `Update(uuid, property, value, timestamp)`
-
-The Create form creates a new task.
-It is invalid to create a task that already exists.
-
-Similarly, the Delete form deletes an existing task.
-It is invalid to delete a task that does not exist.
-
-The Update form updates the given property of the given task, where property and value are both strings.
-Value can also be `None` to indicate deletion of a property.
-It is invalid to update a task that does not exist.
-The timestamp on updates serves as additional metadata and is used to resolve conflicts.
-
-Operations act as deltas between database states.
-
-## Versions and Synchronization
-
-Occasionally, database states are named with an integer, called a version.
-The system as a whole (server and clients) constructs a monotonic sequence of versions and the operations that separate each version from the next.
-No gaps are allowed in the verison numbering.
-Version 0 is implicitly the empty database.
-
-The server stores the operations for each version, and provides them as needed to clients.
-Clients use this information to update their local task databases, and to generate new versions to send to the server.
-
-Clients generate a new version to transmit changes made locally to the server.
-The changes are represented as a sequence of operations with the final operation being tagged as the version.
-In order to keep the gap-free monotonic numbering, the server will only accept a proposed version from a client if its number is one greater that the latest version on the server.
-When this is not the case, the client must "rebase" the local changes onto the latest version from the server and try again.
-This operation is performed using operational transformation (OT).
-The result of this transformation is a sequence of operations based on the latest version, and a sequence of operations the client can apply to its local task database to "catch up" to the version on the server.
-
-## Snapshots
-
-As designed, storage required on the server would grow with time, as would the time required for new clients to update to the latest version.
-As an optimization, the server also stores "snapshots" containing a full copy of the task database at a given version.
-Based on configurable heuristics, it may delete older operations and snapshots, as long as enough data remains for active clients to synchronize and for new clients to initialize.
-
-Since snapshots must be computed by clients, the server may "request" a snapshot when providing the latest version to a client.
-This request comes with a number indicating how much it 'wants" the snapshot.
-Clients which can easily generate and transmit a snapshot should be generous to the server, while clients with more limited resources can wait until the server's requests are more desperate.
-The intent is, where possible, to request snapshots created on well-connected desktop clients over mobile and low-power clients.
-
-## Encryption and Signing
-
-From the server's perspective, all data except for version numbers are opaque binary blobs.
-Clients encrypt and sign these blobs using a symmetric key known only to the clients.
-This secures the data at-rest on the server.
-Note that privacy is not complete, as the server still has some information about users, including source and frequency of synchronization transactions and size of those transactions.
-
-## Backups
-
-In this design, the server is little more than an authenticated storage for encrypted blobs provided by the client.
-To allow for failure or data loss on the server, clients are expected to cache these blobs locally for a short time (a week), along with a server-provided HMAC signature.
-When data loss is detected -- such as when a client expects the server to have a version N or higher, and the server only has N-1, the client can send those blobs to the server.
-The server can validate the HMAC and, if successful, add the blobs to its datastore.
-
-## Expiration
-
-TBD
-
-.. conditions on flushing to allow consistent handling
-
-# Implementation Notes
-
-## Client / Server Protocol
-
-TBD
-
-.. using HTTP
-.. user auth
-.. user setup process
-
-## Batching Operations
-
-TBD
-
-## Recurrence
-
-TBD
-
--- a/docs/src/internals.md
+++ b/docs/src/internals.md
@ -1,4 +0,0 @@
-# Internal Details
-
-This section describes some of the internal details of TaskChampion.
-While this section is not required to use TaskChampion, understanding some of these details may help to understand how TaskChampion behaves.
--- a/docs/src/plans.md
+++ b/docs/src/plans.md
@ -0,0 +1,35 @@
+# Planned Functionality
+
+This section is a bit of a to-do list for additional functionality to add to the synchronzation system.
+Each feature has some discussion of how it might be implemented.
+
+## Snapshots
+
+As designed, storage required on the server would grow with time, as would the time required for new clients to update to the latest version.
+As an optimization, the server also stores "snapshots" containing a full copy of the task database at a given version.
+Based on configurable heuristics, it may delete older operations and snapshots, as long as enough data remains for active clients to synchronize and for new clients to initialize.
+
+Since snapshots must be computed by clients, the server may "request" a snapshot when providing the latest version to a client.
+This request comes with a number indicating how much it 'wants" the snapshot.
+Clients which can easily generate and transmit a snapshot should be generous to the server, while clients with more limited resources can wait until the server's requests are more desperate.
+The intent is, where possible, to request snapshots created on well-connected desktop clients over mobile and low-power clients.
+
+## Encryption and Signing
+
+From the server's perspective, all data except for version numbers are opaque binary blobs.
+Clients encrypt and sign these blobs using a symmetric key known only to the clients.
+This secures the data at-rest on the server.
+Note that privacy is not complete, as the server still has some information about users, including source and frequency of synchronization transactions and size of those transactions.
+
+## Backups
+
+In this design, the server is little more than an authenticated storage for encrypted blobs provided by the client.
+To allow for failure or data loss on the server, clients are expected to cache these blobs locally for a short time (a week), along with a server-provided HMAC signature.
+When data loss is detected -- such as when a client expects the server to have a version N or higher, and the server only has N-1, the client can send those blobs to the server.
+The server can validate the HMAC and, if successful, add the blobs to its datastore.
+
+## Expiration
+
+Deleted tasks remain in the task database, and are simply hidden in most views.
+All tasks have an expiration time after which they may be flushed, preventing unbounded increase in task database size.
+However, purging of a task does not satisfy the necessary OT guarantees, so some further formal design work is required before this is implemented.
--- a/docs/src/storage.md
+++ b/docs/src/storage.md
@ -0,0 +1,73 @@
+# Replica Storage
+
+Each replica has a storage backend.
+The interface for this backend is given in `crate::taskstorage::TaskStorage` and `TaskStorageTxn`.
+
+The storage is transaction-protected, with the expectation of a serializable isolation level.
+The storage contains the following information:
+
+- `tasks`: a set of tasks, indexed by UUID
+- `base_version`: the number of the last version sync'd from the server
+- `operations`: all operations performed since base_version
+- `working_set`: a mapping from integer -> UUID, used to keep stable small-integer indexes into the tasks for users' convenience.  This data is not synchronized with the server and does not affect any consistency guarantees.
+
+## Tasks
+
+The tasks are stored as an un-ordered collection, keyed by task UUID.
+Each task in the database has an arbitrary-sized set of key/value properties, with string values.
+
+Tasks are only created and modified; "deleted" tasks continue to stick around and can be modified and even un-deleted.
+Tasks have an expiration time, after which they may be purged from the database.
+
+### Task Fields
+
+Each task can have any of the following fields.
+Timestamps are stored as UNIX epoch timestamps, in the form of an integer expressed in decimal notation.
+Note that it is possible, in task storage, for any field to be omitted.
+
+NOTE: This structure is based on https://taskwarrior.org/docs/design/task.html, but will diverge from that
+model over time.
+
+* `status` - one of `Pending`, `Completed`, `Deleted`, `Recurring`, or `Waiting`
+* `entry` (timestamp) - time that the task was created
+* `description` - the one-line summary of the task
+* `start` (timestamp) - if set, the task is active and this field gives the time the task was started
+* `end` (timestamp) - the time at which the task was deleted or completed
+* `due` (timestamp) - the time at which the task is due
+* `until` (timestamp) - the time after which recurrent child tasks should not be created
+* `wait` (timestamp) - the time before which this task is considered waiting and should not be shown
+* `modified` (timestamp) - time that the task was last modified
+* `scheduled` (timestamp) - time that the task is available to start
+* `recur` - recurrence frequency
+* `mask` - recurrence history
+* `imask` - for children of recurring tasks, the index into the `mask` property on the parent
+* `parent` - for children of recurring tasks, the uuid of the parent task
+* `project` - the task's project (usually a short identifier)
+* `priority` - the task's priority, one of `L`, `M`, or `H`.
+* `depends` - a comma (`,`) separated list of uuids of tasks on which this task depends
+* `tags` - a comma (`,`) separated list of tags for this task
+* `annotation_<timestamp>` - an annotation for this task, with the timestamp as part of the key
+* `udas` - user-defined attributes
+
+## Operations
+
+Every change to the task database is captured as an operation.
+In other words, operations act as deltas between database states.
+Operations are crucial to synchronization of replicas, using a technique known as Operational Transforms.
+
+Each operation has one of the forms 
+
+ * `Create(uuid)`
+ * `Delete(uuid)`
+ * `Update(uuid, property, value, timestamp)`
+
+The Create form creates a new task.
+It is invalid to create a task that already exists.
+
+Similarly, the Delete form deletes an existing task.
+It is invalid to delete a task that does not exist.
+
+The Update form updates the given property of the given task, where property and value are both strings.
+Value can also be `None` to indicate deletion of a property.
+It is invalid to update a task that does not exist.
+The timestamp on updates serves as additional metadata and is used to resolve conflicts.
--- a/docs/src/sync.md
+++ b/docs/src/sync.md
@ -0,0 +1,120 @@
+# Synchronization
+
+The [task database](./taskdb.md) also implements synchronization.
+Synchronization occurs between disconnected replicas, mediated by a server.
+The replicas never communicate directly with one another.
+The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata.
+
+The synchronization process is a critical part of the task database's functionality, and it cannot function efficiently without occasional synchronization operations
+
+## Operational Transformations
+
+Synchronization is based on [operational transformation](https://en.wikipedia.org/wiki/Operational_transformation).
+This section will assume some familiarity with the concept.
+
+## State and Operations
+
+At a given time, the set of tasks in a replica's storage is the essential "state" of that replica.
+All modifications to that state occur via operations, as defined in [Replica Storage](./storage.md).
+We can draw a network, or graph, with the nodes representing states and the edges representing operations.
+For example:
+
+```text
+  o -- State: {abc-d123: 'get groceries', priority L}
+  |
+  | -- Operation: set abc-d123 priority to H
+  |
+  o -- State: {abc-d123: 'get groceries', priority H}
+```
+
+For those familiar with distributed version control systems, a state is analogous to a revision, while an operation is analogous to a commit.
+
+Fundamentally, synchronization involves all replicas agreeing on a single, linear sequence of operations and the state that those operations create.
+Since the replicas are not connected, each may have additional operations that have been applied locally, but which have not yet been agreed on.
+The synchronization process uses operational transformation to "linearize" those operations.
+This process is analogous (vaguely) to rebasing a sequence of Git commits.
+
+### Versions
+
+Occasionally, database states are named with an integer, called a version.
+The system as a whole (all replicas) constructs a monotonic sequence of versions and the operations that separate each version from the next.
+No gaps are allowed in the version numbering.
+Version 0 is implicitly the empty database.
+
+The server stores the operations to change a state from a version N to a version N+1, and provides that information as needed to replicas.
+Replicas use this information to update their local task databases, and to generate new versions to send to the server.
+
+Replicas generate a new version to transmit changes made locally to the server.
+The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version.
+In order to keep the gap-free monotonic numbering, the server will only accept a proposed version from a replica if its number is one greater that the latest version on the server.
+
+In the non-conflict case (such as with a single replica), then, a replica's synchronization process involves gathering up the operations it has accumulated since its last synchronization; bundling those operations into version N+1; and sending that version to the server.
+
+### Transformation
+
+When the latest version on the server contains operations that are not present in the replica, then the states have diverged.
+For example (with lower-case letters designating operations):
+
+```text
+  o  -- version N
+ w|\a
+  o o
+ x|  \b
+  o   o
+ y|    \c
+  o     o -- replica's local state
+ z|
+  o -- version N+1
+```
+
+In this situation, the replica must "rebase" the local operations onto the latest version from the server and try again.
+This process is performed using operational transformation (OT).
+The result of this transformation is a sequence of operations based on the latest version, and a sequence of operations the replica can apply to its local task database to reach the same state
+Continuing the example above, the resulting operations are shown with `'`:
+
+```text
+  o  -- version N
+ w|\a
+  o o
+ x|  \b
+  o   o
+ y|    \c
+  o     o -- replica's intermediate local state
+ z|     |w'
+  o-N+1 o
+ a'\    |x'
+    o   o
+   b'\  |y'
+      o o
+     c'\|z'
+        o  -- version N+2
+```
+
+The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2.
+Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate *precisely* the same final state at version N+2.
+Careful selection of the operations and the transformation function ensure this.
+
+See the comments in the source code for the details of how this transformation process is implemented.
+
+## Replica Implementation
+
+The replica's [storage](./storage.md) contains the current state in `tasks`, the as-yet un-synchronized operations in `operations`, and the last version at which synchronization occurred in `base_version`.
+
+To perform a synchronization, the replica first requests any versions greater than `base_version` from the server, and rebases any local operations on top of those new versions, updating `base_version`.
+If there are no un-synchronized local operations, the process is complete.
+Otherwise, the replica creates a new version containing those local operations and uploads that to the server.
+In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version.
+In this case, the process repeats.
+
+The replica's un-synchronized operations are already reflected in `tasks`, so the following invariant holds:
+
+> Applying `operations` to the set of tasks at `base_version` gives a set of tasks identical
+> to `tasks`.
+
+## Server Implementation
+
+The server implementation is simple.
+It supports fetching versions keyed by number, and adding a new version.
+In adding a new version, the version number must be one greater than the greatest existing version.
+
+Critically, the server operates on nothing more than numbered, opaque blobs of data.
--- a/docs/src/taskdb.md
+++ b/docs/src/taskdb.md
@ -0,0 +1,28 @@
+# Task Database
+
+The task database is a layer of abstraction above the replica storage layer, responsible for maintaining some important invariants.
+While the storage is pluggable, there is only one implementation of the task database.
+
+## Reading Data
+
+The task database provides read access to the data in the replica's storage through a variety of methods on the struct.
+Each read operation is executed in a transaction, so data may not be consistent between read operations.
+In practice, this is not an issue for TaskChampion's purposes.
+
+## Working Set
+
+The task database maintains the working set.
+The working set maps small integers to current tasks, for easy reference by command-line users.
+This is done in such a way that the task numbers remain stable until the working set is rebuilt, at which point gaps in the numbering, such as for completed tasks, are removed by shifting all higher-numbered tasks downward.
+
+The working set is not replicated, and is not considered a part of any consistency guarantees in the task database.
+
+## Modifying Data
+
+Modifications to the data set are made by applying operations.
+Operations are described in [Replica Storage](./storage.md).
+
+Each operation is added to the list of operations in the storage, and simultaneously applied to the tasks in that storage.
+Operations are checked for validity as they are applied.
+
+
--- a/src/replica.rs
+++ b/src/replica.rs
@ -35,6 +35,21 @@ impl Replica {
        })
    }

+    /// Return true if this status string is such that the task should be included in
+    /// the working set.
+    fn is_working_set_status(status: Option<&String>) -> bool {
+        if let Some(status) = status {
+            status == "pending"
+        } else {
+            false
+        }
+    }
+
+    /// Add the given uuid to the working set, returning its index.
+    fn add_to_working_set(&mut self, uuid: &Uuid) -> Fallible<u64> {
+        self.taskdb.add_to_working_set(uuid)
+    }
+
    /// Get all tasks represented as a map keyed by UUID
    pub fn all_tasks<'a>(&'a mut self) -> Fallible<HashMap<Uuid, Task>> {
        Ok(self
@ -72,6 +87,17 @@ impl Replica {
        Ok(self.taskdb.get_task(&uuid)?.map(|t| (&t).into()))
    }

+    /// Get an existing task by its working set index
+    pub fn get_working_set_task(&mut self, i: u64) -> Fallible<Option<Task>> {
+        let working_set = self.taskdb.working_set()?;
+        if (i as usize) < working_set.len() {
+            if let Some(uuid) = working_set[i as usize] {
+                return Ok(self.taskdb.get_task(&uuid)?.map(|t| (&t).into()));
+            }
+        }
+        return Ok(None);
+    }
+
    /// Create a new task.  The task must not already exist.
    pub fn new_task(
        &mut self,
@ -115,9 +141,10 @@ impl Replica {
    }

    /// Perform "garbage collection" on this replica.  In particular, this renumbers the working
-    /// set.
+    /// set to contain only pending tasks.
    pub fn gc(&mut self) -> Fallible<()> {
-        self.taskdb.rebuild_working_set()?;
+        self.taskdb
+            .rebuild_working_set(|t| Replica::is_working_set_status(t.get("status")))?;
        Ok(())
    }
 }
@ -177,9 +204,14 @@ impl<'a> TaskMut<'a> {
        )
    }

-    /// Set the task's status
+    /// Set the task's status.  This also adds the task to the working set if the
+    /// new status puts it in that set.
    pub fn status(&mut self, status: Status) -> Fallible<()> {
-        self.set_string("status", Some(String::from(status.as_ref())))
+        let status = String::from(status.as_ref());
+        if Replica::is_working_set_status(Some(&status)) {
+            self.replica.add_to_working_set(&self.uuid)?;
+        }
+        self.set_string("status", Some(status))
    }

    /// Set the task's description
@ -326,6 +358,22 @@ mod tests {
        assert_eq!(t.project, Some("work".into()));
    }

+    #[test]
+    fn set_pending_adds_to_working_set() {
+        let mut rep = Replica::new(DB::new_inmemory().into());
+        let uuid = Uuid::new_v4();
+
+        rep.new_task(uuid.clone(), Status::Pending, "to-be-pending".into())
+            .unwrap();
+
+        let mut tm = rep.get_task_mut(&uuid).unwrap().unwrap();
+        tm.status(Status::Pending).unwrap();
+
+        let t = rep.get_working_set_task(1).unwrap().unwrap();
+        assert_eq!(t.status, Status::Pending);
+        assert_eq!(t.description, String::from("to-be-pending"));
+    }
+
    #[test]
    fn get_does_not_exist() {
        let mut rep = Replica::new(DB::new_inmemory().into());
--- a/src/taskdb.rs
+++ b/src/taskdb.rs
@ -104,24 +104,27 @@ impl DB {
        txn.get_task(uuid)
    }

-    /// Rebuild the working set.  This renumbers the pending tasks to eliminate gaps, and also
-    /// finds any tasks whose statuses changed without being noticed.
-    pub fn rebuild_working_set(&mut self) -> Fallible<()> {
-        // TODO: this logic belongs in Replica
-        // TODO: it's every status but Completed and Deleted, I think?
+    /// Rebuild the working set using a function to identify tasks that should be in the set.  This
+    /// renumbers the existing working-set tasks to eliminate gaps, and also adds any tasks that
+    /// are not already in the working set but should be.  The rebuild occurs in a single
+    /// trasnsaction against the storage backend.
+    pub fn rebuild_working_set<F>(&mut self, in_working_set: F) -> Fallible<()>
+    where
+        F: Fn(&TaskMap) -> bool,
+    {
        let mut txn = self.storage.txn()?;

        let mut new_ws = vec![];
        let mut seen = HashSet::new();
-        let pending = String::from("pending");

-        // The goal here is for existing working-set items to be "compressed' down to index
-        // 1, so we begin by scanning the current working set and inserting any still-pending
-        // tasks into the new list
+        // The goal here is for existing working-set items to be "compressed' down to index 1, so
+        // we begin by scanning the current working set and inserting any tasks that should still
+        // be in the set into new_ws, implicitly dropping any tasks that are no longer in the
+        // working set.
        for elt in txn.get_working_set()? {
            if let Some(uuid) = elt {
                if let Some(task) = txn.get_task(&uuid)? {
-                    if task.get("status") == Some(&pending) {
+                    if in_working_set(&task) {
                        new_ws.push(uuid.clone());
                        seen.insert(uuid);
                    }
@ -129,24 +132,43 @@ impl DB {
            }
        }

-        // Now go hunting for tasks that are pending and are not already in this list
+        // Now go hunting for tasks that should be in this list but are not, adding them at the
+        // end of the list.
        for (uuid, task) in txn.all_tasks()? {
            if !seen.contains(&uuid) {
-                if task.get("status") == Some(&pending) {
+                if in_working_set(&task) {
                    new_ws.push(uuid.clone());
                }
            }
        }

+        // clear and re-write the entire working set, in order
        txn.clear_working_set()?;
        for uuid in new_ws.drain(0..new_ws.len()) {
-            txn.add_to_working_set(uuid)?;
+            txn.add_to_working_set(&uuid)?;
        }

        txn.commit()?;
        Ok(())
    }

+    /// Add the given uuid to the working set and return its index; if it is already in the working
+    /// set, its index is returned.  This does *not* renumber any existing tasks.
+    pub fn add_to_working_set(&mut self, uuid: &Uuid) -> Fallible<u64> {
+        let mut txn = self.storage.txn()?;
+        // search for an existing entry for this task..
+        for (i, elt) in txn.get_working_set()?.iter().enumerate() {
+            if *elt == Some(*uuid) {
+                // (note that this drops the transaction with no changes made)
+                return Ok(i as u64);
+            }
+        }
+        // and if not found, add one
+        let i = txn.add_to_working_set(uuid)?;
+        txn.commit()?;
+        Ok(i)
+    }
+
    /// Sync to the given server, pulling remote changes and pushing local changes.
    pub fn sync(&mut self, username: &str, server: &mut Server) -> Fallible<()> {
        let mut txn = self.storage.txn()?;
@ -448,7 +470,7 @@ mod tests {
            txn.clear_working_set()?;

            for i in &[1usize, 3, 4] {
-                txn.add_to_working_set(uuids[*i])?;
+                txn.add_to_working_set(&uuids[*i])?;
            }

            txn.commit()?;
@ -464,7 +486,13 @@ mod tests {
            ]
        );

-        db.rebuild_working_set()?;
+        db.rebuild_working_set(|t| {
+            if let Some(status) = t.get("status") {
+                status == "pending"
+            } else {
+                false
+            }
+        })?;

        // uuids[1] and uuids[4] are already in the working set, so are compressed
        // to the top, and then uuids[0] is added.
--- a/src/taskstorage/inmemory.rs
+++ b/src/taskstorage/inmemory.rs
@ -109,9 +109,9 @@ impl<'t> TaskStorageTxn for Txn<'t> {
        Ok(self.data_ref().working_set.clone())
    }

-    fn add_to_working_set(&mut self, uuid: Uuid) -> Fallible<u64> {
+    fn add_to_working_set(&mut self, uuid: &Uuid) -> Fallible<u64> {
        let working_set = &mut self.mut_data_ref().working_set;
-        working_set.push(Some(uuid));
+        working_set.push(Some(uuid.clone()));
        Ok(working_set.len() as u64)
    }

@ -194,8 +194,8 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
-            txn.add_to_working_set(uuid2.clone())?;
+            txn.add_to_working_set(&uuid1)?;
+            txn.add_to_working_set(&uuid2)?;
            txn.commit()?;
        }

@ -216,15 +216,15 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
-            txn.add_to_working_set(uuid2.clone())?;
+            txn.add_to_working_set(&uuid1)?;
+            txn.add_to_working_set(&uuid2)?;
            txn.commit()?;
        }

        {
            let mut txn = storage.txn()?;
            txn.remove_from_working_set(1)?;
-            txn.add_to_working_set(uuid1.clone())?;
+            txn.add_to_working_set(&uuid1)?;
            txn.commit()?;
        }

@ -244,7 +244,7 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
+            txn.add_to_working_set(&uuid1)?;
            txn.commit()?;
        }

@ -267,16 +267,16 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
-            txn.add_to_working_set(uuid2.clone())?;
+            txn.add_to_working_set(&uuid1)?;
+            txn.add_to_working_set(&uuid2)?;
            txn.commit()?;
        }

        {
            let mut txn = storage.txn()?;
            txn.clear_working_set()?;
-            txn.add_to_working_set(uuid2.clone())?;
-            txn.add_to_working_set(uuid1.clone())?;
+            txn.add_to_working_set(&uuid2)?;
+            txn.add_to_working_set(&uuid1)?;
            txn.commit()?;
        }

--- a/src/taskstorage/kv.rs
+++ b/src/taskstorage/kv.rs
@ -307,7 +307,7 @@ impl<'t> TaskStorageTxn for Txn<'t> {
        Ok(res)
    }

-    fn add_to_working_set(&mut self, uuid: Uuid) -> Fallible<u64> {
+    fn add_to_working_set(&mut self, uuid: &Uuid) -> Fallible<u64> {
        let working_set_bucket = self.working_set_bucket();
        let numbers_bucket = self.numbers_bucket();
        let kvtxn = self.kvtxn();
@ -321,7 +321,7 @@ impl<'t> TaskStorageTxn for Txn<'t> {
        kvtxn.set(
            working_set_bucket,
            next_index.into(),
-            Msgpack::to_value_buf(uuid)?,
+            Msgpack::to_value_buf(uuid.clone())?,
        )?;
        kvtxn.set(
            numbers_bucket,
@ -666,8 +666,8 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
-            txn.add_to_working_set(uuid2.clone())?;
+            txn.add_to_working_set(&uuid1)?;
+            txn.add_to_working_set(&uuid2)?;
            txn.commit()?;
        }

@ -689,15 +689,15 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
-            txn.add_to_working_set(uuid2.clone())?;
+            txn.add_to_working_set(&uuid1)?;
+            txn.add_to_working_set(&uuid2)?;
            txn.commit()?;
        }

        {
            let mut txn = storage.txn()?;
            txn.remove_from_working_set(1)?;
-            txn.add_to_working_set(uuid1.clone())?;
+            txn.add_to_working_set(&uuid1)?;
            txn.commit()?;
        }

@ -718,7 +718,7 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
+            txn.add_to_working_set(&uuid1)?;
            txn.commit()?;
        }

@ -742,16 +742,16 @@ mod test {

        {
            let mut txn = storage.txn()?;
-            txn.add_to_working_set(uuid1.clone())?;
-            txn.add_to_working_set(uuid2.clone())?;
+            txn.add_to_working_set(&uuid1)?;
+            txn.add_to_working_set(&uuid2)?;
            txn.commit()?;
        }

        {
            let mut txn = storage.txn()?;
            txn.clear_working_set()?;
-            txn.add_to_working_set(uuid2.clone())?;
-            txn.add_to_working_set(uuid1.clone())?;
+            txn.add_to_working_set(&uuid2)?;
+            txn.add_to_working_set(&uuid1)?;
            txn.commit()?;
        }

--- a/src/taskstorage/mod.rs
+++ b/src/taskstorage/mod.rs
@ -70,7 +70,7 @@ pub trait TaskStorageTxn {

    /// Add a task to the working set and return its (one-based) index.  This index will be one greater
    /// than the highest used index.
-    fn add_to_working_set(&mut self, uuid: Uuid) -> Fallible<u64>;
+    fn add_to_working_set(&mut self, uuid: &Uuid) -> Fallible<u64>;

    /// Remove a task from the working set.  Other tasks' indexes are not affected.
    fn remove_from_working_set(&mut self, index: u64) -> Fallible<()>;
@ -86,20 +86,6 @@ pub trait TaskStorageTxn {
 /// A trait for objects able to act as backing storage for a DB.  This API is optimized to be
 /// easy to implement, with all of the semantic meaning of the data located in the DB
 /// implementation, which is the sole consumer of this trait.
-///
-/// Conceptually, task storage contains the following:
-///
-///  - tasks: a set of tasks indexed by uuid
-///  - base_version: the number of the last version sync'd from the server
-///  - operations: all operations performed since base_version
-///  - working_set: a mapping from integer -> uuid, used to keep stable small-integer indexes
-///    into the tasks.  The replica maintains this list.  It is not covered by operations.
-///
-///  The `operations` are already reflected in `tasks`, so the following invariant holds:
-///  > Applying `operations` to the set of tasks at `base_version` gives a set of tasks identical
-///  > to `tasks`.
-///
-///  It is up to the caller (DB) to maintain this invariant.
 pub trait TaskStorage {
    /// Begin a transaction
    fn txn<'a>(&'a mut self) -> Fallible<Box<dyn TaskStorageTxn + 'a>>;