TaskChampion

TaskChampion implements the task storage and synchronization behind Taskwarrior. It includes an implementation with Rust and C APIs, allowing any application to maintain and manipulate its own replica. It also includes a specification for tasks and how they are synchronized, inviting alternative implementations of replicas or task servers.

Relationship with Taskwarrior

TaskChampion was originally developed as a project "inspired by" Taskwarrior, and later incorporated into Taskwarrior in its 3.0 release. Taskwarrior depends on TaskChampion, but does not have any kind of privileged access to its implementation details. Any other application can also use TaskChampion to implement similar functionality, and even interoperate with Taskwarrior either in the same replica or via sync.

Usage

Conceptual Overview

The following provide a brief overview of the core concepts in TaskChampion. Subsequent chapters, and API documentation, provide more detail.

Replica

A TaskChampion replica is a local copy of a user’s task data. As the name suggests, several replicas of the same data can exist (such as on a user’s laptop and on their phone) and can synchronize with one another.

A replica contains a collection of tasks, indexed by UUID. It also stores a working set, and ancillary information to support synchronization.

Task

A task is the unit of work that TaskChampion tracks. A task is represented as a map of strings to strings. The meaning of those strings are given in the task model.

Working Set

A working set contains, roughly, the tasks that are currently pending. It assigns a short, integer identifier to each such task, which is easier for users to remember and type. The working set can be "rebuilt" as the task list changes, updating the identifiers for some tasks.

Storage

Storage defines where and how tasks are stored.

Server

A server supports synchronizing tasks among several replicas. This may refer to an instance of taskchampion-sync-server or a number of other options.

APIs

Rust

TaskChampion is implemented in Rust, and this implementation represents its primary public API. It is documented at docs.rs/taskchampion.

Python

TaskChampion is available from Python via taskchampion-py. The Python API is closely modeled on the Rust API.

C/C++

C/C++ users should use a tool like cxx to wrap this package. See Taskwarrior for an example.

Task Model

Tasks are stored internally as a key/value map with string keys and values. Display layers should apply appropriate defaults where necessary.

Validity

Any key/value map is a valid task, including an empty task. Consumers of task data must make a best effort to interpret any map, even if it contains apparently contradictory information. For example, a task with status "completed" but no "end" key present should be interpreted as completed at an unknown time.

Atomicity

Replicas only synchronize with one another occasionally, so it is impossible to know the "current" state of a task with certainty. This makes some kinds of modifications challenging. For example, suppose task tags were updated by reading a list of tags from a property of the key/value map, adding a tag, and writing the result back. Suppose two such modifications are made in different replicas, one setting tags to "oldtag,newtag1" and one setting tags to "oldtag,newtag2". Reconciling these two changes on a sync operation would result in one change winning, losing one of the new tags.

The key names given below avoid this issue, allowing user updates such as adding a tag or deleting a dependency to be represented in a single modification.

Value Representations

Integers are stored in decimal notation.

Timestamps are stored as UNIX epoch timestamps, in the form of an integer.

Keys

The following keys, and key formats, are defined:

status - one of pending for a pending task (the default), completed, deleted, or recurring
description - the one-line summary of the task
modified - the time of the last modification of this task
start - the most recent time at which this task was started (a task with no start key is not active)
end - if present, the time at which this task was completed or deleted (note that this key may not agree with status: it may be present for a pending task, or absent for a deleted or completed task)
tag_<tag> - indicates this task has tag <tag> (value is ignored)
wait - indicates the time before which this task should be hidden, as it is not actionable
entry - the time at which the task was created
annotation_<timestamp> - value is an annotation created at the given time; for example, annotation_1693329505.
dep_<uuid> - indicates this task depends on another task identified by <uuid>; the value is ignored; for example, dep_8c4fed9c-c0d2-40c2-936d-36fc44e084a0

Note that while TaskChampion recognizes "R" as a status, it does not implement recurrence directly.

UDAs

Any unrecognized keys are treated as "user-defined attributes" (UDAs). These attributes can be used to store additional data associated with a task. For example, applications that synchronize tasks with other systems such as calendars or team planning services might store unique identifiers for those systems as UDAs. The application defining a UDA defines the format of the value.

Synchronization and the Sync Server

This section covers synchronization of replicas containing the same set of tasks. A replica can perform all operations locally without connecting to a sync server, then share those operations with other replicas when it connects. Sync is a critical feature of TaskChampion, allowing users to consult and update the same task list on multiple devices, without requiring constant connection.

This is a complex topic, and the section is broken into several chapters, beginning at the lower levels of the implementation and working up.

Synchronization Model

Synchronization occurs between disconnected replicas, mediated by a server. The replicas never communicate directly with one another. The server does not have access to the task data; it sees only opaque blobs of data with a small amount of metadata.

Operational Transforms

Synchronization is based on operational transformation. This section will assume some familiarity with the concept.

State and Operations

At a given time, the set of tasks in a replica's storage is the essential "state" of that replica. All modifications to that state occur via operations, as defined below. We can draw a network, or graph, with the nodes representing states and the edges representing operations. For example:

  o -- State: {abc-d123: 'get groceries', priority L}
  |
  | -- Operation: set abc-d123 priority to H
  |
  o -- State: {abc-d123: 'get groceries', priority H}

For those familiar with distributed version control systems, a state is analogous to a revision, while an operation is analogous to a commit.

Fundamentally, synchronization involves all replicas agreeing on a single, linear sequence of operations and the state that those operations create. Since the replicas are not connected, each may have additional operations that have been applied locally, but which have not yet been agreed on. The synchronization process uses operational transformation to "linearize" those operations.

This process is analogous (vaguely) to rebasing a sequence of Git commits. Critically, though, operations cannot merge; in effect, the operations must be rebased. Furthermore, once an operation has been sent to the server it cannot be changed; in effect, the server does not permit "force push".

Sync Operations

The operations are:

Create(uuid)
Delete(uuid)
Update(uuid, property, value, timestamp)

The Create form creates a new task. Creating a task that already exists has no effect.

Similarly, the Delete form deletes an existing task. Deleting a task that does not exist has no effect.

The Update form updates the given property of the given task, where the property and values are strings. The property is the property being updated, and the value gives its new value (or None to delete a property). The timestamp on updates serves as additional metadata and is used to resolve conflicts. Updating a task that does not exist has no effect.

Versions

Occasionally, database states are given a name (that takes the form of a UUID). The system as a whole (all replicas) constructs a branch-free sequence of versions and the operations that separate each version from the next. The version with the nil UUID is implicitly the empty database.

The server stores the operations to change a state from a "parent" version to a "child" version, and provides that information as needed to replicas. Replicas use this information to update their local task databases, and to generate new versions to send to the server.

In order to transmit new changes to the server, replicas generate a new version. The changes are represented as a sequence of operations with the state resulting from the final operation corresponding to the version. In order to keep the versions in a single sequence, the server will only accept a proposed version from a replica if its parent version matches the latest version on the server.

In the non-conflict case (such as with a single replica), then, a replica's synchronization process involves gathering up the operations it has accumulated since its last synchronization; bundling those operations into a version; and sending that version to the server.

Replica Invariant

A replica stores the current state of all tasks, a sequence of as-yet un-synchronized operations, and the last version at which synchronization occurred (the "base version"). The replica's un-synchronized operations are already reflected in its local tasks, so the following invariant holds:

Applying the replica's sequence of operations to the set of tasks at the base version gives a set of tasks identical to the replica's current state.

Transformation

When the latest version on the server contains operations that are not present in the replica, then the states have diverged. For example, here the replica has local operations a-b, but the server has a new version with operations w-z:

  o  -- version N
 w|\a
  o o
 x|  \b
  o   o
 y|    \c
  o     o -- replica's local state
 z|
  o -- version N+1

(diagram notation: o designates a state, lower-case letters designate operations, and versions are presented as if they were numbered sequentially)

In this situation, the replica must "rebase" the local operations onto the latest version from the server and try again. This process is performed using operational transformation (OT). The result of this transformation is a sequence of operations based on the latest version, and a sequence of operations the replica can apply to its local task database to reach the same state Continuing the example above, the resulting operations are shown with ':

  o  -- version N
 w|\a
  o o
 x|  \b
  o   o
 y|    \c
  o     o -- replica's intermediate local state
 z|     |w'
  o-N+1 o
 a'\    |x'
    o   o
   b'\  |y'
      o o
     c'\|z'
        o  -- version N+2

The replica applies w' through z' locally, and sends a' through c' to the server as the operations to generate version N+2. Either path through this graph, a-b-c-w'-x'-y'-z' or a'-b'-c'-w-x-y-z, must generate precisely the same final state at version N+2. Careful selection of the operations and the transformation function ensures this.

See the comments in the source code for the details of how this transformation process is implemented.

Synchronization Process

To perform a synchronization, the replica first requests the child version of its stored base version from the server (GetChildVersion). If such a version exists, it applies the transformation described above, resulting in an updated state and an updated list of local operations. The replica repeats this process until the server indicates no additional child versions exist. If there are no un-synchronized local operations, the process is complete.

Otherwise, the replica creates a new version containing its local operations, giving its base version as the parent version, and transmits that to the server (AddVersion). In most cases, this will succeed, but if another replica has created a new version in the interim, then the new version will conflict with that other replica's new version and the server will respond with the new expected parent version. In this case, the entire process repeats. If the server indicates a conflict twice with the same expected base version, that is an indication that the replica has diverged (something serious has gone wrong).

Servers

A replica depends on periodic synchronization for performant operation. Without synchronization, its list of pending operations would grow indefinitely, and tasks could never be expired. So all replicas, even "singleton" replicas which do not replicate task data with any other replica, should synchronize periodically.

TaskChampion provides a LocalServer for this purpose. It implements the get_child_version and add_version operations as described, storing data on-disk locally.

Snapshots

The basic synchronization model described in the previous page has a few shortcomings:

servers must store an ever-increasing quantity of versions
a new replica must download all versions since the beginning (the nil UUID) in order to derive the current state

Snapshots allow TaskChampion to avoid both of these issues. A snapshot is a copy of the task database at a specific version. It is created by a replica, encrypted, and stored on the server. A new replica can simply download a recent snapshot and apply any additional versions synchronized since that snapshot was made. Servers can delete and reclaim space used by older versions, as long as newer snapshots are available.

Snapshot Heuristics

A server implementation must answer a few questions:

How often should snapshots be made?
When can versions be deleted?
When can snapshots be deleted?

A critical invariant is that at least one snapshot must exist for any database that does not have a child of the nil version. This ensures that a new replica can always derive the latest state.

Aside from that invariant, the server implementation can vary in its answers to these questions, with the following considerations:

Snapshots should be made frequently enough that a new replica can initialize quickly.

Existing replicas will fail to synchronize if they request a child version that has been deleted. This failure can cause data loss if the replica had local changes. It's conceivable that replicas may not sync for weeks or months if, for example, they are located on a home computer while the user is on holiday.

Requesting New Snapshots

The server requests snapshots from replicas, indicating an urgency for the request. Some replicas, such as those running on PCs or servers, can produce a snapshot even at low urgency. Other replicas, in more restricted environments such as mobile devices, will only produce a snapshot at high urgency. This saves resources in these restricted environments.

A snapshot must be made on a replica with no unsynchronized operations. As such, it only makes sense to request a snapshot in response to a successful AddVersion request.

Server-Replica Protocol

The server-replica protocol is defined abstractly in terms of request/response transactions.

The protocol builds on the model presented in the previous chapters, and in particular on the synchronization process.

Clients

From the protocol's perspective, replicas accessing the same task history are indistinguishable, so this protocol uses the term "client" to refer generically to all replicas replicating a single task history.

Server

A server implements the requests and responses described below. Where the logic is implemented depends on the specific implementation of the protocol.

For each client, the server is responsible for storing the task history, in the form of a branch-free sequence of versions. It also stores the latest snapshot, if any exists. From the server's perspective, snapshots and versions are opaque byte sequences.

Version Invariant

The following invariant must always hold:

All versions are linked by parent-child relationships to form a single chain. That is, each version must have no more than one parent and one child, and no more than one version may have zero parents or zero children.

Data Formats

Task data sent to the server is encrypted by the client, using the scheme described in the "Encryption" chapter.

Version

The decrypted form of a version is a JSON array containing operations in the order they should be applied. Each operation has the form {TYPE: DATA}, for example:

[{"Create":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}]
[{"Delete":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7"}}]
[{"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":"v","timestamp":"2021-10-11T12:47:07.188090948Z"}}]
[{"Update":{"uuid":"56e0be07-c61f-494c-a54c-bdcfdd52d2a7","property":"prop","value":null,"timestamp":"2021-10-11T12:47:07.188090948Z"}}] (to delete a property)

Timestamps are in RFC3339 format with a Z suffix.

Snapshot

The decrypted form of a snapshot is a JSON object mapping task UUIDs to task properties. For example (pretty-printed for clarity):

{
 "56e0be07-c61f-494c-a54c-bdcfdd52d2a7": {
   "description": "a task",
   "priority": "H"
 },
 "4b7ed904-f7b0-4293-8a10-ad452422c7b3": {
   "description": "another task"
 }
}

Transactions

All interactions between the client and server are defined in terms of request/response transactions, as described here.

AddVersion

The AddVersion transaction requests that the server add a new version to the client's task history. The request contains the following;

parent version ID, and
encrypted version data.

The server determines whether the new version is acceptable, atomically with respect to other requests for the same client. If it has no versions for the client, it accepts the version. If it already has one or more versions for the client, then it accepts the version only if the given parent version has no children, thereby maintaining the version invariant.

If the version is accepted, the server generates a new version ID for it. The version is added to the chain of versions for the client, and the new version ID is returned in the response to the client. The response may also include a request for a snapshot, with associated urgency.

If the version is not accepted, the server makes no changes, but responds to the client with a conflict indication containing the ID of the version which has no children. The client may then "rebase" its operations and try again. Note that if a client receives two conflict responses with the same parent version ID, it is an indication that the client's version history has diverged from that on the server.

GetChildVersion

The GetChildVersion transaction is a read-only request for a version. The request consists of a parent version ID. The server searches its set of versions for a version with the given parent ID. If found, it returns the version's

version ID,
parent version ID (matching that in the request), and
encrypted version data.

If found, it returns a not-found indication. Note that this circumstance is not an error, and occurs during every successful sync process.

Implementations may return a different error code if the version is "gone" (not found and would not be accepted by AddVersion) in order to provide better error messages to the user. This may occur if a client has not been synced for a long time and its base version has expired, or if the server storage has been corrupted.

Note that GetChildVersion should return the not-found when no versions exist. This allows a client to switch to a new server and continue uploading the same sequence of versions.

AddSnapshot

The AddSnapshot transaction requests that the server store a new snapshot, generated by the client. The request contains the following:

version ID at which the snapshot was made, and
encrypted snapshot data.

The server may validate that the snapshot is for an existing version and is newer than any existing snapshot. It may also validate that the snapshot is for a "recent" version (e.g., one of the last 5 versions). If a snapshot already exists for the given version, the server may keep or discard the new snapshot but should return a success indication to the client.

The server response is empty.

GetSnapshot

The GetSnapshot transaction requests that the server provide the latest snapshot. The response contains the snapshot version ID and the snapshot data, if those exist.

Encryption

The client configuration includes an encryption secret of arbitrary length. This section describes how that information is used to encrypt and decrypt data sent to the server (versions and snapshots).

Encryption is not used for local (on-disk) sync, but is used for all cases where data is sent from the local host.

Key Derivation

The client derives the 32-byte encryption key from the configured encryption secret using PBKDF2 with HMAC-SHA256 and 600,000 iterations. The salt value depends on the implementation of the protocol, as described in subsequent chapters.

Encryption

The client uses AEAD, with algorithm CHACHA20_POLY1305. The client should generate a random nonce, noting that AEAD is not secure if a nonce is used repeatedly for the same key.

AEAD supports additional authenticated data (AAD) which must be provided for both open and seal operations. In this protocol, the AAD is always 17 bytes of the form:

app_id (byte) - always 1
version_id (16 bytes) - 16-byte form of the version ID associated with this data
- for versions (AddVersion, GetChildVersion), the parent version_id
- for snapshots (AddSnapshot, GetSnapshot), the snapshot version_id

The app_id field is for future expansion to handle other, non-task data using this protocol. Including it in the AAD ensures that such data cannot be confused with task data.

Although the AEAD specification distinguishes ciphertext and tags, for purposes of this specification they are considered concatenated into a single bytestring as in BoringSSL's EVP_AEAD_CTX_seal.

Representation

The final byte-stream is comprised of the following structure:

version (byte) - format version (always 1)
nonce (12 bytes) - encryption nonce
ciphertext (remaining bytes) - ciphertext from sealing operation

The version field identifies this data format, and future formats will have a value other than 1 in this position.

HTTP Representation

The transactions in the sync protocol are realized for an HTTP server at <base_url> using the HTTP requests and responses described here. The base_url should be an HTTPS endpoint on general principle, but nothing in the functionality or security of the protocol depends on connection encryption.

The replica identifies itself to the server using a client_id in the form of a UUID. This value is passed with every request in the X-Client-Id header, in its dashed-hex format.

The salt used in key derivation is the 16-byte client ID.

AddVersion

The request is a POST to <base_url>/v1/client/add-version/<parentVersionId>. The request body contains the history segment, optionally encoded using any encoding supported by actix-web. The content-type must be application/vnd.taskchampion.history-segment.

The success response is a 200 OK with an empty body. The new version ID appears in the X-Version-Id header. If included, a snapshot request appears in the X-Snapshot-Request header with value urgency=low or urgency=high.

On conflict, the response is a 409 CONFLICT with an empty body. The expected parent version ID appears in the X-Parent-Version-Id header.

Other error responses (4xx or 5xx) may be returned and should be treated appropriately to their meanings in the HTTP specification.

GetChildVersion

The request is a GET to <base_url>/v1/client/get-child-version/<parentVersionId>.

The response is determined as described above. The not-found response is 404 NOT FOUND. The gone response is 410 GONE. Neither has a response body.

On success, the response is a 200 OK. The version's history segment is returned in the response body, with content-type application/vnd.taskchampion.history-segment. The version ID appears in the X-Version-Id header. The requested parent version ID appears in the X-Parent-Version-Id header. The response body may be encoded, in accordance with any Accept-Encoding header in the request.

On failure, a client should treat a 404 NOT FOUND as indicating that it is up-to-date. Clients should treat a 410 GONE as a synchronization error. If the client has pending changes to send to the server, based on a now-removed version, then those changes cannot be reconciled and will be lost. The client should, optionally after consulting the user, download and apply the latest snapshot.

AddSnapshot

The request is a POST to <base_url>/v1/client/add-snapshot/<versionId>. The request body contains the snapshot data, optionally encoded using any encoding supported by actix-web. The content-type must be application/vnd.taskchampion.snapshot.

If the version is invalid, as described above, the response should be 400 BAD REQUEST. The server response should be 200 OK on success.

GetSnapshot

The request is a GET to <base_url>/v1/client/snapshot.

The response is a 200 OK. The snapshot is returned in the response body, with content-type application/vnd.taskchampion.snapshot. The version ID appears in the X-Version-Id header. The response body may be encoded, in accordance with any Accept-Encoding header in the request.

After downloading and decrypting a snapshot, a client must replace its entire local task database with the content of the snapshot. Any local operations that had not yet been synchronized must be discarded. After the snapshot is applied, the client should begin the synchronization process again, starting from the snapshot version.

Object Store Representation

TaskChampion also supports use of a generic key-value store to synchronize replicas.

In this case, the salt used in key derivation is a random 16-byte value, stored in the object store and retrieved as needed.

The details of the mapping from this protocol to keys and values are private to the implementation. Other applications should not access the key-value store directly.

Internal Details

The following sections get into the details of how TaskChampion works. None of this information is necessary to use TaskChampion, but might be helpful in understanding its behavior. Developers of TaskChampion and of tools that integrate with TaskChampion should be familiar with this information.

Replica Storage

Each replica has a storage backend. The interface for this backend is given in crate::taskstorage::Storage and StorageTxn.

The storage is transaction-protected, with the expectation of a serializable isolation level. The storage contains the following information:

tasks: a set of tasks, indexed by UUID
base_version: the number of the last version sync'd from the server (a single integer)
operations: all operations performed since base_version
working_set: a mapping from integer -> UUID, used to keep stable small-integer indexes into the tasks for users' convenience. This data is not synchronized with the server and does not affect any consistency guarantees.

Tasks

The tasks are stored as an un-ordered collection, keyed by task UUID. Each task in the database is represented by a key-value map. See Tasks for details on the content of that map.

Operations

Every change to the task database is captured as an operation. In other words, operations act as deltas between database states. Operations are crucial to synchronization of replicas, described in Synchronization Model.

Operations are entirely managed by the replica, and some combinations of operations are described as "invalid" here. A replica must not create invalid operations, but should be resilient to receiving invalid operations during the synchronization process.

Each operation has one of the forms

Create(uuid)
Delete(uuid, oldTask)
Update(uuid, property, oldValue, newValue, timestamp)
UndoPoint()

The Create form creates a new task. It is invalid to create a task that already exists.

Similarly, the Delete form deletes an existing task. It is invalid to delete a task that does not exist. The oldTask property contains the task data from before it was deleted.

The Update form updates the given property of the given task, where the property and values are strings. The oldValue gives the old value of the property (or None to create a new property), while newValue gives the new value (or None to delete a property). It is invalid to update a task that does not exist. The timestamp on updates serves as additional metadata and is used to resolve conflicts.

Application

Each operation can be "applied" to a task database in a natural way:

Applying Create creates a new, empty task in the task database.
Applying Delete deletes a task, including all of its properties, from the task database.
Applying Update modifies the properties of a task.
Applying UndoPoint does nothing.

Undo

Each operation also contains enough information to reverse its application:

Undoing Create deletes a task.
Undoing Delete creates a task, including all of the properties in oldTask.
Undoing Update modifies the properties of a task, reverting to oldValue.
Undoing UndoPoint does nothing.

The UndoPoint operation serves as a marker of points in the operation sequence to which the user might wish to undo. For example, creation of a new task with several properties involves several operations, but is a single step from the user's perspective. An "undo" command reverses operations, removing them from the operations sequence, until it reaches an UndoPoint operation.

Synchronizing Operations

After operations are synchronized to the server, they can no longer be undone. As such, the synchronization model uses simpler operations. Replica operations are converted to sync operations as follows:

Create(uuid) -> Create(uuid) (no change)
Delete(uuid, oldTask) -> Delete(uuid)
Update(uuid, property, oldValue, newValue, timestamp) -> Update(uuid, property, newValue, timestamp)
UndoPoint() -> Ø (dropped from operation sequence)

Storage

The storage backend stores all operations that apply to existing tasks, tracking which have and have not been synchronized. Only unsynchronized operations are used in the synchronization process. Synchronized operations are kept as a log of changes to the relevant tasks. Synchronized UndoPoint operations are not stored.

Task Database

The task database is a layer of abstraction above the replica storage layer, responsible for maintaining some important invariants. While the storage is pluggable, there is only one implementation of the task database.

Reading Data

The task database provides read access to the data in the replica's storage through a variety of methods on the struct. Each read operation is executed in a transaction, so data may not be consistent between read operations. In practice, this is not an issue for TaskChampion's purposes.

Working Set

The task database maintains the working set. The working set maps small integers to current tasks, for easy reference by command-line users. This is done in such a way that the task numbers remain stable until the working set is rebuilt, at which point gaps in the numbering, such as for completed tasks, are removed by shifting all higher-numbered tasks downward.

The working set is not replicated, and is not considered a part of any consistency guarantees in the task database.

Modifying Data

Modifications to the data set are made by applying operations. Operations are described in Replica Storage.

Each operation is added to the list of operations in the storage, and simultaneously applied to the tasks in that storage. Operations are checked for validity as they are applied.

Deletion and Expiration

Deletion of a task merely changes the task's status to "deleted", leaving it in the Task database. Actual removal of tasks from the task database takes place as part of expiration. TaskChampion does not automatically expire tasks, although applications using TaskChampion, such as Taskwarrior, may do so. The expiration process removes tasks with a modified property more than 180 days in the past, by creating a Delete(uuid) operation.