Data repository

From ATProto Wiki
Revision as of 19:15, 12 March 2025 by Baldemo.to (talk | contribs) (Page creation)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A data repository (repo) in the AT Protocol is a self-authenticating collection of data published by a single actor, stored in one or more Personal Data Servers (PDSes). Each repository functions as a personal datastore that maintains an actor's content (such as posts, profile information, and social connections) in a cryptographically verifiable structure.

Structure

Repositories are built on a Merkle Search Tree (MST) data structure, which reduces the entire state to a single root hash.

The repository structure consists of three main layers:

  • Commit: The signed root that authenticates the entire repository
  • Tree Nodes: The internal structure organizing the data
  • Records: The actual content (posts, likes, follows, etc.)

Each element in this structure is a DAG-CBOR object referenced by a Content Identifier (CID) hash.

Addressing

Content within repositories can be addressed using AT URIs, which follow a hierarchical pattern:

at://alice.com                      # Repository root
at://alice.com/app.bsky.feed.post   # Collection
at://alice.com/app.bsky.feed.post/1234  # Specific record

Identifiers

Repositories use several identifier types:

Data Integrity

When a repository is updated, the changes propagate through the Merkle tree, which results in a new root CID. The new root is signed by the repository owner's private key, creating a new commit. This mechanism ensures that all data can be cryptographically verified as authored by the repository owner while maintaining a complete history of changes to the repository.

Synchronization

Repositories can be synchronized across multiple PDSes, allowing users to maintain multiple distributed copies of their data.