Collection: Difference between revisions

From ATProto Wiki
No edit summary
No edit summary
Line 16: Line 16:


=== Indexing and Querying ===
=== Indexing and Querying ===
Collections play a critical role in the network's indexing infrastructure. [[Relay|Relays]] and [[AppView|AppViews]] use collection NSIDs to filter and process specific types of records from the [[firehose]] of network activity. This allows them to build specialized indices and provide efficient query capabilities for particular content types.
Collections are an indispensible part of the network's indexing infrastructure. [[Relay|Relays]] and [[AppView|AppViews]] use collection NSIDs to filter and process specific types of records from the [[firehose]] of network activity. This allows them to build specialized indices and provide efficient query capabilities for particular content types. For example, an AppView might specifically index the <code>app.bsky.feed.post</code> and <code>app.bsky.feed.like</code> collections to build a timeline view, while ignoring other collections that aren't relevant to that particular feature.
 
For example, an AppView might specifically index the <code>app.bsky.feed.post</code> and <code>app.bsky.feed.like</code> collections to build a timeline view, while ignoring other collections that aren't relevant to that particular feature.


=== Data Portability ===
=== Data Portability ===

Revision as of 22:16, 14 March 2025

A collection in the AT Protocol is a fundamental organizational framework that groups related records within a user's data repository. Collections serve as namespaces for storing specific types of content to allow for efficient organization, retrieval, and indexing of data across the ATmosphere.

Each record in a user's belongs to exactly one collection, which is specified by a Namespaced Identifier (NSID). The collection NSID corresponds to the record's type, creating a direct relationship between the data's structure and its storage location. For example, posts in the Bluesky social application are stored in the app.bsky.feed.post collection.

Collections provide a logistical separation of different types of content while maintaining a unified repository structure. This organization enables efficient querying and indexing of specific content types accross the network.

Architecture

Structure and Addressing

Within a collection, individual records are identified by a unique key, known as a record key (rkey). The combination of a user's Decentralized Identifier (DID) or handle, a collection NSID, and a record key forms an AT URI that uniquely identifies any record within the network. For example:

at://did:plc:abcdef123456/app.bsky.feed.post/3jui7kdo2ck

This URI points to a specific post (with key 3jui7kdo2ck in the app.bsky.feed.post collection of the user with the specified DID.

Record keys are typically generated as Timestamp Identifiers (TID) which embed creation time information, though other formats are supported for specific use cases. The key format requirements are defined by the lexicon for each collection's record type.

Indexing and Querying

Collections are an indispensible part of the network's indexing infrastructure. Relays and AppViews use collection NSIDs to filter and process specific types of records from the firehose of network activity. This allows them to build specialized indices and provide efficient query capabilities for particular content types. For example, an AppView might specifically index the app.bsky.feed.post and app.bsky.feed.like collections to build a timeline view, while ignoring other collections that aren't relevant to that particular feature.

Data Portability

The collection-based organization of repositories facilitate data portability within the AT Protocol. When users migrate between Personal Data Servers (PDSes), their entire repository, including all collections and records, can be exported and imported as a unit, preserving the organizational structure and relationships between records.

Security and Access Control

Collections provide a natural boundary for access control policies. While most collections in the current AT Protocol implementation contain public records, the architecture supports future extensions for collection-level privacy settings. For example, future implementations might include private or group-restricted collections that are only visible to specified users or groups, enabling private messaging and selective sharing features.

Repository Implementation

In the underlying data repository implementation, collections are represented as paths within the Merkle Search Tree (MST) data structure. The MST organizes records hierarchically, with collection NSIDs forming part of the peath for each record.

When a PDS receives a request to create or update a record, it validates that the record's $typr field matches the collection NSID where it's being stored to ensure type consistency within collections and prevent misplaced records.