Relay

A Relay serves as a core indexing infrastructure within the AT Protocol. Relays aggregate data from across the network into a unified stream termed the firehose.

Relays occupy an intermediary role within the AT Protocol, aggregating data from Personal Data Servers (PDSes and providing the firehose to downstream services. AppViews, Feed Generators, Labellers, and other application-specific services consume the firehose to build specialized indexing and features.

Core Functions[edit | edit source]

The primary purpose of a relay is to collect, validate, and redistribute repository data from across the network. It does this by crawling user repositories hosted on PDSes throughout the network and consuming their real-time update streams. The relay verifies the cryptographic signatures and Merkle Tree proofs on all updates to ensure data integrity.

Importantly, relays themselves does not interpret or index the records in repositories, but simply stores and forwards them. As a result, developers can create new social experiences on top of the AT Protocol by defining new record types with a lexicon, and these records will flow through relays without requiring any changes to their code.

Each relay maintains its own replica of the repositories it tracks, allowing it to detect changes even when real-time notifications are interrupted. This replication provides resilience against network issues and ensures complete data coverage. When network interruptions occur, relays can periodically re-crawl repositories and compare them to its local replica to determine what records have been added or deleted.

Through continuous monitoring, relays produce a firehose - a consolidated stream of all network activity that notifies subscribers whenever records are added or deleted in any tracked repository. This firehose becomes the foundation for building network-wide services.

As part of its processing, relays perform initial data cleaning by discarding malformed updates, filtering out illegal content, and reducing high-volume spam, which helps downstream services focus on their specific functions rather than basic data validation.

Architecture[edit | edit source]

Operational Considerations[edit | edit source]

Running a relay requires significant resources compared to other AT Protocol services. As of mid-2024, maintaining a real-time copy of all user repositories for a network of 6 million users cost approximately $153 per month in storage and bandwidth alone, not including computational resources for processing and serving the data.

Due to these resource requirements, there are likely to be fewer relays than self-hosted PDSes. However, the protocol is designed to support multiple relays operating independently, such that no single entity has control over data distribution in the network.

Relays can be operated at different scales. Full-network relays track all repositories across the entire network, providing complete coverage. Partial-network relays might focus on specific communities, applications, or regions, reducing resource requirements. Specialized relays could serve particular use cases, such as academic research, brand monitoring, or archive preservation.

Decentralization Benefits[edit | edit source]

The relay architecture provides several key benefits for the AT Protocol's decentralization goals. No privileged access is required to operate a relay - since repositories are public, anyone can crawl and index them using the same protocols. If any relay operator violates user expectations, through censorship or other means, alternative relays can be created that don't have these issues.

Client applications can switch between different relays or even combine data from multiple relays to get a more complete view of the network. The separation between data storage (PDSes) and indexing (relays) allows for innovation at both layers without requiring changes to the entire system.