Personal Data Server (PDS)

From ATProto Wiki

A Personal Data Server (PDS) is the main entry point and digital home of users within the AT Protocol. They store a user's data repository and blobs, manage user identity, and provides the APIs necessary for data queries, cryptographic signing, and other interactions with the broader network. PDSes provides an update stream for its data repositories, which are crawled by relays to broadcast new records in relay firehoses.

Core Functions[edit | edit source]

Repository Hosting and Management[edit | edit source]

The primary function of a PDS is hosting and managing public AT Protocol repositories. The PDS maintains the Merkle Search Tree (MST) data structure that stores user content, handling mutations and generating diffs between repository versions. These diffs are streamed as real-time updates to relays via WebSockets, allowing the network to efficiently aggregate and broadcast updates to user repositories.

The PDS validates all records against their lexicon schemas before adding them to the repository to ensure data integrity. It also provides a repository event stream that notifies subscribers of changes, including sequence numbers for ordering and a backfill period for catching up after disconnections.

For data portability, the PDS supports importing and exporting repositories as CAR files, allowing users to migrate between providers without losing their content or social connections.

Account and Identity Management[edit | edit source]

PDSes handle the complete lifecycle of user accounts, from creation through deletion or migration. They manage account security through email verification, password reset flows, and email change processes.

For AT Protocol identities, PDSes resolve and maintain the connection between a user's handle and decentralized Identifier (DID), and often manage a default handle namespace (base domain) for its users.

PDSes handle the user's cryptographic keys - both the AT Protocol signing key used to authenticate repository changes and the PLC rotation key used for identity operations. It implements the necessary cryptographic operations to generate keys, sign data, and validate signatures according to the protocol's specifications.

The PDS also handles identity operations through the PLC system, including handle changes and identity verification. When these changes occur, the PDS outputs identity and account events on its repository event stream to notify the network.

Blob Storage[edit | edit source]

Beyond text-based content, PDSes host and manages blobs - binary media files like images, videos, and other media files shared by users. They validate the content type of uploaded blobs, store them securely, and serve them publicly. Some implementations also detect sensitive metadata in blobs (like EXIF location data) and prevent upload unless explicitly overriden by the client.

PDSes track repository references to blobs and enforces restrictions on size and type. They manage the blob lifecycle by expiring never-referenced blobs and deleting blobs after their last referencing record is deleted. For moderation purposes, it also provides mechanisms to purge individual blobs from an account.

Authentication and Network Proxying[edit | edit source]

In the current network architecture, almost all client requests go through the user's PDS, which then proxies them to other services as needed. PDSes forward appropriate HTTP headers and handle authentication for these proxied requests. PDSes also handle authentication and authorization via OAuth.

PDSes implement authentication mechanisms for clients, including session management and App Password generation. Current development aims to elevate PDSes to become a full-fledges OAuth authorization server that will track registered client software, provide web interfaces to approve or reject client privileges, and enforce scope limitations on API and repository access.

Private Data and Preferences[edit | edit source]

PDSes store arbitrary lexicon-defined private preferences and configuration for users. This aids synchronization across multiple devices and clients to ensure that personal settings can be exported and re-imported during migration. The PDS provides endpoints for storing preference data and bulk import/export operations.

Architecture[edit | edit source]

PDSes are designed to be lightweight and modular. A single PDS can host anywhere from one to hundreds of thousands of user accounts, depending on its resources and configuration. PDSes are designed such that users can self-host their own PDS on modest hardware (even a Raspberry Pi). Service provides can host PDSes for many users efficiently, and users can migrate between PDSes without losing their identity or data.

Hosting Models[edit | edit source]

The AT Protocol supports various PDS hosting models. Users can run their own PDS on personal hardware or a virtual private server, giving them complete control over their data and server configuration. Most users use a PDS operated by a service provider, which may offer free or paid tiers with different features and capabilities.

PDS Entryway[edit | edit source]

For large-scale PDS hosting operations, an "Entryway" service provides centralized account distribution, session management, request routing, and OAuth authorization. Users of such services don't need to know which specific PDS hosts their account - they simply interact with the entryway domain. For example, Bluesky's PDS Entryway service (at bsky.social) distributes users across multiple physical PDSes while presenting a single logical service to users and applications.

When a user creates an account through an entryway like bsky.social, the service assigns them to a specific PDS behind the scenes. The entryway then handles routing requests to the appropriate server and manages authentication across the entire service. This architecture allows for efficient scaling while maintaining a simple user experience.

Data Portability[edit | edit source]

One of the key advantages of the AT Protocol is that users aren't locked into a specific PDS. If a user wants to change providers, they can export their repository and blobs from their current PDS, import this data to a new PDS, update their DID Document to point to the new PDS, and continue using the network with all their content and social connections intact. This portability ensures that users maintain ownership of their data and social graph regardless of which PDS they use.

Anti-Abuse Measures[edit | edit source]

While content moderation primarily happen at the AppView and Labeller level, PDSes implement several anti-abuse mechanisms:

  • PDSes can control and limit signups through mechanisms like invite codes, vouching, payment processing, CAPTCHAs, or phone verification to prevent bot-created accounts.
  • PDSes apply per-account and global rate limits on operations, including HTTP requests, repository operations, identity operations, proxied requests, blob uploads, and login attempts.
  • PDSes provide either a built-in interface or the ability to delegate the administration of accounts and blobs to an external moderation authority or service.
  • PDSes maintain a public contact mechanism for the administrative team (typically an email address) for abuse reports and other communications.

Technical Requirements[edit | edit source]

Running a PDS requires HTTP and WebSocket server capabilities, database storage for repositories and metadata, blob storage for media files, cryptographic signing capabilities, and network connectivity for federation. Resource requirements scale with the number of hosted users and their activity levels, but are generally modest compared to traditional social media platforms.

Future Developments[edit | edit source]

Upcoming upgrades and developments to PDSes include:

  • Full implementation of OAuth for more secure client authentication and authorization.
  • Direct involvement with private direct messaging (DMs), likely involving PDS-to-PDS communication.
  • Support for group-private content, also likely involveing PDS-to-PDS communication.
  • Live resolution of unknown lexicons for validation at record creation time, requiring PDSes to dynamically fetch and interpret new schema definitions.
  • Standardized email delivery mechanisms, with users controlling which services are allowed to send mail to them.

Further Reading[edit | edit source]