Securing the code that writes code: A look inside Augment Code's AI platform
At Augment, we're building cutting-edge AI to help software engineering organizations write better code faster. But great power comes with great responsibility, and security is paramount when dealing with something as sensitive as source code. We've built our platform from the ground up with a security-first mindset, incorporating multiple layers of protection to safeguard your data and ensure the integrity of our AI models. Here’s a peek under the hood.
Code Context Awareness with Proof of Possession
Our AI doesn't just generically suggest code snippets. It deeply understands the context of your project. It can analyze your codebase, identify dependencies, and recognize coding style conventions. This "context awareness" is powered by our retrieval system, which allows the AI to access and understand the relevant parts of your codebase. But how do we ensure that the AI only accesses the code it needs and nothing more?
Imagine our AI as a diligent librarian who needs to locate the exact books (your code files) to answer your questions. Instead of granting the AI free reign over the entire library, we utilize a system of proof of possession.
Here's how it works:
- Hashing for Identification: Our VS Code or IntelliJ extension acts as a librarian's assistant, meticulously cataloging each file in your workspace (your "library") by calculating its unique SHA256 hash. Think of this hash as a fingerprint for each file.
- Secure Upload: This "assistant" then checks our per-tenant index. If a file's fingerprint isn't already present (meaning it's new or modified), it's securely uploaded for indexing. This ensures our AI has access to the latest version of your code.
- Contextualized Requests: When you use our AI features (like code completion or chat), the extension sends the fingerprints of all relevant files to the AI. This acts as a "request slip" for the librarian, specifying exactly which "books" (files) the AI is allowed to access.
- Restricted Access: Our retrieval system acts as the librarian, carefully retrieving only the files corresponding to the provided fingerprints. This ensures the AI only sees the code you've explicitly granted it access to.
Example
Let's say you have three files in your project:user_auth.js
, product_catalog.js
, and payment_gateway.js
. The fingerprints of the three files are 0xca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
, here shorted to 0xca…bb
, and two more fingerprints 0x3e…9d
, and 0x2e…c6
. Another developer in your organization might have access to user_auth.js
, api_integration.js
, and database_config.js
. The fingerprints are 0xca…bb
, 0x2e…c6
, and 0x25…11
. So, the index between the two users overlaps by one file and each user has two files that are local to them. With proof of possession, our AI will only consider the files you have access to when generating code for your requests. For example, if the user sends the first set of three fingerprints, the AI will access or utilize information from files like api_integration.js
or database_config.js
because it never receives their fingerprints.
Let's say a user wants to gain access to the content of the database_config.js
file they do not have access to via the Augment context system. The retrieval will not include database_config.js
in the search unless the present the hash of the file content. However, if they would have access to the hash, they would already have access to the file itself. You need to know the file content to generate the correct hash. You cannot guess the hash.
This granular control ensures that your proprietary code remains secure and confidential, preventing any accidental or intentional access to sensitive information by the AI or other users.
Secure Multi-Tenancy with Service Tokens
In a cloud-based platform like ours, where multiple customers share the same infrastructure, data isolation is crucial. We guarantee your code and data remain completely separate from other users through a robust system of authentication, authorization, and service tokens.
Authentication & Authorization
When you log in, you're not just verifying your identity, but also gaining access to your organization's specific workspace within our platform. This is where the magic of service tokens comes in. After verifying the identity a short-term service token is attached to all internal network communications related to a request.
Service Tokens: Your Secure Key
Think of a service token as a personalized keycard granting access to your organization's resources. Each time you interact with our platform, this "keycard" is checked to ensure you have the right permissions.
These tokens aren't just a simple "yes/no" key. They contain detailed information about the user, the organization, and the permissions within the organization. This means you can be granted specific access rights, like the ability to upload code, while being restricted from accessing other sensitive data. We call these fine-grained permissions "scopes."
Every request you make to our system – whether it's uploading a new code file or requesting AI assistance – is accompanied by your service token. This token is rigorously validated at every step, especially when accessing our data and storage infrastructure. This ensures that every action is authorized and within your permitted scope.
For example, the file contents are stored in Google Cloud Bigtable, a low latency scale-out NoSQL database. Each access to bigtable is controlled by a transparent bigtable proxy which will validate each read and write operation and the Service Token that is attached to the RPC. This ensures that only authorized requests are permitted, e.g. a user can only read the content of a retrieval embedding file of a customer tenant if the service token proves the permission to do so.
MTLS Encryption
To further enhance security, all internal communication within our platform is encrypted using Mutual TLS (MTLS). This ensures that data exchanged between different services is protected from eavesdropping and tampering, even within our own infrastructure.
This multi-layered approach ensures that your data remains completely invisible and inaccessible to other users, guaranteeing confidentiality and preventing any cross-contamination.
Audited Access Control to Production Services via Genie
Protecting your code and data also means securing our own infrastructure. We achieve this through strict access control and auditing, ensuring only authorized personnel can access sensitive systems and information.
- Limited Access by Default: Our engineers have very limited access to production infrastructure. They can see operational information like logs and service health, but not sensitive data or the ability to make changes.
- Genie for Elevated Access: When engineers need access to sensitive information or to make changes like restarting a service during an oncall incident, they must request permission through Genie, our internal access control system. Each request is time-limited, audited and requires approval from a second person, ensuring accountability and preventing unauthorized access.
- Auditing All Actions: All actions taken within the production environment are meticulously logged and audited, providing a comprehensive trail of activity. This allows us to quickly identify and address any potential security issues.
- Protecting Customer Support Data: We understand the sensitive nature of your interactions with our AI. That's why we have a system called Request Insight, which records customer requests, including chat messages, for a limited time to help us provide effective support. Access to this system is strictly controlled. Only in specific support cases can an engineer or AI researcher request time-limited access. This ensures that your sensitive data is handled with the utmost care and in accordance with our data processing and privacy policies. We do not train on your code or your messages, nor can engineers read them by default.
Building Trust Through Transparency
We believe that security is not just about implementing robust systems, but also about building trust through transparency. We're committed to open communication about our security practices, empowering our customers and potential employees to understand how we protect their data and the integrity of our platform.
The work of security is never done. We're currently working on additional security improvements like customer-managed encryption keys to further secure the Augment Code platform.
Join Us in Building the Future of Code
Software teams don't need to choose between fast, powerful AI coding tools and secure code. To see the power of Augment in your IDE, install Augment for free, today.
If you're an engineer passionate about security and AI, we invite you to join us! We're always looking for talented individuals to contribute to our mission of building a secure and innovative AI platform for code. Check out our careers page to explore exciting opportunities.