We are excited to share an update about how GitHub will be utilizing data to enhance our coding assistance capabilities. Starting April 24, we will use interaction data—including inputs, outputs, code snippets, and contextual information—from users of Copilot Free, Pro, and Pro+ to train and refine our AI models. Users of Copilot Business and Copilot Enterprise are not included in this update.
If you prefer not to participate, you can opt out by visiting your settings under the “Privacy” section. If you have previously opted out of data collection for product improvements, your preference will remain unchanged, ensuring that your data will not be used for training unless you decide to opt in.
This initiative adheres to established industry standards and aims to enhance model performance for all users. By participating, you will contribute to improving our models’ understanding of development workflows, providing more precise and secure code suggestions, and enhancing their ability to identify potential bugs before they are deployed.
Real-world Data Equals Smarter Models
Our initial models were constructed using a combination of publicly available data and carefully curated code samples. Over the past year, we have begun incorporating interaction data from Microsoft employees, resulting in significant improvements, particularly in acceptance rates across various programming languages.
The enhancements noted from Microsoft’s interaction data suggest that training on real-world interaction data can enhance model effectiveness across a broader range of use cases. Should you choose to join this initiative, the interaction data we may collect includes:
- Outputs you have accepted or modified
- Inputs submitted to GitHub Copilot, including code snippets presented to the model
- Code context surrounding your cursor position
- Your written comments and documentation
- File names, repository structure, and navigation patterns
- Your interactions with Copilot features (e.g., chat, inline suggestions)
- Your feedback on suggestions (such as thumbs up/down ratings)
Please note that this program does not collect:
- Interaction data from Copilot Business, Copilot Enterprise, or enterprise-owned repositories
- Data from users who choose to opt out of model training in their Copilot settings
- Content from your issues, discussions, or private repositories at rest. It is important to note that while Copilot can process code from private repositories when in use, this interaction data is essential for service operation and could be used for model training unless you opt out.
The data gathered in this program may be shared with GitHub affiliates, including Microsoft, but will not be shared with third-party AI providers or independent service vendors.
We believe that the future of AI-assisted development relies heavily on authentic interaction data from developers like you. That’s why we are utilizing interaction data from Microsoft and now plan to include data from GitHub employees as well.
If you decide to contribute your interaction data to help improve our models, we greatly appreciate it. Your input is invaluable in creating AI tools that better serve the entire developer community. Should you choose not to participate, that’s perfectly acceptable as well—you will still enjoy the full suite of AI features you’re accustomed to.
Together, we can continue to develop AI that streamlines your workflows and enables you to build better, more secure software at unprecedented speeds.
If you have any questions, please visit our FAQ and related discussion.