Generative AI/LLM Contribution Policy#
We ask that all contributions (through issues and pull requests) are made by a human who takes responsibility for the code, documentation, or comments they submit.
See our LLM policy below. Here’s a brief summary:
Responsibility: You are responsible for any code you submit to JupyterHub’s repositories, regardless of whether it was manually written or generated by AI.
Disclosure: You must disclose whether AI has been used to assist in the development of your pull request.
Code Quality. We will reject pull requests that we deem as AI slop.
Copyright. We reserve the right to reject any pull requests, AI generated or not, where the copyright is in question.
Communication. When interacting with developers (forum, discussions, issues, pull requests, etc.) do not use AI to speak for you (except for translation).
AI Agents. The use of an AI agent that writes code and then submits a pull request autonomously is not permitted.
Principles#
We, the JupyterHub community, value:
human co-creation: we are proud to collaborate with developers, researchers, educators, and infrastructure specialists across our global community. We value contributors over their contributions.
We respect their copyright over their work and appreciate their shared investment in the scientific open source ecosystem by licensing the materials under the BSD 3-Clause “New” or “Revised” License.
We respect the time taken to read and review contributions, maintaining a high standard and supportive community engagement.
security: JupyterHub is trusted infrastructure for hundreds of thousands of users and we prioritize keeping their data, code, and personal configuration information secure.
veracity: beyond security, we ensure our tools do what we think they do, and that we respect accuracy in communications within and beyond the scientific open source ecosystem.
our global society: we seek to minimise environmental impact and human exploitation in the development and deployment of JupyterHub infrastructure.
Our concerns#
“Generative AI” tools, such as LLMs, are
contributing to rapid over-burdening of the reviewing capacity for many open source projects.
Unlike human contributors, models cannot learn from feedback, vastly diminishing the long-term community benefit of the review process.
trained on very large datasets, leaving many unsettled questions about copyright and consent. US Copyright guidance
And in particular, the models in use by the commercial AI industry, which account for the vast majority of LLM-generated contributions today, are particularly destructive, as they are:
trained on very large datasets without credit or consent and do not respect the licenses (or lack thereof) of this input data.
Even if copyright of LLM output is settled safely worldwide, the training of models on open-licensed and/or proprietary inputs and generating outputs stripped of credit remains objectionable.
consuming very large amounts of energy and potable water, both during their training phases and as they are used to generate outputs. See the MIT Technology Review’s summaries of the literature at their Super Topic: AI and our energy future.
Policy#
Responsibility#
You are responsible for any code you submit to JupyterHub’s repositories, regardless of whether it was manually written or generated by AI. You must understand and be able to explain the code you submit as well as the existing related code. It is not acceptable to submit a patch that you cannot understand and explain yourself. In explaining your contribution, do not use AI to automatically generate descriptions. Always make sure you are comfortable saying “I am the author” before submitting code or a comment.
Disclosure#
If you’ve used AI in any part of the workflow that led to your contribution, you must disclose that you used AI and how you used it. Document which tool(s) have been used, how they were used, and specify what code or text is AI generated. We will reject any pull request that does not include the disclosure.
Code Quality#
Code generated by AI can be of low quality. Contributors are expected to submit code that meets JupyterHub’s standards. We will reject pull requests that we deem as AI slop. Do not waste developers’ time by submitting code that is fully or mostly generated by AI, and doesn’t meet our standards.
Copyright#
All code in JupyterHub is released under the BSD 3-clause copyright license. Contributors to JupyterHub license their code under the same license when it is included into JupyterHub’s version control repository. That means contributors must own the copyright of any code submitted to JupyterHub or must include the BSD 3-clause compatible open source license(s) associated with the submitted code in the patch. Code generated by AI may infringe on copyright and it is the submitter’s responsibility to not infringe. We reserve the right to reject any pull requests, AI generated or not, where the copyright is in question.
Communication#
When interacting with developers (forum, discussions, issues, pull requests, etc.) do not use AI to speak for you (except for translation). If the developers want to chat with a chatbot, they can do so themselves. Human-to-human communication is essential for an open source community to thrive.
AI Agents#
The use of an AI agent that writes code and then submits a pull request autonomously is
not permitted.
We will not include or accept supporting resources in AGENTS.md, CLAUDE.md or similar files to our repos as we prefer human contributors follow the
contribution guidelines that already exist.
If present, these files shall only include instructions to prevent generating contributions, such as that used by lobste.rs.
Other Resources#
While these do not formally form part of JupyterHub’s AI policy, the following resources may be helpful in understanding some pitfalls associated with using AI to contribute to JupyterHub:
Acknowledgements#
We thank the SciPy developers for their AI policy, upon which the policy section of this document is largely based. They in turn credit the SymPy AI Policy.
Alignment and agreement#
The JupyterHub team is diverse and passionate. We do not all hold the same perspectives around the differential - and different - harms and benefits that can come from the training and deployment of large language models.
In our creation of this policy we sought to maximise our interpersonal alignment, compromising where appropriate to reach agreement on an actionable policy.