A structured hierarchy of energy-efficiency principles for guiding LLM agents in automated code optimization.
About
A principled rulebook that turns any LLM coding agent into an energy-efficiency auditor, without ad-hoc prompting.
Instead of ad-hoc prompts, the agent evaluates every refactoring against a formal hierarchy of principles spanning algorithmic complexity, memory behavior, and infrastructure utilization.
Designed for real-world codebases. The agent scans repositories, proposes principled refactorings, and measures energy impact using built-in profiling tools.
Meta-principles guarantee correctness, safety, and API stability. The agent self-critiques every proposal against a checklist before presenting it to a human reviewer.
The constitution defines what to optimize. Language and infrastructure-specific skill documents define how. Skills are fetched on demand based on project detection.
Four reproducible benchmark suites (llama.cpp, FFmpeg, Pasteur, Scenarios) with prepare, build, test, and profiling scripts for consistent evaluation.
Built-in profiler measures CPU and GPU energy via RAPL, perf, SPBM hwmon, or nvidia-smi. Optional smart plug integration for wall power measurement.
Principles
When principles conflict, higher tiers always win. Within the same tier, prefer the higher measured energy impact.
Order-of-magnitude waste: N+1 queries, unbuffered I/O, O(n²) string concatenation, missing indexes, recursive event loops, unmemoized recursion, idle cloud resources, unbatched DB operations.
Significant waste: blocking I/O on hot paths, heap allocation in tight loops, resource leaks, SELECT *, uncompressed payloads, linear-search lookups, no auto-scaling, missing caching, unbounded queries, chatty microservices.
Moderate waste: unsized collections, invariant computation in loops, JSON for internal APIs, oversized pods, interpreted CPU-bound services, bloated container images, missing circuit breakers.
Minor or contextual: dead code, redundant environments, non-ephemeral CI, carbon-unaware scheduling, unoptimized client bundles.
Architecture
The constitution provides the principles. Benchmark prompts define the goal. The agent profiles, optimizes in a loop, and commits each improvement with measured results.
Setup
Measure
Optimize
Finalize
Usage
Add the GreenCode Constitution as a skill to your LLM coding agent. The agent fetches technology-specific guidance on demand.
The generated skill.md contains the full constitution plus a skill resolution table. Give it to your agent as a system prompt, tool, or skill file.
curl -sfL https://greencode-constitution.org/skill.md -o skill.md
Depending on your framework, attach skill.md as a system prompt, a skill document, or a file the agent can read. For Claude Code, place it in your project's .claude/ directory or reference it as a custom skill.
# Example: add as a Claude Code project instruction cp skill.md .claude/skill-greencode.md # Or reference the URL directly in your agent config # skill_url: https://greencode-constitution.org/skill.md
The skill document instructs the agent to scan the repository for technology markers (e.g., requirements.txt, pom.xml, Dockerfile) and fetch matching skill docs from greencode-constitution.org/docs/.
# The agent runs detection automatically, then fetches e.g.: # https://greencode-constitution.org/docs/code/python.md # https://greencode-constitution.org/docs/architecture/docker.md
Run the built-in energy profiler to measure actual CPU and GPU energy consumption before and after optimization.
bash <(curl -sfL https://greencode-constitution.org/profile.sh) -- python my_app.py
Outputs CPU joules, GPU joules, wall time, and CPU time. Supports multiple measurement backends:
Languages
Infrastructure
Guides
Benchmarks
Reproducible benchmarks for measuring energy improvements. Each suite includes prepare, build, test, and profiling scripts.
Qwen3-8B quantized model. Measures prompt processing and text generation throughput with GPU energy profiling via nvidia-smi and Nsight.
1080p CPU and 4K GPU transcoding pipelines with complex filter graphs. Auto-detects NVENC for GPU-accelerated encoding.
Kedro-based data synthesis pipeline with the mimic_core dataset. CPU-intensive workload exercising Python data processing.
~45 scenarios from LeetCode, CLBG, and micro-benchmarks across C, C++, Java, Python, Ruby, and Rust.
Results
Evaluated on open-source projects. All measurements via SPBM hardware energy accumulators on DGX Spark (GB10), except llama.cpp kernel-only (nvidia-smi).
Every proposed refactoring passes an 8-point checklist before the agent presents it to a human reviewer: profiling verification, correctness, safety, scope, principle citation, impact estimation, trade-off assessment, and test compatibility. A 4-point post-proposal review then checks the proposal as if reviewing a pull request. If any check fails, the proposal is revised or discarded.