If you run GitHub Actions on self-hosted runners, you've probably hit the capacity problem: your fleet handles normal load fine, but a release day floods the queue. One option is to overprovision your fleet -- more nodes, more idle runners, higher monthly cost. A better option is automatic overflow to GitHub-hosted runners. This post explains how a routing layer intercepts the webhook, makes a routing decision, and provisions a runner -- all before GitHub assigns the job.
The problem with runs-on labels
Standard GitHub Actions gives you two options:
runs-on: ubuntu-latest→ GitHub-hosted runnerruns-on: self-hosted→ your ARC runners (or whatever self-hosted fleet you operate)
These are static. You pick one per workflow step, and that's where the job runs. If your self-hosted fleet is at capacity when a job arrives, GitHub queues it there and waits -- it doesn't fall back to cloud runners on its own.
Hybrid routing solves this at the infrastructure layer. The routing decision happens outside your workflow files entirely. No workflow changes required. The evaluator intercepts jobs as they arrive and decides where they go based on current fleet capacity and your policy rules.
The webhook-first architecture
When GitHub Actions queues a job, GitHub emits a workflow_job webhook event with action: queued. This is the entry point.
GitHub → workflow_job.queued webhook → [Routing Evaluator] → decision
The routing evaluator must decide before GitHub assigns the job. The decision window is short -- typically a few seconds. If you miss it, GitHub has already queued the job with its original runner assignment.
Three possible outcomes:
self_hosted-- dispatch to an ARC runner via JIT provisioning; a runner is created specifically for this jobgithub_hosted-- let GitHub proceed with its own cloud runner assignment; the evaluator does nothingoverflow-- the self-hosted fleet is at capacity; GitHub falls through to a hosted runner
The routing policy
The evaluator follows a policy stored in Firestore and hot-reloaded without a redeploy. A simplified version looks like this:
routes:
- label_match: "ci-standard-runc-x64"
strategy: self_hosted_first
overflow: github_hosted
max_queue_depth: 20
- label_match: "agents-high-runc-x64"
strategy: self_hosted_only
overflow: reject
self_hosted_first means: use a self-hosted runner if capacity is available, fall back to GitHub-hosted if not. This is the typical setting for general CI workloads where cost matters but strict isolation is not required.
self_hosted_only means: self-hosted only. If the fleet is at capacity, reject the overflow -- do not send the job to cloud. This applies to workloads that need private network access, specific hardware, or must not leave your infrastructure.
Policies are stored per-org in the installations Firestore collection. Changes take effect immediately -- the evaluator reads the current policy on each webhook.
The JIT provisioning flow
When the decision is self_hosted, the routing evaluator triggers JIT runner provisioning. "JIT" here means a runner is created specifically for the single arriving job -- not pulled from a warm pool.
Here is the sequence:
- The routing evaluator writes the decision to Firestore (
workflowJobscollection) and calls the GitHub API to request a JIT configuration token for the job. - The JIT token is scoped to a single job -- it expires after use, and it identifies the specific queued job the runner must pick up.
- The token is written to Firestore (
pendingRunnerscollection withstatus: pending). - The cluster-side controller detects the new document (via Firestore
onSnapshot) and claims it in a transaction. - The controller creates a Kubernetes resource for the runner, injecting the JIT token as a flag (
--jitconfig). - The runner container starts, registers with GitHub using the JIT token, picks up the queued job, and executes it.
- When the job completes, the runner exits. The resource is cleaned up and the
pendingRunnersdocument is markedscheduled.
The Kubernetes resource hierarchy in ARC's standard CRD model looks like this:
AutoScalingRunnerSet (CRD)
⌊ EphemeralRunnerSet (CRD)
⌊ EphemeralRunner (CRD)
⌊ Kubernetes Pod ← the actual runner
If you want to understand how to set up ARC from scratch before adding routing on top, see Setting Up ARC on Kubernetes.
Fleet capacity check
Before committing to self_hosted vs overflow, the evaluator checks two Firestore collections:
Firestore: pendingRunners → count of in-flight runner provisions per scale set
Firestore: workflowJobs → count of currently running jobs per scale set
If (pendingRunners + runningJobs) >= maxRunners for the matched scale set, the evaluator returns overflow and lets GitHub handle runner assignment from its own pool.
This is an approximate check, not a lock. There is a small window where multiple workflow_job.queued webhooks can arrive in rapid succession, each pass the capacity check independently, and each trigger a JIT provision before the Firestore counts update. In practice this means a brief overcommit on runner count -- you might provision one or two runners past your configured maximum. The overcommit resolves itself quickly as the counts catch up.
What this looks like end-to-end
1. PR pushed → workflow triggers
2. GitHub emits workflow_job.queued webhook
3. Cloud Function (githubWebhook) receives webhook
4. Routing evaluator: checks policy + fleet capacity in Firestore
└ capacity available → JIT provisioning → runner Pod created → registers → job starts on self-hosted
└ at capacity → overflow → GitHub assigns to hosted runner → job starts on GitHub infra
5. Job completes → runner Pod terminates (self-hosted) OR GitHub cleans up (hosted)
6. Cost attribution written to Firestore → exported to BigQuery
Steps 3 and 4 happen synchronously inside the Cloud Function handler. The webhook response (HTTP 200) is returned to GitHub regardless of whether JIT provisioning succeeds or fails -- a provisioning failure is non-fatal. GitHub continues to hold the job in queue; if the JIT runner never registers, GitHub eventually times out and you can retry.
Cold start overhead
When the decision is self_hosted, there is latency between the webhook arriving and the job actually starting. With pre-pulled runner images on a healthy cluster, the typical overhead is 30-60 seconds from webhook receipt to job start. GitHub-hosted runners typically start in around 10 seconds, so there is a measurable tradeoff.
For latency-sensitive workloads, keep minRunners: 1 in your scale set configuration. A warm runner that is already registered with GitHub can pick up a queued job in 2-5 seconds -- no provisioning wait.
Monitoring routing decisions
Every routing decision is recorded. The relevant Firestore collections:
workflowJobs: one document per job, keyed by{'{'}orgName{'}'}_{'{'} runId{'}'}_{'{'}jobId{'}'}. Fields includeroutingDecision,routingReason,runnerId,runnerName,durationSeconds, andcostUsd.pendingRunners: one document per in-flight JIT provision. Status transitions frompending→scheduling→scheduled(orfailed).costAttribution: daily cost rollups per org, keyed by{'{'}orgName{'}'}_YYYY-MM-DD.
The Grafana dashboard shows routing split percentage, queue depth per scale set label, and runner provisioning latency from webhook to job start. BigQuery receives the cost attribution data from Firestore for historical analysis: what you actually spent on GitHub-hosted overflow vs. what it would have cost to run everything on GitHub-hosted runners.
This is how Stratus handles routing -- the webhook layer, the routing evaluator, and the JIT provisioning. If your team runs GitHub Actions and you want overflow routing without building the evaluator yourself:
Join the waitlist →