How GitHub Actions Hybrid Runner Routing Works Under the Hood

If you run GitHub Actions on self-hosted runners, you've probably hit the capacity problem: your fleet handles normal load fine, but a release day floods the queue. One option is to overprovision your fleet -- more nodes, more idle runners, higher monthly cost. A better option is automatic overflow to GitHub-hosted runners. This post explains how a routing layer intercepts the webhook, makes a routing decision, and provisions a runner -- all before GitHub assigns the job.

The problem with `runs-on` labels

Standard GitHub Actions gives you two options:

runs-on: ubuntu-latest → GitHub-hosted runner
runs-on: self-hosted → your ARC runners (or whatever self-hosted fleet you operate)

These are static. You pick one per workflow step, and that's where the job runs. If your self-hosted fleet is at capacity when a job arrives, GitHub queues it there and waits -- it doesn't fall back to cloud runners on its own.

Hybrid routing solves this at the infrastructure layer. The routing decision happens outside your workflow files entirely. No workflow changes required. The evaluator intercepts jobs as they arrive and decides where they go based on current fleet capacity and your policy rules.

The webhook-first architecture

When GitHub Actions queues a job, GitHub emits a workflow_job webhook event with action: queued. This is the entry point.

GitHub → workflow_job.queued webhook → [Routing Evaluator] → decision

The routing evaluator must decide before GitHub assigns the job. The decision window is short -- typically a few seconds. If you miss it, GitHub has already queued the job with its original runner assignment.

Three possible outcomes:

self_hosted -- dispatch to an ARC runner via JIT provisioning; a runner is created specifically for this job
github_hosted -- let GitHub proceed with its own cloud runner assignment; the evaluator does nothing
overflow -- the self-hosted fleet is at capacity; GitHub falls through to a hosted runner

The routing policy

The evaluator follows a policy stored in Firestore and hot-reloaded without a redeploy. A simplified version looks like this:

routes:
  - label_match: "ci-standard-runc-x64"
    strategy: self_hosted_first
    overflow: github_hosted
    max_queue_depth: 20
  - label_match: "agents-high-runc-x64"
    strategy: self_hosted_only
    overflow: reject

self_hosted_first means: use a self-hosted runner if capacity is available, fall back to GitHub-hosted if not. This is the typical setting for general CI workloads where cost matters but strict isolation is not required.

self_hosted_only means: self-hosted only. If the fleet is at capacity, reject the overflow -- do not send the job to cloud. This applies to workloads that need private network access, specific hardware, or must not leave your infrastructure.

Policies are stored per-org in the installations Firestore collection. Changes take effect immediately -- the evaluator reads the current policy on each webhook.

The JIT provisioning flow

When the decision is self_hosted, the routing evaluator triggers JIT runner provisioning. "JIT" here means a runner is created specifically for the single arriving job -- not pulled from a warm pool.

Here is the sequence:

The routing evaluator writes the decision to Firestore (workflowJobs collection) and calls the GitHub API to request a JIT configuration token for the job.
The JIT token is scoped to a single job -- it expires after use, and it identifies the specific queued job the runner must pick up.
The token is written to Firestore (pendingRunners collection with status: pending).
The cluster-side controller detects the new document (via Firestore onSnapshot) and claims it in a transaction.
The controller creates a Kubernetes resource for the runner, injecting the JIT token as a flag (--jitconfig).
The runner container starts, registers with GitHub using the JIT token, picks up the queued job, and executes it.
When the job completes, the runner exits. The resource is cleaned up and the pendingRunners document is marked scheduled.

The Kubernetes resource hierarchy in ARC's standard CRD model looks like this:

AutoScalingRunnerSet (CRD)
  ⌊ EphemeralRunnerSet (CRD)
        ⌊ EphemeralRunner (CRD)
              ⌊ Kubernetes Pod  ← the actual runner

If you want to understand how to set up ARC from scratch before adding routing on top, see Setting Up ARC on Kubernetes.

Fleet capacity check

Before committing to self_hosted vs overflow, the evaluator checks two Firestore collections:

Firestore: pendingRunners → count of in-flight runner provisions per scale set
Firestore: workflowJobs   → count of currently running jobs per scale set

If (pendingRunners + runningJobs) >= maxRunners for the matched scale set, the evaluator returns overflow and lets GitHub handle runner assignment from its own pool.

This is an approximate check, not a lock. There is a small window where multiple workflow_job.queued webhooks can arrive in rapid succession, each pass the capacity check independently, and each trigger a JIT provision before the Firestore counts update. In practice this means a brief overcommit on runner count -- you might provision one or two runners past your configured maximum. The overcommit resolves itself quickly as the counts catch up.

What this looks like end-to-end

1. PR pushed → workflow triggers
2. GitHub emits workflow_job.queued webhook
3. Cloud Function (githubWebhook) receives webhook
4. Routing evaluator: checks policy + fleet capacity in Firestore
   └ capacity available → JIT provisioning → runner Pod created → registers → job starts on self-hosted
   └ at capacity → overflow → GitHub assigns to hosted runner → job starts on GitHub infra
5. Job completes → runner Pod terminates (self-hosted) OR GitHub cleans up (hosted)
6. Cost attribution written to Firestore → exported to BigQuery

Steps 3 and 4 happen synchronously inside the Cloud Function handler. The webhook response (HTTP 200) is returned to GitHub regardless of whether JIT provisioning succeeds or fails -- a provisioning failure is non-fatal. GitHub continues to hold the job in queue; if the JIT runner never registers, GitHub eventually times out and you can retry.

Cold start overhead

When the decision is self_hosted, there is latency between the webhook arriving and the job actually starting. With pre-pulled runner images on a healthy cluster, the typical overhead is 30-60 seconds from webhook receipt to job start. GitHub-hosted runners typically start in around 10 seconds, so there is a measurable tradeoff.

For latency-sensitive workloads, keep minRunners: 1 in your scale set configuration. A warm runner that is already registered with GitHub can pick up a queued job in 2-5 seconds -- no provisioning wait.

Monitoring routing decisions

Every routing decision is recorded. The relevant Firestore collections:

workflowJobs: one document per job, keyed by {'{'}orgName{'}'}_{'{'} runId{'}'}_{'{'}jobId{'}'}. Fields include routingDecision, routingReason, runnerId, runnerName, durationSeconds, and costUsd.
pendingRunners: one document per in-flight JIT provision. Status transitions from pending → scheduling → scheduled (or failed).
costAttribution: daily cost rollups per org, keyed by {'{'}orgName{'}'}_YYYY-MM-DD.

The Grafana dashboard shows routing split percentage, queue depth per scale set label, and runner provisioning latency from webhook to job start. BigQuery receives the cost attribution data from Firestore for historical analysis: what you actually spent on GitHub-hosted overflow vs. what it would have cost to run everything on GitHub-hosted runners.

This is how Stratus handles routing -- the webhook layer, the routing evaluator, and the JIT provisioning. If your team runs GitHub Actions and you want overflow routing without building the evaluator yourself:

Join the waitlist →

The problem with runs-on labels