← § BLOG

20 CPU/tick: how I hit the ceiling and found 4.8 CPU in untracked logic

The bot ate a steady 17.7 of 20 CPU, bucket sat around 2-22. Decomposition showed: 8.25 CPU is a hard intent ceiling, nothing to do. And 4.8 CPU is in untracked logic — that's where 90% of the potential lives. A story about module-level cache in an ephemeral runtime.

#screeps#performance#profiling#architecture#javascript

A continuation of the article on monitoring a Screeps bot via Grafana. That one was about infrastructure; this one is about the specific crisis that infrastructure helped solve.

In Screeps every player has a hard CPU budget. At GCL 6 it's 20 CPU/tick. Exceed it — the bucket drops; bucket runs out — creeps stop executing, the bot stalls. This isn't "optimization for optimization's sake," it's a hard limit past which the bot physically can't play.

In April my bot was steadily burning 17.7 CPU/tick out of the 20 ceiling, with the bucket bouncing 2-22 against a normal of 10,000. Every 5 minutes the dashboard showed 1-2 overrun spikes after which a couple of rooms skipped a tick. I knew I had to optimize. But I didn't know what exactly.

This article is about how I broke those 17.7 CPU down by category, found that 4.8 of them are "air" (untracked logic), and about the canonical module-level cache pattern, which works differently than it seems in Screeps' ephemeral runtime.

Bucket as a health indicator

First, the CPU model in Screeps. Every tick the bot has:

  • CPU limit — fixed, 20 at GCL 6
  • Bucket — a reservoir. If you spent under the limit, the leftover drips into the bucket (up to 10,000). If you spent over, it pulls the difference from the bucket.
  • Overrun — when the bucket is empty and the tick didn't fit: code stops, Game.cpu.tickLimit = 5 for the next tick, the bot effectively skips.

A healthy system: avg < 15, bucket steadily > 5000, you can afford a one-off 25-30 CPU spike without pain. My system: avg 17.7, bucket 2-22 — on the edge, any surge (spawn wave, combat) → overrun.

The bucket graph in Grafana looked like a dying patient's cardiogram: crawling near zero, sometimes twitching to 22, then back to zero. Without the chart I'd have written it off as "well, that happens."

Decomposition: where the 17.7 CPU goes

The bot writes Memory._cpu metrics every 100 ticks. That goes into segment 0 → PostgreSQL → Grafana. With a Grafana filter I broke down the 17.7 CPU/tick like this:

ComponentCPUReducible?
Intents (tracked)8.25No — hard ceiling
Task generation~1-2Yes
Overhead (findTask, Logger, pickup)~1.5Yes
Role state machines~0.3Don't touch
Init + Memory parse1.6A little
Rooms4.2A little
Untracked in creeps4.8Yes, the main potential

Intents are creep.move(), creep.harvest(), creep.transfer(), any in-game actions. Each has a fixed CPU price you can't lower — that's the rule of the game. 8.25 CPU of intents at 43-45 creeps is the ceiling; the only way through is fewer creeps or fewer actions.

But 4.8 CPU untracked — that's interesting. That's not intents, that's all the rest of the JS logic: task selection, the runRoom loop, task-pool generation, state machines, filters. That's where 90% of the optimization potential sits.

CPU per role

RolecountCPUCPU/creep
hauler175.90.35
miner123.30.27
upgrader31.20.40
skMiner30.90.30
skHauler0.90.30
worker50.40.08
skKiller10.30.30

The worker is cheap because its logic is simple: came to a build site, built, went for energy. The hauler is expensive because every tick it picks a task again from a pool of 30+ options with priorities. With 17 haulers that's 17 walks through the pool per tick — that's where the 5.9 CPU sits.

CPU per method

The bot also writes which game methods spent how much:

MethodCPUCallsCPU/call
moveTo2.1180.26
transfer1.69110.15
harvest1.48140.11
withdraw0.8150.16
roomFind0.691240.006
upgrade0.5930.20
findInRange0.50410.012

roomFind 124 times per tick sounds a lot, but it adds up to 0.69 CPU — not the worst pain. But transfer at 1.69 CPU on 11 calls is a hint that someone is calling transfer every tick.

Module-level cache: the canonical pattern

The most counterintuitive part of Screeps optimization is where to keep your cache.

In a regular Node.js process you can do room.taskPool = [...] — and it lives as long as the process lives. In Screeps that doesn't work:

Game.* objects are recreated every tick. room.foo, creep.bar, structure.baz do NOT survive the tick boundary.

My first two optimization attempts went through room._taskPool — the cache "worked" for that tick, but a tick later it was gone. And Memory.rooms[name].taskPool is also bad: Memory deserialization is expensive, plus task objects (with pos, target) after deserialization are dead pojos, no methods.

The right pattern is module-level state, which lives in the require cache of the global sandbox:

// top of module — persists via require cache
const _cache = {};

function getCached(key, ttl, compute) {
    const tick = Game.time;
    const e = _cache[key];
    if (e && (tick - e.tick) < ttl) return e.value;
    const v = compute();
    _cache[key] = { value: v, tick };
    return v;
}

The Screeps global sandbox lives between ticks — until the server does a global reset (on code deploy or once every hundreds-thousands of ticks). So _cache in module scope survives the tick boundary, unlike fields on game objects.

What you can't cache in the module cache: the Room, Creep, Structure wrapper objects themselves. They're dead next tick — you can't call methods. You store ids, and resolve via Game.getObjectById(id) each time.

This pattern is used by every well-known public bot (Overmind, the-international, bonzAI). I just didn't know.

Phase 1: Task pool TTL=2

The most expensive bottleneck — generating the task pool for a room. TaskGenerator.generate(room) walks structures, looks for drops, computes priorities — and returns an array of 30+ tasks. This was being called every tick in every room: 6 rooms × ~0.7 CPU = ~4.2 CPU in task generation.

But the pool changes rarely: new tasks appear on events — a creep died, a structure filled up, a drop disappeared after pickup. Against the backdrop of 30+ tasks in the pool, between events the pool is identical. Catch it and hold for 2-3 ticks.

Implementation:

// core.task.queue.js
getPool: function(room) {
    Cache.init(room);
    const TTL = 2;
    if (room._cache.taskPool && room._cache.taskPoolTick &&
        Game.time - room._cache.taskPoolTick < TTL) {
        return room._cache.taskPool;
    }
    const pool = TaskGenerator.generate(room);
    pool.sort((a, b) => a.priority - b.priority);
    room._cache.taskPool = pool;
    room._cache.taskPoolTick = Game.time;
    return pool;
}

And invalidations on events:

  • TaskQueue.assign() with maxAssigned=1 → drop the cache (slot taken)
  • TaskQueue.complete/release() → drop the cache (slot freed)
  • A creep died in the room → drop the cache
  • Pickup removed a ground resource → drop the cache

The risk was around pickup tasks: a dropped resource decays at 1/1000 of amount per tick. 500e loses 25e over 50 ticks. At TTL=2 that's negligible. Tomb decay is a bit faster, but still inside acceptable.

Effect after deploy:

  • Hit rate 73% (i.e. 73% of getPool calls return cached value)
  • avg CPU ~1 lower
  • bucket crawled from 2-22 to 13-52

Not the 10,000 you'd want, but for the first time the bucket is steadily growing instead of pinned at zero.

What turned out NOT to be the problem

After profiling I thought the main culprits were room.find() (124 calls/tick) and moveTo. Turned out no. The real bottlenecks showed up in decomposition, not intuition. Candidates I rejected:

CandidateWhy I didn't touch it
manager.colonize.js 26 finds/tickAlready throttled via Game.time % 50 === 0
room.find in hauler dead branchOnly fires in starter colonies, not prod
Caching 124 roomFind callsSums to 0.69 CPU — bad ROI on refactoring 30 sites
Removing moveTo reusePath8 calls/tick — pathfinding fires rarely already

Lesson: intuition said "optimize find()", numbers said "optimize task generation". Numbers won.

Next step: miner transfer throttle

The funniest find came after Phase 1. I noticed that transfer at 1.69 CPU on 11 calls/tick was mostly miners doing creep.transfer(link, ENERGY) every tick when store > 0.

A 5W miner harvests 10 e/tick, store cap 50. So after the first harvest they have 10 in store, after transfer — 0, a tick later 10 again. Same cycle every tick, transfer every tick — 0.15 CPU × 12 miners = 1.8 CPU on a single action that could be done once every 5 ticks.

One-line fix:

// WAS:
if (creep.store[RESOURCE_ENERGY] > 0) {
    creep.transfer(link, RESOURCE_ENERGY);
}

// IS:
if (creep.store[RESOURCE_ENERGY] >= 40) {
    creep.transfer(link, RESOURCE_ENERGY);
}

The miner accumulates to 40, then dumps in one shot. Cycle 40→0 once every 4 ticks. Savings: 9 transfers/tick × 0.15 CPU = −1.35 CPU. That's more than all of Phase 1 gave me.

Risk is zero. harvest is in pipeline P1, transfer is in P3 — they don't conflict (see [the upcoming article on action pipelines, if I write it]). Threshold 40 + harvest 10 = 50 = capacity — overflow is impossible.

This one-liner is queued for deploy as the next phase.

What I learned

Hard ceiling vs soft ceiling. In Screeps intents cost a fixed amount — that's a hard ceiling. No matter how much you optimize, 8.25 CPU on 43 creeps doesn't go anywhere. If a system has a hard ceiling — find it first so you don't optimize the impossible. The SaaS analogue is the cost of a SQL query in the DB: if the query is mandatory, optimizing the code around it has a ceiling.

Hot path ≠ hot logic. room.find() 124 times/tick sounds scary, but adds up to 0.69 CPU. And TaskGenerator.generate() 6 times/tick is 4.2 CPU. Optimize logic that does a lot per call, not points that get called often.

Module-level state in ephemeral runtimes. In Screeps the global sandbox lives between ticks, but game objects don't. That's counterintuitive: it feels like room.foo is more persistent than a global variable, but it's the opposite. FaaS / Lambda has a similar rule: a warm container survives requests, request-scope state doesn't.

Numbers beat intuition. Until I broke down the 17.7 CPU on shelves, I'd have optimized room.find() — and saved 0.5 CPU. Decomposition showed that untracked logic holds 4.8 CPU out of 13.1 — 90% of the potential. Without a Grafana dashboard I wouldn't have seen this.


In the next article — about action pipelines in Screeps: how two methods can both return OK but one of them is silently ignored by the engine; and how a drainer (5T+18RA+17M+10H) becomes immortal under three towers precisely because of correct pipeline choices. That's not about performance anymore — it's about silent failures and why OK ≠ "done."

Share
Discussion

Comments are powered by Giscus + GitHub. Clicking transfers data to GitHub Inc. (USA). No click — no transfer.

20 CPU/tick: how I hit the ceiling and found 4.8 CPU in untracked logic · Grigoriy Masich