Hunting Coroutine Deadlocks with CodeQL
We recently used CodeQL to do a static analysis pass over our Kotlin codebase, looking for a specific coroutine anti-pattern: calling runBlocking from inside a running coroutine. This post walks through the problem, the query we wrote, the debugging journey to get it working, and what we found.
Hunting Coroutine Deadlocks with CodeQL
We recently used CodeQL to do a static analysis pass over our Kotlin codebase,
looking for a specific coroutine anti-pattern: calling runBlocking from inside
a running coroutine. This post walks through the problem, the query we wrote,
the debugging journey to get it working, and what we found.
The Problem
Kotlin coroutines let you write asynchronous code that looks sequential. At the
edges — where coroutine code meets legacy blocking code — you need a bridge.
runBlocking is that bridge. It blocks the calling thread until the coroutine
inside it finishes.
That’s fine when called from a plain, non-coroutine thread. But if you call
runBlocking from inside an already-running coroutine, you can deadlock:
Dispatcher thread pool
┌──────────────────────────────────────────────────────────────┐
│ Thread 1: suspend fun A() ──► calls non-suspend B() │
│ B() calls runBlocking { ... } │
│ runBlocking blocks Thread 1, waiting for a │
│ coroutine to finish... but the dispatcher has │
│ no free thread to run it on. │
│ │
│ Thread 2: (also blocked, waiting for something else) │
│ │
│ DEADLOCK │
└──────────────────────────────────────────────────────────────┘
The tricky version isn’t a direct suspend fun calling runBlocking — that’s
obvious. The real risk is a suspend function calling a non-suspend helper, which
calls another non-suspend helper, somewhere deep in which there’s a
runBlocking. That call chain is what we wanted to detect automatically.
In our codebase, nearly all runBlocking calls go through a local wrapper:
fun <T> runBlockingWithMdc(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
body: suspend CoroutineScope.() -> T,
): T = runBlocking(dispatcher + newMDCContext(), body)
So we needed to detect both direct runBlocking calls and calls to
runBlockingWithMdc.
Choosing CodeQL
CodeQL lets you write queries over a compiled snapshot of your codebase (the “database”). You express what you’re looking for in QL, a declarative language, and it does the call-graph traversal for you. It handles transitive reachability naturally, which would be painful to do by hand.
The database for a Java/Kotlin project is built by letting CodeQL watch your build:
codeql database create codeql-db \
--language=java \
--command="./gradlew build -x test" \
--source-root=.
This produces a queryable snapshot of the codebase at around 767k lines of code.
The Query
The core detection query in RunBlockingLeakDetection.ql:
/**
* @name runBlocking called in non-suspending function reachable from suspending context
* @kind problem
* @problem.severity warning
* @id kotlin/run-blocking-leak
*/
import java
class RunBlockingCall extends MethodCall {
RunBlockingCall() {
(
this.getMethod().hasName("runBlocking") and
this.getMethod().getDeclaringType().hasQualifiedName("kotlinx.coroutines", "BuildersKt")
)
or
this.getMethod().hasName("runBlockingWithMdc")
}
}
class SuspendMethod extends Method {
SuspendMethod() {
this.isSuspend() and
this.fromSource()
}
}
class NonSuspendMethod extends Method {
NonSuspendMethod() {
not this instanceof SuspendMethod and
this.fromSource()
}
}
predicate calls(Method caller, Method callee) {
caller.polyCalls(callee)
or
exists(Method mid |
caller.polyCalls(mid) and
calls(mid, callee)
)
}
from SuspendMethod suspendCaller, NonSuspendMethod nonSuspendMiddle, RunBlockingCall rb
where
calls(suspendCaller, nonSuspendMiddle) and
rb.getEnclosingCallable() = nonSuspendMiddle
select rb,
"runBlocking in non-suspending '" + nonSuspendMiddle.getName() +
"' is reachable from suspend fun '" + suspendCaller.getName() + "'"
The structure:
┌─────────────────────────────────────────────────────────────────┐
│ SuspendMethod ──(calls, transitively)──► NonSuspendMethod │
│ │ │
│ ▼ │
│ RunBlockingCall │
│ (runBlocking or wrapper) │
└─────────────────────────────────────────────────────────────────┘
Debugging the Query
Getting to a working query took three rounds of debugging.
Round 1: Wrong class name
The CodeQL Java library was updated in version 9.x: MethodAccess was renamed
to MethodCall. The original query used MethodAccess and failed to compile:
ERROR: could not resolve type MethodAccess
Fix: extends MethodAccess → extends MethodCall.
Round 2: Parameterized type resolution
The original SuspendMethod detection used:
this.getAParameter().getType().(RefType)
.hasQualifiedName("kotlin.coroutines", "Continuation")
This matched nothing. The reason: in Kotlin bytecode, the continuation parameter
has a parameterized type — Continuation<? super T> — not the raw
Continuation. hasQualifiedName on the raw type doesn’t match the
parameterized form.
There’s a better way. The CodeQL Java library has isSuspend() built in:
class SuspendMethod extends Method {
SuspendMethod() {
this.isSuspend() and // ← native predicate, handles Kotlin source
this.fromSource()
}
}
Using that returned 1,034 suspend methods from source — confirming the database had the right content.
Round 3: Standard library packs not found
Running the query without first installing the QL pack produced:
ERROR: could not resolve module java
ERROR: could not resolve module semmle.code.java.dataflow.DataFlow
The CodeQL CLI ships without the standard query libraries. They’re installed separately:
codeql pack download codeql/java-all
cd codeql-analysis && codeql pack install
The Finding
After those fixes, the query ran in about 3 seconds and returned one result:
runBlockingWithMdc(...) — runBlocking in non-suspending 'guzzle'
is reachable from suspend fun 'invoke'
Tracing the call chain:
ScheduledCommand.run()
└─ runBlockingWithMdc(Dispatchers.IO) {
launch { scheduler.start() } ← coroutine starts here
}
└─ scheduler calls task { execute() }
↑ task: suspend (TaskRequest) -> Unit
└─ execute() [non-suspend]
└─ guzzler.guzzle() [non-suspend]
└─ runBlockingWithMdc(Dispatchers.IO) { ... }
↑ runBlocking inside a running coroutine!
The task type is:
typealias TaskFunction = (suspend (TaskRequest) -> Unit)
So the lambda { execute() } is a suspend function. Its invoke method is the
SuspendMethod CodeQL found. Inside that suspend invoke, the chain goes
through two non-suspend frames (execute → guzzle) and terminates in
runBlockingWithMdc.
Whether this actually deadlocks depends on dispatcher thread availability —
Dispatchers.IO has a large (but bounded) thread pool. But it’s the kind of
thing that only fails under load, in production, on a bad day. Worth fixing.
Running the Analysis
The run-analysis.sh script handles a full rebuild + query run:
#!/usr/bin/env bash
set -euo pipefail
codeql database create "$DB_PATH" \
--language=java \
--command="./gradlew build -x test" \
--overwrite \
--source-root="$PROJECT_ROOT"
codeql query run \
--database="$DB_PATH" \
--output="$SCRIPT_DIR/results.bqrs" \
"$QUERY"
codeql bqrs decode \
--format=text \
"$SCRIPT_DIR/results.bqrs"
If the database is already current, skip straight to the query:
codeql query run \
--database=codeql-db \
--output=codeql-analysis/results.bqrs \
codeql-analysis/RunBlockingLeakDetection.ql
codeql bqrs decode --format=text codeql-analysis/results.bqrs
What We Learned
The CodeQL Java library has first-class Kotlin support. isSuspend(),
fromSource(), and the rest of the modifier predicates work correctly for Kotlin
source code extracted through the Java/Kotlin extractor. The extractor sees the
Kotlin source AST, not the JVM bytecode, so the Continuation parameter trick
doesn’t apply — isSuspend() is the right way to detect suspend functions.
Wrapper functions matter. If we’d only looked for direct runBlocking calls,
we’d have found nothing. Real codebases wrap runBlocking with project-specific
context (MDC propagation in our case), and the query needs to know about those
wrappers.
Transitive reachability is where the value is. The actual runBlocking call
is buried two frames below the suspend function boundary. Neither a simple grep
for runBlocking nor a single-level call check would have found this.
One finding is a good result. We scanned 767k lines of Kotlin/Java across 30+ modules and got one actionable issue. That’s noise-free enough to be useful, and the false-negative rate (zero in this case) reflects that the codebase generally avoids the pattern.