Sharat Visweswara
5 min read

Hunting Coroutine Deadlocks with CodeQL

We recently used CodeQL to do a static analysis pass over our Kotlin codebase, looking for a specific coroutine anti-pattern: calling runBlocking from inside a running coroutine. This post walks through the problem, the query we wrote, the debugging journey to get it working, and what we found.

Hunting Coroutine Deadlocks with CodeQL

We recently used CodeQL to do a static analysis pass over our Kotlin codebase, looking for a specific coroutine anti-pattern: calling runBlocking from inside a running coroutine. This post walks through the problem, the query we wrote, the debugging journey to get it working, and what we found.


The Problem

Kotlin coroutines let you write asynchronous code that looks sequential. At the edges — where coroutine code meets legacy blocking code — you need a bridge. runBlocking is that bridge. It blocks the calling thread until the coroutine inside it finishes.

That’s fine when called from a plain, non-coroutine thread. But if you call runBlocking from inside an already-running coroutine, you can deadlock:

Dispatcher thread pool
┌──────────────────────────────────────────────────────────────┐
│  Thread 1: suspend fun A() ──► calls non-suspend B()         │
│              B() calls runBlocking { ... }                   │
│              runBlocking blocks Thread 1, waiting for a      │
│              coroutine to finish... but the dispatcher has   │
│              no free thread to run it on.                    │
│                                                              │
│  Thread 2: (also blocked, waiting for something else)        │
│                                                              │
│  DEADLOCK                                                    │
└──────────────────────────────────────────────────────────────┘

The tricky version isn’t a direct suspend fun calling runBlocking — that’s obvious. The real risk is a suspend function calling a non-suspend helper, which calls another non-suspend helper, somewhere deep in which there’s a runBlocking. That call chain is what we wanted to detect automatically.

In our codebase, nearly all runBlocking calls go through a local wrapper:

fun <T> runBlockingWithMdc(
    dispatcher: CoroutineDispatcher = Dispatchers.IO,
    body: suspend CoroutineScope.() -> T,
): T = runBlocking(dispatcher + newMDCContext(), body)

So we needed to detect both direct runBlocking calls and calls to runBlockingWithMdc.


Choosing CodeQL

CodeQL lets you write queries over a compiled snapshot of your codebase (the “database”). You express what you’re looking for in QL, a declarative language, and it does the call-graph traversal for you. It handles transitive reachability naturally, which would be painful to do by hand.

The database for a Java/Kotlin project is built by letting CodeQL watch your build:

codeql database create codeql-db \
  --language=java \
  --command="./gradlew build -x test" \
  --source-root=.

This produces a queryable snapshot of the codebase at around 767k lines of code.


The Query

The core detection query in RunBlockingLeakDetection.ql:

/**
 * @name runBlocking called in non-suspending function reachable from suspending context
 * @kind problem
 * @problem.severity warning
 * @id kotlin/run-blocking-leak
 */

import java

class RunBlockingCall extends MethodCall {
  RunBlockingCall() {
    (
      this.getMethod().hasName("runBlocking") and
      this.getMethod().getDeclaringType().hasQualifiedName("kotlinx.coroutines", "BuildersKt")
    )
    or
    this.getMethod().hasName("runBlockingWithMdc")
  }
}

class SuspendMethod extends Method {
  SuspendMethod() {
    this.isSuspend() and
    this.fromSource()
  }
}

class NonSuspendMethod extends Method {
  NonSuspendMethod() {
    not this instanceof SuspendMethod and
    this.fromSource()
  }
}

predicate calls(Method caller, Method callee) {
  caller.polyCalls(callee)
  or
  exists(Method mid |
    caller.polyCalls(mid) and
    calls(mid, callee)
  )
}

from SuspendMethod suspendCaller, NonSuspendMethod nonSuspendMiddle, RunBlockingCall rb
where
  calls(suspendCaller, nonSuspendMiddle) and
  rb.getEnclosingCallable() = nonSuspendMiddle
select rb,
  "runBlocking in non-suspending '" + nonSuspendMiddle.getName() +
  "' is reachable from suspend fun '" + suspendCaller.getName() + "'"

The structure:

┌─────────────────────────────────────────────────────────────────┐
│  SuspendMethod ──(calls, transitively)──► NonSuspendMethod      │
│                                                   │             │
│                                                   ▼             │
│                                          RunBlockingCall        │
│                                   (runBlocking or wrapper)      │
└─────────────────────────────────────────────────────────────────┘

Debugging the Query

Getting to a working query took three rounds of debugging.

Round 1: Wrong class name

The CodeQL Java library was updated in version 9.x: MethodAccess was renamed to MethodCall. The original query used MethodAccess and failed to compile:

ERROR: could not resolve type MethodAccess

Fix: extends MethodAccessextends MethodCall.

Round 2: Parameterized type resolution

The original SuspendMethod detection used:

this.getAParameter().getType().(RefType)
    .hasQualifiedName("kotlin.coroutines", "Continuation")

This matched nothing. The reason: in Kotlin bytecode, the continuation parameter has a parameterized type — Continuation<? super T> — not the raw Continuation. hasQualifiedName on the raw type doesn’t match the parameterized form.

There’s a better way. The CodeQL Java library has isSuspend() built in:

class SuspendMethod extends Method {
  SuspendMethod() {
    this.isSuspend() and   // ← native predicate, handles Kotlin source
    this.fromSource()
  }
}

Using that returned 1,034 suspend methods from source — confirming the database had the right content.

Round 3: Standard library packs not found

Running the query without first installing the QL pack produced:

ERROR: could not resolve module java
ERROR: could not resolve module semmle.code.java.dataflow.DataFlow

The CodeQL CLI ships without the standard query libraries. They’re installed separately:

codeql pack download codeql/java-all
cd codeql-analysis && codeql pack install

The Finding

After those fixes, the query ran in about 3 seconds and returned one result:

runBlockingWithMdc(...) — runBlocking in non-suspending 'guzzle'
is reachable from suspend fun 'invoke'

Tracing the call chain:

ScheduledCommand.run()
    └─ runBlockingWithMdc(Dispatchers.IO) {
           launch { scheduler.start() }   ← coroutine starts here
       }
           └─ scheduler calls task { execute() }
                    ↑ task: suspend (TaskRequest) -> Unit
                        └─ execute()              [non-suspend]
                               └─ guzzler.guzzle()  [non-suspend]
                                      └─ runBlockingWithMdc(Dispatchers.IO) { ... }
                                               ↑ runBlocking inside a running coroutine!

The task type is:

typealias TaskFunction = (suspend (TaskRequest) -> Unit)

So the lambda { execute() } is a suspend function. Its invoke method is the SuspendMethod CodeQL found. Inside that suspend invoke, the chain goes through two non-suspend frames (executeguzzle) and terminates in runBlockingWithMdc.

Whether this actually deadlocks depends on dispatcher thread availability — Dispatchers.IO has a large (but bounded) thread pool. But it’s the kind of thing that only fails under load, in production, on a bad day. Worth fixing.


Running the Analysis

The run-analysis.sh script handles a full rebuild + query run:

#!/usr/bin/env bash
set -euo pipefail

codeql database create "$DB_PATH" \
  --language=java \
  --command="./gradlew build -x test" \
  --overwrite \
  --source-root="$PROJECT_ROOT"

codeql query run \
  --database="$DB_PATH" \
  --output="$SCRIPT_DIR/results.bqrs" \
  "$QUERY"

codeql bqrs decode \
  --format=text \
  "$SCRIPT_DIR/results.bqrs"

If the database is already current, skip straight to the query:

codeql query run \
  --database=codeql-db \
  --output=codeql-analysis/results.bqrs \
  codeql-analysis/RunBlockingLeakDetection.ql

codeql bqrs decode --format=text codeql-analysis/results.bqrs

What We Learned

The CodeQL Java library has first-class Kotlin support. isSuspend(), fromSource(), and the rest of the modifier predicates work correctly for Kotlin source code extracted through the Java/Kotlin extractor. The extractor sees the Kotlin source AST, not the JVM bytecode, so the Continuation parameter trick doesn’t apply — isSuspend() is the right way to detect suspend functions.

Wrapper functions matter. If we’d only looked for direct runBlocking calls, we’d have found nothing. Real codebases wrap runBlocking with project-specific context (MDC propagation in our case), and the query needs to know about those wrappers.

Transitive reachability is where the value is. The actual runBlocking call is buried two frames below the suspend function boundary. Neither a simple grep for runBlocking nor a single-level call check would have found this.

One finding is a good result. We scanned 767k lines of Kotlin/Java across 30+ modules and got one actionable issue. That’s noise-free enough to be useful, and the false-negative rate (zero in this case) reflects that the codebase generally avoids the pattern.