Skip to content

Replace synchronized blocks in OkHttpCall with atomic CAS #4656

Open
Amir-yazdanmanesh wants to merge 4 commits intosquare:trunkfrom
Amir-yazdanmanesh:trunk
Open

Replace synchronized blocks in OkHttpCall with atomic CAS #4656
Amir-yazdanmanesh wants to merge 4 commits intosquare:trunkfrom
Amir-yazdanmanesh:trunk

Conversation

@Amir-yazdanmanesh
Copy link
Contributor

Summary

  • Replace synchronized(this) blocks in OkHttpCall with lock-free atomic operations to avoid pinning virtual threads (Project Loom)
  • Merge rawCall and creationFailure fields into a single AtomicReference<Object> rawCallState with state transitions null → Call or null → Throwable
    (write-once semantics)
  • Replace boolean executed with AtomicBoolean using compareAndSet for the execute-once guard

Closes #4297

Motivation

synchronized blocks pin virtual threads to platform threads, defeating the purpose of Project Loom. Since the guarded code protects single-instance creation with
minimal contention, atomic CAS is a better fit than ReentrantLock. See #4297.

Design

The key insight is merging rawCall + creationFailure into one AtomicReference. This avoids the race condition from #4301 where separate references could get out of
sync. Once the reference is set to a non-null value (either a Call or a Throwable), it never changes.

If two threads race to create the call, both call createRawCall() but only one CAS wins. The loser uses the winner's result. This is safe because createRawCall() is
side-effect-free.

Test plan

  • ./gradlew :retrofit:java-test:check — all tests pass across JDK 8, 10-24 (including concurrency tests in CallTest.java)
…e#4072) The testJdkN tasks copied classpath and testClassesDirs from the built-in test task, which broke the Gradle task dependency graph. Re-running a test task would not recompile main/test sources because no dependency on compilation tasks was established. Wire directly to sourceSets.test outputs instead, giving Gradle implicit task dependencies through file collection producers. This also eliminates the tasks.getByName("test") eager realization anti-pattern.
1. Imports: Added AtomicBoolean and AtomicReference; removed GuardedBy 2. Fields: Replaced rawCall + creationFailure + executed with AtomicBoolean executed and AtomicReference<Object> rawCallState (merged call/failure into one atomic ref) 3. getRawCall(): Rewrote with CAS — reads state atomically, creates call on miss, uses compareAndSet to publish. Loser of a race uses the winner's result 4. throwStateFailure(): New static helper to rethrow stored failures 5. request() / timeout(): Removed synchronized keyword (now lock-free via getRawCall()) 6. execute(): Uses executed.compareAndSet(false, true) instead of synchronized block 7. enqueue(): Same CAS for executed check; delegates to getRawCall() instead of inlining creation 8. isExecuted(): Returns executed.get() instead of synchronized read 9. cancel() / isCanceled(): Read rawCallState.get() instead of synchronized block
@JakeWharton
Copy link
Collaborator

The locks don't guard long-running work, only state changes. I don't think this is a big deal. Has anyone demonstrated an actual problem or are we just speculating?

@Amir-yazdanmanesh
Copy link
Contributor Author

The locks don't guard long-running work, only state changes. I don't think this is a big deal. Has anyone demonstrated an actual problem or are we just speculating?

You're right, the synchronized blocks here are short and don't guard long-running work, so pinning should be brief. I don't have benchmarks demonstrating an actual
throughput problem.

The change was motivated by the general guidance to avoid synchronized with virtual threads, but I understand that without evidence of a real issue, the added complexity
of CAS may not be justified.

Happy to close this if you'd prefer to keep the current approach until a concrete problem is reported.

@JakeWharton
Copy link
Collaborator

I need to review it more closely. The last attempt at this had subtle races because it used two atomic states. This one is different, but it still warrants exhaustively thinking through the state transitions in the context of concurrency.

@Amir-yazdanmanesh
Copy link
Contributor Author

I need to review it more closely. The last attempt at this had subtle races because it used two atomic states. This one is different, but it still warrants exhaustively thinking through the state transitions in the context of concurrency.

Happy to walk through the state transitions while you review:
The main thing I'd flag is the cancel window, if cancel() is called after executed CAS succeeds but before rawCallState is set, it reads null and silently does nothing. The original synchronized approach probably had the same gap, but worth confirming there wasn't a canceled boolean that got checked post-creation.
The two-thread race in getRawCall() should be fine as long as createRawCall() has no side effects, both threads may call it, but only the CAS winner's result survives. I believe that holds, but interceptors/event listeners are worth a glance.
Let me know if a test for the cancel-before-create scenario would help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants