Thread-Safe Architecture¶
This document describes the internal design of Stichotrope's thread-safe profiling implementation in v0.2.0+. This is intended for developers contributing to the project or understanding the implementation details.
Overview¶
Stichotrope uses a thread-local storage design with a global registry and minimal locking strategy to achieve thread-safe profiling with negligible overhead impact.
Design Principles¶
- Per-thread lock-free operations: Each thread's profiling data is independent, requiring no synchronization for track/block profiling
- Minimal lock contention: Locks are only acquired for cross-thread aggregation operations
- Automatic cleanup: Thread-local data is automatically cleaned up when threads terminate
- Zero overhead when disabled: Global enable/disable avoids profiling completely when not needed
Data Structures¶
Thread-Local Storage¶
# Per-thread structure (threading.local())
_tracks: Dict[int, Track]
Each thread maintains its own _tracks dictionary mapping track indices to Track objects. Track objects contain:
class Track:
track_idx: int # Track identifier
track_name: Optional[str] # User-friendly track name
enabled: bool # Whether track is enabled
blocks: Dict[int, ProfileBlock] # Blocks in this track
ProfileBlock objects store timing data:
class ProfileBlock:
name: str # Block name
file: str # Source file
line: int # Source line
hit_count: int # Number of times executed
total_time_ns: int # Total execution time
min_time_ns: int # Minimum execution time
max_time_ns: int # Maximum execution time
Global Registry¶
_all_thread_data: Dict[int, threading.local]
The global registry maps thread IDs to their threading.local() objects. This allows aggregating results across threads while maintaining per-thread independence.
Lock Strategy¶
Global Lock¶
_lock: threading.RLock()
A single RLock (reentrant lock) protects the global _all_thread_data registry and cross-thread aggregation operations.
Lock Acquisition Points¶
Locked operations (require global lock):
- __init__(): Register profiler instance in global registry
- get_all_thread_data(): Aggregate results from all threads
- Cross-thread cache validation
Lock-free operations (no lock needed):
- track() decorator application
- block() context manager entry/exit
- get_results() for current thread
- Per-thread profiling updates
Lock Contention Impact¶
- Minimal contention: Locks are held for microseconds (only during aggregation)
- No locks in hot path: Decorator and context manager execution don't acquire locks
- Per-thread efficiency: 99%+ of profiling operations are lock-free
Call-Site Caching¶
Stichotrope uses call-site caching to minimize overhead:
# Global cache maps (file, line, function_name) to ProfileBlock
_call_site_cache: Dict[Tuple[str, int, str], ProfileBlock]
When a decorated function is called:
- Inspect call site (file, line, function name)
- Check if this call site is already cached
- If cached: retrieve existing ProfileBlock
- If not cached: create new ProfileBlock and cache it
This reduces overhead for repeated calls to the same function.
Thread Lifecycle¶
Thread Creation¶
When a thread begins profiling:
- Thread-local storage is automatically created by
threading.local() - Global registry entry is created on first profiling operation
_tracksdictionary is initialized for this thread
Thread Termination¶
When a thread terminates:
- Thread-local storage is automatically cleaned up (no manual cleanup needed)
- Global registry entry becomes inactive
- Data can still be accessed via
get_all_thread_data()until profiler is destroyed
Implementation Reference¶
The complete implementation is in stichotrope/profiler.py. Key sections:
- Thread-local initialization:
_setup_thread_local()method - Global registry management:
_register_thread()and_unregister_thread()methods - Decorator/context manager:
track()andblock()methods - Lock acquisition:
get_all_thread_data()method uses lock for aggregation - Call-site caching:
_get_or_create_block()method implements cache logic
Testing¶
Thread-safety is verified through:
- Unit tests:
tests/unit/test_thread_safety.py- Basic thread-safety tests - Integration tests:
tests/integration/test_threaded_profiling.py- Real-world scenarios - Performance tests:
tests/performance/test_thread_safety_overhead.py- Overhead measurement - Stress tests: Multiple threads with high contention scenarios
See Also¶
- API Reference - Profiler - User-facing API documentation
- Performance Characteristics - Performance expectations and guarantees
- Getting Started - Multi-Threading - Usage example