Rendering Optimization Strategies
A summary of all discussed optimization techniques for achieving high-performance rendering (e.g., 144fps with large design documents).
Transform & Geometry
-
Transform Cache
- Store
local_transform
and derivedworld_transform
. - Use dirty flags and top-down updates.
- Store
-
Geometry Cache
- Cache
local_bounds
,world_bounds
. - Used for culling, layout, and hit-testing.
- Cache
-
Flat Scene Graph + Parent Pointers
- Flat arena with parent/children relationships.
- Enables O(1) access and traversal.
Rendering Pipeline
-
GPU Acceleration (Skia Backend::GL/Vulkan)
- Use hardware compositing, filters, transforms.
-
Scene-Level Picture Caching
- Use
SkPicture
to record full-scene vector draw ops. - Serves as the always-up-to-date canonical snapshot.
- Resolution-independent; ideal for rerendering or tile regeneration.
- Use
-
Tile-Based Raster Cache (Hybrid Rendering)
- Render the full viewport, take snapshot. debounced (after no more changes. e.g. 150ms)
- Divide the snapshot into fixed-size tiles (e.g., 512×512).
- When new area discovered, render the cached, non-overlapping parts with tile cache. only render newly discovered area.
- Repeat step 1.
- Optional padding per tile to account for effects (blur, shadows).
-
Dynamic Mode Switching (Picture vs Tile)
- Render from
SkPicture
directly during normal zoom or active edits. - Fallback to raster tiles for zoomed-out or complex views.
- Tile invalidation/redraw is driven by zoom level, camera transform, or frame budget.
- Render from
-
Dirty & Re-Cache Strategy
- Nodes marked dirty will trigger re-recording of affected picture regions or tiles.
- Use change tracking to only re-record minimum needed areas.
- Recording large subtrees is expensive—optimize granularity based on tree structure.
-
Scene Cache Config / Strategy
-
Defines how scene caching is organized.
-
Properties include:
-
depth
:0
→ Entire scene is one cache.1
→ Cache per top-level container.n
→ Cache at depthn
, chunking deeper layers.
-
mode
:AlwaysPicture
,Hybrid
,AlwaysTile
-
tile_size
,tile_padding
-
zoom_threshold_for_tiles
-
frame_budget_threshold_ms
-
use_bbh
,enable_lod
, etc.
-
-
Cache accessors like
get_picture_cache_by_id()
support scoped re-rendering.
-
-
Will-Change Optimization
-
Nodes marked with "will-change" are expected to become dirty soon.
-
Examples:
- Image node waiting on async src resolution
- Text node waiting on font availability
-
Tree holders of such nodes are chunked for localized re-recording.
-
Prevents re-recording full subtrees—minimizes recording cost.
-
-
Flattened Render Command List
-
Scene is compiled into a flat list of
RenderCommand
structs with resolved:- Transform
- Clip bounds
- Opacity
- Z-order
-
Enables non-recursive rendering and independent layer recording.
-
Required for tiling at arbitrary depths and for caching subtrees.
Example:
Logical Tree:
Frame
└── Group
├── Rect1
├── Rect2
└── Rect3
Flattened:
[
RenderCommand { node_id: Rect1, transform: ..., clip: ..., z: ... },
RenderCommand { node_id: Rect2, transform: ..., clip: ..., z: ... },
RenderCommand { node_id: Rect3, transform: ..., clip: ..., z: ... },
]- Each command can be grouped and recorded separately into its own
SkPicture
. - Nesting is preserved logically via sort order, but rendering is flat.
- This model is essential for dynamic caching, parallel planning, and GPU-aware scheduling.
-
-
Dirty-Region Culling
- Use camera’s
visible_rect
to cullworld_bounds
. - Optional: accelerate with quadtree or BVH.
- Use camera’s
-
Minimize Canvas State Changes
- Reuse transforms and paints.
- Precompute common values like DPI × Zoom × ViewMatrix.
-
Text & Path Caching
- Cache laid-out paragraphs and SVG paths keyed by node ID.
- Each entry stores a hash of the text/style or path string and the current font repository generation.
- Caches are invalidated when fonts or the original data change.
- Hit testing reuses these paths for
path.contains
checks.
-
Render Pass Flattening
- Group nodes with same blend/composite states.
- Sort draw calls for fewer GPU flushes.
Image Optimization
-
LoD / Mipmapped Image Swapping
- Use lower-res versions of images at low zoom.
- Prevents high GPU bandwidth use at low visibility.
-
ImageRepository with Transform-Aware Access
- Pick image resolution based on projected screen size.
- TODO: currently we select mipmap levels solely by the size of the drawing rectangle. This is a temporary strategy until a proper cache invalidation mechanism based on zoom is introduced.
Text & Glyph Optimization
-
Glyph Cache (Atlas or Paragraph Caching)
- Cache rasterized or vector glyphs used across the document.
- Prevents redundant layout or rendering of text.
- Essential for high-DPI or frequently zoomed views.
Engine-Level
-
Precomputed World Transforms
- Avoid recalculating transforms per draw call.
- Essential for random-access rendering.
-
Flat Table Architecture
- All node data (transforms, bounds, styles) stored in flat maps.
- Enables fast diffing, syncing, and concurrent access.
-
Callback-Based Traversal with Fn/FnMut
- Owner controls child behavior via inlined, zero-cost closures.
-
Scene Planner & Scheduler
- A dynamic system that builds the flat render list per frame.
- Reacts to scene changes, memory pressure, or frame budget changes.
- Drives the decision to re-record, cache, evict, or downgrade fidelity.
Optional Advanced
-
Multithreaded Scene Update
- Parallelize transform/bounds resolution.
-
CRDT-Ready Data Stores
- Flat table model enables future collaboration support.
-
BVH or Quadtree Spatial Index
- Build dynamic index from
world_bounds
for fast spatial queries.
- Build dynamic index from
With Compromises
Practical, UX-safe tradeoffs that simplify implementation and improve performance, especially under load. These techniques sacrifice exactness for speed — but in ways users won’t notice.
-
Quantize Camera Transform
Instead of using fully continuous float precision for the camera position and zoom, round them to the nearest N units (e.g., 0.1 for position, 0.01 for zoom):
This list is designed to help evolve a renderer from minimal single-threaded mode to scalable, GPU-friendly real-time performance.