↑
Decoupling Compute and Memory for Async GPUs
Posted by
yiyingzhang
|
2 hours ago |
1 comments
bobbyzhu2008 2 hours ago
67% less kernel code is the more interesting number here — Hopper's async capabilities have been underutilized largely because the programming model is painful. Curious how it handles cases where compute and memory phases aren't cleanly separable.