wardc an hour ago
For chincilla / scaling laws doesnt it seem a bit weird that they arent using wall-clock? Like FA4 backwards is bandwidth bound not flops bound. it seems like you'd care about like dollars or time in relation to loss or something like that not just clean room flops. MFUs are likely not equivalent given different model sizes / shapes
adamzwasserman 3 hours ago