Transformer Circuits - Decomposing Small Language Models
Can we understand what’s going in Large Language Models by dissecting small ones?
This post is available here.
Can we understand what’s going in Large Language Models by dissecting small ones?
This post is available here.