Extending AFM-4.5B to 64k Context Length
Via Nathan Lambert, an extremely fun write up of the journey to an 64k context length for Arcee’s 4.5B foundation model. There are a lot of good takeaways, but this one particularly resonated with me:
Experimentation is Key: As in everything I write, I am unable to stress enough the importance of trying dumb things. If you try enough dumb things, eventually one of them will turn into a smart thing. Embrace the chaos.