Support Cohere Command-A (Cohere2ForCausalLM arch)

It would be great to support this new model! https://cohere.com/blog/command-a

They use a fairly unique architecture, where some layers use sliding window attention while others use global attention with no position embeddings, so even though I read through the documentation on how to add a model I'm a little lost on how to do this myself.