What is Collama?

Collama is a free and open-source VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.

Under the hood

It uses LLaMA.cpp, the Inference of Meta's LLaMA model (and others) in pure C/C++.

Llama.cpp is designed to facilitate LLM inference with minimal setup and optimal performance on various hardware, both locally and in the cloud. It's a plain C/C++ implementation with no dependencies, optimized for Apple silicon, x86 architectures, and supports integer quantization for faster inference and reduced memory use.

It also includes custom CUDA kernels for NVIDIA GPUs, supports Vulkan, SYCL, and OpenCL backends, and enables CPU+GPU hybrid inference for models larger than the total VRAM capacity.

Features

It comes as an VSCode extension, that can ease developers life. As it can:

  1. Chat
  2. Generate code
  3. Explain code

Getting started

  • Install Open Copilot from VSCode marketplace.
  • Set your llama.cpp server's address like http://192.168.0.101:8080 in the Cody>llama Server Endpoint configure.
  • Now enjoy coding with your localized deploy models.

Platforms

  • Linux
  • macOS
  • Windows

License

Apache-2.0 License

Resources & Downloads

GitHub - iohub/collama: VSCode AI coding assistant powered by self-hosted llama.cpp endpoint.
VSCode AI coding assistant powered by self-hosted llama.cpp endpoint. - iohub/collama
GitHub - ggerganov/llama.cpp: LLM inference in C/C++
LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.