I would like to know llama.cpp!
Reliability of This Article
by Our Founder/CEO&CTO Hiroyuki Chishiro
- He has been involved in 12 years of research on real-time systems.
- He teaches OS (Linux kernel) in English at the University of Tokyo.
- From September 2012 to August 2013, he was a visiting researcher at the Department of Computer Science, the University of North Carolina at Chapel Hill (UNC), Chapel Hill, North Carolina, United States. He has been involved in research and development of real-time Linux in C language.
- He has experienced in more than 15 years of programming languages: C/C++, Python, Solidity/Vyper, Java, Ruby, Go, Rust, D, HTML/CSS/JS/PHP, MATLAB, Verse (UEFN), Assembler (x64, ARM).
- While a faculty member at the University of Tokyo, he developed the "Extension of LLVM Compiler" in C++ language and his own real-time OS "Mcube Kernel" in C language, which he published as open source on GitHub.
- In January 2020-Present, he is CTO of Guarantee Happiness LLC, Chapel Hill, North Carolina, United States, in charge of e-commerce site development and web/social network marketing. In June 2022-Present, he is CEO&CTO of Japanese Tar Heel, Inc. in Chapel Hill, North Carolina, United States.
- We have been engaged in disseminating useful information on AI and Crypto (Web3).
- We have written more than 20 articles on AI including AI chatbots such as ChatGPT, Auto-GPT, Gemini (formerly Bard). He has experience in contract work as a prompt engineer, manager, and quality assurance (QA) for several companies in San Francisco, United States (Silicon Valley in the broadest sense of the word).
- We have written more than 40 articles on cryptocurrency (including smart contract programming). He has experience as an outsourced translator of English articles on cryptocurrency into Japanese for a company in London, England.
You can learn from us.
If you would like to know the recommended job sites for AI Engineers, please click the following.
If you would like to know the recommended job sites for Prompt Engineers, please click the following.
Table of Contents
What is llama.cpp?
llama.cpp is a framework for inferring Large Language Models (LLMs) such as LLaMA in C/C++ language.
The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a variety of hardware, both locally and in the cloud.
For reference, LLaMA is an open-source framework of large-scale language models in the Python language developed by Meta (formerly Facebook).
1 |
$ git clone https://github.com/meta-llama/llama |
The license is LLAMA 2 COMMUNITY LICENSE AGREEMENT.
If you want to know more about AI chatbots with large language models including LLaMA, please click the following.
llama.cpp is available as open source on GitHub.
1 |
$ git clone https://github.com/ggerganov/llama.cpp |
The open source license is MIT License.
The features of llama.cpp are as follows.
- Implemented in plain C/C++ language with no dependencies
- Support mainly for Apple silicon: optimized by ARM NEON, Accelerate, and Metal frameworks
- Support for AVX, AVX2, and AVX512 on x86 architecture
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage
- Custom CUDA kernel to run LLM on NVIDIA GPUs (AMD GPU support via HIP)
- Vulkan and SYCL backend support
- Hybrid CPU+GPU inference for partial model acceleration beyond total VRAM capacity
Large Language Models Supported by llama.cpp
As of August 2024, the following large language models are supported by llama.cpp.
- LLaMA
- LLaMA 2
- LLaMA 3
- Mistral 7B
- Mixtral MoE
- DBRX
- Falcon
- Chinese LLaMA / Alpaca,Chinese LLaMA-2 / Alpaca-2
- Vigogne (French)
- BERT
- Koala
- Baichuan 1 & 2,derivations
- Aquila 1 & 2
- Starcoder models
- Refact
- MPT
- Bloom
- Yi models
- StableLM models
- Deepseek models
- Qwen models
- PLaMo-13B
- Phi models
- GPT-2
- Orion 14B
- InternLM2
- CodeShell
- Gemma
- Mamba
- Grok-1
- Xverse
- Command-R models
- SEA-LION
- GritLM-7B,GritLM-8x7B
- OLMo
- GPT-NeoX,Pythia
- ChatGLM3-6b,ChatGLM4-9b
Projects that Run llama.cpp in Other Languages and Environments
Projects are underway to run llama.cpp in other languages and environments.
Typical projects are listed below.
- Python: abetlen/llama-cpp-python
- Go: go-skynet/go-llama.cpp
- Node.js: withcatai/node-llama-cpp
NOTE: The list of projects can be found on the official llama.cpp page.
Introductory Videos of llama.cpp
These are introductory videos of llama.cpp.
Summary
We introduced llama.cpp, a framework that enables inference of large language language models such as LLaMA in C/C++ language.
llama.cpp supports many large-scale language models.
If you would like to know the recommended job sites for AI Engineers, please click the following.
If you would like to know the recommended job sites for Prompt Engineers, please click the following.