What is llama.cpp? Framework for Inferring Large Language Models such as LLaMA in C/C++ Language

Person who needs help

I would like to know llama.cpp!

We can help you with your concerns.

Reliability of This Article
by Our Founder/CEO&CTO Hiroyuki Chishiro

He has been involved in 12 years of research on real-time systems.
He teaches OS (Linux kernel) in English at the University of Tokyo.
From September 2012 to August 2013, he was a visiting researcher at the Department of Computer Science, the University of North Carolina at Chapel Hill (UNC), Chapel Hill, North Carolina, United States. He has been involved in research and development of real-time Linux in C language.
He has experienced in more than 15 years of programming languages: C/C++, Python, Solidity/Vyper, Java, Ruby, Go, Rust, D, HTML/CSS/JS/PHP, MATLAB, Verse (UEFN), Assembler (x64, ARM).
While a faculty member at the University of Tokyo, he developed the "Extension of LLVM Compiler" in C++ language and his own real-time OS "Mcube Kernel" in C language, which he published as open source on GitHub.
In January 2020-Present, he is CTO of Guarantee Happiness LLC, Chapel Hill, North Carolina, United States, in charge of e-commerce site development and web/social network marketing. In June 2022-Present, he is CEO&CTO of Japanese Tar Heel, Inc. in Chapel Hill, North Carolina, United States.
We have been engaged in disseminating useful information on AI and Crypto (Web3), and working on game development with Unreal Editor for Fortnite (UEFN).

We have written more than 20 articles on AI including AI chatbots such as ChatGPT, Auto-GPT, Gemini (formerly Bard). He has experience in contract work as a prompt engineer, manager, and quality assurance (QA) for training ChatGPT/Gemini in several companies in San Francisco, United States (Silicon Valley in the broadest sense of the word).
We have written more than 40 articles on cryptocurrency (including smart contract programming). He has experience as an outsourced translator of English articles on cryptocurrency into Japanese for a company in London, England.
We have developed more than 10 games on UEFN and published on Fortnite (Fortnite, Fortnite.GG).

You can learn from us.

If you would like to know the recommended job sites for AI Engineers, please click the following.

: Recommended Job Sites for AI Engineers [Posts/Boards] [C++/Python]

We can help you with your concerns. You can learn from us. We introduce recommended job sites for AI Engineers. As an AI engineer, you can contribute to the development of AI libraries and frameworks using the C++/Python language. Deep LearningTensorFlow: C++/Python languagePyTorch: Python languageComputer Vision (image processing, face recognition systems)OpenCV: C++ languagefacenet (Face Recognition using Tensorflow): Python language You can also contribute to the development tools for AI chatbots (ChatGPT, Gemini (formerly Bard), etc.) and Generative AI, etc. NOTE: Probably using C++/Python language. If you would like to become an AI Engineer, sign up for a free membership today! ...

If you would like to know the recommended job sites for Prompt Engineers, please click the following.

: Recommended Job Sites for Prompt Engineers [Posts/Boards] [AI Chatbots, ChatGPT, Auto-GPT, Gemini (formerly Bard)]

We can help you with your concerns. You can learn from us. We introduce you to recommended job sites (recruitment agencies) for Prompt Engineers. NOTE: Prompt engineers are sometimes called AI Trainers, AI Alchemists, and AI Whisperers. To become a Prompt Engineer, you are required to learn AI chatbot like ChatGPT and Auto-GPT, Gemini (formerly Bard), and Prompt Engineering. If you would to know AI chatbots, how to start and use ChatGPT, Auto-GPT, Gemini, and Prompt Engineering, please click the following. As of October 2023, there are not many job openings for prompt engineers. However, the information "$335,000 Pay for ...

Table of Contents

What is llama.cpp?

llama.cpp is a framework for inferring Large Language Models (LLMs) such as LLaMA in C/C++ language.

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a variety of hardware, both locally and in the cloud.

For reference, LLaMA is an open-source framework of large-scale language models in the Python language developed by Meta (formerly Facebook).

$ git clone https://github.com/meta-llama/llama

1	$ git clone https://github.com/meta-llama/llama

The license is LLAMA 2 COMMUNITY LICENSE AGREEMENT.

If you want to know more about AI chatbots with large language models including LLaMA, please click the following.

: What are AI Chatbots? Text Generation AI [Large Language Models (LLMs)] [ChatGPT, Gemini (formerly Bard), New AI-powered Bing, LlaMA, DeepSeek]

We can help you with your concerns. You can learn from us. What are AI Chatbots? This article introduces AI chatbots and points to keep in mind. AI chatbots are text generation AI and can efficiently mass-produce articles of a certain level of quality. However, there is a possibility that the content of the created articles may be incorrect. Therefore, please note that the content of the articles should be reviewed by an expert. If you are the expert, it is your responsibility to check it properly! Also, check the content of the article for copyright infringement by using a ...

llama.cpp is available as open source on GitHub.

$ git clone https://github.com/ggerganov/llama.cpp

1	$ git clone https://github.com/ggerganov/llama.cpp

The open source license is MIT License.

The features of llama.cpp are as follows.

Implemented in plain C/C++ language with no dependencies
Support mainly for Apple silicon: optimized by ARM NEON, Accelerate, and Metal frameworks
Support for AVX, AVX2, and AVX512 on x86 architecture
1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage
Custom CUDA kernel to run LLM on NVIDIA GPUs (AMD GPU support via HIP)
Vulkan and SYCL backend support
Hybrid CPU+GPU inference for partial model acceleration beyond total VRAM capacity

Large Language Models Supported by llama.cpp

As of August 2024, the following large language models are supported by llama.cpp.

LLaMA
LLaMA 2
LLaMA 3
Mistral 7B
Mixtral MoE
DBRX
Falcon
Chinese LLaMA / Alpaca，Chinese LLaMA-2 / Alpaca-2
Vigogne (French)
BERT
Koala
Baichuan 1 & 2，derivations
Aquila 1 & 2
Starcoder models
Refact
MPT
Bloom
Yi models
StableLM models
Deepseek models
Qwen models
PLaMo-13B
Phi models
GPT-2
Orion 14B
InternLM2
CodeShell
Gemma
Mamba
Grok-1
Xverse
Command-R models
SEA-LION
GritLM-7B，GritLM-8x7B
OLMo
GPT-NeoX，Pythia
ChatGLM3-6b，ChatGLM4-9b

Projects that Run llama.cpp in Other Languages and Environments

Projects are underway to run llama.cpp in other languages and environments.

Typical projects are listed below.

NOTE: The list of projects can be found on the official llama.cpp page.

Introductory Videos of llama.cpp

These are introductory videos of llama.cpp.

Summary

We introduced llama.cpp, a framework that enables inference of large language language models such as LLaMA in C/C++ language.

llama.cpp supports many large-scale language models.