Introduction to this article series
My goal for this series is to decipher how the capabilities are built-into Large Language Models (LLMs) products. These capabilities are divided between the LLMs layer and Agentic layer - Which is similar to a software layer. The intelligence (or thinking) is still in the LLM layer.
This understanding is needed for choosing the right LLM model so that it has the capabilities needed for the application being developed and know what is happening behind the scenes. A much deeper understanding to have is how LLMs process and predict through the attention mechanism but that is not in the scope for this.
I don't have a predetermined sequence of topics yet, I am going where my curiosity takes me.
Why does prompting work?
This direction of enquiry started for me with the question "why does prompting work?". This is what makes LLMs so powerful and accessible to use. Because we think in language. Without a language we wont be able to function so well as a society.
“Language serves not only to express thought but to make possible thoughts which could not exist without it.” - Bertrand Russell
The computational power of Deep Neural Networks (DNNs) to process data at scale and give output in simplified form has been known for a while but LLMs allow us the ability to interact with them in words just like we think to ourselves - making them a good thinking companion. But why does it work though? what makes them accessible in this form? Like any powerful system there are few layers to answering this question completely.
Basically LLMs are trained there are 2 stages, 1st is the pre-training where the neural network with attention mechanism is trained on lots and lots of text. This is probably the text found on the general internet, forums, books etc. Doing this model training gives us a Base model or Foundational model.
In the 2nd stage, which is called Fine Tuning, the Base model is trained on instruction based text. These are high-quality human generated texts that have prompts + questions and an answer. The base model is trained on these interactive or assistance kind of tasks. The quantum of data is much less than in the pre-training. The chatbots work with similar prompt scenario. This is just one kind of fine-tuning, others types of fine-tunings teaches the LLMs to use the tools provided or improving their reasoning.
The model cards for Gemma and Llama models on huggingface show "it"(Instruct) or "pt"(Pre-Trained). Based on this we can learn what kind of training did the model go through.
In short, prompting only works with models that are instruction tuned (it) and not so well with the models that only underwent pre-training (pt). This code on Github compares the performance of Gemma3 "pt" and "it" models on a few tasks to distinguish their capabilities clearly.
Although "it" model results are not perfect (probably because of its small size), it follows the instructions much better than the "pt" model. Pre-trained only models are not of much use directly and hence are not available easily. I use Ollama to run local models and it only hosts fine-tuned models by default. The examples provided use hugging face API to access the "pt" model hence you need a hugging face API key.
Other reference: This video of Andrej Karpathy is very helpful to understand LLMs.