Chain of Draft Prompting: A Simple Way to Make LLMs Think Faster

4 min readMar 30, 2025

Large Language Models (LLMs) are great at solving complex problems. One common way to guide them is called “Chain-of-Thought” (CoT) prompting. This method helps the model think step by step, writing full explanations at each stage. While it often gives accurate answers, it can be slow and use a lot of tokens. That means higher cost and longer wait times — especially in real-world applications where speed matters.

A newer method, called Chain of Draft (CoD) proposed in this paper, solves this problem by making the model write its thoughts in short, simple drafts — more like notes than full sentences. This blog explains what CoD is, how it works, and when you should use it instead of other prompting styles.

What Is Chain of Draft?

Think about how people solve problems on paper. We usually don’t write full sentences — we just jot down quick notes or numbers to help us think. For example, solving a math word problem, you might write:

20 — x = 12
x = 8

You wouldn’t write out a paragraph explaining each step. That’s exactly how Chain of Draft works. It tells the model to solve the problem in small steps, but each step should be short and focused — just a few words or an equation. Then, at the end, it gives the final answer.

This makes the output shorter and faster, while still letting the model reason step by step. In tests, CoD has achieved nearly the same accuracy as CoT, but with a small fraction of the token usage — sometimes only 10%. That’s a big deal if you’re using LLMs in apps that handle lots of queries or need fast replies.

How CoD Is Different from CoT and Other Techniques

CoT encourages detailed thinking and full explanations. It’s helpful when clarity is important, but it can get too wordy. CoD, on the other hand, keeps things short. It still breaks problems into steps but skips the long text.

Another technique, called Tree-of-Thought (ToT), explores multiple possible solutions like branches in a tree. It’s useful for really hard or open-ended problems but takes a lot more time and resources.

CoD sits in the middle. It’s faster and simpler than ToT, more efficient than CoT, and far more reliable than just asking the model for a direct answer without any reasoning.

An Example: Lollipops and Math

Let’s take a simple example.

Question: Jason had 20 lollipops. He gave some to Denny. Now he has 12. How many did he give?

With a standard direct prompt, the model might just say “8” without showing how it got there.

With Chain-of-Thought, the model might explain:
“Jason started with 20. After giving some away, he has 12. So he gave away 20–12 = 8.”

With Chain of Draft, the model would write:
20 - x = 12; x = 8; #### 8

Same answer, but fewer words. The reasoning is clear, and it takes less space and time.

How to Use Chain of Draft

To use CoD in your own prompts, you just need to tell the model what to do. Start your prompt by saying something like:

“Think step by step. Keep each step short — no more than five words. Give the final answer after ####.”

It also helps to give a couple of examples so the model understands the style. After that, it usually follows the pattern well.

You can use CoD for math problems, logic puzzles, planning, even programming. For example, if a line of code causes an error, the model can give you a short chain of thoughts like:
Loop too long; Fix: use len(list); #### Adjust loop

This is much faster than getting a full paragraph and is often easier to follow — especially for people who just want the solution.

When Is CoD Useful?

Chain of Draft works best when your task needs clear thinking, but not a long explanation. It’s great for:

  • Math problems
  • Logic and reasoning tasks
  • Programming help
  • Answering multi-step questions
  • Planning or outlining longer content

It’s not ideal for topics that need detailed discussion, like history essays or explanations for beginners. In those cases, Chain-of-Thought may be better because it shows all the details.

Final Thoughts

Chain of Draft is a helpful tool if you’re using LLMs and want to save time and tokens without losing accuracy. It encourages the model to “think like a human” by taking notes, not writing essays. This makes it perfect for apps that need to answer questions fast, like tutoring tools, chatbots, or even automated agents.

Once you try it, you’ll see that many problems can be solved just as well — or better — without the extra words. It’s efficient, simple, and surprisingly powerful. Give it a shot the next time you’re crafting prompts, and you might be surprised at how well your model performs when it’s asked to “think in drafts.”

--

--

Isaac Kargar
Isaac Kargar

Written by Isaac Kargar

AI Researcher | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/

No responses yet