implement Power Law sampling #17927

ddh0 · 2025-12-11T04:23:13Z

This PR implements a new sampler that reshapes token probability distributions to favor tokens near a configurable target probability, rather than selecting from the highest-probability candidates. The technique is called Power Law sampling and it was originally described and implemented by @MrJackSpade here.

How it works

Traditional samplers ask:

"Which tokens are most probable?"

Power Law sampling asks:

"Which tokens are near my target probability?"

This allows controlled exploration of the probability space. Setting a lower target (e.g., 0.45-0.65) favors "interesting but plausible" tokens from the mid-range of the distribution, while higher targets (e.g., 0.85 - 0.95) behave more like standard samplers. The sampler evolved from ideas similar to Mirostat, but targets probability directly rather than perplexity for more intuitive control.

Adaptive target tracking

The sampler maintains a weighted history of the original probabilities of selected tokens. If recent selections have been higher-probability than the target, it compensates by temporarily lowering the effective target, and vice-versa. This keeps the average selection probability near your configured target over time.

Parameters

Flag	Description	Valid range	Default
`--power-law-target`	Select tokens near this probability. Negative = disabled.	[0.0, 1.0]	`-1.0`
`--power-law-decay`	Decay rate for target adaptation over time. Effective history length ≈ 1/(1-decay) tokens.	[0.0, 0.99]	`0.9`

In most cases, just play with --power-law-target. The decay default of 0.9 (~10 token history) works well. Lower decay values make adaptation more reactive but the model may start to feel unstable. Higher values like 0.99 equate to extremely slow adaptation over time. Decay is clamped to 0.99 to prevent unbounded accumulation, thus we get a maximum "effective history size" of ~100 tokens.

Negative target values will disable the sampler and just sample a token from the un-transformed distribution. Since the default target is set to -1.0, the sampler is disabled by default. This is intentional, since it's a specialized sampler.

Usage notes

This sampler must be last in the chain, like the existing greedy, dist, or mirostat samplers, because it selects a token ID rather than just transforming logits.

The sampler works best when the only other samplers are light truncation, e.g. --top-k 64 combined with --min-p 0.05 to remove very unlikely tokens. You should disable penalties, DRY, and most other samplers as they are not expected to play nice.

Example usage

./build/bin/llama-server -m ~/gguf/my-model.gguf --samplers "top-k;min-p;power-law" --top-k 64 --min-p 0.05 --power-law-target 0.55

ddh0 · 2025-12-12T02:55:30Z

Nevermind, sorry, I think we want to do a little more testing. I'm going to mark this as draft again temporarily.

pnb

This looks very interesting! I wish the original compared to XTC, since the goals seem highly similar.

As an aside, I am curious if there is some way to make it work without selecting a token (i.e., only steps 1-3). I see why token selection is necessary, given the need to save the original probability to the history for the adaptive adjustment part. But, for example, maybe it would suffice instead to save the original probability of the highest-probability token after transforming, regardless of which one is eventually selected by a downstream sampler.

pnb · 2025-12-12T18:40:14Z

src/llama-sampling.cpp

+
+    // fixed power law transform parameters (from original implementation)
+    const float distribution_width = 0.2f;
+    const float peak_logit_value   = 3.0f;


Should these parameters be configurable like in the original implementation? There is probably a tradeoff with feature creep, having too many options for users to control, but some of these seem potentially important (especially distribution_width). Also, I noticed peak_logit_value is outside the range suggested in the original implementation; is that intentional?

Myself and the original author are discussing the parameters over the next few days, I agree that the current implementation is probably not ideal, which is why I marked it back as draft.

I will post a comment in the main thread with an update once we've got it more figured out. Thank you!

ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

my git skills are lacking

last commit with debug logging!

ddh0 · 2025-12-16T04:39:31Z

@pnb I've basically re-done the entire PR since your last comment, as well as updated the top comment with a much clearer explanation. Let me know if I can clear anything up.

Geechan · 2025-12-16T05:45:06Z

This is a fantastic sampler for creative tasks, and is truly a game changer in this regard. It's difficult to understand how effective it is until you try it for yourself.

I've found a target value of 0.4-0.7 to be excellent for creative tasks, with higher values for more deterministic tasks. It manages to do what XTC does without many of the pitfalls behind that sampler - a self correcting, dynamic algorithm basically keeps the sampler in check much better than setting a random chance to apply top truncation to. A lot of so called 'AI slop' is heavily over-represented in the top tokens, and so this sampler really helps a model shine in the still strong and coherent mid range while not drifting too far from established probabilities and the natural distribution charts of models (unlike adjusting temperature).

I hope to see this merged!

AesSedai · 2025-12-16T05:53:34Z

Also chiming in here as an early tester of this sampler, it's really refreshing for creative tasks like Geechan mentioned. It breaks the streak of high-confidence token selection that leads to the familiar patterns you get used to, while not impacting coherence.

Overall excited to see this merged in and tested more widely.

z80maniac · 2025-12-17T10:40:37Z

How does this sampler handle the cases where high probability is justified? For example, a punctuation. Let's say we have a text And then he added "Also. Assuming the model follows English grammar, there will be almost 100% probability that the next token is ,. Will Power Law sampler discard it?

Or what about tokens in the middle of a word? Let's say there is a text about a man and his tractor, and the prompt ends with I rode my trac. The next token must be tor so it will also have almost 100% probability. Will Power Law sampler discard it?

And if these high probability tokens won't be discarded, then how will the sampler differentiate between useful high-probability tokens and high-probability slop or repetition?

This is all theoretical and maybe it doesn't matter in practice, but I'm just interested if the above cases are somehow accounted for.

MaggotHATE · 2025-12-17T13:25:12Z

Very interesting sampler, thank you for the implementation! I like the effect so far, it stays on topic even on long results.

One question: if this sample must be the last in the chain, why include it alongside other samplers? For now it looks like a user can make a mistake by putting it elsewhere, which is probably not what we want. Maybe it's worth adding it into the chain at the end, where the dist is, and notify that it will always be the last one if included.

ddh0 · 2025-12-17T14:54:53Z

How does this sampler handle the cases where high probability is justified? [...] This is all theoretical and maybe it doesn't matter in practice, but I'm just interested if the above cases are somehow accounted for.

The idea is that you're supposed to configure your truncation samplers (like top-k and/or min-p) in such a way that removes garbage tokens from the candidates pool, before it even hits Power Law. It's the same for temperature - if you're using a high temperature you should cut out the nonsense before you apply it. (@z80maniac)

if this sample must be the last in the chain, why include it alongside other samplers? For now it looks like a user can make a mistake by putting it elsewhere, which is probably not what we want. Maybe it's worth adding it into the chain at the end, where the dist is, and notify that it will always be the last one if included.

This is good feedback, thank you. I will consider how to change it so that the power law sampler is guaranteed to always be at the end of the chain, if it's active. (@MaggotHATE)

pnb · 2025-12-17T16:01:54Z

I took another look through the code and I think the choice of what is a tunable parameter vs. what is a fixed default is great. The knobs to tune make sense, and I tried playing with the other parameters (that are now constants) without seeing much obvious effect in the text. Overall I would say the effect of this sampler is a little subtle compared to XTC, but it is noticeable with a low target like .05, where lots of excessively popular adverbs disappear from the results.

ddh0 · 2025-12-17T21:10:31Z

Maybe it's worth adding it into the chain at the end, where the dist is, and notify that it will always be the last one if included.

This is addressed now in 7752998.

Gentle poke to @ggerganov - are there any more changes needed here? What are your thoughts?

ddh0 added 2 commits December 10, 2025 22:13

initial commit for branch

774cf23

simplify constants

5ab4ff7

loci-dev mentioned this pull request Dec 11, 2025

UPSTREAM PR #17927: implement Power Law sampling auroralabs-loci/llama.cpp#522

Open

ddh0 and others added 11 commits December 11, 2025 12:52

Merge branch 'ggml-org:master' into power-law-sampler

66e2d17

add params to struct common_params_sampling, add reference to PR

88fb0f3

explicitly clamp min_target and max_target to [0.0, 1.0]

374bfd4

add args, rename queue_size -> window_size

ffe1639

improved comments

4959878

minor

f3457a8

remove old unused code from algorithm

9316959

minor

b3aea57

add power law case to common_sampler_init, add sampler name mappings

cd7de7c

clarify behaviour when window_size = 0

534cb4f

add missing enums

dcada03

This comment was marked as outdated.

Sign in to view

ddh0 marked this pull request as ready for review December 11, 2025 23:59

ddh0 requested a review from ggerganov as a code owner December 11, 2025 23:59

ddh0 marked this pull request as draft December 12, 2025 02:55

ddh0 added 2 commits December 11, 2025 22:43

remove target_range param, make target == 1 no-op, cleanup code

2d62bbe

oops, straggler

5c78b79

pnb reviewed Dec 12, 2025

View reviewed changes

add missing parameters in server-task.cpp

53380c1

github-actions bot added examples server labels Dec 13, 2025

ddh0 and others added 5 commits December 12, 2025 23:19

copy from author

94cb883

ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

remove old debug log, style nit

0a19a3f

fix compiler warning, add commented-out logging per token

824bb3a

Merge branch 'ggml-org:master' into power-law-sampler

1879fc6

Merge branch 'ggml-org:master' into power-law-sampler

67a7336

ddh0 added 9 commits December 14, 2025 03:41

update default decay

667b70f

Merge branch 'master' into power-law-sampler

36b526d

optimize

6934780

fix bad merge

f5d0872

my git skills are lacking

silence missing initializer for member

493bf30

update default decay to 0.9

6854325

fix logging

b5ed673

format (double)

4e28eb2

add power law to the new samplers vector

1c58e9a

ddh0 force-pushed the power-law-sampler branch from 778a00e to 1c58e9a Compare December 15, 2025 04:35

ddh0 and others added 7 commits December 14, 2025 23:14

log sampler init values

4e04bd1

Merge branch 'ggml-org:master' into power-law-sampler

6e66095

improve logging messages in llama_sampler_power_law

9c50b57

remove extraneous logging

0344068

simplify target computation

1c2d2e9

last commit with debug logging!

Merge branch 'ggml-org:master' into power-law-sampler

85b6e52

remove debug logging, explicitly clamp params at init

fcb5129

ddh0 marked this pull request as ready for review December 16, 2025 04:35

ddh0 requested a review from ngxson as a code owner December 16, 2025 04:35

ddh0 added 2 commits December 16, 2025 13:33

Merge branch 'ggml-org:master' into power-law-sampler

58aa1c6

Merge branch 'ggml-org:master' into power-law-sampler

27dda80

add use_power_law flag + logic, minor cleanup

7752998

implement Power Law sampling #17927

Are you sure you want to change the base?

implement Power Law sampling #17927

Conversation

ddh0 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Adaptive target tracking

Parameters

Usage notes

Example usage

Uh oh!

This comment was marked as outdated.

ddh0 commented Dec 12, 2025

Uh oh!

pnb left a comment

Choose a reason for hiding this comment

Uh oh!

pnb Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ddh0 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ddh0 commented Dec 16, 2025

Uh oh!

Geechan commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AesSedai commented Dec 16, 2025

Uh oh!

z80maniac commented Dec 17, 2025

Uh oh!

MaggotHATE commented Dec 17, 2025

Uh oh!

ddh0 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pnb commented Dec 17, 2025

Uh oh!

ddh0 commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ddh0 commented Dec 11, 2025 •

edited

Loading

Geechan commented Dec 16, 2025 •

edited

Loading

ddh0 commented Dec 17, 2025 •

edited

Loading