Releases · ggerganov/llama.cpp

17 Oct 00:40

73afe68

fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875)

* fix: use `vm_allocate` to allocate CPU backend buffer on macOS

* fix: switch to `posix_memalign` to keep existing `free()` usages work

* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS

* style: formatting

* fix: move const outside of `#ifndef`

* style: formatting

* fix: unused var

* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`

* fix: unused var

* fix: page align to `GGUF_DEFAULT_ALIGNMENT`

* fix: page align to `TENSOR_ALIGNMENT`

* fix: convert `TENSOR_ALIGNMENT` to a macro

* fix: increase page size to `32` on iOS

* fix: iOS page size

* fix: `hbw_posix_memalign` alignment

Assets 22

16 Oct 19:03

github-actions

b3930

9e04102

b3930

llama : suppress conversion from 'size_t' to 'int' (#9046)

* llama : suppress conversion from 'size_t' to 'int'

This commit updates llm_tokenizer_spm.tokenize to suppress/remove the
following warnings that are generated on Windows when using MSVC:

```console
src\llama-vocab.cpp(211,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
src\llama-vocab.cpp(517,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
```

This is done by adding a cast for the size_t returned from
symbols.size(). I believe this is safe as it seems unlikely that
symbols, which stores an entry for each UTF8 character, would become
larger than INT_MAX.

The motivation for this change is to reduce the number of warnings that
are currently generated when building on Windows.

* squash! llama : suppress conversion from 'size_t' to 'int'

Move cast into for loop.

Assets 22

16 Oct 12:22

github-actions

b3927

10433e8

b3927

llama : add tensor name for "result_norm" (#9907)

Signed-off-by: Molly Sophia <[email protected]>

Assets 22

16 Oct 12:16

github-actions

b3926

1f66b69

b3926

server : fix the disappearance of the end of the text (#9867)

* server: fix the disappearance of the end of the text when streaming with stop strings

* simplify "send text" checks

Assets 22

16 Oct 12:06

github-actions

b3925

0e41b30

b3925

sync : ggml

Assets 22

16 Oct 01:50

github-actions

b3923

becfd38

b3923

[CANN] Fix cann compilation error (#9891)

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

Assets 22

15 Oct 14:52

github-actions

b3922

755a9b2

b3922

llama : add infill sampler (#9896)

ggml-ci

Assets 22

15 Oct 14:32

github-actions

b3921

223c25a

b3921

server : improve infill context reuse (#9894)

ggml-ci

Assets 22

15 Oct 12:49

github-actions

b3920

fbc98b7

b3920

sampling : add XTC sampler (#9742)

* Initial XTC commit

Adds XTC sampler, not activated by default, but recommended settings by default.

* Cleanup

* Simplified chances calculation

To be more inline with the original implementation, chance is calculated once at the beginning.

* First fixes by comments

Still need to look into sorting

* Fixed trailing backspaces

* Fixed RNG to be reproduceable

Thanks to @slaren for directions

* Fixed forgotten header

* Moved `min_keep`

Moved from conditions to a simple check at the end.

* Fixed broken randomization

Thanks to @slaren for explanation

* Swapped sorting for a custom algorithm

Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.

* Algorithm rework

1. Scan token from top till the first non-penalizable
2. Remove the last captured token (the least probable above threshold)
3. Shift all tokens to override the remaining penalizable
4. Penalize and put them at the the bottom.

* Added XTC to `test-sampling`

* Simplified algorithm and more tests

* Updated info in common and args

* Merged back lost commits in common and arg

* Update dump info in common

* Fixed incorrect min_keep check

* Added XTC to README

* Renamed parameters, fixed info and defaults

* probability is at 0 by default, but XTC is included in sampling queue
* threshold higher than 0.5 switches XTC off

* Initial server support

* Added XTC to server UIs

* Fixed labels in old server UI

* Made algorithm safer and more readable

* Removed xtc_threshold_max

* Fixed arg after update

* Quick fixes by comments

* Simplified algorithm since threshold_max is removed

* Renamed random distribution

* Fixed tests and outdated README

* Small fixes

Assets 22

14 Oct 07:59

github-actions

b3917

a89f75e

b3917

server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b3931

b3930

b3927

b3926

b3925

b3923

b3922

b3921

b3920

b3917