Skip to content

Releases: ggerganov/llama.cpp

b3931

17 Oct 00:40
73afe68
Compare
Choose a tag to compare
fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875)

* fix: use `vm_allocate` to allocate CPU backend buffer on macOS

* fix: switch to `posix_memalign` to keep existing `free()` usages work

* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS

* style: formatting

* fix: move const outside of `#ifndef`

* style: formatting

* fix: unused var

* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`

* fix: unused var

* fix: page align to `GGUF_DEFAULT_ALIGNMENT`

* fix: page align to `TENSOR_ALIGNMENT`

* fix: convert `TENSOR_ALIGNMENT` to a macro

* fix: increase page size to `32` on iOS

* fix: iOS page size

* fix: `hbw_posix_memalign` alignment

b3930

16 Oct 19:03
9e04102
Compare
Choose a tag to compare
llama : suppress conversion from 'size_t' to 'int' (#9046)

* llama : suppress conversion from 'size_t' to 'int'

This commit updates llm_tokenizer_spm.tokenize to suppress/remove the
following warnings that are generated on Windows when using MSVC:

```console
src\llama-vocab.cpp(211,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
src\llama-vocab.cpp(517,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
```

This is done by adding a cast for the size_t returned from
symbols.size(). I believe this is safe as it seems unlikely that
symbols, which stores an entry for each UTF8 character, would become
larger than INT_MAX.

The motivation for this change is to reduce the number of warnings that
are currently generated when building on Windows.

* squash! llama : suppress conversion from 'size_t' to 'int'

Move cast into for loop.

b3927

16 Oct 12:22
10433e8
Compare
Choose a tag to compare
llama : add tensor name for "result_norm" (#9907)

Signed-off-by: Molly Sophia <[email protected]>

b3926

16 Oct 12:16
1f66b69
Compare
Choose a tag to compare
server : fix the disappearance of the end of the text (#9867)

* server: fix the disappearance of the end of the text when streaming with stop strings

* simplify "send text" checks

b3925

16 Oct 12:06
0e41b30
Compare
Choose a tag to compare
sync : ggml

b3923

16 Oct 01:50
becfd38
Compare
Choose a tag to compare
[CANN] Fix cann compilation error (#9891)

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

b3922

15 Oct 14:52
755a9b2
Compare
Choose a tag to compare
llama : add infill sampler (#9896)

ggml-ci

b3921

15 Oct 14:32
223c25a
Compare
Choose a tag to compare
server : improve infill context reuse (#9894)

ggml-ci

b3920

15 Oct 12:49
fbc98b7
Compare
Choose a tag to compare
sampling : add XTC sampler (#9742)

* Initial XTC commit

Adds XTC sampler, not activated by default, but recommended settings by default.

* Cleanup

* Simplified chances calculation

To be more inline with the original implementation, chance is calculated once at the beginning.

* First fixes by comments

Still need to look into sorting

* Fixed trailing backspaces

* Fixed RNG to be reproduceable 

Thanks to @slaren for directions

* Fixed forgotten header

* Moved `min_keep` 

Moved from conditions to a simple check at the end.

* Fixed broken randomization

Thanks to @slaren for explanation

* Swapped sorting for a custom algorithm

Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.

* Algorithm rework

1. Scan token from top till the first non-penalizable
2. Remove the last captured token (the least probable above threshold)
3. Shift all tokens to override the remaining penalizable
4. Penalize and put them at the the bottom.

* Added XTC to `test-sampling`

* Simplified algorithm and more tests

* Updated info in common and args

* Merged back lost commits in common and arg

* Update dump info in common

* Fixed incorrect min_keep check

* Added XTC to README

* Renamed parameters, fixed info and defaults

* probability is at 0 by default, but XTC is included in sampling queue
* threshold higher than 0.5 switches XTC off

* Initial server support

* Added XTC to server UIs

* Fixed labels in old server UI

* Made algorithm safer and more readable

* Removed xtc_threshold_max

* Fixed arg after update

* Quick fixes by comments

* Simplified algorithm since threshold_max is removed

* Renamed random distribution

* Fixed tests and outdated README

* Small fixes

b3917

14 Oct 07:59
a89f75e
Compare
Choose a tag to compare
server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>