summaryrefslogtreecommitdiff
path: root/examples
AgeCommit message (Collapse)Author
2024-01-09server : update readme about token probs (#4777)Behnam M
* updated server readme to reflect the gg/server-token-probs-4088 commit added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`. * simplified the `completion_probabilities` JSON schema It's now easier to understand what the structure of `completion_probabilities` looks like. * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-09server : add api-key flag to documentation (#4832)Zsapi
Document the api-key flag added to server in https://github.com/ggerganov/llama.cpp/pull/4441
2024-01-08llama.swiftui : update readmeGeorgi Gerganov
2024-01-08main : add self-extend support (#4815)Georgi Gerganov
* examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * llama : "self-extend"-like context extension * passkey : add comment * main : add Self-Extend support * llama : add comment about llama_kv_cache_seq_div
2024-01-08examples : add passkey test (#3856)Georgi Gerganov
* examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * make : add passkey target * passkey : add "self-extend"-like context extension (#4810) * llama : "self-extend"-like context extension * passkey : add comment * passkey : add readme
2024-01-07llama-bench : add no-kv-offload parameter (#4812)slaren
2024-01-07llama.swiftui : use llama.cpp as SPM package (#4804)Alex Azarov
2024-01-07llama.swiftui : add visionOS target (#4805)Alex Azarov
2024-01-07server : fix n_predict check (#4798)Georgi Gerganov
2024-01-06llama.swiftui : use correct pointer for llama_token_eos (#4797)Daniel Illescas Romero
2024-01-06examples : improve base-translate.sh script (#4783)Georgi Gerganov
2024-01-05metal : switch back to default.metallib (ggml/681)Georgi Gerganov
ggml-ci
2024-01-05examples : add few-shot translation example (#4783)Georgi Gerganov
2024-01-04finetune : remove unused includes (#4756)Daniel Bevenius
This commit removes unused includes from finetune.cpp. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-04server : send token probs for "stream == false" (#4714)Georgi Gerganov
2024-01-04llama.swiftui : support loading custom model from file picker (#4767)singularity
* swiftui: support load model from file picker * swiftui: remove trailing whitespace
2024-01-04server : fix options in README.md (#4765)Michael Coppola
* fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-04llama.swiftui : fix build of ggml.metallib (#4754)singularity
* metal: fix metal backend init failure in swiftui * metal: build ggml.metallib instead of copy src * llama.swift : remove debug flags from metallib build --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03server : throw an error when `slot unavailable` (#4741)Justin Parker
2024-01-02server : add token counts to html footer (#4738)Phil H
* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <ph@got-root.co.uk>
2024-01-02editorconfig : fix whitespace and indentation #4710Georgi Gerganov
2024-01-02server : add --override-kv parameter (#4710)minarchist
* Changes to server to allow metadata override * documentation * flake.nix: expose full scope in legacyPackages * flake.nix: rocm not yet supported on aarch64, so hide the output * flake.nix: expose checks * workflows: nix-ci: init; build flake outputs * workflows: nix-ci: add a job for eval * workflows: weekly `nix flake update` * workflows: nix-flakestry: drop tag filters ...and add a job for flakehub.com * workflows: nix-ci: add a qemu job for jetsons * flake.nix: suggest the binary caches * flake.lock: update to a commit recently cached by nixpkgs-cuda-ci --------- Co-authored-by: John <john@jLap.lan> Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
2024-01-02finetune: fix typo in README.md (#4733)Daniel Bevenius
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-12-30clip : refactor + bug fixes (#4696)Georgi Gerganov
* clip : refactor + bug fixes ggml-ci * server : add log message
2023-12-29clip : use ggml_backend_buffer_is_host (#4205)Georgi Gerganov
2023-12-29clip : enable gpu backend (#4205)Steward Garcia
* clip: enable CUDA backend * add missing kernels * add enough padding for alignment * remove ggml_repeat of clip.cpp * add metal backend * llava : fixes - avoid ggml_repeat - use GGML_USE_ instead of CLIP_USE_ macros - remove unused vars --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-29cmake : fix ld warning duplicate libraries libllama.a (#4671)Cuong Trinh Manh
* fix "ld: warning: ignoring duplicate libraries: '../libllama.a'" * fix warning in example.
2023-12-29llava-cli : refactor to use sampling library (#4669)Justine Tunney
This change makes it possible to use flags like `--grammar` when using the `llava-cli` program. The rest is just code cleanup deleting a long standing TODO comment. This change also ensures that logging information is emitted to stderr which helps the `llava-cli` command be more friendly to shell scripts. See Mozilla-Ocho/llamafile@1cd334f
2023-12-29server : replace sleep with condition variables (#4673)Justine Tunney
The server currently schedules tasks using a sleep(5ms) busy loop. This adds unnecessary latency since most sleep implementations do a round up to the system scheduling quantum (usually 10ms). Other libc sleep impls spin for smaller time intervals which results in the server's busy loop consuming all available cpu. Having the explicit notify() / wait() code also helps aid in the readability of the server code. See mozilla-Ocho/llamafile@711344b
2023-12-29server : fix OpenAI server sampling w.r.t. penalty. (#4675)SakuraUmi
2023-12-29server : allow to generate multimodal embeddings (#4681)Karthik Sethuraman
2023-12-29main-cmake-pkg : fix build issue (#4665)andrijdavid
* Fix main-cmake-pkg compilation * Use glob to load common files * cmake : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-29llama.swiftui : fix infinite loop, ouput timings, buff UI (#4674)Peter Sugihara
* fix infinite loop * slight UI simplification, clearer UX * clearer UI text, add timings to completion log
2023-12-28Fix OpenAI server sampling w.r.t. temp and seed (#4668)Justine Tunney
The default values for tfs_z and typical_p were being set to zero, which caused the token candidates array to get shrunk down to one element thus preventing any sampling. Note this only applies to OpenAI API compatible HTTP server requests. The solution is to use the default values that OpenAI documents, as well as ensuring we use the llama.cpp defaults for the rest. I've tested this change still ensures deterministic output by default. If a "temperature" greater than 0 is explicitly passed, then output is unique each time. If "seed" is specified in addition to "temperature" then the output becomes deterministic once more. See mozilla-Ocho/llamafile#117 See mozilla-Ocho/llamafile@9e4bf29
2023-12-27finetune : fix output formatting in print_params (#4653)Daniel Bevenius
This commit fixes the output formatting in the print_params function which currently looks like this: ```console print_params: n_vocab: 32000 print_params: n_ctx: 128 print_params: n_embd: 4096 print_params: n_ff: 11008 print_params: n_head: 32 print_params: n_head_kv: 32 print_params: n_layer: 32 print_params: norm_rms_eps : 0.000010 print_params: rope_freq_base : 10000.000000 print_params: rope_freq_scale : 1.000000 ``` With this comit the output will look like this: ```console print_params: n_vocab : 32000 print_params: n_ctx : 128 print_params: n_embd : 4096 print_params: n_ff : 11008 print_params: n_head : 32 print_params: n_head_kv : 32 print_params: n_layer : 32 print_params: norm_rms_eps : 0.000010 print_params: rope_freq_base : 10000.000000 print_params: rope_freq_scale : 1.000000 ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-12-23server : allow to specify custom prompt for penalty calculation (#3727)Alexey Parfenov
2023-12-22lookup : add prompt lookup decoding example (#4484)LeonEricsson
* initial commit, going through initializations * main loop finished, starting to debug * BUG: generates gibberish/repeating tokens after a while * kv_cache management * Added colors to distinguish drafted tokens (--color). Updated README * lookup : fix token positions in the draft batch * lookup : use n_draft from CLI params * lookup : final touches --------- Co-authored-by: Leon Ericsson <leon.ericsson@icloud.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-21ggml : change ggml_scale to take a float instead of tensor (#4573)Georgi Gerganov
* ggml : change ggml_scale to take a float instead of tensor * ggml : fix CPU implementation * tests : fix test-grad0 ggml-ci
2023-12-21gguf : simplify example dependenciesGeorgi Gerganov
2023-12-18llama.swiftui : add tinyllama 1.1B F16Georgi Gerganov
2023-12-18llama.swiftui : add more modelsGeorgi Gerganov
2023-12-17llama.swiftui : add bench functionality (#4483)Georgi Gerganov
* llama.swiftui : add bench button * llama.swiftui : initial bench functionality * force to use n_gpu_layers on simulator * add download buttons & expose llamaState.loadModel * update project.pbxproj * comment #Preview & fix editorconfig check * gitignore : xcode stuff * llama.swiftui : UX improvements * llama.swiftui : avoid data copy via "downloadTask" * llama.swiftui : remove model from project * llama : remove "mostly" from model infos * llama.swiftui : improve bench --------- Co-authored-by: jhen <developer@jhen.me>
2023-12-17finetune : keep allocs alive until all allocations are done (#4486)slaren
2023-12-17server : disable llm logs if SERVER_VERBOSE is off (#3792)olexiyb
2023-12-17server : fix grammar being ignored (#4494)AdithyanI
Fix bug in identifying the grammar.
2023-12-17server : fix possible ambiguity in content type charset (#4501)Alexey Parfenov
2023-12-17server : allow requests larger than 8K (#4500)mzcu
2023-12-15server : add optional API Key Authentication example (#4441)ShadovvBeast
* Add API key authentication for enhanced server-client security * server : to snake_case --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14ggml : remove n_dims from ggml_tensor (#4469)slaren
ggml-ci
2023-12-14ggml : add ggml_row_size() (fixes llama out of space) (#4461)LostRuins
* Fixes "Not enough space in the context's memory pool" encountered on certain models, which seems to be caused by some imprecision related to the automatic casting of floating point values * do not cast to size_t, instead just use doubles * ggml : add ggml_row_size(), deprecate ggml_type_sizef() * ggml : fix row size compute to avoid overflows * tests : fix sizey -> sizez --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>