ik_llama.cpp.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-02-18	common, server : surface min_keep as its own parameter (#5567)	Robey Holderith
	* Feature - surface min_keep as its own parameter * Updated README with min_keep param
2024-02-06	server : remove model.json endpoint (#5371)	Alexey Parfenov

2024-01-03	server : throw an error when `slot unavailable` (#4741)	Justin Parker

2024-01-02	server : add token counts to html footer (#4738)	Phil H
	* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <ph@got-root.co.uk>
2023-12-15	server : add optional API Key Authentication example (#4441)	ShadovvBeast
	* Add API key authentication for enhanced server-client security * server : to snake_case --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-12	server : tweak default sampling parameters (#4367)	kalomaze
	* Set a more typical Top P setting as the default * Update temp max
2023-12-12	english : use `typos` to fix comments and logs (#4354)	Richard Kiss

2023-11-19	server : relay error messages (#4131)	SoftwareRenderer

2023-11-10	server : allow continue edit on completion mode (#3950)	Jhen-Jie Hong
	* server : allow continue edit on completion mode * server : handle abort case in runCompletion * server : style improvement
2023-11-08	server : add min_p param (#3877)	Mihai
	* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841 * Use spaces instead of tabs * Update index.html.hpp after running deps.sh * Fix test - fix line ending
2023-10-22	server : parallel decoding and multimodal (#3677)	Georgi Gerganov
	* implementing parallel decoding in server example * crash fixed * save dev progress * refactored sampling function * completion endpoint working * multiple client support * grammar + no stream completion * cached prompt support * chat.mjs support cached prompt + some fixes * server ui now support multiple clients * unused change reverted * fixed timings per slot * add context swap * add changes to README.md * llava multimodal integration * fixed tokens probs * add multimodal input - alfa * refactor code + remove unused comments + improved README.md * fix compilation errors with llvm * notify the user from server ui that multimodality is unavialable * some ci fixes * fix ci make build undefined ref errors * fix long prompt than ctx proposed in #3639 * fixed premature end due stop word * context shift fixed * fix llava implementation * sync README.md changes * readme change * update api like OpenAI * multimodal support enabled by default * fix make bui;d errors * fix multiple clients * fix zig build * new sampling API * latest changes of sampling API * server : coding-style normalization * server : coding-style normalization (part 2) * server : remove beam-search functionality * server : bug fix in ingest_images n_tokens is incremented internally by llama_batch_add * server : use refs + use llama_batch_clear() * server : snake case * server : minor sync * added thread safe pipeline * server : bach has to be allocated for n_parallel sequences * server : no need for atomic int - already using mutex * server : logs + minor code style * server : fix multibyte handle in partial response (#3706) * fix image load + view image in chat * make : silence stb warnings * clip : link to ggml, not to llama * server : fix switch fallthrough * server : fix crash in Debug on macOS (I have no idea why this fixes it!?) * server : refactor ctx_sampling init + n_ctx + names * server : bug fix for prompt caching * Do not save/load image_data to localStorage * editorconfig : new line in index.html * server : completion requests remember slot_id * Update readme to document multimodal in server * server : minor style * Update readme to document multimodal in server * server : hide ctx_sampling->prev behind API (#3696) * server : apply fix from #3722 * server : fix slot reuse * server : add comment about changing slot_state to bool --------- Co-authored-by: FSSRepo <go778sgt@gmail.com> Co-authored-by: Damian Stewart <d@damianstewart.com> Co-authored-by: Steward Garcia <57494570+FSSRepo@users.noreply.github.com> Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
2023-10-12	server : add completion mode (no chat) (#3582)	Aarni Koskela

2023-09-04	server : add a subtle loading animation to the edit box (#2466)	Aarni Koskela
	* editorconfig: add override for the server HTML (which already is 2-space indented) * server: add a subtle loading animation to the edit box
2023-08-25	server : display token probabilities in the UI (#2489)	Jhen-Jie Hong
	* server : add n_probs param in chat UI * server : keep message data array & show in probabilites component * server : add simple popover component * server : fix completion_probabilities undefined if not set n_probs * server : implement Probabilites * server : handle bytes * server : make n_probs max to 10 for easy scroll * server : adjust for dark/light mode * server : Fix regenerated prompt * server : update index.html.hpp * server : convert prob to percentage + show original value as div title * server : fix Probabilites not used if included empty str * server : skip byte pair in display probabilites * server : remove array check of completion_probabilities in messages * skip empty array or byte pair (> 1) in Probabilites * generate index.html.hpp * fix incorrect prob convert if the str is already a known token * use final response to show probabilities on stop * revert unnecessary change * correct probabilites usage * remove unused function * always send partial response for get correct probs of last to_send * fix typo * fix content of format_final_response * refactor probs render & make pColor transparent if not found * send empty string when got stop_pos in partial * avoid unnecessary empty data event & send rest of partial tokens on stop * use <br /> for new line * skip -1 tok in loop to avoid send '' on end * trim last new lines on stop * revert unnecessary change
2023-08-19	server : better default prompt (#2646)	Georgi Gerganov

2023-08-18	server : support for saving templates in browser LocalStorage (#2486)	staviq
	* support for templates in browser LocalStorage * sync accepted #2409 fix from upstream * convert autosave invocation to useEffect * Apply suggestions from code review Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> * Regen index.html.cpp, suggested from code review --------- Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>
2023-08-14	server : fix default grammar by use empty string in the UI (#2604)	Jhen-Jie Hong

2023-08-14	server : implement json-schema-to-grammar.mjs & add grammar param in the UI ↵	Jhen-Jie Hong
	(#2588) * server : implement json-schema-to-grammar.mjs by follow python impl * server : add grammar support in chat.mjs * server : implement grammer param in the UI * server : generate .hpp * server : remove trailing whitespaces * server : generate .hpp * server : fix sort of prop pairs * server : optimize regex & iteration
2023-08-04	fix firefox autoscroll (#2519)	Jonas Wunderlich

2023-08-04	Fixing race condition in server and partial stream handling in frontend. (#2391)	Stephen Nichols
	* Fixing race condition in server.cpp and partial stream handling in completion.js * Reverting assert edits. * Adding newline to eof
2023-08-01	server : Support dark mode (#2414)	ebraminio
	* server : Support dark mode So it respects user system light / dark settings. * Update index.html.hpp by running ./deps.sh
2023-07-25	[Server] Escape HTML in webchat (#2368)	Henri Vasserman
	* escape HTML in webchat * add amp
2023-07-24	Chat UI extras (#2366)	Aarni Koskela
	* makefile: correct deps for server * server: tighten settings layout a little * server: expose all currently configured generation params in UI * server: expose remaining generation params, for the adventurous * server: embetter mirostat fields
2023-07-05	Expose generation timings from server & update completions.js (#2116)	Tobias Lütke
	* use javascript generators as much cleaner API Also add ways to access completion as promise and EventSource * export llama_timings as struct and expose them in server * update readme, update baked includes * llama : uniform variable names + struct init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-04	Simple webchat for server (#1998)	Tobias Lütke
	* expose simple web interface on root domain * embed index and add --path for choosing static dir * allow server to multithread because web browsers send a lot of garbage requests we want the server to multithread when serving 404s for favicon's etc. To avoid blowing up llama we just take a mutex when it's invoked. * let's try this with the xxd tool instead and see if msvc is happier with that * enable server in Makefiles * add /completion.js file to make it easy to use the server from js * slightly nicer css * rework state management into session, expose historyTemplate to settings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>