1 files changed, 101 insertions, 31 deletions
diff --git a/examples/server/public_simplechat/readme.md b/examples/server/public_simplechat/readme.md
index de0dfc99..36a46885 100644
--- a/examples/server/public_simplechat/readme.md
+++ b/examples/server/public_simplechat/readme.md
@@ -11,18 +11,29 @@ in a simple way with minimal code from a common code base. Inturn additionally i
 multiple independent back and forth chatting to an extent, with the ai llm model at a basic level, with their
 own system prompts.
 
+This allows seeing the generated text / ai-model response in oneshot at the end, after it is fully generated,
+or potentially as it is being generated, in a streamed manner from the server/ai-model.
+
+Auto saves the chat session locally as and when the chat is progressing and inturn at a later time when you
+open SimpleChat, option is provided to restore the old chat session, if a matching one exists.
+
 The UI follows a responsive web design so that the layout can adapt to available display space in a usable
 enough manner, in general.
 
 Allows developer/end-user to control some of the behaviour by updating gMe members from browser's devel-tool
-console.
+console. Parallely some of the directly useful to end-user settings can also be changed using the provided
+settings ui.
 
-NOTE: Given that the idea is for basic minimal testing, it doesnt bother with any model context length and
-culling of old messages from the chat by default. However by enabling the sliding window chat logic, a crude
-form of old messages culling can be achieved.
+NOTE: Current web service api doesnt expose the model context length directly, so client logic doesnt provide
+any adaptive culling of old messages nor of replacing them with summary of their content etal. However there
+is a optional sliding window based chat logic, which provides a simple minded culling of old messages from
+the chat history before sending to the ai model.
 
-NOTE: It doesnt set any parameters other than temperature and max_tokens for now. However if someone wants
-they can update the js file or equivalent member in gMe as needed.
+NOTE: Wrt options sent with the request, it mainly sets temperature, max_tokens and optionaly stream for now.
+However if someone wants they can update the js file or equivalent member in gMe as needed.
+
+NOTE: One may be able to use this to chat with openai api web-service /chat/completions endpoint, in a very
+limited / minimal way. One will need to set model, openai url and authorization bearer key in settings ui.
 
 
 ## usage
@@ -52,9 +63,15 @@ Open this simple web front end from your local browser
 
 Once inside
 
-* Select between chat and completion mode. By default it is set to chat mode.
+* If you want to, you can change many of the default global settings
+  * the base url (ie ip addr / domain name, port)
+  * chat (default) vs completion mode
+  * try trim garbage in response or not
+  * amount of chat history in the context sent to server/ai-model
+  * oneshot or streamed mode.
 
 * In completion mode
+  * one normally doesnt use a system prompt in completion mode.
   * logic by default doesnt insert any role specific "ROLE: " prefix wrt each role's message.
     If the model requires any prefix wrt user role messages, then the end user has to
     explicitly add the needed prefix, when they enter their chat message.
@@ -88,12 +105,16 @@ Once inside
 * Wait for the logic to communicate with the server and get the response.
   * the user is not allowed to enter any fresh query during this time.
   * the user input box will be disabled and a working message will be shown in it.
+  * if trim garbage is enabled, the logic will try to trim repeating text kind of garbage to some extent.
 
 * just refresh the page, to reset wrt the chat history and or system prompt and start afresh.
 
 * Using NewChat one can start independent chat sessions.
   * two independent chat sessions are setup by default.
 
+* When you want to print, switching ChatHistoryInCtxt to Full and clicking on the chat session button of
+  interest, will display the full chat history till then wrt same, if you want full history for printing.
+
 
 ## Devel note
 
@@ -104,14 +125,31 @@ by developers who may not be from web frontend background (so inturn may not be
 end-use-specific-language-extensions driven flows) so that they can use it to explore/experiment things.
 
 And given that the idea is also to help explore/experiment for developers, some flexibility is provided
-to change behaviour easily using the devel-tools/console, for now. And skeletal logic has been implemented
-to explore some of the end points and ideas/implications around them.
+to change behaviour easily using the devel-tools/console or provided minimal settings ui (wrt few aspects).
+Skeletal logic has been implemented to explore some of the end points and ideas/implications around them.
 
 
 ### General
 
 Me/gMe consolidates the settings which control the behaviour into one object.
 One can see the current settings, as well as change/update them using browsers devel-tool/console.
+It is attached to the document object. Some of these can also be updated using the Settings UI.
+
+  baseURL - the domain-name/ip-address and inturn the port to send the request.
+
+  bStream - control between oneshot-at-end and live-stream-as-its-generated collating and showing
+  of the generated response.
+
+    the logic assumes that the text sent from the server follows utf-8 encoding.
+
+    in streaming mode - if there is any exception, the logic traps the same and tries to ensure
+    that text generated till then is not lost.
+
+      if a very long text is being generated, which leads to no user interaction for sometime and
+      inturn the machine goes into power saving mode or so, the platform may stop network connection,
+      leading to exception.
+
+  apiEP - select between /completions and /chat/completions endpoint provided by the server/ai-model.
 
   bCompletionFreshChatAlways - whether Completion mode collates complete/sliding-window history when
   communicating with the server or only sends the latest user query/message.
@@ -119,6 +157,19 @@ One can see the current settings, as well as change/update them using browsers d
   bCompletionInsertStandardRolePrefix - whether Completion mode inserts role related prefix wrt the
   messages that get inserted into prompt field wrt /Completion endpoint.
 
+  bTrimGarbage - whether garbage repeatation at the end of the generated ai response, should be
+  trimmed or left as is. If enabled, it will be trimmed so that it wont be sent back as part of
+  subsequent chat history. At the same time the actual trimmed text is shown to the user, once
+  when it was generated, so user can check if any useful info/data was there in the response.
+
+    One may be able to request the ai-model to continue (wrt the last response) (if chat-history
+    is enabled as part of the chat-history-in-context setting), and chances are the ai-model will
+    continue starting from the trimmed part, thus allows long response to be recovered/continued
+    indirectly, in many cases.
+
+    The histogram/freq based trimming logic is currently tuned for english language wrt its
+    is-it-a-alpabetic|numeral-char regex match logic.
+
   chatRequestOptions - maintains the list of options/fields to send along with chat request,
   irrespective of whether /chat/completions or /completions endpoint.
 
@@ -126,6 +177,14 @@ One can see the current settings, as well as change/update them using browsers d
     modify the existing options value or remove them, for now you can update this global var
     using browser's development-tools/console.
 
+    For string and numeric fields in chatRequestOptions, including even those added by a user
+    at runtime by directly modifying gMe.chatRequestOptions, setting ui entries will be auto
+    created.
+
+  headers - maintains the list of http headers sent when request is made to the server. By default
+  Content-Type is set to application/json. Additionally Authorization entry is provided, which can
+  be set if needed using the settings ui.
+
   iRecentUserMsgCnt - a simple minded SlidingWindow to limit context window load at Ai Model end.
   This is disabled by default. However if enabled, then in addition to latest system message, only
   the last/latest iRecentUserMsgCnt user messages after the latest system prompt and its responses
@@ -140,7 +199,8 @@ One can see the current settings, as well as change/update them using browsers d
 
 By using gMe's iRecentUserMsgCnt and chatRequestOptions.max_tokens one can try to control the
 implications of loading of the ai-model's context window by chat history, wrt chat response to
-some extent in a simple crude way.
+some extent in a simple crude way. You may also want to control the context size enabled when
+the server loads ai-model, on the server end.
 
 
 Sometimes the browser may be stuborn with caching of the file, so your updates to html/css/js
@@ -149,28 +209,15 @@ matter clearing site data, dont directly override site caching in all cases. Wor
 have to change port. Or in dev tools of browser, you may be able to disable caching fully.
 
 
-Concept of multiple chat sessions with different servers, as well as saving and restoring of
-those across browser usage sessions, can be woven around the SimpleChat/MultiChatUI class and
-its instances relatively easily, however given the current goal of keeping this simple, it has
-not been added, for now.
+Currently the server to communicate with is maintained globally and not as part of a specific
+chat session. So if one changes the server ip/url in setting, then all chat sessions will auto
+switch to this new server, when you try using those sessions.
 
 
 By switching between chat.add_system_begin/anytime, one can control whether one can change
 the system prompt, anytime during the conversation or only at the beginning.
 
 
-read_json_early, is to experiment with reading json response data early on, if available,
-so that user can be shown generated data, as and when it is being generated, rather than
-at the end when full data is available.
-
-  the server flow doesnt seem to be sending back data early, atleast for request (inc options)
-  that is currently sent.
-
-  if able to read json data early on in future, as and when ai model is generating data, then
-  this helper needs to indirectly update the chat div with the recieved data, without waiting
-  for the overall data to be available.
-
-
 ### Default setup
 
 By default things are setup to try and make the user experience a bit better, if possible.
@@ -179,7 +226,8 @@ However a developer when testing the server of ai-model may want to change these
 Using iRecentUserMsgCnt reduce chat history context sent to the server/ai-model to be
 just the system-prompt, prev-user-request-and-ai-response and cur-user-request, instead of
 full chat history. This way if there is any response with garbage/repeatation, it doesnt
-mess with things beyond the next question/request/query, in some ways.
+mess with things beyond the next question/request/query, in some ways. The trim garbage
+option also tries to help avoid issues with garbage in the context to an extent.
 
 Set max_tokens to 1024, so that a relatively large previous reponse doesnt eat up the space
 available wrt next query-response. However dont forget that the server when started should
@@ -189,11 +237,33 @@ also be started with a model context size of 1k or more, to be on safe side.
   internal n_predict, for now add the same here on the client side, maybe later add max_tokens
   to /completions endpoint handling code on server side.
 
-Frequency and presence penalty fields are set to 1.2 in the set of fields sent to server
-along with the user query. So that the model is partly set to try avoid repeating text in
-its response.
+NOTE: One may want to experiment with frequency/presence penalty fields in chatRequestOptions
+wrt the set of fields sent to server along with the user query. To check how the model behaves
+wrt repeatations in general in the generated text response.
+
+A end-user can change these behaviour by editing gMe from browser's devel-tool/console or by
+using the providing settings ui.
+
+
+### OpenAi / Equivalent API WebService
+
+One may be abe to handshake with OpenAI/Equivalent api web service's /chat/completions endpoint
+for a minimal chatting experimentation by setting the below.
+
+* the baseUrl in settings ui
+  * https://api.openai.com/v1 or similar
+
+* Wrt request body - gMe.chatRequestOptions
+  * model (settings ui)
+  * any additional fields if required in future
+
+* Wrt request headers - gMe.headers
+  * Authorization (available through settings ui)
+    * Bearer THE_OPENAI_API_KEY
+  * any additional optional header entries like "OpenAI-Organization", "OpenAI-Project" or so
 
-A end-user can change these behaviour by editing gMe from browser's devel-tool/console.
+NOTE: Not tested, as there is no free tier api testing available. However logically this might
+work.
 
 
 ## At the end