Differences

This shows you the differences between two versions of the page.

--- 712:configure_stages [2026/03/02 12:38] – Prinz, Patrick
+++ 712:configure_stages [2026/03/25 09:44] (current) – [AI Feature Tokenizer Configuration] Prinz, Patrick
@@ Line 526: / Line 526: @@
 </cg>
 </code>
+**When using models newer than gpt4o (like gpt5)**
+>The temperature has to be adjusted to the default value of 1. Add <cg-property name="temperature" value="1" /> to the relevant <cg-hosts>
+>The <chatbot-property name="maxTokens" value="500"></chatbot-property> has to be replaced with <chatbot-property name="maxCompletionTokens" value="500"></chatbot-property>
 **AI content generation feature (OpenAI compatible):**
@@ Line 687: / Line 693: @@
 ===== AI Feature Tokenizer Configuration =====
-LLM that are NOT using the standard byte-pair-encoding algorithm CANNOT be used with the Azure tokenizer. Stages implements the HuggingFaceTokenizer from deep java library to support customized tokenizer.
+LLM that are NOT using the standard byte-pair-encoding algorithm or are provided via the openAICompatible adapter CANNOT be used with the Azure tokenizer. Stages implements the HuggingFaceTokenizer from deep java library to support customized tokenizer.
 Add the property “tokenizerFolder” to the cg-host section
@@ Line 730: / Line 736: @@
 . Add tokenizer configuration:
-As the custom models are not processed by our Standard Azure Tokenizer, there has to be added a custom configuration and put into conf/tokenizer. The custom tokenizer can use tokenizer.json and tokenizer-config.json files.
+As the custom models are not processed by the standard azure tokenizer, there has to be a custom configuration added and put into a folder in conf/tokenizer that matches the config.xml (e.g. conf/tokenizer/DeepSeek_V3_2). The custom tokenizer can use tokenizer.json and tokenizer_config.json files.
 For lots of models various configurations can be downloaded from Huggingface.co, e.g.:
@@ Line 741: / Line 747: @@
 >In tokenizer_config.json look up objects (e.g. bos_token) and just use the value from the “content” attribute value instead of the whole object. See the following file examples:
-**Original tokenizer-config.json :**
+**Original tokenizer_config.json :**
 <code ->
@@ Line 781: / Line 787: @@
 </code>
-**Updated tokenizer-config.json**
+**Updated tokenizer_config.json**
 <code ->