Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| 712:configure_stages [2026/03/02 12:38] – Prinz, Patrick | 712:configure_stages [2026/03/25 09:44] (current) – [AI Feature Tokenizer Configuration] Prinz, Patrick | ||
|---|---|---|---|
| Line 526: | Line 526: | ||
| </cg> | </cg> | ||
| </ | </ | ||
| + | |||
| + | **When using models newer than gpt4o (like gpt5)** | ||
| + | |||
| + | >The temperature has to be adjusted to the default value of 1. Add < | ||
| + | |||
| + | >The < | ||
| **AI content generation feature (OpenAI compatible): | **AI content generation feature (OpenAI compatible): | ||
| Line 687: | Line 693: | ||
| ===== AI Feature Tokenizer Configuration ===== | ===== AI Feature Tokenizer Configuration ===== | ||
| - | LLM that are NOT using the standard byte-pair-encoding algorithm CANNOT be used with the Azure tokenizer. Stages implements the HuggingFaceTokenizer from deep java library to support customized tokenizer. | + | LLM that are NOT using the standard byte-pair-encoding algorithm |
| Add the property “tokenizerFolder” to the cg-host section | Add the property “tokenizerFolder” to the cg-host section | ||
| Line 730: | Line 736: | ||
| 3. Add tokenizer configuration: | 3. Add tokenizer configuration: | ||
| - | As the custom models are not processed by our Standard Azure Tokenizer, there has to be added a custom configuration and put into conf/ | + | As the custom models are not processed by the standard azure tokenizer, there has to be a custom configuration |
| For lots of models various configurations can be downloaded from Huggingface.co, | For lots of models various configurations can be downloaded from Huggingface.co, | ||
| Line 741: | Line 747: | ||
| >In tokenizer_config.json look up objects (e.g. bos_token) and just use the value from the “content” attribute value instead of the whole object. See the following file examples: | >In tokenizer_config.json look up objects (e.g. bos_token) and just use the value from the “content” attribute value instead of the whole object. See the following file examples: | ||
| - | **Original | + | **Original |
| <code -> | <code -> | ||
| Line 781: | Line 787: | ||
| </ | </ | ||
| - | **Updated | + | **Updated |
| <code -> | <code -> | ||