GITBOOK-151: No subject

scosman · gitbook-bot · commit c2e6e52a8a14 · 2025-03-02T15:30:35.000Z
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -8,9 +8,9 @@
 * [Models and AI Providers](docs/models-and-ai-providers.md)
 * [Synthetic Data Generation](docs/synthetic-data-generation.md)
 * [Fine Tuning Guide](docs/fine-tuning-guide.md)
+* [Evaluations](docs/evaluations.md)
 * [Guide: Train a Reasoning Model](docs/guide-train-a-reasoning-model.md)
 * [Reasoning & Chain of Thought](docs/reasoning-and-chain-of-thought.md)
-* [Evaluations](docs/evaluations.md)
 * [Prompts](docs/prompts.md)
 * [Reviewing and Rating](docs/reviewing-and-rating.md)
 * [Collaboration](docs/collaboration.md)
diff --git a/docs/evaluations.md b/docs/evaluations.md
@@ -413,6 +413,13 @@ Like mean squared error, but scores are normalized to the range 0-1. For example
 
 </details>
 
+#### Resolving "N/A" Correlation Scores
+
+If you see "N/A" scores in your correlation table, it means more data is needed. This can be one of two cases
+
+* _**Simply not enough data**_: if your eval method dataset if very small (<10 items) it can be impossible to produce confident correlation scores. Add more data to resolve this case.
+* _**Not enough variation of human ratings in the eval method dataset**_: if you have a larger dataset, but still get N/A, it's likely there isn't enough variation in your dataset for the given score. For example, if all of the golden samples of a score pass, the evaluator won't produce a confident correlation score, as it has no failing examples and everything is a tie. Add more content to your eval methods dataset, designing the content to fill out the missing score ranges. You can use synthetic data gen [human guidance](synthetic-data-generation.md#human-guidance) to generate examples that fail.
+
 #### Select a Default Eval Method
 
 Once you have a winner, click the "Set as default" button to make this eval-method the default for your eval.
diff --git a/docs/synthetic-data-generation.md b/docs/synthetic-data-generation.md
@@ -58,6 +58,12 @@ Adding a short guidance prompt can quickly improve the quality of the generated
 
 <figure><img src="../.gitbook/assets/Screenshot 2025-02-07 at 9.31.39 AM.png" alt="" width="152"><figcaption><p>Click "Add Guidance"</p></figcaption></figure>
 
+{% hint style="info" %}
+Often human guidance is used for producing adversarial content: poor quality or inappropriate content. This is done to ensure an [evaluation](evaluations.md) can detect and fail this sort of content.
+
+However, LLMs will often do their best to avoid producing poor or inappropriate content, even when asked for it. If you find that's the case, use an uncensored and unaligned model like Dolphin 8x22B or Grok. These models will follow instructions more closely, and do not attempt to censor their content.
+{% endhint %}
+
 #### Interactive Curation UX
 
 Kiln synthetic data generation is designed to be used in our interactive UI.