Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate fields based on their mappings and the dynamic templates set #2285

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

mrodm
Copy link
Contributor

@mrodm mrodm commented Dec 17, 2024

Relates #2207

Add support to validate mappings after ingesting documents based on dynamic templates.

It also updates how mappings not found in the preview are compared/validated with ECS fields.

Assumptions:

  • Filtered out dynamic templates created by import_mappings
  • Filtered out dynamic templates created by ecs@mappings component template
  • Filtered out the field definitions added via appendECSMappingMultifields function.
  • All the mappings present in the actual mappings (after ingesting documents) must fulfill one of these conditions to be valid:
    • All the fields from the actual mapping must be present in the preview (not viceversa).
    • All the fields from the actual mapping must be present in the dynamic template (not viceversa).
  • Multi-fields are just validated with the definitions in the preview (not in ECS or in dynamic templates).

Author's Checklist

  • Clean log/debug statements
  • Run tests on packages from integration repository

multi_fields are now compared directly with dynamic templates or ECS
fields. It was required to know whether or not that field was a
multi_field or not to apply some specific validations.
@mrodm mrodm self-assigned this Dec 17, 2024
@mrodm
Copy link
Contributor Author

mrodm commented Dec 19, 2024

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12164

…ing for other dynamic templates if parameters do not match
@mrodm
Copy link
Contributor Author

mrodm commented Dec 19, 2024

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12164

@mrodm
Copy link
Contributor Author

mrodm commented Dec 19, 2024

As a reference, this is the previous errors found without support of dynamic templates: #2214 (comment)

Looking at the errors in this build: https://buildkite.com/elastic/integrations/builds/19707

There are some packages that are failing due to missing parameters:

There are packages that fail due to different types comparing with ECS:

  • box_events: different type for threat.enrichments.indicator.first_seen and threat.enrichments.indicator.last_seen
  • claroty_ctd: different type for threat.indicator.modified_at
  • mimecast: different type for threat.indicator.first_seen and threat.indicator.modified_at
  • ti_anomali: different type for threat.indicator.modified_at

Packages with issues related to multi_fields:

  • fortinet_fortimail: missing multi_field (probably due to ecs@mappings component template)
  • cef: missing multi_field (probably due to ecs@mappings component template)

@@ -448,7 +450,7 @@ func appendECSMappingMultifields(schema []FieldDefinition, prefix string) []Fiel
{
Name: "text",
Type: "match_only_text",
External: "ecs",
External: externalFieldAppendedTag,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workaround to be able to filter multifieds generated here during the validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a better approach for this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have many failures if we don't do this? Maybe we can leave this for a separate change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As talked offline, we decided to ignore the validation of these multi-fields.
Added a comment about the reasoning behind this in https://github.com/elastic/elastic-package/pull/2285/files#diff-67b718cebdeb840d99361c07166f75df079827a75df20850ef6cada861304521R318-R322

Comment on lines 854 to 859
var matches bool
if fullRegex {
matches, err = stringMatchesPatterns(values, name)
} else {
matches, err = stringMatchesWildcards(values, name)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does match_pattern applies also for unmatch ?

According to this, maybe it just applies for match.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html#match-unmatch

WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the docs seem to indicate that it only applies to match, but it would be weird that it is not possible to set this option for unmatch. Maybe match_pattern applies to both?

In any case, do we generate templates with these options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the docs seem to indicate that it only applies to match, but it would be weird that it is not possible to set this option for unmatch. Maybe match_pattern applies to both?

According to the tests it looks like it applies to both parameters
https://github.com/elastic/elasticsearch/blob/fe3a3cb70e3dd18b3e5ddaf6a63389b4cb16b776/server/src/test/java/org/elasticsearch/index/mapper/DynamicTemplatesTests.java#L2391-L2397

And also it looks like it could apply to path_match and path_unmatch, lookign at this test
https://github.com/elastic/elasticsearch/blob/75bf27c8b69b7c32fb200afd5a244e7b1f8aea32/x-pack/plugin/core/template-resources/src/main/resources/watch-history.json#L23

Currently, I think that math_pattern parameter is not used. Fleet does not generate that and currently the ECS component template does not use it neither.

Comment on lines 156 to 157
// TODO: Should we remove this code to load external and local fields?
fieldsDir := filepath.Join(fieldsParentDir, "fields")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code to load the external and local fields maybe can be removed. That would mean that just the schema set (optionally) via options is taken into account.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good.

internal/fields/mappings.go Outdated Show resolved Hide resolved
Comment on lines 896 to 924
case "match_mapping_type", "unmatch_mapping_type":
// Do nothing
// These comparisons are done with the original data, and it's likely that the
// resulting mapping does not have the same type since it could change by the `mapping` field
// case "match_mapping_type":
// logger.Debugf("> Check match_mapping_type: %q (type %s)", currentPath, fieldType)
// values, err := parseSetting(value)
// if err != nil {
// return fmt.Errorf("failed to check match_mapping_type setting: %w", err)
// }
// logger.Debugf(">> Comparing to values: %s", values)
// if slices.Contains(values, "*") {
// continue
// }
// if !slices.Contains(values, fieldType) {
// logger.Debugf(">> Issue: not matches")
// matched = false
// }
// case "unmatch_mapping_type":
// logger.Debugf("> Check unmatch_mapping_type: %q (type %s)", currentPath, fieldType)
// values, err := parseSetting(value)
// if err != nil {
// return fmt.Errorf("failed to check unmatch_mapping_type setting: %w", err)
// }
// logger.Debugf(">> Comparing to values: %s", values)
// if slices.Contains(values, fieldType) {
// logger.Debugf(">> Issue: matches")
// matched = false
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that these options would be better to skip them, since they would require to have the original type before the documents are stored into Elasticsearch.

If there is no objection, this whole case could be left as empty (with a comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can ignore this, but we should still check that the mapping type in the template matches with the actual mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I'll remove the code and add a comment there about this.

Comparing and validating the type would be achieved via the validateObjectMappingAndParameters function once it is found a dynamic template that matches the given parameters.

@mrodm mrodm requested a review from a team December 19, 2024 17:14
@mrodm mrodm marked this pull request as ready for review December 19, 2024 17:14
@mrodm mrodm changed the title [WIP] Validate dynamic mappings Validate fields based on their mappings and the dynamic templates set Dec 19, 2024
return fmt.Errorf("missing type parameter for field: %q", currentPath)
}

for _, template := range dynamicTemplates {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. This loop mixes parsing of the dynamic templates and its execution. Consider unmarshalling the dynamic templates first, so this loop becomes something like:

for _, template := dynamicTemplates {
    matches, err := template.Matches(currentPath, definition)
    if err != nil {
        return err
    }
    if matches {
        return nil
    }
}

The parsing of the dynamic templates could be done once, in validateMappingsNotInPreview.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated that logic in two stages. First one parses the dynamic templates from map[string]any into a specific struct.

Once the dynamic templates are parsed, the required mappings are validated with those dynamic templates.

Comment on lines 854 to 859
var matches bool
if fullRegex {
matches, err = stringMatchesPatterns(values, name)
} else {
matches, err = stringMatchesWildcards(values, name)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the docs seem to indicate that it only applies to match, but it would be weird that it is not possible to set this option for unmatch. Maybe match_pattern applies to both?

In any case, do we generate templates with these options?

Comment on lines 896 to 924
case "match_mapping_type", "unmatch_mapping_type":
// Do nothing
// These comparisons are done with the original data, and it's likely that the
// resulting mapping does not have the same type since it could change by the `mapping` field
// case "match_mapping_type":
// logger.Debugf("> Check match_mapping_type: %q (type %s)", currentPath, fieldType)
// values, err := parseSetting(value)
// if err != nil {
// return fmt.Errorf("failed to check match_mapping_type setting: %w", err)
// }
// logger.Debugf(">> Comparing to values: %s", values)
// if slices.Contains(values, "*") {
// continue
// }
// if !slices.Contains(values, fieldType) {
// logger.Debugf(">> Issue: not matches")
// matched = false
// }
// case "unmatch_mapping_type":
// logger.Debugf("> Check unmatch_mapping_type: %q (type %s)", currentPath, fieldType)
// values, err := parseSetting(value)
// if err != nil {
// return fmt.Errorf("failed to check unmatch_mapping_type setting: %w", err)
// }
// logger.Debugf(">> Comparing to values: %s", values)
// if slices.Contains(values, fieldType) {
// logger.Debugf(">> Issue: matches")
// matched = false
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can ignore this, but we should still check that the mapping type in the template matches with the actual mapping.

@@ -448,7 +450,7 @@ func appendECSMappingMultifields(schema []FieldDefinition, prefix string) []Fiel
{
Name: "text",
Type: "match_only_text",
External: "ecs",
External: externalFieldAppendedTag,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have many failures if we don't do this? Maybe we can leave this for a separate change.

@mrodm
Copy link
Contributor Author

mrodm commented Jan 8, 2025

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12164

@@ -1528,12 +1528,10 @@ func (r *tester) validateTestScenario(ctx context.Context, result *testrunner.Re
logger.Warn("Validate mappings found (technical preview)")
exceptionFields := listExceptionFields(scenario.docs, fieldsValidator)

mappingsValidator, err := fields.CreateValidatorForMappings(r.dataStreamPath, r.esClient,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the process to load the fields from the package and ECS has been removed, these parameters are not needed anymore.

The schema to be used to validate fields with ECS is the one set (optionally) via the method WithMappingValidatorFallbackSchema

@mrodm
Copy link
Contributor Author

mrodm commented Jan 8, 2025

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12164

@mrodm
Copy link
Contributor Author

mrodm commented Jan 9, 2025

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12164

@mrodm
Copy link
Contributor Author

mrodm commented Jan 9, 2025

test integrations

@elastic-vault-github-plugin-prod

Created or updated PR in integrations repository to test this version. Check elastic/integrations#12164

@mrodm mrodm requested a review from jsoriano January 9, 2025 17:02
@mrodm
Copy link
Contributor Author

mrodm commented Jan 9, 2025

Follow-up of the above comment #2285 (comment)

Summary of the errors found in the build: https://buildkite.com/elastic/integrations/builds/20226

There are some packages that are failing due to missing parameters/definitions:

  • auditd_manager: auditd.data.a0, auditd.data.a1, auditd.data.a2 and auditd.data.a3
  • teleport: some examples: teleport.audit.account_id, teleport.audit.aws_host, teleport.audit.aws_region, teleport.audit.instance_id,...
  • mongodb_atlas: missing definition for two fields mongodb_atlas.project.additional_info.hidden and mongodb_atlas.project.additional_info.is_mms_admin
    • They do not match the dynamic template since they are boolean and not string.
  • crowdstrike: errors related to flattened type
  • sublime_security: field does not match the dynamic template sublime_security.email_message.headers.hops.fields.position

There are packages that fail due to different types comparing with ECS:

  • box_events: different type for threat.enrichments.indicator.first_seen and threat.enrichments.indicator.last_seen
  • claroty_ctd: different type for threat.indicator.modified_at
  • mimecast: different type for threat.indicator.first_seen and threat.indicator.modified_at
  • ti_anomali: different type for threat.indicator.modified_at
  • cef: different type for url.original (keyword vs wildcard)

Packages that do not have any test failing now:

While inspecting the mappings and dynamic templates created for the docker package, I found some discrepancies in the mappings created
taking into account the definition in the package:

  • field definition:
    - name: core.*.pct
      type: scaled_float
      format: percent
      unit: percent
      metric_type: gauge
      description: |
        Percentage of CPU time in this core.
  • dynamic template created:
        {
          "docker.cpu.core.*.pct": {
            "path_match": "docker.cpu.core.*.pct",
            "match_mapping_type": "*",
            "mapping": {
              "scaling_factor": 1000,
              "type": "scaled_float"
            }
          }
        },

cc @jsoriano

@mrodm
Copy link
Contributor Author

mrodm commented Jan 13, 2025

Among the errors mentioned above:

cef: different type for url.original (keyword vs wildcard)

One option to solve this error in cef package is to update the ECS mappings that elastic-package adds automatically via import_mappings setting.

Currently, the embedded dynamic template created by elastic-package when import_mappings is enabled does not take any effect (it does not match path_match):

        {
          "_embedded_ecs-url_original_to_multifield": {
            "path_match": "*.url.original",
            "mapping": {
              "fields": {
                "text": {
                  "type": "match_only_text"
                }
              },
              "type": "wildcard"
            }
          }
        },

This dynamic template could be updated as in the ecs@mappings component template (without unmatch_mapping_type parameter since it is not supported in 8.6.x):

        {
          "ecs_path_match_wildcard_and_match_only_text": {
            "path_match": [
              "*.body.content",
              "*url.full",
              "*url.original"
            ],
            "unmatch_mapping_type": "object",
            "mapping": {
              "fields": {
                "text": {
                  "type": "match_only_text"
                }
              },
              "type": "wildcard"
            }
          }
        },

The inconvenient is that it would be needed to do a new cef release in order to include these new dynamic templates, since they are included in the built package.

WDYT about updating the embedded dynamic templates (in a follow-up PR)? Should it be created a new "empty" release in cef (with the new embedded dynamic templates) if this change is applied? Or just wait for the next cef version whenever it is created? @jsoriano

EDIT:

@jsoriano
Copy link
Member

cef: different type for url.original (keyword vs wildcard)

Why is this issue not detected with current validation? 🤔

WDYT about updating the embedded dynamic templates (in a follow-up PR)?

We probably need to do that in any case.

Should it be created a new "empty" release in cef (with the new embedded dynamic templates) if this change is applied? Or just wait for the next cef version whenever it is created?

We will need this to apply the new validation in integrations, we can wait till then.

@mrodm
Copy link
Contributor Author

mrodm commented Jan 14, 2025

cef: different type for url.original (keyword vs wildcard)

Why is this issue not detected with current validation? 🤔

Looking at the current logic to validate the fields (without mappings), IIUC wildcards are not validated or at least considered as not blocking due to this switch case (parseSingleElementValue function):

// All other types are considered valid not blocking validation.
default:
return nil
}

WDYT about updating the embedded dynamic templates (in a follow-up PR)?

We probably need to do that in any case.

Should it be created a new "empty" release in cef (with the new embedded dynamic templates) if this change is applied? Or just wait for the next cef version whenever it is created?

We will need this to apply the new validation in integrations, we can wait till then.

As this requires the ECS templates file to be updated in elastic-package, I'll move forward with this PR to prepare it for review.

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @mrodm

@mrodm
Copy link
Contributor Author

mrodm commented Jan 14, 2025

Created this PR elastic/integrations#12300 to start testing different approaches/workarounds that could be followed in the failing packages found in the integrations repository.

As a note, that integrations PR is just for testing purposes, and it is not going to be merged as it is.

@mrodm mrodm mentioned this pull request Jan 16, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants