Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS] Adding new integration for Custom GCS Input #4692

Merged
merged 10 commits into from
Feb 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@
/packages/gcp_pubsub @elastic/security-external-integrations
/packages/github @elastic/security-external-integrations
/packages/golang @elastic/obs-service-integrations
/packages/google_cloud_storage @elastic/security-external-integrations
/packages/google_workspace @elastic/security-external-integrations
/packages/hadoop @elastic/obs-service-integrations
/packages/haproxy @elastic/obs-service-integrations
Expand Down
4 changes: 4 additions & 0 deletions packages/google_cloud_storage/_dev/build/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
dependencies:
ecs:
reference: git@v8.6.0
import_mappings: true
28 changes: 28 additions & 0 deletions packages/google_cloud_storage/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Custom GCS (Google Cloud Storage) Input

Use the `google cloud storage input` to read content from files stored in buckets which reside on your Google Cloud.
The input can be configured to work with and without polling, though currently, if polling is disabled it will only
perform a one time passthrough, list the file contents and end the process. Polling is generally recommended for most cases
even though it can get expensive with dealing with a very large number of files.

*To mitigate errors and ensure a stable processing environment, this input employs the following features :*

1. When processing google cloud buckets, if suddenly there is any outage, the process will be able to resume post the last file it processed and was successfully able to save the state for.

2. If any errors occur for certain files, they will be logged appropriately, but the rest of the
files will continue to be processed normally.

3. If any major error occurs which stops the main thread, the logs will be appropriately generated,
describing said error.


NOTE: Currently only `JSON` is supported with respect to object/file formats. We also support gzipped JSON object/files. As for authentication types, we currently have support for
`json credential keys` and `credential files`. If a download for a file/object fails or gets interrupted, the download is retried for 2 times.
This is currently not user configurable.


## ECS Field Mapping
This integration includes the ECS Dynamic Template, all fields that follows the ECS Schema will get assigned the correct index field mapping and does not need to be added manually.

## Ingest Pipelines
Custom ingest pipelines may be added by adding the name to the pipeline configuration option, creating custom ingest pipelines can be done either through the API or the [Ingest Node Pipeline UI](/app/management/ingest/ingest_pipelines/).
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
version: '2.3'
services:
google-cloud-storage-emulator:
image: fsouza/fake-gcs-server:latest
command: -host=0.0.0.0 -public-host=elastic-package-service_google-cloud-storage-emulator_1 -port=4443 -scheme=http
volumes:
- ./sample_logs:/data
ports:
- 4443/tcp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{ "testmessage": "success" }
5 changes: 5 additions & 0 deletions packages/google_cloud_storage/changelog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- version: "0.1.0"
changes:
- description: Initial Release
type: enhancement
link: https://github.com/elastic/integrations/pull/4692
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
service: google-cloud-storage-emulator
input: gcs
data_stream:
vars:
project_id: testproject
alternative_host: "http://{{Hostname}}:{{Port}}"
buckets: |
- name: testbucket
poll: true
poll_interval: 15s
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
data_stream:
dataset: {{data_stream.dataset}}
{{#if pipeline}}
pipeline: {{pipeline}}
{{/if}}
{{#if project_id}}
project_id: {{project_id}}
{{/if}}
{{#if alternative_host}}
alternative_host: {{alternative_host}}
{{/if}}
{{#if service_account_key}}
auth.credentials_json.account_key: {{service_account_key}}
{{/if}}
{{#if service_account_file}}
auth.credentials_file.path: {{service_account_file}}
{{/if}}
{{#if buckets}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another field parse_json could be added as per the doc

buckets:
{{buckets}}
{{/if}}
{{#if tags}}
tags:
{{#each tags as |tag i|}}
- {{tag}}
{{/each}}
{{/if}}
{{#contains "forwarded" tags}}
publisher_pipeline.disable_host: true
{{/contains}}
{{#if processors}}
processors:
{{processors}}
{{/if}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
- name: data_stream.type
type: constant_keyword
description: Data stream type.
- name: data_stream.dataset
type: constant_keyword
description: Data stream dataset.
- name: data_stream.namespace
type: constant_keyword
description: Data stream namespace.
- name: "@timestamp"
type: date
description: Event timestamp.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- name: input.type
description: Type of Filebeat input.
type: keyword
- name: tags
type: keyword
description: User defined tags
- name: log.offset
type: long
description: Log offset
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
- name: gcs.storage
type: group
fields:
- name: bucket.name
type: keyword
description: The name of the Google Cloud Storage Bucket.
- name: object.json_data
type: keyword
description: When parse_json is true, the resulting JSON data is stored in this field.
- name: object.name
type: keyword
description: The content type of the Google Cloud Storage object.
- name: object.content_type
type: keyword
description: The content type of the Google Cloud Storage object.
77 changes: 77 additions & 0 deletions packages/google_cloud_storage/data_stream/generic/manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
title: Custom GCS (Google Cloud Storage) Input
type: logs
streams:
- input: gcs
description: Collect JSON data from configured GCS Bucket with Elastic Agent.
title: Custom GCS (Google Cloud Storage) Input
template_path: gcs.yml.hbs
vars:
- name: data_stream.dataset
type: text
title: Dataset name
description: |
Dataset to write data to. Changing the dataset will send the data to a different index. You can't use `-` in the name of a dataset and only valid characters for [Elasticsearch index names](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html).
default: google_cloud_storage.generic
required: true
show_user: true
- name: pipeline
type: text
title: Ingest Pipeline
description: |
The Ingest Node pipeline ID to be used by the integration.
required: false
show_user: true
- name: project_id
type: text
title: Project ID
description: |
This attribute is required for various internal operations with respect to authentication, creating storage clients and logging which are used internally for various processing purposes.
required: true
show_user: true
- name: service_account_key
type: text
title: Service Account Key
description: |
This attribute contains the json service account credentials string, which can be generated from the google cloud console, ref[Service Account Keys](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
Required if a Service Account File is not provided.
required: false
show_user: true
- name: service_account_file
type: text
title: Service Account File
description: |
This attribute contains the service account credentials file, which can be generated from the google cloud console, ref [Service Account Keys](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
Required if a Service Account Key is not provided.
required: false
show_user: true
- name: alternative_host
type: text
title: Alternative Host
description: Used to override the default host for the storage client (default is storage.googleapis.com)
required: false
multi: false
show_user: false
- name: buckets
type: yaml
title: Buckets
description: "This attribute contains the details about a specific bucket like name, max_workers, poll, poll_interval and bucket_timeout. The attribute name is specific to a bucket as it describes the bucket name, while the fields max_workers, poll, poll_interval and bucket_timeout can exist both at the bucket level and the root level. \nIt is internally represented as an array, so multiple buckets can be provided.\nFor more information about each attribute, please see the relevant [Documentation](https://www.elastic.co/guide/en/beats/filebeat/8.5/filebeat-input-gcs.html#attrib-buckets).\n"
required: true
show_user: true
- name: processors
type: yaml
title: Processors
multi: false
required: false
show_user: false
description: |
Processors are used to reduce the number of fields in the exported event or to enhance the event with metadata. This executes in the agent before the logs are parsed. See [Processors](https://www.elastic.co/guide/en/beats/filebeat/current/filtering-and-enhancing-data.html) for details.
- name: tags
type: text
title: Tags
description: Tags to include in the published event
required: false
default:
- forwarded
- google_cloud_storage-generic
multi: true
show_user: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"@timestamp": "2023-02-06T21:52:46.349Z",
"agent": {
"ephemeral_id": "3cf23ee4-8245-40b5-b4d2-81fddcb3d7d3",
"id": "754dc12a-6bfa-45fb-ab62-c686495d3d66",
"name": "docker-fleet-agent",
"type": "filebeat",
"version": "8.7.0"
},
"data_stream": {
"dataset": "google_cloud_storage.generic",
"namespace": "ep",
"type": "logs"
},
"ecs": {
"version": "8.0.0"
},
"elastic_agent": {
"id": "754dc12a-6bfa-45fb-ab62-c686495d3d66",
"snapshot": true,
"version": "8.7.0"
},
"event": {
"agent_id_status": "verified",
"dataset": "google_cloud_storage.generic",
"ingested": "2023-02-06T21:52:47Z"
},
"input": {
"type": "gcs"
},
"message": "job with jobId testbucket-testdata.log-worker-0 encountered an error: content-type text/plain; charset=utf-8 not supported",
"tags": [
"forwarded",
"google_cloud_storage-generic"
]
}
28 changes: 28 additions & 0 deletions packages/google_cloud_storage/docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Custom GCS (Google Cloud Storage) Input

Use the `google cloud storage input` to read content from files stored in buckets which reside on your Google Cloud.
The input can be configured to work with and without polling, though currently, if polling is disabled it will only
perform a one time passthrough, list the file contents and end the process. Polling is generally recommended for most cases
even though it can get expensive with dealing with a very large number of files.

*To mitigate errors and ensure a stable processing environment, this input employs the following features :*

1. When processing google cloud buckets, if suddenly there is any outage, the process will be able to resume post the last file it processed and was successfully able to save the state for.

2. If any errors occur for certain files, they will be logged appropriately, but the rest of the
files will continue to be processed normally.

3. If any major error occurs which stops the main thread, the logs will be appropriately generated,
describing said error.


NOTE: Currently only `JSON` is supported with respect to object/file formats. We also support gzipped JSON object/files. As for authentication types, we currently have support for
`json credential keys` and `credential files`. If a download for a file/object fails or gets interrupted, the download is retried for 2 times.
This is currently not user configurable.


## ECS Field Mapping
This integration includes the ECS Dynamic Template, all fields that follows the ECS Schema will get assigned the correct index field mapping and does not need to be added manually.

## Ingest Pipelines
Custom ingest pipelines may be added by adding the name to the pipeline configuration option, creating custom ingest pipelines can be done either through the API or the [Ingest Node Pipeline UI](/app/management/ingest/ingest_pipelines/).
4 changes: 4 additions & 0 deletions packages/google_cloud_storage/img/icon.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions packages/google_cloud_storage/manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
format_version: 2.3.0
name: google_cloud_storage
title: Custom GCS (Google Cloud Storage) Input
description: Collect JSON data from configured GCS Bucket with Elastic Agent.
type: integration
version: "0.1.0"
conditions:
kibana.version: "^8.6.2"
categories:
- custom
- cloud
policy_templates:
- name: gcs
title: Custom GCS (Google Cloud Storage) Input
description: Collect JSON data from configured GCS Bucket with Elastic Agent.
inputs:
- type: gcs
title: Custom GCS (Google Cloud Storage) Input
description: Collect JSON data from configured GCS Bucket with Elastic Agent.
icons:
- src: "/img/icon.svg"
type: "image/svg+xml"
owner:
github: elastic/security-external-integrations