Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Neptune Event Stream as a Source #22067

Open
r2690698 opened this issue Dec 20, 2024 · 0 comments
Open

AWS Neptune Event Stream as a Source #22067

r2690698 opened this issue Dec 20, 2024 · 0 comments
Labels
source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.

Comments

@r2690698
Copy link

r2690698 commented Dec 20, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

process and route graph database events generated by Amazon Neptune, a direct integration with Neptune’s event stream would greatly enhance Vector’s capabilities. Neptune’s event streams provide real-time updates for changes to graph data, such as node and edge modifications, which are crucial for applications that depend on immediate data consistency and downstream event-driven processing.

The specific use cases include:
1. Real-Time Processing of Graph Data:
Applications that rely on Neptune for graph data analysis often need to propagate updates (e.g., node additions, deletions, or property changes) to downstream systems such as message queues, databases, or analytics platforms.
Example: Propagating event changes from Neptune to an AWS SQS queue for further processing.
2. Enabling Event-Driven Architectures:
Many applications using Neptune adopt an event-driven design to react to data changes. Without direct support for Neptune streams, users must rely on intermediary systems such as AWS Lambda or Kinesis, introducing extra latency and complexity.
3. Simplifying Event Pipeline Architectures:
Direct support for Neptune streams in Vector would remove the need for additional infrastructure, making it easier to build lightweight and efficient pipelines for graph event processing.

Attempted Solutions

No response

Proposal

Proposal
Add support in Vector for Neptune event streams as a source, similar to the existing support for AWS Kinesis Streams or S3 Sources. This would allow users to directly consume, filter, and route Neptune events using Vector.

Proposed Configuration
1. Neptune Stream as a Source
A new source type, neptune_stream, can be introduced to handle data streams emitted by Neptune.
Example Configuration:

[sources.neptune]
type = "neptune_stream"
region = "us-east-1"
endpoint = "neptune-cluster.cluster-xxxxxxxxxx.us-east-1.neptune.amazonaws.com"
stream_arn = "arn:aws:neptune:us-east-1:123456789012:stream:my-stream"
credentials = { access_key_id = "xxx", secret_access_key = "xxx" }

[transforms.filter_events]
type = "remap"
inputs = ["neptune"]
source = '''
if .event_type == "ADD" && .node_type == "Customer" {
emit
}
'''

[sinks.sqs]
type = "aws_sqs"
region = "us-east-1"
queue_url = "https://sqs.us-east-1.amazonaws.com/123456789012/my-queue"
inputs = ["filter_events"]

2.	Features of the neptune_stream Source:
•	Support for connecting to Neptune clusters via HTTPS endpoints.
•	Automatic handling of stream events, including:
•	Node and edge additions.
•	Deletions.
•	Property updates.
•	Built-in parsing of Neptune event formats (JSON or Gremlin-based events).
•	Integration with Vector’s existing filtering and enrichment capabilities.
3.	Advanced Options:
•	Batching: Support for batching events for downstream sinks like S3 or Kafka.
•	Event Filtering: Allow users to filter events at the source level, based on metadata (e.g., node type, labels, or event type).
4.	Fallback and Error Handling:
•	Retry mechanisms for transient network issues.
•	Dead-letter queue integration for failed event processing.

Benefits
1. Simplifies Neptune event processing pipelines by removing dependencies on intermediary services (e.g., Lambda, Kinesis).
2. Reduces latency in propagating graph updates to downstream systems.
3. Leverages Vector’s efficient transformation and routing engine directly for graph event streams.
4. Offers a consistent interface for working with Neptune streams alongside other AWS services.

References

No response

Version

0.42.0

@r2690698 r2690698 added the type: feature A value-adding code addition that introduce new functionality. label Dec 20, 2024
@jszwedko jszwedko added the source: new A request for a new source label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

2 participants