Key Capabilities
Why teams choose DataStream for Elastic
A purpose-built pipeline that unlocks the full value of Elastic – without the infrastructure and storage costs that usually come with comprehensive log coverage.
Wide source coverage
On-premises, cloud, legacy systems, OT/ICS networks, IoT devices, and custom applications — all widely used sources are supported out of the box with no custom Filebeat plugin or Beats configuration required.
ECS normalization engine
Automated mapping and validation against Elastic Common Schema. Every event is enriched with the correct ECS fields so Kibana dashboards, detection rules, and analytics work on day one.
Data ingest cost optimization
Filter, deduplicate, sample, and route at the edge before data hits Elasticsearch. Typical customers reduce their daily ingest volume by 50–90% while expanding security coverage.
Multi-stage routing architecture
Intelligent routing sends security-relevant data to Elasticsearch, full data to Elasticsearch storage tiers, raw data as JSON or Parquet to cloud storage via dedicated cloud integrations, and supports Elasticsearch-native features – all from a single pipeline.
Schema drift detection
Automated validation prevents schema changes from breaking detection rules, Kibana visualizations, or compliance reports. Proactive alerting on data quality issues before they reach Elasticsearch.
Elastic Security ready
ECS-compliant output ensures Elastic Security detection rules, alerts, and investigation workflows function optimally – with no manual field extraction or schema mapping required.
ARCHITECTURE
How DataStream works
A multi-stage pipeline that processes raw logs before they reach your Elastic environment: each step improves data quality and reduces infrastructure cost.
Ingest
Logs from any source via Elastic Agents, Beats (Filebeat, etc.), agentless collection (WinRM/SSH), Syslog, CEF, LEEF, HTTP, and direct APIs
Parse & normalize
ECS-aware parsers and field mapping: Windows Events, Syslog, JSON, custom formats. Uses native Elasticsearch Ingest Pipeline format for all transformations
Enrich & validate
Contextual metadata enrichment + schema validation against Elastic Common Schema (ECS) requirements
Filter & optimize
Deduplication, sampling, and field extraction reduce volume by 50–90%
Route & deliver
Multi-stage routing: ECS-normalized security events → Elastic Security, full data → Elasticsearch, raw data → cloud storage (JSON or Parquet)
API & authentication
- Elasticsearch Bulk API with native protocol support
- API key and basic authentication with TLS
- Elasticsearch Ingest Pipeline execution with error handling
- High-throughput batching with retry logic
- Rate limiting & throttle prevention built-in
ECS components
- Pre-built ECS field definitions and aliases for common security data sources
- Field mapping aligned to ECS data structures
- Anomaly detection on schema drift
- Compatible with Elastic Security detection rules
- Optimized for Kibana visualizations and dashboards
Deployment options
- Docker / Kubernetes container deployment
- On-premises agent deployment
- Cloud-native (AWS, Azure, GCP, Elastic Cloud)
- Air-gapped & data-residency configurations
- Multi-tenant MSSP configurations
Built for enterprise challenges
Elasticsearch ingest cost reduction
Elasticsearch and Elastic Cloud pricing scales with data volume and retention. DataStream acts as an intelligent optimization layer that reduces ingest volume by 50–90% through ECS-aware filtering, deduplication, and field extraction, without creating security blind spots or breaking existing Kibana dashboards and detection rules.
Legacy & OT integration
Legacy systems, OT networks, IoT devices, and custom applications often lack native Elastic Agents or Beats. DataStream provides ready-made connectors for all widely used sources and flexible transformation pipelines that eliminate custom development and reduce deployment complexity.
Elastic Security activation
Elastic Security’s detection rules and investigation workflows rely on ECS-compliant data to function optimally. DataStream’s automated ECS mapping ensures all ingested data is immediately available for security alerting, threat detection, and investigation without manual field extraction or schema mapping.
MSSP customer onboarding
MSSPs and security integrators can onboard new customers to Elasticsearch and Elastic Cloud faster with reduced integration complexity. Pre-built connectors and automated ECS normalization reduce custom development effort and accelerate time-to-deployment.
ROI & SAVINGS
Measurable impact on your Elastic infrastructure
With DataStream, you can extend coverage to more sources without increasing your Elastic infrastructure costs.
Data ingest volume before indexing
Field-level optimization removes empty values, null fields, and operational metadata for an immediate 55–60% reduction. Optional event-level filtering and sampling push total reduction to 70–90% – with security-critical events always retained.
Deployment effort reduction
Pre-built connectors and automated ECS normalization eliminate custom development work and accelerate time-to-value.
Comparison
VirtualMetric vs. Alternatives
How does DataStream compare to other data pipeline solutions for Elastic Stack?
|
Cribl Stream
|
Logstash
|
VirtualMetric DataStream
|
Native Beats
|
|
|---|---|---|---|---|
| Native Elasticsearch integration | ||||
| Automated ECS normalization | Manual | Plugin-based | Per Beat | |
| Elastic Security ready | Generic | Manual | Beat-dependent | |
| OT / Legacy / IoT connectors | Add-ons required | Custom config | Limited | |
| Multi-stage routing | Basic | Pipeline only | ||
| Data ingest cost reduction | Generic | |||
| Raw data – cloud storage (Parquet) | Manual | |||
| Schema drift detection |
Frequently asked questions
How can I reduce Elasticsearch ingest costs without losing security visibility?
Elasticsearch pricing scales with data volume stored and indexed. DataStream reduces that volume through a layered approach. By default, field-level optimization removes empty values, null fields, and operational metadata that Elasticsearch analytics rules never reference, achieving 55-60% reduction with no security risk. Optional event-level filtering and statistical sampling can push total reduction to 70-90%, with security-critical events always protected. Full raw logs are simultaneously routed to low-cost cloud storage (AWS S3, Azure Blob, Google Cloud Storage) with a Correlation ID, so analysts can retrieve complete records for forensic investigations when needed.
Read more: How to Reduce SIEM Costs Without Losing Security Visibility
Why do I need a pipeline tool if Elasticsearch already has Ingest Pipelines and Beats?
Elasticsearch Ingest Pipelines and Beats handle ingestion and basic transformations within the Elastic Stack, but they have significant limitations. Beats can’t collect from agentless sources or systems without agent installation, Ingest Pipelines don’t normalize data to ECS across diverse source types automatically, and both send everything to a single Elasticsearch cluster. DataStream operates before data reaches Elasticsearch: it collects from any source via agentless collection (WinRM/SSH) or agents where needed, applies vendor-specific ECS normalization using the native Elasticsearch Ingest Pipeline format, reduces volume before it ever hits Elasticsearch’s storage meter, and routes different data types to the right destination based on security value.
How does ECS normalization work, and do I need to set it up manually?
Elastic Common Schema (ECS) provides a standardized structure for security events. DataStream handles ECS mapping automatically: when logs arrive from a supported source, the multi-schema processing engine applies vendor-specific field mappings using the native Elasticsearch Ingest Pipeline format, validates the output against ECS schema requirements, and routes the normalized data to Elasticsearch. No manual parser writing or field mapping is required for supported sources.
How do I connect sources that don’t have native Elastic Agents or Beats?
DataStream supports both agentless and agent-based collection. Agentless collection connects directly via WinRM (Windows) or SSH (Linux, macOS, Solaris, AIX) with no software installation. For network devices, OT/ICS systems, and security appliances, it supports Syslog, CEF, LEEF, and REST APIs. Pre-built content packs cover all widely used vendors – Fortinet, Palo Alto, Check Point, CrowdStrike, CyberArk, Zscaler, and more – each activating automatically when logs from that vendor are detected.
How does DataStream handle sensitive data before it reaches Elasticsearch?
DataStream applies policy-based redaction and masking in the pipeline, before data leaves your environment. You define rules through a no-code UI to automatically remove or obfuscate sensitive fields: usernames, passwords, tokens, PII such as email addresses and phone numbers, or any custom field. Redaction is applied consistently across all incoming data, eliminating the risk of uneven coverage from manual processes. The structure and security context of each log remain intact, so Elasticsearch detection rules and correlations continue to work accurately. Policies are designed to support GDPR, HIPAA, and PCI DSS requirements.
Can DataStream send data to both Elasticsearch and cloud storage simultaneously?
Yes, multi-destination routing is a core capability. DataStream can simultaneously send ECS-normalized, security-relevant events to Elasticsearch or Elastic Cloud for real-time analytics, full data volume to Elasticsearch storage tiers for long-term retention, and raw logs in JSON or Parquet format to AWS S3, Azure Blob Storage, or Google Cloud Storage for archival at a fraction of Elasticsearch’s storage cost. Each destination receives the appropriate data tier based on security value, with a Correlation ID linking optimized Elasticsearch data back to complete raw logs in archive storage.
How long does it take to deploy DataStream for Elasticsearch?
Initial deployment setup takes under 30 minutes. DataStream’s guided configuration automatically handles Elasticsearch authentication, endpoint configuration, and pipeline routing – no manual Elasticsearch cluster reconfiguration required. The actual time-to-value depends on the complexity of your data sources and normalization requirements. Contact our team for a technical consultation on your specific deployment timeline.
Talk to our experts
Schedule a technical session with our engineering team to see how DataStream compares to what you’re running today.
Try DataStream
Route data to your SIEM in the correct schema, with automatic normalization and up to 90% data volume reduction.
Try now