Big Data Processing

Managed Hadoop & Spark clusters.

Fully managed big data clusters. Run Hadoop, Spark, Presto, and Hive at petabyte scale. Auto-scaling, spot instances, and HDFS.

Create cluster View docs

Petabytes

Scale

Hadoop/Spark

Engines

Auto

Scaling

Spot ready

Cost

Data, at scale.

Petabyte Hadoop & Spark clusters.

Petabyte scale

Process petabytes with auto-scaling clusters.

Multi-engine

Hadoop, Spark, Presto, Hive, and Flink.

Spot instances

80% cost savings with spot instance support.

HDFS & S3

Native HDFS and cloud object storage.

Auto-scaling

Scale workers based on workload demand.

Notebook integration

Jupyter and Zeppelin notebooks built in.

Getting started

Launch your first instance in three steps. CLI, console, or API — your choice.

Terminal

ur data cluster create analytics \
  --engine=spark --workers=10 \
  --spot=true --auto-scale=5-50

Big data patterns.

ETL and interactive analytics.

ETL pipelines

Transform petabytes with Spark ETL.

View tutorial

Suggested configuration

Spark · Petabyte · Auto-scale

Estimate your costs

Create detailed configurations to see exactly how much your architecture will cost. Pay for what you use, down to the second.

Configuration 1

Estimated: $569.50/mo

Spark Engine

Cluster Tier

Compute Resources

Processing Units (DPUs)

Data Processed (TB/mo)

Storage & Output

Persistent Storage (GB)

Data Egress (GB)

Config 1 cost$569.50

Cost details

$569.50

Managed Spark and Hadoop. Unified data lake.

Configuration 1

$569.50

10 Processing Unit(s)$500.00

Data Processed$50.00

Storage$11.50

Egress$8.00

Works seamlessly with

Kafka

VPC

IAM

Monitoring

Frequently asked questions

Big data, managed.

Petabyte-scale Hadoop & Spark.

Create cluster View docs

Managed Hadoop & Spark clusters.

Data, at scale.

Petabyte scale

Multi-engine

Spot instances

HDFS & S3

Auto-scaling

Notebook integration

Getting started

Create cluster

Submit job

Monitor

Big data patterns.

ETL pipelines

Estimate your costs

Configuration 1

Spark Engine

Compute Resources

Storage & Output

Cost details

Frequently asked questions

Big data, managed.