Big Data Processing

Managed Hadoop & Spark clusters.

Fully managed big data clusters. Run Hadoop, Spark, Presto, and Hive at petabyte scale. Auto-scaling, spot instances, and HDFS.

⚡ Hadoop & Spark

Petabytes

Scale

Hadoop/Spark

Engines

Auto

Scaling

Spot ready

Cost

Data, at scale.

Petabyte Hadoop & Spark clusters.

Petabyte scale

Process petabytes with auto-scaling clusters.

Multi-engine

Hadoop, Spark, Presto, Hive, and Flink.

Spot instances

80% cost savings with spot instance support.

HDFS & S3

Native HDFS and cloud object storage.

Auto-scaling

Scale workers based on workload demand.

Notebook integration

Jupyter and Zeppelin notebooks built in.

Getting started

Launch your first instance in three steps. CLI, console, or API — your choice.

Terminal
ur data cluster create analytics \
  --engine=spark --workers=10 \
  --spot=true --auto-scale=5-50

Big data patterns.

ETL and interactive analytics.

ETL pipelines

Transform petabytes with Spark ETL.

View tutorial

Suggested configuration

Spark · Petabyte · Auto-scale

Estimate your costs

Create detailed configurations to see exactly how much your architecture will cost. Pay for what you use, down to the second.

Configuration 1

Estimated: $569.50/mo

Spark Engine

Compute Resources

TB

Storage & Output

GB
GB
Config 1 cost$569.50

Cost details

$569.50

Managed Spark and Hadoop. Unified data lake.

Configuration 1
$569.50
10 Processing Unit(s)$500.00
Data Processed$50.00
Storage$11.50
Egress$8.00

Works seamlessly with

S3
Kafka
VPC
IAM
Monitoring
BI

Frequently asked questions

Big data, managed.

Petabyte-scale Hadoop & Spark.