Features of Spark

Features of Spark listed below explains that how Spark overcomes the limitations of Hadoop.

1. Fast Processing

Apache Spark allows tremendous speed in in-memory processing up to 100x faster and 10x faster on disk. This speed is possible since, the number of reads–writes to the disk decreases.

2. Dynamic in Nature

Spark has 80 high-level operators, thus it allows developing parallel applications. Although, Scala is the default language we can work with Java, Python and R.

3. In-memory Processing

With this property Spark increases the processing speed as it caches the data, so the data fetching time decreases.

4. Fault Tolerant

The primary abstraction of Spark i.e., RDD is highly capable to handle failure. So, the loss of data reduces to zero in case of failure.

5. Stream Processing

Support for Stream processing in Apache Spark makes it a popular framework for working with live stream of data.

6. Lazy Evaluation Supportive

All the Transformations that are present on RDD do not execute on the go. And also each transformation creates a new RDD. This RDD executes only on Action.

7. Support Many Languages

Spark is although written in Scala, it supports many languages API’s.

8. Integration with Hadoop

Spark can integrate with Hadoop HDFS. As Spark does not have its own file storage system.

9. Spark GraphX

This component of Spark handles graph and graph parallel computation. GraphX exposes a set of fundamental operators as well as an optimized variant of the Pregel API.

10. Cost Efficient

Apache Spark is cost effective solution for Big data problem. Hadoop requires a large amount of storage and data center at the time of replication.

Apache Spark Features