Features of Spark
Features of Spark listed below explains that how Spark overcomes the limitations of Hadoop.
1. Fast Processing
Apache Spark allows tremendous speed in in-memory processing up to 100x faster and 10x faster on disk. This speed is possible since, the number of reads–writes to the disk decreases.
2. Dynamic in Nature
Spark has 80 high-level operators, thus it allows developing parallel applications. Although, Scala is the default language we can work with Java, Python and R.
3. In-memory Processing
With this property Spark increases the processing speed as it caches the data, so the data fetching time decreases.
4. Fault Tolerant
The primary abstraction of Spark i.e., RDD is highly capable to handle failure. So, the loss of data reduces to zero in case of failure.
5. Stream Processing
Support for Stream processing in Apache Spark makes it a popular framework for working with live stream of data.
6. Lazy Evaluation Supportive
All the Transformations that are present on RDD do not execute on the go. And also each transformation creates a new RDD. This RDD executes only on Action.
7. Support Many Languages
Spark is although written in Scala, it supports many languages API’s.
8. Integration with Hadoop
Spark can integrate with Hadoop HDFS. As Spark does not have its own file storage system.
9. Spark GraphX
This component of Spark handles graph and graph parallel computation. GraphX exposes a set of fundamental operators as well as an optimized variant of the Pregel API.
10. Cost Efficient
Apache Spark is cost effective solution for Big data problem. Hadoop requires a large amount of storage and data center at the time of replication.