Sharing my slides and talk recording (in Russian) from JavaDay 2016 conference.
In this talk I discuss how we built an open-source Spark distribution http://insightedge.io that runs on top of in-memory database.
The agenda of the talk:
- a need of hybrid transactional and analytical processing
- an overview of in-memory datagrid features
- how we designed InsightEdge RDD partitions to make it scalable
- implementation of Spark DataSource API to support DataFrame/SQL
- optimization techniques: predicates push-down and columns pruning
- how InsightEdge can run 30 times faster that regular Spark
- designing API with Scala, the good and unpleasant parts
- extending Spark API with geo spatial queries
- testing with Docker
Video recording (in Russian):