How we built InsightEdge. Slides and talk recording

Dec 6th, 2017 12:36 pm

Sharing my slides and talk recording (in Russian) from JavaDay 2016 conference.

In this talk I discuss how we built an open-source Spark distribution http://insightedge.io that runs on top of in-memory database.

The agenda of the talk:

a need of hybrid transactional and analytical processing
an overview of in-memory datagrid features
how we designed InsightEdge RDD partitions to make it scalable
implementation of Spark DataSource API to support DataFrame/SQL
optimization techniques: predicates push-down and columns pruning
how InsightEdge can run 30 times faster that regular Spark
designing API with Scala, the good and unpleasant parts
extending Spark API with geo spatial queries
testing with Docker

Video recording (in Russian):

Slides:

Comments