How we built InsightEdge. Slides and talk recording

Sharing my slides and talk recording (in Russian) from JavaDay 2016 conference.

In this talk I discuss how we built an open-source Spark distribution that runs on top of in-memory database.

The agenda of the talk:

  • a need of hybrid transactional and analytical processing
  • an overview of in-memory datagrid features
  • how we designed InsightEdge RDD partitions to make it scalable
  • implementation of Spark DataSource API to support DataFrame/SQL
  • optimization techniques: predicates push-down and columns pruning
  • how InsightEdge can run 30 times faster that regular Spark
  • designing API with Scala, the good and unpleasant parts
  • extending Spark API with geo spatial queries
  • testing with Docker

Video recording (in Russian):