Brief description

The RapidMiner Streaming extension allows user to easily build streaming processes in a guided, visual workflow designer. It supports Apache Flink and Spark Streaming clusters as back-ends and with an included connection management to effortlessly switch between different platforms or cluster set-ups. Outcome of the H2020 research project INFORE (http://infore-project.eu/)

  • Graphically design data processing workflows and data analytics tasks with minimal or no programming overhead​

  • Real-time, interactive machine learning and data mining tools ​

  • Distributed Complex Event Forecasting

  • Code-Free development​​

  • Platform and back-end independent​​

  • Pluggable connection management​​

  • Easy to share and collaborate

Main Features
  • Graphically design data processing workflows and data analytics tasks with minimal or no programming overhead

  • Real-time, interactive machine learning and data mining tools

  • Distributed Complex Event Forecasting

  • Code-Free development

  • Platform and back-end independent

  • Pluggable connection management

  • Easy to share and collaborate

Areas of Application
  • Streaming Analytics

  • Design Spark or Flink streaming process

  • Complex Event Forecasting

  • Machine Learning

Market Trends and Opportunities

Monitoring and analyzing data streams becomes more and more common. But still the access to these data sources requires either predefined solutions or some coding skills to write ones own applications. With a code free and user friendly work flow designer tapping into the pool of streaming data becomes more accessible for analysts and for data scientist.

Combining streaming data sources with an enterprise ready data science platform offers new use cases for a user centric development and deployment of AI methods.

Customer Benefits

Users can easily design and execute streaming applications without the requirement to write custom code.

The connection management makes it very easy to switch between different set-ups and all processes are platform agnostic, so supporting multiple platforms is not a problem. The extension also combines the possibility to retrieve data from streaming sources via Apache Kafka with the machine learning capabilities of the RapidMiner data science tools.

Technological novelty

The multi-platform support and workflow optimization features are new developments, that target the need to customize resource allocation and process performance (e.g., moving load heavy operations closer to the data source) in multi-cluster environments.