SOON, the Station Observation Outlier fiNder, is an application that identifies errors in large data flows that need real-time filtering and cleaning. Through an easy-to-use interface, SOON provides real-time alerts on detected anomalies, and further statistics on historical data.
SOON builds on artificial intelligence and machine learning to extract the expected behavior of a complex system. The use of a BIG DATA approach makes the solution scalable and flexible.
The idea behind SOON is that data flows can be screened from different perspectives simultaneously. This approach maximizes the exploitation of the available information. At the same time, the process becomes resilient to the absence of information. If you miss one point of view, you still have the others to look at the problem.
To detect the errors, SOON uses machine learning to check the internal consistency of the dataflow from a temporal, spatial and parametric perspective.
The machine learning modeling is further optimized and generalized by identifying groups of stations with similar behaviors through advanced clustering techniques. Downstream of SOON, additional external processing can then be applied to treat the anomalies for cleaning and further filtering.
- Big Data approach (Hadoop HDFS, Apache Spark, Apache Kafka, Apache NiFi)
- Machine Learning analytics engine (Apache PySpark)
- Visualization web-application (D3js based)
SOON is an highly generalized framework, that could be adapted to many environments. Actual selected areas are Weather Analysis and Water Utilities.
• SOON works with all type of sensors
• SOON manages large networks
• SOON can scale-up
• SOON immediately visualize your problems