The I-BiDaaS project aims to empower end-users to utilize and interact with big data technologies more easily. This is achieved by designing, building, and demonstrating a unified solution that (i) increases the speed of data analysis which is necessary to cope with the rate of data asset growth, and (ii) facilitates cross-domain data-flow, matching the needs of a thriving data-driven EU economy. I-BiDaaS achieves its goals following a methodological approach. Having access to real-world big data from three different industry domains, it proceeds with concise methodologies for breaking inter- and intra-sectorial data-silos, that also provide support for data sharing, exchange, and interoperability. Besides that, it also supports a methodological big data experimentation by putting in place a safe data processing environment. To foster experimentation, I-BiDaaS develops data processing tools and techniques applicable in real-world settings. Finally, I-BiDaaS has been tangibly validated by real-world, industry-lead experiments, in the domains of banking, manufacturing, and telecommunications.
The objectives of the I-BiDaaS project are the following:
- Develop, validate, demonstrate, and support, a complete and solid big data solution that can be easily configured and adopted by practitioners.
- Break inter- and intra-sectorial data-silos, create a data market and offer new business opportunities, and support data sharing, exchange, and interoperability.
- Construct a safe environment for methodological big data experimentation, for the development of new products, services, and tools.
- Develop data processing tools and techniques applicable in real-world settings, and demonstrate significant increase of speed of data throughput and access.
- Develop technologies that will increase the efficiency and competitiveness of all EU companies and organisations that need to manage vast and complex amounts of data.
The I-BiDaaS platform offers three different modes, tailored to different categories of users, namely:
- The Self-service mode provides a set of built in algorithms to run on users’ data. This mode allows users to select an offered algorithm based on their domain knowledge of the problem and the data, and perform standard data analytics tasks on their datasets without the need of any coding knowledge.
- The Expert mode: In this mode, users are able to upload and run their own code, or modify the available code templates, while using platform's resources. Every user-created project is stored under the user’s profile and experiments within the project can be executed multiple times, giving users the freedom to experiment on their data with different algorithmic setups and various resource allocations.
- The Co-Develop mode: This mode corresponds to the projects that have been tailored-made to match specific needs in the domains of the project’s pilot partners. The I-BiDaaS infrastructure has been configured to address the challenges of these specific use cases and the results showcase the potential of the developed platform in real-world business scenarios.
The architecture that is behind the I-BiDaaS Solution is divided into three main conceptual layers:
- The Infrastructure layer forms the “lower-most vertical” layer which includes the actual underlying storage and processing infrastructure of the I-BiDaaS solution.
- The Distributed large-scale layer is responsible for the orchestration and management of the underlying physical computational and storage infrastructure. It allows the effective and efficient use of the cloud infrastructure and enables the application layer to provide effective big data analytics.
- The Application layer sits on top of the distributed large-scale layer and refers to the software modules that are involved in the actual workflow of extracting actionable knowledge from the big data, starting from data ingestion, preparation, and fabrication, to batch and streaming analytics, to visualization and delivering analytics results for supporting decision making.
The I-BiDaaS platform is an end-to-end solution for Big Data analytics that offers the following functionalities: data ingestion and fabrication of realistic synthetic data for experimentation and testing, batch and streaming analytics, development of batch analytics solutions via sequential programming (where at the run-time, the platform automatically “parallelizes” the task over the actual distributed infrastructure), a pool of built in algorithms for Big Data processing tasks, data sampling and interactive querying, advanced visualizations and monitoring. The platform offers an open-source variant with a subset of the offered functionalities in a dockerized environment. The open source variant includes a pool of built-in implementations of standard machine learning algorithms that can be used as is or can serve as code templates for expert developers, and can exhibit tunable hyperparameters through a visual interface.
I-BiDaaS has been validated in the domain of Finance (Banking), Telecommunication and Manufacturing. Nevertheless, the I-BiDaaS Solution can be applied in many more sectors (e.g., insurance, education, energy, healthcare, retail, etc.)
The digital transition is reshaping the way organizations operate. Today, companies leverage industrial and commercial data to improve customer experiences, open new markets, make employees and processes more productive, and create new competitive advantage sources. According to the European Commission, the number of companies data users will reach 726,110 in the European Union (EU) in 2020, and in 2025 the number will reach between 753,380 to 845,330 of companies data users. Moreover, recent reports indicate that there is a lack of talent in the domain of the EU data economy. Indeed, by 2020, 769,000 'data worker' positions will be unfilled. New technologies like Big Data Analytics will create new jobs, and flexible Big Data tools like I-BiDaaS will be available to users with different levels of expertise, including non-experts in the Big Data domain, empowering users with crucial knowledge for long-term and sustainable growth, productivity, innovation, and competitiveness for their large, small, and medium-sized enterprises (SMEs).
I-BiDaaS delivers benefits for large, small, and medium-sized enterprises (SMEs)
- A platform that provides a safe environment for methodological big data experimentation even for short-term projects using specific algorithms, and therefore specific code, to achieve the desired result
- Do it yourself as a self-service-solution, allowing users to test, and evaluate the solution’s usability thus, developing skills & knowledge, and gaining experience
- Increase the speed of data analysis, resulting in time efficiency and cost reduction
- Obtain advanced visualization of your results, improving the user experience
The I-BiDaaS platform offers a highly flexible solution with three different modes of operation (expert, self-service, and co-develop). This allows the users to tailor their platform usage to the level of their expertise in the development of Big Data applications and to their domain knowledge about the industrial problem of interest. With respect to existing offerings in the market, that are either tailored to experts-only or non-experts-only (the latter case usually incurring a high-cost solutions), I-BiDaaS may offer a higher degree of flexibility. In addition, the solution exhibits a high degree of reusability of the developed implementations of machine learning algorithms, either through using code templates or through tuning of parameters of existing implementations via a visual interface. Furthermore, differently from most of existing solutions, the I-BiDaaS framework offers the functionality of fabricating realistic synthetic data via the IBM’s TDF tool. This is useful, e.g., in early development stages when not sufficient real data is available, and can help in making Big Data applications development more agile. In addition, I-BiDaaS offers an innovation with respect to how the data is fabricated. First, the correlations produced by the batch analytics module are fed back to TDF, to be used for training and to help building data fabrication rules that will be used for future data generation purposes. Finally, regarding streaming analytics, the I-BiDaaS framework allows to offload parts of the streaming analytics that can be parallelized at the GPU-accelerated streaming analytics and pattern matching tool.