Smart Data Lake Builder
The efficient way to lay the foundations of a smarter Data Lake
Whilst building, operating and maintaining a multi-layered data architecture is usually complex and expensive, the Smart Data Lake Builder developed by ELCA helps you save up to 50% of its usual implementation and maintenance cost.
Building a Smart Data Lake unleashes the true power of modern Analytics Platforms. It enables the extraction of meaningful insights in support of key decision-making processes driven by data consumers, data scientists and analysts. It also feeds into key applications within your organization.
Classical Data Lakes are often reduced to basic but cheap raw data storage, neglecting significant aspects like transformation, data quality and security. These topics are left to data scientists, who end up spending up to 80% of their time acquiring, understanding and cleaning data before they can start using their core competencies.
In addition, classical Data Lakes are often implemented by separate departments using different standards and tools, which makes it harder to implement comprehensive analytical use cases.
Smart Data Lakes solve these various issues by providing architectural and methodical guidelines, together with an efficient tool to build a strong high-quality data foundation.
The Smart Data Lake Builder leverages metadata and automation to reduce complexity and generate significant savings over implementation and maintenance:
- Significant savings (30 to 50%) on data lake implementation, operation and maintenance
- Faster & cost-effective implementation of new analytical apps
- Increased productivity of data scientists and improved self-service for data consumers
- Higher clarity of data structure and origin
- No binding to any platform or vendor
Smart Data Lakes are at the core of any modern analytics platform. Their structure easily integrates prevalent Data Science tools and open source technologies, as well as AI and ML. Their storage is cheap and scalable, supporting both unstructured data and complex data structures.
Smart Data Lakes’ underlying technologies are scalable horizontally. They can thus be leveraged to adjust performance on demand and steadily grow in support of your business.
Smart Data Lake’s key components
To leverage the full potential of an analytics platform, a core with a strong, high-quality data foundation is needed, in which data is standardized, enriched, transformed, and secured. It must also be structured semantically and fulfill data privacy requirements to be labelled Smart Data.
The Smart Data Lake is built upon a multi layered data architecture, where raw data is collected in a staging layer, as it would be in a classic data lake. This data is then transformed and curated through multiple layers into a secured, high-quality business view of data. Generic and customized transformations help prepare data to be efficiently used in varied analytical tasks and applications. The underlying technologies allow to process data in stream or batch mode.
The multi layered data architecture of a Smart Data Lake
Metadata-driven
Smart Data Lake Builder is built on top of easy to maintain metadata, thus keeping a holistic view on all data objects and transformations. Visual data lineage graphs and a data catalog can be generated or exported.
Automated
The metadata allows for automatic, dynamic creation and execution of data pipelines. For sources with large number of data objects, metadata can easily be generated.
Connected
Smart Data Lake has out-of-the-box connectivity for most current technologies, including HadoopFS, Hive, Kafka, JDBC, Splunk, Webservice, SFTP, JMS, but also Excel and Access.
Customizable
Custom transformations can be defined using SQL, Java/Scala, or Python. The product can be easily extended in Java/Scala.
Reusable
Generic transformations like historization or deduplication are supported out-of-the-box.
Cloud-Ready and scalable
Smart Data Lake is built with the cloud in mind. While you can run it in very small setups locally, the Smart Data Lake Builder is ready to run out-of-the-box in most popular private or public cloud infrastructures to scale horizontally.
Open Source
The solution is built on top of many open-source technologies like Apache Spark®. In turn, ELCA provides the Smart Data Lake Builder as an open-source tool under the GPL license on GitHub, see https://www.smartdatalake.io.
No Vendor Lock-In
The whole ecosystem of Smart Data Lake is vendor neutral. A comprehensive feature list is available here.
As maintainer and major contributor, ELCA has gained extensive know-how on Smart Data Lake Builder and the concepts behind it within its “Data, Analytics and AI” Business Line.
ELCA will :
- Help you build a strong, modern data foundation with Smart Data Lake Builder (project and mandates)
- Support you in developing sophisticated analytical apps on top of your Data Lake (project and mandates)
- Customize Smart Data Lake Builder to your needs and integrate it into your environment
- Provide subscriptions to support your Smart Data Lake Builder installation in production.