Smart Data Lake Builder

The efficient way to lay the foundations of a smarter Data Lake

Whilst building, operating and maintaining a multi-layered data architecture is usually complex and expensive, the Smart Data Lake Builder developed by ELCA helps you save up to 50% of its usual implementation and maintenance cost.

Zacharias Kull
Senior Expert Data Analytics

Building a Smart Data Lake unleashes the true power of modern Analytics Platforms. It enables the extraction of meaningful insights in support of key decision-making processes driven by data consumers, data scientists and analysts. It also feeds into key applications within your organization.

Classical vs Smart Data Lakes

Classical Data Lakes are often reduced to basic but cheap raw data storage, neglecting significant aspects like transformation, data quality and security. These topics are left to data scientists, who end up spending up to 80% of their time acquiring, understanding and cleaning data before they can start using their core competencies.


In addition, classical Data Lakes are often implemented by separate departments using different standards and tools, which makes it harder to implement comprehensive analytical use cases.


Smart Data Lakes solve these various issues by providing architectural and methodical guidelines, together with an efficient tool to build a strong high-quality data foundation.

Key benefits of Smart Data Lake Builder

The Smart Data Lake Builder leverages metadata and automation to reduce complexity and generate significant savings over implementation and maintenance:

  • Significant savings (30 to 50%) on data lake implementation, operation and maintenance
  • Faster & cost-effective implementation of new analytical apps
  • Increased productivity of data scientists and improved self-service for data consumers
  • Higher clarity of data structure and origin
  • No binding to any platform or vendor
What’s a Smart Data Lake made of?

Smart Data Lakes are at the core of any modern analytics platform. Their structure easily integrates prevalent Data Science tools and open source technologies, as well as AI and ML. Their storage is cheap and scalable, supporting both unstructured data and complex data structures.


Smart Data Lakes’ underlying technologies are scalable horizontally. They can thus be leveraged to adjust performance on demand and steadily grow in support of your business.

Smart Data Lake’s key components

SDLB Graphic 01v2.svg
SDLB Graphic 01v2.svg
How to build a strong, high-quality Data Foundation

To leverage the full potential of an analytics platform, a core with a strong, high-quality data foundation is needed, in which data is standardized, enriched, transformed, and secured. It must also be structured semantically and fulfill data privacy requirements to be labelled Smart Data.


The Smart Data Lake is built upon a multi layered data architecture, where raw data is collected in a staging layer, as it would be in a classic data lake. This data is then transformed and curated through multiple layers into a secured, high-quality business view of data. Generic and customized transformations help prepare data to be efficiently used in varied analytical tasks and applications. The underlying technologies allow to process data in stream or batch mode.


The multi layered data architecture of a Smart Data Lake

SDLB Graphic 02.svg
Key features of Smart Data Lake Builder


Smart Data Lake Builder is built on top of easy to maintain metadata, thus keeping a holistic view on all data objects and transformations. Visual data lineage graphs and a data catalog can be generated or exported.



The metadata allows for automatic, dynamic creation and execution of data pipelines. For sources with large number of data objects, metadata can easily be generated.



Smart Data Lake has out-of-the-box connectivity for most current technologies, including HadoopFS, Hive, Kafka, JDBC, Splunk, Webservice, SFTP, JMS, but also Excel and Access.



Custom transformations can be defined using SQL, Java/Scala, or Python. The product can be easily extended in Java/Scala.



Generic transformations like historization or deduplication are supported out-of-the-box.


Cloud-Ready and scalable

Smart Data Lake is built with the cloud in mind. While you can run it in very small setups locally, the Smart Data Lake Builder is ready to run out-of-the-box in most popular private or public cloud infrastructures to scale horizontally.


Open Source

The solution is built on top of many open-source technologies like Apache Spark®. In turn, ELCA provides the Smart Data Lake Builder as an open-source tool under the GPL license on GitHub, see


No Vendor Lock-In

The whole ecosystem of Smart Data Lake is vendor neutral. A comprehensive feature list is available here.

What ELCA offers

As maintainer and major contributor, ELCA has gained extensive know-how on Smart Data Lake Builder and the concepts behind it within its “Data, Analytics and AI” Business Line.


ELCA will :

  • Help you build a strong, modern data foundation with Smart Data Lake Builder (project and mandates)
  • Support you in developing sophisticated analytical apps on top of your Data Lake (project and mandates)
  • Customize Smart Data Lake Builder to your needs and integrate it into your environment
  • Provide subscriptions to support your Smart Data Lake Builder installation in production.
Contact: Zacharias Kull

By continuing to browse this site, you accept the use of cookies or similar technologies whose purpose is to produce statistics on visits to our site (tests and measurement of visitor numbers, visit frequency, page views and performance) and to offer you content and promotions which will be of interest to you.

Our cookie policy has been updated. Please feel free to manage your preferences.


Manage your cookie preferences

Update your cookie preferences

Find out about the type of cookies stored on your device, accept or block them for the entire site, all services or on a service-by-service basis.

OK, accept all

Disable all

Visitor flow

These cookies provide us with insight into traffic sources and allow us to better understand our visitors anonymously.

(Google Analytics and CrazyEgg)


Sharing tool

Social media cookies allow content sharing on your preferred networks.



Visitor understanding

These cookies are used to track visitors across websites.

The intention is to enable us to offer more relevant, targeted content to existing contacts (ClickDimensions) and display ads that are relevant and engaging for users (Facebook Pixels).


For more information about these cookies and our cookie policy, click here