What is a data lake
In Russian, data lake is translated as "data lake". It is a huge repository in which different data is stored in a "raw", that is, unordered and raw form. Data in a data lake is like a fish in a lake that got there from a river - you don't know exactly what kind of fish is there and where it is located. And in order to “cook” the fish, that is, to process the data, it must also be caught.
Most of the time we deal with unstructured data in our lives. Videos, books, magazines, Word and PDF documents, audio recordings, and photos are all unstructured data and can all be stored in Data Lake.
How the data lake works
Data lake is a huge storage that accepts any files of all formats. The source of the data doesn't matter either. A data lake can take data from CRM or ERP systems, product catalogs, banking software, sensors, or smart devices—any system a business uses.
Later, when the data is saved, you can work with it - extract it according to a certain template into classic databases or analyze and process it right inside the data lake.
The collected data can be distributed and structured, analytics can be set up for building models and testing assumptions, and machine learning can be used.
Read on the topic: Why you need Hadoop
Another example of a data processing tool in the data lake is BI systems that help businesses solve the problems of in-depth analytics (data mining), predictive modeling, and also visualize the results. The area of use is multifaceted - from financial management to risk management and marketing.
Such specialists have access to data in the data lake and can process it using various analytical systems and approaches. In a data lake, data can be processed without extraction - it is enough to equip systems for analysis right inside the lake,”
@Special Thanks To
@suboohi
@siz-official