An Introduction to Data Engineering for the everyman

I’m going to give a high-level introduction to the world of Data Engineering. This post is designed so that the average person can get an appreciation of the role so there are going to be some in accuracies for the benefit of supplying an approachable explanation for a complex role. With that disclaimer out of the way, allow me to explain what a Data Engineer is.

Where they come from

The need for a Data Engineer comes from an issue that businesses often run into where they have data in different places and formats and need them to be combined, analysed or otherwise made useful. Most business users do not have the technical skills necessary to achieve this, especially in a performant and cost-effective manner. The Data Engineer’s role is to facilitate the requirements of Analysts, Developers, Data Scientists, and other data users.

It’s no secret that the number of Data Engineering jobs has jumped in recent years. I attribute this to 2 main pushes. Firstly management are looking to become more “data driven” i.e. make decisions with data to back them up rather than a gut feeling. Secondly data has value. Companies such as Netflix, Google and Meta (the owners of Facebook) have attributed some of the success to their use of data.

What do they do?

A Data Engineer concerns themselves with moving data from one place to another. They need to consider a variety of variables to build a pipeline that will go from a source system to a destination system. One of the major complicating factors in this is the data may not be in the correct ‘shape’ when it comes out of the source system to fit in the destination system. Data Engineers employ various methods of transforming their data so that the data will fit when it reaches its destination.

A topic that you will hear Data Engineers often discuss is “data quality”. This is a concept where data can be inconsistent or incomplete which, when you come to analyse the data with a visualisation tool (such as Power BI), causes a lot of issues for the person doing the analysis. A commonly used phrase is “garbage in, garbage out” which neatly sums up the problem. Data Engineers have at their disposal a range of tools to deal with these situations.

Many Hats

Depending on the size of the business a Data Engineer may cross multiple disciplines. They may have to do some database administration, data analysis or maybe some business analysis. The majority of what they do will be about the moving and processing of data but often they will have to add extra skills to their repertoire.

One role that will often be mentioned in the same breath as Data Engineer is the Data Scientist. I want to make clear that these are separate but complementary roles. The Data Engineer will handle gathering, processing and serving up data for the Data Scientist to use in their model training or machine learning work.

Real world

An example scenario for a data engineer would be to extract data from the finance & accounting system and load that into a data warehouse so Finance Analysts can build dashboards and query multiple years of data. What this will mean for the Data Engineer is to understand how the data is stored in the finance system, how it’s going to be analysed in the dashboard tools and the volume of data that will entail. From here they will design a pipeline that will facilitate the move of the data and process it into a new structure or “schema” so when the Finance Analysts are building their reports it’s readily understood what each column represents.


Posted

in

by

Tags:

Comments

Leave a comment