Every time we open an article with a title similar to “What is a data engineer?” or “The difference between data engineer and data scientist” we get a cliche answer: Data engineers are like plumbers.
No! No! No! That is wrong. A data engineer can work with pipelines like a plumber but the role is very different.
In this article, I will show you that a data engineer is similar to another profession/job: hydraulic and water resources engineer.
I will explain why in three simple arguments:
- Working titles;
- Task goals;
- Tools needed to develop work.
1. The titles are the same
Well, this is obvious, right? 🙂 They are both engineers.
According to the Oxford English Dictionary, an engineer is “someone who designs, builds or maintains engines, machines, or structures”.
Each role may focus on different resources/products/processes, “water” for hydraulic and water resources engineers, and “data” for data engineers but both handle engineering on it.
They are more “thinking” roles than “manual” roles (like a plumber) since they have to reflect and calculate what are the best solutions for their processes and not act by simple guidelines.
2. Both have similar working goals
This section is similar to the previous but we are focusing on the goal of each engineer.
For example, a mechanical engineer has the working scope to build, maintain and improve mechanical machines that will perform some tasks.
I’m considering a data engineer has the objective of building, maintaining, and improving data pipelines (ETL or ELT processes), data storage structures (data warehouses or data lakes), and providing solid data to the stakeholders.
In a detailed way, the professional has the working scope of guarantee the a) extraction of data from various sources (both internal like relational databases and external sources), b) transformation of data using solid programming skills or software, c) good organization of the data in the correct storage structures and d) quality/organization of all the end-to-end processes and data using orchestrator tools, monitoring tools or other control tools.
A data engineer needs to think of the process as a whole considering downstream and upstream mechanisms.
The specialization of Hydraulic and Water Resources Engineering by the McGill University of Canada describes these two disciplines as follows:
“Water resources engineering is the quantitative study of the hydrologic cycle — the distribution and circulation of water linking the earth’s atmosphere, land and oceans. (…) Applications include the management of the urban water supply, the design of urban storm-sewer systems, and flood forecasting.” and “Hydraulic engineering consists of the application of fluid mechanics to water flowing in an isolated environment (pipe, pump) or in an open channel (river, lake, ocean). Applications include the design of hydraulic structures, such as sewage conduits, dams and breakwaters, the management of waterways, such as erosion protection and flood protection, and environmental management, such as prediction of the mixing and transport of pollutants in surface water.”.
Therefore I’m considering hydraulic and water resources engineers need to guarantee (besides other tasks)
a) the extraction of water from various sources,
b) the correct water cleaning in water treatment facilities (see image above),
c) good organization of the water in the correct storage structures, and
d) quality/organization of all the end-to-end processes with several control tools.
In the table below it is possible to see how identical both roles are in terms of working processes, tasks, or goals (with some examples).
Processes/Task/Scope | Data Engineer | Hydraulic and Water Resources Engineer |
---|---|---|
Extraction of raw product from sources | Relational databases, External API, or CRM data. | Surface water, groundwater, or wastewater. |
Development and maintain transformation processes | Data transformation by cleaning, deduplication, or data type correction. | Water cleaning by removing organic compounds, or non-organic compounds. |
Development and maintain storage structure | Data warehouse, data Lakes. | Water towers, water dams. |
Development of the full process construction | Data orchestration tools. | Computer tools to draw all systems, and wastewater treatment plants. |
Controlling/Monitoring processes and product | Software tools for data lineage or process control | Sensors all over the process |
Stakeholders | Data analysts, Data Scientists. | Cities, industrial. |
Therefore you can see that even having different targets both engineers do similar tasks.
3. They use identical tools
In that cliché of “data engineer equals plumber” it is often written that both have tools. However, the plumber tools are different from the data engineer tools. But both data engineers and hydraulic and water resources engineers use similar tools.
Considering the processes present in the table above I will present you some examples for each role.
For data engineers:
- SQL for analysis of the data sources;
- Python, Scala or other programming languages for development.
- Airflow, Luigi or other for the development of the full process construction
- Grafana and data testing tools to control and monitor.
For hydraulic and water resources engineer:
- Tools for geo analysis or GIS tools for analysis of the sources area;
- Excel or similar tool for calculus;
- CAD software tools for the development of the full process construction;
- Sensors for quality and quantity water control.
So all the tools for both engineers are complex tools (mostly software) with the purpose of proceeding to the estimation of the best solution. They are not manual tools like hammers.
Conclusion
In summary it was presented in three simple subjects that data engineers are less identical to plumbers and more to hydraulic and water resources engineers.
Hydraulic and water resources engineers and data engineers resemble because
- Both are engineers, a “mind role” and not a “manual role” like a plumber;
- They have a similar working scope of extracting/studying raw product, transforming it, storing it and deliver to the stakeholder;
- These positions always have to understand all the process end-to-end by being aware of downstream and upstream operations;
- The tools that both positions use are complex tools aiming calculation and analysis.
And the cliché is down!
What do you think, do you agree with me?
Do you think I am going to be attacked by Mario Bros? 🧑🔧
Did you like this article? Follow me for more articles on Medium.