How to Switch From Data Science to Data Engineering

James Phoenix
James Phoenix

Data science and data engineering are neighbouring disciplines within a freelance, startup, agency or SME setting. But scale them up, and you’ll find that the two roles sit within different departments of a larger business or enterprise. 

While some data practitioners start their career as a jack-of-all-trades that can do a bit of data engineering and a bit of data science, it usually becomes necessary to stack the chips on either side. Otherwise, you’ll end up spreading your resources too thin. 

If you’re a data scientist or a “recovering data scientist”, you might be wondering whether to reskill to data engineering. Maybe you’re bored of your job remit, maybe you need engineering skills to thrive, or maybe you’re just looking for a new challenge, or all or none of those things! 

So, how do you switch from data science to data engineering?


Data Science and Data Engineering: The Same But Different?

The difference between data science and data engineering is analogous to the difference between science and engineering. 

Science seeks to contribute understanding to a body of knowledge. It then seeks to make that knowledge communicable to others. Data science looks to synthesise data into understanding, insight and action. They animate the entity of data and give it impact. Here is a spider chart of the data engineer’s core skills:

Engineering focuses on the construction of systems. Data exists in everything, but it’s only made available through data engineering. Data engineers build and support knowledge-giving systems. Data engineering is fundamental in the data science hierarchy of needs, as below:

In short, data engineers enable data scientists to do data science. Here’s an excellent quote to capture that:

“Data engineers are the plumbers building a data pipeline, while data scientists are the painters and storytellers, giving meaning to an otherwise static entity” – David Bianco from UrTheCast

While this depicts data science as perhaps the more desirable job as ‘painters’ vs the ‘plumbers’, that’s really not the case. Data engineering is in incredible demand right now and possibly outstrips data science in job availability – though it’s very marginal. Plus, data engineering will add something to any IT professional’s career, perhaps more so than data science. 


Why Convert From a Data Scientist to a Data Engineer?

Data science and data engineering both align with job requirements in the present and future, and both are in high demand across numerous sectors and industries. Becoming either a data scientist or data engineer is likely to lead to long-term employability within startups, SMEs and enterprises. 

But recent data suggests that data engineering is getting the upper hand in the job market. For example, the Interview Query’s Data Science Interview report found that data science grew by 10% from 2019 to 2020, while data engineering grew by 40% in the same period. Similarly, Mihail Eric found there were some 70% more open data engineering roles than data scientist roles.

As far as salary is concerned, data engineers and data scientists are almost level. One study suggests data engineers earn slightly more ($137,000 vs. $121,000), but others contradict that. 

But, of course, this comes down to more than pay and employability. 


Data Engineers Build Systems

Data engineers build systems that make data available for insight and understanding. That’s fundamental for a myriad of different businesses and organisations, from the science and medical industries to economics and finance, marketing and governmental/public sector. 

If you enjoy building the fundamental systems that make data ‘work’, then data engineering might be for you.

From a technical perspective, this includes at least the following:

  • Data pipelines and ETL/ELT 
  • SQL and NoSQL
  • Data structures 
  • Python and some of its libraries like BeautifulSoup, Keras, Matplotlib, NumPy, Pandas, PyTorch, SciKit-Learn, SciPy, Scrapy and TensorFlow
  • Cloud databasing
  • Testing
  • Data streaming
  • Distributed systems 
  • Machine learning
  • Big Data and Spark, Kafka and Hadoop

For a more complete data engineering skills roadmap, head here. 


Data Engineers Solve Problems Without Necessarily Answering Them 

Data science is typically focused on answering business questions and providing solutions. 

It’s the data scientist’s job to synthesise data into answers to put forward to other business teams, like product teams or marketing teams. This is a more people-facing role that involves more interpretation and subjective decision-making.

Data engineering, on the other hand, has a more objective, technical remit.

The fundamental task is to obtain clean, usable data that drive insight and action. The process of communicating that data with other teams and stakeholders is often someone else’s job! 

That might appeal to some who are fed up with persuading stakeholders with their data science. 


Data Engineering Improves Machine Learning Skills

Data engineering skills are unequivocally useful if you want to get into machine learning. In machine learning, data scientists often feed algorithms data, but engineers help write ML code and deploy ML products.

For those who want to work at the cutting edge, it’s necessary to blend skills from data science and data engineering.

You can’t get away with being a data scientist or a data engineer without picking up the general stats, maths, and IT skills that straddle both remits. 


Switching From Data Science to Engineering

To switch from data science to engineering, you’ll probably end up shelving some of your maths and statistics skills in favour of more systems-based knowledge. Data engineering is less theoretical and revolves around tools and processing techniques. As a data engineer, you’ll need to add skills in specific tools to your repertoire and pivot your coding skills towards the engineering side. 

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!

Developing Python, database and data pipeline skills is crucial. From there, look to build your first data engineering projects, like building a REST API with Flask. BigQuery, AWS Redshift and Snowflake are essential technologies to familiarise oneself with, but always be ready to transfer knowledge to your company or clients’ data stack. 

As a data engineer, you’ll need soft skills such as clear and efficient communication, as you’ll need to find out what data tools a client or company has and describe how you can improve their workflow. 

In the end, data engineering and data science crossover collaborate significantly in most small-to-medium-sized teams. So converting from one to the other won’t be the trickiest thing in the world. Plus, you can also go back or switch between roles once you’re well-versed in either.  


Summary: How To Switch From Data Science to Data Engineering

Data science and data engineering are interconnected. Data scientists are probably more people-facing, and you could argue that it’s the more stressful of the two jobs. After all, a data scientist’s job is to solve problems and demonstrate results. That’s not easy when things don’t quite work as planned or when other stakeholders aren’t receptive. 

On the other hand, data engineers have a tighter, more objective job role. So long as their pipelines and infrastructure work and work well, there isn’t much room to criticise. That isn’t to say that data engineering becomes extremely challenging when working with complex distributed systems – it’s just that the end product is nearly always objectively measurable. 

As the analogy goes data engineers focus on ‘plumbing’. If you build an efficient, effective plumbing system, then that’s it – the job’s done. Data scientists are instead compared to ‘storytellers’ or ‘painters’ – their work can be judged subjectively. 

From a technical standpoint, switching from data science to engineering involves re-tooling and re-skilling. You’ll need to learn new tools and pivot your coding knowledge towards engineering. Once you’ve done that, focus on building projects that demonstrate your data engineering ability.


FAQ

Can I switch from data science to data engineering?

Absolutely! Data science and data engineering are allied and overlap considerably at general level. However, data science is more maths and statistics, whereas data engineering is more data cleaning, data infrastructure, Python, etc. Data engineering is more infrastructure-focused than the more client-focused data science.

Which is better, data science or data engineering?

It’s impossible to say! Both data science and data engineering are in hot demand right now. For careers and work, both are excellent choices. Data engineering is a little more about making clean data work for outcomes, whereas data science is about animating data to drive insight and action.

Who gets paid more, data scientists or data engineers?

Studies show that both data scientists and data engineers enjoy great salaries. There is no robust difference between the salaries of each. Data science is perhaps the more flexible role at lower levels, but data engineering might scale better to expanding businesses and enterprises.


More Stories

Cover Image for Why I’m Betting on AI Agents as the Future of Work

Why I’m Betting on AI Agents as the Future of Work

I’ve been spending a lot of time with Devin lately, and I’ve got to tell you – we’re thinking about AI agents all wrong. You and I are standing at the edge of a fundamental shift in how we work with AI. These aren’t just tools anymore; they’re becoming more like background workers in our digital lives. Let me share what I’ve…

James Phoenix
James Phoenix
Cover Image for Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

The Problem While setting up Devin (a coding assistant) with Supabase CLI on an EC2 instance, I encountered significant performance issues. After investigation, I discovered that Docker was using the VFS storage driver, which is known for being significantly slower than other storage drivers like overlay2. The root cause was interesting: the EC2 instance was already using overlayfs for its root filesystem,…

James Phoenix
James Phoenix