Embracing Large Language Models in Data Engineering: An Optimistic Perspective
A field in flux.
We’ve seen a lot of changes in the area of data engineering over the past few years. Amazon Redshift started the competition in cloud data warehousing in 2012, followed by Snowflake and Databricks. Maxime Beauchemin wrote an important article about the role of data engineering about six years ago. Dbt and its many uses, like building analytics solutions like a software engineer, have recently become popular. We expect more changes in the near future, and people working in this field will need to keep learning. People have been talking a lot about ChatGPT and Large Language Models (LLMs) since the end of 2022, wondering about their impact. In this article, I will share my thoughts on how widespread use of LLM technology might affect data engineers.
AI is the UI, and it’s going to democratize data querying.
The StackOverflow blog post, “AI Isn’t the App, It’s the UI“, explains that interfaces like ChatGPT are great for accessing apps and databases. This means people can just ask for what they need using everyday language. A solid system behind the scenes provides the action or information. This way, we keep the benefits of LLM’s while reducing some of the uncertainty of using them.
Tools like ChatGPT could change the game by making database queries as simple as asking a question. It could be the answer to self-service Business Intelligence. People won’t need to know SQL or wait for data analysts to answer their questions. They can just ask a chatbot for the information they need for their work. If we get tools that let us create visualizations using code (which I think will happen soon), they could even add nice-looking, relevant graphs to the data just by asking an AI assistant.
AI will further fuel digitalization.
There’s been a lot of talk about how ChatGPT can write code. If LLM’s can indeed help software engineers do their jobs better as expected (even though they won’t completely replace software engineers or make it a job anyone can do), then making software will get easier and less expensive. This could help companies digitize business processes that are now too costly to put into software.
This would also be great for data engineers. More apps in a company means more data, more possible insights, and more pipelines to build and keep aligned with the data warehouse. I also predict that cheaper cloud services, thanks to more efficient hardware lowering computing and storage costs, will encourage more digitization in businesses. Lower project costs are another advantage for digitization and data engineering.
The data warehouse will become even more valuable.
Both of these changes are good news for data engineers. As data becomes more common and easier to get, a good, up-to-date data warehouse becomes very valuable for a company. After all, your chatbot needs to get the data from somewhere, right? And the reliable SQL-based data warehouse might be the best answer. It’s a technology that’s easy to use and lots of people know how to use SQL. Data engineers will use a mix of modern tools like Airbyte or Fivetran, dbt, Git, and a cost-effective cloud database. They’ll use their business knowledge to build nice, well-designed data warehouses that are very beneficial for a company. Making sure datasets are well made and things are named correctly and consistently will help make sure that natural language queries give correct results every time. This is how data engineers will be valuable. They’ll bring together and combine data from different company apps into reliable, accurate, and timely datasets that show the company’s business logic and strategic goals.
Conclusion.
I now give the word to ChatGPT to write the conclusion this article on its own:
“As we move towards a future where AI becomes more tightly integrated with the way businesses operate, data engineers will play a vital role in connecting the dots between these transformative tools and the vast amounts of data being generated. Embracing LLM technology not only offers a multitude of opportunities for data engineers to thrive but also serves as a catalyst for the ever-increasing value of a well-structured, insightful data warehouse.”
A bit generic maybe, but the gist of it is correct and writing it myself would be a hassle, so I’ll leave it at that. 🤖
Further Reading
- For those who aren’t data engineers, but are still worried about how LLM’s might affect their jobs, this article from The Economist is extremely well-written and looks beyond the knee-jerk reactions to the technology’s promise for transforming labor markets: Your job is (probably) safe from artificial intelligence | The Economist
- Seriously, the future is bright for data practitioners. 70% of fortune 500 companies now have a C-level role dedicated to data. 75% of these roles have existed for less than five years. CIOs, Meet Your New Colleagues: Chief Data, Analytics and AI Officers – WSJ
- The people over at dbt labs have more or less come to the same conclusion, and have fleshed out the ideas touched upon in this article: Analytics Intelligence Everywhere – by Jason Ganz (getdbt.com)