DataTalksClub Week 1: Data engineering essentials
Introduction
I have recently enrolled in the DataTalksClub Data Engineering zoom camp so will be posting a series of blogs detailing what I have learned and the journey along the way - enjoy!
In this post, I'll explain why I decided to start this course and share my experiences from the first week.
Why this course?
- Completely free: This course is entirely free. Which is great for those starting out or who are budget conscious.
- Beginner-friendly: The course assumes no prior knowledge of data engineering, although you are expected to know the basics of python.
- Hands-on learning: You get the opportunity to complete an example project plus weekly homework exercises and code-alongs. Great for building up your Github profile!
- Community support: DataTalksClub has a large and active community of learners and professionals. This is so handy if you are stuck on a problem or are even if you are looking for career advice.
- Up to date: The content covers modern data engineering tools and the content is updated regularly.
- Flexible schedule: The course is self-paced, so you can fit it around your existing commitments - ideal for those working full time. (Although there are deadlines for the (optional) homework.)
Week 1 overview: building the basics
The first week focused on setting up essential environments and becoming familiar with the fundamental tools involved in data engineering. The primary areas of focus were:
- Docker: A platform designed to simplify the creation, deployment, and running of applications by using containers. Containers allow developers to package an application with all its dependencies, ensuring consistency across various environments. Please see a previous blog post I did all about Docker!
- PostgreSQL: An open-source relational database system known for its robustness and versatility.
- pgAdmin: An easy to use web based interface that facilitates the management and visualisation of PostgreSQL databases, making it easier to interact with and query data.
- SQL: The standard language for querying and manipulating relational databases.
- Terraform: An infrastructure as code tool that allows for the automation of infrastructure management. It enables the definition and provisioning of data centre infrastructure using a high-level configuration language.
Key takeaways from week 1
- The Power of Containerisation with Docker: Docker's ability to create isolated environments ensures that applications run uniformly across different systems. Docker provides consistency for developers, which anyone who has ever worked on a team project will know is so important.
- Managing Data with PostgreSQL and pgAdmin: Setting up a PostgreSQL database and interacting with it through pgAdmin provided hands-on experience in data management. The web based system of pgAdmin made the whole thing feel really similar to something like MySQL Workbench.
- Attention to Detail is Key: Setting up the environments needed for the course took me a longer time than expected. The delays were mostly self-made, where I was rushing ahead or not paying attention to small details. Working within the terminal makes these mistakes feel even more frustrating when trying to solve them. A lesson learned for next week is to take my time!
An example of the SQL queries done on the pgAdmin web tool |
Reflections from a beginner
As is obvious from my other blog posts, I am a beginner in the tech world trying to up-skill as much as I can. Most of my past experience has been in the data analysis and visualisation space, but I am keen to go back a few steps and look at the data engineering side of things in 2025.
I struggled with some of the set up this week, mostly because of my lack of patience. Looking back now whilst writing this blog, I feel like I learned a lot from fixing the errors; it forced me to research more into what I was doing, rather than blindly copying. That is the great thing about coding/data - there is always something to learn and investigate. The feeling of having all systems connected and querying the database correctly was such a relief!
As I progress through the Data Engineering ZoomCamp, I look forward to delving deeper into more advanced topics and sharing my insights. Stay tuned!
For those interested in exploring the course further, more information is available on the DataTalks.Club website.
Comments
Post a Comment