Azure Databricks and Spark SQL (Python)

$9.99 (90% OFF)

About This Course

<div>I’m Malvik Vaghadia, a Data Engineer and Architect with nearly 15 years of professional experience. I'm also a recognised Databricks Champion, an honour given to a small global community for deep platform expertise and contribution to the wider ecosystem.</div><div><br></div><div>I’ve worked on multiple large-scale lakehouse implementations and consulted for enterprise clients. As an instructor, I’ve taught 200,000+ students worldwide and hold a 4.6+ instructor rating. Since launching this course, it has become one of Udemy’s best-sellers in the Databricks category, and this new version (Sept 2025) has been completely rebuilt with 17 hours of brand-new content.</div><div><br></div><div>Why Learn Databricks</div><div><br></div><div>Databricks is recognised as a Leader in the Gartner Magic Quadrant for Data & AI platforms. It has become the go-to lakehouse platform for modern data engineering, enabling organisations to build, orchestrate, and optimise pipelines at scale. By mastering Databricks, you’ll be learning one of the most in-demand skills in today’s data landscape.</div><div><br></div><div>Course Delivery Style</div><div><br></div><div>This course is designed with the right balance of theory, hands-on coding, and practical projects. Every concept is explained clearly, then demonstrated live in Databricks, and reinforced with a multi-phase, end-to-end project that you’ll build step by step. You’ll also get all course notebooks as downloadable materials, containing the full code, step-by-step documentation, and extra resources so you can follow along easily.</div><div><br></div><div><span style="font-size: 1rem;">Curriculum Highlights:</span></div><div><ul><li><span style="font-size: 1rem;">Four Part Course Project: End-to-end NYC Taxi project and further pipeline builds across multiple parts as you develop your knowledge.</span></li><li><span style="font-size: 1rem;">Foundations: What data engineering is, why Databricks, the Spark architecture, PySpark, and the Lakehouse.</span></li><li><span style="font-size: 1rem;">Azure setup: Account creation, resources, role-based access control, naming conventions, and cost management.</span></li><li><span style="font-size: 1rem;">Databricks setup: Creating and configuring a workspace, navigating the UI, and handling personal email restrictions.</span></li><li><span style="font-size: 1rem;">Databricks notebooks and workspace: Markdown, comments, organising objects, mixing languages, and notebook tips.</span></li><li><span style="font-size: 1rem;">Databricks compute: Clusters, DBU pricing, runtimes, serverless vs all-purpose compute, instance pools, and SQL warehouses.</span></li><li><span style="font-size: 1rem;">Spark SQL (Python): Writing Spark SQL code using both SQL syntax and DataFrame APIs, reading/writing different file formats, defining schemas, and managing tables and views.</span></li><li><span style="font-size: 1rem;">PySpark Transformations: Column operations, functions, filtering, sorting, joining, aggregations, pivots, and conditional logic.</span></li><li><span style="font-size: 1rem;">Medallion architecture: Bronze, Silver, and Gold layers explained and implemented.</span></li><li><span style="font-size: 1rem;">Delta Lake: Transaction log, schema enforcement and evolution, time travel, and DML operations (MERGE, UPDATE, DELETE).</span></li><li><span style="font-size: 1rem;">Workflows and jobs: Passing parameters, handling failures, concurrency, conditional tasks, and monitoring.</span></li><li><span style="font-size: 1rem;">Git & local development: VS Code setup, linking with GitHub, repos, and workflow best practices.</span></li><li><span style="font-size: 1rem;">Functions and modularization: Creating and importing Python modules, UDFs, and project structuring.</span></li><li><span style="font-size: 1rem;">Unity Catalog & governance: Metastores, securable objects, workspace roles, external locations, and permissions.</span></li><li><span style="font-size: 1rem;">Streaming & Lakeflow pipelines: Structured Streaming concepts, Auto Loader, watermarking, triggers, and the new Lakeflow (DLT) pipeline model.</span></li><li><span style="font-size: 1rem;">Performance: Lazy evaluation, explain plans, caching, shuffles, broadcast joins, partitioning, Z-ORDER, and Liquid Clustering.</span></li><li><span style="font-size: 1rem;">Automation & CI/CD: Programmatic interaction with Databricks, CLI demo, and high-level CI/CD overview.</span></li></ul></div><div><span style="font-size: 1rem;">By the end of the course, you’ll have both the knowledge and confidence to design, build, and optimise production-grade data pipelines on Databricks.</span></div>

What you'll learn:

How to use Databricks to build and run data engineering workflows
The principles of the Lakehouse architecture with Delta Lake
How to process data with Spark SQL and PySpark
Best practices for Databricks compute, jobs, and orchestration
How to apply governance with Unity Catalog and manage secure access
Working with streaming pipelines using Structured Streaming and Lakeflow
Applying concepts to real-world projects with modular code and version control
Real World Scenarios

About This Course

What you'll learn:

More Course Deals