% Off Udemy Coupon - CourseSpeak

Azure Databricks and Spark SQL (Python)

Your Hands-On Guide to Databricks Data Engineering with PySpark and Spark SQL, including a 4-Part Course Project

$9.99 (90% OFF)
Get Course Now

About This Course

<div>I’m Malvik Vaghadia, a Data Engineer and Architect with nearly 15 years of professional experience. I'm also a recognised Databricks Champion, an honour given to a small global community for deep platform expertise and contribution to the wider ecosystem.</div><div><br></div><div>I’ve worked on multiple large-scale lakehouse implementations and consulted for enterprise clients. As an instructor, I’ve taught 200,000+ students worldwide and hold a 4.6+ instructor rating. Since launching this course, it has become one of Udemy’s best-sellers in the Databricks category, and this new version (Sept 2025) has been completely rebuilt with 17 hours of brand-new content.</div><div><br></div><div>Why Learn Databricks</div><div><br></div><div>Databricks is recognised as a Leader in the Gartner Magic Quadrant for Data &amp; AI platforms. It has become the go-to lakehouse platform for modern data engineering, enabling organisations to build, orchestrate, and optimise pipelines at scale. By mastering Databricks, you’ll be learning one of the most in-demand skills in today’s data landscape.</div><div><br></div><div>Course Delivery Style</div><div><br></div><div>This course is designed with the right balance of theory, hands-on coding, and practical projects. Every concept is explained clearly, then demonstrated live in Databricks, and reinforced with a multi-phase, end-to-end project that you’ll build step by step. You’ll also get all course notebooks as downloadable materials, containing the full code, step-by-step documentation, and extra resources so you can follow along easily.</div><div><br></div><div><span style="font-size: 1rem;">Curriculum Highlights:</span></div><div><ul><li><span style="font-size: 1rem;">Four Part Course Project: End-to-end NYC Taxi project and further pipeline builds across multiple parts as you develop your knowledge.</span></li><li><span style="font-size: 1rem;">Foundations: What data engineering is, why Databricks, the Spark architecture, PySpark, and the Lakehouse.</span></li><li><span style="font-size: 1rem;">Azure setup: Account creation, resources, role-based access control, naming conventions, and cost management.</span></li><li><span style="font-size: 1rem;">Databricks setup: Creating and configuring a workspace, navigating the UI, and handling personal email restrictions.</span></li><li><span style="font-size: 1rem;">Databricks notebooks and workspace: Markdown, comments, organising objects, mixing languages, and notebook tips.</span></li><li><span style="font-size: 1rem;">Databricks compute: Clusters, DBU pricing, runtimes, serverless vs all-purpose compute, instance pools, and SQL warehouses.</span></li><li><span style="font-size: 1rem;">Spark SQL (Python): Writing Spark SQL code using both SQL syntax and DataFrame APIs, reading/writing different file formats, defining schemas, and managing tables and views.</span></li><li><span style="font-size: 1rem;">PySpark Transformations: Column operations, functions, filtering, sorting, joining, aggregations, pivots, and conditional logic.</span></li><li><span style="font-size: 1rem;">Medallion architecture: Bronze, Silver, and Gold layers explained and implemented.</span></li><li><span style="font-size: 1rem;">Delta Lake: Transaction log, schema enforcement and evolution, time travel, and DML operations (MERGE, UPDATE, DELETE).</span></li><li><span style="font-size: 1rem;">Workflows and jobs: Passing parameters, handling failures, concurrency, conditional tasks, and monitoring.</span></li><li><span style="font-size: 1rem;">Git &amp; local development: VS Code setup, linking with GitHub, repos, and workflow best practices.</span></li><li><span style="font-size: 1rem;">Functions and modularization: Creating and importing Python modules, UDFs, and project structuring.</span></li><li><span style="font-size: 1rem;">Unity Catalog &amp; governance: Metastores, securable objects, workspace roles, external locations, and permissions.</span></li><li><span style="font-size: 1rem;">Streaming &amp; Lakeflow pipelines: Structured Streaming concepts, Auto Loader, watermarking, triggers, and the new Lakeflow (DLT) pipeline model.</span></li><li><span style="font-size: 1rem;">Performance: Lazy evaluation, explain plans, caching, shuffles, broadcast joins, partitioning, Z-ORDER, and Liquid Clustering.</span></li><li><span style="font-size: 1rem;">Automation &amp; CI/CD: Programmatic interaction with Databricks, CLI demo, and high-level CI/CD overview.</span></li></ul></div><div><span style="font-size: 1rem;">By the end of the course, you’ll have both the knowledge and confidence to design, build, and optimise production-grade data pipelines on Databricks.</span></div>

What you'll learn:

  • How to use Databricks to build and run data engineering workflows
  • The principles of the Lakehouse architecture with Delta Lake
  • How to process data with Spark SQL and PySpark
  • Best practices for Databricks compute, jobs, and orchestration
  • How to apply governance with Unity Catalog and manage secure access
  • Working with streaming pipelines using Structured Streaming and Lakeflow
  • Applying concepts to real-world projects with modular code and version control
  • Real World Scenarios