% Off Udemy Coupon - CourseSpeak

Azure Databricks and Spark SQL (Python)

Your Hands-On Guide to Databricks Data Engineering with PySpark and Spark SQL, including a 4-Part Course Project

$9.99 (90% OFF)
Get Course Now

About This Course

I’m Malvik Vaghadia, a Data Engineer and Architect with nearly 15 years of professional experience. I’ve worked on multiple large-scale lakehouse implementations and consulted for enterprise clients. As an instructor, I’ve taught 200,000+ students worldwide and hold a 4.6+ instructor rating. Since launching this course, it has become one of Udemy’s best-sellers in the Databricks category, and this new version (Jan 2026) has been completely rebuilt with 17 hours of brand-new content. **Why Learn Databricks** Databricks is recognised as a Leader in the Gartner Magic Quadrant for Data & AI platforms. It has become the go-to lakehouse platform for modern data engineering, enabling organisations to build, orchestrate, and optimise pipelines at scale. By mastering Databricks, you’ll be learning one of the most in-demand skills in today’s data landscape. Course Delivery Style This course is designed with the right balance of theory, hands-on coding, and practical projects. Every concept is explained clearly, then demonstrated live in Databricks, and reinforced with a multi-phase, end-to-end project that you’ll build step by step. You’ll also get all course notebooks as downloadable materials, containing the full code, step-by-step documentation, and extra resources so you can follow along easily. Curriculum Highlights: - Four Part Course Project: End-to-end NYC Taxi project and further pipeline builds across multiple parts as you develop your knowledge. - Foundations: What data engineering is, why Databricks, the Spark architecture, PySpark, and the Lakehouse. - Azure setup: Account creation, resources, role-based access control, naming conventions, and cost management. - Databricks setup: Creating and configuring a workspace, navigating the UI, and handling personal email restrictions. - Databricks notebooks and workspace: Markdown, comments, organising objects, mixing languages, and notebook tips. - Databricks compute: Clusters, DBU pricing, runtimes, serverless vs all-purpose compute, instance pools, and SQL warehouses. - Spark SQL (Python): Writing Spark SQL code using both SQL syntax and DataFrame APIs, reading/writing different file formats, defining schemas, and managing tables and views. - PySpark Transformations: Column operations, functions, filtering, sorting, joining, aggregations, pivots, and conditional logic. - Medallion architecture: Bronze, Silver, and Gold layers explained and implemented. - Delta Lake: Transaction log, schema enforcement and evolution, time travel, and DML operations (MERGE, UPDATE, DELETE). - Workflows and jobs: Passing parameters, handling failures, concurrency, conditional tasks, and monitoring. - Git & local development: VS Code setup, linking with GitHub, repos, and workflow best practices. - Functions and modularization: Creating and importing Python modules, UDFs, and project structuring. - Unity Catalog & governance: Metastores, securable objects, workspace roles, external locations, and permissions. - Streaming & Lakeflow pipelines: Structured Streaming concepts, Auto Loader, watermarking, triggers, and the new Lakeflow (DLT) pipeline model. - Performance: Lazy evaluation, explain plans, caching, shuffles, broadcast joins, partitioning, Z-ORDER, and Liquid Clustering. - Automation & CI/CD: Programmatic interaction with Databricks, CLI demo, and high-level CI/CD overview. By the end of the course, you’ll have both the knowledge and confidence to design, build, and optimise production-grade data pipelines on Databricks.

What you'll learn:

  • How to use Databricks to build and run data engineering workflows
  • The principles of the Lakehouse architecture with Delta Lake
  • How to process data with Spark SQL and PySpark
  • Best practices for Databricks compute, jobs, and orchestration
  • How to apply governance with Unity Catalog and manage secure access
  • Working with streaming pipelines using Structured Streaming and Lakeflow
  • Applying concepts to real-world projects with modular code and version control
  • Real World Scenarios