Job Description

Job Summary:

As a Databricks Lead, you will be a critical member of our data engineering team, responsible for designing, developing, and optimizing our data pipelines and platforms on Databricks, primarily leveraging AWS services. You will play a key role in implementing robust data governance with Unity Catalog and ensuring cost-effective data solutions. This role requires a strong technical leader who can mentor junior engineers, drive best practices, and contribute hands-on to complex data challenges.

Responsibilities:

* Databricks Platform Leadership:

* Lead the design, development, and deployment of large-scale data solutions on the Databricks platform.

* Establish and enforce best practices for Databricks usage, including notebook development, job orchestration, and cluster management.

* Stay abreast of the latest Databricks features and capabilities, recommending and implementing improvements.

* Data Ingestion and Streaming (Kafka):

* Architect and implement real-time and batch data ingestion pipelines using Apache Kafka for high-volume data streams.

* Integrate Kafka with Databricks for seamless data processing and analysis.

* Optimize Kafka consumers and producers for performance and reliability.

* Data Governance and Management (Unity Catalog):

* Implement and manage data governance policies and access controls using Databricks Unity Catalog.

* Define and enforce data cataloging, lineage, and security standards within the Databricks Lakehouse.

* Collaborate with data governance teams to ensure compliance and data quality.

* AWS Cloud Integration:

* Leverage various AWS services (S3, EC2, Lambda, Glue, etc.) to build a robust and scalable data infrastructure.

* Manage and optimize AWS resources for Databricks workloads.

* Ensure secure and compliant integration between Databricks and AWS.

* Cost Optimization:

* Proactively identify and implement strategies for cost optimization across Databricks and AWS resources.

* Monitor DBU consumption, cluster utilization, and storage costs, providing recommendations for efficiency gains.

* Implement autoscaling, auto-termination, and right-sizing strategies to minimize operational expenses.

* Technical Leadership & Mentoring:

* Provide technical guidance and mentorship to a team of data engineers.

* Conduct code reviews, promote coding standards, and foster a culture of continuous improvement.

* Lead technical discussions and decision-making for complex data engineering problems.

* Data Pipeline Development & Optimization:

* Develop, test, and maintain robust and efficient ETL/ELT pipelines using PySpark/Spark SQL.

* Optimize Spark jobs for performance, scalability, and resource utilization.

* Troubleshoot and resolve complex data pipeline issues.

* Collaboration:

* Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver solutions.

* Communicate technical concepts effectively to both technical and non-technical stakeholders.

Qualifications:

* Bachelor's or Master's degree in Computer Science, Data Engineering, or a related quantitative field.

* 7+ years of experience in data engineering, with at least 3+ years in a lead or senior role.

* Proven expertise in designing and implementing data solutions on Databricks.

* Strong hands-on experience with Apache Kafka for real-time data streaming.

* In-depth knowledge and practical experience with Databricks Unity Catalog for data governance and access control.

* Solid understanding of AWS cloud services and their application in data architectures (S3, EC2, Lambda, VPC, IAM, etc.).

* Demonstrated ability to optimize cloud resource usage and implement cost-saving strategies.

* Proficiency in Python and Spark (PySpark/Spark SQL) for data processing and analysis.

* Experience with Delta Lake and other modern data lake formats.

* Excellent problem-solving, analytical, and communication skills.

Added Advantage (Bonus Skills):

* Experience with Apache Flink for stream processing.

* Databricks certifications.

* Experience with CI/CD pipelines for Databricks deployments.

* Knowledge of other cloud platforms (Azure, GCP) is a plus.

Job Tags

Similar Jobs

Promenade Group

Salesforce Administrator Job at Promenade Group

Overview Who we are looking for Are you a Salesforce rockstar ready to take ownership of a dynamic CRM environment? Join Promenade (formerly... ...Sales Cloud and Service Cloud instancesHandle user administration (profiles, roles, permissions, access)Customize objects, record...

Cargill

Boiler Technician Job at Cargill

Job Description New, easy-to-apply options are available for this role: chat with our recruiting assistant Ana at careers.cargill.com or text CargillJobs to 60196. Want to build a stronger, more sustainable future and cultivate your career? Join Cargill's global...

Western Construction Group

Union Bricklayer Mason / Caulker in Cleveland OH Job at Western Construction Group

...with customer satisfaction, then we would love to meet you! Union membership is not required to apply--new employees will receive... .... Using equipment and tools effectively to perform basic construction tasks. Prepares surfaces by cutting and/or grinding out problem...

MVP Consulting Group

Virtual Assistant Job at MVP Consulting Group

...accreditation preparation, while also offering concierge-level referral services for mental health and substance... ...a professional, punctual, and detail-oriented Virtual Assistant to join our team. This flexible, entry- to mid-level role is ideal for someone looking to...

Mercy

RN GIG - Flex/Per Diem/PRN - Emergency Department - Rogers, Arkansas Job at Mercy

MERCY HOSPITAL Northwest Arkansas Department: Emergency Department Status: PRN/Per Diem/Flex/Float Shift: Various Shifts Available Incentives: ~$40/hr Base Rate ~ Variable Shift Incentives ~ W-2 Employee ~ Minimum 1 shift in 90 days ~4, 8, ...

Lead DataBricks Engineer Job at Anblicks, Dallas, TX

MWsrNmNGeTFEeWJqWnJIcTFxWFp3UEw4UUE9PQ==