Senior Data Engineer
Brak informacji o wynagrodzeniu
SeniorFull-time·Umowa o pracę·B2B
#325566·Dodano 20 dni temu·26
Źródło: theprotocol.itTech Stack / Keywords
SQLPostgreSQLDelta LakePySparkDatabricks WorkflowsUnity CatalogPythonApache AirflowKafkaMicrosoft AzureClaude CodeWindows
Firma i stanowisko
Webellian is a well-established Digital transformation and IT consulting company committed to creating a positive impact for our clients. We are driven by shared values, strong principles, and a passion for innovative and disruptive technologies. We are a community of truly passionate engineers and senior advisors who work with our clients across industries, playing a deep and meaningful role in accelerating and realizing their vision and strategy.
Wymagania
- 6+ years of professional data engineering experience, with a strong track record of delivering production data pipelines at scale.
- Expert-level SQL and strong PostgreSQL expertise: advanced query optimisation, schema design, indexing, partitioning, and understanding of MVCC and connection management.
- Strong Databricks experience: Delta Lake, PySpark, Databricks Workflows, Unity Catalog, and performance tuning of large-scale Spark jobs.
- Proficiency in Python for data pipeline development: pandas, PySpark, data validation libraries (Great Expectations or equivalent), and scripting for automation.
- Experience with data orchestration frameworks: Apache Airflow, Databricks Workflows, or equivalent DAG-based scheduling tools.
- Solid understanding of data integration patterns: CDC with Debezium or equivalent, Kafka-based event streaming, and batch ingestion strategies.
- Hands-on experience with data lakehouse architecture: medallion architecture (Bronze/Silver/Gold), Delta Lake ACID transactions, and table optimisation.
- Experience implementing data quality frameworks and data contracts in production pipelines.
- Familiarity with Azure data services: Azure Data Factory, Azure Event Hubs, Azure Data Lake Storage, or equivalent cloud-native data tooling.
- Hands-on proficiency with Claude Code: using it daily for pipeline development, SQL authoring, data exploration, and documentation tasks.
- Strong communication skills: able to collaborate with data consumers (ML Engineers, analysts, product teams) to understand requirements and deliver reliable data products.
Obowiązki
- Design and build scalable data pipelines for ingestion, transformation, and serving of structured and unstructured data — supporting both batch and real-time AI workloads.
- Develop and maintain Databricks-based data processing workflows: Delta Lake table management, PySpark transformations, notebook orchestration, and Unity Catalog governance.
- Architect and optimise PostgreSQL data models: schema design, indexing strategies, partitioning, query performance tuning, and integration patterns for AI service consumption.
- Build and maintain data orchestration workflows using Apache Airflow, Databricks Workflows, or equivalent — ensuring reliable scheduling, dependency management, and failure recovery.
- Implement data quality frameworks: validation rules, anomaly detection, data contracts, and automated alerting on pipeline health and data freshness.
- Design and manage feature engineering pipelines: transforming raw data into ML-ready feature sets, integrating with feature stores, and versioning feature definitions.
- Own data integration patterns between operational PostgreSQL databases and the Databricks lakehouse: CDC (Change Data Capture), event-driven ingestion via Kafka, and batch export strategies.
- Implement data governance standards: lineage tracking, cataloguing, access control, PII handling, data retention policies, and audit logging.
- Collaborate with ML Engineers to design and deliver data pipelines supporting model training, batch inference, and real-time feature serving.
- Monitor and operate data infrastructure: pipeline observability dashboards, SLA tracking, incident response, and root-cause analysis for data issues.
- Champion Claude Code as an active daily tool for pipeline development, SQL generation, data exploration scripting, and documentation.
Oferta
- Contract under Polish law: B2B or Umowa o Pracę
- Benefits such as private medical care, group insurance, Multisport card
- English classes available
- Hybrid work (at least 1 day/week on-site) in Warsaw (Mokotów)
- Opportunity to work with excellent professionals
- High standards of work and focus on the quality of code
- New technologies in use
- Continuously learning and growth
- International team
- Pinball, PlayStation & much more (on-site)
- Sharing the costs of sports activities
- Private medical care
- Life insurance
- Remote work opportunities
- Fruits
- Video games at work
- Coffee / tea
- Drinks
- Parking space for employees
- Leisure zone
- English classes
Opieka zdrowotna
Ubezpieczenie
Karta sportowa
Kursy językowe
Elastyczne godziny
Darmowe napoje
Darmowe przekąski
Parking dla aut
Pakiet wypoczynkowy
Imprezy teamowe
Webellian
44 aktywne oferty