#299621Dodano Invalid Date9źródło: nofluffjobs.com
CodiLime
CodiLime

Mid/Senior Data Engineer

16 500 - 26 500 PLN(znormalizowane)
Doświadczenie

Senior

Lokalizacja

Warszawa

Tryb pracy

Zdalnie

Wymiar

Full-time

SnowflakedbtSQLApache AirflowAzure Data FactoryDockerKubernetesCI/CDData LakeOOPGitApache SparkAzure DatabricksGitHub ActionsAPI GatewayFastAPIREST APIAzure AI SearchAWS OpenSearchETLELT

O ofercie

CodiLime is a software and network engineering industry expert and the first-choice service partner for top global networking hardware providers, software providers, and telecoms. The company has 250+ employees and has been operating since 2011. Their clients include tech startups and large companies across the US, Japan, Israel, and Europe. The project involves building a centralized, large-scale business data platform for a major global consulting firm, combining data from over 10 sources into a unified dataset with over 300 million company records.

Wymagania

  • Strong experience with Snowflake and dbt
  • Strong SQL skills including query optimization
  • Experience with orchestration tools like Apache Airflow, Azure Data Factory, or similar
  • Experience with Docker, Kubernetes, and CI/CD practices for data workflows
  • Experience working with large-scale datasets
  • Very good understanding of data pipeline design concepts and best practices
  • Experience with data lake architectures for large-scale data processing and analytics
  • Very good coding skills in Python including clean, scalable, and testable code with unit tests
  • Understanding and applying object-oriented programming (OOP)
  • Experience with version control systems: Git
  • Good knowledge of English (minimum C1 level)

Nice to have:

  • Experience with Apache Spark (ideally on Azure Databricks)
  • Experience with GitHub Actions for CI/CD workflows
  • Experience with API Gateway, FastAPI (REST, async)
  • Experience with Azure AI Search or AWS OpenSearch
  • Familiarity with designing and developing ETL/ELT processes
  • Familiarity with LLMs, Azure OpenAI, or Agentic AI systems

Obowiązki

Data Pipeline Development:

  • Designing, building, and maintaining scalable, end-to-end data pipelines for ingesting, cleaning, transforming, and integrating large structured and semi-structured datasets
  • Optimizing data collection, processing, and storage workflows
  • Conducting periodic data refresh processes through data pipelines
  • Building a robust ETL infrastructure using SQL technologies
  • Assisting with data migration to the new platform
  • Automating manual workflows and optimizing data delivery

Data Transformation & Modeling:

  • Developing data transformation logic using SQL and DBT for Snowflake
  • Designing and implementing scalable and high-performance data models
  • Creating matching logic to deduplicate and connect entities across multiple sources
  • Ensuring data quality, consistency, and performance to support downstream applications

Workflow Orchestration:

  • Orchestrating data workflows using Apache Airflow running on Kubernetes
  • Monitoring and troubleshooting data pipeline performance and operations

Data Platform & Integration:

  • Enabling integration of 3rd-party and pre-cleaned data into a unified schema with rich metadata and hierarchical relationships
  • Working with relational (Snowflake, PostgreSQL) and non-relational (Elasticsearch) databases

Software Engineering & DevOps:

  • Writing data processing logic in Python
  • Applying software engineering best practices including version control (Git), CI/CD pipelines (GitHub Actions), and DevOps workflows
  • Ensuring code quality using tools like SonarQube
  • Documenting data processes and workflows
  • Participating in code reviews

Future-Readiness & Integration:

  • Preparing the platform for future integrations such as REST APIs and LLM/agentic AI
  • Leveraging Azure-native tools for secure and scalable data operations
  • Communicating and collaborating effectively with other developers
  • Maintaining project documentation in Confluence

Benefity

  • Flexible working hours and approach to work: fully remote, in the office, or hybrid
  • Professional growth supported by internal training sessions and a training budget
  • Solid onboarding with a hands-on approach
  • A great atmosphere among professionals
  • Ability to change the project you work on
  • Sport subscription
  • Training budget
  • Private healthcare
  • Flat structure
  • Small teams
  • International projects
  • Masterclazz training
  • Free coffee
  • Bike parking
  • Playroom
  • Free beverages
  • Free lunch
  • In-house trainings
  • Modern office
  • No dress code