Mid/Senior Data Engineer
Senior
Warszawa
Zdalnie
Full-time
O ofercie
CodiLime is a software and network engineering industry expert and the first-choice service partner for top global networking hardware providers, software providers, and telecoms. The company has 250+ employees and has been operating since 2011. Their clients include tech startups and large companies across the US, Japan, Israel, and Europe. The project involves building a centralized, large-scale business data platform for a major global consulting firm, combining data from over 10 sources into a unified dataset with over 300 million company records.
Wymagania
- Strong experience with Snowflake and dbt
- Strong SQL skills including query optimization
- Experience with orchestration tools like Apache Airflow, Azure Data Factory, or similar
- Experience with Docker, Kubernetes, and CI/CD practices for data workflows
- Experience working with large-scale datasets
- Very good understanding of data pipeline design concepts and best practices
- Experience with data lake architectures for large-scale data processing and analytics
- Very good coding skills in Python including clean, scalable, and testable code with unit tests
- Understanding and applying object-oriented programming (OOP)
- Experience with version control systems: Git
- Good knowledge of English (minimum C1 level)
Nice to have:
- Experience with Apache Spark (ideally on Azure Databricks)
- Experience with GitHub Actions for CI/CD workflows
- Experience with API Gateway, FastAPI (REST, async)
- Experience with Azure AI Search or AWS OpenSearch
- Familiarity with designing and developing ETL/ELT processes
- Familiarity with LLMs, Azure OpenAI, or Agentic AI systems
Obowiązki
Data Pipeline Development:
- Designing, building, and maintaining scalable, end-to-end data pipelines for ingesting, cleaning, transforming, and integrating large structured and semi-structured datasets
- Optimizing data collection, processing, and storage workflows
- Conducting periodic data refresh processes through data pipelines
- Building a robust ETL infrastructure using SQL technologies
- Assisting with data migration to the new platform
- Automating manual workflows and optimizing data delivery
Data Transformation & Modeling:
- Developing data transformation logic using SQL and DBT for Snowflake
- Designing and implementing scalable and high-performance data models
- Creating matching logic to deduplicate and connect entities across multiple sources
- Ensuring data quality, consistency, and performance to support downstream applications
Workflow Orchestration:
- Orchestrating data workflows using Apache Airflow running on Kubernetes
- Monitoring and troubleshooting data pipeline performance and operations
Data Platform & Integration:
- Enabling integration of 3rd-party and pre-cleaned data into a unified schema with rich metadata and hierarchical relationships
- Working with relational (Snowflake, PostgreSQL) and non-relational (Elasticsearch) databases
Software Engineering & DevOps:
- Writing data processing logic in Python
- Applying software engineering best practices including version control (Git), CI/CD pipelines (GitHub Actions), and DevOps workflows
- Ensuring code quality using tools like SonarQube
- Documenting data processes and workflows
- Participating in code reviews
Future-Readiness & Integration:
- Preparing the platform for future integrations such as REST APIs and LLM/agentic AI
- Leveraging Azure-native tools for secure and scalable data operations
- Communicating and collaborating effectively with other developers
- Maintaining project documentation in Confluence
Benefity
- Flexible working hours and approach to work: fully remote, in the office, or hybrid
- Professional growth supported by internal training sessions and a training budget
- Solid onboarding with a hands-on approach
- A great atmosphere among professionals
- Ability to change the project you work on
- Sport subscription
- Training budget
- Private healthcare
- Flat structure
- Small teams
- International projects
- Masterclazz training
- Free coffee
- Bike parking
- Playroom
- Free beverages
- Free lunch
- In-house trainings
- Modern office
- No dress code