[online recruitment]
You’ll be working with various ETL (Extract Transform Load) pipelines based mainly on Apache Spark in AWS for batch and streaming processing, communicating daily with colleagues distributed locally and abroad. The project includes numerous activities like design, technical supervision, and development. You should be comfortable with discussing selected technical solutions with a customer, as well as assessing design risks, flaws, and eliminating issues. We expect you’ll be able to collaborate technical work among Big Data developers, QAs, and DevOps to ensure the appropriate quality of code, continuous integration, and continuous delivery per best practices of software development.
Project pipelines examples:
- Spark pipeline to cleanse a dataset for further analysis by ML algorithms by other teams. Deployed on AWS EMR using CDK. Scheduled on a daily basis to process ~1 TB of data
- Workflow to pull pre-processed data from Teradata database, hash sensitive data and store partitioned data on S3 in parquet format. Scheduled on a daily basis on an on-prem server as an Oozie workflow
- Hourly Spark pipeline to normalize highly dynamic dataset in JSON format to satisfy SQL standards. Deployed on AWS EMR using CDK
- Daily Spark pipeline to ingest data to Redshift database for BA analysis. Dataset is extremely dynamic (from 1k columns to 7k each day) which won’t fit a classic SQL database by default
- Gather requirements from users (Data Analytics), design and implement ETL pipelines from scratch
- Support and migration of the existing workflows from on-premise to AWS
- Understanding the Data Lake and Data Warehouse concepts
- Analysis of the business requirements and task estimation
- Refactoring of the existing application. Defects fixing
- Design and implementation of ETL pipelines from scratch on distributed systems
- Support PROD releases. Release documentation
- Release automation using AWS CDK and CloudFormation Templates
5+ years' experience in Java/Scala software development 2+ years' experience with AWS
Tech stack
- Programming languages: Scala, Java
- Hadoop: Spark, Kafka, YARN, HDFS, Oozie (Spark is a basis — you have to know Scala)
- AWS: EMR, S3, Redshift, Glue, Athena, CloudFormation, CDK, CloudWatch, Secret Manager
- non-AWS: Couchbase, ElasticSearch
- Data formats: Parquet, Avro, JSON
- International projects for clients all over the world
- Competitive salary
- Individual development plan
- Managerial Targeted Training programs
- BRIDGE Mentoring Program
- Luxoft Training Center
- Language Classes
- Self-learning online library
- Global Relocation Program
- Internal Mobility (a chance to gain experience in varied projects and technologies)
- Professional communities for knowledge-sharing (Agile, Tech, Business)
- Group Life Insurance
- Travel Insurance
- Private Healthcare (dental care, unlimited consultations of specialist physicians)
- Medical costs reimbursement for employees
- Benefit Program (Cafeteria and Multisport Card)
- LuxGood Program (a wide range of health and well-being initiatives)
- After-hours groups (sport, trips, board games, cultural activities)
- Company and Team events
- BeLux - discount offers program (banking, car leasing, other)
- Convenient locations in modern offices
- International projects for clients all over the world
- Competitive salary
- Individual development plan
- Managerial Targeted Training programs
- BRIDGE Mentoring Program
- Luxoft Training Center
- Language Classes
- Self-learning online library
- Global Relocation Program
- Internal Mobility (a chance to gain experience in varied projects and technologies)
- Professional communities for knowledge-sharing (Agile, Tech, Business)
- Group Life Insurance
- Travel Insurance
- Private Healthcare (dental care, unlimited consultations of specialist physicians)
- Medical costs reimbursement for employees
- Benefit Program (Cafeteria and Multisport Card)
- LuxGood Program (a wide range of health and well-being initiatives)
- After-hours groups (sport, trips, board games, cultural activities)
- Company and Team events
- BeLux - discount offers program (banking, car leasing, other)
- Convenient locations in modern offices