Hyderabad, India
Full time
We are seeking an experienced and detail-oriented Data Integration Engineer to contribute to the development and expansion of a suite of systems and tools, with a primary focus on ETL processes. The ideal candidate will have a deep understanding of modern data engineering concepts and will have shipped or supported code and infrastructure with a user base in the millions and datasets with billions of records. The candidate will be routinely implementing features, fixing bugs, performing maintenance, consulting with product managers, and troubleshooting problems. Changes you make will be accompanied by tests to confirm desired behavior. Code reviews, in the form of pull requests reviewed by peers, are a regular and expected part of the job as well.
Key Responsibilities:
ETL Development with Talend
Architect and build complex ETL pipelines in Talend Data Integration, ensuring scalability, reusability, and maintainability of workflows
Develop data pipelines or features related to data ingestion, transformation, or storage using Python and relational databases (e.g., PostgreSQL) or cloud-based data warehousing (e.g.,BigQuery)
Automate data ingestion from REST APIs, FTP servers, cloud platforms, and relational databases into cloud or on-premises storage.
Optimize Talend jobs by using efficient memory settings, parallelization, and dependency injection for high-volume data processing.
Manage Talend deployments using Talend Management Console (TMC) for scheduling, monitoring, and lifecycle management.
BigQuery Data Management:
Build high-performance BigQuery datasets, implementing advanced partitioning (DATE, RANGE) and clustering for cost-effective queries.
Proficient in working with JSON and ARRAY data structures, with expertise in leveraging BigQuery to efficiently nest and unnest objects as required for complex data transformations and analysis.
Write advanced SQL queries for analytics, employing techniques like window functions, CTEs, and array operations for complex transformations.
Real-time Data Pipelines with Google Pub/Sub and Dataflow:
Implement Pub/Sub topics and subscriptions to manage real-time data ingestion pipelines effectively.
Integrate Pub/Sub with Talend for real-time ETL workflows, ensuring low-latency data delivery.
Implement dynamic windowing and triggers for efficient aggregation and event handling.
PostgreSQL Database Development and Optimization:
Be able to enhance, modify existing PostgreSQL queries and functions
Write advanced PL/pgSQL functions and triggers for procedural data logic.
As needed develop materialized views and indexed expressions to speed up query execution for large datasets.
Monitor and optimize queries through EXPLAIN/ANALYZE.
Qualifications:
6+ years of professional data engineering experience (equivalent education and/or experience may be considered)
Strong experience with Talend Data Integration for designing and optimizing ETL pipelines
Excellent Python and PostgreSQL development and debugging skills
Experience in data extraction, transformation, and loading (ETL) using Python.
Experience working with JSON and ARRAY data structures in BigQuery, including nesting and unnesting
Experience in integrating and optimizing streaming data pipelines in a cloud environment
Experience with deployment tools such as Jenkins to build automated CI/CD pipelines
Hands-on experience with Google Cloud Storage, Pub/Sub, Dataflow, and Dataprep for ETL and real-time data processing
Apply Job
To apply for this position, click the Apply button and send us your resume.