We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
Remote New

Data Integration Engineer

O'Reilly Media Inc
United States
Jan 30, 2025
Description
About O'Reilly Media
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things-and do things better-by providing them with the skills and understanding that's necessary for success.
At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O'Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.
Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.
Learn more: https://www.oreilly.com/about/

Diversity
At O'Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.
Learn more: https://www.oreilly.com/diversity

About the Team
Our data platform team is dedicated to establishing a robust data infrastructure, facilitating easy access to quality, reliable, and timely data for reporting, analytics, and actionable insights. We focus on designing and building a sustainable and scalable data architecture, treating data as a core corporate asset. Our efforts also include process improvement, governance enhancement, and addressing application, functional, and reporting needs. We value teammates who are helpful, respectful, communicate openly, and prioritize the best interests of our users. Operating across various cities and timezones in the US, our team fosters collaboration to deliver work that brings pride and fulfillment.

About the Role
We are seeking an experienced and detail-oriented Data Integration Engineer to contribute to the development and expansion of a suite of systems and tools, with a primary focus on ETL processes. The ideal candidate will have a deep understanding of modern data engineering concepts and will have shipped or supported code and infrastructure with a user base in the millions and datasets with billions of records. The candidate will be routinely implementing features, fixing bugs, performing maintenance, consulting with product managers, and troubleshooting problems. Changes you make will be accompanied by tests to confirm desired behavior. Code reviews, in the form of pull requests reviewed by peers, are a regular and expected part of the job as well.

Salary Range: $110,000 - $138,000
What You'll Do
  • ETL Development with Talend:
    • Architect and build complex ETL pipelines in Talend Data Integration, ensuring scalability, reusability, and maintainability of workflows.
    • Implement sophisticated data transformations, including lookups, joins, aggregates, and custom routines using Talend's tMap, tJavaRow, tSQLROW and JSON components.
    • Develop data pipelines or features related to data ingestion, transformation, or storage using Python and relational databases (e.g., PostgreSQL) or cloud-based data warehousing (e.g.,BigQuery)
    • Automate data ingestion from REST APIs, FTP servers, cloud platforms, and relational databases into cloud or on-premises storage.
    • Leverage Talend's integration with BigQuery for seamless data flow into analytical systems, employing native connectors.
    • Familiarity with Talend's debugging tools, logs, and monitoring dashboards to troubleshoot and resolve job execution issues.
    • Optimize Talend jobs by using efficient memory settings, parallelization, and dependency injection for high-volume data processing.
    • Integrate Talend with Google Cloud Storage, Pub/Sub, and Dataflow to create hybrid workflows combining batch and real-time data processing.
    • Manage Talend deployments using Talend Management Console (TMC) for scheduling, monitoring, and lifecycle management.
  • BigQuery Data Management:
    • Build high-performance BigQuery datasets, implementing advanced partitioning (DATE, RANGE) and clustering for cost-effective queries.
    • Proficient in working with JSON and ARRAY data structures, with expertise in leveraging BigQuery to efficiently nest and unnest objects as required for complex data transformations and analysis.
    • Write advanced SQL queries for analytics, employing techniques like window functions, CTEs, and array operations for complex transformations.
    • Implement BigQuery federated queries to integrate external datasets from Cloud Storage or other data warehouses.
    • Fundamental understanding of Designing and managing BigQuery reservations and slots involves allocating compute resources effectively to balance performance, cost, and workload demands across various teams and projects.
  • Real-time Data Pipelines with Google Pub/Sub and Dataflow:
    • Implement Pub/Sub topics and subscriptions to manage real-time data ingestion pipelines effectively.
    • Integrate Pub/Sub with Talend for real-time ETL workflows, ensuring low-latency data delivery.
    • Implement dynamic windowing and triggers for efficient aggregation and event handling.
    • Optimize streaming pipelines by fine-tuning autoscaling policies, worker counts, and resource configurations.
  • PostgreSQL Database Development and Optimization:
    • Be able to enhance, modify existing PostgreSQL queries and functions
    • Write advanced PL/pgSQL functions and triggers for procedural data logic.
    • As needed develop materialized views and indexed expressions to speed up query execution for large datasets.
    • Monitor and optimize queries through EXPLAIN/ANALYZE.
What You'll Have
Required:
  • 6+ years of professional data engineering experience (equivalent education and/or experience may be considered)
  • Strong experience with Talend Data Integration for designing and optimizing ETL pipelines
  • Excellent Python and PostgreSQL development and debugging skills
  • Experience in data extraction, transformation, and loading (ETL) using Python.
  • Experience working with JSON and ARRAY data structures in BigQuery, including nesting and unnesting
  • Experience in integrating and optimizing streaming data pipelines in a cloud environment
  • Experience with deployment tools such as Jenkins to build automated CI/CD pipelines
  • Hands-on experience with Google Cloud Storage, Pub/Sub, Dataflow, and Dataprep for ETL and real-time data processing
  • Proficient in building and managing real-time data pipelines with Google Pub/Sub and Dataflow
  • Proficient in BigQuery, including dataset management, advanced SQL, partitioning, clustering, and federated queries
  • Solid understanding of PostgreSQL, including PL/pgSQL, query optimization, and advanced functions
  • Familiarity with optimizing BigQuery performance through reservations, slots, and cost-effective query techniques
  • Proven experience in creating, managing, and merging branches in Git, following best practices for version control.
  • Expertise in resolving merge conflicts, with a deep understanding of branching strategies, rebasing, and other Git workflows.
  • Extensive experience with GitHub pull requests (PRs), including creating, reviewing, and approving code changes in a collaborative environment.
  • Excellent problem-solving skills and ability to optimize high-volume data workflows
  • Strong communication skills to collaborate effectively with cross-functional teams
  • Strong drive to experiment, learn and improve your skills
  • Respect for the craft-you write self-documenting code with modern techniques
  • Great written communication skills-we do a lot of work asynchronously in Slack and Google Docs
  • Empathy for our users-a willingness to spend time understanding their needs and difficulties is central to the team
  • Desire to be part of a compact, fun, and hard-working team
Preferred:
  • Experience Integrating BigQuery ML for advanced machine learning use cases, including regression, classification, and time-series forecasting.

Additional Information:At this time, O'Reilly Media Inc. is not able to provide visa sponsorship or provide any immigration support (i.e. H-1B, STEM, OPT, CPT, EAD and Permanent Residency process)

Applied = 0

(web-6f6965f9bf-tv2z2)