Data Engineer
Data engineers work closely with Subject Matter Experts (SMEs) to design
the data model, develop data pipelines, and integrate our system
with external systems containing the data. Data engineers also need to provide guidance and support on how to access and leverage the data foundation to create new workflows or analyze data
Data Pipeline Development & Maintenance
• Integrate new data sources to Foundry using Data Connection
• Implement 2-way integrations between Foundry and external systems
• Develop pipelines transforming tabular or unstructured data
• Implement data transformations in PySpark or Pipeline Builder
to derive new datasets or create ontology objects
• Set up support structures for pipelines running in production
• Monitor and debug critical issues such as data staleness or data quality
• Improve performance of data pipelines (latency, resource usage)
• Design and implement an ontology based on business requirements and
available data
• Provide data engineering context for application development
Minimum Criteria Details
• Between 1 and 3 years of experience, ideally in a customer-facing role
• Experience in Python/PySpark, or experienced in another
programming language and willing to learn Python and PySpark on their
own
• Data engineering experience preferred over data science
• Programming experience requiring collaborative software development
Coding Skills
•Python – complete language proficiency
• SQL – proficiency in querying language (join types, filtering, aggregation)
and data modeling (relationship types, constraints)
• PySpark – basic familiarity (DataFrame operations, PySpark SQL
functions) and differences with other DataFrame implementations
(Pandas)
Desirable but not essential:
• knowledge of industrial processes
• knowledge of SAP, Oracle, Salesforce or similar ERP's.
Frameworks and Conceptual Familiarity
• Distributed compute – conceptual knowledge of Hadoop and Spark (driver,
executors, partitions)
• Databases – general familiarity with common relational database models
and proprietary instantiations, such as SAP, Salesforce etc.
• Git – knowledge of version control / collaboration workflows and
best practices
• Iterative working – familiarity with agile and iterative working
methodology and rapid user feedback gathering concepts
• Data quality – best practices