Data Engineer

  • Standard Bank
  • Johannesburg, Gauteng, South Africa
  • May 22, 2020
Full Time Information Technology

Job Description

Job Details

Standard Bank is a firm believer in technical innovation, to help us guarantee exceptional client service and leading edge financial solutions. Our growing global success reflects our commitment to the latest solutions, the best people, and a uniquely flexible and vibrant working culture. To help us drive our success into the future, we are looking for an experienced

Data Engineer

to join our

Risk IT

team at our Johannesburg offices. Standard Bank is a leading African banking group focused on emerging markets globally. It has been a mainstay of South Africa's financial system for more than 150 years, and now spans 17 countries across the African continent.

Job Purpose

Provide infrastructure, tools and frameworks used to deliver end-to-end solutions to business problems. Build scalable infrastructure for supporting the delivery of clear business insights from raw data sources; with a focus on collecting, managing, analysing, visualising data and developing analytical solutions. Responsible for expanding and optimising Standard Bank's data and data pipeline architecture, whilst optimising data flow and collection to ultimately support data initiatives.

Key Responsibilities/Accountabilities

Create and maintain optimal data pipeline architecture and creating databases optimized for performance, implementing schema changes, and maintaining data architecture standards across the required Standard Bank databases. Work alongside data scientists to help make use of the data they collect.
- Assemble large, complex data sets that meet functional / non-functional business requirements and align data architecture with business requirements. Processes, cleanses, and verifies the integrity of data used for analysis.
- Build analytics tools that utilise the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics. Create data tools for analytics and data scientist team members that assist them in building and optimising
Standard Bank into an innovative industry leader.
- Utilise data to discover tasks that can be automated and identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Designing and developing scalable ETL packages from the business source systems and the development of ETL routines in order to populate databases from sources and to create aggregates. Oversee large-scale data Hadoop platforms and to support the fast-growing data within the business.
-Responsible for enabling and running data migrations across different databases and different servers and defines and implements data stores based on system requirements and consumer requirements.
- Responsible for performing thorough testing and validation in order to support the accuracy of data transformations and data verification used in machine learning models.
- Perform ad-hoc analyses of data stored in Standard Banks databases and writes SQL scripts, stored procedures, functions, and views. Proactively analyses and evaluates the Standard Banks databases in order to identify and recommend improvements and optimisation. Deploy sophisticated analytics programs, machine learning and statistical methods.
- Analyse complex data elements and systems, data flow, dependencies, and relationships in order to contribute to conceptual physical and logical data models.
- Liaise and collaborate with the entire EDO team, providing support to the entire department for its data centric needs. Collaborate with subject matter experts to select the relevant sources of information and translates the business requirements into data mining/science outcomes. Presents findings and observations to team for development of recommendations.
- Acts as a subject matter expert from a data perspective and provides input into all decisions relating to data engineering and the use thereof. Educate the organisation on data engineering perspectives on new approaches, such as testing hypotheses and statistical validation of results. Ensure ongoing knowledge of industry standards as well as best practice and identify gaps between these definitions/data elements and company data elements/definitions.

Preferred Qualification and Experience

Qualifications

:
- Degree in Information Technology

Experience

:
- 5-7 years Experience with big data tools: Hadoop, Spark, Kafka, etc. Experience with relational SQL and NoSQL databases, including Postgres and Cassandra. Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc. Experience with AWS cloud services: EC2, EMR, RDS, Redshift. Experience with stream-processing systems: Storm, Spark-Streaming, etc. Experience with objectoriented/ object function scripting languages: Python, Java, C++, Scala, etc.
- 5-7 years Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases. Experience building and optimizing
  • big data' data pipelines, architectures and data sets. Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- 5-7 years Strong analytic skills related to working with unstructured datasets. Build processes supporting data transformation, data structures, metadata, dependency and workload management. A successful history of manipulating, processing and extracting value from large disconnected datasets. Working knowledge of message queuing, stream processing, and highly scalable
  • big data' data stores.

    Knowledge/Technical Skills/Expertise

    Architectural methodologies used in the design and development of IT systems.

- The ability to ensure the accuracy and consistency of data for the duration that the data is stored as well as preventing unintentional alterations or loss of data.
- Knowledge and understanding of IT applications and architecture.
- Ability to analyse statistics and other data, interpret and evaluate results, and create reports and presentations for use by others.
- The ability to apply metadata to information to make it easy for other people to find.
- Refers to the knowledge and experience required to manage the installation, configuration, upgrade, administration, monitoring and maintenance of physical databases.