Big Data & Database Systems Fundamentals Training Course
The course is part of the Data Scientist skill set (Domain: Data and Technology).
Course Outline
Data Warehousing Concepts
- What is Data Ware House?
- Difference between OLTP and Data Ware Housing
- Data Acquisition
- Data Extraction
- Data Transformation.
- Data Loading
- Data Marts
- Dependent vs Independent data Mart
- Data Base design
ETL Testing Concepts:
- Introduction.
- Software development life cycle.
- Testing methodologies.
- ETL Testing Work Flow Process.
- ETL Testing Responsibilities in Data stage.
Big data Fundamentals
- Big Data and its role in the corporate world
- The phases of development of a Big Data strategy within a corporation
- Explain the rationale underlying a holistic approach to Big Data
- Components needed in a Big Data Platform
- Big data storage solution
- Limits of Traditional Technologies
- Overview of database types
NoSQL Databases
Hadoop
Map Reduce
Apache Spark
Open Training Courses require 5+ participants.
Big Data & Database Systems Fundamentals Training Course - Booking
Big Data & Database Systems Fundamentals Training Course - Enquiry
Big Data & Database Systems Fundamentals - Consultancy Enquiry
Consultancy Enquiry
Testimonials (4)
how the trainor shows his knowledge in the subject he's teachign
john ernesto ii fernandez - Philippine AXA Life Insurance Corporation
Course - Data Vault: Building a Scalable Data Warehouse
During the exercises, James explained me every step whereever I was getting stuck in more detail. I was completely new to NIFI. He explained the actual purpose of NIFI, even the basics such as open source. He covered every concept of Nifi starting from Beginner Level to Developer Level.
Firdous Hashim Ali - MOD A BLOCK
Course - Apache NiFi for Administrators
That I had it in the first place.
Peter Scales - CACI Ltd
Course - Apache NiFi for Developers
practice tasks
Pawel Kozikowski - GE Medical Systems Polska Sp. Zoo
Course - Python and Spark for Big Data (PySpark)
Upcoming Courses
Related Courses
Unified Batch and Stream Processing with Apache Beam
14 HoursApache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam's supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.
In this instructor-led, live training (onsite or remote), participants will learn how to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.
By the end of this training, participants will be able to:
- Install and configure Apache Beam.
- Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
- Execute pipelines across multiple environments.
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- This course will be available Scala in the future. Please contact us to arrange.
NoSQL Database with Microsoft Azure Cosmos DB
14 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at database administrators or developers who wish to use Microsoft Azure Cosmos DB to develop and manage highly responsive and low latency applications.
By the end of this training, participants will be able to:
- Provision the necessary Cosmos DB resources to start building databases and applications.
- Scale application performance and storage by utilizing APIs in Cosmos DB.
- Manage database operations and reduce cost by optimizing Cosmos DB resources.
Data Vault: Building a Scalable Data Warehouse
28 HoursIn this instructor-led, live training in Guatemala, participants will learn how to build a Data Vault.
By the end of this training, participants will be able to:
- Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
- Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
- Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
- Build and deploy highly scalable and repeatable warehouses.
Apache Flink Fundamentals
28 HoursThis instructor-led, live training in Guatemala (online or onsite) introduces the principles and approaches behind distributed stream and batch data processing, and walks participants through the creation of a real-time, data streaming application in Apache Flink.
By the end of this training, participants will be able to:
- Set up an environment for developing data analysis applications.
- Understand how Apache Flink's graph-processing library (Gelly) works.
- Package, execute, and monitor Flink-based, fault-tolerant, data streaming applications.
- Manage diverse workloads.
- Perform advanced analytics.
- Set up a multi-node Flink cluster.
- Measure and optimize performance.
- Integrate Flink with different Big Data systems.
- Compare Flink capabilities with those of other big data processing frameworks.
Introduction to Graph Computing
28 HoursIn this instructor-led, live training in Guatemala, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.
By the end of this training, participants will be able to:
- Understand how graph data is persisted and traversed.
- Select the best framework for a given task (from graph databases to batch processing frameworks.)
- Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
- View real-world big data problems in terms of graphs, processes and traversals.
Confluent KSQL
7 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at developers who wish to implement Apache Kafka stream processing without writing code.
By the end of this training, participants will be able to:
- Install and configure Confluent KSQL.
- Set up a stream processing pipeline using only SQL commands (no Java or Python coding).
- Carry out data filtering, transformations, aggregations, joins, windowing, and sessionization entirely in SQL.
- Design and deploy interactive, continuous queries for streaming ETL and real-time analytics.
Apache NiFi for Administrators
21 HoursIn this instructor-led, live training in Guatemala (onsite or remote), participants will learn how to deploy and manage Apache NiFi in a live lab environment.
By the end of this training, participants will be able to:
- Install and configure Apachi NiFi.
- Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes.
- Automate dataflows.
- Enable streaming analytics.
- Apply various approaches for data ingestion.
- Transform Big Data and into business insights.
Apache NiFi for Developers
7 HoursIn this instructor-led, live training in Guatemala, participants will learn the fundamentals of flow-based programming as they develop a number of demo extensions, components and processors using Apache NiFi.
By the end of this training, participants will be able to:
- Understand NiFi's architecture and dataflow concepts.
- Develop extensions using NiFi and third-party APIs.
- Custom develop their own Apache Nifi processor.
- Ingest and process real-time data from disparate and uncommon file formats and data sources.
Python and Spark for Big Data (PySpark)
21 HoursIn this instructor-led, live training in Guatemala, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.
By the end of this training, participants will be able to:
- Learn how to use Spark with Python to analyze Big Data.
- Work on exercises that mimic real world cases.
- Use different tools and techniques for big data analysis using PySpark.
Spark Streaming with Python and Kafka
7 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Spark Streaming features in processing and analyzing real-time data.
By the end of this training, participants will be able to use Spark Streaming to process live data streams for use in databases, filesystems, and live dashboards.