From Data to Decision with Big Data and Predictive Analytics Training Course
Audience
If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you.
It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.
It is not aimed at people configuring the solution, those people will benefit from the big picture though.
Delivery Mode
During the course delegates will be presented with working examples of mostly open source technologies.
Short lectures will be followed by presentation and simple exercises by the participants
Content and Software used
All software used is updated each time the course is run, so we check the newest versions possible.
It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.
Course Outline
Quick Overview
- Data Sources
- Minding Data
- Recommender systems
- Target Marketing
Datatypes
- Structured vs unstructured
- Static vs streamed
- Attitudinal, behavioural and demographic data
- Data-driven vs user-driven analytics
- data validity
- Volume, velocity and variety of data
Models
- Building models
- Statistical Models
- Machine learning
Data Classification
- Clustering
- kGroups, k-means, the nearest neighbours
- Ant colonies, birds flocking
Predictive Models
- Decision trees
- Support vector machine
- Naive Bayes classification
- Neural networks
- Markov Model
- Regression
- Ensemble methods
ROI
- Benefit/Cost ratio
- Cost of software
- Cost of development
- Potential benefits
Building Models
- Data Preparation (MapReduce)
- Data cleansing
- Choosing methods
- Developing model
- Testing Model
- Model evaluation
- Model deployment and integration
Overview of Open Source and commercial software
- Selection of R-project package
- Python libraries
- Hadoop and Mahout
- Selected Apache projects related to Big Data and Analytics
- Selected commercial solution
- Integration with existing software and data sources
Requirements
Understanding of traditional data management and analysis methods like SQL, data warehouses, business intelligence, OLAP, etc... Understanding of basic statistics and probability (mean, variance, probability, conditional probability, etc....)
Open Training Courses require 5+ participants.
From Data to Decision with Big Data and Predictive Analytics Training Course - Booking
From Data to Decision with Big Data and Predictive Analytics Training Course - Enquiry
From Data to Decision with Big Data and Predictive Analytics - Consultancy Enquiry
Consultancy Enquiry
Testimonials (2)
The content, as I found it very interesting and think it would help me in my final year at University.
Krishan - NBrown Group
Course - From Data to Decision with Big Data and Predictive Analytics
Richard's training style kept it interesting, the real world examples used helped to drive the concepts home.
Jamie Martin-Royle - NBrown Group
Course - From Data to Decision with Big Data and Predictive Analytics
Upcoming Courses
Related Courses
Unified Batch and Stream Processing with Apache Beam
14 HoursApache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam's supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.
In this instructor-led, live training (onsite or remote), participants will learn how to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.
By the end of this training, participants will be able to:
- Install and configure Apache Beam.
- Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
- Execute pipelines across multiple environments.
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- This course will be available Scala in the future. Please contact us to arrange.
Data Vault: Building a Scalable Data Warehouse
28 HoursIn this instructor-led, live training in Guatemala, participants will learn how to build a Data Vault.
By the end of this training, participants will be able to:
- Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
- Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
- Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
- Build and deploy highly scalable and repeatable warehouses.
Apache Flink Fundamentals
28 HoursThis instructor-led, live training in Guatemala (online or onsite) introduces the principles and approaches behind distributed stream and batch data processing, and walks participants through the creation of a real-time, data streaming application in Apache Flink.
By the end of this training, participants will be able to:
- Set up an environment for developing data analysis applications.
- Understand how Apache Flink's graph-processing library (Gelly) works.
- Package, execute, and monitor Flink-based, fault-tolerant, data streaming applications.
- Manage diverse workloads.
- Perform advanced analytics.
- Set up a multi-node Flink cluster.
- Measure and optimize performance.
- Integrate Flink with different Big Data systems.
- Compare Flink capabilities with those of other big data processing frameworks.
Generative & Predictive AI for Developers
21 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at intermediate-level developers who wish to build AI-powered applications using predictive analytics and generative models.
By the end of this training, participants will be able to:
- Understand the fundamentals of predictive AI and generative models.
- Utilize AI-powered tools for predictive coding, forecasting, and automation.
- Implement LLMs (Large Language Models) and transformers for text and code generation.
- Apply time-series forecasting and AI-based recommendations.
- Develop and fine-tune AI models for real-world applications.
- Evaluate ethical considerations and best practices in AI deployment.
Introduction to Predictive AI
21 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at beginner-level IT professionals who wish to grasp the fundamentals of Predictive AI.
By the end of this training, participants will be able to:
- Understand the core concepts of Predictive AI and its applications.
- Collect, clean, and preprocess data for predictive analysis.
- Explore and visualize data to uncover insights.
- Build basic statistical models to make predictions.
- Evaluate the performance of predictive models.
- Apply Predictive AI concepts to real-world scenarios.
Confluent KSQL
7 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at developers who wish to implement Apache Kafka stream processing without writing code.
By the end of this training, participants will be able to:
- Install and configure Confluent KSQL.
- Set up a stream processing pipeline using only SQL commands (no Java or Python coding).
- Carry out data filtering, transformations, aggregations, joins, windowing, and sessionization entirely in SQL.
- Design and deploy interactive, continuous queries for streaming ETL and real-time analytics.
Apache NiFi for Administrators
21 HoursIn this instructor-led, live training in Guatemala (onsite or remote), participants will learn how to deploy and manage Apache NiFi in a live lab environment.
By the end of this training, participants will be able to:
- Install and configure Apachi NiFi.
- Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes.
- Automate dataflows.
- Enable streaming analytics.
- Apply various approaches for data ingestion.
- Transform Big Data and into business insights.
Apache NiFi for Developers
7 HoursIn this instructor-led, live training in Guatemala, participants will learn the fundamentals of flow-based programming as they develop a number of demo extensions, components and processors using Apache NiFi.
By the end of this training, participants will be able to:
- Understand NiFi's architecture and dataflow concepts.
- Develop extensions using NiFi and third-party APIs.
- Custom develop their own Apache Nifi processor.
- Ingest and process real-time data from disparate and uncommon file formats and data sources.
Predictive AI in DevOps: Enhancing Software Delivery
14 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at intermediate-level DevOps professionals who wish to integrate predictive AI into their DevOps practices.
By the end of this training, participants will be able to:
- Implement predictive analytics models to forecast and solve challenges in the DevOps pipeline.
- Utilize AI-driven tools for enhanced monitoring and operations.
- Apply machine learning techniques to improve software delivery workflows.
- Design AI strategies for proactive issue resolution and optimization.
- Navigate the ethical considerations of using AI in DevOps.
Python and Spark for Big Data (PySpark)
21 HoursIn this instructor-led, live training in Guatemala, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.
By the end of this training, participants will be able to:
- Learn how to use Spark with Python to analyze Big Data.
- Work on exercises that mimic real world cases.
- Use different tools and techniques for big data analysis using PySpark.
Spark Streaming with Python and Kafka
7 HoursThis instructor-led, live training in Guatemala (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Spark Streaming features in processing and analyzing real-time data.
By the end of this training, participants will be able to use Spark Streaming to process live data streams for use in databases, filesystems, and live dashboards.