Data Analysis and Preparation for ML Sep 2023

Online Course: Data Analysis and Data Preparation for Machine Learning

6 September 2023, 10:00 - 16:00 CEST <CANCELLED>

Training for professionals from all domains who would like to get a handle on their data
Organised by EIT Manufacturing CLC East in cooperation with EuroCC Austria and VSC Research Center (TU Wien) 
Language: English
Location: Online
Level: Beginner
Price: € 120,00 (including VAT)
Registration closed

This one-day online course shows the participant how to get a sense of the look and feel of their data, how to visualize it and clean it up where necessary. In addition, this course shows participants how to get the data in a suitable shape before feeding it into Machine Learning (ML) algorithms further down the line.

Please note that this course will not teach ML methods, as there are follow-up courses for these topics. 

This training is ideal for:

  • Marketing professionals with programming experience
  • Professionals in quality management with programming experience
  • Professionals in machine maintenance with programming experience

Participants will learn how to:

  • Get data into a suitable form
  • Visualize data
  • Clean data
  • Transform data
  • Analyze data
  • Handle data that does not fit in memory


  • Overview
    Participants learn why data needs to be pre-processed before being passed to ML methods. They also learn what the typical challenges are in data wrangling.
  • Pandas
    Participants get to know this powerful Python library and find out how they can load data into a data frame, get the look and feel of it and transform it in the best suitable way.
  • NumPy
    ML would simply not be possible in Python without this useful library for numerical operations. This is why participants will get to know the most important aspects of the API and what can be achieved with it.
  • Matplotlib
    Humans are visual beings and this is why we prefer looking at graphs, rather than endless tables of data. Matplotlib is the Python library to create all kinds of graphs which helps understand data a great deal more. Participants will learn how to create the most common graphs within Matplotlib.
  • Dask
    In ML problems, we often get to a situation where our data does not fit into memory. Even if it fits into memory, we would like some operations to run faster. Dask solves this problem by dividing our data into smaller, more manageable chunks. It then runs computations on those chunks in parallel, making it possible to handle data that is larger than memory. It is also faster since it makes computations run concurrently. Participants will get to know this tool and see the similarities with previously learned libraries.


Find out more about the pre-processing of data for machine learning


Course format

The training will be held online from 10:00 – 16:00 CEST with a 1-hour break at 12:00. The participation links will be provided after the purchase and before the training.


  • The participants are expected to have at least basic programming skills in Python
  • The programming language of choice is Python and participants will get to know libraries such as NumPy, Pandas, Scikit-Learn, Matplotlib and Dask.
  • The content is delivered with Jupyter notebooks on Google Colab, so participants should have a Google account in order to be able to participate fully.


Simeon Harrison (EuroCC Austria and VSC Research Center, TU Wien)


Full price for the course with course documentation: € 120,00 (including VAT)


Upon completion of the online training, participants will receive a certificate of attendance.


Registration closed


Rosina Preis (Competence and Knowledge Manager for EU Projects CLC East)

Simeon Harrison (Trainer and Coordinator Training for Industries, EuroCC Austria and VSC)

Back to training events