Machine Learning Group


We are widely known for developing our open source ML software "workbench" called WEKA (Waikato Environment for Knowledge Analysis). We have also developed MOA (Massive Online Analysis), a software tool which enables users to analyse large, continuous data sources (like data streams). We have expertise creating machine learning applications specific to solve particular user problems (especially in relation to scientific instruments), and that users can in turn integrate effectively into their business operations.


The overall goal of the Machine Learning Group is to build state-of-the-art software for machine learning (ML) and to apply this to real-world problems and opportunities. We are seeking collaborative opportunities with businesses to further extend and commercialise tools we have developed, and/or develop new machine learning applications.

Research Focus

The Machine Learning Group's current research focuses are:

  1. Application of machine learning to lab instrument data, particularly GC-MS and NIR. Tools to develop and document applications for machine learning (e.g. particularly in pre-processing and workflow management)
  2. Identifying and applying machine learning in other domains with significant potential benefit (e.g. Health records, image processing, horticulture, on-line)
  3. Data stream mining and analysis (MOA)
  4. New machine learning algorithms for WEKA

Unique Capability Propostion

The University of Waikato's Machine Learning Group is a world leader in the application of machine learning and data mining technologies to real-world problems, particularly in complex instruments (such as GC-MS and NIR).

Past Successes

WEKA is a globally successful open source project, having been downloaded over 3.2m times since 2000 (countries with the most being the US, India and China). Downloads have increased steadily over that time, peaking at nearly 80,000 in the month of May 2012.

In our work with companies, we have successfully implemented ML techniques for analysis of data by testing laboratories in New Zealand and internationally. This includes software for exploiting near infra-red (NIR) spectroscopy analysis that has reduced the time taken for soil testing from several days to minutes. We have developed a unique software platform for documentation and rapid prototyping of new machine learning applications (ADAMS), which has been used to generate applications with significant commercial benefits. One example is an application that detects inadvertent sample swapping, and another that calibrates results from different manufacturers of the same types of instrument.