# [UI-part3] Incremental learning from data streams

### Lecture notes of Artificial Intelligence

Course: https://ucilnica.fri.uni-lj.si/course/view.php?id=81
Lecturer: izr. prof. dr. Zoran Bosnić
Language: English
Date: 2014-05-14

Course overview (part 3):

• comparison of stationary learning vs. incremental learning; streams, design, limitations
• data summarization: statistics, histogram, wavelets, condensing datasets
• incremental learning models: incremental decision trees (VFDT, functional tree leaves), Hoeffding bound
• concept drift/novelty detection, reacting to changes: extreme values, decision structure, frequency, distances
• clustering from data streams: hierarchical, micro, grid clustering,
• evaluation of streaming algorithms
• matrix factorization with temporal dynamics (recommendation systems)

Lectures with hands-on exercises in Statistical package R. Homework problem for grading: competition in modeling given streaming data. Project: will be discussed individually, taking some data related to PhD and doing something incrementally.

## Learning from data streams

Date: 2014-05-14

### Introduction

Static learning:

• fixed dataset
• classification/regression prediction problems

Incremental learning:

• data streams = changing data
• concept drift = unpredictable data distribution
• time series = measurements with timestamps, transformation into relational data is possible
• sliding window = holds most recent examples
• sequence-based window – 20 last measurements
• timestamp-based window:
• linear – 20 samples with 15 min between
• non-linear – 20 samples with increasing space between them
• adaptive – shrinks on demand (for concept drift)

• split window into two subwindows such that their average is the same (test all combinations)
• drop elements if it is not
• comparison of means from two subwindows must take into account their stability (from how many elements the mean was computed)
• one parameter $$\delta$$

### Data synopsis

Compress incoming information from a stream to make processing feasible, not store all data.

• representative sample: reservoir sampling algorithm
• histograms
• sketches
• discrete transforms and wavelets: Haar wavelets

## Predicting from data streams

Date: 2014-05-21

Options:

• for time series: statistics, transformations, fit coefficients
• for relational/attributional data: adapt common learning algorithms, specialized incremental learning algorithms
• transform time series into attributional form:
• temporal attributes – periodical attributes based on assumptions what makes sense (eg. daily/weekly sin/cos function)
• historical attributes – eg. measurements at some hours before
• real values – for manual checking

Incremental models do not require all historical data to be available and can be build incrementally. Incremental learning slides a window through data and updates the model with new examples as they arrive.

### Algorithm: Very fast decision tree algorithms (VFDT)

• only discrete attributes
• tree is not binary
• split a leaf node only if there is sufficient statistical evidence

Growing the tree:

1. update statistics in the leaf
2. evaluate $$H()$$ on all attributes, compute difference between best two attributes: $\Delta H = H(a_1) - H(a_2)$
3. compare Hoeffding bound (Hoeffding, 1963): $\epsilon = \sqrt{\frac{R^2 ln(2 / \delta)}{2n}}$

## Concept drift detection

Distribution of examples or what is being modeled can change over time – eg. as a consequence of a hidden variable or process. We must be cautious to differentiate from temporal noise.

Categorizations based on:

• data management:
• full memory (by increasing recent weights)
• partial memory (fixed or adaptive window)
• detection methods:
• monitor performance indicators
• compare time-windows
• blind methods (restart at regulated interval)
• informed methods

Tests:

• Page-Hinkley test (PH test): compute difference against average, alarm if larger than allowed
• Statistical process control: compute probability of error $$p_i$$ and its standard deviation $$s_i$$, system switches between states according to bounds (“in control”, “warning”, “out of control”)

## Clustering from data streams

Date: 2014-05-28

• online clustering
• evaluating incremental algorithm

In online clustering the task is to maintain a continuously consistent good clustering using a small amount of memory and time (grouping objects into groups, unsupervised learning).

Approaches:

• hierarchical clustering: build micro clusters for cluster features (CF), maintain statistics for each one (not storing examples), later combine them into macro clusters
• partitioning clustering (k-means)
• density-based clustering
• grid-based clustering
• model-based clustering

### Algorithm: BIRCH

Hierarchical clustering approach with acronym meaning Balanced Iterative Reducing and Clustering using Hierarchies (Zhang et al., 1996).

• build a tree
• every node has 6 cluster features and descendant subtrees
• each node summarizes the statistics of descending nodes
• parameters: $$B$$ - branch factor, $$T$$ - maximum absorption distance
• put new examples arrive, the closest CF can absorb it if closer than radius $$T$$, if not a new CF entry is added, if there is no room split the parent node

An extension is CluStream (Aggarwal et al., 2003) that maintains just $$q$$ micro clusters, discards oldest.

## Evaluation of streaming algorithms

Performance measures: MSE, CA…

Approach in batch learning is to split data into training set (training set, validation set) and test set.

Issues in streaming data:

• dataset not fixed
• concept drifts
• decision models change over time

Approaches influenced by order of examples:

• holdout and independent test set
• predictive sequential (prequential) evaluation

Solution is to use prequential estimate with forgetting mechanism (using either time window or fading factors).

Q-estimate is used to compare two algorithms. Taking a logarithm of accumulated losses provides positive values if first algorithm is better than the second.

$Q_i(A,B) = log(\frac{S^A_i}{S^B_i})$