Data streaming algorithms for estimating entropy of network traffic

Ashwin Lall, Vyas Sekar, Mitsunori Ogihara, Jun Xu, Hui Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

86 Citations (Scopus)

Abstract

Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. In this paper, we investigate the problem of estimating the entropy in a streaming computation model. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently. We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. The first algorithm for entropy estimation is inspired by the structural similarity with the seminal work of Alon et al. for estimating frequency moments, and we provide strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the performance of the streaming algorithm can be enhanced by separating the high-frequency items (or elephants) from the low-frequency items (or mice). We evaluate our algorithms on traffic traces from different deployment scenarios.

Original languageEnglish (US)
Title of host publicationPerformance Evaluation Review
Pages145-156
Number of pages12
Volume34
Edition1
DOIs
StatePublished - Jun 2006
Externally publishedYes
EventSIGMETRICS 2006/Performance 2006 - Joint International Conference on Measurement and Modeling of Computer Systems - Saint Malo, France
Duration: Jun 26 2006Jun 30 2006

Other

OtherSIGMETRICS 2006/Performance 2006 - Joint International Conference on Measurement and Modeling of Computer Systems
CountryFrance
CitySaint Malo
Period6/26/066/30/06

Fingerprint

Entropy
Program processors
Data storage equipment
Monitoring

Keywords

  • Data streaming
  • Traffic analysis

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Lall, A., Sekar, V., Ogihara, M., Xu, J., & Zhang, H. (2006). Data streaming algorithms for estimating entropy of network traffic. In Performance Evaluation Review (1 ed., Vol. 34, pp. 145-156) https://doi.org/10.1145/1140103.1140295

Data streaming algorithms for estimating entropy of network traffic. / Lall, Ashwin; Sekar, Vyas; Ogihara, Mitsunori; Xu, Jun; Zhang, Hui.

Performance Evaluation Review. Vol. 34 1. ed. 2006. p. 145-156.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lall, A, Sekar, V, Ogihara, M, Xu, J & Zhang, H 2006, Data streaming algorithms for estimating entropy of network traffic. in Performance Evaluation Review. 1 edn, vol. 34, pp. 145-156, SIGMETRICS 2006/Performance 2006 - Joint International Conference on Measurement and Modeling of Computer Systems, Saint Malo, France, 6/26/06. https://doi.org/10.1145/1140103.1140295
Lall A, Sekar V, Ogihara M, Xu J, Zhang H. Data streaming algorithms for estimating entropy of network traffic. In Performance Evaluation Review. 1 ed. Vol. 34. 2006. p. 145-156 https://doi.org/10.1145/1140103.1140295
Lall, Ashwin ; Sekar, Vyas ; Ogihara, Mitsunori ; Xu, Jun ; Zhang, Hui. / Data streaming algorithms for estimating entropy of network traffic. Performance Evaluation Review. Vol. 34 1. ed. 2006. pp. 145-156
@inproceedings{f24e7bba4ceb4fb5a574a854ede78944,
title = "Data streaming algorithms for estimating entropy of network traffic",
abstract = "Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. In this paper, we investigate the problem of estimating the entropy in a streaming computation model. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently. We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. The first algorithm for entropy estimation is inspired by the structural similarity with the seminal work of Alon et al. for estimating frequency moments, and we provide strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the performance of the streaming algorithm can be enhanced by separating the high-frequency items (or elephants) from the low-frequency items (or mice). We evaluate our algorithms on traffic traces from different deployment scenarios.",
keywords = "Data streaming, Traffic analysis",
author = "Ashwin Lall and Vyas Sekar and Mitsunori Ogihara and Jun Xu and Hui Zhang",
year = "2006",
month = "6",
doi = "10.1145/1140103.1140295",
language = "English (US)",
isbn = "1595933204",
volume = "34",
pages = "145--156",
booktitle = "Performance Evaluation Review",
edition = "1",

}

TY - GEN

T1 - Data streaming algorithms for estimating entropy of network traffic

AU - Lall, Ashwin

AU - Sekar, Vyas

AU - Ogihara, Mitsunori

AU - Xu, Jun

AU - Zhang, Hui

PY - 2006/6

Y1 - 2006/6

N2 - Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. In this paper, we investigate the problem of estimating the entropy in a streaming computation model. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently. We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. The first algorithm for entropy estimation is inspired by the structural similarity with the seminal work of Alon et al. for estimating frequency moments, and we provide strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the performance of the streaming algorithm can be enhanced by separating the high-frequency items (or elephants) from the low-frequency items (or mice). We evaluate our algorithms on traffic traces from different deployment scenarios.

AB - Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. In this paper, we investigate the problem of estimating the entropy in a streaming computation model. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently. We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. The first algorithm for entropy estimation is inspired by the structural similarity with the seminal work of Alon et al. for estimating frequency moments, and we provide strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the performance of the streaming algorithm can be enhanced by separating the high-frequency items (or elephants) from the low-frequency items (or mice). We evaluate our algorithms on traffic traces from different deployment scenarios.

KW - Data streaming

KW - Traffic analysis

UR - http://www.scopus.com/inward/record.url?scp=33750373763&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750373763&partnerID=8YFLogxK

U2 - 10.1145/1140103.1140295

DO - 10.1145/1140103.1140295

M3 - Conference contribution

SN - 1595933204

SN - 9781595933201

VL - 34

SP - 145

EP - 156

BT - Performance Evaluation Review

ER -