The widespread use of the Internet has created two problems: Document retrieval latency and network traffic. Caching of documents 'close' to users has helped alleviate both problems. Different caching policies have been proposed/implemented to make best use of limited available cache at each caching server. A mesh of caching servers, aided by different data diffusion algorithms and the natural hierarchical structure of the Internet topology, has increased 'virtual' size of cache. Yet the size of available cache is small compared to the total size of all documents served, and remains a major resource constraint. In this work, we looked at how to improve document download time, by distributing a fixed amount of total storage in a network or mesh of caches. The intuition behind our cache distribution approach is to give more storage to the caching nodes in the network, which experience more traffic, in the hope that this will reduce the average latency of document retrieval in the network. A heuristic was developed to estimate traffic at each cache of a network. From this traffic estimation, each cache then receives a corresponding percentage of the total storage capacity of the network. Through extensive simulation it is found that the proposed cache distribution algorithm can reduce latency up to 80% over prior work that includes both Harvest-type and demand-driven data diffusion algorithms. Furthermore, the best improvement was achieved in a cache range that corresponds to practical, real world cache ranges.
- Heuristic algorithm
- Internet topology
- World Wide Web
ASJC Scopus subject areas
- Computer Networks and Communications