Duplicate Transfer Problem inside a Proxy’s Cache
This paper presents an overview of the current web client – web proxy – web server mechanism and takes a deep look into one of its main disadvantages: the replication, in the proxy’s cache, of web objects having different URL but the same content. This problem is known as the Duplicate Transfer problem and is mainly caused by the current mode of indexing web objects based on their URL, which is used as a primary key in the cache repository. We present in this paper a statistical analysis based on real traffic measurements, which shows that more than 10% of a proxy’s cache consists of replicated objects, grabbed from the Internet in a useless manner and stored redundantly at least twice. These results urge the development of a scalable real-life solution to the duplicate transfer problem: some solutions have been previously proposed, but never deployed on a large scale in Internet. Continue Reading