Leading communications surveillance platform, MirrorWeb, has announced wholesale changes to its web crawling technology in order to become more energy efficient.
MirrorWeb operates Amazon Web Services (AWS) accounts in London, Ohio, Virginia and Frankfurt, which are utilized depending on their clients’ preferences. Over any given 24-hour period, each of these accounts is known to run thousands of web crawls, a vital element of the digital archiving service that MirrorWeb provides.
In recent months, the company has been making the transition from Intel based crawl servers to ARM (Advanced RISC Machine) based crawl servers. ARM processors were developed by Acorn Computers and eventually Apple, and provide a low power, energy-efficient alternative to their Intel counterparts.
AWS’ version of an ARM chip set, Graviton, uses up to 60% less energy for the same performance than comparable EC2 instances, such as Intel. Due to the move to the ARM chip set, MirrorWeb was able to reduce the size of their crawl servers by half, based on the performance gains achieved with Graviton.
Additionally, MirrorWeb is introducing the practice of ‘upload on rotation’. Traditionally, for each crawl, archival data would be stored on the crawl server for the duration of the web crawl. The storage capacity would need to expand as the crawl progressed, with additional space being repeatedly requested from Amazon, as it was unclear how large the crawl would end up being. At the end of the crawl, it would take some time to upload it to the cloud, depending on the size of the crawl.
For the new ‘upload on rotation’ process, every time an archive file is completed, a new file is created, and the previous file is uploaded right away. This saves energy wasted on repeatedly growing the storage, and the longer upload period at the end of the crawl, further increasing energy efficiency.
Philip Clegg, Chief Technology Officer of MirrorWeb, said: “The changes that we’ve made have been on the agenda for a while now, and we’re very happy to make the transition over to ARM processors. The performance benefits are remarkable, and we can use up to 60% less energy to get the same results. From an environmental perspective, it’s a no-brainer.
“Further tweaks to our crawling process should complement that perfectly. 'Upload on rotation’ saves energy on every one of our crawls. It shows our commitment to honing our processes while embracing our responsibilities”.
For more information about MirrorWeb web archiving, visit https://www.mirrorweb.com/solutions/capabilities/website-archiving