By CivilServiceWorld

30 Apr 2013

Information storage can be complex and costly, not to mention its environmental impact. As Whitehall organisations look for good places to squirrel away their ever-growing mounds of data, Gill Hitchcock sets out the options


Row upon row of servers, cabling to rival an old-school telephone exchange, expensive staff and buildings, and an electricity bill as big as a small town’s. Many Whitehall departments still own their own data storage facilities, operating outside their core business and paying high fixed costs even as their needs change. But like many organisations in the corporate world, civil servants are beginning to explore other options – and there’s plenty of room for improvement: the Cabinet Office says the government’s data storage is enormously inefficient, with serious over-capacity and utilisation below 10 per cent in some organisations.

Two years ago the Cabinet Office launched a data centre consolidation programme, with the ambitious aim of cutting costs by 35 per cent and delivering savings of £20m in 2012-13, £60m this financial year and £80m in 2014-15. These savings were to be achieved by reducing the number of smaller data centres, and by moving data and software from multiple pieces of government-owned hardware to a single central system. Where legislation and security matters permit, the Cabinet Office urged departments to use ‘cloud’ services – accessing IT on demand from private providers via a network.

So how far has the programme got? The Cabinet Office doesn’t yet know how much has been cut from power bills, but it’s optimistic that considerable cost savings have been identified and delivered in some large central departments. These savings, a spokesperson says, have been realised through a combination of contract negotiation; implementing best practice; decommissioning unused servers; virtualisation (the amalgamation of multiple network storage devices into a single unit); and greater use of shared platforms and tiered storage – where data is stored on different media dependent on its performance, recovery and availability requirements. It is not yet able to specify the level of savings, however. “The savings associated with data-centre consolidation in central government are in the process of being assured through the Cabinet Office and are not yet available for release,” a spokesperson tells CSW.

Identifying how the government’s use of data centres has changed is complex, the spokesperson admits: “The best data available shows central government now operating from 80 data centres, and with 70 per cent of the estate in 20 data centres.”

The figures do indicate progress – and a significant decline in data centre numbers since the CIO Council commissioned a survey in 2010. That research identified 220 data centres across central government, plus 88 across the police services, and in excess of 600 across local government and the wider public sector.

Cloud atlas
The Cabinet Office never though that the shift towards cloud would be easy, and admits that a culture change is required in order for civil servants to work differently across the ICT functions of government and embrace cloud-based commodity services.

Some parts of government have been moving relatively quickly on this agenda, however – and learning some useful lessons along the way. Among them is HM Revenue and Customs (HMRC), which last autumn became the first department to sign a contract for the delivery of G-Cloud services over the Public Services Network (PSN). Its intention is to complete a move to centralised cloud storage this year: housing its data in a single centre, accessed online from all its locations, should provide cheaper, more secure and greener data storage for the department.

HMRC says that pre-cloud, much of its data was held on hundreds of local file and print servers (FAPs), and the data they contained had to be regularly backed-up onto electronic tapes, many of which were moved off-site to a tape library. According to the department, FAP failures were costing thousands of lost hours each year, and by eliminating these failures it expects an increase in tax yield of more than £3m a year.

“FAPs use a lot of power, and need special air-conditioned rooms to keep them cool,” an HMRC spokesperson says: the cloud “will reduce electricity bills by about £75,000 a year, plus approximately £500,000 a year in the back-up, transport and accommodation costs of tape libraries.

“The central storage hardware is also a lot cheaper – several million pounds less than it would cost us to replace all the FAPs with new ones. And moving to central storage will also simplify office moves, as only the normal office IT will need to be moved.”

Not all organisations are convinced about the benefits of cloud storage, however. Oliver Morley, chief executive of The National Archives (TNA), sees cloud as an unlikely solution for TNA. He’s not too worried about security, he says, but does have concerns about the ease of data extraction from the cloud: “You do need to know that you can remove your data easily and quickly from the cloud,” he says. “If someone calls at 3am and says there are security issues and they need to extract data, you need to be able to do that.”

He also has fears about ‘lock-in’, whereby buyers end up tied into a contract that no longer meets their needs. It’s happened in the past with big IT projects, he says, and “there’s a high potential that you have a similar problem with lock-in on the cloud. We are extremely strong on saying: ‘You don’t want lock-in. You want to make sure that this is a service and you can switch it on and switch it off’.”

The university sector can offer some lessons on data storage, too. Jisc, a registered charity which supports further and higher education institutions in their use of digital technologies, manages a £10m programme – paid for by the Higher Education Funding Council for England’s modernisation fund – around shared services and cloud computing.

Jisc’s Dr Simon Hodson, who is responsible for managing research data, warns of the hidden costs associated with cloud storage. He points out that the business model for a number of cloud providers is to provide storage cheaply, but to impose very high costs when organisations ask to access their data. “You think: ‘Okay that looks a pretty good offer for storage’, but then the sting in the tail comes when you try to access the data. So these are things that institutions and companies need to be aware of,” he explains.

In its guidance on cloud security, the Information Commissioner’s Office (ICO) says that processing data in the cloud may pose new security risks. It urges data controllers to take time to understand the data protection dangers that cloud computing presents.

Securing the system
HMRC says that although its cloud storage programme is removing the risks associated with the use of tape, it has had to work hard to ensure that its cloud system is secure. “The programme is an early adopter of PSN, so we’ve had to work through the inevitable issues involved in connecting offices to the central cloud storage service,” explains an HMRC spokesperson. “The big advantage in using PSN, though, is that it’s designed specifically for government use and so the evaluation and confirmation that security requirements are being met is well understood.” This isn’t the case with all approaches to cloud IT, but under HMRC’s plans “the data is held in secure UK data centres, and the service is being accredited for use across government.”

Meanwhile, the Land Registry continues to hold its core database of land and property ownership in England and Wales – along with maps, plans and addresses – on central processing units, known as mainframe computers, with more than 70 terabytes of storage. But its local office data has been centralised onto a NetApp virtual storage infrastructure, as part of a managed service, and its financial and business information resides on Oracle SAS storage, an integrated system of storage products.

The registry says that its data is replicated between two sites using ‘metro mirror’ technology, which maintains a full copy of the data at the second site. Data is also replicated to a disaster recovery site using ‘global mirror’ technology, ensuring that a copy of the data is made just a few seconds behind real time.

For the British Library, security means spreading its data storage across four major nodes: two at the British Library itself, and one each at the national libraries of Wales and Scotland. “When the content is ingested into one node, it is replicated to the other sites,” says Sean Martin, head of architecture and development at the library. “We find that storage isn’t ultra-reliable over a long period. You occasionally get phenomena which are colloquially called ‘bit rot’, and you get errors on disks and various things. At each site we self-check the content to ensure it hasn’t changed, and if we find there has been a corruption, we will request a fresh copy from one of the other sites.”

As well as keeping its data geographically dispersed, the British Library uses a range of technology suppliers. Martin explains that the library did not want all of the storage sites to be identical, in case a problem at one spread to the others. And Jisc’s Hodson backs this approach: “It’s good practice that there is at least one copy and the data storage has a ‘three, two, one principle’, whereby ideally you should have at least three copies on at least two different storage media and at least one of those back-ups in another location.”

Byte-size storage
For Tim Gollins, head of digital preservation at TNA, the biggest challenge is not data storage, but the management of that data: how it is protected and made available, and the way usage can be optimised. The archives currently hold 300 terabytes of material including the whole of the government web – the most used web archive in the world, Gollins says. But its storage requirements are growing fast, with a further 50 terabytes arriving – including Locog’s Olympic records, and a major influx from public hearings. “We will be taking the archive of Leveson, and that is very significant in terms of the total volume,” says Gollins. “By moving to a video form for the record of these hearings – whatever hearings they may be – that increases the volume of data enormously.”

TNA’s solution for coping with this massive increase is to expand its existing tape storage capacity – about 310 terabytes – to 13.4 petabytes. As Morley explains, “it is a case of buying tapes – and we can ride the benefit of improvement in tape technology”: capacity continues to rise as the systems develop.

Gollins says that, per byte, tape storage is much cheaper than disk; and because it doesn’t require cooling, its green profile is “staggering”. Asked whether government departments could make more use of tape, he replies: “The thing about storage is that you have to be very clear about how it’s going to be used. One of the reasons we can make use of tape is that our need to access the master copy is very infrequent. And that means it can sit quietly on the tape and do nothing. If you’re an organisation that has a collection of data that needs to be constantly processed and served to customers or users then tapes don’t work, because it takes too long for the data to get from the tape to the person. You have to have disks.”

Hodson says that data reuse is a big issue for universities, and they have been undertaking surveys to identify the best storage infrastructure for their needs. Although he says that in general sensitive data is generally handled very carefully, he’s worried about the amount of non-sensitive data held on external hard drives or memory sticks. One of the challenges is that universities do not have very good self-knowledge about the amount of data they hold, because a lot of it is on ad hoc systems, he says.

It’s important, Hodson argues, to focus on value for money and getting the right system; a single-minded emphasis on price may lead to less obvious costs being overlooked. “It will take some more work to uncover some of the hidden costs in data storage,” he says.

Weighing up the options
So what lessons can organisations such as Jisc, the Land Registry and the British Library impart to Whitehall departments looking to optimise their own data storage? Although each department’s needs will vary, considerations should include the potential hidden costs of accessing data stored cheaply in the cloud; and an awareness of the pros and cons of keeping data on tape, where the benefits of a more eco-friendly mode of storage must be weighed against the fact that it takes far longer to access data. Then there are the security issues: wherever data is stored, the data controller – the department which owns the information – must ensure that none of its personal data is used in ways that break the Data Protection Act (DPA).

The ICO cautions that an important part of selecting the right cloud provider is an assessment of the supplier’s security. A large cloud provider could have a number of data centres in different countries, but the watchdog points out that the DPA requires that personal data is not processed in a country outside the European Economic Area unless that country ensures an adequate level of data protection. Organisations must ask potential cloud providers for a list of countries where data is likely to be processed and for information about safeguards, the ICO urges.

At a time when Whitehall’s requirement for data storage is growing fast, there are numerous factors that civil servants must consider before the government can reap the benefits of cheaper, greener data storage. And they may often find that the cheapest option is not the best: as Hodson says, “optimal storage often comes at a higher cost – but it’s better practice.”

Read the most recent articles written by CivilServiceWorld - Bid to block whistleblower’s access to ministers

Share this page