Improving Access to Accelerate Innovation: Two of NREL’s Largest Datasets Now Publicly Available on the Cloud
Information Available as Part of DOE-Sponsored Open Energy Data Initiative
July 5, 2019
Historically, sharing large datasets publicly has required gathering the data, loading it on a hard drive, and mailing it to the requestor, but thanks to a migration to the cloud in April, NREL’s Wind Integration National Dataset (WIND Toolkit) and National Solar Radiation Database (NSRDB) are now publicly available via the Amazon Web Services (AWS) open data registry. This allows companies, research organizations, and universities faster, simpler, and more thorough retrieval of massive solar and wind resource data.
Anyone can now download the data from the cloud and perform large-scale solar or wind analysis with a nominal cost. The WIND Toolkit and NSRDB were previously publicly available for download on NREL servers, but only subsets of the data could be downloaded, and NREL staff couldn’t meet the demand for larger data pulls—up to thousands of requests every day.
“We get constant requests from people who can’t fulfill their [data] requests with the [existing] NSRDB Viewer and the Wind Prospector,” NREL data engineer Michael Rossol said. “Those were [previously] the only ways to access these datasets.”
Rossol said universities are some of the most common requestors, with the Massachusetts Institute of Technology (MIT) one of the most frequent to reach out. Not only can MIT and other academic institutions now access the datasets directly, but they can play with the data directly in the cloud using the HDF Groups Highly Scalable Distributed Service (HSDS). Before this migration, the WIND Toolkit, for example, was near impossible to access if full due to its size, and researchers are often only interested in a small sliver of the data. Until now, no one had been able to aggregate the slew of useful, but disparate datasets in one easily accessible location.
Amazon funded this dataset migration through its Sustainability Data Initiative, an effort to make renewable energy datasets widely available to the public.
According to Amazon.com: “Providing access to large datasets in the cloud can help researchers and innovators address a wide range of sustainability challenges. The Amazon Sustainability Data Initiative significantly reduces the cost, time, and technical barriers associated with analyzing large datasets to generate sustainability insights—regardless of an organization’s size or computing power.”
Chris Webber, who leads NREL’s cloud computing team, said the move allows internal and external researchers to better collaborate and reduces the workload on NREL employees to facilitate data requests. Right now, NREL has a couple hundred terabytes of raw data in the cloud—and it’s growing.
NREL’s efforts to improve access to the lab’s wealth of energy data is part of a broader U.S. Department of Energy (DOE)-funded effort called the Open Energy Data Initiative (OEDI), which aims to improve and automate access of high-value energy datasets across the DOE programs, offices, and national laboratories.
NREL’s collaboration with Amazon on the NSRDB and WIND Toolkit migration marks the beginning of a three-year OEDI project to create a “data lake” that enables faster, easier, more advanced analysis and computation to accelerate innovation. The plan is to launch the lake with 10 databases of DOE information, including the WIND Toolkit and NSDRB. Webber said he imagines within the next year, NREL will have made significant headway on the lake.
“[Our] big datasets are now available to the public,” Webber said. “OEDI is going to be a game-changer.”
Learn more about NREL’s computational science research.