About the project
The project Reference Wind Farm Reanalysis Data is developed at the Center for Big Data Analysis with the aim of performing reanalysis of wind farms data. A key part of the project is to store the data in the local Hadoop ecosystem and make it available to further research through a simple interface.
Reanalysis is a meteorological data interpolation technique which assimilates historical observational data spanning an extended period, using a single consistent assimilation (or "analysis") scheme throughout. The available observational data usually do not include all of the model's prognostic fields, or may include other additional fields, they have different spatial distribution from forecast model grids, are valid over a range of times rather than a single time, and are also subject to observational error. Reanalysis is therefore used to produce an analysis of the initial state, which is a best fit of the numerical model to the available data, taking into account the errors in the model and the data. For meteorological data coming from wind farms in NetCDF format and used in projects such as NORCOWE, CBDA thus manages the data volume by:
- fast and easy ingestion in the local Hadoop infrastructure and optimal representation of data for further processing: since the existing interfaces provided by MapReduce and Spark frameworks cannot efficiently handle array-based data formats such as NetCDF, new interfaces have been created and a data model designed specifically for NetCDF. The developed NetCDF-based interfaces allow both MapReduce and Spark to efficiently extract, transform, store in HDSF as ORC tables and process the datasets.
- designing an accessible interface to make the data easily available to the relevant research tasks.
The code base is open source and is made available by the author P. Thongtra's through her Git account: https://github.com/pthongtra/netcdf-load-utils