Predicting clear-sky global horizontal irradiance at eight locations in South Africa using four models

Solar radiation under clear-sky conditions provides information about the maximum possible magnitude of the solar resource available at a location of interest. This information is useful for determining the limits of solar energy use in applications such as thermal and electrical energy generation. Measurements of solar irradiance to provide this information are limited by the associated cost. It is therefore of great interest and importance to develop models that generate these data in lieu of measurements. This study focused on four such models: Ineichen-Perez (I-P), European Solar Radiation Atlas model (ESRA), multilayer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) models. These models were calibrated and tested using solar irradiance data measured at eight different locations in South Africa. The I-P model showed the best performance, recording relative root mean square errors of less than 2% across all hours, months and locations. The performances of the MLPNN and RBFNN were poor when averaged over all stations, but tended to show performance similar to that of the I-P model for some of the stations. The ESRA model showed performance that was in between that of the Artificial Neural Networks and that of the I-P model.


Introduction
Solar radiation exhibits variation that depends on astronomical and weather factors.The astronomically-driven variation is predictable from well-established equations [1,2].In general, weather-induced variations are less predictable and result in long-and short-term solar irradiance fluctuations that can only be predicted in a statistical sense [1].Clear-sky conditions, on the other hand, present atmospheric conditions that produce predictable effects on solar irradiance.There is growing interest in models that predict clear-sky solar irradiance, which resulted in the development of many models that vary in complexity and accuracy of prediction [3][4][5][6][7][8][9][10][11].A majority of these models predict broadband clear-sky irradiance, where the clear-sky atmospheric effects are accounted for by broadband attenuation parameters such as Linke-turbidity coefficient [12,13] and Angstrom coefficient [14,15].Calibration of the models for local conditions involves an empirical process that computes the relevant broadband attenuation parameters using the clear-sky irradiance models backwards, with a selection of measured local clearsky solar irradiance data as input [16].
It is also possible to generate clear-sky irradiance from a set of astronomical and weather parameters using artificial neural networks (ANNs).The ANNs approximate the functional relationship between random input and output variables by learning from examples made up of historical data output and input variables [17].Published applications of ANNs in the field of solar energy include time-series forecasting of solar radiation quantities [18][19][20][21] and other function approximation or regression models that map a set of input parameters like temperature into radiation quantities [22][23][24][25][26].One major attraction of ANN methods is their ability to find relations between input and output even if the representation was intractable [19].The ANN can, therefore, map a wide range of possible combinations of input or explanatory variables to a single desired output.This, however, does not underplay the importance of carefully selecting the variables.Koca et al. [26], for example, showed that different combinations of inputs affected the performance of ANN models that predicted global solar irradiation.
Solar energy is one of the promising sources of energy in South Africa.It is therefore important to investigate the performance of solar radiation models for South African conditions.A growing database of solar irradiance data from measurements by the Southern African Universities Radiometric Network (SAURAN) [27] provides opportunities to investigate and develop clear-sky models for South African conditions.The present investigation considered four models, two of which are semi-empirical broadband models: Ineichen-Perez (I-P) [10] and European Solar Radiation Atlas (ESRA) [11], which take Linke turbidity index, Earth-sun geometrical parameters and other geographical parameters as inputs.These models have been extensively investigated in other regions outside South Africa where relative root mean square errors (rRMSE) of less than 10% were reported [9,6,5,28].The other two models considered in this investigation are ANNs based models, one a multi-layer perceptron neural network (MLPNN) and the other a radial basis function neural network (RBFNN).All four models were calibrated to predict horizontal clear-sky solar irradiance from similar inputs that carry information about location, time of day and year as well as atmospheric conditions.Model performance was investigated across eight different locations.The theoretical details of these models are discussed, followed by a methodology that describes data preparation and model evaluation criteria.
where θ z is the apparent zenith angle.

European solar radiation atlas model
The GHIclear for the ESRA model is given in Rigollier et al. [11] as the sum of beam horizontal clear-sky irradiance (BHIclear) and diffuse horizontal clear-sky irradiance (DHIclear).Equation 3gives the expression for calculating the BHIclear.
where δ R (m) is the Rayleigh optical thickness and its computation from air mass m is given by Kasten [13].
The diffuse irradiance on a horizontal surface, as shown by Equation 4, is expressed as a product a diffuse transmission function (Trd) and a diffuse angular function, (Fd).

Artificial neural network models
General structure A neural network model learns the statistical model that generates the data in a set of examples.The functional mapping of the model can be stated as in Equation 5. y = y(x;w) (5) where: • x is a vector of inputs; • w is vector of model parameters usually referred to as weights; and •  is the model output.
During learning, the ANN optimises the weight matrix w so that the error between the desired output t, for input x, and the corresponding predicted output, y = y(x;w), is minimised.In ANNs that are applied to regression problems, the sum-of-squares error (SSE) function E is normally the preferred target objective [30].Equation 6 defines E.
where n =1, 2,…N indexes the training patterns or features making up the training input matrix , and the corresponding target output vector t.In this investigation, the ANN models estimate clear-sky global horizontal irradiance GHI clear from three inputs.
Equations 7 and 8 define the model input and output variables.
x = [ cos θ z , doy/N y , exp (-mT L ) ] (7) t =(GHI clear ) M /I 0 (8) where: • doy denotes the day of year number (it equals 1 for the first day of January); • N y , the number of days in a year; and • (GHI clear ) M is the target clear-sky irradiance selected from records of measured irradiance data.
Figure 1 shows the general form of the architecture of an ANN that implements this model.
The following sections give a more detailed account of the specific forms of the functional mapping of the MLPNN and RBFNN.

Multilayer perceptron neural network
Bishop [30] and Nabney [31] gave detailed descriptions of the functional mapping of a MLPNN.For a two-layer MLPNN with M hidden units, which maps three inputs to one output, the functional mapping can be written in the form of Equation 9.
The MLPNN is usually trained by the 'backpropagation method' [17], which optimises the input layer and output layer weights until a set objective (usually a set SSE) is achieved.

Radial basis function neural network
The RBFNN is considered as the main practical alternative to MLPNN for non-linear modelling [31].The general radial basis function of the network mapping is given by Bishop [30] and Equation 10specifies it to the three inputs and one output for the clear-sky model. where: •   (2) are the elements of output layer vector of weights  (2) ; and •   () =   ∥  −   ∥ are basis functions, where the jth input data point   defines the centre of the radial-basis function, and the vector x is the vector of inputs applied to the input layer [17].
Gaussian and thin plate spline function are some of the preferred basis functions in RBFNN.
Training of the RBFNN goes through a two-stage process.The first stage optimises the radial basis functions kernels, and stage two optimises the weight matrix of the output layer  (2) by least squares method.

Methodology 3.1 Experimental
The irradiance information required for the calibrating and evaluation of the models was obtained from measurements performed by eight radiometric stations spread across South Africa, as shown in Figure 2. The stations form part of Southern African Universities Radiometric Network.Detailed information about the equipment used at the respective stations can be obtained from Brooks et al. [27] or by accessing the data portal webpage at http://www.sauran.net/.

Data preparation
Model inputs: The Linke turbidity indexes evaluated at air mass 2 (T L (AM2)) can be computed from measured clear-sky direct normal irradiance (DNI) using Kasten's pyrheliometric formula, given as Equation 11.
Using Equation 11, turbidity indexes limited to air mass in the range 1.99 ≤ m ≤ 2.2 were computed using yearlong samples of DNI data measured at each station.Linke turbidity indexes that fall out of a range defined by 2 ≤ TL ≤ 5 were disregarded.This range was shown to be representative of clear skies for at least one location in South Africa [32].
The resulting time series of indexes were resampled, filtered and interpolated to produce yearlong time series of daily Linke turbidity indexes for each of the eight locations.The rest of the inputs, which include, solar zenith angle θ z (, ), air mass (θ z ), and  0 , were all computed from well-known astronomical equations.

Model training and validation data.
The training patterns for the ANNs consisted of N × 3 input data matrix, , as defined in Equation 7 and a corresponding output vector of N target elements , as defined in Equation 8.The number of features, N, corresponds to the number of clear-sky GHI data points selected from one-minute averages of GHI data measured at the eight stations.The selection was according to a criterion defined by 0.97 ≤   ≤ 1.01, where   is a clear-sky index calculated as a ratio of measured GHI to clear-sky GHI, given by the Ineichen-Perez model [10].The training data were selected from measurements of GHI data gathered from 1 January 2014 to 31 December 2014.Using the same criterion, the validation data for the all models were selected from a sample of GHI data measured from 1 January 2015 to 31 December 2015 at each of the eight stations.

Evaluation of model accuracy
The bias, precision, and accuracy of the models were evaluated from the following usual statistics.The relative mean bias error (rMBE) represents the mean bias of model prediction.Equation 12defines the mathematical computation of the rMBE for the GHI clear prediction models.rMBE = mean(model GHI clear -measured GHI clear ) mean(measured GHI clear ) (12) The rMBE gives an indication of how much a model under-estimates or over-estimates the observations.A sample of the model predictions may, however, consists of an even distribution of overand under-estimated observations, resulting in the errors compensating each other and giving a false sense of unbiased predictions.The rRMSE gives a measure of the precision and bias or accuracy of the model.It is defined by Equation 13for the GHI clear prediction models.
A potential drawback of rRMSE is its sensitivity to outlying estimates far away from the true value [33].

Training and validation data
The training and testing data were derived from each of the eight stations.In plots show that University of Pretoria (UPR) and Vanrhynsdorp (VAN) contributed the least and largest amount of data, respectively.The hourly contributions were also not uniform, and followed a normal distribution centred about solar noon, as shown in Figure 3(b).The periods 6:00-7:00 and 17:00-18:00 provided the least amount of training and validation data.

General performance of the models
It is important to select the best possible network architectures for the ANN models.Since the number of inputs and the number of outputs were fixed at three and one respectively, the optimal architectures were determined by choosing the number of hidden units M that resulted in the least error.Architectures with 4, 8, 12, 16, and 32 hidden units were considered.Figure 4 shows the rRMSE averaged over all sites as functions of the number of hidden units for both the MLPNN and RBFNN models.The 3-12-1 architecture gave the best performance for the MLPNN, while the 3-32-1 architecture showed the best performance for the RBFNN.There was, how-  ever, a small difference between the performance of the 3-16-1 and 3-32-1 RBFNN architectures.A compromise between complexity and performance would thus favour the 3-16-1 RBFNN architecture.
The results also reveal that the 3-16-1 RBFNN architecture performed better than the 3-12-1 MLPNN.Figure 4 also shows the performance of the I-P and ESRA models and reveals that these two models performed better that the ANNs models.The I-P model, with rRMSE < 2%, had the least prediction error.

Performance as a function of time
In Figure 5(a), the rRMSE is averaged over all locations and was plotted as a function of time of day for the 3-12-1 MLPNN, 3-16-1 RBFNN, I-P, and ESRA models.The ANNs and ESRA models showed similar trends where the rRMSE exhibited significantly larger errors during the early morning and late evening hours in comparison with a flat trend between 7:00 and 17:00.This contrasted with the trend exhibited by the I-P model that consistently performed with rRMSE below 2% for all the hours.The prediction biases of the models are shown in Figure 5(b), where rMBE is plotted as a function time of day.Again, performance the I-P was consistent for all the hours of day, revealing a positive bias that indicated overestimation of the clear-sky GHI.The ESRA model also overestimated the GHI for the hours ranging from 9:00 to 15:00.The biases for the ANNs showed a similar trend with rMBE that was close to zero between 8:00 and 16:00.Given the large rRMSE, this trend suggests that the ANNs both overestimated and underestimated the GHI during this time interval, resulting in net-bias error was close to zero.
A further insight into variation of the performance of the models with time is shown in Figure 6, where rRMSE was plotted as a function of month of year.The performance of the ANNs varied the most with month of year and exhibited the largest monthly rRMSE.A comparison of the two ANNs models revealed that the RBFNN performed better than the MLP, except for the months of October to January.The I-P model also showed the most superior monthly prediction performance with rRMSE below 2% for all months.

Conclusions
This paper presented four models for predicting clear-sky global horizontal irradiance: Ineichen-Perez (I-P), European Solar Radiation Atlas model (ESRA), multilayer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) models.The I-P model produced the most consistent and most accurate performance, recording relative root mean square errors (rRMSE) values of less than 2% across, all hours of day, all months of year and all locations.On the other hand, the two artificial neural networks (ANN) models, MLPNN and RBFNN, showed poor performance across all hours, and months for an all-stations-averaged evaluation.The evaluation of the ESRA model when averaged over all stations, hours and months revealed a performance that is close to that of the ANNs, recording rRMSE values of close to 3%.The performance of ANNs matched that of the I-P model for some of the stations, indicating that the ANNs 'remembered' the input and output relationships for these locations better, compared with other locations.It is, therefore, useful to explore ways to improve the generalisation capabilities of the ANNs for this clear-sky irradiance generation application.

Figure 1 :
Figure 1: Architecture of an artificial neural network having two layers of adaptive weights applied in this work.(Adapted from Nabney [31])

Figure 2 :
Figure 2: Map showing the locations and altitude of the radiometric stations that provided irradiance data, where RVD = Ritchersveld, VAN = Vanrhynsdorp, GRT = Graaff-Reinet, NMU = Nelson Mandela Metropolitan University, UFS = University of Free State, UPR = University of Pretoria, VRY = Vryheid, and KZH = University of KwaZulu-Natal Howard College.
rRMSE= �mean[(model GHI clear -measured GHI clear ) 2 ] mean(measured GHI clear ) (a) and (b) of Figure 3 the populations of the training and testing data, expressed as percentages of the total number of training sample data points N = 340 426, are plotted as functions of location and time of day, respectively.The contributions were not uniform, and

Figure 3 :
Figure 3: Distribution of training and testing data by: (a) location and (b) by time of day.

Figure 4 :Figure 5 :
Figure 4: Relative root mean square error of the multi-layer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) as a function of number of hidden units, compared with that of the Ineichen-Perez (I-P) and European Solar Radiation Atlas (ESRA) clear-sky models.

4. 5
Examples of clear sky GHI model predictionsFigure 8 illustrates a visual comparison of the models' predictions to the clear-sky GHI measured at Graaff-Reinet radiometric station.At the scale shown by the figures, the model predictions matched the measured clear sky GHI without perceptible differences, except for the hours close to solar noon.In these examples, the I-P model and the ANN models underestimated the clear-sky GHI at solar noon while the ESRA model produced the closest match.