THE PARAMETRIC AND NONPARAMETRIC ESTIMATOR IN SEMIPARAMETRIC REGRESSION FOR LONGITUDINAL DATA WITH SPLINE APPROACH

Regression analysis aims to determine the relationship between response variables and predictor variables. There are three approaches to estimate regression curves, there are parametric, nonparametric, and semiparametric regression. In this study, the form of spline semiparametric regression curve estimator for longitudinal data assessed. Based on the estimator that be obtained by using Weighted Least Square (WLS) optimization applied to model electricity consumption in Madura by choosing a model for longitudinal data based on linear spline estimator with two knot. The good criterion of the model is using the GCV value, the coefficient of determination and the value of MSE. The best model is a model that has a high coefficient of determination and a small MSE value. This spline model has a determination coefficient value of 99,72911% and MSE 32,50458.


INTRODUCTION
Regression analysis is a study used to determine the relationship pattern between response variables and predictor variables [1] [2].There are three approaches to determine the shape of the regression curve, there are parametric, nonparametric and semiparametric.In the parametric regression model it is assumed that the function patterns are known as linear, quadratic, cubic, polynomial, exponential, and many more.While the nonparametric regression model is assumed that the function pattern is not known like spline, neural network, kernel, polynomial, wavelet, histogram, MARS, Fourier series and others.If a regression model contains parametric components and other nonparametric estimated components then a semiparametric model [3][4].
In this research, we use semiparametric regression approach with linier estimator for parametric component and spline estimator for nonparametric component.Spline estimator has a high flexibility and ability to estimate data behavior that tends to differ at different intervals and spline is a model providing superior and very respectable visually statistical interpretation [5].Therefore, the spline method developed in the last decade.Pratiwi [6] introduce a semiparametric regression to estimate the average of age at first marriage in East Java with linier spline estimator.Budiantara [7] developed a spline estimator in nonparametric regression by using a base spline function family.Truncated spline estimator gives easier and simpler mathematical calculations than other estimator.
Along with the development of data analysis in big data era, study about regression analysis requires not only cross section data that often be used, but also longitudinal data.Longitudinal data consists of cross section and time series data.Research with longitudinal data is more reliable in finding answers about the dynamics of change.Longitudinal data potentially provides more complete information.Another advantage of using longitudinal data, we can know the changes that occur in a subject, because the observations are repeated for each subject [8].Some semiparametric regression studies that have used longitudinal data include spline [9], mixed-effects modeling [10] and kernel logistic [11].
However, a study for longitudinal data with spline estimator is needed to accommodate repetitive data patterns, in this case seasonal, periodic, and seasonal trend combinations.
This manuscript consists of four parts.This part is the first part which give explanation how important to determine spline estimator in semiparametric regression for longitudinal data.The first part also gives some motivation based on previous study.The second part presents study about longitudinal data structure, the form of spline estimator in semiparametric regression for longitudinal data generally and matrices equation.Based on matrices equation, Weighted Least Square (WLS) optimization is done to determine parameter vector estimator.This section is ended by present estimator form from spline in semiparametric regression for longitudinal data.An application case based on simulation data with spline estimator in semiparametric regression for longitudinal data is result and discussion in the third part.The fourth part gives conclusion based on the result.

MATERIAL AND METHODS
Consider a longitudinal data structure that be presented in Table 1 as follows: (1)   (  ) represents a regression curve.Random error for  ℎ observation in  ℎ subject is denoted by   that independent, identically normal distributed with mean 0, and variance  2 .In this case,   (  ) approached by spline function as follows: Equation ( 2) is substituted to equation (1), the result is a semiparametric regression equation for longitudinal data that be approached by spline function as follows: (3) with  is the order of the spline,  as the representation of the knot point,  the number of knots.  value of point knot to-l,  is the number of spline order.For spline function with  = 1 is called linear spline function, while  = 2 is called quadratic spline function, and so on.In this case  11 ,  12 , … ,  1 ,  21 ,  22 , … ,   ,  1 ,  2 , … ,   are unknown parameters and their value is estimated, and truncated function (  −   ) +  is defined as. 3) can be presented to matrices form as follows: In this case, [] is a matrix with structure as follows: where ( ) Weighted Least Square (WLS) optimization is done by using matrices form that be presented in equation (4).The form of WLS optimization is presented as follows: with (6) as a weighted matrix WLS optimization to get estimator for , is called  ̂, can be done by expanding right side in equation ( 5), so that (β, η) = {(y − Xβ − Tη) T W(y − f(x, t))} = 0 that can be expanded as follows: For determining  ̂ derivation process is done with (β, η) respect to  with requirement right side (β, η) equals to zero vector.The result is an estimator for parameter vector as follows: ̂= (  WX) −1 {   −   } (8) with similar procedure to get an estimator from parameter  so that it is obtained Estimator in the equation (8) and equation ( 9) not yet parameter free, so an estimator must be searched that is free from parameters with mutual substitution.If equation ( 9) is substituted to equation ( 8 Based on equation (10), a weighted matrix gives affect for determining  ̂.Similar with Ordinary Least Square (OLS) optimization method, WLS is not bound with an error distribution.A weighted matrix plays a role in the smoothing process for nonparametric regression with longitudinal data (G, 1990).
The spline estimator in semiparametric regression for longitudinal data is presented by equation ( 10 Equation ( 13) can be presented to matrix equation as follows: ̂=  ̂+ ̂ (13) Equation (10) and equation ( 11) is substituted to equation ( 13), the result is as follows: () is a hat matrix for parametric components.
() hat matrix for nonparametric components.Matrix hat for semiparametric regression model with spline approach is denoted by (). = ( 1 , … ,   ) shows the knot point contained in the matrix ().
The goodness indicators that often be used in semiparametric regression is Mean Square Error (MSE), Generalized Cross Validation (GCV), and determination coefficient (R 2 ).All of goodness indicators can be applied to splne estimator in semiparametric regression for longitudinal data.
In spline estimator in semiparametric regression for longitudinal data, an optimal knots point () is determined.In determining optimal knots point can be used GCV formula.GCV often be used because have asymptotically optimal properties [9].For determining an optimal knots point can be seen based on the smallest GCV value.The formula of GCV given as follows: By choosing of an optimal knots point will give impact to produce a determination coefficient with high value, or approximate to 100%.The determination coefficient formula given as follows: with  ̂ is a vector that include of estimation result for all of subjects, and  ̅ is a vector that include mean value for each subject.The best model that can be used for prediction met he goodness of criteria.The goodness of criteria is the smallest GCV value for an optimal knots point, the smallest mean square error (MSE) value, and the big of determination coefficient value.
In this part, study about spline estimator in semiparametric regression for longitudinal data is applied in electricity consumption data.The electricity consumption data consists of one response, and two predictors.Response variable represents monthly peak load denoted by .The first predictor represents monthly current denoted by , the second predictor represents observation time denoted by .There are three subjects in this application, each subject is observed for 12 months.The data analysis procedure uses spline estimator in semiparametric regression for longitudinal data is given as follows: 1

RESULT AND DISCUSSION
Open Source Software (OSS) R is used to analysis based on spline estimator in semiparametric regression for longitudinal data.The scatter plot that shows how the pattern of data for the Sampang Regency is presented in Figure 1.The scatter plot that shows how the pattern of data for the Pamekasan Regency is presented in Figure 2. The scatter plot that shows how the pattern of data for the Sumenep Regency is presented in Figure 3. Parameter estimates in this study depend on the optimum knot value obtained by considering the minimum GCV value.An overview of the selection of optimum knot points obtained from the minimum GCV will be shown in  Based on the optimal knot points and parameter estimates obtained for each subject, the truncated spline semiparametric regression model for longitudinal data can be written as follows: Model estimates for the 1st subject are:  ̂1 = 0,033 + 0,147 Based on the truncated equation (18) model for Sampang Regency, several interpretations can be determined, namely when the average monthly electric current in Sampang is below 1616,263 and if there is an increase in the average monthly flow of 1 unit then the peak load tends to increase 0.288 knots.The same pattern occurred when the monthly average flow of Sampang Regency between 1616,263 to 1643,132, then the peak load tended to decrease by 15,934 knots times with the sum of values of 26219,018.If the current in Sampang Regency is 1643.132or more, then the peak load tends to change by 0.144 knots by reducing the value of 26418.276.A similar interpretation method applies to the other two subjects.Following are given truncated models of equations ( 16) and (17), respectively for second and third subjects.
The truncated model of equation ( 16) for Pamekasan Regency is

CONCLUSION
The model the relationship between indicators in the field of electrecity consumption that is packaged according to longitudinal data structures can be done by using a spline semiparametric regression estimator for longitudinal data.Based on the result of the semiparametric spline regression estimator for longitudinal data and the concept of the goodness of the model, the model for the best electricity consumption indicator is a linier spline model with two knots.The result based on data analysis gives satisfied goodness of indicator, like the small MSE, and the high value of determination coefficient.This spline model has a determinition value of 99,72911% and MSE 32,50458.

Table 2 Table 2 .
Comparison of GCV value and determination coefficient