Function approximation
Given a sequence of \( n \) points generated by a function \( f(x) \) and a number of elements \( k \), find the subset of \( k \) elements plus the first and last points of the sequence for which the approximation error is minimized
Let us define the error function as
\[e = \sum^n_{i=1} (y_i - g(x_i))^2\]i.e. the square of the difference between \( y_i \) and the ordinate \( g(x_i) \) for point \( x_i \) on a segment connecting two other points (i.e. approximating the \( i^{th} \) point); the algorithm should choose \( k \) points and minimize \( e \).
The problem can be solved via dynamic programming by recursively considering each point \( i \) as the first point of the set (excluding the first one of the entire \( n \) sequence and the last one of the sequence) and consider the error given by the approximation on the right and left of it.
Suppose the input data is
\[X = \{ 4,3,2,1 \} \\ Y = \{ 4,2,3,1 \} \\ K = 1\]which corresponds to the following function
we’re allowed to choose only one point except the first and last one to minimize the error. If we were to choose point \( (x_3, y_3) \) i.e. the third one with coordinates \( (3;2) \), there would be a segment from the point \( (1;1) \) to \( (3;2) \) and point \( 2;3 \) would yield an approximation error \( e \) equal to
\[e_{\overline{13}} = \left( y_2 - \left( \frac{y_3 - y_1}{x_3 - x_1} (x_2 - x_1) + y_1 \right) \right)^2\]Similar calculations would have to be performed if point \( (x_2;y_2) \) had been chosen instead. It follows that
\[e_{\overline{14}} = min \left\{ e_{\overline{13}} + e_{\overline{34}}, e_{\overline{12}} + e_{\overline{24}} \right\}\]where
\[e_{\overline{12}} = e_{\overline{34}} = 0\]The proposed solution runs in \( O(n) \) to calculate the error threshold (and \( O(n \log{n}) \)to sort the data) while the dynamic programming loop runs in \( O(n^2) \) therefore yielding a total of \( O(n^3) \).