Abstract:Time series data are widely generated in many fields of science, technology and economy. Time series feature generation algorithm based on Symbolic Fourier Approximation (SFA) and sliding window transformation mechanism is one of the most effective feature dictionary construction algorithms, but there are some obvious shortcomings in this kind of methods. Firstly, the number of optimal Fourier values cannot be dynamically selected for different sliding window lengths in the process of transformation. Secondly, there is a lack of effective algorithm to select discriminant features from the generated massive features. To this end, a new variable length feature dictionary building algorithm is proposed in this study. First, a variable length word extraction method based on SFA is proposed. The method dynamically selects the optimal number of Fourier values for different sliding window lengths. Second, a new feature discriminant evaluation indicator is designed, and the generated features are selected according to its dynamic threshold. Experimental results show that, based on the proposed time series dictionary, the logistic regression model can achieve high classification accuracy and find the discriminant features in the prediction process.