caret includes two functions, minDiss and sumDiss, that can be used to maximize the minimum and total dissimilarities, respectfully.Īs an example, the figure below shows a scatter plot of two chemical descriptors for the Cox2 data. Using an initial random sample of 5 compounds, we can select 20 more compounds from the data so that the new compounds are most dissimilar from the initial 5 that were specified. The panels in the figure show the results using several combinations of distance metrics and scoring functions. NewSamp <- maxDissim(start, samplePool, n = 20) For these data, the distance measure has less of an impact than the scoring method for determining which compounds are most dissimilar. The visualization below shows the data set (small points), the starting samples (larger blue points) and the order in which the other 20 samples are added. Simple random sampling of time series is probably not the best way to resample times series data. Hyndman and Athanasopoulos (2013) discuss rolling forecasting origin techniques that move the training and test sets in time. FixedWindow: A logical: if FALSE, the training set always start at the first sample and the training set size will vary over data splits.Īs an example, suppose we have a time series with 20 data points.horizon: The number of consecutive values in test set sample.initialWindow: the initial number of consecutive values in each training set sample.The three parameters for this type of splitting are: caret contains a function called createTimeSlices that can create the indices for this type of splitting.
0 Comments
Leave a Reply. |