Other question: are there rules of thumb one can use to determine how one should split the data into training and testing subsamples. Imagine one has a lot of data (say 500K observations). Crossvalidation takes a lot of time. If the training data was only 25%, things would go quicker. But what constitutes "enough" data to train the model?
↧