If True, shuffle the data before splitting it. Recommended for most ML applications
array of categories, e.g. class labels, which is used to stratifiy the holdout train/test split
number greater than 0, smaller than 1, indicating the fraction of data used for the training settypically 0.8
seed for random number generator