In the getting_started, it’s said to train on 500k rows but datagen only yields 174000 rows, which coincides with the number of rows in the first slice of data. It seems that dl.batch() will only load one slice of data.
Also the 4GB RAM limit is very small for the given dataset, considering that we’ll probably use a lot of tree-based models that require full dataset at their disposal.