for the comlex pricing model, how to read all the data using the dataloader()? I tried using a for loop but it wasn’t eficient. The batches i got from using a for loop were overlapping with one another.
Hi Anshuman ,
There is a tutorial notebook where we show how to iterate efficiently over the dataset (accessible though your dashboard). This is a large dataset which wouldn’t fit into your RAM. Using batches is efficient and a standard way to manipulate datasets of such size.
DataLoader() helps you create a generator. If you’re not familiar with how generators work in python you can read about them. Here a quick snippet that requests the first 500,000 rows of the dataset.
dl = alphien.data.DataLoader()
dl.batch(toRow=500000)
for data in gen:
print(data.shape)
Here is the output:
(174000, 165)
(189000, 165)
(136999, 165)
You can see that we have consecutively iterated over 3 different batches to give back 500,000 row. To know how big the dataset is, you can print it with dl.size(). However, during the first few weeks, only the first 75% of the dataset is available; we’ll open the remaining 25% at a later stage.
Thanks, that helped !