UBSPricing datagen not funtion as expected

In the getting_started, it’s said to train on 500k rows but datagen only yields 174000 rows, which coincides with the number of rows in the first slice of data. It seems that dl.batch() will only load one slice of data.


Also the 4GB RAM limit is very small for the given dataset, considering that we’ll probably use a lot of tree-based models that require full dataset at their disposal.

There have been other questions on the forum for RAM limitation. The data is HUGE so you have to slice the data and that’s part of the challenge. You should load, train with the batch you have loaded, clean the memory and start again !


Raise up to the challenge ! Happy coding, Lionel.

Data streaming can work well with NN, but for some tree-based models it’s not very practical. So this competition is all about working with tight constraints then. In this case, could you elaborate on how exactly many cpu and ram resources do we have? I’ve noticed sometimes my ram usage exceeds 4 gig limit, but the instance is not instantly sigkilled. Is there a timeout set for this behaviour?

Thanks.


edit: I found that I am the only user on the vm, which has a 12C24T cpu and and 32 gigs of ram. However it seems that I have access to only 1 THREAD of the cpu but the entire pool of ram. That’s not very reasonable to me.

Hello,


Everyone has their own container with 200% CPU (2T) and 4GB of RAM. If you reach 4GB limit, you could still kill some of your kernels to release back the memory to the heap. All that happens is that your code cannot allocate any more memory.

Apart from that, if your session is idle for 2 hrs, it would be killed off and would start again when you come online.

Thanks,

Manas


Post deleted by realethanzou

I checked again. In a single notebook I can still only access one thread, both with main process or the processes created by multiprocessing.Pool(2). With multiple notebooks I do have access to two threads, but that is not my use case.

I could have a look again, but everyone gets a 200% limit and therefore you should be able to do 100% of 2 threads or 50% on 4 Threads.

If you have a sample code, we could look into it. Of if you’re uncomfortable posting it here, please look for me on Alphachat.


Thanks,

Manas