UBSPricing datagen not funtion as expected

realethanzou · September 19, 2020, 7:17am

In the getting_started, it’s said to train on 500k rows but datagen only yields 174000 rows, which coincides with the number of rows in the first slice of data. It seems that dl.batch() will only load one slice of data.

Also the 4GB RAM limit is very small for the given dataset, considering that we’ll probably use a lot of tree-based models that require full dataset at their disposal.

ls · September 19, 2020, 7:39pm

There have been other questions on the forum for RAM limitation. The data is HUGE so you have to slice the data and that’s part of the challenge. You should load, train with the batch you have loaded, clean the memory and start again !

Raise up to the challenge ! Happy coding, Lionel.

realethanzou · September 19, 2020, 9:24pm

Data streaming can work well with NN, but for some tree-based models it’s not very practical. So this competition is all about working with tight constraints then. In this case, could you elaborate on how exactly many cpu and ram resources do we have? I’ve noticed sometimes my ram usage exceeds 4 gig limit, but the instance is not instantly sigkilled. Is there a timeout set for this behaviour?

Thanks.

edit: I found that I am the only user on the vm, which has a 12C24T cpu and and 32 gigs of ram. However it seems that I have access to only 1 THREAD of the cpu but the entire pool of ram. That’s not very reasonable to me.

manas · September 20, 2020, 12:25am

Hello,

Everyone has their own container with 200% CPU (2T) and 4GB of RAM. If you reach 4GB limit, you could still kill some of your kernels to release back the memory to the heap. All that happens is that your code cannot allocate any more memory.

Apart from that, if your session is idle for 2 hrs, it would be killed off and would start again when you come online.

Thanks,

Manas

realethanzou · September 20, 2020, 12:47am

Post deleted by realethanzou

realethanzou · September 20, 2020, 1:04am

I checked again. In a single notebook I can still only access one thread, both with main process or the processes created by multiprocessing.Pool(2). With multiple notebooks I do have access to two threads, but that is not my use case.

manas · September 20, 2020, 1:08am

I could have a look again, but everyone gets a 200% limit and therefore you should be able to do 100% of 2 threads or 50% on 4 Threads.

If you have a sample code, we could look into it. Of if you’re uncomfortable posting it here, please look for me on Alphachat.

Thanks,

Manas