Accessing the embeddings from the leaderboard #2448
Replies: 6 comments
-
No, we don't store them. You need to run benchmarks |
Beta Was this translation helpful? Give feedback.
-
Ok thanks for the fast reply. I tried using CachedEmbeddingWrapper for this. Is there a way to store all of the embeddings for different tasks and models in separate directories or do I have to do this separately for each model and task? |
Beta Was this translation helpful? Give feedback.
-
I'm not sure that current |
Beta Was this translation helpful? Give feedback.
-
Ok thanks. I'll have a look, maybe I can support with a PR. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the question here - as this is more of a discussion I will move it over :) |
Beta Was this translation helpful? Give feedback.
-
Hi, I created a PR for this: #2467. I did some modifications to the CachedEmbeddingWrapper class. So far, the cache is initialized as a single TextVectorMap when the class is instantiated. I simply updated the code to initialize a separate TextVectorMap per task. The encode function recieves the task_name through kwargs, so I can simply use that to cache the embeddings into the corresponding map. The memmaps are initialized with 100.000 vectors, if we have many tasks it might lead to some unused disk space, but I checked and the files are only a few megabytes large. Alternatively, one could make the initial number of vectors smaller, e.g. 10.000. Not sure if this is necessary, but storing per task could also be made optionally (and else it just appends to one large map). I also
Looking forward to feedback :) |
Beta Was this translation helpful? Give feedback.
-
Hi all,
is there any way to access the raw embedding arrays for all models and tasks on the leaderboard? Are they stored somewhere?
Otherwise one has to download all models and taks and re-run the entire benchmark.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions