I have an embedding matrix in the size of (100000, 100). I want to compute all the pairwise cosine distances in the matrix. I've tried using sklearn.metrics.pairwise.cosine_distances function, but it crashes due to RAM memory reaching its limit. I also tried to do the calculaion in batches like so:
from sklearn.metrics.pairwise import cosine_distances embeddings.astype(np.float32) distances_matrix = [] batch_size = 1000 df_size = len(embeddings) for i in tqdm(range(0, df_size, batch_size)): end = min(i + batch_size, df_size) batch = embeddings[i:end] batch_distances = cosine_distances(batch, embeddings) distances_matrix.append(batch_distances) but it also craches after about 11 iterations.
Any suggestions on how to approach this? Thanks.