0
$\begingroup$

Assume that you want to see the car price based on name of the car, and production year.

Name, production year, price Volvo, 2021, 50000 Volvo, 2012, 16000 Toyota, 2022, 30000 

Then if the input data is:

Volvo, 2018 

It returns the most similar data maybe as below:

Volvo, 2021, 50000 Volvo, 2012, 16000 

So, I can manually choose the right price.

$\endgroup$
2
  • 1
    $\begingroup$ So what is your question? $\endgroup$ Commented Oct 12, 2022 at 10:17
  • $\begingroup$ df.query("'Name' == 'Volvo'")? $\endgroup$ Commented Oct 12, 2022 at 10:40

1 Answer 1

2
$\begingroup$

If you want to be choosing something manually, you can just query it using df.query. You can play around with the production_year variable, but in your post it seems rather irrelevant.

data = pd.DataFrame({"name" : ['Volvo', 'Volvo', 'Toyota'], "production_year" : [2021, 2012, 2022], 'price' : [50000, 16000, 30000]}) q = data.query("name == 'Volvo'") 

Returns :

Volvo, 2021, 50000 Volvo, 2012, 16000 

I am unaware whether your data is in text documents or in a table, but it seems like the latter. In that case, you can jump straight to creating a sort of like a recommender system - see https://www.kaggle.com/code/alexanderstefanov/goodreads-search-engine/notebook

In the former case, I would use sentence transformers for semantic search.

from sentence_transformers import SentenceTransformer, util model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1') query_embedding = model.encode('Volvo made in 2012 with a price of 10000') passage_embedding = model.encode(['Volvo made in 2022 with a price of 500000', 'Volvo made in 2010 with a price of 12000', 'Volvo made in 2013 with a price of 13337', 'London is known for manufacturing Volvos']) print("Similarity:", util.dot_score(query_embedding, passage_embedding)) 

Returns the following probs: 0.9150, 0.9557, 0.9385, 0.6478

$\endgroup$
2
  • $\begingroup$ Thanks for the reply and my apologies for ambiguity. The similar results should come out of the clustering algo. This example is very easy and manually possible, but once you have 40+ features you cannot query and you do not probably know how to measure the similarity. $\endgroup$ Commented Oct 12, 2022 at 11:21
  • $\begingroup$ @user141467 I am not sure that clustering is what you need. You could approach it that way, but it seems like you're in the field of neural search / sentence similarity. $\endgroup$ Commented Oct 12, 2022 at 12:16

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.