Questions tagged [natural-language-processing]
For questions related to natural language processing (NLP), which is concerned with the interactions between computers and human (or natural) languages, in particular how to create programs that process and analyze large amounts of natural language data.
758 questions
0 votes
2 answers
101 views
If LLMs like OpenAI / DeepSeek / Gemini exist, why do we still need ML or NLP libraries, now and in the future?
I’m new to AI and NLP, and I’m trying to understand how different tools fit together. Large Language Models (LLMs) like OpenAI, DeepSeek, or Gemini can already handle many NLP tasks text ...
0 votes
1 answer
43 views
How to manage discussion between two persons
I want to code (C++) a method allowing a character C1 to ask or request something from another character C2. The answer of C2 will be environment related: does it knows the thing C1 is looking for? ...
0 votes
0 answers
45 views
Extracting services mentioned in short reports — rules vs ML?
I’m trying to identify which home services are present vs explicitly excluded in short, free-text reports. I also need to normalize synonyms (e.g., “pressure washing” → “power washing”). Goal: decide ...
1 vote
0 answers
64 views
Are traditional NLP tasks solve well with modern transformer/llm technologies?
Before the LLM explosion, the tradidional NLP tasks, such as parsing, coreference resolution, translation to logical representation, temporal and event sequence resolution -- all have approached a ...
0 votes
0 answers
33 views
How can I give context to the BLIP model when generating captions?
I'm using HuggingFace's 'blip-image-captioning-base' model for image captioning. I trained it on both existing and domain-specific datasets I created specifically for generating Turkish language ...
2 votes
0 answers
39 views
How can I integrate BERT Tokenizer into BLIP model for image captioning?
Lately, I've been working on generating alt text for images using the BLIP model. The model I use is "blip-image-captioning-base" from HuggingFace. However, to generate alt text in Turkish ...
0 votes
0 answers
49 views
How can a symbolic "traveler equation" help AI detect signal-carrying works across time?
**In my personal project, "The Circular Vision: an equation to find signal‑carrying humans", I propose a symbolic framework to think about how some human works (songs, poems, scientific ...
0 votes
0 answers
24 views
Research on Machine Learning for sensory Impairement people(Visual or Hearing)
I'm exploring the potential of Visual-Language Models (like CLIP, BLIP, etc.) in assistive technology, particularly for people with visual impairment. I have explored few research papers in this area ,...
2 votes
1 answer
107 views
How can text models handle misspellings?
As of my knowledge, first, the string of text is tokenized and then the tokens are fed to a transformer. Handling simple typos ("exsmple" rather than "example") can be done by ...
1 vote
1 answer
120 views
Are there any research papers with guidelines or tricks on how to use LLMs effectively?
There are several prompting techniques that can significantly enhance the performance of LLMs across a wide range of tasks, including programming. For example, a complex problem can be recursively ...
1 vote
0 answers
64 views
Probability of training data in language models
Are there estimates of the total probability neural language models assign to the corpus they were trained on?
0 votes
0 answers
40 views
BERT Adapter and LoRA for Multi-Label Classification (301 classes)
I’m working on a multi-label classification task with 301 labels. I’m using a BERT model with Adapters and LoRA. My dataset is relatively large (~1.5M samples), but I reduced it to around 1.1M to ...
0 votes
0 answers
34 views
Classifying review as real/fake based solely on text
Hey I just wanted to know if there are any papers on classifying reviews or news or any text as fake based solely just on the text. I've been digging and can only find papers that use other data such ...
0 votes
1 answer
60 views
How do BPE Tokenizers with `add_prefix_space` handle natural language, such as quotations or poetry, where there won't be a prefix space?
BPE Tokenizers are the standard for modern LLMs. By default, most add_prefix_space, so that John went away is pretokenized to <...
1 vote
0 answers
66 views
Applying the RTD task to a model trained with MLM leads to a decrease in performance as training progresses
We are developing a new LLM based on the CodeBERT architecture. As part of this effort, we initially trained our model using the Masked Language Modeling (MLM) objective with HuggingFace API. To ...