Java String - See if a string contains only numbers and characters not words?

Question

I have an array of string that I load throughout my application, and it contains different words. I have a simple if statement to see if it contains letters or numbers but not words .

I mean i only want those words which is like AB2CD5X .. and i want to remove all other words like Hello 3 , 3 word , any other words which is a word in English. Is it possible to filter only alphaNumeric words except those words which contain real grammar word.

i know how to check whether string contains alphanumeric words

Pattern p = Pattern.compile("[\\p{Alnum},.']*");

also know

 if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])

how will you identify the difference between a series of alphabets and a word? — Hirak
– Hirak, Commented May 28, 2014 at 12:01
for real grammer word of complete english language you need vast implementation. Just check user input for alphanumeric and add them to key value pair style and eliminate else. for alpha numeric use regex — Akash kumar
– Akash kumar, Commented May 28, 2014 at 12:03

Community · Accepted Answer · 2017-05-23 12:09:52Z

What you need is a dictionary of English words. Then you basically scan your input and check if each token exists in your dictionary. You can find text files of dictionary entries online, such as in Jazzy spellchecker. You might also check Dictionary text file.

Here is a sample code that assumes your dictionary is a simple text file in UTF-8 encoding with exactly one (lower case) word per line:

public static void main(String[] args) throws IOException { final Set<String> dictionary = loadDictionary(); final String text = loadInput(); final List<String> output = new ArrayList<>(); // by default splits on whitespace final Scanner scanner = new Scanner(text); while(scanner.hasNext()) { final String token = scanner.next().toLowerCase(); if (!dictionary.contains(token)) output.add(token); } System.out.println(output); } private static String loadInput() { return "This is a 5gse5qs sample f5qzd fbswx test"; } private static Set<String> loadDictionary() throws IOException { final File dicFile = new File("path_to_your_flat_dic_file"); final Set<String> dictionaryWords = new HashSet<>(); String line; final LineNumberReader reader = new LineNumberReader(new BufferedReader(new InputStreamReader(new FileInputStream(dicFile), "UTF-8"))); try { while ((line = reader.readLine()) != null) dictionaryWords.add(line); return dictionaryWords; } finally { reader.close(); } }

If you need more accurate results, you need to extract stems of your words. See Apache's Lucene and EnglishStemmer

Federico Piazza · Accepted Answer · 2014-06-10 17:13:20Z

You can use Cambridge Dictionaries to verify human words. In this case, if you find a "human valid" word you can skip it.

As the documentation says, to use the library, you need to initialize a request handler and an API object:

DefaultHttpClient httpClient = new DefaultHttpClient(new ThreadSafeClientConnManager()); SkPublishAPI api = new SkPublishAPI(baseUrl + "/api/v1", accessKey, httpClient); api.setRequestHandler(new SkPublishAPI.RequestHandler() { public void prepareGetRequest(HttpGet request) { System.out.println(request.getURI()); request.setHeader("Accept", "application/json"); } });

To use the "api" object:

 try { System.out.println("*** Dictionaries"); JSONArray dictionaries = new JSONArray(api.getDictionaries()); System.out.println(dictionaries); JSONObject dict = dictionaries.getJSONObject(0); System.out.println(dict); String dictCode = dict.getString("dictionaryCode"); System.out.println("*** Search"); System.out.println("*** Result list"); JSONObject results = new JSONObject(api.search(dictCode, "ca", 1, 1)); System.out.println(results); System.out.println("*** Spell checking"); JSONObject spellResults = new JSONObject(api.didYouMean(dictCode, "dorg", 3)); System.out.println(spellResults); System.out.println("*** Best matching"); JSONObject bestMatch = new JSONObject(api.searchFirst(dictCode, "ca", "html")); System.out.println(bestMatch); System.out.println("*** Nearby Entries"); JSONObject nearbyEntries = new JSONObject(api.getNearbyEntries(dictCode, bestMatch.getString("entryId"), 3)); System.out.println(nearbyEntries); } catch (Exception e) { e.printStackTrace(); }

Community · Accepted Answer · 2020-06-20 09:12:55Z

Antlr might help you. Antlr stands for ANother Tool for Language Recognition

Hibernate uses ANTLR to parse its query language HQL(like SELECT,FROM).

jhobbie · Accepted Answer · 2014-06-13 16:47:00Z

if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])

I think this is a good starting point, but since you're looking for strings that contain both letters and numbers you might want:

if(string.contains("[a-zA-Z]+") && string.contains([0-9]+])

I guess you might also want to check if there are spaces? Right? Because you that could indicate that there are separate words or some sequence like 3 word. So maybe in the end you could use:

if(string.contains("[a-zA-Z]+") && string.contains([0-9]+] && !string.contains(" "))

Hope this helps

Amit P · Accepted Answer · 2014-06-14 09:03:11Z

You may try this,

First tokenize the string using StringTokenizer with default delimiter, for each token if it contains only digits or only characters, discard it, remaining will be the words which contains combination of both digits and characters. For identifying only digits only characters you can have regular expressions used.

Collectives™ on Stack Overflow

Java String - See if a string contains only numbers and characters not words?

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Linked

Related