0

My scenario, I am trying to get particular AWS S3 stored text file word count and its language detection using AWS lambda python code. Here, below code I am trying. It is providing line count but I don't know how to get word count and language detection. Please provide some idea for get file word count and language detection.

I tried for line count

import boto3 def lambda_handler(event, context): # create the s3 resource s3 = boto3.resource('s3') # get the file object obj = s3.Object('bucket name', 'sample.txt') # read the file contents in memory file_contents = obj.get()["Body"].read() # print the occurrences of the new line character to get the number of lines # print file_contents.count('\n') # TODO implement return { 'Line Count': file_contents.count('\n') } 

Current Response: { "Line Count": 48, }

Expected Response: { "Line Count": 48, "Word Count": : ?, // Here I want to show word count "Language": ? // Here language name }

2
  • You say it's not working, could you perhaps give more details about what's not working? Could you also provide a sample file and what you expect to get back from that file? Commented Jan 9, 2019 at 17:03
  • Hi @NickChapman I updated my question could you please check it? Commented Jan 9, 2019 at 17:10

1 Answer 1

0

To get the number of words you can try any of the things listed here: How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace?

To detect the language you can try one of the things listed here: NLTK and language detection

Unfortunately, your question is rather broad. Additionally, the task of detecting a text's language is rather difficult to get right. Getting the word count is easy but depends a lot on what you are going to define a word as.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.