How to get particular file word count from AWS S3 storage using lambda?

Question

My scenario, I am trying to get particular AWS S3 stored text file word count and its language detection using AWS lambda python code. Here, below code I am trying. It is providing line count but I don't know how to get word count and language detection. Please provide some idea for get file word count and language detection.

I tried for line count

import boto3 def lambda_handler(event, context): # create the s3 resource s3 = boto3.resource('s3') # get the file object obj = s3.Object('bucket name', 'sample.txt') # read the file contents in memory file_contents = obj.get()["Body"].read() # print the occurrences of the new line character to get the number of lines # print file_contents.count('\n') # TODO implement return { 'Line Count': file_contents.count('\n') }

Current Response: { "Line Count": 48, }

Expected Response: { "Line Count": 48, "Word Count": : ?, // Here I want to show word count "Language": ? // Here language name }

You say it's not working, could you perhaps give more details about what's not working? Could you also provide a sample file and what you expect to get back from that file? — Nick Chapman
– Nick Chapman, Commented Jan 9, 2019 at 17:03
Hi @NickChapman I updated my question could you please check it? — sai
– sai, Commented Jan 9, 2019 at 17:10

Nick Chapman · Accepted Answer · 2019-01-09 20:44:22Z

To get the number of words you can try any of the things listed here: How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace?

To detect the language you can try one of the things listed here: NLTK and language detection

Unfortunately, your question is rather broad. Additionally, the task of detecting a text's language is rather difficult to get right. Getting the word count is easy but depends a lot on what you are going to define a word as.

Collectives™ on Stack Overflow

How to get particular file word count from AWS S3 storage using lambda?

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related