1,988 questions
0 votes
1 answer
96 views
How to format a vocabulary list into a table using Python for Google Sheets
What are the details of your problem? I am a teacher and I want to use Python to create a worksheet for my students. I have a vocabulary PDF with content like this: do your best duː jɔː best 33, 81 do ...
0 votes
2 answers
150 views
How to extract text outside parentheses using regex? [duplicate]
I'm trying to extract the part of a string that is outside parentheses using regular expressions. For example, from the input string: "Hello (this is extra) world (more text)" I want to ...
0 votes
0 answers
102 views
How to remove duplicate footers from pdf text
ASP.NET Core 9 MVC / C# controller extracts texts from invoices using pdfpig based on code in answer How to group text to lines if there is small difference in Y position. Invoices can have multiple ...
2 votes
5 answers
200 views
Picking out file names from grep results where file name contains numbers and hyphens
I have a script that runs a grep command and formats the results nicely for me, asking if I want to open any of the resulting files in an editor etc. The core of my script is a command like this: grep ...
0 votes
0 answers
43 views
Swift Regex -- RegexBuilder for extracting blocks from enclosed in tags
I'm trying to use RegexBuilder/Swift to write a Swift method that extracts for example lists enclosed by <ul> and </ul> from an HTML-string. In this example let htmlText = ""&...
3 votes
3 answers
136 views
Add ) at End of Lines Containing ( but Not )
I need help with a regular expression in Notepad++. I want to: Find lines that contain a ( character but do not contain a ) character. Add a ) at the end of those lines. I tried using this regex to ...
1 vote
1 answer
91 views
How do I remove escape characters from output of nltk.word_tokenize?
How do I get rid of non-printing (escaped) characters from the output of the nltk.word_tokenize method? I am working through the book 'Natural Language Processing with Python' and am following the ...
1 vote
1 answer
114 views
Extra newline before -----END CERTIFICATE----- when processing CSV in Python
I'm processing CSV files in Python to extract and format data from another file. However, when writing the output, I get an extra newline before -----END CERTIFICATE-----. I want the output to have ...
2 votes
2 answers
97 views
How can sort a file on specific lines, preserving headers?
raw txt file contains these lines: cat raw.txt ID DESCRIPTION ----- -------------- 2 item2 4 item4 1 item1 3 item3 How can reorder it by ID as ...
0 votes
1 answer
114 views
How to properly structure and clean extracted text from DOCX in Python?
I am working on a Flask-based web application that processes multilingual agenda documents. The documents are in DOC/DOCX format and contain structured agenda items that I need to extract and format ...
3 votes
1 answer
151 views
Parse text file, change some strings to camel case, add other strings - follow up question
Note that this is the follow up question of Parse text file, change some strings to camel case, add other strings . The parsing rules are similar but different: The input order in the output is ...
1 vote
1 answer
96 views
Parse text file, change some strings to camel case, add other strings
The parsing rules are: Replace the string "public static final String" with the string "export const" if that string occurs only once. Replace the string "public static final ...
2 votes
1 answer
97 views
How to Dynamically Map Directory-Based Identifiers to a Specific Column in Annotated Output Files?
Problem: I have a two-step bioinformatics pipeline where: Code 1 generates output files (.marked.bam) and places them into a directory structure. Code 2 processes annotated files (annotated....
1 vote
1 answer
62 views
Calculating richness of text from NLTK book package
I am trying to return richness of NLTK text provided in NLTK book but for some reason I get None. Can someone please explain me what I am doing wrong? from nltk.book import * def ...
-5 votes
1 answer
120 views
How to Separate Text and Code in Python Strings?
I've encountered an issue in python. I have a string that contains both a message and code, and I need to separate them and pass each to different functions. An example: text = """ Can ...