Questions tagged [pandas]
Python module widely used in data science designed around database commands and calculations
28 questions
1 vote
0 answers
63 views
Trying to plot the results of GSEA in python
I'm having some major chaos with the output table generated and then passing that to my plot function. New to this (1st time), could somebody review this piece of code and suggest some corrections. I ...
1 vote
2 answers
1k views
ValueError: count not convert string to float
I'm trying to use a Python script to plot with a rolling window using pandas and seaborn. This code worked for the longest time but now it's giving me an error that I don't know how to fix. Here is ...
1 vote
2 answers
62 views
Getting rid of duplicates in a dictionary
I have a tsv file that lists the reads and read lengths from a FASTA file but some reads are duplicated - that's just from the analysis I did previously - but I want to only take one instance of the ...
0 votes
1 answer
574 views
Refactoring pandas using an iterator via chunksize
This question was also asked on Stack Overflow Bioinformatics rationale eggNOG files can be very big and sump all available RAM for regular to medium sized desktops. I am looking for advice on using ...
1 vote
1 answer
71 views
Tidying MEGAN Taxonomy for Python/R Analysis
I am analysing some WGS data in MEGAN and would like to do some additional analysis in Python/R I am having trouble Tidying the Taxonomic data in a format which would be conducive to this. Originally ...
1 vote
1 answer
50 views
How to get enriched pathways in the data using continous statistic measure?
I was doing pathway enrichment analysis using the below code ...
1 vote
1 answer
53 views
How to calculate frequency of a category with respect to the value in another column?
I was trying to calculate the frequency of disease_present (yes) when smoking status is y (yes) for each group (A, B, C, D) <...
3 votes
3 answers
2k views
How to run python request for list of url's with multiple page numbers?
Hi I am trying to get the cancer ontologies (obo_id and label) from EBI-OLS. Earlier I have used the below code to get the obo_id terms and ...
4 votes
1 answer
279 views
calculating mutation frequencies for every gene
I have a dataset for mutation data and I want to calculate mutation frequencies across all genes df (This is only the small subset of data) ...
2 votes
1 answer
249 views
Math on Pandas Columns
I have a pandas dataframe that reads in a PAF file from minimap2. What I would like to do is take the first 5 columns of the data from to create a BED file. I used this to extract the first 5 columns: ...
2 votes
1 answer
69 views
2 votes
1 answer
104 views
Python list comprehension to calculate the set of categories for each row
My data = data ...
3 votes
2 answers
103 views
expansions and contractions from OrthoFinder
The input file looks like this, and the complete file can be found here: ...
2 votes
1 answer
830 views
How to get separate histograms plots on the basis of the column value?How to detect which plot has most deviation?
I have a data frame (df) which has correlations calculated for different genes with respect to different ID combinations. I want to get separate histogram plot based on the gene name (separate plot ...