2

I have a column with a bunch of ingredients lists in it. I'm trying to figure out how many times different individual ingredients appear. There are 73,000 rows. The answers on this question works for a small amount of data in Google Sheets.

Formula is =UNIQUE(TRANSPOSE(SPLIT(JOIN(", ";A2:A);", ";FALSE)))

But I've overwhelmed JOIN with more than 50000 characters here. Is there another way to tackle this?

Sheet: https://docs.google.com/spreadsheets/d/1t0P9hMmVpwhI2IbATmIMjobuALTg8VWhl8-AQaq3zIo/edit?usp=sharing

enter image description here

0

5 Answers 5

2
=ARRAYFORMULA(UNIQUE(TRIM(TRANSPOSE(SPLIT(TRANSPOSE( QUERY(","&A1:A,,5000000)),","))))) 

0


=QUERY(QUERY(ARRAYFORMULA(TRIM(TRANSPOSE(SPLIT(TRANSPOSE( QUERY(","&A1:A,,5000000)),",")))), "select Col1,count(Col1) where Col1 is not null group by Col1 label count(Col1)''"), "order by Col2 desc") 

0

demo spreadsheet

Sign up to request clarification or add additional context in comments.

4 Comments

That seems to have done it. Not totally sure. Possible to get counts as well? Also, thanks!
sure, it's possible... just load time could be tricky (even crash) if you got weak PC. answer updated
dang...you should offer an online sheets course...amazing work.
if this helped you then consider accepting the answer
1
=UNIQUE(TRANSPOSE(SPLIT(REGEXREPLACE(TRANSPOSE( QUERY(ARRAYFORMULA(","&A1:A),,5000000))," ,",","),","))) 

0

1 Comment

"Text result of REGEXREPLACE is longer than the limit of 50000 characters." as well
1

but maybe you need this (?):

=QUERY(TRANSPOSE(SPLIT(REGEXREPLACE(TRANSPOSE( QUERY(ARRAYFORMULA(","&A1:A),,5000000))," ,",","),",")), "select Col1,count(Col1) where Col1 is not null group by Col1 label count(Col1)''") 

0

2 Comments

"Text result of REGEXREPLACE is longer than the limit of 50000 characters." Could be time for me to learn Python/pandas
nah, there is always a way with formulas
1

I did a google scripting solution because I wanted to play with key map pairs.

function myFunction() { var myMap = {"candy":0}; var sh = SpreadsheetApp.getActiveSpreadsheet(); var ss = sh.getSheetByName("FIRSTSHEETNAME"); var os = sh.getSheetByName("Ingredients"); var data = ss.getDataRange().getValues(); for (var i=0; i<data.length;i++)//full //for (var i=1; i<4000;i++)//test { var array = data[i][0].split( ","); for (var j=0; j<array.length;j++) { var item = array[j]; //Logger.log(array[j]); if (myMap[item]>-1){ //Logger.log("REPEAT INGREDIENT"); var num = parseInt(myMap[item]); num++; myMap[item]=num; //Logger.log(item +" "+num); } else { myMap[item]=1; //Logger.log("New Ingredient: "+item); //Logger.log(myMap); } } } //Logger.log(myMap); var output=[]; for (var key in myMap){ //Logger.log("Ack"); output.push([key,myMap[key]]); } //Logger.log(output); os.getRange(2,1,output.length,output[0].length).setValues(output); } 

You'll need to add an "Ingredients" tab for the output and change your first tab to be called FIRSTSHEETNAME (or change the code). In my testing it took 4 seconds for 4 items, 5 seconds for 400 items, and 6 seconds for 4000 items. there might be an issue with leading spaces but this gives you a place to start.

3 Comments

I just counted them... there are 1,965,918 items which could take more than 30 seconds (?) in which case script will fail before it's completed
Did you try it? The script limit is 5 minutes now for non corporate accounts, 30 minutes for corporate accounts. and by items I meant rows.
I didn't, did you? 5/30min... interesting. I heard 30sec... good to know then. as for the rows, there are: 98,268
1

A fast running formula that works with columns of at least 40,000 rows:

 =query(arrayformula(TRIM(flatten(split(A2:A20000,",")))),"select Col1,Count(Col1) Where NOT (Col1='' OR Col1 contains '#VALUE!') Group By Col1 order by Count(Col1) desc label Col1 'Ingredient',Count(Col1) 'Freq.'") 

FLATTEN function, combined with SQL (QUERY function) can be a solution for fast filtering of values (such as empty or error messages).

TRIM function avoids artifacts in the result due to meaningless spaces before/after each string.

Sheet: https://docs.google.com/spreadsheets/d/1m9EvhQB1Leg2H7L52WhPe66_jRrTc8VsnZcliQsxJ7s/edit?usp=sharing

*In case of false case differences, you could normalize all characters of the strings to uppercase before within the same formula with UPPER(A2:A20000).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.