0

Alright, so basically I have a Google script that searches for a keyword. The results look like:

 http://www.example.com/user/1234 http://www.youtube.com/user/125 http://www.forum.com/user/12 

What could I do to organize these results like this?:

 Forums: http://www.forum.com/user/12 YouTubes: http://www.youtube.com/user/125 Unidentified: http://www.example.com/user/1234 

By the way I'm organizing them with keywords. If the url has "forum" in it then it goes to the forum list, if it has YouTube it goes to the YouTube list, but if no keywords match up then it goes to unidentified.

5
  • I don't understand the question. Are both input and output strings? What are your rules of organizing things? By domain? Why is example.com unidentified? And finally: what have you tried? Commented Feb 10, 2014 at 14:23
  • I'm organizing them with keywords. If the url has "forum" in it then it goes to the forum list, if it has youtube it goes to the youtube list, but if no keywords match up then it goes to unidentified. Commented Feb 10, 2014 at 14:26
  • Did you try to solve this by yourself in any fashion? Commented Feb 10, 2014 at 14:27
  • Yes, but I was using bash the run the Python script, then trying to organize the results with grep, sed, etc. All tries have failed, lol. I have no idea how I would solely go about doing this in Python. Commented Feb 10, 2014 at 14:28
  • What happens when a URL contains both "forum" and "youtube"? Commented Feb 10, 2014 at 14:31

5 Answers 5

2

1/. Create a dict, and assign an empty list to each keyword you have. eg my_dict = {'forums':[],'youtube':[],'unidentified':[]}

2/.Iterate over your urls.

3/. Generate a key for your url,domain name in your case, you can extract the key using re regex module.

4/ Check the dictionary ( of step#1) for this key, if it does not exist, assign it to 'unidentified key, if it exists, append this url to the list in the dictionary with that key.

Sign up to request clarification or add additional context in comments.

1 Comment

I don't think he always wants the domain name to be the key. For instance, the key of example.com is "Unidentified".
1

Something like this? I guess you will be able to adapt this example to your needs

import pprint import re urls = ['http://www.example.com/user/1234', 'http://www.youtube.com/user/126', 'http://www.youtube.com/user/125', 'http://www.forum.com/useryoutube/12'] pattern = re.compile('//www\.(\w+)\.') keys = ['forum', 'youtube'] results = dict() for u in urls: ms = pattern.search(u) key = ms.group(1) if key in keys: results.setdefault(key, []).append(u) pprint.pprint(results) 

3 Comments

it will better not to hardcoded the key, it should generate dynamically and create the key, as he doesn't know what are all the domain name or key in advanced.
Ah, I see, thanks. I edited my post. I gave the OP the possibility to select the keys he's interested in.
and now with a more solid pattern matching so that the last url is classified as forum
1
import urlparse urls = """ http://www.example.com/user/1234 http://www.youtube.com/user/125 http://www.forum.com/user/12 """.split() categories = { "youtube.com": [], "forum.com": [], "unknown": [], } for url in urls: netloc = urlparse.urlparse(url).netloc if netloc.count(".") == 2: # chop sub-domain netloc = netloc.split(".", 1)[1] if netloc in categories: categories[netloc].append(url) else: categories["unknown"].append(url) print categories 

Parse the urls. Find the category. Append the full url

Comments

1

You should probably keep your sorted results in a dictionary and the unsorted ones in a list. You could then sort it like so:

categorized_results = {"forum": [], "youtube": []} uncategorized_results = [] for i in results: i = i.split(".") for k in categorized_results: j = True if k in i: categorized_results[k].append(i) j = False if j: uncategorized_results.append(i) 

If you'd like to output it neatly:

category_aliases: {"forum": "Forums:", "youtube": "Youtubes:"} for i in categorized_results: print(category_aliases[i]) for j in categorized_results[i]: print(j) print("\n") print("Unidentified:") print("\n".join(uncategorized_results)) # Let's not put in another for loop. 

Comments

0

How about this:

from urlparse import urlparse class Organizing_Results(object): CATEGORY = {'example': [], 'youtube': [], 'forum': []} def __init__(self): self.url_list = [] def add_single_url(self, url): self.url_list.append(urlparse(url)) def _reduce_result_list(self, acc, element): for c in self.CATEGORY: if c in element[1]: return self.CATEGORY[c].append(element) return self.CATEGORY['example'].append(element) def get_result(self): reduce(lambda x, y: c._reduce_result_list(x, y), c.url_list, []) return self.CATEGORY c = Organizing_Results() c.add_single_url('http://www.example.com/user/1234') c.add_single_url('http://www.youtube.com/user/1234') c.add_single_url('http://www.unidentified.com/user/1234') c.get_result() 

You can easy broaden the class with more functions as you need.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.