python - Find a value by substring in a nested dictionary -


just give problem context: writing django webapp includes several applications. 1 of them used display articles rss feeds. now, displaying link, source , description. want add thumbnails these articles. i'm trying grab these thumbnails any rss or atom feed. theses feeds parts (e.g. images) constructed in totally arbitrary ways. since don't want write specific script every feed on web, idea ".jpg", ".png" substrings in every article fetch , url. getting rss or atom feeds articles handled python feedparser module, , outputs example:

 {'guidislink': false,   'href': '',   'id': 'http://www.bbc.co.uk/sport/football/39426760',   'link': 'http://www.bbc.co.uk/sport/football/39426760',   'links': [{'href': 'http://www.bbc.co.uk/sport/football/39426760',              'rel': 'alternate',              'type': 'text/html'}],   'media_thumbnail': [{'height': '576',                        'url': 'http://c.files.bbci.co.uk/44a9/production/_95477571_joshking2.jpg',                        'width': '1024'}],   'published': 'wed, 05 apr 2017 21:49:14 gmt',   'published_parsed': time.struct_time(tm_year=2017, tm_mon=4, tm_mday=5, tm_hour=21, tm_min=49, tm_sec=14, tm_wday=2, tm_yday=95, tm_isdst=0),   'summary': 'joshua king scores dramatic late equaliser bournemouth '              'liverpool drop 2 crucial points @ anfield.',   'summary_detail': {'base': 'http://feeds.bbci.co.uk/news/rss.xml',                      'language': none,                      'type': 'text/html',                      'value': 'joshua king scores dramatic late equaliser '                               'for bournemouth liverpool drop 2 crucial '                               'points @ anfield.'},   'title': 'liverpool 2-2 bournemouth',   'title_detail': {'base': 'http://feeds.bbci.co.uk/news/rss.xml',                    'language': none,                    'type': 'text/plain',                    'value': 'liverpool 2-2 bournemouth'}} 

here, http://c.files.bbci.co.uk/44a9/production/_95477571_joshking2.jpg somewhere nested in lists , dictionaries. while know how access in specific case, structures of feeds vary. mainly:

  • the dictionary key holding url not same
  • the 'deepness' of url might nested not same

however, case url image extension thumbnail of article. how url?

to frame out little more, use helper functions (based on feedparser module) processes feeds context variable, dictionary, usable in templates. looping , displaying of title, description etc directly in templates, since consistently part of dictionary feedparser:

... {% feed in feeds %}   <h3>{{ feed.feed.title }}</h3>   {% entry in feed.entries %} ... 

on backend :

def parse_feeds(urls):     parsed_feeds = []     url in urls:         parsed_feed = feedparser.parse(url)         parsed_feeds.append(parsed_feed)     return parsed_feeds  class indexview(generic.listview):     template_name = 'publisher/index.html'      def get_context_data(self, **kwargs):         context = super(indexview,self).get_context_data(**kwargs)         reacted_feeds = rssarticle.objects.all()         context['reacted_feeds'] = reacted_feeds         parsed_feeds = parse_feeds(urls)         delete_existing_entries(parsed_feeds)         context['feeds'] = parsed_feeds         return context 

so every time call indexview, list of articles feeds subscribed to. that's want include image, not provided feedparser due inconsistent nature of location in feeds.

if want include these pictures, @ macro level have 2 solutions:

  • writing in addition existing system, might hurt performance because of many things having happen @ same time
  • rewriting whole thing, might hurt performance , consistency because don't take advantage of feedparser's power anymore

maybe should keep raw xml , try luck beautifulsoup instead of translating dictionary feedparser.

ps : here example image located somewhere else.

{'guidislink': false,  'id': 'http://www.lemonde.fr/tiny/5106451/',  'link': 'http://www.lemonde.fr/les-decodeurs/article/2017/04/05/presidentielle-les-grands-clivages-qui-divisent-les-onze-candidats_5106451_4355770.html?xtor=rss-3208',  'links': [{'href': 'http://www.lemonde.fr/les-decodeurs/article/2017/04/05/presidentielle-les-grands-clivages-qui-divisent-les-onze-candidats_5106451_4355770.html?xtor=rss-3208',             'rel': 'alternate',             'type': 'text/html'},            {'href': 'http://s1.lemde.fr/image/2017/04/05/644x322/5106578_3_0f2b_sur-le-plateau-du-debat-de-bfmtv-et-cnews_0e90a3db44861847870cfa1e4c3793b1.jpg',             'length': '40057',             'rel': 'enclosure',             'type': 'image/jpeg'}],  'published': 'wed, 05 apr 2017 17:02:38 +0200',  'published_parsed': time.struct_time(tm_year=2017, tm_mon=4, tm_mday=5, tm_hour=15, tm_min=2, tm_sec=38, tm_wday=2, tm_yday=95, tm_isdst=0),  'summary': 'protection sociale, europe, identité… avec leurs programmes, les '             'proximités idéologiques entre candidats bousculent de plus en '             'plus le traditionnel axe «\xa0gauche-droite\xa0».',  'summary_detail': {'base': 'http://www.lemonde.fr/rss/une.xml',                     'language': none,                     'type': 'text/html',                     'value': 'protection sociale, europe, identité… avec leurs '                              'programmes, les proximités idéologiques entre '                              'candidats bousculent de plus en plus le '                              'traditionnel axe «\xa0gauche-droite\xa0».'},  'title': 'présidentielle\xa0: les grands clivages qui divisent les onze '           'candidats',  'title_detail': {'base': 'http://www.lemonde.fr/rss/une.xml',                   'language': none,                   'type': 'text/plain',                   'value': 'présidentielle\xa0: les grands clivages qui '                            'divisent les onze candidats'}} 

if need thumbnail, think easy way ignore else, , search every value string desired tail. there plenty of links traverse structure, should care that, i'd turn string , parse that.

your trigger colon followed white-space , quotation mark. grab what's between quotation marks. call value

extensions = [".jpg", ".png"] ... if value[-4:] in extensions:     # you've found desired url 

does moving?


Comments