python - mergnig POS tag by noun phrase chunk -


my question similar question. in spacy, can part-of-speech tagging , noun phrase identification separately e.g.

import spacy nlp = spacy.load('en') sentence = 'for instance , consider 1 simple phenomena :              question typically followed answer ,              or explicit statement of inability or refusal answer .' token = nlp(sentence) token_tag = [(word.text, word.pos_) word in token] 

output looks like:

[('for', 'adp'),  ('instance', 'noun'),  (',', 'punct'),  ('consider', 'verb'),  ('one', 'num'),  ('simple', 'adj'),  ('phenomena', 'noun'),   ...] 

for noun phrase or chunk, can noun_chunks chunk of words follows:

[nc nc in token.noun_chunks] # [instance, 1 simple phenomena, answer, ...] 

i'm wondering if there way cluster pos tag based on noun_chunks output as

[('for', 'adp'),  ('instance', 'noun'), # or noun_chunks  (',', 'punct'),  ('one simple phenomena', 'noun_chunks'),   ...] 

i figured out how it. basically, can start , end position of noun phrase token follows:

noun_phrase_position = [(s.start, s.end) s in token.noun_chunks] noun_phrase_text = dict([(s.start, s.text) s in token.noun_chunks]) token_pos = [(i, t.text, t.pos_) i, t in enumerate(token)] 

then combine solution in order merge list of token_pos based on start, stop position

result = [] start, end in noun_phrase_position:     result += token_pos[index:start]     result.append(token_pos[start:end])     index = end  result_merge = [] i, r in enumerate(result):     if len(r) > 0 , isinstance(r, list):         result_merge.append((r[0][0], noun_phrase_text.get(r[0][0]), 'noun_phrase'))     else:         result_merge.append(r) 

output

[(1, 'instance', 'noun_phrase'),  (2, ',', 'punct'),  (3, 'consider', 'verb'),  (4, 'one simple phenomena', 'noun_phrase'),  (7, ':', 'punct'),  (8, 'a', 'det'), ... 

Comments