python - Trying to print the doctype declaration -

from urllib.request import urlopen bs4 import beautifulsoup import re  html = urlopen("http://www.bbc.co.uk/iplayer/live/bbcone?area=london") bsobj = beautifulsoup(html, "html.parser") version = bsobj.find(string = re.compile('doctype html'))  if version in bsobj:     print("yes") else:     print("no")

i know doctype declaration "http://www.bbc.co.uk/iplayer/live/bbcone?area=london" html 5 (!doctype html) when run script output "no". doing wrong?

doctype instruction browser find , find_all won't find because not html tag.

other regex won't work because string value in bs html rather doctype html.

you can use link user kindall mentioned or use way:

import requests bs4 import beautifulsoup, doctype  html = requests.get("http://www.bbc.co.uk/iplayer/live/bbcone?area=london") soup = beautifulsoup(html.content, "html.parser") version = soup.find_all(string="html") doctype = next(item item in version if isinstance(item, doctype))  print (doctype)

which print:

html

test

Search This Blog

python - Trying to print the doctype declaration -

Comments

Post a Comment