from urllib.request import urlopen bs4 import beautifulsoup import re html = urlopen("http://www.bbc.co.uk/iplayer/live/bbcone?area=london") bsobj = beautifulsoup(html, "html.parser") version = bsobj.find(string = re.compile('doctype html')) if version in bsobj: print("yes") else: print("no")
i know doctype declaration "http://www.bbc.co.uk/iplayer/live/bbcone?area=london" html 5 (!doctype html) when run script output "no". doing wrong?
doctype instruction browser find , find_all won't find because not html tag.
other regex won't work because string
value in bs html
rather doctype html
.
you can use link user kindall
mentioned or use way:
import requests bs4 import beautifulsoup, doctype html = requests.get("http://www.bbc.co.uk/iplayer/live/bbcone?area=london") soup = beautifulsoup(html.content, "html.parser") version = soup.find_all(string="html") doctype = next(item item in version if isinstance(item, doctype)) print (doctype)
which print:
html
Comments
Post a Comment