python - Scraping Yahoo/Oath New Earnings Calendar Format -


can't figure out why modified python script below isn't working new earnings calendar format. not appear matching href may different old format (dynamic javascript?).

import datetime import requests import bs4 import csv  def get_earning_data(date,date2):     url = "http://finance.yahoo.com/calendar/earnings?&day={}".format(date)     headers = {"user-agent": "mozilla/5.0 (windows nt 6.3; rv:36.0) gecko/20100101 firefox/36.0"}     html = requests.get(url, headers=headers).text     soup = bs4.beautifulsoup(html, "html.parser")     quotes = []     tr in soup.find_all("tr"):         if len(tr.contents) > 3:             if len(tr.contents[1].contents) > 0:                 if tr.contents[1].contents[0].name == "a":                     if tr.contents[1].contents[0]["href"].startswith("/quote/"):                         if "." not in tr.contents[1].contents[0].text:                              quotes.append(tr.contents[1].contents[0].text)                             quotes.append(date2)     return quotes  outfile = "earningscalendar.csv" open(outfile, 'wb').close index = 0 while index < 7:     date = (datetime.date.today() + datetime.timedelta(index)).strftime("%y-%m-%d")     date2 = (datetime.date.today() + datetime.timedelta(index)).strftime("%d/%m/%y")     mylist = get_earning_data(date,date2)     print (mylist)     open(outfile, 'ab') csvfile:         writer = csv.writer(csvfile, delimiter=',',quoting=csv.quote_none)         in range(0, len(mylist), 2):             writer.writerow(mylist[i:i+2])     index += 1     

here's sample page source row 04-05-2017:

<tr class="data-rowkmx9 bgc($extralightblue):h h(36px) bgc($altrowcolor)" data-reactid="490"><td class="data-col0 ta(start) pend(15px) pstart(6px) w(10%)" data-reactid="491"><a href="/quote/kmx?p=kmx" title="carmax inc" data-symbol="kmx" class="fw(b)" data-reactid="492">kmx</a></td><td class="data-col1 ta(start) pend(10px) w(20%)" data-reactid="493">carmax inc</td><td class="data-col2 ta(end) pstart(15px) w(10%)" data-reactid="494">0.79</td><td class="data-col3 ta(end) pstart(15px) w(10%)" data-reactid="495">-</td><td class="data-col4 ta(end) pstart(15px) w(10%)" data-reactid="496"><span class="" data-reactid="497">-</span></td><td class="data-col5 ta(end) pend(6px) pstart(15px) w(13%)" data-reactid="498"><span data-reactid="499">before market open</span></td></tr>

here's sample page showing old format: http://web.archive.org/web/20170301070135/https://biz.yahoo.com/research/earncal/today.html

the difference between old , new can see .startswith. using "http://finance.yahoo.com/quote/" not work either.


Comments