How do I grab all the links within an element in HTML using python? -

- February 15, 2011

first, please check image below can better explain question:

enter image description here

i trying take user input select 1 of links below "course search term".... (ie. winter 2015).

the html opened shows part of code webpage. grab href links in element , consists of 5 term links want. following instructions website (www.gregreda.com/2013/03/03/web-scraping-101-with-python/), doesn't explain part. here code have been trying.

from bs4 import beautifulsoup urllib2 import urlopen  base_url = "http://classes.uoregon.edu/"  def get_category_links(section_url):      html = urlopen(section_url).read()     soup = beautifulsoup(html, "lxml")     pldefault = soup.find("td", "pldefault")     ul_links = pldefault.find("ul")     category_links = [base_url + ul.a["href"] in ul_links.findall("ul")]      return category_links

any appreciated! thanks. or if see website, classes.uoregon.edu/

i keep simple , locate links containing 2015 in text , term in href:

for link in soup.find_all("a",                           href=lambda href: href , "term" in href,                           text=lambda text: text , "2015" in text):     print link["href"]

prints:

/pls/prod/hwskdhnt.p_search?term=201402 /pls/prod/hwskdhnt.p_search?term=201403 /pls/prod/hwskdhnt.p_search?term=201404 /pls/prod/hwskdhnt.p_search?term=201406 /pls/prod/hwskdhnt.p_search?term=201407

if want full urls, use urlparse.urljoin() join links base url:

from urlparse import urljoin  ... link in soup.find_all("a",                           href=lambda href: href , "term" in href,                           text=lambda text: text , "2015" in text):     print urljoin(url, link["href"])

this print:

http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201402 http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201403 http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201404 http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201406 http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201407

Search This Blog

Sort

How do I grab all the links within an element in HTML using python? -

Comments

Post a Comment

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

[C++][SFML 2.2] Strange Performance Issues - Moving Mouse Lowers CPU Usage -

ios - Possible to get UIButton sizeThatFits to work? -