BeautifulSoup 기초
앞으로 모든 것은 python3를 기준으로!
BeautifulSoup
BeautifulSoup는 HTML을 XML 형식의 파이썬 객체로 변환합니다
쉽게 웹에서 필요한 내용만 가져올 수 있습니다
설치
1
| $pip3 install beautifulsoup4
|
Code
1 2 3 4 5 6
| from urllib.request import urlopen from bs4 import BeautifulSoup
html = urlopen("http://pythonscraping.com/pages/page1.html") bsObj = BeautifulSoup(html.read(), "html.parser") print(bsObj.h1)
|
결과
1
| <h1>An Interesting Title</h1>
|
참고
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Title얻기
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| from urllib.request import urlopen from urllib.request import HTTPError from bs4 import BeautifulSoup
def getTitle(url): try: html = urlopen(url) except HTTPError as e: print(e)
try: bsObj = BeautifulSoup(html.read(), "html.parser") title = bsObj.head.title except AttributeError as e: print(e) else: return title
title = getTitle("http://pythonscraping.com/pages/page1.html") if title == None: print("Title could not be found") else: print(title)
|
Hoyuo안드로이드 개발자입니다