BeautifulSoup 기초


앞으로 모든 것은 python3를 기준으로!

BeautifulSoup

BeautifulSoup는 HTML을 XML 형식의 파이썬 객체로 변환합니다
쉽게 웹에서 필요한 내용만 가져올 수 있습니다

설치

1
$pip3 install beautifulsoup4

Code

1
2
3
4
5
6
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read(), "html.parser")
print(bsObj.h1)

결과

1
<h1>An Interesting Title</h1>

참고

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Title얻기

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from urllib.request import urlopen
from urllib.request import HTTPError
from bs4 import BeautifulSoup


def getTitle(url):
try:
html = urlopen(url)
except HTTPError as e:
print(e)

try:
bsObj = BeautifulSoup(html.read(), "html.parser")
title = bsObj.head.title
except AttributeError as e:
print(e)
else:
return title


title = getTitle("http://pythonscraping.com/pages/page1.html")
if title == None:
print("Title could not be found")
else:
print(title)