BeautifulSoup 기초

2017-01-10 22:35

앞으로 모든 것은 python3를 기준으로!

BeautifulSoup

BeautifulSoup는 HTML을 XML 형식의 파이썬 객체로 변환합니다
쉽게 웹에서 필요한 내용만 가져올 수 있습니다

설치

1	$pip3 install beautifulsoup4

Code

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read(), "html.parser")
print(bsObj.h1)

결과

1	<h1>An Interesting Title</h1>

참고

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Title얻기

from urllib.request import urlopen
from urllib.request import HTTPError
from bs4 import BeautifulSoup


def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)

    try:
        bsObj = BeautifulSoup(html.read(), "html.parser")
        title = bsObj.head.title
    except AttributeError as e:
        print(e)
    else:
        return title


title = getTitle("http://pythonscraping.com/pages/page1.html")
if title == None:
    print("Title could not be found")
else:
    print(title)

python web

Hoyuo

안드로이드 개발자입니다

BeautifulSoup 기초

BeautifulSoup

설치

Code

결과

참고

Title얻기

Related posts