파이썬(Python) 정규식과 엑셀 활용하기

파이썬 크롤링(Python Crawling)2018. 12. 9. 01:23

뷰어
댓글로
이전글
다음글

728x90

※ 정규식의 활용 ※

정규식을 사용하는 기본적인 방법은 다음과 같습니다.

import re

data = "제 번호는 010-1234-5678입니다. 전화 주세요. 017-1234-4567으로 전화해도 돼요. 그쪽 번호는 010-9999-1111 맞나요?"

compile_text = re.compile(r'010-\d\d\d\d-\d\d\d\d')
match_text = compile_text.findall(data)
print(match_text)

student_email = ["gildong@gmail.com", "hello@hello", "myname@name.com", "abcd@.com", "abcd@abc.co.kr"]

compile_text = re.compile('^[a-zA-Z0-9+_.]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$')
for i in student_email:
    print(i, ": ", compile_text.match(i) != None)

또한 파싱 이후에 나온 내용 중에서 '숫자' 데이터만 리스트에 담고 싶다면 다음과 같이 할 수 있습니다.

import urllib.request
from bs4 import BeautifulSoup
import re


def main():
    url = "http://ndb796.tistory.com/109"
    soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html.parser")
    texts = soup.find("div", class_="tt_article_useless_p_margin").find_all("p")

    result = []

    for i in texts:
        compile_text = re.compile("\d+")
        list = compile_text.findall(i.get_text())
        if len(list) > 0:
            result.extend(list)

    print(result)


if __name__ == "__main__":
    main()

※ 엑셀 활용 ※

파이썬(Python)은 openpyxl 라이브러리를 활용해서 크롤링한 데이터를 엑셀 형태로 내보낼 수 있습니다.

import urllib.request
from bs4 import BeautifulSoup
from openpyxl import Workbook


def main():
    url = "http://www.newsis.com/realnews/"
    sourcecode = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(sourcecode, "html.parser")

    articles = []

    for i in soup.find_all("strong", class_="title"):
        articles.append(i.get_text())

    wb = Workbook()
    sheet1 = wb.active
    file_name = 'output.xlsx'

    for i in range(0, len(articles)):
        sheet1.cell(row=i + 1, column=1).value = i + 1
        sheet1.cell(row=i + 1, column=2).value = articles[i]

    wb.save(filename=file_name)

if __name__ == "__main__":
    main()

728x90

저작자표시 비영리 변경금지

'파이썬 크롤링(Python Crawling)' 카테고리의 다른 글

파이썬(Python) JTBC 뉴스 기사 파싱하여 날짜별로 통계 구하기 (4)	2018.12.09
파이썬(Python) 네이트 판 최신 뉴스 기사 파싱하기 (0)	2018.12.08
파이썬(Python) SBS 최신 뉴스 파싱하기 (0)	2018.12.08
파이썬(Python) 스포츠 동아 최신 뉴스 기사 파싱하기 (0)	2018.12.08
파이썬(Python) 네이트 판 댓글 파싱하기 (0)	2018.12.08

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

안경잡이개발자

파이썬(Python) 정규식과 엑셀 활용하기

'파이썬 크롤링(Python Crawling)' 카테고리의 다른 글

최근에 올라온 글

최근에 달린 댓글

공지사항

글 보관함

최근에 받은 트랙백

링크

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역