'2018/12/09 글 목록

2018/12/09 +2

Loading..파이썬(Python) JTBC 뉴스 기사 파싱하여 날짜별로 통계 구하기4
2018.12.09

뷰어로 보기
Loading..파이썬(Python) 정규식과 엑셀 활용하기
2018.12.09

뷰어로 보기

파이썬(Python) JTBC 뉴스 기사 파싱하여 날짜별로 통계 구하기

파이썬 크롤링(Python Crawling)2018. 12. 9. 01:58

뷰어
댓글로
이전글
다음글

728x90

소스코드는 다음과 같습니다.

import urllib.request
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt


def main():
    url = "http://news.jtbc.joins.com/section/index.aspx?scode=70"
    sourcecode = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(sourcecode, "html.parser")

    times = []

    for i in range(0, 20):
        times.append(soup.find_all("span", class_="date")[i].get_text().strip())

    edited = []

    for i in range(0, len(times)):
        edited.append(times[i][8:10])

    count1 = 0
    count2 = 0
    count3 = 0
    count4 = 0

    for i in range(0, len(edited)):
        if edited[i] == "05": # 05일 기사 개수 구하기
            count1 = count1 + 1
        elif edited[i] == "06": # 06일 기사 개수 구하기
            count2 = count2 + 1
        elif edited[i] == "07": # 07일 기사 개수 구하기
            count3 = count3 + 1
        elif edited[i] == "08": # 08일 기사 개수 구하기
            count4 = count4 + 1

    days = [count1, count2, count3, count4]
    activities = ['05', '06', '07', '08']
    colors = ['red', 'blue', 'green', 'yellow']
    plt.pie(days, labels=activities, colors=colors, startangle=90, autopct='%.2f%%')
    plt.show()


if __name__ == "__main__":
    main()

728x90

저작자표시 비영리 변경금지 (새창열림)

'파이썬 크롤링(Python Crawling)' 카테고리의 다른 글

파이썬(Python) 정규식과 엑셀 활용하기 (0)	2018.12.09
파이썬(Python) 네이트 판 최신 뉴스 기사 파싱하기 (0)	2018.12.08
파이썬(Python) SBS 최신 뉴스 파싱하기 (0)	2018.12.08
파이썬(Python) 스포츠 동아 최신 뉴스 기사 파싱하기 (0)	2018.12.08
파이썬(Python) 네이트 판 댓글 파싱하기 (0)	2018.12.08

파이썬(Python) 정규식과 엑셀 활용하기

파이썬 크롤링(Python Crawling)2018. 12. 9. 01:23

뷰어
댓글로
이전글
다음글

728x90

※ 정규식의 활용 ※

정규식을 사용하는 기본적인 방법은 다음과 같습니다.

import re

data = "제 번호는 010-1234-5678입니다. 전화 주세요. 017-1234-4567으로 전화해도 돼요. 그쪽 번호는 010-9999-1111 맞나요?"

compile_text = re.compile(r'010-\d\d\d\d-\d\d\d\d')
match_text = compile_text.findall(data)
print(match_text)

student_email = ["gildong@gmail.com", "hello@hello", "myname@name.com", "abcd@.com", "abcd@abc.co.kr"]

compile_text = re.compile('^[a-zA-Z0-9+_.]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$')
for i in student_email:
    print(i, ": ", compile_text.match(i) != None)

또한 파싱 이후에 나온 내용 중에서 '숫자' 데이터만 리스트에 담고 싶다면 다음과 같이 할 수 있습니다.

import urllib.request
from bs4 import BeautifulSoup
import re


def main():
    url = "http://ndb796.tistory.com/109"
    soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html.parser")
    texts = soup.find("div", class_="tt_article_useless_p_margin").find_all("p")

    result = []

    for i in texts:
        compile_text = re.compile("\d+")
        list = compile_text.findall(i.get_text())
        if len(list) > 0:
            result.extend(list)

    print(result)


if __name__ == "__main__":
    main()

※ 엑셀 활용 ※

파이썬(Python)은 openpyxl 라이브러리를 활용해서 크롤링한 데이터를 엑셀 형태로 내보낼 수 있습니다.

import urllib.request
from bs4 import BeautifulSoup
from openpyxl import Workbook


def main():
    url = "http://www.newsis.com/realnews/"
    sourcecode = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(sourcecode, "html.parser")

    articles = []

    for i in soup.find_all("strong", class_="title"):
        articles.append(i.get_text())

    wb = Workbook()
    sheet1 = wb.active
    file_name = 'output.xlsx'

    for i in range(0, len(articles)):
        sheet1.cell(row=i + 1, column=1).value = i + 1
        sheet1.cell(row=i + 1, column=2).value = articles[i]

    wb.save(filename=file_name)

if __name__ == "__main__":
    main()

728x90

저작자표시 비영리 변경금지 (새창열림)

'파이썬 크롤링(Python Crawling)' 카테고리의 다른 글

파이썬(Python) JTBC 뉴스 기사 파싱하여 날짜별로 통계 구하기 (4)	2018.12.09
파이썬(Python) 네이트 판 최신 뉴스 기사 파싱하기 (0)	2018.12.08
파이썬(Python) SBS 최신 뉴스 파싱하기 (0)	2018.12.08
파이썬(Python) 스포츠 동아 최신 뉴스 기사 파싱하기 (0)	2018.12.08
파이썬(Python) 네이트 판 댓글 파싱하기 (0)	2018.12.08

‹ Prev 1 Next ›

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

안경잡이개발자

파이썬(Python) JTBC 뉴스 기사 파싱하여 날짜별로 통계 구하기

'파이썬 크롤링(Python Crawling)' 카테고리의 다른 글

파이썬(Python) 정규식과 엑셀 활용하기

'파이썬 크롤링(Python Crawling)' 카테고리의 다른 글

최근에 올라온 글

최근에 달린 댓글

공지사항

글 보관함

최근에 받은 트랙백

링크

티스토리툴바