ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Web-Crawling (네이버 북)
    Python 2020. 6. 4. 18:37

    네이버 책 검색기

    • 네이버 책 메뉴에서 빅데이터를 검색 해 책 제목, 저자, 출판사, 출판일, 정가, 할인가 출력

    %%html
    
    <!-- 에디터 폰트를 조정합니다. -->
    <style type='text/css'>
    .CodeMirror{
        font-size: 14px;
        font-family: Bitstream Vera Sans Mono;}
    </style>
    import requests
    from bs4 import BeautifulSoup

    주소 작업

    url = 'https://book.naver.com/search/search.nhn'
    params = {'sm':'sta_hty.book', 
              'sug':'', 
              'where':'nexearch',
              'query':'bigdata'}

    get 요청

    response = requests.get(url, params=params)
    status_code = response.status_code
    print(status_code)
    if status_code == 200:
        text = response.text
    200

    Str ==> BeautifulSoup 객체로 변환

    #print(text)
    soup = BeautifulSoup(text)

    책 전체 정보 가져오기 ==> 크롬 개발자 도구(F12)

    book_all = soup.find(id='searchBiblioList')
    #book_all = soup.find(attrs={'id':'searchBiblioList'})
    #book_all = soup.select_one('#searchBiblioList')
    #book_all
    book_all_li_all = book_all.select('li') # 책 전체
    book_all_li_one = book_all.select_one('li') # 책 한개
    book_all_li_one
    <li style="position:relative;">
    <div class="thumb type_search">
    <div class="thumb_type thumb_type2">
    <a class="N=a:bls.thumb,r:1,i:98000001_000000000000000000ECC37F" href="http://book.naver.com/bookdb/book_detail.nhn?bid=15516543" target="_blank">
    <img alt="KNIME을 활용한 Big Data분석" onerror="this.src='https://ssl.pstatic.net/static/book/image/noimg3.gif';" src="https://bookthumb-phinf.pstatic.net/cover/155/165/15516543.jpg?type=m1&amp;udate=Thu Jun 04 18:35:29 KST 2020"/><span class="mask"><span class="bg1"></span><span class="bg2"></span></span>
    </a>
    </div>
    </div>
    <dl style="width:654px">
    <dt>
    <a class="N=a:bls.title,r:1,i:98000001_000000000000000000ECC37F" href="http://book.naver.com/bookdb/book_detail.nhn?bid=15516543" target="_blank">KNIME을 활용한 <strong>Big</strong> <strong>Data</strong>분석</a><span> (Click 하나로 끝내는 데이터 분석 KNIME)</span> </dt>
    <dd class="txt_block">
    <a class="N=a:bls.author,r:1,i:4570282" href="http://book.naver.com/search/search.nhn?query=%EC%A1%B0%EC%B9%98%EC%84%A0&amp;frameFilterType=1&amp;frameFilterValue=4570282">조치선</a>, <a class="N=a:bls.author,r:1,i:9162" href="http://book.naver.com/search/search.nhn?query=%EC%A0%95%EC%98%81%EC%A7%84&amp;frameFilterType=1&amp;frameFilterValue=9162">정영진</a> 외 5명 저 <span class="bar">|</span> <a class="N=a:bls.publisher,r:1,i:" href="http://book.naver.com/search/search.nhn?filterType=7&amp;query=%EC%97%91%EC%85%88">엑셈</a> <span class="bar">|</span> 2019.09.25</dd>
    <dd class="txt_desc">
    <div class="review_point">
    <span style="width:100.0%;">별점</span>
    </div>
                    10.0<span class="blind">점</span>
    <span class="bar"> | </span>
    <a class="N=a:bls.review,r:1,i:" href="/bookdb/review.nhn?bid=15516543">네티즌리뷰</a>
                        3건
                    <span class="bar">|</span>
    <a class="N=a:bls.bookbuy,r:1,i:98000001_000000000000000000ECC37F" href="javascript:showBuyLayerByBid('15516543')" id="buy_btn_15516543" onclick="return showAdultLayer('15516543', 'false', 'false', 'false');"><img alt="도서구매" class="btn v2" height="20" id="btn_buy_img_15516543" src="https://ssl.pstatic.net/static/book/image/btn_book_buy.gif" title="구매 가능한 도서입니다." width="48"/></a>
                         <strike>25000원</strike> → <em class="price">22500원(-10%)</em>
    </dd>
    <dd id="searchDescrition_15516543" line="3" type="책소개">
    <span class="txt_g1">소개 </span>
                    쉽고 빠르게 활용할 수 있는 KNIME ANALYTICS PLATFORM 기반 데이터 분석KNIME은 독일의 KONSTANZ 대학에서 개발된 워크플로우 기반의 통합 데이터 분석 플랫폼입니다. ‘KNIME을 활용한 빅데이터 분석’은 데이터 분석을 처음 접하는 학생들과 현업 담당자들을 위하여 집필하였고 데이터 선택...</dd>
    <!-- 연관도서 노출 -->
    </dl>
    </li>

    책 bid 한개 가져오기

    bid_one = book_all_li_one.select_one('a')['href'].split('=')[1]
    print(bid_one)
    15516543

    책 bid 다 가져와서 list에 담기

    bid_list = []
    for book_all_bid_one in book_all_li_all:
        b_id = book_all_bid_one.select_one('a')['href'].split('=')[1]
        bid_list.append(b_id)

    bid list 확인

    print(bid_list)
    ['15516543', '13587569', '13783099', '16338249', '16327795', '16346530', '14594752', '13784550', '13399152', '16276774', '10390764', '15748262', '13409559', '15746028', '15744672', '16113809', '15811120', '10220466', '15136965', '15760214']

    책제목 하나 가져오기

    book_title = book_all_li_one.select_one('img')['alt']
    book_title
    title_one = book_all_li_one.img['alt']
    print(title_one)
    KNIME을 활용한 Big Data분석

    책제목 전체 가져와서 list에 담기

    title_list = []
    for book_name_one in book_all_li_all:
        b_title = book_name_one.select_one('img')['alt']
        title_list.append(b_title)
    print(title_list)
    ['KNIME을 활용한 Big Data분석', '빅데이터', '빅데이터 리더십', 'BIG DATA를 활용한 K-뷰티경영학', 'Knowledge Discovery in Big Data from Astronomy and Earth Observation: Astrogeoinformatics', 'Ocean Energy Modeling and Simulation with Big Data: Computational Intelligence for System Optimizati', '헬스케어.의료분야 인공지능(AI)과 빅데이터(Big Data)의 핵심기술 개발동향과 국내외 시장 분석', '빅데이터 빅마인드', 'Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are', 'Systems Simulation and Modeling for Cloud Computing and Big Data Applications', 'Big Data', 'Big Data', '지식의 방주039 대한민국 여행트렌드 2018 Ⅹ. 빅데이터(Big Data)', 'Big Data', 'Big Data', 'Sharing Economy and Big Data Analytics', 'Big Data Mining for Climate Change', 'MY BIG DATA', 'Big Data', 'Spatial Analysis Using Big Data: Methods and Urban Applications']

    데이터 가공(책 한권)

    book_text = book_all_li_one.select_one('dd.txt_block').text
    book_text = book_text.replace('\n', '').replace('\xa0','')
    book_text = book_text.replace('\r','').replace('\t','')
    book_text_list = book_text.split('|')
    book_author = book_text_list[0]
    book_publisher = book_text_list[1]
    book_pubdate = book_text_list[2]

    데이터 가공(책 전체)

    author_list =[]
    publisher_list = []
    pubdate_list = []
    for book_all_li_one in book_all_li_all:
        book_text = book_all_li_one.select_one('dd.txt_block').text
        book_text = book_text.replace('\n','').replace('\r','')
        book_text = book_text.replace('\t','').replace('\xa0','')
        book_text_list = book_text.split('|')
        if(len(book_text_list)) == 4:
            book_author = book_text_list[0] + book_text_list[1]
            book_publish = book_text_list[2]
            book_pubdate = book_text_list[3]
        else:
            book_author = book_text_list[0]
            book_publish = book_text_list[1]
            book_pubdate = book_text_list[2]
        author_list.append(book_author)
        publisher_list.append(book_publish)
        pubdate_list.append(book_pubdate)

    책 저자 하나 가져오기

    print(book_author)
    Yoshiki Yamagata 저 

    모든 책 저자 가져오기

    print(author_list)
    ['조치선, 정영진 외 5명 저 ', '안지선 글  송진욱 그림', '김진호(대학교수), 최용주(대학부총장) 저 ', '이범식 김은주 전소현 이상범 저 ', 'Petr Skoda 저 ', 'Vikas Khare 저 ', '편집부 저 ', '박형준 저 ', 'Stephens-davidowitz, Seth 저 ', 'Dinesh Peter 저 ', '버나드 마 저  Ann Lee 역', 'Pedersen, John S. (EDT), Wilkinson, Adrian (EDT) 저 ', '조명화(여행작가) 저 ', 'Sarangi, Saswat, Sharma, Pankaj 저 ', 'Sarangi, Saswat, Sharma, Pankaj 저 ', 'Soraya Sedkaoui 저 ', 'Zhihua Zhang 저 ', '이랑(가수), 황국영 저 ', 'Zgurovsky, Michael Z., Zaychenko, Yuriy P. 저 ', 'Yoshiki Yamagata 저 ']

    책 출판사 하나 가져오기

    print(book_publisher)
    엑셈

    모든 책 출판사 가져오기

    print(publisher_list)
    ['엑셈', '봄볕', '북카라반', '구민사', 'Elsevier', 'Elsevier', 'IRSGlobal', '리드리드출판', 'DeyStreetBooks', 'Elsevier', '교학사', 'EdwardElgarPub', '테마여행신문TTNThemeTravelNewsKorea', 'RoutledgeIndia', 'RoutledgeIndia', 'Wiley-ISTE', 'Elsevier', '소시민워크', 'Springer-NatureNewYorkInc', 'Elsevier']

    책 출판일 하나 가져오기

    print(book_pubdate)
    2019.11.03

    모든 책 출판일 가져오기

    print(pubdate_list)
    ['2019.09.25', '2018.05.16', '2018.07.25', '2020.04.20', '2020.04.22', '2020.04.21', '2019.02.26', '2018.07.27', '2018.02.27', '2020.03.09', '2016.03.20', '2019.11.29', '2018.03.03', '2019.10.05', '2019.10.03', '2020.01.09', '2019.12.03', '2016.02.01', '2019.07.05', '2019.11.03']

    책 가격(정가-할인가) 하나 가져오기

    book_txt_desc = book_all_li_all[1].select_one('dd.txt_desc')
    price_old = book_txt_desc.select_one('strike').text
    price_old = price_old.split('원')[0]
    price_new = book_txt_desc.select_one('em.price').text
    price_new = price_new.split('원')[0]
    print(price_old, price_new)
    13000 11700

    모든 책 가격(정가-할인가) 가져오기

    price_list = []
    for book_all_one in book_all_li_all:
        book_txt_descs = book_all_one.select_one('dd.txt_desc')
        price_olds = book_txt_descs.select_one('strike')
        price_news = book_txt_descs.select_one('em.price')
        if price_olds == None:
            price_olds = 0
        else:
            price_olds = price_olds.text.split('원')[0]
            #print(price_olds)
        if price_news == None:
            price_news = 0
        else:
            price_news = price_news.text.split('원')[0]
            #print(price_news)
        price_list.append((price_olds, price_news))
    print(price_list)
    [('25000', '22500'), ('13000', '11700'), ('16000', '14400'), ('21000', '20370'), (0, 0), (0, 0), ('390000', '351000'), ('15800', '14220'), ('22220', '13880'), (0, 0), ('14000', '12600'), ('189540', '188320'), (0, 0), ('69650', '57120'), ('196560', '177940'), (0, 0), (0, 0), (0, 0), ('214180', '190630'), (0, 0)]

    Book_info_list

    book_info_list = []
    for i in range(len(bid_list)):
        book_info_dict = dict()
        book_info_dict['bid'] = bid_list[i]
        book_info_dict['title'] = title_list[i]
        book_info_dict['author'] = author_list[i]
        book_info_dict['publisher'] = publisher_list[i]
        book_info_dict['pubdate'] = pubdate_list[i]
        book_info_dict['price_old_new'] = price_list[i]
        book_info_list.append(book_info_dict)
    book_info_list
    [{'bid': '15516543',
      'title': 'KNIME을 활용한 Big Data분석',
      'author': '조치선, 정영진 외 5명 저 ',
      'publisher': '엑셈',
      'pubdate': '2019.09.25',
      'price_old_new': ('25000', '22500')},
     {'bid': '13587569',
      'title': '빅데이터',
      'author': '안지선 글  송진욱 그림',
      'publisher': '봄볕',
      'pubdate': '2018.05.16',
      'price_old_new': ('13000', '11700')},
     {'bid': '13783099',
      'title': '빅데이터 리더십',
      'author': '김진호(대학교수), 최용주(대학부총장) 저 ',
      'publisher': '북카라반',
      'pubdate': '2018.07.25',
      'price_old_new': ('16000', '14400')},
     {'bid': '16338249',
      'title': 'BIG DATA를 활용한 K-뷰티경영학',
      'author': '이범식 김은주 전소현 이상범 저 ',
      'publisher': '구민사',
      'pubdate': '2020.04.20',
      'price_old_new': ('21000', '20370')},
     {'bid': '16327795',
      'title': 'Knowledge Discovery in Big Data from Astronomy and Earth Observation: Astrogeoinformatics',
      'author': 'Petr Skoda 저 ',
      'publisher': 'Elsevier',
      'pubdate': '2020.04.22',
      'price_old_new': (0, 0)},
     {'bid': '16346530',
      'title': 'Ocean Energy Modeling and Simulation with Big Data: Computational Intelligence for System Optimizati',
      'author': 'Vikas Khare 저 ',
      'publisher': 'Elsevier',
      'pubdate': '2020.04.21',
      'price_old_new': (0, 0)},
     {'bid': '14594752',
      'title': '헬스케어.의료분야 인공지능(AI)과 빅데이터(Big Data)의 핵심기술 개발동향과 국내외 시장 분석',
      'author': '편집부 저 ',
      'publisher': 'IRSGlobal',
      'pubdate': '2019.02.26',
      'price_old_new': ('390000', '351000')},
     {'bid': '13784550',
      'title': '빅데이터 빅마인드',
      'author': '박형준 저 ',
      'publisher': '리드리드출판',
      'pubdate': '2018.07.27',
      'price_old_new': ('15800', '14220')},
     {'bid': '13399152',
      'title': 'Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are',
      'author': 'Stephens-davidowitz, Seth 저 ',
      'publisher': 'DeyStreetBooks',
      'pubdate': '2018.02.27',
      'price_old_new': ('22220', '13880')},
     {'bid': '16276774',
      'title': 'Systems Simulation and Modeling for Cloud Computing and Big Data Applications',
      'author': 'Dinesh Peter 저 ',
      'publisher': 'Elsevier',
      'pubdate': '2020.03.09',
      'price_old_new': (0, 0)},
     {'bid': '10390764',
      'title': 'Big Data',
      'author': '버나드 마 저  Ann Lee 역',
      'publisher': '교학사',
      'pubdate': '2016.03.20',
      'price_old_new': ('14000', '12600')},
     {'bid': '15748262',
      'title': 'Big Data',
      'author': 'Pedersen, John S. (EDT), Wilkinson, Adrian (EDT) 저 ',
      'publisher': 'EdwardElgarPub',
      'pubdate': '2019.11.29',
      'price_old_new': ('189540', '188320')},
     {'bid': '13409559',
      'title': '지식의 방주039 대한민국 여행트렌드 2018 Ⅹ. 빅데이터(Big Data)',
      'author': '조명화(여행작가) 저 ',
      'publisher': '테마여행신문TTNThemeTravelNewsKorea',
      'pubdate': '2018.03.03',
      'price_old_new': (0, 0)},
     {'bid': '15746028',
      'title': 'Big Data',
      'author': 'Sarangi, Saswat, Sharma, Pankaj 저 ',
      'publisher': 'RoutledgeIndia',
      'pubdate': '2019.10.05',
      'price_old_new': ('69650', '57120')},
     {'bid': '15744672',
      'title': 'Big Data',
      'author': 'Sarangi, Saswat, Sharma, Pankaj 저 ',
      'publisher': 'RoutledgeIndia',
      'pubdate': '2019.10.03',
      'price_old_new': ('196560', '177940')},
     {'bid': '16113809',
      'title': 'Sharing Economy and Big Data Analytics',
      'author': 'Soraya Sedkaoui 저 ',
      'publisher': 'Wiley-ISTE',
      'pubdate': '2020.01.09',
      'price_old_new': (0, 0)},
     {'bid': '15811120',
      'title': 'Big Data Mining for Climate Change',
      'author': 'Zhihua Zhang 저 ',
      'publisher': 'Elsevier',
      'pubdate': '2019.12.03',
      'price_old_new': (0, 0)},
     {'bid': '10220466',
      'title': 'MY BIG DATA',
      'author': '이랑(가수), 황국영 저 ',
      'publisher': '소시민워크',
      'pubdate': '2016.02.01',
      'price_old_new': (0, 0)},
     {'bid': '15136965',
      'title': 'Big Data',
      'author': 'Zgurovsky, Michael Z., Zaychenko, Yuriy P. 저 ',
      'publisher': 'Springer-NatureNewYorkInc',
      'pubdate': '2019.07.05',
      'price_old_new': ('214180', '190630')},
     {'bid': '15760214',
      'title': 'Spatial Analysis Using Big Data: Methods and Urban Applications',
      'author': 'Yoshiki Yamagata 저 ',
      'publisher': 'Elsevier',
      'pubdate': '2019.11.03',
      'price_old_new': (0, 0)}]

    'Python' 카테고리의 다른 글

    Web-Crawling (다음 뉴스)  (0) 2020.06.08
    Web-Crawling (네이버 영화)  (0) 2020.06.05
    Web-Crawling  (0) 2020.06.04
    Jupyter Notebook  (0) 2020.06.02
    Python 가상환경 생성 및 R 주피터 노트북 연결  (0) 2020.06.02
Designed by Tistory.