윤영준 윤영준 2023-12-13
Existing cafe info scrapper does not work due to Naver's lazyness, as only first 100 pages are relevent and rest of it is just repeated. This is an rework of that.
@3ac64686feb4e872ab6bfc69beaaf1b8eafaea87
DEPRECIATED_naver_cafe_info_gatherer.py (Renamed from naver_cafe_info_gatherer.py)
--- naver_cafe_info_gatherer.py
+++ DEPRECIATED_naver_cafe_info_gatherer.py
@@ -24,6 +24,8 @@
 HEADER = {"User-Agent": "Mozilla/119.0 (Windows NT 10.0; Win64; x64) Chrome/98.0.4758.102"}
 
 
+
+
 # Function to remove tags
 def remove_tags(html):
     # parse html content
cafe_scrapping_executor.py (Renamed from cafe_scarapping_executor.py)
--- cafe_scarapping_executor.py
+++ cafe_scrapping_executor.py
@@ -1,7 +1,7 @@
-from naver_blog_url_gatherer import blog_url_scrapper
+from DEPRECIATED_naver_cafe_info_gatherer import naver_cafe_scrapper
 
 if __name__ == "__main__":
-    # blog_url_scrapper("선산읍", "2022-01-01", "2023-10-31")
+    naver_cafe_scrapper("선산읍", 100, 101)
     # blog_url_scrapper("고아읍", "2022-01-01", "2023-10-31")
     # blog_url_scrapper("산동읍", "2022-01-01", "2023-10-31")
     # blog_url_scrapper("무을면", "2022-01-01", "2023-10-31")
@@ -25,4 +25,4 @@
     # blog_url_scrapper("인동동", "2022-01-01", "2023-10-31")
     # blog_url_scrapper("진미동", "2022-01-01", "2023-10-31")
     # blog_url_scrapper("양포동", "2022-01-01", "2023-10-31")
-    blog_url_scrapper("구미", "2022-01-01", "2023-10-31")
(파일 끝에 줄바꿈 문자 없음)
+    # blog_url_scrapper("구미", "2022-01-01", "2023-10-31")
(파일 끝에 줄바꿈 문자 없음)
Add a comment
List