当前位置：首页 - 正文

python下 selenium与chrome结合进行网页爬取,怎么设置代理IP

发布网友发布时间：2022-04-05 23:41

我来回答

共2个回答

懂视网时间：2022-04-06 04:03

python写代理ip的方法：首先创建代理ip对象，并定制一个opener对象；然后安装opener对象，以后的urlopen就一直使用这个代理地址。

python写代理ip的方法：

方法1：

先创建代理ip对象

proxy_support = urllib.request.ProxyHandler({'https':'117.64.149.137:808'})

定制一个opener对象

opener = urllib.request.build_opener(proxy_support)

安装这个opener对象，以后的urlopen就一直使用这个代理地址了

urllib.request.install_opener(opener)

发出请求时，就是用到这个代理地址了

html = urllib.request.urlopen('xxxxxxxxxx').read()

方法2：

先创建代理ip对象

proxy_support = urllib.request.ProxyHandler({'https':'117.64.149.137:808'})

定制一个opener对象

opener = urllib.request.build_opener(proxy_support)

这里可以直接使用opener对象发出请求

html = opener.open('xxxxxxxxx').read()

示例代码：

import urllib.request
#这一段三句话是为了请求时带上浏览器标识，因为有的网站看到是爬虫的标识直接返回403
#请求的网站不涉及到提交数据，所以没有给出data参数
url = 'https://whatismyipaddress.com/'
header = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
req = urllib.request.Request(url,headers=header)
#使用代理和还原不使用代理的方法
#if语句相当于一个开关，不要写成True
use_proxy = urllib.request.ProxyHandler({'https':'117.64.149.137:808'})
null_proxy = urllib.request.ProxyHandler()
if True:
 opener = urllib.request.build_opener(use_proxy)
else:
 opener = urllib.request.build_opener(null_proxy)
#根据上面的开关，安装的opener对象是否带有代理地址
urllib.request.install_opener(opener)
#获取返回结果
#同时可以使用html = opener.open(req).read()获取结果
html = urllib.request.urlopen(req).read()
#这网页返回页面的内容太多，在控制台不好查看，
#并且返回的内容是二进制格式，可以直接写入文件，当个网页查看
with open('E:whatismyip.html','wb') as file:
 file.write(html)
 print('OK')

相关免费学习推荐：python教程（视频）

热心网友时间：2022-04-06 01:11

#coding:utf-8
import sys,re,random,time,os
import socket
from socket import error as socket_error
import threading
import urllib2,cookielib
from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.webdriver.common.proxy import *
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

proxyFilePath = time.strftime("%Y%m%d")

def testSocket(ip, port):
'''
socket连接测试,用来检测proxy ip,port 是否可以正常连接
'''
print '正在测试socket连接...'
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.settimeout(10)
sock.connect((ip, int(port)))
#sock.send('meta')
sock.close()
print ip+':'+port+'--status:ok'
return 1
except socket_error as serr: # connection error
sock.close()
print ip+':'+port+'--status:error--Connection refused.'
return 0

def getDriver(httpProxy = '', type='Firefox'):
if type == 'Firefox':
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': httpProxy,
'ftpProxy': httpProxy,
'sslProxy': httpProxy,
'noProxy': '' # set this value as desired
})
firefox_profile = FirefoxProfile()
#firefox_profile.add_extension("firefox_extensions/adblock_plus-2.5.1-sm+tb+an+fx.xpi")
firefox_profile.add_extension("firefox_extensions/webdriver_element_locator-1.rev312-fx.xpi")
firefox_profile.set_preference("browser.download.folderList",2)
firefox_profile.set_preference("webdriver.load.strategy", "unstable")
#driver = webdriver.Firefox(firefox_profile = firefox_profile, proxy=proxy, firefox_binary=FirefoxBinary('/usr/bin/firefox'))
#driver = webdriver.Firefox(firefox_profile = firefox_profile, proxy=proxy, firefox_binary=FirefoxBinary("/cygdrive/c/Program\ Files\ (x86)/Mozilla\ Firefox/firefox.exe"))
driver = webdriver.Firefox(firefox_profile = firefox_profile, proxy=proxy)
elif type == 'PhantomJS': # PhantomJS
service_args = [
'--proxy='+httpProxy,
'--proxy-type=http',
]
webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.User-Agent'] = 'Mozilla/5.0 (X11; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36'
driver = webdriver.PhantomJS(executable_path='windows/phantomjs.exe', service_args=service_args)
else: # Chrome
chrome_options = webdriver.ChromeOptions()
#chrome_options.add_extension('firefox_extensions/adblockplus_1_7_4.crx')
chrome_options.add_argument('--proxy-server=%s' % httpProxy)
driver = webdriver.Chrome(executable_path='windows/chromedriver.exe', chrome_options=chrome_options)
return driver

声明：本网页内容为用户发布，旨在传播知识，不代表本网认同其观点，若有侵权等问题请及时与本网联系，我们将在第一时间删除处理。
E-MAIL:11247931@qq.com

焦点

python下 selenium与chrome结合进行网页爬取,怎么设置代理IP

最新推荐

猜你喜欢

热门推荐