python爬取代理ip_python中进行爬虫抓取怎么样能够使用代理IP

‘壹’ python爬虫爬取豆瓣影评返回403怎么办，代理IP和cookie都设置了

如果只是爬取影评的话，没必要登录。
返回的304是你的cookie用的是旧的。
去掉cookie，正常抓取就可以了。

‘贰’ python下 selenium与chrome结合进行网页爬取，怎么设置代理IP

#coding:utf-8
import sys,re,random,time,os
import socket
from socket import error as socket_error
import threading
import urllib2,cookielib
from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.webdriver.common.proxy import *
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

proxyFilePath = time.strftime("%Y%m%d")

def testSocket(ip, port):
'''
socket连接测试,用来检测proxy ip,port 是否可以正常连接
'''
print '正在测试socket连接...'
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.settimeout(10)
sock.connect((ip, int(port)))
#sock.send('meta')
sock.close()
print ip+':'+port+'--status:ok'
return 1
except socket_error as serr: # connection error
sock.close()
print ip+':'+port+'--status:error--Connection refused.'
return 0

def getDriver(httpProxy = '', type='Firefox'):
if type == 'Firefox':
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': httpProxy,
'ftpProxy': httpProxy,
'sslProxy': httpProxy,
'noProxy': '' # set this value as desired
})
firefox_profile = FirefoxProfile()
#firefox_profile.add_extension("firefox_extensions/adblock_plus-2.5.1-sm+tb+an+fx.xpi")
firefox_profile.add_extension("firefox_extensions/webdriver_element_locator-1.rev312-fx.xpi")
firefox_profile.set_preference("browser.download.folderList",2)
firefox_profile.set_preference("webdriver.load.strategy", "unstable")
#driver = webdriver.Firefox(firefox_profile = firefox_profile, proxy=proxy, firefox_binary=FirefoxBinary('/usr/bin/firefox'))
#driver = webdriver.Firefox(firefox_profile = firefox_profile, proxy=proxy, firefox_binary=FirefoxBinary("/cygdrive/c/Program\ Files\ (x86)/Mozilla\ Firefox/firefox.exe"))
driver = webdriver.Firefox(firefox_profile = firefox_profile, proxy=proxy)
elif type == 'PhantomJS': # PhantomJS
service_args = [
'--proxy='+httpProxy,
'--proxy-type=http',
]
webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.User-Agent'] = 'Mozilla/5.0 (X11; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36'
driver = webdriver.PhantomJS(executable_path='windows/phantomjs.exe', service_args=service_args)
else: # Chrome
chrome_options = webdriver.ChromeOptions()
#chrome_options.add_extension('firefox_extensions/adblockplus_1_7_4.crx')
chrome_options.add_argument('--proxy-server=%s' % httpProxy)
driver = webdriver.Chrome(executable_path='windows/chromedriver.exe', chrome_options=chrome_options)
return driver

‘叁’ python 爬虫 ip池怎么做

自己做个代理服务器。再指向次一级代理。或者是直接让爬虫通过http proxy的参数设置去先把一个代理。代理池通常是租来的，或者是扫描出来的。扫描出来的往往大部分都不可用。爬虫的实现有几百种方案。通常建议直接从SCRAPY入手。

‘肆’ python中，进行爬虫抓取怎么样能够使用代理IP

在python中用爬虫再用到代理服务器，有两个办法，①直接在布署该python爬虫的电脑上设置代理服务器，这样从该电脑上出站的信息就只能由代理服务器处理了，爬虫的也不例外，可以搜"windows设置代理服务器"、"Linux设置代理服务器"。通常是”设置->网络->连接->代理“。
②若想让python单独使用这个代理服务器，可以搜一下"python proxy config"，"python配置代理服务器"，有一些库支持简单的BM代理服务器连接。

‘伍’ 学习python爬虫IP被限制怎么办

解决爬虫ip限制问题，可以使用芝麻代理ip来突破ip限制。

‘陆’ python爬虫怎么设置HTTP代理服务器

解决的方法很简单，就是使用代理服务器。
使用代理服务器去爬取某个网站的内容的时候，在对方的网站上，显示的不是我们真实的IP地址，而是代理服务器的IP地址。并且在Python爬虫中，使用代理服务器设置起来也很简单。

‘柒’ 如何使用Python实现爬虫代理IP池

第一步：找IP资源

IP资源并不丰富，换句话说是供不应求的，因此一般是使用动态IP。

免费方法，直接在网络上找，在搜索引擎中一搜索特别多能够提供IP资源的网站，进行采集即可。

付费方法，通过购买芝麻ip上的IP资源，并进行提取，搭建IP池。

‘捌’ 代理IP对于Python爬虫有多重要

额~我使用代理IP做爬虫这么久，还没遇到这个问题哎，是不是因为你使用的代理IP可用率不太高导致的啊，或者是你的代理IP实际上并不是高匿的啊，网站根据某些规律找到你的本机IP了。我一直用的是 618IP代理 HTTP，没遇到什么问题，觉得爬取速度很快，也很稳定。建议你用排除法去排除可能导致的原因，快点解决问题

‘玖’ python爬虫怎么抓取代理服务器

如果你下面那个可以使用个，你就都加上代理就是了，应该是有的网站限制了爬虫的头部数据。虽然你可以通过urlopen返回的数据判断，但是不建议做，增加成本。如果解决了您的问题请采纳！如果未解决请继续追问

‘拾’ python爬虫如何自己用云服务器上搭建代理服务器并使用requests测试代理

1、简介
使用同一个ip频繁爬取一个网站，久了之后会被该网站的服务器屏蔽。所以这个时候需要使用代理服务器。通过ip欺骗的方式去爬取网站

可以使用http://yum.iqianyue.com.com/proxy中找到很多服务器代理地址

2、应用
# *-*coding:utf-8*-*
from urllib import request
def use_porxy(porxy_addr,url):
porxy = request.ProxyHandler({'http':porxy_addr})
opener = request.build_opener(porxy, request.ProxyHandler)
request.install_opener(opener)
data = request.urlopen(url).read().decode('utf-8')
return data
data = use_porxy("114.115.182.59:128","http://www..com")
print(len(data))

导航:首页 > 编程语言 > python爬取代理ip

python爬取代理ip

与python爬取代理ip相关的资料