October 25, 2012 at 4:54 pm #1315
But what I really don’t understand is this:
the work done by 1 VPN is equal to how many private proxies?
10, 20, 50 ?
because if 1 VPN could do what 20 or 50 semi-dedicated proxies do, then it would be cheaper 😛
NicOctober 25, 2012 at 10:47 pm #1316
You would actually need both. The purpose of the VPN is because you probably don’t have enough computers / computing power at your home to run all the software. For example, Matt W. says that he spreads the building out of Tier 1 over 2 weeks. This means that the content is being built a little at a time, automatically for 2 weeks 24/7. If you did this on a home machine…you would bascially need a dedicated machine running 24/7 for the whole two weeks. Matt W. has said before that this essentially what he has (two dedicated machines). If you were trying to do this on your main computer then you could never turn it off…and it would run slow as it is running all this software which would make it difficult for you to do all the other tasks you normally do on your main machine.
In addition, you would need enough bandwidth on your internet connection to handle all the traffic your dedicated machines is putting out there. Since having a dedicated computer and enough bandwidth is expensive, a solution is to have a VPN, which is essentially a remotely located PC that you rent monthly. It is always on and always running and connected to the internet with a lot of bandwidth. You load all of your software on that remote PC..and you run all of the software from there.
The private proxies serve a different purpose. They ensure that your IP address doesn’t get banned by Google, or by the Web 2.0 sites to which you’ll be posting. You’ll need these in addition to your VPN (or dedicated computer if you choose to go that route). Even though your VPN will have a lot of bandwidth..it will still only be running one IP address, and if you run all of your software on the VPN with only one IP address, THAT IP address will get banned in a hurry. To keep that from happening you use the private proxies in all the software whether it is running on an VPN or on a dedicated machine in your home.October 26, 2012 at 8:07 am #1323
Countrystarr has got the terms VPN and VPS mixed up. He is talking about a VPS in his post above so ignore that part.
Both VPN’s and Proxies will allow you to hide your IP address – but really you need to be using proxies directly within software so they have the control of rotating the IP address as/when they need to.
Also if you use multi thraded software, proxies would allow you to run lots of threads/connections under different IP’s where as a VPN would only ever be able to work under 1 IP at any one time.
If your doing SEO link/building, you want proxies.October 26, 2012 at 10:00 am #1330
Thanks Matt and Countrystarr,
I understand better nowOctober 26, 2012 at 10:38 am #1332
What about using Proxygoblin (a proxy scraping software) instead of purchasing proxies, anyone has tried it?October 26, 2012 at 6:56 pm #1347
What about using Proxygoblin (a proxy scraping software) instead of purchasing proxies, anyone has tried it?
Yes that is ok but that only scrapes and test public proxies which get hammered by everyone.
If your going to be scraping Google or creating accounts on websites you need semi-dedicated private proxies as a minimum but please ensure you avoid SquidProxies I had consistent issues with them for a long time.October 26, 2012 at 7:20 pm #1353
OOPS! Yes…I did mix up the two! Thanks for clarifying Matt.
So…just to clarify, you use the semi-dedicated proxies at buyproxies.org for scraping?
Thanks!October 28, 2012 at 5:42 pm #1360January 30, 2013 at 11:18 pm #2547
@Matthew – You stated above:
“If your going to be scraping Google or creating accounts on websites you need semi-dedicated private proxies”
You advise the use of semi-dedicated proxies when creating accounts. Is this because Web2.0 sites log IP addresses and, if they notice you created an account using an IP that later gets onto a spam blacklist, they might decide you’re a spammer and delete the account?
If that’s the logic, I follow and understand.
However, I don’t fully understand the need for private proxies when scraping Google; surely public proxies would do in this case? When scraping Google, we just need to ensure that we are not blocked from completing the scraping task. As long as Google has not yet banned the IP of the proxy I’m using when attempting to scrape google.com then I can complete the task I’m trying to perform. If that IP is subsequently banned by Google because of spam/scrapers, it doesn’t bother me because I have already done my scraping.
Do you think private / semi dedicated proxies are needed for scraping Google (and perhaps other sites, too)? If so, please explain why public proxies should be avoided.January 31, 2013 at 9:51 am #2554
Yes that is the logic
Yes you can use public proxies to scrape Google, however it is much slower & scrapebox often stops scraping half way through.
Where as you load it with some semi-deciateds, you can turn up the threads and it will harvest all day everyday and never stop until it runs out of keywords.
For the sake of the cost of the proxies compared to the amount of time I have to spend to harvest public proxies and restarting scrapes it just doesn’t make sense.January 31, 2013 at 9:31 pm #2560
Right, thanks for the explanation. I must admit I do get fed up of harvesting public proxies. I’ve been using the automator addon for Scrapebox, which helps, but I think it’s time to invest in some semi-dedis.
I had been avoiding using anything other than public proxies because I had assumed that my own activities would result in the IPs being banned after I’d been using them for scraping tasks for a while. It seems I have nothing to worry about on that front. I suppose it makes sense because I can never hope to match the level of spammy delight that public proxies are subjected to, even if I leave my software running 24/7?
Do you have any rules of thumb for the ratio of number of threads to number of proxies when scraping Google?
UPDATE: I grabbed 10 semi-dedis from buyproxies using your link posted above. I’ve been running Scrapebox with them for the last hour or so with 25 connections (seems to work out as 11 to Google and 14 to Bing at the moment). So far so good – I’ve scraped over 80,000 URLs and that number is going up at the rate of approx 24 urls/s, so 10 semi-dedis and 25 connections seems fine so far.February 1, 2013 at 9:50 am #2563
Can Anyone Explain This?
I’ve left Scrapebox harvesting overnight and now have a list of nearly 380,000 URLs; it is still harvesting. The URL/s metric quoted by scrapebox for Google had dropped off this morning when I checked, so I wondered whether Google had banned my proxy IPs.
So I checked this by manually running a query on Google.com in Firefox using each of my proxies. Every time I ran a query, I had to enter a captcha, indicating the proxy was blocked. However, scrapebox continues to chug along at a rate of around 6URLs/s being scraped from google.
It looks as though my manual search queries using Firefox are blocked, but for some reason Scrapebox (using the very same proxies) can carry on scraping. I don’t understand how this is possible.
Has anyone else seen this before, or can you offer a potential explanation?February 4, 2013 at 12:22 pm #2590
Glad to see you took the jump and are seeing the benefits of private proxies already
It is likely that not every single proxy is blocked, they got blocked/unblocked all the time and will all be on different timers.February 5, 2013 at 2:52 pm #2613
Just carrying on with the theme of proxies, a couple of further questions.
Does the software that you suggest we use rotate to proxies if it finds one blocked?
Once a proxy is blocked do you have to write it off and plan on getting another set of proxies?
If a proxy does get unblocked how long on average does it take for this to happen?
In my mind I have this idea of buying 20 proxies but only uploading 10 and letting them run until they are no good, then deleting those and uploading another 10. I suppose the answers to the questions above will define if this is a sensible idea or not.
IanFebruary 7, 2013 at 9:16 am #2636
1. Which software are you talking about? For the most part they do
2. No just wait until its unblocked
3. Depends on the service, if its Google like 15-20 minutes sometimes longer
Just use all 20 proxies at once, all of the questions above will become irrelevant and you’ll never think about it.
You must be logged in to reply to this topic.