photo

SCM

shared this idea
3 years ago

Employees Involved

photo

SCM

Admin

Statistics

7
Comments
1
Views

Share

659
votes

Allow BING or BING Cache or Google/Google Cache selection for custom sources

Allow fine tune control of how SCM searches for content when using a custom source.

tools like GSA SER all the time scrapes google and it works great.I think for success in scraping google we need more options to set up like:

a) time between queries (ex 30 seconds),b) proxies (its already done) c) choose search engine (country)

if we got banned we change proxy but SCM could automaticly change proxy during processing:SCM scrape google with proxy1 then it could use next proxy for next query and it could also pause for some seconds after each query OR after used the same proxy.

lets say we have 2 proxies, SCM could process like this:

query1 - use proxy1query2 - use proxy2pause for specific time - like 30 seconds between query 1 and 3query3 - use proxy1pause for specific time - like 30 seconds between query 2 and 4query4 - use proxy2etc.

Official Answer
photo Employee
SCM Posted 2 years ago

4.0.8.2

+ Time between queries is randomized 20-30secs

+ Proxies are supported in custom site google searches

- Google cache has not been added and will be a new feature request

http://vote.seocontentmachine.com/responses/google-cache-scraping

Add Comment

Comments (7)

photo
189

Not replace, but give a choice

photo Employee
236

Mirosław Lasok wrote:

Not replace, but give a choice
Vote item edited!

photo
163

Example of using google as searching for articles:

Query: site:ezinearticles.com birthday

Bing results: 26

Google results: 53 700

Webcrawler.com results: ~740

And google give most accurate results so the articles will be much better!

photo Employee
150

Mirek wrote:

Example of using google as searching for articles:

Query: site:ezinearticles.com birthday

Bing results: 26

Google results: 53 700

Webcrawler.com results: ~740

And google give most accurate results so the articles will be much better!

Not entirely accurate, as SCM limits all bing results to 25.

If you run "site:ezinearticles.com birthday" in Bing it gives full result set.

Finally, BING is used because ezinearticles.com bans scrapers, but the BING cache is used instead.

photo
125

SCM wrote:

Mirek wrote:

Example of using google as searching for articles:

Query: site:ezinearticles.com birthday

Bing results: 26

Google results: 53 700

Webcrawler.com results: ~740

And google give most accurate results so the articles will be much better!

Not entirely accurate, as SCM limits all bing results to 25.

If you run "site:ezinearticles.com birthday" in Bing it gives full result set.

Finally, BING is used because ezinearticles.com bans scrapers, but the BING cache is used instead.

Well Google has cache as well :) Also gives more accurate results for the chosen keywords. Google definitely has to become an option to use, even with proxies if needed, but it does belong there.

People are comparing this feature with rival software and saying google scraping works much better. I don't want to hear anyone ever comparing other software with SCM! It is the top of the line software and it should stay that way!

photo Employee
133

Goce Ristov wrote:

SCM wrote:

Mirek wrote:

Example of using google as searching for articles:

Query: site:ezinearticles.com birthday

Bing results: 26

Google results: 53 700

Webcrawler.com results: ~740

And google give most accurate results so the articles will be much better!

Not entirely accurate, as SCM limits all bing results to 25.

If you run "site:ezinearticles.com birthday" in Bing it gives full result set.

Finally, BING is used because ezinearticles.com bans scrapers, but the BING cache is used instead.

Well Google has cache as well :) Also gives more accurate results for the chosen keywords. Google definitely has to become an option to use, even with proxies if needed, but it does belong there.

People are comparing this feature with rival software and saying google scraping works much better. I don't want to hear anyone ever comparing other software with SCM! It is the top of the line software and it should stay that way!

Ok I will rush this update out, I just thought of a way we can get Google + proxies, but user be warned, Google bans are most likely to happen.

A temporary work around is...

Set search to google, then in the search type in site:ezinearticles.com my keyword.

This is will actually do a Google search for your keyword for that site.

photo
114

Don't want the headache with proxies. Getting google passed proxies is not easy job today. Spending more money after private proxies is not possible for users. If there is any way to scrape google without proxy I will go with that.... may be a long pause is a better option.

Leave Comment

photo