photo

TheGypsy

shared this question
1 year ago

Employees Involved

photo

SCM

Admin

Statistics

1
Comments
1
Views

Share

Tags

1
votes

Is it possible somehow to scrape only specific content from pages?

I would like to get the content from a div omitting everything else from the page:

<p>some CONTENT outside div</p>

<h1>some CONTENT outside div</h1>

<div class="container">

<p>CONTENT1</p>

<h1>CONTENT2</h1>

<h2>CONTENT3</h2>

</div>

I would like to get:

CONTENT1

CONTENT2

CONTENT3

Is this possible with SCM?

Thanks!

Official Answer
photo Employee
SCM Posted 1 year ago

When you add it as custom source, there is xpath box. Inside put in //div

That way SCM will only take content inside the div tag.

https://seocontentmachine.com/how-to-scrape-content-from-any-website-using-css-or-xpath/

Add Comment

Comments (1)

photo
1

That's awesome! You have just saved me from learning Python. Again.

Leave Comment

photo