photo

Maxim

shared this idea
2 months ago

Employees Involved

photo

SCM

Admin

Statistics

4
Comments
1
Views

Share

Tags

23
votes

Search & Replace in "Custom Sites" and in "Article Downloader"

Example, we have artcile with listing in text

  1. La la la

  2. <ul>
  3. <li>text bla bla 1</li>
  4. <li>text bla bla 2</li>
  5. <li>text bla bla 3</li>
  6. </ul>
  7. Ow ow ow

If we scraping this article, we get

  1. La la la
  2. text bla bla 1 text bla bla 2 text bla bla 3
  3. Ow ow ow
Its no good.

Need ad "Search and Replace"

We ad:

  1. <li> to *
  2. </li> to \n

And result we have

  1. La la la
  2. * text bla bla 1
  3. * text bla bla 2
  4. * text bla bla 3
  5. Ow ow ow
Sorry for bad English.

Under Consideration
+1 I like this idea
Official Answer
photo Employee
SCM Posted 2 months ago

You need to scrape as HTML,

Then after that perform your transformations with find/replace

I will see if there is an easy way to do this in the article downloader.

Add Comment

Comments (4)

photo
13

But. If i use 50 sites for grab articles.

I need to open each file, find out what site was this article, and for each article to replace.

Example Article 1

  1. First open file with article

  2. Second Need replace <ul class="bla1">, <li class="list">, <font data-title="bla">

  3. And clean manual from other html tags

Example Article 2

  1. First open file with article

  2. Second Need replace <ul style="color: red" id="b1">, <li>, <br>

  3. And clean manual from other html tags

And to do so for each article.

If articles 50 or 250?

I spend on it too much time.

PS. Sorry for bad English.

photo Employee
15

I will have to add a search and replace module for you in the article downloader.

photo
8

Hello.

When you add "Search & Replace" in "Article Downloader"

???

photo Employee
7

Maxim wrote:

Hello.

When you add "Search & Replace" in "Article Downloader"

???

Its on the to do list.

Have you thought about scraping the article as HTML, that way you get the proper formatting?

Leave Comment

photo