Skip to content

Commit 0b51037

Browse files
committed
Merge remote-tracking branch 'origin/master'
2 parents e4b8f45 + 9771cd6 commit 0b51037

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ arg | Long | Description
6666
### As Extractor:
6767
To just extract a single webpage to terminal:
6868

69-
```
69+
```shell
7070
$ python torcrawl.py -u http://www.github.com
7171
<!DOCTYPE html>
7272
...
@@ -75,21 +75,21 @@ $ python torcrawl.py -u http://www.github.com
7575

7676
Extract into a file (github.htm) without the use of TOR:
7777

78-
```
78+
```shell
7979
$ python torcrawl.py -w -u http://www.github.com -o github.htm
8080
## File created on /script/path/github.htm
8181
```
8282

8383
Extract to terminal and find only the line with google-analytics:
8484

85-
```
85+
```shell
8686
$ python torcrawl.py -u http://www.github.com | grep 'google-analytics'
8787
<meta name="google-analytics" content="UA-*******-*">
8888
```
8989

9090
Extract a set of webpages (imported from file) to terminal:
9191

92-
```
92+
```shell
9393
$ python torcrawl.py -i links.txt
9494
...
9595
```
@@ -99,7 +99,7 @@ $ python torcrawl.py -i links.txt
9999
Crawl the links of the webpage without the use of TOR,
100100
also show verbose output (really helpfull):
101101

102-
```
102+
```shell
103103
$ python torcrawl.py -v -w -u http://www.github.com/ -c
104104
## URL: http://www.github.com/
105105
## Your IP: *.*.*.*
@@ -110,7 +110,7 @@ $ python torcrawl.py -v -w -u http://www.github.com/ -c
110110

111111
Crawl the webpage with depth 2 (2 clicks) and 5 seconds waiting before crawl the next page:
112112

113-
```
113+
```shell
114114
$ python torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 5
115115
## TOR is ready!
116116
## URL: http://www.github.com/
@@ -123,7 +123,7 @@ $ python torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 5
123123
### As Both:
124124
You can crawl a page and also extract the webpages into a folder with a single command:
125125

126-
```
126+
```shell
127127
$ python torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 5 -e
128128
## TOR is ready!
129129
## URL: http://www.github.com/
@@ -136,9 +136,9 @@ $ python torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 5 -e
136136
```
137137
***Note:*** *The default (and only for now) file for crawler's links is the `links.txt` document. Also, to extract right after the crawl you have to give `-e` argument*
138138

139-
With the same logic you can parse all these pages to grep (for example) and search for a specific text:
139+
Following the same logic; you can parse all these pages to grep (for example) and search for specific text:
140140

141-
```
141+
```shell
142142
$ python torcrawl.py -u http://www.github.com/ -c -e | grep '</html>'
143143
</html>
144144
</html>
@@ -149,7 +149,7 @@ $ python torcrawl.py -u http://www.github.com/ -c -e | grep '</html>'
149149
![peek 2018-12-08 16-11](https://user-images.githubusercontent.com/9204902/49687660-f72f8280-fb0e-11e8-981e-1bbeeac398cc.gif)
150150

151151
## Contributors:
152-
Feel free to contribute on this project! Just fork it, make any change on your fork and add a pull request on current branch! Any advice, help or questions will be great for me :)
152+
Feel free to contribute on this project! Just fork it, make any change on your fork and add a pull request on current branch! Any advice, help or questions would be appreciated :shipit:
153153

154154
## License:
155155
“GPL” stands for “General Public License”. Using the GNU GPL will require that all the released improved versions be free software. [source & more](https://www.gnu.org/licenses/gpl-faq.html)

0 commit comments

Comments
 (0)