Nutch学习1:安装

1.在$NUTCH_HOME/conf/nutch-site.xml中添加

<property>
    <name>storage.data.store.class</name>
    <value>org.apache.gora.hbase.store.HBaseStore</value>
</property>

在$NUTCH_HOME/ivy/ivy.xml中添加

<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" />
<dependency org="org.apache.hbase" name="hbase-common" rev="0.98.9-hadoop2" conf="*->default" />

在$NUTCH_HOME/conf/gora.properties中添加

2.执行指令

编译过程中会自动下载依赖包。编译成功后会生成目录$NUTCH_HOME/runtime/
在$NUTCH_HOME/runtime/local/conf/nutch-site.xml中添加:

<property>
    <name>storage.data.store.class</name>
    <value>org.apache.gora.hbase.store.HBaseStore</value>
</property>
<property>
    <name>http.agent.name</name>
    <value>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36</value>
</property>
<property>
    <name>http.accept.language</name>
    <value>zh-CN,zh;q=0.8,en;q=0.6</value>
</property>
<property>
    <name>parser.character.encoding.default</name>
    <value>utf-8</value>
</property>

3.测试
创建目录urls,在urls下创建文件seed.txt,并添加内容http://www.sina.com.cn
执行指令

登陆到hbase上查看:

nutch

1 Reply to “Nutch学习1:安装”

发表评论