1.在$NUTCH_HOME/conf/nutch-site.xml中添加
<property> <name>storage.data.store.class</name> <value>org.apache.gora.hbase.store.HBaseStore</value> </property>
在$NUTCH_HOME/ivy/ivy.xml中添加
<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" /> <dependency org="org.apache.hbase" name="hbase-common" rev="0.98.9-hadoop2" conf="*->default" />
在$NUTCH_HOME/conf/gora.properties中添加
1 |
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore |
2.执行指令
1 |
ant runtime |
编译过程中会自动下载依赖包。编译成功后会生成目录$NUTCH_HOME/runtime/
在$NUTCH_HOME/runtime/local/conf/nutch-site.xml中添加:
<property> <name>storage.data.store.class</name> <value>org.apache.gora.hbase.store.HBaseStore</value> </property> <property> <name>http.agent.name</name> <value>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36</value> </property> <property> <name>http.accept.language</name> <value>zh-CN,zh;q=0.8,en;q=0.6</value> </property> <property> <name>parser.character.encoding.default</name> <value>utf-8</value> </property>
3.测试
创建目录urls,在urls下创建文件seed.txt,并添加内容http://www.sina.com.cn
执行指令
1 |
nutch inject urls |
登陆到hbase上查看:
楼下是疯子。哈哈