java - Port Lucene 3.6.2 Analyzer to Lucene 5.5.0 -
for lucene 3.6.2
have following analyzer:
public final class standardanalyzerv36 extends analyzer { private analyzer analyzer; public standardanalyzerv36() { analyzer = new standardanalyzer(version.lucene_36); } public standardanalyzerv36(set<?> stopwords) { analyzer = new standardanalyzer(version.lucene_36, stopwords); } @override public final tokenstream tokenstream(string fieldname, reader reader) { return analyzer.tokenstream(fieldname, new htmlstripcharfilter(charreader.get(reader))); } @override public final tokenstream reusabletokenstream(string fieldname, reader reader) throws ioexception { return analyzer.reusabletokenstream(fieldname, reader); } }
could please me port on analyzer lucene 5.5.0
? analyzer interface changed in new version.
updated
i have reimplemented analyzer following:
public final class standardanalyzerv36 extends analyzer { public static final chararrayset stop_words_set = stopanalyzer.english_stop_words_set; @override protected tokenstreamcomponents createcomponents(string fieldname) { final classictokenizer src = new classictokenizer(); tokenstream tok = new standardfilter(src); tok = new stopfilter(new lowercasefilter(tok), stop_words_set); return new tokenstreamcomponents(src, tok); } @override protected reader initreader(string fieldname, reader reader) { return new htmlstripcharfilter(reader); }
but tests fails on following call:
tokens = luceneutils.tokenizestring(analyzer, "[{(rdbms)}]"); public static list<string> tokenizestring(analyzer analyzer, string string) { list<string> result = new arraylist<string>(); try { tokenstream stream = analyzer.tokenstream(null, new stringreader(string)); stream.reset(); while (stream.incrementtoken()) { result.add(stream.getattribute(chartermattribute.class).tostring()); } } catch (ioexception e) { // not thrown b/c we're using string reader... throw new runtimeexception(e); } return result; }
with following exception:
java.lang.illegalstateexception: tokenstream contract violation: close() call missing @ org.apache.lucene.analysis.tokenizer.setreader(tokenizer.java:90) @ org.apache.lucene.analysis.analyzer$tokenstreamcomponents.setreader(analyzer.java:315) @ org.apache.lucene.analysis.analyzer.tokenstream(analyzer.java:143)
what wrong code ?
finally got working:
public final class standardanalyzerv36 extends analyzer { public static final chararrayset stop_words_set = stopanalyzer.english_stop_words_set; @override protected tokenstreamcomponents createcomponents(string fieldname) { final classictokenizer src = new classictokenizer(); tokenstream tok = new standardfilter(src); tok = new stopfilter(new lowercasefilter(tok), stop_words_set); return new tokenstreamcomponents(src, tok); } @override protected reader initreader(string fieldname, reader reader) { return new htmlstripcharfilter(reader); } } public class luceneutils { public static list<string> tokenizestring(analyzer analyzer, string string) { list<string> result = new arraylist<string>(); tokenstream stream = null; try { stream = analyzer.tokenstream(null, new stringreader(string)); stream.reset(); while (stream.incrementtoken()) { result.add(stream.getattribute(chartermattribute.class).tostring()); } } catch (ioexception e) { // not thrown b/c we're using string reader... throw new runtimeexception(e); } { ioutils.closequietly(stream); } return result; } }
Comments
Post a Comment