java - Port Lucene 3.6.2 Analyzer to Lucene 5.5.0 -


for lucene 3.6.2 have following analyzer:

public final class standardanalyzerv36 extends analyzer {      private analyzer analyzer;      public standardanalyzerv36() {         analyzer = new standardanalyzer(version.lucene_36);     }      public standardanalyzerv36(set<?> stopwords) {         analyzer = new standardanalyzer(version.lucene_36, stopwords);     }      @override     public final tokenstream tokenstream(string fieldname, reader reader) {         return analyzer.tokenstream(fieldname, new htmlstripcharfilter(charreader.get(reader)));     }      @override     public final tokenstream reusabletokenstream(string fieldname, reader reader) throws ioexception {         return analyzer.reusabletokenstream(fieldname, reader);     }  } 

could please me port on analyzer lucene 5.5.0 ? analyzer interface changed in new version.

updated

i have reimplemented analyzer following:

public final class standardanalyzerv36 extends analyzer {      public static final chararrayset stop_words_set = stopanalyzer.english_stop_words_set;        @override     protected tokenstreamcomponents createcomponents(string fieldname) {          final classictokenizer src = new classictokenizer();         tokenstream tok = new standardfilter(src);         tok = new stopfilter(new lowercasefilter(tok), stop_words_set);         return new tokenstreamcomponents(src, tok);     }      @override     protected reader initreader(string fieldname, reader reader) {         return new htmlstripcharfilter(reader);     } 

but tests fails on following call:

tokens = luceneutils.tokenizestring(analyzer, "[{(rdbms)}]");  public static list<string> tokenizestring(analyzer analyzer, string string) {         list<string> result = new arraylist<string>();         try {             tokenstream stream = analyzer.tokenstream(null, new stringreader(string));             stream.reset();             while (stream.incrementtoken()) {                 result.add(stream.getattribute(chartermattribute.class).tostring());             }         } catch (ioexception e) {             // not thrown b/c we're using string reader...             throw new runtimeexception(e);         }         return result;     } 

with following exception:

java.lang.illegalstateexception: tokenstream contract violation: close() call missing     @ org.apache.lucene.analysis.tokenizer.setreader(tokenizer.java:90)     @ org.apache.lucene.analysis.analyzer$tokenstreamcomponents.setreader(analyzer.java:315)     @ org.apache.lucene.analysis.analyzer.tokenstream(analyzer.java:143) 

what wrong code ?

finally got working:

public final class standardanalyzerv36 extends analyzer {      public static final chararrayset stop_words_set = stopanalyzer.english_stop_words_set;        @override     protected tokenstreamcomponents createcomponents(string fieldname) {          final classictokenizer src = new classictokenizer();         tokenstream tok = new standardfilter(src);         tok = new stopfilter(new lowercasefilter(tok), stop_words_set);          return new tokenstreamcomponents(src, tok);     }      @override     protected reader initreader(string fieldname, reader reader) {         return new htmlstripcharfilter(reader);     } }  public class luceneutils {      public static list<string> tokenizestring(analyzer analyzer, string string) {         list<string> result = new arraylist<string>();         tokenstream stream = null;         try {             stream = analyzer.tokenstream(null, new stringreader(string));             stream.reset();             while (stream.incrementtoken()) {                 result.add(stream.getattribute(chartermattribute.class).tostring());             }         } catch (ioexception e) {             // not thrown b/c we're using string reader...             throw new runtimeexception(e);         } {             ioutils.closequietly(stream);         }         return result;     } } 

Comments

Popular posts from this blog

PySide and Qt Properties: Connecting signals from Python to QML -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

scala - 'wrong top statement declaration' when using slick in IntelliJ -