Solr has different kinds of transformers which are used while importing data to the search engine. For the whole list of built in transformers, you can take a look at transformer wiki
For the focus of this post, we will talk about RegexTransformer which uses patterns to processes the incoming data. It is quite useful when you need to process the data before indexing, but it has a shortcoming, multivalued fields are not supported.
For instance, assuming that you have a field for storing document's language. For single values, it is pretty straightforward. But what if we need multiple values for this field? You might have some serialized format in yout db, and you may need to extract the language related info. For instance, you may have sth like
t:9:{u:5;s:7:"fr";k:1;y:2:"jp";k:2;y:2:"bg";}
With the build in transformer, you can get "fr,jp,bg" into a single field. But what if you need to put each of these fields into a multiValued field?
Solution is extending RegexTransformer to support it ;) You can find the code of the extension
Once you get the code and build the jar, simply add the below property to your entity tag in your import script:
transformer="com.hcetavaj.MultiValuedRegex"
As this is an extension to RegexTransformer, all of the properties of it support. To enable multiValued field support, simply add the below property to your field mapping:
multiRegex="pattern comes here"
Then simply start importing and watch ;)
Congratulations @stephanruhl! You received a personal award!
Click here to view your Board
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Congratulations @stephanruhl! You received a personal award!
You can view your badges on your Steem Board and compare to others on the Steem Ranking
Vote for @Steemitboard as a witness to get one more award and increased upvotes!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit