Sackgesicht VIP
Total posts: 1,636
20 Окт 2013 22:28

As of now, the auto detection seems to work like this:

If 1 search word is entered the mode is "%like%" and if more words are entered, Cobalt uses fulltext boolean mode.

For me this does not really provide the best possible approach.

I would suggest following scheme:

Case 1:

1 search word is entered --> example "Hotel"

The search starts with a fulltext search in boolean mode.

If no result is returned, it will search again in fulltext boolean mode now with an asterisk as wildcard operator --> "Hotel*"

If no result again, then it will shift to a "%LIKE%" search as last resort.

The search indicator ("search worning" -> Cobalt term) can have a small optional tooltip indicator, showing what search type was used to get the result.

Real world example:

I have a section with 38K records.

In Full text mode, the Query, counting the number of results, takes

0.3 ms (0 result)

1.05ms (30 results)

7.19ms (637 results)

If we follow this process the overhead against the "%LIKE%" search would be very minimal .. the average maybe around 1-3ms.

It only needs to do the count query (See also another search query optimization here )

Case 2:

2 or more search words are entered like "Hotel Garden"

As it is now, the search will return ALL records with "Hotel" AND ALL records with "Garden".

Based on normal search behaviour, (i presume) the user would expect ALL records with "Hotel" AND "Garden" in it. --> "+Hotel +Garden"

Therefore the automatic detection search should start with a boolean fulltext query like this "+Hotel +Garden".

The search text indicator box can then contain "Hotel AND Garden"

Maybe the "Auto detection" can then be renamed to "Intelligent search".

Another enhancement could be an additional (optional) icon to let the user even choose the search mode.

More to follow ...

Последние изменения: 02 Март 2014


Sackgesicht VIP
Total posts: 1,636
20 Окт 2013 22:35

Search result highlight -- optional for user

Autocomplete for a wildcard search ...


Jeff VIP
Total posts: 745
24 Окт 2013 20:36

Following this very interesting topic.

I remember the times of the Altavista search engines where there were a lot of boolean search options, which actually made things more complicated for the average end user.

I think it was Google who made searching much more intelligent and easier to use.

Just type in what you are looking for and thanks to their complex algorithms you "magically" find the best results.

The problem with human (written) language is that people are not only faulty an inconsistent in their writing, but they can find it also difficult to formulate the right question.

If I build a directory I always define sets of predefined keywords attributes in order for records to be found. These keywords should be mandatory and can be selected by check boxes, radio buttons, select lists and tags. But some things are difficult to be dissected into keywords, like titles, summaries and descriptions. That's where I think the text search field should be used for.

Global & quick search: text search

Specific search: text search + attribute search

In your example:

Find a hotel > attribute search

Find "Hotel Garden" > attribute search (hotel) + text search (Hotel Garden)

However, since you are quite an expert in optimizing queries, I'm very interested in your expertise views on this :)

Best regards,

Jeff


Sackgesicht VIP
Total posts: 1,636
24 Окт 2013 21:36

Jeff,

for most of the ordinary Cobalt cases, the default behaviour might be good enough. If you have only a couple of records in your section/type there will be no difference, in fact the %LIKE% approach might even be preferred, since it gives you also results, even if the search term is part of a word.

But with growing number, %LIKE% should not be used anymore. Especially when you have quite some description in your articles (product catalog with product description, any sort of directory where artikles have description). Cobalt went here ahead when offering the fulltext search option and changed the way to retrieve records in storing all searchable information within the fieldsdata column in the #__js_res_record table.

Cobalt gives us now 3 options - %LIKE%, Automatic and Fulltext as described above. The implementation is exact as it says, but i believe a component like Cobalt should implement it in a more "intelligent" way.

The slowest search mode is always the %LIKE% -- so why is it the standard mode for the detection when only 1 term is entered, it should first try the fulltext search . In my scenario, there is a difference of 1000 ms ...

Your example of Altavista to Google is precisely what i try to get from Cobalt. A more "intuitive" way to get results.

My suggestions above would be the first step to bring Cobalt there.

With growing directories and more information to handle, the next level for Cobalt would be the additional integration with a "Search engine" to overcome some of the problems which popped up recently.

Cobalt, as of the moment, has problems to index types with several thousand records. Not even sure how import of bigger chunks will work.

Using some filters with counting, results in a lot of queries, putting some additional load on the database.

Counting of records in categories/subcategories was always and is still a performance problem.

Because of all the different ways to use Cobalt, it has to take so many scenarios in consideration which automatically effects performance. Just see the recent optimization efforts for the published and hidden parameters.

The way, filters work --- like for the parent/child filter - as of now, it will load all records into the memory.

In my use case, it would load 59K records just to be prepared for the filter, even if i don't use the filter.

Cobalt is acting like a hybrid, trying to get the best possible result through 2 approaches. One is the relational database structure and then the "document" type in organizing the searchable content in a flat structure (#__js_res_record).

The logical consequence for me, when using bigger data, would be the usage of a real search engine for the data stored in Cobalt. Cobalt/MySQL will still be the source of all data, but at the same time the "document" structure of the record will be sent to the search engine.

Lets look at Apache Lucene/Solr or Elasticsearch as examples.

See how linkedin gives you the results and let you easily drill down (left side,) with blazing fast counting of results..

Elasticsearch would be the perfect companion for Cobalt, when using Cobalt with a bigger number of records, for comments, forum, directories... --> but it would also require more control over your hosting environment .. dedicated server(s) or cloud setup etc ..

You would get super fast geolocation search, facetted navigation, percolator, search suggestions, etc etc ...

Have to leave, maybe i will write some more later ...


Jeff VIP
Total posts: 745
24 Окт 2013 23:21

Hi Sackgesicht,

thanks for elaborating.

Your example of Altavista to Google is precisely what i try to get from Cobalt. A more "intuitive" way to get results.

My suggestions above would be the first step to bring Cobalt there.

We are on the same track. Good!

Your example of Altavista to Google is precisely what i try to get from Cobalt. A more "intuitive" way to get results.

My suggestions above would be the first step to bring Cobalt there.

With growing directories and more information to handle, the next level for Cobalt would be the additional integration with a "Search engine" to overcome some of the problems which popped up recently.

I think you are right. I have seen examples where they use Drupal for the UI and Solr to provide for the search functionality.

Your example of Altavista to Google is precisely what i try to get from Cobalt. A more "intuitive" way to get results.

My suggestions above would be the first step to bring Cobalt there.

With growing directories and more information to handle, the next level for Cobalt would be the additional integration with a "Search engine" to overcome some of the problems which popped up recently.

Elasticsearch would be the perfect companion for Cobalt, when using Cobalt with a bigger number of records, for comments, forum, directories...

Looks and sounds very exciting and might be interesting for my project on the long run...

Maybe I can convince my web hoster (SiteGround) to facilitate Elasticsearch and to throw it in their cPanel :)


Jeff VIP
Total posts: 745
24 Окт 2013 23:53

Does Elasticsearch support faceted search?


londoh VIP
Total posts: 137
25 Окт 2013 04:18

Elasticsearch would be the perfect companion for Cobalt, when using Cobalt with a bigger number of records, for comments, forum, directories... --> but it would also require more control over your hosting environment .. dedicated server(s) or cloud setup etc

I have that control thru dedicated servers, and had an exploratory look at solr.

I have no experience with solr or elasticsearch but I'd be interested in discussing this further.

fast geo search is especially appealling


londoh VIP
Total posts: 137
25 Окт 2013 04:32

I find there is an elasticsearch joomla extension on the JED

JES

no J3 version but I saw note there is a gihub branch for J3 so should be possible

It needs a plugin, but there is fairly detailed dev instructions so I expect thats also possible.

If I find time this weekend I'll see if I can have a go at getting it working.


Jeff VIP
Total posts: 745
25 Окт 2013 04:56

ETH000

I find there is an elasticsearch joomla extension on the JED

JES

Please check your link


Sackgesicht VIP
Total posts: 1,636
25 Окт 2013 05:01

Jeff,

since elasticsearch does not offer authentication, it might not be accessible in a shared hosting environment through a control panel.

I installled it here on a Mac OSX server via homebrew. Very easy .. ready to go within 2 minutes ...

Check this slideshow here ..

compare it to the Cobalt structure Section-Type/multitype against Index/multi-index-Type/multitype

Cobalt : Section-Type-ArticleID

Elasticsearch : Index-Type-ID

Cobalt uses internally JSON -- all field info is stored in #__js_res_record -> fields column in JSON format

For elastic search, you use JSON -- Data in = JSON; data out = JSON

For an Enterprise Cobalt - elasticsearch would be a perfect fit ...

A core support by Cobalt would be a priority on my Cobalt9 .. no, even on my Cobalt 8 wishlist ..

Example, if a section will have elasticsearch support, no need to store the content in the fieldsdata column anymore .. the fulltext index in MySQL would be relieved ... filter data goes into ES as well ... --> facetted search/navigation

If we change the filter or searchable settings of a field , we just create a new index in ES by exporting from our trusted Cobalt source ...

If I find time this weekend I'll see if I can have a go at getting it working.

That sounds good .. I hope you will find time --- Solr and elasticsearch are not new in the Drupal world , but here in Joomla/CCK, i am not aware of any attempts to integrate them ...

PHP client Elastica

The developer link to JES


Sackgesicht VIP
Total posts: 1,636
25 Окт 2013 13:53

Elasticsearch articles:

Completion Suggester

Organizing Filters


londoh VIP
Total posts: 137
25 Окт 2013 14:06

I'm just reading some of the elasticsearch site.

On a Friday evening too... I really should get out more!

Just now my main question is solr or elasticsearch?

I have only limited experience with solr so dont have much preference either way

I read elasticsearch v solr

It seems at the level I need there is little difference.

So mainly it comes down to how easy it is to hook it into J! and Cobalt

The JSolr package on github looks useful and very up to date

So Im kinda minded to lean towards solr on that basis.

But the JES info is well laid out.

Do you have any thoughts on which to package to go for?


Sackgesicht VIP
Total posts: 1,636
25 Окт 2013 14:36

On a Friday evening too... I really should get out more!

On a Saturday morning ... I should sleep more ... :D

On a Friday evening too... I really should get out more!

Just now my main question is solr or elasticsearch?

Hmmm ... hard to tell ...

For me, after reading quite some articles, i developed a preference for elasticsearch.

But as of the moment, i have no real experience, just starting to play around.

The schema free approach of elasticsearch might be a plus in combination with Cobalt for a start.

I would not compare Solr directly with elasticsearch -- i would rather look what solution fits better to Cobalt ...


londoh VIP
Total posts: 137
25 Окт 2013 14:43

The schema free approach of elasticsearch might be a plus in combination with Cobalt for a start.

yes I saw that aspect. it seems it might offer easier integration.

The schema free approach of elasticsearch might be a plus in combination with Cobalt for a start.

I would not compare Solr directly with elasticsearch -- i would rather look what solution fits better to Cobalt ...

yea I agree.

I'll have a more detailed look at the existing joomla code and extensions tomorrow.


Sackgesicht VIP
Total posts: 1,636
25 Окт 2013 14:53

londoh VIP
Total posts: 137
27 Окт 2013 15:50

so I got a basic integration working...

Cobalt + ElasticSearch


Jeff VIP
Total posts: 745
27 Окт 2013 18:53

You guys have taken the initiative to implement and test Elasticsearch + Cobalt, which is pretty awsome :) That's why I love this community.

Keep up the good work. I can't wait to see this working.

Sorry I can't be of any help here. The technical aspects are beyond my skills...


Sackgesicht VIP
Total posts: 1,636
27 Окт 2013 19:52

Over the discussion, we should not forget about the original topic -- "Search mode auto detection improvements" .. which i consider an important improvement within the existing Cobalt behaviour.


Jeff VIP
Total posts: 745
27 Окт 2013 20:05

Sackgesicht

Over the discussion, we should not forget about the original topic -- "Search mode auto detection improvements" .. which i consider an important improvement within the existing Cobalt behaviour.

Agree

Работает на Cobalt