As per previous CS@W posts Joost Zwarts' Constructing Conceptual Spaces for Lexical Semantics provided one of several examples at the conference of data analysis derived from large datasets. The abstract included:
The similarity structure of a conceptual space can be determined using lexical data or pile sorting, but it can also be based on some sort of analysis of the values involved. Using the work of Geeraerts et al. (1994) on Dutch clothing terminology, Zwarts (2010) demonstrates how a space of “shirts” can be constructed (either using graph or MDS techniques) along such lines, with fruitful results.This talk outlined the way features can be identified and decomposed. Key to this are classifiers which Zwarts listed as:
1 Psychological classifiers (piles)The presentation provided a small example based on containers, which for me was very helpful as data is a real issue for my study of Hodges' model. There are datasets out there - nursing, classification systems - and secondary data sources to consider. As mentioned above Zwarts took a dataset comprised of 38,000 possible items from the clothing domain and used 244 from the sub-domain of shirts. [Geeraerts, Grondelaers, Bakema (1994). The Structure of Lexical Variation. Berlin: Mouton de Gruyter. ]
From similarity judgments or sorted piles
2 Lexical classifiers (words)
From common lexical descriptions
3 Analytical classifiers (features)
I am probably simplifying things but discussion of Hamming distance recalled for me old Byte articles on bit-classifiers. Whether a sign of progress (maturation) or my focus, but in the 1980s there were many articles on data structures and algorithms, for some reason quad trees proved quite an attraction. There was an approach to clinical classification by Johnson (1987) that adopted a ZIP code format. As Zwarts related his presentation I wondered where a primary care problem might reside in Hodges' model: (1000, 0100, 0010, 0001)? Alternately where is the emerging problem that is nudging this individual towards possible relapse?
Graphviz was used and multidimensional scaling. The talk became more technical, understanding aided by graphical examples as classical categories were introduced: A category C is classical iff it can be defined by a particular set of feature values. The conclusion brought together the technical aspects of the data examined: convexity is too strong, connectedness somewhat too weak, but that there is a clear notion of coherence. There was much here to learn from.
Johnson, B. (1987) Health Code, The Guardian, 23 July, 16 (see also (1990) Journal of Health Care Computing).
The following page is out of date but cites Johnson:
(I don't want to upset searches on Google for 'conceptual spaces', so I've two more further posts this month on the conference concerning CSML and OntoSpaces.)
Here is the view from my B&B I enjoyed some lovely walks into Lund, plus using the bus. With a map that stayed in my laptop bag, I enjoyed getting lost on two occasions.