Sampling and POS

The more data the better understanding we have of the world to which that data refers to. Data is thus a representation of the world and not the world. One must keep that in mind at all times.

Corpus linguistics represents the world in data. The simplest is type and token. The next layer is parts-of-speech or POS. Every token is given a POS in its context. This POS is a labeling of every item in language. POS therefore is a categorical representation of every word, itself a representation of the concepts and things in the world.

To account for the entirety of the representation of the world in words is what POS is. These are the categories of our mind, a summary of our conceptualizations.