I have always been puzzled and amazed by the idea of “embedding”. A high-dimensional space, such as a corpus for a language, can be represented using, say, only 50 dimensions. How amazing! This is a huge save in covariate space dimension compared to the one-hot encoding. In a previous course at UConn called Data Science in Action, I did some text classification based on one-hot encoding and tf-idf weighting of text messages after tokenization, but that was a rather naive application - there were 9376 words in a total of 5572 messages, and I did not try to lower the dimension of covariate space but applied a bunch of classification algorithms directly. The project is on GitHub.