State of the art machine learning models that focus on natural language processing
are powerful, but also complex and expensive, both computationally and financially. Some
generative tasks may require substantially large language models. Larger models are also
not often accessible as access is sold as a service or requires advanced technical knowledge.
Some natural language processing tasks however, such as short sequenced natural language
generation may not require the use of these complex and expensive models. Hidden Markov
models are a historically well known model that are observable, interpretable, and better
suited for small scale generative sequence tasks. To further improve the generative capabilities,
the constrained hidden Markov process (CHiMP) model was introduced in previous work to
allow control over generated sequences by focusing on lexical categories and constraints on
those lexical categories. This work improves upon the CHiMP model to an increased cohesive
level by adding phrasal categories to the hidden state space, and by using floating constraints
on the phrasal categories.
Keywords: natural language processing, markov model, constrained sequence generation,
machine learning, statistical models |