下面是一段文档的向量化的程序,且未经停用词过滤from sklearn.feature_extraction.text import CountVectorizercorpus = ['Jobs was the chairman of Apple Inc., and he was very famous','I like to use apple computer','And I also like to eat apple'] vectorizer =CountVectorizer()print(vectorizer.vocabulary_)print(vectorizer.fit_transform(corpus).todense()) #转化为完整特征矩阵已知print(vectorizer.vocabulary_)的输出结果为:{u'and': 1, u'jobs': 9, u'apple': 2, u'very': 15, u'famous': 6, u'computer': 4, u'eat': 5, u'he': 7, u'use': 14, u'like': 10, u'to': 13, u'of': 11, u'also': 0, u'chairman': 3, u'the': 12, u'inc': 8, u'was': 16}. 则最后一条print语句中文档D1,即'Jobs was the chairman of Apple Inc., and he was very famous'的向量为