AnacondaCon 2018. David Sullivan. "DUKE: Dataset Understanding through Knowledge-base Embeddings" produces abstractive descriptions of datasets based on word2vec model trained on wikipedia paired with a curated ontology. For those familiar with word2vec, you can think of DUKE as essentially "dataset2vec". This talk will discuss the technology behind DUKE, how DUKE can be used to improve the data science and data engineering process, and how the audience can download and use the software.
I am looking for editors/curators to help with branches of the tree. Please send me an email if you are interested.