Dr. Mihai Surdeanu, University of Arizona: Data mining master’s course
Course Summary
It is estimated that by 2025 up to 80% of the web content will be unstructured information including text[1]. This is happening in domain-specific scenarios as well. For example, PubMed, a repository and search engine for biomedical literature[2], indexes more than 1 million scientific publications each year. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build these systems, such as web crawling, index construction and compression, and several retrieval and ranking methods including Boolean, vector-based, neural (including transformer networks and large language models), and link analysis algorithms such as PageRank. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems.
The expected learning outcomes of this course are:
- Students will be able to identify instances of information retrieval problems in the real world, e.g., web search or music retrieval, and examine information retrieval techniques, e.g., search methods, language models, text classification, link analysis, which apply to the problem at hand.
- Students will be able to design and implement modern information retrieval systems that solve the above problems, including methods driven by word embedding methods such as word2vec and transformer networks.
- Students will be able to analyze the behavior of their implemented systems on the task addressed, and appraise the performance on real-world data.
Location: John Snow room, Hotel Universitas, map
Schedule:
23 oct. 2023 | 16:00 – 20:00 |
24 oct. 2023 | 14:00 – 18:00 |
25 oct. 2023 | 14:00 – 18:00 |
26 oct. 2023 | 14:00 – 18:00 |
27 oct. 2023 | 12:00 – 16:00 |
30 oct. 2023 | 16:00 – 20:00 |
31 oct. 2023 | 14:00 – 18:00 |
1 nov. 2023 | 14:00 – 18:00 |
2 nov. 2023 | 14:00 – 18:00 |
3 nov. 2023 | 14:00 – 18:00 |
Instructor
Dr. Mihai Surdeanu
Associate Professor
Computer Science Department
University of Arizona
Bio
Dr. Surdeanu works on natural language processing (NLP) systems that process and extract meaning from natural language texts such as question answering (answering natural language questions), information extraction (converting free text into structured relations and events), and textual entailment. He focuses mostly on interpretable models, i.e., approaches where the computer can explain in human understandable terms why it made a decision, and machine reasoning, i.e., methods that approximate the human capacity to understand bigger things from knowing smaller facts. He published more than 150 peer-reviewed articles, including four articles that were among the top three most cited articles at their respective venues that year. His work has been cited more than 18 thousand times, and has a current h-index of 44. Dr. Surdeanu’s work was funded by several United States government organizations (DARPA, NIH, NSF), as well as private foundations (the Allen Institute for Artificial Intelligence, the Bill Melinda Gates Foundation).
[1] https://solutionsreview.com/data-management/80-percent-of-your-data-will-be-unstructured-in-five-years
[2] https://www.ncbi.nlm.nih.gov/pubmed