Thư Viện Số Đại Học Thủy Lợi: Tìm kiếm

Lọc theo bộ sưu tập

Bộ lọc:

Kết quả tìm kiếm

Trang trước
1
Trang sau

Danh sách kết quả tìm kiếm tài liệu từ 1 đến 5 trong 5 tài liệu phù hợp.

Tài liệu phù hợp với tiêu chí tìm kiếm:

CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories

Tác giả: Mart´ınez-Plumed, Fernando; Người hướng dẫn: -; Người tham gia: Contreras-Ochando, Lidia; Ferri, Cesar; Hernandez-Orallo, Jose; Kull, Meelis; Lachiche, Nicolas; Ramırez-Quintana, Marıa Jose; Flach, Peter (2019)

CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still the de facto standard for developing data mining and knowledge discovery projects. However, undoubtedly the field has moved on considerably in twenty years, with data science now the leading term being favoured over data mining. In this paper we investigate whether, and in what contexts, CRISP-DM is still fit for purpose for data science projects. We argue that if the project is goal-directed and process-driven the process model view still largely holds. On the other hand, when data science projects become more exploratory the paths that the project can take become more varied, and a more...

Improving Data Analytics with Fast and Adaptive Regularization

Tác giả: Luo, Zhaojing; Người hướng dẫn: -; Người tham gia: Cai, Shaofeng; Chen, Gang; Gao, Jinyang; Lee, Wang-Chien; Ngiam, Kee Yuan; Zhang, Meihui (2019)

Deep Learning and Machine Learning models have recently been shown to be effective in many real world applications. While these models achieve increasingly better predictive performance, their structures have also become much more complex. A common and difficult problem for complex models is overfitting. Regularization is used to penalize the complexity of the model in order to avoid overfitting. However, in most learning frameworks, regularization function is usually set with some hyper-parameters where the best setting is difficult to find. In this paper, we propose an adaptive regularization method, as part of a large end-to-end healthcare data analytics software stack, which effectively addresses the above difficulty. First, we propose a general adaptive regularization method ba...

Measures of Scatter and Fisher Discriminant Analysis for Uncertain Data

Tác giả: Tavakkol, Behnam; Người hướng dẫn: -; Người tham gia: K. Jeong, Myong; L. Albin, Susan (2019)

Uncertain data objects are objects that can be characterized by either a probability density function (PDF) or with multiple points. Because of existing levels of uncertainty for uncertain data objects, the scatter of this type of objects might be very different than the scatter of certain data objects. Measures of scatter for uncertain objects have not been defined before. In this paper, we define covariance matrix, within scatter matrix, and between scatter matrix as the measures of scatter for uncertain data objects. Also, in this paper, we extend the idea of Fisher linear discriminant analysis for uncertain objects (UFLDA). We also develop kernel Fisher discriminant analysis for uncertain objects (UKFDA). The developed uncertain kernel Fisher discriminants are for two cases: 1) ...

Forecasting Gathering Events through Trajectory Destination Prediction: a Dynamic Hybrid Model

Tác giả: Khezerlou, Amin Vahedian; Người hướng dẫn: -; Người tham gia: Zhou, Xun; Tong, Ling; Li, Yanhua; Luo, Jun (2019)

Identifying urban gathering events is an important problem due to challenges it brings to urban management. Recently, we proposed a hybrid model (H-VIGO-GIS) to predict future gathering events through trajectory destination prediction. Our approach consisted of two models: historical and recent and continuously predicted future gathering events. However, H-VIGO-GIS has limitations. (1) The recent model does not capture the newly-emerged abnormal patterns effectively, since it uses all recent trajectories, including normal ones. (2) The recent model is sparse due to limited number of trajectories it learns, i.e. it cannot produce predictions in many cases, forcing us to rely only on the historical model. (3) The accuracy of both recent and historical models varies by space and time. ...

Semi-supervised Topological Analysis for Elucidating Hidden Structures in High-Dimensional Transcriptome Datasets

Tác giả: Feng, Tianshu; Người hướng dẫn: -; Người tham gia: I. Davila, Jaime; Liu, Yuanhang; Lin, Sangdi; Huang, Shuai; Wang, Chen (2019)

Topological data analysis (TDA) is a powerful method for reducing data dimensionality, mining underlying data relationships, and intuitively representing the data structure. The Mapper algorithm is one such tool that projects highdimensional data to 1-dimensional space by using a filter function that is subsequently used to reconstruct the data topology relationships. However, domain context information and prior knowledge have not been considered in current TDA modeling frameworks. Here, we report the development and evaluation of a semi-supervised topological analysis (STA) framework that incorporates discrete or continuously labeled data points and selects the most relevant filter functions accordingly. We validate the proposed STA framework with simulation data and then apply it...

Trang trước
1
Trang sau