Professor of Statistics
University of Allahabad
Anoop Chaturvedi research interests include regression analysis, time series modelling and forecasting, data mining, Bayesian modelling and econometrics. Recently he has worked on model averaging for linear models and Bayesian time series, analysing RNA-Seq data using data mining tool. His research team applied different data mining and clustering tools in the field of exosomes while taking a neglected disease Chagas. Another recent major contribution is developing robust Bayes procedure for panel data models, which got published in one of the leading journals of econometrics Journal of Econometrics.
Anoop did his PG and PhD in Statistics at Lucknow University. Since 1981, he is working at University of Allahabad holding various positions.
Deep Learning for Gene Expression Data using Artificial Neural Network, Model Averaging and Time Series Models
Choosing a suitable model is central to all statistical work with gene expression data. A typical approach is to carry out a model selection exercise leading to a single “best” model and then to make inferences as if the selected model were the true model. However, it ignores a major component of uncertainty about the model itself. As a consequence, uncertainty about quantities of interest can be underestimated. A complete solution to this problem involves averaging over all possible combinations of predictors when making inferences about quantities of interest. Combining established concepts of shrinkage estimation and model averaging yields the concept of averaging estimation which reflects the idea of using weighted combinations of estimators with different tuning parameters to improve overall stability, standard errors and predictive performance of estimators.
Artificial Neural Networks (ANN) have become a popular tool for classification and predictive modelling when data show nonlinear patterns because of their flexible nature of not assuming any parametric form. The objective is mimicking intelligence of human brain in machine. Deep Learning utilizes a hierarchical level of artificial neural networks which can be envisaged as a cascade of nonlinear processing units. ANN found interesting applications in predictive modelling and classifying gene expression data. Often gene expression data involve a large number of explanatory variables. The use of more explanatory variables may give a better fit for the data but lead to over fitting and bad predictive performance. Similarly, increasing the size of a neural network may lead to better fit on training data, but may result in over fitting and poor predictions. Thus one needs a method for deciding how to choose a best model, or best set of models. The objective is to extend model averaging technique for artificial neural network, which will provide a viable option to overcome this problem. The LASSO and other penalized loss functions will be used to determine the model weights. Bayesian model averaging will also be explored for ANN models. The results will be applied to gene expression data as well as to other biological data sets.
For time course gene expression data, monitoring the changes in gene expression patterns over time provides important information about the dynamic mechanism driving the process. When the time series has linear structure, various time series models such as autoregressive models, moving average models or mixed ARMA models are developed in statistical literature. However, these models fail when the inherent structure is nonlinear, which is often the case in modelling time course gene expression data. An alternative is ensemble modelling, which is a multi-model approach utilizing each component model’s unique capability to capture different patterns. If time series exhibits a mixture of linear as well as non-linear pattern, a hybrid of ARIMA models dealing with linear part and neural network dealing with non-linear part provide an interesting alternative. These models will be applied to time course gene expressive data to explore their dynamic behaviour. This kind of analysis opens an opportunity to study the emergence of coherent temporal responses of many interacting components and may lead to improved predictions.
- Data Mining
- Time Series Modelling and forecasting
- Baysian modelling
- Regression Analysis and Econometrics
- Model Averaging for big data
- Analysing gene expression data
Field of Expertise
Econometrics, Regression analysis, Bayesian modelling, Time Series Modelling and forecasting, Data mining
- Robust linear static panel data models using -contamination (with Badi H. Baltagia, Georges Bresson, and Guy Lacroix), Journal of Econometrics, 2018, 202 108–123. ISSN: 0304-4076.
- Clustering and Candidate Motifs Detection in Exosomal miRNAs by Application of Machine Learning Algorithms (with Pallavi Gaur), Interdisciplinary Sciences: Computational Life Sciences , 2017.
- Shrinkage Estimation in Spatial Autoregressive Model (with Amresh Pal and Ashutosh Dubey), Journal of Multivariate Analysis, 2016, 143, 362-373.
- Mining SNPs in extracellular vesicular transcriptome of Trypanosoma cruzi: a step closer to early diagnosis of neglected Chagas disease (with Pallavi Gaur).
- Bayesian Analysis of a Linear Model Involving Structural Changes in Either Regression Parameters or Disturbances Precision (with Arvind Shrivastava), Communications in Statistics (Theory and Methods), 2016, 45(2), 307-320.
Department of Statistic
University of Allahabad
211002, UP. India
Phone: +91 9415214134