Abstract: Regularized regression models have gained popularity in recent years. The addition of a penalty term to the likelihood function allows parameter estimation where traditional methods fail, such as in the p » n case. The use of an l1 penalty in particular leads to simultaneous parameter estimation and variable selection, which is rather convenient in practice. Moreover, computationally efficient algorithms make these methods really attractive in many applications. This thesis is inspired by this literature and investigates the development of novel penalty functions and regression methods within this context. In particular, Chapter 2 deals with linear models for time-dependent response and explanatory variables. This is beyond the independent framework which is common to many of the developed regularized regression models. We propose to account for the time dependency in the data by explicitly adding autoregressive terms to the response variable together with an autoregressive process for the residuals. In addition, the use of a l1 penalized likelihood approach for parameter estimation leads to automatic order and variable selection and makes this method feasible for high-dimensional data. Theoretical properties of the estimators are provided and an extensive simulation study is performed. Finally, we show the application of the model on air pollution and stock market data and discuss its implementation in the R package DREGAR, which is freely available in CRAN. In Chapter 3, we develop a new penalty function. Despite all the advantages of the l1 penalty, this penalty is not differentiable at zero, and neither are the alternatives that are proposed in the literature. The only exception is the ridge penalty, which does not lead to variable selection. Motivated by this gap, and noting the advantages that a differentiable penalty can give, such as increased computational efficiency in some cases and the derivation of more accurate model selection criteria, we develop a new penalty function based on the error function. We study the theoretical properties of this function and of the estimators obtained in a regularized regression context. Finally, we perform a simulation study and we use the new penalty to analyse a diabetes and prostate cancer dataset. The new method is implemented in the R package DLASSO, that is freely available in CRAN. Finally, Chapter 4 deals with regression models for discrete response data, which is frequently collected in many application areas. In particular, we consider a discrete Weibull regression model that has recently been introduced in the literature. In this chapter, we propose the first Bayesian implementation of this model. We consider a general parametrization, where both parameters of the discrete Weibull distribution can be conditioned on the predictors, and show theoretically how, under a uniform noninformative
prior, the posterior distribution is proper with finite moments. In addition, we consider closely the case of Laplace priors for parameter shrinkage and variable selection. A simulation study and the analysis of four real datasets of medical records show the applicability of this approach to the analysis of count data. The method is implemented in the R package BDWreg, which is freely available in CRAN.
Abstract: The objective of this thesis is to establish whether or not online, objective questions in elementary graph theory can be written in a way that exploits the medium of computer-aided assessment. This required the identification and resolution of question design and programming issues. The resulting questions were trialled to give an extensive set of answer files which were analysed to identify whether computer delivery affected the questions in any adverse ways and, if so, to identify practical ways round these issues. A library of questions spanning commonly-taught topics in elementary graph theory has been designed, programmed and added to the graph theory topic within an online assessment and learning tool used at Brunel University called Mathletics. Distracters coded into the questions are based on errors students are likely to make, partially evidenced by final examination scripts. Questions were provided to students in Discrete Mathematics modules with an extensive collection of results compiled for analysis. Questions designed for use in practice environments were trialled on students from 2007 – 2008 and then from 2008 to 2014 inclusive under separate testing conditions. Particular focus is made on the relationship of facility and discrimination between comparable questions during this period. Data is grouped between topic and also year group for the 2008 – 2014 tests, namely 2008 to 2011 and 2011 to 2014, so that it may then be determined what factors, if any, had an effect on the overall results for these questions. Based on the analyses performed, it may be concluded that although CAA questions provide students with a means for improving their learning in this field of mathematics, what makes a question more challenging is not solely based on the number of ways a student can work out his/her solution but also on several other factors that depend on the topic itself.
Abstract: This thesis describes the development of Computer-Aided Assessment questions for elementary discrete and decision mathematics at the school/university interface, stressing the pedagogy behind the questions’ design and the development of methodology for assessing their efficacy in improving students’ engagement and perceptions, as well as on their exams results. The questions give instant and detailed feedback and hence are valuable as diagnostic, formative or summative tools. A total of 275 questions were designed and coded for five topics, numbers, sets, logic, linear programming and graph theory, commonly taught to students of mathematics, computer science, engineering and management. Pedagogy and programming problems with authoring questions were resolved and are discussed in specific topic contexts and beyond. The delivery of robust and valid objective questions, even within the constraints of CAA, is therefore feasible. Different question types and rich feedback comprising text, equations and diagrams that allow random parameters to produce millions of realisations at run time, can give CAA an important role in teaching mathematics at this level. Questionnaires identified that CAA was generally popular with students, with the vast majority seeing CAA not only as assessment but also as a learning resource. To test the impact of CAA on students’ learning, an analysis of the exam scripts quantified its effect on class means and standard deviations. This also identified common student errors, which fed into the question design and editing processes by providing evidence-based mal-rules. Four easily-identified indicators (correctly-written remainders, conversion of binary/octal/hexadecimal numbers, use of correct set notation {…} and consistent layout of truth tables) were examined in student exam scripts to find out if the CAA helps students to improve examination answers. The CAA answer files also provided the questions’ facilities and discriminations, potentially giving teachers specific information on which to base and develop their teaching and assessment strategies. We conclude that CAA is a successful tool for the formative/summative assessment of mathematics at this level and has a positive effect on students’ learning.
