Information > View File

Open Source Software (OSS) and Statistics


Statistical OSS

General Statistical OSS

  • The R Project is an OSS tool under GNU for statistical analysis
  • GGobi is an OSS data visualization system for viewing high-dimensional data
  • GNUPlot is a portable command-line driven interactive data and function plotting utility, in use by INSEE
  • Dashboard is a free software written by Jochen Jesinghaus of the JRC in ISPR to visualise complex indices and their relations
  • iPlots is a package for the R-project which provides high interaction statistical graphics, written in Java
  • Mondrian is an advanced statistical data-visualization system written in JAVA, with the main emphasis on for Categorical Data, Geographical Data and Large Datasets
  • Demetra is the single interface in which the two seasonal adjustment methods Tramo/Seats and X-12-Arima are implemented. It facilitates the application of these modern time series techniques to large-scale sets of time seriesin the explicit consideration of the needs of production units in statistical institutes
  • Gretl is a cross-platform software package for econometric analysis, written in the C programming language.
  • τ-ARGUS is a software program designed to protect statistical tables, while µ-ARGUS is a software program designed to create safe micro-data files. They are both results of the CASC-project
  • Ploticus is a free non-interactive software package for producing plots, charts, and graphics from data; developed in a Unix/C environment and running on various Unix, Linux, and win32 systems. Ploticus is good for automated or just-in-time graph generation, handles date and time data, and has basic statistical capabilities. It produces a wide range of full-colour charts. Ploticus is script-driven, though for many uses scripts don't need to be written because so-called prefabs (see ploticus prefab ) exist. Ploticus has the following limitations: scant support for mathematical formulas and scientific notations; binary data cannot be plotted with Ploticus; files cannot be read directly from MySQL, Oracle, Excel, Access, etc. (though Ploticus graphs can be exported into PowerPoint, Word, etc.) New 16-Mar-2007
  • AutoClass C is an unsupervised Bayesian classification system that seeks a maximum posterior probability classification and thus aims to discover the 'natural' classes in the data. It can use mixed discrete and real valued data. Cases have probabilistic class membership. Autoclass allows correlation between attributes within a class and predicts "test" case class memberships from a "training" classification. AutoClass C is limited by memory requirements that are roughly in proportion to the number of data, times the number of attributes (the data space); plus the number of classes, times number of modeled attributes (the model space); plus a fixed program space.Autoclass has some limitations with regard to handling very large data sets (processing time might become excessive).
  • Dap is a small statistics and graphics package, based on C, licensed under GPL (version 2 or later ones). It provides core methods of data management, analysis, and graphics commonly used in statistical consulting practice. Using DAP requires familiarity with basic C syntax. As of Version 3.0, DAP can read SAS programs, thereby freeing the user from having to learn any C at all. The manual contains a brief introduction to the C syntax needed for C-style programming for Dap. Because Dap processes files one line at a time, rather than reading entire files into memory, it can be, and has been, used on data sets that have many lines and/or many variables.
  • MCSim is a simulation and statistical inference tool for algebraic or differential equation systems,licensed under GPL (version 2 or later). It was created specifically to perform Monte Carlo analyses in an optimized, and easy to maintain environment. MCSim is a simulation package, written in C, which allows to design statistical or simulation models (eventually dynamic, via ODEs), perform Monte Carlo stochastic simulations, and Bayesian inference through Markov Chain Monte Carlo simulations
  • Salstat is a statistics package for the analysis of scientific data, like in psychology, for example. The test it can perform range from descriptive statistics to analysis of variance tests and their nonparametric equivilents. The number of tests that is operable in the current version (released in October 2003) is however limited. SalStat is available only as a beta-version, i.e. most functionalities have not yet been tested exhaustively within different environments. Another important shortcoming of SalStat is that it allows to quit the work session without any notification of the user that the data should be saved. SalStat is written in Python, the GUI is wxPython (a cross-platform GUI toolkit for the Python programming language) and uses Numeric and SciPy (an open source library of scientific tools for the Python programming language). Its modular architecture enables people to design their own tests and incorporate them into the GUI. Source code is available under the GPL, and binary versions are also available for those who do not wish to install Python, wxPython and Numeric. New 05-Oct-2007
  • MacAnova is an interactive statistical analysis program for Windows, Macintosh. In spite of its name, MacAnova is not just for Macintosh computers and not just for doing Analysis of Variance. It is extensible via macros. MacAnova has many capabilities, including the design of experiments. Its strengths are analysis of variance and related models, matrix algebra, time series analysis (time and frequency domain), and (to a lesser extent) uni- and multi-variate exploratory statistics. Core MacAnova has a functional/command oriented interface, but an increasing number of capabilities are available through a menu/dialog/mouse type interface. The lack of real menus even in the Windows and Mac version might be considered as user-unfriendly by users who are used to menus. New 05-Oct-2007
  • ViSta is an open source application that constructs very-high-interaction, dynamic graphics that show you multiple views of your data simultaneously. The graphics are designed to augment your visual intuition so that you can better understand your data. It runs on Windows, Macintosh and Unix and it is available in English, Frenh and Spanish. New 14-Jun-2008
  • OpenEpi is free and open source software for epidemiologic statistics released with MIT licence. OpenEpi provides statistics for counts and measurements in descriptive and analytic studies, stratified analysis with exact confidence limits, matched pair and person-time analysis, sample size and power calculations, random numbers, sensitivity, specificity and other evaluation statistics and other things like R x C tables, chi-square for dose-response. The software is in English, French, Spanish and Italian and it can be used on line since it is a web application. New 14-Jun-2008
  • Statistis a small and portable statistics program written in C released as open source software with GPL license and it runs on several systems like Unix/Linux, Windows, Mac. It is terminal-based, but it can utilise GNUplot, if it is installed, for plotting purposes. It is simple to use and can be run in scripts. It can handle bid datasets on small machines and it opens csv files. Some features: data manipulation (recoding, transforming, selecting), descriptive stats (including histograms, box&whisker plots), correlation & regression, and the common significance tests (chi-square, t-test, etc.). New 14-Jun-2008
  • Tanagra is a free and open-source software released wtih an own license. It is fundamentally a data mining software for academic and research purposes. It supports the standard "stream diagram" paradigm used by most data-mining systems. Contains components for Data source (tab-delimited text), Visualization (grid, scatterplots), Descriptive statistics (cross-tab, ANOVA, correlation), Instance selection (sampling, stratified), Feature selection and construction, Regression (multiple linear), Factorial analysis (principal components, multiple correspondence), Clustering (kMeans, SOM, LVQ, HAC), Supervised learning (logistic regr., k-NN, multi-layer perceptron, prototype-NN, ID3, discriminant analysis, naive Bayes, radial basis function), Meta-spv learning (instance Spv, arcing, boosting, bagging), Learning assessment (train-test, cross-validation), and Association (Agrawal a-priori). New 14-Jun-2008
  • Dataplot is a free software system for scientific visualization, statistical analysis, and non-linear modeling. The target Dataplot user is the researcher and analyst engaged in the characterization, modeling, visualization, analysis, monitoring, and optimization of scientific and engineering processes. It is released as public domain and it runs on top of several systems like Solaris, Mac OS X, Windows. It has built-in functions like trigonometric functions and it recognizes a lof of data types. Originally it was developed as a command language but now it has a graphical interface available for mant systems. New 14-Jun-2008
  • Scilabis a scientific software package for numerical computations in a user-friendly environment. The syntax is similar to MATLAB but they are not completely compatible, though there is a converter included in Scilab for MATLAB=>Scilab Conversions. The software is often used for signal processing, statistical analysis, image enhancement, fluid dynamics simulations. It supports hundreds of built-in functions and libraries, 3-d graphics, and symbolic capabilities through a Maple interface. It is available for Windows, Mac and Unix computers. New 14-Jun-2008
  • Arc software is a free statistical analysis tool for regression problems. Arc is released as free software with an own licence, it is available for Macintosh, MS Windows and Linux/Unix systems. Fundamentally it works as a command line tool encapsulated in a graphical interface, infact it can load files by the graphical interface and represent data extracted in a plot window. To have some example it is suggested to download some data set linked to the related book. New 15-Jun-2008
  • TextSTAT is an open source software for the analysis of texts. It is available for Windows, Linux and Mac OS X and it is written in Python and it is released with an own licence. The interface is available in English, German, and Dutch. It reads ASCII/ANSI texts and HTML files and it produces word frequency lists showing where the words appear. New 15-Jun-2008

Data Transmission Tools

Software for Primary Data Collection

  • Chiba is an Open Source Java Implementation of the W3C XForms standard

Geographical Information Systems (GIS)

  • GeGIS is a new generation GIS (geGIS/Majas)
    GIS systems which use web 2.0 technologies like geGIS/Majas have an enormous potential. What formerly only was possible using powerful desktop computers can be equalled today quite well by web based applications with a standard internet browser. Not only presenting geographical objects, but also editing, managing and analyzing functions belong meanwhile to the possibilities. The engine behind geGIS is Majas, an open source web component for building rich Internet applications (RIA) with sophisticated capabilities for the display, analysis and management of geographic information. It is a building block that allows developers to add maps and other geographic data capabilities to their web applications.
  • GRASS GIS an open source GIS with a long history
    GRASS GIS is an standalone open source application released with GPL licence that initially was developed by U.S. army and from 2002 it is maintained by a big community with heardquarter in Italy. It runs on several systems rapresenting data in 2 and 3 dimensions. The core module is written in C language and it has an modular architecture.New 13-Jun-2008
  • gvSIG a spanish GIS application
    Despite its origin, gvSIG is translated in several languages and its usage is intended for people that work in Public Administrations. It is an open source application released with GPL licence and it was required from the Transport Council of Spain as result of an open source migration. It runs several system also as mobile application. New 13-Jun-2008
  • QGIS a customized GIS software
    Quantum GIS (QGIS) is an open source GIS released with GPL licence that runs on several systems supporting different format types. It works also as interface for GRASS (see above). Like the others it has a modular structure with several plugins that can be developed using different programming languages. New 13-Jun-2008

More to come soon.

Updated 28-Feb-2006

Disclaimer: The Commission takes no responsibility for any external links on this page. In case of problems with any link, or if you want to add information here, please contact us.