Aceto-White Temporal Pattern Classification using k-NN to Identify Precancerous Cervical Lesion in Colposcopic Images.
Cervical cancer is the first cause of death in Mexican women. After Pap smear test, colposcopy is the most used technique to diagnose this disease due to its higher sensitivity (98%) and specificity (48%). However, the major problem with this technique is its intrinsic subjectivity that requires new mechanisms to quantify near to absolute measurements to improve the test. One of the most promising approaches to reach this goal has been the use of the aceto-white temporal patterns intrinsic to the color changes in digital images. Although some efforts have been made to characterize cervical lesion using these temporal patterns, there is not a complete understanding of how to use them to segment colposcopic images. In this work, we used the machine learning supervised classification algorithm k-NN over the entire length of the aceto-white temporal pattern to automatically discriminate between normal and abnormal cervical tissue, reaching a sensitivity of 71% and specificity of 59%.
Discovering Interobserver Variability in the Cytodiagnosis of Breast Cancer using Decision Tree and Bayesian Networks.
We evaluate the performance of two decision tree procedures and four Bayesian network classifiers as potential decision support systems in the cytodiagnosis of breast cancer. In
order to test their performance thoroughly, we use two real-world databases containing 692 cases
and 322 cases collected by a single observer and 19 observers respectively. The results show that,
in general, there are considerable differences in all tests (accuracy, sensitivity, specificity, PV+,
PV- and ROC) when a specific classifier uses the single-observer dataset compared to those when
this same classifier uses the multiple-observer dataset. These results suggest that different
observers see different things: a problem known as interobserver variability. We graphically unveil
such a problem by presenting the structures of the decision trees and Bayesian networks resultant
from running both databases.
Diagnosis of Breast Cancer using Bayesian Networks: A Case Study.
We evaluate the effectivenessof seven Bayesian networks classifiers as potencial tools for the diagnosis of breast cancer using two real-word databases containing fine-needle aspiration of the breast lesion cases collected by a single observer and multiple observers, respectively. The results show a certain ingredient of subjectivity implicity contained in these data: we get an average accuracy of 93.04% for the former and 83.31% for the latter. These findings suggest that observers see differents things when looking at the samples in the microscope: a situation that significantly diminishes the perfomance of these classifiers in diagnosis such a disease.
Publicaciones en el Indice de Revistas Mexicanas de Investigación Científica y Tecnológica
El crecimiento de la demanda de nuevo ingreso a las Instituciones de Educación Superios combinado con la menor velocidad en la expansión de la oferta, ha impulsado la necesidad de aplicar criterios de selección, entre ellos los exámenes de ingreso. En este trabajo, se realiza una aproximación a las relaciones entre los resultados obtenidos por los estudiantes en las áreas de conocimiento exploradas por el EXAN II y su trayectoria escolar, se analizó la información disponible de los resultados de la aplicación del EXAN II y la trayectoria escolar desplegada por 6,937 estudiantes de primer ingreso a la Universidad Veracruzana (UV) en 1998. Se utilizaron pruebas de independencia condicional, así como medidas de correlación simple. Sin agotar todas las posibilidades, el análisis de los datos sugiere el grado de asociación entre las calificaciones en el examen y el rendimiento en la Universidad.
Examen de Selección y Trayectoría Escolar.
Si bien se ha generalizado la aplicación de exámenes como instrumentos para determinar el ingreso a las instituciones de educación superior (IES), son escasos los estudios que exploran la capacidad para determinar la probabilidad de éxito de quienes ingresan. Esta es una aproximación a las relaciones entre los resultados obtenidos por los estudiantes en las áreas de conocimiento exploradas por el Examen Nacional de Ingreso a la Educación Superior (EXANI-II) y su trayectoria escolar. Se analizó la información disponible de 6,937 estudiantes de primer ingreso a la Universidad Veracruzana (UV) en 1998. Se utilizaron pruebas de independencia condicional, así como medidas de correlación. Sin agotar todas las posibilidades, el análisis de los datos sugiere el grado de asociación entre las calificaciones en el examen y el rendimiento en la universidad, además de calcular la probabilidad de éxito.
Modeling Aceto-White Temporal Patterns to Segment Colposcopic Images.
Colposcopy test is the second most used technique to diagnose cervical cancer disease. Some researches have proposed to use temporal changes intrinsic to the colposcopic image sequences to automatically characterize cervical lesion. Under this approach, every single pixel on the image is represented as a Time Series of length equal to the sampling frequency times acquisition points. Although this approach seems to show promising results, the data analysis procedures have to deal with huge data set that rapidly increase with the number of cases (patients) considered in the analysis. In the present work, we perform principal component analysis (PCA) to reduce the dimensionality of the data in order to facilitate similarity measures for classification and clustering. The importance or tis work is that we propose a model to parameterize the dynamics of the system using an efficient representation getting a 1.11% data compression ratio and similarity on clustering of 0.78. The feasibility of the proposed model is shown testing the similarity of the clusters generated using the k-means algorithm over the raw data and the compressed representation of real data.
Comparison of the Performance of Seven Classifiers as Effective Decision Support Tools for the Cytodiagnosis of Breast Cancer: A Case Study.
We evaluate the performance of seven classifiers as effective potencial decision support tools in the cytodiagnosis of breast cancer. To this end, we use a real-world database containing 692 fine needle aspiration of the breast lesion cases collected by a single observer. The result show, in average, good overall classification performance in terms of five different test: accuracy of 93.62%, specificity of 96%, PV+of 92% and PV - of 94.5%. With this comparison, we identify and discuss the advantage and disadvantages of each of these approaches. Finally, based on these results, we give some advice regarding the selection on the classifier depending on the user's needs.
On the Possibility of Reliably Constructing a Decision Support System for the Cytodiagnosis of Breast Cancer.
We evaluate de performance of three Bayesian Network classifiers as decision support system in the cytodiagnosis of breast cancer. In order to test their performance thoroughly. we use two real-world databases containing 692 cases collected by a single observer and 322 cases collected by multiple observer respectively. Suprisingly enough, these classifiers generalize well only in the former dataset. In the case of the latter one, the results given by such procedures have a considerable reduction in the sensitivity and PV- test. These results suggest that different observers see different things: a problem known as interobserver variability. Thus, it is neccesary to carry out more test for identifying the cause of this subjectivity.
Automatic Construction of Bayesian Network Structures by Means of a Concurrent Search Mechanism.
The implicit knowledge in the databases can be extracted of automatic form. One of the several approaches considered for this problem is the construcion of graphical models that represent the relations between the variables and regularities in the data. In this work the problem is addressed by means of an algorithm of search and scoring. These kind of algorithms use a heuristic mechanism search and a function of score to guide themselves towards the best possible solution. The algorithm, which is implementd in the semifunctional language Lisp, is a searching mechanism of the structure of a bayesian network (BN) based on concurrent processes. Each process is assigned to a node of the BN and effects one of three possible operations between its node and some of the rest: to put, to take away or to invert an edge. The structure is constructed using the metric MDL (made up the three terms), whose calculation is made of distributed way, in this form the search is guided by selecting those operations between the nodes that minimize the MDL of the network. In this work are presented some results of the algorithm in terms if comparison of the structure of the obtained network with respect to its gold network.
How Good are the Bayesian Information Criterion and the Minimum Description Length Principle for Model Selection?. A Bayesian Networks Analysis.
The Bayesian Information Criterion (BIC) and the Minimum Description Length Principle (MDL) have been widely proposed as good metrics for model selection. Such scores basically include two terms: one for accuracy and the other for complexity. Their philosophy is to find a model that rightly balance these terms. However, it is surprising that both metrics do often not work very well in practice for they overfit the data. In this paper, we present an analysis of the BIC and MDL scores using the framework of Bayesian Networks that supports such a claim. To this end, we carry out different test that include the recovery of gold-standard network structures as well as the construction and evaluation of Bayesian Network classifiers. Finally, based on these results, we discuss the disadvantage of both metrics and propose some future work to examine these limitations more deeply.
Diagnosis of Chronic Idiopathic Inflammatory Bowel Disease Using Bayesian Network.
In this paper, we evaluate the effectiveness of four Bayesian Network classifiers as potencial tools for the histopathological diagnosis of chronic idiopathic inflammatory bowel disease (CIIBD) using a database containing endoscopic colorectal biopsies. CIIBD is the generic term for referring to two ailments known as Crohn's disease and ulcerative colitis. The results show that the defined histological attributes, considered relevant in the medical literature for the diagnosis of CIIBD samples (Crohn's disease and ulcerative colitis combined into a single category) but less good for the explicit distinction between Crohn's disease and ulcerative colitis. The findings suggest and intrinsic impossibility of selecting a set of features for achieving good balance for both sensitivity and specificity for Crohn's disease and ulcerative colitis.
Bayesian Model Combination and its Application to Cervical Cancer Detection.
We have developed a novel methodology to combine several models using a Bayesian approach. The method selects the most relevant attributes from several models, and produces a Bayesian classifiers which has a higher classification rate than any of them, and at the same time is very efficient. Based on conditional information measures, the method eliminates irrelevant variables, and joins or eliminates dependent variables; until an optimal Bayesian classifier is obtained. We have applied this method for diagnosis of precursor lesions of cervical cancer. The temporal evolution of the color changes in a sequence of colposcopy images is analyzed, and the resulting curve is fit to an approximate model. In previous work we develop 3 different mathematical model to describe the temporal evolution of each image region, and based on each model to detect regions that could have cancer. In this paper we combine the three models using our methodology and show very high accuracy for cancer detection, superior to any of the 3 original models.
Digital Image Processing of Functional Magnetic Resonance Images to Identify Stereo-Sensitive Cortical Regions Using Dynamic Global Stimuli.
Functional magnetic resonance image (fMRI) were anayized to investigate the cortical regions involved in stereoscopic vision using red/green anaglyphs to present random dot stereograms. Two experiments were conducted both of which required high attentional demands. In the first experiment the subjects were instructed to follow the path of a square defined by depth and moving in the horizontal plane contrasted with a similar sized squared defined by a slight difference in luminance. Three main regions were identified V3A, V3B ans BA7. To test that the observer activations were not produced by the pursuit eye movements, a second experiments required the subjects to fixate whilst a shape was presented in different random orientations. Our results suggest that areas V1, V3A and precuneus are involved in stereo disparity processing. We hypothesise that the activation of the V3B regions was produced by the second order motion component induced by the spatio-temporal changes in disparity.
Bayes-N: An Algorithm for Learning Bayesian Networks from Data using Local Measures of Information Gain Applied to Classification Problems
Bayes-N is an algorithm for Bayesian network learning from data based on local measures of information gain, applied to problems in which there is given dependent or class variable and set of independent or explanatory variables from which we want to predict the class variable on new cases. Given this setting, Bayes-N induces an ancestral ordering of all the variables generatinga directed acyclic graph in which the class variable is a sink variable, with a subset of the explanatory variables as its parents. It is shown that classification using this variables as predictors performs better than the naive bayes classifier, and al least as good as other algorithms that learn Bayesian networks suchs as K2, PC and Bayes-9. It is also shown that the MDL measure of the networks generated by Bayes-N is comparable to those obtained by these ither algorithms.
Evaluation of the Potencial of Bayesian Networks on the Clasification of Medical Data (Evaluación del Potencial de Redes Bayesianas en la Clasificación de Datos Médicos).
In this paper, we present the evaluation of Bayesian Networks on the classification of medical data. Their qualitative and quantitative nature permits representing the probabilistic relationship among variables as well as carrying out inferences such as prediction, diagnosis and decision-making. The medical area has used them for analysis and processing of data. Here, we evaluate the performance of Bayesian Networks on medical databases related to diseases such as Breast Cancer, Tumors, Diabetes and Hepatitis. In order to carry out such a task, we tested different Bayesian Network are a powerful and reliable tool for diagnosis and decision.making in this area.
Entropy Based Linear Approximation Algorithm for Times Series Discretization.
Time series data mining is a relatively new sub-areas of data mining, in which the temporal dimension of data introduces new challenges for classification and clustering tasks. The huge amount of infomation contained in temporal databases requires efficient representations, not only to reduce dimensionality, but also to preserve the relevant information for efficient classification. Many approaches have been proposed to represent temporal data in discrete form, However, most of them are oriented to data compression, rather than to information maximization. In this work we proposed new time series discretization algortihm called EBLA2. The basic idea behind EBLA2 is to minimize the entropy of the temporal patterns over their class labels after finding a minimum set of intervals from which the continuous values of the temporal databases can be discretized. Under a similar approach, the algorithm is able to find the minimum time series length te represent the complete time series database.
Descubrimiento de Conocimiento en Bases de Datos usando Redes Bayesianas.
Una de las principales metas de una disciplina científica relativamente nueva, conocida con el nombre de Descubrimiento de Conocimiento en Bases de Datos es la de proveer métodos que sean capaces de encontrar patrones, regularidades y conocimientos implícitamente contenidos en los datos, de manera que podamos comprender mejor el fenómeno que está siendo estudiado. Debido a la cantidad de datos en casi cualquier dominio de conocimiento crece velozmente, es necesario proponer nuevos enfoques que puedan procesarlos de una manera rápida, eficiente y confiable. Para lograr dicho objetivo esta disciplina combina ideas y técnicas de una variedad de àreas tales como bases de datos, estadística, aprendizaje automático, inteligencia artificial, redes neuronales y visulización de datos.
Clasificación de Patrones Temporales para Caracterizar Lesiones Cervico Uterinas en Imágenes Colposcópicas.
En el presente trabajo se propone una metodología para analizar y clasificar patrones temporales extraídos de imágenes colposcópicas para caracterizar lesiones cervico uterinas. Las imágenes colposcópicas han sido adquiridas con luz blanca las cuales se han representado en diversos espacios de color para identificar con cual de ellos se obtiene una mejor caracterización de las series temporales. El enfoque de aprendizaje supervisado fue elegido para realizar la clasificación de series temporales.La clasificación se realizó utilizando el algoritmo k-vecinos más cercanos. El método k-fold cross validation fue utilizado para evaluar el desempeño del clasificador. Los resultados preliminares obtenidos en este trabajo en proceso, son alentadores alcanzando una sensibilidad de 59%, mismas que se esperan mejorar al incluir un mayor número de casos.
Assesing Cervical Cancer Lesion Predictability using Aceto-White Temporal Patterns with Bayesian Network Learning.
Colposcopic test is the second most used technique to diagnose cervical cancer disease. In a previous work, we propose a mothodology analysis to automatically segment the colposcopic images using the temporal patterns, which produces a compact representation to facilitate similarity measures for classification. In the present work, we used different Bayesian Networks algorithms to assess their predictability scores to perform classification of different temporal patterns related to precursor lesions of cervical cancer. The aim of this work is to show evidence of the viability of this machine learning framework to segment colposcopic images.
Estudiantes, Examen de Ingreso y Trayectoria Escolar.
Las Instituciones de Educación Superior encontrarón en los exámenes de selección un instrumento para dejar atrás procedimientos carentes de una lógica académica y ofrecer procesos que garantizan a los solicitantes criterios de transparencia. Estos instrumentos se caracterizan por la solidez de su construcción y aplicación, de acuerdo con estándares internacionales. Diseñados para explorar la capacidad de razonamiento y el dominio de contenidos de los solicitantes, permiten ordenar a los estudiantes en función de sus resultados y fundamentar la decisión de ingreso. La premisa es que estudiantes con resultados más altos tendrán mayores probabilidades de alcanzar un mejor desempeño en la educación superior. En este trabajo se realiza una aproximación a las relaciones entre los resultados obtenidos por los estudiantes en las áreas de conocimiento exploradas por el EXAN-II, el desempeño en el bachillerato y su trayectoria escolar. Se analizó la información disponible de los resultados de la aplicación del EXAN-II de 8,363 estudiantes de primer ingreso a la Universidad Veracruzana en el año 2000. Se utilizaron pruebas de independencia condicional. Sin agotar todas las posibilidades, se describen los datos obtenidos y su análisis sugiere el grado y la orientación de la asociación entre las calificaciones en el examen y el rendimiento en la universidad.
A Parsimonious Constraint-based Algorithm to Induce Bayesian Network Structures from Data.
In this paper, we present a novel algorithm, called MP-Bayes which induces Bayesian Network structures from data based on entropy measures. One of the main features of this method is its parsimonious nature: it tends to represent the joint probability distribution underlying the data with the least number of arcs. While other methods that build Bayesian Networks tend to overfit the data, MP-Bayes creates models that seem to have an adequate trade-off between accuracy and complexity. To support such a claim, we compare the performance of MP-Bayes, in terms of classification, against those of four different Bayesian network classifiers. The results show that our procedure generalise well in a wide range of situations.
Cervical Cancer Detection using Colposcopic Images: A Temporal Approach.
In the present work we propose a methodology analysis of the colposcopic images to help the expert to make robust diagnosis of precursor lesions of cervical cancer. Although some others approaches have been used to assess cervical lesions, a complete methodology to evaluate temporal changes of tissue color is still missing. The different processes involved in the analysis are described. The image registration was implemented using the phase correlation method followed by a locally applied algorithm based on the normalized cross-correlation. During the parameterization process, each time series obtained from the image sequences was represented as a parabola in a parameter space. A supervised Bayesian learning approach is proposed to classify the features in the parameter space according to the classification made by the colposcopist. Then those labels are used as a criterion to categorize the tissue and perform the image segmentation. Some preliminary results are shown using unsupervised learning with real data.
Is the Signal-to-Noise Distance Enough to Determine the Gene-Markers' Class Label?. A Preliminary Study using Bayesian Networks.
Tumour gene markers are intended to be molecular "fingerprints" for effective specific cancer identification. A representative feature is the distance to which such markers are from the actual phenotype. It can be thought that the closer the distance, the better the estimate to determine to which class the markers belong. In this paper, we assess the significance of this feature, along with 3 more features, for discriminating class label using 3 Bayesian network classifiers. The results show that the "distance" feature helps discriminate between a normal sample and a malignant sample with high confidence while this same feature is not highly significant to discriminate among different types of cancer.
Construcción, Evaluación y Manipulación de Redes Bayesianas a partir de Datos Biológicos.
Aplicando diferentes métodos algorítmicos se presenta un procedimiento general para construis, evaluar y manipular Redes Bayesianas a partir de datos biológicos con el objetivo de descubrir nuevo conocimiento o confirmar hipótesis. Se aplican varias técnicas en las diferentes fases del procedimiento para mostrar la diversidad de posibilidades del análisis usando Redes Bayesianas. Se presenta un ejemplo que se usa una base de datos de origen experiemental, sin embargo, para corroborar la utilidad de los métodos, se listan diversos casos en los que se han utilizado bases de datos de naturaleza y fuente diferente. Para poder construir el ejemplo que se presenta, usamos nuestro propio laboratorio computacional para minería de datos: MiningLab, ya que no encontramos una herramienta computacional con una interfaz versátil e intuitiva y que conjuntara diversos algoritmos para cada una de las fases del procedimiento mencionado.
Aceto-White Temporal Pattern Classification using Näive Bayes to Identify Precancerous Cervical Lesion in Colposcopic Images.
Cervical cancer is a global disease. However, the highest incidence and mortality rates have been found in Africa, Asia and Latin America. After Pap smear test, colposcopy is the most used technique to diagnose this disease due to its higher sensitivity (98%) and specificity (48%). However, the major problem with this technique is its intrinsic subjectivity that requires new mechanisms to quantify near to absolute measurements to improve the test. One of the most promising approaches to reach this goal has been the use of the aceto-white temporal patterns intrinsic to the color changes in digital images. Although some efforts have been made to characterize cervical lesion using these temporal patterns, there is not a complete understanding of how to use them to segment colposcopic images. In this work, we explored the use of the machine learning supervised Naïve Bayes classification algorithm over a discretized version of aceto-white temporal pattern to automatically discriminate between normal and abnormal cervical tissue.
Explorations of the BDI Multi-Agent Support for the Knowledge Discovery in Databases Process.
Knowledge Discovery in Databases (KDD) is the process of finding valid, novel, useful and understandable patterns in data, to verify hypothesis of the user or to describe/predict the future behavior of some event. The KDD process involves diverse techniques provided by tools like the Waikato Environment for Knowledge Analysis (WEKA), but usually without guidance. This work is an explorations of the use of Multi-Agent Systems (MAS) methodologies and tools to provide support in the KDD process while using such tools. The Belief-Desire-Intention (BDI) model of agency provides the right level of abstraction to approach this problem. Firts, the Prometheus methodology is used to analyse the KDD process in terms of MAS of BDI agents. Then, a MAS of decision trees inducers and Bayesian networks builders, that complete to generate the "best" hypothesis for a given KDD problem, is implemented. The main result of this explorations is a framework where it is posible to implement AgentSpeak(L) agents that use primitive actions of WEKA to form intentions for solving problems in the KDD process. Extensions in terms of the number of agents and their capabilities are easy to implement in this framework.
Los Programas de Cómputo y su Protección Jurídica en México.
Si todos los seres humanos tuvieramos conciencia ética e integridad moral, no habría necesidad de la existencia de lo que conocemos como el derecho de autor, cuya protección surge por la imposibilidad de regir estas situaciones con las reglas del derecho de propiedad ordinario, esto debido al "... carácter inmaterial de su objeto y la presencia de intereses extrapatrimoniales del autor en su obra". Así pues, fue necesario crear una legislación especial para la protección del derecho de autor y, en general, de la propiedad intelectual. En este escrito se hace referencia a cómo los creadores de programas de cómputo protegen sus obras a través del derecho de autor, y a la tendencia mundial de armonizar criterios en esta materia con el fin de otorgar mayor seguridad a los autores e incentivar su actividad creativa. Igualmente, se hace una crítica al mismo derecho que, mal entendido, puede dar lugar al monopolio del conocimiento, además de construir un freno para el avance científico y tecnológico.
El Nuevo Modelo Educativo: La Odisea del Profesor.
Estamos siendo testigos de la tendencia internacional hacia una modificación de los modelos educativos tradicionales, considerados como aquellos en los que la enseñanza tiene un papel preponderante mientras que el aprendizaje se considera derivación de aquélla. El cambio de visión, que implica una educación centrada en el aprendizaje, donde el personaje principal es, por supuesto, el estudiante, presenta una serie de retos para todos los actores educativos. Estudiantes, profesores e instituciones se enfrentan a la exigencia de reconsiderar lo que hasta hace poco se consideraba apropiado: que el estudiante es un recipiente donde el profesor derrama su sabiduría, sabiduría que aquél recibe con relativamente poco esfuerzo. El desafio institucional en este sentido, es importante y, claro, no lo es menos el que tiene frente a sí el estudiante; sin embargo, en este ensayo escogimos referirnos a los retos que enfrenta el profesor al advertir este cambio de paradigma.
Elección del Valor K para el Algoritmo K-Vecino más Cercano.
El presente artículo describe el algoritmo conocido como "K-Vecinos más cercanos" el cual es considerado por algunos autores como uno de los mejores clasificadores, sin embargo a la hora de utilizar dicho algoritmo se presenta la interrogante sobre e valor K que se debe considerar para obtener una predicción confiable en los resultados de la clasificación. Para ellos se muestra un análisis utilizando el sistema WEKA, donde se efectuó un estudio comparativo para la elección del valor de K que proporcione la clasificación más confiable. Con base a los resultados obtenidos es posible sugerir al decisor cuál podría ser el valor de K más adecuado bajo ciertos parámetros que le permita obtener un grado de certeza en la clasificación al utilizar dicho algoritmo.
Análisis Comparativo en el Desempeño de los Clasificadores de Minería de Datos para su empleo como Herramienta en el Proceso de Administración del Conocimiento.
En este artículo se presenta un estudio comparativo del desempeño de los clasificadores más utilizados (C4.5, Redes Bayesianas y K-Vecino más cercano) dentro de la Minería de Datos mediante un análisis con el sistema WEKA. Con base a los resultados, se busca brindar al decisor, cuál podría ser el clasificador más adecuado para utilizar, según la situación en que se encuentre la información que posee, y con ello generar conocimiento para obtener un mayor grado de certeza en el proceso de toma de decisiones.
Una Herramienta Computacional para el Análisis de Mapas AutoOrganizados.
La red neuronal artificial SOM (por las siglas en ingles de Self-Organizing Maps) es un modelo de aprendizaje no supervisado que ha sido aplicado con extraordinario éxito en una amplia variedad de campos de investigación. Sin embargo, su funcionamiento no esá validado matemáticamente. Por ejemplo, no se han caracterizado las condiciones en las que la proyección, a la converge el algoritmo de aprendizaje que utiliza la red neuronal SOM, preserve la topología del conjunto de datos. Aprovechando las capacidades actuales de los equipos de cómputo, es posible llevar a cabo investigaciones computacionales exploratorias para profundizar en el conocimiento de estos algoritmos y descubrir resultados matemáticos que justifiquen su uso en varios escenarios de aplicación. Con este objetivo, en el Laboratorio de Dinámica no Lineal de la Facultad de Ciencias de la UNAM, se ha desarrollado un prototipo de software llamado LabSOM (laboratorio para la investigación computacional del algoitmo SOM), que permite al usuario diseñar experimentos generando datos con una estructura predeterminada. En este trabajo se presenta un prototipo del sistema y un ejemplo que ilustra su utilidad para investigar este algoritmo.
Sistemas Dinámicos y Visualización Infométrica: Una Aplicación de la Red Neuronal SOM.
El desarrollo de métodos de la informática y de la inteligencia computacional ha permitido el aprovechamiento de vastos almacenes de información que están disponibles en formato digital. Hace algunas décadas, el análisis de datos se basaba principalmente en métodos estadísticos. Estos métodos parten de modelos preestablecidos y de lo que se trata es de deteminar que tan bien alguno de estos modelos describe a los datos. Los métodos de la inteligencia computacional involucran procesos de aprendizaje que crean modelos o representaciones no preestablecidas a partir de los datos. Los procesos de aprendizaje se abordan matemáticamente utilizando Sistemas Dinámicos, muchos de ellos inspirados en la manera en que los seres vivos aprenden. Ejemplos sobresalientes de estos procesos de aprendizaje son las Redes Neuronales Artificiales (RNA). Algunas RNA como el algoritmo de Mapas Auto-Organizantes (SOM por sus siglas en inglés), han probado ser de utilidad para asistir al descubrimiento de conocimiento y al análisis bibliométrico. Metodologías como ViBlioSOM (Visualización Bibliométrica con el algortimo SOM), se usan exitosamente en el análisis de grandes volúmenes de información contenidos en las bases de textos de ciencia y tecnología que están disponibles en la actualidad. Para ilustrar las capacidades de visualización de esta tecnología, presentamos un ejemplo en el cual se obtienen automáticamente algunos mapas, a partir de un gran conjunto de ficheros de MedLine con la ayuda del sistema ViBlioSOM. El análisis de estos mapas permite obtener información útil para apreciar la forma en que las matemáticas están siendo utilizadas como herramienta en la investigación biomédica.
A Method Based on Genetic Algorithms and Fuzzy Logic to Induce Bayesian Networks.
A method to induce Bayesian Network from data to over-come some limitations of other learning algorithms is proposed. One of the main features of this method is a metric to evaluate Bayesian Networks combining different quality metrics. In this fuzzy system a metric of classification is also proposed, a criterium that is not often used to guide the search while learning bayesian networks. Finally, the fuzzy system is integrated to a genetic algorithm, robust and flexible learning method with performance in the range of the best learning algorithms of bayesian networks developed up to now.
Digital Image Processing of Functional Magnetic Resonance Images to Identify Stereo-Sensitive Cortical Regions Using a Global Stereo Stimuli.
Functional magnetic resonance images (fMRI) were analyzed to investigate the cortical regions involved in stereoscopic vision using red/green anaglyphs to present random dot stereograms. Two experiments were conducted both of which required high attentional demands. In the first experiment the subjects were instructed to follow the path of a square defined by depth and moving in the horizontal plane contrasted with a similar sized square defined by a slight difference in luminance. Three main regions were identified V3A, V3B and BA7. To test that the observed activations were not produced by the pursuit eye movements, a second experiment required the subjects to fixate whilst a shape was presented in different random orientations. Our results suggests that areas V1, V3A and precuneus are involved in stereo disparity processing. We hypothesise that the activation of the V3B region was produced by the second order motion component induced by the spatio-temporal changes in disparity.
Stereo-Sensitive Cortical Regions Identified Using Shape Discrimination from Stereopsis: an FMRI Study.
Functional magnetic resonance images (fMRI) were analyzed to investigate the cortical regions involved in stereoscopic vision using red/green anaglyphs to present random dot stereograms. In a previous work, we developed an experiment in which the subjects were instructed to follow the path of a square defined by depth and moving in the horizontal plane contrasted with a similar sized square defined by slight difference in luminance. Three main regions were identified V3A, V3B and BA7. To test that the observed activations were not produced by the pursuit eye movements, a second experiment reported here was designed to required the subjects to fixate whilist a shape was presented in different random orientations. The subject's task was to press a button when the shape was presented in a particular orientation. The sites of the activations found with this procedure were consistent with those identified in our previous experiment. In agreement with other fMRI studies, our results suggests that areas V1, V3A and precuneus (BA7) are involved in stereo disparity preocessing. We hypothesise that the activation of the V3B region was produced by the second order motion component induced by spatio-temporal changes in disparity (stereoscopic motion). We found no evidence for the involvement of the V5 area in the processing of stereoscopic stimuli.
Examen de Selección y Trayectoria Escolar
Functional Si bien se ha generalizado la aplicación de exámenes como instrumentos para determinar el ingreso a las instituciones de educación superior (IES), son escasos los estudios que exploran la capacidad para determinar la probabilidad de éxito de quienes ingresan. Esta es una aproximación a las relaciones entre los resultados obtenidos por los estudiantes en las áreas de conocimiento exploradas por el Examen Nacional de Ingreso a la Educación Superior (EXANI-II) y su trayectoria escolar. Se analizó la información disponible de 6,937 estudiantes de primer ingreso a la Universidad Veracruzana (UV) en 1998. Se utilizaron pruebas de independencia condicional, así como medidas de correlación. Sin agotar todas las posibilidades, el análisis de los datos sugiere el grado de asociación entre las calificaciones en el examen y el rendimiento en la universidad, además de calcular la probabilidad de éxito.
Un Algoritmo para Generar Redes Bayesianas a partir de Datos Estadìsticos
Una red probabilista es una representación gráfica para manejar incertidumbre en sistemas expertos. Dentro de este campo se tienen dos divisiones en cuanto a la forma de construir el sistema: el enfoque tradicional y el enfoque de aprendizaje. En el enfoque tradicional la determinación de la topología ó estructura de la red y de los parámetros asociados con dicha topología es propuesta por el experto humano. En el enfoque de aprendizaje tanto la topología como los parámetros son determinados a partir de una muestra de datos estadísticos sin la intervención directa del experto humano (en la mayoría de los casos). A la fecha, la búsqueda de esquemas de aprendizaje en redes probabilistas es un campo de investigación abierto y sumamente activo. En este trabajo proponemos un algoritmo original que "aprende" de una muestra de datos estadísticos la topología de la red probabilísta a partir de una serie de pruebas estadísticas de independencia condicional. Este algoritmo se aplica a una clase de problemas en los cuales las variables se encuentran divididas en variables dependientes e independientes.