Schedule at a Glance

Time Assembly Hall Campus Center 375
8:30am Welcome: Sue Faerman  
9:00am Keynote: Andy Deitsch  
  The Industrial Internet  
10:00am Mark Hughes Loni Hagen
  Tracking Corporate Culture: Using Text Analysis to Track Non-Financial Disclosures in Corporate Filings Understanding citizen's direct policy suggestions to the federal government: a natural language process and machine learning approach
10:30am Daniel Zhang Mehrdad Mirzaei
  Bus Stop Usage Evaluation and BRT Station Selection Strategy by Machine Learning Methods Connecting Twitter activities with e-petition signatures using PSL
11:00am Weijia Ran Nic DePaula
  Market Diffusion of Enabling Technologies for Sustainable Consumption and Production Social media communication in the public sector: a content analysis
11:30am Lunch and Poster Presentations in the Student Center  
1:30pm Keynote: Peter Aiken  
  The Case for the Chief Data Officer: Recasting the C-Suite to Leverage your Most Valuable Asset  
2:30pm Catherine Dumas Grace Begany
  Project Petition: Examining Collective Action in Networks of Online Petitioning Behavior in WeThePeople Using Association Rules and Community Detection. Investigating Health Information Behavior: A Mobile Diary Study
3:00pm Dima Kassab & Luis Ibanez Jami Colter
  Analysis of an Open Source Community using gameful design Cyberbullying in online gaming environments; A look at the psychological risk factors for adults.
3:30pm Tino DeMarco Ning Sa
  The Listing and Selling of Real Estate: an ethnography Investigating the Sources of Noise in Interactive Information Search


Andrew Deitsch

The Industrial Internet

The confluence of networked machines with embedded sensors, software, big data, advanced analytics tools, and millions of people using social networks has given rise to the next industrial revolution that we at GE call the Industrial Internet. This talk will describe what GE is doing in this space and how we are developing and using technology to drive significant financial benefits for the industries GE serves.

Andy, or Andrew Deitsch, is the Technology Leader for the Computational Science & Architectures organization at GE Global Research. Among other responsibilities, he leads research and development in areas such as big data and knowledge representation, high performance computing, robotics and autonomous systems, the Internet of Things, and cyber security. Most notably, he is a CS Alum at UAlbany and is currently on the CCI Advisory Board as a member.

Peter Aiken

The Case for the Chief Data Officer: Recasting the C-Suite to Leverage your Most Valuable Asset.

Yes, we face a data deluge and big data seems to be largely about how to deal with it. But much of what has been written about big data is focused on selling hardware and services. The truth is that until the concept of big data can be objectively defined, any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion. While both the need for and approaches to these new requirements are faced by virtually every organization, jumping into the fray ill-prepared has (to date) reproduced the same dismal IT project results.
Learning Objectives

  • The very real, very rapid, very great increases in data of all forms (charts showing data types and volume increases)
  • Challenges faced by virtually all data management programs
  • Means by which big data techniques can compliment existing data management practices
  • Necessary but insufficient pre-requisites to exploiting big data techniques
  • Prototyping nature of practicing big data techniques

Peter Aiken, Ph.D., is widely acclaimed as one of the top ten data management authorities worldwide. As a practicing data consultant, author and researcher, he has been actively performing in and studying data management for more than 30 years. Throughout his career, he has held leadership positions and consulted with more than 50 organizations in 20 countries across numerous industries, including defense, banking, healthcare, telecommunications and manufacturing. He is a highly sought-after keynote speaker and author of multiple publications, including his latest book, “The Case for the Chief Data Officer: Recasting the C-Suite to Leverage your Most Valuable Asset.” In addition to being Data Blueprint’s Founding Director, Peter is also Associate Professor of Information Systems at Virginia Commonwealth University and President of the International Data Management Association (DAMA).


Mark Hughes

Tracking Corporate Culture: Using Text Analysis to Track Non-Financial Disclosures in Corporate Filings

Assembly Hall, 10:00am
In the United States, in addition to financial disclosures, publicly-traded corporations are required to include various qualitative corporate disclosures (“QCDs”) in their public filings. What do these qualitative or non-financial disclosures reveal about changing corporate values and culture? This study employs a longitudinal analysis, identifying the trends in non-financial corporate disclosures during the ten-year period from 2003 to 2013.
This study also demonstrates an application of text analysis programs. In the past, researchers hand-culled qualitative disclosures from corporate filings. However, current text analysis (NLP) software provides a more efficient method to retrieve this information.

Loni Hagen

Understanding citizen's direct policy suggestions to the federal government: a natural language process and machine learning approach

Campus Center 375, 10:00am
Citizens use online petition platforms to make policy suggestions. However, electronic-government researchers and practitioners lack methods to efficiently measure the priorities citizens emphasize in petitions. To address this problem, we will use multiple unsupervised learning methods to cluster and to uncover the underlying semantic structure of the “We the People” (WtP) petition documents. We will apply unsupervised learning method to a collection of 1,800 petition texts from November 2011 to May 2013.
The major contributions of this study include introducing an unsupervised automatic clustering method to process large volumes of text, and finding naturally occurring policy suggestions from petitions. For policy makers, it is important to understand the citizen’s policy suggestions that are not mediated by other channels. Unsupervised automatic clustering methods can provide a solution to understanding citizens’ direct policy suggestions which are too big for human to manually process.

Tianchi Zhang

Bus Stop Usage Evaluation and BRT Station Selection Strategy by Machine Learning Methods

Assembly Hall, 10:30am
According to “Commuting in the United States 2009”, in that year 86.1% of Americans commuted by car, light truck, or van, and about three-quarters of these individuals were driving alone, causing traffic congestion and raising environmental and energy-saving concerns in society. Therefore, transportation experts encourage the public to take public transportation and recommend the development of Bus Rapid Transit (BRT). Currently, bus service restructuring and BRT plans are based on rider surveys, community meetings and on-street interviews. However, these methods require large investments in manpower and material resources, and produce potentially biased results. In this paper, the author used the machine learning method, a computer program that automatically analyzes a large body of data and calculates what information is most relevant, to evaluate current bus station usage and determine potential BRT station locations. The station features considered by the machine learning include passenger activities (getting on and getting off), station distance to prior/next station, topography, and so on. The passenger data, collected by a local transit agency in 2008/2009 in Albany, NY, were classified by different times and weather conditions. The author also performed deep research into GTFS data and developed a GTFS data visualization website, using Google Maps API, to retrieve station distance and topography data. After testing different algorithms, the EM algorithm and K-Means were determined to be the best algorithms for clustering the stations. While the machine learning strategy can successfully make comprehensive evaluations of all stops, it is inadequate where specific routes are concerned. Therefore, future research should focus on how to redesign stop features to cluster a specific route’s stops.

Mehrdad Mirzaei

Connecting Twitter activities with e-petition signatures using PSL

Campus Center 375, 10:30am
The WethePeople website provides a new way to petition the government to take action on a specific issue facing the country. Petitioners and signers can use Twitter to solicit signatures for the petitions they recommend. We will use probabilistic soft logic (PSL) to automatically detect advocacy tweets that reflect specific petitions. PSL will automatically retrieve relative tweets on e-petition text. Our result is unique and important because it provides a novel method for presenting “smart” information to online petition users. Further, our algorithm can be widely used to retrieve and present only relevant tweets to users.

Weijia Ran

Market Diffusion of Enabling Technologies for Sustainable Consumption and Production

Assembly Hall, 11:00am
Entrepreneurial activities and business models describe ways to start and maintain a business. Empirical data show that they play an important role in bringing technology-based products or services to market. However, the role of entrepreneurial activities and business models in the diffusion process has not been systematically explored and discussed in the adoption and diffusion literature. The purpose of our study is to contribute to this area by exploring the role of entrepreneurship and business models in the diffusion process through a System Dynamics modeling and simulation approach. We built a simulation model based on technology-diffusion-related literature and empirical data collected through the process of implementing a sustainable consumption and production initiative. Our analysis of simulation experiment results shows different entrepreneurial activities and business models leads to different diffusion paths and associated market behaviors.

Nicolau DePaula

Social media communication in the public sector: a content analysis

Campus Center 375, 11:00am
Social media has become a widely used technology among the general population. Although studies have begun to examine the uses of this technology in the context of government operations, most propositions about such usage comes from interviews with government officials. Although analyses of actual social media communication exist, these are few and have focused on too broad of categorizations. In this study we conduct a content analysis to both explore and make propositions about the purposes for which governments are using social media. We focus on the "posts" of municipal agencies in the social network site Facebook. To complement the existing literature on government social media practice we provide a set of descriptive categories to understand social media communication in the public sector.

Catherine Dumas

Project Petition: Examining Collective Action in Networks of Online Petitioning Behavior in WeThePeople Using Association Rules and Community Detection.

Assembly Hall, 2:30pm
We present the case of WethePeople (WtP), an unprecedented national experiment in the use of technology to create a new form of communication between individuals and the White House. WtP is a web-enabled petitioning system, which gives users an opportunity to petition the Federal Government for actions of the petitioner’s choosing and to obtain a response from the Administration if the petitioner can demonstrate sufficient support for the petition in question (see ). The data used for this study were obtained from a White House database containing information about all petitions publicly available on the WtP website between Sept 22, 2011 and April 30, 2013 (see On December 21, 2012, President Obama pledged a national conversation about gun control in response to 33 petitions initiated since December 14, 2012, all focused on the topic of gun control motivated by the Sandy Hook elementary school shootings. We focus particularly on 21 of the 33 petitions initiated during this week that were in opposition to gun control as a policy response to Sandy Hook, against the backdrop of the single largest petition to appear on WtP, which was gathering over 195,000 signatures during that same period of time along with 12 other petitions also advocating various gun control options. We use market basket analysis to explore questions about whether individuals who sign one anti-gun control petition also sign other anti-gun control petitions. We use community detection and social network analysis to determine if there are groups of individuals who sign similar anti-gun control petitions, thus suggesting the creation of “communities” of individual whose actions are similarly aligned in opposition to gun control or in support of policy proposals that are alternatives to gun control.This paper will present some of our preliminary results and analysis.

Grace Begany

Investigating Health Information Behavior: A Mobile Diary Study

Campus Center 375, 2:30pm
People seeking health information increasingly use information and communication technologies (ICTs) to do so. The use of ICTs (e.g., computers, laptops, tablets, smartphones, and cellphones) to connect to critical health information resources is an important and growing activity. With eight out of ten Internet users, or 59% of all U.S. adults, looking for health information online, this activity ranks as the third most popular online pursuit, next to email and using a search engine. Within this population, a small percentage of health information seekers go online to find others who share the same health concerns, so-called “peer-to-peer healthcare,” which typically occurs via social media. Additionally, the use of mobile devices on the part of health information seekers has grown tremendously in the past two years, nearly doubling between 2010 and 2012.
The proposed research study seeks to further understand the use of ICTs, such as the Internet, social media, and mobile devices, for health information seeking and use. To do so, the researchers will investigate the health information behaviors of ICT users in the University at Albany community via a mobile diary study. Diary studies are a common methodological approach in human-computer interaction (HCI) research and allow participants to report their behaviors and experiences in situ, or in natural everyday situations. The proposed study will employ a unique, custom-developed, mobile web-based diary as the primary participant tool of data collection. The study’s key research question asks: What is the ICT-related health information behavior of University at Albany community members? In particular, researchers are interested in (1) whether participants use the Internet, mobile devices and/or social media to seek health information for themselves, (2) whether they use the Internet, mobile devices and/or social media to seek health information for someone else, and (3) whether they have privacy concerns regarding use of ICTs to seek health information, among other health information behaviors and variables as enabled via the available data. The proposed study would provide valuable feedback and insights regarding the use of ICTs for health information seeking and use which could be used to improve health information resources and their user experiences.

Dima Kassab & Luis Ibanez

Analysis of an Open Source Community using gameful design

Assembly Hall, 3:00pm
The goal of this proposal is to analyze the ITK community ( in order to identify game elements that have an impact on the interactions between different participants in the community. ITK is an open source software library for medical image analysis. A developer typically joins the ITK community to learn about how to use the library to solve technical issues related to a medical imaging software application. Engaging these developers and supporting their learning are the main objectives of this community, and a challenge that many other open source communities face. The proposed analysis will use gameful design theory, content analysis and discourse analysis to gain a better understanding of the interactions and learning as they happen in the community. Based on this analysis, suggestions will be made to improve the effectiveness of the game elements to produce engaging learning environment.
Gameful design is the practice of using game mechanics and elements in non-game contexts. Some current applications have shown the successful implementation of gameful design, such as Foursquare, Wikipedia and StackOverFlow. Different game elements can be used to gamify a system, such as points, rewards, reputation systems, levels and badges, among others.
ITK has been selected as the object of this study due to the familiarity and proximity that the authors have with its community, and their access to the online venues where communication and interaction happens in the community. As many other open source communities, ITK has grown organically and not always in a seamless manner. As a result, the community does not always get to realize its full potential on community growth and efficiency of interactions.
Using content analysis and discourse analysis, the authors propose to analyze different venues of interactions among participants in the ITK community. Most of the daily interactions in the community happen online. The venues of interaction that will be used here as data sources for analysis, include: mailing lists, code reviews, git repositories, wiki pages and bug trackers. The analysis seeks to identify game elements that may have an impact on learning and engagement of the community members. Based on this analysis, recommendations will be made on how gameful design can be used to restructure ITK online communications and cultural rules to increase the effectiveness of interactions, engagement and learning.

Jami Cotler

Cyberbullying in online gaming environments; A look at the psychological risk factors for adults.

Campus Center 375, 3:00pm
The psychological impact of cyberbullying may be more profound than that of traditional bullying due to the negative comments, threats, and accusations that are often visible to a wide audience, are long-lasting, and can be viewed repeatedly by the victim and their peers causing repeated victimization (Campbell, 2005; Strom and Strom, 2005; Rivituso, 2012). The negative impact of cyberbullying leads to feelings of frustration, anger, and sadness that are detrimental to the victim’s psychological well-being (Patchin & Hinduja, 2006). While K-12 students are the most researched groups regarding cyberbullying, researchers have found that cyberbullying is not limited to these groups and that it extends to adult populations in college campuses and the work place. (Bond, Tuckey, & Dollard, 2010; Chapell et al., 2004; De Cuyper, Baillien, & De Witte, 2009; Keashly & Neuman, 2010; Lester, 2009; Privitera & Campbell, 2009; Cowie, Naylor, Smith, Rivers, & Pereira, 2002). However, scientific research on bullying and cyberbullying among older adolescents and adults is in its infancy with less literature on the topic (Lester, 2009). Moreover, research in the area of cyberbullying in online multi-player video gaming is also in its’ infancy with a focused on K-12 students.

Tino DeMarco

The Listing and Selling of Real Estate: an ethnography

Assembly Hall, 3:30pm
This ethnography attempts to learn how real estate owners navigate the listing and selling process for their own homes. The goal of the study is to view how information and communication technologies are chosen and employed, how owners value realtor and realtor services and to witness how owners approach and make decisions in a complex environment involving new technology, multiple policies, and information.

Ning Sa

Investigating the Sources of Noise in Interactive Information Search

Campus Center 375, 3:30pm
During web search, users click and read documents they evaluate as relevant as well as those evaluated as irrelevant. The irrelevant results clicked and read add noises to the implicit feedback which is the user information that can be obtained unobtrusively (Kelly, 2005). This paper investigates the sources of the noises and the indicators of the noises. In a lab experiment, each participant performs web search on 4 pre-defined tasks. The reasons why users click and read each result and how they make judgment is asked in the interview after the search session. A pilot study was conducted, and some findings were reported.

Posters (Campus Center Ballroom, 11:30 - 1:30)

Stephen Hay

Detecting Relation Between Twitter and WethePeople E-Petition Using PSL

The WethePeople website provides a new way to petition the government to take action on a specific issue facing the country. Once a petition receives enough public support the government will review it and issue an official response. This research project will use a specific data mining technique called probabilistic soft logic (PSL) algorithm to automatically detect related tweets to specific petitions on the WethePeople website.
We will target specific petitions and build a PSL that will automatically retrieve related tweets and present them on GUI we build. Doing so will allow us to establish a direct correlation between WethePeople petition activity and Twitter. Our result is unique and important because it provides a novel method for presenting “smart” information to online petition users. Further, our algorithm can be widely used to retrieve and present only relevant tweets to users who are mainly interested in retrieving related tweets.

Gloria Moran

Utilizing Bayesian Inferences to Compare the FDA’s MSM Blood Donor Screening Regulations against a High-Risk Behavior Approach

This study uses Bayesian inferences to compare the MSM (Men who have had Sex with Men) regulation versus a High-Risk Behavior (HRB) approach on their effectiveness of HIV and AIDS blood donor screening in NYS. The report’s HRB defined behaviors were formulated through extensive research on individual behaviors that increase the HIV/ AIDS transmission rate. This probabilistic model is constructed on quantitative ranges permitting sensitivity analysis in determining optimality of MSM and HRB screening regulations. Optimality is proposed through comparing effectiveness in screening out infection (= Pr (Deferred/Infected) / Pr (Accepted/ Infected)) and the change in blood supply to accepted infected blood (= Pr (Accepted/Uninfected) / (Accepted/Infected)).
The FDA’s current zero-risk approach to blood donation screening regulations requires the indefinite deferral of MSM after 1974. Low-risk men often in healthy long-term monogamous relationships are routinely denied under this regulation. The HRB approach to blood donor screening would replace the MSM regulation while still screening out the high risk MSM along with non-MSM also at high risk of transmission. This study aims for a non-discriminatory policy that has equal if not increased effectiveness in screening out donors at high risk for HIV/ AIDS infection.
The relative strength of the “gift-relationship” (voluntary non-remunerated blood donations) is assessed for both the MSM and HRB regulatory approaches. The blood donor gift-relationship communicates social solidarity in building altruism and a sense of community. This paper suggests that the MSM regulatory policy therefore stigmatizes, alienates and discriminates against homosexual men in American society by denying access to this community. The gift-relationship is often linked to national self-sufficiency in blood and plasma products. In addition, this relationship closely relates to the perpetual issue of non-compliance and donor-recipient trust.
Lastly, the study provides an in-depth stakeholder analysis accounting for blood donor NGOs, for-profit plasma industries, the FDA, the CDC, recipient representative groups, key legislative officials and gay rights activists. Social, economical, ethical and political considerations assist in revealing the obstacles and pathways to revising the FDA’s MSM blood donor regulations. These are addressed in the recommendations for bridging the gap between the policy models and policy implementation.

Ersin Dincelli

*Diffusion of Ideas and Homophily: How Does Conformity Influence Formation of Social Movements in Online Social Networks?

Online social networks (OSNs) have become important means of communication and sources that influence the masses in the last decade. This study aims to investigate influence and contagion that take place in certain occurrences in which actors can chose a side by favoring a particular opinion, such as protests, social movements or elections in OSNs. Network analysis can be used to explain this phenomenon as it makes it possible to understand the structure and behavior of such networks. In particular, how people would react to such events based on their beliefs and ideas as well as their particular network neighbors and how they align their behaviors with their network neighbors.

Smita Sharma

*Predicting Response Rate of Survey Questions Using Support Vector Machine Algorithm

A travel survey (or travel diary or travel behavior inventory) is a survey of individual travel behavior. One of the most important travel data set is National Household Travel Survey (NHTS). The NHTS provides information to assist transportation planners and others who need comprehensive data on travel and transportation patterns in the United States. However, there are many local travel surveys done in cities, university campuses and local public transit areas to extract more focused information about residents.
As part of this project, I looked at the Naïve Bayesian classification of questions based on response rate of questions in a travel survey. I also used Sequential Minimal Optimization Regression (SMOReg) model to predict the response rates of questions based on associated attributes. To apply the classification algorithm and to generate a regression model, I used Weka software developed by the University of Waikato in New Zealand.
One of the main objective for this project was to explore the software Weka to find out its possible applications for travel data. Both the regression model and classification generated by Weka were found to be fairly good as per root mean square error, F-measure and confusion matrix.

Christopher Kotfila

*IPython Notebook: Bridging the Programmer-Researcher divide to create modern tools for data analysis and reproducible computational science

Problem: When working on teams, with computationally intensive research methods such as data mining or large scale text analysis, researchers and programmers can feel like they're speaking two completely different languages. Programmers are usually tasked with obtaining, scrubbing, integrating and presenting data while scientists are concerned with exploring, modeling and interpreting that data. These two areas of concern can often be difficult to bridge leading to slow turn around time on analysis, incomplete understanding of the problem, and frustration on both sides of the divide. With the validity of a dataset, and a researchers reputation, hanging in the balance it is vital that both programmers and researchers can clearly and cleanly communicate their needs to each other.
How IPython Notebook Helps: IPython provides a natural interface that allows research team programmers to deliver custom agile analysis tools to researchers with a minimal amount of overhead. The IPython Notebook workflow allows for tight iteration and refinement of analysis tools. This builds confidence between researchers and programmers, helping to ensure that they are operating with a shared understanding. Finally the combined output of programmer and researcher, when working with the IPython Notebook, produces reproducible artifacts that can be easily distributed in online supplemental materials, providing real, practical solutions to the computational reproducibility crisis.

Carson Tao

Information Science in System Dynamics: A Review of the ISDC Bibliography

The Information Science and Information Special Interest Group (iSIG) wishes to understand its domain within the context of System Dynamics and how to advance its chosen field. To this end, the System Dynamics Society’s (SDS) bibliographic database is examined for titles and keywords relating to information science within SD. The Web of Science, a popular online publication database is also examined for references to encourage future SD works. The majority of articles in the SDS database come from two sources: The System Dynamics Review and the ISDC conference programs. There is little overlap between the SDS database and the Web of Science, as the latter does not generally index conference materials. This lack of overlap may limit the visibility of the Society’s database to those individuals who already know of its existence rather than persons new to the field. There are also gaps within the Society’s current database that limit its effectiveness when searching for items in the iSIG domain, as well as other areas of interest to the SD community. Changes to the submission system process are recommended to capture more meta-data and abstracts to increase the value of the database to the public.