Posted: August 6th, 2022


 see document attached, same as previous, provide 1 paragraph feedback based on the proposal review guide. 

Currently, I work as a marketing researcher for the Stockton Shelter for the Homeless in California. The shelter is expanding its services to further assist the local community by transitioning to becoming a homeless navigation center. A homeless navigation center differs from a traditional homeless shelter as it is classified as “low-barrier” with the end goal of transitioning clients into permanent housing and, more importantly, has a referral process that clients must go through in order to enter the program. This means that the center’s outreach team must select, in the Stockton Center’s case, 180 individuals to refer. Those who come to the center without being referred will, unfortunately, be turned away as the center cannot feasibly assist the entire homeless community at once. My coworkers and I are faced with the problem of having to determine which individuals best qualify for and should be referred to the program based on their likelihood to successfully exit into permanent housing

This problem can be solved through the use of predictive modeling to determine which clients are the most likely to successfully exit the program. Given that I have a clear target variable that is categorical– the clients that are most likely to succeed– I would use supervised segmentation. To do this, I can develop a decision induction tree to determine which attributes are of the most important in determining who will be successful in the program and subsequently identify which specific individuals are the most likely to be successful. This is valuable as the decision induction tree can provide me with the probability of an individual succeeding in the program based on their classification within a set of attributes. Once I have this data, my coworkers and I can then use the data to determine which 180 individuals to refer to the program. The same model and process can be used to select the next set of individuals to refer to the program, adding value to the Stockton Navigation Center by minimizing the guessing and error that may occur when choosing who to refer; thus, setting the program up for long-term success. The program’s success is particularly important as it determines how much governmental funding it will receive each year to continue assisting the housing insecure.

Five attributes that will help my predictive modeling include the employment status of the individual, their highest level of education completed, the length of their homelessness, their history of substance abuse, and their risk of being harmed by others. The employment status is an important attribute that would be useful for the classification model as it would separate those who are currently employed (part-time or full-time) from those who are not. This is essential as our previous research has indicated that those who are ‘successful’ in similar programs commonly have some form of employment. The education level is an important attribute as it would separate those with a lower education level (middle school or less) from those with a higher education level (high school or higher). This is vital as our previous research on similar programs indicates that those with an education level of high school or higher are more likely to be successful in the program and achieve permanent housing. The length of homelessness would identify the individuals whether an individual is recently or chronically homeless, which is essential to know as previous research indicates that those who are successful in programs alike are typically newly homeless. The history of an individual’s substance abuse is important as it helps determines what risks an individual carries as well as what additional services they will need in order to obtain and maintain permanent housing (i.e. support groups, rehab, etc.). Results of similar programs indicate that those without a history of substance abuse typically perform better in navigation centers. Lastly, an individual’s risk of being harmed by others is an important attribute as the center desires to help individuals reach safety. Our current research indicates that those facing domestic violence or are fleeing someone who wants to hurt them are in specific need of the center’s services and are at a high risk of becoming chronically homeless. If an individual is at risk of becoming chronically homeless, it may potentially be more difficult to help transition to permanent housing in the future.

The data for these attributes can be obtained by having our outreach team survey the current homeless shelter attendees, where the results would then be represented on the classification decision tree model to determine which attributes supply the highest information gain and which group of individuals would be the most successful for the program. Using this data, my colleagues and I would be able to select specific individuals to refer to the program based on their likelihood to be successful in the program.

Appendix A. Proposal Review Guide
Effective data analytic thinking should allow you to assess potential data mining projects systematically. The material in this book
should give you the necessary background to assess proposed data mining projects, and to uncover potential flaws in proposals. This
skill can be applied both as a self-assessment for your own proposals and as an aid in evaluating proposals from internal data science
teams or external consultants.
What follows contains a set of questions that one should have in mind when considering a data mining project. The questions are
framed by the data mining process discussed in detail in Chapter 2, and used as a conceptual framework throughout the book. After
reading this book, you should be able to apply these conceptually to a new business problem. The list that follows is not meant to be
exhaustive (in general, the book isn’t meant to be exhaustive). However, the list contains a selection of some of the most important
questions to ask.
Throughout the book we have concentrated on data science projects where the focus is to mine some regularities, patterns, or models
from the data. The proposal review guide reflects this. There may be data science projects in an organization where these regularities are
not so explicitly defined. For example, many data visualization projects initially do not have crisply defined objectives for modeling.
Nevertheless, the data mining process can help to structure data-analytic thinking about such projects — they simply resemble
unsupervised data mining more than supervised data mining.

Business and Data Understanding
What exactly is the business problem to be solved?
Is the data science solution formulated appropriately to solve this business problem? NB: sometimes we have to make judicious
What business entity does an instance/example correspond to?
Is the problem a supervised or unsupervised problem?
If supervised,

Is a target variable defined?
If so, is it defined precisely?
Think about the values it can take.

Are the attributes defined precisely?
Think about the values they can take.
For supervised problems: will modeling this target variable actually improve the stated business problem? An important
subproblem? If the latter, is the rest of the business problem addressed?
Does framing the problem in terms of expected value help to structure the subtasks that need to be solved?
If unsupervised, is there an “exploratory data analysis” path well defined? (That is, where is the analysis going?)

Data Preparation
Will it be practical to get values for attributes and create feature vectors, and put them into a single table?
If not, is an alternative data format defined clearly and precisely? Is this taken into account in the later stages of the project? (Many
of the later methods/techniques assume the dataset is in feature vector format.)
If the modeling will be supervised, is the target variable well defined? Is it clear how to get values for the target variable (for training
and testing) and put them into the table?
How exactly will the values for the target variable be acquired? Are there any costs involved? If so, are the costs taken into account
in the proposal?
Are the data being drawn from a population similar to that to which the model will be applied? If there are discrepancies, are any
selection biases noted clearly? Is there a plan for how to compensate for them?

Is the choice of model appropriate for the choice of target variable?
Classification, class probability estimation, ranking, regression, clustering, etc.
Does the model/modeling technique meet the other requirements of the task?
Generalization performance, comprehensibility, speed of learning, speed of application, amount of data required, type of data,
missing values?
Is the choice of modeling technique compatible with prior knowledge of the problem (e.g., is a linear model being proposed for a
definitely nonlinear problem)?
Should various models be tried and compared (in evaluation)?
For clustering, is there a similarity metric defined? Does it make sense for the business problem?

Evaluation and Deployment
Is there a plan for domain-knowledge validation?
Will domain experts or stakeholders want to vet the model before deployment? If so, will the model be in a form they can
Is the evaluation setup and metric appropriate for the business task? Recall the original formulation.
Are business costs and benefits taken into account?
For classification, how is a classification threshold chosen?
Are probability estimates used directly?
Is ranking more appropriate (e.g., for a fixed budget)?
For regression, how will you evaluate the quality of numeric predictions? Why is this the right way in the context of the problem?
Does the evaluation use holdout data?
Cross-validation is one technique.
Against what baselines will the results be compared?
Why do these make sense in the context of the actual problem to be solved?
Is there a plan to evaluate the baseline methods objectively as well?
For clustering, how will the clustering be understood?
Will deployment as planned actually (best) address the stated business problem?
If the project expense has to be justified to stakeholders, what is the plan to measure the final (deployed) business impact?

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price: