Large Language Model Reasoning Failures